You can click on section 2.3 to access this section of the textbook. Below is a modified version of this section.
Measures of the Location of the Data
The common measures of location are quartiles and percentiles. Quartiles are special percentiles. The first quartile, Q1, is the same as the 25th percentile, and the third quartile, Q3, is the same as the 75th percentile. The median, M, is called both the second quartile and the 50th percentile.
To calculate quartiles and percentiles, the data must be ordered from smallest to largest. Quartiles divide ordered data into quarters. Percentiles divide ordered data into hundredths. To score in the 90th percentile of an exam does not mean, necessarily, that you received 90% on a test. It means that 90% of test scores are the same or less than your score and 10% of the test scores are the same or greater than your test score.
Percentiles are useful for comparing values. For this reason, universities and colleges use percentiles extensively. One instance in which colleges and universities use percentiles is when SAT results are used to determine a minimum testing score that will be used as an acceptance factor. For example, suppose Duke accepts SAT scores at or above the 75th percentile. That translates into a score of at least 1220.
Percentiles are mostly used with very large populations. Therefore, if you were to say that 90% of the test scores are less (and not the same or less) than your score, it would be acceptable because removing one particular data value is not significant.
The median is a number that measures the “center” of the data. You can think of the median as the “middle value,” but it does not actually have to be one of the observed values. It is a number that separates ordered data into halves. Half the values are the same number or smaller than the median, and half the values are the same number or larger. For example, consider the following data.
1; 11.5; 6; 7.2; 4; 8; 9; 10; 6.8; 8.3; 2; 2; 10; 1
Ordered from smallest to largest:
1; 1; 2; 2; 4; 6; 6.8; 7.2; 8; 8.3; 9; 10; 10; 11.5
Since there are 14 observations, the median is between the seventh value, 6.8, and the eighth value, 7.2. To find the median, add the two values together and divide by two.
The median is seven. Half of the values are smaller than seven and half of the values are larger than seven.
Quartiles are numbers that separate the data into quarters. Quartiles may or may not be part of the data. To find the quartiles, first find the median or second quartile. The first quartile, Q1, is the middle value of the lower half of the data, and the third quartile, Q3, is the middle value, or median, of the upper half of the data. To get the idea, consider the same data set:
1; 1; 2; 2; 4; 6; 6.8; 7.2; 8; 8.3; 9; 10; 10; 11.5
The median or second quartile is seven. The lower half of the data are 1, 1, 2, 2, 4, 6, 6.8. The middle value of the lower half is two.
1; 1; 2; 2; 4; 6; 6.8
The number two, which is part of the data, is the first quartile. One-fourth of the entire sets of values are the same as or less than two and three-fourths of the values are more than two.
The upper half of the data is 7.2, 8, 8.3, 9, 10, 10, 11.5. The middle value of the upper half is nine.
The third quartile, Q3, is nine. Three-fourths (75%) of the ordered data set are less than nine. One-fourth (25%) of the ordered data set are greater than nine. The third quartile is part of the data set in this example.
The interquartile range is a number that indicates the spread of the middle half or the middle 50% of the data. It is the difference between the third quartile (Q3) and the first quartile (Q1).
IQR = Q3 – Q1
The IQR can help to determine potential outliers. A value is suspected to be a potential outlier if it is less than (1.5)(IQR) below the first quartile or more than (1.5)(IQR) above the third quartile. Potential outliers always require further investigation.
A potential outlier is a data point that is significantly different from the other data points. These special data points may be errors or some kind of abnormality or they may be a key to understanding the data.
For the following 13 real estate prices, calculate the IQR and determine if any prices are potential outliers. Prices are in dollars.
389,950; 230,500; 158,000; 479,000; 639,000; 114,950; 5,500,000; 387,000; 659,000; 529,000; 575,000; 488,800; 1,095,000
Order the data from smallest to largest.
114,950; 158,000; 230,500; 387,000; 389,950; 479,000; 488,800; 529,000; 575,000; 639,000; 659,000; 1,095,000; 5,500,000
M = 488,800
Q1 = 230,500 + 387,0002 = 308,750
Q3 = 639,000 + 659,0002 = 649,000
IQR = 649,000 – 308,750 = 340,250
(1.5)(IQR) = (1.5)(340,250) = 510,375
Q1 – (1.5)(IQR) = 308,750 – 510,375 = –201,625
Q3 + (1.5)(IQR) = 649,000 + 510,375 = 1,159,375
No house price is less than –201,625. However, 5,500,000 is more than 1,159,375. Therefore, 5,500,000 is a potential outlier.
For the two data sets in the test scores example, find the following:
- The interquartile range. Compare the two interquartile ranges.
- Any outliers in either set.
The five number summary for the day and night classes is
a) The IQR for the day group is Q3 – Q1 = 82.5 – 56 = 26.5
The IQR for the night group is Q3 – Q1 = 89 – 78 = 11
The interquartile range (the spread or variability) for the day class is larger than the night class IQR. This suggests more variation will be found in the day class’s class test scores.
b) Day class outliers are found using the IQR times 1.5 rule. So,
- Q1 – IQR(1.5) = 56 – 26.5(1.5) = 16.25
- Q3 + IQR(1.5) = 82.5 + 26.5(1.5) = 122.25
Since the minimum and maximum values for the day class are greater than 16.25 and less than 122.25, there are no outliers.
Night class outliers are calculated as:
- Q1 – IQR (1.5) = 78 – 11(1.5) = 61.5
- Q3 + IQR(1.5) = 89 + 11(1.5) = 105.5
For this class, any test score less than 61.5 is an outlier. Therefore, the scores of 45 and 25.5 are outliers. Since no test score is greater than 105.5, there is no upper end outlier.
Interpreting Percentiles, Quartiles, and Median
A percentile indicates the relative standing of a data value when data are sorted into numerical order from smallest to largest. Percentages of data values are less than or equal to the pth percentile. For example, 15% of data values are less than or equal to the 15th percentile.
- Low percentiles always correspond to lower data values.
- High percentiles always correspond to higher data values.
A percentile may or may not correspond to a value judgment about whether it is “good” or “bad.” The interpretation of whether a certain percentile is “good” or “bad” depends on the context of the situation to which the data applies. In some situations, a low percentile would be considered “good;” in other contexts a high percentile might be considered “good”. In many situations, there is no value judgment that applies.
On a timed math test, the first quartile for time it took to finish the exam was 35 minutes. Interpret the first quartile in the context of this situation.
- Twenty-five percent of students finished the exam in 35 minutes or less.
- In other words, Seventy-five percent of students finished the exam in 35 minutes or more.
- Here a low percentile could be considered good, as finishing more quickly on a timed exam is desirable. (If you take too long, you might not be able to finish.)
On a 20 question math test, the 70th percentile for the number of correct answers was 16. Interpret the 70th percentile in the context of this situation.
- 70% of students answered 16 or fewer questions correctly.
- In another words, 30% of students answered more than 16 questions correctly.
- Here a high percentile could be considered good, as answering more questions correctly is desirable.
At a community college, it was found that the 30th percentile of credit units that students are enrolled for is seven units. Interpret the 30th percentile in the context of this situation.
- 30% of students enrolled in is 7 credit units or less.
- In another words, 70% of students enrolled in is 7 or more units.
- Here a low percentile could be considered good, as enrolling in more credit units is desirable.
Sharpe Middle School is applying for a grant that will be used to add fitness equipment to the gym. The principal surveyed 15 anonymous students to determine how many minutes a day the students spend exercising. The results from the 15 anonymous students are shown.
0; 40; 60; 30 ;60; 10; 45; 30; 300; 90; 30; 120; 60; 0; 20
Determine the following five values.
- Min = 0
- Q1 = 20
- Med = 40
- Q3 = 60
- Max = 300
If you were the principal, would you be justified in purchasing new fitness equipment?
Since 75% of the students exercise for 60 minutes or less daily, and since the IQR is 40 minutes (60 – 20 = 40), we know that half of the students surveyed exercise between 20 minutes and 60 minutes daily. This seems a a reasonable amount of time spent exercising, so the principal would be justified in purchasing the new equipment.
However, the principal needs to be careful. The value 300 appears to be a potential outlier.
Q3 + 1.5(IQR) = 60 + (1.5)(40) = 120.
The value 300 is greater than 120 so it is a potential outlier. If we delete it and calculate the five values, we get the following values:
- Min = 0
- Q1 = 20
- Q3 = 60
- Max = 120
We still have 75% of the students exercising for 60 minutes or less daily and half of the students exercising between 20 and 60 minutes a day. However, 15 students is a small sample and the principal should survey more students to be sure of his survey results.