Home » Section 2.7

Recent Posts

Recent Comments

    Attribution-NonCommercial-ShareAlike 4.0 International

    Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.

    Section 2.7

    Here is the link to section 2.7 of the textbook.

    2.7 Measures of the Spread of the Data

     

    An important characteristic of any set of data is the variation in the data. In some data sets, the data values are concentrated closely near the mean; in other data sets, the data values are more widely spread out from the mean. The most common measure of variation, or spread, is the standard deviation. The standard deviation is a number that measures how far data values are from their mean.

    The standard deviation

    • provides a numerical measure of the overall amount of variation in a data set, and
    • can be used to determine whether a particular data value is close to or far from the mean.

     

     

    The standard deviation provides a measure of the overall variation in a data set

    The standard deviation is always positive or zero. The standard deviation is small when the data are all concentrated close to the mean, exhibiting little variation or spread. The standard deviation is larger when the data values are more spread out from the mean, exhibiting more variation.

    Suppose that we are studying the amount of time customers wait in line at the checkout at supermarket A and supermarket B. the average wait time at both supermarkets is five minutes. At supermarket A, the standard deviation for the wait time is two minutes; at supermarket B the standard deviation for the wait time is four minutes.

     

    Because supermarket B has a higher standard deviation, we know that there is more variation in the wait times at supermarket B. Overall, wait times at supermarket B are more spread out from the average; wait times at supermarket A are more concentrated near the average.

     

    The standard deviation can be used to determine whether a data value is close to or far from the mean.

    Suppose that Rosa and Binh both shop at supermarket A. Rosa waits at the checkout counter for seven minutes and Binh waits for one minute. At supermarket A, the mean waiting time is five minutes and the standard deviation is two minutes. The standard deviation can be used to determine whether a data value is close to or far from the mean.

    Rosa waits for seven minutes:

    • Seven is two minutes longer than the average of five; two minutes is equal to one standard deviation.
    • Rosa’s wait time of seven minutes is two minutes longer than the average of five minutes.
    • Rosa’s wait time of seven minutes is one standard deviation above the average of five minutes.

    Binh waits for one minute.

    • One is four minutes less than the average of five; four minutes is equal to two standard deviations.
    • Binh’s wait time of one minute is four minutes less than the average of five minutes.
    • Binh’s wait time of one minute is two standard deviations below the average of five minutes.
    • A data value that is two standard deviations from the average is just on the borderline for what many statisticians would consider to be far from the average. Considering data to be far from the mean if it is more than two standard deviations away is more of an approximate “rule of thumb” than a rigid rule. In general, the shape of the distribution of the data affects how much of the data is further away than two standard deviations. (You will learn more about this in later chapters.)

    The number line may help you understand standard deviation. If we were to put five and seven on a number line, seven is to the right of five. We say, then, that seven is one standard deviation to the right of five because 5 + (1)(2) = 7.

    If one were also part of the data set, then one is two standard deviations to the left of five because 5 + (–2)(2) = 1.

    number line

     

    In general, a value = mean + (#ofSTDEV)(standard deviation)

    where #ofSTDEVs = the number of standard deviations

    One is two standard deviations less than the mean of five because: 1 = 5 + (–2)(2).

     

    Calculating the Standard Deviation

    If x is a number, then the difference “x – mean” is called its deviation. In a data set, there are as many deviations as there are items in the data set. The deviations are used to calculate the standard deviation. If the numbers belong to a population, in symbols a deviation is x – μ. For sample data, in symbols a deviation is x – x¯.

    The procedure to calculate the standard deviation depends on whether the numbers are the entire population or are data from a sample. The calculations are similar, but not identical. Therefore the symbol used to represent the standard deviation depends on whether it is calculated from a population or a sample. The lower case letter s represents the sample standard deviation and the Greek letter σ (sigma, lower case) represents the population standard deviation. If the sample has the same characteristics as the population, then s should be a good estimate of σ.

    To calculate the standard deviation, we need to calculate the variance first. The variance is the average of the squares of the deviations (the x – x¯ values for a sample, or the x – μ values for a population). The symbol σ2 represents the population variance; the population standard deviation σ is the square root of the population variance. The symbol s2 represents the sample variance; the sample standard deviation s is the square root of the sample variance. You can think of the standard deviation as a special average of the deviations.

    In this course, we calculate the standard deviation by software GeoGebra. Once, you get the summary statistics in GeoGebra, you can read the standard deviation from the table.

    If the numbers come from a census of the entire population and not a sample, when we calculate the average of the squared deviations to find the variance, we divide by N, the number of items in the population. If the data are from a sample rather than a population, when we calculate the average of the squared deviations, we divide by n – 1, one less than the number of items in the sample.

    e^{\i \pi} + 1 = 0 s^{\i \pi} + 1 = 0

     

    MathJax.Hub.Config({
      tex2jax: {
        inlineMath: [['$','$'], ['\\(','\\)']],
        processEscapes: true,
        ignoreHtmlClass: 'tex2jax_ignore|editor-rich-text'
      }
    });

     

     

    Example. In a fifth grade class, the teacher was interested in the average age and the sample standard deviation of the ages of her students. The following data are the ages for a SAMPLE of n = 20 fifth grade students. The ages are rounded to the nearest half year:

    9; 9.5; 9.5; 10; 10; 10; 10; 10.5; 10.5; 10.5; 10.5; 11; 11; 11; 11; 11; 11; 11.5; 11.5; 11.5;

    1. Enter the data into the Spreadsheet in GeoGebra .

    2. Select/highlight the data.

     

     

    3. Under “tool” icon, select “One Variable Analysis”.

     

     

    4. Click on the “Show Statistics” icon on the top right.

     

    5. Now, you can see the table under “Statistics”.

     

    From the table, you can read the standard deviations. In the table, <!–MathML:

    σ and s–>