Home » Section 2.2

Recent Posts

Recent Comments

    Attribution-NonCommercial-ShareAlike 4.0 International

    Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.

    Section 2.2

    Here is the link to Section 2.2 of the textbook. Below is a modified version of the section.

    Histogram

    For most of the work you do in this book, you will use a histogram to display the data. One advantage of a histogram is that it can readily display large data sets. A rule of thumb is to use a histogram when the data set consists of 100 values or more.

    A histogram consists of contiguous (adjoining) boxes. It has both a horizontal axis and a vertical axis. The horizontal axis is labeled with what the data represents (for instance, distance from your home to school). The vertical axis is labeled either frequency or relative frequency (or percent frequency or probability). The graph will have the same shape with either label. The histogram can give you the shape of the data, the center, and the spread of the data.

    To construct a histogram, first decide how many bars or intervals, also called classes, represent the data. Many histograms consist of five to 15 bars or classes for clarity. The number of bars needs to be chosen. Choose a starting point for the first interval.

    Example 2.7

    The following data are the heights (in inches to the nearest half inch) of 100 male semiprofessional soccer players. The heights are continuous data, since height is measured.

    60 60.5 61 61 61.5 63.5 63.5 63.5
    64 64 64 64 64 64 64 64.5
    64.5 64.5 64.5 64.5 64.5 64.5 64.5 66
    66 66 66 66 66 66 66 66
    66 66.5 66.5 66.5 66.5 66.5 66.5 66.5
    66.5 66.5 66.5 66.5 67 67 67 67
    67 67 67 67 67 67 67 67
    67.5 67.5 67.5 67.5 67.5 67.5 67.5 68
    68 69 69 69 69 69 69 69
    69 69 69 69.5 69.5 69.5 69.5 69.5
    70 70 70 70 70 70 70.5 70.5
    70.5 71 71 71 72 72 72 72.5
    72.5 73 73.5 74

     

    The smallest data value is 60. We can use this as the starting point.

    The largest value is 74. Let’s use this as the ending value.

    Next, calculate the width of each bar or class interval. To calculate this width, subtract the starting point from the ending value and divide by the number of bars (you must choose the number of bars you desire). Suppose you choose five bars.

     

    fraction numerator 74 minus 60 over denominator 5 end fraction equals 2.8

     

    If you round 2.8 up, to the whole number, it becomes 3. So, let build five bars each with width of 3. The boundaries are:

    • 60
    • 60 + 3 = 63
    • 63 + 3 = 66
    • 66 + 3 = 69
    • 69 + 3 = 71
    • 71 + 3 = 75

     

    The heights 60 through 61.5 inches are in the interval 60–63. The heights that are 63.5 through 64.5 are in the interval 63–66. The heights that are 66 through 68 are in the interval 66–69. The heights 69 through 70.5 are in the interval 69–71. The heights 71 through 74 are in the interval 71–75.

    The following histogram displays the heights on the x-axis and frequency on the y-axis.

     

    Histogram
    As you can see in the above picture, in this histogram, there are 5 bins each of the width of 3. The x-axis represents the heights (in inches) and the y-axis represents the frequency. We can read the histogram as follow:
    • The first bin tells us there are 5 people with heights between 60 to 63 inches.
    • The second bin tells us there are 18 people with heights between 63 to 66 inches.
    • The third bin tells us there are 42 people with heights between 66 to 69 inches.
    • The fourth bin tells us there are 27 people with heights between 60 to 72 inches.
    • The last bin tells us there are 8 people with heights between 72 to 75 inches.
    Below shows the relative frequency histogram of the data. The only difference is that the y-axis is the relative frequency of data values instead of frequency.
    relative frequency histogram
    The following pictures also shows a relative frequency histogram for the data. Interpret the histogram.
    relative frequency histogram
    Answer:
    • The first bin tells us 5% of the sample have heights between 59.95 to 61.95 inches.
    • The second bin tells us 3% of the sample have heights between 61.95 to 63.95 inches.
    • The third bin tells us 15% of the sample have heights between 63.95 to 65.95 inches.
    • The fourth bin tells us 40% of the sample have heights between 65.95 to 67.95 inches.
    • The fifth bin tells us 17% of the sample have heights between 67.95 to 69.95 inches.
    • The six bin tells us 12% of the sample have heights between 69.95 to 71.95 inches.
    • The seventh bin tells us 7% of the sample have heights between 71.95 to 73.95 inches.
    • The last bin tells us 1% of the sample have heights between 73.95 to 75.95 inches.

    TRY IT 2.7

    The following data are the shoe sizes of 50 male students. The sizes are discrete data since shoe size is measured in whole and half units only. Construct a histogram and calculate the width of each bar or class interval. Suppose you choose six bars.


    9; 9; 9.5; 9.5; 10; 10; 10; 10; 10; 10; 10.5; 10.5; 10.5; 10.5; 10.5; 10.5; 10.5; 10.5
    11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11.5; 11.5; 11.5; 11.5; 11.5; 11.5; 11.5
    12; 12; 12; 12; 12; 12; 12; 12.5; 12.5; 12.5; 12.5; 14

     

    Example 2.8
    Create a histogram for the following data: the number of books bought by 50 part-time college students at ABC College. The number of books is discrete data, since books are counted.

    1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1
    2; 2; 2; 2; 2; 2; 2; 2; 2; 2
    3; 3; 3; 3; 3; 3; 3; 3; 3; 3; 3; 3; 3; 3; 3; 3
    4; 4; 4; 4; 4; 4
    5; 5; 5; 5; 5
    6; 6
    Eleven students buy one book. Ten students buy two books. Sixteen students buy three books. Six students buy four books. Five students buy five books. Two students buy six books.
    Because the data are integers, subtract 0.5 from 1, the smallest data value and add 0.5 to 6, the largest data value. Then the starting point is 0.5 and the ending value is 6.5.
    Calculate the number of bars as follows:
    fraction numerator 6.5 minus 0.5 over denominator n u m b r space o f space b a r s end fraction equals 1
    where 1 is the width of a bar. Therefore, bars = 6.
    The following histogram displays the number of books on the  x-axis and the frequency on the  y-axis.
     
     
    We can read the histogram as follow:
    • The first bin tells us, 11 students bought 1 book.
    • The second bin tells us, 10 students bought 2 books.
    • The third bin tells us, 16 students bought 3 books.
    • The fourth bin tells us, 6 students bought 4 books.
    • The fifth bin tells us, 5 students bought 5 books.
    • The last bin tells us, 2 students bought 6 books.

    TRY IT 2.8

    The following data are the number of sports played by 50 student athletes. The number of sports is discrete data since sports are counted.

    1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1
    2; 2; 2; 2; 2; 2; 2; 2; 2; 2; 2; 2; 2; 2; 2; 2; 2; 2; 2; 2; 2; 2
    3; 3; 3; 3; 3; 3; 3; 3


    20 student athletes play one sport. 22 student athletes play two sports. Eight student athletes play three sports.

     

    EXAMPLE 2.9

    Using this data set, construct a histogram.

    Number of Hours My Classmates Spent Playing Video Games on Weekends
    9.95 10 2.25 16.75 0
    19.5 22.5 7.5 15 12.75
    5.5 11 10 20.75 17.5
    23 21.9 24 23.75 18
    20 15 22.9 18.8 20.5

     

     

    Answer:

    Some values in this data set fall on boundaries for the class intervals. A value is counted in a class interval if it falls on the left boundary, but not if it falls on the right boundary. Different researchers may set up histograms for the same data in different ways. There is more than one correct way to set up a histogram. 
     

    TRY IT 2.9

    The following data represent the number of employees at various restaurants in New York City. Using this data, create a histogram.

    22; 35; 15; 26; 40; 28; 18; 20; 25; 34; 39; 42; 24; 22; 19; 27; 22; 34; 40; 20; 38; and 28

    Use 10–19 as the first interval.

     

    Dot Plot

    Dot Plot is another graph to visualize the distribution of data. In a dot plot, we put a dot above each data value.

    Example. The dotplot shows the distribution of ages of employees at a company.

     

     

    • How many employees are in this company?

           Answer: There are 25 dots in this graph, so in total there are 25 employees in this company.

    • How many employees are 40 years old?

            Answer: There are 4 dots above 40 indicating that 4 employees are 40 years old.

     

    •  What percentages of the employees are 40 years old?

           Answer: 4 employees of the total 25 employees are 40 years old, so we have 16% of the employees are 40 years old.

    4 over 25 cross times 100 equals 16 percent sign

     

    •  How many employees are 45 years old?

             Answer: 2 employees are 45 years old.

     

     

    • How many employees are 44 or older?

             Answer: There are 6 employees 44 years old and 2 employees are 45 years old. So, in total 8 employees are 44 or older.

     

     

    • What percentages of the employees are 44 or older?

            Answer: 8 employees are 44 or older.out of the total 25. So, we have 32% of the employees are 44 or older.

    8 over 25 cross times 100 equals 32