One of the major setbacks of various data visualization methods is that they become increasingly difficult to read with larger datasets. The case is quite different for histograms, which is mostly used to visualize large datasets of discrete and continuous data.
Histograms provide a visual representation of quantitative data by using the height of neatly joined rectangular bars to indicate the frequency of points in a class interval. This graph can be generated manually by drawing it with a straight ruler, or digitally using Excel.
Constructing a histogram is quite easy when done digitally. Therefore, this article will cover details about what a histogram is and how we can create them digitally using Excel.
What is a Histogram Graph?
A histogram graph is a graph that is used to visualize the frequency of discrete and continuous data using rectangular bars. The rectangular bars show the number of data points that fall into a specified class interval.
Also known as a histogram chart, the class intervals (or bins) are not always of equal size across the horizontal axis. When constructing a histogram chart, the first thing to do after data collection is to determine the bins or class intervals.
The data should be grouped according to this interval, then a frequency of the data that fall into these groups will be taken. Since the class intervals are usually a continuous range of values, unlike bar charts, the rectangular pars are not spaced (i.e. they touch each other).
Features of a Histogram Graph
A histogram graph is a popular graphing tool that provides a visual representation of data distribution. In order to identify a histogram chart, here are some qualities you may need to look out for.
The title of a histogram is what gives an insight into the data visualized on the graph. It summarizes the information depicted on the histogram chart.
With a title on a histogram chart, a third party can easily depict what the graph is about without going any further to read the graph itself. In the illustration above, the title of the histogram chart is Histogram.
A histogram chart has two axes, the vertical and the horizontal axis. The vertical axis on the histogram chart indicates the frequency, while the horizontal axis indicates the class intervals or bins.
These 2 axes are usually labeled with what they represent, giving more meaning to the title of the histogram graph. The horizontal (x) axis shows the scale of values in which the class interval is measured.
Each of these axes usually has a label that describes the kind of data plotted on each axis. In this case, the horizontal label is Bin while the vertical label is Frequency
The bars are the body of a histogram graph, which mainly visualizes the data set. The bars on a histogram chart are rectangular in shape, and they indicate the number of times values fall in each class interval.
The height of the bars shows the frequency, while the bar width indicates interval. Histogram graphs with uniform class intervals usually have the same width.
The scale of a histogram is a set of numbers used to measure or quantify the dataset on the graph. This is part of what determines the width and height of each rectangular bar on the histogram chart.
It is what describes how each unit on the horizontal and vertical axes are structured. In the illustration, 1 unit on the horizontal axis is equivalent to 20, while 2 unit on the vertical axis is equivalent to 2.
Histographs are graphs formed by joining the top midpoint of the rectangular bars on an existing histogram chart. Also known as a frequency polygon, they are usually used when visualizing a dataset of continuous variables.
In other words, we may say that histographs are line charts drawn on a histogram chart, where the mid-apex point of the rectangular bars is the data points.
Not all histogram charts have a histograph. However, when added to a histogram chart, it is very useful in giving more information about the dataset.
Types of Histogram Graph
Histogram graphs are classified into different types based on the distribution of the rectangular bars on the graph. That is, the way the bars are shaped and the entire graph structure.
The various distributions of histogram charts are highlighted below:
A histogram chart is said to be of the normal distribution if it is bell-shaped. As the name implies, it is the normal or typical structure a histogram distribution is expected to adopt, even if it is not “normal”.
In some cases, a histogram graph can be said to be normally distributed by merely taking a look at it. However, other distributions are similar to a normal distribution, making it necessary to perform statistical calculations before a distribution can be said to be normal.
A bimodal distribution is an outcome of combining two different processes in one dataset. This distribution contains two different normally distributed graphs.
For example, the data collected from the two divisions of a class (e.g. Class 1A and Class 1B) has a possibility of being bimodal. Looking like the back of a double-humped camel, it is also referred to as double-peaked distribution.
A skewed distribution is an asymmetric graph with an off-center peak tending towards the limit of the graph (or away from the tail). There are two types of skewed distributions, namely; right-skewed and left-skewed distributions.
In a right-skewed distribution, the tail of the graph is on the right-hand side. It is also known as a positively skewed distribution.
On the other hand, a left-skewed distribution has its tail on the left-hand side, and is also known as a negatively skewed distribution.
This type of distribution lacks a particular pattern and produces several peaks. Hence, it is also referred to as multimodal distribution.
Random distribution is usually generated when a dataset containing variables with different properties are combined. In this case, the data should be sorted and separately analyzed.
Edge Peak Distribution
The edge peak distribution is very similar to the normal distribution, with the distinguishing factor being that the former has a large peak at one of the tails.
This kind of distribution is usually generated due to an error in the histogram graph construction.
As the name suggests, the bars in a comb distribution have a comb-like structure. These bars alternate in between tall and short, making the graph look like the mouth of a comb.
The comb distribution is usually generated due to rounding errors in the data set.
A truncated distribution is generated when the tail of a normal distribution is cut off in the resulting histogram chart. Cutting off the tail sometimes gives it a heart-like shape, resulting in it being called a heart-cut distribution.
Histogram Graph Examples
Example 1: ABC Company is trying to reduce customer waiting time in queues for better customer satisfaction. To do this, they took a random customer and interviewed him on the amount of time he has had to wait in the queue in the past 10 days.
The table below is the result of this interview. Create a histogram chart using this data. Hence, determine what kind of distribution the graph is
Solution: As shown below, we have created a histogram with 4 bins and 4 frequencies. The horizontal axis shows the range of waiting time, while the vertical axis indicates the average number of customers that experience each interval of waiting time.
The resulting histogram chart has a randomized distribution.
Example 2: A philanthropist wants to donate supplies to a less privileged community. In order to determine the quantity and kind of supplies to donate, a survey was carried out in the community.
The goal of this survey is to find out the age demography of community residents. After conducting this survey, it was visualized on a histogram for easy analysis as shown below.
Given that the Number of People indicated in the graph is in hundreds, determine the population of people in each age bracket, then use your result to find out which of these brackets have the highest population.
Solution: Since the number of people specified on the histogram chart is in hundreds, then the number of people that belongs to each age bracket is:
0-20: 4*100 = 400 people
20-40: 7*100 = 700 people
40-60: 2*100 = 200 people
Clearly, the bracket with the highest population is the 20-40 age bracket. This means that the majority of the residents if this community are between the ages of 20 and 40.
Uses of Histogram Graph
- Identification of Mode in a Dataset
Without complex mathematical computations, one can easily identify the most common process outcome in a dataset. By visualizing the collected data on a histogram chart, the outcome with the highest frequency will easily stand out as the peak of the graph.
- Identifying data structure
One can easily spot trends in the data when reading a histogram chart. This can be helpful in making predictions, optimizing processes, and identifying possible issues.
- Spotting deviations in data
You can easily spot deviations in data when visualizing using a histogram compared to some other data visualization methods. This is very useful in cases where you are collecting data over time.
Immediately there is a deviation in the data, the deviation is noticed on the histogram chart. This will easily help you let you inspect the data collection process and make amends if the deviation is caused by human error.
Histogram vs Bar Chart
Although possessing very similar structures and characteristics, histogram charts and bar charts have quite a number of differences. These differences are what will assist us in recognizing these charts when we come across them.
Therefore, this section will dive into the similarities and differences between bar charts and histograms.
The rectangular bars in a bar chart are spaced while the rectangular bars in a histogram are joined together. Also, the horizontal labels on the bar graph are usually discrete or nominal data.
Histograms, on the other hand, have their axes labeled with the bins or class intervals of the data set.
In data analysis, bar graphs are used to measure the frequency of categorical data, while histograms measure ordinal and quantitative (interval and ratio) data. Although the vertical axis of both graphs is discrete, the horizontal axis of a bar graph is categorical while that of a histogram is numerical.
The rectangular bars on a bar graph are usually arranged in order of decreasing height. Histograms, on the other hand, have their rectangular bars ordered according to where they fall in the class interval.
Although the class intervals are arranged in ascending order, this does not mean the rectangular bars will necessarily be arranged the same way. This is because of the frequency of each interval, which randomly varies depending on the dataset.
Both histograms and bar charts have a title, axes, scale, and rectangular bars. By merely looking at both graphs, they look a lot like each other. This is mainly because they both employ the use of rectangular bars to visualize data.
Bar charts and histograms are both used to determine the mode or frequency of the elements in a dataset. The height of the rectangular bars corresponds to the frequency of a particular element in the dataset.
The simple way of reading these two graphs is by following the simple unofficial rule which states that, “The higher the bar, the higher the frequency, and vice versa.”
Histogram Graph in Excel
To construct a histogram chart using Excel, follow these few simple steps:
- Step 1: Enter your data into the Excel workbook as shown in the figure below.
The inputs are the set of random variables we want to visualize using Excel, Bin Range is the range of values you want to be indicated on the horizontal axis. This is what determines the width of the rectangular bars and the scale of the horizontal axis.
- Step 2: Go to Data>Analysis|Data Analysis. If you can’t find the Data Analysis tab, it means that you haven’t installed the Analysis Toolpak plugin. To install the plugin, go to File>Options and a dialogue box similar to the one below will pop up.
Click on Analysis Toolpak, then on the Go button. Another dialogue box similar to the diagram below will popup.
Check Analysis Toolpak and click OK as shown above. Your Data Analysis tab will now show up in your Analysis menu.
- Step 3: After clicking on Data Analysis, a dialog box similar to the one below will pop up.
Click on Histogram and then OK to go to the next step.
- Step 4: Enter the input range and bin range, then check the necessary options as shown in the diagram below. Click OK and you will have your histogram chart.
- Step 5: In the diagram below, our generated histogram chart looks more like a bar chart with space in between the bars.
Edit the gap in between the rectangular bars by highlighting all the bars, then go to Format Data Series>Series Options. Eliminate the dap by reducing Gap Width to 0% as shown below.
Separate the bars from each other by adding a Border Color to the bars.
- Step 6: Edit the bins or class intervals by right-clicking on the graph, then going to Select Data Series. This will bring a prompt similar to the one below.
Click Edit in the Horizontal (Category) Axis Labels to edit the bin labeling.
Click OK and there you have your histogram chart.
By merely looking at the symmetry of this graph, we can conclude that it is of random distribution.
Disadvantages of a Histogram Graph
- It can only be used to visualize continuous data. Since the graphs are usually joined together, it will be incorrect to use a histogram to visualize discrete data. This can only be done using a bar chart.
- Since data is grouped into different categories, histograms cannot read exact values. You cannot identify specific data points in the graph by merely reading the histogram chart.
- It cannot be used to compare two data sets.
A histogram chart is a great visualization tool for studying the variation of large data sets. It is one of the most used data visualization methods in statistical analysis.
Histograms are one of the seven basic tools of quality control because of their simplicity and ability to solve the majority of quality-related issues. Quality control analysts study different things in the histogram chart, including the distribution, width, and height of the rectangular bars.
Although it is commonly said that the height of a histogram indicates the frequency of occurrences in the bin, this does not apply to all cases. In cases whereby the histogram does not have equal bins, the frequency of the interval is determined by the area of the rectangular bars.