Sunday, January 5, 2020

The Range of Statistical Data Sets

In statistics and mathematics, the range is the difference between the maximum and minimum values of a data set and serve as one of two important features of a data set. The formula for a range is the maximum value minus the minimum value in the dataset, which provides statisticians with a better understanding of how varied the data set is. Two important features of a data set include the center of the data and the spread of the data, and the center can be ​measured in a number of ways: the most popular of these are the mean, median, mode, and midrange, but in a similar fashion, there are different ways to calculate how spread out the data set is and the easiest and crudest measure of spread is called the range. The calculation of the range is very straightforward. All we need to do is find the difference between the largest data value in our set and the smallest data value. Stated succinctly we have the following formula: Range Maximum Value–Minimum Value. For example, the data set 4,6,10, 15, 18 has a maximum of 18, a minimum of 4 and a range of 18-4 14. Limitations of Range The range is a very crude measurement of the spread of data because it is extremely sensitive to outliers, and as a result, there are certain limitations to the utility of a true range of a data set to statisticians because a single data value can greatly affect the value of the range. For example, consider the set of data 1, 2, 3, 4, 6, 7, 7, 8. The maximum value is 8, the minimum is 1 and the range is 7. Then consider the same set of data, only with the value 100 included. The range now becomes 100-1 99 wherein the addition of a single extra data point greatly affected the value of the range. The standard deviation is another measure of spread that is less susceptible to outliers, but the drawback is that the calculation of the standard deviation is much more complicated. The range also tells us nothing about the internal features of our data set. For example, we consider the data set 1, 1, 2, 3, 4, 5, 5, 6, 7, 8, 8, 10 where the range for this data set is 10-1 9.  If we then compare this to the data set of 1, 1, 1, 2, 9, 9, 9, 10. Here the range is, yet again, nine, however, for this second set and unlike the first set, the data is clustered around the minimum and maximum. Other statistics, such as the first and third quartile, would need to be used to detect some of this internal structure. Applications of Range The range is a good way to get a very basic understanding of how spread out numbers in the data set really are because it is easy to calculate as it only requires a basic arithmetic operation, but there are also a few other applications of the range of a data set in statistics. The range can also be used to estimate another measure of spread, the standard deviation. Rather than go through a fairly complicated formula to find the standard deviation, we can instead use what is called the range rule. The range is fundamental in this calculation. The range also occurs in a boxplot, or box and whiskers plot. The maximum and minimum values are both graphed at the end of the whiskers of the graph and the total length of the whiskers and box is equal to the range.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.