Generally, we assume that the frequencies are concentrated in the middle part of the class interval in the grouped data. Based on the middle value, we compute moments and then compute mean, variance, and other quantitative measures. However, this assumption does not hold in general and introduces a grouping error.
Understanding of grouping error
In a simplified way, it can be explained as follows:
- Grouped data is constructed by dividing the entire continuous data into different class intervals.
- Frequency represents the number of data points that belong to a particular class interval or a range.
- Grouping technique is used to represent each datapoint with a single value within a class interval. This value is called the midpoint.
- The midpoint is computed by averaging the lower and higher class range (Lower Range + Higher Range)/2.
It is also called the grouping of observations by midpoint value within a class interval. Because of this approach, there is an information loss, which can be understood with the example below.
Suppose we have observations in a class interval range 5 – 6 and observations are {5.11, 5.13, 5.2, 5.22, 5.31, 5.41, 5.51, 5.6, 5.7, 5.89, 5.99} and computed midpoint value or representation value is 5+6/2 = 5.5. It implies that all the data points within a class interval are 5.5, which is not true, as we can see in the observation data.
In the above diagram, a few observations or data points are taken and depicted how grouping is happening. After grouping, all the data points represent 5.5 irrespective of their actual values. Moreover, this is the way grouping introduces errors.
Impact of grouping error in moments
In the case of continuous data, the moments are computed using midpoints values. From the above explanations, we can say that the midpoint does not represent the actual value of individual data points. So, these errors will be reflected in moments, and consequently, these errors will be propagated in mean, variance, and other quantitative values, which are computed using moments.
Sheppard’s Correction
Sheppard’s correction is applicable in continuous data type, and it eliminates grouping error which highly impacts moments. The effect of grouping error can be corrected by calculating the moments using the infromation on width of the class interval rather midpoint of the class interval.
Prof. Sheppard proved that if the frequency distribution is continuous and the frequency tapers off to zero in both directions, the grouping effect can be corrected as follows: Let us consider that c be the width of the class interval.
Corrected Raw Moments
- =
- =
- =
- =
Corrected Central Moments
- =
- =
- =
Conditions for correction
- Frequencies should be large enough to apply the above correction.
- Frequency distribution should be continuous and finite in range.
- Frequency distribution should be tapped off to zero in both directions, which means the curve should be symmetric rather than J-shaped or U-shaped curves.
References
- Descriptive Statistic, By Prof. Shalabh, Dept. of Mathematics and Statistics, IIT Kanpur.
294 total views, 1 views today