In the earlier notes of descriptive statistics, we have covered raw, central, and absolute moments. In this note, we will study the characteristics, precisely the shape and peakedness, of the frequency curve or distribution in terms of the followings:
- Skewness: It tells the amount and the direction of skewness from the horizontal symmetry.
- Kurtosis: It tells the shape of the central peak or flatness of the curve.
Study the shape of the Frequency Curve or Distribution
To study the symmetry of the curve based on the given set of data, whether it is more concentrated on the left, right, or uniformly concentrated on the center of the frequency curve. This feature is called skewness. In order to quantify it, we have something called the coefficient of skewness.
The literal meaning of skewness is lack of symmetry, and it gives an idea about the shape of the curve obtained by frequency distribution or frequency curve of data. It shows the nature and concentration of observations towards higher or lower values of variables.
Skewness
A distribution is said to be skewed if the frequency curve of the distribution is not bell-shaped and is stretched more to one side than to the other.
- The frequency distribution for which the curve has a longer tail towards the left-hand side is negatively skewed.
- The frequency distribution for which the curve has a longer tail towards the right-hand side is positively skewed.
- The frequency distribution for which the curve is equally distributed on both left and right is zero skewness.
The data findings are mainly categorized into three categories, positively skewed, negatively skewed, and zero skewness.
Negatively and Positively Skewness
A negatively skewed frequency curve is a type of curve in which most values are clustered around the right tail of the curve while the left tail of the curve is longer. Whereas, in the case of a positively skewed frequency curve, most values are clustered around the left tail of the curve while the right tail of the curve is longer.
The coefficient of skewness measures the skewness of a distribution or frequency curve. It is based on the notion of the moment of the distribution curve. This coefficient is one of the measures of skewness, and it can be measured using any of the measures of central tendency. We will see how to calculate the coefficient of skewness using mean, median, mode, quantiles, and percentiles.
Coefficient of Skewness
For population
=
Where and are the second and third central moments respectively. measures the magnitude only. To measure both mangitude as well as signs as positive (+) or negative (-). From the below equation, we can conclude that sign depends on .
= =
For sample
Let us consider we have n observations, and the mean of n observations is denoted as and the variance is denoted as . The skewness of the distribution curve is measure as the following:
=
=
Interpretations
- If or = 0, it means the distribution is symmetric or in other words, it is normally distributed or zero skewed.
- If or > 0, it means the distribution is positively skewed.
- If or = 0, it means the distribution is negatively skewed.
We can easily see that whether the distribution is symmetric or not symmetric. If not symmetric, then seeing the value of or , we get the information about whether distribution is positively or negatively skewed.
When the distribution is positively or negatively skewed, the measures of central tendencies such as mean, median, and mode would be different. We will study this in the next section.
Coefficient of Skewness using mean & median
To measure the coefficient of skewness for sample data using mean and median. Let us consider as mean, as median, and as standard deviation. The formula to compute the coefficient of skewness is defined as follows:
=
range is defined as: and interpretation for positively, negatively, and zero skewness is the same as above.
Coefficient of Skewness using mean & mode
To measure the coefficient of skewness for sample data using mean and mode. Let us consider as mean, as mode, and as standard deviation. The formula to compute the coefficient of skewness is defined as follows:
=
range is defined as: and interpretation for positively, negatively, and zero skewness is the same as above.
Coefficient of Skewness using quartiles
To measure the coefficient of skewness for sample data using quartiles. Let us consider represents 25%, 50%, 75% and 100% of observations respectively. The formula to compute the coefficient of skewness is defined as follows:
=
range is defined as: and interpretation for positively, negatively, and zero skewness is the same as above.
Coefficient of Skewness using percentiles
To measure the coefficient of skewness for sample data using percentiles. Let us consider represents 10%, 50%, 90% of observations respectively. The formula to compute the coefficient of skewness is defined as follows:
=
range is defined as: and interpretation for positively, negatively, and zero skewness is the same as above.
Central Tendency Measures in Negatively, Zero, Positively Skewed Curve
In normal or zero skewness, the measures of central tendency techniques, such as mean, median, or mode, represent approximately the same value. However, for positively or negatively skewed distribution curves, the measures of central tendencies are dispersed and the relationship among the central tendencies are depicted in the above diagram of frequency curve.
Positively skewed distribution
The mode is having the highest value, followed by median and mean for frequencies. Whereas, in terms of data value comparison, the mode will have the smaller value followed by median and mean in the increasing order. It follows the following order: Mode < Median < Mean.
Negatively skewed distribution
The mode is having the highest value, followed by median and mean for frequencies. In terms of data value comparison, the mean will have the smaller value followed by median and mode in the increasing order. it follows the following order: Mode > Median > Mean.
If the frequency curve or distribution is symmetrical, the mean is the best measure of central tendency. However, if the frequency curve is skewed positively or negatively, the median will be a more accurate measure of central tendency.
Dealing with skewed curves
Statistical tests produce a great result when the frequency curve is normal or close to normal distribution. However, in the case of skewed frequency curves, a statistical test may produce a misleading result. There are various approaches to make the frequency curve close to normal, and one of the most popular techniques is the log transformation, which reduces the skewness of a frequency curve.
Kurtosis
It describes the peakedness or flatness of a frequency curve or distribution. The flatness means, how flat is the curve at the peak.
In the above diagram, we can see three different flatness or peakedness. In other words, we can visually see the different distribution curves and compare their peakedness. However, the question is how to mathematically quantify and compare the peakedness or the flatness of the variables’ different frequency curves.
Comparision of Peakedness
Usually, the peakedness of a distribution curve or frequency curve is measured with respect to the peakedness of a normal distribution. It means if the curve is normally distributed, then the Kurtosis value will be zero. Whereas in other cases, it must have non-zero values.
Effectively we can say that Kurtosis examines the hump or flatness of the given frequency curve or distribution with respect to the hump or flatness of the normal distribution. Here shape of the hump of the normal distribution has been accepted as a standard.
- Curves with a hump-like of normal distribution curve are called mesokurtic.
- Curves with greater peakedness or less flatness than of normal distribution curve are called leptokurtic.
- Curves with less peakedness or greater flatness than the normal distribution curve are called platykurtic.
Quantify the Peakedness using Coefficient of Kurtosis
We can quantify the peakedness using the coefficient of Kurtosis. However, there are different types of coefficient of Kurtosis, and one of them is known as Karl Pearson’s coefficient of Kurtosis. It is represented as follows:
For Population
=
Where and are the second and fourth central moments respectively. measures the magnitude only. To measure both magnitudes as well as signs as positive (+) or negative (-). From the below equation, we can conclude that sign depends on the subtraction of with number three. Here number three is the value of normal distribution as the normal distribution peakedness is the reference for comparison.
=
For sample
Let us consider we have n observations, and the arthematic mean of n observations is denoted as and the variance is denoted as . The sample based coefficients of kurtosis are as follows:
=
= – 3
Interpretations
- For normal distribution or mesokurtic distirbution, or = 3 and or = 0.
- For leptokurtic distribution, or > 3, and or > 0.
- For platykurtic distirbution, or < 3 and or < 0.
Importance to study Kurtosis of the frequency curve
Leptokurtotic type of curves are having higher likelyhood of extreme values compared to mesokurtic.
References
- Descriptive Statistic, By Prof. Shalabh, Dept. of Mathematics and Statistics, IIT Kanpur.
1,844 total views, 1 views today