Descriptive Statistics – Measures of Shape – Skewness, and Kurtosis

In the earlier notes of descriptive statistics, we have covered raw, central, and absolute moments. In this note, we will study the characteristics, precisely the shape and peakedness, of the frequency curve or distribution in terms of the followings:

  • Skewness: It tells the amount and the direction of skewness from the horizontal symmetry.
  • Kurtosis: It tells the shape of the central peak or flatness of the curve.

Study the shape of the Frequency Curve or Distribution

To study the symmetry of the curve based on the given set of data, whether it is more concentrated on the left, right, or uniformly concentrated on the center of the frequency curve. This feature is called skewness. In order to quantify it, we have something called the coefficient of skewness

The literal meaning of skewness is lack of symmetry, and it gives an idea about the shape of the curve obtained by frequency distribution or frequency curve of data. It shows the nature and concentration of observations towards higher or lower values of variables.

Skewness

A distribution is said to be skewed if the frequency curve of the distribution is not bell-shaped and is stretched more to one side than to the other. 

  • The frequency distribution for which the curve has a longer tail towards the left-hand side is negatively skewed.
  • The frequency distribution for which the curve has a longer tail towards the right-hand side is positively skewed.
  • The frequency distribution for which the curve is equally distributed on both left and right is zero skewness.

The data findings are mainly categorized into three categories, positively skewed, negatively skewed, and zero skewness. 

Negatively and Positively Skewness

Negatively and Positive Skewness
Negatively and Positive Skewness

A negatively skewed frequency curve is a type of curve in which most values are clustered around the right tail of the curve while the left tail of the curve is longer. Whereas, in the case of a positively skewed frequency curve, most values are clustered around the left tail of the curve while the right tail of the curve is longer.

The coefficient of skewness measures the skewness of a distribution or frequency curve. It is based on the notion of the moment of the distribution curve. This coefficient is one of the measures of skewness, and it can be measured using any of the measures of central tendency. We will see how to calculate the coefficient of skewness using mean, median, mode, quantiles, and percentiles.

Coefficient of Skewness

For population

\beta_1 = \frac{\mu^2_3}{\mu^3_2}

Where \mu_2 and \mu_3 are the second and third central moments respectively. \beta_1 measures the magnitude only. To measure both mangitude as well as signs as positive (+) or negative (-). From the below equation, we can conclude that sign depends on \mu_3.

\gamma_1 = \pm \sqrt{ \beta_1} = \frac{\mu_3}{\sqrt{\mu^3_2}}

For sample

Let us consider we have n observations, x_1, x_2, x_3, .... x_n and the mean of n observations is denoted as \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i and the variance is denoted as  \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x}) ^2 . The skewness of the distribution curve is measure as the following:

\beta_{1s} = \dfrac{\left ( \dfrac{1}{n} \sum_{i=1}^n (x_i - \bar{x}) ^3 \right )^2 }{\left ( \dfrac{1}{n} \sum_{i=1}^n (x_i - \bar{x}) ^2 \right )^3}

\gamma_{1s} = \dfrac{\dfrac{1}{n} \sum_{i=1}^n (x_i - \bar{x}) ^3}{\left ( \dfrac{1}{n} \sum_{i=1}^n (x_i - \bar{x}) ^2 \right )^{\dfrac{3}{2}}}

Interpretations

  • If \gamma_1 or \gamma_{1s} = 0, it means the distribution is symmetric or in other words, it is normally distributed or zero skewed.
  • If \gamma_1 or \gamma_{1s} > 0, it means the distribution is positively skewed.
  • If \gamma_1 or \gamma_{1s} = 0, it means the distribution is negatively skewed.

We can easily see that whether the distribution is symmetric or not symmetric. If not symmetric, then seeing the value of \gamma_1 or \gamma_{1s}, we get the information about whether distribution is positively or negatively skewed. 

When the distribution is positively or negatively skewed, the measures of central tendencies such as mean, median, and mode would be different. We will study this in the next section.

Coefficient of Skewness using mean & median

To measure the coefficient of skewness for sample data using mean and median. Let us consider \bar{x} as mean, \bar{x}_{med} as median, and \sigma_x as standard deviation. The formula to compute the coefficient of skewness is defined as follows:

S_{sk} =  \dfrac{ 3 \left ( \bar{x} - \bar{x}_{med} \right ) }{ \sigma_x }

S_{sk} range is defined as:  -3 \leq S_{sk} \leq 3 and interpretation for positively, negatively, and zero skewness is the same as above. 

Coefficient of Skewness using mean & mode

To measure the coefficient of skewness for sample data using mean and mode. Let us consider \bar{x} as mean, \bar{x}_{mode} as mode, and \sigma_x as standard deviation. The formula to compute the coefficient of skewness is defined as follows:

S_{sk} =  \dfrac{ \bar{x} - \bar{x}_{mode} }{ \sigma_x }

S_{sk} range is defined as:  -3 \leq S_{sk} \leq 3 and interpretation for positively, negatively, and zero skewness is the same as above. 

Coefficient of Skewness using quartiles

To measure the coefficient of skewness for sample data using quartiles. Let us consider Q_1, Q_2, Q_3, Q_4 represents 25%, 50%, 75% and 100% of observations respectively. The formula to compute the coefficient of skewness is defined as follows:

S_{qsk} =  \dfrac{ (Q_3 - Q_2) - (Q_2 - Q_1)}{ (Q_3 - Q_2) + (Q_2 - Q_1)}

S_{qsk} range is defined as:  -3 \leq S_{qsk} \leq 3 and interpretation for positively, negatively, and zero skewness is the same as above. 

Coefficient of Skewness using percentiles

To measure the coefficient of skewness for sample data using percentiles. Let us consider P_{10}, P_{50}, P_{90} represents 10%, 50%, 90% of observations respectively. The formula to compute the coefficient of skewness is defined as follows:

S_{psk} =  \dfrac{ (P_{90} - P_{50}) - (P_{50} - P_{10})}{ (P_{90} - P_{50}) + (P_{50} - P_{10})}

S_{psk} range is defined as:  -3 \leq S_{psk} \leq 3 and interpretation for positively, negatively, and zero skewness is the same as above. 

Central Tendency Measures in  Negatively, Zero, Positively Skewed Curve

Central Tendency Measures in  Negatively, Zero, Positively Skewed Curve

In normal or zero skewness, the measures of central tendency techniques, such as mean, median, or mode, represent approximately the same value. However, for positively or negatively skewed distribution curves, the measures of central tendencies are dispersed and the relationship among the central tendencies are depicted in the above diagram of frequency curve. 

Positively skewed distribution

The mode is having the highest value, followed by median and mean for frequencies. Whereas, in terms of data value comparison, the mode will have the smaller value followed by median and mean in the increasing order. It follows the following order: Mode < Median < Mean.

Negatively skewed distribution

The mode is having the highest value, followed by median and mean for frequencies. In terms of data value comparison, the mean will have the smaller value followed by median and mode in the increasing order. it follows the following order: Mode > Median > Mean. 

If the frequency curve or distribution is symmetrical, the mean is the best measure of central tendency. However, if the frequency curve is skewed positively or negatively, the median will be a more accurate measure of central tendency.

Dealing with skewed curves

Statistical tests produce a great result when the frequency curve is normal or close to normal distribution. However, in the case of skewed frequency curves, a statistical test may produce a misleading result. There are various approaches to make the frequency curve close to normal, and one of the most popular techniques is the log transformation, which reduces the skewness of a frequency curve.

Kurtosis

It describes the peakedness or flatness of a frequency curve or distribution. The flatness means, how flat is the curve at the peak. 

Measures of shape - Kurtosis
Measures of shape – Kurtosis

In the above diagram, we can see three different flatness or peakedness. In other words, we can visually see the different distribution curves and compare their peakedness. However, the question is how to mathematically quantify and compare the peakedness or the flatness of the variables’ different frequency curves.

Comparision of Peakedness

Usually, the peakedness of a distribution curve or frequency curve is measured with respect to the peakedness of a normal distribution. It means if the curve is normally distributed, then the Kurtosis value will be zero. Whereas in other cases, it must have non-zero values.

Effectively we can say that Kurtosis examines the hump or flatness of the given frequency curve or distribution with respect to the hump or flatness of the normal distribution. Here shape of the hump of the normal distribution has been accepted as a standard.

  • Curves with a hump-like of normal distribution curve are called mesokurtic.
  • Curves with greater peakedness or less flatness than of normal distribution curve are called leptokurtic.
  • Curves with less peakedness or greater flatness than the normal distribution curve are called platykurtic.

Quantify the Peakedness using Coefficient of Kurtosis

We can quantify the peakedness using the coefficient of Kurtosis. However, there are different types of coefficient of Kurtosis, and one of them is known as Karl Pearson’s coefficient of Kurtosis. It is represented as follows:

For Population

\beta_2 = \frac{\mu_4}{\mu^2_2}

Where \mu_2 and \mu_4 are the second and fourth central moments respectively. \beta_2 measures the magnitude only. To measure both magnitudes as well as signs as positive (+) or negative (-). From the below equation, we can conclude that sign depends on the subtraction of \beta_2 with number three. Here number three is the \beta_2 value of normal distribution as the normal distribution peakedness is the reference for comparison.

\gamma_2 = \beta_2 - 3

For sample

Let us consider we have n observations, x_1, x_2, x_3, .... x_n and the arthematic mean of n observations is denoted as \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i and the variance is denoted as  \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x}) ^2 . The sample based coefficients of kurtosis are as follows:

\beta_{2s} = \dfrac{\left ( \dfrac{1}{n} \sum_{i=1}^n (x_i - \bar{x}) ^4 \right ) }{\left ( \dfrac{1}{n} \sum_{i=1}^n (x_i - \bar{x}) ^2 \right )^2}

\gamma_{2s} = \beta_{2s} – 3

Interpretations

  • For normal distribution or mesokurtic distirbution, \beta_2 or \beta_{2s} = 3 and  \gamma_2 or \gamma_{2s} = 0.
  • For leptokurtic distribution, \beta_2 or \beta_{2s} > 3, and \gamma_2 or \gamma_{2s}  > 0.
  • For platykurtic distirbution, \beta_2 or \beta_{2s} < 3 and \gamma_2 or \gamma_{2s} < 0.

Importance to study Kurtosis of the frequency curve

Leptokurtotic type of curves are having higher likelyhood of extreme values compared to mesokurtic. 

References

  1. Descriptive Statistic, By Prof. Shalabh, Dept. of Mathematics and Statistics, IIT Kanpur.

 1,844 total views,  1 views today

Scroll to Top
Scroll to Top
%d bloggers like this: