Descriptive Statistics – Variance of data – Coefficient of Variance

In the earlier notes, we have considered two aspects of data (also called a variable) separately. The first one measures the central tendency of data, and the second one is variation in data. These are a crucial part of the information that is contained inside the data. 

In this note, we will learn how to use these measures (arithmetic mean and variance) together and gather information about the data. However, individually either the mean or the variance may not be advisable or give us the correct information. One of the tools to measure the variation of data is called coefficient of variation.

Need for Coefficient of Variation

Suppose we have two datasets, the first data set contains the height of the student in centimeters (cm) and the second data set includes the weight of the student in kilograms (kg). The calculated mean and standard deviation will be different in scale for both datasets. We can’t compare these two datasets just by seeing the statistic such as mean and standard deviation.

More examples:

  • House rent is paid in Indian rupees in India and similarly in dollars in the USA
  • Salary received in Indian rupees in India and similarly in dollars in the USA

House rent is always some percentage of salary, and if we directly compare, House rent paid in India and the USA will not give the correct result. We may feel that House rent is too high in the USA but may not be valid if we compare these two with appropriate statistical tools, i.e., coefficient of variance.

Coefficient of Variation (CV)

The coefficient of variation measures the variability of a data set without reference to the scale or units of the data. It is very useful in comparing the results from two different surveys or tests in which the values are collected on different scales.

Suppose there are two data sets with sample means \bar{x_1} and \bar{x_2} and standard errors (without loss of generality, we can call it standard deviation) s_1 and s_1. The question is how to compare the statistic of two data sets as these data sets may be in two different scales, and associated means and standard errors are calculated on the same scale. The solution is coefficient of variation and it is defined as:

Coefficient of Variation for Sample Data

CV = \frac{s}{\bar{x}}

for sample dataset, where s is standard error and sample mean \bar{x}. The sample mean should always be greater than zero.

Coefficient of Variation for Population

CV = \frac{\sigma}{\mu}

for population, where \sigma is standard deviation and mean \mu. The mean should always be greater than zero.

The data with higher CV has more variability than the other. It is similar to the measures of variability where it says; higher the variance implies higher variability in the data from the mean.

The CV helps in comparing data sets on two completely different measurements. These variables are measured in different scales but their dimensionless CV enables the comparision of the variation of these variables.

Comparison of Variation of variables using Coefficient of Variation in Python

References

  1. Descriptive Statistic, By Prof. Shalabh, Dept. of Mathematics and Statistics, IIT Kanpur.

 252 total views,  1 views today

Scroll to Top
Scroll to Top