Descriptive Statistics – Measures of Variability based on absolute deviation

We have covered what variability is and how to measure variability using specific values or partitioning values in the earlier note. Moreover, in this note, we will start measuring variation or dispersion based on the deviation.

Deviation based measures of variation

We need a tool that can measure the deviation of every observation around any given value. However, we mainly consider the mean and measure around the mean value. This approach is called deviation-based measures of variation.

Suppose the deviation of any observation x_i from any value A is measured as d_i = (x_i - A). It gives the difference between A and x_i, and the difference value d_i lies in the following range:

  • if x_i > A , then such deviations d_i are positive.
  • if x_i < A , then such deviations d_i are negative.
  • if x_i = A , then such deviations d_i are zero.

We can’t conclude by seeing all these d_i‘s values. So a better way is to summarize it to a single quantity. If we consider the average of these deviations, d_i‘s, then the average value:

Deviation = \frac {1}{n} \sum_{i=1}^n {d_i}

where d_1, d_2,....,d_n are the differences between A and respective x_1, x_2, ...., x_n values.

The resultant average value may be close to zero and reflect no variation or a slight variation, which may be incorrect. So we need to consider only the magnitudes of the deviations while dropping the signs.

The main reason to consider the magnitude and dropping the direction is that, during averaging, direction values may cancel out, and as a consequence, it produces an incorrect result.

There are two ways to achieve it. Either we take absolute value or make the negative values to be squared. And based on these two aspects, we have two types of measures. The first is Absolute Deviation, and the second is Variance.

Absolute Deviation

There are discrete (ungrouped) and continuous (grouped) variable types of datasets. In discrete variables, we try to use the observations as such. But, in the case of a continuous variable, we try to group them based on the class intervals, convert the data into a frequency table, and extract mid-values of the class intervals and the corresponding frequency to construct the statistical measures.

Absolute Deviation for Discrete Data

Suppose we have n observations, x_1, x_2, ...., x_n on a variable X. To calculate the absolute deviation of all the observations, we need a reference value (A), and it  can be any value from the observation or derived value. In general, it looks like as follows:

 | x_i - A | , for all n observations

While doing that, we ignore negative signs and only takes positive difference value. For example, absolute value of | 5 – 10 |  and |10 – 5 | is 5 only. In the end, we will sum of all absolute values and divided by number of observations. It is represented as follows:

Absolute deviation = \frac {1}{n} \sum_{i=1}^n {| x_i - A |}

Absolute Deviation for Continous Data

Suppose we have observations on a variable X and having k class intervals such as  e_1 - e_2, e_2 - e_3, .... e_{k-1} - e_k in a frequency table. The midpoint value is obtained for each interval is as follows:

 m_i = \frac{e_i + e_j}{2} , where i < j

and associated absolute frequency is f_i for the class interval  e_i - e_j . The f_i represents a number of observations belong to the class interval  e_i - e_j . The sum of all the absolute frequencies must be n = \sum_{i=1}^k {f_i}.

Absolute deviation = \frac {1}{n} \sum_{i=1}^k {f_i \times | m_i - A |}

Average Absolute Deviation

In the absolute deviation, we have discussed that we can calculate the absolute deviation from any value A. However, when we take any value equal to any one of the measures of central tendencies, such as meanmedian, or mode, it is called mean absolute deviation, median absolute deviation, and mode absolute deviation, respectively.

It is usually seen that the median absolute deviation is less than or equal to the mean absolute deviation and even the absolute deviation from any value.

In the note of measures of central tendency, we had discussed how to measure mean, median, and mode for both discrete and continuous datasets. So while calculating, the same formula must use; otherwise, the whole calculation will product the wrong result.

To compute average absolute deviation, we need continuous or discrete values. Instead of creating dummy data, we will use the tips dataset and calculate the average absolute deviation using the Python programming language. Also, we will analyze the average absolute deviation by considering the measures of central tendency using mean, median, and mode on the same variable.

Mean Absolute Deviation

Suppose we have n observations, x_1, x_2, ...., x_n on a discrete or ungrouped variable X. The compution of the mean absolute deviation are as follows:

  • \bar{x}  = \frac {1}{n} \sum_{i=1}^n {x_i}, it is a sample mean, and from the sample mean, we will find the deviations from all the observations such as,  |x_i - \bar{x} | .
  • Mean Absolute Deviation = \frac {1}{n} \sum_{i=1}^n { | x_i - \bar{x} |}

Median Absolute Deviation

Suppose we have n observations, x_1, x_2, ...., x_n on a discrete or ungrouped variable X. The compution of the median absolute deviation are as follows:

  • Let us consider \bar{X}  as a median of the observations. To know how to compute median, kindly refer to measures of central tendency using median note.
  • Now we will find the deviations of all the observations from the median value. This is performed as follows:  | x_i - \bar{X}| for all the observations. Once we have all the deviations values, we will again find the median of computed values.
  • Median Absolute Deviation = Median(| x_i - \bar{X} |), where i = 0 to n deviations values from the median. 

References

  1. Descriptive Statistic, By Prof. Shalabh, Dept. of Mathematics and Statistics, IIT Kanpur.
  2. https://en.wikipedia.org/wiki/Average_absolute_deviation

 108 total views,  1 views today

Scroll to Top
Scroll to Top
%d bloggers like this: