Descriptive Statistics – Measures of Variability based on specific values

In this note series of descriptive statistics, we have introduced the concept of variability or dispersion and why it is essential to understand the behavior of data. Moreover, this note will go into the details of dispersion or variability based on the specific values or partitioning values. These are the few of the measures based on the partitioning or specific values.

  • Range
  • Inter Quartile Range (IQR)
  • Quartile Deviation

Specific or partitioning value-based measures of variation

First, we will cover specific values and partitioning values based measures of variation, that including range, interquartile range, and quartile deviation.

Range

It is a difference between the maximum and minimum values of the data or observations. For example, suppose we have n observations, x_1, x_2, x_3, ... ,x_n so the range is calculated as follows:

 R = max(x_1,x_2,...x_n) - min(x_1,x_2,x_3....x_n)

Where, max () & min() are the functions that return maximum and minimum value from the observations.

Interpretation of Range: If a variable consisting of n observations has a higher range value, it has high variability. On the contrary, if a variable has a lower range value, it implies low variability. When the variablity is low it means observations are concentrated and less possibility of outliers whereas, if the variability is high, it means observations are spread and possibility of having outliers are higher than usual.

Whenever we measure and compare the variability between two variables, the measures should always be the same. Otherwise, it may produce misleading results. Suppose the variability measure tool is a range. In that case, range value should be generated and compared for both variables rather than the results generated from different tools such as Range and IQR or Range and Standard deviation.

Example to calculate range using Python

In the below example, we generate random numbers and use the NumPy amax() and amin() to find the maximum and minimum number and calculate the range by subtracting maximum minus minimum number.

import numpy as np

n_num = 10
start_num = 10
end_num = 100
x = np.random.randint(low=start_num,high=end_num,size=n_num)

print(x)
# Output: [28 50 29 86 24 90 79 53 77 11]

range_val = np.amax(x) - np.amin(x)
print(range_val)
# Output: 79

Interquartile Range (IQR)

In the notes on quantiles, we have seen that when we talk about quartile, the entire frequency distribution is divided into four equal parts Q1, Q2, Q3 and Q4.

  • Q1: The first quartile lower quartile, 25th percentile
  • Q2: The second quartile median, 50th percentile
  • Q3: The third quartile upper quartile, 75th percentile

IQR is defined as the difference between the 75th and 25th percentiles or equivalently 3rd and 1st quartiles. IQR = Q3 – Q1. We can see that it covers centre of the distribution and contains 50% of the observations.

During the decision-making: the dataset has a higher value of interquartile range has more variability. However, the lower value of the interquartile range is preferable.

Suppose we compare Range and Interquartile Range (IQR). Both try to measure the same aspect of the data, that is, the variability. However, they are doing differently.

Quartile Deviation

It is defined as the half defference between the 75th and 25th percentiles and it is same as half of Interquartile range.

qd = 1/2*IQR

Till now, we have seen that how to measure the variation in the data using various tools such as Range, Quartile Deviation and Interquartile Range.

  • The range’s measure depends on two values, that is, the minimum and maximum from the observations.
  • The measure of quartile deviation and the interquartile range are dependent on two values: the first quartile value and the third quartile value. 

These measures are based on the two values only at a time, either minimum and maximum or first and third quartiles. But there is another concept to measure the variation. It measures variation based on the individual deviation of the data from the central value (mean) or any other value. In this measure, rather than having two values, we consider all the individual data points from the central value and then combine all the deviation.

References

  1. Descriptive Statistic, By Prof. Shalabh, Dept. of Mathematics and Statistics, IIT Kanpur.

 172 total views,  1 views today

Scroll to Top
Scroll to Top
%d bloggers like this: