In the previous note on Descriptive Statistics – Variables and Types of Data, we have seen different data types and variables to access the data. And how the variables are associated with other data.
In this note, we will learn further with the assumption that there are different statistical variables, and these are filled with observations from the identified population or samples. Whenever we have the data, our first objective is to extract meaningful information from it. To do that, first, we need to rearrange the data in some required format.
Suppose we have millions of observations of the height of dogs and seeing those observations, we can’t say anything as data can’t speak by itself. In this case, we need to rearrange the observations into different groups or classes according to resemblance and similarities. And later, using statistical tools, we can extract information like the average height of the dogs and many more.
Classification of Data
It is a process of arranging the data into groups or classes according to resemblance and similarities, which means, data will be classified into different groups based on the different aspects. Now the question comes why do we make this classification. The classification is made for the following reasons:
Functions of Classification
- Condenses the data – Grouping of data based on the different aspects, types of quantities, or different natures. At this stage, data are grouped based on some parameters.
- Facilitates comparisons – Once data are grouped or condensed, we are very interested in studying the relationship. Sometimes, we called it statistical modeling to find the relationship between input and output variables. At this stage, the data is ready for comparisons.
- Helps in studying relationships – Using statistical tools such as descriptive statistics, we learn the parameters of the variables or the relationships between the variables.
- Facilitates statistical treatment of the data
The frequency is the number of times a particular data point occurs in the set of data. A frequency distribution is a table that lists each data point and its frequency. The relative frequency is the frequency of a data point expressed as a percentage of the total number of data points.
Absolute & Relative Frequencies
The frequency is the number of occurrences of an outcome in the given sample, and there are mainly two types of frequencies named Absolute and Relative frequencies.
Absolute frequency the number of observations in a particular category.
For example, suppose 10 persons participated in a test, and their results were declared in two categories as Pass (P) and Fail (F). The result are {P,F,P,F,F,P,P,F,P,P}. Let us use a1 and a2 to refer to Pass and Fail categories. So the absolute frequencies of a1 and a2 are referred to as n1, and n2 respectively.
- 6 persons passed, so n1 = 6.
- 4 persons failed, so n2 = 4.
The n1 and n2 are simply trying to present the number of units present in the category.
Relative frequency is the ratio of the number of times a value of the data occurs in all outcomes to the total number of outcomes. For example, The relative frequency of a1 and a2 are as follows:
- a1 = n1/(n1+n2) = 6/10 = .6 or 60%
- a2 = n2/(n1+n2) = 4/10 = .4 or 40%
Mainly, it gives us information about the proportions of Pass and Fails person in the test.
The following code snippet calculates the absolute and relative frequency in python using the above example.
import pandas as pd # P,F,P,F,F,P,P,F,P,P result = [1,0,1,0,0,1,1,0,1,1] # function value_counts() counts the number of occurrences of particular observations. # Method to find absolute frequency df = pd.Series(result).value_counts() print(df) # Method to find relative frequency df = pd.Series(result).value_counts() print(df / len(result))
Frequency Distribution
The arrangement of ungrouped data in the form of a group is called the frequency distribution of data. It classifies the data into different classes by dividing the entire range of variables into a suitable number of groups called classes.
A frequency distribution is an overview of all distinct values in some variable and the number of times they occur. A frequency distribution tells how frequencies are distributed overvalues. It is one of the ways to represent the data to extract meaningful information from it.
The lower and upper boundary figures of a class are called the lower limit and upper limit, respectively, and the difference between the limits is called the width of the class or class interval. The value of variate lies in the middle of lower and upper limits.
- The number of observations in a particular class is called absolute frequency.
- The number of observations in a particular class divided by total frequency is called relative frequency.
- The cumulative frequency corresponding to any variate value is the number of observations less than or equal to that value.
- The cumulative frequency corresponding to a class is the total number of observations less than or equal to the upper limit of the class.
Example 1:
Suppose the following are the time taken (in seconds) by 20 participants in a race. { 32, 35, 45, 74, 55, 68, 38, 35, 55, 66, 65, 42, 68, 72, 84, 67, 36, 42, 58} and the data is summarized in class intervals, 30-41, 41-50, 51-60, 61-70, 71-80, and 81-90.
- Lower boundary = 32
- Upper boundary = 84
- Width or Range = 10
Class Intervals | MidPoint | Absolute Frequency | Relative Frequency | Cumulative Frequency |
30-41 | 35.5 | 5 | 5/20 = 0.25 | 5 |
41-50 | 45.5 | 3 | 3/20 = 0.15 | 5+3 = 8 |
51-60 | 55.5 | 3 | 3/20 = 0.15 | 8+3 = 11 |
61-70 | 65.5 | 5 | 5/20 = 0.25 | 11+5 = 16 |
71-80 | 75.5 | 2 | 2/20 = 0.01 | 16+2 = 18 |
81-90 | 85.5 | 2 | 2/20 = 0.01 | 18+2 = 20 |
20 | 1 |
Q&A
From this note on introduction to description statistics, we can get answers to the following questions.
- What is a distribution in statistics?
- What is a frequency distribution?
- What is the difference between absolute and relative frequency distribution?
References
- Descriptive Statistic, By Prof. Shalabh, Dept. of Mathematics and Statistics, IIT Kanpur.
- https://www.asu.edu/courses/mat142ej/readings/Statistics.pdf
620 total views, 1 views today