In this note, we will get familiarized with different data types primarily used in descriptive statistics. These are the foundation for understanding and selecting graphical and analytical tools for descriptive statistics.
The motivation to read the different data types is that we can’t apply all the available measures, such as measures of central tendency, measures of dispersion, measures of association, etc., on all the data types. These measures are particular to specific data types.
Whenever we try to conduct any statistical analysis, the first and foremost requirement is the objective, and the purpose is based on the research problem. Once the research problem is fixed, and the population of interest is identified, data is collected on the relative variable from a population, and these variables can be different types based on the interest.
Variables
Once a research problem is fixed, and the population of interest is identified, the observations are collected on statistical variables. Even any information of interest can be captured in such a variable.
These variables are different types, and also their existence depends on the type of statistical analysis tools or approaches—for example, in the probabilistic analysis, we frequently used these variables in the different context: random variable, continuous random variable, discrete random variable, etc.
In the statistical analysis, the number of variables can be one or more than one. When we analyze with one variable, we called it univariate analysis, and when the variables are more than one, we called it multivariate analysis.
Values on Variables
In statistics, variables are represented using capital letters (X, Y ..), and observations are collected on variables. These observations are the values stored inside the variable, and these values are represented using small letters (x,y,z..). For example,
- When X is gender, then it takes three values – Male, Female, Transgender.
- When X is a country in Asia, it takes India, Bangladesh, China, Thailand, Bhutan, etc.
- When X is an odd number, then it takes values – 1,3,5,7, … etc.
Example: Let variable X is the height of students, and suppose there are two students; their heights are 150 cms and 160 cms.
- X: Height of students
- x: values of height in centimeters (cm)
- x1 = 150, x2 = 160
Variable Types
There are two types of variables, one is quantitative (numerical) variables and the other is qualitative (categorical) variables. Under the quantitative variables, we have two types of variables, one is called discrete variable, and the other is called a continuous variable.
Quantitative Variables
It represents measurable quantities, and the values of these variables can be ordered in logical and natural ways.
Example:
- Size of shirts – 39,40,42, and so on.
- Per kilo prices of vegetables – Rs. 30, Rs. 35, Rs. 50, and so on.
- The number of colleges in a city – 5, 10, 15, and so on.
- Heights of children – 1.2 m, 1.23 m, 1.5 m, and so on.
We know that quantitative variables are two types, Discrete and Continuous variables.
Discrete Variable
These variables can take a finite number of values. For example, the number of children in a family – 0,1,2,3, etc. or number of branches of a school in a city – 0,1,2,3,4,5, and so on.
Continuous Variable
These variables can have an infinite number of values. For example, Lenght of a wire is 1.5 meters, 1.55 meters, 1.6 meters, and so on. So it can take an infinite range of numbers as a value.
Qualitative (Categorical) Variables
It represents measurable quantities, and the values of these variables can’t be ordered in logical and natural ways.
Example:
- Names of cities – Kanpur, Mumbai, Kolkata, etc.
- Colors of hair – Black, White, Brown, etc.
- Performance – Good, Excellent, Bad, etc.
- Tastes of Food – Sweet, Salty, Neutral, etc.
Usually, numbers are assigned to qualitative variables. For example, Names of cities (X), x1 = 1 = Kanpur, x2 = 2 = Mumbai, and so on.
Nominal variable
A nominal variable, is one that has two or more categories, but it does not maintain any ordering to the categories. For example, a binary variable (such as yes/no question) is a categorical variable having two categories (yes or no) and there is no intrinsic ordering to the categories.
For example:
- Hair color is also a categorical variable having a number of categories (blonde, brown, brunette, red, etc.) and again, there is no agreed way to order these from highest to lowest.
A purely nominal variable is one that simply allows you to assign categories but you cannot clearly order the categories. If the variable has a clear ordering, then that variable would be an ordinal variable.
Ordinal variable
An ordinal variable is a variable that keeps the position of something in a list.
- Two judges give ranks to a fashion model.
- Two persons give ranks to food prepared or their scores are ranked.
These observations are the ranks of two variables (two judges). For example, 1st, 2nd, 3rd positions, etc.
Interval variable
An interval variable is similar to an ordinal variable, except that the intervals between the values of the numerical variable are equally spaced.
Q&A
From this note on descriptive statistics – variable and data types, we can get answers to the following questions.
- What is the difference between quantitative data and qualitative data?
- What is descriptive statistics, and how is it different from inferential statistics?
- Why is it important for a data scientist to learn statistics?
Summary
In this note, we have seen the different aspects of descriptive statistics, including variables and the variable type and different types of data associated with the variables. Usually, values are categories into quantitative and qualitative types.
Under the quantitative variables, again, two types of data types or variables are available. The first is called a discrete variable, and the second one is called a continuous variable. The main characteristic of discrete and continuous variables is that discrete variables can possess only a finite set of values, whereas continuous variables can possess an infinite set of values.
Under the qualitative variables, again, three types of variables are available. The first is called a categorical or nominal variable. It possesses category type of values, like (Gender -> Male, Female, or Transgender), which means it has three categories. The second is called ordinal variable, it is similar to nominal, but it strictly maintains order or rank. The third is called an interval variable, which keeps data in the form of an interval or range.
References
- Descriptive Statistic, By Prof. Shalabh, Dept. of Mathematics and Statistics, IIT Kanpur.
751 total views, 1 views today