In the previous note, we have seen how the data science and statistics are related to each other. In this and subsequent notes, we will learn the fundamental underlying mathematics of data science. It includes the basics of probability theory and statistical inference. During demonostration and visualization of theories, we will use small dataset. However, dataset sizes are usually huge in actual data science, and it isn’t easy to perform mental computation and verification. So we will use Python and R programming language for the same.
This note series will create awareness and help you to select the appropriate statistical tools for data science, which give correct statistical inference to make accurate judgments. There are facts that people won’t believe the proposed hypothesis until our observations and decisions are based on the scientific approach with sufficient evidence.
Introduction to Statistics
It has become a rule rather than an exception that if we want to learn and know about any phenomenon or a process, we must collect data and learn from it. So, statistics is the science of learning from data. It is related to collecting data and extracting the hidden information by its descriptive analysis and drawing conclusions or inferences.
A statistical analysis mostly begins with a given set of data and uses different tools that describe, summarize, and analyze the data. However, if the data is not available, the statistical design of the experiment is appropriately used to generate data. At the end of the investigation, the data is described and summarized using descriptive statistics tools.
Inferential Statistics
After completing the experiment, data is described and summarized to draw a statistical conclusion using the tools of inferential statistics. The concept of chance is considered and utilized to conclude the data. However, some assumptions are made about the chances/probabilities of obtaining the different data values. These assumptions are referred to as the probability model for the data.
A careful description and presentation of the data enable us to infer an appropriate probability model for a given data set which can be verified using the additional data. The tools of statistical inference lay the foundation for formulating a probability model to describe the data. Thus an understanding of statistical inference data to make valid inferences requires knowledge of the theory of probability.
Elements of Probability
The Probability of an event is subjected to various meanings or interpretations depending upon how one says it. For example, if a medical doctor says that there are 70% chances that the patient will be cured, they give an intuitive idea about the success of the treatment. One can only say that the doctor feels that the patients have recovered over the long run in 70% of such ailments.
Probability measures uncertainty. There are two types of interpretation of Probability: frequency interpretation and subjective interpretation of Probability.
Frequency Interpretation of Probability
The Probability of a given outcome of an experiment indicates a property of that outcome, and such a property can be determined by continual repetition of the experiment. A popular interpretation of Probability is as follows: The Probability of the outcome is observed as the proportion of the experiments that result in the outcome.
Subjective interpretation of Probability
In the subjective interpretation, the Probability of an outcome is not thought of as being a property of the outcome. Rather it is considered as a statement about the beliefs of the person who is quoting the Probability. And we know that Probability is about the chances of occurrence of the outcome. It means, Probability becomes a subjective or personal concept and has no meaning outside of expressing one’s degree of belief. Decision-makers often favor this interpretation of Probability.
Interpretation of Probability
Irrespective of frequency or subjective interpretation of Probability, practically there is a consensus that the mathematics of Probability is the same in either case. For example, if we say that the Probability of raining tomorrow is 0.8, then we feel that 80% chances are there for the rain tomorrow and the expected weather will be cloudy. Similarly, when 20% chances are there for the not raining tomorrow and the expected weather will not be clouded.
Q&A
From this note, we can get answers to the following questions.
- What is Descriptive Statistics, and how is it different from Inferential Statistics?
- Why is it important for a Data Scientist to learn Statistics?
- What is Data Science?
References
- Essentials of Data Science With R Software – 1: Probability and Statistical Inference, By Prof. Shalabh, Dept. of Mathematics and Statistics, IIT Kanpur.
CITE THIS AS:
“Introduction to Probability and Statistical Inference for Data Science” From NotePub.io – Publish & Share Note! https://notepub.io/notes/mathematics/statistics/statistical-inference-for-data-science/introduction/
8,174 total views, 1 views today