In this note series on Probability and Statistical Inference, we have already seen the importance of probability distributions and their associated probability functions for discrete random variables. In addition, we have learned to resemble a natural random phenomenon with these probability distributions. These distributions were Degenerate distribution, Uniform distribution, Bernoulli distribution, Binomial distribution, Poisson distribution, and Geometric distribution.
This note will cover probability distributions and their associated probability functions for continuous random variables. We have already covered Continuous Uniform Distribution, and now we will explore Normal Distribution, a most widely used probability density function to model the random phenomenons. It has enormous applications in Data Science.
Normal Distribution
The normal distribution is one of the essential distributions used in statistics. It is also called Gaussian distribution. The most widely used model for the distribution of a random variable is a normal distribution.
A random variable X is said to follow a normal distribution with parameters and if its Probability Density Function (PDF) is given by:
,
The mean and variance of X are:
The value of E(X) determines the centre of the probability density function, and the value of Var(X) determines the width. It is also represented as .
Properties of Normal Distribution
- The density of normal distribution has its maximum at .
- The density of the curve is symmetric and bell-shaped.
- The inflexion points of the density are at and
- A lower indicates a higher concentration around the mean . It means lower variance and data is concentrated towards the mean.
- A higher indicates a flatter density. It means higher variance and data points are distributed across the distribution.
- Two normal distributions having the same mean value do not tell about the variance, and similarly, having the same variance of two normal distributions does not speak about mean values. These two characteristics are different and represent two different properties of a density curve.
- Normal distribution probabilities are associated with the 68-95-99.7 rule. It means, 68% of the data is within one standard deviation of the mean . Similarly, 95% of the data is within two standard deviation of the mean and 99.7% of the data is within three standard deviation of the mean .
Cumulative Distribution Function
The cumulative distribution function of is:
which is often denoted as .
There is no explicit formula to solve the integral. It has to be solved by numerical or computational methods. This is why CDF tables are presented in almost all statistical textbooks.
Standard Normal Distribution
If and , then X is said to follow a standard normal distribution. The PDF of a standard normal distribution is given by:
,
Important results:
If X is normally distributed with mean and variance , then for any constants a and b != 0, the random variable Y which is equal to Y = a + bX is also a normally distributed with parameters as follows:
If X is normally distributed with mean and variance , let Z is another random variable as follows:
,
It has a standard normal distribution N(0,1) called a Z transformation. This result helps us to find different probability statements about X in terms of probabilities for Z. In simple words; a normally distributed random variable becomes a standard normal distributed random variable when we perform Z transformation.
Z Transformation
It is a process of standardization that allows for the comparison of scores from disparate distributions. Using a distribution’s mean and standard deviation, z transformations convert separate distributions into a standardized distribution, allowing for the comparison of dissimilar metrics.
- The standardized distribution is made up of z scores, hence the term z transformation.
- Z scores are a special type of standard score in which each unit represents one standard deviation from the mean.
- Z scores always have a distribution with a mean value of 0 and a standard deviation of 1.
Important results:
If X is standard normal distribution with mean and variance , then Cumulative Distribution Function (CDF) can be computed as follows:
After Z transformation, finding CDF becomes easy as (mean) and (variance) values are fixed and the below equations illustrated these things.
- where is the pdf of
- , total area under the whole curve is 1.
Normal Distribution – Examples
Example 1: An apple farmer sells the apples in boxes. The weights of the boxes vary and are assumed to be normally distributed with = 20 kg and . The farmer wants to avoid customers being unsatisfied because the boxes are too low in weight.
Therefore the farmer wants to know the probability that a box with a weight of less than 18 kg is sold.
Solution: This can be obtained by computing = 0.158
from scipy.stats import norm norm(loc=20,scale=2).cdf(18)
which is equal to 15%.
Sample Random Variables from Normal Distributution
Consider a random sample and these independent and identically distributed random variables with distribution. Then,
Arithmetic mean:
=
Expectation:
Variance:
where
References
- Essentials of Data Science With R Software – 1: Probability and Statistical Inference, By Prof. Shalabh, Dept. of Mathematics and Statistics, IIT Kanpur.
CITE THIS AS:
“Probability and Statistical Inference – Introduction to Normal Distribution” From NotePub.io – Publish & Share Note! https://notepub.io/notes/mathematics/statistics/statistical-inference-for-data-science/normal-distribution/
22,598 total views, 1 views today