Essentials of Data Science – Probability and Statistical Inference

In the previous note on probability and statistical inference, we have seen axioms of probability, counting principles, conditional probability, and multiplication theorem of probability. In this note, we will learn and understand the Bayes theorem. Bayes theorem is the foundation for Bayesian analysis.

Whenever we try to compute the probability, there are two possibilities. Either an event will happen or already happened. The basic intuition is that, from the past experiences of an event, we may predict something about the future event. It is a natural phenomenon, and we make these kinds of predictions in our day-to-day life.

For example, suppose we want to go to the mall from home for shopping, and directly or indirectly, we usually estimate the time based on our past experiences. These kind of information that are used to predict about the future event is called aprior information.

Table of Contents hide

1 Law of Total Probability

Law of Total Probability

Let us assume that we have $A_1, A_2, A_3, A_4, \cdots , A_m$ events associated with a sample space $\Omega$ , such that:

$A_1 \cup A_2 \cup A_3 \cup A_4, \cdots, \cup A_m = \Omega$ ,
$A_i \cap A_j = \emptyset$ (pairwise disjoint) for all $i \neq j = 1,2,3,4, \cdots, m$ ,
$P(A_i) > 0$ for all i

then the probability of an event B which is also associated to $\Omega$ , can be calculated as:

$P(B) = \sum_{i=1}^m P(B|A_i)P(A_i)$

In other words, the basic idea behind the law of total probability is that if you have a family of disjoint events that cover the entire sample space, then for any event B we have:

$P(B) = P(B|A_1)P(A_1) + P(B|A_2)P(A_2) + P(B|A_3)P(A_3) + \cdots + P(B|A_m)P(A_m)$

The basic intuition is that we break the main experiment of the problem into a sequence of sub-experiments and compute the probabilities of each sub-experiment and then put them together using the Law of Total Probability to get the answer. These kinds of multi-stage experiments can be easily described using a probability tree.

Bayes’ Theorem

Bayes’ Theorem gives a connection between $P(A|B)$ and $P(B|A)$ where two events, A and B with P(A) > 0 and P(B) > 0, we get

$P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{P(A \cap B) P(A)}{P(A) P(B)} = \frac{P(B|A)P(A)}{P(B)}$

For m events, $A_1, A_2, A_3, \cdots , A_m$ such that,

$A_1 \cup A_2 \cup A_3 \cup A_4, \cdots, \cup A_m = \Omega$
$A_i \cap A_j = \emptyset$ (pairwise disjoint) for all $i \neq j = 1,2,3,4, \cdots, m$ ,
$P(A_i) > 0$ for all i
B is another event than A, then

$P(A_j|B) = \frac{P(B|A_j)P(A_j)}{\sum_{i=1}^m P(B|A_i)P(A_i)}$

is known as Bayes’ formula. Where,

$P(A_i)$ : Prior probabilities
$P(A_j|B)$ : Posterior probabilities
$P(B|A_i)$ : Model probabilities (or Likelihood)

Example 1:

Consider an example to understand the importance of prior probabilities and Bayes’ theorem. Suppose a blood test for checking the presence/absence of a rare disease is developed with following probabilities:

Solution:

Let us consider, two events, A and D. A is defined as outcome of test is positive and D is defined as person has disease.

Suppose, P(a person doesn’t have a disease, and the test is negative) = $P(A | D)$ , is given as 0.999 and P(Person dont has disease and test is negative) = $P(\bar{A} | \bar{D})$ , is given as 0.999. It seems to be a good test for a naive person.

Example 2:

Suppose someone rents books from two different libraries. It has been observed that, sometimes the book is defective due to missing pages. Our utlimate goal is to find the probability that the rented book from the library is not defective.

Let us consider the following events:

$A_i$ is an event where the value of i = (1,2), it means, the book is issued from library i. As in the given problem, there are only two libraries.
B is an event, which tells that book is available and is not defective.

Given prior probabilities:

$P(A_1)$ = 0.6
$P(A_2)$ = 0.4

Given model probabilities or likelihoods:

$P(B|A_1)$ = 0.95
$P(B|A_2)$ = 0.75

According to law of total probability: B = $P(B|A_1)P(A_1) + P(B|A_2)P(A_2)$ = 0.6×0.95 + 0.4×0.75 = 0.87.

Suppose we want to find the posteriar probability such that we received a non-defective book, what is the probability that book is rented from $A_1$ ? This is obtained as follows:

$P(A_1|B) = \frac{P(A_1 \cap B)}{P(B)} = \frac{P(A_1 \cap B) \times P(A_1)}{P((B)} = \frac{P(B|A_1) \times P(A_1)}{P(B)} = \frac{0.57}{0.87} = 0.6552$

Q&A

From this note, we can get answers to the following questions.

What is the difference between pairwise disjoint and disjoint in probability theory?
What is the difference between prior and posterior probabilities?

References

Essentials of Data Science With R Software – 1: Probability and Statistical Inference, By Prof. Shalabh, Dept. of Mathematics and Statistics, IIT Kanpur.

CITE THIS AS:

“Bayes Theorem for Data Science” From NotePub.io – Publish & Share Note! https://notepub.io/notes/mathematics/statistics/statistical-inference-for-data-science/bayes-theorem/

15,694 total views, 1 views today

Essentials of Data Science – Probability and Statistical Inference – Bayes’ Theorem

Law of Total Probability

Bayes’ Theorem

Q&A

References

Like this:

NotePub

Indranagar,
Bangalore - 560038, Karnataka, India

Write Us: [email protected]

Essentials

About Us

Contact Us

Private Policy

Copyright Policy

Assets

Notes

Articles

Questions

Projects

Law of Total Probability

Bayes’ Theorem

Q&A

References

Share this:

Like this:

NotePub

Indranagar, Bangalore - 560038, Karnataka, India

Write Us: [email protected]

Essentials

About Us

Contact Us

Private Policy

Copyright Policy

Assets

Notes

Articles

Questions

Projects

Indranagar,
Bangalore - 560038, Karnataka, India