Essentials of Data Science – Probability and Statistical Inference – Bayes’ Theorem

In the previous note on probability and statistical inference, we have seen axioms of probability, counting principles, conditional probability, and multiplication theorem of probability. In this note, we will learn and understand the Bayes theorem. Bayes theorem is the foundation for Bayesian analysis.

Whenever we try to compute the probability, there are two possibilities. Either an event will happen or already happened. The basic intuition is that, from the past experiences of an event, we may predict something about the future event. It is a natural phenomenon, and we make these kinds of predictions in our day-to-day life.

For example, suppose we want to go to the mall from home for shopping, and directly or indirectly, we usually estimate the time based on our past experiences. These kind of information that are used to predict about the future event is called aprior information.

Law of Total Probability

Let us assume that we have A_1, A_2, A_3, A_4, \cdots , A_m events associated with a sample space \Omega, such that:

  • A_1 \cup A_2 \cup A_3 \cup A_4, \cdots, \cup A_m = \Omega ,
  • A_i \cap A_j = \emptyset (pairwise disjoint) for all i \neq j = 1,2,3,4, \cdots, m,
  • P(A_i) > 0 for all i

then the probability of an event B which is also associated to \Omega, can be calculated as:

P(B) = \sum_{i=1}^m P(B|A_i)P(A_i)

In other words, the basic idea behind the law of total probability is that if you have a family of disjoint events that cover the entire sample space, then for any event B we have:

P(B) = P(B|A_1)P(A_1) + P(B|A_2)P(A_2) + P(B|A_3)P(A_3) + \cdots + P(B|A_m)P(A_m)

The basic intuition is that we break the main experiment of the problem into a sequence of sub-experiments and compute the probabilities of each sub-experiment and then put them together using the Law of Total Probability to get the answer. These kinds of multi-stage experiments can be easily described using a probability tree.

Bayes’ Theorem

Bayes’ Theorem gives a connection between P(A|B) and P(B|A) where two events, A and B with P(A) > 0 and P(B) > 0, we get

P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{P(A \cap B) P(A)}{P(A) P(B)} = \frac{P(B|A)P(A)}{P(B)}

For m events, A_1, A_2, A_3, \cdots , A_m such that,

  • A_1 \cup A_2 \cup A_3 \cup A_4, \cdots, \cup A_m = \Omega
  • A_i \cap A_j = \emptyset (pairwise disjoint) for all i \neq j = 1,2,3,4, \cdots, m,
  • P(A_i) > 0 for all i
  • B is another event than A, then

 P(A_j|B) = \frac{P(B|A_j)P(A_j)}{\sum_{i=1}^m P(B|A_i)P(A_i)}

is known as Bayes’ formula. Where,

  • P(A_i): Prior probabilities
  • P(A_j|B): Posterior probabilities
  • P(B|A_i): Model probabilities (or Likelihood)

Example 1:

Consider an example to understand the importance of prior probabilities and Bayes’ theorem. Suppose a blood test for checking the presence/absence of a rare disease is developed with following probabilities: 

Solution:

Let us consider, two events, A and D. A is defined as outcome of test is positive and D is defined as person has disease.

Suppose, P(a person doesn’t have a disease, and the test is negative) = P(A | D), is given as 0.999 and P(Person dont has disease and test is negative) = P(\bar{A} | \bar{D}), is given as 0.999. It seems to be a good test for a naive person.

Example 2:

Suppose someone rents books from two different libraries. It has been observed that, sometimes the book is defective due to missing pages. Our utlimate goal is to find the probability that the rented book from the library is not defective.

Let us consider the following events:

  • A_i is an event where the value of i = (1,2), it means, the book is issued from library i. As in the given problem, there are only two libraries.
  • B is an event, which tells that book is available and is not defective.

Given prior probabilities:

  • P(A_1) = 0.6
  • P(A_2) = 0.4

Given model probabilities or likelihoods:

  • P(B|A_1) = 0.95
  • P(B|A_2) = 0.75

According to law of total probability: B = P(B|A_1)P(A_1) + P(B|A_2)P(A_2) = 0.6×0.95 + 0.4×0.75 = 0.87.

Suppose we want to find the posteriar probability such that we received a non-defective book, what is the probability that book is rented from A_1 ? This is obtained as follows:

P(A_1|B) = \frac{P(A_1 \cap B)}{P(B)} = \frac{P(A_1 \cap B) \times P(A_1)}{P((B)} = \frac{P(B|A_1) \times P(A_1)}{P(B)} = \frac{0.57}{0.87} = 0.6552

Q&A

From this note, we can get answers to the following questions.

References

  1. Essentials of Data Science With R Software – 1: Probability and Statistical Inference, By Prof. Shalabh, Dept. of Mathematics and Statistics, IIT Kanpur.

CITE THIS AS:

“Bayes Theorem for Data Science”  From NotePub.io – Publish & Share Note! https://notepub.io/notes/mathematics/statistics/statistical-inference-for-data-science/bayes-theorem/

 15,694 total views,  1 views today

Scroll to Top
Scroll to Top
%d bloggers like this: