Essentials of Data Science – Probability and Statistical Inference – Geometric Distribution

In this note series on Probability and Statistical Inference, we have seen the importance of probability distributions such as Bernoulli distribution, Binomial distribution, and Poisson distribution, and how these distributions resemble a real random phenomenon.

This note will cover the basic intuition behind Geometric distribution, expectation, variance, and other quantitative measures to characterize the geometric random variable. Further, we will also cover various similar phenomena following Geometric distribution.
Using the probability functions of Geometric distribution, we can compute expectation, variance, and other quantitative measures of similar phenomenons.

Condition of a Geometric Distribution

Consider a random experiment closely related to the one used in the definition of a binomial distribution. Again, assume a series of Bernoulli trials (independent trials with a constant probability p of success on each trial). However, trials are conducted until success is obtained rather than a fixed number of trials.

Example 1: Suppose a situation of drawing a lottery ticket in a draw—every draw results in a win or loss. Let us consider a condition such that we want to know how many lottery tickets are needed to buy until we win.

It is similar to determining how many independent Bernoulli trials are needed until the event of interest occurs for the first time. The outcomes of the Bernoulli trials are win or lose. 

Example 2:  Suppose the drugs are being tried in a clinical trial to treat the disease successfully. Let us consider that we want to know how many different drugs are to be tried to tackle the disease successfully.

Geometric Distribution

The geometric distribution can be used to determine the probability that the event of interest happens at the k^{th} trial for the first time.

A discrete random variable X is said to follow a geometric distribution with parameter p if its probability mass function (PMF) is given by:

P(X = k)  = p(1-p)^{k-1}, k = 1,2,3, \cdots,

The mean and variance of a Geometric random variable are given by:

  • E(X) = \frac{1}{p} - 1
  • Var(X) = \frac{1}{p} \left ( \frac{1}{p} - 1 \right )

Example 3: Suppose the probability that a machine produces a faulty transistor is 0.1 while assuming that the production of transistors is an independent event, and let the random variable X denote the number of transistors produced until the first error. 

Then, P(X = 5) is the probability that the first four transistors are produced correctly, and the fifth transistor has an error. This event can be denoted as {GGGGD}, where G denotes a good transistor and D denotes a defective transistor.

As the production of transistors is independent and the probability of a correct faulty transistor is 0.1 and a correct transistor is 0.9.

 \begin{aligned}P(X = 5) &= P(GGGGD) \\ &= P(G) P(G) P(G) P(G) P(D) \\ &= 0.9^4 0.1 \\ &= 0.066 \end{aligned}

Example 5:  Suppose a coin is tossed until the Head is obtained for the first time. The probability of getting a Head is p = 0.5 for each toss.

  • P(X= 1) = 0.5
  • P(X = 2) = 0.5 * (1 – 0.5) = 0.25
  • P(X = 3) = 0.5 * (1 – 0.5)(1 – 0.5) = 0.125

The mean and variance are:

  • E(X) = \frac{1}{0.5} = 2
  • Var(X) = \frac{1}{0.5} \left( \frac{1}{0.5} - 1 \right) = 2

Example 6: An urn contains N white and M black balls. Balls are selected randomly one at a time until a black one is obtained. It has been assumed that each selected ball is replaced before the next one is drawn. We want to find the probability that:

To find exactly n draws are needed as follows:

Let X is a random variable such that the number of draws needed to select a black ball.

Here  p = \frac{M}{M + N} and (1-p) = \frac{N}{M + N} so probability of exactly n draws is obtained as:

 \begin{aligned}P(X = n) &= \left ( \frac{N}{M+N} \right) ^ {n - 1} \left ( \frac{M}{M+N} \right) \\ &= \frac{MN^{n-1}}{\left ( M + N \right)^n} \end{aligned}

To find at least k draws are needed as follows:

Let X is a random variable such that the number of draws needed to select a black ball.

Here  p = \frac{M}{M + N} and (1-p) = \frac{N}{M + N} so probability of at least k draws are needed to obtained a black ball are as follows:

P(X \geq k) = p (1 - p)^{k-1} = \left( \frac{N}{M+N} \right) ^ {k -1}

Properties of Geometric Distribution

Lack of memory property: A geometric random variable is defined as the number of trials until the first success. However, because the trials are independent, the count of the number of trials until the next success can be started at any trial without changing the probability distribution of a random variable.

In other words, the memoryless property means that a given probability distribution is independent of its history. The probability of something happening in the future has not related to whether it has happened in the past or not. The history of the occurrence of an event is irrelevant in the future.

For example, in the production of transistors example, 100 transistors are produced without any defect and suppose the 105th transistor is defective. So the first error occurs on the 105th transistor.

Thus the probability that the next six outcomes after the 100th transistor, i.e., for the event denoted as GGGGGD is:

 \begin{aligned}P(X = 5) &= P(GGGGD) \\ &= P(G) P(G) P(G) P(G) P(D) \\ &= 0.9^4 0.1 \\ &= 0.066 \end{aligned}

where p = 0.1 and (1-p) = 0.9

This probability is identical to the probability that the initial error occurs on the 5th transistor. Thus, the implication of using a geometric model is that the system presumably will not wear out and the probability of an error remains constant for all transistors. In this sense, the geometric distribution is said to lack any memory.

References

  1. Essentials of Data Science With R Software – 1: Probability and Statistical Inference, By Prof. Shalabh, Dept. of Mathematics and Statistics, IIT Kanpur.

CITE THIS AS:

“Probability and Statistical Inference – Introduction to Geometric Probability Distribution”  From NotePub.io – Publish & Share Note! https://notepub.io/notes/mathematics/statistics/statistical-inference-for-data-science/geometric-distribution/

 12,361 total views,  1 views today

Scroll to Top
Scroll to Top
%d bloggers like this: