Descriptive Statistics – Measures of association – Pearson's Chi-Squared Statistics

In the earlier note on the contingency table of descriptive statistics, we have seen how to create a contingency table and interpret the relationship between two variables in terms of marginal, and conditional frequency distributions, which can be obtained using both absolute and relative frequencies.

In this note, we will learn a few statistical tools that use a contingency table and show the association between two categorical (counting variables). These are Chi-Squared statistics, Cramer’s V statistics, and Contingency Coefficient. These quantify the degree of association between variables similar to the correlation coefficient for continuous variables, the rank correlation coefficient for ordinal variables, or rank data.

Table of Contents hide

1 Pearson’s Chi-Squared Statistics

1.1 Interpretation of Pearson’s Chi-Squared Statistics

2 Cramer’s V Statistics

2.1 Interpretation of Cramer’s V Statistics

3 Contingency Coefficient

3.1 Interpretation of Contingency Coefficient Statistics

4 References

4.1 Share this:

4.2 Like this:

Pearson’s Chi-Squared Statistics

It is used to measure the association between variables in a contingency table. The $\tilde{\chi}^2$ statistics for k x l contingency table is given as follows.

$\tilde{\chi}^2$ = $\sum_{i=1}^k \sum_{i=1}^l \left [ \dfrac{\left [ n_{ij} - \frac{n_{i+} n_{+j}}{n} \right ]^2}{\frac {n_{i+} n_{+j}}{n}} \right ]$ ; $0 \leq \tilde{\chi}^2 \leq n [ min(k,l) - 1]$

Marginal Frequencies

$n_{i+} = \sum_{j=1}^l n_{ij}$ , it represents marginal frequency distribution of X and j value varies from 1 to l while keeping i value constant.
$n_{+j} = \sum_{i=1}^k n_{ij}$ , it represents marginal frequency distribution of Y and i value varies from 1 to k while keeping j value constant.

Total Frequency

$n = \sum_{i=1}^k n_{i+} = \sum_{j=1}^l n_{+j} = \sum_{i=1}^k \sum_{j=1}^l n_{ij}$

Absolute Frequencies

$n_{ij}$ , it represents the joint frequency distribution of X and Y. The joint frequency distribution tells how the values of both the variables behave jointly.

Interpretation of Pearson’s Chi-Squared Statistics

The value of $\tilde{\chi}^2$ close to zero implies a weak association between the two variables.
The value of $\tilde{\chi}^2$ close to $n \times [min(k,l) - 1]$ , implies strong association between the two variables. $[min(k,l) - 1]$ , gives the minimum size of contingency table.
The other values will suitably indicate the degree of association between the two variables to be low-moderate-high.

$\tilde{\chi}^2$ statistic is symmetric in the sense that its value does not depend on which variable is defined as X and which as Y.

Example 1: A sample of 100 students was chosen and divided into two groups, weak and strong, in academics. Some of the students are given tuition. We would like to see if tuition was helpful in improving the academic performance of the student or not. The data has complied in the following contingency table.

Contingency Table - Example 1 — Contingency Table – Example 1

$\tilde{\chi}^2$ = $\left [ \frac{100 \times (40 \times 30 - 20 \times 10)^2}{50 \times 50 \times 40 \times 60}\right ]$ = 16.66
$n \times (min(k,l) - 1)$ = $100 \times min(2,2) - 1)$ = 100

We have seen that Pearson’s Chi-squared statistics value is 16.66, which is not closed to zero and not close to 100. So our interpretation is that the association between the two variables is moderate. This interpretation is very subjective as there is no straightforward formula that tells directly.

Example 2: Following data on 20 persons has been collected on their age category and their response to the taste of a soft drink. It is like a soft drink was served to children, young, and elderly persons, and its taste was recorded as good or bad.

We have constructed the following 2*3 contingency table from the data, and the marginal frequencies are represented as rows and columns total.

Contingency Table - Example 2 — Contingency Table – Example 2

$\tilde{\chi}^2$ = 0.278
$n \times (min(k,l) - 1)$ = $20 \times min(2,3) - 1)$ = 20

We have seen that Pearson’s Chi-squared statistics value is 0.278, which is closed to zero. The limitation of Pearson’s Chi-squared statistics is that the range of Pearson’s Chi-squared statistics depends on the sample size and size of the contingency table, and these values depend on the situations.

So the Cramer modified the interpretation part and proposed Cramer’s V statistic for a k * l contingency table formula.

Cramer’s V Statistics

$V = \sqrt{\frac{\tilde{\chi}^2}{n \times (min(k,l) - 1)}}$ ; $0 \leq V \leq 1$

The advantage of $V$ statistics is that, it is more simpler as values are lies between 0-1.

Interpretation of Cramer’s V Statistics

The value of $V$ close to zero implies a weak association between the two variables.
The value of $V$ close to 1, implies strong association between the two variables.
The other values indicates the moderate association between the variables.

For example1, $\tilde{\chi}^2$ = 16.66. So the $V = \sqrt{\frac{16.66}{100}}$ = 0.40. This again shows a moderate association.

For example2, $\tilde{\chi}^2$ = 0.278. So the $V = \sqrt{\frac{0.278}{20}}$ = 0.11. This shows a weak association. It implies taste is not much dependend on age.

Contingency Coefficient

The corrected version of Pearson’s contingency coefficient is:

$C_{corr} = \frac{C}{C_{max}} ; 0 \leq C_{corr} \leq 1$ , where $C = \sqrt{\frac{\tilde{\chi}^2}{\tilde{\chi}^2 + n}}$ , $C_{max} = \sqrt{\frac{min(k,l) - 1}{min(k,l)}}$ .

Interpretation of Contingency Coefficient Statistics

The value of $C$ close to zero implies a weak association between the two variables.
The value of $C$ close to 1, implies strong association between the two variables.
The other values indicates the moderate association between the two variables.

For example1, $\tilde{\chi}^2$ = 16.66. So,

$C = \sqrt{\frac{16.66}{16.66 + 100}}$ = 0.38.
$C_{max} = \sqrt{\frac{min(2,2) - 1}{min(2,2)}}$ = 0.71
$C_{corr} = \frac{0.38}{0.71)}$ = 0.54

The value of $C_{corr}$ = 0.54 again shows a moderate association between two variables.

References

Descriptive Statistic, By Prof. Shalabh, Dept. of Mathematics and Statistics, IIT Kanpur.

Descriptive Statistics – Measures of association – Pearson’s Chi-Squared Statistics

Pearson’s Chi-Squared Statistics

Interpretation of Pearson’s Chi-Squared Statistics

Cramer’s V Statistics

Interpretation of Cramer’s V Statistics

Contingency Coefficient

Interpretation of Contingency Coefficient Statistics

References

Like this:

NotePub

Indranagar,
Bangalore - 560038, Karnataka, India

Write Us: [email protected]

Essentials

About Us

Contact Us

Private Policy

Copyright Policy

Assets

Notes

Articles

Questions

Projects

Pearson’s Chi-Squared Statistics

Interpretation of Pearson’s Chi-Squared Statistics

Cramer’s V Statistics

Interpretation of Cramer’s V Statistics

Contingency Coefficient

Interpretation of Contingency Coefficient Statistics

References

Share this:

Like this:

NotePub

Indranagar, Bangalore - 560038, Karnataka, India

Write Us: [email protected]

Essentials

About Us

Contact Us

Private Policy

Copyright Policy

Assets

Notes

Articles

Questions

Projects

Indranagar,
Bangalore - 560038, Karnataka, India