Pearson's chi-squared test
The null hypothesis of the Chi-Squared test is that the variables are independent, and there is no relationship between them.
The Chi-Squared test of independence works by comparing the observed frequencies to the expected frequencies if there was no relationship between the two categorical variables (if the null hypothesis was true).
Select two categorical variables to create a contingency table containing the frequencies of the categories in these variables. A Pearson's chi-squared test will be performed on this contingency table.
The example below shows how to use the Pearson's Chi-Squared test module to test whether there is a relationship between two categorical variables: gender and smoking.
In this example, our hypotheses are:
- Null hypothesis: the two categorical variables are independent, there is no relationship between them
- Alternative hypothesis: the two categorical variables are dependent, there is a relationship between them
The output below shows the results of Pearson's Chi-Squared test. The first section of the output is a tabulation of the selected groups on which the Chi-Squared test will be performed. The second section is the printed output of a
chisq.test run in R. There are more details on this printed output in the R documentation for chisq.test.
In the output, we see that the p-value is 0.52, which is greater than the significance level of 0.05, thus we fail to reject the null hypothesis. We conclude that there there is no relationship between gender and smoking, the two variables are independent.
|test_var||Yes||Column Input. Text, Integer, Boolean, Date, DateTime||The first categorical variable from which to calculate the contingency table.|
|group_var||Yes||Column Input. Text, Integer, Boolean, Date, DateTime||The second categorical variable from which to calculate the contingency table.|