Hypothesis test for the difference between proportions

This document is prepared automatically using the following R command.

Problem

Solution

This lesson explains how to conduct a hypothesis test to determine whether the difference between two proportions is significant.

The test procedure, called the two-proportion z-test, is appropriate when the following conditions are met:

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results.

Since the above requirements are satisfied, we can use the following four-step approach to construct a confidence interval.

1. State the hypotheses

The first step is to state the null hypothesis and an alternative hypothesis.

\[Null\ hypothesis(H_0): P_1 \leqq P_2\] \[Alternative\ hypothesis(H_1): P_1 > P_2\]

Note that these hypotheses constitute a one-tailed test. The null hypothesis will be rejected if the proportion from population 1 is too big..

2. Formulate an analysis plan

For this analysis, the significance level is 0.05`. The test method, shown in the next section, is a two-proportion z-test.

3. Analyze sample data

Using sample data, we calculate the pooled sample proportion (p) and the standard error (SE). Using those measures, we compute the z-score test statistic (z).

\[p=\frac{p_1 \times n_1+ p_2 \times n_2}{n1+n2}\] \[p=\frac{0.71 \times 150+ 0.63 \times 100}{150+100}\]

\[p=169.5/250=0.678\]

\[SE=\sqrt{p\times(1-p)\times[1/n_1+1/n_2]}\]

\[SE=\sqrt{0.678\times0.322\times[1/150+1/100]}=0.061\]

\[z=\frac{p_1-p_2}{SE}=\frac{0.71-0.63}{0.061}=1.33\]

where \(p_1\) is the sample proportion in sample 1, where \(p_2\) is the sample proportion in sample 2, \(n_1\) is the size of sample 1, and \(n_2\) is the size of sample 2.

Since we have a one-tailed test, the P-value is the probability that the z statistic is or greater than 1.33.

We can use following R code to find the p value.

\[p=pnorm(1.33,lower.tail=FALSE)=0.092\]

Alternatively,we can use the Normal Distribution curve to find p value.

draw_n(z=x$result$z,alternative=x$result$alternative)

4. Interpret results.

Since the P-value (0.092) is greater than the significance level (0.05), we cannot reject the null hypothesis.

Result of propCI()

$data
# A tibble: 1 × 2
  x     y    
  <lgl> <lgl>
1 NA    NA   

$result
  alpha   p1   p2  n1  n2  DF   pd         se critical        ME      lower
1  0.05 0.71 0.63 150 100 248 0.08 0.06085776 1.644854 0.1001021 -0.0201021
      upper                      CI ppooled   sepooled        z     pvalue
1 0.1801021 0.08 [95CI -0.02; 0.18]   0.678 0.06032081 1.326242 0.09237975
  alternative
1     greater

$call
propCI(n1 = 150, n2 = 100, p1 = 0.71, p2 = 0.63, P = 0, alternative = "greater")

attr(,"measure")
[1] "propdiff"

Reference

The contents of this document are modified from StatTrek.com. Berman H.B., “AP Statistics Tutorial”, [online] Available at: https://stattrek.com/hypothesis-test/difference-in-proportions.aspx?tutorial=AP URL[Accessed Data: 1/23/2022].