Comparing two groups

t 検定の紹介

Greg Nishihara

2024-07-07

A juvenile Sargassum macrocarpum

A random sample

Table 1: Size (mm) of juvenile Sargassum macrocarpum(ノコギリモク).

Let’s compare the two groups

The mean (\(\overline{x}\)), standard deviation (\(s\)), and the standard error (s.e.) for juvenile S. macrocarpum from site A and B are:

  • \(\overline{x}_A=\) 20.033, \(s_A=\) 1.003, and s.e. = 0.41
  • \(\overline{x}_B=\) 22.25 and \(s_B=\) 0.971, and s.e. = 0.396

What is our question?

  • If we want to statistically compare the size from the two sites, we need a question (i.e., a working hypothesis).

Working hypothesis (作業仮設): The size (width) of juvenile S. macrocarpum collected from site A and B are different.

  • We know that the means for site A and B are different, but the standard deviations and standard errors are similar.
    • \(\overline{x}_A=\) 20.033; \(s=\) 1.003; s.e. = 0.41
    • \(\overline{x}_B=\) 22.25; \(s=\) 0.971; s.e. = 0.396

Define our hypotheses

Let’s formally define our statistical hypotheses.

  • \(H_0\) (null hypothesis 帰無仮説): There is no difference in the paired values.

  • \(H_A\) (alternative hypothesis 対立仮設): There is a difference in the paired values.

Other alternative hypotheses

  • \(H_P\) (alternative hypothesis): The difference in paired values is positive.
  • \(H_N\) (alternative hypothesis): The difference in paired values is negative.

Calculate the size differences among pairs

Assume that we can compare the paired differences (e.g., \(x_{A,1} - x_{B,1}\), \(x_{A,2} - x_{B,2}\), \(x_{A,3} - x_{B,3}\), \(\cdots\), \(x_{A,6} - x_{B,6}\)).

Recall the hypotheses

The two statistical hypotheses that we defined were:

  • \(H_0\): There is no difference in the paired values.
  • \(H_A\): There is a difference in the paired values.

The mean difference (\(\overline{x}_{A-B}\)) is -2.217, the standard deviation (\(s_{A-B}\)) is 1.289, and the standard error (\(\text{s.e.}_{A-B}\)) is 0.526

Note: The true difference \(\mu_{A-B}\) is -2, the true standard deviation \(\sigma_A = \sigma_B\) is 1.

Distribution of the mean

Recall that the central limit theorem (中心極限定理) states that the distribution of the mean has a Gaussian (normal) distribution (正規分布).

  • \(\overline{x}_{A-B} =\) -2.217 (mean)
  • \(s_{A-B}=\) 1.289 (standard deviation)
  • s.e.A-B = 0.526 (standard error)

The shaded area is the 95% probability region. The width of the shaded area is called a confidence interval (信頼区間). If the significance level (有意水準) is \(\alpha = 0.05\), then the confidence interval is called a 95% confidence interval (95% 信頼区間).

Developing the confidence interval

The confidence interval is an interval \([l, u]\) with a lower bound of \(l\) and an upper bound of \(u\).

For a probability \(1-\alpha\), the interval \([l, u]\) for \(x\) is

\[ P(l \le x \le u) = 1-\alpha \] If \(\overline{x}\) is a sample mean, then the z-score (z値) is

\[ z = \frac{\overline{x}-\mu}{\sigma} \]

where \(\mu\) is the population mean and \(\sigma\) is the population standard deviation. Then, to find \(l\) and \(u\), we need to solve

\[ P(l \le z \le u) = 1-\alpha \]

for the interval \([l, u]\) of \(z\) given a probability of \(1-\alpha\).

Central limit theorem (中心極限定理)

Recall that the central limit theorem states that:

\[ \lim_{n\rightarrow\infty} \sqrt{n}\overbrace{\left(\frac{\overline{x}_n-\mu}{\sigma}\right)}^{z} \xrightarrow{d} N(0, 1) \]

Therefore, for \(\alpha = 0.05\), we can define an \([l, u]\)

\[ P\left(l \le z \le u \right) = 1-0.05 = 0.95 \]

For the standard normal distribution (\(N(0,1)\))

  • \(l\) is the \(\alpha/2=0.05/2=0.025\) quantile.
  • \(u\) is the \(1-\alpha/2=1-0.05/2=0.975\) quantile.

Determining the lower and upper quantiles of \(N(0, 1)\)

Note that \([-1 s, 1 s]\) is the 68.3% interval, \([-2 s, 2 s]\) is the 95.4% interval, and \([-3 s, 3 s]\) is the 99.7% interval.

Table of quantiles for \(N(0, 1)\)

Significance level, α Confidence interval (%) ± quantile
0.500 50.000 0.674
0.317 68.269 1.000
0.200 80.000 1.282
0.100 90.000 1.645
0.050 95.000 1.960
0.046 95.450 2.000
0.025 97.500 2.241
0.003 99.730 3.000
0.000 99.994 4.000

Calculating the confidence interval

Let \(x = \overline{x}_{A-B} =\) -2.217, \(s = \text{s.e.} =\) 0.526, \(\sigma_A = \sigma_B =\) 1, and and \(\alpha = 0.05\).

\[ P\left(l \le \frac{\overline{x}-\mu}{\sigma}\le u\right) = 1-\alpha = 0.95 \]

\[ P\left(\overline{x} +l \sigma \le \mu \le \overline{x} + u\sigma\right) = 1-\alpha = 0.95 \]

When \(\alpha= 0.05\), the \(l\) and \(u\) quantiles are \(l=\) -1.96 and \(u=\) 1.96, and \(\sigma = 1\).

\[ P( -2.217 + -1.96 \times 1 \le x \le -2.217 + 1.96 \times 1 ) \]

\[ P( -4.177 \le x \le -0.257 ) = 0.95 \]

The 95% confidence interval of \(\overline{x}=\) -2.217 is \([-4.177, -0.257 ]\).

Confidence intervals for each experiment when \(\sigma\) is known

\(\sigma_A = \sigma_B=\) 1, \(H_0:\) \(\overline{x}_A = \overline{x}_B\) or \(\overline{x}_{A-B}=0\)

How often does the confidence interval contain 0?

  • The true difference is -2, therefore \(H_0\) is false.
  • If we do not reject \(H_0\), we are making a Type-II Error (第2種の誤り).
  • The 95% confidence intervals of 8 experiments include 0.
  • The error rate is \(\beta=\) 8 / 20 = 0.4 or 40%.
  • The power of this analysis (\(1 - \beta\)) is 0.6

We made some wrong assumptions

The z-score when population mean \(\mu\) and population standard deviation \(\sigma\) is known follows a standard normal distribution.

\[ z = \frac{\overline{x} - \mu}{\sigma}\sim N(0,1) \] However, if you do not know the population standard deviation, we must calculate the t-value.

\[ t_{\overline{x}} = \frac{\overline{x} - x_0}{s.e.} = \frac{\overline{x} - x_0}{s / \sqrt{n}} \]

Which follows a t-distribution. \(x_0\) is a constant, and is often set to zero.

Determining the lower and upper quantiles of \(t(d.f.)\)?

Note that the degrees-of-freedom (d.f., 自由度) for the t-distribution is \(N -1\) = 5.

Table of quantiles for \(t(d.f. = 5)\)

Quantiles of the t distribution for d.f. = 5.
Significance level, \(\alpha\) Confidence interval (%) \(\pm\) quantile
0.500 50.000 0.727
0.363 63.678 1.000
0.200 80.000 1.476
0.102 89.806 2.000
0.100 90.000 2.015
0.050 95.000 2.571
0.030 96.990 3.000
0.025 97.500 3.163
0.010 98.968 4.000

Confidence intervals for each experiment when \(\sigma\) is unknown

\(H_0:\) \(\overline{x}_A = \overline{x}_B\) or \(\overline{x}_{A-B}=0\)

How often does the confidence interval contain 0?

  • The true difference is -2, therefore \(H_0\) is false.
  • If we do not reject \(H_0\), we are making a Type-II Error (第2種の誤り).
  • The 95% confidence intervals of 5 experiments include 0. So for 5 experiments, we do not reject \(H_0\).
  • The error rate is \(\beta=\) 5 / 20 = 0.25 or 25%.
  • The power of this analysis (\(1 - \beta\)) is 0.75

Formal tests of group means

The paired t-test

Type-II error rate \(\beta\) = 5 / 20 = 25% and power (\(1-\beta\)) is 0.75.

t-test (unpaired assuming unequal variance)

Type-II error rate \(\beta\) = 1 / 20 = 5% and power (\(1-\beta\)) is 0.95.

The null hypothesis of the t-test

\(H_0\) null hypothesis (帰無仮説): \(\overline{x}_A - \overline{x}_B = \overline{x}_{A-B}=0\)

Paired t-test

Paired t-test (対応ありのt検定)

We need to calculate the t-value, which is the statistic for the t-test.

\[ t^* = \frac{\overline{x}_{A-B} - \mu}{s_{A-B} / \sqrt{n}} \]

And determine the degrees-of-freedom (自由度) which is \(n-1\).

Used when observations can be paired. For example the length of the left and right fin of a fish.

Independent two sample t-test

There are two versions.

Equal variance (等分散)

\[ t^* = \frac{\overline{x}_A - \overline{x}_B}{s_p \sqrt{1 / n_A + 1/n_B}} \] \[ s_p = \sqrt{ \frac{(n_A-1)s_A^2 + (n_B-1)s_B^2} {n_A + n_B -2}} \] Degrees-of-freedom is \(n_A + n_B - 2\).

Unequal variance, Welch’s t-test (ウェルチのt検定)

\[ t^* = \frac{\overline{x}_A - \overline{x}_B}{s_p} \]

\[ s_p = \sqrt{ \frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}} \] Degrees-of-freedom is calculated with the Welch-Satterthwaite Equation.

\(s\) is the sample standard deviation. \(n\) is the number of samples. \(\overline{x}\) is the mean. \(t^*\) is the t-value.

Welch-Satterthwaite Equation

\[ \text{degrees-of-freedom} = \frac{ \left(\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}\right)^2 } {\frac{\left(s_A^2 / n_A\right)^2}{n_A-1} + \frac{\left(s_B^2 / n_B\right)^2}{n_B-1}} \]

The unpaired t-test

\[ \begin{aligned} t^* &= \frac{\overline{x}_{A-B} - \mu}{s_{A-B} / \sqrt{n}} \\ t^* &= \frac{-2.467}{2.642 / \sqrt{6}} \\ t^* &= -2.287 \end{aligned} \]

  • \(\overline{x}_{A-B}=\) -2.467
  • \(\mu=0\)
  • \(n\) = 6
  • \(s_{A-B}=\) 2.642
  • \(\alpha\) = 0.05
  • t-value: -2.287
  • One-sided P-value: 0.035
  • Two-sided P-value: 0.071

The juvenile S. macrocarpum size observations cannot be paired, so this is the wrong test.

The correct test is Welch’s t-test

\[ \begin{aligned} t^* &= \frac{\overline{x}_A -\overline{x}_B}{s_p} \\ s_p &= \sqrt{\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}} \\ s_p &= \sqrt{\frac{1.995^2}{6} + \frac{1.961^2}{6}} \\ t^* &= \frac{10.05 - 12.517}{1.142} \\ t^* &= -2.16 \\ \text{d.f.} &= 9.997 \end{aligned} \]

  • \(\alpha\) = 0.05
  • t-value: -2.16
  • One-sided P-value: 0.028
  • Two-sided P-value: 0.056

The P-value decreases, but \(0.056 \ge \alpha= 0.05\). We can’t reject \(H_0\).

Behavior of the t-test (equal variance)

Increasing the number of observations decrease the Type-II error rate and increases the power of the test. The Type-I error rate is fixed at \(\alpha=0.05\).

Behavior of the t-test (unequal variance)

Unbalanced variances (\(s^2\)) increase the risk of a Type-II error rate (\(\beta\)) and decrease the power (\(1-\beta\)) of the t-test. The Type-I error rate is fixed at \(\alpha=0.05\).

Welch’s t-test R code

library(tidyverse)
A = c(9.8,11.1,10.7,10.7,11.8,6.2)
B = c(12.5,13.8,12.0,15.5,9.8,11.5)
data = tibble(A, B)
data = data %>% pivot_longer(cols = c(A,B))
t.test(value ~ name, data = data) # ウェルチ t 検定
# t.test(A, B) # Alternative method

# two-sample, equal variance t-test (等分散 t 検定)
# t.test(value ~ name, data = data, var.equal = TRUE) 

Welch’s t-test does not require equal variances or equal sample size.

The two-sample t-test requires equal variances.


    Welch Two Sample t-test

data:  value by name
t = -2.16, df = 9.9971, p-value = 0.05612
alternative hypothesis: true difference in means between group A and group B is not equal to 0
95 percent confidence interval:
 -5.01124979  0.07791646
sample estimates:
mean in group A mean in group B 
       10.05000        12.51667