T-test in R to compare means (2024)

HOME STATISTICS WITH R T TEST R

Statistics with R Hypothesis testing

The t.test function in R is used to perform a t-test, which is a statistical test to compare the means of two groups and determine if they are significantly different from each other or to test if the mean of a sample is equal to a certain value. The function allows you to conduct various types of t-tests, such as one-sample t-test, independent samples t-test and paired samples t-test, for equal or different variances.

Syntax

The syntax of the t.test function is the following:

t.test(x, y = NULL, alternative = c("two.sided", "less", "greater"), mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95, ...)# Method for class 'formula't.test(formula, data, subset, na.action, ...)

Being:

One sample t-test

The one sample t-test can be used to check if the true mean of a simple random sample drawn from a normal population with unknown mean \(\mu\) is equal to \(\mu_0\), greater than \(\mu_0\) or lower than \(\mu_0\).

The t-test assumes that the observations are independent and drawn from a normal distribution. The minimum recommended sample size is 30.

Mean equal to \(\mu_0\)

The null and alternative hypotheses are the following:

\(H_0\): The mean of the distribution IS \(\mu_0\).
\(H_1\): The mean of the distribution is NOT \(\mu_0\).

Given a sample data, you can determine whether its mean is equal to \(\mu_0\) or not using the t.test function. The following example tests whether the true mean is equal to 10 or not for a 95% confidence level.

# Sample dataset.seed(10)x <- rnorm(100, mean = 10)# Is the mean of 'x' different from 10?t.test(x = x, mu = 10, conf.level = 0.95)

One Sample t-testdata: xt = -1.4507, df = 99, p-value = 0.15alternative hypothesis: true mean is not equal to 1095 percent confidence interval: 9.676689 10.050213sample estimates:mean of x 9.863451

The p-value is greater than the usual significance levels, so we don’t have enough evidence to reject the null hypothesis that the true mean is equal to 10. Notice that \(\mu_0\) is inside the 95% confidence interval returned by the function.

Mean lower than \(\mu_0\)

In this scenario the null and alternative hypotheses are the following:

\(H_0\): The mean of the distribution IS \(\mu_0\).
\(H_1\): The mean of the distribution is LOWER than \(\mu_0\).

The next example checks whether there is enough evidence to reject the null hypothesis or not. As the alternative hypothesis is that the mean of the distribution is lower than \(\mu_0\) we have to set alternative = "less".

# Sample dataset.seed(10)x <- rnorm(100, mean = 8)# Is the mean of 'x' less than 10?t.test(x = x, mu = 10, alternative = "less")

One Sample t-testdata: xt = -22.699, df = 99, p-value < 2.2e-16alternative hypothesis: true mean is less than 1095 percent confidence interval: -Inf 8.019733sample estimates:mean of x 7.863451

The p-value indicates strong evidence against the null hypothesis. This implies that the null hypothesis (true mean is 10) can be rejected in favor of the alternative hypothesis (true mean is less than 10). The 95% confidence interval also supports this, as it ranges from (\(\infty\), 8.019733), so the true mean is likely less than 8.019733.

Mean greater than \(\mu_0\)

The last option involves conducting a test where the null hypothesis assumes the true mean to be \(\mu_0\), while the alternative hypothesis considers the true mean greater than \(\mu_0\):

\(H_0\): The mean of the distribution IS \(\mu_0\).
\(H_1\): The mean of the distribution is GREATER than \(\mu_0\).

# Sample dataset.seed(10)x <- rnorm(100, mean = 8)# Is the mean of 'x' greater than 10?t.test(x = x, mu = 10, alternative = "greater")

One Sample t-testdata: xt = -22.699, df = 99, p-value = 1alternative hypothesis: true mean is greater than 1095 percent confidence interval: 7.707169 Infsample estimates:mean of x 7.863451

In this case, a p-value of 1 implies that there is no significant evidence against the null hypothesis that the true mean is 10.

Two sample t-test

The t.test function can also perform a two sample t-test to compare the means between two groups. To conduct this test, assign one group to x and the other to y inside the function. Note that by default both groups are considered independent and with different variances.

If the population variances are assumed to be different (the default), this test is also called a Welch test or Welch’s t-test.

Equal means

The null hypothesis for a test of equal means states that the means of the populations are equal, while the alternative hypothesis contends that the means differ between the populations:

\(H_0\): The mean of the distribution of X is EQUAL to the mean of the distribution of Y. (Or the means difference is 0.)
\(H_1\): The mean of the distribution of X is DIFFERENT to the mean of the distribution of Y. (Or the means difference is not 0.)

# Sample dataset.seed(10)x <- rnorm(100)y <- rnorm(100)# Is mean of 'x' different from the mean of 'y'?t.test(x = x, y = y)

Welch Two Sample t-testdata: x and yt = -0.30777, df = 197.83, p-value = 0.7586alternative hypothesis: true difference in means is not equal to 095 percent confidence interval: -0.3080508 0.2248780sample estimates: mean of x mean of y -0.13654894 -0.09496258

The p-value is greater than the usual significance levels, which implies that there is no enough evidence to reject the null hypothesis of equal means.

Lower mean

In this scenario the alternative hypothesis is that the true mean of the first group is lower than the true mean of the second group:

\(H_0\): The mean of the distribution of X is EQUAL to the mean of the distribution of Y.
\(H_1\): The mean of the distribution of X is LOWER than the mean of the distribution of Y.

# Sample dataset.seed(10)x <- rnorm(100)y <- rnorm(100)# Is mean of 'x' less than mean of 'y'?t.test(x = x, y = y, alternative = "less")

Welch Two Sample t-testdata: x and yt = -0.30777, df = 197.83, p-value = 0.3793alternative hypothesis: true difference in means is less than 095 percent confidence interval: -Inf 0.1817153sample estimates: mean of x mean of y -0.13654894 -0.09496258

The p-value is greater than the usual significance levels, so there is no evidence to reject the null hypothesis that the mean of X is equal to the mean of Y.

Greater mean

\(H_0\): The mean of the distribution of X is EQUAL to the mean of the distribution of Y.
\(H_1\): The mean of the distribution of X is GREATER than the mean of the distribution of Y.

# Sample dataset.seed(10)x <- rnorm(100, mean = 3)y <- rnorm(100)# Is mean of 'x' greater than mean of 'y'?t.test(x = x, y = y, alternative = "greater")

Welch Two Sample t-testdata: x and yt = 21.894, df = 197.83, p-value < 2.2e-16alternative hypothesis: true difference in means is greater than 095 percent confidence interval: 2.735112 Infsample estimates: mean of x mean of y 2.86345106 -0.09496258

In this case, the p-value is almost 0, which implies that there is strong evidence to reject the null hypothesis of equal means.

Equal variances

By default, t.test assumes different population variances. However, if an F-test (e.g., conducted with the var.test function) does not provide sufficient evidence to reject the null hypothesis of equal variances, you can set var.equal = TRUE. This setting enables the use of a pooled variance estimate for the calculation.

# Sample dataset.seed(10)x <- rnorm(100)y <- rnorm(100)# Independent samples t-test with equal population variancest.test(x = x, y = y, var.equal = TRUE)

Two Sample t-testdata: x and yt = -0.30777, df = 198, p-value = 0.7586alternative hypothesis: true difference in means is not equal to 095 percent confidence interval: -0.3080493 0.2248766sample estimates: mean of x mean of y -0.13654894 -0.09496258

The p-value is greater than the usual significance levels, which imply there is no enough evidence to reject the null hypothesis of equal means.

Paired t-test

Lastly, if the groups are dependent, you should specify paired = TRUE to execute a paired samples t-test.

# Sample dataset.seed(10)x <- rnorm(100)x_2 <- sqrt(x)# Paired samples t-testt.test(x = x, y = x_2, paired = TRUE)

Paired t-testdata: x and x_2t = -2.152, df = 43, p-value = 0.03705alternative hypothesis: true mean difference is not equal to 095 percent confidence interval: -0.136882092 -0.004443087sample estimates:mean difference -0.07066259

In this test the p-value is 0.03705, so there is no enough evidence to reject the null hypothesis of equal means for 0.05 and 0.1, but it can be rejected for 0.01.