t- test

History About Student t- Distribution

t-static: The t-statistic was first discovered by Englishman W.S. Gosset, who published it under the pseudonym “Student” in his 1908 research paper entitled “The Probable Error of the Mean.” Therefore, this t-statistic is called Student’s t-statistic. Later on, Prof. R.A. Fisher developed and further defined the t-statistic in 1926, and it became known as Fisher’s t-statistic. (Source: From the book “Probability and Inference,” page 273)

Definition of Student’s t-Statistic

Let a random samples \((x1,x2,...,xn)\) of size \(n\) be drawn from normal population with mean \(\mu\) and variance \(\delta^2\). Then student’s t-statistic is defined as, \(t= \frac{\bar(x)-\mu}{\frac{s}{\sqrt(n)}}\)

Which follows student’s t-distribution with \((n-1)\) degree of freedom, where \(bar(x)=\frac{ \sum(x_i)}{n}\) is the sample mean and \(s^2 = \frac{1}{n-1}\sum(x_i-\bar(x))^2\) is the sample variance.

When to use t- test

Here is the passage in correct form:

The t-test is a parametric test; therefore, it approximates the normal distribution as the sample size increases. When choosing a t-test, two considerations need to be taken into account: whether the groups being compared come from a single population or two different populations, and whether you want to test the difference in a specific direction. Source

Types of t-test

  • One-sample, two-sample, or paired t-test: If the groups come from a single population.
  • Two-sample t-test: If the groups come from two different populations.
  • Multisample t-test: If the groups come from the several populations.

Before Diving into T-test

We need to know,

  • Types of Hypothesis
  • p-value
  • Confidence Interval

Hypothesis Testing with t-test

There are two types of hypotheses:

  1. Null Hypothesis (\(H_0\)): The null hypothesis is also defined as the hypothesis of no difference. It should be a simple hypothesis and is tested for possible rejection based on sample observations drawn from the population. It is accepted when the p-value is greater than α (usually 0.05).

  2. Alternative Hypothesis (\(H_1\)): Any hypothesis that is mutually exclusive and complementary to the null hypothesis is called the alternative hypothesis. It is accepted when the p-value is less than \(alpha\) (usually 0.05).

They are tested like,

  • \(H_0\) : \(bar(x)\) = \(mu\) (Sample mean and populations are equal or sample is taken from same population.)

  • \(H_1\): \(bar(x) \neq \mu\) (Sample mean not equal to population mean or sample is not taken from same population.)

p-value

In statistics, the p-value represents the probability of obtaining results at least as extreme as the observed results of a statistical hypothesis test, assuming that the null hypothesis is correct. A smaller p-value indicates stronger evidence in favor of the alternative hypothesis.

Confidence Interval

A confidence interval represents the range of values around the mean within which a parameter is likely to fall with a specified level of confidence.

T Statistics

\[t = \frac{\bar{x} - \mu}{\frac{S} {\sqrt{n}}}\]

Where, \(bar{x}\) is sample mean, \(mu\) is population mean and \(S\) is sample standard deviation and \(n\) is sample size.

If the test statistics is above the value from the T table, we reject the null hypothesis. Which is opposite to the p-value.

From Wikipedia.

R code for testing the hypothesis with t -test

Here I have used builtin data mtcars to test hypothesis. A data frame with 32 observations on 11 (numeric) variables.

[, 1]	mpg	Miles/(US) gallon
[, 2]	cyl	Number of cylinders
[, 3]	disp	Displacement (cu.in.)
[, 4]	hp	Gross horsepower
[, 5]	drat	Rear axle ratio
[, 6]	wt	Weight (1000 lbs)
[, 7]	qsec	1/4 mile time
[, 8]	vs	Engine (0 = V-shaped, 1 = straight)
[, 9]	am	Transmission (0 = automatic, 1 = manual)
[,10]	gear	Number of forward gears
[,11]	carb	Number of carburetors

One Sample T test

we want to hypothesize that the mean ‘mpg’ is very close to 20. To determine whether I can make this claim or not, I will use a one-sample t-test. We will use a significance level of 0.05, which represents the percentage of error that we can tolerate.

#load the data as dataframe.
df <- data.frame(mtcars)
t.test(df$mpg,mu = 20)
	One Sample t-test

data:  df$mpg
t = 0.08506, df = 31, p-value = 0.9328
alternative hypothesis: true mean is not equal to 20
95 percent confidence interval:
 17.91768 22.26357
sample estimates:
mean of x 
 20.09062 

Above code is the example of One Sample t-test where \(mu\) = 20 is our hypothesized mean. Looking over different parameters:

  • df = 31: We have degree of freedom 31 (i.e. we have 32 rows and df= n-1).
  • t = 0.08506: Our level of significance is 0.05. Now we go to t-table and find the value associated with it. We look into df as 31, alpha as 0.05 and the value was 2.042 (taken from here). Since our calculated t statistics is smaller than tabulated value, we conclude that there is no evidence of two means being different. Or in other words, we were unable to reject the null hypothesis.
  • p-value = 0.9328: We compare this p-value with our level of significance and this value resembles level of error we could accept. Since this is huge than our level of significance, we conclude that there is not a strong evidence against null hypothesis.

In conclusion, there is not a significant difference between hypothesized mean and data’s mean.

Two Sample T test

We use two sample T test to test whether two means come from same population or not.

Group the mtcars’s mpg on the basis of am column. Our goal is to find whether there is difference in average of mpg in each group of am. We have 0 as automatic and 1 as manual for am. See above table for more info.

t.test(mpg~am,var.equal= T, data = mtcars)
	Two Sample t-test

data:  mpg by am
t = -4.1061, df = 30, p-value = 0.000285
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 -10.84837  -3.64151
sample estimates:
mean in group 0 mean in group 1 
       17.14737        24.39231 

Here we saw value for t test is -4.1061, degree of freedom = 30, and p value is 0.000285 from this p value we have strong evidence to reject \(H_0\). We accept \(H_1\) which means that two samples are not taken from same population.

As the p-value is smaller than our chosen level of significance (0.05), we have sufficient grounds to reject the null hypothesis. Therefore, we can conclude that there is a statistically significant difference in the average ‘mpg’ between the two groups.

Comments