What is Hypothesis Testing

It is a type of inferential statistics that involves extrapolating results from a sample (random) to the entire population. It’s used to make decisions based on statistical tests and models that use the p-value, also known as the Type I error or alpha error.

Type I Error : When we reject true null hypothesis then it is called type I error

Type II Error : When we do not reject false null hypothesis then it is called type II error.

It can be done using parametric or non-parametric methods/models.

Parametric : They have certain assumptions about the data (model) and/or errors that must be validated before the results can be accepted.

Non-parametric : They are non-parametric because they make no assumptions about the data distribution (model) or mistakes.

Why to use parametric test?

Because they are based on the mean, standard deviation, and normal distribution, parametric tests are regarded “more powerful” than non parametric tests/models. Non-parametric tests are based on median, IQR, and non-normal distributions, non-parametric tests are deemed “less powerful” than parametric tests/models.

Two Statistical Hypothesis

Null Hypothesis : It is also known as hypothesis of no difference Alternative Hypothesis : It is complementary to the null hypothesis also known as research hypotheis.

When to accept Null or Alternative Hypothesis

Accept (fail to reject) null hypothesis from parametric or non-parametric tests requires a P-value > 0.05. (Goodness-of-fit tests)

To accept it from parametric or non-parametric testing (Research hypothesis tests! ), the P-value must be less than 0.05.

Some Commonly Used Parametric Test Using R

One Sample Z Test On Mtcars Data

In this blog I am only going to explain how to test one sample z test using R without explain what is z-test, how it work because I already explained it in my past blog.

# we need to define parameter
muO <- 20
sigma <- 6
xbar <- mean(mtcars$mpg)
n <- length(mtcars$mpg)
z <-sqrt(n)*(xbar-muO)/sigma
p_value<-2*pnorm(-abs(z))

Let’s check z value and p value,

z
## [1] 0.08544207

Hence, we found value of z is 0.08544207

p_value
## [1] 0.9319099

We found p-vale 0.9319099 which is > 0.05 hence we accpet null hypothesis, i.e means of sample and population are equal.

Why there is no one sample z-test in base R package?

Because the t-distribution behaves like the z-distribution for n>=30, the T-test can be employed for both small and big samples. Thus, we don’t need one-sample z-test in R!

One Sample t-test: We can work for small sample as well as for large sample

t.test(mtcars$mpg, mu =20)
## 
##  One Sample t-test
## 
## data:  mtcars$mpg
## t = 0.08506, df = 31, p-value = 0.9328
## alternative hypothesis: true mean is not equal to 20
## 95 percent confidence interval:
##  17.91768 22.26357
## sample estimates:
## mean of x 
##  20.09062

Hence we obtained p-valued 0.9328 it means we do not reject null hypothesis.

Two Sample T-test

It is used to compare the means of a dependent variable with two categories of grouped independent variables. For instance, we can compare exam score (dependent variable) between male and female groups of students!

Assumptions

  • For each category, the dependent variable must follow the normal distribution (Test of normality-GOF)

  • The variance is homogeneous (i.e. equal) across independent variable categories (Test of equal variance-GOF)

What to do if variance across independent variable categories not equal

In this case we used Welch test.

Assumption

  • For each category, the dependent variable must follow the normal distribution (Test of normality-GOF)

  • Variance across independent variable categories are not homogenous i.e; not equal.

Let’s do narmality test on mtcars data

with(mtcars, shapiro.test(mpg[am == 0]))
## 
##  Shapiro-Wilk normality test
## 
## data:  mpg[am == 0]
## W = 0.97677, p-value = 0.8987

Here, p-value is 0.8987. Hence, we do not reject null hypothesis that means it follows normal distribution.

with(mtcars, shapiro.test(mpg[am == 1]))
## 
##  Shapiro-Wilk normality test
## 
## data:  mpg[am == 1]
## W = 0.9458, p-value = 0.5363

It also follows normal distribution. Hence first condition is satisfied i.e; dependent variable mpg follows normal distribution.

Variance Check

var.test(mpg ~ am, data = mtcars)
## 
##  F test to compare two variances
## 
## data:  mpg by am
## F = 0.38656, num df = 18, denom df = 12, p-value = 0.06691
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.1243721 1.0703429
## sample estimates:
## ratio of variances 
##          0.3865615

We can see p-value is 0.06691 which is grater than 0.05. Hence we can say variance across independent variable categories are same. Now we can use two sample student t test.

t.test(mpg ~ am, var.equal= T, data = mtcars)
## 
##  Two Sample t-test
## 
## data:  mpg by am
## t = -4.1061, df = 30, p-value = 0.000285
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -10.84837  -3.64151
## sample estimates:
## mean in group 0 mean in group 1 
##        17.14737        24.39231

Here we saw p-value 0.000285 which is less then 0.05. Hence we reject ho that means milage (mpg) is statistically different among cars with automatic and manual transmission system.

Let’s check two sample student t-test result with simple linear regression model

summary(lm(mpg ~ am, data = mtcars))
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## am             7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

This difference is statistically significant and the p-value is same as given by the two-samples t-test.

What test should we used if we have to compare mean of more than two samples

If we need to compare mean of more than two samples we used 1-way ANOVA test.

Assumption

  • Dependent variable must be “normally distributed”

  • Variance across categories must be same

1-way ANOVA assumptions checks

Normality by categories

with(mtcars, shapiro.test(mpg[gear == 3]))
## 
##  Shapiro-Wilk normality test
## 
## data:  mpg[gear == 3]
## W = 0.95833, p-value = 0.6634

Category 3 follows normal distribution.

with(mtcars, shapiro.test(mpg[gear == 4]))
## 
##  Shapiro-Wilk normality test
## 
## data:  mpg[gear == 4]
## W = 0.90908, p-value = 0.2076

Category 4 also follows normal distribution.

with(mtcars, shapiro.test(mpg[gear == 5]))
## 
##  Shapiro-Wilk normality test
## 
## data:  mpg[gear == 5]
## W = 0.90897, p-value = 0.4614

So, dependent variable follows normal distribution.

Let’s do variance test

In case of more than two samples case we do not use var.test(). For this we usedleveneTest() avilable in car packages. Let’s check. Before doing this we need to change our independent variable into factor.

library(car)
## Loading required package: carData
leveneTest(mpg ~ as.factor(gear), data=mtcars)
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  2  1.4886 0.2424
##       29

Here, we find p-value grater than 0.2424. Hence variance across categories is same. So, we can now used one way calssical ANOVA test.

1-Way Classical ANOVA test

summary(aov(mpg ~ gear, data = mtcars))
##             Df Sum Sq Mean Sq F value Pr(>F)   
## gear         1  259.7  259.75   8.995 0.0054 **
## Residuals   30  866.3   28.88                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We find p-value less than 0.05. Hence we reject null hypothesis that means sample means are not equal. This means, post-hoc test or pairwise comparison is required. If alternative hypothesis is accepted we need to do post-hoc test. For classical 1-way ANOVA TukeyHSD post-hoc test is best. Let’s used it.

TukeyHSD(aov(mpg ~ as.factor(gear), data = mtcars))
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = mpg ~ as.factor(gear), data = mtcars)
## 
## $`as.factor(gear)`
##          diff        lwr       upr     p adj
## 4-3  8.426667  3.9234704 12.929863 0.0002088
## 5-3  5.273333 -0.7309284 11.277595 0.0937176
## 5-4 -3.153333 -9.3423846  3.035718 0.4295874

Let’s check this result with simple linear model

summary(lm(mpg ~ gear, data = mtcars))
## 
## Call:
## lm(formula = mpg ~ gear, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.240  -2.793  -0.205   2.126  12.583 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    5.623      4.916   1.144   0.2618   
## gear           3.923      1.308   2.999   0.0054 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.374 on 30 degrees of freedom
## Multiple R-squared:  0.2307, Adjusted R-squared:  0.205 
## F-statistic: 8.995 on 1 and 30 DF,  p-value: 0.005401
pairwise.t.test(mtcars$mpg, mtcars$gear, p.adj= "none")
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  mtcars$mpg and mtcars$gear 
## 
##   3       4    
## 4 7.3e-05 -    
## 5 0.038   0.218
## 
## P value adjustment method: none

gear = 3 category is omitted from the result because R automatically creates 3 dummy variables for 3 categories of gear variable i.e. 3, 4 and 5 and uses only last two of them in the model and takes the first one as reference.

Comments