In this article, we will achieve a clear understanding of
- The definition of t-test,
- different terminologies/ factors involved in conducting t-test like mean, variance, standard deviation, significance level, degree of freedom, t-value, and p-value,
- assumptions for t-test calculation,
- different kinds of t-tests,
- how to do the test for each kind of t-test, and
- how we can establish t-test calculation in MS Excel.
What is a t-test?
A t-test/ Student t-test is a statistical tool to determine the degree of significant difference between the means of two population groups. The groups may be related to certain features. This hypothesis testing tool will let us determine if the differences in the groups could have happened by chance.
To compare means of more than two groups or multiple pairwise comparisons, we use a post-hoc test or ANOVA test.
For example, we can use a t-test to determine if the recovery duration of common cold while consuming natural remedies is different from over-the-counter pharmaceutical remedies. Let us assume that the cold lasts a couple of days when treating it with naturopathy, and it lasted for a week when treating it with OTC remedies.
While surveying other people, the same results were observed. With a t-test, we can find if these results are repeatable. The t-test will help us determine the probability of the results (whether happening coincidently) by comparing the two groups’ means.
T-test explanation
A t-test helps compare two data sets’ mean values and concludes if they originated from the same population. In terms of mathematics, we need to take a sample from each of the two population groups for calculating the t-test. The sample size is small (n<30).
We establish the problem statement by assuming that the two means are equal (null hypothesis). Based on the kind of t-test and the applicable formula, we compare the t-value against the standard values. Based on the t-value, the null hypothesis is either accepted or rejected.
In any kind of t-test, there are two possible hypotheses:
- Null hypothesis: there is no significant difference between the two groups
- Alternative hypothesis: there is a significant difference between the two groups
If a null hypothesis gets rejected, it implies that the data readings are strong and not by chance.
Factors involved in conducting a t-test
Before diving into the t-test calculations, we need to understand a few of the basic terminologies. These terminologies are a crucial part of a t-test calculation. They are:
Mean: It is also called average. It is the ratio of the sum of all the terms to the total number of terms. For example, mean of 6, 12, and 18 is [(6+12+18)/3] = 12
Variance (s2): It is a measure to determine how spread out the data set is. It is also called a measure of variability. It is the average squared deviation of values from the mean. For example, the mean of 2,4 and 6 is 4. The variance is [(2-4) ^2 + (4-4) ^2 + (6-4) ^2] /3 = 2.667
Standard deviation (б): It measures the amount of dispersion for a set of values. It is the square root of variance. The values are closer to the mean of a data set when the standard deviation is low.
Significance level (α): It is the probability that determines whether the null hypothesis should be accepted or rejected. We can use any value between 0 and 1 to define the significance level. For ease of calculation, we can use the values 0.01, 0.05, or 0.1.
Degree of freedom (DF): It is the number of observations in a data set that are free to vary while estimating statistical parameters.
t-value: It is the result that is obtained out of a t-test calculation. The value determines whether to accept or reject the null hypothesis. When the t-value is zero, it implies that the sample result is at par with the null hypothesis. When the difference between the sample estimate and null hypothesis increases, the absolute value of the t-value increases.
P-value: If the null hypothesis holds true, the p-value determines the likeliness of finding a particular set of observations. A smaller p-value signifies that the alternative hypothesis may be true.
Different kinds of t-test
There are three kinds of t-tests:
- One-sample t-test: It is done to determine if the mean of a population is different from a hypothesized value. It checks if the selected sample size belongs to a population with a specific mean value.
- Independent two-sample t-test: It is done to compare the means of two different groups to check if their associated population means are significantly different from one another.
- Paired t-test: it is done to compare the means of two different sample sizes having different characteristics, taken from the same population group.
Assumptions for t-test calculation
The first assumption is the scale of measurement. The scale of measurement applied to the collected data is assumed to be an ordinal or continuous scale.
The second assumption is that the sample collected from the population should be picked randomly.
When plotting the data on a graph should portray a bell-curve normal distribution.
The variance needs to be homogeneous. When the samples’ standard deviations are equal (approximately), we observe homogenous variance.
t-test calculation
We need three key data values to calculate a t-test:
- Mean difference: the difference between the mean values from each data set.
- Each group’s standard deviation
- Number of data values of every group
The value obtained in a t-test is called the t-value. It is compared with the value derived from the critical value table, known as the t-distribution table. The comparison determines the chance’s effect on the difference in values.
t-distribution table
The t-distribution table is available in two formats – one-tail and two-tail.
A one-tail t-distribution table is used to assess cases that have a fixed range or value. For example, finding the probability of output value remaining below -2 or getting more than 8 while rolling a pair of die.
A two-tail distribution table is used to conduct a range-bound analysis. For example, to check if the coordinates lie between -3 and 3.
The t-test calculations can be done using standard software programs supporting the required statistical functions (example: MS Excel).
How to do a one-sample t-test
- Define null and alternate hypotheses.
- Define significance level
- Calculate the degree of freedom (DF).
DF = n-1
- Calculate t-value.
t = x-Ms∕n
x signifies observed mean sample
M signifies hypothesized population mean (null hypothesis)
s signifies sample’s standard deviation
n signifies the number of observations in a sample
- Calculate p-value
- Derive null hypothesis. It is rejected when the p-value is less than the significance level.
How to do an independent two-sample t-test?
- Define the null and alternative hypotheses.
- Define significance level.
- Calculate the degree of freedom (DF). It varies by condition due to the presence of two different samples. The thumb rule is to pick the smaller value between nA-1 and nB-1.
- Calculate the t-value.
t=mA–mBs2nA+s2nB
mA = Sample A mean
mB = Sample B mean
nA = Sample size A
nB = Sample size B
s2 = common variance estimator of the two samples
s2=∑x-mA2+∑x-mB2nA+nB-2
nA+nB-2 is the degree of freedom
- Calculate the p-value.
- When the p-value is less than the significance level, reject the null hypothesis.
How to do a paired t-test?
- Define paired differences (d).
d=x1–x2
x1 = value of x in data set one. x2 = value of x in data set two that is paired with x1
- Define null and alternative hypotheses.
- Define significance level.
- Calculate degree of freedom (DF = n-1). n is the number of paired observations.
- Calculate t-value.
t=d-Msd∕n
d = sample mean difference between paired observations
M = hypothesized mean difference (null hypothesis)
sd = standard deviation of d values.
- Calculate the p-value.
- We reject the null hypothesis when the p-value is less than the significance level.
How to set up a t-test in MS Excel
Install Data Analysis ToolPak. To check whether we have it installed, click ‘Data’ on the menu bar and check for ‘Data Analysis’ in the ‘Analyze’ section. If ‘Data Analysis’ is absent, you can install it for free.
To install the Data Analysis ToolPak click on ‘File’ and then click ‘Options.’ Click on ‘Add-ins’ and then click on ‘Go.’ A popup will appear. Check for ‘Analysis ToolPak’ on the popup and click ‘Ok.’
Once the Data Analysis ToolPak is enabled, it appears on the Data menu. When we click on it, it displays the different analyses that we can perform. We can choose from one of the three kinds of t-test to perform our calculations.
T-test can also be done online
Commonly Asked Questions
When do we use a t-test?
A t-test is used for hypothesis testing. It compares the means of two groups. It is used when we need to determine if a process has an actual effect on the particular population or how different two groups are from each other.
A t-test can only be used for pairwise comparison.
For example, we can use a t-test to determine if the mean length of a flower’s petal differs based on its species. We find two different flower species for the calculation and measure an equal number of petals from each species.
When do we reject the null hypothesis in a t-test?
If the critical value is less than the absolute value of the t-value, we reject the null hypothesis. If the critical value is more than the absolute value of the t-value, we accept the null hypothesis.
What does the p-value signify?
The p-value tells us the probability of occurrence of the data under the null hypothesis. A p-value of 0.05 signifies that there is a 5% probability that we will observe a t-value at least as extreme as the one we obtained when the null hypothesis was accepted.
We have various comprehensive calculators that you can use online for free. You can choose from t-test calculator, graphing, matrix, the standard deviation to statistics, and scientific calculators. Check it here.
Do your calculators do the Chi-squared test? This involves statistics, and your calculator for statistics seem to have everything else. Am I overlooking it, is this feature available…only under a different name? How about the dispersion or spread of statistics? This would be a very valuable feature to add if you do not already have this.