Example t-test and answers to sample size questions

2015-01-13

azim58 - Example t-test and answers to sample size questions


Example t-test and answers to sample size questions


Spreadsheet file can be found here:
https://imtiewhl.livedrive.com/files/5183
or here
"F:\kurt\storage\CIM Research Folder\DR\2012\5-17-12\example t-test
5-17-12.xlsx"



Here are two questions that I would like to be able to answer.

  1. Given a certain standard deviation and number of samples in each of
two groups, what would the difference between the two population means
need to be greater than in order achieve significance and is this
difference detectable and greater than the standard deviation?

  1. Given a certain standard deviation and difference between two
population means, how many samples in each group would be necessary to
achieve enough significance to conclude that the two group means truly
are different (with a 5% false positive error)?

Note that in order to answer these questions the approximate standard
deviation would need to be known from an experiment or previous values in
the literature.

Excel has two nice functions for answering these questions: TDIST and
TINV to go back and forth between t and p values. Note that determining
the p value from the t value essentially involves determining the area
under Student's t distribution curve from the point of the t value.
However, the equations for doing this are quite large and involved so
these Excel functions can greatly speed the calculations up.

Before answering these two questions, I will first I'll just go through
an example 2 sample t-test using formulas that can handle unequal sample
sizes and unequal variance.

Let's say I have signal intensities for a particular spot on a microarray
due to tumor sera binding, and I also have signal intensities for a
particular spot on a microarray due to naive sera binding.


tumor1
tumor2
naive1
naive2

1297
1293
2519
2567


I then calculate the average and standard deviation for the tumor group
and the naive group.

tumor avg
tumor stdv
naïve avg
naïve stdv

1295
2.828427
2543
33.94113


Next I need to calculate t from this information.


t =
overline{X
_1 - overline
X
_2 over s_
overline{X
_1 - overline
X
_2}}




s_
overline{X
_1 - overline
X
_2} = sqrt
{s_1^2 over n_1
+
s_2^2 over

n_2
}.


With this data the value for t becomes:

n
Sx1-x2
t

2
24.08319
51.82038

Note that when computing the t value the x1 must be greater than x2 or
the absolute value function could be used.

Next before using the t value to determine the p value, we need to
determine the degrees of freedom.



mathrm
d.f.
= frac
(s_1^2/n_1 + s_2^2/n_2)^2
(s_1^2/n_1)^2/(n_1-1) +

(s_2^2/n_2)^2/(n_2-1)
.


For this data I obtain


df

  1. 013888


The excel function TDIST(X,df,# of tails) can then be used to determine
the p value. Researchers often set the "alpha" value to 0.05, and
consider the means of the two populations to be significantly different if
the p value is below this alpha value. Note that alpha is also the Type I
error rate or the false positive error rate. I just think of it as the
probability that you will accept the alternative hypothesis when it is
actually false. Note that you can never really accept the alternative
hypothesis, you can just reject the null hypothesis (that the two means
are not significantly different) when it is actually true.

With this data I obtain:

p value

  1. 012284


Now let's answer the original questions.
  1. Given a certain standard deviation and number of samples in each of
two groups, what would the difference between the two population means
need to be greater than in order achieve significance and is this
difference detectable and greater than the standard deviation?

Let's keep the standard deviation and the number of samples the same.
Let's set the p value to 0.05. Then let's use TINV(p value, df) to
determine the t value. From the t value we can calculate the difference
in population means required to originally obtain that p value. t =
difference/"Sx1-x2". I obtain the following values.


tumor stdv
naïve stdv
n
Sx1-x2


df

p value

t value

difference between population means required

  1. 828427

33.94113

2
24.08319


1.013888

0.05

12.7062

306.0059


Now let's answer the 2nd question.
  1. Given a certain standard deviation and difference between two
population means, how many samples in each group would be necessary to
achieve enough significance to conclude that the two group means truly
are different (with a 5% false positive error)?

A solution to this problem can be found by using the goal seek function
in excel. Set a cell for the difference between the population means; set
a cell for n (just choose some arbitrary number like 2; calculate df and
t using the other information (including standard deviation). Then use
the TDIST function to get the p value from the t value. Then use goal
seek (Data -> What If Analysis -> Goal Seek) to set the p value to 0.05
by changing n. So for this example data if I know that the two population
means are expected to vary by 40, I will find that I need 6 samples in
each group to show that this difference is significant.


tumor stdv
naïve stdv
n
Sx1-x2


df

p value

t value

difference between population means required

  1. 828427

33.94113

5.523944
14.4912


4.586773

0.050834

2.760296

40



Spreadsheet file can be found here:
https://imtiewhl.livedrive.com/files/5183
or here
"F:\kurt\storage\CIM Research Folder\DR\2012\5-17-12\example t-test
5-17-12.xlsx"