Notes on t-test
2015-01-13azim58 - Notes on t-test
Notes on Student's t-test
Sources of Information
http://en.wikipedia.org/wiki/T_test
http://www.tc3.edu/instruct/sbrown/stat/sampsiz.htm#Case5
http://www.andrews.edu/~calkins/math/edrm611/edrm11.htm
http://martin-bell.suite101.com/sample-size-calculations---confidence-and-p
ower-a209629
http://www.surveysystem.com/sscalc.htm#one
Tiger also found this link for looking at how many samples are needed for
a microarray with a certain number of spots
http://bioinformatics.mdanderson.org/MicroarraySampleSize/
Sample size software (from Tiger):
http://www.psycho.uni-duesseldorf.de/aap/projects/gpower/
Here's a paper about this program:
G*Power 3: A flexible statistical power analysis program for the social,
behavioral, and biomedical sciences
Other papers on sample sizes for microarrays
used for data that follows a normal distribution (a normality test such
as the Shapiro-Wilk or Kolmogorov-Smirnov test can be used to determine
if data is of a normal distribution form)
null hypothesis: the difference between two responses measured on the
same statistical unit has a mean value of zero.
unpaired t-test: two independent groups of samples are compared (one
treatment group and one non-treatment group)
paired t-test: often a repeated measures t-test (one group is tested
before and after treatment)
p-value: determined with t value from a table of values from Student's
t-distribution (there's also formulas for Student's t-distribution that
can be used instead of tables). Excel has some normal distribution
formulas found here
http://www.exceluser.com/explore/statsnormal.htm
this page has an explanation for how to use tables to look up p values
http://www.math.unco.edu/facstaff/Powers/M550/ContinuousDistributions/pdf/7
.2.1%20Finding%20Areas%20under%20the%20Standard%20Normal%20Curve.pdf
Excel has two functions useful for t tests: TTEST, and TINV, TDIST
The p value can be found from Student's t distribution
this page looks like it has some good information about how to do this
http://commons.bcit.ca/math/faculty/david_sabo/apples/math2441/section8/sma
llsampmean/tdist/tdist.htm
degrees of freedom for a two sample t test is n-2. actually the degrees
of freedom for unequal sample size and unequal variance is much more
complicated and is found on the wikipedia article.
some t test kahn academy videos start here:
http://www.khanacademy.org/math/statistics/v/t-statistic-confidence-interva
l
One-sample t-test
t=(x_bar-u0)/(s/sqrt(n))
df = n-1
Two sample t test for unequal sample sizes, unequal variance
can be found on wikipedia article (long equations)
http://en.wikipedia.org/wiki/T_test
Selecting appropriate sample sizes
confidence interval (same as confidence?) = 1-alpha
confidence level (preselected) (alpha error level) (often 0.05)
characteristics of sample or population
margin of error
determining sample size needed to estimate the true mean of a population
with a certain margin of error
this page talks all about that
http://www.tc3.edu/instruct/sbrown/stat/sampsiz.htm#Case5
this page also has information about determining the difference of two
population proportions (but I think this is for a binomial distribution
and not a normal distribution)
power = probability of rejecting the null hypothesis when you should
(when the specific alternative hypothesis is true) (1-beta). In other
words, the power is the probability of detecting an effect when their
truly is an effect.
beta = Type II error rate (like a false negative) (you reject the
alternative hypothesis when it is actually true)
alpha = Type 1 error rate (probability of rejecting the null hypothesis
when it is actually true or the probability of accepting the alternative
hypothesis when it is actually false (like a false positive)) (commonly
set to 0.05)
this page talks about power analysis
http://www.ats.ucla.edu/stat/sas/dae/t_test_power2.htm
information needed
expected average difference
standard deviations of groups
pre-specified level of statistical power for calculating the sample size
(could be set to 0.8)
"good estimate of effect size is the key to a good power analysis"
I could use SAS to determine the power of a test with a certain effect
size, stdev, and sample sizes
===========================================================================
Notes on t-test for specific applications
Notes on q-value for many simultaneous tests
===========================================================================
Normal distribution function
http://mathworld.wolfram.com/NormalDistributionFunction.html
Cumulative distribution function
Here are two questions that I would like to be able to answer.
- Given a certain standard deviation and number of samples in each of
need to be greater than in order achieve significance and is this
difference detectable and greater than the standard deviation?
- Given a certain standard deviation and difference between two
achieve significance?
Example t-test and answers to sample size questions
===========================================================================
Calculating false discovery rate
===========================================================================
R Statistics Program could be a useful for the analysis.