Notes on t-test

2015-01-13

azim58 - Notes on t-test


Notes on Student's t-test

Sources of Information
http://en.wikipedia.org/wiki/T_test
http://www.tc3.edu/instruct/sbrown/stat/sampsiz.htm#Case5
http://www.andrews.edu/~calkins/math/edrm611/edrm11.htm
http://martin-bell.suite101.com/sample-size-calculations---confidence-and-p
ower-a209629
http://www.surveysystem.com/sscalc.htm#one
Tiger also found this link for looking at how many samples are needed for
a microarray with a certain number of spots
http://bioinformatics.mdanderson.org/MicroarraySampleSize/
Sample size software (from Tiger):
http://www.psycho.uni-duesseldorf.de/aap/projects/gpower/
Here's a paper about this program:
G*Power 3: A flexible statistical power analysis program for the social,
behavioral, and biomedical sciences
Other papers on sample sizes for microarrays


used for data that follows a normal distribution (a normality test such
as the Shapiro-Wilk or Kolmogorov-Smirnov test can be used to determine
if data is of a normal distribution form)
null hypothesis: the difference between two responses measured on the
same statistical unit has a mean value of zero.
unpaired t-test: two independent groups of samples are compared (one
treatment group and one non-treatment group)
paired t-test: often a repeated measures t-test (one group is tested
before and after treatment)

p-value: determined with t value from a table of values from Student's
t-distribution (there's also formulas for Student's t-distribution that
can be used instead of tables). Excel has some normal distribution
formulas found here
http://www.exceluser.com/explore/statsnormal.htm
this page has an explanation for how to use tables to look up p values
http://www.math.unco.edu/facstaff/Powers/M550/ContinuousDistributions/pdf/7
.2.1%20Finding%20Areas%20under%20the%20Standard%20Normal%20Curve.pdf
Excel has two functions useful for t tests: TTEST, and TINV, TDIST
The p value can be found from Student's t distribution
this page looks like it has some good information about how to do this
http://commons.bcit.ca/math/faculty/david_sabo/apples/math2441/section8/sma
llsampmean/tdist/tdist.htm

degrees of freedom for a two sample t test is n-2. actually the degrees
of freedom for unequal sample size and unequal variance is much more
complicated and is found on the wikipedia article.

some t test kahn academy videos start here:
http://www.khanacademy.org/math/statistics/v/t-statistic-confidence-interva
l

One-sample t-test
t=(x_bar-u0)/(s/sqrt(n))
df = n-1

Two sample t test for unequal sample sizes, unequal variance
can be found on wikipedia article (long equations)
http://en.wikipedia.org/wiki/T_test


Selecting appropriate sample sizes
confidence interval (same as confidence?) = 1-alpha
confidence level (preselected) (alpha error level) (often 0.05)
characteristics of sample or population
margin of error


determining sample size needed to estimate the true mean of a population
with a certain margin of error
this page talks all about that
http://www.tc3.edu/instruct/sbrown/stat/sampsiz.htm#Case5
this page also has information about determining the difference of two
population proportions (but I think this is for a binomial distribution
and not a normal distribution)


power = probability of rejecting the null hypothesis when you should
(when the specific alternative hypothesis is true) (1-beta). In other
words, the power is the probability of detecting an effect when their
truly is an effect.
beta = Type II error rate (like a false negative) (you reject the
alternative hypothesis when it is actually true)
alpha = Type 1 error rate (probability of rejecting the null hypothesis
when it is actually true or the probability of accepting the alternative
hypothesis when it is actually false (like a false positive)) (commonly
set to 0.05)

this page talks about power analysis
http://www.ats.ucla.edu/stat/sas/dae/t_test_power2.htm
information needed
expected average difference
standard deviations of groups
pre-specified level of statistical power for calculating the sample size
(could be set to 0.8)
"good estimate of effect size is the key to a good power analysis"
I could use SAS to determine the power of a test with a certain effect
size, stdev, and sample sizes




===========================================================================
Notes on t-test for specific applications
Notes on q-value for many simultaneous tests

===========================================================================
Normal distribution function
http://mathworld.wolfram.com/NormalDistributionFunction.html
Cumulative distribution function

Here are two questions that I would like to be able to answer.
  1. Given a certain standard deviation and number of samples in each of
two groups, what would the difference between the two population means
need to be greater than in order achieve significance and is this
difference detectable and greater than the standard deviation?

  1. Given a certain standard deviation and difference between two
population means, how many samples in each group would be necessary to
achieve significance?

Example t-test and answers to sample size questions


===========================================================================
Calculating false discovery rate


===========================================================================
R Statistics Program could be a useful for the analysis.