Notes on q-value for many simultaneous tests

2015-01-13

azim58 - Notes on q-value for many simultaneous tests


The q value is used instead of the p value for many simultaneous tests.
The q value is essentially the false discovery rate for many tests rather
than the false positive rate of a single test. The q values for a set of
features (gene, peptide, etc.) can be calculated from the p values using
a variety of different algorithms (Bonferroni, Benjamani Hochberg, etc.).
Two ways of calculating these q values are to use the p.adjust function
in R (see example R session 5-23-12) or the QValue program with R. Once
these q values are obtained, one could select features which all have a
false discovery rate below a certain level, or just simply see what the
false discovery rate is for features with certain p values. One could
also determine how many samples would be necessary to obtain false
discovery rates below a certain threshold assuming that the effect size
and standard deviations remain about the same. This calculation could be
accomplished through a process of trial and error in excel.

"L:\storage\CIM Research Folder\DR\2012\5-23-12\example determination of
multi test sample size 5-23-12.xlsx"


Small program for calculating Q-value
http://genomics.princeton.edu/storeylab/qvalue/

reference for qvalue

===========================================================================
Other information
This site looks like it might have some good information.
http://viiia.org/fdrFigs/?l=en-us
http://www.nonlinear.com/support/progenesis/samespots/faq/pq-values.aspx

implies that 5% of all tests will result in false positives. An FDR
adjusted p-value (or q-value) of 0.05 implies that 5% of significant
tests will result in false positives. The latter is clearly a far
smaller quantity.
q-values will also be ordered.
expect 1.41% of all the spots with q-value less than this to be false
positives.
experiment.