Notes on q-value for many simultaneous tests

2015-01-13

azim58 - Notes on q-value for many simultaneous tests

The q value is used instead of the p value for many simultaneous tests.
The q value is essentially the false discovery rate for many tests rather
than the false positive rate of a single test. The q values for a set of
features (gene, peptide, etc.) can be calculated from the p values using
a variety of different algorithms (Bonferroni, Benjamani Hochberg, etc.).
Two ways of calculating these q values are to use the p.adjust function
in R (see example R session 5-23-12) or the QValue program with R. Once
these q values are obtained, one could select features which all have a
false discovery rate below a certain level, or just simply see what the
false discovery rate is for features with certain p values. One could
also determine how many samples would be necessary to obtain false
discovery rates below a certain threshold assuming that the effect size
and standard deviations remain about the same. This calculation could be
accomplished through a process of trial and error in excel.

"L:\storage\CIM Research Folder\DR\2012\5-23-12\example determination of
multi test sample size 5-23-12.xlsx"

Small program for calculating Q-value
http://genomics.princeton.edu/storeylab/qvalue/

reference for qvalue
-Storey JD and Tibshirani R. (2003) Statistical significance for genome-wide studies. Proceedings of the National Academy of Sciences, 100: 9440-9445.
-A direct approach to false discovery rates

===========================================================================
Other information
This site looks like it might have some good information.
http://viiia.org/fdrFigs/?l=en-us
http://www.nonlinear.com/support/progenesis/samespots/faq/pq-values.aspx

* Another way to look at the difference is that a p-value of 0.05
  implies that 5% of all tests will result in false positives. An FDR
  adjusted p-value (or q-value) of 0.05 implies that 5% of significant
  tests will result in false positives. The latter is clearly a far
  smaller quantity.
* if you order the p-values used to calculate the q-values, then the
  q-values will also be ordered.
* the q-value is a little greater at 0.0141, which means we should
  expect 1.41% of all the spots with q-value less than this to be false
  positives.
* In this way, a threshold of 0.05 has meaning across the entire
  experiment.

azim58wiki: