Marginal asymptotics for the "large P, small N" paradigm: With applications to microarray data

被引:46
作者
Kosorok, Michael R.
Ma, Shuangge
机构
[1] Univ N Carolina, Dept Biostat, Chapel Hill, NC 27599 USA
[2] Yale Univ, Div Biostat, New Haven, CT 06520 USA
关键词
Brownian bridge; Brownian motion; empirical process; false discovery rate; Hungarian construction; marginal asymptotics; maximal inequalities; median tests; microarrays; t-tests;
D O I
10.1214/009053606000001433
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The "large p, small n" paradigm arises in microarray studies, image analysis, high throughput molecular screening, astronomy, and in many other high dimensional applications. False discovery rate (FDR) methods are useful for resolving the accompanying multiple testing problems. In cDNA microarray studies, for example, p-values may be computed for each of p genes using data from n arrays, where typically p is in the thousands and n is less than 30. For FDR methods to be valid in identifying differentially expressed genes, the p-values for the nondifferentially expressed genes must simultaneously have uniform distributions marginally. While feasible for permutation p-values, this uniformity is problematic for asymptotic based p-values since the number of p-values involved goes to infinity and intuition suggests that at least some of the p-values should behave erratically. We examine this neglected issue when n is moderately large but p is almost exponentially large relative to n. We show the somewhat surprising result that, under very general dependence structures and for both mean and median tests., the p-values are simultaneously valid. A small simulation study and data analysis are used for illustration.
引用
收藏
页码:1456 / 1486
页数:31
相关论文
共 29 条
[1]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[2]  
Billingsley P., 1995, PROBABILITY MEASURE
[3]   HUNGARIAN CONSTRUCTIONS FROM THE NONASYMPTOTIC VIEWPOINT [J].
BRETAGNOLLE, J ;
MASSART, P .
ANNALS OF PROBABILITY, 1989, 17 (01) :239-256
[4]  
Csorgo M., 1981, Probability and Mathematical Statistics: a series of monographs and textbooks
[5]   Comparison of discrimination methods for the classification of tumors using gene expression data [J].
Dudoit, S ;
Fridlyand, J ;
Speed, TP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) :77-87
[6]   ASYMPTOTIC MINIMAX CHARACTER OF THE SAMPLE DISTRIBUTION FUNCTION AND OF THE CLASSICAL MULTINOMIAL ESTIMATOR [J].
DVORETZKY, A ;
KIEFER, J ;
WOLFOWITZ, J .
ANNALS OF MATHEMATICAL STATISTICS, 1956, 27 (03) :642-669
[7]  
FAN J, 2005, HOW MANY SIMULATANEO
[8]   Semilinear high-dimensional model for normalization of microarray data: A theoretical analysis and partial consistency [J].
Fan, JQ ;
Peng, H ;
Huang, T .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2005, 100 (471) :781-796
[9]   Normalization and analysis of cDNA microarrays using within-array replications applied to neuroblastoma cell response to a cytokine [J].
Fan, JQ ;
Tam, P ;
Woude, GV ;
Ren, Y .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (05) :1135-1140
[10]   Operating characteristics and extensions of the false discovery rate procedure [J].
Genovese, C ;
Wasserman, L .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2002, 64 :499-517