A two-way semilinear model for normalization and analysis of cDNA microarray data

被引:24
作者
Huang, J [1 ]
Wang, D
Zhang, CH
机构
[1] Univ Iowa, Dept Stat & Actuarial Sci, Iowa City, IA 52242 USA
[2] Univ Iowa, Program Publ Hlth Genet, Iowa City, IA 52242 USA
[3] Univ Alabama Birmingham, Ctr Comprehens Canc, Biostat & Bioinformat Unit, Birmingham, AL 35294 USA
[4] Rutgers State Univ, Dept Stat, Piscataway, NJ 08855 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
analysis of variance; differentially expressed gene; high-dimensional data; microarray; noise level; semiparametric regression; spline; variance estimation;
D O I
10.1198/016214504000002032
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A basic question in analyzing cDNA microarray data is normalization, the purpose of which is to remove systematic bias in the observed expression values by establishing a normalization curve across the whole dynamic range. A proper normalization procedure ensures that the normalized intensity ratios provide meaningful measures of relative expression levels. We propose a two-way semilinear model (TW-SLM) for normalization and analysis of microarray data. This method does not make the usual assumptions underlying some of the existing methods. For example, it does not assume that the percentage of differentially expressed genes is small or that there is symmetry in the expression levels of up-regulated and down-regulated genes, as required in the lowess normalization method. The TW-SLM also naturally incorporates uncertainty due to normalization into significance analysis of microarrays. We use a semiparametric approach based on polynomial splines in the TW-SLM to estimate the normalization curves and the normalized expression values. We study the theoretical properties of the proposed estimator in the TW-SLM, including the finite-sample distributional properties of the estimated gene effects and the rate of convergence of the estimated normalization curves when the number of genes under study is large. We also conduct simulation studies to evaluate the TW-SLM method and illustrate the proposed method using a published microarray dataset.
引用
收藏
页码:814 / 829
页数:16
相关论文
共 27 条
[1]  
[Anonymous], [No title captured]
[2]  
Bickel Peter J, 1993, Efficient and adaptive estimation for semiparametric models, V4
[3]   Measured and modeled properties of mammalian skeletal muscle: IV. Dynamics of activation and deactivation [J].
Brown, IE ;
Loeb, GE .
JOURNAL OF MUSCLE RESEARCH AND CELL MOTILITY, 2000, 21 (01) :33-47
[4]   Microarray expression profiling identifies genes with altered expression in HDL-deficient mice [J].
Callow, MJ ;
Dudoit, S ;
Gong, EL ;
Speed, TP ;
Rubin, EM .
GENOME RESEARCH, 2000, 10 (12) :2022-2029
[5]  
Chen Y, 1997, J Biomed Opt, V2, P364, DOI 10.1117/12.281504
[6]   ROBUST LOCALLY WEIGHTED REGRESSION AND SMOOTHING SCATTERPLOTS [J].
CLEVELAND, WS .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1979, 74 (368) :829-836
[7]  
Dudoit S, 2002, STAT SINICA, V12, P111
[8]   SEMIPARAMETRIC ESTIMATES OF THE RELATION BETWEEN WEATHER AND ELECTRICITY SALES [J].
ENGLE, RF ;
GRANGER, CWJ ;
RICE, J ;
WEISS, A .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1986, 81 (394) :310-320
[9]  
FAN J, 2004, IN PRESS J AM STAT A
[10]   Efficient estimation of conditional variance functions in stochastic regression [J].
Fan, JQ ;
Yao, Q .
BIOMETRIKA, 1998, 85 (03) :645-660