High-density oligonucleotide arrays have become a popular assay for concurrent measurement of mRNA expression at the genome scale. Much effort has been devoted to the development of statistical analysis tools aimed at reducing experimental noise and normalizing experimental variation in gene expression analysis. However, these investigations do not detect or catalog systematic problems associated with specific oligonucleotide probes. Here, we present an investigation of problematic probes that yield consistent but inaccurate signals across multiple experiments. By evaluating data integrity among gene, probe sequence, and genomic structure we identified a total of 20,696 (10.5%) nonspecific probes that could cross-hybridize to multiple genes and a total of 18,363 (9.3%) probes that miss the target transcript sequences on the Affymetrix GeneChip U95A/Av2 array. The numbers of nonspecific and mistargeted probes on the U133A array are 29,405 (12.1%) and 19,717 (8.0%), respectively. The poor performance of the mistargeted probes was confirmed in two GeneChip experiments, in which these probes showed a 20-30% decrease in detecting present signals compared with normal probes. Comparison of qualitative expression signals obtained from SAGE and EST data with those from GeneChip arrays showed that the consistency of the two platforms is 30% lower in problematic probes than in normal probes. A Web application was developed to apply our results for improving the accuracy of expression analysis.
ASJC Scopus subject areas