Microarrays – Key Genome
Expression Trackers – Work Better When Probes Are Sequence-Verified
Many
probes don’t match latest RefSeq database information
BETHESDA, Md. (July 22, 2004) -- Microarray
technology, sometimes referred to as biochips, has been extensively used to
investigate genome-wide expression patterns and has facilitated a revolution
in the characterization of cellular regulation. In addition, comprehensive
gene expression profiling shows great potential for human disease
diagnostics.
For instance, multiple research groups have shown that
microarray data can identify previously unappreciated molecular subtypes of
lung cancer that differ in their prognoses. Unfortunately, poor
reproducibility of results exists across studies.
Furthermore, there is now a tremendous volume of data,
particularly from human clinical specimens, which can’t be duplicated, so
strategies to improve analysis of (that is, “clean up”) existing data sets
are needed. One limitation of the application of microarray technology could
be due to the failure of similar studies to measure identical biological
parameters. In other words, the problem could arise from the fact that many
of the microarray probes – and there are now up to hundreds of thousands on
a single slide – are often based on gene sequences that are five years old,
or more.
Background
Frustrated by more than two years of trying to analyze
microarray data contrasting two known conditions, researchers at Harvard
Medical School and Washington University in St. Louis decided to look at the
nucleotide sequences that measure gene expression on the most widely used
commercial microarray technology. They found that in many cases they did not
match the most current information.
In this study, they undertook a global analysis of the
microarrays and systematically attempted to confirm the accuracy of
individual probe sequences. They looked at every probe on the array to see
if it corresponded with the gene that it was intended to measure. They found
that an important percentage of the probe sequences -- sometimes as much as
20%, on both old and currently used platforms – didn’t perfectly correspond
with the appropriate mRNA as defined by the reference sequence (RefSeq).
Research at Harvard’s Brigham & Women’s Hospital
The study, entitled “Increased measurement accuracy for
sequence-verified microarray probes,” will appear in the August 2004 edition
of Physiological Genomics, one of 14 journals published by the
American Physiological Society.
Researchers Brigham H. Mecham, Daniel Z. Wetmore and
Thomas J. Mariani worked in the Division of Pulmonary and Critical Care
Medicine, Department of Medicine, Brigham and Women’s Hospital (BWH) at
Harvard Medical School, Boston; Zoltan Szallasi and Isaac Kohane were at the
Children’s Hospital Informatics Program of Harvard Medical School; and Yoel
Sadovsky was at the Department of Obstetrics and Gynecology , Washington
University School of Medicine, St. Louis, MO.
The work in this paper was supported by the Harvard
Lung Biology Center, HL071885 (TJM), ES11597-01 (YS) and the Francis
Families Foundation.
Results
The researchers found that there were many causes for
the probe sequence inaccuracies, but most notably there has been constant
improvement in sequence information databases over time. Regardless of the
nature of probe sequence inaccuracies, the study clearly shows that
sequence-verified probes perform more consistently, and with higher
accuracy, within replicates and across different versions of the technology.
They note that the leading manufacturer of such
microarrays “apparently…has come to the same conclusion and has recently
released a platform containing RefSeq-verified probes.”
Based on a comprehensive analysis of probe sequences on
the 20 most common mammalian microarray platforms, the researchers found
that data derived from verified probes showed greater accuracy than from
unverified probes,
-
Between technical replicates
-
Across generations of same-platform technology
-
In comparisons between different technology platforms
-
When comparing patient-oriented data from multiple, in
dependent diagnostic microarray studies.
After identifying the limitations of the probe
sequences, they used this information to improve the application of the
technology. On the diagnostic side, they tested the effects of probe
sequence accuracy in data from two independent breast cancer expression
profiling studies. Their results indicate that restricting data to
sequence-verified probes can improve the diagnostic power of microarray
technology.
Discussion and data availability
The researchers stress that the result did not address
a particular classification scheme but indicated that removing unverified
probe sets allowed for the major component of change to be related to the
underlying biology (in this data set, breast cancer) as opposed to the
source of the experiments.
“As combining data from multiple microarray
platforms/technologies is certain to prove a common method, our results
showing increased accuracy of sequence-verified probes across platforms (oligo
vs. oligo and oligo vs. cDNA) substantiate the importance of using the most
reliable information to verify equivalence of measurement across
technologies,” the researchers conclude.
The authors have created a website for checking
sequences/measurements on microarrays for the 10 most common platforms,
which probably will be going up to 26 relatively soon. Called the “Lung
Transcriptome,” it was designed and built by Brigham Mecham, B.S., and
Thomas Mariani, Ph.D., to serve “as both a microarray data repository and
source for information and analytical tools for functional genomics-based,
pulmonary-focused research applications.”
It can be found at
http://lungtranscriptome.bwh.harvard.edu.
Source: Physiological Genomics, July
2004, one of 14 journals containing almost 4,000 articles annually,
published by the American Physiological Society.
Editors note: A copy of the research paper by
Mecham et al. is available in pdf format to the media. Members of the media
are encouraged to obtain an electronic version and to interview members of
the research team. To do so, please contact Donna Krupa at APS
(301) 634-7209, cell (703) 967-2751 or
dkrupa@the-aps.org.
The
American Physiological Society was founded in 1887 to foster basic and
applied bioscience. The Bethesda, Maryland-based society has more than
10,000 members and provides a wide range of research, educational and career
support to further the contributions of physiology to understanding the
mechanisms of diseased and healthy states.