I experienced an interesting confluence of events the other day. My 11-year-old son has been finding out about the great power of online information. Although we limit his access to assistance with homework, he is already a digital whiz kid who knows where to find a great deal of information for writing tomes like Norse history reports. In my day, it would have taken an entire afternoon of digging through texts at a university library to obtain items he found in a few seconds.
The confluence came about because of what I was doing while sitting next to him. While he was busy downloading information about Vikings, I was reading an update on a story that I have been following for a few years: the attempt to create a simple, objective blood test that could properly identify mood disorders. That would be a truly handy gadget for mental health professionals to have in their diagnostic tool kits! Going through the literature, which relies heavily on gene expression data, it hit me how profoundly the judicious use of online databases has contributed to the scientific rigor of the research. The Internet was not only seminal to my son’s work but also to this blood test research.
In this column, I will discuss new progress on this Internet-boosted line of inquiry. I will begin with a few basics about differential gene expression and microarrays and will then move on to something that researchers are calling “convergent functional genomics.” As you shall see, the clever use of online databases both confirmed and extended the work done at the bench. As a result, it may very well be possible in the next few years to have a clinic-ready blood test that is capable of diagnosing unipolar and bipolar depression. There may even be a diagnostic test for schizophrenia.
In order to understand this promising research, we first need to review a few facts about differential gene expression, microarrays, and their use in the laboratory. As you may remember, only about 2% of the genome encodes for messenger RNA (mRNA)—sequences usually referred to as class II genes (the rest of the genes encode either ribosomal RNA, called class I genes, or transfer RNA, called class III genes).
You can subdivide class II genes into 2 categories based on the transcriptional activity. Some class II genes are turned on all the time; we often refer to them as “housekeeping” sequences. Some class II genes are expressed quite cell-specifically (a neuron has a very different job description from, say, a gut fibroblast, after all), and they are either completely silent or are called on infrequently, depending on the needs of the cell.
Researchers can capture these “need-specific” class II mRNAs quite easily because of the binding properties of their nucleotides. Consider this example: Suppose you are interested in finding out which neural genes, if any, become activated in the presence of a test medication. You take 2 groups of cells; 1 group will not be exposed to the drug (serving as the unstimulated control), while the other will be exposed to the drug for a set period.
How do you get the medication-specific genes? You simply isolate both sets of mRNA, convert them to helical DNA, and then mix them together. The genes that are commonly expressed in both populations (like those housekeeping genes) will find each other and, with some coaxing, bind together. This makes them double-stranded. The genes that are unique to the medication stimulation have no “partners” and will not bind to anything. This makes them single-stranded.
Since it is easy in the laboratory to separate double-stranded from single-stranded snippets of DNA, we can quickly isolate our “medication-specific” gene population. (This technique can also be used in the opposite direction for some medications, or “turn off” genes.) Simply looking for unpaired populations in the controls can give the researcher valuable information about active and suppressive events related to medication exposure.
Today, populations of nucleotides can be embedded in something we call a “microarray,” which is essentially a plastic tray to which DNA samples have been previously and irreversibly bound. Any DNA can be attached to the microarray, including any (or all) products from the 40,000-plus genes that make up the human genome. Once embedded, you simply wash the plastic with the nucleotide sample that you are testing and see what does and does not bind to the nucleotides on the dish. This hybridization principle was used extensively in the data I am about to describe.
The blood test experiment was an attempt to measure whole genome expression differences in populations with mood disorders (and schizophrenia) using only their blood as the sample substrate. If any unique gene sequences were discovered, would these sequences predict mood disorders in unknown populations? The researchers had their biological work cut out for them. It is quite an experimental leap to ask about events going on in the brain by interrogating only the blood. As you shall see, using a database that looked at human brain–specific gene expression (on the Internet) turned out to be critical for this work.
Le-Niculescu and colleagues1 enrolled 3 cohorts in this study: 2 for depression and 1 for psychotic disor-ders. Twenty-nine patients in the first cohort had been given a diagnosis of bipolar I disorder. The second group, a replicant cohort, consisted of 19 patients with bipolar I disorder. The third group comprised the psychoses-related cohort and included 30 persons with schizoaffective disorders, substanceinduced psychoses, and schizophrenia.
The first task for the researchers was to isolate the genetic substrates of the patients in various phases of their mood disorder. Blood samples were collected when the patients were in a high-mood state (a visual analog scale score of 60 or higher) and in a low-mood state (visual analog scale score of 40 or lower). The various populations of mRNAs were isolated from these blood samples, and the hybridization work involving microarrays began.
Unique gene expression profiles were eventually obtained in both low- and high-mood states and were then divided into forward and reverse mRNA subpopulations.
The forward population represented the manic state. Those mRNA populations were classified as absent in the low (ie, the gene was not expressed in the low-mood state) and present in the high (meaning that the gene was expressed only in the high-mood state). The isolated sequences were considered to be candidate biomarkers for the manic phase of the disorder.
The reverse populations were also isolated. They were absent-in-the-high mood but present-in-the-low mood representatives. These sequences were considered to be candidate biomarkers for the depressive phase of the disorder.
The first category was cross-validation using animal models. This vetting procedure used a pharmacogenomic mouse model for bipolar disorder. Both low-specific and high-specific populations were characterized, and gene sequences were isolated using similar microarray procedures previously deployed in the human work just discussed. In these tests, the source of the mRNAs included not only mouse blood but also brain tissue. The mouse sequences isolated in this fashion were compared with the human sequences previously described.
Next, the strongest gene candidates in each category then underwent an extensive series of tests and cross-checking (Figure). These validation exercises can be divided into 3 categories.
The second category was cross-validation using human postmortem brain sample databases. This vetting procedure involved peering into the Internet and specifically assessing a URL that was carrying data from “GeneCards”—an Online Mendelian Inheritance of Man database (http://www.nslij-genetics.org/search_omim.html). This database contains published reports of changes in the expression of specific genes in postmortem brain tissues that were obtained from patients with bipolar disorder. The idea was to compare the sequences that were isolated from living patient blood samples with sequences that were isolated from the brain samples in deceased patients.
This was a key step because the cross-checking not only involved human-to-human comparisons but it was also the first attempt to establish blood-to-brain connections with the data. As was hinted at previously, the body spends a ridiculous amount of time and resources trying to wall these systems off from each other. Any blood test that is designed to assay something in the brain by looking for something in the blood would need the concordance between the tissues down pat. It is also tricky to determine the phase of illness at which the person died: was it at the low end or at the high end? The researchers assumed that the deaths occurred when the patients were experiencing the low symptoms. Most amazingly, perhaps, the researchers found a number of sequences that converged well with the genes they previously obtained from the blood sample.
The third category was cross-validation using human genetic data linked to mood disorders. This work also involved extensive use of an Internet-borne database. An online sequence-based integrated map of the human genome is published by the University of Southampton in the United Kingdom. (There is a similar collection of information called the Marshfield Clinic Research Foundation database in the United States.) These databases on gene sequences include previously published works shown to have a genetic linkage to mood disorders.
Taken together, 3 separate vetting procedures were used to screen the sequences isolated from the original living human cohorts. At each step, a single question was asked: “Do any of the sequences match?” Answering this question was not straightforward, and statistical analyses were then performed to determine convergence. The researchers termed the entire protocol “convergent functional genomics.”
That any genes could still be present after such screening is a testament to both the rigor of the work and the insightful nature of the experimental design. The researchers did indeed find matches. A total of 10 candidate genes survived the screenings. Five came from the selection in the high-mood, or manic population: Atxn1, EdnRb, Edg2, Fzd3, and Mbp. Five came from the selection in the low-mood, or depressive population: Erbb3, FGfr1, Mag, Pmp22, and Ugt8.
What do these gene sequences do? This is probably the most biologically interesting aspect of the work, and it is easily the most opaque. Some of the gene sequences are involved in the normal myelination of neurons. These included the sequences Edg2, Mag, Mbp, Pmp22, and Ugt8. Several of these are involved in growth factor signaling: Erbb3, FGfr1, Fzd3, Igfbp6, and Ptprm.
What does the isolation of these sequences mean to our biological understanding of mood disorders? Not much, unfortunately. Growth factor and signal transduction sequences seem to hold the greatest promise for obtaining early leads. The presence of so many myelination-specific genes in an affective disorder, however, is less intuitive and certainly more surprising, and their roles are nearly a complete mystery.
Discovering biological roles was not the point of this work, however. There was a more practical issue: Given the blood data, how well did these sequences actually predict a mood disorder?
The answer makes these data especially compelling for the future clinic. Using the original populations, these 10 biomarkers were tasked to predict which patients had what disorder and which phase they were experiencing at the time of the test.
Such prediction is relatively easy. The researchers calculated a score on the basis of the ratio of high-mood to low-mood genes, using both sensitivity scores and specificity inventories. Their results were a stunner. In the first cohort (high mood only), sensitivity was 84.6% and specificity was 68.8%. In the second cohort (high mood only), sensitivity was 70.0% and specificity was 66.7%.
Similar results were obtained when predicting low mood. In the first cohort (low mood only), sensitivity was 76.9% and specificity was 81.3%. In the second cohort (low mood only), sensitivity was 66.7% and specificity was 61.5%.
These are extraordinary figures. As the researchers themselves pointed out, these scores are comparable to results obtained in prenatal tests that can predict Down syndrome. They do indeed seem to have uncovered a working blood test for an affective disorder.
These data were obtained with adults who were experiencing a specific disorder in an even more specific phase. The test was conducted with an assay that could be administered in any clinic capable of drawing someone’s blood. Although not mentioned in this space, similar results were obtained in predicting disease states in the psychoses cohort. Some believe
that blood test kits that are capable of such diagnostic discrimination could be available in as few as 5 years.
Of course, the robustness of these findings immediately suggests the commissioning of larger, more prospective studies. It also suggests something equally extraordinary: the critical role of the creation of specific databases and how their unfettered, online access took part in uncovering such big science.
That is the convergence that hit me as I finished reading some of the articles describing this work and gazed up to look at my son. There he was, fully engaged in a beautifully interactive Web site that described the Vikings’ impact on medieval European life while my nose was buried in GeneCards. Two very different purposes, one very handy information source.
The Internet does not always have a great reputation, and some of the criticism is deserved. Nonetheless, I knew that such easy access to online information has—like my son and the field I love so much—quite a future indeed.