LifeMap Discovery, which functions as a gene expression database, aims to sketch a systematic map of gene expression profiles within normal and diseased tissues and cells.
During very early development, cells are totipotent, with the potential to develop into any type of mature cell in the developing embryo. As development proceeds, the cells lose this potential and their final fate is determined. Throughout this differentiation process, cells go through a number of stages, each characterized by unique gene expression profiles. The final cell type is determined via the integration of the expression or repression of specific sets of genes, as well as by regulation at the epigenetic and protein levels.
Stem cells demonstrate two unique characteristics distinguishing them from other cells in the body.
One is their potential to differentiate into different cell types during early development and in the
adult body, while the second is their self-renewing capacity, which makes them a potential cell reservoir
for tissue repair. While most of the stem cells become differentiated, mammalians retain some stem/progenitor
cells during adult life. These cells, under specific conditions, can develop into specialized cells
and function as replacements for cells that are lost through normal physiological processes, injury
or disease.
These stem and progenitor cells can be cultured in vitro and induced to differentiate into specific
cell types. Their stemness and differentiation pathways are determined by the sets of genes being expressed,
which, in the case of cultured cells, is dependent on the specific treatment the cells receive and their
culture conditions.
LifeMap Discovery provides a snapshot of gene expression patterns at cellular and anatomical resolutions. The LifeMap database includes the following entity types: in vivo cells, anatomical compartments, tissues, organs and cultured cells, including protocol-derived cells (PDCs, cells derived during a differentiation protocol). Gene expression data are accrued by manually surveying the published literature and/or by extracting information about the gene status from related high throughput experiments (see below – High throughput gene expression in LifeMap Discovery). The information is then allocated to appropriate entity cards. Hence, the database can be used as a platform for identifying cells, based on specific gene expression profiles.
A DNA microarray is a collection of thousands of microscopic DNA spots attached to a solid surface. The number of genes attached depends on the array design, but generally covers all the expressed genes in the genome. RNA is extracted from cells of two populations under investigation, and reverse transcribed to cDNA. The cDNA is fluorescently labeled and applied to the microarray chip. After hybridization of the labeled cDNA to the probe, the microarray is scanned. The fluorescence intensities of the spots, which correspond to the level of gene expression in each population, are normalized and compared. The comparison is usually performed by calculating the normalized intensity fold change of one sample versus the other, with additional statistical analysis to exclude false-positive results.
A DNA microarray experiment compares gene expression between two populations. The DNA microarray chip contains a collection of thousands of microscopic DNA spots, termed probes, attached to a solid surface. Common microarrays contain probes corresponding to 20,000-40,000 genes. Other types of arrays include the exon arrays and the miRNA arrays which contain probes unique to all exons in the genome or to miRNAs respectively. In the microarray technique, RNA is isolated from cells or tissues (A) and reverse transcribed to cDNA (B). At this stage, the cDNA is labeled and then hybridized to a microarray (C). The more copies a specific transcript has in the RNA sample, the stronger the staining intensity will be. The microarray is then scanned to quantify the staining intensity in each spot, which enables calculation of the relative gene expression. (D). In current illustration, the gene marked pink has a higher expression in sample A, the gene marked green in sample B and there is equal expression of the yellow-marked gene in both samples. |
RNA sequencing (RNA-Seq) allows for quantitative determination of RNA expression levels. The method features an advantage over microarrays in that it provides coverage of the entire genome, including the various single-nucleotide polymorphisms (SNPs). In this method, RNA is extracted from cells, and the mRNA is isolated. In some cases, the mRNA is fragmented at this stage. The mRNA is then reverse transcribed into cDNA and then, if necessary, fragmented to lengths compatible with the sequencing system. Once all the fragments are sequenced, the transcripts (or reads) are assembled into genes. Although it is possible to assemble the transcriptome de novo, it is usually more efficient to align the reads to a reference genome or reference transcripts. As RNA-Seq is quantitative, a direct comparison between experiments can be made.
To perform RNA sequencing, RNA is extracted from cells or tissues and reverse transcribed to cDNA (A). The cDNA is then digested into fragments, while the fragment length may vary depending on sequencing machine specifications (B). Each cDNA fragment is sequenced in a high-throughput manner to obtain short sequences termed “short sequenced reads” (SSRs). SSRs are aligned to a reference genome to produce a genome-scale transcription map (C). Transcript expression is calculated based on the number of reads aligned to each gene (D). |
In situ hybridization (ISH) provides high-resolution gene expression information within the context of their natural location within an organ or organism. ISH uses a labeled cDNA fragment (i.e., probe) to locate a specific DNA segment in a portion or section of a tissue (in situ). The basic steps in ISH include cell permeabilization, hybridization of the labeled probe, and detection of the probe, thereby revealing the location of the mRNA of interest. This process can be adapted to a large scale system and the results are often shown in databases such as MGI, Gensat etc. These databases are linked to LifeMap Discovery.
Analysis of gene expression by in situ hybridization (ISH) provides high-resolution information reflecting both gene expression level and location within a tissue or even full organism. ISH can be automated, thereby enabling the determination of gene expression patterns for thousands of genes (large scale ISH). Whole-mount embryos or tissue sections are placed and fixed on a standard microscope slide (A, B). DNA templates are transcribed to cRNA strands, termed riboprobes, which are hydrolyzed, labeled and hybridized to the sections (C). The riboprobes hybridize to their complementary RNA transcripts in the sample, thereby enabling visualization of the RNA transcript location (D). |
Normalized data extracted from microarray analyses and deposited in a public repository, were used to calculate lists of differentially expressed genes, using an eBayseian analysis tool developed at LifeMap Sciences. For normal, non-disease expression data, a Binyamini-Hochenberg False Discovery Rate (FDR) correction for multiple tests is applied (for more information on FDR see FDR in Wikipedia). Genes with p-value < 0.05 and a fold change of at least 2 were regarded as differentially expressed. For experiments that do not have normalized data deposited in a public repository, the differentially expressed gene lists are derived from the article describing the high throughput experiment.
In the LifeMap Discovery database, high throughput experiments (e.g., DNA microarray and RNA sequencing) are detailed in dedicated experiment cards. In normal gene expression experiment cards, the samples are linked to organ/tissue/anatomical compartment/in vivo cell/cultured cell cards. The genes are allocated to these entity cards based on information from the literature as well as from public gene expression dataset depositories, such as GEO. Each card contains a description of the experiment, a list of the samples tested in the experiment and their related LifeMap Discovery entity cards. Furthermore, a list of the gene expression comparisons between different samples in the experiment is provided. Each experiment includes one or more comparisons, each with its dedicated card (termed gene expression comparison card – see below).
In disease gene expression experiment cards, the samples are not linked to any other card in LifeMap Discovery, but are linked to cards in MalaCards.
A comparison card presents two sample groups that were compared and the resulting differential gene expression list. Each sample group is comprised of one or more experimental samples.
In normal expression comparison cards, the samples within each sample group are linked to the same LifeMap Discovery entity card. The differentially expressed gene lists contain the top 100 differentially expressed genes in one sample group in comparison to the other. One of the sample groups can be a reference group not linked to LifeMap Discovery entities. References can be RNA extracted from whole embryo, commercial synthetic RNA or mixed samples.
In disease expression comparison cards, the test sample group is always the diseased tissue, and the control sample group is the matched normal tissue. The differentially expressed genes lists contain up to 200 down- and 200 up-regulated genes in the diseased tissue in comparison to the control.
In addition to data derived from the literature and from high throughput experiments, LifeMap Discovery presents lists of differentially expressed genes derived from large scale data sets:
Each large scale data set is presented in an experiment card containing a list of its samples.
In normal expression datasets, each sample is detailed in its own card, which includes a link to the related LifeMap Discovery card (tissue, anatomical compartment or cell) and the list of genes expressed in the sample. When a sample is linked to compartment or cell card(s), its expressed genes appear in the gene expression list of the linked card(s). When the sample is linked to an organ card, its expressed genes appear in the gene expression list only if the sample is derived from the whole organ/tissue and is linked solely to the organ/tissue card (and not also to a cell or compartment).
In disease expression datasets, the samples are not linked to any card in LifeMap Discovery, but are linked to cards in MalaCards. Each sample card contains a list of up to 200 down- and 200 up-regulated genes in the diseased tissue in comparison to the control.
The database presents gene expression information for in vivo and in vitro cell entities:
Organ/Tissues: The organ/tissue cards, which provide a low-resolution description of mammalian development, indicate the total number of genes described in the card itself and in all its related cards, which include developmental path, anatomical compartment, in vivo and in vitro cell, and large scale dataset sample cards. Users can request the full list of genes, which contains annotations for developmental path-specific and developmental path-enriched genes.
Below is a partial sample of the spinal cord gene expression data (the complete summary includes 64 entities and 2,901 genes – 17 developmental path-specific and 5 developmental path-enriched).
Each column in the file represents one gene that is expressed in the spinal cord or its related entities. Gene orthologs are referred to by the same gene symbol.
Each row represents a development path, anatomical compartment, cell or large scale dataset sample related to the spinal cord.
The list contains the following gene types:
In addition, the organ/tissue card includes a detailed gene expression table summarizing genes that have been found, by high throughput gene expression comparisons and/or in large scale data sets, to be expressed in samples taken from the whole tissue.
Anatomical compartments: Gene expression profiles in the anatomical compartment cards, which outline the developmental ancestry of specific temporospatial regions within an organ/tissue, are extracted from high throughput gene expression comparisons and/or large scale data sets related to the specific anatomical region. At the anatomical compartment level, genes expressed in any of the cells that comprise the related compartment are aggregated and presented.
In vivo cells and cultured stem, progenitor and primary cells: Gene expression profiles in the cell card, which details developmental paths taken by cells directed toward specific lineages, include genes expression data collected via manual curation, high throughput gene expression comparisons and/or large scale data sets related to the specific cell.
The following information is provided for each gene in each card:
Positive Selective cell marker. Genes that are either established cell markers, or that have been suggested to be characteristic of the cell.
Negative selective cell marker. Genes that are selectively not expressed in the cell and have been suggested to be characteristic of the cell, through absence of their expression. Usually, 'selectively not expressed' genes discriminate between developmentally similar cells.
Expressed genes. Genes that are known to be expressed in the cells but are not defined as selective cell markers.