- What is Gene Expression Samples and Conditions?
- How to access Gene Expression Samples and Conditions?
- What is z-score/significance value and how it is calculated?
- How the P-value is calculated?
- Gene Expression Samples and Conditions Graph
- Gene Expression Samples and Conditions Data table
- Samples and Conditions can be accessed from the Gene Expression section of a Gene Detail page. For example, see the figure below.
Fig 1. Part of Gene Detail page for lexA showing Gene Expression section with a link to Gene Expression Samples and Conditions.
The standard z-score indicates how many standard deviations a measured expression value is above or below the mean. The higher the absolute value of the z-score, the more the gene's expression deviates from the mean and it indicates the significance of change in expression. We report the significance score as the absolute value of the z-score, so a significantly expressed gene might be up- or down-regulated in a sample.
Here we used an approach described by Yang et. al. with slight modifications to calculate standard z-score. The z-score is calculated by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation. To facilitate the z-score calculation, data from each hybridization/sample are first graphed in an MA plot. The MA-plot is a graphical way to see log-ratios and fluorescence intensity at the same time. It was proposed by Dudoit et. al. and was defined in their work as:
A = 1/2*(log2(Cy5) + log2(Cy3)) M = log2(Cy5 / Cy3)
Then, z-scores are calculated for each feature on the MA plot in a sliding window of 2000 features across the x-axis. z-scores for individual features on microarray slide representing the same gene were averaged, it denotes the z-score for that particular gene.
Figure 2 shows the results of such a calculation in one sample named "10ug/mL EMB vs DMSO, 6h", with data values less than one standard deviation from the mean shown blue, those between one and two standard deviations from the mean plotted in green, and those data values more than two standard deviations from the mean indicated in red. A differential gene expression value that is further from the mean expression for the sample, the more significant the expression value is likely to be.
Figure 2. MA plot for "10ug/mL EMB vs DMSO, 6h" sample (in tbdb database).
The above method is followed for calculating z-scores for type one experiments, where the control sample is tailored to the particular experiment/slide (not a common reference). For type two gene expression experiments, where the control RNA is made from a common reference the z-score calculation involves the following slight modification.
Normalization and log ratios were calculated on per experiment set basis. In each experiment set values in channel 2 were averaged. This average value is treated as channel 1 control for calculations. For certain experiments like time course or drug treatments where multiple replicates of zero time point or no treatment controls are available, the average channel 2 values from these slides will be used as control instead of the average of all channel 2 values. After this the z scores were calculated in the same way as described above for type one experiments.
For Affymetrix slides (single channel data), slides in an set are summarized and are normalized, then a mean is calculated for each row in the dataset. These average values are used as reference to convert values to log2 ratios.
The probability that a given condition is significant across all samples is determined by an approach similar to that used in GO::Termfinder program described by Boyle et. al. The hypergeometric distribution is used to determine the probability that a given condition will be significant within the sample size. Given a population of N experiments, where M meet the significance threshold, what is the probability that x or more samples with a given condition meet the threshold, given that there exist n total samples with that condition.
The histogram by default shows significance value/absolute value (z-score) on the x-axis and the number of samples within a z-score range on the y-axis. By selecting appropriate radio buttons at the bottom of the histogram, you can regenerate the histogram to plot intensity or log ratio (base 2) (Expression Value) on x-axis. Samples with a higher significance indicate that the gene's expression deviates significantly from the mean in that sample. Note that significant gene expression can result from both up-regulation and down-regulation. Refer to the "Expression Value" in the table to the right to see whether a gene is "up" or "down". Positive expression values indicate elevated gene expression and negative values indicate decreased gene expression. Using this histogram and its expandable sliding selection bar, you can choose samples within a certain significance value/intensity/Log (base 2) ratio range. Histogram bars falling within a selected range will be highlighted. Samples/hybridizations corresponding to the selected significance value range will be displayed in a table on the right hand side of the page.
By default data in "Hide/Show Experimental Conditions" section above the histogram is hidden. A user can click on it to view or hide data. This section displays experimental conditions corresponding to the selected samples in the histogram. For each experimental condition P-value and the percentage of samples in selected area of the total samples annotated with that particular experimental condition were displayed.
Fig 3. Graph view showing the significance values for lexA across all samples/hybridizations.Selected significance values are indicated in blue samples where the gene show that range of significance values will be described in the Samples and Conditions Data table.
Samples falling within a selected significance value range are displayed in a table on right hand side. This table displays the following information.
Sample Name: Description of the sample(s) hybridized. Provided by the researcher or obtained from a publication.
Experimental Condition: Description of the key experimental parameter of the sample studied. This can be either mutant information or drug treatment or some other growth contditon studied. Provided by the researcher or obtained from a publication by a curator. These experimental conditions were color coded by P-values. The legend is shown in the tooltip when you hover your mouse above the column header.
Expression Value: Positive values indicate that the gene is expressed at a higher level compared to a control and negative values indicate that the gene is expressed at a lower level when compared to a control. For two-color microarrays, the expression value will be the log ratio (base 2) of the ratio of the test sample (red channel, usually) normalized, background corrected intensity compared to the background corrected intensity of the reference sample (green channel, usually). This is sometimes referred to at log(2)[(corrected, normalized channel 2 intensity) / (corrected channel 1 intensity)] or log(2) ratio. For one-channel microarrays, the expression value will be the log(2) of the background corrected, normalized intensity.
Intensity: log(2) transformed value of the product of background-corrected intensities from each channel.
Significance: Absolute value of the z-score for the expression value. A positive z-score indicates that a gene's expression compared to its intensity is above the median while a negative z-score indicates that it is below the median. The higher the absolute value of the z-score, the more the gene's expression deviates from the median and is therefore more likely to reveal underlying physiological processes.
Publication: Link to the publication that generated this data as well as the full complement of microarray data from the paper.
Fig 4. Table view of selected samples from the graph.
You can sort samples by name, intensity, expression value, category etc. You can select or unselect samples in this table by checking appropriate boxes. You can also select and unselect individual samples by checking or unchecking in the corresponding box. You can download the data from this table. Also there are various other download and analysis options as indicated by the following icons.
Download Table Content This will allow you to download data selected from the graph and displayed in the table as an Excel spread sheet. Download Raw Datafiles Allow you to download raw data for samples listed in the table. Raw data files represent a data dump from PortEco along with original and normalized data values with current up-to-date annotations. Download PCL file Allow you to download data for all genes from samples listed in the table in Pre-clustering (PCL) file format. Allow you to take a PCL file for selected samples into Gene Pattern (hyperlink to GP web site) implemented within PortEco. Gene Pattern provides a suite of dozens of data analysis tools that were implemented within PortEco to provide a seamless access to users. This will take you to hierarchical clustering with a pre-clustering (PCL) file with data for all genes from the samples listed in the table. Add to Repository Allows you to save PCL file into your data repository at PortEco (if you have an account with Expression::PortEco).