05/05/2020

Quality Metrics Results

Legend	Name	Description
A	Overview	Table containing the index, title, and validation of analyzed samples. The validation column will display "OK" for samples without outliers, or one or more section numbers where statistical outliers were found in the samples
B	Full report	Generates the full report with the complete description of each analysis section.
C	Export results	Exports the results as HTML to the local computer.
D	Save project	Save the results inside a project.
E	Sections	Tabs of each results session.

Section 1: Between Array Comparison

Legend	Name	Description
A	Distance between arrays	See remarks
B	Distance between arrays (Outliers)	See remarks
C	Principal Component Analysis (PCA)	See remarks

Remarks

Figure 1.1 shows a false color heatmap of the distances between arrays. The color scale is chosen to cover the range of distances encountered in the dataset. Patterns in this plot can indicate clustering of the arrays either because of intended biological or unintended experimental factors (batch effects). The distance d_ab between two arrays a and b is computed as the mean absolute difference (L₁-distance) between the data of the arrays (using the data from all probes without filtering). In formula, d_ab = mean | M_ai - M_bi |, where M_ai is the value of the i-th probe on the a-th array. Outlier detection was performed by looking for arrays for which the sum of the distances to all other arrays, S_a = Σ_bd_ab was exceptionally large. One such array was detected, and it is marked by an asterisk, *.

Figure 1.2 shows a bar chart of the sum of distances to other arrays S_a, the outlier detection criterion from the previous figure. The bars are shown in the original order of the arrays. Based on the distribution of the values across all arrays, a threshold of 866 was determined, which is indicated by the vertical line. One array exceeded the threshold and was considered an outlier.

Figure 1.3 shows a scatterplot of the arrays along the first two principal components. You can use this plot to explore if the arrays cluster, and whether this is according to an intended experimental factor (you can indicate such a factor by color using the 'intgroup' argument), or according to unintended causes such as batch effects. Move the mouse over the points to see the sample names.
Principal component analysis is a dimension reduction and visualisation technique that is here used to project the multivariate data vector of each array into a two-dimensional plot, such that the spatial arrangement of the points in the plot reflects the overall data (dis)similarity between the arrays.

Section 2: Array Intensity Distributions

Legend	Name	Description
A	Boxplots	See remarks
B	Boxplots (Outliers)	See remarks
C	Density plots	See remarks

Remarks

Figure 2.1 shows boxplots representing summaries of the signal intensity distributions of the arrays. Each box corresponds to one array. Typically, one expects the boxes to have similar positions and widths. If the distribution of an array is very different from the others, this may indicate an experimental problem. Outlier detection was performed by computing the Kolmogorov-Smirnov statistic K_a between each array's distribution and the distribution of the pooled data.

Figure 2.2 shows a bar chart of the Kolmogorov-Smirnov statistic K_a, the outlier detection criterion from the previous figure. The bars are shown in the original order of the arrays. Based on the distribution of the values across all arrays, a threshold of 0.0233 was determined, which is indicated by the vertical line. One array exceeded the threshold and was considered an outlier.

Figure 2.3 shows density estimates (smoothed histograms) of the data. Typically, the distributions of the arrays should have similar shapes and ranges. Arrays whose distributions are very different from the others should be considered for possible problems. Various features of the distributions can be indicative of quality related phenomena. For instance, high levels of background will shift an array's distribution to the right. Lack of signal diminishes its right right tail. A bulge at the upper end of the intensity range often indicates signal saturation.

Section 3: Individual Array Quality

Legend	Name	Description
A	MA Plots	See remarks
B	MA Plots (Outliers)	See remarks

Remarks

Figure 3.1 shows MA plots. M and A are defined as:
M = log₂(I₁) - log₂(I₂)
A = 1/2 (log₂(I₁)+log₂(I₂)),
where I₁ is the intensity of the array studied,and I₂ is the intensity of a "pseudo"-array that consists of the median across arrays. Typically, we expect the mass of the distribution in an MA plot to be concentrated along the M = 0 axis, and there should be no trend in M as a function of A. If there is a trend in the lower range of A, this often indicates that the arrays have different background intensities; this may be addressed by background correction. A trend in the upper range of A can indicate saturation of the measurements; in mild cases, this may be addressed by non-linear normalisation (e.g. quantile normalisation).
Outlier detection was performed by computing Hoeffding's statistic D_a on the joint distribution of A and M for each array. The value of D_a is shown in the panel headings. 0 arrays had D_a>0.15 and were marked as outliers. For more information on Hoeffing's D-statistic, please see the manual page of the function hoeffd in the Hmisc package.

Figure 3.2 shows a bar chart of the D_a, the outlier detection criterion from the previous figure. The bars are shown in the original order of the arrays. A threshold of 0.15 was used, which is indicated by the vertical line. None of the arrays exceeded the threshold and was considered an outlier.

Section 4: Variance Mean Dependency

Legend	Name	Description
A	Standard Deviation versus Rank of the Mean	See remarks

Remarks

Figure 4 shows a density plot of the standard deviation of the intensities across arrays on the y-axis versus the rank of their mean on the x-axis. The red dots, connected by lines, show the running median of the standard deviation. After normalisation and transformation to a logarithm(-like) scale, one typically expects the red line to be approximately horizontal, that is, show no substantial trend. In some cases, a hump on the right hand of the x-axis can be observed and is symptomatic of a saturation of the intensities.

Quality Analysis Results - Export HTML

Legend	Name	Description
A	Report name
B	Image format	Options: PNG, JPEG, GIF, BMP, SVG, and WMF (windows-only)
C	Save mode	Embedded in the file or by separate image files.
D
E

References

Kauffmann A, Gentleman R, Huber W (2009). arrayQualityMetrics - a bioconductor package for quality assessment of microarray data. Bioinformatics. [LINK]

GEAP

Gene Expression Analysis Platform

Quality Metrics Results

Section 1: Between Array Comparison

Remarks

Section 2: Array Intensity Distributions

Remarks

Section 3: Individual Array Quality

Remarks

Section 4: Variance Mean Dependency

Remarks

Quality Analysis Results - Export HTML

References