Protocol
for Analysis of E. coli Membrane Array Data
Statistical Considerations:
We routinely obtain at least 2 replicates of each experimental condition.
The membrane arrays have duplicate spots for each gene and each spot is
considered to be a separate determination. We normalize individual hybridization
replicates by expressing the signal intensity for each spot as a percentage
of the sum of all spot intensities on the array (global normalization).
The uncertainty that lies behind individual gene measurements can be variously
calculated as the standard deviation of the determinations or the coefficient
of variation can be used. Since we are usually interested in the statistical
significance of differences between an experimental condition and a control,
we prefer the student t-test as a means for calculating the probability,
based on the uncertainty of the measurements in both conditions, that
gene expression differs significantly between the conditions. A conservative
approach is to consider significance at a level of P<0.0002, where
false positives might be expected for 1 in 5000 genes. We use the standard
deviation of the mean of the log ratios, within the context of the P value,
to indicate significant up- or down-regulation of gene expression. The
standard deviation for the log ratios is calculated and usually those
genes that differ by more than 3 standard deviations (99.9% confidence
in each tail) from the mean of the log ratio (usually zero, or no change)
are considered. This approach is meaningful where the expression level
of the majority of genes does not change significantly between conditions
and where the researcher is interested in genes that show substantially
different expression.
Downloads:
The following downloads provide analysis tools (and examples) for
processing membrane array data in a Windows-PC environment using
Microsoft Office tools. The macros embedded in these Microsoft Excel
workbooks were designed to work with Sigma-GenoSys
Panorama E. coli gene arrays. The macros are written in Visual Basic
and can be modified in VB Editor for use with other array designs.
6. If the array is stripped, proceed to a new hybridization experiment.
Store unused arrays in Saran Wrap at 4oC.
Array Data Analysis in Excel:
We have developed a series of manually executed macros, written
in Visual Basic for Microsoft Excel, that allow for rapid, semi-automated
gene expression data analysis. The first VB macro, (embedded in
Image Data Cruncher.xls)
assigns a spot number to each spot by its coordinate (location)
on the array, assigns gene names and associated annotation information
to spot numbers, and normalizes individual hybridization replicates
by expressing the signal intensity for each spot as a percentage
of the sum of all spot intensities. The second macro (embedded in
2-Replicate-Stats.xls)
streamlines statistical analysis of experimental replicates by application
of the student t-test to calculate p-values that indicate confidence
levels for the significance of the ratio of experimentals to controls,
based on the uncertainty of individual gene measurements. The third
macro (embedded in Data Analysis.xls)
introduces a threshold value, calculates expression ratios between
experimental and control conditions, and calculates an additional
statistical metric, the standard deviation of the mean of the log
ratios. A fourth macro (embedded in Data
Analysis.xls) is used to create a processed dataset suitable
for import into a Microsoft Access database. Step-by-step instructions
follow:
- Open the workbook called "Image
Data Cruncher", being sure to enable macros and open as read-only.
[This workbook has three worksheets, named: arvdata, allfields,
and Data Analysis. The "arvdata" worksheet is a blank worksheet,
"allfields" has information from Sigma-Genosys used to assign
spot numbers to spot-array coordinates, and the "data-analysis"
worksheet contains annotation information to assign the gene name
to each spot on the image.] Open the .XLS file that contains the
exported raw data from Array Vision. Select the entire data set
by clicking on the upper left had box, and copy/paste this information
into the blank "arvdata" worksheet beginning at cell A1. To run
the macro, choose Tools on the toolbar, select Macros, highlight
"AllDataCrunched7" and click Run. (These 6 subroutines can be
run manually, in order, to learn how the macro works.) Save the
crunched data as "condition-replicate.crunched" following your
standardized file-naming scheme.
- The second workbook (2-Replicate-Stats),
is used to execute the student t-test on two replicate data sets
for two conditions. [This workbook has four worksheets named,
"Enter Data (2 replicates)" "P-Raw", "P-Ln", and "All Values".]
Open the four crunched data files to be analyzed and the "2-Replicate-Stats"
workbook. The four data sets are sequentially copied and pasted
beginning with the first replicate of the control condition followed
by the second replicate of the control, then the first replicate
of the experimental, and the second replicate of the experimental.
Select columns A-F of the crunched data set in the "Data Analysis"
worksheet of the "condition replicate crunched file" and copy/paste
into the corresponding six columns of the "Enter Data (2 Replicates)"
worksheet in the "2-Replicate-Stats" workbook. The process is
repeated for the remaining three replicates. To run the macro,
choose Tools, select Macros, "AllAnalysis8", and click Run. Save
the file as "condition-replicate stats", following your file-naming
scheme.
- The third workbook (Data Analysis)
is used to execute two macros. The first calculates the log ratio
and a second statistical metric, the standard deviation of the
mean of the log ratios; the second macro creates an additional
spreadsheet containing selected data columns for import into Microsoft
Access. [The Data Analysis workbook has two worksheets named:
"crunched data" and "DataAnalysis".]
Open the "condition-replicate stats" file and the "Data
Analysis" workbook. Copy columns A-F of the "All Values"
worksheet in the "condition-replicate stats" workbook
and paste into the "crunched data" worksheet of the
"Data Analysis" workbook. To run the macro, choose Tools,
select Macros, "AllAnalysis6", and click Run. Next,
the P values calculated in the 2-Replicate-Stats workbook are
copied to the Data Analysis workbook: copy columns H-I of the
"All Values" worksheet in the "condition-replicate
stats" workbook and paste into columns I-J of the "Data
Analysis" worksheet of the "Data Analysis" workbook.
To run the second macro, select Tools, select Macros, highlight
"MakeDBsheet", and click Run. Save the file as "condition-replicate
analysis", following your file-naming scheme. The threshold
value can be adjusted by editing the “ThresholdRatios3”
subroutine in the “AllAnalysis6” macro to use a cell
either higher or lower than F501 [2nd line: Range("F501").Select].
- Combining multiple experiments in Microsoft Access: When multiple
time points or multiple conditions need to be compared, it is
useful to build a database. A sample database has been included
here called "sampleDB". There
are 3 tables in the sample database. This first one is a populated
annotation table. In this table is important reference information,
such as b#, array coordinate, gene, gene product, functional groupings,
and accession numbers. Both the control and the test tables are
populated with sample information. This information is identical
to that contained in the "Data Analysis" worksheet. To see how
the data tables are designed, highlight a table and click the
“Design” button. Additional tables can be created and populated
by copying/pasting the table (structure only), editing the design
as necessary, and pasting the appropriate data from the worksheet
created by MakeDB. Queries are designed by linking tables by b#,
as shown in the "sampleDB". Simple
queries can be run by manipulation of the parameters in query
design. It is best that users obtain a third-party software manual.
Data can be copy/pasted from Access into presentation software
such as Spotfire (www.spotfire.com).
Workbooks and Macros
Workbook |
Macro |
Subroutine |
Purpose |
Image Data
Cruncher |
AllDataCrunched6 |
ArvAllSort1 |
associates
array coordinate with spot number |
Image Data
Cruncher |
AllDataCrunched6 |
Nameall2 |
associates
spot number with unique identifier for gene |
Image Data
Cruncher |
AllDataCrunched6 |
CalcPct3 |
normalizes
data by expressing each spot as percentage of sum of all spot
intensities |
Image Data
Cruncher |
AllDataCrunched6 |
Cleanup4 |
reorganizes
data and calculates avg values for duplicate spots |
Image Data
Cruncher |
AllDataCrunched6 |
Statistics5 |
calculates
averages of genomic DNA controls and blank spots |
Workbook |
Macro |
Subroutine |
Purpose |
2-replicate-stats |
AllAnalysis8 |
OrganizebySpotNo1 |
sorts each
of the 4 data sets individually by spot number |
2-replicate-stats |
AllAnalysis8 |
CalculateAverages2 |
calculates
averages of volumes and pct values for the control and test
replicates |
2-replicate-stats |
AllAnalysis8 |
CopyValuesIntoPRaw3 |
copies and
pastes pct values into a separate spreadsheet for calculation
of p values |
2-replicate-stats |
AllAnalysis8 |
CalculateLn4 |
copies pct
values into a separate spreadsheet and natural log transforms
data |
2-replicate-stats |
AllAnalysis8 |
CalculatePRaw5 |
calculates
the Pvalue for the raw data by application of the student t-test |
2-replicate-stats |
AllAnalysis8 |
CalculatePLn6 |
calculates
the Pvalue for the log transformed data by application of the
student t-test |
2-replicate-stats |
AllAnalysis8 |
CopyAllValues7 |
copies and
pastes data used for ratio calculations into separate spreadsheet& |
Workbook |
Macro |
Subroutine |
Purpose |
Data
Analysis |
AllAnalysis6 |
SpotSort1 |
sorts
control and test data sets by spot number |
Data
Analysis |
AllAnalysis6 |
DataSort2 |
copies
data set into spread sheet used for ratio calculations, sorts
by total pct value |
Data
Analysis |
AllAnalysis6 |
ThresholdRatios3 |
calculates
ratio of Test/Control using threshold of tot pct value for 500th
lowest gene |
Data
Analysis |
AllAnalysis6 |
Cleanup4 |
reorganizes
data and calculates log (10) of ratio |
Data
Analysis |
AllAnalysis6 |
Stats5 |
calculates
standard deviation of log ratio values and correlation between
Test and Control |
Data
Analysis |
Manual
Step |
|
copy
and paste special values Praw and PLn values from 2-replicate-stats
to Data Analysis |
Data
Analysis |
MakeDBsheet |
|
reorganizes
data and copies into separate spreadsheet for entry into Access
database |
Directory Structure:
Array Analysis
User Name
Project Name
Images (.gel TIFF image files)
Export (.xls raw data files from ArrayVision)
Crunched (.xls crunched image files created in Image
Data Cruncher)
Stats (.xls files with student t-test results created in 2-Replicate-Stats)
Analysis (.xls files with ratio calculations created in Data
Analysis)
|