Protocol for Analysis of E. coli Membrane Array Data

Statistical Considerations:

We routinely obtain at least 2 replicates of each experimental condition. The membrane arrays have duplicate spots for each gene and each spot is considered to be a separate determination. We normalize individual hybridization replicates by expressing the signal intensity for each spot as a percentage of the sum of all spot intensities on the array (global normalization). The uncertainty that lies behind individual gene measurements can be variously calculated as the standard deviation of the determinations or the coefficient of variation can be used. Since we are usually interested in the statistical significance of differences between an experimental condition and a control, we prefer the student t-test as a means for calculating the probability, based on the uncertainty of the measurements in both conditions, that gene expression differs significantly between the conditions. A conservative approach is to consider significance at a level of P<0.0002, where false positives might be expected for 1 in 5000 genes. We use the standard deviation of the mean of the log ratios, within the context of the P value, to indicate significant up- or down-regulation of gene expression. The standard deviation for the log ratios is calculated and usually those genes that differ by more than 3 standard deviations (99.9% confidence in each tail) from the mean of the log ratio (usually zero, or no change) are considered. This approach is meaningful where the expression level of the majority of genes does not change significantly between conditions and where the researcher is interested in genes that show substantially different expression.

Downloads:

The following downloads provide analysis tools (and examples) for processing membrane array data in a Windows-PC environment using Microsoft Office tools. The macros embedded in these Microsoft Excel workbooks were designed to work with Sigma-GenoSys Panorama E. coli gene arrays. The macros are written in Visual Basic and can be modified in VB Editor for use with other array designs. 6. If the array is stripped, proceed to a new hybridization experiment. Store unused arrays in Saran Wrap at 4oC.


Array Data Analysis in Excel:


We have developed a series of manually executed macros, written in Visual Basic for Microsoft Excel, that allow for rapid, semi-automated gene expression data analysis. The first VB macro, (embedded in Image Data Cruncher.xls) assigns a spot number to each spot by its coordinate (location) on the array, assigns gene names and associated annotation information to spot numbers, and normalizes individual hybridization replicates by expressing the signal intensity for each spot as a percentage of the sum of all spot intensities. The second macro (embedded in 2-Replicate-Stats.xls) streamlines statistical analysis of experimental replicates by application of the student t-test to calculate p-values that indicate confidence levels for the significance of the ratio of experimentals to controls, based on the uncertainty of individual gene measurements. The third macro (embedded in Data Analysis.xls) introduces a threshold value, calculates expression ratios between experimental and control conditions, and calculates an additional statistical metric, the standard deviation of the mean of the log ratios. A fourth macro (embedded in Data Analysis.xls) is used to create a processed dataset suitable for import into a Microsoft Access database. Step-by-step instructions follow:

  • Open the workbook called "Image Data Cruncher", being sure to enable macros and open as read-only. [This workbook has three worksheets, named: arvdata, allfields, and Data Analysis. The "arvdata" worksheet is a blank worksheet, "allfields" has information from Sigma-Genosys used to assign spot numbers to spot-array coordinates, and the "data-analysis" worksheet contains annotation information to assign the gene name to each spot on the image.] Open the .XLS file that contains the exported raw data from Array Vision. Select the entire data set by clicking on the upper left had box, and copy/paste this information into the blank "arvdata" worksheet beginning at cell A1. To run the macro, choose Tools on the toolbar, select Macros, highlight "AllDataCrunched7" and click Run. (These 6 subroutines can be run manually, in order, to learn how the macro works.) Save the crunched data as "condition-replicate.crunched" following your standardized file-naming scheme.


  • The second workbook (2-Replicate-Stats), is used to execute the student t-test on two replicate data sets for two conditions. [This workbook has four worksheets named, "Enter Data (2 replicates)" "P-Raw", "P-Ln", and "All Values".] Open the four crunched data files to be analyzed and the "2-Replicate-Stats" workbook. The four data sets are sequentially copied and pasted beginning with the first replicate of the control condition followed by the second replicate of the control, then the first replicate of the experimental, and the second replicate of the experimental. Select columns A-F of the crunched data set in the "Data Analysis" worksheet of the "condition replicate crunched file" and copy/paste into the corresponding six columns of the "Enter Data (2 Replicates)" worksheet in the "2-Replicate-Stats" workbook. The process is repeated for the remaining three replicates. To run the macro, choose Tools, select Macros, "AllAnalysis8", and click Run. Save the file as "condition-replicate stats", following your file-naming scheme.


  • The third workbook (Data Analysis) is used to execute two macros. The first calculates the log ratio and a second statistical metric, the standard deviation of the mean of the log ratios; the second macro creates an additional spreadsheet containing selected data columns for import into Microsoft Access. [The Data Analysis workbook has two worksheets named: "crunched data" and "DataAnalysis".] Open the "condition-replicate stats" file and the "Data Analysis" workbook. Copy columns A-F of the "All Values" worksheet in the "condition-replicate stats" workbook and paste into the "crunched data" worksheet of the "Data Analysis" workbook. To run the macro, choose Tools, select Macros, "AllAnalysis6", and click Run. Next, the P values calculated in the 2-Replicate-Stats workbook are copied to the Data Analysis workbook: copy columns H-I of the "All Values" worksheet in the "condition-replicate stats" workbook and paste into columns I-J of the "Data Analysis" worksheet of the "Data Analysis" workbook. To run the second macro, select Tools, select Macros, highlight "MakeDBsheet", and click Run. Save the file as "condition-replicate analysis", following your file-naming scheme. The threshold value can be adjusted by editing the “ThresholdRatios3” subroutine in the “AllAnalysis6” macro to use a cell either higher or lower than F501 [2nd line: Range("F501").Select].


  • Combining multiple experiments in Microsoft Access: When multiple time points or multiple conditions need to be compared, it is useful to build a database. A sample database has been included here called "sampleDB". There are 3 tables in the sample database. This first one is a populated annotation table. In this table is important reference information, such as b#, array coordinate, gene, gene product, functional groupings, and accession numbers. Both the control and the test tables are populated with sample information. This information is identical to that contained in the "Data Analysis" worksheet. To see how the data tables are designed, highlight a table and click the “Design” button. Additional tables can be created and populated by copying/pasting the table (structure only), editing the design as necessary, and pasting the appropriate data from the worksheet created by MakeDB. Queries are designed by linking tables by b#, as shown in the "sampleDB". Simple queries can be run by manipulation of the parameters in query design. It is best that users obtain a third-party software manual. Data can be copy/pasted from Access into presentation software such as Spotfire (www.spotfire.com).

Workbooks and Macros

Workbook Macro Subroutine Purpose
Image Data Cruncher AllDataCrunched6 ArvAllSort1 associates array coordinate with spot number
Image Data Cruncher AllDataCrunched6 Nameall2 associates spot number with unique identifier for gene
Image Data Cruncher AllDataCrunched6 CalcPct3 normalizes data by expressing each spot as percentage of sum of all spot intensities
Image Data Cruncher AllDataCrunched6 Cleanup4 reorganizes data and calculates avg values for duplicate spots
Image Data Cruncher AllDataCrunched6 Statistics5 calculates averages of genomic DNA controls and blank spots
Workbook Macro Subroutine Purpose
2-replicate-stats AllAnalysis8 OrganizebySpotNo1 sorts each of the 4 data sets individually by spot number
2-replicate-stats AllAnalysis8 CalculateAverages2 calculates averages of volumes and pct values for the control and test replicates
2-replicate-stats AllAnalysis8 CopyValuesIntoPRaw3 copies and pastes pct values into a separate spreadsheet for calculation of p values
2-replicate-stats AllAnalysis8 CalculateLn4 copies pct values into a separate spreadsheet and natural log transforms data
2-replicate-stats AllAnalysis8 CalculatePRaw5 calculates the Pvalue for the raw data by application of the student t-test
2-replicate-stats AllAnalysis8 CalculatePLn6 calculates the Pvalue for the log transformed data by application of the student t-test
2-replicate-stats AllAnalysis8 CopyAllValues7 copies and pastes data used for ratio calculations into separate spreadsheet&
Workbook Macro Subroutine Purpose
Data Analysis AllAnalysis6 SpotSort1 sorts control and test data sets by spot number
Data Analysis AllAnalysis6 DataSort2 copies data set into spread sheet used for ratio calculations, sorts by total pct value
Data Analysis AllAnalysis6 ThresholdRatios3 calculates ratio of Test/Control using threshold of tot pct value for 500th lowest gene
Data Analysis AllAnalysis6 Cleanup4 reorganizes data and calculates log (10) of ratio
Data Analysis AllAnalysis6 Stats5 calculates standard deviation of log ratio values and correlation between Test and Control
Data Analysis Manual Step   copy and paste special values Praw and PLn values from 2-replicate-stats to Data Analysis
Data Analysis

MakeDBsheet

  reorganizes data and copies into separate spreadsheet for entry into Access database


Directory Structure:


Array Analysis
 indentMarker User Name
     indentMarker Project Name
         indentMarker Images (.gel TIFF image files)
         indentMarker Export (.xls raw data files from ArrayVision)
         indentMarker Crunched (.xls crunched image files created in Image Data Cruncher)
         indentMarker Stats (.xls files with student t-test results created in 2-Replicate-Stats)
         indentMarker Analysis (.xls files with ratio calculations created in Data Analysis)