Gaggle and Firegoose: integrate, explore and analyze data
In this example, we explore and analyze gene expression changes in response to perturbations in oxygen tension. There are six different tools used in this exploration, all developed independently from each other.
DMV (Data Matrix Viewer, developed at ISB) for visualizing matrices; in this example two symmetric matrices: one (log10 ratios) with magnitude of change in expression for each of 2400 genes, and another (lambdas) with significance of each measured change in the first matrix
R (from R-project.org) for statistical analysis using commandline operations
MEV (from Dana Farber Cancer Institute) for statistical analysis using simple point-and-click operations
Genome Browser (from ISB) for visualizing the genome map along with tracks for Transcriptome Structure (based on total RNA hybridization to high-density arrays) and Protein-DNA interactions (ChIP-chip data)
Cytoscape (from Agilent, ISB, Sloan-Kettering, NCBI, Institut Pasteur, UToronto, UCSD, UCSF, and Unilever) for visualizing evolutionary, literature-based, and functional associations (using comparative genomics and literature mining) from the STRING database
Firegoose (from ISB) for performing interactive queries against online databases (KEGG in this example)
(0:00) From DMV to R and back. The exploration begins in DMV. The user wishes to filter the matrix based on significance of change in gene expression. To do this he selects the lambda matrix and broadcasts it to R. In R he retrieves the matrix (with the getMatrix() function); filters it (filterMatrix()); and broadcasts names of genes that match the filtering criteria (broadcast(filterMatrix()) to DMV (using setTargetGoose('DMV') before issuing the broadcast command.
From MEV to Genome Browser. Next, the user slices the main log10 ratios matrix based on selection of genes from the R broadcast. Expression profiles of genes in this submatrix are significant but have complex relationships to one another. For further analysis he sends the submatrix to MEV where he performs hierarchical clustering to find correlated patterns of change in gene expression. To learn more about a specific branch in the hierarchical tree, he selects the branch by clicking, sets Genome Browser as the destination, and broadcasts the gene names.
(2:28) In Genome Browser. The user explorers the locations of co-expressed genes discovered using MEV in context of Transcriptome Structure and Protein-DNA interactions -- these data are visualized as tracks along the genome map. This exploration illustrates how many of the co-expressed genes share binding sites for a transcription factor suggesting they might be co-regulated.
From MEV to Cytoscape. To understand the relationships among these co-expressed genes the user goes back to MEV and broadcasts the gene names to Cytoscape which contains a complex network of all genes connected to one another based on whether they share phylogeny, are co-mentioned in literature, are chromosomally proximal to one another etc. The user observes that many of the potentially co-regulated genes are also associated in multiple ways.
In Cytoscape and thereon to Firegoose. The user extracts the subnetwork for genes selected from the MEV broadcast. He selects a subset of interacting genes in the subnetwork and broadcasts it to Firegoose.
From Firegoose to KEGG and back to the desktop. The user broadcasts genes received by Firegoose to KEGG and explores the query results to learn that the potentially co-regulated genes function in diverse metabolic pathways. Further he learns that two of the 21 genes he queried catalyze linked steps in the TCA cycle; importantly these genes are not chromsomally linked based on his findings from the Genome Browser. Collectively this exploration has enabled the user to assemble a hypothesis that a single transcription factor ties together chromsomally distant genes into a oxygen-responsive regulon. The user can either send the entire list of TCA cycle genes or the metabolic network back to Cytoscape for archiving. In this manner explorations within the Gaggle and Firegoose framework can move seamlessly through third party resources on the desktop or on a remote website.