[Gaggle home | Firegoose home | contents]
A researcher undertakes a study of the physiological response in H. salinarum to changes in oxygen level. She does a series of 61 microarrays under conditions of varying oxygen concentration.
What can we learn from this data?
In Part 1 we found differentially expressed genes from microarray data and clustered them by expression profile. Now we want to find information about the functions of genes in our clusters.
These 222 genes were found to be genes activated under anaerobic conditions.
[+] Show/hide cluster 1.
These 223 genes were found to be genes repressed under anaerobic conditions and active under aerobic conditions.
[+] Show/hide cluster 2.
In this part of the analysis we demonstrate the utility of the Firegoose by discovering the cellular processes in which the differentially regulated genes take part. We will concentrate on the anaerobically induced genes, but a similar analysis could be done on the aerobically induced genes as well.
If you haven't already, install the Firegoose add-on to the Mozilla browser (instructions here). The Firegoose adds a toolbar to the browser that enables quick and easy data transfer between the Gaggle and a number of popular biological web sites. The parts of the toolbar and their functions are shown below.
The Firegoose has some bugs the cause strange behavior when there are multiple browser windows open. Also, there are points at which time consuming operations are happening in the background without a clear indication for the user to wait. Please be patient with these problems.
Query the KEGG Pathway database for standard biochemical pathways in which our genes of interest participate.
A new tab should open in the browser window showing KEGG pathways found for our query. Along with many genes of unknown function, we see a few interesting results:
H. salinarum ferments arginine in the absence of oxygen, a process in which some of the amino acid metabolism genes are likely involved. Other pathways include transporters by which the organism may alter its uptake of nutrients. Finally, two transcription factors are identified, providing clues to the regulatory systems involved.
EMBL STRING is a database of functional associations between proteins. We will next query STRING with our set of genes of interest.
Note: there will be a long pause while STRING is processing the query. Wait while the message "Waiting for string.embl.de..." appears in the lower left corner of the browser.
A large network will appear showing the proteins as nodes and colored edges indicating various kinds of evidence for functional associations. Some connected components in the network correspond to the pathways found in KEGG. Others represent specialized pathways not present in the KEGG database. One of these is Dimethylsulfoxide respiration, which appears in the network as 6 interconnected proteins.
Clicking on the nodes in a STRING network gives more information including protein domains and annotations. Clicking on VNG0829G reveals its function as a DMSO reductase. Another group of 5 interconnected proteins are gas vesicle proteins (gvp) which H. salinarum uses to control bouyancy.
We'll drill down into STRING's supporting evidence for the group of genes of unknown function containing VNG1187G.
This brings in proteins that were not identified as differentially expressed, but may still participate in pathways with the proteins linked to VNG1187G. The additional nodes may shed some light on the function of these proteins.
The size of the network makes it unweildy and STRING provides no way to select subnetworks. So, we want to broadcast the full STRING network to Cytoscape. Again, be prepared for a longish pause.
NOTICE: This page is temporarily using an older version of Cytoscape. The instructions will be the same, but what you see in cytoscape may not match exactly the image on this page.
Now that we have our network in a more interactive form, we can select components and broadcast them. This enables us to find more information about selected subsets of the network.
Detailed instructions for using Cytoscape are available at cytoscape.org.
The Halo Annotations target refers to an organism specific database of functional annotations based on sequence and structure based computation and experimental evidence. We can query this database by selecting some nodes in Cytoscape and broadcasting them (as a List) to the Firegoose, then rebroadcasting them to to the Halo Annotations target.
We see that some of these genes are annotated as being involved in nitrite reduction. We can drill down into the supporting evidence by clicking the links in the Function column.
Try the same procedure on other groups of genes. Select them in Cytoscape, broadcast to Firegoose, then use the Firegoose to query the annotations database or other data sources, for example, Entrez Protein.
DAVID is another resource for functional annotations and protein domains. One challenge in using DAVID is that it doesn't work well with the VNG naming system used in H. salinarum. We'll have to translate our broadcasts to and from DAVID from VNG identifiers to GI numbers.
For convenience, the translated list of genes has been embedded in this page and appears as Anaerobic genes as GI numbers: NameList(219) in the Gaggle Data menu.
[+] Show/hide Anaerobic genes as GI numbers.
The genes are pasted into the search form of DAVID. We still have to tell DAVID that these are GI numbers.
DAVID groups our proteins into clusters based on functional annotations from several primary sources. We can broadcast a cluster or the subcategories within a cluster back to the Gaggle, but doing so requires another translator, from GI to VNG this time, and a trick.
We could hunt and find these proteins in the original large STRING network, but viewing them separately would be easier.
Alternatively, broadcast the embedded Signal transducers: NameList(4) to STRING. We should note that although Bop clusters with signal transducers, it's true function is phototrophy.
[+] Show/hide Signal transducers.
Now we can explore the domains identified in these signal transducers.
Using the Firegoose, the Gaggle, and several public data sources, we have created an integrated visualization of the cellular processes activated by H. salinarum in response to anoxic environments. The diagram summarizes our results.
Edges represent several types of evidence for functional association provided by STRING. Yellow filled nodes indicate genes classified by KEGG. Blue outline nodes indicate genes classified by DAVID. Other nodes were characterized by the annotation database or other sources, including PFAM, BLAST, and PDB. 102 genes of unknown function were omitted.
The above network is availabe for viewing and manipulation in Cytoscape. Try selecting a group of nodes and broadcasting to STRING or the Halobacterium annotations database.
String network in Cytoscape (1.x).