[Gaggle home | Firegoose home | contents]

The Scenario

A researcher undertakes a study of the physiological response in H. salinarum to changes in oxygen level. She does a series of 61 microarrays under conditions of varying oxygen concentration.

What can we learn from this data?

Differentially expressed genes

In Part 1 we found differentially expressed genes from microarray data and clustered them by expression profile. Now we want to find information about the functions of genes in our clusters.

Cluster 1: Anaerobic genes

These 222 genes were found to be genes activated under anaerobic conditions.

[+] Show/hide cluster 1.

Cluster 2: Aerobic genes

These 223 genes were found to be genes repressed under anaerobic conditions and active under aerobic conditions.

[+] Show/hide cluster 2.

Part 2: Exploring the functions of anaerobically induced genes.

In this part of the analysis we demonstrate the utility of the Firegoose by discovering the cellular processes in which the differentially regulated genes take part. We will concentrate on the anaerobically induced genes, but a similar analysis could be done on the aerobically induced genes as well.

If you haven't already, install the Firegoose add-on to the Mozilla browser (instructions here). The Firegoose adds a toolbar to the browser that enables quick and easy data transfer between the Gaggle and a number of popular biological web sites. The parts of the toolbar and their functions are shown below.

Firegoose toolbar

  1. If the Gaggle Boss is not already running, start the Gaggle Boss.
  2. Press the multicolored Gaggle button on the Firegoose toolbar to connect the Firegoose to the Boss. The status indicator in the lower right of the browser should say, "Firegoose: Connected" and the Firegoose should appear in the list of connected geese in the Boss.

Known problems

The Firegoose has some bugs the cause strange behavior when there are multiple browser windows open. Also, there are points at which time consuming operations are happening in the background without a clear indication for the user to wait. Please be patient with these problems.

Step 1: Find KEGG Pathways

Query the KEGG Pathway database for standard biochemical pathways in which our genes of interest participate.

  1. In the Firegoose toolbar, click on the Gaggle Data drop-down menu and select cluster 1 anaerobic genes: NameList(222). The toolbar has detected the presence of data embedded in this web page in the Gaggle microformat.
  2. In the target menu, select KEGG Pathway. The list of 222 anaerobically induced genes is now ready to be broadcast to KEGG.
  3. Click the Broadcast button.

A new tab should open in the browser window showing KEGG pathways found for our query. Along with many genes of unknown function, we see a few interesting results:

H. salinarum ferments arginine in the absence of oxygen, a process in which some of the amino acid metabolism genes are likely involved. Other pathways include transporters by which the organism may alter its uptake of nutrients. Finally, two transcription factors are identified, providing clues to the regulatory systems involved.

Step 2: Search EMBL STRING

EMBL STRING is a database of functional associations between proteins. We will next query STRING with our set of genes of interest.

Note: there will be a long pause while STRING is processing the query. Wait while the message "Waiting for string.embl.de..." appears in the lower left corner of the browser.

  1. Make sure cluster 1 anaerobic genes: NameList(222) is still selected in the Gaggle Data menu.
  2. Select the target EMBL STRING and click Broadcast.
  3. The initial STRING query page will appear. Wait while STRING processes the query, which will take about 10-15 seconds. A list of proteins will appear. Click the Continue -> button.

String network

A large network will appear showing the proteins as nodes and colored edges indicating various kinds of evidence for functional associations. Some connected components in the network correspond to the pathways found in KEGG. Others represent specialized pathways not present in the KEGG database. One of these is Dimethylsulfoxide respiration, which appears in the network as 6 interconnected proteins.

DMSO respiration proteins

Clicking on the nodes in a STRING network gives more information including protein domains and annotations. Clicking on VNG0829G reveals its function as a DMSO reductase. Another group of 5 interconnected proteins are gas vesicle proteins (gvp) which H. salinarum uses to control bouyancy.

Step 3: Examine the VNG1187G cluster

We'll drill down into STRING's supporting evidence for the group of genes of unknown function containing VNG1187G.

String 1187

  1. Click on VNG1187G in the STRING network. In the pop-up menu click on recenter the network on this node. VNG1187G should now appear in a network by itself.
  2. Click more twice to expand the network out to second neighbors.

This brings in proteins that were not identified as differentially expressed, but may still participate in pathways with the proteins linked to VNG1187G. The additional nodes may shed some light on the function of these proteins.

  1. Click on the nodes in the new network and examine their annotations and domains.
  2. Click the browser's Back button 3 times to return to the full network.

Step 4: Broadcast to Cytoscape

The size of the network makes it unweildy and STRING provides no way to select subnetworks. So, we want to broadcast the full STRING network to Cytoscape. Again, be prepared for a longish pause.

NOTICE: This page is temporarily using an older version of Cytoscape. The instructions will be the same, but what you see in cytoscape may not match exactly the image on this page.

  1. Start Cytoscape (with cygoose plug-in).
  2. Make sure Cytoscape connects to the Boss.
  3. Click the multicolored Gaggle button on the Firegoose toolbar to update list of connected Geese.
  4. With the STRING tab open in the browser, select Protein interactions from STRING: Network in the Gaggle Data menu.
  5. Select Cytoscape in Firegoose's target menu.
  6. Click Broadcast. Wait ~10-15 seconds while the toolbar downloads the network as an XML file, parses it, and transfers it to Cytoscape.
  7. Switch to Cytoscape. Open the Layout menu and select yFiles->Organic. Zoom in until the node labels become visible.

Cytoscape

Now that we have our network in a more interactive form, we can select components and broadcast them. This enables us to find more information about selected subsets of the network.

Detailed instructions for using Cytoscape are available at cytoscape.org.

Step 5: Broadcast selected nodes to annotation database

The Halo Annotations target refers to an organism specific database of functional annotations based on sequence and structure based computation and experimental evidence. We can query this database by selecting some nodes in Cytoscape and broadcasting them (as a List) to the Firegoose, then rebroadcasting them to to the Halo Annotations target.

  1. Select the group of four connected genes containing VNG1187G in Cytoscape.
  2. In the CyGoose panel at the left of the Cytoscape window, select Firegoose as the target.
  3. Click the button marked List to broadcast the four gene identifiers to Firegoose.
  4. In the Firegoose, make sure gaggle: NameList(4) is selected in the Gaggle Data menu and select Halo Annotations in the target menu. Click Broadcast.

Halo Annotations

We see that some of these genes are annotated as being involved in nitrite reduction. We can drill down into the supporting evidence by clicking the links in the Function column.

  1. Click on "putative Cu-containing nitrite reductase", which is the annotation for VNG1187G.
  2. Then click on the match labeled 1kbvA to show the entry in the Protein Data Bank on which this annotation was based.

Try the same procedure on other groups of genes. Select them in Cytoscape, broadcast to Firegoose, then use the Firegoose to query the annotations database or other data sources, for example, Entrez Protein.

Step 6: Perform functional clustering with DAVID

DAVID is another resource for functional annotations and protein domains. One challenge in using DAVID is that it doesn't work well with the VNG naming system used in H. salinarum. We'll have to translate our broadcasts to and from DAVID from VNG identifiers to GI numbers.

(optional)

  1. Start the VNG->GI translator.
  2. Select Firegoose as the target in the translator.
  3. Check the "auto" checkbox. With auto checked, the translator will automatically translate and rebroadcast any broadcast it receives.
  4. Update the list of geese in Firegoose by clicking the multicolored Gaggle button.
  5. From the Firegoose, broadcast the cluster 1 anaerobic genes: NameList(222) to the translator. Matching GI numbers will be automatically rebroadcast back to the Firegoose. gaggle: NameList(219) will appear in the Gaggle Data menu (a few genes fail to translate).

Translator

For convenience, the translated list of genes has been embedded in this page and appears as Anaerobic genes as GI numbers: NameList(219) in the Gaggle Data menu.

[+] Show/hide Anaerobic genes as GI numbers.

  1. Broadcast either of gaggle: NameList(219) or Anaerobic genes as GI numbers: NameList(219) to DAVID.

The genes are pasted into the search form of DAVID. We still have to tell DAVID that these are GI numbers.

  1. Select "GI_ACCESSION" in the drop-down labeled Step 2: select identifier.
  2. Click Submit List.
  3. When DAVID has finished processing the uploaded list, press the button marked Functional Annotation Clustering.

DAVID

DAVID groups our proteins into clusters based on functional annotations from several primary sources. We can broadcast a cluster or the subcategories within a cluster back to the Gaggle, but doing so requires another translator, from GI to VNG this time, and a trick.

(optional)

  1. Start the GI->VNG translator.
  2. Click the multicolored Gaggle button on the toolbar to update the list of connected geese.
  3. Back in the DAVID clustering results, find the cluster containing signal transduction annotations.
  4. Click on the red G.
  5. Now the trick; we want to select the first column holding GI_ACCESSION numbers in the table. We can do that by holding down the control key (on windows) or the command key (on OS X) and dragging the mouse over the table cells in that column. Those cells should then be selected.
  6. Right-click on the selected cells and choose Capture Selection. An entry Selection: geneList(4) should appear in the Gaggle Data menu.

Capture Selections

Step 7: View signal transduction protein domains in STRING

We could hunt and find these proteins in the original large STRING network, but viewing them separately would be easier.

  1. Broadcast the Selection: geneList(4) to the GI->VNG translator.
  2. From the translator, broadcast the corresponding VNG numbers back to the Firegoose.
  3. Rebroadcast to STRING. Wait until the list of proteins shows up. Press Continue ->.

Alternatively, broadcast the embedded Signal transducers: NameList(4) to STRING. We should note that although Bop clusters with signal transducers, it's true function is phototrophy.

[+] Show/hide Signal transducers.

STRING protein domains

Now we can explore the domains identified in these signal transducers.

  1. Click on a protein in the STRING network.
  2. Follow the links to find out more about its known domains.

Conclusions

Using the Firegoose, the Gaggle, and several public data sources, we have created an integrated visualization of the cellular processes activated by H. salinarum in response to anoxic environments. The diagram summarizes our results.

Cellular processes activated in anoxic environments

Edges represent several types of evidence for functional association provided by STRING. Yellow filled nodes indicate genes classified by KEGG. Blue outline nodes indicate genes classified by DAVID. Other nodes were characterized by the annotation database or other sources, including PFAM, BLAST, and PDB. 102 genes of unknown function were omitted.

The above network is availabe for viewing and manipulation in Cytoscape. Try selecting a group of nodes and broadcasting to STRING or the Halobacterium annotations database.

String network in Cytoscape (1.x).

Institute for Systems Biology

validate