Gaggle Components
Geese
Data Standards
Boss

Helicobacter pylori Demo (2007-04)

An Introduction to the Boss and a few simple geese

Here we introduce you to the Gaggle using a few genes from Helicobacter pylori, some public microarray data, and these tools:

  • The Gaggle Boss
  • Cytoscape:   a program for visualizing and exploring biomolecular networks
  • the Data Matrix Viewer (DMV), which gives a spreadsheet like view of tabular data
  • The R statistical environment
  • The Firegoose toolbar for Firefox

Start all these programs from these web start links:

  1. Boss   (start this first! Don't proceed until it is running.)
  2. Small flagellar-related network   (23 genes)   'Prolinks'
  3. Data Matrix Viewer   w/ public H.py microarray data from Stanford.   'DMV'
  4. Start up R and load the gaggle package.
    (Please note that installing and using the R goose involves some effort,
    and should probably be skipped by those who are new both to the Gaggle and to R.
    )
  5. Start up Firefox. (See installation instructions for the Firegoose.)

Explore the Boss

  1. Make sure that you can see all three geese in the Gaggle Boss main tab -- H. pylori Prolinks Network, DMV.
  2. Click Select All then Hide:   all three geese are now hidden.
  3. Click Listen None and Listen All and observe how the 'Listening?' status changes.
  4. Direct clicking in the 'goose table' will also change selection and listening status.
  5. Select 'Prolinks' and then click 'Show'; the Cytoscape Prolinks goose will become visible.
  6. Make sure, in addition, that only the Prolinks goose is listening.
  7. Switch to the Boss's'Annotation Search' tab.
  8. Enter flagellar in the text box, then click Search or press the 'Enter' key.
  9. 32 genes match. Click Select All then Broadcast. Three Prolinks nodes are selected.

Explore the Prolinks Cytoscape Goose

  1. Make sure that the three nodes broadcast from the Boss (just above) are selected in the network.
  2. Note that these three flagellar genes are associated, in the default view, with seven others via protein-protein, gene cluster (operon) and gene neighborhood (genome proximity in mutlitple organisms) edges. (You can use the keystroke 'Control-F' to select first neighbors of already-selected nodes to reveal these connections).
  3. You may wish to experiment with the 'Confidence' and 'p value' sliders, along with clearing the node selections, and rebroadcasting from the Boss Annotation Search tab. This limits or expands the list of genes associated with the three flagellar genes.
  4. Select the three or more flagellar-plus-associated genes you are interested in. The next step (after bringing the DMV into view, see below) will be to examine microarray expression profiles for these genes. We recommend that you finish this step with ten nodes selected:   three flagellar genes, plus their seven first neighbors.

Goose-to-Goose Communication and Control

  1. You saw above that the Gaggle Boss window allows to show and hide particular geese, and to set their 'listening state' -- that is, whether or not they will receive broadcasts made by other geese.
  2. Let's say that you want to display the DMV goose. From the pulldown menu, select DMV, and then press the S (for 'show') button immediately to the right. H hides the selected goose. B broadcasts the current node selection. N broadcasts the current network.
  3. If you set the Boss as the 'current goose', and broadcast a selection (in the Cytoscape Goose, this is either the node selection, or a network), then the broadcast goes to the Boss, which distributes the message to any and all geese which are currently listening, as controlled by the goose list in the boss.

Explore the DMV

  1. From the Cytoscape Prolinks goose, select the DMV, and press S to bring it to the front.
  2. In the left panel of the DMV mouse-click on 'environmental', find and click on the button with red label 57 just above, and see that a data matrix 1590 x 57 is loaded.
  3. Click on the left-most large button whose tooltip says, "Change the type of name displayed in the row headers". This changes the row names to gene symbols wherever those are known.
  4. Go back to the Prolinks Cytoscape window, and (after checking to be sure that some nodes are selected) broadcast these node names.
  5. Observe that the DMV now has 10 selected rows.
  6. Find and press the Plot Selected Rows button; examine the plot in its new tab.
  7. There are many more operations possible within the DMV:   finding correlations, creating submatrices, broadcasting (and receiving) matrices. We explore these in (forthcoming) demo #2.
  8. Of particular interest to the statistically savvy, one may send matrices to the R goose, perform any kind of statistical manipulation, and broadcast either a transformed matrix, or the names of genes which meet certain criteria. This is explored below in the topic titled 'Explore the R Goose'.

Explore Web Resources using the Firegoose

  1. Make sure Firefox is started and connected to the Boss with the Firegoose extension. See instructions for installing and operating the Firegoose.
  2. From the Cytoscape goose, and broadcast the names of the selected nodes to the Firegoose. For our purposes here, we recommend that you include the three flagellar genes, and their seven immediate neighbors (obtained when no restrictions are put confidence of p value of the Prolinks-derived edges.
  3. Back in Firefox, we should see gaggle: NameList(7) on the toolbar to indicate that we have received 7 gene names from the latest broadcast.
  4. Select the target web site from the drop-down list on the Gaggle toolbar. Select KEGG and click on the button labeled gaggle: NameList(7).
  5. After a few seconds pause, we see that KEGG annotates (some of) these genes to three pathways:   bacterial chemotaxis, flagellar assembly, and the type III secretion system. Follow the displayed KEGG links, and you will see, for example, that fliY (HP1030) appears in all three pathways. Note that KEGG, somewhat confusingly, lumps fliN (HP0584) with fliY.
  6. Now try broadcasting the same genes to EMBL String. Select "EMBL String" and click gaggle: NameList(7) again.
  7. STRING reveals, among many other things, that putative homologs of fliY and fliM are mentioned for three species in PubMed.

Explore the R Goose

  1. Start R, and load the gaggle package. (see R Goose Installation.)
  2. From Cytoscape Prolinks, broadcast the 10 selected nodes (3 flagellar genes, plus 7 first neighbors) to the DMV.
  3. Check in the DMV to make sure just those 10 rows are selected.
  4. Update the DMV's goose list, select RShellGoose, and broadcast (using the M button) the 10 x 57 matrix to R.
  5. In the R goose, try these commands:
    
    # get the newly-broadcast matrix into an R variable called 'm' 
    m = getMatrix ()
    
    # what are the dimensions of this matrix? 
    dim (m)
    
    # find the correlation coefficient of all 10 genes to 'HP1035'
    apply (m, 1, cor, m ['HP1035',])
    
    
  6. Clear selections in the DMV and Cytoscape and make sure they're both listening. Then, back in R enter this command:
  7. 
    # broadcast highly correlated (> 0.8) gene names back to the gaggle
    broadcast (rownames (m) [apply (m, 1, function (i) cor (m ['HP1035',], i)> 0.8)])
    
    
  8. Now, in the DMV, you can easily examine the expression profile of genes highly correlated to HP1035. Press the "Plot" button.
  9. Using the Prolinks Cytoscape goose, you can see network relationships of the correlated genes: gene neighborhood (genome proximity in multiple genomes) relationships connect these three highly-correlated genes.

Though this excursion into R does not reveal any startling biology, it does illustrate the way in which R is valuable -- perhaps even indispensable -- in any biological exploration which includes high throughput data.

© 2007, Institute for Systems Biology, All Rights Reserved
validate