Gaggle Components
Geese
Data Standards
Boss

Translator

In biology, there is a large number of naming systems for orfs, genes, and their products. The Translator attempts to manage some of that complexity by allowing relatively painless conversion between one naming system and another.

Different naming systems are often mutually inconsistent, so mapping between them is destined to be a lossy process. That's lossy as in lossy data compression, not loosey as in loosey-goosey.

That said, the software supports a loose definition of translation, encompassing scenarios like mapping peptides to genes or mapping across species via COG membership. These go beyond simply exchanging one naming system for another. Maintaining the desired degree of rigor is up to the user's judgement.

Options

In lieu of consistency, the Translator gives the user control. There are options to configure how the translation is performed and it is easy to modify an existing translation or load up a completely new one. This involves editing a tab-delimited text file.

If the Allow One-to-Many option is set, as it is by default, one term in the source namespace may map to more than one term in the target namespace. So, if you're translating a list of gene names, you may start with 10 names and end up with 11. Networks may expand during translation and data matrices to grow extra rows.

If Allow One-to-Many is false, a term in the source namespace will map to at most one term in the target namespace. If a translates to x, y, and z, the software has no way of knowing which, out of x, y, or z is the preferred translation. It makes the choice arbitrarily. If this is of concern, a hand curated one-to-one mapping is probably better.

The Drop Untranslatable Terms option controls what happens when there is no translation for a given term. It will either be dropped, or simple translate to itself.

To get a strict one-to-one mapping, set both allow one-to-many and drop untranslatable terms to false. Then every element in the source data structure will map to exactly one element in the translated data structure.

Orthology

One goal of the Translator is to support mapping between species via orthology.

A quick demo

  1. Start the Gaggle boss (optional).
  2. Start the translator.
  3. Click File|Load Translation File. Enter or browse to a tranlation file, such as the one at:
    http://gaggle.systemsbiology.net/docs/geese/translator/MAGGIEortho.tsv
    This file contains orthologous genes from three organisms, halobacterium, pyrococcus, and sulfolobus.
  4. Press Preview and select a translation from Halo locus to any of the other naming systems.
  5. Paste these gene names into the source text box.
    VNG0565C
    VNG0566C
    VNG0285C
    VNG0715G
    VNG1252G
    VNG0284C
    VNG0284C
    VNG1615G
    

Homologene Sample

Use homologene data to map orthologs among several model organisms. The data file contains 44449 sets of homologous gene ids from 20 organisms.

  • Download gene ID data file: homologene.tsv
  • ...or use gene symbols data file: symbols.tsv
  • Start translator.
  • Click File|Load Translation File, navigate to the data file, click Preview, and select the desired source and target organisms. Click OK and broadcast or paste some source gene IDs into the source box. Press Translate. Homologous gene ids from the target organism should appear in the target box.
  • For example, select mouse as the source and rat as the target and try translating the mouse gene IDs below.
    65116
    19214
    19215
    19216
    19217
    19220
    19224
    27367
    100043000
    67891
    19899
    68193
    19946
    667779
    666899
    19943
    100042670
    
© 2006, Institute for Systems Biology, All Rights Reserved
validate