Gaggle Components
Data Standards

Gaggle Genome Browser

New Project Wizard

For a quick and easy start see getting started.

A Genome is defined as a collection of

  1. Sequences
  2. Genes

Creating a new project in the genome browser envolves importing the data that defines a genome. First, the program needs to know the lengths and numbers of sequences you'll be working with. This includes chromosomes, plasmids and sequences associated with organelles like mitochondria or plastids. Typically, we'll also create a track of genes (including protein coding regions plus tRNAs, rRNA, and other transcribed regions where available). For common model organisms, this information can be acuired automatically from public data sources.

Projects are created using the New Project Wizard. Each step of the wizard is explained below.

Pick an organism

In simple cases, all you need to do is pick an organism and click OK. Reasonable defaults will be chosen for the remainder of the options. This scenario is illustrated in the getting started section.

You may want additional control or you may be working with an organism not on the list of known organisms. Regardless, enter the name of your organism on the first panel of the wizard. We'll use Creepus delicti as a silly example. Use the green right arrow to procede through the New Project Wizard.

Name your project

By default, the genome browser creates a directory called hbgb in your documents directory. The program tries to respect operating system conventions for where to put user documents. For example, on Windows XP, this might be c:\Documents and Settings\[username]\My Documents\hbgb\. Each project is contained in a sqlite database file which is named after the organism with a .hbgb file extension.

The project has a name distinct from the filename, which can be used as a human readable title for your project.

Select a Data Source

If your organism is recognized by the program, the genome data can be downloaded from UCSC. We intend to support NCBI as well in a future release. In other cases, it will be necessary to load your own data.

Specify Sequences

Sequences are specified by cutting and pasting into a text box. One sequence per line. Each line specifies a name; length; and optional topology separated by semicolons. An example is shown below.

We can select the topology for all sequences at once using the radio button or individually as we have below. Per-sequence settings override the radio button.

chromosome 1; 1000000; circular
chromosome 2; 500000; circular
plasmid XYZ; 100000; circular

More info about sequences

Genome Features

Loading features is optional. It requires a tab-delimited text file with the following columns:

(Sequence, Strand, Start, End, Unique Name, Common Name, Gene Type)

Sequence names must exactly match those defined on the precious panel. Strand should be either +, - or . (for no strand). Gene Type indicates coding sequence or other types of features, for example: cds, rna, trna or rrna.

© 2006, Institute for Systems Biology, All Rights Reserved