Gaggle Components
Geese
Data Standards
Boss

Gaggle Genome Browser

Schema

GGB uses SQLite files to store datasets, usually with the ".hbgb" file extension. The basic schema is shown below.

A dataset (typically there's just one) has a set of sequences and a set of tracks. Sequences, the chromosomes and plasmids that make up the genome, determine the coordinate system. Tracks hold data to be plotted on those coordinates.

Schema

Tracks and Features

Track data consists of a set of features, where a feature is a data point related to some location on the genome. To allow for different kinds of features, we store the features for each track in a separate table - one feature table per track. The feature tables are linked to their tracks by the table_name field in the tracks table.

The three basic types of features supported by GGB are "gene", "quantitative.segment", and "quantitative.positional". As shown in the diagram, the feature tables for these features types vary in schema, allowing different types of data to be associated with locations on the genome. Other track types exist or can be added for specialized purposes, for example "peptide" and "quantitative.segment.matrix".

Most users should not have to know this, but the mapping of track types, feature types and renderers takes place in the TrackRendererRegistry (newInstance() method). Anyone digging into this code should send questions and abuse to the mailing list.

Also see: schema.sql and the gaggle genome browser paper for more details.

Attributes

All entities identified by a UUID (datasets, sequences and tracks) can be assigned attributes. These are key/value pairs where the value can be a string, integer, floating point number or boolean value, taking advantage of SQLite's flexibile concept of type affinity. Among other uses, attributes configure the visual style of tracks -- what renderer will be used to draw the data plus color, location and other parameters.

Example

Here, we show an example. The track "Transcript Signal" is of type "quantitative.segment" meaning it maps a floating point value to a location on the genome that has a start and end (a segment). It's feature table "features_transcript_signal" reflects this in it's schema. The track data will be rendered using the Scaling renderer in a nice shade of blue, and positioned using the top and height values.

Specifically, the track will occupy a horizontal band across the screen between 37% and 48% (0.37+0.11) of the screen height. It is implicit in the choice of "Scaling" renderer that data in the negative strand will be reflected about the horizontal axis in a band between 52% and 63% of the screen height. In general, interpretation of positioning, and other visual properties, is entirely up to the renderer, which is very flexible at the cost of some complexity.

Note that "features_genes" holds a different type of feature than "features_transcript_signal", and consequently has a different schema.

tracks

UUIDNameTypetable_name
22e5d4ba-a1c0-4a9f-921d-53db6e8be038Genesgenefeatures_genes
6206eaff-b0c1-4dc1-8cef-521cbb2dc0a3Transcript Signalquantitative.segmentfeatures_transcript_signal
...

attributes

UUIDKeyValue
6206eaff-b0c1-4dc1-8cef-521cbb2dc0a3top0.37
6206eaff-b0c1-4dc1-8cef-521cbb2dc0a3height0.11
6206eaff-b0c1-4dc1-8cef-521cbb2dc0a3color0x80336699
6206eaff-b0c1-4dc1-8cef-521cbb2dc0a3viewerScaling
...

features_genes

Sequences_idStrandStartEndnamecommon_namegene_type
1+2481450VNG0001Hcds
1+14502112VNG0002GyrvOcds
1+21453251VNG0003Ccds
...

features_transcript_signal

Sequences_idStrandStartEndValue
1+117012.5113
1+31909.0327
1+511106.8395
...

sequences

IDUUIDNameLengthTopology
1213faf4a-e763-4acc-8d2f-d61d4694e23achromosome2014239circular
22df49fe5-8b7a-4623-9c9a-6bef06e712e1pNRC200365425circular
3a859ef0f-6815-4a84-bbbd-84508736dde4pNRC100191346circular
...

Help

Please ask for help on the Gaggle mailing list.

© 2006, Institute for Systems Biology, All Rights Reserved
validate