Gaggle Genome Browser performance analysis

Last updated: 2010/05/25

Gaggle Genome Browser

Version:0.9
Build Number:156
Build Date:2010/05/07 16:22
Window size:800x600

Hardware specs:

Workstation
Name:Mac Pro
Processor Name:Quad-Core Intel Xeon
Processor Speed:2.66 GHz
Memory:8 GB

Low-end machine
Name:Windows XP box
Processor Name:AMD Athlon 2600+
Processor Speed:1.92 GHz
Memory:1 GB

Rendering speed in GGB depends on several factors, but mainly depends on the number of features drawn and the complexity of the rendering. Complexity is a function of how many distinct tracks are drawn, the computing necessary to assign visual properties such and color and location and the complexity of the shapes being drawn (for example circles are more costly than rectangular shapes). Rendering is reasonably fast for feature counts into the low 10,000's.

Distributions of rendering time reflect specific usage patterns which were simulated by randomly scrolling both short and longer distances at a fixed zoom level. Since these are not precisely standardized loads, exact distributions are at best approximate. However, the general shapes of the distributions are informative. Distinct peaks are visible representing optimal rendering and rendering slowed by fetching uncached data from the DB or garbage collection. Other factors that may influence rendering time are the loads in the Swing event queue and in the program's internal task queue, disk seek time and overall system load.

We tested a maximum database size of 13G, which is stores single nucleotide RNA-seq for Human chromosome 1 (simulated with fake data). Rendering shows no discernible dependency on the size of the underlying database or the count of total features contained in the database.

We have not tried to rigorously tune the cache size. A small performance improvement could probably be achieved by doing so.

Rendering times are shown for 4 datasets of varying complexity, in milliseconds.

Rendering Time Datasets Compared


B. anthracis RNA-seq

8 tracks of single nucleotide resolution RNA-seq data. Below, we show the 200k feature zoom level, which typically renders in under 1/4 of a second on the workstation. Demo available.

B Anthracis 200k Features

DB size: 358M
Total features: 43,278,382

Summary at 200k features (211 samples)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   89.0    95.5   160.0   280.0   386.5  1209.0

Summary at 2M features (139 samples)

  Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   718    2716    2841    2788    2927    3282

Rendering Time Ba 200k


S. solfataricus

39 total tracks, including heatmap display of tiling array data. Dataset also contains RNA-seq, not shown. The screenshot shows the 20k feature view, which typically renders in about 3/4 of a second.

S Solfataricus 20k

DB size: 573M
Total features: 27,401,764

Summary at 10k features (382 samples)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  213.0   364.0   376.5   399.7   432.0   681.0

Summary at 20k features (191 samples)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  700.0   727.5   780.0   787.2   806.5  1267.0

Summary at 40k features (66 samples)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1475    1507    1558    1611    1681    2042 

Summary at 200k features (52 samples)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   5863    6026    6102    6297    6272    8882 

Rendering Time Ss

H salinarum growth series

Growth series of 14 tracks of tiling array data with segmentation. 30 total tracks. Growth curve demo available (under item 6)

H Salinarum 200k

DB size: 258M
Total features: 7,251,794

Summary at 200k features (209 samples), 512M max heap, 314M total memory used

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  482.0   495.0   500.0   587.1   534.0  1743.0

Summary at 200k features (121 samples), 65M max heap

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  494.0   505.0   512.0   668.1   596.0  1751.0

Summary at 200k features (117 samples) on low end hardware, 512M max heap

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    812     843     922    1420    2171    3812

Summary at 200k features (236 samples) on low end hardware, 65M max heap

 Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
797.0   828.0   844.0  1244.0   995.8  7656.0

Ideally, comparisons would be done with standardized loads, but since we don't have an automated way of doing this, we simulate typical usage by randomly scrolling back and forth at a fixed zoom level. In spite of this, the timing results reported here are reasonably reproducible, as shown below.

Rendering Time Halo 200k Replicates

Here, performance is compared on different machines and different JVM heap sizes. Except where indicated, this analysis was done with the default JVM heap size of 65M, which is probably too restrictive. A heap size of 512M gives little improvement in median rendering times but a noticeable decrease in maximum rendering times. Our two machines were a workstation class Mac and a low-end Windows XP box. As expected, rendering times go up on the slower machine, but performance remains adequate and the UI remains responsive.

Rendering Time Halo Different Machines

Human chromosome 1

Using artificial data, we simulated 1 track of single base resolution data on human chromosome 1, which has a length of ~250 million bps. Although this dataset has many more features than other datasets, it's complexity is very low, so it renders fast. Shown is the 200k feature view.

Due to our early use-cases, we assume irregular genome coordinates, which is typically the nature of tiling arrays. This is inefficient for data where we could take advantage of the regularity of the genome coordinates. Doing so should enable this data to be stored in ~250M * 8 (bytes) * 2 (strands) = 4G, plus reasonable overhead. However, rendering speed remained quick and the program remained responsive, except while indexing which need not be a user-facing function, showing the viability of the software for very large datasets.

Human 200k

DB size: 13G
Total features: 494,533,116

Summary at 200k features (610 samples)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  14.00   15.00   33.50   97.75  126.00  631.00

Rendering Time 200k

Summary at 2M features (253 samples)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    284     399    1667    1798    2310    4774

Rendering Time 2M


...return to GGB home