Last updated: 2010/05/25
Gaggle Genome Browser
|Build Date:||2010/05/07 16:22|
|Processor Name:||Quad-Core Intel Xeon|
|Processor Speed:||2.66 GHz|
|Name:||Windows XP box|
|Processor Name:||AMD Athlon 2600+|
|Processor Speed:||1.92 GHz|
Rendering speed in GGB depends on several factors, but mainly depends on the number of features drawn and the complexity of the rendering. Complexity is a function of how many distinct tracks are drawn, the computing necessary to assign visual properties such and color and location and the complexity of the shapes being drawn (for example circles are more costly than rectangular shapes). Rendering is reasonably fast for feature counts into the low 10,000's.
Distributions of rendering time reflect specific usage patterns which were simulated by randomly scrolling both short and longer distances at a fixed zoom level. Since these are not precisely standardized loads, exact distributions are at best approximate. However, the general shapes of the distributions are informative. Distinct peaks are visible representing optimal rendering and rendering slowed by fetching uncached data from the DB or garbage collection. Other factors that may influence rendering time are the loads in the Swing event queue and in the program's internal task queue, disk seek time and overall system load.
We tested a maximum database size of 13G, which is stores single nucleotide RNA-seq for Human chromosome 1 (simulated with fake data). Rendering shows no discernible dependency on the size of the underlying database or the count of total features contained in the database.
We have not tried to rigorously tune the cache size. A small performance improvement could probably be achieved by doing so.
Rendering times are shown for 4 datasets of varying complexity, in milliseconds.
8 tracks of single nucleotide resolution RNA-seq data. Below, we show the 200k feature zoom level, which typically renders in under 1/4 of a second on the workstation. Demo available.
DB size: 358M
Total features: 43,278,382
Summary at 200k features (211 samples)
Min. 1st Qu. Median Mean 3rd Qu. Max. 89.0 95.5 160.0 280.0 386.5 1209.0
Summary at 2M features (139 samples)
Min. 1st Qu. Median Mean 3rd Qu. Max. 718 2716 2841 2788 2927 3282
39 total tracks, including heatmap display of tiling array data. Dataset also contains RNA-seq, not shown. The screenshot shows the 20k feature view, which typically renders in about 3/4 of a second.
DB size: 573M
Total features: 27,401,764
Summary at 10k features (382 samples)
Min. 1st Qu. Median Mean 3rd Qu. Max. 213.0 364.0 376.5 399.7 432.0 681.0
Summary at 20k features (191 samples)
Min. 1st Qu. Median Mean 3rd Qu. Max. 700.0 727.5 780.0 787.2 806.5 1267.0
Summary at 40k features (66 samples)
Min. 1st Qu. Median Mean 3rd Qu. Max. 1475 1507 1558 1611 1681 2042
Summary at 200k features (52 samples)
Min. 1st Qu. Median Mean 3rd Qu. Max. 5863 6026 6102 6297 6272 8882
Growth series of 14 tracks of tiling array data with segmentation. 30 total tracks. Growth curve demo available (under item 6)
DB size: 258M
Total features: 7,251,794
Summary at 200k features (209 samples), 512M max heap, 314M total memory used
Min. 1st Qu. Median Mean 3rd Qu. Max. 482.0 495.0 500.0 587.1 534.0 1743.0
Summary at 200k features (121 samples), 65M max heap
Min. 1st Qu. Median Mean 3rd Qu. Max. 494.0 505.0 512.0 668.1 596.0 1751.0
Summary at 200k features (117 samples) on low end hardware, 512M max heap
Min. 1st Qu. Median Mean 3rd Qu. Max. 812 843 922 1420 2171 3812
Summary at 200k features (236 samples) on low end hardware, 65M max heap
Min. 1st Qu. Median Mean 3rd Qu. Max. 797.0 828.0 844.0 1244.0 995.8 7656.0
Ideally, comparisons would be done with standardized loads, but since we don't have an automated way of doing this, we simulate typical usage by randomly scrolling back and forth at a fixed zoom level. In spite of this, the timing results reported here are reasonably reproducible, as shown below.
Here, performance is compared on different machines and different JVM heap sizes. Except where indicated, this analysis was done with the default JVM heap size of 65M, which is probably too restrictive. A heap size of 512M gives little improvement in median rendering times but a noticeable decrease in maximum rendering times. Our two machines were a workstation class Mac and a low-end Windows XP box. As expected, rendering times go up on the slower machine, but performance remains adequate and the UI remains responsive.
Using artificial data, we simulated 1 track of single base resolution data on human chromosome 1, which has a length of ~250 million bps. Although this dataset has many more features than other datasets, it's complexity is very low, so it renders fast. Shown is the 200k feature view.
Due to our early use-cases, we assume irregular genome coordinates, which is typically the nature of tiling arrays. This is inefficient for data where we could take advantage of the regularity of the genome coordinates. Doing so should enable this data to be stored in ~250M * 8 (bytes) * 2 (strands) = 4G, plus reasonable overhead. However, rendering speed remained quick and the program remained responsive, except while indexing which need not be a user-facing function, showing the viability of the software for very large datasets.
DB size: 13G
Total features: 494,533,116
Summary at 200k features (610 samples)
Min. 1st Qu. Median Mean 3rd Qu. Max. 14.00 15.00 33.50 97.75 126.00 631.00
Summary at 2M features (253 samples)
Min. 1st Qu. Median Mean 3rd Qu. Max. 284 399 1667 1798 2310 4774
...return to GGB home