Sunday, April 29, 2012

v1 - FIRST release of source code

Today is the start of regular release of Wash U Epigenome Browser source code. The first version is available here: http://epigenomegateway.wustl.edu/source/ This link is also accessible from browser home page.

Inside this directory there's currently one compressed file marked as "v1". Download the file and get all the working code of the Browser. As stated on the browser web site, the code is free for non-commercial use.

The contents in the downloaded file are a freeze of my working directory. Most of the files are text files. Only exception is a few key documents with suffix ".dia". The software Dia is required to open them.

About the version number: an integer starting from 1 as of this first release, to be incremented by 1 at each new release. This is my personal tribute to the prestigious UCSC Kent Source Tree, the fundamental basis of my work, which is now at v265 over its bi-weekly release cycle. Currently we're yet to move to GitHub because we don't have any plan to branch out from existing code.

Each release will be accompanied by a blog post to explain bug fixes/updates/new features.

===

UPDATES AND FIXES

Gene tracks has been updated and an error with gene information table in database is corrected (the chromosome name field was defined as char(10) so chromosome names longer than 10 characters was truncated and led to errors).

The centralized gene symbol and description table is discarded, and now each gene type will have its own symbol-description table.

Transposable element tracks were added for fruit fly and Arabidopsis.

[ Human ]
the UCSC Gene track, RefSeq gene track, Ensembl gene track, GENCODE gene track source data were retrieved from UCSC Genome Browser ftp site on April 26th, 2012.

[ Mouse ]
the UCSC Gene track, RefSeq gene track, Ensembl gene track data were retrieved from UCSC Genome Browser ftp site on April 26th

[ Zebrafish ]
the RefSeq and Ensembl gene tracks source data were retrieved from UCSC Genome Browser ftp site on April 26th

[ Fruitfly ]
the RefSeq, Ensembl, and FlyBase gene tracks and RepeatMasker source data were retrieved from UCSC Genome Browser ftp site on April 26th

[ Arabidopsis ]
the TAIR gene track data and transposon information was retrieved from TAIR ftp site on April 27th

Wednesday, April 25, 2012

Bug fix report

Following bugs have just been fixed:

1. Gene Set View now works fine with large amount of tracks. Previously when you had 100+ tracks showing in genome heatmap the browser will reject custom gene set if you try to run Gene Set View. Now this big mistake has been corrected.

2. This problem is also associated with big track number. In metadata color map, various attributes associated with large number of tracks will be rendered correctly. Internally the browser supplies 41 predefined colors to visualize attributes. When number of unique attributes is beyond 41, the browser is now smart enough to do it correctly -- a color will be selected from the list and will be darkened, ensuring the new color is unique.

3. Finally if you have custom metadata terms in use, clicking on the term in metadata color map will show its name correctly in context menu.

Please let us know any further bugs and problems.

Wednesday, April 18, 2012

Improved interface for Gene Plot function

The Gene Plot function has undergone two slight changes again:
  1. improved user interface
  2. on-demand loading of Google Chart library
Following is a quick overview.

At floating toolbox, click "Apps" button to show the list of "apps":


The Gene Plot panel will be displayed on top of the browser panel:


Now you can drag this panel around using the narrow header on top of panel.

As revealed by the interface layout, the Gene Plot consists of 4 steps. Start with "Step 0" by entering some genes into the text area:



At next step choose a track with quantitative data. Track name will be printed out for the selected one:


Choose a graph type for "step 2". By default the first type "Quartiles & extremes" is selected.

Last step is choosing rendering method. The method will be chosen from the drop-down menu. By default the "R software" is the only available method, and "Google Chart" method is disabled. This is to speed up page loading, and won't cause loading to pause for long time if Google service is actually blocked at user's location (which is known to happen in some countries). 

If you have working internet connection for Google, you will be able to enjoy Google Chart service by clicking the "enable" button. Once the external libraries are imported, the page will print out some message to reflect:


From above the Google Chart option in drop-down menu is active. Find the orange button on bottom of panel and press it to make the plot. Once done, the plot will be displayed in same panel:



Finally by clicking the stripe on the top, you can go back to control options and re-do the graph.


Sunday, April 15, 2012

New color palette


The new color palette is now in use! As shown above, click on any color blobs in configuration panel to invoke the palette. It sits in a glossy modern-looking transparent plate, contains a display of lovely colors (with their even more lovely names printed on them), and .... a gradient slider!

To use it is simple: pick up a color by clicking on any named block and that color is chosen for whatever purpose. You can go further to select a lighter/darker version of that color by dragging the slider (or clicking on anywhere on the gradient stripe).

Palette colors were selected from here: http://www.w3schools.com/html/html_colornames.asp

The new palette is really a complete upgrade to its predecessor. It displays color names, which is helpful for vision-impaired users so they know what they are looking at. And it supports much wider spectrum to choose from. Looking good is important!

Friday, April 13, 2012

The "overview wiggle track"

Updated on Feb 8, 2014: This feature has been dropped.



Today we release a new feature, the "overview wiggle track". Let's take a quick look here.

Open the browser, select a species (I'm choosing "human" here), the page will be loaded up as usual. Something new appeared on top of small chromosome ideogram: a wiggle track.


By default the small wiggle track is in some obscure color. With a close inspection you'll be able to tell it's the "UCSC genes" track, showing gene density of current chromosome.

Press cursor (LEFT but not right click) on the track will invoke a context menu:


Select "configure" option if you are unsatisfied with the wiggle's rendering style. A familiar looking configure panel will open up in floating toolbox:


Below is the new look of the wiggle track, with color changed to blue, and height increased a bit:


With the "replace" option in the menu we can change content of the track:


A sub-menu is displayed prompting you to select track from either "heatmap" or "genomic feature" category. Choose either one, the corresponding track selection panel will be displayed. Upon selection of a track, the wiggle plot will update. Both native and custom bigWig and bigBed tracks can be displayed (density counts will be shown for bigBed genomic features). However BAM tracks are not supported for the moment.

If you are happy with that, you can go on with your browsing. The wiggle plot would update at appropriate conditions. Following is the case when the displayed region spans multiple chromosomes. Wiggle plot automatically extend its coverage over those chromosomes:


This might sounds like crazy and is indeed rarely useful in the context of genome browsing. But this would be really useful in case of Repeat Browser where user would be constantly browsing data over multiple repeat subfamilies for comparison purpose. We hope you'll see such things happen in a few months.

And during Gene Set View, the wiggle plot will update again. Following example shows wiggle plot of some H3K9me3 data over KEGG's glycolysis pathway. It is seen that two genes in that pathway carries extraordinarily high H3K9me3 marks.


And finally you can add another wiggle plot to be plotted beneath ideogram bar. Here I add a H3K4me3 track of same sample (CD34 primary cell) as the H3K9me3 track:


It looks that H3K4me3 mark is more dispersed than the H3K9me3 mark here, and the two marks don't correlate with each other.

In general, this small function presents data pattern on a wider scale, and hopefully provides some kind of guidance for your browsing.

Monday, April 2, 2012

The data hub


Tabular format is deprecated. Click here to read about the new JSON format.


Updated 24/9/2013: start to use the JSON hub format
Updated 6/1/2013: file format indicator "sam" is replaced by "bam"
Updated 12/27/2012
Updated 10/2/2012
Posted on 4/2012


We are very happy to announce a new function, the "data hub".

The "data hub" concept is about organizing custom tracks in a tidy way and the ability of batch upload. User can prepare a data hub to host bedGraph or SAM tracks generated from her experiments, write up some metadata terms, and bring it up on the browser via one single mouse click.


The benefit

  1. Batch uploading (eliminating the labor of manual uploading one each time)
  2. Custom track information are made persistent (encoded in a hub descriptor file)
  3. Track files don't have to be on the same web server
  4. All types of tracks can be annotated by custom metadata
  5. With a datahub you can configure:
    1. Default track display mode (you can have some of the tracks turned on by default while keep the rest hidden)
    2. Default rendering style (this feature is still been worked on but you can control some key styles e.g. color and height)



Submitting a sample datahub


On the toolbox panel click button CustomTK, the Custom track submission panel will be displayed:


Click button "DataHub" to show the contents:




Enter the URL of a sample hub (or click the text as indicated, the URL for hg19 sample hub is http://vizhub.wustl.edu/hubSample/hg19/hub2.txt). Click "Load" button, tracks of the sample hub will be displayed:



As displayed in above screen shot, the hg19 sample hub contains many tracks, including 4 heatmap tracks (in blue), one bed track named "mattress", one long-range interaction track named "fractal globule", and one BAM track named "tempest". It also contains two metadata terms shown as two columns in the metadata color map. The heatmap tracks are annotated by the terms.

Go back to the "Custom track" control panel:


This time select button "Manage". The number in parenthesis tells number of custom tracks that have been registered in the management table, it increments when more tracks are added. The table would look like below:






Creating your own datahub

You need to create a hub descriptor file , place it on your web server, and that's all.

The file is a simple text file. It is line-oriented and tab-delimited. Each line defines a track or a metadata term. Each type of track might require different number of fields to describe itself.

As you've already seen from the custom track panel, five types of custom tracks are available to be included in your data hub:
  1. Quantitative data (bedGraph format, slow and non-restrictive) info↗
  2. Quantitative data (bigWig format, fast and restrictive) info↗
  3. Genomic features or annotation (BED) info↗
  4. Long-range genome interaction (BED-derived format) info↗
  5. Read alignment (BAM format) info↗
Lines starting with "#" are comments. Blank lines are allowed.

None-comment lines are separated into fields by tab and each file type might have different requirement on fields, and the order of the fields must be obeyed.

Note the track style feature has only limited options and is under development. To use it follow this simple example.
  1. Quantitative data (bigWig)
    1. 1st field must be bigWig
    2. 2nd field is URL to the bigwig file
    3. 3rd field is track name, must not contain tab or quotes, or other bizarre characters
    4. 4th field is mode string, must be either show or hide
    5. 5th field is metadata annotation, should be in the format of "name:attribute" pairs, colon is used to separate "name" and "attribute" in a pair, and multiple pairs will be separated by comma
    6. 6th field is optional custom style
  2. Quantitative data (bedGraph)
    1. 1st field must be bedGraph
    2. 2nd field is URL to the bedGraph file
    3. 3rd field is track name
    4. 4th field is mode string, must be either show or hide
    5. 5th field is metadata annotation same as previous
    6. 6th field is optional custom style
  3. Genomic annotation
    1. 1st field must be the word bed
    2. 2nd field is URL to the bed file
    3. 3rd field is track name
    4. 4th field is display mode, must be one of hide, density, thin, or full
    5. 5th field is metadata annotation, same as previous
    6. 6th field is optional custom style
  4. Long-range genome interaction
    1. 1st field must be the word longrange
    2. 2nd field is URL to the track file
    3. 3rd field is track name
    4. 4th field is display mode, must be one of hide, arc, trihm, thin, or full
    5. 5th field is metadata annotation, same as previous
    6. 6th field is optional custom style
  5. Read alignment
    1. 1st field must be word BAM
    2. fields 2 to 6 have same requirements as bed track
  6. metadata term:
    1. first field must be the word metadata
    2. second field is term name, terms identical as native metadata terms (such as "Sample") can be used
    3. third field is attributes of the term, multiple attributes will be separated by comma