Tuesday, May 29, 2012

v4 code release: faster gene track rendering

With the Start of Summer, we release our 4th version of Wash U Epigenome Browser. Following is a summary.


Major improvement
The Browser response is made faster by the new gene track. To render a gene track, only one Ajax query is needed to fetch the gene data. In the past two Ajax queries were needed and dragged down performance. A lot of coding work has been done for this, basically replacing the old design with the new one. Clear-looking arrow marks indicate strand of the gene. The marks are drawn both over introns and exons:



Minor improvements and changes

To show more information about a gene, click on it to invoke the tooltip:


In the balloon, gene symbol is printed in large italic font at top left. On its right is the "internal identifier" related to this gene track, and the link to the entry of this gene in an external database from which the gene info are retrieved.

In the middle of the balloon, click "show gene structure" to reveal the data on this gene:


We are going to add the Gene Ontology annotation for all the species, and that will be represented in the same style as gene structure data here. The small static tooltip balloon is turning into a dashboard with rich information.

Also at control panel, some more functions are added to allow user to more conveniently control appearance of genomic feature tracks. Go to "Tracks" > "Genomic features" to see the contents:


This table shows your collection of genomic feature tracks. You all know what the wrench icon does. Click it to configure the track's rendering through the panel displayed in floating toolbox. A few color boxes are also available on the right, which are shortcuts to the color configuration function. Click them will invoke the color palette:


❖, ☁, and T are for "bigBed" tracks (e.g. genes or repeats), controlling color of "item box", "density plots", and "name text" respectively. The numerical tracks like "vertebrate PhyloP" has different shortcuts, "+" and "-" controls plotting color of positive/negative values. Hope you'll like them!





*** notice to Maize genome users ***

We have changed the chromosome names from "n" to "chrn", where "n" is one of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, Mt, Pt, UNKNOWN. We previously follows what maizesequence.org is using on their site, but we were persuaded to change into the "conventional style" (chr1 instead of 1). It is important to be aware of it as you need to prepare your bigWig/bigBed/BAM files accordingly so that they can be properly displayed on the browser.



Bug fix
Small coding error leading to erroneous display of some aradidopsis and maize genes is fixed. Now the genes are displaying alright:



Monday, May 21, 2012

v3 code release - Maize genome supported

Version 3 of the Wash U Epigenome Browser source code is available now! Follow this link to download. Following is an account of what's new.


New genome assembly
We're very happy to support the maize genome (from the B73 strain, assembly version 2).

 (from Wikipedia)


Corn is nutritious and makes healthy food. While in industrial countries, maize is grown to feed the livestock and produce biofuel, maize researchers all know how important maize is as a staple food for people in many parts of the world.

So we are now supporting maize genome in our Browser with the hope that maize researcher and breeders can benefit from browsing maize genomic data using our service.

Data including genome sequence, gene and repeat prediction was downloaded from http://www.maizesequence.org/index.html. Apart from a few genome annotation tracks, the maize genome database is currently empty. We will certainly add public maize data sets here (let us know what you would like to see), but you can always view your own data sets via the custom track and Data Hub functions. Click following link to open the browser showing maize genome and tracks from a sample data hub:

http://epigenomegateway.wustl.edu/browser/?genome=AGPv2&coordinate=1:11499583-11999166&datahub=http://epigenomegateway.wustl.edu/browser/b73/testhub.txt&gftk=AGPv2_5a,full

You can find a sketch of the procedures for making maize database here.


Bug fixes
  1. At the heatmap track facet browsing panel, clicking the ⊞ or ⊟ will correctly open or fold the contents (it used to generated an error).
  2. When the correlation function is in use, any heatmap tracks newly added with have their correlation coefficients properly computed and displayed.
  3. Occasional error encountered when clicking some items in gene track is now eliminated.


Minor improvements
To make it easier to configure metadata color map, the button  is added at right side of metadata color map:

Clicking this button will display a panel containing the metadata vocabulary. You can browse through the vocabulary and add/remove terms from the color map by toggling the checkboxes:


Sunday, May 20, 2012

Variant of BED file format (and how to make it)

UPDATE contents in this post have gone void as WashU Epigenome Browser no longer supports bigBed file format.

Look at this post to see how to reformat your data into the tabix format.

-----------------------------------

In Wash U Epigenome Browser, we use a slightly altered version of BED format to encode positional data of genomic features: the 5th field is set to an unique integer, to be used as the ID of the genomic feature represented by that line. There's no upper bound of the ID value, it can go as high as 10 million if there's 10 million lines in that BED file. (example)

But the day breaks. Right now users diligently following our guideline to prepare a custom genomic feature track will encounter following error when converting BED file into bigBed file with the bedToBigBed program:


At line xx, score (xxx) must be between 0 and 1000


... where "xxx" is an integer bigger than 1000.


It is all because in the BED format specification the 5th field is deemed as "score", and the value must be between 0 and 1000. The bedToBigBed program scrutinizes the input BED file and squawks when it sees a "score" bigger than 1000.

In order to work around the hurdle to generate properly working bigBed files, you can use following bedToBigBed binary to do the work, but not the native one:

http://epigenomegateway.wustl.edu/bedToBigBed

This binary is compiled on a PC with 32bit Ubuntu operating system, using Kent Source Tree downloaded on Apr 25, 2012. It should work on both 32bit and 64bit Linux PCs.


Follow is the recipe to re-make bedToBigBed program that doesn't squawk:

  1. Download Kent Source Tree at http://hgdownload.cse.ucsc.edu/admin/jksrc.zip, decompress it, the directory "kent" will be created in your working directory.
  2. Open file "kent/src/lib/basicBed.c"
  3. At line 1375, if the content is "if (!isCt && (bed->score < 0 || bed->score > 1000))", remove line 1375 and 1376. Else do nothing.
  4. Save your edit on this file.
  5. Resume normal procedure to build the library and bedToBigBed binary.
    1. Remove "-Werror" tag from file kent/src/inc/common.mk
    2. Go to kent/src/
    3. Run "make libs"
    4. Go to kent/src/utils/bedToBigBed/
    5. Run "make", then a new "bedToBigBed" binary will be generated


We have to stick to this variation of the BED format because the genomic feature track need the ID field to scroll (ID is a neat way for the Browser to tell which genomic features have been extended by scrolling so the new data can be correctly appended to cached data). We don't think it's bizarre, savage, or ruthless, because the 5th field of BED file is already of integer type, so why not making it free of limit, free to bear an arbitrary value it wants to? We apologize for any unsettlement that might arise, and we're happy to hear your thoughts.

Free(dom) is good, isn't it?

Sunday, May 6, 2012

v2 code release - URL parsing and persistent hyperlinks

UPDATE December 27th, 2012:
URL parameter function described in this post has gone void as new specifications are in use since Version 13, see this post for details:
http://washugb.blogspot.com/2012/12/url-parameter-specification-effective.html




Today we're very happy to release our second version of Wash U Epigenome Browser. Go to source code archive page and download subtleKnife.v2.tgz to get the code.


New features
You can now supply parameters through URL as a way to control browser behavior, just like good-old CGI programs do.

Very quickly, click following link and see how it works:

http://epigenomegateway.wustl.edu/browser/?genome=hg19&datahub=http://remc.wustl.edu/xzhou/hub.txt

Once clicked, the browser will be displayed showing human hg19 genome, and tracks from the sample data hub.

Expected use of this feature is for users to obtain persistent hyperlinks to current browsing status, and share it to collaborators. Also external web sites can provide links and button to direct users to visit our browser with customized display information (e.g. a specific composition of custom tracks hosted on external websites).

Indeed, URL with parameter is an alternative to session function. The URL could be truly persistent, as it doesn't require any kind of storage to keep it valid. But with session, its information must be kept in the database, and thus could be deleted and become invalid. Though using URL user might has to deal with really long URL strings, but with "session" parameter, the URL can be very short and handy.

The composition of such URL is:

[base URL] + '?' + [key1] + '=' + [value1] + '&' + [key2] + '=' + [value2] + '&' + [more key/values pairs]

Explanations:
  1. Base URL is http://epigenomegateway.wustl.edu/browser/.
  2. '?' the question mark must be present and immediately follows base URL.
  3. Key and value are joined by '=', and multiple key/value pairs are joined by '&'.
Following is Key/Value specification. All keys are case-insensitive:
  • "genome", the name of genome to display, allowed values are:
    • hg19 (human)
    • mm9 (mouse)
    • danRer7 (zebrafish)
    • dm3 (fruitfly)
    • tair10 (arabidopsis)
  • "session", to restore a saved session. Value is session id string
  • "statusid" value is status ID, this parameter is used in conjugation with "session"
  • "metadata", to decide which metadata terms are to be displayed in metadata color map. Value is comma joined metadata term names (experimental, only leaf terms have been tested to work in this way)
  • "coordinate", to decide specific genomic position the browser should be displaying. Value is coordinate string in form of chr1:5000-6000
  • "juxtapose", to run juxtaposition on a bigBed track. Value has two possibilities:
    • if supplying URL, the URL must points to a valid bigBed file. Additional parameter "juxtaposecustom=on" must be supplied as well.
    • if using native genomic feature track to run juxtaposition, valid track name must be provided (experimental, the list of bigBed tracks is not explicitly given but can be found in config/hg19/makeDb.sql file)
  • "geneset", to run Gene Set View. Value is comma separated gene names. This parameter should not be used with "coordinate" or "juxtapose".
  • "datahub", to display tracks from a data hub. Value is URL to a data hub descriptor file
  • "hmtk", to display specific native heatmap tracks. Value is comma separated native heatmap track names (experimental, the list of native heatmap tracks is not explicitly given but can be found in config/hg19/track2Detail file)
  • "customhmtk", to display custom heatmap tracks. Value is in form of "name1,url1,name2,url2,...", where name/url is to define one custom heatmap track. All fields are joined by comma, so track name must not contain comma.
  • "gftk", to display specific native genomic feature tracks. Value is in form of "name1,mode1,name2,mode2,...", where name/mode is to define display of one native genomic feature track. Mode must be one of thin/full/density (experimental)
  • "customgftk", to display custom genomic feature tracks. Value is in form of "name1,url1,mode1,name2,url2,mode2,...". It has almost same requirement as above.
  • "bam", to display BAM tracks hosted on our server. This function is experimental because we are still working to align and generate BAM files for every heatmap tracks.
  • "custombam" to display custom BAM tracks, value is in form of "name1,url1,mode1,name2,url2,mode2,...". It has identical requirement as "customgftk".

In next code release we expect the browser to be able to automatically generate URL as a snapshot of current browsing status.


Bug fix

  1. The situation of no term displayed in metadata color map now works with session saving/restore.
  2. The browser now tolerates empty lines in data hub descriptor file.
  3. When no track is displayed in genome heatmap, clicking "add track" button in pairwise comparison panel won't encounter error. A warning message will be printed out in message console complaining there's no track to choose from.



Minor changes
In the track selection panel, custom genomic feature tracks and BAM tracks are presented in following new way:


Click the new tabs "Genomic features / custom" and "Read alignments / custom" to see tracks of these types.

In Bird's Eye View panel, left click the wiggle plot image to show a small context menu (but not right click):