Monday, November 7, 2011

Correlation function revised

The correlation function has been extensively refactored and debugged. It still runs on Spearman's Rank correlation, which is the only choice, and the correlation coefficient is plotted in the same way. But here's the new things:

First of all, right-click menu is enabled on correlation canvas. Options are provided to turn off correlation, or sort tracks according to correlation coefficient in descending order:



At correlation control panel, genomic features are presented with radio buttons. Checking a button will invoke correlation of heatmap track data with feature density of that genomic feature track.




At the bottom a new option is provided, which left at "Yes" will enforce "density" mode for tracks that are checked. If the track is not displayed, it will be brought up in "density" mode, and if the track is already displayed in "thin/full" mode, it will be flipped into "density" mode.

Finally, the correlation status can be saved under session.

Future work: add other correlation/distance metrics, like Pearson.

Thursday, November 3, 2011

New Gene and RepeatMasker tracks

Gene model can be predicted by different algorithms, based on different information sources. As available on UCSC Genome Browser, each species usually have more than one gene track. In order to enable user to browse track data in richer context of gene information, some of the UCSC gene tracks have been downloaded and integrated into Human Epigenome Browser for currently supported species (human hg19, mouse mm9, and zebrafish danRer7).

At navigation bar, go to "Tracks" --> "Genomic features", gene tracks are listed in first box:


Above screenshot shows 5 gene tracks for human genome. Use the drop-down menus to turn them on:


Click on a gene, a tooltip will pop out displaying some more info on this gene:


Gene symbols are always displayed whenever available, no matter which method or source this gene belongs to. The gene DNAJC1 will have same display name in both RefSeq and the other tracks, but it has distinct identifiers in all sources. Click "More info" link will take you to NCBI RefSeq database page on this gene.

Apart from this view, you can selectively view only certain part of gene. At control panel you can find downward arrow following each gene track name, click it will reveal some more tracks:


Genes from each source have been prepared into 5 separate tracks, and can be brought up for viewing independently.

Same practice has been applied to RepeatMasker tracks. Following UCSC Genome Browser display convention, in the RepeatMasker box lists "classes" of repetitive elements:


Click arrow will reveal additional tracks, which are "families" of repetitive elements belonging to the class:



Here lists some related future work:
  1. currently there's only one level of "parent-child" relationship in organizing genomic feature tracks. Might be expanded to more level (will be useful in case of repeats, as there's "sub-family" following the "family" scheme).
  2. regular synchronization of genomic feature track data with UCSC
  3. adding UCSC sequence conservation tracks