Comparing reference-based RNA-Seq mapping methods for non-human primate data.Jul 11,2014 BMC Genomics. Ashlee M Benjamin1; Marshall Nichols1; Thomas W Burke1; Geoffrey S Ginsburg1; Joseph E Lucas2; 1Center for Applied Genomics, Department of Medicine, Duke University, Durham, North Carolina, USA 2Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina, USA.
Author details: 1Center for Applied Genomics, Department of Medicine, Duke University,Durham, North Carolina, USA. 2Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina, USA.
The application of next-generation sequencing technology to gene expression quantification analysis,
namely, RNA-Sequencing, has transformed the way in which gene expression studies are conducted
and analyzed. These advances are of particular interest to researchers studying organisms with missing
or incomplete genomes, as the need for knowledge of sequence information is overcome. De
novo assembly methods have gained widespread acceptance in the RNA-Seq community for organisms
with no true reference genome or transcriptome. While such methods have tremendous utility,
computational cost is still a significant challenge for organisms with large and complex genomes.
In this manuscript, we present a comparison of four reference-based mapping methods for non-human
primate data. We utilize TopHat2 and GSNAP for mapping to the human genome, and Bowtie2 and
Stampy for mapping to the human genome and transcriptome for a total of six mapping approaches.
For each of these methods, we explore mapping rates and locations, number of detected genes, correlations
between computed expression values, and the utility of the resulting data for differential
We show that reference-based mapping methods indeed have utility in RNA-Seq analysis of mammalian
data with no true reference, and the details of mapping methods should be carefully considered
when doing so. Critical algorithm features include short seed sequences, the allowance of mismatches,
and the allowance of gapped alignments in addition to splice junction gaps. Such features facilitate
sensitive alignment of non-human primate RNA-Seq data to a human reference.