Mercurial > repos > devteam > data_manager_hisat_index_builder

<tool id="hisat_index_builder_data_manager" name="HISAT index" tool_type="manage_data" version="1.0.0">
    <description>builder</description>
    <requirements>
        <requirement type="package" version="0.1.6">hisat</requirement>
    </requirements>
    <stdio>
        <exit_code range=":-1" />
        <exit_code range="1:" />
    </stdio>
    <command interpreter="python">hisat_index_builder.py "${out_file}" --fasta_filename "${all_fasta_source.fields.path}" --fasta_dbkey "${all_fasta_source.fields.dbkey}" --fasta_description "${all_fasta_source.fields.name}" --data_table_name "hisat_indexes"</command>
    <inputs>
        <param label="Source FASTA Sequence" name="all_fasta_source" type="select">
            <options from_data_table="all_fasta" />
        </param>
        <param label="Name of sequence" name="sequence_name" type="text" value="" />
        <param label="ID for sequence" name="sequence_id" type="text" value="" />
    </inputs>
    <outputs>
        <data format="data_manager_json" name="out_file" />
    </outputs>
    <help>
<![CDATA[
.. class:: infomark

**Notice:** If you leave name, description, or id blank, it will be generated automatically.

What is HISAT?
--------------

`HISAT <http://ccb.jhu.edu/software/hisat>`__ is a fast and sensitive
spliced alignment program. As part of HISAT, we have developed a new
indexing scheme based on the Burrows-Wheeler transform
(`BWT <http://en.wikipedia.org/wiki/Burrows-Wheeler_transform>`__) and
the `FM index <http://en.wikipedia.org/wiki/FM-index>`__, called
hierarchical indexing, that employs two types of indexes: (1) one global
FM index representing the whole genome, and (2) many separate local FM
indexes for small regions collectively covering the genome. Our
hierarchical index for the human genome (about 3 billion bp) includes
~48,000 local FM indexes, each representing a genomic region of
~64,000bp. As the basis for non-gapped alignment, the FM index is
extremely fast with a low memory footprint, as demonstrated by
`Bowtie <http://bowtie-bio.sf.net>`__. In addition, HISAT provides
several alignment strategies specifically designed for mapping different
types of RNA-seq reads. All these together, HISAT enables extremely fast
and sensitive alignment of reads, in particular those spanning two exons
or more. As a result, HISAT is much faster >50 times than
`TopHat2 <http://ccb.jhu.edu/software/tophat>`__ with better alignment
quality. Although it uses a large number of indexes, the memory
requirement of HISAT is still modest, approximately 4.3 GB for human.
HISAT uses the `Bowtie2 <http://bowtie-bio.sf.net/bowtie2>`__
implementation to handle most of the operations on the FM index. In
addition to spliced alignment, HISAT handles reads involving indels and
supports a paired-end alignment mode. Multiple processors can be used
simultaneously to achieve greater alignment speed. HISAT outputs
alignments in `SAM <http://samtools.sourceforge.net/SAM1.pdf>`__ format,
enabling interoperation with a large number of other tools (e.g.
`SAMtools <http://samtools.sourceforge.net>`__,
`GATK <http://www.broadinstitute.org/gsa/wiki/index.php/The_Genome_Analysis_Toolkit>`__)
that use SAM. HISAT is distributed under the `GPLv3
license <http://www.gnu.org/licenses/gpl-3.0.html>`__, and it runs on
the command line under Linux, Mac OS X and Windows.
]]>
    </help>
    <citations>
        <citation type="doi">10.1038/nmeth.3317</citation>
    </citations>
</tool>
author	devteam
date	Fri, 06 Nov 2015 14:17:24 -0500
parents
children