Mercurial > repos > devteam > fasta_clipping_histogram

<tool id="cshl_fasta_clipping_histogram" name="Length Distribution" version="1.0.1">
    <description>chart</description>
    <macros>
        <import>macros.xml</import>
    </macros>
    <expand macro="requirements">
        <requirement type="package" version="1.49">perl-gdgraph</requirement>
    </expand>
    <command detect_errors="exit_code"><![CDATA[
fasta_clipping_histogram.pl
'$input'
'$outfile'
    ]]></command>

    <inputs>
        <expand macro="fasta_input" />
    </inputs>

    <outputs>
        <data name="outfile" format="png" metadata_source="input" />
    </outputs>
    <tests>
        <test>
            <param name="input" value="fasta_clipping_histogram-in1.fa" />
            <output name="outfile" file="fasta_clipping_histogram-out1.png" compare="sim_size" delta="512" />
        </test>
        <test>
            <param name="input" value="fasta_clipping_histogram-in2.fa" />
            <output name="outfile" file="fasta_clipping_histogram-out2.png" compare="sim_size" delta="512" />
        </test>
    </tests>
    <help><![CDATA[
**What it does**

This tool creates a histogram image of sequence lengths distribution in a given fasta dataset file.

**TIP:** Use this tool after clipping your library (with **FASTX Clipper tool**), to visualize the clipping results.

-----

**Output Examples**

In the following library, most sequences are 24-mers to 27-mers.
This could indicate an abundance of endo-siRNAs (depending of course of what you've tried to sequence in the first place).

.. image:: ${static_path}/fastx_icons/fasta_clipping_histogram_1.png

In the following library, most sequences are 19,22 or 23-mers.
This could indicate an abundance of miRNAs (depending of course of what you've tried to sequence in the first place).

.. image:: ${static_path}/fastx_icons/fasta_clipping_histogram_2.png

-----

**Input Formats**

This tool accepts short-reads FASTA files. The reads don't have to be short, but they do have to be on a single line, like so::

    >sequence1
    AGTAGTAGGTGATGTAGAGAGAGAGAGAGTAG
    >sequence2
    GTGTGTGTGGGAAGTTGACACAGTA
    >sequence3
    CCTTGAGATTAACGCTAATCAAGTAAAC

If the sequences span over multiple lines::

    >sequence1
    CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAG
    TCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAG
    aactggtctttacctTTAAGTTG

Use the **FASTA Width Formatter** tool to re-format the FASTA into a single-lined sequences::

   >sequence1
   CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAGTCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAGaactggtctttacctTTAAGTTG

-----

**Multiplicity counts (a.k.a reads-count)**

If the sequence identifier (the text after the '>') contains a dash and a number, it is treated as a multiplicity count value (i.e. how many times that individual sequence repeated in the original FASTA file, before collapsing).

Example 1 - The following FASTA file *does not* have multiplicity counts::

    >seq1
    GGATCC
    >seq2
    GGTCATGGGTTTAAA
    >seq3
    GGGATATATCCCCACACACACACAC

Each sequence is counts as one, to produce the following chart:

.. image:: ${static_path}/fastx_icons/fasta_clipping_histogram_3.png

Example 2 - The following FASTA file have multiplicity counts::

    >seq1-2
    GGATCC
    >seq2-10
    GGTCATGGGTTTAAA
    >seq3-3
    GGGATATATCCCCACACACACACAC

The first sequence counts as 2, the second as 10, the third as 3, to produce the following chart:

.. image:: ${static_path}/fastx_icons/fasta_clipping_histogram_4.png

Use the **FASTA Collapser** tool to create FASTA files with multiplicity counts.

------

This tool is based on `FASTX-toolkit`__ by Assaf Gordon.

.. __: http://hannonlab.cshl.edu/fastx_toolkit/
    ]]></help>
    <expand macro="citations" />
</tool>
author	iuc
date	Tue, 08 May 2018 10:45:30 -0400
parents	d0969fa24eb1
children	cf0a5c96c7f8