annotate fasta_clipping_histogram.xml @ 0:82e8c467e2ec draft

Uploaded
author devteam
date Wed, 25 Sep 2013 14:38:48 -0400
parents
children de44f4045b05
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
1 <tool id="cshl_fasta_clipping_histogram" name="Length Distribution">
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
2 <description>chart</description>
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
3 <requirements>
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
4 <requirement type="package" version="0.0.13">fastx_toolkit</requirement>
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
5 </requirements>
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
6 <command>fasta_clipping_histogram.pl $input $outfile</command>
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
7
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
8 <inputs>
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
9 <param format="fasta" name="input" type="data" label="Library to analyze" />
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
10 </inputs>
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
11
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
12 <outputs>
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
13 <data format="png" name="outfile" metadata_source="input" />
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
14 </outputs>
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
15 <help>
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
16
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
17 **What it does**
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
18
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
19 This tool creates a histogram image of sequence lengths distribution in a given fasta dataset file.
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
20
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
21 **TIP:** Use this tool after clipping your library (with **FASTX Clipper tool**), to visualize the clipping results.
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
22
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
23 -----
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
24
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
25 **Output Examples**
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
26
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
27 In the following library, most sequences are 24-mers to 27-mers.
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
28 This could indicate an abundance of endo-siRNAs (depending of course of what you've tried to sequence in the first place).
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
29
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
30 .. image:: ${static_path}/fastx_icons/fasta_clipping_histogram_1.png
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
31
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
32
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
33 In the following library, most sequences are 19,22 or 23-mers.
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
34 This could indicate an abundance of miRNAs (depending of course of what you've tried to sequence in the first place).
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
35
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
36 .. image:: ${static_path}/fastx_icons/fasta_clipping_histogram_2.png
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
37
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
38
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
39 -----
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
40
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
41
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
42 **Input Formats**
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
43
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
44 This tool accepts short-reads FASTA files. The reads don't have to be short, but they do have to be on a single line, like so::
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
45
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
46 >sequence1
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
47 AGTAGTAGGTGATGTAGAGAGAGAGAGAGTAG
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
48 >sequence2
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
49 GTGTGTGTGGGAAGTTGACACAGTA
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
50 >sequence3
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
51 CCTTGAGATTAACGCTAATCAAGTAAAC
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
52
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
53
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
54 If the sequences span over multiple lines::
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
55
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
56 >sequence1
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
57 CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAG
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
58 TCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAG
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
59 aactggtctttacctTTAAGTTG
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
60
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
61 Use the **FASTA Width Formatter** tool to re-format the FASTA into a single-lined sequences::
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
62
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
63 >sequence1
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
64 CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAGTCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAGaactggtctttacctTTAAGTTG
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
65
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
66
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
67 -----
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
68
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
69
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
70
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
71 **Multiplicity counts (a.k.a reads-count)**
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
72
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
73 If the sequence identifier (the text after the '>') contains a dash and a number, it is treated as a multiplicity count value (i.e. how many times that individual sequence repeated in the original FASTA file, before collapsing).
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
74
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
75 Example 1 - The following FASTA file *does not* have multiplicity counts::
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
76
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
77 >seq1
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
78 GGATCC
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
79 >seq2
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
80 GGTCATGGGTTTAAA
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
81 >seq3
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
82 GGGATATATCCCCACACACACACAC
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
83
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
84 Each sequence is counts as one, to produce the following chart:
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
85
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
86 .. image:: ${static_path}/fastx_icons/fasta_clipping_histogram_3.png
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
87
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
88
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
89 Example 2 - The following FASTA file have multiplicity counts::
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
90
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
91 >seq1-2
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
92 GGATCC
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
93 >seq2-10
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
94 GGTCATGGGTTTAAA
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
95 >seq3-3
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
96 GGGATATATCCCCACACACACACAC
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
97
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
98 The first sequence counts as 2, the second as 10, the third as 3, to produce the following chart:
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
99
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
100 .. image:: ${static_path}/fastx_icons/fasta_clipping_histogram_4.png
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
101
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
102 Use the **FASTA Collapser** tool to create FASTA files with multiplicity counts.
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
103
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
104 ------
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
105
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
106 This tool is based on `FASTX-toolkit`__ by Assaf Gordon.
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
107
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
108 .. __: http://hannonlab.cshl.edu/fastx_toolkit/
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
109
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
110 </help>
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
111 </tool>
82e8c467e2ec Uploaded
devteam
parents:
diff changeset
112 <!-- FASTA-Clipping-Histogram is part of the FASTX-toolkit, by A.Gordon (gordon@cshl.edu) -->