Mercurial > repos > devteam > fasta_clipping_histogram
comparison fasta_clipping_histogram.xml @ 0:82e8c467e2ec draft
Uploaded
author | devteam |
---|---|
date | Wed, 25 Sep 2013 14:38:48 -0400 |
parents | |
children | de44f4045b05 |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:82e8c467e2ec |
---|---|
1 <tool id="cshl_fasta_clipping_histogram" name="Length Distribution"> | |
2 <description>chart</description> | |
3 <requirements> | |
4 <requirement type="package" version="0.0.13">fastx_toolkit</requirement> | |
5 </requirements> | |
6 <command>fasta_clipping_histogram.pl $input $outfile</command> | |
7 | |
8 <inputs> | |
9 <param format="fasta" name="input" type="data" label="Library to analyze" /> | |
10 </inputs> | |
11 | |
12 <outputs> | |
13 <data format="png" name="outfile" metadata_source="input" /> | |
14 </outputs> | |
15 <help> | |
16 | |
17 **What it does** | |
18 | |
19 This tool creates a histogram image of sequence lengths distribution in a given fasta dataset file. | |
20 | |
21 **TIP:** Use this tool after clipping your library (with **FASTX Clipper tool**), to visualize the clipping results. | |
22 | |
23 ----- | |
24 | |
25 **Output Examples** | |
26 | |
27 In the following library, most sequences are 24-mers to 27-mers. | |
28 This could indicate an abundance of endo-siRNAs (depending of course of what you've tried to sequence in the first place). | |
29 | |
30 .. image:: ${static_path}/fastx_icons/fasta_clipping_histogram_1.png | |
31 | |
32 | |
33 In the following library, most sequences are 19,22 or 23-mers. | |
34 This could indicate an abundance of miRNAs (depending of course of what you've tried to sequence in the first place). | |
35 | |
36 .. image:: ${static_path}/fastx_icons/fasta_clipping_histogram_2.png | |
37 | |
38 | |
39 ----- | |
40 | |
41 | |
42 **Input Formats** | |
43 | |
44 This tool accepts short-reads FASTA files. The reads don't have to be short, but they do have to be on a single line, like so:: | |
45 | |
46 >sequence1 | |
47 AGTAGTAGGTGATGTAGAGAGAGAGAGAGTAG | |
48 >sequence2 | |
49 GTGTGTGTGGGAAGTTGACACAGTA | |
50 >sequence3 | |
51 CCTTGAGATTAACGCTAATCAAGTAAAC | |
52 | |
53 | |
54 If the sequences span over multiple lines:: | |
55 | |
56 >sequence1 | |
57 CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAG | |
58 TCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAG | |
59 aactggtctttacctTTAAGTTG | |
60 | |
61 Use the **FASTA Width Formatter** tool to re-format the FASTA into a single-lined sequences:: | |
62 | |
63 >sequence1 | |
64 CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAGTCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAGaactggtctttacctTTAAGTTG | |
65 | |
66 | |
67 ----- | |
68 | |
69 | |
70 | |
71 **Multiplicity counts (a.k.a reads-count)** | |
72 | |
73 If the sequence identifier (the text after the '>') contains a dash and a number, it is treated as a multiplicity count value (i.e. how many times that individual sequence repeated in the original FASTA file, before collapsing). | |
74 | |
75 Example 1 - The following FASTA file *does not* have multiplicity counts:: | |
76 | |
77 >seq1 | |
78 GGATCC | |
79 >seq2 | |
80 GGTCATGGGTTTAAA | |
81 >seq3 | |
82 GGGATATATCCCCACACACACACAC | |
83 | |
84 Each sequence is counts as one, to produce the following chart: | |
85 | |
86 .. image:: ${static_path}/fastx_icons/fasta_clipping_histogram_3.png | |
87 | |
88 | |
89 Example 2 - The following FASTA file have multiplicity counts:: | |
90 | |
91 >seq1-2 | |
92 GGATCC | |
93 >seq2-10 | |
94 GGTCATGGGTTTAAA | |
95 >seq3-3 | |
96 GGGATATATCCCCACACACACACAC | |
97 | |
98 The first sequence counts as 2, the second as 10, the third as 3, to produce the following chart: | |
99 | |
100 .. image:: ${static_path}/fastx_icons/fasta_clipping_histogram_4.png | |
101 | |
102 Use the **FASTA Collapser** tool to create FASTA files with multiplicity counts. | |
103 | |
104 ------ | |
105 | |
106 This tool is based on `FASTX-toolkit`__ by Assaf Gordon. | |
107 | |
108 .. __: http://hannonlab.cshl.edu/fastx_toolkit/ | |
109 | |
110 </help> | |
111 </tool> | |
112 <!-- FASTA-Clipping-Histogram is part of the FASTX-toolkit, by A.Gordon (gordon@cshl.edu) --> |