Mercurial > repos > rnateam > rnabob
comparison rnabob.xml @ 1:95756833bc6c draft default tip
Uploaded
| author | rnateam |
|---|---|
| date | Fri, 09 Jan 2015 14:23:53 -0500 |
| parents | 0a63e16e1e84 |
| children |
comparison
equal
deleted
inserted
replaced
| 0:0a63e16e1e84 | 1:95756833bc6c |
|---|---|
| 4 <requirement type="package" version="2.2.1">rnabob</requirement> | 4 <requirement type="package" version="2.2.1">rnabob</requirement> |
| 5 </requirements> | 5 </requirements> |
| 6 <version_command>echo "2.2.1"</version_command> | 6 <version_command>echo "2.2.1"</version_command> |
| 7 <command> | 7 <command> |
| 8 <![CDATA[ | 8 <![CDATA[ |
| 9 rnabob | 9 rnabob |
| 10 -q | 10 -q |
| 11 $fancy | 11 $fancy |
| 12 $compStrands | 12 $compStrands |
| 13 $skipOverlapping | 13 $skipOverlapping |
| 14 $descriptorFile | 14 $descriptorFile |
| 15 $sequenceFile > $stdout | 15 $sequenceFile > $stdout |
| 16 ]]> | 16 ]]> |
| 17 </command> | 17 </command> |
| 18 <stdio> | 18 <stdio> |
| 19 <exit_code range="1:" level="fatal" description="Error occurred. Please check Tool Standard Error" /> | 19 <exit_code range="1:" level="fatal" description="Error occurred. Please check Tool Standard Error" /> |
| 20 <exit_code range=":-1" level="fatal" description="Error occurred. Please check Tool Standard Error" /> | 20 <exit_code range=":-1" level="fatal" description="Error occurred. Please check Tool Standard Error" /> |
| 21 </stdio> | 21 </stdio> |
| 22 <inputs> | 22 <inputs> |
| 23 <param name="descriptorFile" type="data" format="txt" multiple="false" label="Motif Descriptor File" help="This file contains the description of the motif for which to search"/> | 23 <param name="descriptorFile" type="data" format="txt" multiple="false" label="Motif Descriptor File" help="This file contains the description of the motif for which to search"/> |
| 24 <param name="sequenceFile" type="data" format="fasta" multiple="false" label="Sequence File" help="This file specifies the sequence in which the motif will be searched"/> | 24 <param name="sequenceFile" type="data" format="fasta" multiple="false" label="Sequence File" help="This file specifies the sequence in which the motif will be searched"/> |
| 25 <param name="compStrands" type="boolean" truevalue="-c" falsevalue="" checked="false" label="Also search on complementary strands" help="-c : Search both strands of the supplied sequence"/> | 25 <param name="compStrands" type="boolean" truevalue="-c" falsevalue="" checked="false" label="Also search on complementary strands" help="-c : Search both strands of the supplied sequence"/> |
| 26 <param name="skipOverlapping" type="boolean" truevalue="-s" falsevalue="" checked="false" label="Skip overlapping matches" help="-s : This is a workaround to avoid a problem in the DNABANK, overlapping matches will be ignored"/> | 26 <param name="skipOverlapping" type="boolean" truevalue="-s" falsevalue="" checked="false" label="Skip overlapping matches" help="-s : This is a workaround to avoid a problem in the DNABANK, overlapping matches will be ignored"/> |
| 27 <param name="fancy" type="boolean" checked="false" truevalue="-F" falsevalue="" label="Show Alignments" help="Display full alignments to pattern"/> | 27 <param name="fancy" type="boolean" checked="false" truevalue="-F" falsevalue="" label="Show Alignments" help="Display full alignments to pattern"/> |
| 28 </inputs> | 28 </inputs> |
| 29 <outputs> | 29 <outputs> |
| 30 <data format="txt" name="stdout" label="${tool.name} on ${on_string}" /> | 30 <data format="txt" name="stdout" label="${tool.name} on ${on_string}" /> |
| 31 </outputs> | 31 </outputs> |
| 32 <tests> | 32 <tests> |
| 46 <param name="fancy" value="False" /> | 46 <param name="fancy" value="False" /> |
| 47 <output name="stdout" file="trna.bob" /> | 47 <output name="stdout" file="trna.bob" /> |
| 48 </test> | 48 </test> |
| 49 </tests> | 49 </tests> |
| 50 <help> | 50 <help> |
| 51 <![CDATA[ | |
| 51 **What RNABOB does** | 52 **What RNABOB does** |
| 52 | 53 |
| 53 RNABOB allows searching a sequence database for RNA structural motifs. | 54 RNABOB allows searching a sequence database for RNA structural motifs. |
| 54 The probe motif is specified in a *descriptor* file, | 55 The probe motif is specified in a *descriptor* file, |
| 55 which describes its primary sequence, secondary structure, and tertiary constraints. | 56 which describes its primary sequence, secondary structure, and tertiary constraints. |
| 57 | 58 |
| 58 ----- | 59 ----- |
| 59 | 60 |
| 60 **Sequence database format** | 61 **Sequence database format** |
| 61 | 62 |
| 62 RNABOB is currently restricted to reading sequence files in FASTA format. | 63 RNABOB is currently restricted to reading sequence files in FASTA format. |
| 63 The command line version of RNABOB can also read sequence files in GCG, EMBL, GenBank and other formats. | 64 The command line version of RNABOB can also read sequence files in GCG, EMBL, GenBank and other formats. |
| 64 | 65 |
| 65 ----- | 66 ----- |
| 66 | 67 |
| 67 **Descriptor file syntax** | 68 **Descriptor file syntax** |
| 68 | 69 |
| 69 The descriptor file syntax is fairly powerful, and allows a great deal of freedom for specifying | 70 The descriptor file syntax is fairly powerful, and allows a great deal of freedom for specifying |
| 70 RNA motifs. The syntax is therefore a bit complicated. | 71 RNA motifs. The syntax is therefore a bit complicated. |
| 71 | 72 |
| 72 The descriptor file has two parts: a **topology** description and an **explicit** description. | 73 The descriptor file has two parts: a **topology** description and an **explicit** description. |
| 73 | 74 |
| 74 The first non-blank, non-comment line of the file is the topology description. It defines the | 75 The first non-blank, non-comment line of the file is the topology description. It defines the |
| 75 order of occurrence of a series of single-stranded, double-stranded and related elements. Each | 76 order of occurrence of a series of single-stranded, double-stranded and related elements. Each |
| 76 element must be given a unique name (a number, typically) and must be prefixed with '**s**', | 77 element must be given a unique name (a number, typically) and must be prefixed with '**s**', |
| 77 '**h**', or '**r**', indicating single-strand, helical, or a relational element. Helical and | 78 '**h**', or '**r**', indicating single-strand, helical, or a relational element. Helical and |
| 78 relational elements are paired to other elements, which are suffixed by a prime, **\'**. | 79 relational elements are paired to other elements, which are suffixed by a prime, **\'**. |
| 79 | 80 |
| 80 For example:: | 81 For example:: |
| 81 | 82 |
| 82 \ | 83 \ |
| 83 h1 s1 h1' | 84 h1 s1 h1' |
| 84 | 85 |
| 85 describes a hairpin loop structure with a simple helix and single-stranded loop. If the helix | 86 describes a hairpin loop structure with a simple helix and single-stranded loop. If the helix |
| 86 always contained a non-canonical base pair at one position, the topology coud be described as:: | 87 always contained a non-canonical base pair at one position, the topology coud be described as:: |
| 87 | 88 |
| 88 \ | 89 \ |
| 89 h1 r1 h2 s1 h2' r1' h1' | 90 h1 r1 h2 s1 h2' r1' h1' |
| 90 | 91 |
| 91 where r1,r1' indicate a correlation, where the sequence r1 constrains the sequence of r1'. | 92 where r1,r1' indicate a correlation, where the sequence r1 constrains the sequence of r1'. |
| 92 (Helices are a special case of this.) | 93 (Helices are a special case of this.) |
| 93 | 94 |
| 94 The remaining non-comment, non-blank lines are explicit descriptions of each element in turn. Each | 95 The remaining non-comment, non-blank lines are explicit descriptions of each element in turn. Each |
| 95 line contains 3 or 4 fields, separated by tabs or blank space. The first field is the name of the | 96 line contains 3 or 4 fields, separated by tabs or blank space. The first field is the name of the |
| 96 element, from the topology description. The second field is the number of mismatches allowed in | 97 element, from the topology description. The second field is the number of mismatches allowed in |
| 97 this element. The third field is the primary sequence constraint to apply to this element. | 98 this element. The third field is the primary sequence constraint to apply to this element. |
| 98 | 99 |
| 99 Helices and relational element pairs are specified on a single line rather than two. Mismatches | 100 Helices and relational element pairs are specified on a single line rather than two. Mismatches |
| 100 and primary sequence constraints are given as pairs, separated by a colon '**:**'. The left side | 101 and primary sequence constraints are given as pairs, separated by a colon '**:**'. The left side |
| 101 is the constraint applied to the upstream element, and the right side is applied to the downstream | 102 is the constraint applied to the upstream element, and the right side is applied to the downstream |
| 102 elements. | 103 elements. |
| 103 | 104 |
| 104 The primary sequence constraint is given as a sequence of nucleotides. Any IUPAC single-letter | 105 The primary sequence constraint is given as a sequence of nucleotides. Any IUPAC single-letter |
| 105 code is recognized, including N if the position can have any base identity. Allowed length | 106 code is recognized, including N if the position can have any base identity. Allowed length |
| 106 variations are specified with asterisks ``'*'``, where each ``*`` will allow either 0 or 1 N at | 107 variations are specified with asterisks ``'*'``, where each ``*`` will allow either 0 or 1 N at |
| 107 that position. | 108 that position. |
| 108 | 109 |
| 109 For example:: | 110 For example:: |
| 110 | 111 |
| 111 \ | 112 \ |
| 112 GGAGG******NNNAUG | 113 GGAGG******NNNAUG |
| 113 | 114 |
| 114 specifies a GGAGG Shine/Dalgarno site and an AUG initiation codon, separated by a spacer of 3 to 9 | 115 specifies a GGAGG Shine/Dalgarno site and an AUG initiation codon, separated by a spacer of 3 to 9 |
| 115 nucleotides of any sequence. | 116 nucleotides of any sequence. |
| 116 | 117 |
| 117 An alternative syntax can be used for very long gaps:: | 118 An alternative syntax can be used for very long gaps:: |
| 118 | 119 |
| 119 \ | 120 \ |
| 120 GGAGG[10]NNNAUG is the same as GGAGG**********NNNAUG | 121 GGAGG[10]NNNAUG is the same as GGAGG**********NNNAUG |
| 121 | 122 |
| 122 Be careful defining variable length helices and relational elements; if the number and type (gap | 123 Be careful defining variable length helices and relational elements; if the number and type (gap |
| 123 or identity) of position do not match on left and right sides, the program will refuse to accept | 124 or identity) of position do not match on left and right sides, the program will refuse to accept |
| 124 the descriptor. | 125 the descriptor. |
| 125 | 126 |
| 126 Relational elements have an additional field which specifies a "transformation matrix" of four | 127 Relational elements have an additional field which specifies a "transformation matrix" of four |
| 127 nucleotides, specifying the rule for making the ``r'`` pattern from the ``r`` sequence in order | 128 nucleotides, specifying the rule for making the ``r'`` pattern from the ``r`` sequence in order |
| 128 ``A-C-G-T``. For example, the transformation matrix for a simple helix is ``TGCA``; if you allow | 129 ``A-C-G-T``. For example, the transformation matrix for a simple helix is ``TGCA``; if you allow |
| 129 ``G-U`` pairs, it is ``TGYR``. RNABOB allows ``G-U`` pairing by default and uses the ``TGYR`` | 130 ``G-U`` pairs, it is ``TGYR``. RNABOB allows ``G-U`` pairing by default and uses the ``TGYR`` |
| 130 matrix for helical elements. | 131 matrix for helical elements. |
| 131 | 132 |
| 132 For example, the explicit description of our hairpin might be: | 133 For example, the explicit description of our hairpin might be: |
| 133 | 134 |
| 134 :: | 135 :: |
| 135 | 136 |
| 136 \ | 137 \ |
| 137 h1 0:0 NNN:NNN | 138 h1 0:0 NNN:NNN |
| 138 r1 0:0 R:N GNAN | 139 r1 0:0 R:N GNAN |
| 139 h2 0:0 **NC:GN** | 140 h2 0:0 **NC:GN** |
| 140 s1 0 UUCG | 141 s1 0 UUCG |
| 141 | 142 |
| 142 This describes a stem of 6 to 8 base pairs, in which the 4th pair from the bottom of the stem must | 143 This describes a stem of 6 to 8 base pairs, in which the 4th pair from the bottom of the stem must |
| 143 be a non-canonical GA pair. Note that, in general, the left side of the primary constraint for | 144 be a non-canonical GA pair. Note that, in general, the left side of the primary constraint for |
| 144 helices and relational elements is redundant, and should be given as all N. In some cases it is | 145 helices and relational elements is redundant, and should be given as all N. In some cases it is |
| 145 convenient to constrain the right side to require a particular base pair (GU, for instance) at one | 146 convenient to constrain the right side to require a particular base pair (GU, for instance) at one |
| 146 position. | 147 position. |
| 147 | 148 |
| 148 A note on mismatches: The split format for helices and relational elements works like this. The | 149 A note on mismatches: The split format for helices and relational elements works like this. The |
| 149 number on the left constrains the primary sequence match of the left side of the primary | 150 number on the left constrains the primary sequence match of the left side of the primary |
| 150 constraint. The number on the right constrains the match of the right side of the primary | 151 constraint. The number on the right constrains the match of the right side of the primary |
| 151 constraint, *after* that side has been constructed according to the sequence on the left. In other | 152 constraint, *after* that side has been constructed according to the sequence on the left. In other |
| 152 words, the number on the left constrains the mismatches in primary sequence only, while the number | 153 words, the number on the left constrains the mismatches in primary sequence only, while the number |
| 153 on the right will constrain the number of mispaired positions in the helix. | 154 on the right will constrain the number of mispaired positions in the helix. |
| 154 | 155 |
| 155 Finally: any line that begins with a pound sign '#' is a comment line, and will not be interpreted | 156 Finally: any line that begins with a pound sign '#' is a comment line, and will not be interpreted |
| 156 by the pattern compiler. | 157 by the pattern compiler. |
| 157 | 158 |
| 158 **Options** | 159 **Options** |
| 159 | 160 |
| 160 The behavior of RNABOB can be modified by use of the following options: | 161 The behavior of RNABOB can be modified by use of the following options: |
| 161 | 162 |
| 162 *Complement*: Selecting this option will cause RNABOB to search for the pattern also on the | 163 *Complement*: Selecting this option will cause RNABOB to search for the pattern also on the |
| 163 complementary strands. | 164 complementary strands. |
| 164 | 165 |
| 165 *Skip*: This is a workaround to avoid a problem in the DNABANK. There are some sequences in the | 166 *Skip*: This is a workaround to avoid a problem in the DNABANK. There are some sequences in the |
| 166 database which have long stretches of ambiguous sequence (N's). Descriptors with no primary | 167 database which have long stretches of ambiguous sequence (N's). Descriptors with no primary |
| 167 sequence constraints will match these garbage sequences at many, many positions, and generate huge | 168 sequence constraints will match these garbage sequences at many, many positions, and generate huge |
| 168 outputs. This option toggles a search strategy that skips forward a pattern-length rather than a | 169 outputs. This option toggles a search strategy that skips forward a pattern-length rather than a |
| 169 single base when a match is found, thus printing out only a single match when overlapping matches | 170 single base when a match is found, thus printing out only a single match when overlapping matches |
| 170 are found. | 171 are found. |
| 171 | 172 |
| 172 **Examples** | 173 **Examples** |
| 173 | 174 |
| 174 The following example descriptors included in the source distribution | 175 The following example descriptors included in the source distribution |
| 175 (http://selab.janelia.org/software/rnabob/rnabob.tar.gz): | 176 (http://selab.janelia.org/software/rnabob/rnabob.tar.gz): |
| 176 | 177 |
| 177 - trna.des - a general descriptor of a tRNA structure | 178 - trna.des - a general descriptor of a tRNA structure |
| 178 - r17.des - descriptor of the consensus binding site for the r17 phage coat protein | 179 - r17.des - descriptor of the consensus binding site for the r17 phage coat protein |
| 179 - pseudoknot.des - description of a simple pseudoknotted structure | 180 - pseudoknot.des - description of a simple pseudoknotted structure |
| 180 | 181 |
| 181 An example cosmid ``F22B7.fa`` from the *C. elegans* genome sequencing project is also provided | 182 An example cosmid ``F22B7.fa`` from the *C. elegans* genome sequencing project is also provided |
| 182 for running these descriptors against. | 183 for running these descriptors against. |
| 183 | 184 |
| 184 :: | 185 :: |
| 185 | 186 |
| 186 \ | 187 \ |
| 187 # trna.des | 188 # trna.des |
| 188 # | 189 # |
| 189 # Generalized descriptor of a tRNA cloverleaf. Doesn't | 190 # Generalized descriptor of a tRNA cloverleaf. Doesn't |
| 190 # find them all though. | 191 # find them all though. |
| 191 # | 192 # |
| 192 | 193 |
| 193 h1 s1 h2 s2 h2' s3 h3 s4 h3' s5 h4 s6 h4' h1' s8 | 194 h1 s1 h2 s2 h2' s3 h3 s4 h3' s5 h4 s6 h4' h1' s8 |
| 194 | 195 |
| 195 h1 0:2 NNNNNNN:NNNNNNN | 196 h1 0:2 NNNNNNN:NNNNNNN |
| 196 h2 0:1 *NNN:NNN* | 197 h2 0:1 *NNN:NNN* |
| 197 h3 0:1 NNNNN:NNNNN | 198 h3 0:1 NNNNN:NNNNN |
| 198 h4 0:1 NNNNN:NNNNN | 199 h4 0:1 NNNNN:NNNNN |
| 199 s1 0 TN | 200 s1 0 TN |
| 200 s2 0 NNNN********** | 201 s2 0 NNNN********** |
| 201 s3 0 N | 202 s3 0 N |
| 202 s4 0 NNNNNN* | 203 s4 0 NNNNNN* |
| 203 s5 0 NN******************** | 204 s5 0 NN******************** |
| 204 s6 0 TTC**** | 205 s6 0 TTC**** |
| 205 s8 0 NCCA | 206 s8 0 NCCA |
| 206 | 207 |
| 207 Running RNABOB with ``trna.des`` against ``F22B7.fa`` searches the top strand of the cosmid for | 208 Running RNABOB with ``trna.des`` against ``F22B7.fa`` searches the top strand of the cosmid for |
| 208 the above motif. ``trna.des`` hits twice, once on each strand. (F22B7 has several other tRNA genes | 209 the above motif. ``trna.des`` hits twice, once on each strand. (F22B7 has several other tRNA genes |
| 209 in it which the pattern fails to detect - this is *not* a pattern to use for tRNA genefinding!). | 210 in it which the pattern fails to detect - this is *not* a pattern to use for tRNA genefinding!). |
| 210 </help> | 211 ]]> |
| 212 </help> | |
| 211 <citations> | 213 <citations> |
| 212 <citation type="doi">10.1093/bioinformatics/6.4.325</citation> | 214 <citation type="doi">10.1093/bioinformatics/6.4.325</citation> |
| 213 <citation type="bibtex">@UNPUBLISHED{rnabob, | 215 <citation type="bibtex">@UNPUBLISHED{rnabob, |
| 214 author = {Eddy S.R}, | 216 author = {Eddy S.R}, |
| 215 title = {RNABOB: a program to search for RNA secondary structure motifs in sequence databases}, | 217 title = {RNABOB: a program to search for RNA secondary structure motifs in sequence databases}, |
| 216 note = {}}</citation> | 218 note = {}}</citation> |
| 217 </citations> | 219 </citations> |
| 218 </tool> | 220 </tool> |
