Mercurial > repos > thanhlv > flye
comparison flye.xml @ 6:b94de04ca7c2 draft
planemo upload for repository https://github.com/quadram-institute-bioscience/galaxy-tools/tree/master/tools/flye commit e5796f490952b36c7f1360351be90ec0bb60de55-dirty
| author | thanhlv |
|---|---|
| date | Tue, 17 Sep 2019 11:08:59 -0400 |
| parents | af87635c6888 |
| children | 36721dedba06 |
comparison
equal
deleted
inserted
replaced
| 5:693c8773a40c | 6:b94de04ca7c2 |
|---|---|
| 4 <import>macros.xml</import> | 4 <import>macros.xml</import> |
| 5 </macros> | 5 </macros> |
| 6 <expand macro="requirements" /> | 6 <expand macro="requirements" /> |
| 7 <version_command>flye --version</version_command> | 7 <version_command>flye --version</version_command> |
| 8 <command detect_errors="exit_code"> | 8 <command detect_errors="exit_code"> |
| 9 <![CDATA[ | 9 <![CDATA[ |
| 10 | 10 |
| 11 #for $counter, $input in enumerate($inputs): | 11 #for $counter, $input in enumerate($inputs): |
| 12 | 12 |
| 13 #if $input.is_of_type('fastqsanger', 'fastq'): | 13 #if $input.is_of_type('fastqsanger', 'fastq'): |
| 14 #set $ext = 'fastq' | 14 #set $ext = 'fastq' |
| 46 #end if | 46 #end if |
| 47 #if $no_trestle: | 47 #if $no_trestle: |
| 48 '$no_trestle' | 48 '$no_trestle' |
| 49 #end if | 49 #end if |
| 50 2>&1 | 50 2>&1 |
| 51 ]]></command> | 51 ]]> </command> |
| 52 <inputs> | 52 <inputs> |
| 53 <param name="inputs" type="data" format="fasta,fasta.gz,fastq,fastq.gz,fastqsanger.gz,fastqsanger" multiple="true" label="Input reads" /> | 53 <param name="inputs" type="data" format="fasta,fasta.gz,fastq,fastq.gz,fastqsanger.gz,fastqsanger" multiple="true" label="Input reads" > |
| 54 <help><![CDATA[ | |
| 55 | |
| 56 Input reads could be in FASTA or FASTQ format, uncompressed | |
| 57 or compressed with gz. Currenlty, raw and corrected reads | |
| 58 from PacBio and ONT are supported. The expected error rates are | |
| 59 <30% for raw and <2% for corrected reads. Additionally, | |
| 60 --subassemblies option performs a consensus assembly of multiple | |
| 61 sets of high-quality contigs. You may specify multiple | |
| 62 files with reads (separated by spaces). Mixing different read | |
| 63 types is not yet supported. | |
| 64 ]]> </help> | |
| 65 </param> | |
| 54 <param name="mode" type="select" label="Mode"> | 66 <param name="mode" type="select" label="Mode"> |
| 55 <option value="--nano-raw">Nanopore raw</option> | 67 <option value="--nano-raw">Nanopore raw</option> |
| 56 <option value="--nano-corr">Nanopore corrected</option> | 68 <option value="--nano-corr">Nanopore corrected</option> |
| 57 <option value="--pacbio-raw">PacBio raw</option> | 69 <option value="--pacbio-raw">PacBio raw</option> |
| 58 <option value="--pacbio-corr">PacBio corrected</option> | 70 <option value="--pacbio-corr">PacBio corrected</option> |
| 59 <option value="--subassemblies">high-quality contig-like input</option> | 71 <option value="--subassemblies">high-quality contig-like input</option> |
| 60 </param> | 72 </param> |
| 61 <param argument="-g" type="text" label="estimated genome size (for example, 5m or 2.6g)"> | 73 <param argument="-g" type="text" label="estimated genome size (for example, 5m or 2.6g)"> |
| 74 <help> | |
| 75 <![CDATA[ | |
| 76 <span>The genome size estimate is used for solid k-mer selection in the | |
| 77 initial disjointig assembly stage. <b>Flye is not very sensitive to this | |
| 78 parameter, and the estimate could be rough</b>. It is ok if the parameter is | |
| 79 within 0.5x-2x of the actual genome size. If the final assembly size is | |
| 80 very different from the initial guess, consider re-running the pipeline | |
| 81 with an updated estimate for better results.</span> | |
| 82 <br> | |
| 83 <span>An alternative option is to run Flye in <b>--meta</b> mode, which uses a different | |
| 84 approach for solid k-mer selection. This mode is almost independent from the | |
| 85 genome size parameter (you still need to provide an estimate for the selection | |
| 86 of some other parameters). When assembly is completed, you can re-run in the | |
| 87 normal mode with the inferred genome size.</span> | |
| 88 ]]> | |
| 89 </help> | |
| 62 <validator type="regex" message="Genome size must be a float or integer, optionally followed by the a unit prefix (kmg)">^([0-9]*[.])?[0-9]+[kmg]?$</validator> | 90 <validator type="regex" message="Genome size must be a float or integer, optionally followed by the a unit prefix (kmg)">^([0-9]*[.])?[0-9]+[kmg]?$</validator> |
| 63 </param> | 91 </param> |
| 64 <param argument="-i" type="integer" value="1" label="number of polishing iterations" /> | 92 <param argument="-i" type="integer" value="1" label="number of polishing iterations" /> |
| 65 <param argument="-m" type="integer" optional="true" label="minimum overlap between reads (default: auto)" /> | 93 <param argument="-m" type="integer" optional="true" label="minimum overlap between reads (default: auto)" help="This sets a minimum overlap length for two reads to be considered overlapping. In the latest Flye versions, this parameter is chosen automatically based on the read length distribution (reads N90) and does not require manual setting. Typical value is 3k-5k (and down to 1k for datasets with shorter read length). Intuitively, we want to set this parameter as high as possible, so the repeat graph is less tangled. However, higher values might lead to assembly gaps. In some rare cases (for example in case of biased read length distribution) it makes sense to set this parameter manualy."/> |
| 66 <param argument="--asm_coverage" type="integer" optional="true" label="reduced coverage for initial contig assembly (default: not set)" /> | 94 <param argument="--asm_coverage" type="integer" optional="true" label="reduced coverage for initial contig assembly (default: not set)" /> |
| 67 <param argument="--plasmid" type="boolean" truevalue="--plasmid" falsevalue="" checked="False" label="rescue short unassmebled plasmids" /> | 95 <param argument="--plasmid" type="boolean" truevalue="--plasmid" falsevalue="" checked="False" label="rescue short unassmebled plasmids" /> |
| 68 <param argument="--meta" type="boolean" truevalue="--meta" falsevalue="" checked="False" label="metagenome / uneven coverage mode" /> | 96 <param argument="--meta" type="boolean" truevalue="--meta" falsevalue="" checked="False" label="metagenome / uneven coverage mode" /> |
| 69 <param argument="--no_trestle" type="boolean" truevalue="--no-trestle" falsevalue="" checked="False" label="skip Trestle stage" /> | 97 <param argument="--no_trestle" type="boolean" truevalue="--no-trestle" falsevalue="" checked="False" label="skip Trestle stage" help="After resolving bridged repeats, Trestle module attempts to resolve simple unbridged repeats (of multiplicity 2) using the heterogeneities between repeat copies"/> |
| 70 </inputs> | 98 </inputs> |
| 71 <outputs> | 99 <outputs> |
| 72 <data name="scaffolds" format="fasta" from_work_dir="out_dir/scaffolds.fasta" label="${tool.name} on ${on_string} (scaffolds)"/> | 100 <data name="scaffolds" format="fasta" from_work_dir="out_dir/scaffolds.fasta" label="${tool.name} on ${on_string} (scaffolds)"/> |
| 73 <data name="assembly_info" format="tabular" from_work_dir="out_dir/assembly_info.txt" label="${tool.name} on ${on_string} (assembly_info)"/> | 101 <data name="assembly_info" format="tabular" from_work_dir="out_dir/assembly_info.txt" label="${tool.name} on ${on_string} (assembly_info)"/> |
| 74 <data name="assembly_graph" format="graph_dot" from_work_dir="out_dir/assembly_graph.gv" label="${tool.name} on ${on_string} (assembly_graph)"/> | 102 <data name="assembly_graph" format="graph_dot" from_work_dir="out_dir/assembly_graph.gv" label="${tool.name} on ${on_string} (assembly_graph)"/> |
| 101 <param name="i" value="2"/> | 129 <param name="i" value="2"/> |
| 102 <output name="scaffolds" file="result3_scaffolds.fasta" ftype="fasta" compare="sim_size"/> | 130 <output name="scaffolds" file="result3_scaffolds.fasta" ftype="fasta" compare="sim_size"/> |
| 103 <output name="assembly_gfa" file="result2_assembly_graph.gfa" ftype="txt" compare="sim_size"/> | 131 <output name="assembly_gfa" file="result2_assembly_graph.gfa" ftype="txt" compare="sim_size"/> |
| 104 </test> | 132 </test> |
| 105 </tests> | 133 </tests> |
| 106 <help><![CDATA[ | 134 <help> |
| 135 <![CDATA[ | |
| 136 Flye output | |
| 137 The main output files are: | |
| 107 | 138 |
| 108 Input reads could be in FASTA or FASTQ format, uncompressed | 139 - **assembly.fasta** - Final assembly. Contains contigs and possibly scaffolds (see below). |
| 109 or compressed with gz. Currenlty, raw and corrected reads | |
| 110 from PacBio and ONT are supported. The expected error rates are | |
| 111 <30% for raw and <2% for corrected reads. Additionally, | |
| 112 --subassemblies option performs a consensus assembly of multiple | |
| 113 sets of high-quality contigs. You may specify multiple | |
| 114 files with reads (separated by spaces). Mixing different read | |
| 115 types is not yet supported. | |
| 116 | 140 |
| 117 You must provide an estimate of the genome size as input, | 141 - **assembly_graph.{gfa|gv}** - Final repeat graph. Note that the edge sequences might be different (shorter) than contig sequences, because contigs might include multiple graph edges (see below). |
| 118 which is used for solid k-mers selection. The estimate could | 142 |
| 119 be rough (e.g. withing 0.5x-2x range) and does not affect | 143 - **assembly_info.txt** - Extra information about contigs (such as length or coverage). |
| 120 the other assembly stages. Standard size modificators are | 144 |
| 121 supported (e.g. 5m or 2.6g). | 145 Each contig is formed by a single unique graph edge. If possible, unique contigs are extended with the sequence from flanking unresolved repeats on the graph. Thus, a contig fully contains the corresponding graph edge (with the same id), but might be longer then this edge. This is somewhat similar to unitig-contig relation in OLC assemblers. In a rare case when a repetitive graph edge is not covered by the set of "extended" contigs, it will be also output in the assembly file. |
| 122 | 146 |
| 123 ]]></help> | 147 Sometimes it is possible to further order contigs into scaffolds based on the repeat graph structure. These ordered contigs will be output as a part of scaffold in the assembly file (with a scaffold\_ prefix). Since it is hard to give a reliable estimate of the gap size, those gaps are represented with the default 100 Ns. assembly_info.txt file (below) contains additional information about how scaffolds were formed. |
| 148 | |
| 149 Extra information about contigs/scaffolds is output into the assembly_info.txt file. It is a tab-delimited table with the columns as follows: | |
| 150 | |
| 151 - Contig/scaffold id | |
| 152 | |
| 153 - Length | |
| 154 | |
| 155 - Coverage | |
| 156 | |
| 157 - Is circular (representing circular sequence, such as bacterial chromosome or plasmid) | |
| 158 | |
| 159 - Is repetitive (represents repeated, rather than unique sequence) | |
| 160 | |
| 161 - Multiplicity (inferred multiplicity based on coverage) | |
| 162 | |
| 163 - Graph path (repeat graph path corresponding to this contig/scaffold). Scaffold gaps are marked with ?? symbols, and * symbol denotes a terminal graph node. | |
| 164 | |
| 165 scaffolds.fasta file is a symlink to assembly.fasta, which is retained for the backward compatibility. | |
| 166 ]]> | |
| 167 </help> | |
| 124 <expand macro="citations" /> | 168 <expand macro="citations" /> |
| 125 </tool> | 169 </tool> |
