comparison lastz_d.xml @ 5:ec4affe27298 draft

"planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/lastz commit 25c49a61a5358cc7ab016fb5847328af7e67a24c"
author iuc
date Fri, 02 Apr 2021 17:18:09 +0000
parents 0acd9701676b
children ac0ffffa649e
comparison
equal deleted inserted replaced
4:0acd9701676b 5:ec4affe27298
19 '--infscores=${output}' 19 '--infscores=${output}'
20 ]]> 20 ]]>
21 </command> 21 </command>
22 <inputs> 22 <inputs>
23 <expand macro="target_input"/> 23 <expand macro="target_input"/>
24 <param name="query" format="fasta,fastq" type="data" label="Select QUERY sequence(s)" help="These are the sequences that you are aligning against TARGET"/> 24 <param name="query" format="fasta,fastq,fasta.gz,fastq.gz,fastq.bz2" type="data" label="Select QUERY sequence(s)" help="These are the sequences that you are aligning against TARGET"/>
25 <param name="score_file" type="data" format="txt" optional="true" label="Control file for inference" argument="--inferonly[=control_file]" help="Optional controf file. If nothing is selected, LASTZ_D uses default described in the manual"/> 25 <param name="score_file" type="data" format="txt" optional="true" label="Control file for inference" argument="--inferonly[=control_file]" help="Optional controf file. If nothing is selected, LASTZ_D uses default described in the manual"/>
26 </inputs> 26 </inputs>
27 <outputs> 27 <outputs>
28 <data format="txt" name="output" label="${tool.name} on ${on_string}: substituion matrix"/> 28 <data format="txt" name="output" label="${tool.name} on ${on_string}: substituion matrix"/>
29 </outputs> 29 </outputs>
45 45
46 <help><![CDATA[ 46 <help><![CDATA[
47 47
48 **What is does** 48 **What is does**
49 49
50 LASTZ_D is a non-integer (**D** stands for Double) version of LASTZ that can be used to estimate substitution matrix that will be used to score alignments. It was developed by `Bob Harris <http://www.bx.psu.edu/~rsharris/>`_ in the lab of Webb Miller at Penn State as a part of LASTZ. Matrix computed by this tool is to be used by LASTZ (see below). 50 LASTZ_D is a non-integer (**D** stands for Double) version of LASTZ that can be used to estimate substitution matrix that
51 will be used to score alignments. It was developed by `Bob Harris <http://www.bx.psu.edu/~rsharris/>`_ in the lab of
52 Webb Miller at Penn State as a part of LASTZ. Matrix computed by this tool is to be used by LASTZ (see below).
51 53
52 .. class:: warningmark 54 .. class:: warningmark
53 55
54 **Read documentation** before proceeding. LASTZ is a complex tool with many parameter options. Fortunately, there is a `great manual <https://lastz.github.io/lastz/>`_ maintained by its author. The two sections that are particularly relevant to the inference of substitution matrix are `Inferring Score Sets <http://www.bx.psu.edu/~rsharris/lastz/README.lastz-1.04.00.html#adv_inference>`_ and `Inference Control File <http://www.bx.psu.edu/~rsharris/lastz/README.lastz-1.04.00.html#fmt_inference>`_. 56 **Read documentation** before proceeding. LASTZ is a complex tool with many parameter options. Fortunately, there is
57 a `great manual <https://lastz.github.io/lastz/>`_ maintained by its author. The two sections that are particularly
58 relevant to the inference of substitution matrix are
59 `Inferring Score Sets <https://www.bx.psu.edu/~rsharris/lastz/README.lastz-1.04.03.html#adv_inference>`_ and
60 `Inference Control File <https://www.bx.psu.edu/~rsharris/lastz/README.lastz-1.04.03.html#fmt_inference>`_.
55 61
56 **Notes on the inference** 62 **Notes on the inference**
57 63
58 Inference is achieved by computing the probability of each of the 18 different alignment events (gap open, gap extend, and 16 substitutions). These probabilities are estimated from alignments of the sequences. Of course, at first we don't have alignments, so the process begins by using a generic scoring set to create alignments, infer scores from those, then realign, and so on, until the scores stabilize or "converge". Ungapped alignments are performed until the substitution scores converge, then gapped alignments are performed (holding the substitution scores constant) until the gap penalties converge. In the end you get a matrix like this:: 64 Inference is achieved by computing the probability of each of the 18 different alignment events (gap open, gap extend, and 16 substitutions).
65 These probabilities are estimated from alignments of the sequences. Of course, at first we don't have alignments, so the process
66 begins by using a generic scoring set to create alignments, infer scores from those, then realign, and so on, until the scores stabilize
67 or "converge". Ungapped alignments are performed until the substitution scores converge, then gapped alignments are performed
68 (holding the substitution scores constant) until the gap penalties converge. In the end you get a matrix like this::
59 69
60 # (a LASTZ scoring set, created by "LASTZ --infer") 70 # (a LASTZ scoring set, created by "LASTZ --infer")
61 71
62 bad_score = X:-1781 # used for sub[X][*] and sub[*][X] 72 bad_score = X:-1781 # used for sub[X][*] and sub[*][X]
63 fill_score = -178 # used when sub[*][*] not otherwise defined 73 fill_score = -178 # used when sub[*][*] not otherwise defined