Mercurial > repos > iuc > biotradis

<tool id="bacteria_tradis" name="Bio-TraDis reads to counts" version="@TOOL_VERSION@+galaxy@VERSION@">
    <description></description>
    <macros>
          <import>macros.xml</import>
    </macros>
    <expand macro="requirements" />
    <command detect_errors="aggressive">
        <![CDATA[
            cp '${input_ref}' reference.fa &&
            ls '${input_fastq}' > file.txt &&
            bacteria_tradis -v -f file.txt -r reference.fa
                #if str($map_parameters.map_options) == "modify":

                    #if str($map_parameters.set_kmers_options.set) == "yes":
                       --smalt_k  '$map_parameters.set_kmers_options.kmer_length'
                       --smalt_s  '$map_parameters.set_kmers_options.step_size'
                    #end if

                    --smalt_y '$map_parameters.min_percentage'
                    --smalt_r '$map_parameters.duplicate_reads'
                    -m '$map_parameters.min_quality'

                #end if

                #if str($tranposon_tag.use) == "yes":
                    -m '$tranposon_tag.nb_mismatches'
                    -t '$tranposon_tag.sequence'
                #end if
            2>&1
        ]]>
    </command>

    <inputs>
        <param name="input_fastq" type="data" format="fastq" label="Fastq file containing TraDis reads"/>
        <param name="input_ref" type="data" format="fasta" label="Fasta File of the reference Genome"/>

        <conditional name="map_parameters">
            <param name="map_options" type="select" label="Mapping Parameters" help="By default, the bacteria_tradis pipeline determines appropriate read mapping parameters automatically from the length of the first read in the fastq file. These parameters have been tested for data issue from TraDIS protocol of Barquist et al.">
                <option value="default" selected="true">Use Default Parameters</option>
                <option value="modify">Set Mapping parameters</option>

            </param>
            <when value="modify">

                <conditional name="set_kmers_options">
                    <param name="set" type="boolean" label="Modify kmers parameters" truevalue='yes' falsevalue='no' />
                    <when value="yes">
                      <param name="kmer_length" type="integer" value="" min="9" max="20" label=" Length of kmers hashed (--smalt_k)" help=" The minimum length of an exact match between a read and the genome needed to trigger an alignment attempt. Appropriate values are between ~10 and 20 for bacterial genomes depending on read length. Lower values lead to increased sensitivity at the expense of runtime." />
                     <param name="step_size" type="integer" value="" min="1" max="15" label="Step size for smalt kmers (--smalt_s)" help=" Distance between the start of hashed kmers. Appropriate values are between 1 and ~15, but should be less than --smalt_k to ensure kmers overlap. Lower values lead to increased sensitivity at the expense of runtime." />
                     </when>
                     <when value="no">
                     </when>
                </conditional>

                <param name="min_percentage" type="float" value="0.96" min="0" max="1" label="Minimum percentage of identical bases between read and reference (--smalt_y)" help="May be lowered to improve sensitivity in the case of low quality or short reads." />
                <param name="duplicate_reads" type="boolean" truevalue="-1" falsevalue="0" label="Randomly assign position to reads that align in multiple location (--smalt_r)" help="If not, reads mapping in multiples positions are ignored" />
                <param name="min_quality" type="integer" value="30" label="Minimum mapping quality score (-m) " help="Multi-mapping reads have a quality score of 0 by definition, so this parameter needs to be set to 0 for these reads to be properly processed. Can be lowered without dramatically affecting results in most cases, particularly if --smalt_y is set reasonably." />

            </when>
            <when value="default">
            </when>
        </conditional>

        <conditional name="tranposon_tag">
            <param name="use" type="boolean" truevalue="yes" falsevalue="no" label="Search for a tranposon tag" help="Use with data containing a transposon tag attached to the reads. Only reads containing the transposon tag will be processed, and the tag will be removed before mapping." />

            <when value="yes">

                <param name="sequence" type="text" value="" help="" />
                <param name="nb_mismatches" type="integer" value="2" min="0" help="If there is evidence for low-quality bases in the transposon tag (from FastQC, for instance), setting this to 1 or 2 may result in higher recovery of insertion sites. Higher than 2 is not advisable with the typical transposon tag lengths (10 - 12 bases) produced by TraDIS protocols, but may be appropriate with protocols that produce significantly longer transposon tags." />
                <param name="tagdir" type="select" label="Direction of the transposon tag" help="" >
                    <option value="3" selected="true">3'</option>
                    <option value="5">5bacteria_tradis.xml'</option>
                </param>

            </when>
            <when value="no">
            </when>
        </conditional>


    </inputs>

    <outputs>
        <data format="txt" name="Statistics" from_work_dir="file.stats" label="${tool.name} on ${on_string} : Statistics" />
        <data name="Counts" format="tabular.gz" from_work_dir="./*.gz" label="${tool.name} on ${on_string} : Counts"  />
        <data name="Aligned_reads" format="bam" from_work_dir="./*.bam" label="${tool.name} on ${on_string} : Mapped Reads"  />
    </outputs>

    <tests>
        <test>
            <param name="input_fastq" ftype="fastq" value="tiny.fastq.gz"/>
            <param name="input_ref" ftype="fasta" value="tiny_ref.fasta"/>
            <param name="map_options" value="default"/>
            <param name="min_quality" ftype="float" value="0"/>
            <param name="use" value="no"/>
            <param name="set" ftype="select" value="no"/>
            <output name="Statistics" file="file.stats" lines_diff="2"  />
            <output name="Counts" file="tiny.out.gz.CP009273.1_60_120.insert_site_plot.gz" compare="diff" decompress="true" lines_diff="0" />
        </test>
        <test>
            <param name="input_fastq" ftype="fastq" value="tiny.fastq.gz"/>
            <param name="input_ref" ftype="fasta" value="tiny_ref.fasta"/>
            <param name="min_quality" ftype="integer" value="0"/>
            <param name="map_options" value="modify"/>
            <param name="min_percentage" ftype="float" value="0.5"/>
            <param name="duplicate_reads" ftype="boolean" value="-1"/>
            <param name="min_quality" ftype="float" value="20"/>
            <param name="use" value="no"/>
            <param name="set" ftype="select" value="yes"/>
            <param name="kmer_length" ftype="integer" value="10"/>
            <param name="step_size" ftype="integer" value="5"/>
            <output name="Statistics" file="file.stats" lines_diff="2"  />
            <output name="Counts" file="tiny_1.out.gz.CP009273.1_60_120.insert_site_plot.gz" compare="diff" decompress="true" lines_diff="0" />
        </test>

    </tests>
    <help>
<![CDATA[

**What is does**

Bio-TraDis provides software utilities for the processing, mapping, and analysis of transposon insertion sequencing data. The pipeline was designed with the data from the TraDIS sequencing protocol in mind, but should work with a variety of transposon insertion sequencing protocols as long as they produce data in the expected format.

-----

**Parameters**

The --smalt_r 0 and -m 0 options specify that we want to map reads with multiple best mappings to a random position and use these in our downstream analyses; by default these reads are left unmapped. Mapping and processing this library will take about 30 minutes to an hour on a typical desktop computer.

By default, the bacteria_tradis pipeline determines appropriate read mapping parameters automatically from the length of the first read in the fastq file. It should be noted that the default parameters have been tested using the optimized TraDIS protocol of Barquist et al., 20XX in the hands of an experienced sequencing specialist; these will need to be tuned for other protocols, or for pilot runs, etc. There are various other scenarios in which it would be appropriate to reduce the stringency of these parameters: in the case that read trimming has been applied, if there are quality issues in the library, for certain types of studies (particularly gene essentiality studies as above), or if the quality of the reference genome is low (or of a different strain).


The *-mm* option specifies the number of mismatches allowed when matching the transposon tag; by default none are allowed. We sometimes observe one or two positions within the transposon tag that seem to have generally low quality. If there is evidence for low-quality bases in the transposon tag (from FastQC, for instance), setting this to 1 or 2 may result in higher recovery of insertion sites. Higher than 2 is not advisable with the typical transposon tag lengths (10 - 12 bases) produced by TraDIS protocols, but may be appropriate with protocols that produce significantly longer transposon tags.


The *-m* option sets the minimum mapping quality score to use an alignment in downstream analysis (e.g. plot files); defaults to 30. Multi-mapping reads have a quality score of 0 by definition, so this parameter needs to be set to 0 for these reads to be properly processed. Can be lowered without dramatically affecting results in most cases, particularly if  *smalt_y* is set reasonably.


The other options specify parameters for the smalt mapper, which are discussed in more detail in the smalt manual (ftp.sanger.ac.uk/pub/resources/software/smalt/smalt-manual-0.7.4.pdf). We will discuss their effects on TraDIS mapping briefly here:

*-smalt_k*: length of kmers hashed; roughly, the minimum length of an exact match between a read and the genome needed to trigger an alignment attempt. Appropriate values are between ~10 and 20 for bacterial genomes depending on read length. Lower values lead to increased sensitivity at the expense of runtime.

*-smalt_s*: skipstep. Sampling step size, i.e. the distance between successive words that are hashed along the genomic reference sequence. With the option -s 1
every word is hashed, with -s 2 every second word, with -s 3 very third etc. Appropriate values are between 1 and ~15, but should be less than --smalt_k to ensure kmers overlap. Lower values lead to increased sensitivity at the expense of runtime.


*-smalt_y*: minimum percentage of identical bases between read and reference, defaults to .96 - 96% identity, or 4 mismatches allowed in a 100 base read. May be lowered to improve sensitivity in the case of low quality or short reads.


*-smalt_r*: specifies what to do with reads that map equally well in multiple locations. By default this is set to -1, meaning that multi-mapping reads are left unmapped. This is appropriate in studies comparing insertion frequency in the same library passaged through multiple conditions, as in this case a change in frequency of one repetitive gene could lead to many genes appearing to be selected artifactually. For studies of gene essentiality in a newly created library, this should be set to 0 (randomly assign a position) to avoid repetitive elements (particularly insertion sequences and the like) artificially appearing to be essential.

-----

**Output files**

On completion, bacteria tradis produces a number of files. These include:
**(input list name).stats** : Mapping statistics file. This is comma delimited, and includes one line for each library mapped along with a header. It can be easily opened in e.g. Excel or R.
**(library name.replicon_name).insert_site_plot.gz**: Plot files, one for each replicon and library. These contain insertion counts on each strand for every nucleotide position in the replicon. They can be opened as “user plots” in the Artemis genome browser, and will be used for further analysis.
**(library name).mapped.bam** : BAM file containing mapped reads.

-----

**More information**

.. class:: infomark

Additional information about Bio-TraDis can be found at https://github.com/sanger-pathogens/Bio-Tradis
]]>
    </help>

<expand macro="citations" />

 </tool>
author	iuc
date	Mon, 04 Jul 2022 07:28:50 +0000
parents	476d4cefec3a
children