# HG changeset patch # User iuc # Date 1511535328 18000 # Node ID cbc665adcde405bb7731a7f95b327831f4283ee8 # Parent c022e4a68b7640f94cdc0389b2fbb4ab7972ec55 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/bwa commit c355891532cecaab6b3288a148a6b3bcb5973396 diff -r c022e4a68b76 -r cbc665adcde4 bwa-mem.xml --- a/bwa-mem.xml Tue Nov 21 11:23:45 2017 -0500 +++ b/bwa-mem.xml Fri Nov 24 09:55:28 2017 -0500 @@ -1,63 +1,65 @@ - - - map medium and long reads (> 100 bp) against reference genome - - read_group_macros.xml - bwa_macros.xml - - - - + + - map medium and long reads (> 100 bp) against reference genome + + read_group_macros.xml + bwa_macros.xml + + + + - + '${reference_fasta_filename}' + '${fastq_input.fastq_input1.forward}' '${fastq_input.fastq_input1.reverse}' +#else: + '${reference_fasta_filename}' + '${fastq_input.fastq_input1}' +#end if - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| samtools sort -@\${GALAXY_SLOTS:-2} -O bam -o '$bam_output' +]]> + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - + - - - - - + + + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 70 nt) reads against large reference genomes. - -This Galaxy tool wraps bwa-mem module of bwa read mapping tool. Galaxy implementation takes fastq files as input and produces output in BAM (not SAM) format, which can be further processed using various BAM utilities exiting in Galaxy (BAMTools, SAMTools, Picard). +This Galaxy tool wraps bwa-mem module of bwa read mapping tool. The Galaxy implementation takes fastq files as input and produces output in BAM format, which can be further processed using various BAM utilities exiting in Galaxy (BAMTools, SAMTools, Picard). ----- @@ -314,9 +315,9 @@ Galaxy wrapper for BWA allows you select between precomputed and user-defined indices for reference genomes using **Will you select a reference genome from your history or use a built-in index?** flag. This flag has two options: - 1. **Use a built-in genome index** - when selected (this is default), Galaxy provides the user with **Select reference genome index** dropdown. Genomes listed in this dropdown have been pre-indexed with bwa index utility and are ready to be mapped against. + 1. **Use a built-in genome index** - when selected (this is default), Galaxy provides the user with **Select reference genome index** dropdown. Genomes listed in this dropdown have been pre-indexed with bwa index utility and are ready to be mapped against. 2. **Use a genome from the history and build index** - when selected, Galaxy provides the user with **Select reference genome sequence** dropdown. This dropdown is populated by all FASTA formatted files listed in your current history. If your genome of interest is uploaded into history it will be shown there. Selecting a genome from this dropdown will cause Galaxy to first transparently index it using `bwa index` command, and then run mapping with `bwa mem`. - + If your genome of interest is not listed here you have two choices: 1. Contact galaxy team using **Help->Support** link at the top of the interface and let us know that an index needs to be added @@ -328,74 +329,23 @@ Galaxy allows four levels of control over bwa-mem options provided by **Select analysis mode** menu option. These are: - 1. *Simple Illumina mode*: The simplest possible bwa mem application in which it alignes single or paired-end data to reference using default parameters. It is equivalent to the following command: bwa mem <reference index> <fastq dataset1> [fastq dataset2] - 2. *PacBio mode*: The mode adjusted specifically for mapping of long PacBio subreads. Equivalent to the following command: bwa mem -k17 -W40 -r10 -A1 -B1 -O1 -E1 -L0 <reference index> <PacBio dataset in fastq format> + 1. *Simple Illumina mode*: The simplest possible bwa mem application in which it alignes single or paired-end data to reference using default parameters. It is equivalent to the following command: bwa mem [fastq dataset2] + 2. *PacBio mode*: The mode adjusted specifically for mapping of long PacBio subreads. Equivalent to the following command: bwa mem -k17 -W40 -r10 -A1 -B1 -O1 -E1 -L0 3. *Full list of options*: Allows access to all options through Galaxy interface. ------- - -**BWA MEM options** - -Each Galaxy parameter widget corresponds to command line flags listed below: - -Algorithm options:: - - -k INT minimum seed length [19] - -w INT band width for banded alignment [100] - -d INT off-diagonal X-dropoff [100] - -r FLOAT look for internal seeds inside a seed longer than {-k} * FLOAT [1.5] - -y INT find MEMs longer than {-k} * {-r} with size less than INT [0] - -c INT skip seeds with more than INT occurrences [500] - -D FLOAT drop chains shorter than FLOAT fraction of the longest overlapping chain [0.50] - -W INT discard a chain if seeded bases shorter than INT [0] - -m INT perform at most INT rounds of mate rescues for each read [50] - -S skip mate rescue - -P skip pairing; mate rescue performed unless -S also in use - -e discard full-length exact matches - -Scoring options:: - - -A INT score for a sequence match, which scales options -TdBOELU unless overridden [1] - -B INT penalty for a mismatch [4] - -O INT[,INT] gap open penalties for deletions and insertions [6,6] - -E INT[,INT] gap extension penalty; a gap of size k cost '{-O} + {-E}*k' [1,1] - -L INT[,INT] penalty for 5'- and 3'-end clipping [5,5] - -U INT penalty for an unpaired read pair [17] - -Input/output options:: - - -p first query file consists of interleaved paired-end sequences - -R STR read group header line such as '@RG\tID:foo\tSM:bar' [null] - - -v INT verbose level: 1=error, 2=warning, 3=message, 4+=debugging [3] - -T INT minimum score to output [30] - -h INT if there are <INT hits with score >80% of the max score, output all in XA [5] - -a output all alignments for SE or unpaired PE - -C append FASTA/FASTQ comment to SAM output - -V output the reference FASTA header in the XR tag - -Y use soft clipping for supplementary alignments - -M mark shorter split hits as secondary - - -I FLOAT[,FLOAT[,INT[,INT]]] - specify the mean, standard deviation (10% of the mean if absent), max - (4 sigma from the mean if absent) and min of the insert size distribution. - FR orientation only. [inferred] - -@dataset_collections@ - @RG@ @info@ - - - 10.1093/bioinformatics/btp324 - 10.1093/bioinformatics/btp698 - @misc{1303.3997, -Author = {Heng Li}, -Title = {Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM}, -Year = {2013}, -Eprint = {arXiv:1303.3997}, -url = {http://arxiv.org/abs/1303.3997}, -} - + ]]> + + 10.1093/bioinformatics/btp324 + 10.1093/bioinformatics/btp698 + @misc{1303.3997, + Author = {Heng Li}, + Title = {Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM}, + Year = {2013}, + Eprint = {arXiv:1303.3997}, + url = {http://arxiv.org/abs/1303.3997}, + } + diff -r c022e4a68b76 -r cbc665adcde4 bwa.xml --- a/bwa.xml Tue Nov 21 11:23:45 2017 -0500 +++ b/bwa.xml Fri Nov 24 09:55:28 2017 -0500 @@ -1,457 +1,412 @@ - - - map short reads (< 100 bp) against reference genome - - read_group_macros.xml - bwa_macros.xml - - #if str( $analysis_type.analysis_type_selector ) == "full": - -n ${analysis_type.n} - -o ${analysis_type.o} - -e ${analysis_type.e} - -i ${analysis_type.i} - -d ${analysis_type.d} - -l ${analysis_type.l} - -k ${analysis_type.k} - -m ${analysis_type.m} - -M ${analysis_type.M} - -O ${analysis_type.O} - -E ${analysis_type.E} - -R ${analysis_type.R} - -q ${analysis_type.q} - - #if str( $analysis_type.B ): + + - map short reads (< 100 bp) against reference genome + + read_group_macros.xml + bwa_macros.xml + +#if str( $analysis_type.analysis_type_selector ) == "full": + -n ${analysis_type.n} + -o ${analysis_type.o} + -e ${analysis_type.e} + -i ${analysis_type.i} + -d ${analysis_type.d} + -l ${analysis_type.l} + -k ${analysis_type.k} + -m ${analysis_type.m} + -M ${analysis_type.M} + -O ${analysis_type.O} + -E ${analysis_type.E} + -R ${analysis_type.R} + -q ${analysis_type.q} + #if str( $analysis_type.B ): -B ${analysis_type.B} - #end if - - #if str( $analysis_type.L ): + #end if + #if str( $analysis_type.L ): -L ${analysis_type.L} - #end if #end if - - - #if $use_rg: - @set_rg_string@ - -r '$rg_string' - #end if - +#end if + + +#if $use_rg: + @set_rg_string@ + -r '$rg_string' +#end if + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - first.sai && +#if str( $input_type.input_type_selector ) == "paired" or str( $input_type.input_type_selector ) == "paired_collection": + bwa aln + -t "\${GALAXY_SLOTS:-1}" + @command_options@ + '$reference_fasta_filename' + #if str( $input_type.input_type_selector ) == "paired_collection": + '${input_type.fastq_input1.forward}' + #else + '${input_type.fastq_input1}' + #end if + > first.sai && - bwa aln - -t "\${GALAXY_SLOTS:-1}" - - @command_options@ - - "${reference_fasta_filename}" + bwa aln + -t "\${GALAXY_SLOTS:-1}" + @command_options@ + '${reference_fasta_filename}' + #if str( $input_type.input_type_selector ) == "paired_collection": + '${input_type.fastq_input1.reverse}' + #else + '${input_type.fastq_input2}' + #end if + > second.sai && - #if str( $input_type.input_type_selector ) == "paired_collection": - "${input_type.fastq_input1.reverse}" - #else - "${input_type.fastq_input2}" - #end if - - > second.sai && - - bwa sampe - - #if str( $input_type.adv_pe_options.adv_pe_options_selector) == "True": + bwa sampe + #if str( $input_type.adv_pe_options.adv_pe_options_selector) == "True": -a ${$input_type.adv_pe_options.a} -o ${$input_type.adv_pe_options.o} -n ${$input_type.adv_pe_options.n} -N ${$input_type.adv_pe_options.N} - #end if - - @read_group_options@ + #end if + @read_group_options@ + #if str( $input_type.input_type_selector ) == "paired_collection": + '${reference_fasta_filename}' first.sai second.sai '${input_type.fastq_input1.forward}' '${input_type.fastq_input1.reverse}' + #else: + '${reference_fasta_filename}' first.sai second.sai '${input_type.fastq_input1}' '${input_type.fastq_input2}' + #end if - #if str( $input_type.input_type_selector ) == "paired_collection": - "${reference_fasta_filename}" first.sai second.sai "${input_type.fastq_input1.forward}" "${input_type.fastq_input1.reverse}" - #else: - "${reference_fasta_filename}" first.sai second.sai "${input_type.fastq_input1}" "${input_type.fastq_input2}" - #end if + ## Fastq single -####### Fastq single - - #elif str( $input_type.input_type_selector ) == "single": - bwa aln - -t "\${GALAXY_SLOTS:-1}" +#elif str( $input_type.input_type_selector ) == "single": + bwa aln + -t "\${GALAXY_SLOTS:-1}" - @command_options@ + @command_options@ - "${reference_fasta_filename}" - "${input_type.fastq_input1}" - > first.sai && - - bwa samse + '${reference_fasta_filename}' + '${input_type.fastq_input1}' + > first.sai && - #if str( $input_type.adv_se_options.adv_se_options_selector) == "True": - -n ${$input_type.adv_se_options.n} - #end if + bwa samse - @read_group_options@ - - "${reference_fasta_filename}" first.sai "${input_type.fastq_input1}" + #if str( $input_type.adv_se_options.adv_se_options_selector) == "True": + -n ${$input_type.adv_se_options.n} + #end if + @read_group_options@ + '${reference_fasta_filename}' first.sai '${input_type.fastq_input1}' ####### BAM paired - #elif str( $input_type.input_type_selector ) == "paired_bam": - bwa aln - -t "\${GALAXY_SLOTS:-1}" - -b - -1 - - @command_options@ - - "${reference_fasta_filename}" - "${input_type.bam_input}" - > first.sai && +#elif str( $input_type.input_type_selector ) == "paired_bam": + bwa aln + -t "\${GALAXY_SLOTS:-1}" + -b + -1 + @command_options@ + '${reference_fasta_filename}' + '${input_type.bam_input}' + > first.sai && - bwa aln - -t "\${GALAXY_SLOTS:-1}" - -b - -2 - @command_options@ - "${reference_fasta_filename}" - "${input_type.bam_input}" - > second.sai && + bwa aln + -t "\${GALAXY_SLOTS:-1}" + -b + -2 + @command_options@ + '${reference_fasta_filename}' + '${input_type.bam_input}' + > second.sai && - bwa sampe + bwa sampe - #if str( $input_type.adv_bam_pe_options.adv_pe_options_selector) == "True": + #if str( $input_type.adv_bam_pe_options.adv_pe_options_selector) == "True": -a ${$input_type.adv_bam_pe_options.a} -o ${$input_type.adv_bam_pe_options.o} -n ${$input_type.adv_bam_pe_options.n} -N ${$input_type.adv_bam_pe_options.N} - #end if - - @read_group_options@ - - "${reference_fasta_filename}" first.sai second.sai "${input_type.bam_input}" "${input_type.bam_input}" + #end if + @read_group_options@ + '${reference_fasta_filename}' first.sai second.sai '${input_type.bam_input}' '${input_type.bam_input}' ####### Fastq single ------------ to do next - #elif str( $input_type.input_type_selector ) == "single_bam": - bwa aln - -t "\${GALAXY_SLOTS:-1}" - -b - -0 - - @command_options@ - - "${reference_fasta_filename}" - "${input_type.bam_input}" - > first.sai && - - bwa samse - - #if str( $input_type.adv_bam_se_options.adv_se_options_selector) == "True": - -n ${$input_type.adv_bam_se_options.n} - #end if - - @read_group_options@ +#elif str( $input_type.input_type_selector ) == "single_bam": + bwa aln + -t "\${GALAXY_SLOTS:-1}" + -b + -0 - "${reference_fasta_filename}" first.sai "${input_type.bam_input}" - #end if - - | samtools sort -O bam -o '$bam_output' -]]> - + @command_options@ - - - - - - - - - - - - - - + '${reference_fasta_filename}' + '${input_type.bam_input}' + > first.sai && - - - - - - - - - - - - - - - - - - - + bwa samse - - - - - - - - - - + #if str( $input_type.adv_bam_se_options.adv_se_options_selector) == "True": + -n ${$input_type.adv_bam_se_options.n} + #end if + @read_group_options@ + '${reference_fasta_filename}' first.sai '${input_type.bam_input}' +#end if - - - - - - - - - - - +| samtools sort -@\${GALAXY_SLOTS:-2} -O bam -o '$bam_output' +]]> + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Support** link at the top of the interface and let us know that an index needs to be added - 2. Upload your genome of interest as a FASTA file to Galaxy history and selected **Use a genome from the history and build index** option. - ------ - -**Galaxy-specific option** - -Galaxy allows three levels of control over bwa-mem options provided by **Select analysis mode** menu option. These are: - - 1. *Simple Illumina mode*: The simplest possible bwa mem application in which it alignes single or paired-end data to reference using default parameters. It is equivalent to the following command: bwa mem <reference index> <fastq dataset1> [fastq dataset2] - 2. *Full list of options*: Allows access to all options through Galaxy interface. - ------- - -**bwa-aln options** - -Each Galaxy parameter widget corresponds to command line flags listed below:: + 1. Contact galaxy team using **Help->Support** link at the top of the interface and let us know that an index + needs to be added + 2. Upload your genome of interest as a FASTA file to Galaxy history and selected **Use a genome from the history + and build index** option. - -n NUM max #diff (int) or missing prob under 0.02 err rate (float) [0.04] - -o INT maximum number or fraction of gap opens [1] - -e INT maximum number of gap extensions, -1 for disabling long gaps [-1] - -i INT do not put an indel within INT bp towards the ends [5] - -d INT maximum occurrences for extending a long deletion [10] - -l INT seed length [32] - -k INT maximum differences in the seed [2] - -m INT maximum entries in the queue [2000000] - -M INT mismatch penalty [3] - -O INT gap open penalty [11] - -E INT gap extension penalty [4] - -R INT stop searching when there are >INT equally best hits [30] - -q INT quality threshold for read trimming down to 35bp [0] - -B INT length of barcode - -L log-scaled gap penalty for long deletions - -N non-iterative mode: search for all n-difference hits (slooow) - -I the input is in the Illumina 1.3+ FASTQ-like format - -b the input read file is in the BAM format - -0 use single-end reads only (effective with -b) - -1 use the 1st read in a pair (effective with -b) - -2 use the 2nd read in a pair (effective with -b) - -**bwa-samse options**:: - - -a INT maximum insert size [500] - -o INT maximum occurrences for one end [100000] - -n INT maximum hits to output for paired reads [3] - -N INT maximum hits to output for discordant pairs [10] - -c FLOAT prior of chimeric rate (lower bound) [1.0e-05] - -r STR read group header line [null] - -**bwa-sampe options**:: - - -n INT maximum hits to output for paired reads [3] - -r STR read group header line [null] - -@dataset_collections@ @RG@ @info@ - - - 10.1093/bioinformatics/btp324 - 10.1093/bioinformatics/btp698 - + ]]> + + 10.1093/bioinformatics/btp324 + 10.1093/bioinformatics/btp698 + diff -r c022e4a68b76 -r cbc665adcde4 bwa_macros.xml --- a/bwa_macros.xml Tue Nov 21 11:23:45 2017 -0500 +++ b/bwa_macros.xml Fri Nov 24 09:55:28 2017 -0500 @@ -1,115 +1,115 @@ - read_group_macros.xml + read_group_macros.xml - 0.7.17 + 0.7.17 - - #set $rg_string = "@RG\\tID:" + str($rg_id) - #set $rg_string += $format_read_group("\\tSM:", $rg_sm) - #set $rg_string += $format_read_group("\\tPL:", $rg_pl) - #set $rg_string += $format_read_group("\\tLB:", $rg_lb) - #set $rg_string += $format_read_group("\\tCN:", $rg_cn) - #set $rg_string += $format_read_group("\\tDS:", $rg_ds) - #set $rg_string += $format_read_group("\\tDT:", $rg_dt) - #set $rg_string += $format_read_group("\\tFO:", $rg_fo) - #set $rg_string += $format_read_group("\\tKS:", $rg_ks) - #set $rg_string += $format_read_group("\\tPG:", $rg_pg) - #set $rg_string += $format_read_group("\\tPI:", $rg_pi) - #set $rg_string += $format_read_group("\\tPU:", $rg_pu) - + + #set $rg_string = "@RG\\tID:" + str($rg_id) + #set $rg_string += $format_read_group("\\tSM:", $rg_sm) + #set $rg_string += $format_read_group("\\tPL:", $rg_pl) + #set $rg_string += $format_read_group("\\tLB:", $rg_lb) + #set $rg_string += $format_read_group("\\tCN:", $rg_cn) + #set $rg_string += $format_read_group("\\tDS:", $rg_ds) + #set $rg_string += $format_read_group("\\tDT:", $rg_dt) + #set $rg_string += $format_read_group("\\tFO:", $rg_fo) + #set $rg_string += $format_read_group("\\tKS:", $rg_ks) + #set $rg_string += $format_read_group("\\tPG:", $rg_pg) + #set $rg_string += $format_read_group("\\tPI:", $rg_pi) + #set $rg_string += $format_read_group("\\tPU:", $rg_pu) + - - - - bwa - samtools - - + + + bwa + samtools + + - - - - - - - - + + + + + + + + - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + - - - - - + + + + + - - - - - + + + + + - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + - + ----- .. class:: warningmark @@ -160,8 +160,8 @@ @RG ID:FLOWCELL2.LANE4 PL:illumina LB:LIB-KID-2 SM:KID PI:400 Note the hierarchical relationship between read groups (unique for each lane) to libraries (sequenced on two lanes) and samples (across four lanes, two lanes for each library). - - + + ----- .. class:: infomark @@ -175,16 +175,5 @@ 3. https://github.com/lh3/bwa 4. http://bio-bwa.sourceforge.net/ - - - ------- - -**Dataset collections - processing large numbers of datasets at once** - -Dataset collections are in beta-testing. Extensive documentation will be added later this Spring. - - - - +