# HG changeset patch # User yhoogstrate # Date 1393937419 18000 # Node ID 17f10d28dab3f7d03896d428ee49edf4f42f693c # Parent b578aaede79b9067e7791eeef55f9be2a86cf298 Uploaded diff -r b578aaede79b -r 17f10d28dab3 samtools-parallel-mpileup.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/samtools-parallel-mpileup.xml Tue Mar 04 07:50:19 2014 -0500 @@ -0,0 +1,222 @@ + + + Samtools mpileup (classical or supporting parallelization). + + samtools-parallel-mpileup + samtools + + + #if $reference_genome_source.source_select == "attribute" and len({ alignment.metadata.dbkey:True for alignment in $alignments }.keys()) != 1 + echo "Invalid number of dbkeys are found: ${ len({ alignment.metadata.dbkey:True for alignment in $alignments }.keys()) }, while only one should be used. Make sure that the alignments are done on the same reference genome and that 'tool-data/all_fasta.loc' is configured properly!" >&2 + #else + #if $mpileup_parallelization.mpileup_parallelization_select == "true" + samtools-parallel-mpileup mpileup + -t $mpileup_parallelization.samtools_threads + #else + samtools mpileup + #end if + -f + #if $reference_genome_source.source_select == "indexed_filtered" + "$reference_genome_source.reference_genome" + #else if $reference_genome_source.source_select == "indexed_all" + "$reference_genome_source.reference_genome" + #else if $reference_genome_source.source_select == "history" + "$reference_genome_source.reference_genome" + #else + + "${ filter( lambda x: str( x[0] ) == str( { alignment.metadata.dbkey:True for alignment in $alignments }.keys()[0] ), $__app__.tool_data_tables[ 'all_fasta' ].get_fields() )[0][-1] }" + #end if + + #if $extended_parameters_regions.samtools_regions == "region" + -r $extended_parameters_regions.$samtools_r + #elif $extended_parameters_regions.samtools_regions == "regions_file_pos" or $extended_parameters_regions.samtools_regions == "regions_file_bed" + -l $extended_parameters_regions.$samtools_l + #end if + + #if $extended_parameters.parameters == "extended" + $extended_parameters.samtools_6 + $extended_parameters.samtools_A + $extended_parameters.samtools_B + -C $extended_parameters.samtools_C + -d $extended_parameters.samtools_d + $extended_parameters.samtools_E + -M $extended_parameters.samtools_M + $extended_parameters.samtools_R + -q $extended_parameters.samtools_q + -Q $extended_parameters.samtools_Q + + -e $extended_parameters.samtools_e + -F $extended_parameters.samtools_F + -h $extended_parameters.samtools_h + $extended_parameters.samtools_I + -L $extended_parameters.samtools_L + -m $extended_parameters.samtools_m + -o $extended_parameters.samtools_o + $extended_parameters.samtools_p + -P $extended_parameters.samtools_P + #end if + + #for $alignment in $alignments + ${alignment} + #end for + + 2> stderr_1.txt + > $output ; + cat stderr_1.txt + #end if + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +VarScan2.3.6:: + +*VarScan2 Overview* + +VarScan is a platform-independent mutation caller for targeted, exome, and whole-genome resequencing data generated on Illumina, SOLiD, Life/PGM, Roche/454, and similar instruments. The newest version, VarScan 2, is written in Java, so it runs on most operating systems. +http://dx.doi.org/10.1101/gr.129684.111 +http://www.ncbi.nlm.nih.gov/pubmed/19542151 + +*VarScan* requires mpileup formatted input files, which are generally derived from BAM files. Since mpileup files can become humongous, the interim step of storing it is bypassed. Thus, in this wrapper one or multiple BAM/SAM files go in, get processed into a mpileup file and get directly linked to VarScan. +The samtools package is not able to parallelize the mpileup generation which make it a very slow process. +Other people were aware of this and have written a version that can do parallelization: +https://github.com/mydatascience/parallel-mpileup + +Consequently, when a BAM files gets processed by this wrapper, it's processed by *parallel-mpileup* before its send to VarScan. + +.. _VarScan: http://varscan.sourceforge.net/ + +**Input formats** + +VarScan2 accepts sequencing alignments in the same, either SAM or BAM format (http://samtools.sourceforge.net/). The alignment files have to be linked to a reference genome by galaxy. This is indicated under every history item with e.g.: *"database: hg19"* for a link to hg19, or *"database: ?"* if the link is missing. + +**Installation** + +Make sure your reference genomes are properly annotated in "tool-data/all_fasta.loc", and linked to the names of the reference used for alignment. + +**License** + +* VarScan2.3.6: Non-Profit Open Software License 3.0 (Non-Profit OSL 3.0) +* parallel-mpileup: MIT License (https://github.com/mydatascience/parallel-mpileup/blob/master/samtools-0.1.19/COPYING) +* samtool: MIT License + + +**Contact** + +The tool wrapper has been written by Youri Hoogstrate from the Erasmus Medical Center (Rotterdam, Netherlands) on behalf of the Translational Research IT (TraIT) project: +http://www.ctmm.nl/en/programmas/infrastructuren/traitprojecttranslationeleresearch + +More tools by the Translational Research IT (TraIT) project can be found in the following repository: +http://toolshed.dtls.nl/ + + diff -r b578aaede79b -r 17f10d28dab3 tool_data_table_conf.xml.sample --- a/tool_data_table_conf.xml.sample Wed Feb 19 02:46:43 2014 -0500 +++ b/tool_data_table_conf.xml.sample Tue Mar 04 07:50:19 2014 -0500 @@ -1,5 +1,8 @@ - - - name, dbkey, display_name, value - -
\ No newline at end of file + + + + + name, dbkey, display_name, value + +
+
\ No newline at end of file diff -r b578aaede79b -r 17f10d28dab3 tool_dependencies.xml --- a/tool_dependencies.xml Wed Feb 19 02:46:43 2014 -0500 +++ b/tool_dependencies.xml Tue Mar 04 07:50:19 2014 -0500 @@ -1,11 +1,12 @@ - + + - svn checkout https://github.com/mydatascience/parallel-mpileup/trunk samtools-mpileup-parallel && cd samtools-mpileup-parallel && cd $(ls |grep samtools-) && make && cp samtools ../samtools-mpileup-parallel + svn checkout https://github.com/yhoogstrate/parallel-mpileup/trunk samtools-parallel-mpileup && cd samtools-parallel-mpileup && cd $(ls |grep samtools-) && make && cp samtools ../samtools-parallel-mpileup - samtools-mpileup-parallel + samtools-parallel-mpileup $INSTALL_DIR/bin @@ -18,6 +19,87 @@ Downloads and installs a modified version of samtools, able to paralellize the mpileup function. + + + + + + http://downloads.sourceforge.net/project/samtools/samtools/0.1.19/samtools-0.1.19.tar.bz2 + sed -i.bak 's/-lcurses/-lncurses/' Makefile + make + chmod ugo+rx misc/*.p? + mkdir misc/bin + cp -p `find misc -type f -perm -555` misc/bin/ + + samtools + $INSTALL_DIR/bin + + + bcftools/bcftools + $INSTALL_DIR/bin + + + bcftools/vcfutils.pl + $INSTALL_DIR/bin + + + misc/bin + $INSTALL_DIR/bin + + + $INSTALL_DIR/bin + + + + +Program: samtools (Tools for alignments in the SAM format) +Version: 0.1.19 + +Usage: samtools <command> [options] + +Command: view SAM<->BAM conversion + sort sort alignment file + mpileup multi-way pileup + depth compute the depth + faidx index/extract FASTA + tview text alignment viewer + index index alignment + idxstats BAM index stats (r595 or later) + fixmate fix mate information + flagstat simple stats + calmd recalculate MD/NM tags and '=' bases + merge merge sorted alignments + rmdup remove PCR duplicates + reheader replace BAM header + cat concatenate BAMs + targetcut cut fosmid regions (for fosmid pool only) + phase phase heterozygotes + +This also installs bcftools and misc utility commands: + bcftools + vcfutils.pl + ace2sam + bamcheck + blast2sam.pl + bowtie2sam.pl + export2sam.pl + interpolate_sam.pl + maq2sam-long + maq2sam-short + md5fa + md5sum-lite + novo2sam.pl + psl2sam.pl + sam2vcf.pl + samtools.pl + soap2sam.pl + varfilter.py + wgsim + wgsim_eval.pl + zoom2sam.pl + + + @@ -35,4 +117,4 @@ Downloads VarScan2. - + \ No newline at end of file diff -r b578aaede79b -r 17f10d28dab3 varscan_mpileup2snp.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/varscan_mpileup2snp.xml Tue Mar 04 07:50:19 2014 -0500 @@ -0,0 +1,103 @@ + + + VarScan2 SNP/SNV detection; directly reading *.mpileup file(s). + + VarScan + + + cat $mpileup_input | java + -Xmx64G + -jar \$JAVA_JAR_PATH/VarScan.v2.3.6.jar + mpileup2snp + + #if $extended_parameters.parameters == "extended" + --min-coverage $varscan_min_coverage + --min-reads2 $varscan_min_reads2 + --min-avg-qual $varscan_min_avg_qual + --min-var-freq $varscan_min_var_freq + --min-freq-for-hom $varscan_min_freq_for_hom + --p-value $varscan_p_value + $varscan_strand_filter + $varscan_output_vcf + $varscan_variants + #end if + + --output-vcf $varscan_output_vcf + > $snv_output + 2> &1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +VarScan2.3.6:: + +*VarScan2 Overview* + +VarScan is a platform-independent mutation caller for targeted, exome, and whole-genome resequencing data generated on Illumina, SOLiD, Life/PGM, Roche/454, and similar instruments. The newest version, VarScan 2, is written in Java, so it runs on most operating systems. +http://dx.doi.org/10.1101/gr.129684.111 +http://www.ncbi.nlm.nih.gov/pubmed/19542151 + +*VarScan* requires mpileup formatted input files, which are generally derived from BAM files. Since mpileup files can become humongous, the interim step of storing it is bypassed. Thus, in this wrapper one or multiple BAM/SAM files go in, get processed into a mpileup file and get directly linked to VarScan. +The samtools package is not able to parallelize the mpileup generation which make it a very slow process. +Other people were aware of this and have written a version that can do parallelization: +https://github.com/mydatascience/parallel-mpileup + +Consequently, when a BAM files gets processed by this wrapper, it's processed by *parallel-mpileup* before its send to VarScan. + +.. _VarScan: http://varscan.sourceforge.net/ + +**Input formats** + +VarScan2 accepts sequencing alignments in the same, either SAM or BAM format (http://samtools.sourceforge.net/). The alignment files have to be linked to a reference genome by galaxy. This is indicated under every history item with e.g.: *"database: hg19"* for a link to hg19, or *"database: ?"* if the link is missing. + +**Installation** + +Make sure your reference genomes are properly annotated in "tool-data/all_fasta.loc", and linked to the names of the reference used for alignment. + +**License** + +* VarScan2.3.6: Non-Profit Open Software License 3.0 (Non-Profit OSL 3.0) +* parallel-mpileup: MIT License (https://github.com/mydatascience/parallel-mpileup/blob/master/samtools-0.1.19/COPYING) + + +**Contact** + +The tool wrapper has been written by Youri Hoogstrate from the Erasmus Medical Center (Rotterdam, Netherlands) on behalf of the Translational Research IT (TraIT) project: +http://www.ctmm.nl/en/programmas/infrastructuren/traitprojecttranslationeleresearch + +More tools by the Translational Research IT (TraIT) project can be found in the following repository: +http://toolshed.dtls.nl/ + + diff -r b578aaede79b -r 17f10d28dab3 varscan_mpileup2snp_from_bam.xml --- a/varscan_mpileup2snp_from_bam.xml Wed Feb 19 02:46:43 2014 -0500 +++ b/varscan_mpileup2snp_from_bam.xml Tue Mar 04 07:50:19 2014 -0500 @@ -2,15 +2,20 @@ VarScan2 SNP/SNV detection; directly reading *.bam file(s) & using parallel mpileup generation, to avoid unncessairy I/O overhead and increase performance. - samtools-mpileup-parallel + samtools-parallel-mpileup VarScan + samtools #if $reference_genome_source.source_select == "attribute" and len({ alignment.metadata.dbkey:True for alignment in $alignments }.keys()) != 1 echo "Invalid number of dbkeys are found: ${ len({ alignment.metadata.dbkey:True for alignment in $alignments }.keys()) }, while only one should be used. Make sure that the alignments are done on the same reference genome and that 'tool-data/all_fasta.loc' is configured properly!" >&2 #else - samtools-mpileup-parallel mpileup - -t $samtools_threads + #if $mpileup_parallelization.mpileup_parallelization_select == "true" + samtools-parallel-mpileup mpileup + -t $mpileup_parallelization.samtools_threads + #else + samtools mpileup + #end if -f #if $reference_genome_source.source_select == "indexed_filtered" "$reference_genome_source.reference_genome" @@ -60,7 +65,7 @@ #for $alignment in $alignments ${alignment} #end for - 2>/dev/null + 2> stderr_1.txt | java -Xmx64G -jar \$JAVA_JAR_PATH/VarScan.v2.3.6.jar @@ -80,12 +85,24 @@ --output-vcf $varscan_output_vcf > $snv_output - 2>&1 + 2> stderr_2.txt ; + + echo "-------------------------[ mpileup generation ]-------------------------" ; + echo "" ; + cat stderr_1.txt ; + echo "" ; + echo "" ; + echo "-------------------------[ VarScan SNP detect ]-------------------------" ; + echo "" ; + echo "" ; + cat stderr_2.txt ; + echo "" ; + echo "------------------------------------------------------------------------" ; #end if - + @@ -95,6 +112,9 @@ + + + @@ -116,20 +136,20 @@ - - + + - - + + @@ -138,19 +158,27 @@ - + - + + + + + + + + + + - + - - + @@ -171,7 +199,7 @@ - + @@ -233,6 +261,6 @@ http://www.ctmm.nl/en/programmas/infrastructuren/traitprojecttranslationeleresearch More tools by the Translational Research IT (TraIT) project can be found in the following repository: -http://toolshed.nbic.nl/ +http://toolshed.dtls.nl/