Mercurial > repos > iuc > gatk2
changeset 5:c2645201ffae draft
Uploaded
author | iuc |
---|---|
date | Tue, 11 Mar 2014 07:42:09 -0400 |
parents | e67da4f2c9bf |
children | b80301676614 |
files | gatk2_annotations.txt.sample gatk2_macros.xml gatk2_wrapper.py haplotype_caller.xml tool_dependencies.xml unified_genotyper.xml variant_recalibrator.xml |
diffstat | 7 files changed, 146 insertions(+), 104 deletions(-) [+] |
line wrap: on
line diff
--- a/gatk2_annotations.txt.sample Sat Jan 18 07:00:26 2014 -0500 +++ b/gatk2_annotations.txt.sample Tue Mar 11 07:42:09 2014 -0400 @@ -1,30 +1,45 @@ #unique_id name gatk_value tools_valid_for -AlleleBalance AlleleBalance AlleleBalance UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -AlleleBalanceBySample AlleleBalanceBySample AlleleBalanceBySample UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -BaseCounts BaseCounts BaseCounts UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -BaseQualityRankSumTest BaseQualityRankSumTest BaseQualityRankSumTest UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -ChromosomeCounts ChromosomeCounts ChromosomeCounts UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -DepthOfCoverage DepthOfCoverage DepthOfCoverage UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -DepthPerAlleleBySample DepthPerAlleleBySample DepthPerAlleleBySample UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -FisherStrand FisherStrand FisherStrand UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -GCContent GCContent GCContent UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -HaplotypeScore HaplotypeScore HaplotypeScore UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -HardyWeinberg HardyWeinberg HardyWeinberg UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -HomopolymerRun HomopolymerRun HomopolymerRun UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -InbreedingCoeff InbreedingCoeff InbreedingCoeff UnifiedGenotyper,VariantAnnotator,VariantRecalibrator +#http://gatkforums.broadinstitute.org/discussion/1268/how-should-i-interpret-vcf-files-produced-by-the-gatk +AlleleBalance AlleleBalance AB UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +AlleleBalanceBySample AlleleBalanceBySample AlleleBalanceBySample UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +BaseCounts BaseCounts BaseCounts UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +BaseQualityRankSumTest BaseQualityRankSumTest BaseQualityRankSumTest UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +ChromosomeCounts ChromosomeCounts ChromosomeCounts UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +Coverage Coverage Coverage UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +DepthPerAlleleBySample DepthPerAlleleBySample DepthPerAlleleBySample UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +FisherStrand FisherStrand FisherStrand UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +GCContent GCContent GCContent UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +HaplotypeScore HaplotypeScore HaplotypeScore UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +HardyWeinberg HardyWeinberg HardyWeinberg UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +HomopolymerRun HomopolymerRun HomopolymerRun UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +InbreedingCoeff InbreedingCoeff InbreedingCoeff UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller IndelType IndelType IndelType UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -LowMQ LowMQ LowMQ UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -MVLikelihoodRatio MVLikelihoodRatio MVLikelihoodRatio UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -MappingQualityRankSumTest MappingQualityRankSumTest MappingQualityRankSumTest UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -MappingQualityZero MappingQualityZero MappingQualityZero UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -MappingQualityZeroBySample MappingQualityZeroBySample MappingQualityZeroBySample UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -MappingQualityZeroFraction MappingQualityZeroFraction MappingQualityZeroFraction UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -NBaseCount NBaseCount NBaseCount UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -QualByDepth QualByDepth QualByDepth UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -RMSMappingQuality RMSMappingQuality RMSMappingQuality UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -ReadDepthAndAllelicFractionBySample ReadDepthAndAllelicFractionBySample ReadDepthAndAllelicFractionBySample UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -ReadPosRankSumTest ReadPosRankSumTest ReadPosRankSumTest UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -SampleList SampleList SampleList UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -SnpEff SnpEff SnpEff VariantAnnotator,VariantRecalibrator -SpanningDeletions SpanningDeletions SpanningDeletions UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -TechnologyComposition TechnologyComposition TechnologyComposition UnifiedGenotyper,VariantAnnotator,VariantRecalibrator +LowMQ LowMQ LowMQ UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +MVLikelihoodRatio MVLikelihoodRatio MVLikelihoodRatio VariantAnnotator,VariantRecalibrator,HaplotypeCaller +MappingQualityRankSumTest MappingQualityRankSumTest MappingQualityRankSumTest UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +MappingQualityZero MappingQualityZero MappingQualityZero UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +MappingQualityZeroBySample MappingQualityZeroBySample MappingQualityZeroBySample UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +NBaseCount NBaseCount NBaseCount UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +QualByDepth QualByDepth QualByDepth UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +RMSMappingQuality RMSMappingQuality RMSMappingQuality UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +ReadPosRankSumTest ReadPosRankSumTest ReadPosRankSumTest UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +SampleList SampleList SampleList UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +SnpEff SnpEff SnpEff VariantAnnotator,VariantRecalibrator,HaplotypeCaller +SpanningDeletions SpanningDeletions SpanningDeletions UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +VariantType VariantType VariantType UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +AlleleCount Allele count in genotypes, for each ALT allele (AC) AC UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +AlleleFrequency Allele Frequency, for each ALT allele (AF) AF UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +AlleleNumber Total number of alleles in called genotypes (AN) AN UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +Coverage Unfiltered depth over all samples (DP) DP UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +Dels Dels Dels UnifiedGenotyper,VariantAnnotator,VariantRecalibrator +MQ RMS Mapping Quality MQ UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +MQ0 Mapping Quality Zero MQ0 UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +BaseQualityRankSumTest BaseQualityRankSumTest BaseQualityRankSumTest UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +MappingQualityRankSumTest MappingQualityRankSumTest MappingQualityRankSumTest UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +ReadPosRankSumTest ReadPosRankSumTest ReadPosRankSumTest UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +HaplotypeScore HaplotypeScore HaplotypeScore UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +QualByDepth QualByDepth QD UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +VQSLOD Variant quality score recalibration VQSLOD UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +FisherStrand FisherStrand FS UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +StrandBias Strand Bias evidence (higher SB, more bias, more false positive calls) SB UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +
--- a/gatk2_macros.xml Sat Jan 18 07:00:26 2014 -0500 +++ b/gatk2_macros.xml Tue Mar 11 07:42:09 2014 -0400 @@ -3,6 +3,7 @@ <requirements> <requirement type="package">gatk2</requirement> <requirement type="package" version="0.1.19">samtools</requirement> + <requirement type="package" version="1.56.0">picard</requirement> <requirement type="set_environment">GATK2_PATH</requirement> <requirement type="set_environment">GATK2_SITE_OPTIONS</requirement> </requirements>
--- a/gatk2_wrapper.py Sat Jan 18 07:00:26 2014 -0500 +++ b/gatk2_wrapper.py Tue Mar 11 07:42:09 2014 -0400 @@ -7,7 +7,6 @@ import sys, optparse, os, tempfile, subprocess, shutil from binascii import unhexlify -from string import Template GALAXY_EXT_TO_GATK_EXT = { 'gatk_interval':'intervals', 'bam_index':'bam.bai', 'gatk_dbsnp':'dbSNP', 'picard_interval_list':'interval_list' } #items not listed here will use the galaxy extension as-is GALAXY_EXT_TO_GATK_FILE_TYPE = GALAXY_EXT_TO_GATK_EXT #for now, these are the same, but could be different if needed @@ -19,6 +18,7 @@ if tmp_dir and os.path.exists( tmp_dir ): shutil.rmtree( tmp_dir ) + def gatk_filename_from_galaxy( galaxy_filename, galaxy_ext, target_dir = None, prefix = None ): suffix = GALAXY_EXT_TO_GATK_EXT.get( galaxy_ext, galaxy_ext ) if prefix is None: @@ -29,36 +29,39 @@ os.symlink( galaxy_filename, gatk_filename ) return gatk_filename + def gatk_filetype_argument_substitution( argument, galaxy_ext ): return argument % dict( file_type = GALAXY_EXT_TO_GATK_FILE_TYPE.get( galaxy_ext, galaxy_ext ) ) + def open_file_from_option( filename, mode = 'rb' ): if filename: return open( filename, mode = mode ) return None + def html_report_from_directory( html_out, dir ): html_out.write( '<html>\n<head>\n<title>Galaxy - GATK Output</title>\n</head>\n<body>\n<p/>\n<ul>\n' ) for fname in sorted( os.listdir( dir ) ): html_out.write( '<li><a href="%s">%s</a></li>\n' % ( fname, fname ) ) html_out.write( '</ul>\n</body>\n</html>\n' ) -def index_bam_files( bam_filenames, tmp_dir ): + +def index_bam_files( bam_filenames ): for bam_filename in bam_filenames: bam_index_filename = "%s.bai" % bam_filename if not os.path.exists( bam_index_filename ): #need to index this bam file stderr_name = tempfile.NamedTemporaryFile( prefix = "bam_index_stderr" ).name command = 'samtools index %s %s' % ( bam_filename, bam_index_filename ) - proc = subprocess.Popen( args=command, shell=True, stderr=open( stderr_name, 'wb' ) ) - return_code = proc.wait() - if return_code: + try: + subprocess.check_call( args=command, shell=True, stderr=open( stderr_name, 'wb' ) ) + except: for line in open( stderr_name ): print >> sys.stderr, line - os.unlink( stderr_name ) #clean up - cleanup_before_exit( tmp_dir ) raise Exception( "Error indexing BAM file" ) - os.unlink( stderr_name ) #clean up + finally: + os.unlink( stderr_name ) def __main__(): #Parse Command Line @@ -74,8 +77,7 @@ parser.add_option( '-e', '--phone_home', dest='phone_home', action='store', type="string", default='STANDARD', help='What kind of GATK run report should we generate(NO_ET|STANDARD|STDOUT)' ) parser.add_option( '-K', '--gatk_key', dest='gatk_key', action='store', type="string", default=None, help='What kind of GATK run report should we generate(NO_ET|STANDARD|STDOUT)' ) (options, args) = parser.parse_args() - - tmp_dir = tempfile.mkdtemp( prefix='tmp-gatk-' ) + if options.pass_through_options: cmd = ' '.join( options.pass_through_options ) else: @@ -87,42 +89,50 @@ elif options.max_jvm_heap_fraction is not None: cmd = cmd.replace( 'java ', 'java -XX:DefaultMaxRAMFraction=%s -XX:+UseParallelGC ' % ( options.max_jvm_heap_fraction ), 1 ) bam_filenames = [] - if options.datasets: - for ( dataset_arg, filename, galaxy_ext, prefix ) in options.datasets: - gatk_filename = gatk_filename_from_galaxy( filename, galaxy_ext, target_dir = tmp_dir, prefix = prefix ) - if dataset_arg: - cmd = '%s %s "%s"' % ( cmd, gatk_filetype_argument_substitution( dataset_arg, galaxy_ext ), gatk_filename ) - if galaxy_ext == "bam": - bam_filenames.append( gatk_filename ) - index_bam_files( bam_filenames, tmp_dir ) - #set up stdout and stderr output options - stdout = open_file_from_option( options.stdout, mode = 'wb' ) - stderr = open_file_from_option( options.stderr, mode = 'wb' ) - #if no stderr file is specified, we'll use our own - if stderr is None: - stderr = tempfile.NamedTemporaryFile( prefix="gatk-stderr-", dir=tmp_dir ) - - proc = subprocess.Popen( args=cmd, stdout=stdout, stderr=stderr, shell=True, cwd=tmp_dir ) - return_code = proc.wait() - - if return_code: - stderr_target = sys.stderr - else: - stderr_target = sys.stdout - stderr.flush() - stderr.seek(0) - while True: - chunk = stderr.read( CHUNK_SIZE ) - if chunk: - stderr_target.write( chunk ) + tmp_dir = tempfile.mkdtemp( prefix='tmp-gatk-' ) + try: + if options.datasets: + for ( dataset_arg, filename, galaxy_ext, prefix ) in options.datasets: + gatk_filename = gatk_filename_from_galaxy( filename, galaxy_ext, target_dir = tmp_dir, prefix = prefix ) + if dataset_arg: + cmd = '%s %s "%s"' % ( cmd, gatk_filetype_argument_substitution( dataset_arg, galaxy_ext ), gatk_filename ) + if galaxy_ext == "bam": + bam_filenames.append( gatk_filename ) + if galaxy_ext == 'fasta': + subprocess.check_call( 'samtools faidx "%s"' % gatk_filename, shell=True ) + subprocess.check_call( 'java -jar %s R=%s O=%s QUIET=true' % ( os.path.join(os.environ['JAVA_JAR_PATH'], 'CreateSequenceDictionary.jar'), gatk_filename, os.path.splitext(gatk_filename)[0] + '.dict' ), shell=True ) + index_bam_files( bam_filenames ) + #set up stdout and stderr output options + stdout = open_file_from_option( options.stdout, mode = 'wb' ) + stderr = open_file_from_option( options.stderr, mode = 'wb' ) + #if no stderr file is specified, we'll use our own + if stderr is None: + stderr = tempfile.NamedTemporaryFile( prefix="gatk-stderr-", dir=tmp_dir ) + + proc = subprocess.Popen( args=cmd, stdout=stdout, stderr=stderr, shell=True, cwd=tmp_dir ) + return_code = proc.wait() + + if return_code: + stderr_target = sys.stderr else: - break - stderr.close() + stderr_target = sys.stdout + stderr.flush() + stderr.seek(0) + while True: + chunk = stderr.read( CHUNK_SIZE ) + if chunk: + stderr_target.write( chunk ) + else: + break + stderr.close() + finally: + cleanup_before_exit( tmp_dir ) + #generate html reports if options.html_report_from_directory: for ( html_filename, html_dir ) in options.html_report_from_directory: html_report_from_directory( open( html_filename, 'wb' ), html_dir ) - - cleanup_before_exit( tmp_dir ) + -if __name__=="__main__": __main__() +if __name__ == "__main__": + __main__()
--- a/haplotype_caller.xml Sat Jan 18 07:00:26 2014 -0500 +++ b/haplotype_caller.xml Tue Mar 11 07:42:09 2014 -0400 @@ -158,7 +158,7 @@ <!-- load the available annotations from an external configuration file, since additional ones can be added to local installs --> <options from_data_table="gatk2_annotations"> <filter type="multiple_splitter" column="tools_valid_for" separator=","/> - <filter type="static_value" value="UnifiedGenotyper" column="tools_valid_for"/> + <filter type="static_value" value="HaplotypeCaller" column="tools_valid_for"/> </options> </param> <repeat name="additional_annotations" title="Additional annotation" help="-A,--annotation &lt;annotation&gt;"> @@ -191,7 +191,7 @@ <!-- load the available annotations from an external configuration file, since additional ones can be added to local installs --> <options from_data_table="gatk2_annotations"> <filter type="multiple_splitter" column="tools_valid_for" separator=","/> - <filter type="static_value" value="UnifiedGenotyper" column="tools_valid_for"/> + <filter type="static_value" value="HaplotypeCaller" column="tools_valid_for"/> </options> </param>
--- a/tool_dependencies.xml Sat Jan 18 07:00:26 2014 -0500 +++ b/tool_dependencies.xml Tue Mar 11 07:42:09 2014 -0400 @@ -15,6 +15,38 @@ </set_environment> <package name="samtools" version="0.1.19"> - <repository changeset_revision="54195f1d4b0f" name="package_samtools_0_1_19" owner="iuc" toolshed="http://testtoolshed.g2.bx.psu.edu" /> + <repository changeset_revision="9f412e12b103" name="package_samtools_0_1_19" owner="iuc" toolshed="http://testtoolshed.g2.bx.psu.edu" /> + </package> + <package name="picard" version="1.56.0"> + <repository changeset_revision="7206dbf34dcd" name="package_picard_1_56_0" owner="devteam" toolshed="http://testtoolshed.g2.bx.psu.edu" /> + </package> + + <package name="gatk2_r_dependencies" version="2.8"> + <install version="1.0"> + <actions> + <action type="setup_r_environment"> + + <repository changeset_revision="2c0a13200a73" name="package_r_2_11_0" owner="devteam" toolshed="http://testtoolshed.g2.bx.psu.edu"> + <package name="R" version="2.11.0" /> + </repository> + <package>https://github.com/bgruening/download_store/raw/master/gatk2_R_deps/colorspace_1.2-4.tar.gz</package> + <package>https://github.com/bgruening/download_store/raw/master/gatk2_R_deps/stringr_0.6.2.tar.gz</package> + <package>https://github.com/bgruening/download_store/raw/master/gatk2_R_deps/RColorBrewer_1.0-5.tar.gz</package> + <package>https://github.com/bgruening/download_store/raw/master/gatk2_R_deps/dichromat_2.0-0.tar.gz</package> + <package>https://github.com/bgruening/download_store/raw/master/gatk2_R_deps/munsell_0.4.2.tar.gz</package> + <package>https://github.com/bgruening/download_store/raw/master/gatk2_R_deps/labeling_0.2.tar.gz</package> + <package>https://github.com/bgruening/download_store/raw/master/gatk2_R_deps/plyr_1.8.1.tar.gz</package> + <package>https://github.com/bgruening/download_store/raw/master/gatk2_R_deps/digest_0.6.4.tar.gz</package> + <package>https://github.com/bgruening/download_store/raw/master/gatk2_R_deps/gtable_0.1.2.tar.gz</package> + <package>https://github.com/bgruening/download_store/raw/master/gatk2_R_deps/reshape2_1.2.2.tar.gz</package> + <package>https://github.com/bgruening/download_store/raw/master/gatk2_R_deps/scales_0.2.3.tar.gz</package> + <package>https://github.com/bgruening/download_store/raw/master/gatk2_R_deps/proto_0.3-10.tar.gz</package> + <package>https://github.com/bgruening/download_store/raw/master/gatk2_R_deps/ggplot2_0.9.3.1.tar.gz</package> + </action> + </actions> + </install> + <readme> + R depencies for GATK2. + </readme> </package> </tool_dependency>
--- a/unified_genotyper.xml Sat Jan 18 07:00:26 2014 -0500 +++ b/unified_genotyper.xml Tue Mar 11 07:42:09 2014 -0400 @@ -22,7 +22,7 @@ \$GATK2_SITE_OPTIONS ## according to http://www.broadinstitute.org/gatk/guide/article?id=1975 - --num_cpu_threads_per_data_thread 6 + --num_cpu_threads_per_data_thread 1 #if $reference_source.reference_source_selector != "history": -R "${reference_source.ref_file.fields.path}"
--- a/variant_recalibrator.xml Sat Jan 18 07:00:26 2014 -0500 +++ b/variant_recalibrator.xml Tue Mar 11 07:42:09 2014 -0400 @@ -63,15 +63,12 @@ --maxIterations "${analysis_param_type.max_iterations}" --numKMeans "${analysis_param_type.num_k_means}" --stdThreshold "${analysis_param_type.std_threshold}" - --qualThreshold "${analysis_param_type.qual_threshold}" --shrinkage "${analysis_param_type.shrinkage}" --dirichlet "${analysis_param_type.dirichlet}" --priorCounts "${analysis_param_type.prior_counts}" - #if str( $analysis_param_type.bad_variant_selector.bad_variant_selector_type ) == 'percent': - --percentBadVariants "${analysis_param_type.bad_variant_selector.percent_bad_variants}" - #else: - --minNumBadVariants "${analysis_param_type.bad_variant_selector.min_num_bad_variants}" - #end if + + --minNumBadVariants "${analysis_param_type.min_num_bad_variants}" + --target_titv "${analysis_param_type.target_titv}" #for $tranche in [ $tranche.strip() for $tranche in str( $analysis_param_type.ts_tranche ).split( ',' ) if $tranche.strip() ] --TStranche "${tranche}" @@ -83,7 +80,6 @@ #end if --ignore_filter "${ignore_filter_name}" #end for - --ts_filter_level "${analysis_param_type.ts_filter_level}" ' #end if @@ -100,7 +96,7 @@ <param name="input_variants" type="data" format="vcf" label="Variant file to recalibrate" /> </repeat> <param name="ref_file" type="select" label="Using reference genome" help="-R,--reference_sequence &lt;reference_sequence&gt;"> - <options from_data_table="gatk_picard_indexes"> + <options from_data_table="gatk2_picard_indexes"> <!-- <filter type="data_meta" key="dbkey" ref="variants[0].input_variants" column="dbkey"/> --> </options> <validator type="no_options" message="A built-in reference genome is not available for the build associated with the selected input file"/> @@ -114,7 +110,7 @@ </when> </conditional> - <repeat name="rod_bind" title="Binding for reference-ordered data" help="-resource,--resource &lt;resource&gt;"> + <repeat name="rod_bind" title="Binding for reference-ordered data" help="-resource,--resource &lt;resource&gt;" min="1"> <conditional name="rod_bind_type"> <param name="rod_bind_type_selector" type="select" label="Binding Type"> <option value="dbsnp" selected="True">dbSNP</option> @@ -324,26 +320,17 @@ <expand macro="gatk_param_type_conditional" /> <expand macro="analysis_type_conditional"> - <param name="max_gaussians" type="integer" label="maximum number of Gaussians to try during variational Bayes Algorithm" value="10" help="-mG,--maxGaussians &lt;maxGaussians&gt;"/> - <param name="max_iterations" type="integer" label="maximum number of maximum number of VBEM iterations to be performed in variational Bayes Algorithm" value="100" help="-mI,--maxIterations &lt;maxIterations&gt;"/> - <param name="num_k_means" type="integer" label="number of k-means iterations to perform in order to initialize the means of the Gaussians in the Gaussian mixture model" value="30" help="-nKM,--numKMeans &lt;numKMeans&gt;"/> - <param name="std_threshold" type="float" label="If a variant has annotations more than -std standard deviations away from mean then don't use it for building the Gaussian mixture model." value="8.0" help="-std,--stdThreshold &lt;stdThreshold&gt;"/> - <param name="qual_threshold" type="float" label="If a known variant has raw QUAL value less than -qual then don't use it for building the Gaussian mixture model." value="80.0" help="-qual,--qualThreshold &lt;qualThreshold&gt;"/> + <param name="max_gaussians" type="integer" label="maximum number of Gaussians to try during variational Bayes Algorithm" value="8" help="-mG,--maxGaussians &lt;maxGaussians&gt;"/> + <param name="max_iterations" type="integer" label="maximum number of maximum number of VBEM iterations to be performed in variational Bayes Algorithm" value="150" help="-mI,--maxIterations &lt;maxIterations&gt;"/> + <param name="num_k_means" type="integer" label="number of k-means iterations to perform in order to initialize the means of the Gaussians in the Gaussian mixture model" value="100" help="-nKM,--numKMeans &lt;numKMeans&gt;"/> + <param name="std_threshold" type="float" label="If a variant has annotations more than -std standard deviations away from mean then don't use it for building the Gaussian mixture model." value="10.0" help="-std,--stdThreshold &lt;stdThreshold&gt;"/> <param name="shrinkage" type="float" label="shrinkage parameter in variational Bayes algorithm" value="1.0" help="-shrinkage,--shrinkage &lt;shrinkage&gt;"/> <param name="dirichlet" type="float" label="dirichlet parameter in variational Bayes algorithm" value="0.001" help="-dirichlet,--dirichlet &lt;dirichlet&gt;"/> <param name="prior_counts" type="float" label="number of prior counts to use in variational Bayes algorithm" value="20.0" help="-priorCounts,--priorCounts &lt;priorCounts&gt;"/> - <conditional name="bad_variant_selector"> - <param name="bad_variant_selector_type" type="select" label="How to specify bad variants"> - <option value="percent" selected="True">Percent</option> - <option value="min_num">Number</option> - </param> - <when value="percent"> - <param name="percent_bad_variants" type="float" label="percentage of the worst scoring variants to use when building the Gaussian mixture model of bad variants. 0.07 means bottom 7 percent." value="0.03" help="-percentBad,--percentBadVariants &lt;percentBadVariants&gt;"/> - </when> - <when value="min_num"> - <param name="min_num_bad_variants" type="integer" label="minimum amount of worst scoring variants to use when building the Gaussian mixture model of bad variants. Will override -percentBad arugment if necessary" value="2000" help="-minNumBad,--minNumBadVariants &lt;minNumBadVariants&gt;"/> - </when> - </conditional> + <!--<param name="trustAllPolymorphic" type="boolean" label="trustAllPolymorphic" truevalue="-/-trustAllPolymorphic=true" falsevalue="-/-trustAllPolymorphic=false" + help="Trust that all the input training sets' unfiltered records contain only polymorphic sites to drastically speed up the computation. -trustAllPolymorphic" />--> + + <param name="min_num_bad_variants" type="integer" label="minimum amount of worst scoring variants to use when building the Gaussian mixture model of bad variants. Will override -percentBad arugment if necessary" value="1000" help="-minNumBad,--minNumBadVariants &lt;minNumBadVariants&gt;"/> <param name="target_titv" type="float" label="expected novel Ti/Tv ratio to use when calculating FDR tranches and for display on optimization curve output figures. (approx 2.15 for whole genome experiments). ONLY USED FOR PLOTTING PURPOSES!" value="2.15" help="-titv,--target_titv &lt;target_titv&gt;"/> <param name="ts_tranche" type="text" label="levels of novel false discovery rate (FDR, implied by ti/tv) at which to slice the data. (in percent, that is 1.0 for 1 percent)" value="100.0, 99.9, 99.0, 90.0" help="-tranche,--TStranche &lt;TStranche&gt;"/> <repeat name="ignore_filters" title="Ignore Filter" help="-ignoreFilter,--ignore_filter &lt;ignore_filter&gt;"> @@ -360,7 +347,6 @@ <when value="LowQual" /> </conditional> </repeat> - <param name="ts_filter_level" type="float" label="truth sensitivity level at which to start filtering, used here to indicate filtered variants in plots" value="99.0" help="-ts_filter_level,--ts_filter_level &lt;ts_filter_level&gt;"/> </expand> </inputs> <outputs> @@ -410,7 +396,6 @@ maxIterations The maximum number of VBEM iterations to be performed in variational Bayes algorithm. Procedure will normally end when convergence is detected. numKMeans The number of k-means iterations to perform in order to initialize the means of the Gaussians in the Gaussian mixture model. stdThreshold If a variant has annotations more than -std standard deviations away from mean then don't use it for building the Gaussian mixture model. - qualThreshold If a known variant has raw QUAL value less than -qual then don't use it for building the Gaussian mixture model. shrinkage The shrinkage parameter in variational Bayes algorithm. dirichlet The dirichlet parameter in variational Bayes algorithm. priorCounts The number of prior counts to use in variational Bayes algorithm. @@ -423,7 +408,6 @@ path_to_Rscript The path to your implementation of Rscript. For Broad users this is maybe /broad/tools/apps/R-2.6.0/bin/Rscript rscript_file The output rscript file generated by the VQSR to aid in visualization of the input data and learned model path_to_resources Path to resources folder holding the Sting R scripts. - ts_filter_level The truth sensitivity level at which to start filtering, used here to indicate filtered variants in plots @CITATION_SECTION@ </help>