# HG changeset patch # User Jim Johnson # Date 1380895769 18000 # Node ID 6ad9205c1307f2e2caef91937ad00030dd2472d7 # Parent bea875f081e8ada2df1da6bd5da873a1c4c329d5 Update to SnpEff version 3.3 diff -r bea875f081e8 -r 6ad9205c1307 README --- a/README Thu Jul 04 09:49:15 2013 -0500 +++ b/README Fri Oct 04 09:09:29 2013 -0500 @@ -1,15 +1,18 @@ -These are galaxy tools for Snp Effect ( http://snpeff.sourceforge.net/ ) +These are galaxy tools for SnpEff ( http://snpeff.sourceforge.net/ ) + +This repository contains a tool_dependencies.xml file that will attempt to automatically install SnpEff and SnpSift. + +This will use the default location for genome reference downloads from the snpEff.config: +data_dir = ~/snpEff/data/ +You can manually edit the installed snpEff.config and change the location, or you can create a symbolic link to the desired data location from ~/snpEff. + The genome reference options used by the tools: "SnpEff" snpEff.xml "SnpEff Download" snpEff_download.xml are taken from: tool-data/snpeffect_genomedb.loc -The tool-data/snpeffect_genomedb.loc.sample file has the genomes references from the SnpEffect dwnloads page: -http://snpeff.sourceforge.net/download.html -The values for snpeffect_genomedb.loc.sample were populated by: -java -jar snpEff.jar cfg2table galaxy | grep 'option' | sed 's/^.*value="\([^"]*\)">\([^<]*\).*$/\1#\2/' | tr '#' '\t' >> snpeffect_genomedb.loc.sample +There are 2 datamanagers to download and install prebuilt SnpEff Genome databases: + data_manager_snpeff_databases - generates a list of available SnpEff genome databases into the tool-data/snpeff_databases.loc + data_manager_snpeff_download - downloads a SnpEff genome database selected from: tool-data/snpeff_databases.loc and adds entries to snpeff_genomedb.loc,snpeff_regulationdb.loc,snpeff_annotations.loc -This repository contains a tool_dependencies.xml file that will allow SnpEff and SnpSift to be automatically installed. -This will use the default location for genome reference downloads from the snpEff.config: -data_dir = ~/snpEff/data/ diff -r bea875f081e8 -r 6ad9205c1307 data_manager/data_manager_snpEff_databases.py --- a/data_manager/data_manager_snpEff_databases.py Thu Jul 04 09:49:15 2013 -0500 +++ b/data_manager/data_manager_snpEff_databases.py Fri Oct 04 09:09:29 2013 -0500 @@ -46,7 +46,7 @@ genome_version = fields[0].strip() if genome_version.startswith("Genome") or genome_version.startswith("-"): continue - description = fields[1].strip() + description = fields[1].strip() + ' : ' + genome_version data_table_entries.append(dict(value=genome_version, name=description)) data_manager_dict['data_tables']['snpeff_databases'] = data_table_entries except Exception, e: diff -r bea875f081e8 -r 6ad9205c1307 data_manager/data_manager_snpEff_databases.xml --- a/data_manager/data_manager_snpEff_databases.xml Thu Jul 04 09:49:15 2013 -0500 +++ b/data_manager/data_manager_snpEff_databases.xml Fri Oct 04 09:09:29 2013 -0500 @@ -1,10 +1,10 @@ - + Read the list of available snpEff databases - snpEff + snpEff - data_manager_snpEff_databases.py --jar_path \$JAVA_JAR_PATH/snpEff.jar "$out_file" + data_manager_snpEff_databases.py --jar_path \$SNPEFF_JAR_PATH/snpEff.jar "$out_file" diff -r bea875f081e8 -r 6ad9205c1307 data_manager/data_manager_snpEff_download.py --- a/data_manager/data_manager_snpEff_download.py Thu Jul 04 09:49:15 2013 -0500 +++ b/data_manager/data_manager_snpEff_download.py Fri Oct 04 09:09:29 2013 -0500 @@ -22,7 +22,7 @@ # Download human database 'hg19' java -jar snpEff.jar download -v hg19 - java -jar \$JAVA_JAR_PATH/snpEff.jar download -c \$JAVA_JAR_PATH/snpEff.config $genomeVersion > $logfile + java -jar \$SNPEFF_JAR_PATH/snpEff.jar download -c \$JAVA_JAR_PATH/snpEff.config $genomeVersion > $logfile snpEffectPredictor.bin regulation_HeLa-S3.bin @@ -65,6 +65,8 @@ sys.exit( return_code ) ## search data_dir/genome_version for files regulation_pattern = 'regulation_(.+).bin' + # annotation files that are included in snpEff by a flag + annotations_dict = {'nextProt.bin' : '-nextprot','motif.bin': '-motif'} genome_path = os.path.join(data_dir,genome_version) if os.path.isdir(genome_path): for root, dirs, files in os.walk(genome_path): @@ -78,8 +80,13 @@ m = re.match(regulation_pattern,fname) if m: name = m.groups()[0] - data_table_entry = dict(value=genome_version, name=name) + data_table_entry = dict(genome=genome_version,value=name, name=name) _add_data_table_entry( data_manager_dict, 'snpeff_regulationdb', data_table_entry ) + elif fname in annotations_dict: + value = annotations_dict[fname] + name = value.lstrip('-') + data_table_entry = dict(genome=genome_version,value=value, name=name) + _add_data_table_entry( data_manager_dict, 'snpeff_annotations', data_table_entry ) return data_manager_dict def _add_data_table_entry( data_manager_dict, data_table, data_table_entry ): diff -r bea875f081e8 -r 6ad9205c1307 data_manager/data_manager_snpEff_download.xml --- a/data_manager/data_manager_snpEff_download.xml Thu Jul 04 09:49:15 2013 -0500 +++ b/data_manager/data_manager_snpEff_download.xml Fri Oct 04 09:09:29 2013 -0500 @@ -1,10 +1,10 @@ - + Download a new database - snpEff + snpEff - data_manager_snpEff_download.py --jar_path \$JAVA_JAR_PATH/snpEff.jar --config \$JAVA_JAR_PATH/snpEff.config + data_manager_snpEff_download.py --jar_path \$SNPEFF_JAR_PATH/snpEff.jar --config \$SNPEFF_JAR_PATH/snpEff.config --genome_version "${genome_databases.fields.value}" --organism "${genome_databases.fields.name}" "$out_file" diff -r bea875f081e8 -r 6ad9205c1307 data_manager_conf.xml --- a/data_manager_conf.xml Thu Jul 04 09:49:15 2013 -0500 +++ b/data_manager_conf.xml Fri Oct 04 09:09:29 2013 -0500 @@ -4,8 +4,7 @@ - - + @@ -13,15 +12,28 @@ - + + + + snpEff/data + + ${GALAXY_DATA_MANAGER_DATA_PATH}/snpEff/data + abspath + - - + + + + + + + + diff -r bea875f081e8 -r 6ad9205c1307 snpEff.xml --- a/snpEff.xml Thu Jul 04 09:49:15 2013 -0500 +++ b/snpEff.xml Fri Oct 04 09:09:29 2013 -0500 @@ -1,4 +1,4 @@ - + Variant effect and annotation - snpEff + snpEff -SNPEFF_DATA_DIR=`grep '^data_dir' \$JAVA_JAR_PATH/snpEff.config | sed 's/.*data_dir.*[=:]//'`; +SNPEFF_DATA_DIR=`grep '^data_dir' \$SNPEFF_JAR_PATH/snpEff.config | sed 's/.*data_dir.*[=:]//'`; eval "if [ ! -e \$SNPEFF_DATA_DIR/$genomeVersion ] ; -then java -Xmx6G -jar \$JAVA_JAR_PATH/snpEff.jar download -c \$JAVA_JAR_PATH/snpEff.config $genomeVersion ; +then java -Xmx6G -jar \$SNPEFF_JAR_PATH/snpEff.jar download -c \$SNPEFF_JAR_PATH/snpEff.config $genomeVersion ; fi"; -java -Xmx6G -jar \$JAVA_JAR_PATH/snpEff.jar eff -c \$JAVA_JAR_PATH/snpEff.config -i $inputFormat -o $outputFormat -upDownStreamLen $udLength +java -Xmx6G -jar \$SNPEFF_JAR_PATH/snpEff.jar eff -c \$SNPEFF_JAR_PATH/snpEff.config -i $inputFormat -o $outputFormat -upDownStreamLen $udLength #if $spliceSiteSize and $spliceSiteSize.__str__ != '': -spliceSiteSize $spliceSiteSize #end if #if $filterIn and $filterIn.__str__ != 'no_filter': - -$filterIn + $filterIn #end if #if $filterHomHet and $filterHomHet.__str__ != 'no_filter': - -$filterHomHet + $filterHomHet #end if #if $annotations and $annotations.__str__ != '': - -#slurp - #echo ' -'.join($annotations.__str__.split(',')) + #echo ' '.join($annotations.__str__.split(',')) +#end if +#if $extra_annotations and $extra_annotations.__str__ != '': + #echo ' '.join($extra_annotations.__str__.split(',')) +#end if +#if $regulation and $regulation.__str__ != '': + -reg #echo ' -reg '.join($regulation.__str__.split(','))# #end if #if $filterOut and $filterOut.__str__ != '': - -#slurp - #echo ' -'.join($filterOut.__str__.split(',')) + #echo ' '.join($filterOut.__str__.split(',')) #end if #if str( $transcripts ) != 'None': -onlyTr $transcripts @@ -96,7 +100,7 @@ -stats $statsFile #end if #if $offset.__str__ != '': - -${offset} + ${offset} #end if #if $chr.__str__.strip() != '': -chr "$chr" @@ -150,35 +154,43 @@ - - + + - - - - + + + + - - - - - - - - + + + + + + + + + + + + These are available for only a few genomes + + + + These are available for only a few genomes - - + + @@ -186,21 +198,21 @@ - - - - - + + + + + - - + + ^\S*$ - + @@ -233,9 +245,7 @@ - + @@ -258,7 +268,7 @@ - + - snpEff + snpEff - java -Xmx6G -jar \$JAVA_JAR_PATH/SnpSift.jar $annotate_cmd + java -Xmx6G -jar \$SNPEFF_JAR_PATH/SnpSift.jar $annotate_cmd #if $annotate.id : -id #elif $annotate.info_ids.__str__.strip() != '' : diff -r bea875f081e8 -r 6ad9205c1307 snpSift_caseControl.xml --- a/snpSift_caseControl.xml Thu Jul 04 09:49:15 2013 -0500 +++ b/snpSift_caseControl.xml Fri Oct 04 09:09:29 2013 -0500 @@ -1,14 +1,14 @@ - + Count samples are in 'case' and 'control' groups. - snpEff + snpEff - java -Xmx1G -jar \$JAVA_JAR_PATH/SnpSift.jar caseControl -q + java -Xmx1G -jar \$SNPEFF_JAR_PATH/SnpSift.jar caseControl -q #if $name.__str__.strip() != '': -name $name #end if diff -r bea875f081e8 -r 6ad9205c1307 snpSift_filter.xml --- a/snpSift_filter.xml Thu Jul 04 09:49:15 2013 -0500 +++ b/snpSift_filter.xml Fri Oct 04 09:09:29 2013 -0500 @@ -1,11 +1,11 @@ - + Filter variants using arbitrary expressions - snpEff + snpEff - java -Xmx6G -jar \$JAVA_JAR_PATH/SnpSift.jar filter -f $input -e $exprFile $inverse $pass + java -Xmx6G -jar \$SNPEFF_JAR_PATH/SnpSift.jar filter -f $input -e $exprFile $inverse $pass #if $filterId and len($filterId.__str__.strip()) > 0: --filterId = "$filterId" #end if diff -r bea875f081e8 -r 6ad9205c1307 snpSift_int.xml --- a/snpSift_int.xml Thu Jul 04 09:49:15 2013 -0500 +++ b/snpSift_int.xml Fri Oct 04 09:09:29 2013 -0500 @@ -1,14 +1,14 @@ - + Filter variants using intervals - snpEff + snpEff - java -Xmx2G -jar \$JAVA_JAR_PATH/SnpSift.jar intervals -i $input $exclude $bedFile > $output + java -Xmx2G -jar \$SNPEFF_JAR_PATH/SnpSift.jar intervals -i $input $exclude $bedFile > $output diff -r bea875f081e8 -r 6ad9205c1307 tool-data/snpeff_annotations.loc.sample --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool-data/snpeff_annotations.loc.sample Fri Oct 04 09:09:29 2013 -0500 @@ -0,0 +1,5 @@ +## Regulation Databases for SnpEff +## These are from the list on: http://snpeff.sourceforge.net/download.html +#genome annotation_name description +#GRCh37.71 nextprot nextprot +#GRCh37.71 motif motif diff -r bea875f081e8 -r 6ad9205c1307 tool-data/snpeff_regulationdb.loc.sample --- a/tool-data/snpeff_regulationdb.loc.sample Thu Jul 04 09:49:15 2013 -0500 +++ b/tool-data/snpeff_regulationdb.loc.sample Fri Oct 04 09:09:29 2013 -0500 @@ -1,5 +1,4 @@ -## Databases for SnpEff +## Regulation Databases for SnpEff ## These are from the list on: http://snpeff.sourceforge.net/download.html -## the Description field in this sample is "Genome : Version" -#Genome Regulation_Name -#GRCh37.70 CD4 +#genome regulation_name description +#GRCh37.70 CD4 CD4