Mercurial > repos > bgruening > upload_testing
changeset 58:d602d8b1dc4f
Uploaded
author | bgruening |
---|---|
date | Tue, 13 Aug 2013 09:38:07 -0400 |
parents | 5b919ef94655 |
children | b2e673e1db33 |
files | tool-data/homer_available_genomes.loc.sample tools/README tools/annotatePeaks.xml tools/bed2pos.xml tools/findMotifsGenome.xml tools/findPeaks.xml tools/homer_macros.xml tools/makeTagDirectory.py tools/makeTagDirectory.xml tools/pos2bed.xml |
diffstat | 10 files changed, 134 insertions(+), 477 deletions(-) [+] |
line wrap: on
line diff
--- a/tool-data/homer_available_genomes.loc.sample Mon Aug 12 14:39:25 2013 -0400 +++ b/tool-data/homer_available_genomes.loc.sample Tue Aug 13 09:38:07 2013 -0400 @@ -2,5 +2,3 @@ hg19 mm9 mm10 - -
--- a/tools/README Mon Aug 12 14:39:25 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,15 +0,0 @@ -Homer wrapper for Galaxy - -The homer tools will need to be accessible from command line - -Code repo: https://bitbucket.org/gvl/homer - -=========================================: -LICENSE for this wrapper: -=========================================: -Kevin Ying -Garvan Institute: http://www.garvan.org.au -GVL: https://genome.edu.au/wiki/GVL - -http://opensource.org/licenses/mit-license.php -
--- a/tools/annotatePeaks.xml Mon Aug 12 14:39:25 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,164 +0,0 @@ -<tool id="homer_annotatePeaks" name="homer_annotatePeaks" version="0.0.5"> - <requirements> - <requirement type="package" version="4.1">homer</requirement> - </requirements> - <description></description> - <!--<version_command></version_command>--> - <command> - annotatePeaks.pl $input_bed $genome_selector 1> $out_annotated - 2> $out_log || echo "Error running annotatePeaks." >&2 - </command> - <inputs> - <param format="tabular,bed" name="input_bed" type="data" label="Homer peaks OR BED format"/> - <param name="genome_selector" type="select" label="Genome version"> - <option value="hg19" selected="true">hg19</option> - </param> - <param type="text" name="options" label="Extra options" value="" help="See link below for more options"> - <sanitizer> - <valid initial="string.printable"> - <remove value="'"/> - <remove value="/"/> - </valid> - <mapping initial="none"> - <add source="'" target="__sq__"/> - </mapping> - </sanitizer> - </param> - </inputs> - <outputs> - <!--<data format="html" name="html_outfile" label="index" />--> - <!--<data format="html" hidden="True" name="html_outfile" label="index.html" />--> - <data format="csv" name="out_annotated" label="${tool.name} on #echo os.path.splitext(str($input_bed.name))[0]#_genome_${genome_selector}" /> - <data format="txt" name="out_log" label="${tool.name} on #echo os.path.splitext(str($input_bed.name))[0]#_genome_${genome_selector}.log" /> - </outputs> - <tests> - <test> - <!--<param name="input_file" value="extract_genomic_dna.fa" />--> - <!--<output name="html_file" file="sample_output.html" ftype="html" />--> - </test> - </tests> - - <help> - - .. class:: infomark - - **Homer annoatePeaks** - - More information on accepted formats and options - - http://biowhat.ucsd.edu/homer/ngs/annotation.html - - TIP: use homer_bed2pos and homer_pos2bed to convert between the homer peak positions and the BED format. - -**Parameter list** - -Command line options (not all of them are supported):: - - Usage: annotatePeaks.pl <peak file | tss> <genome version> [additional options...] - - Available Genomes (required argument): (name,org,directory,default promoter set) - -- or -- - Custom: provide the path to genome FASTA files (directory or single file) - - User defined annotation files (default is UCSC refGene annotation): - annotatePeaks.pl accepts GTF (gene transfer formatted) files to annotate positions relative - to custom annotations, such as those from de novo transcript discovery or Gencode. - -gtf <gtf format file> (-gff and -gff3 can work for those files, but GTF is better) - - Peak vs. tss/tts/rna mode (works with custom GTF file): - If the first argument is "tss" (i.e. annotatePeaks.pl tss hg18 ...) then a TSS centric - analysis will be carried out. Tag counts and motifs will be found relative to the TSS. - (no position file needed) ["tts" now works too - e.g. 3' end of gene] - ["rna" specifies gene bodies, will automaticall set "-size given"] - NOTE: The default TSS peak size is 4000 bp, i.e. +/- 2kb (change with -size option) - -list <gene id list> (subset of genes to perform analysis [unigene, gene id, accession, - probe, etc.], default = all promoters) - -cTSS <promoter position file i.e. peak file> (should be centered on TSS) - - Primary Annotation Options: - -mask (Masked repeats, can also add 'r' to end of genome name) - -m <motif file 1> [motif file 2] ... (list of motifs to find in peaks) - -mscore (reports the highest log-odds score within the peak) - -nmotifs (reports the number of motifs per peak) - -mdist (reports distance to closest motif) - -mfasta <filename> (reports sites in a fasta file - for building new motifs) - -fm <motif file 1> [motif file 2] (list of motifs to filter from above) - -rmrevopp <#> (only count sites found within <#> on both strands once, i.e. palindromic) - -matrix <prefix> (outputs a motif co-occurrence files: - prefix.count.matrix.txt - number of peaks with motif co-occurrence - prefix.ratio.matrix.txt - ratio of observed vs. expected co-occurrence - prefix.logPvalue.matrix.txt - co-occurrence enrichment - prefix.stats.txt - table of pair-wise motif co-occurrence statistics - additional options: - -matrixMinDist <#> (minimum distance between motif pairs - to avoid overlap) - -matrixMaxDist <#> (maximum distance between motif pairs) - -mbed <filename> (Output motif positions to a BED file to load at UCSC (or -mpeak)) - -mlogic <filename> (will output stats on common motif orientations) - -d <tag directory 1> [tag directory 2] ... (list of experiment directories to show - tag counts for) NOTE: -dfile <file> where file is a list of directories in first column - -bedGraph <bedGraph file 1> [bedGraph file 2] ... (read coverage counts from bedGraph files) - -wig <wiggle file 1> [wiggle file 2] ... (read coverage counts from wiggle files) - -p <peak file> [peak file 2] ... (to find nearest peaks) - -pdist to report only distance (-pdist2 gives directional distance) - -pcount to report number of peaks within region - -vcf <VCF file> (annotate peaks with genetic variation infomation, one col per individual) - -editDistance (Computes the # bp changes relative to reference) - -individuals <name1> [name2] ... (restrict analysis to these individuals) - -gene <data file> ... (Adds additional data to result based on the closest gene. - This is useful for adding gene expression data. The file must have a header, - and the first column must be a GeneID, Accession number, etc. If the peak - cannot be mapped to data in the file then the entry will be left empty. - -go <output directory> (perform GO analysis using genes near peaks) - -genomeOntology <output directory> (perform genomeOntology analysis on peaks) - -gsize <#> (Genome size for genomeOntology analysis, default: 2e9) - - Annotation vs. Histogram mode: - -hist <bin size in bp> (i.e 1, 2, 5, 10, 20, 50, 100 etc.) - The -hist option can be used to generate histograms of position dependent features relative - to the center of peaks. This is primarily meant to be used with -d and -m options to map - distribution of motifs and ChIP-Seq tags. For ChIP-Seq peaks for a Transcription factor - you might want to use the -center option (below) to center peaks on the known motif - ** If using "-size given", histogram will be scaled to each region (i.e. 0-100%), with - the -hist parameter being the number of bins to divide each region into. - Histogram Mode specific Options: - -nuc (calculated mononucleotide frequencies at each position, - Will report by default if extracting sequence for other purposes like motifs) - -di (calculated dinucleotide frequencies at each position) - -histNorm <#> (normalize the total tag count for each region to 1, where <#> is the - minimum tag total per region - use to avoid tag spikes from low coverage - -ghist (outputs profiles for each gene, for peak shape clustering) - -rm <#> (remove occurrences of same motif that occur within # bp) - - Peak Centering: (other options are ignored) - -center <motif file> (This will re-center peaks on the specified motif, or remove peak - if there is no motif in the peak. ONLY recentering will be performed, and all other - options will be ignored. This will output a new peak file that can then be reanalyzed - to reveal fine-grain structure in peaks (It is advised to use -size < 200) with this - to keep peaks from moving too far (-mirror flips the position) - -multi (returns genomic positions of all sites instead of just the closest to center) - - Advanced Options: - -len <#> / -fragLength <#> (Fragment length, default=auto, might want to set to 0 for RNA) - -size <#> (Peak size[from center of peak], default=inferred from peak file) - -size #,# (i.e. -size -10,50 count tags from -10 bp to +50 bp from center) - -size "given" (count tags etc. using the actual regions - for variable length regions) - -log (output tag counts as log2(x+1+rand) values - for scatter plots) - -sqrt (output tag counts as sqrt(x+rand) values - for scatter plots) - -strand <+|-|both> (Count tags on specific strands relative to peak, default: both) - -pc <#> (maximum number of tags to count per bp, default=0 [no maximum]) - -cons (Retrieve conservation information for peaks/sites) - -CpG (Calculate CpG/GC content) - -ratio (process tag values as ratios - i.e. chip-seq, or mCpG/CpG) - -nfr (report nuclesome free region scores instead of tag counts, also -nfrSize <#>) - -norevopp (do not search for motifs on the opposite strand [works with -center too]) - -noadj (do not adjust the tag counts based on total tags sequenced) - -norm <#> (normalize tags to this tag count, default=1e7, 0=average tag count in all directories) - -pdist (only report distance to nearest peak using -p, not peak name) - -map <mapping file> (mapping between peak IDs and promoter IDs, overrides closest assignment) - -noann, -nogene (skip genome annotation step, skip TSS annotation) - -homer1/-homer2 (by default, the new version of homer [-homer2] is used for finding motifs) - - - </help> -</tool> -
--- a/tools/bed2pos.xml Mon Aug 12 14:39:25 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,37 +0,0 @@ -<tool id="homer_bed2pos" name="homer_bed2pos" version="1.0.0"> - <requirements> - <requirement type="package" version="4.1">homer</requirement> - </requirements> - <description></description> - <!--<version_command></version_command>--> - <command> - bed2pos.pl $input_bed 1> $out_pos - 2> $out_log || echo "Error running bed2pos." >&2 - </command> - <inputs> - <param format="tabular,bed" name="input_bed" type="data" label="BED file" /> - </inputs> - <outputs> - <!--<data format="html" name="html_outfile" label="index" />--> - <!--<data format="html" hidden="True" name="html_outfile" label="index.html" />--> - <data format="tabular" name="out_pos" label="${tool.name} on #echo os.path.splitext(str($input_bed.name))[0]#" /> - <data format="txt" name="out_log" label="${tool.name} on #echo os.path.splitext(str($input_bed.name))[0]#.log" /> - </outputs> - <tests> - <test> - <!--<param name="input_file" value="extract_genomic_dna.fa" />--> - <!--<output name="html_file" file="sample_output.html" ftype="html" />--> - </test> - </tests> - - <help> - .. class:: infomark - - Converts: BED -(to)-> homer peak positions - - **Homer bed2pos.pl** - - http://biowhat.ucsd.edu/homer/ngs/miscellaneous.html - </help> -</tool> -
--- a/tools/findMotifsGenome.xml Mon Aug 12 14:39:25 2013 -0400 +++ b/tools/findMotifsGenome.xml Tue Aug 13 09:38:07 2013 -0400 @@ -1,48 +1,116 @@ -<tool id="homer_findMotifsGenome" name="identify motifs" version="0.1.2"> +<tool id="homer_findMotifsGenome" name="identify Motifs" version="0.1.2"> + <description></description> <requirements> <requirement type="package" version="35x1">blat</requirement> <requirement type="package" version="2.8.2">weblogo</requirement> <requirement type="package" version="9.07">ghostscript</requirement> </requirements> - <description></description> - <!--<version_command></version_command>--> <command> + #import os #import tempfile - #set $tmpdir = tempfile.mkdtemp() + + #set $tmpdir = os.path.abspath( tempfile.mkdtemp() ) export PATH=\$PATH:$database.fields.path; - findMotifsGenome $infile ${infile.metadata.dbkey} $tmpdir + findMotifsGenome.pl $infile ${infile.metadata.dbkey} $tmpdir + + -p 4 + $mask + -size $size + -len $motif_len + -mis $mismatches + -S $number_of_motifs + $noweight + $cpg + -nlen $nlen + -olen $olen + $hypergeometric + $norevopp + $rna + + #if $bg_infile: + -bg $bg_infile + #end if + + #if $logfile_output: + 2> $out_logfile + #else: + 2>&1 + #end if + + ; + cp $tmpdir/knownResults.txt $known_results_tabular; + + #if $concat_motifs_output: + cp $tmpdir/homerMotifs.all.motifs $out_concat_motifs; + #end if + + #if $html_output: + #set $go_path = os.path.join($tmpdir, 'geneOntology.html') + + mkdir $denovo_results_html.files_path; + cp $tmpdir/homerResults.html $denovo_results_html; + cp $tmpdir/homerResults.html "$denovo_results_html.files_path"; + cp -r $tmpdir/homerResults/ "$denovo_results_html.files_path"; - ; - cp $tmpdir/homerResults.html $denovo_results_html; - cp -r $tmpdir/homerResults/* "$denovo_results_html.files_path"; + mkdir "$known_results_html.files_path"; + cp $tmpdir/knownResults.html $known_results_html; + cp $tmpdir/knownResults.html "$known_results_html.files_path"; + cp $tmpdir/homerResults.html "$known_results_html.files_path"; + cp -r $tmpdir/knownResults/ "$known_results_html.files_path"; - cp $tmpdir/knownResults.html $known_results_html; - cp -r $tmpdir/knownResults/* "$known_results_html.files_path"; + #if os.path.exists( $go_path ): + cp $go_path "$denovo_results_html.files_path"; + cp $go_path "$known_results_html.files_path"; + #end if - + #end if - 2>&1 + ##rm -rf $tmpdir </command> <inputs> - <param name="database" type="select" label="HOMER database" min="1"> - <options from_file="homer.loc"> - <column name="value" index="0"/> - <column name="name" index="1"/> - <column name="path" index="2"/> - </options> - </param> + <expand macro="input_choose_homer_version" /> <param name="infile" format="bed" type="data" label="BED file" help="a file containing genomic coordinates"> <validator type="dataset_metadata_in_file" filename="homer_available_genomes.loc" metadata_name="dbkey" metadata_column="0" message="No HOMER genome build available for your species." /> </param> + <param name="rna" type="boolean" truevalue="-rna" falsevalue="" checked="False" label="Search for RNA motifs" help="If looking at RNA data (i.e. Clip-Seq or similar), this option will restrict HOMER to only search the + strand (relative to the peak), and will output RNA motif logos (i.e. U instead of T). It will also try to compare found motifs to an RNA motif database, which sadly, only contains miRNAs right now... I guess chuck roundhouse kicked all of the splicing and other RNA motifs into hard to find databases."/> + <param name="norevopp" type="boolean" truevalue="-norevopp" falsevalue="" checked="False" label="Only search for motifs on + strand" help=""/> + <param name="hypergeometric" type="boolean" truevalue="-h" falsevalue="" checked="False" label="Hypergeometric enrichment scoring" help="By default, findMotifsGenome.pl uses the binomial distribution to score motifs. This works well when the number of background sequences greatly out number the target sequences - however, if you are using '-bg' option above, and the number of background sequences is smaller than target sequences, it is a good idea to use the hypergeometric distribution instead. FYI - The binomial is faster to compute, hence it's use for motif finding in large numbers of regions."/> + <param name="olen" type="integer" value="0" label="Motif level autonormalization" help ="0 means disabled"/> + <param name="nlen" type="integer" value="3" label="Region level autonormalization" help ="0 to disable"/> + <param name="noweight" type="boolean" truevalue="-noweight" falsevalue="" checked="False" label="disabling GC/CpG normalization" help=""/> + <param name="cpg" type="boolean" truevalue="-cpg" falsevalue="" checked="False" label="normalize CpG% content instead of GC% content" help=""/> + <param name="number_of_motifs" type="integer" value="25" label="Number of motifs to find" help ="The more mismatches you allow, the more sensitive the algorithm, particularly for longer motifs. However, this also slows down the algorithm a bit. If searching for motifs longer than 12-15 bp, it's best to increase this value to at least 3 or even 4."/> + <param name="mismatches" type="integer" value="2" label="Mismatches allowed in global optimization phase" help ="The more mismatches you allow, the more sensitive the algorithm, particularly for longer motifs. However, this also slows down the algorithm a bit. If searching for motifs longer than 12-15 bp, it's best to increase this value to at least 3 or even 4."/> + <param name="mask" type="boolean" truevalue="-mask" falsevalue="" checked="True" label="Use masked version of the genome" help=""/> + <param name="size" type="integer" value="200" label="The size of the region used for motif finding" help =" If analyzing ChIP-Seq peaks from a transcription factor, Chuck would recommend 50 bp for establishing the primary motif bound by a given transcription factor and 200 bp for finding both primary and 'co-enriched' motifs for a transcription factor. When looking at histone marked regions, 500-1000 bp is probably a good idea (i.e. H3K4me or H3/H4 acetylated regions). In theory, HOMER can work with very large regions (i.e. 10kb), but with the larger the regions comes more sequence and longer execution time."/> + <param name="motif_len" type="text" value="8,10,12" label="Specifies the length of motifs to be found" help ="HOMER will find motifs of each size separately and then combine the results at the end. The length of time it takes to find motifs increases greatly with increasing size. In general, it's best to try out enrichment with shorter lengths (i.e. less than 15) before trying longer lengths. Much longer motifs can be found with HOMER, but it's best to use smaller sets of sequence when trying to find long motifs (i.e. use '-len 20 -size 50'), otherwise it may take way too long (or take too much memory). The other trick to reduce the total resource consumption is to reduce the number of background sequences (-N #)."/> + + <param name="bg_infile" format="bed" type="data" optional="True" label="User defined background regions" help="These will still be normalized for CpG% or GC% content just like randomly chosen sequences and autonormalized unless these options are turned off (i.e. '-nlen 0 -noweight'). This can be very useful since HOMER is a differential motif discovery algorithm. For example, you can give HOMER a set of peaks co-bound by another factor and compare them to the rest of the peaks. HOMER will automatically check if the background peaks overlap with the target peaks using mergePeaks, and discard overlapping regions."/> + + <param name="concat_motifs_output" type="boolean" truevalue="" falsevalue="" checked="True" label="Output concatenated file composed of all motifs" help=""/> + <param name="html_output" type="boolean" truevalue="" falsevalue="" checked="True" label="Output HOMER visual summaries" help=""/> + <param name="logfile_output" type="boolean" truevalue="" falsevalue="" label="Output HOMER logfile" help=""/> + </inputs> <outputs> - <data format="html" name="denovo_results_html" label="HOMER de novo motifs" /> - <data format="html" name="known_results_html" label="HOMER known motifs" /> + <data format="tabular" name="known_results_tabular" label="HOMER known motifs" /> + <data format="html" name="denovo_results_html" label="HOMER de novo motifs"> + <filter>html_output is True</filter> + </data> + <data format="html" name="known_results_html" label="HOMER known motifs"> + <filter>html_output is True</filter> + </data> + <data format="txt" name="out_concat_motifs" label="HOMER concatenated motif files"> + <filter>concat_motifs_output is True</filter> + </data> + <data name="out_logfile" type="data" format="txt" label="HOMER logfile: motifs from ${on_string}"> + <filter>logfile_output is True</filter> + </data> </outputs> <tests> <test> @@ -56,6 +124,11 @@ **Homer findMotifsGenome** +Autonormalization attempts to remove sequence bias from lower order oligos (1-mers, 2-mers ... up to #). +Region level autonormalization, which is for 1/2/3 mers by default, attempts to normalize background regions by adjusting their weights. +If this isn't getting the job done (autonormalization is not guaranteed to remove all sequence bias), you can try the more aggressive motif level autonormalization (-olen #). +This performs the autonormalization routine on the oligo table during de novo motif discovery. + </help> </tool>
--- a/tools/findPeaks.xml Mon Aug 12 14:39:25 2013 -0400 +++ b/tools/findPeaks.xml Tue Aug 13 09:38:07 2013 -0400 @@ -1,45 +1,38 @@ -<tool id="homer_findPeaks" name="homer_findPeaks" version="0.1.2"> +<tool id="homer_findPeaks" name="find Peaks" version="0.1.2"> + <description></description> <requirements> <requirement type="package" version="35x1">blat</requirement> <requirement type="package" version="2.8.2">weblogo</requirement> <requirement type="package" version="9.07">ghostscript</requirement> </requirements> - <description>Homer's peakcaller. Requires tag directories (see makeTagDirectory)</description> <!--<version_command></version_command>--> <command> export PATH=\$PATH:$database.fields.path; findPeaks $affected_tag_dir.extra_files_path -o $outputPeakFile - #if $control_tag_dir: - -i $control_tag_dir.extra_files_path - #end if + #if $control_tag_dir: + -i $control_tag_dir.extra_files_path + #end if - 2>&1 + #if $logfile_output: + 2> $out_logfile + #else: + 2>&1 + #end if </command> <inputs> - <param name="database" type="select" label="HOMER database" min="1"> - <options from_file="homer.loc"> - <column name="value" index="0"/> - <column name="name" index="1"/> - <column name="path" index="2"/> - </options> - </param> + <expand macro="input_choose_homer_version" /> <param name="affected_tag_dir" format="homer_tagdir" type="data" label="tag directory" help="Must be made with the tool makeTagDirectory" /> <param name="control_tag_dir" type="data" format="homer_tagdir" optional="True" label="Control tag directory" help="Must be made with makeTagDirectory" /> - + <param name="logfile_output" type="boolean" truevalue="" falsevalue="" label="Output HOMER logfile" help=""/> </inputs> <outputs> - <!--<data format="html" name="html_outfile" label="index" />--> - <!--<data format="html" hidden="True" name="html_outfile" label="index.html" />--> <data format="txt" name="outputPeakFile" label="${tool.name} on #echo os.path.splitext(str($affected_tag_dir.name))[0]#.txt" /> - <!--<data format="txt" name="out_log" label="${tool.name} on #echo os.path.splitext(str($tagDir.name))[0]#.log" />--> </outputs> <tests> <test> - <!--<param name="input_file" value="extract_genomic_dna.fa" />--> - <!--<output name="html_file" file="sample_output.html" ftype="html" />--> </test> </tests> @@ -49,76 +42,8 @@ **Homer findPeaks** - For more options, look under: "Command line options for findPeaks" - - http://biowhat.ucsd.edu/homer/ngs/peaks.html - - TIP: use homer_bed2pos and homer_pos2bed to convert between the homer peak positions and the BED format. - -**Parameter list** - -Command line options (not all of them are supported):: - - Usage: findPeaks <tag directory> [options] - - Finds peaks in the provided tag directory. By default, peak list printed to stdout - - General analysis options: - -o <filename|auto> (file name for to output peaks, default: stdout) - "-o auto" will send output to "<tag directory>/peaks.txt", ".../regions.txt", - or ".../transcripts.txt" depending on the "-style" option - -style <option> (Specialized options for specific analysis strategies) - factor (transcription factor ChIP-Seq, uses -center, output: peaks.txt, default) - histone (histone modification ChIP-Seq, region based, uses -region -size 500 -L 0, regions.txt) - groseq (de novo transcript identification from GroSeq data, transcripts.txt) - tss (TSS identification from 5' RNA sequencing, tss.txt) - dnase (Hypersensitivity [crawford style (nicking)], peaks.txt) +Requires tag directories (see makeTagDirectory) - chipseq/histone options: - -i <input tag directory> (Experiment to use as IgG/Input/Control) - -size <#> (Peak size, default: auto) - -minDist <#> (minimum distance between peaks, default: peak size x2) - -gsize <#> (Set effective mappable genome size, default: 2e9) - -fragLength <#|auto> (Approximate fragment length, default: auto) - -inputFragLength <#|auto> (Approximate fragment length of input tags, default: auto) - -tbp <#> (Maximum tags per bp to count, 0 = no limit, default: auto) - -inputtbp <#> (Maximum tags per bp to count in input, 0 = no limit, default: auto) - -strand <both|separate> (find peaks using tags on both strands or separate, default:both) - -norm # (Tag count to normalize to, default 10000000) - -region (extends start/stop coordinates to cover full region considered "enriched") - -center (Centers peaks on maximum tag overlap and calculates focus ratios) - -nfr (Centers peaks on most likely nucleosome free region [works best with mnase data]) - (-center and -nfr can be performed later with "getPeakTags" - - Peak Filtering options: (set -F/-L/-C to 0 to skip) - -F <#> (fold enrichment over input tag count, default: 4.0) - -P <#> (poisson p-value threshold relative to input tag count, default: 0.0001) - -L <#> (fold enrichment over local tag count, default: 4.0) - -LP <#> (poisson p-value threshold relative to local tag count, default: 0.0001) - -C <#> (fold enrichment limit of expected unique tag positions, default: 2.0) - -localSize <#> (region to check for local tag enrichment, default: 10000) - -inputSize <#> (Size of region to search for control tags, default: 2x peak size) - -fdr <#> (False discovery rate, default = 0.001) - -poisson <#> (Set poisson p-value cutoff, default: uses fdr) - -tagThreshold <#> (Set # of tags to define a peak, default: 25) - -ntagThreshold <#> (Set # of normalized tags to define a peak, by default uses 1e7 for norm) - -minTagThreshold <#> (Absolute minimum tags per peak, default: expected tags per peak) - - GroSeq Options: (Need to specify "-style groseq"): - -tssSize <#> (size of region for initiation detection/artifact size, default: 250) - -minBodySize <#> (size of regoin for transcript body detection, default: 1000) - -maxBodySize <#> (size of regoin for transcript body detection, default: 10000) - -tssFold <#> (fold enrichment for new initiation dectection, default: 4.0) - -bodyFold <#> (fold enrichment for new transcript dectection, default: 4.0) - -endFold <#> (end transcript when levels are this much less than the start, default: 10.0) - -fragLength <#> (Approximate fragment length, default: 150) - -uniqmap <directory> (directory of binary files specifying uniquely mappable locations) - Download from http://biowhat.ucsd.edu/homer/groseq/ - -confPvalue <#> (confidence p-value: 1.00e-05) - -minReadDepth <#> (Minimum initial read depth for transcripts, default: auto) - -pseudoCount <#> (Pseudo tag count, default: 2.0) - -gtf <filename> (Output de novo transcripts in GTF format) - "-o auto" will produce <dir>/transcripts.txt and <dir>/transcripts.gtf </help> </tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/homer_macros.xml Tue Aug 13 09:38:07 2013 -0400 @@ -0,0 +1,11 @@ +<macros> + <macro name="input_choose_homer_version"> + <param name="database" type="select" label="HOMER version and data files" min="1"> + <options from_file="homer.loc"> + <column name="value" index="0"/> + <column name="name" index="1"/> + <column name="path" index="2"/> + </options> + </param> + </macro> +</macros>
--- a/tools/makeTagDirectory.py Mon Aug 12 14:39:25 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,94 +0,0 @@ -""" - - -""" -import re -import os -import sys -import subprocess -import optparse -import shutil -import tempfile - -def getFileString(fpath, outpath): - """ - format a nice file size string - """ - size = '' - fp = os.path.join(outpath, fpath) - s = '? ?' - if os.path.isfile(fp): - n = float(os.path.getsize(fp)) - if n > 2**20: - size = ' (%1.1f MB)' % (n/2**20) - elif n > 2**10: - size = ' (%1.1f KB)' % (n/2**10) - elif n > 0: - size = ' (%d B)' % (int(n)) - s = '%s %s' % (fpath, size) - return s - -class makeTagDirectory(): - """wrapper - """ - - def __init__(self,opts=None, args=None): - self.opts = opts - self.args = args - - def run_makeTagDirectory(self): - """ - makeTagDirectory <Output Directory Name> [options] <alignment file1> [alignment file 2] - - """ - if self.opts.format != "bam": - cl = [self.opts.executable] + args + ["-format" , self.opts.format] - else: - cl = [self.opts.executable] + args - print cl - p = subprocess.Popen(cl) - retval = p.wait() - - - html = self.gen_html(args[0]) - #html = self.gen_html() - return html,retval - - def gen_html(self, dr=os.getcwd()): - flist = os.listdir(dr) - print flist - """ add a list of all files in the tagdirectory - """ - res = ['<div class="module"><h2>Files created by makeTagDirectory</h2><table cellspacing="2" cellpadding="2">\n'] - - flist.sort() - for i,f in enumerate(flist): - if not(os.path.isdir(f)): - fn = os.path.split(f)[-1] - res.append('<tr><td><a href="%s">%s</a></td></tr>\n' % (fn,getFileString(fn, dr))) - - res.append('</table>\n') - - return res - -if __name__ == '__main__': - op = optparse.OptionParser() - op.add_option('-e', '--executable', default='makeTagDirectory') - op.add_option('-o', '--htmloutput', default=None) - op.add_option('-f', '--format', default="sam") - opts, args = op.parse_args() - #assert os.path.isfile(opts.executable),'## makeTagDirectory.py error - cannot find executable %s' % opts.executable - - #if not os.path.exists(opts.outputdir): - #os.makedirs(opts.outputdir) - f = makeTagDirectory(opts, args) - - html,retval = f.run_makeTagDirectory() - f = open(opts.htmloutput, 'w') - f.write(''.join(html)) - f.close() - if retval <> 0: - print >> sys.stderr, serr # indicate failure - - -
--- a/tools/makeTagDirectory.xml Mon Aug 12 14:39:25 2013 -0400 +++ b/tools/makeTagDirectory.xml Tue Aug 13 09:38:07 2013 -0400 @@ -1,34 +1,31 @@ -<tool id="homer_makeTagDirectory" name="Make HOMER database" version="1.0.1"> +<tool id="homer_makeTagDirectory" name="create HOMER database" version="1.0.1"> <requirements> <requirement type="package" version="35x1">blat</requirement> <requirement type="package" version="2.8.2">weblogo</requirement> <requirement type="package" version="9.07">ghostscript</requirement> </requirements> - <description>(TagDirectory). Used by findPeaks</description> - <!--<version_command></version_command>--> + <description>(TagDirectory)</description> <command> - #set $HOMER_PATH = str($database.fields.path) - export PATH=\$PATH:$database.fields.path; + #set $HOMER_PATH = str($database.fields.path) + export PATH=\$PATH:$database.fields.path; + + makeTagDirectory $tag_dir.extra_files_path + #for $infile in $alignment_files: + $infile.file + #end for - makeTagDirectory $tag_dir.extra_files_path - #for $infile in $alignment_files: - $infile.file - #end for - - 2>&1 + #if $logfile_output: + 2> $out_logfile + #else: + 2>&1 + #end if </command> <inputs> - <param name="database" type="select" label="HOMER database" min="1"> - <options from_file="homer.loc"> - <column name="value" index="0"/> - <column name="name" index="1"/> - <column name="path" index="2"/> - </options> - </param> + <expand macro="input_choose_homer_version" /> <repeat name="alignment_files" title="Alignment Files"> - <param name="file" label="Add file" type="data" format="sam,bed,bam" help="Alignments in SAM, BAM or BED format" /> + <param name="file" label="Add file" type="data" format="sam,bed,bam" help="Alignments in SAM, BAM or BED format" /> </repeat> </inputs>
--- a/tools/pos2bed.xml Mon Aug 12 14:39:25 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,37 +0,0 @@ -<tool id="homer_pos2bed" name="homer_pos2bed" version="1.0.0"> - <requirements> - <requirement type="package" version="4.1" >homer</requirement> - </requirements> - <description></description> - <!--<version_command></version_command>--> - <command> - pos2bed.pl $input_peak 1> $out_bed - 2> $out_log || echo "Error running pos2bed." >&2 - </command> - <inputs> - <param format="tabular" name="input_peak" type="data" label="Homer peak positions" /> - </inputs> - <outputs> - <!--<data format="html" name="html_outfile" label="index" />--> - <!--<data format="html" hidden="True" name="html_outfile" label="index.html" />--> - <data format="bed" name="out_bed" label="${tool.name} on #echo os.path.splitext(str($input_peak.name))[0]#.bed" /> - <data format="txt" name="out_log" label="${tool.name} on #echo os.path.splitext(str($input_peak.name))[0]#.log" /> - </outputs> - <tests> - <test> - <!--<param name="input_file" value="extract_genomic_dna.fa" />--> - <!--<output name="html_file" file="sample_output.html" ftype="html" />--> - </test> - </tests> - - <help> - .. class:: infomark - - Converts: homer peak positions -(to)-> BED format - - **Homer pos2bed.pl** - - http://biowhat.ucsd.edu/homer/ngs/miscellaneous.html - </help> -</tool> -