# HG changeset patch # User iuc # Date 1430794040 14400 # Node ID 57841366f1129f6ecec193f8f053b8f947af0802 # Parent dc6df7644fc4bb84add6af4c73e9293841dd5fd3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/htseq commit 344140b8df53b8b7024618bb04594607a045c03a diff -r dc6df7644fc4 -r 57841366f112 htseqsams2mx.xml --- a/htseqsams2mx.xml Wed Apr 29 12:06:59 2015 -0400 +++ b/htseqsams2mx.xml Mon May 04 22:47:20 2015 -0400 @@ -1,19 +1,19 @@ using HTSeq code + + pysam + matplotlib + htseq + - - pysam - matplotlib - htseq - htseqsams2mx.py -g "$gfffile" -o "$outfile" -m "$model" --id_attribute "$id_attr" --feature_type "$feature_type" - --mapqMin $mapqMin + --mapqMin $mapqMin #for $s in $samfiles: #if $s.ext != 'data': - --samf "'${s}','${s.name}','${s.ext}','${s.metadata.bam_index}'" + --samf "'${s}','${s.name}','${s.ext}','${s.metadata.bam_index}'" #end if #end for #if $filter_extras: @@ -22,8 +22,8 @@ - union - + - + - + - + @@ -80,36 +80,36 @@ Counts reads in multiple sam/bam format mapped files and generates a matrix ideal for edgeR and other count based tools It uses HTSeq to count your sam reads over a gene model supplied as a GTF file -The output is a tabular text (columnar - spreadsheet) file containing the +The output is a tabular text (columnar - spreadsheet) file containing the count matrix for downstream processing. Each row contains the counts from each sample for each -of the non-emtpy GTF input file contigs matching the GTF attribute choice above. -You probably want to use gene level GTF output attribute and count reads that overlap +of the non-emtpy GTF input file contigs matching the GTF attribute choice above. +You probably want to use gene level GTF output attribute and count reads that overlap GTF exons for RNA-seq. Or you can count over exons by using transcript level output names or ids. Etc. ---- **Author's plea on replicates** -If you want to interpret the downstream p values in terms of rejecting or accepting the null hypothesis +If you want to interpret the downstream p values in terms of rejecting or accepting the null hypothesis under random sampling with replacement from the universe of possible biological/experimental replicates from which your data was derived, -which is what published p values are often assumed to do, then you need biological -(or for cell culture material experimental) replicates. +which is what published p values are often assumed to do, then you need biological +(or for cell culture material experimental) replicates. -Using technical or no replicates means the downstream p values are not interpretable the way most people would assume +Using technical or no replicates means the downstream p values are not interpretable the way most people would assume they are - ie as the probability of obtaining a result as or more extreme as your experimental data in millions of experiments conducted using the same methods under the null hypothesis. -There is no way around this and it is scientific fraud to ignore this issue and publish bogus p values derived from -technical or no replicates without making the lack of biological or experimental error in the p value calculations +There is no way around this and it is scientific fraud to ignore this issue and publish bogus p values derived from +technical or no replicates without making the lack of biological or experimental error in the p value calculations clear to your readers so they can adjust their expectations. However, the buck stops here at higher level inference. If you have no replicates, you must not use this tool as the p values are uninterpretable. So there. -See your stats 101 notes on the central limit theorem and test statistics for a refresher or talk to a +See your stats 101 notes on the central limit theorem and test statistics for a refresher or talk to a statistician if this makes no sense please. **Attribution** -This Galaxy tool relies on HTSeq_ from http://www-huber.embl.de/users/anders/HTSeq/doc/index.html +This Galaxy tool relies on HTSeq_ from http://www-huber.embl.de/users/anders/HTSeq/doc/index.html for the tricky work of counting. That code includes the following attribution: ## Written by Simon Anders (sanders@fs.tum.de), European Molecular Biology @@ -121,7 +121,7 @@ Otherwise, all code and documentation comprising this tool including the requirement for more than one sample bam -was written by Ross Lazarus and is +was written by Ross Lazarus and is licensed to you under the LGPL_ like other rgenetics artefacts Sorry, I don't use readgroups so had no reason to code read groups. Contributions welcome. Send code @@ -129,5 +129,6 @@ .. _LGPL: http://www.gnu.org/copyleft/lesser.html .. _HTSeq: http://www-huber.embl.de/users/anders/HTSeq/doc/index.html - + +