htseq_count_eba: htseq-count.xml annotate

annotate htseq-count.xml @ 1:9e5fd206da01 draft default tip

Uploaded

author	yboursin
date	Wed, 25 May 2016 10:07:31 -0400
parents	34cfb3829048
children

rev	line source
1 9e5fd206da01 Uploaded yboursin parents: 0 diff changeset	1 <tool id="htseq_count" name="htseq-count" version="EBA2016-v1">
0 34cfb3829048 Uploaded yboursin parents: diff changeset	2 <description> - Count aligned reads in a BAM file that overlap features in a GFF file</description>
34cfb3829048 Uploaded yboursin parents: diff changeset	3 <requirements>
34cfb3829048 Uploaded yboursin parents: diff changeset	4 <requirement type="package" version="0.6.1">htseq</requirement>
34cfb3829048 Uploaded yboursin parents: diff changeset	5 <requirement type="package" version="1.7.1">numpy</requirement>
34cfb3829048 Uploaded yboursin parents: diff changeset	6 <requirement type="package" version="0.1.19">samtools</requirement>
34cfb3829048 Uploaded yboursin parents: diff changeset	7 <requirement type="package" version="0.7.7">pysam</requirement>
34cfb3829048 Uploaded yboursin parents: diff changeset	8 </requirements>
34cfb3829048 Uploaded yboursin parents: diff changeset	9
34cfb3829048 Uploaded yboursin parents: diff changeset	10 <stdio>
34cfb3829048 Uploaded yboursin parents: diff changeset	11 <exit_code range="1:" level="fatal" description="Unknown error occurred" />
34cfb3829048 Uploaded yboursin parents: diff changeset	12 <regex match="htseq-count: (command ){0,1}not found" source="stderr" level="fatal" description="The HTSeq python package is not properly installed, contact Galaxy administrators" />
34cfb3829048 Uploaded yboursin parents: diff changeset	13 <regex match="samtools: (command ){0,1}not found" source="stderr" level="fatal" description="The samtools package is not properly installed, contact Galaxy administrators" />
34cfb3829048 Uploaded yboursin parents: diff changeset	14 <regex match="Error: Feature (.+) does not contain a '(.+)' attribute" source="both" level="fatal" description="Error parsing the GFF file, at least one feature of the specified 'Feature type' does not have a value for the specified 'ID Attribute'" />
34cfb3829048 Uploaded yboursin parents: diff changeset	15 <regex match="Error occured in line (\d+) of file" source="stderr" level="fatal" description="Unknown error parsing the GFF file" />
34cfb3829048 Uploaded yboursin parents: diff changeset	16 <regex match="Error" source="stderr" level="fatal" description="Unknown error occured" />
34cfb3829048 Uploaded yboursin parents: diff changeset	17 <regex match="Warning: Read (.+) claims to have an aligned mate which could not be found. $Is the SAM file properly sorted\?$" source="stderr" level="warning" description="PAIRED DATA MISSING OR NOT PROPERLY SORTED. Try reruning and selecting the option to 'Force sorting of SAM/BAM file by NAME'. See stderr output of this dataset for more information." />
34cfb3829048 Uploaded yboursin parents: diff changeset	18 </stdio>
34cfb3829048 Uploaded yboursin parents: diff changeset	19
34cfb3829048 Uploaded yboursin parents: diff changeset	20 <version_command>htseq-count -h \| grep version \| sed 's/^$.$$version .*$\./\2/'</version_command>
34cfb3829048 Uploaded yboursin parents: diff changeset	21
34cfb3829048 Uploaded yboursin parents: diff changeset	22 <command><![CDATA[
34cfb3829048 Uploaded yboursin parents: diff changeset	23 ##set up input files
34cfb3829048 Uploaded yboursin parents: diff changeset	24 #set $reference_fasta_filename = "localref.fa"
34cfb3829048 Uploaded yboursin parents: diff changeset	25 #if $samout_conditional.samout:
34cfb3829048 Uploaded yboursin parents: diff changeset	26 #if str( $samout_conditional.reference_source.reference_source_selector ) == "history":
34cfb3829048 Uploaded yboursin parents: diff changeset	27 ln -s "${samout_conditional.reference_source.ref_file}" "${reference_fasta_filename}" &&
34cfb3829048 Uploaded yboursin parents: diff changeset	28 samtools faidx "${reference_fasta_filename}" 2>&1 \|\| echo "Error running samtools faidx for htseq-count" >&2 &&
34cfb3829048 Uploaded yboursin parents: diff changeset	29 #else:
34cfb3829048 Uploaded yboursin parents: diff changeset	30 #set $reference_fasta_filename = str( $samout_conditional.reference_source.ref_file.fields.path )
34cfb3829048 Uploaded yboursin parents: diff changeset	31 #end if
34cfb3829048 Uploaded yboursin parents: diff changeset	32 #end if
34cfb3829048 Uploaded yboursin parents: diff changeset	33 #if $force_sort:
34cfb3829048 Uploaded yboursin parents: diff changeset	34 #if $samfile.extension == 'bam':
34cfb3829048 Uploaded yboursin parents: diff changeset	35 samtools sort -n "$samfile" "name_sorted_alignment" &&
34cfb3829048 Uploaded yboursin parents: diff changeset	36 #else
34cfb3829048 Uploaded yboursin parents: diff changeset	37 samtools view -Su -t ${reference_fasta_filename}.fai "$samfile" \| samtools sort -n - "name_sorted_alignment" &&
34cfb3829048 Uploaded yboursin parents: diff changeset	38 #end if
34cfb3829048 Uploaded yboursin parents: diff changeset	39 #end if
34cfb3829048 Uploaded yboursin parents: diff changeset	40 htseq-count
34cfb3829048 Uploaded yboursin parents: diff changeset	41 --mode=$mode
34cfb3829048 Uploaded yboursin parents: diff changeset	42 --stranded=$stranded
34cfb3829048 Uploaded yboursin parents: diff changeset	43 --minaqual=$minaqual
34cfb3829048 Uploaded yboursin parents: diff changeset	44 --type="$featuretype"
34cfb3829048 Uploaded yboursin parents: diff changeset	45 --idattr="$idattr"
34cfb3829048 Uploaded yboursin parents: diff changeset	46 #if $samout_conditional.samout:
34cfb3829048 Uploaded yboursin parents: diff changeset	47 --samout=$__new_file_path__/${samoutfile.id}_tmp
34cfb3829048 Uploaded yboursin parents: diff changeset	48 #end if
34cfb3829048 Uploaded yboursin parents: diff changeset	49 #if $force_sort:
34cfb3829048 Uploaded yboursin parents: diff changeset	50 --order=name
34cfb3829048 Uploaded yboursin parents: diff changeset	51 --format=bam
34cfb3829048 Uploaded yboursin parents: diff changeset	52 name_sorted_alignment.bam
34cfb3829048 Uploaded yboursin parents: diff changeset	53 #else
34cfb3829048 Uploaded yboursin parents: diff changeset	54 --order=pos
34cfb3829048 Uploaded yboursin parents: diff changeset	55 --format=$samfile.extension
34cfb3829048 Uploaded yboursin parents: diff changeset	56 $samfile
34cfb3829048 Uploaded yboursin parents: diff changeset	57 #end if
34cfb3829048 Uploaded yboursin parents: diff changeset	58 "$gfffile"
34cfb3829048 Uploaded yboursin parents: diff changeset	59 \| awk '{if ($1 ~ "no_feature\|ambiguous\|too_low_aQual\|not_aligned\|alignment_not_unique") print $0 \| "cat 1>&2"; else print $0}' > $counts 2>$othercounts
34cfb3829048 Uploaded yboursin parents: diff changeset	60 #if $samout_conditional.samout:
34cfb3829048 Uploaded yboursin parents: diff changeset	61 && samtools view -Su -t ${reference_fasta_filename}.fai $__new_file_path__/${samoutfile.id}_tmp \| samtools sort -o - sorted > $samoutfile
34cfb3829048 Uploaded yboursin parents: diff changeset	62 #end if
34cfb3829048 Uploaded yboursin parents: diff changeset	63 ]]>
34cfb3829048 Uploaded yboursin parents: diff changeset	64 </command>
34cfb3829048 Uploaded yboursin parents: diff changeset	65
34cfb3829048 Uploaded yboursin parents: diff changeset	66 <inputs>
34cfb3829048 Uploaded yboursin parents: diff changeset	67 <param format="sam,bam" name="samfile" type="data" label="Aligned SAM/BAM File"/>
34cfb3829048 Uploaded yboursin parents: diff changeset	68 <param format="gff" name="gfffile" type="data" label="GFF File"/>
34cfb3829048 Uploaded yboursin parents: diff changeset	69 <param name="mode" type="select" label="Mode" help="(--mode)">
34cfb3829048 Uploaded yboursin parents: diff changeset	70 <help>Mode to handle reads overlapping more than one feature.</help>
34cfb3829048 Uploaded yboursin parents: diff changeset	71 <option value="union" selected="true">Union</option>
34cfb3829048 Uploaded yboursin parents: diff changeset	72 <option value="intersection-strict">Intersection (strict)</option>
34cfb3829048 Uploaded yboursin parents: diff changeset	73 <option value="intersection-nonempty">Intersection (nonempty)</option>
34cfb3829048 Uploaded yboursin parents: diff changeset	74 </param>
34cfb3829048 Uploaded yboursin parents: diff changeset	75 <param name="stranded" type="select" label="Stranded" help="dUTP, NSR, and NNSR (e.g.: Illumina TruSeq Stranded mRNA Library Prep) are reverse if mapping was done using fr-firstrand library type. SOLID or ligation (e.g.: Epicentre ScriptSeq) are Yes if mapping was done using fr-secondstrand library type (--stranded)">
34cfb3829048 Uploaded yboursin parents: diff changeset	76 <help>Specify whether the data is from a strand-specific assay. 'Reverse' means yes with reversed strand interpretation.dUTP, NSR, and NNSR (e.g.: Illumina TruSeq Stranded mRNA Library Prep) are reverse if mapping was done using fr-firstrand library type. SOLID or ligation (e.g.: Epicentre ScriptSeq) are Yes if mapping was done using fr-secondstrand library type.</help>
34cfb3829048 Uploaded yboursin parents: diff changeset	77 <option value="yes" selected="true">Yes</option>
34cfb3829048 Uploaded yboursin parents: diff changeset	78 <option value="no">No</option>
34cfb3829048 Uploaded yboursin parents: diff changeset	79 <option value="reverse">Reverse</option>
34cfb3829048 Uploaded yboursin parents: diff changeset	80 </param>
34cfb3829048 Uploaded yboursin parents: diff changeset	81 <param name="minaqual" type="integer" value="10" label="Minimum alignment quality">
34cfb3829048 Uploaded yboursin parents: diff changeset	82 <help>Skip all reads with alignment quality lower than the given minimum value. (-minaqual)</help>
34cfb3829048 Uploaded yboursin parents: diff changeset	83 </param>
34cfb3829048 Uploaded yboursin parents: diff changeset	84 <param name="featuretype" type="text" value="exon" label="Feature type">
34cfb3829048 Uploaded yboursin parents: diff changeset	85 <help>Feature type (3rd column in GFF file) to be used. All features of other types are ignored. The default, suitable for RNA-Seq and Ensembl GTF files, is exon. (--type)</help>
34cfb3829048 Uploaded yboursin parents: diff changeset	86 </param>
34cfb3829048 Uploaded yboursin parents: diff changeset	87 <param name="idattr" type="text" value="gene_id" label="ID Attribute">
34cfb3829048 Uploaded yboursin parents: diff changeset	88 <help>GFF attribute to be used as feature ID. Several GFF lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identity the counts in the output table. All features of the specified type MUST have a value for this attribute. The default, suitable for RNA-Seq and Ensembl GTF files, is gene_id.</help>
34cfb3829048 Uploaded yboursin parents: diff changeset	89 </param>
34cfb3829048 Uploaded yboursin parents: diff changeset	90 <conditional name="samout_conditional">
34cfb3829048 Uploaded yboursin parents: diff changeset	91 <param name="samout" type="boolean" value="False" truevalue="True" falsevalue="False" label="Additional BAM Output">
34cfb3829048 Uploaded yboursin parents: diff changeset	92 <help>Write out all SAM alignment records into an output BAM file, annotating each line with its assignment to a feature or a special counter (as an optional field with tag ‘XF’).</help>
34cfb3829048 Uploaded yboursin parents: diff changeset	93 </param>
34cfb3829048 Uploaded yboursin parents: diff changeset	94 <when value="True">
34cfb3829048 Uploaded yboursin parents: diff changeset	95 <conditional name="reference_source">
34cfb3829048 Uploaded yboursin parents: diff changeset	96 <param name="reference_source_selector" type="select" label="Choose the source for the reference list">
34cfb3829048 Uploaded yboursin parents: diff changeset	97 <option value="cached">Locally cached</option>
34cfb3829048 Uploaded yboursin parents: diff changeset	98 <option value="history">History</option>
34cfb3829048 Uploaded yboursin parents: diff changeset	99 </param>
34cfb3829048 Uploaded yboursin parents: diff changeset	100 <when value="cached">
34cfb3829048 Uploaded yboursin parents: diff changeset	101 <param name="ref_file" type="select" label="Using reference genome">
34cfb3829048 Uploaded yboursin parents: diff changeset	102 <options from_data_table="sam_fa_indexes">
34cfb3829048 Uploaded yboursin parents: diff changeset	103 <filter type="data_meta" key="dbkey" ref="samfile" column="1"/>
34cfb3829048 Uploaded yboursin parents: diff changeset	104 </options>
34cfb3829048 Uploaded yboursin parents: diff changeset	105 <validator type="no_options" message="A built-in reference genome is not available for the build associated with the selected input file"/>
34cfb3829048 Uploaded yboursin parents: diff changeset	106 </param>
34cfb3829048 Uploaded yboursin parents: diff changeset	107 </when>
34cfb3829048 Uploaded yboursin parents: diff changeset	108 <when value="history"> <!-- FIX ME!!!! -->
34cfb3829048 Uploaded yboursin parents: diff changeset	109 <param name="ref_file" type="data" format="fasta" label="Using reference file" />
34cfb3829048 Uploaded yboursin parents: diff changeset	110 </when>
34cfb3829048 Uploaded yboursin parents: diff changeset	111 </conditional>
34cfb3829048 Uploaded yboursin parents: diff changeset	112 </when>
34cfb3829048 Uploaded yboursin parents: diff changeset	113 </conditional>
34cfb3829048 Uploaded yboursin parents: diff changeset	114 <param name="force_sort" type="boolean" value="False" truevalue="True" falsevalue="False" label="Force sorting of SAM/BAM file by NAME">
34cfb3829048 Uploaded yboursin parents: diff changeset	115 <help>This option can be used for for paired-end data that has many unmapped mates. Use this if you get the warning about paired end data missing or not being properly sorted.</help>
34cfb3829048 Uploaded yboursin parents: diff changeset	116 </param>
34cfb3829048 Uploaded yboursin parents: diff changeset	117 </inputs>
34cfb3829048 Uploaded yboursin parents: diff changeset	118
34cfb3829048 Uploaded yboursin parents: diff changeset	119 <outputs>
34cfb3829048 Uploaded yboursin parents: diff changeset	120 <data format="tabular" name="counts" metadata_source="samfile" label="${tool.name} on ${on_string}"/>
34cfb3829048 Uploaded yboursin parents: diff changeset	121 <data format="tabular" name="othercounts" metadata_source="samfile" label="${tool.name} on ${on_string} (no feature)"/>
34cfb3829048 Uploaded yboursin parents: diff changeset	122 <data format="bam" name="samoutfile" metadata_source="samfile" label="${tool.name} on ${on_string} (BAM)">
34cfb3829048 Uploaded yboursin parents: diff changeset	123 <filter>samout_conditional['samout']</filter>
34cfb3829048 Uploaded yboursin parents: diff changeset	124 </data>
34cfb3829048 Uploaded yboursin parents: diff changeset	125 </outputs>
34cfb3829048 Uploaded yboursin parents: diff changeset	126
34cfb3829048 Uploaded yboursin parents: diff changeset	127 <tests>
34cfb3829048 Uploaded yboursin parents: diff changeset	128 <test>
34cfb3829048 Uploaded yboursin parents: diff changeset	129 <param name="samfile" value="htseq-test.sam" />
34cfb3829048 Uploaded yboursin parents: diff changeset	130 <param name="gfffile" value="htseq-test.gff" />
34cfb3829048 Uploaded yboursin parents: diff changeset	131 <param name="samout" value="False" />
34cfb3829048 Uploaded yboursin parents: diff changeset	132 <output name="counts" file="htseq-test_counts.tsv" />
34cfb3829048 Uploaded yboursin parents: diff changeset	133 <output name="othercounts" file="htseq-test_othercounts.tsv" />
34cfb3829048 Uploaded yboursin parents: diff changeset	134 </test>
34cfb3829048 Uploaded yboursin parents: diff changeset	135 <test>
34cfb3829048 Uploaded yboursin parents: diff changeset	136 <param name="samfile" value="htseq-test.sam" />
34cfb3829048 Uploaded yboursin parents: diff changeset	137 <param name="gfffile" value="htseq-test.gff" />
34cfb3829048 Uploaded yboursin parents: diff changeset	138 <param name="samout" value="False" />
34cfb3829048 Uploaded yboursin parents: diff changeset	139 <param name="force_sort" value="True" />
34cfb3829048 Uploaded yboursin parents: diff changeset	140 <output name="counts" file="htseq-test_counts.tsv" />
34cfb3829048 Uploaded yboursin parents: diff changeset	141 <output name="othercounts" file="htseq-test_othercounts.tsv" />
34cfb3829048 Uploaded yboursin parents: diff changeset	142 </test>
34cfb3829048 Uploaded yboursin parents: diff changeset	143 <test>
34cfb3829048 Uploaded yboursin parents: diff changeset	144 <param name="samfile" value="htseq-test.bam" />
34cfb3829048 Uploaded yboursin parents: diff changeset	145 <param name="gfffile" value="htseq-test.gff" />
34cfb3829048 Uploaded yboursin parents: diff changeset	146 <param name="samout" value="False" />
34cfb3829048 Uploaded yboursin parents: diff changeset	147 <output name="counts" file="htseq-test_counts.tsv" />
34cfb3829048 Uploaded yboursin parents: diff changeset	148 <output name="othercounts" file="htseq-test_othercounts.tsv" />
34cfb3829048 Uploaded yboursin parents: diff changeset	149 </test>
34cfb3829048 Uploaded yboursin parents: diff changeset	150 <test>
34cfb3829048 Uploaded yboursin parents: diff changeset	151 <param name="samfile" value="htseq-test-paired.bam" />
34cfb3829048 Uploaded yboursin parents: diff changeset	152 <param name="singlepaired" value="paired" />
34cfb3829048 Uploaded yboursin parents: diff changeset	153 <param name="gfffile" value="htseq-test.gff" />
34cfb3829048 Uploaded yboursin parents: diff changeset	154 <param name="samout" value="False" />
34cfb3829048 Uploaded yboursin parents: diff changeset	155 <output name="counts" file="htseq-test-paired_counts.tsv" />
34cfb3829048 Uploaded yboursin parents: diff changeset	156 <output name="othercounts" file="htseq-test-paired_othercounts.tsv" />
34cfb3829048 Uploaded yboursin parents: diff changeset	157 </test>
34cfb3829048 Uploaded yboursin parents: diff changeset	158 <test>
34cfb3829048 Uploaded yboursin parents: diff changeset	159 <param name="samfile" value="htseq-test-paired.bam" />
34cfb3829048 Uploaded yboursin parents: diff changeset	160 <param name="singlepaired" value="paired" />
34cfb3829048 Uploaded yboursin parents: diff changeset	161 <param name="gfffile" value="htseq-test.gff" />
34cfb3829048 Uploaded yboursin parents: diff changeset	162 <param name="samout" value="False" />
34cfb3829048 Uploaded yboursin parents: diff changeset	163 <param name="force_sort" value="True" />
34cfb3829048 Uploaded yboursin parents: diff changeset	164 <output name="counts" file="htseq-test-paired_counts.tsv" />
34cfb3829048 Uploaded yboursin parents: diff changeset	165 <output name="othercounts" file="htseq-test-paired_othercounts.tsv" />
34cfb3829048 Uploaded yboursin parents: diff changeset	166 </test>
34cfb3829048 Uploaded yboursin parents: diff changeset	167
34cfb3829048 Uploaded yboursin parents: diff changeset	168 <!-- Seems to be an issue setting the $reference_fasta_filename variable during test
34cfb3829048 Uploaded yboursin parents: diff changeset	169 <test>
34cfb3829048 Uploaded yboursin parents: diff changeset	170 <param name="samfile" value="htseq-test.sam" />
34cfb3829048 Uploaded yboursin parents: diff changeset	171 <param name="gfffile" value="htseq-test.gff" />
34cfb3829048 Uploaded yboursin parents: diff changeset	172 <param name="samout" value="True" />
34cfb3829048 Uploaded yboursin parents: diff changeset	173 <param name="reference_source_selector" value="history" />
34cfb3829048 Uploaded yboursin parents: diff changeset	174 <param name="ref_file" value="htseq-test_reference.fasta" />
34cfb3829048 Uploaded yboursin parents: diff changeset	175 <output name="counts" file="htseq-test_counts.tsv" />
34cfb3829048 Uploaded yboursin parents: diff changeset	176 <output name="othercounts" file="htseq-test_othercounts.tsv" />
34cfb3829048 Uploaded yboursin parents: diff changeset	177 <output name="samoutfile" file="htseq-test_samout.bam" />
34cfb3829048 Uploaded yboursin parents: diff changeset	178 </test>
34cfb3829048 Uploaded yboursin parents: diff changeset	179 -->
34cfb3829048 Uploaded yboursin parents: diff changeset	180 </tests>
34cfb3829048 Uploaded yboursin parents: diff changeset	181
34cfb3829048 Uploaded yboursin parents: diff changeset	182 <help>
34cfb3829048 Uploaded yboursin parents: diff changeset	183 <![CDATA[
34cfb3829048 Uploaded yboursin parents: diff changeset	184 Overview
34cfb3829048 Uploaded yboursin parents: diff changeset	185 --------
34cfb3829048 Uploaded yboursin parents: diff changeset	186
34cfb3829048 Uploaded yboursin parents: diff changeset	187 This tool takes an alignment file in SAM or BAM format and feature file in GFF format
34cfb3829048 Uploaded yboursin parents: diff changeset	188 and calculates the number of reads mapping to each feature. It uses the htseq-count
34cfb3829048 Uploaded yboursin parents: diff changeset	189 script that is part of the HTSeq python module. See
34cfb3829048 Uploaded yboursin parents: diff changeset	190 http://www-huber.embl.de/users/anders/HTSeq/doc/count.html for details.
34cfb3829048 Uploaded yboursin parents: diff changeset	191
34cfb3829048 Uploaded yboursin parents: diff changeset	192 A feature is an interval (i.e., a range of positions) on a chromosome or a union of
34cfb3829048 Uploaded yboursin parents: diff changeset	193 such intervals. In the case of RNA-Seq, the features are typically genes, where
34cfb3829048 Uploaded yboursin parents: diff changeset	194 each gene is considered here as the union of all its exons. One may also consider
34cfb3829048 Uploaded yboursin parents: diff changeset	195 each exon as a feature, e.g., in order to check for alternative splicing. For
34cfb3829048 Uploaded yboursin parents: diff changeset	196 comparative ChIP-Seq, the features might be binding regions from a pre-determined
34cfb3829048 Uploaded yboursin parents: diff changeset	197 list.
34cfb3829048 Uploaded yboursin parents: diff changeset	198
34cfb3829048 Uploaded yboursin parents: diff changeset	199
34cfb3829048 Uploaded yboursin parents: diff changeset	200 Overlap Modes
34cfb3829048 Uploaded yboursin parents: diff changeset	201 -------------
34cfb3829048 Uploaded yboursin parents: diff changeset	202
34cfb3829048 Uploaded yboursin parents: diff changeset	203 Special care must be taken to decide how to deal with reads that overlap more than one feature.
34cfb3829048 Uploaded yboursin parents: diff changeset	204
34cfb3829048 Uploaded yboursin parents: diff changeset	205 The htseq-count script allows to choose between three modes: union, intersection-strict, and intersection-nonempty.
34cfb3829048 Uploaded yboursin parents: diff changeset	206
34cfb3829048 Uploaded yboursin parents: diff changeset	207 The following figure illustrates the effect of these three modes:
34cfb3829048 Uploaded yboursin parents: diff changeset	208
34cfb3829048 Uploaded yboursin parents: diff changeset	209 .. image:: count_modes.png
34cfb3829048 Uploaded yboursin parents: diff changeset	210
34cfb3829048 Uploaded yboursin parents: diff changeset	211
34cfb3829048 Uploaded yboursin parents: diff changeset	212 Strandedness
34cfb3829048 Uploaded yboursin parents: diff changeset	213 ------------
34cfb3829048 Uploaded yboursin parents: diff changeset	214
34cfb3829048 Uploaded yboursin parents: diff changeset	215 Important: The default for strandedness is yes. If your RNA-Seq data has not been made with a strand-specific protocol, this causes half of the reads to be lost. Hence, make sure to set the option Stranded to 'No' unless you have strand-specific data!
34cfb3829048 Uploaded yboursin parents: diff changeset	216
34cfb3829048 Uploaded yboursin parents: diff changeset	217
34cfb3829048 Uploaded yboursin parents: diff changeset	218 Output
34cfb3829048 Uploaded yboursin parents: diff changeset	219 ------
34cfb3829048 Uploaded yboursin parents: diff changeset	220
34cfb3829048 Uploaded yboursin parents: diff changeset	221 The script outputs a table with counts for each feature, followed by the special counters, which count reads that were not counted for any feature for various reasons, namely
34cfb3829048 Uploaded yboursin parents: diff changeset	222
34cfb3829048 Uploaded yboursin parents: diff changeset	223 - no_feature: reads which could not be assigned to any feature (set S as described above was empty).
34cfb3829048 Uploaded yboursin parents: diff changeset	224
34cfb3829048 Uploaded yboursin parents: diff changeset	225 - ambiguous: reads which could have been assigned to more than one feature and hence were not counted for any of these (set S had mroe than one element).
34cfb3829048 Uploaded yboursin parents: diff changeset	226
34cfb3829048 Uploaded yboursin parents: diff changeset	227 - too_low_aQual: reads which were not counted due to the -a option, see below
34cfb3829048 Uploaded yboursin parents: diff changeset	228
34cfb3829048 Uploaded yboursin parents: diff changeset	229 - not_aligned: reads in the SAM file without alignment
34cfb3829048 Uploaded yboursin parents: diff changeset	230
34cfb3829048 Uploaded yboursin parents: diff changeset	231 - alignment_not_unique: reads with more than one reported alignment. These reads are recognized from the NH optional SAM field tag. (If the aligner does not set this field, multiply aligned reads will be counted multiple times.)
34cfb3829048 Uploaded yboursin parents: diff changeset	232
34cfb3829048 Uploaded yboursin parents: diff changeset	233
34cfb3829048 Uploaded yboursin parents: diff changeset	234 Options Summary
34cfb3829048 Uploaded yboursin parents: diff changeset	235 ---------------
34cfb3829048 Uploaded yboursin parents: diff changeset	236
34cfb3829048 Uploaded yboursin parents: diff changeset	237 Usage: htseq-count [options] sam_file gff_file
34cfb3829048 Uploaded yboursin parents: diff changeset	238
34cfb3829048 Uploaded yboursin parents: diff changeset	239 This script takes an alignment file in SAM format and a feature file in GFF
34cfb3829048 Uploaded yboursin parents: diff changeset	240 format and calculates for each feature the number of reads mapping to it. See
34cfb3829048 Uploaded yboursin parents: diff changeset	241 http://www-huber.embl.de/users/anders/HTSeq/doc/count.html for details.
34cfb3829048 Uploaded yboursin parents: diff changeset	242
34cfb3829048 Uploaded yboursin parents: diff changeset	243 Options:
34cfb3829048 Uploaded yboursin parents: diff changeset	244 -h, --help show this help message and exit
34cfb3829048 Uploaded yboursin parents: diff changeset	245 -m MODE, --mode=MODE mode to handle reads overlapping more than one
34cfb3829048 Uploaded yboursin parents: diff changeset	246 feature(choices: union, intersection-strict,
34cfb3829048 Uploaded yboursin parents: diff changeset	247 intersection-nonempty; default: union)
34cfb3829048 Uploaded yboursin parents: diff changeset	248 -s STRANDED, --stranded=STRANDED
34cfb3829048 Uploaded yboursin parents: diff changeset	249 whether the data is from a strand-specific assay.
34cfb3829048 Uploaded yboursin parents: diff changeset	250 Specify 'yes', 'no', or 'reverse' (default: yes).
34cfb3829048 Uploaded yboursin parents: diff changeset	251 'reverse' means 'yes' with reversed strand
34cfb3829048 Uploaded yboursin parents: diff changeset	252 interpretation
34cfb3829048 Uploaded yboursin parents: diff changeset	253 -a MINAQUAL, --minaqual=MINAQUAL
34cfb3829048 Uploaded yboursin parents: diff changeset	254 skip all reads with alignment quality lower than the
34cfb3829048 Uploaded yboursin parents: diff changeset	255 given minimum value (default: 0)
34cfb3829048 Uploaded yboursin parents: diff changeset	256 -t FEATURETYPE, --type=FEATURETYPE
34cfb3829048 Uploaded yboursin parents: diff changeset	257 feature type (3rd column in GFF file) to be used, all
34cfb3829048 Uploaded yboursin parents: diff changeset	258 features of other type are ignored (default, suitable
34cfb3829048 Uploaded yboursin parents: diff changeset	259 for Ensembl GTF files: exon)
34cfb3829048 Uploaded yboursin parents: diff changeset	260 -i IDATTR, --idattr=IDATTR
34cfb3829048 Uploaded yboursin parents: diff changeset	261 GFF attribute to be used as feature ID (default,
34cfb3829048 Uploaded yboursin parents: diff changeset	262 suitable for Ensembl GTF files: gene_id)
34cfb3829048 Uploaded yboursin parents: diff changeset	263 -o SAMOUT, --samout=SAMOUT
34cfb3829048 Uploaded yboursin parents: diff changeset	264 write out all SAM alignment records into an output SAM
34cfb3829048 Uploaded yboursin parents: diff changeset	265 file called SAMOUT, annotating each line with its
34cfb3829048 Uploaded yboursin parents: diff changeset	266 feature assignment (as an optional field with tag
34cfb3829048 Uploaded yboursin parents: diff changeset	267 'XF')
34cfb3829048 Uploaded yboursin parents: diff changeset	268 -q, --quiet suppress progress report and warnings
34cfb3829048 Uploaded yboursin parents: diff changeset	269
34cfb3829048 Uploaded yboursin parents: diff changeset	270 Written by Simon Anders (sanders@fs.tum.de), European Molecular Biology
34cfb3829048 Uploaded yboursin parents: diff changeset	271 Laboratory (EMBL). (c) 2010. Released under the terms of the GNU General
34cfb3829048 Uploaded yboursin parents: diff changeset	272 Public License v3. Part of the 'HTSeq' framework.
34cfb3829048 Uploaded yboursin parents: diff changeset	273 ]]>
34cfb3829048 Uploaded yboursin parents: diff changeset	274 </help>
34cfb3829048 Uploaded yboursin parents: diff changeset	275
34cfb3829048 Uploaded yboursin parents: diff changeset	276 <citations>
34cfb3829048 Uploaded yboursin parents: diff changeset	277 <citation type="bibtex">
34cfb3829048 Uploaded yboursin parents: diff changeset	278 @article{anders_htseqpython_2015,
34cfb3829048 Uploaded yboursin parents: diff changeset	279 title = {{HTSeq}—a {Python} framework to work with high-throughput sequencing data},
34cfb3829048 Uploaded yboursin parents: diff changeset	280 volume = {31},
34cfb3829048 Uploaded yboursin parents: diff changeset	281 issn = {1367-4803, 1460-2059},
34cfb3829048 Uploaded yboursin parents: diff changeset	282 url = {http://bioinformatics.oxfordjournals.org/content/31/2/166},
34cfb3829048 Uploaded yboursin parents: diff changeset	283 doi = {10.1093/bioinformatics/btu638},
34cfb3829048 Uploaded yboursin parents: diff changeset	284 abstract = {Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed.
34cfb3829048 Uploaded yboursin parents: diff changeset	285 Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes.
34cfb3829048 Uploaded yboursin parents: diff changeset	286 Availability and implementation: HTSeq is released as an open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq.
34cfb3829048 Uploaded yboursin parents: diff changeset	287 Contact: sanders\{at\}fs.tum.de},
34cfb3829048 Uploaded yboursin parents: diff changeset	288 language = {en},
34cfb3829048 Uploaded yboursin parents: diff changeset	289 number = {2},
34cfb3829048 Uploaded yboursin parents: diff changeset	290 urldate = {2015-04-21},
34cfb3829048 Uploaded yboursin parents: diff changeset	291 journal = {Bioinformatics},
34cfb3829048 Uploaded yboursin parents: diff changeset	292 author = {Anders, Simon and Pyl, Paul Theodor and Huber, Wolfgang},
34cfb3829048 Uploaded yboursin parents: diff changeset	293 month = jan,
34cfb3829048 Uploaded yboursin parents: diff changeset	294 year = {2015},
34cfb3829048 Uploaded yboursin parents: diff changeset	295 pmid = {25260700},
34cfb3829048 Uploaded yboursin parents: diff changeset	296 pages = {166--169},
34cfb3829048 Uploaded yboursin parents: diff changeset	297 file = {Full Text PDF:/Users/lparsons/Library/Application Support/Firefox/Profiles/thd2t4je.default/zotero/storage/84XQB8V6/Anders et al. - 2015 - HTSeq—a Python framework to work with high-through.pdf:application/pdf;Snapshot:/Users/lparsons/Library/Application Support/Firefox/Profiles/thd2t4je.default/zotero/storage/JKUAUCKB/166.html:text/html}
34cfb3829048 Uploaded yboursin parents: diff changeset	298 }
34cfb3829048 Uploaded yboursin parents: diff changeset	299 </citation>
34cfb3829048 Uploaded yboursin parents: diff changeset	300 </citations>
34cfb3829048 Uploaded yboursin parents: diff changeset	301 </tool>

Mercurial > repos > yboursin > htseq_count_eba

annotate htseq-count.xml @ 1:9e5fd206da01 draft default tip