Mercurial > repos > iuc > featurecounts
changeset 31:83ab9e468b86 draft
"planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/featurecounts commit 839f962c859728f53bb696cea0720862418f1a13"
author | iuc |
---|---|
date | Sat, 04 Dec 2021 22:17:25 +0000 |
parents | 56aa64c23690 |
children | 456cc2927264 |
files | featurecounts.xml featurecounts.xml.orig |
diffstat | 2 files changed, 43 insertions(+), 678 deletions(-) [+] |
line wrap: on
line diff
--- a/featurecounts.xml Tue Aug 31 08:07:43 2021 +0000 +++ b/featurecounts.xml Sat Dec 04 22:17:25 2021 +0000 @@ -2,7 +2,7 @@ <description>Measure gene expression in RNA-Seq experiments from SAM or BAM files</description> <macros> <token name="@TOOL_VERSION@">2.0.1</token> - <token name="@VERSION_SUFFIX@">1</token> + <token name="@VERSION_SUFFIX@">2</token> </macros> <xrefs> <xref type="bio.tools">subread</xref> @@ -38,6 +38,12 @@ -T \${GALAXY_SLOTS:-2} -s $strand_specificity + + -Q $read_filtering_parameters.mapping_quality + $read_filtering_parameters.splitonly + $read_filtering_parameters.primary + $read_filtering_parameters.ignore_dup + -t '$extended_parameters.gff_feature_type' -g '$extended_parameters.gff_feature_attribute' $extended_parameters.summarization_level @@ -66,14 +72,11 @@ $extended_parameters.by_read_group - -Q $extended_parameters.mapping_quality $extended_parameters.largest_overlap --minOverlap $extended_parameters.min_overlap --fracOverlap $extended_parameters.frac_overlap --fracOverlapFeature $extended_parameters.frac_overlap_feature $extended_parameters.read_reduction - $extended_parameters.primary - $extended_parameters.ignore_dup #if $extended_parameters.R: $extended_parameters.R #end if @@ -258,6 +261,33 @@ help="If specified, the chimeric fragments (those fragments that have their two ends aligned to different chromosomes) will NOT be included for summarization. This option is only applicable for paired-end read data." /> </section> + <section name="read_filtering_parameters" title="Read filtering options"> + <param name="mapping_quality" + type="integer" + value="0" + argument="-Q" + label="Minimum mapping quality per read" + help="The minimum mapping quality score a read must satisfy in order to be counted. For paired-end reads, at least one end should satisfy this criteria. 0 by default." /> + <param name="splitonly" type="select" display="radio" label="Filter split alignments" help="Split alignments are alignments with CIGAR string containing 'N', e.g. exon spanning reads in RNASeq."> + <option value="">No filtering: count split and non-split alignments</option> + <option value="--splitOnly">Count only split alignments (--splitOnly)</option> + <option value="--nonSplitOnly">Count only non-split alignments (--nonSplitOnly)</option> + </param> + <param type="boolean" + truevalue=" --primary" + falsevalue="" + argument="--primary" + label="Only count primary alignments" + help="If specified, only primary alignments will be counted. Primary and secondary alignments are identified using bit 0x100 in theFlag field of SAM/BAM files. All primary alignments in a dataset will be counted regardless of whether they are from multi-mapping reads or not ('-M' is ignored)." /> + <param name="ignore_dup" + type="boolean" + truevalue=" --ignoreDup" + falsevalue="" + argument="--ignoreDup" + label="Ignore reads marked as duplicate" + help="If specified, reads that were marked as duplicates will be ignored. Bit Ox400 in the FLAG field of a SAM/BAM file is used for identifying duplicate reads. In paired end data, the entire read pair will be ignored if at least one end is found to be a duplicate read." /> + </section> + <section name="extended_parameters" title="Advanced options"> <param name="gff_feature_type" type="text" @@ -283,8 +313,8 @@ <conditional name = "multifeatures"> <param name="multifeat" type="select" label="Allow reads to map to multiple features" help="Setting -O, -M and --fraction"> - <option value="" selected="true">Disabled; reads that align to multiple features or overlapping features are excluded</option> - <option value="-M">Enabled; multi-mapping reads are included (-M)</option> + <option value="" selected="true">Disabled: reads that align to multiple features or overlapping features are excluded</option> + <option value="-M">Enabled: multi-mapping reads are included (-M)</option> <option value="-O">Enabled: multi-overlapping features are included (-O)</option> <option value="-O -M">Enabled: both multi-mapping and multi-overlapping features are included (-M -O)</option> </param> @@ -295,8 +325,8 @@ truevalue="--fraction" falsevalue="" argument="--fraction" - label="Assign fractions to multimapping reads" - help="If specified, a fractional count 1/n will be generated for each multi-mapping read, where n is the number of alignments (indica- ted by 'NH' tag) reported for the read."/> + label="Assign fractions to multi-mapping reads" + help="If specified, a fractional count 1/x will be generated for each multi-mapping read, where x is the number of alignments (indicated by 'NH' tag) reported for the read."/> </when> <when value="-O"> <param name="fraction" @@ -304,8 +334,8 @@ truevalue="--fraction" falsevalue="" argument="--fraction" - label="Assign fractions to multimapping reads" - help="If specified, a fractional count 1/n will be generated for each multi-overlapping read, where n is the number of alignments (indica- ted by 'NH' tag) reported for the read."/> + label="Assign fractions to multi-overlapping features" + help="If specified, a fractional count 1/y will be generated for each multi-overlapping feature, where y is the number of features overlapping with the read."/> </when> <when value="-O -M"> <param name="fraction" @@ -313,18 +343,11 @@ truevalue="--fraction" falsevalue="" argument="--fraction" - label="Assign fractions to multimapping reads" - help="If specified, a fractional count 1/n will be generated for each multi-mapping or multi-overlapping read, where n is the number of alignments (indica- ted by 'NH' tag) reported for the read."/> + label="Assign fractions to both multi-mapping reads and multi-overlapping features" + help="If specified, a fractional count 1/(x*y) will be generated, where x is the number of alignments (indicated by 'NH' tag) and y the number of overlapping features."/> </when> </conditional> - <param name="mapping_quality" - type="integer" - value="0" - argument="-Q" - label="Minimum mapping quality per read" - help="The minimum mapping quality score a read must satisfy in order to be counted. For paired-end reads, at least one end should satisfy this criteria. 0 by default." /> - <conditional name="exon_exon_junction_read_counting_enabled"> <param name="count_exon_exon_junction_reads" argument="-J" type="boolean" truevalue="-J" falsevalue="" label="Exon-exon junctions" @@ -403,22 +426,6 @@ <option value="--read2pos 3">Reduce it to the 3' end</option> </param> - <param name="primary" - type="boolean" - truevalue=" --primary" - falsevalue="" - argument="--primary" - label="Only count primary alignments" - help="If specified, only primary alignments will be counted. Primary and secondary alignments are identified using bit 0x100 in theFlag field of SAM/BAM files. All primary alignments in a dataset will be counted regardless of whether they are from multi-mapping reads or not ('-M' is ignored)." /> - - <param name="ignore_dup" - type="boolean" - truevalue=" --ignoreDup" - falsevalue="" - argument="--ignoreDup" - label="Ignore reads marked as duplicate" - help="If specified, reads that were marked as duplicates will be ignored. Bit Ox400 in the FLAG field of a SAM/BAM file is used for identifying duplicate reads. In paired end data, the entire read pair will be ignored if at least one end is found to be a duplicate read." /> - <param type="boolean" truevalue="-R BAM" falsevalue="" @@ -426,13 +433,6 @@ label="Annotates the alignment file with 'XS:Z:'-tags to described per read or read-pair the corresponding assigned feature(s)." help="" /> - <param name="count_split_alignments_only" - type="boolean" - truevalue=" --countSplitAlignmentsOnly" - falsevalue="" - argument="--countSplitAlignmentsOnly" - label="Ignore unspliced alignments" - help="If specified, only split alignments (CIGAR strings containing the letter `N') will be counted. All the other alignments will be ignored. An example of split alignments are exon-spanning reads in RNA-seq data." /> </section> </inputs> <outputs> @@ -620,7 +620,7 @@ FeatureCounts produces a table containing counted reads, per gene, per row. Optionally the last column can be set to be the effective gene-length. These tables are compatible with the DESeq2, edgeR and limma-voom Galaxy wrappers by IUC. .. _Subread: http://subread.sourceforge.net/ -.. _`Subread User's Guide`: http://bioinf.wehi.edu.au/subread-package/SubreadUsersGuide.pdf +.. _`Subread User's Guide`: https://bioconductor.org/packages/release/bioc/vignettes/Rsubread/inst/doc/SubreadUsersGuide.pdf .. _`Subread package`: https://sourceforge.net/projects/subread/files/ ]]></help> <citations>
--- a/featurecounts.xml.orig Tue Aug 31 08:07:43 2021 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,635 +0,0 @@ -<<<<<<< HEAD -<tool id="featurecounts" name="featureCounts" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="20.05"> - <description>Measure gene expression in RNA-Seq experiments from SAM or BAM files.</description> - <macros> - <token name="@TOOL_VERSION@">2.0.1</token> - <token name="@VERSION_SUFFIX@">1</token> - </macros> - -======= -<tool id="featurecounts" name="featureCounts" version="2.0.1" profile="16.04"> - <description>Measure gene expression in RNA-Seq experiments from SAM or BAM files</description> - <xrefs> - <xref type='bio.tools'>subread</xref> - </xrefs> ->>>>>>> d0c3b656f (add bio.tools ID) - <requirements> - <requirement type="package" version="@TOOL_VERSION@">subread</requirement> - <requirement type="package" version="1.11">samtools</requirement> - <requirement type="package" version="8.31">coreutils</requirement> - </requirements> - - <version_command>featureCounts -v 2>&1 | grep .</version_command> - <command detect_errors="exit_code"><![CDATA[ - - ## Export fc path for its built-in annotation - - export FC_PATH=\$(command -v featureCounts | sed 's@/bin/featureCounts$@@') && - - ## Check whether all alignments are from the same type (bam || sam) - featureCounts - - #if $anno.anno_select=="history": - -a '$anno.reference_gene_sets' - -F "GTF" - #elif $anno.anno_select=="cached": - -a '$anno.reference_gene_sets_builtin.fields.path' - -F "GTF" - #elif $anno.anno_select=="builtin": - -a \${FC_PATH}/annotation/${anno.bgenome}_RefSeq_exon.txt - -F "SAF" - #end if - - -o "output" - -T \${GALAXY_SLOTS:-2} - - -s $strand_specificity - -t '$extended_parameters.gff_feature_type' - -g '$extended_parameters.gff_feature_attribute' - $extended_parameters.summarization_level - - $extended_parameters.multifeatures.multifeat - #if $extended_parameters.multifeatures.multifeat != "": - $extended_parameters.multifeatures.fraction - #end if - - - ## $extended_parameters.contribute_to_multiple_features - ## $extended_parameters.multimapping_enabled.multimapping_counts - - ###if str($extended_parameters.multimapping_enabled.multimapping_counts) == " -M": - ## $extended_parameters.multimapping_enabled.fraction - ###end if --> - - $extended_parameters.exon_exon_junction_read_counting_enabled.count_exon_exon_junction_reads - #if str($extended_parameters.exon_exon_junction_read_counting_enabled.count_exon_exon_junction_reads) == "-J": - #if $extended_parameters.exon_exon_junction_read_counting_enabled.genome: - -G '$extended_parameters.exon_exon_junction_read_counting_enabled.genome' - #end if - #end if - - $extended_parameters.long_reads - - $extended_parameters.by_read_group - - -Q $extended_parameters.mapping_quality - $extended_parameters.largest_overlap - --minOverlap $extended_parameters.min_overlap - --fracOverlap $extended_parameters.frac_overlap - --fracOverlapFeature $extended_parameters.frac_overlap_feature - $extended_parameters.read_reduction - $extended_parameters.primary - $extended_parameters.ignore_dup - #if $extended_parameters.R: - $extended_parameters.R - #end if - #if str($extended_parameters.read_extension_5p) != "0": - --readExtension5 $extended_parameters.read_extension_5p - #end if - - #if str($extended_parameters.read_extension_3p) != "0": - --readExtension3 $extended_parameters.read_extension_3p - #end if - - $pe_parameters.fragment_counting_enabled.fragment_counting - #if str($pe_parameters.fragment_counting_enabled.fragment_counting) == " -p": - $pe_parameters.fragment_counting_enabled.check_distance_enabled.check_distance - #if str($pe_parameters.fragment_counting_enabled.check_distance_enabled.check_distance) == " -P": - -d $pe_parameters.fragment_counting_enabled.check_distance_enabled.minimum_fragment_length - -D $pe_parameters.fragment_counting_enabled.check_distance_enabled.maximum_fragment_length - #end if - #end if - - $pe_parameters.only_both_ends - $pe_parameters.exclude_chimerics - - '${alignment}' - - ## Remove comment and add sample name to header - && grep -v "^#" "output" - | sed -e 's|${alignment}|${alignment.element_identifier}|g' - > body.txt - ## Set the right columns for the tabular formats - #if $format.value == "tabdel_medium": - && cut -f 1,7 body.txt > expression_matrix.txt - - ## Paste doesn't allow a non ordered list of columns: -f 1,7,8,6 will only return columns 1,7 and 8 - ## Thus the gene length column (last column) has to be added separately - && cut -f 6 body.txt > gene_lengths.txt - && paste expression_matrix.txt gene_lengths.txt > expression_matrix.txt.bak - && mv -f expression_matrix.txt.bak '${output_medium}' - #elif $format.value == "tabdel_short": - && cut -f 1,7 body.txt > '${output_short}' - #else: - && cp body.txt '${output_full}' - #end if - - #if str($include_feature_length_file) == "true": - && cut -f 1,6 body.txt > '${output_feature_lengths}' - #end if - - #if str($extended_parameters.exon_exon_junction_read_counting_enabled.count_exon_exon_junction_reads) == "-J": - && sed -e 's|${alignment}|${alignment.element_identifier}|g' 'output.jcounts' > '${output_jcounts}' - #end if - - #if $extended_parameters.R: - && samtools sort --no-PG -o '$output_bam' -@ \${GALAXY_SLOTS:-2} -T "\${TMPDIR:-.}" *.featureCounts.bam - #end if - && sed -e 's|${alignment}|${alignment.element_identifier}|g' 'output.summary' > '${output_summary}' - ]]></command> - <inputs> - <param name="alignment" - type="data" - multiple="false" - format="bam,sam" - label="Alignment file" - help="The input alignment file(s) where the gene expression has to be counted. The file can have a SAM or BAM format; but ALL files must be in the same format. Unless you are using a Gene annotation file from the History, these files must have the database/genome attribute already specified e.g. hg38, not the default: ?" > - </param> - - <param name="strand_specificity" - type="select" - label="Specify strand information" - argument="-s" - help="Indicate if the data is stranded and if strand-specific read counting should be performed. Strand setting must be the same as the strand settings used to produce the mapped BAM input(s)"> - <option value="0" selected="true">Unstranded</option> - <option value="1">Stranded (Forward)</option> - <option value="2">Stranded (Reverse)</option> - </param> - - <conditional name="anno"> - <param name="anno_select" type="select" label="Gene annotation file"> - <option value="builtin">featureCounts built-in</option> - <option value="cached" selected="True">locally cached</option> - <option value="history">in your history</option> - </param> - <when value="builtin"> - <param name="bgenome" type="select" label="Select built-in genome" help="Built-in gene annotations for genomes hg38, hg19, mm10 and mm9 are included in featureCounts"> - <options from_data_table="featurecounts_anno"> - <filter type="data_meta" key="dbkey" ref="alignment" column="dbkey"/> - </options> - </param> - </when> - <when value="cached"> - <param name="reference_gene_sets_builtin" type="select" label="Using locally cached annotation" help="If the annotation file you require is not listed here, please contact the Galaxy administrator"> - <options from_data_table="gene_sets"> - <filter type="data_meta" key="dbkey" ref="alignment" column="dbkey"/> - <filter type="sort_by" column="2" /> - </options> - <validator type="no_options" message="An annotation file is not available for the build associated with the selected input file"/> - </param> - </when> - <when value="history"> - <param name="reference_gene_sets" - format="gff,gtf,gff3" - type="data" - label="Gene annotation file" - help="The program assumes that the provided annotation file is in GTF format. Make sure that the gene annotation file corresponds to the same reference genome as used for the alignment"> - </param> - </when> - </conditional> - - <param name="format" - type="select" - label="Output format" - help="The output format will be tabular, select the preferred columns here"> - <option value="tabdel_short">Gene-ID "\t" read-count (MultiQC/DESeq2/edgeR/limma-voom compatible)</option> - <option value="tabdel_medium">Gene-ID "\t" read-count "\t" gene-length</option> - <option value="tabdel_full">featureCounts 1.4.0+ default (includes regions provided by the GTF file)</option> - </param> - - <param name="include_feature_length_file" - type="boolean" - truevalue="true" - falsevalue="false" - checked="false" - label="Create gene-length file" - help="Creates a tabular file that contains the effective (nucleotides used for counting reads) length of the feature; might be useful for estimating FPKM/RPKM" /> - - - <section name="pe_parameters" title="Options for paired-end reads"> - <conditional name="fragment_counting_enabled"> - - <param name="fragment_counting" - type="select" - argument="-p" - checked="true" - label="Count fragments instead of reads" - help="If specified, fragments (or templates) will be counted instead of reads."> - <option value="" selected="true">Disabled; all reads/mates will be counted individually</option> - <option value=" -p">Enabled; fragments (or templates) will be counted instead of reads</option> - </param> - - <when value=" -p"> - <conditional name="check_distance_enabled"> - <param name="check_distance" - type="boolean" - truevalue=" -P" - falsevalue="" - argument="-P" - label="Check paired-end distance" - help="If specified, paired-end distance will be checked when assigning fragments to meta-features or features. This option is only applicable when -p (Count fragments instead of reads) is specified. The distance thresholds should be specified using -d and -D (minimum and maximum fragment/template length) options." /> - <when value=" -P"> - <param name="minimum_fragment_length" - type="integer" - value="50" - argument="-d" - label="Minimum fragment/template length." /> - <param name="maximum_fragment_length" - type="integer" - value="600" - argument="-D" - label="Maximum fragment/template length." /> - </when> - <when value="" /> - </conditional> - </when> - <when value="" /> - </conditional> - - <param name="only_both_ends" - type="boolean" - truevalue=" -B" - falsevalue="" - argument="-B" - label="Only allow fragments with both reads aligned" - help="If specified, only fragments that have both ends successfully aligned will be considered for summarization. This option is only applicable for paired-end reads." /> - - <param name="exclude_chimerics" - type="boolean" - truevalue=" -C" - falsevalue="" - argument="-C" - checked="true" - label="Exclude chimeric fragments" - help="If specified, the chimeric fragments (those fragments that have their two ends aligned to different chromosomes) will NOT be included for summarization. This option is only applicable for paired-end read data." /> - </section> - - <section name="extended_parameters" title="Advanced options"> - <param name="gff_feature_type" - type="text" - value="exon" - argument="-t" - label="GFF feature type filter" - help="Specify the feature type. Only rows which have the matched matched feature type in the provided GTF annotation file will be included for read counting. `exon' by default." /> - - <param name="gff_feature_attribute" - type="text" - value="gene_id" - argument="-g" - label="GFF gene identifier" - help="Specify the attribute type used to group features (eg. exons) into meta-features (eg. genes), when GTF annotation is provided. `gene_id' by default. This attribute type is usually the gene identifier. This argument is useful for the meta-feature level summarization." /> - - <param name="summarization_level" - type="boolean" - truevalue=" -f" - falsevalue="" - argument="-f" - label="On feature level" - help="If specified, read summarization will be performed at the feature level. By default (-f is not specified), the read summarization is performed at the meta-feature level." /> - - <conditional name = "multifeatures"> - <param name="multifeat" type="select" label="Allow reads to map to multiple features" help="Setting -O, -M and --fraction"> - <option value="" selected="true">Disabled; reads that align to multiple features or overlapping features are excluded</option> - <option value="-M">Enabled; multi-mapping reads are included (-M)</option> - <option value="-O">Enabled: multi-overlapping features are included (-O)</option> - <option value="-O -M">Enabled: both multi-mapping and multi-overlapping features are included (-M -O)</option> - </param> - <when value=""/> - <when value="-M"> - <param name="fraction" - type="boolean" - truevalue="--fraction" - falsevalue="" - argument="--fraction" - label="Assign fractions to multimapping reads" - help="If specified, a fractional count 1/n will be generated for each multi-mapping read, where n is the number of alignments (indica- ted by 'NH' tag) reported for the read."/> - </when> - <when value="-O"> - <param name="fraction" - type="boolean" - truevalue="--fraction" - falsevalue="" - argument="--fraction" - label="Assign fractions to multimapping reads" - help="If specified, a fractional count 1/n will be generated for each multi-overlapping read, where n is the number of alignments (indica- ted by 'NH' tag) reported for the read."/> - </when> - <when value="-O -M"> - <param name="fraction" - type="boolean" - truevalue="--fraction" - falsevalue="" - argument="--fraction" - label="Assign fractions to multimapping reads" - help="If specified, a fractional count 1/n will be generated for each multi-mapping or multi-overlapping read, where n is the number of alignments (indica- ted by 'NH' tag) reported for the read."/> - </when> - </conditional> - - <param name="mapping_quality" - type="integer" - value="0" - argument="-Q" - label="Minimum mapping quality per read" - help="The minimum mapping quality score a read must satisfy in order to be counted. For paired-end reads, at least one end should satisfy this criteria. 0 by default." /> - - <conditional name="exon_exon_junction_read_counting_enabled"> - <param name="count_exon_exon_junction_reads" argument="-J" type="boolean" truevalue="-J" falsevalue="" - label="Exon-exon junctions" - help="If specified, reads supporting each exon-exon junction will be counted" /> - <when value="-J"> - <param name="genome" argument="-G" type="data" format="fasta" optional="true" - label="Reference sequence file" - help="The FASTA-format file that contains the reference sequences used in read mapping can be used to improve read counting for junctions" /> - </when> - <when value="" /> - </conditional> - - <param name="long_reads" argument="-L" type="boolean" truevalue="-L" falsevalue="" - label="Long reads" - help="If specified, long reads such as Nanopore and PacBio reads will be counted. Long read counting can only run in one thread and only reads (not read-pairs) can be counted." /> - - <param name="by_read_group" argument="--byReadGroup" type="boolean" truevalue="--byReadGroup" falsevalue="" - label="Count reads by read group" - help="If specified, reads are counted for each read group separately. The 'RG' tag must be present in the input BAM/SAM alignment files." /> - - - <param name="largest_overlap" - type="boolean" - truevalue=" --largestOverlap" - falsevalue="" - argument="--largestOverlap" - label="Largest overlap" - help="If specified, reads (or fragments) will be assigned to the target that has the largest number of overlapping bases" /> - - <param name="min_overlap" - type="integer" - value="1" - argument="--minOverlap" - label="Minimum bases of overlap" - help="Specify the minimum required number of overlapping bases between a read (or a fragment) and a feature. 1 by default. If a negative value is provided, the read will be extended from both ends." /> - - <param name="frac_overlap" - type="integer" - value="0" - min="0" - max="1" - argument="--fracOverlap" - label="Minimum fraction (of read) overlapping a feature" - help="Specify the minimum required fraction of overlapping bases between a read (or a fragment) and a feature. Value should be within range [0,1]. 0 by default. Number of overlapping bases is counted from both reads if paired end. Both this option and '--minOverlap' need to be satisfied for read assignment." /> - - <param name="frac_overlap_feature" - type="integer" - value="0" - min="0" - max="1" - argument="--fracOverlapFeature" - label="Minimum fraction (of feature) overlapping a read" - help="Specify the minimum required fraction of bases included in a feature overlapping bases between a read (or a read-pair). Value should be within range [0,1]. 0 by default." /> - - <param name="read_extension_5p" - type="integer" - value="0" - argument="--readExtension5" - label="Read 5' extension" - help="Reads are extended upstream by ... bases from their 5' end" /> - - <param name="read_extension_3p" - type="integer" - value="0" - argument="--readExtension3" - label="Read 3' extension" - help="Reads are extended upstream by ... bases from their 3' end" /> - - <param name="read_reduction" - type="select" - label="Reduce read to single position" - argument="--read2pos" - help="The read is reduced to its 5' most base or 3'most base. Read summarization is then performed based on the single base the the read is reduced to."> - <option value="" selected="true">Leave the read as it is</option> - <option value="--read2pos 5">Reduce it to the 5' end</option> - <option value="--read2pos 3">Reduce it to the 3' end</option> - </param> - - <param name="primary" - type="boolean" - truevalue=" --primary" - falsevalue="" - argument="--primary" - label="Only count primary alignments" - help="If specified, only primary alignments will be counted. Primary and secondary alignments are identified using bit 0x100 in theFlag field of SAM/BAM files. All primary alignments in a dataset will be counted regardless of whether they are from multi-mapping reads or not ('-M' is ignored)." /> - - <param name="ignore_dup" - type="boolean" - truevalue=" --ignoreDup" - falsevalue="" - argument="--ignoreDup" - label="Ignore reads marked as duplicate" - help="If specified, reads that were marked as duplicates will be ignored. Bit Ox400 in the FLAG field of a SAM/BAM file is used for identifying duplicate reads. In paired end data, the entire read pair will be ignored if at least one end is found to be a duplicate read." /> - - <param type="boolean" - truevalue="-R BAM" - falsevalue="" - argument="-R" - label="Annotates the alignment file with 'XS:Z:'-tags to described per read or read-pair the corresponding assigned feature(s)." - help="" /> - - <param name="count_split_alignments_only" - type="boolean" - truevalue=" --countSplitAlignmentsOnly" - falsevalue="" - argument="--countSplitAlignmentsOnly" - label="Ignore unspliced alignments" - help="If specified, only split alignments (CIGAR strings containing the letter `N') will be counted. All the other alignments will be ignored. An example of split alignments are exon-spanning reads in RNA-seq data." /> - </section> - </inputs> - <outputs> - <data format="tabular" - name="output_medium" - label="${tool.name} on ${on_string}: Counts (with length)"> - <filter>format == "tabdel_medium"</filter> - <actions> - <action name="column_names" type="metadata" default="Geneid,${alignment.element_identifier},Length" /> - </actions> - </data> - - <data format="bam" - name="output_bam" - label="${tool.name} on ${on_string}: Alignment file"> - <filter>extended_parameters['R']</filter> - </data> - - <data format="tabular" - name="output_short" - label="${tool.name} on ${on_string}: Counts"> - <filter>format == "tabdel_short"</filter> - <actions> - <action name="column_names" type="metadata" default="Geneid,${alignment.element_identifier}" /> - </actions> - </data> - - <data format="tabular" - name="output_full" - label="${tool.name} on ${on_string}: Counts (with location)"> - <filter>format == "tabdel_full"</filter> - <actions> - <action name="column_names" type="metadata" default="Geneid,Chr,Start,End,Strand,Length,${alignment.element_identifier}" /> - </actions> - </data> - - <data format="tabular" - name="output_summary" - label="${tool.name} on ${on_string}: Summary"> - <actions> - <action name="column_names" type="metadata" default="Status,${alignment.element_identifier}" /> - </actions> - </data> - - <data format="tabular" - name="output_feature_lengths" - label="${tool.name} on ${on_string}: Feature lengths"> - <filter>include_feature_length_file</filter> - <actions> - <action name="column_names" type="metadata" default="Feature,Length" /> - </actions> - </data> - - <data name="output_jcounts" format="tabular" - label="${tool.name} on ${on_string}: Junction counts"> - <filter>extended_parameters['exon_exon_junction_read_counting_enabled']['count_exon_exon_junction_reads']</filter> - <actions> - <action name="column_names" type="metadata" - default="PrimaryGene,SecondaryGene,Site1_chr,Site1_location,Site1_strand,Site2_chr,Site2_location,Site2_strand,${alignment.element_identifier}" /> - </actions> - </data> - </outputs> - <tests> - <test expect_num_outputs="3"> - <param name="alignment" value="featureCounts_input1.bam" ftype="bam" dbkey="hg38" /> - <param name="anno_select" value="history"/> - <param name="reference_gene_sets" value="featureCounts_guide.gff" ftype="gff" dbkey="hg38" /> - <param name="format" value="tabdel_medium" /> - <param name="include_feature_length_file" value="true"/> - <output name="output_medium" file="output_1_medium.tab"> - <metadata name="column_names" value="Geneid,featureCounts_input1.bam,Length"/> - </output> - <output name="output_summary" file="output_1_summary.tab"> - <metadata name="column_names" value="Status,featureCounts_input1.bam"/> - </output> - </test> - <test expect_num_outputs="3"> - <param name="alignment" value="featureCounts_input1.bam" ftype="bam" dbkey="hg38" /> - <param name="anno_select" value="history"/> - <param name="reference_gene_sets" value="featureCounts_guide.gff" ftype="gff" dbkey="hg38" /> - <param name="format" value="tabdel_full" /> - <param name="include_feature_length_file" value="true"/> - <output name="output_full" file="output_1_full.tab"> - <metadata name="column_names" value="Geneid,Chr,Start,End,Strand,Length,featureCounts_input1.bam"/> - </output> - <output name="output_summary" file="output_1_summary.tab"> - <metadata name="column_names" value="Status,featureCounts_input1.bam"/> - </output> - <output name="output_feature_lengths" file="output_feature_lengths.tab"> - <metadata name="column_names" value="Feature,Length"/> - </output> - </test> - <test expect_num_outputs="4"> - <param name="alignment" value="featureCounts_input1.bam" ftype="bam" dbkey="hg38" /> - <param name="anno_select" value="history"/> - <param name="reference_gene_sets" value="featureCounts_guide.gff" ftype="gff" dbkey="hg38" /> - <param name="format" value="tabdel_short" /> - <param name="include_feature_length_file" value="true"/> - <param name="count_exon_exon_junction_reads" value="-J"/> - <output name="output_short" file="output_1_short.tab"> - <metadata name="column_names" value="Geneid,featureCounts_input1.bam"/> - </output> - <output name="output_summary" file="output_1_summary.tab"> - <metadata name="column_names" value="Status,featureCounts_input1.bam"/> - </output> - <output name="output_jcounts" file="output_1_jcounts.tab"> - <metadata name="column_names" value="PrimaryGene,SecondaryGene,Site1_chr,Site1_location,Site1_strand,Site2_chr,Site2_location,Site2_strand,featureCounts_input1.bam"/> - </output> - </test> - <!-- Ensure featureCounts built-in annotation works --> - <test expect_num_outputs="3"> - <param name="alignment" value="pairend_strandspecific_51mer_hg19_chr1_1-100000.bam" ftype="bam" dbkey="hg19" /> - <param name="anno_select" value="builtin"/> - <param name="format" value="tabdel_short" /> - <section name="extended_parameters"> - <param name="R" value="true" /> - </section> - <output name="output_short" file="output_builtin_hg19.tab"> - <metadata name="column_names" value="Geneid,pairend_strandspecific_51mer_hg19_chr1_1-100000.bam"/> - </output> - <output name="output_summary" file="output_summary_builtin_hg19.tab"/> - <output name="output_bam" file="output.bam" ftype="bam"/> - </test> - <!-- Ensure cached GTFs work --> - <test expect_num_outputs="3"> - <param name="alignment" value="featureCounts_input1.bam" ftype="bam" dbkey="hg38" /> - <param name="anno_select" value="cached"/> - <param name="format" value="tabdel_medium" /> - <param name="include_feature_length_file" value="true"/> - <output name="output_medium" file="output_1_medium.tab"> - <metadata name="column_names" value="Geneid,featureCounts_input1.bam,Length"/> - </output> - <output name="output_summary" file="output_1_summary.tab"> - <metadata name="column_names" value="Status,featureCounts_input1.bam"/> - </output> - </test> - <!-- Ensure BAM output works --> - <test> - <param name="alignment" value="subset.sorted.bam" ftype="bam" /> - <param name="anno_select" value="history" /> - <param name="reference_gene_sets" value="small.gtf" ftype="gtf" /> - <section name="extended_parameters" > - <param name="R" value="true" /> - </section> - <output name="output_bam" value="subset.sorted.featurecounts.bam" compare="sim_size"/> - </test> - </tests> - - <help><![CDATA[ -featureCounts -############# - -Overview --------- -FeatureCounts is a light-weight read counting program written entirely in the C programming language. It can be used to count both gDNA-seq and RNA-seq reads for genomic features in in SAM/BAM files. FeatureCounts is part of the Subread_ package. - -Input formats -------------- -Alignments should be provided in either: - - - SAM format, http://samtools.sourceforge.net/samtools.shtml#5 - - BAM format - -Annotations for gene regions should be provided in the GFF/GTF format: - - - http://genome.ucsc.edu/FAQ/FAQformat.html#format3 - - http://www.ensembl.org/info/website/upload/gff.html - -Alternatively, the featureCounts built-in annotations for genomes hg38, hg19, mm10 and mm9 can be used through selecting the built-in option above. These annotation files are in simplified annotation format (SAF) as shown below. The GeneID column contains Entrez gene identifiers and each entry (row) is taken as a feature (e.g. an exon). - -Example - **Built-in annotation format**: - - ====== ==== ======= ======= ====== - GeneID Chr Start End Strand - ====== ==== ======= ======= ====== - 497097 chr1 3204563 3207049 - - 497097 chr1 3411783 3411982 - - 497097 chr1 3660633 3661579 - - ====== ==== ======= ======= ====== - -These annotation files can be found in the `Subread package`_. You can see the version of Subread used by this wrapper in the tool form above under `Options > Requirements`. To create the files, the annotations were downloaded from NCBI RefSeq database and then adapted by merging overlapping exons from the same gene to form a set of disjoint exons for each gene. Genes with the same Entrez gene identifiers were also merged into one gene. See the `Subread User's Guide`_ for more information. Gene names can be obtained for these Entrez identifiers with the Galaxy **annotateMyIDs** tool. - -Output format -------------- -FeatureCounts produces a table containing counted reads, per gene, per row. Optionally the last column can be set to be the effective gene-length. These tables are compatible with the DESeq2, edgeR and limma-voom Galaxy wrappers by IUC. - -.. _Subread: http://subread.sourceforge.net/ -.. _`Subread User's Guide`: http://bioinf.wehi.edu.au/subread-package/SubreadUsersGuide.pdf -.. _`Subread package`: https://sourceforge.net/projects/subread/files/ - ]]></help> - <citations> - <citation type="doi">10.1093/bioinformatics/btt656</citation> - </citations> -</tool>