Mercurial > repos > devteam > fastq_stats

<tool id="fastq_stats" name="FASTQ Summary Statistics" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@">
    <description>by column</description>
    <macros>
        <import>macros.xml</import>
    </macros>
    <edam_topics>
        <edam_topic>topic_0622</edam_topic>
    </edam_topics>
    <edam_operations>
        <edam_operation>operation_3180</edam_operation>
    </edam_operations>
    <expand macro="requirements"/>
    <command><![CDATA[
gx-fastq-stats '$input_file' '$output_file' '${input_file.extension[len('fastq'):]}'
    ]]></command>
    <inputs>
        <param name="input_file" type="data" format="fastqsanger,fastqillumina,fastqsolexa,fastqcssanger,fastqsanger.gz,fastqillumina.gz,fastqsolexa.gz,fastqcssanger.gz,fastqsanger.bz2,fastqillumina.bz2,fastqsolexa.bz2,fastqcssanger.bz2" label="FASTQ File"/>
    </inputs>
    <outputs>
        <data name="output_file" format="tabular" />
    </outputs>
    <tests>
        <test>
            <param name="input_file" value="fastq_stats1.fastq" ftype="fastqsanger" />
            <output name="output_file" file="fastq_stats_1_out.tabular" />
        </test>
    </tests>
    <help><![CDATA[
**What is does**

This tool creates summary statistics on a FASTQ file.

.. class:: infomark

**TIP:** This statistics report can be used as input for the **Boxplot** tools.

-----

**The output file will contain the following fields:**

* column      = column number (1 to 36 for a 36-cycles read Solexa file)
* count       = number of bases found in this column.
* min         = Lowest quality score value found in this column.
* max         = Highest quality score value found in this column.
* sum         = Sum of quality score values for this column.
* mean        = Mean quality score value for this column.
* Q1          = 1st quartile quality score.
* med         = Median quality score.
* Q3          = 3rd quartile quality score.
* IQR         = Inter-Quartile range (Q3-Q1).
* lW          = 'Left-Whisker' value (for boxplotting).
* rW          = 'Right-Whisker' value (for boxplotting).
* outliers    = Scores falling beyond the left and right whiskers (comma separated list).
* A_Count     = Count of 'A' nucleotides found in this column.
* C_Count     = Count of 'C' nucleotides found in this column.
* G_Count     = Count of 'G' nucleotides found in this column.
* T_Count     = Count of 'T' nucleotides found in this column.
* N_Count     = Count of 'N' nucleotides found in this column.
* Other_Nucs  = Comma separated list of other nucleotides found in this column.
* Other_Count = Comma separated count of other nucleotides found in this column.

For example::

  #column   count   min max sum mean    Q1  med Q3  IQR lW  rW  outliers    A_Count C_Count G_Count T_Count N_Count other_bases other_base_count
  1   14336356    2   33  450600675   31.4306281875   32.0    33.0    33.0    1.0 31  33  2,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30    4482314 2199633 4425957 3208745 19707
  2   14336356    2   34  441135033   30.7703737965   30.0    33.0    33.0    3.0 26  34  2,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25   4419184 2170537 4627987 3118567 81
  3   14336356    2   34  433659182   30.2489127642   29.0    32.0    33.0    4.0 23  34  2,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22    4310988 2941988 3437467 3645784 129
  4   14336356    2   34  433635331   30.2472490917   29.0    32.0    33.0    4.0 23  34  2,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22    4110637 3007028 3671749 3546839 103
  5   14336356    2   34  432498583   30.167957813    29.0    32.0    33.0    4.0 23  34  2,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22    4348275 2935903 3293025 3759029 124

-----

.. class:: warningmark

Adapter bases in color space reads are excluded from statistics.
    ]]></help>
    <citations>
        <citation type="doi">10.1093/bioinformatics/btq281</citation>
    </citations>
</tool>
author	iuc
date	Fri, 04 Oct 2024 10:33:53 +0000
parents	1bbd777c6b87
children