Mercurial > repos > rjmw > dimspy_replicate_filter

<tool id="dimspy_replicate_filter" name="Replicate Filter" version="1.0.0">
    <description> - Remove peaks that fail to appear in at least x-out-of-n (technical) replicates</description>
    <macros>
        <import>macros.xml</import>
    </macros>
    <expand macro="requirements" />
    <command detect_errors="exit_code">
    <![CDATA[
        dimspy replicate-filter
        --input '$hdf5_file_in'
        --output '$hdf5_file_out'
        #if $filelist
            --filelist '$filelist'
        #end if
        --ppm $ppm
        --replicates $replicates
        --min-peak-present $min_peaks
        #if $rsd_threshold
            --rsd-threshold $rsd_threshold
        #end if
        --report '$report'
        &&
        dimspy create-sample-list
        --input '$hdf5_file_out'
        --output '$samplelist'
        --delimiter tab
        &&
        dimspy hdf5-pls-to-txt
        --input '$hdf5_file_out'
        --output .
        --delimiter $delimiter
    ]]>
    </command>
    <inputs>
        <param name="hdf5_file_in" type="data" format="h5" label="Peaklists (HDF5 file)" help="Peaklists generated by Process Scans (SIM-Stitch)." argument="--hdf5_file_in"/>
        <param name="filelist" type="data" optional="true" format="tsv,tabular" label="Filelist / Samplelist" help="Only provide a filelist if you like to exclude Peaklists, update the metadata (e.g. classLabel), or if you have not provided a filelist for Process Scans." argument="--filelist" />
        <param name="replicates" type="integer" value="3" label="Number of technical replicates for each sample" help="" argument="--replicates"/>
        <param name="min_peaks" type="integer" value="2" label="Minimum number of technical replicates a peak has to be present in" help="" argument="--min_peaks"/>
        <param name="ppm" type="float" value="2.0" label="Ppm error tolerance" help="Maximum tolerated m/z deviation across technical replicates in parts per million (ppm)." argument="--ppm"/>
        <param name="rsd_threshold" type="text" value="" label="Relative standard deviation threshold" help="Maximum tolerated relative standard deviation (RSD) of the peak intensities across technical replicates. Leave empty to skip this filter step." argument="--rsd-threshold"/>
        <param name="delimiter" type="hidden" value="tab" argument="--delimiter"/>
    </inputs>
    <outputs>
        <data name="hdf5_file_out" format="h5" label="${tool.name} on ${on_string}: Peaklists (HDF5 file)"/>
        <data name="report" format="txt" label="${tool.name} on ${on_string}: Report"/>
        <data name="samplelist" format="tsv" label="${tool.name} on ${on_string}: Sample Metadata (updated)" />
        <collection name="peaklists_txt" type="list" label="${tool.name} on ${on_string}: Peaklists">
            <discover_datasets pattern="(?P&lt;designation&gt;.+)\.txt" format="tsv" directory="." visible="false" />
        </collection>
    </outputs>
    <tests>
        <test>
            <param name="hdf5_file_in" value="pls.h5" ftype="h5"/>
            <param name="replicates" value="3"/>
            <param name="min_peaks" value="2"/>
            <param name="ppm" value="2.0"/>
            <param name="rsd_threshold" value=""/>
            <output name="hdf5_file_out" value="pls_rf.h5" ftype="h5" compare="sim_size" />
            <output name="report" value="report_pls_rf_01.txt" ftype="txt"/>
            <output name="samplelist" value="samplelist_1.txt" ftype="tsv"/>
            <output_collection name="peaklists_txt" type="list">
                <element name="batch04_QC17_rep01_262_2_263_3_264" file="batch04_QC17_rep01_262_2_263_3_264.txt" ftype="tsv"/>
            </output_collection>
        </test>
        <test>
            <param name="hdf5_file_in" value="pls.h5" ftype="h5"/>
            <param name="filelist" value="filelist_mzml_triplicates.txt" ftype="tsv"/>
            <param name="replicates" value="3"/>
            <param name="min_peaks" value="2"/>
            <param name="ppm" value="2.0"/>
            <param name="rsd_threshold" value=""/>
            <output name="hdf5_file_out" value="pls_rf.h5" ftype="h5" compare="sim_size" />
            <output name="report" value="report_pls_rf_02.txt" ftype="txt"/>
            <output name="samplelist" value="samplelist_2.txt" ftype="tsv"/>
            <output_collection name="peaklists_txt" type="list">
                <element name="batch04_QC17_rep01_262_2_263_3_264" file="batch04_QC17_rep01_262_2_263_3_264.txt" ftype="tsv"/>
            </output_collection>
        </test>
    </tests>
    <help>
----------------
Replicate filter
----------------

|

Description
-----------

| This tools is typically applied following the 'process_scans' tool.
| Under the DIMS analysis workflow, biological samples are often analysed as a set of technical replicates. This tool is used to combine the Peaklists for each of these technical replicates, in to a single Peaklist.
|
| In combining technical replicate Peaklists, peaks are clustered together where the difference in their m/z values (measured in parts-per-million, 'ppm'), is less-than or equal to the user-defined threshold.
|
| Peaks are removed from final Peaklist if:

1) they occur in fewer than the user-defined 'Number of technical replicates a peak has to be present in' and/or
2) the relative standard deviation (measured in % and also termed the coefficient of variation) among intensity values for a peak is greater than the user-defined value.

Parameters
----------

**\1. Set of Peaklists (HDF5 file)** (REQUIRED)

A set of Peaklists (HDF5 format). These files are automatically returned from the preceding 'process_scans' tool.

**\2. Filelist / Samplelist** (OPTIONAL)

| A tabular-formatted .txt file with columns: filename, replicate, batch, classLabel, injectionOrder.
| Additional collumns are allowed but are not used during processing.
| This file must be uploaded in to (or available from) the current history in order to allow for it to be selected from the drop-down menu.
| **NOTE:** Only provide a filelist if you like to exclude Peaklists, update the metadata (e.g. classLabel), or if you have not provided a filelist for 'process scans'.
|

@example_filelist@

**\3. Number of technical replicates** (REQUIRED)

The total number of technical replicates acquired for each sample (all samples must have the same number of technical replicates)

**\4. Minimum number of technical replicates a peak has to be present in** (REQUIRED)

A numerical value from 0 up to the numerical value entered in the 'Number of technical replicates' box. Peaks that occur in fewer than this number of technical replicates are removed from the output Peaklist.

**\5. ppm error tolerance** (REQUIRED)

A numerical value from 0 upwards. This values defines the tolerance applied when clustering peaks (based on their m/z value) across each of the technical replicates.

**\6. Relative standard deviation threshold** (OPTIONAL)

A numerical value from 0 upwards. Leave blank if you do not intend to apply a relative standard deviation threshold to your data.

Output file(s)
--------------

| A HDF5 file containing the replicate-filtered Peaklists
|
| A replicate-filtered Peaklist, in .tsv format, for each file specified in the filelist (Data Collection)

- Tab-delimited text file containing a numeric data matrix, with . as decimal, and NA for missing values.
- Includes additional information, such as the signal-to-noise ratio, relative-standard deviation (rsd) and 'purity' for each peak.

An **updated** filelist similar as described above

@github_developers_contributors@

    </help>
    <expand macro="citations" />
</tool>
author	rjmw
date	Wed, 30 May 2018 09:17:44 -0400
parents	66453f08e258
children