Mercurial > repos > rjmw > dimspy_replicate_filter
view replicate_filter.xml @ 1:9fc2f24615a7 draft default tip
planemo upload for repository https://github.com/computational-metabolomics/dimspy-galaxy commit 42331bc61ea07d75f88007e5a2c65eaf9e811f06
author | rjmw |
---|---|
date | Wed, 30 May 2018 09:17:44 -0400 |
parents | 66453f08e258 |
children |
line wrap: on
line source
<tool id="dimspy_replicate_filter" name="Replicate Filter" version="1.0.0"> <description> - Remove peaks that fail to appear in at least x-out-of-n (technical) replicates</description> <macros> <import>macros.xml</import> </macros> <expand macro="requirements" /> <command detect_errors="exit_code"> <![CDATA[ dimspy replicate-filter --input '$hdf5_file_in' --output '$hdf5_file_out' #if $filelist --filelist '$filelist' #end if --ppm $ppm --replicates $replicates --min-peak-present $min_peaks #if $rsd_threshold --rsd-threshold $rsd_threshold #end if --report '$report' && dimspy create-sample-list --input '$hdf5_file_out' --output '$samplelist' --delimiter tab && dimspy hdf5-pls-to-txt --input '$hdf5_file_out' --output . --delimiter $delimiter ]]> </command> <inputs> <param name="hdf5_file_in" type="data" format="h5" label="Peaklists (HDF5 file)" help="Peaklists generated by Process Scans (SIM-Stitch)." argument="--hdf5_file_in"/> <param name="filelist" type="data" optional="true" format="tsv,tabular" label="Filelist / Samplelist" help="Only provide a filelist if you like to exclude Peaklists, update the metadata (e.g. classLabel), or if you have not provided a filelist for Process Scans." argument="--filelist" /> <param name="replicates" type="integer" value="3" label="Number of technical replicates for each sample" help="" argument="--replicates"/> <param name="min_peaks" type="integer" value="2" label="Minimum number of technical replicates a peak has to be present in" help="" argument="--min_peaks"/> <param name="ppm" type="float" value="2.0" label="Ppm error tolerance" help="Maximum tolerated m/z deviation across technical replicates in parts per million (ppm)." argument="--ppm"/> <param name="rsd_threshold" type="text" value="" label="Relative standard deviation threshold" help="Maximum tolerated relative standard deviation (RSD) of the peak intensities across technical replicates. Leave empty to skip this filter step." argument="--rsd-threshold"/> <param name="delimiter" type="hidden" value="tab" argument="--delimiter"/> </inputs> <outputs> <data name="hdf5_file_out" format="h5" label="${tool.name} on ${on_string}: Peaklists (HDF5 file)"/> <data name="report" format="txt" label="${tool.name} on ${on_string}: Report"/> <data name="samplelist" format="tsv" label="${tool.name} on ${on_string}: Sample Metadata (updated)" /> <collection name="peaklists_txt" type="list" label="${tool.name} on ${on_string}: Peaklists"> <discover_datasets pattern="(?P<designation>.+)\.txt" format="tsv" directory="." visible="false" /> </collection> </outputs> <tests> <test> <param name="hdf5_file_in" value="pls.h5" ftype="h5"/> <param name="replicates" value="3"/> <param name="min_peaks" value="2"/> <param name="ppm" value="2.0"/> <param name="rsd_threshold" value=""/> <output name="hdf5_file_out" value="pls_rf.h5" ftype="h5" compare="sim_size" /> <output name="report" value="report_pls_rf_01.txt" ftype="txt"/> <output name="samplelist" value="samplelist_1.txt" ftype="tsv"/> <output_collection name="peaklists_txt" type="list"> <element name="batch04_QC17_rep01_262_2_263_3_264" file="batch04_QC17_rep01_262_2_263_3_264.txt" ftype="tsv"/> </output_collection> </test> <test> <param name="hdf5_file_in" value="pls.h5" ftype="h5"/> <param name="filelist" value="filelist_mzml_triplicates.txt" ftype="tsv"/> <param name="replicates" value="3"/> <param name="min_peaks" value="2"/> <param name="ppm" value="2.0"/> <param name="rsd_threshold" value=""/> <output name="hdf5_file_out" value="pls_rf.h5" ftype="h5" compare="sim_size" /> <output name="report" value="report_pls_rf_02.txt" ftype="txt"/> <output name="samplelist" value="samplelist_2.txt" ftype="tsv"/> <output_collection name="peaklists_txt" type="list"> <element name="batch04_QC17_rep01_262_2_263_3_264" file="batch04_QC17_rep01_262_2_263_3_264.txt" ftype="tsv"/> </output_collection> </test> </tests> <help> ---------------- Replicate filter ---------------- | Description ----------- | This tools is typically applied following the 'process_scans' tool. | Under the DIMS analysis workflow, biological samples are often analysed as a set of technical replicates. This tool is used to combine the Peaklists for each of these technical replicates, in to a single Peaklist. | | In combining technical replicate Peaklists, peaks are clustered together where the difference in their m/z values (measured in parts-per-million, 'ppm'), is less-than or equal to the user-defined threshold. | | Peaks are removed from final Peaklist if: 1) they occur in fewer than the user-defined 'Number of technical replicates a peak has to be present in' and/or 2) the relative standard deviation (measured in % and also termed the coefficient of variation) among intensity values for a peak is greater than the user-defined value. Parameters ---------- **\1. Set of Peaklists (HDF5 file)** (REQUIRED) A set of Peaklists (HDF5 format). These files are automatically returned from the preceding 'process_scans' tool. **\2. Filelist / Samplelist** (OPTIONAL) | A tabular-formatted .txt file with columns: filename, replicate, batch, classLabel, injectionOrder. | Additional collumns are allowed but are not used during processing. | This file must be uploaded in to (or available from) the current history in order to allow for it to be selected from the drop-down menu. | **NOTE:** Only provide a filelist if you like to exclude Peaklists, update the metadata (e.g. classLabel), or if you have not provided a filelist for 'process scans'. | @example_filelist@ **\3. Number of technical replicates** (REQUIRED) The total number of technical replicates acquired for each sample (all samples must have the same number of technical replicates) **\4. Minimum number of technical replicates a peak has to be present in** (REQUIRED) A numerical value from 0 up to the numerical value entered in the 'Number of technical replicates' box. Peaks that occur in fewer than this number of technical replicates are removed from the output Peaklist. **\5. ppm error tolerance** (REQUIRED) A numerical value from 0 upwards. This values defines the tolerance applied when clustering peaks (based on their m/z value) across each of the technical replicates. **\6. Relative standard deviation threshold** (OPTIONAL) A numerical value from 0 upwards. Leave blank if you do not intend to apply a relative standard deviation threshold to your data. Output file(s) -------------- | A HDF5 file containing the replicate-filtered Peaklists | | A replicate-filtered Peaklist, in .tsv format, for each file specified in the filelist (Data Collection) - Tab-delimited text file containing a numeric data matrix, with . as decimal, and NA for missing values. - Includes additional information, such as the signal-to-noise ratio, relative-standard deviation (rsd) and 'purity' for each peak. An **updated** filelist similar as described above @github_developers_contributors@ </help> <expand macro="citations" /> </tool>