Mercurial > repos > oinizan > frogs

<?xml version="1.0"?>
<!--
# Copyright (C) 2022 INRAE
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
-->
<tool id="FROGSFUNC_step2_functions" name="FROGSFUNC_2_functions" version= "@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="20.05">
    <description>Calculates functions abundances in each sample. </description>

  <macros>
        <import>macros.xml</import>
  </macros>

  <expand macro="requirements_frogsfunc" />

  <command detect_errors="exit_code">
        <![CDATA[
        frogsfunc_functions.py
                @CPUS@
                --input-biom $input_biom
                --input-fasta $input_fasta
                --input-tree $input_tree
                --input-marker $input_marker
                --marker-type $category.value

                #if $category.value == "16S"
                    --functions $functions
                #end if

                #if $category.value != "16S"
                    --input-function-table $functions.fields.traits
                #end if

                --max-nsti $max_nsti
                --min-blast-ident $min_blast_ident
                --min-blast-cov $min_blast_cov
                --hsp-method $hsp_method
                --output-biom $output_biom
                --output-fasta $output_fasta
                --output-function-abund "frogsfunc_functions_unstrat.tsv"
                --output-otu-norm $output_otu_norm
                --output-weighted $output_weighted
                --output-excluded $output_excluded
                --summary $summary_file
        ]]>
    </command>
    <inputs>
        <!-- Input files -->
        <param argument="--input-biom" format="biom1" type="data" label="Biom file" help="The abundance file i.e. FROGSFUNC_1_placeseqs_copynumber tool output file (frogsfunc_placeseqs.biom)." optional="false"/>
       	<param argument="--input-fasta" format="fasta" type="data" label="Sequence file" help="The fasta file i.e. from FROGSFUNC_1_placeseqs_copynumber tool output file (frogsfunc_placeseqs.fasta)." optional="false"/>
        <param argument="--input-tree" format="nhx" type="data" label="Tree file" help="The file contains the tree information from FROGSFUNC_1_placeseqs_copynumber tool (frogsfunc_placeseqs_tree.nwk)." optional="false"/>
        <param argument="--input-marker" format="tsv" type="data" label="Marker file" help="Table of predicted marker copy number i.e. FROGSFUNC_1_placeseqs_copynumber output (frogsfunc_marker.tsv)." optional="false"/>

        <!-- Parameters-->
	    <param name="category" type="select" label="Taxonomic marker" help="Taxonomic marker of interest." multiple="false" display="radio">
            <options from_data_table="frogs_picrust2_functions">
                <column name='name' index='0' />
                <column name='value' index='0' />
                <filter type="unique_value" column='0'/>
                    <validator type="no_options" message="A built-in database is not available" />
            </options>
		</param>
		<param argument="--functions" type="select" label="Target function database" multiple="true" optional="false" help=" 16S : at least 'EC' or/and 'KO' should be chosen (EC for Metacyc pathway analysis or/and KO for KEGG pathway analysis) - others values are optionnal. ITS and 18S : 'EC' only available." >
			<options from_data_table="frogs_picrust2_functions">
				<column name='name' index='1' />
				<column name='value' index='1' />
				<column name='path' index='2' />
				<column name='traits' index='3' />
                <filter type="param_value" ref="category" column="0" />
 		<validator type="expression" message="'EC' is the default database used by PICRUSt2. 'EC' or 'KO' must be at least selected. Other tables are optionnal">"EC" in value or "KO" in value</validator>
            </options>
        </param>
        <param argument="--max-nsti" type="float" label="NSTI cut-off" help="Any sequence with an NSTI above this threshold will be out. (default: 2)" value="2" min="0" optional="false" />
        <param argument="--min-blast-ident" type="float" label="Identity alignment cut-off" help="Percentage identity of the alignment between the input sequence and the PICRUSt2 reference sequence. Below this threshold, all sequences will be discarded. (default: None)" value="0" min="0" max="1" optional="true" />
        <param argument="--min-blast-cov" type="float" label="Coverage alignment cut-off" help="Coverage identity of the alignment between the input sequence and the PICRUSt2 reference sequence. Below this threshold, all sequences will be discarded.  (default: None)" value="0" min="0" max="1" optional="true" />
		<param argument="--hsp-method" type="select" label="HSP method" help="Hidden-state prediction method to use: maximum parsimony (mp), empirical probabilities (emp_prob), continuous traits prediction using subtree averaging (subtree_average), continuous traits prediction with phylogentic independent contrast (pic), continuous traits reconstruction using squared-change parsimony (scp) (default: mp)." multiple="false" display="radio">
            <option value="mp">mp</option>
            <option value="emp_prob">emp_prob</option>
            <option value="pic">pic</option>
            <option value="scp">scp</option>
            <option value="subtree_average">subtree_average</option>
		</param>
    </inputs>

    <outputs>
        <data format="html" name="summary_file" label="${tool.name}: report.html" from_work_dir="report.html"/>
		<data format="biom1" name="output_biom" label="${tool.name}: frogsfunc_functions_asv_abundances.biom" from_work_dir="frogsfunc_functions.biom"/>
		<data format="fasta" name="output_fasta" label="${tool.name}: frogsfunc_functions_asv.fasta" from_work_dir="frogsfunc_functions.fasta"/>

        <data format="tsv" name="output_otu_norm" label="${tool.name}: frogsfunc_functions_marker_norm.tsv" from_work_dir="frogsfunc_functions_marker_norm.tsv.tsv"/>
        <data format="tsv" name="output_weighted" label="${tool.name}: frogsfunc_functions_weighted_nsti.tsv" from_work_dir="frogsfunc_functions_weighted_nsti.tsv"/>
        <data format="tsv" name="output_excluded" label="${tool.name}: frogsfunc_functions_asv_excluded.tsv" from_work_dir="frogsfunc_functions_excluded.tsv"/>
        <data format="tsv" name="output_copy_ec_abund" label="${tool.name}: EC_copynumbers_predicted.tsv" from_work_dir="EC_copynumbers_predicted.tsv">
            <filter>"EC" in functions</filter>
        </data>
        <data format="tsv" name="output_copy_ko_abund" label="${tool.name}: KO_copynumbers_predicted.tsv" from_work_dir="KO_copynumbers_predicted.tsv">
            <filter>"KO" in functions</filter>
        </data>
        <data format="tsv" name="output_copy_cog_abund" label="${tool.name}: COG_copynumbers_predicted.tsv" from_work_dir="COG_copynumbers_predicted.tsv">
            <filter>"COG" in functions</filter>
        </data>
        <data format="tsv" name="output_copy_pfam_abund" label="${tool.name}: PFAM_copynumbers_predicted.tsv" from_work_dir="PFAM_copynumbers_predicted.tsv">
            <filter>"PFAM" in functions</filter>
        </data>
        <data format="tsv" name="output_copy_tigrfam_abund" label="${tool.name}: TIGRFAM_copynumbers_predicted.tsv" from_work_dir="TIGRFAM_copynumbers_predicted.tsv">
            <filter>"TIGRFAM" in functions</filter>
        </data>
        <data format="tsv" name="output_copy_pheno_abund" label="${tool.name}: PHENO_copynumbers_predicted.tsv" from_work_dir="PHENO_copynumbers_predicted.tsv">
            <filter>"PHENO" in functions</filter>
        </data>
        <data format="tsv" name="output_function_ec_abund" label="${tool.name}:  frogsfunc_functions_unstrat_EC.tsv" from_work_dir="frogsfunc_functions_unstrat_EC.tsv">
            <filter>"EC" in functions</filter>
        </data>
        <data format="tsv" name="output_function_ko_abund" label="${tool.name}:  frogsfunc_functions_unstrat_KO.tsv" from_work_dir="frogsfunc_functions_unstrat_KO.tsv">
            <filter>"KO" in functions</filter>
        </data>
        <data format="tsv" name="output_function_cog_abund" label="${tool.name}:  frogsfunc_functions_unstrat_COG.tsv" from_work_dir="frogsfunc_functions_unstrat_COG.tsv">
            <filter>"COG" in functions</filter>
        </data>
        <data format="tsv" name="output_function_pfam_abund" label="${tool.name}:  frogsfunc_functions_unstrat_PFAM.tsv" from_work_dir="frogsfunc_functions_unstrat_PFAM.tsv">
            <filter>"PFAM" in functions</filter>
        </data>
        <data format="tsv" name="output_function_tigrfam_abund" label="${tool.name}:  frogsfunc_functions_unstrat_TIGRFAM.tsv" from_work_dir="frogsfunc_functions_unstrat_TIGRFAM.tsv">
            <filter>"TIGRFAM" in functions</filter>
        </data>
        <data format="tsv" name="output_function_pheno_abund" label="${tool.name}:  frogsfunc_functions_unstrat_PHENO.tsv" from_work_dir="frogsfunc_functions_unstrat_PHENO.tsv">
            <filter>"PHENO" in functions</filter>
        </data>
    </outputs>

    <tests>
        <test expect_num_outputs="8">
            <param name="input_fasta" value="references/25-frogsfunc_placeseqs.fasta" />
            <param name="input_biom" value="references/25-frogsfunc_placeseqs.biom" />
            <param name="input_tree" value="references/25-frogsfunc_placeseqs_tree.nwk" />
            <param name="input_marker" value="references/25-frogsfunc_copynumbers_marker.tsv" />
            <param name="category" value="16S" />
            <param name="functions" value="EC" />
            <param name="max_nsti" value="2" />

            <output name="output_function_ec_abund" file="references/27-frogsfunc_functions_unstrat.tsv" compare="diff" lines_diff="0" />
            <output name="output_otu_norm" file="references/27-frogsfunc_functions_marker_norm.tsv" compare="diff" lines_diff="0" />
            <output name="output_weighted" file="references/27-frogsfunc_functions_weighted_nsti.tsv" compare="diff" lines_diff="0" />
            <output name="summary_file" file="references/27-frogsfunc_functions_report.html" compare="diff" lines_diff="0" />
            <output name="output_excluded" file="references/27-frogsfunc_functions_excluded.txt" compare="diff" lines_diff="0" />
        </test>
    </tests>

     <help>

@HELP_LOGO@

.. class:: infomark page-header h2

What it does

FROGSFUNC_2_functions is the second step of PICRUSt2. It ables to predicts :
	(i) Functional abundances based solely on the sequences of marker genes with PICRUSt2. The available marker genes are 16S, ITS and 18S.

	(ii) Functions, weighted by the relative abundance of ASVs in the community. Inferring the metagenomes of the communities with `PICRUSt2 &lt;https://github.com/picrust/picrust2&gt;`_.


There are three steps performed at this stage:
	(i) It runs hidden-state prediction (hsp) to predict function abundances with castor-R of each ASVs placed in the PICRUSt2 reference phylogenetic tree (FROGSFUNC_1_placeseqs_copynumber outputs).

    (ii) The read depth per ASV is divided by the predicted marker (16S/ITS/18S) copy numbers. This is performed to help control for variation in marker copy numbers across organisms, which can result in interpretation issues. For instance, imagine an organism with five identical copies of the 16S gene that is at the same absolute abundance as an organism with one 16S gene. The ASV corresponding to the first organism would erroneously be inferred to be at higher relative abundance simply because this organism had more copies of the 16S gene.

    (iii) The ASV read depths per sample (after normalizing by marker (16S/ITS/18S) copy number) are multiplied by the predicted function copy numbers per ASV.


.. class:: infomark page-header h2

Inputs/Outputs


.. class:: h3

Inputs


**-Biom file-**:

The ASVs biom file from FROGSFUNC_1_placeseqs_copynumber tool (format `biom1 &lt;http://biom-format.org/documentation/format_versions/biom-1.0.html&gt;`_). (FROGSFUNC_1_placeseqs_copynumber.biom from FROGSFUNC_1_placeseqs_copynumber)

**-Sequence file-**:

The sequence file of inserted ASVs into PICRUST2 reference tree from (frogsfunc_placesesqs.fasta from FROGSFUNC_1_placeseqs_copynumber step).

**-Tree file (format newick nwk)-**:

The file contains the tree informations from FROGSFUNC_1_placeseqs_copynumber step (FROGSFUNC_1_placeseqs_copynumber output : FROGSFUNC_1_placeseqs_copynumber_tree.nwk)

**-Marker file-**:

Output table of predicted marker gene copy numbers per sequence. (frogsfunc_marker.tsv from FROGSFUNC_1_placeseqs_copynumber step)

.. class:: h3

Parameters


**-Taxonomic marker-**:

Marker gene to be analyzed from the previous FROGSFUNC_1_placeseqs_copynumber step (frogsfunc_marker.tsv from FROGSFUNC_1_placeseqs_copynumber).

**-Target function database-**:

Which default pre-calculated count table to use ?
 - For 16S rRNA gene you can choose between: 'EC', 'KO', 'PFAM', 'COG', 'TIGRFAM', and/or 'PHENO'. You must select at least 'EC' or 'KO' because for next FROGSFUNC tools, the information from Metacyc (EC) or KEGG (KO) are requiered.
 - For ITS and 18S markers, 'EC' is only available.

For more informations about the different databases:

 - EC : https://enzyme.expasy.org/
 - KO : https://www.genome.jp/kegg/ko.html
 - PFAM : http://pfam.xfam.org/
 - COG : https://www.ncbi.nlm.nih.gov/research/cog-project/
 - TIGRFAM : https://tigrfams.jcvi.org/cgi-bin/index.cgi
 - PHENO : https://phenodb.org/

**-NSTI cut-off-**:

 Nearest Sequenced Taxon Index (`NSTI &lt;https://www.nature.com/articles/nbt.2676&gt;`_) is the phylogenetic distance between the ASV and the nearest sequenced reference genome. This metric can be used to identify ASVs that are highly distant from all reference sequences (the predictions for these sequences are less reliable!). The higher the NSTI score, the less the affiliations are relevant. Any ASVs with a NSTI value higher than 2 are typically either from uncharacterized phyla or off-target sequences.

**-Identity alignment cut-off-**:

 All sequences with a identity percentage of alignment against the PICRUSt2 closest reference sequence is lower than this value will be excluded (between 0 and 1).

**-Coverage alignment cut-off-**:

 All sequences with a coverage percentage of alignment against the PICRUSt2 closest reference sequence is lower than this value will be excluded (between 0 and 1).

**-HSP method-**:

 Hidden-state prediction method to use.


.. class:: h3

Outputs

**-Fasta file-**:

 Sequence file without excluded ASVs (NSTI, blast perc identity or blast perc coverage thresholds). (FROGSFUNC_2_functions.fasta)

**-ASV abundance Biom file-**:

 ASV abundance data i a biom file without excluded ASVs (NSTI, %identity or %coverage thresholds alignment). (FROGSFUNC_2_functions.biom)

**-Function abundance file-**:

 It is the function abundance predictions of metagenome, per sample. (frogsfunc_functions_unstrat_DATABASENAME.tsv, for exemple: FROGSFUNC_2_functions_unstrat_EC.tsv)

Table column description:
 - classification: the hierarchy classification of the gene function.
 - db_link: the url on the link accession ID (*observation_name*) of the function.
 - observation_name: Accession identifier
 - observation_sum: Total abundance of functions across all samples.
 - last columns: Abundances of these functions in each samples.

**-ASV normalized abundance table-**:

 Table with normalized abundances per marker copy number from FROGSFUNC_1 step. (FROGSFUNC_2_functions_marker_norm.tsv)

**-Weighted NSTI file-**:

 Output file with the mean of NSTI value per sample. (FROGSFUNC_2_functions_weighted_nsti.tsv)

**-Excluded sequences file-**:

Information about removed sequences that have a NSTI value aboved the NSTI threshold chosen in this step:
 - ASV: ASV name id.
 - FROGS_taxonomy
 - PICRUSt2_taxonomy
 - exclusion_paramater: The paramater(s) that excluded the ASVs.
 - value_parameter: The values associated with the paramater(s).

**-Copy number marker file - one per chosen target function database (EC, KO, PFAM, COG, TIGRFAM,PHENO)-**:

Output table of predicted function copy numbers per ASV. There are as many tables as chosen target function database (EC, KO, PFAM, COG, TIGRFAM,PHENO) (exemple : FROGSFUNC_step3_functions: EC_copynumbers_predicted.tsv and FROGSFUNC_step3_functions: PHENO_copynumbers_predicted.tsv )

**-Report file-**: (report.html)

.. image:: FROGS_frogsfunc_functions_piechart.png
    :height: 375
    :width: 1014

ASVs are excluded if the associated NSTI is above the threshold, or if the alignment values are below the thresholds.


.. image:: FROGS_frogsfunc_functions_starplot.png
    :height: 466
    :width: 806

Number of different taxonomic ranks before (green) and after (orange) application of the filters.


.. image:: FROGS_frogsfunc_functions_table.png
    :height: 580
    :width: 1452


.. image:: FROGS_frogsfunc_functions_sunburst.png


Gene families/function from KEGG or Metacyc databases are classified according to 4 hierarchy levels. The graph shows the proportion of each level within the selected samples.


@HELP_CONTACT@

    </help>


    <expand macro="citations" />

</tool>
author	oinizan
date	Fri, 02 May 2025 07:44:22 +0000
parents	646bee69560f
children