Mercurial > repos > george-weingart > graphlan_import

--- a/export2graphlan.xml	Tue Aug 26 14:51:29 2014 -0400
+++ b/export2graphlan.xml	Thu Sep 04 13:24:08 2014 -0400
@@ -25,61 +25,146 @@
 	</outputs>

   <help>
-**export2graphlan** is an automatic conversion script from **LEfSe**, **MetaPhlAn2**, and **HUMAnN** input and/or output files, to **GraPhlAn**. Input file can be also given in BIOM 2.0 format.
-
-The aim of this tool is to support biologists, helping them by automatically write the tree and the annotation file for **GraPhlAn**.
-
-----
-
-.. contents::
-
-----
-
 Overview
 ========
+**export2graphlan** is an *OPTIONAL* tool that automatically convert **LEfSe**, **MetaPhlAn2**, and **HUMAnN** input and/or output files, to **GraPhlAn**. Input file can be also given in BIOM (both 1 and 2) format.

-A graphical representation of how **export2graphlan** can be used in the analysis pipeline:
+The aim of this tool is to support biologists, helping them by provide the tree and the annotation file for GraPhlAn, automatically.
+
+Input files
+-----------
+
+As shown in the image below, export2graphlan can work with just one of the following files or with both of them.
+
+ * **Result of MetaPhlAn or HUMAnN analysis**: As depicted in the image below, this file can be the result of a MetaPhlAn analysis or a HUMAnN analysis. Generally, it is a tab separated file that have for each row a taxonomy and an abundance value.
+
+ * **Output of LEfSe**: This file is the result of LEfSe execute on the *Result of MetaPhlAn or HUMAnN analysis* file. This file allow GraPhlAn to highlight for you the found biomarkers.
+
+Input parameters
+----------------
+
+#
+      --annotations ANNOTATIONS
+                        List which levels should be annotated in the tree. Use
+                        a comma separate values form, e.g.,
+                        --annotation_levels 1,2,3. Default is None
+      --external_annotations EXTERNAL_ANNOTATIONS
+                        List which levels should use the external legend for
+                        the annotation. Use a comma separate values form,
+                        e.g., --annotation_levels 1,2,3. Default is None
+    --background_levels BACKGROUND_LEVELS
+                        List which levels should be highlight with a shaded
+                        background. Use a comma separate values form, e.g.,
+                        --background_levels 1,2,3
+    --background_clades BACKGROUND_CLADES
+                        Specify the clades that should be highlight with a
+                        shaded background. Use a comma separate values form
+                        and surround the string with " if it contains spaces.
+                        Example: --background_clades "Bacteria.Actinobacteria,
+                        Bacteria.Bacteroidetes.Bacteroidia,
+                        Bacteria.Firmicutes.Clostridia.Clostridiales"
+    --background_colors BACKGROUND_COLORS
+                        Set the color to use for the shaded background. Colors
+                        can be either in RGB or HSV (using a semi-colon to
+                        separate values, surrounded with ()) format. Use a
+                        comma separate values form and surround the string
+                        with " if it contains spaces. Example:
+                        --background_colors "#29cc36, (150; 100; 100), (280;
+                        80; 88)"
+    --title TITLE         If specified set the title of the GraPhlAn plot.
+                        Surround the string with " if it contains spaces,
+                        e.g., --title "Title example"
+    --title_font_size TITLE_FONT_SIZE
+                        Set the title font size. Default is 15
+    --def_clade_size DEF_CLADE_SIZE
+                        Set a default size for clades that are not found as
+                        biomarkers by LEfSe. Default is 10
+    --min_clade_size MIN_CLADE_SIZE
+                        Set the minimum value of clades that are biomarkers.
+                        Default is 20
+    --max_clade_size MAX_CLADE_SIZE
+                        Set the maximum value of clades that are biomarkers.
+                        Default is 200
+    --def_font_size DEF_FONT_SIZE
+                        Set a default font size. Default is 10
+    --min_font_size MIN_FONT_SIZE
+                        Set the minimum font size to use. Default is 8
+    --max_font_size MAX_FONT_SIZE
+                        Set the maximum font size. Default is 12
+    --annotation_legend_font_size ANNOTATION_LEGEND_FONT_SIZE
+                        Set the font size for the annotation legend. Default
+                        is 10
+    --abundance_threshold ABUNDANCE_THRESHOLD
+                        Set the minimun abundace value for a clade to be
+                        annotated. Default is 20.0
+    --most_abundant MOST_ABUNDANT
+                        When only lefse_input is provided, you can specify how
+                        many clades highlight. Since the biomarkers are
+                        missing, they will be chosen from the most abundant
+    --least_biomarkers LEAST_BIOMARKERS
+                        When only lefse_input is provided, you can specify the
+                        minimum number of biomarkers to extract. The taxonomy
+                        is parsed, and the level is choosen in order to have
+                        at least the specified number of biomarkers
+    --discard_otus        If specified the OTU ids will be discarde from the
+                        taxonmy. Default behavior keep OTU ids in taxonomy
+    --internal_levels     If specified sum-up from leaf to root the abundances
+                        values. Default behavior do not sum-up abundances on
+                        the internal nodes
+
+    input parameters:
+    You need to provide at least one of the two arguments
+    -i LEFSE_INPUT, --lefse_input LEFSE_INPUT
+                        LEfSe input data
+    -o LEFSE_OUTPUT, --lefse_output LEFSE_OUTPUT
+                        LEfSe output result data
+
+    output parameters:
+    -t TREE, --tree TREE  Output filename where save the input tree for GraPhlAn
+    -a ANNOTATION, --annotation ANNOTATION
+                        Output filename where save GraPhlAn annotation
+
+    Input data matrix parameters:
+    --sep SEP
+    --out_table OUT_TABLE
+                        Write processed data matrix to file
+    --fname_row FNAME_ROW
+                        row number containing the names of the features
+                        [default 0, specify -1 if no names are present in the
+                        matrix
+    --sname_row SNAME_ROW
+                        column number containing the names of the samples
+                        [default 0, specify -1 if no names are present in the
+                        matrix
+    --metadata_rows METADATA_ROWS
+                        Row numbers to use as metadata[default None, meaning
+                        no metadata
+    --skip_rows SKIP_ROWS
+                        Row numbers to skip (0-indexed, comma separated) from
+                        the input file[default None, meaning no rows skipped
+    --sperc SPERC         Percentile of sample value distribution for sample
+                        selection
+    --fperc FPERC         Percentile of feature value distribution for sample
+                        selection
+    --stop STOP           Number of top samples to select (ordering based on
+                        percentile specified by --sperc)
+    --ftop FTOP           Number of top features to select (ordering based on
+                        percentile specified by --fperc)
+    --def_na DEF_NA       Set the default value for missing values [default None
+                        which means no replacement]
+
+Integration
+===========
+
+A graphical representation of how **export2graphlan** can be integrated in the analysis pipeline:

 .. image:: https://bitbucket.org/repo/oL6bEG/images/3364692296-graphlan_integration.png
-    :height: 500
-    :width: 600
-
-
-----
-
-HMP aerobiosis
-==============
-
-A taxonomic tree that shows three different classes of oxygen (low, medium, and high), highlighting biomarker clades for each class. Data are taken from the HMP project.
-
-.. image:: https://bitbucket.org/repo/oL6bEG/images/2487320460-hmp_aerobiosis.png
-    :height: 500
-    :width: 600
-
-
-Pipeline
---------
+    :height: 672
+    :width: 800

-
-    # download the data
-    $ wget http://huttenhower.sph.harvard.edu/webfm_send/129 -O hmp_aerobiosis_small.txt
-
-    # convert the file in LEfSe format, and run LEfSe
-    $ format_input.py hmp_aerobiosis_small.txt hmp_aerobiosis_small.in -c 1 -s 2 -u 3 -o 1000000
-    $ run_lefse.py hmp_aerobiosis_small.in hmp_aerobiosis_small.res
-
-    # convert it!
-    $ export2graphlan.py -i hmp_aerobiosis_small.txt -o hmp_aerobiosis_small.res -t tree.txt -a annot.txt --title "HMP aerobiosis" --annotations 2,3 --external_annotations 4,5,6 --fname_row 0 --skip_rows 1,2 --ftop 200
+Want to know more?
+==================

-    # attach annotation to the tree
-    $ graphlan_annotate.py --annot annot.txt tree.txt outtree.txt
-
-    # generate the beautiful image
-    $ graphlan.py --dpi 300 --size 7.0 outtree.txt outimg.png --external_legends
-
-The input file is downloaded from The Huttenhower Lab  and given to **LEfSe** for biomarkers discovery. The two file (the *LEfSe input* and the *LEfSe output* files) are then passed to **export2graphlan**. In this case the levels 2 (*phylum*) and 3 (*class*) are annotated on the circular tree, while levels 4 (*order*), 5 (*family*), and 6 (*genus*) are put on the external legend.
-
-
-
+If you want to know more about **export2graphlan** please have a look at the `tutorial <https://bitbucket.org/nsegata/graphlan/wiki/export2graphlan%20-%20tutorial>`_.
   </help>
 </tool>