# HG changeset patch # User george-weingart # Date 1409851448 14400 # Node ID 2c0d791fc95009402609ae86011b02f6acdcab94 # Parent cac6247cb1d3de830966c24b1123e8a98536dd5e Updated the help diff -r cac6247cb1d3 -r 2c0d791fc950 export2graphlan.xml --- a/export2graphlan.xml Tue Aug 26 14:51:29 2014 -0400 +++ b/export2graphlan.xml Thu Sep 04 13:24:08 2014 -0400 @@ -25,61 +25,146 @@ -**export2graphlan** is an automatic conversion script from **LEfSe**, **MetaPhlAn2**, and **HUMAnN** input and/or output files, to **GraPhlAn**. Input file can be also given in BIOM 2.0 format. - -The aim of this tool is to support biologists, helping them by automatically write the tree and the annotation file for **GraPhlAn**. - ----- - -.. contents:: - ----- - Overview ======== +**export2graphlan** is an *OPTIONAL* tool that automatically convert **LEfSe**, **MetaPhlAn2**, and **HUMAnN** input and/or output files, to **GraPhlAn**. Input file can be also given in BIOM (both 1 and 2) format. -A graphical representation of how **export2graphlan** can be used in the analysis pipeline: +The aim of this tool is to support biologists, helping them by provide the tree and the annotation file for GraPhlAn, automatically. + +Input files +----------- + +As shown in the image below, export2graphlan can work with just one of the following files or with both of them. + + * **Result of MetaPhlAn or HUMAnN analysis**: As depicted in the image below, this file can be the result of a MetaPhlAn analysis or a HUMAnN analysis. Generally, it is a tab separated file that have for each row a taxonomy and an abundance value. + + * **Output of LEfSe**: This file is the result of LEfSe execute on the *Result of MetaPhlAn or HUMAnN analysis* file. This file allow GraPhlAn to highlight for you the found biomarkers. + +Input parameters +---------------- + +# + --annotations ANNOTATIONS + List which levels should be annotated in the tree. Use + a comma separate values form, e.g., + --annotation_levels 1,2,3. Default is None + --external_annotations EXTERNAL_ANNOTATIONS + List which levels should use the external legend for + the annotation. Use a comma separate values form, + e.g., --annotation_levels 1,2,3. Default is None + --background_levels BACKGROUND_LEVELS + List which levels should be highlight with a shaded + background. Use a comma separate values form, e.g., + --background_levels 1,2,3 + --background_clades BACKGROUND_CLADES + Specify the clades that should be highlight with a + shaded background. Use a comma separate values form + and surround the string with " if it contains spaces. + Example: --background_clades "Bacteria.Actinobacteria, + Bacteria.Bacteroidetes.Bacteroidia, + Bacteria.Firmicutes.Clostridia.Clostridiales" + --background_colors BACKGROUND_COLORS + Set the color to use for the shaded background. Colors + can be either in RGB or HSV (using a semi-colon to + separate values, surrounded with ()) format. Use a + comma separate values form and surround the string + with " if it contains spaces. Example: + --background_colors "#29cc36, (150; 100; 100), (280; + 80; 88)" + --title TITLE If specified set the title of the GraPhlAn plot. + Surround the string with " if it contains spaces, + e.g., --title "Title example" + --title_font_size TITLE_FONT_SIZE + Set the title font size. Default is 15 + --def_clade_size DEF_CLADE_SIZE + Set a default size for clades that are not found as + biomarkers by LEfSe. Default is 10 + --min_clade_size MIN_CLADE_SIZE + Set the minimum value of clades that are biomarkers. + Default is 20 + --max_clade_size MAX_CLADE_SIZE + Set the maximum value of clades that are biomarkers. + Default is 200 + --def_font_size DEF_FONT_SIZE + Set a default font size. Default is 10 + --min_font_size MIN_FONT_SIZE + Set the minimum font size to use. Default is 8 + --max_font_size MAX_FONT_SIZE + Set the maximum font size. Default is 12 + --annotation_legend_font_size ANNOTATION_LEGEND_FONT_SIZE + Set the font size for the annotation legend. Default + is 10 + --abundance_threshold ABUNDANCE_THRESHOLD + Set the minimun abundace value for a clade to be + annotated. Default is 20.0 + --most_abundant MOST_ABUNDANT + When only lefse_input is provided, you can specify how + many clades highlight. Since the biomarkers are + missing, they will be chosen from the most abundant + --least_biomarkers LEAST_BIOMARKERS + When only lefse_input is provided, you can specify the + minimum number of biomarkers to extract. The taxonomy + is parsed, and the level is choosen in order to have + at least the specified number of biomarkers + --discard_otus If specified the OTU ids will be discarde from the + taxonmy. Default behavior keep OTU ids in taxonomy + --internal_levels If specified sum-up from leaf to root the abundances + values. Default behavior do not sum-up abundances on + the internal nodes + + input parameters: + You need to provide at least one of the two arguments + -i LEFSE_INPUT, --lefse_input LEFSE_INPUT + LEfSe input data + -o LEFSE_OUTPUT, --lefse_output LEFSE_OUTPUT + LEfSe output result data + + output parameters: + -t TREE, --tree TREE Output filename where save the input tree for GraPhlAn + -a ANNOTATION, --annotation ANNOTATION + Output filename where save GraPhlAn annotation + + Input data matrix parameters: + --sep SEP + --out_table OUT_TABLE + Write processed data matrix to file + --fname_row FNAME_ROW + row number containing the names of the features + [default 0, specify -1 if no names are present in the + matrix + --sname_row SNAME_ROW + column number containing the names of the samples + [default 0, specify -1 if no names are present in the + matrix + --metadata_rows METADATA_ROWS + Row numbers to use as metadata[default None, meaning + no metadata + --skip_rows SKIP_ROWS + Row numbers to skip (0-indexed, comma separated) from + the input file[default None, meaning no rows skipped + --sperc SPERC Percentile of sample value distribution for sample + selection + --fperc FPERC Percentile of feature value distribution for sample + selection + --stop STOP Number of top samples to select (ordering based on + percentile specified by --sperc) + --ftop FTOP Number of top features to select (ordering based on + percentile specified by --fperc) + --def_na DEF_NA Set the default value for missing values [default None + which means no replacement] + +Integration +=========== + +A graphical representation of how **export2graphlan** can be integrated in the analysis pipeline: .. image:: https://bitbucket.org/repo/oL6bEG/images/3364692296-graphlan_integration.png - :height: 500 - :width: 600 - - ----- - -HMP aerobiosis -============== - -A taxonomic tree that shows three different classes of oxygen (low, medium, and high), highlighting biomarker clades for each class. Data are taken from the HMP project. - -.. image:: https://bitbucket.org/repo/oL6bEG/images/2487320460-hmp_aerobiosis.png - :height: 500 - :width: 600 - - -Pipeline --------- + :height: 672 + :width: 800 - - # download the data - $ wget http://huttenhower.sph.harvard.edu/webfm_send/129 -O hmp_aerobiosis_small.txt - - # convert the file in LEfSe format, and run LEfSe - $ format_input.py hmp_aerobiosis_small.txt hmp_aerobiosis_small.in -c 1 -s 2 -u 3 -o 1000000 - $ run_lefse.py hmp_aerobiosis_small.in hmp_aerobiosis_small.res - - # convert it! - $ export2graphlan.py -i hmp_aerobiosis_small.txt -o hmp_aerobiosis_small.res -t tree.txt -a annot.txt --title "HMP aerobiosis" --annotations 2,3 --external_annotations 4,5,6 --fname_row 0 --skip_rows 1,2 --ftop 200 +Want to know more? +================== - # attach annotation to the tree - $ graphlan_annotate.py --annot annot.txt tree.txt outtree.txt - - # generate the beautiful image - $ graphlan.py --dpi 300 --size 7.0 outtree.txt outimg.png --external_legends - -The input file is downloaded from The Huttenhower Lab and given to **LEfSe** for biomarkers discovery. The two file (the *LEfSe input* and the *LEfSe output* files) are then passed to **export2graphlan**. In this case the levels 2 (*phylum*) and 3 (*class*) are annotated on the circular tree, while levels 4 (*order*), 5 (*family*), and 6 (*genus*) are put on the external legend. - - - +If you want to know more about **export2graphlan** please have a look at the `tutorial `_.