annotate src/breadcrumbs/docs/Tutorial-BreadCrumbs.md @ 32:041787cd0d31 draft default tip

Modified from StringIO import StringIO ## for Python 2 to from io import StringIO ## for Python 3
author george-weingart
date Wed, 23 Jun 2021 20:52:58 +0000
parents d589875b8125
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
1 # BreadCrumbs Tutorial #
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
2
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
3 This is a brief tutorial to get you acquainted with the scripts provided in breadcrumbs. This tutorial is oragnized by script and task. Examples are given using files in the demo_input folder which is included in the BreadCrumbs package. Each of these commands should work from the command line in the breadcrumbs directory.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
4
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
5 Please note all of the following calls expect you to be in the breadcrumbs directory and to have both the ./breadcrumbs/src and ./breadcrumbs/scripts in your path and or python path.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
6
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
7 Enjoy and happy research!
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
8
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
9 ## Contents: ##
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
10 1. scriptPCoA
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
11 2. scriptManipulateTable.py
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
12 I. Manipulating the measurements
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
13 II. Filtering
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
14 III. Filtering with knowledge of feature hierarchical relationship
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
15 IV. Manipulate samples by metadata
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
16 V. Manipulate the feature names
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
17 3. scriptPlotFeature.py
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
18 4. scriptBiplotTSV.R
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
19 5. scriptConvertBetweenBIOMAndPCL.py
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
20
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
21 ## scriptPCoA.py ##
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
22 This script allows one to plot a PCoA of an abundance table. In the plot each sample is one marker. The marker shape and color is determined by a metadata (of your choice). The distances between each sample is determined by a specific beta-diversity distance metric. By default Bray-curtis distance is used. This can be changed as needed. You will notice for every call you must give it the sample id (-i) and the last metadata which should be the row before your first data (-l). This helps the scripts understand what is a data measurement and what is a metadata.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
23
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
24 A. How do I make a PCoA of an abundance table, painting (coloring) it by a specific metadata?
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
25
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
26 > scripts/scriptPcoa.py -i TID -l STSite -p STSite demo_input/Test.pcl
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
27
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
28 B. How do I make a series of PCoAs of an abundance table, one PCoA for every metadata?
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
29
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
30 If nothing is specified with -p then all metadata are painted. Note there are a max of 9 shapes to use, a metadata will be skipped if it has more than 9 levels (specific values which can be used many times). Don't worry, the script will let you know if this happens and will just skip to the next metadata.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
31
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
32 > scripts/scriptPcoa.py -i TID -l STSite demo_input/Test.pcl
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
33
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
34 C. How do I use a different beta-diversity distance metric instead of Bray-curtis distance?
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
35 The following metrics can be choosen: braycurtis, canberra, chebyshev, cityblock, correlation, cosine, euclidean, hamming, sqeuclidean, unifrac_unweighted, unifrac_weighted
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
36
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
37 > scripts/scriptPcoa.py -i TID -l STSite -m sqeuclidean demo_input/Test.pcl
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
38
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
39 D. How do I get the coordinates of the points in the PCoA plot? Use -C and give a file path to which to write.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
40
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
41 > scripts/scriptPcoa.py -i TID -l STSite -C coordinates.txt demo_input/Test.pcl
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
42
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
43 E. How do I get the distance matrix represented by the PCoA plot? Use -D and give a file path to which to write.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
44
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
45 > scripts/scriptPcoa.py -i TID -l STSite -D distances.txt demo_input/Test.pcl
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
46
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
47 F. How do I make a PCoA using unifrac type metrics.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
48
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
49 > scripts/scriptPcoa.py -m unifrac_weighted -t demo_input/GreenGenesCore-May09.ref.tre -e demo_input/fastunifrac_Ley_et_al_NRM_2_sample_id_map.txt -c demo_input/fastunifrac_Ley_et_al_NRM_2_sample_id_map-colors.txt
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
50 > scripts/scriptPcoa.py -m unifrac_unweighted -t demo_input/GreenGenesCore-May09.ref.tre -e demo_input/fastunifrac_Ley_et_al_NRM_2_sample_id_map.txt -c demo_input/fastunifrac_Ley_et_al_NRM_2_sample_id_map-colors.txt
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
51
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
52 There already exists a collection of functionality surrounding unifrac distances in Qiime and related software. We support these metrics here for completeness, if your need is not met here, please look into Qiime and related software for a solutions with a more rich collection of functionality.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
53
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
54 ## scriptManipulateTable.py ##
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
55 Abundance tables can be difficult to manipulate. This script captures frequent tasks that may be important to manipulating an abundance table including normalization, summing, filtering, stratifying the tables into subsets (for instance breaking up a large HMP table into tables, one for each body site), and other functionality. You will notice for every call you must give it the sample id (-i) and the last metadata which should be the row before your first data (-l). This helps the scripts understand what is a data measurement and what is a metadata.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
56
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
57 _Remember you can do multiple tasks or use multiple arguments at the same time._
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
58 _Here is an example of summing, normalizing, adding on clade prefixes, and stratifies the tables based on the STSite metadata_
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
59
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
60 > scripts/scriptManipulateTable.py -i TID -l STSite -s -n -x -y STSite demo_input/Test.pcl
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
61
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
62 Please look at the detailed description of normalization and summation for a clear understanding of how the data is being manipulated.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
63
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
64 *Manipulating the measurements*
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
65 A. How do I sum a table based on clade names?
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
66
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
67 > scripts/scriptManipulateTable.py -i TID -l STSite -s demo_input/Test.pcl
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
68
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
69 B. How do I normalize a table?
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
70
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
71 > scripts/scriptManipulateTable.py -i TID -l STSite -n demo_input/Test.pcl
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
72
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
73 *Filtering*
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
74 C. How do I filter a normalized table by percentage?
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
75
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
76 This filters out bugs that are not in the top 0.95 percentage of at least 0.05 percent of the samples (a good default).
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
77
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
78 > scripts/scriptManipulateTable.py -i TID -l STSite -P 0.95,0.05 demo_input/Test.pcl
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
79
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
80 D. How do I filter a normalized table by a minimum abundance?
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
81
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
82 This filters out bugs that do not have at least a certain number of bugs in a certain number of samples. Here we show
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
83 the call to filter out all bugs which do not have at least 3 samples with at least 0.0001 abundance (a good initial default).
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
84
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
85 > scripts/scriptManipulateTable.py -i TID -l STSite -A 0.0001,3 demo_input/Test.pcl
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
86
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
87 E. How do I filter a count table by count occurrence?
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
88
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
89 This removes samples that do not have at least 5 counts in at least 3 samples (an initial default to use could be 2,2).
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
90
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
91 > scripts/scriptManipulateTable.py -i TID -l STSite -O 5,3 demo_input/Test.pcl
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
92
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
93 F. How do I filter a table by standard deviation?
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
94
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
95 > scripts/scriptManipulateTable.py -i TID -l STSite -D 1 demo_input/Test.pcl
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
96
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
97 *Filtering with knowledge of feature hierarchical relationship.*
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
98 F. How do I make the table have only terminal nodes?
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
99
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
100 > scripts/scriptManipulateTable.py -i TID -l STSite -t demo_input/Test.pcl
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
101
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
102 G. How do I remove all the OTUs from a table?
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
103
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
104 > scripts/scriptManipulateTable.py -i TID -l STSite -u demo_input/Test.pcl
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
105
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
106 H. How do I reduce all bugs more specific than a certain clade? Aka, how do I reset a table to be only a clade (genus) or higher?
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
107
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
108 This reduces the bugs to bugs with 3 levels of hierarchy or less (class on a standard biological taxonomy).
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
109
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
110 > scripts/scriptManipulateTable.py -i TID -l STSite -c 3 demo_input/Test.pcl
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
111
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
112 You may want to hierarchically sum all of you bugs before reducing the table to a certain level, just in case you are missing some.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
113
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
114 > scripts/scriptManipulateTable.py -i TID -l STSite -s -c 3 demo_input/Test.pcl
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
115
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
116 Detail. OTUs or taxonomic clades are terminal nodes of a dendrogram representing the full taxonomy or phylogeny of a study. Biology may happen at these terminal clades or at higher level clades. Hierarchical summation uses the name of the bug (containing the consensus lineage) to add bugs together at different levels of their ancestral state and represent additional higher level clades or bigger groupings of bugs.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
117
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
118 More plainly, imagine if we have 2 bugs in a sample with 5 and 10 counts. These two bugs differ as species but share the rest of their ancestry. In this case, an additional bug is added for the genus level group.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
119
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
120 k__kingdom1|p__phylum2|c__class1|o__order1|f__family1|g__genus1|s__species1 5
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
121 k__kingdom1|p__phylum2|c__class1|o__order1|f__family1|g__genus1|s__species2 10
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
122 add
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
123 k__kingdom1|p__phylum2|c__class1|o__order1|f__family1|g__genus1 15
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
124
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
125 A new kingdom, phylum, class, order, and family is not entered because they would be the same grouping of counts as the new genus level entry.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
126
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
127 If we had an additional bug
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
128 k__kingdom1|p__phylum2|c__class1|o__order2|f__family12|g__genus23|s__species14 2
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
129 k__kingdom1|p__phylum2|c__class1|o__order1|f__family1|g__genus1|s__species1 5
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
130 k__kingdom1|p__phylum2|c__class1|o__order1|f__family1|g__genus1|s__species2 10
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
131 add
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
132 k__kingdom1|p__phylum2|c__class1 17
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
133 k__kingdom1|p__phylum2|c__class1|o__order1|f__family1|g__genus1 15
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
134
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
135 Two new bugs are added because o__order1 and o__order2 can be combined at the c__class1 grouping and s__species1 and s__species2 can be combined at the g__genus1 level. Other groupings at other clade levels are not made because they represent the same groupings of counts already accounted for in the data by bugs and would be redundant. For instance, having a k__kingdom 17 count entry would be the same grouping as the c_class1 bug that was added and so is not created and added.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
136
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
137 *How do I reduce the table to a list of bugs?*
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
138
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
139 > scripts/scriptManipulateTable.py -i TID -l STSite -b features.txt demo_input/Test.pcl
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
140 > scripts/scriptManipulateTable.py -i TID -l STSite -b 'Bacteria|3417,Bacteria|unclassified|4904' demo_input/Test.pcl
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
141
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
142 IV. Manipulate samples by metadata.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
143 J. How do I stratify the table to subtables based on a metadata? (Example. How do I take the HMP table and break it up by body site or time point?)
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
144
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
145 > scripts/scriptManipulateTable.py -i TID -l STSite -y STSite demo_input/Test.pcl
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
146
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
147 K. How do I remove all samples of a certain metadata value? (Example, How do I remove all gut HMP body site samples but leave the rest in the table?)
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
148
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
149 > scripts/scriptManipulateTable.py -i TID -l STSite -r STSite,R_Retroauricular_crease, L_Retroauricular_crease demo_input/Test.pcl
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
150
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
151 V. Manipulate the feature names
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
152 L. How do I add on the 'k__' and 's__' on the names of my bugs?
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
153
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
154 > scripts/scriptManipulateTable.py -i TID -l STSite -x demo_input/Test.pcl
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
155
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
156 ## scriptPlotFeature.py ##
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
157 This script allows you to plot a row of an abundance table, metadata or data. This assumes the first column is the id and the remaining columns are values to be plotted. Three different plots can be generated based on the input arguments and the type of data given to the script. A boxplot is made if two features are given, one numeric and one categorical. A scatterplot is made if two numeric features are given. A histogram is made if one numeric feature is given.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
158
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
159 A. How do I plot a box plot of two data.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
160 A box plot requires two features, one not categorical and one that is categorical. The script detects this automatically and will plot the correct plot for you as you go.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
161
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
162 > scripts/scriptPlotFeature.py demo_input/Test.pcl STSite 'Bacteria|3417'
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
163
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
164 B. How do I plot a scatter plot of two data?
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
165 A box plot requires two features, both not categorical. The script detects this automatically and will plot the correct plot for you as you go.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
166
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
167 > scripts/scriptPlotFeature.py demo_input/Test.pcl 'Bacteria|unclassified|4904' 'Bacteria|3417'
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
168
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
169 C. How do I plot a histogram of a feature?
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
170 Just plot one numeric feature.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
171
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
172 > scripts/scriptPlotFeature.py demo_input/Test.pcl 'Bacteria|3417'
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
173
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
174 D. How do I change the title or axes?
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
175
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
176 > scripts/scriptPlotFeature.py -t Title -x Xaxis -y Yaxis demo_input/Test.pcl 'Bacteria|3417'
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
177
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
178 E. How do I change the color? Use -c and a hex color.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
179
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
180 > scripts/scriptPlotFeature.py -c '#333333' demo_input/Test.pcl 'Bacteria|3417'
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
181
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
182 F. How do I invert the colors for a black background? Use -r .
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
183
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
184 > scripts/scriptPlotFeature.py -r demo_input/Test.pcl 'Bacteria|3417'
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
185
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
186 ## scriptBiplotTSV.R ##
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
187 This script allows one to plot a tsv file as a biplot. A tsv file is a transposed PCL file (demo files are found in demo_input). The positioning of sample markers and bug text are generated by nonmetric multidimensional scaling. The metadata are represented by arrows and then a text at the head of the arrow. The coordinates of the arrows are determined by the center/average of the coordinates of the samples with that metadata showing a central tendency of where that metadata is located. More specifically, discontinuous metadata are broken down to levels (values), then each level is made into it's own binary metadata (0 for not having that value and 1 for having that value). Then for each new metadata, samples with the value of 1 are selected and have their coordinates in the ordination are averaged. This average coordinate set is then used as the coordinates for that metadata level. For continuous data, using the ordination coordinates for all the sample points, the value of the continuous metadata is placed in a landscape using the sample coordiantes as x and y and the z as the metadata value. This is then smoothed with a lowess and then the maximum fitted value's coordinates are used as the central tendency of the metadata.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
188
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
189 This is a customizable plot there the metadata plotted, the bugs plotted, the arrow color, the bug and metadata text color, the sample colors, the sample shapes, and the title can be changed. Below are examples of how to use the commandline for this script; although options are shown seperately here, many options can be used together.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
190
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
191 A. This is the minimal call. The call to the script must include the following positional arguments lastmetadata and inputPCLFile after any optional arguments (given with flags preceeded by - or --).
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
192
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
193 > ./scripts/scriptBiplotTSV.R STSite demo_input/Test-Biplot.tsv
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
194
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
195 This consists of the script call, the last metadata value, and the input file.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
196
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
197 B. How do I specify an output file name? Use -o and then a name ending with the extension .pdf .
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
198
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
199 > ./scripts/scriptBiplotTSV.R -o Test2Biplot.pdf STSite demo_input/Test-Biplot.tsv
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
200
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
201 C. How do I specify bug names to plot? Use -b and the names of the bugs to plot (as written in your pcl file) seperated by commas.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
202
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
203 > ./scripts/scriptBiplotTSV.R -b 'Bacteria|3417' STSite demo_input/Test-Biplot.tsv
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
204 > ./scripts/scriptBiplotTSV.R -b 'Bacteria|3417,Bacteria|unclassified|4904' STSite demo_input/Test-Biplot.tsv
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
205
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
206 D. How do I specify metadata to plot? Using the -m option, if plotting a continuous metadata, give the id of the metadata (first column entry of metadata), for any other metadata concatonate the metadata ID and the value of interest with "_" . Here are a working examples.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
207
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
208 > ./scripts/scriptBiplotTSV.R -m 'STSite_L_Antecubital_fossa,STSite_R_Antecubital_fossa' STSite demo_input/Test-Biplot.tsv
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
209 > ./scripts/scriptBiplotTSV.R -m 'Continuous' STSite demo_input/Test-Biplot.tsv
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
210
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
211 E. How do I specify a title? Use -i and the title text. Use ' to surround the text. This helps the command line understand that the text is together and not a series of commands. These should be used when you are giving a flag a value that has spaces or anything but alphanumeric characters.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
212
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
213 > ./scripts/scriptBiplotTSV.R -i 'Test Title' STSite demo_input/Test-Biplot.tsv
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
214
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
215 F. How do I specify a metadata to shape markers by? Use -y and the id for your metadata in your pcl file (the entry for your metadata in the first column).
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
216
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
217 > ./scripts/scriptBiplotTSV.R -y STSite STSite demo_input/Test-Biplot.tsv
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
218
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
219 G. How do I specify specific shapes to use? This requires the combination of -y and -s . Use -y to specify the metadata to use. Use -s to specify what shapes should be used for what metadata values. These are given as metadatavalue:shape,metadataValue:shape
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
220
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
221 > ./scripts/scriptBiplotTSV.R -y STSite -s 'L_Antecubital_fossa:15,R_Antecubital_fossa:23' STSite demo_input/Test-Biplot.tsv
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
222
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
223 H. How do I specify a metadata to color markers by? Use -c and the id for your metadata in your pcl file (the entry for your metadata in the first column).
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
224
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
225 > ./scripts/scriptBiplotTSV.R -c STSite STSite demo_input/Test-Biplot.tsv
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
226
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
227 I. How do I specify a default marker shape to use instead of using a metadata? Use -d and a number recognized by R's pch parameter (number between 1-25). For more information http://www.statmethods.net/advgraphs/parameters.html
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
228
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
229 > ./scripts/scriptBiplotTSV.R -d 1 STSite demo_input/Test-Biplot.tsv
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
230
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
231 J. How do I specify a color range to use when coloring? Use -r with two (R supported) colors seperated by a comma. R supported colors can be found in many sources including this one http://www.stats4stem.org/r-colors.html
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
232
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
233 > ./scripts/scriptBiplotTSV.R -r 'red,cyan' STSite demo_input/Test-Biplot.tsv
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
234
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
235 K. How do I specify a color to use when drawing arrows? Use -a and a (R supported) color. R supported colors can be found in many sources including this one http://www.stats4stem.org/r-colors.html
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
236
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
237 > ./scripts/scriptBiplotTSV.R -a red STSite demo_input/Test-Biplot.tsv
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
238
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
239 L. How do I specify a color to use for arrow text? Use -w and a (R supported) color. R supported colors can be found in many sources including this one http://www.stats4stem.org/r-colors.html
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
240
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
241 > ./scripts/scriptBiplotTSV.R -w orange STSite demo_input/Test-Biplot.tsv
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
242
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
243 M. How do I specify a color to use for bug text? Use -t and a (R supported) color. Make sure to use -b to plot bugs. R supported colors can be found in many sources including this one http://www.stats4stem.org/r-colors.html
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
244
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
245 > ./scripts/scriptBiplotTSV.R -t pink -b 'Bacteria|3417' STSite demo_input/Test-Biplot.tsv
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
246
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
247 N. How do I rotate the projection (plot) in reference to a specific metadata? Use the -e option and give the plot a metadata and a weight for that metadata, the larger the weight, the more the rotation takes into account the metadata. You may have to experiement with different weights and see how the rotation is affected. The weights can be from 0 (no rotation by the metadata) to a very large number. The metadata name should be the metadata id if the value is continuous or the metadata id and the value (level) of interest seperated by a _ if the metadata is not continuous. Below are two examples, one using a continuous metadata and one using a discontinuous metadata.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
248
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
249 > ./scripts/scriptBiplotTSV.R -e 'Continuous,2' STSite demo_input/Test-Biplot.tsv
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
250 > ./scripts/scriptBiplotTSV.R -e 'STSite_L_Antecubital_fossa,.5' STSite demo_input/Test-Biplot.tsv
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
251
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
252 O. How do I color NAs a specific color, no matter other coloring in the plot? Use -n and a color supported by R. R supported colors can be found in many sources including this one http://www.stats4stem.org/r-colors.html This requires you to be coloring the plot by a metadata (option -c).
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
253
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
254 > ./scripts/scriptBiplotTSV.R -n grey -c STSite STSite demo_input/Test-BiplotNA.tsv
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
255
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
256 P. How do I scale arrows in the plot. Use -z and a number to weight how much the metadata influences the rotation (number between 0 and very large).
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
257
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
258 > ./scripts/scriptBiplotTSV.R -z 2 STSite demo_input/Test-Biplot.tsv
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
259
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
260 Q. How do I plot metadata labels without the arrows?
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
261
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
262 > ./scripts/scriptBiplotTSV.R -A STSite demo_input/Test-Biplot.tsv
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
263
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
264 R. How do I plot the biplot without metadata?
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
265
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
266 > ./scripts/scriptBiplotTSV.R -m "" STSite demo_input/Test-Biplot.tsv
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
267
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
268 ## scriptConvertBetweenBIOMAndPCL.py ##
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
269 The script allows one to convert between PCL and BIOM file formats. ID, last feature (row) metadata, and last sample metadata are optional information in the script call (when converting from PCL to BIOM). These are used to dictate placement of certain key sample metadata in the PCL file. Typically, it is helpful to set these arguments. This aids in the consistent and reliable manipulation of these files. If the are not given, a guess will be made to the ID and it will be assumed no metadata exist.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
270
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
271 A quick definition:
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
272 *ID or sample id* - typically your first row in the PCL file (the Ids of all your samples) in the example below "ID"
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
273 *Feature (row) metadata* - columns in your PCL file which describe your features. These come after your feature IDs but before your measurements.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
274 *Sample metadata* - rows in your PCL file which come before your measurements and describe your samples
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
275
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
276 For a description of a PCL and it's parts please look in the docs folder for PCL-Description.txt
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
277
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
278 A. The minimal call to convert from BIOM file to a PCL file or visa versa. This call indicates the sample metadata entry which is the sample id and which is the last listed metadata in a pcl file (before the data measurements). When converting a PCL file, if there are no metadata and only a metadata id, -l and -i is not required. If there are multiple metadata in a pcl file the -l (last metadata) field is required. Neither of these fields are required for biom file conversion to pcl.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
279
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
280 > ./scripts/scriptConvertBetweenBIOMAndPCL.py demo_input/Test_no_metadata.pcl example1.biom
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
281 > ./scripts/scriptConvertBetweenBIOMAndPCL.py demo_input/Test.biom example2.pcl
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
282 > ./scripts/scriptConvertBetweenBIOMAndPCL.py -l STSite demo_input/Test.pcl example3.biom
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
283
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
284 B. Specifying ID and lastmetadata
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
285
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
286 > ./scripts/scriptConvertBetweenBIOMAndPCL.py -i TID -l STSite demo_input/Test.pcl example4.biom
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
287 > ./scripts/scriptConvertBetweenBIOMAndPCL.py -i TID -l STSite demo_input/Test.biom example5.pcl
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
288
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
289 C. The case where there are no sample metadata, just sample IDs. Indicate the ID and if no last metadata is indicated (-l) it is assumed no sample metadata exist.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
290
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
291 > ./scripts/scriptConvertBetweenBIOMAndPCL.py -i ID demo_input/Test_no_metadata.pcl example6.biom
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
292 > ./scripts/scriptConvertBetweenBIOMAndPCL.py -i ID demo_input/Test_no_metadata.biom example7.pcl
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
293
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
294 D. The case when converting a PCL file with Feature (row) metadata (for example taxonomy_5). Include the last column with feature metadata.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
295
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
296 > ./scripts/scriptConvertBetweenBIOMAndPCL.py -i ID -r taxonomy_5 -l STSite ./demo_input/testFeatureMetadata.pcl testFeatureMetadata.biom
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
297
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
298 E. Although the output file name can be automatically generated, the output file name can be given if needed.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
299
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
300 > ./scripts/scriptConvertBetweenBIOMAndPCL.py -i TID -l STSite demo_input/Test.biom CustomFileName.pcl
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
301 > ./scripts/scriptConvertBetweenBIOMAndPCL.py -i TID -l STSite demo_input/Test.pcl CustomFileName.biom
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
302
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
303 F. Indicate the use of a pcl file using a delimiter that is not tab or indicate the creation of a pcl file using a delimier that is not tab.
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
304
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
305 > ./scripts/scriptConvertBetweenBIOMAndPCL.py -i TID -l STSite -f , demo_input/Test-comma.pcl
d589875b8125 First version of micropita in this repository
george-weingart
parents:
diff changeset
306 > ./scripts/scriptConvertBetweenBIOMAndPCL.py -i TID -l STSite -f , demo_input/Test-comma.biom