annotate format_input.py @ 20:a6284ef17bf3 draft default tip

Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
author george-weingart
date Tue, 07 Jul 2015 13:26:55 -0400
parents 47ac77f2fe68
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
1 #!/usr/bin/env python
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
2
20
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents: 0
diff changeset
3 import sys,os,argparse,pickle,re,numpy
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents: 0
diff changeset
4
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents: 0
diff changeset
5
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents: 0
diff changeset
6
0
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
7
20
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents: 0
diff changeset
8 #***************************************************************************************************************
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents: 0
diff changeset
9 #* Log of change *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents: 0
diff changeset
10 #* January 16, 2014 - George Weingart - george.weingart@gmail.com *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents: 0
diff changeset
11 #* *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents: 0
diff changeset
12 #* biom Support *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents: 0
diff changeset
13 #* Modified the program to enable it to accept biom files as input *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents: 0
diff changeset
14 #* *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents: 0
diff changeset
15 #* Added two optional input parameters: *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents: 0
diff changeset
16 #* 1. biom_c is the name of the biom metadata to be used as class *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents: 0
diff changeset
17 #* 2. biom_s is the name of the biom metadata to be used as subclass *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents: 0
diff changeset
18 #* class and subclass are used in the same context as the original *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents: 0
diff changeset
19 #* parameters class and subclass *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents: 0
diff changeset
20 #* These parameters are totally optional, the default is the program *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents: 0
diff changeset
21 #* chooses as class the first metadata received from the conversion *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents: 0
diff changeset
22 #* of the biom file into a sequential (pcl) file as generated by *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents: 0
diff changeset
23 #* breadcrumbs, and similarly, the second metadata is selected as *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents: 0
diff changeset
24 #* subclass. *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents: 0
diff changeset
25 #* The syntax or logic for the original non-biom case was NOT changed. *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents: 0
diff changeset
26 #* *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. "Bacteria" or "Archaea")
george-weingart
parents: 0
diff changeset
27 #* <******************* IMPORTANT NOTE *************************> *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
28 #* The biom case requires breadcrumbs and therefore there is a *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
29 #* a conditional import of the breadcrumbs modules *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
30 #* If the User uses a biom input and breadcrumbs is not detected, *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
31 #* the run is abnormally ended *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
32 #* breadcrumbs itself needs a biom environment, so if the immport *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
33 #* of biom in breadcrumbs fails, the run is also abnormally
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
34 #* ended (Only if the input file was biom) *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
35 #* *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
36 #* USAGE EXAMPLES *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
37 #* -------------- *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
38 #* Case #1: Using a sequential file as input (Old version - did not change *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
39 #* ./format_input.py hmp_aerobiosis_small.txt hmp_aerobiosis_small.in -c 1 -s 2 -u 3 -o 1000000 *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
40 #* Case #2: Using a biom file as input *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
41 #* ./format_input.py hmp_aerobiosis_small.biom hmp_aerobiosis_small.in -o 1000000 *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
42 #* Case #3: Using a biom file as input and override the class and subclass *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
43 #* ./format_input.py lefse.biom hmp_aerobiosis_small.in -biom_c oxygen_availability -biom_s body_site -o 1000000
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
44 #* *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
45 #***************************************************************************************************************
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
46
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
47 def read_input_file(inp_file, CommonArea):
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
48
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
49 if inp_file.endswith('.biom'): #* If the file format is biom:
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
50 CommonArea = biom_processing(inp_file) #* Process in biom format
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
51 return CommonArea #* And return the CommonArea
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
52
0
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
53 with open(inp_file) as inp:
20
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
54 CommonArea['ReturnedData'] = [[v.strip() for v in line.strip().split("\t")] for line in inp.readlines()]
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
55 return CommonArea
0
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
56
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
57 def transpose(data):
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
58 return zip(*data)
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
59
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
60 def read_params(args):
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
61 parser = argparse.ArgumentParser(description='LEfSe formatting modules')
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
62 parser.add_argument('input_file', metavar='INPUT_FILE', type=str, help="the input file, feature hierarchical level can be specified with | or . and those symbols must not be present for other reasons in the input file.")
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
63 parser.add_argument('output_file', metavar='OUTPUT_FILE', type=str,
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
64 help="the output file containing the data for LEfSe")
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
65 parser.add_argument('--output_table', type=str, required=False, default="",
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
66 help="the formatted table in txt format")
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
67 parser.add_argument('-f',dest="feats_dir", choices=["c","r"], type=str, default="r",
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
68 help="set whether the features are on rows (default) or on columns")
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
69 parser.add_argument('-c',dest="class", metavar="[1..n_feats]", type=int, default=1,
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
70 help="set which feature use as class (default 1)")
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
71 parser.add_argument('-s',dest="subclass", metavar="[1..n_feats]", type=int, default=None,
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
72 help="set which feature use as subclass (default -1 meaning no subclass)")
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
73 parser.add_argument('-o',dest="norm_v", metavar="float", type=float, default=-1.0,
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
74 help="set the normalization value (default -1.0 meaning no normalization)")
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
75 parser.add_argument('-u',dest="subject", metavar="[1..n_feats]", type=int, default=None,
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
76 help="set which feature use as subject (default -1 meaning no subject)")
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
77 parser.add_argument('-m',dest="missing_p", choices=["f","s"], type=str, default="d",
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
78 help="set the policy to adopt with missin values: f removes the features with missing values, s removes samples with missing values (default f)")
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
79 parser.add_argument('-n',dest="subcl_min_card", metavar="int", type=int, default=10,
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
80 help="set the minimum cardinality of each subclass (subclasses with low cardinalities will be grouped together, if the cardinality is still low, no pairwise comparison will be performed with them)")
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
81
20
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
82 parser.add_argument('-biom_c',dest="biom_class", type=str,
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
83 help="For biom input files: Set which feature use as class ")
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
84 parser.add_argument('-biom_s',dest="biom_subclass", type=str,
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
85 help="For biom input files: set which feature use as subclass ")
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
86
0
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
87 args = parser.parse_args()
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
88
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
89 return vars(args)
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
90
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
91 def remove_missing(data,roc):
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
92 if roc == "c": data = transpose(data)
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
93 max_len = max([len(r) for r in data])
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
94 to_rem = []
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
95 for i,r in enumerate(data):
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
96 if len([v for v in r if not( v == "" or v.isspace())]) < max_len: to_rem.append(i)
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
97 if len(to_rem):
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
98 for i in to_rem.reverse():
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
99 data.pop(i)
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
100 if roc == "c": return transpose(data)
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
101 return data
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
102
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
103
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
104 def sort_by_cl(data,n,c,s,u):
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
105 def sort_lines1(a,b):
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
106 return int(a[c] > b[c])*2-1
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
107 def sort_lines2u(a,b):
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
108 if a[c] != b[c]: return int(a[c] > b[c])*2-1
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
109 return int(a[u] > b[u])*2-1
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
110 def sort_lines2s(a,b):
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
111 if a[c] != b[c]: return int(a[c] > b[c])*2-1
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
112 return int(a[s] > b[s])*2-1
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
113 def sort_lines3(a,b):
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
114 if a[c] != b[c]: return int(a[c] > b[c])*2-1
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
115 if a[s] != b[s]: return int(a[s] > b[s])*2-1
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
116 return int(a[u] > b[u])*2-1
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
117 if n == 3: data.sort(sort_lines3)
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
118 if n == 2:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
119 if s is None:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
120 data.sort(sort_lines2u)
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
121 else:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
122 data.sort(sort_lines2s)
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
123 if n == 1: data.sort(sort_lines1)
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
124 return data
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
125
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
126 def group_small_subclasses(cls,min_subcl):
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
127 last = ""
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
128 n = 0
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
129 repl = []
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
130 dd = [list(cls['class']),list(cls['subclass'])]
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
131 for d in dd:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
132 if d[1] != last:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
133 if n < min_subcl and last != "":
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
134 repl.append(d[1])
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
135 last = d[1]
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
136 n = 1
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
137 for i,d in enumerate(dd):
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
138 if d[1] in repl: dd[i][1] = "other"
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
139 dd[i][1] = str(dd[i][0])+"_"+str(dd[i][1])
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
140 cls['class'] = dd[0]
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
141 cls['subclass'] = dd[1]
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
142 return cls
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
143
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
144 def get_class_slices(data):
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
145 previous_class = data[0][0]
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
146 previous_subclass = data[0][1]
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
147 subclass_slices = []
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
148 class_slices = []
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
149 last_cl = 0
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
150 last_subcl = 0
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
151 class_hierarchy = []
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
152 subcls = []
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
153 for i,d in enumerate(data):
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
154 if d[1] != previous_subclass:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
155 subclass_slices.append((previous_subclass,(last_subcl,i)))
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
156 last_subcl = i
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
157 subcls.append(previous_subclass)
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
158 if d[0] != previous_class:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
159 class_slices.append((previous_class,(last_cl,i)))
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
160 class_hierarchy.append((previous_class,subcls))
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
161 subcls = []
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
162 last_cl = i
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
163 previous_subclass = d[1]
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
164 previous_class = d[0]
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
165 subclass_slices.append((previous_subclass,(last_subcl,i+1)))
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
166 subcls.append(previous_subclass)
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
167 class_slices.append((previous_class,(last_cl,i+1)))
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
168 class_hierarchy.append((previous_class,subcls))
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
169 return dict(class_slices), dict(subclass_slices), dict(class_hierarchy)
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
170
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
171 def numerical_values(feats,norm):
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
172 mm = []
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
173 for k,v in feats.items():
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
174 feats[k] = [float(val) for val in v]
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
175 if norm < 0.0: return feats
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
176 tr = zip(*(feats.values()))
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
177 mul = []
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
178 fk = feats.keys()
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
179 hie = True if sum([k.count(".") for k in fk]) > len(fk) else False
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
180 for i in range(len(feats.values()[0])):
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
181 if hie: mul.append(sum([t for j,t in enumerate(tr[i]) if fk[j].count(".") < 1 ]))
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
182 else: mul.append(sum(tr[i]))
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
183 if hie and sum(mul) == 0:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
184 mul = []
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
185 for i in range(len(feats.values()[0])):
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
186 mul.append(sum(tr[i]))
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
187 for i,m in enumerate(mul):
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
188 if m == 0: mul[i] = 0.0
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
189 else: mul[i] = float(norm) / m
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
190 for k,v in feats.items():
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
191 feats[k] = [val*mul[i] for i,val in enumerate(v)]
20
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
192 if numpy.mean(feats[k]) and (numpy.std(feats[k])/numpy.mean(feats[k])) < 1e-10:
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
193 feats[k] = [ float(round(kv*1e6)/1e6) for kv in feats[k]]
0
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
194 return feats
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
195
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
196 def add_missing_levels2(ff):
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
197
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
198 if sum( [f.count(".") for f in ff] ) < 1: return ff
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
199
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
200 dn = {}
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
201
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
202 added = True
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
203 while added:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
204 added = False
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
205 for f in ff:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
206 lev = f.count(".")
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
207 if lev == 0: continue
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
208 if lev not in dn: dn[lev] = [f]
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
209 else: dn[lev].append(f)
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
210 for fn in sorted(dn,reverse=True):
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
211 for f in dn[fn]:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
212 fc = ".".join(f.split('.')[:-1])
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
213 if fc not in ff:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
214 ab_all = [ff[fg] for fg in ff if (fg.count(".") == 0 and fg == fc) or (fg.count(".") > 0 and fc == ".".join(fg.split('.')[:-1]))]
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
215 ab =[]
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
216 for l in [f for f in zip(*ab_all)]:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
217 ab.append(sum([float(ll) for ll in l]))
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
218 ff[fc] = ab
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
219 added = True
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
220 if added:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
221 break
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
222
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
223 return ff
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
224
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
225
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
226 def add_missing_levels(ff):
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
227 if sum( [f.count(".") for f in ff] ) < 1: return ff
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
228
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
229 clades2leaves = {}
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
230 for f in ff:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
231 fs = f.split(".")
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
232 if len(fs) < 2:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
233 continue
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
234 for l in range(len(fs)):
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
235 n = ".".join( fs[:l] )
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
236 if n in clades2leaves:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
237 clades2leaves[n].append( f )
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
238 else:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
239 clades2leaves[n] = [f]
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
240 for k,v in clades2leaves.items():
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
241 if k and k not in ff:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
242 ff[k] = [sum(a) for a in zip(*[[float(fn) for fn in ff[vv]] for vv in v])]
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
243 return ff
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
244
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
245
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
246
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
247 def modify_feature_names(fn):
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
248 ret = fn
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
249
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
250 for v in [' ',r'\$',r'\@',r'#',r'%',r'\^',r'\&',r'\*',r'\"',r'\'']:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
251 ret = [re.sub(v,"",f) for f in ret]
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
252 for v in ["/",r'\(',r'\)',r'-',r'\+',r'=',r'{',r'}',r'\[',r'\]',
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
253 r',',r'\.',r';',r':',r'\?',r'\<',r'\>',r'\.',r'\,']:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
254 ret = [re.sub(v,"_",f) for f in ret]
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
255 for v in ["\|"]:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
256 ret = [re.sub(v,".",f) for f in ret]
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
257
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
258 ret2 = []
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
259 for r in ret:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
260 if r[0] in ['0','1','2','3','4','5','6','7','8','9']:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
261 ret2.append("f_"+r)
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
262 else: ret2.append(r)
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
263
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
264 return ret2
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
265
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
266
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
267 def rename_same_subcl(cl,subcl):
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
268 toc = []
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
269 for sc in set(subcl):
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
270 if len(set([cl[i] for i in range(len(subcl)) if sc == subcl[i]])) > 1:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
271 toc.append(sc)
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
272 new_subcl = []
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
273 for i,sc in enumerate(subcl):
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
274 if sc in toc: new_subcl.append(cl[i]+"_"+sc)
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
275 else: new_subcl.append(sc)
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
276 return new_subcl
20
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
277
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
278
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
279 #*************************************************************************************
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
280 #* Modifications by George Weingart, Jan 15, 2014 *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
281 #* If the input file is biom: *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
282 #* a. Load an AbundanceTable (Using breadcrumbs) *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
283 #* b. Create a sequential file from the AbundanceTable (de-facto - pcl) *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
284 #* c. Use that file as input to the rest of the program *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
285 #* d. Calculate the c,s,and u parameters, either from the values the User entered *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
286 #* from the meta data values in the biom file or set up defaults *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
287 #* <<<------------- I M P O R T A N T N O T E ------------------->> *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
288 #* breadcrumbs src directory must be included in the PYTHONPATH *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
289 #* <<<------------- I M P O R T A N T N O T E ------------------->> *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
290 #*************************************************************************************
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
291 def biom_processing(inp_file):
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
292 CommonArea = dict() #* Set up a dictionary to return
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
293 CommonArea['abndData'] = AbundanceTable.funcMakeFromFile(inp_file, #* Create AbundanceTable from input biom file
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
294 cDelimiter = None,
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
295 sMetadataID = None,
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
296 sLastMetadataRow = None,
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
297 sLastMetadata = None,
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
298 strFormat = None)
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
299
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
300 #****************************************************************
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
301 #* Building the data element here *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
302 #****************************************************************
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
303 ResolvedData = list() #This is the Resolved data that will be returned
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
304 IDMetadataName = CommonArea['abndData'].funcGetIDMetadataName() #* ID Metadataname
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
305 IDMetadata = [CommonArea['abndData'].funcGetIDMetadataName()] #* The first Row
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
306 for IDMetadataEntry in CommonArea['abndData'].funcGetMetadataCopy()[IDMetadataName]: #* Loop on all the metadata values
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
307 IDMetadata.append(IDMetadataEntry)
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
308 ResolvedData.append(IDMetadata) #Add the IDMetadata with all its values to the resolved area
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
309 for key, value in CommonArea['abndData'].funcGetMetadataCopy().iteritems():
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
310 if key != IDMetadataName:
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
311 MetadataEntry = list() #* Set it up
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
312 MetadataEntry.append(key) #* And post it to the area
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
313 for x in value:
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
314 MetadataEntry.append(x) #* Append the metadata value name
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
315 ResolvedData.append(MetadataEntry)
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
316 for AbundanceDataEntry in CommonArea['abndData'].funcGetAbundanceCopy(): #* The Abundance Data
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
317 lstAbundanceDataEntry = list(AbundanceDataEntry) #Convert tuple to list
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
318 ResolvedData.append(lstAbundanceDataEntry) #Append the list to the metadata list
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
319 CommonArea['ReturnedData'] = ResolvedData #Post the results
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
320 return CommonArea
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
321
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
322
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
323 #*******************************************************************************
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
324 #* Check the params and override in the case of biom *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
325 #*******************************************************************************
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
326 def check_params_for_biom_case(params, CommonArea):
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
327 CommonArea['MetadataNames'] = list() #Metadata names
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
328 params['original_class'] = params['class'] #Save the original class
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
329 params['original_subclass'] = params['subclass'] #Save the original subclass
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
330 params['original_subject'] = params['subject'] #Save the original subclass
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
331
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
332
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
333 TotalMetadataEntriesAndIDInBiomFile = len(CommonArea['abndData'].funcGetMetadataCopy()) # The number of metadata entries
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
334 for i in range(0,TotalMetadataEntriesAndIDInBiomFile): #* Populate the meta data names table
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
335 CommonArea['MetadataNames'].append(CommonArea['ReturnedData'][i][0]) #Add the metadata name
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
336
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
337
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
338 #****************************************************
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
339 #* Setting the params here *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
340 #****************************************************
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
341
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
342 if TotalMetadataEntriesAndIDInBiomFile > 0: #If there is at least one entry - has to be the subject
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
343 params['subject'] = 1
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
344 if TotalMetadataEntriesAndIDInBiomFile == 2: #If there are 2 - The first is the subject and the second has to be the metadata, and that is the class
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
345 params['class'] = 2
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
346 if TotalMetadataEntriesAndIDInBiomFile == 3: #If there are 3: Set up default that the second entry is the class and the third is the subclass
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
347 params['class'] = 2
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
348 params['subclass'] = 3
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
349 FlagError = False #Set up error flag
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
350
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
351 if not params['biom_class'] is None and not params['biom_subclass'] is None: #Check if the User passed a valid class and subclass
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
352 if params['biom_class'] in CommonArea['MetadataNames']:
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
353 params['class'] = CommonArea['MetadataNames'].index(params['biom_class']) +1 #* Set up the index for that metadata
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
354 else:
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
355 FlagError = True
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
356 if params['biom_subclass'] in CommonArea['MetadataNames']:
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
357 params['subclass'] = CommonArea['MetadataNames'].index(params['biom_subclass']) +1 #* Set up the index for that metadata
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
358 else:
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
359 FlagError = True
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
360 if FlagError == True: #* If the User passed an invalid class
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
361 print "**Invalid biom class or subclass passed - Using defaults: First metadata=class, Second Metadata=subclass\n"
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
362 params['class'] = 2
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
363 params['subclass'] = 3
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
364 return params
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
365
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
366
0
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
367
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
368 if __name__ == '__main__':
20
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
369 CommonArea = dict() #Build a Common Area to pass variables in the biom case
0
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
370 params = read_params(sys.argv)
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
371
20
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
372 #*************************************************************
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
373 #* Conditionally import breadcrumbs if file is a biom file *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
374 #* If it is and no breadcrumbs found - abnormally exit *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
375 #*************************************************************
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
376 if params['input_file'].endswith('.biom'):
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
377 try:
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
378 from lefsebiom.ConstantsBreadCrumbs import *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
379 from lefsebiom.AbundanceTable import *
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
380 except ImportError:
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
381 sys.stderr.write("************************************************************************************************************ \n")
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
382 sys.stderr.write("* Error: Breadcrumbs libraries not detected - required to process biom files - run abnormally terminated * \n")
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
383 sys.stderr.write("************************************************************************************************************ \n")
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
384 exit(1)
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
385
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
386
0
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
387 if type(params['subclass']) is int and int(params['subclass']) < 1:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
388 params['subclass'] = None
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
389 if type(params['subject']) is int and int(params['subject']) < 1:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
390 params['subject'] = None
20
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
391
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
392
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
393 CommonArea = read_input_file(sys.argv[1], CommonArea) #Pass The CommonArea to the Read
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
394 data = CommonArea['ReturnedData'] #Select the data
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
395
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
396 if sys.argv[1].endswith('biom'): #* Check if biom:
a6284ef17bf3 Fixed bug due to numerical approximation after normalization affecting root-level clades (e.g. &#34;Bacteria&#34; or &#34;Archaea&#34;)
george-weingart
parents: 0
diff changeset
397 params = check_params_for_biom_case(params, CommonArea) #Check the params for the biom case
0
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
398
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
399 if params['feats_dir'] == "c":
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
400 data = transpose(data)
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
401
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
402 ncl = 1
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
403 if not params['subclass'] is None: ncl += 1
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
404 if not params['subject'] is None: ncl += 1
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
405
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
406 first_line = zip(*data)[0]
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
407
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
408 first_line = modify_feature_names(list(first_line))
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
409
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
410 data = zip( first_line,
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
411 *sort_by_cl(zip(*data)[1:],
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
412 ncl,
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
413 params['class']-1,
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
414 params['subclass']-1 if not params['subclass'] is None else None,
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
415 params['subject']-1 if not params['subject'] is None else None))
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
416 # data.insert(0,first_line)
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
417 # data = remove_missing(data,params['missing_p'])
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
418 cls = {}
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
419
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
420 cls_i = [('class',params['class']-1)]
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
421 if params['subclass'] > 0: cls_i.append(('subclass',params['subclass']-1))
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
422 if params['subject'] > 0: cls_i.append(('subject',params['subject']-1))
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
423 cls_i.sort(lambda x, y: -cmp(x[1],y[1]))
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
424 for v in cls_i: cls[v[0]] = data.pop(v[1])[1:]
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
425 if not params['subclass'] > 0: cls['subclass'] = [str(cl)+"_subcl" for cl in cls['class']]
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
426
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
427 cls['subclass'] = rename_same_subcl(cls['class'],cls['subclass'])
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
428 # if 'subclass' in cls.keys(): cls = group_small_subclasses(cls,params['subcl_min_card'])
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
429 class_sl,subclass_sl,class_hierarchy = get_class_slices(zip(*cls.values()))
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
430
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
431 feats = dict([(d[0],d[1:]) for d in data])
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
432
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
433 feats = add_missing_levels(feats)
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
434
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
435 feats = numerical_values(feats,params['norm_v'])
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
436 out = {}
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
437 out['feats'] = feats
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
438 out['norm'] = params['norm_v']
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
439 out['cls'] = cls
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
440 out['class_sl'] = class_sl
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
441 out['subclass_sl'] = subclass_sl
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
442 out['class_hierarchy'] = class_hierarchy
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
443
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
444 if params['output_table']:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
445 with open( params['output_table'], "w") as outf:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
446 if 'class' in cls: outf.write( "\t".join(list(["class"])+list(cls['class'])) + "\n" )
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
447 if 'subclass' in cls: outf.write( "\t".join(list(["subclass"])+list(cls['subclass'])) + "\n" )
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
448 if 'subject' in cls: outf.write( "\t".join(list(["subject"])+list(cls['subject'])) + "\n" )
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
449 for k,v in out['feats'].items(): outf.write( "\t".join([k]+[str(vv) for vv in v]) + "\n" )
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
450
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
451 with open(params['output_file'], 'wb') as back_file:
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
452 pickle.dump(out,back_file)
47ac77f2fe68 First version of lefse in this toolshed
george-weingart
parents:
diff changeset
453