annotate docs/scripts/txt/ExtractFromSDFiles.txt @ 3:90ea638ce878 draft default tip

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 09:11:59 -0500
parents 2abf0d43254d
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
1
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
1 NAME
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
2 ExtractFromSDFiles.pl - Extract specific data from SDFile(s)
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
3
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
4 SYNOPSIS
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
5 ExtractFromSDFiles.pl SDFile(s)...
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
6
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
7 ExtractFromSDFiles.pl [-h, --help] [-d, --datafields "fieldlabel,..." |
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
8 "fieldlabel,value,criteria..." | "fieldlabel,value,value..."]
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
9 [--datafieldsfile filename] [--indelim comma | tab | semicolon] [-m,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
10 --mode alldatafields | commondatafields | | datafieldnotbylist |
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
11 datafields | datafieldsbyvalue | datafieldsbyregex | datafieldbylist |
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
12 datafielduniquebylist | molnames | randomcmpds | recordnum | recordnums
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
13 | recordrange | 2dcmpdrecords | 3dcmpdrecords ] [-n, --numofcmpds
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
14 number] [--outdelim comma | tab | semicolon] [--output SD | text | both]
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
15 [-o, --overwrite] [-q, --quote yes | no] [--record recnum |
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
16 startrecnum,endrecnum] --RegexIgnoreCase *yes or no* [-r, --root
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
17 rootname] [-s, --seed number] [--StrDataString yes | no]
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
18 [--StrDataStringDelimiter text] [--StrDataStringMode StrOnly |
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
19 StrAndDataFields] [--ValueComparisonMode *Numeric | Alphanumeric*] [-v,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
20 --violations- number] [-w, --workingdir dirname] SDFile(s)...
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
21
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
22 DESCRIPTION
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
23 Extract specific data from *SDFile(s)* and generate appropriate SD or
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
24 CSV/TSV text file(s). The structure data from SDFile(s) is not
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
25 transferred to CSV/TSV text file(s). Multiple SDFile names are separated
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
26 by spaces. The valid file extensions are *.sdf* and *.sd*. All other
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
27 file names are ignored. All the SD files in a current directory can be
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
28 specified either by **.sdf* or the current directory name.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
29
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
30 OPTIONS
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
31 -h, --help
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
32 Print this help message.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
33
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
34 -d, --datafields *"fieldlabel,..." | "fieldlabel,value,criteria..." |
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
35 "fieldlabel,value,value,..."*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
36 This value is mode specific. In general, it's a list of comma
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
37 separated data field labels and associated mode specific values.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
38
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
39 For *datafields* mode, input value format is: *fieldlabel,...*.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
40 Examples:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
41
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
42 Extreg
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
43 Extreg,CompoundName,ID
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
44
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
45 For *datafieldsbyvalue* mode, input value format contains these
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
46 triplets: *fieldlabel,value, criteria...*. Possible values for
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
47 criteria: *le, ge or eq*. The values of --ValueComparisonMode
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
48 indicates whether values are compared numerical or string comarison
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
49 operators. Default is to consider data field values as numerical
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
50 values and use numerical comparison operators. Examples:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
51
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
52 MolWt,450,le
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
53 MolWt,450,le,LogP,5,le,SumNumNO,10,le,SumNHOH,5,le
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
54
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
55 For *datafieldsbyregex* mode, input value format contains these
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
56 triplets: *fieldlabel,regex, criteria...*. *regex* corresponds to
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
57 any valid regular expression and is used to match the values for
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
58 specified *fieldlabel*. Possible values for criteria: *eq or ne*.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
59 During *eq* and *ne* values, data field label value is matched with
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
60 regular expression using =~ and !~ respectively. --RegexIgnoreCase
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
61 option value is used to determine whether to ignore letter
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
62 upper/lower case during regular expression match. Examples:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
63
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
64 Name,ol,eq
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
65 Name,'^pat',ne
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
66
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
67 For *datafieldbylist* and *datafielduniquebylist* mode, input value
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
68 format is: *fieldlabel,value1,value2...*. This is equivalent to
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
69 *datafieldsbyvalue* mode with this input value
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
70 format:*fieldlabel,value1,eq,fieldlabel,value2,eq,...*. For
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
71 *datafielduniquebylist* mode, only unique compounds identified by
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
72 first occurrence of *value* associated with *fieldlabel* in
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
73 *SDFile(s)* are kept; any subsequent compounds are simply ignored.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
74
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
75 For *datafieldnotbylist* mode, input value format is:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
76 *fieldlabel,value1,value2...*. In this mode, the script behaves
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
77 exactly opposite of *datafieldbylist* mode, and only those compounds
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
78 are extracted whose data field values don't match any specified data
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
79 field value.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
80
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
81 --datafieldsfile *filename*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
82 Filename which contains various mode specific values. This option
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
83 provides a way to specify mode specific values in a file instead of
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
84 entering them on the command line using -d --datafields.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
85
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
86 For *datafields* mode, input file lines contain comma delimited
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
87 field labels: *fieldlabel,...*. Example:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
88
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
89 Line 1:MolId
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
90 Line 2:"Extreg",CompoundName,ID
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
91
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
92 For *datafieldsbyvalue* mode, input file lines contains these comma
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
93 separated triplets: *fieldlabel,value, criteria*. Possible values
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
94 for criteria: *le, ge or eq*. Examples:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
95
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
96 Line 1:MolWt,450,le
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
97
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
98 Line 1:"MolWt",450,le,"LogP",5,le,"SumNumNO",10,le,"SumNHOH",5,le
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
99
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
100 Line 1:MolWt,450,le
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
101 Line 2:"LogP",5,le
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
102 Line 3:"SumNumNO",10,le
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
103 Line 4: SumNHOH,5,le
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
104
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
105 For *datafieldbylist* and *datafielduniquebylist* mode, input file
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
106 line format is:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
107
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
108 Line 1:fieldlabel;
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
109 Subsequent lines:value1,value2...
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
110
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
111 For *datafieldbylist*, *datafielduniquebylist*, and
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
112 *datafieldnotbylist* mode, input file line format is:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
113
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
114 Line 1:fieldlabel;
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
115 Subsequent lines:value1,value2...
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
116
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
117 For *datafielduniquebylist* mode, only unique compounds identified
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
118 by first occurrence of *value* associated with *fieldlabel* in
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
119 *SDFile(s)* are kept; any subsequent compounds are simply ignored.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
120 Example:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
121
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
122 Line 1: MolID
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
123 Subsequent Lines:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
124 907508
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
125 832291,4642
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
126 "1254","907303"
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
127
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
128 --indelim *comma | tab | semicolon*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
129 Delimiter used to specify text values for -d --datafields and
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
130 --datafieldsfile options. Possible values: *comma, tab, or
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
131 semicolon*. Default value: *comma*.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
132
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
133 -m, --mode *alldatafields | commondatafields | datafields |
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
134 datafieldsbyvalue | datafieldsbyregex | datafieldbylist |
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
135 datafielduniquebylist | datafieldnotbylist | molnames | randomcmpds |
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
136 recordnum | recordnums | recordrange | 2dcmpdrecords | 3dcmpdrecords*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
137 Specify what to extract from *SDFile(s)*. Possible values:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
138 *alldatafields, commondatafields, datafields, datafieldsbyvalue,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
139 datafieldsbyregex, datafieldbylist, datafielduniquebylist,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
140 datafieldnotbylist, molnames, randomcmpds, recordnum, recordnums,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
141 recordrange, 2dcmpdrecords, 3dcmpdrecords*. Default value:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
142 *alldatafields*.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
143
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
144 For *alldatafields* and *molnames* mode, only a CSV/TSV text file is
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
145 generated; for all other modes, however, a SD file is generated by
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
146 default - you can change the behavior to genereate text file using
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
147 *--output* option.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
148
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
149 For *3DCmpdRecords* mode, only those compounds with at least one
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
150 non-zero value for Z atomic coordinates are retrieved; however,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
151 during retrieval of compounds in *2DCmpdRecords* mode, all Z atomic
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
152 coordinates must be zero.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
153
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
154 -n, --numofcmpds *number*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
155 Number of compouds to extract during *randomcmpds* mode.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
156
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
157 --outdelim *comma | tab | semicolon*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
158 Delimiter for output CSV/TSV text file(s). Possible values: *comma,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
159 tab, or semicolon* Default value: *comma*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
160
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
161 --output *SD | text | both*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
162 Type of output files to generate. Possible values: *SD, text, or
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
163 both*. Default value: *SD*. For *alldatafields* and *molnames* mode,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
164 this option is ingored and only a CSV/TSV text file is generated.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
165
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
166 -o, --overwrite
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
167 Overwrite existing files.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
168
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
169 -q, --quote *yes | no*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
170 Put quote around column values in output CSV/TSV text file(s).
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
171 Possible values: *yes or no*. Default value: *yes*.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
172
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
173 --record *recnum | recnums | startrecnum,endrecnum*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
174 Record number, record numbers or range of records to extract during
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
175 *recordnum*, *recordnums* and *recordrange* mode. Input value format
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
176 is: <num>, <num1,num2,...> and <startnum, endnum> for *recordnum*,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
177 *recordnums* and *recordrange* modes recpectively. Default value:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
178 none.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
179
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
180 --RegexIgnoreCase *yes or no*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
181 Specify whether to ingnore case during *datafieldsbyregex* value of
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
182 -m, --mode option. Possible values: *yes or no*. Default value:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
183 *yes*.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
184
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
185 -r, --root *rootname*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
186 New file name is generated using the root: <Root>.<Ext>. Default for
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
187 new file names: <SDFileName><mode>.<Ext>. The file type determines
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
188 <Ext> value. The sdf, csv, and tsv <Ext> values are used for SD,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
189 comma/semicolon, and tab delimited text files respectively.This
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
190 option is ignored for multiple input files.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
191
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
192 -s, --seed *number*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
193 Random number seed used for *randomcmpds* mode. Default:123456789.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
194
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
195 --StrDataString *yes | no*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
196 Specify whether to write out structure data string to CSV/TSV text
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
197 file(s). Possible values: *yes or no*. Default value: *no*.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
198
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
199 The value of StrDataStringDelimiter option is used as a delimiter to
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
200 join structure data lines into a structure data string.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
201
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
202 This option is ignored during generation of SD file(s).
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
203
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
204 --StrDataStringDelimiter *text*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
205 Delimiter for joining multiple stucture data lines into a string
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
206 before writing to CSV/TSV text file(s). Possible values: *any
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
207 alphanumeric text*. Default value: *|*.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
208
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
209 This option is ignored during generation of SD file(s).
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
210
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
211 --StrDataStringMode *StrOnly | StrAndDataFields*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
212 Specify whether to include SD data fields and values along with the
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
213 structure data into structure data string before writing it out to
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
214 CSV/TSV text file(s). Possible values: *StrOnly or
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
215 StrAndDataFields*. Default value: *StrOnly*.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
216
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
217 The value of StrDataStringDelimiter option is used as a delimiter to
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
218 join structure data lines into a structure data string.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
219
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
220 This option is ignored during generation of SD file(s).
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
221
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
222 --ValueComparisonMode *Numeric | Alphanumeric*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
223 Specify how to compare data field values during *datafieldsbyvalue*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
224 mode: Compare values using either numeric or string ((eq, le, ge)
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
225 comparison operators. Possible values: *Numeric or Alphanumeric*.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
226 Defaule value: *Numeric*.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
227
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
228 -v, --violations *number*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
229 Number of criterion violations allowed for values specified during
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
230 *datafieldsbyvalue* and *datafieldsbyregex* mode. Default value:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
231 *0*.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
232
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
233 -w, --workingdir *dirname*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
234 Location of working directory. Default: current directory.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
235
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
236 EXAMPLES
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
237 To retrieve all data fields from SD files and generate CSV text files,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
238 type:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
239
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
240 % ExtractFromSDFiles.pl -o Sample.sdf
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
241 % ExtractFromSDFiles.pl -o *.sdf
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
242
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
243 To retrieve all data fields from SD file and generate CSV text files
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
244 containing a column with structure data as a string with | as line
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
245 delimiter, type:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
246
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
247 % ExtractFromSDFiles.pl --StrDataString Yes -o Sample.sdf
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
248
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
249 To retrieve MOL_ID data fileld from SD file and generate CSV text files
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
250 containing a column with structure data along with all data fields as a
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
251 string with | as line delimiter, type:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
252
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
253 % ExtractFromSDFiles.pl -m datafields -d "Mol_ID" --StrDataString Yes
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
254 --StrDataStringMode StrAndDataFields --StrDataStringDelimiter "|"
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
255 --output text -o Sample.sdf
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
256
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
257 To retrieve common data fields which exists for all the compounds in a
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
258 SD file and generate a TSV text file NewSample.tsv, type:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
259
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
260 % ExtractFromSDFiles.pl -m commondatafields --outdelim tab -r NewSample
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
261 --output Text -o Sample.sdf
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
262
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
263 To retrieve MolId, ExtReg, and CompoundName data field from a SD file
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
264 and generate a CSV text file NewSample.csv, type:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
265
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
266 % ExtractFromSDFiles.pl -m datafields -d "Mol_ID,MolWeight,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
267 CompoundName" -r NewSample --output Text -o Sample.sdf
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
268
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
269 To retrieve compounds from a SD which meet a specific set of criteria -
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
270 MolWt <= 450, LogP <= 5 and SumNO < 10 - from a SD file and generate a
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
271 new SD file NewSample.sdf, type:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
272
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
273 % ExtractFromSDFiles.pl -m datafieldsbyvalue -d "MolWt,450,le,LogP
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
274 ,5,le,SumNO,10" -r NewSample -o Sample.sdf
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
275
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
276 To retrive compounds from a SD file with a specific set of values for
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
277 MolID and generate a new SD file NewSample.sdf, type:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
278
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
279 % ExtractFromSDFiles.pl -m datafieldbylist -d "Mol_ID,159,4509,4619"
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
280 -r NewSample -o Sample.sdf
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
281
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
282 To retrive compounds from a SD file with values for MolID not on a list
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
283 of specified values and generate a new SD file NewSample.sdf, type:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
284
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
285 % ExtractFromSDFiles.pl -m datafieldnotbylist -d "Mol_ID,159,4509,4619"
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
286 -r NewSample -o Sample.sdf
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
287
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
288 To retrive 10 random compounds from a SD file and generate a new SD file
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
289 RandomSample.sdf, type:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
290
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
291 % ExtractFromSDFiles.pl -m randomcmpds -n 10 -r RandomSample
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
292 -o Sample.sdf
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
293
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
294 To retrive compound record number 10 from a SD file and generate a new
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
295 SD file NewSample.sdf, type:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
296
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
297 % ExtractFromSDFiles.pl -m recordnum --record 10 -r NewSample
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
298 -o Sample.sdf
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
299
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
300 To retrive compound record numbers 10, 20 and 30 from a SD file and
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
301 generate a new SD file NewSample.sdf, type:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
302
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
303 % ExtractFromSDFiles.pl -m recordnums --record 10,20,30 -r NewSample
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
304 -o Sample.sdf
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
305
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
306 To retrive compound records between 10 to 20 from SD file and generate a
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
307 new SD file NewSample.sdf, type:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
308
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
309 % ExtractFromSDFiles.pl -m recordrange --record 10,20 -r NewSample
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
310 -o Sample.sdf
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
311
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
312 AUTHOR
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
313 Manish Sud <msud@san.rr.com>
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
314
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
315 SEE ALSO
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
316 FilterSDFiles.pl, InfoSDFiles.pl, SplitSDFiles.pl,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
317 MergeTextFilesWithSD.pl
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
318
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
319 COPYRIGHT
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
320 Copyright (C) 2015 Manish Sud. All rights reserved.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
321
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
322 This file is part of MayaChemTools.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
323
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
324 MayaChemTools is free software; you can redistribute it and/or modify it
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
325 under the terms of the GNU Lesser General Public License as published by
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
326 the Free Software Foundation; either version 3 of the License, or (at
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
327 your option) any later version.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
328