annotate docs/scripts/txt/ExtractFromSDFiles.txt @ 0:4816e4a8ae95 draft default tip

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 09:23:18 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
1 NAME
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
2 ExtractFromSDFiles.pl - Extract specific data from SDFile(s)
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
3
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
4 SYNOPSIS
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
5 ExtractFromSDFiles.pl SDFile(s)...
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
6
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
7 ExtractFromSDFiles.pl [-h, --help] [-d, --datafields "fieldlabel,..." |
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
8 "fieldlabel,value,criteria..." | "fieldlabel,value,value..."]
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
9 [--datafieldsfile filename] [--indelim comma | tab | semicolon] [-m,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
10 --mode alldatafields | commondatafields | | datafieldnotbylist |
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
11 datafields | datafieldsbyvalue | datafieldsbyregex | datafieldbylist |
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
12 datafielduniquebylist | molnames | randomcmpds | recordnum | recordnums
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
13 | recordrange | 2dcmpdrecords | 3dcmpdrecords ] [-n, --numofcmpds
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
14 number] [--outdelim comma | tab | semicolon] [--output SD | text | both]
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
15 [-o, --overwrite] [-q, --quote yes | no] [--record recnum |
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
16 startrecnum,endrecnum] --RegexIgnoreCase *yes or no* [-r, --root
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
17 rootname] [-s, --seed number] [--StrDataString yes | no]
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
18 [--StrDataStringDelimiter text] [--StrDataStringMode StrOnly |
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
19 StrAndDataFields] [--ValueComparisonMode *Numeric | Alphanumeric*] [-v,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
20 --violations- number] [-w, --workingdir dirname] SDFile(s)...
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
21
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
22 DESCRIPTION
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
23 Extract specific data from *SDFile(s)* and generate appropriate SD or
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
24 CSV/TSV text file(s). The structure data from SDFile(s) is not
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
25 transferred to CSV/TSV text file(s). Multiple SDFile names are separated
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
26 by spaces. The valid file extensions are *.sdf* and *.sd*. All other
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
27 file names are ignored. All the SD files in a current directory can be
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
28 specified either by **.sdf* or the current directory name.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
29
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
30 OPTIONS
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
31 -h, --help
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
32 Print this help message.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
33
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
34 -d, --datafields *"fieldlabel,..." | "fieldlabel,value,criteria..." |
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
35 "fieldlabel,value,value,..."*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
36 This value is mode specific. In general, it's a list of comma
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
37 separated data field labels and associated mode specific values.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
38
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
39 For *datafields* mode, input value format is: *fieldlabel,...*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
40 Examples:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
41
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
42 Extreg
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
43 Extreg,CompoundName,ID
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
44
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
45 For *datafieldsbyvalue* mode, input value format contains these
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
46 triplets: *fieldlabel,value, criteria...*. Possible values for
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
47 criteria: *le, ge or eq*. The values of --ValueComparisonMode
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
48 indicates whether values are compared numerical or string comarison
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
49 operators. Default is to consider data field values as numerical
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
50 values and use numerical comparison operators. Examples:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
51
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
52 MolWt,450,le
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
53 MolWt,450,le,LogP,5,le,SumNumNO,10,le,SumNHOH,5,le
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
54
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
55 For *datafieldsbyregex* mode, input value format contains these
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
56 triplets: *fieldlabel,regex, criteria...*. *regex* corresponds to
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
57 any valid regular expression and is used to match the values for
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
58 specified *fieldlabel*. Possible values for criteria: *eq or ne*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
59 During *eq* and *ne* values, data field label value is matched with
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
60 regular expression using =~ and !~ respectively. --RegexIgnoreCase
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
61 option value is used to determine whether to ignore letter
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
62 upper/lower case during regular expression match. Examples:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
63
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
64 Name,ol,eq
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
65 Name,'^pat',ne
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
66
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
67 For *datafieldbylist* and *datafielduniquebylist* mode, input value
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
68 format is: *fieldlabel,value1,value2...*. This is equivalent to
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
69 *datafieldsbyvalue* mode with this input value
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
70 format:*fieldlabel,value1,eq,fieldlabel,value2,eq,...*. For
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
71 *datafielduniquebylist* mode, only unique compounds identified by
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
72 first occurrence of *value* associated with *fieldlabel* in
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
73 *SDFile(s)* are kept; any subsequent compounds are simply ignored.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
74
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
75 For *datafieldnotbylist* mode, input value format is:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
76 *fieldlabel,value1,value2...*. In this mode, the script behaves
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
77 exactly opposite of *datafieldbylist* mode, and only those compounds
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
78 are extracted whose data field values don't match any specified data
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
79 field value.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
80
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
81 --datafieldsfile *filename*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
82 Filename which contains various mode specific values. This option
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
83 provides a way to specify mode specific values in a file instead of
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
84 entering them on the command line using -d --datafields.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
85
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
86 For *datafields* mode, input file lines contain comma delimited
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
87 field labels: *fieldlabel,...*. Example:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
88
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
89 Line 1:MolId
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
90 Line 2:"Extreg",CompoundName,ID
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
91
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
92 For *datafieldsbyvalue* mode, input file lines contains these comma
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
93 separated triplets: *fieldlabel,value, criteria*. Possible values
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
94 for criteria: *le, ge or eq*. Examples:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
95
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
96 Line 1:MolWt,450,le
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
97
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
98 Line 1:"MolWt",450,le,"LogP",5,le,"SumNumNO",10,le,"SumNHOH",5,le
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
99
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
100 Line 1:MolWt,450,le
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
101 Line 2:"LogP",5,le
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
102 Line 3:"SumNumNO",10,le
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
103 Line 4: SumNHOH,5,le
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
104
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
105 For *datafieldbylist* and *datafielduniquebylist* mode, input file
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
106 line format is:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
107
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
108 Line 1:fieldlabel;
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
109 Subsequent lines:value1,value2...
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
110
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
111 For *datafieldbylist*, *datafielduniquebylist*, and
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
112 *datafieldnotbylist* mode, input file line format is:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
113
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
114 Line 1:fieldlabel;
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
115 Subsequent lines:value1,value2...
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
116
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
117 For *datafielduniquebylist* mode, only unique compounds identified
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
118 by first occurrence of *value* associated with *fieldlabel* in
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
119 *SDFile(s)* are kept; any subsequent compounds are simply ignored.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
120 Example:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
121
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
122 Line 1: MolID
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
123 Subsequent Lines:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
124 907508
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
125 832291,4642
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
126 "1254","907303"
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
127
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
128 --indelim *comma | tab | semicolon*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
129 Delimiter used to specify text values for -d --datafields and
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
130 --datafieldsfile options. Possible values: *comma, tab, or
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
131 semicolon*. Default value: *comma*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
132
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
133 -m, --mode *alldatafields | commondatafields | datafields |
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
134 datafieldsbyvalue | datafieldsbyregex | datafieldbylist |
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
135 datafielduniquebylist | datafieldnotbylist | molnames | randomcmpds |
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
136 recordnum | recordnums | recordrange | 2dcmpdrecords | 3dcmpdrecords*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
137 Specify what to extract from *SDFile(s)*. Possible values:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
138 *alldatafields, commondatafields, datafields, datafieldsbyvalue,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
139 datafieldsbyregex, datafieldbylist, datafielduniquebylist,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
140 datafieldnotbylist, molnames, randomcmpds, recordnum, recordnums,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
141 recordrange, 2dcmpdrecords, 3dcmpdrecords*. Default value:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
142 *alldatafields*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
143
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
144 For *alldatafields* and *molnames* mode, only a CSV/TSV text file is
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
145 generated; for all other modes, however, a SD file is generated by
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
146 default - you can change the behavior to genereate text file using
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
147 *--output* option.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
148
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
149 For *3DCmpdRecords* mode, only those compounds with at least one
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
150 non-zero value for Z atomic coordinates are retrieved; however,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
151 during retrieval of compounds in *2DCmpdRecords* mode, all Z atomic
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
152 coordinates must be zero.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
153
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
154 -n, --numofcmpds *number*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
155 Number of compouds to extract during *randomcmpds* mode.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
156
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
157 --outdelim *comma | tab | semicolon*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
158 Delimiter for output CSV/TSV text file(s). Possible values: *comma,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
159 tab, or semicolon* Default value: *comma*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
160
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
161 --output *SD | text | both*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
162 Type of output files to generate. Possible values: *SD, text, or
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
163 both*. Default value: *SD*. For *alldatafields* and *molnames* mode,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
164 this option is ingored and only a CSV/TSV text file is generated.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
165
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
166 -o, --overwrite
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
167 Overwrite existing files.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
168
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
169 -q, --quote *yes | no*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
170 Put quote around column values in output CSV/TSV text file(s).
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
171 Possible values: *yes or no*. Default value: *yes*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
172
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
173 --record *recnum | recnums | startrecnum,endrecnum*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
174 Record number, record numbers or range of records to extract during
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
175 *recordnum*, *recordnums* and *recordrange* mode. Input value format
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
176 is: <num>, <num1,num2,...> and <startnum, endnum> for *recordnum*,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
177 *recordnums* and *recordrange* modes recpectively. Default value:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
178 none.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
179
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
180 --RegexIgnoreCase *yes or no*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
181 Specify whether to ingnore case during *datafieldsbyregex* value of
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
182 -m, --mode option. Possible values: *yes or no*. Default value:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
183 *yes*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
184
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
185 -r, --root *rootname*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
186 New file name is generated using the root: <Root>.<Ext>. Default for
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
187 new file names: <SDFileName><mode>.<Ext>. The file type determines
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
188 <Ext> value. The sdf, csv, and tsv <Ext> values are used for SD,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
189 comma/semicolon, and tab delimited text files respectively.This
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
190 option is ignored for multiple input files.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
191
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
192 -s, --seed *number*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
193 Random number seed used for *randomcmpds* mode. Default:123456789.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
194
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
195 --StrDataString *yes | no*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
196 Specify whether to write out structure data string to CSV/TSV text
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
197 file(s). Possible values: *yes or no*. Default value: *no*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
198
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
199 The value of StrDataStringDelimiter option is used as a delimiter to
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
200 join structure data lines into a structure data string.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
201
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
202 This option is ignored during generation of SD file(s).
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
203
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
204 --StrDataStringDelimiter *text*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
205 Delimiter for joining multiple stucture data lines into a string
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
206 before writing to CSV/TSV text file(s). Possible values: *any
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
207 alphanumeric text*. Default value: *|*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
208
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
209 This option is ignored during generation of SD file(s).
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
210
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
211 --StrDataStringMode *StrOnly | StrAndDataFields*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
212 Specify whether to include SD data fields and values along with the
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
213 structure data into structure data string before writing it out to
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
214 CSV/TSV text file(s). Possible values: *StrOnly or
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
215 StrAndDataFields*. Default value: *StrOnly*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
216
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
217 The value of StrDataStringDelimiter option is used as a delimiter to
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
218 join structure data lines into a structure data string.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
219
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
220 This option is ignored during generation of SD file(s).
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
221
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
222 --ValueComparisonMode *Numeric | Alphanumeric*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
223 Specify how to compare data field values during *datafieldsbyvalue*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
224 mode: Compare values using either numeric or string ((eq, le, ge)
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
225 comparison operators. Possible values: *Numeric or Alphanumeric*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
226 Defaule value: *Numeric*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
227
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
228 -v, --violations *number*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
229 Number of criterion violations allowed for values specified during
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
230 *datafieldsbyvalue* and *datafieldsbyregex* mode. Default value:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
231 *0*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
232
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
233 -w, --workingdir *dirname*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
234 Location of working directory. Default: current directory.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
235
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
236 EXAMPLES
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
237 To retrieve all data fields from SD files and generate CSV text files,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
238 type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
239
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
240 % ExtractFromSDFiles.pl -o Sample.sdf
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
241 % ExtractFromSDFiles.pl -o *.sdf
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
242
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
243 To retrieve all data fields from SD file and generate CSV text files
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
244 containing a column with structure data as a string with | as line
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
245 delimiter, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
246
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
247 % ExtractFromSDFiles.pl --StrDataString Yes -o Sample.sdf
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
248
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
249 To retrieve MOL_ID data fileld from SD file and generate CSV text files
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
250 containing a column with structure data along with all data fields as a
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
251 string with | as line delimiter, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
252
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
253 % ExtractFromSDFiles.pl -m datafields -d "Mol_ID" --StrDataString Yes
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
254 --StrDataStringMode StrAndDataFields --StrDataStringDelimiter "|"
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
255 --output text -o Sample.sdf
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
256
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
257 To retrieve common data fields which exists for all the compounds in a
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
258 SD file and generate a TSV text file NewSample.tsv, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
259
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
260 % ExtractFromSDFiles.pl -m commondatafields --outdelim tab -r NewSample
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
261 --output Text -o Sample.sdf
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
262
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
263 To retrieve MolId, ExtReg, and CompoundName data field from a SD file
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
264 and generate a CSV text file NewSample.csv, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
265
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
266 % ExtractFromSDFiles.pl -m datafields -d "Mol_ID,MolWeight,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
267 CompoundName" -r NewSample --output Text -o Sample.sdf
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
268
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
269 To retrieve compounds from a SD which meet a specific set of criteria -
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
270 MolWt <= 450, LogP <= 5 and SumNO < 10 - from a SD file and generate a
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
271 new SD file NewSample.sdf, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
272
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
273 % ExtractFromSDFiles.pl -m datafieldsbyvalue -d "MolWt,450,le,LogP
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
274 ,5,le,SumNO,10" -r NewSample -o Sample.sdf
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
275
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
276 To retrive compounds from a SD file with a specific set of values for
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
277 MolID and generate a new SD file NewSample.sdf, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
278
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
279 % ExtractFromSDFiles.pl -m datafieldbylist -d "Mol_ID,159,4509,4619"
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
280 -r NewSample -o Sample.sdf
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
281
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
282 To retrive compounds from a SD file with values for MolID not on a list
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
283 of specified values and generate a new SD file NewSample.sdf, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
284
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
285 % ExtractFromSDFiles.pl -m datafieldnotbylist -d "Mol_ID,159,4509,4619"
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
286 -r NewSample -o Sample.sdf
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
287
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
288 To retrive 10 random compounds from a SD file and generate a new SD file
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
289 RandomSample.sdf, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
290
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
291 % ExtractFromSDFiles.pl -m randomcmpds -n 10 -r RandomSample
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
292 -o Sample.sdf
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
293
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
294 To retrive compound record number 10 from a SD file and generate a new
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
295 SD file NewSample.sdf, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
296
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
297 % ExtractFromSDFiles.pl -m recordnum --record 10 -r NewSample
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
298 -o Sample.sdf
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
299
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
300 To retrive compound record numbers 10, 20 and 30 from a SD file and
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
301 generate a new SD file NewSample.sdf, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
302
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
303 % ExtractFromSDFiles.pl -m recordnums --record 10,20,30 -r NewSample
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
304 -o Sample.sdf
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
305
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
306 To retrive compound records between 10 to 20 from SD file and generate a
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
307 new SD file NewSample.sdf, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
308
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
309 % ExtractFromSDFiles.pl -m recordrange --record 10,20 -r NewSample
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
310 -o Sample.sdf
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
311
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
312 AUTHOR
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
313 Manish Sud <msud@san.rr.com>
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
314
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
315 SEE ALSO
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
316 FilterSDFiles.pl, InfoSDFiles.pl, SplitSDFiles.pl,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
317 MergeTextFilesWithSD.pl
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
318
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
319 COPYRIGHT
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
320 Copyright (C) 2015 Manish Sud. All rights reserved.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
321
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
322 This file is part of MayaChemTools.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
323
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
324 MayaChemTools is free software; you can redistribute it and/or modify it
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
325 under the terms of the GNU Lesser General Public License as published by
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
326 the Free Software Foundation; either version 3 of the License, or (at
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
327 your option) any later version.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
328