comparison docs/scripts/txt/ModifySDFilesDataFields.txt @ 0:4816e4a8ae95 draft default tip

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 09:23:18 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:4816e4a8ae95
1 NAME
2 ModifySDFilesDataFields.pl - Modify data fields in SDFile(s)
3
4 SYNOPSIS
5 ModifySDFilesDataFields.pl SDFile(s)...
6
7 ModifySDFilesDataFields.pl [-d, --detail infolevel] [--datafieldscommon
8 newfieldlabel, newfieldvalue, [newfieldlabel, newfieldvalue,...]]
9 [--datafieldsmap newfieldlabel, oldfieldlabel, [oldfieldlabel,...];
10 [newfieldlabel, oldfieldlabel, [oldfieldlabel,...]]]
11 [--datafieldsmapfile filename] [--datafieldURL URLDataFieldLabel,
12 CGIScriptPath, CGIParamName, CmpdIDFieldLabel] [-h, --help] [-k,
13 --keepolddatafields all | unmappedonly | none] [-m, --mode molname |
14 datafields | both] [--molnamemode datafield | labelprefix] [--molname
15 datafieldname or prefixstring] [--molnamereplace always | empty] [-o,
16 --overwrite] [-r, --root rootname] [-w, --workingdir dirname]
17 SDFile(s)...
18
19 DESCRIPTION
20 Modify molname line and data fields in *SDFile(s)*. Molname line can be
21 replaced by a data field value or assigned a sequential ID prefixed with
22 a specific string. For data fields and modification of their values,
23 these types of options are supported: replace data field labels by
24 another set of labels; combine values of multiple data fields and assign
25 a new label; add specific set of data field labels and values to all
26 compound records; and others.
27
28 The file names are separated by space.The valid file extensions are
29 *.sdf* and *.sd*. All other file names are ignored. All the SD files in
30 a current directory can be specified either by **.sdf* or the current
31 directory name.
32
33 OPTIONS
34 -d, --detail *infolevel*
35 Level of information to print about compound records being ignored.
36 Default: *1*. Possible values: *1, 2 or 3*.
37
38 --datafieldscommon *newfieldlabel, newfieldvalue, [newfieldlabel,
39 newfieldvalue,...]*
40 Specify data field labels and values for addition to each compound
41 record. It's a comma delimited list of data field label and values
42 pair. Default: *none*.
43
44 Examples:
45
46 DepositionDate,YYYY-MM-DD
47 Source,www.domainname.org,ReleaseData,YYYY-MM-DD
48
49 --datafieldsmap *newfieldlabel, oldfieldlabel, [oldfieldlabel,...];
50 [newfieldlabel, oldfieldlabel, [oldfieldlabel,...]]*
51 Specify how various data field labels and values are combined to
52 generate a new data field labels and their values. All the comma
53 delimited data fields, with in a semicolon delimited set, are mapped
54 to the first new data field label along with the data field values
55 joined via new line character. Default: *none*.
56
57 Examples:
58
59 Synonym,Name,SystematicName,Synonym;CmpdID,Extreg
60 HBondDonors,SumNHOH
61
62 --datafieldsmapfile *filename*
63 Filename containing mapping of data fields. Format of data fields
64 line in this file corresponds to --datafieldsmap option. Example:
65
66 Line 1: Synonym,Name,SystematicName,Synonym;CmpdID,Extreg
67 Line 2: HBondDonors,SumNHOH
68
69 --datafieldURL *URLDataFieldLabel, CGIScriptPath, CGIParamName,
70 CmpdIDFieldLabel*
71 Specify how to generate a URL for retrieving compound data from a
72 web server and add it to each compound record. *URLDataFieldLabel*
73 is used as the data field label for URL value which is created by
74 combining *CGIScriptPath,CGIParamName,CmpdIDFieldLabel* values:
75 CGIScriptPath?CGIParamName=CmpdIDFieldLabelValue. Default: *none*.
76
77 Example:
78
79 Source,http://www.yourdomain.org/GetCmpd.pl,Reg_ID,Mol_ID
80
81 -h, --help
82 Print this help message.
83
84 -k, --keepolddatafields *all | unmappedonly | none*
85 Specify how to transfer old data fields from input SDFile(s) to new
86 SDFile(s) during *datafields | both* value of -m, --mode option:
87 keep all old data fields; write out the ones not mapped to new
88 fields as specified by --datafieldsmap or <--datafieldsmapfile>
89 options; or ignore all old data field labels. For *molname* -m
90 --mode, old datafields are always kept. Possible values: *all |
91 unmappedonly | none*. Default: *none*.
92
93 -m, --mode *molname | datafields | both*
94 Specify how to modify SDFile(s): *molname* - change molname line by
95 another datafield or value; *datafield* - modify data field labels
96 and values by replacing one label by another, combining multiple
97 data field labels and values, adding specific set of data field
98 labels and values to all compound, or inserting an URL for compound
99 retrieval to each record; *both* - change molname line and
100 datafields simultaneously. Possible values: *molname | datafields |
101 both*. Default: *molname*
102
103 --molnamemode *datafield | labelprefix*
104 Specify how to change molname line for -m --mode option values of
105 *molname | both*: use a datafield label value or assign a sequential
106 ID prefixed with *labelprefix*. Possible values: *datafield |
107 labelprefix*. Default: *labelprefix*.
108
109 --molname *datafieldname or prefixstring*
110 Molname generation method. For *datafield* value of --molnamemode
111 option, it corresponds to datafield label name whose value is used
112 for molname; otherwise, it's a prefix string used for generating
113 compound IDs like labelprefixstring<Number>. Default value, *Cmpd*,
114 generates compound IDs like Cmpd<Number> for molname.
115
116 --molnamereplace *always | empty*
117 Specify when to replace molname line for -m --mode option values of
118 *molname | both*: always replace the molname line using --molname
119 option or only when it's empty. Possible values: *always | empty*.
120 Default: *empty*.
121
122 -o, --overwrite
123 Overwrite existing files.
124
125 -r, --root *rootname*
126 New SD file name is generated using the root: <Root>.<Ext>. Default
127 new file name: <InitialSDFileName>ModifiedDataFields.<Ext>. This
128 option is ignored for multiple input files.
129
130 -w, --workingdir *dirname*
131 Location of working directory. Default: current directory.
132
133 EXAMPLES
134 To replace empty molname lines by Cmpd<CmpdNumber> and generate a new SD
135 file NewSample1.sdf, type:
136
137 % ModifySDFilesDataFields.pl -o -r NewSample1 Sample1.sdf
138
139 To replace all molname lines by Mol_ID data field generate a new SD file
140 NewSample1.sdf, type:
141
142 % ModifySDFilesDataFields.pl --molnamemode datafield
143 --molnamereplace always -r NewSample1 -o Sample1.sdf
144
145 To replace all molname lines by Mol_ID data field, map Name and
146 CompoundName to a new datafield Synonym, and generate a new SD file
147 NewSample1.sdf, type:
148
149 % ModifySDFilesDataFields.pl --molnamemode datafield
150 --molnamereplace always --molname Mol_ID --mode both
151 --datafieldsmap "Synonym,Name,CompoundName" -r
152 NewSample1 -o Sample1.sdf
153
154 To replace all molname lines by Mol_ID data field, map Name and
155 CompoundName to a new datafield Synonym, add common fields ReleaseDate
156 and Source, and generate a new SD file NewSample1.sdf without keeping
157 any old SD data fields, type:
158
159 % ModifySDFilesDataFields.pl --molnamemode datafield
160 --molnamereplace always --molname Mol_ID --mode both
161 --datafieldsmap "Synonym,Name,CompoundName"
162 --datafieldscommon "ReleaseDate,yyyy-mm-dd,Source,
163 www.mayachemtools.org" --keepolddatafields none -r
164 NewSample1 -o Sample1.sdf
165
166 Preparing SD files PubChem deposition:
167
168 Consider a SD file with these fields: Mol_ID, Name, Synonyms and
169 Systematic_Name. And Mol_ID data field uniquely identifies your
170 compound.
171
172 To prepare a new SD file CmpdDataForPubChem.sdf containing only required
173 PUBCHEM_EXT_DATASOURCE_REGID field, type:
174
175 % ModifySDFilesDataFields.pl --m datafields
176 --datafieldsmap
177 "PUBCHEM_EXT_DATASOURCE_REGID,Mol_ID"
178 -r CmpdDataForPubChem -o Sample1.sdf
179
180 To prepare a new SD file CmpdDataForPubChem.sdf containing only required
181 PUBCHEM_EXT_DATASOURCE_REGID field and replace molname line with Mol_ID,
182 type:
183
184 % ModifySDFilesDataFields.pl --molnamemode datafield
185 --molnamereplace always --molname Mol_ID --mode both
186 --datafieldsmap
187 "PUBCHEM_EXT_DATASOURCE_REGID,Mol_ID"
188 -r CmpdDataForPubChem -o Sample1.sdf
189
190 In addition to required PubChem data field, you can also add optional
191 PubChem data fields.
192
193 To map your Name, Synonyms and Systematic_Name data fields to optional
194 PUBCHEM_SUBSTANCE_SYNONYM data field along with required ID field, type:
195
196 % ModifySDFilesDataFields.pl --molnamemode datafield
197 --molnamereplace always --molname Mol_ID --mode both
198 --datafieldsmap
199 "PUBCHEM_EXT_DATASOURCE_REGID,Mol_ID;
200 PUBCHEM_SUBSTANCE_SYNONYM,Name,CompoundName"
201 -r CmpdDataForPubChem -o Sample1.sdf
202
203 To add your <domain.org> as PUBCHEM_EXT_SUBSTANCE_URL and link substance
204 retrieval to your CGI script
205 <http://www.yourdomain.org/GetCmpd.pl,Reg_ID,Mol_ID> via
206 PUBCHEM_EXT_DATASOURCE_REGID field along with optional and required data
207 fields, type:
208
209 % ModifySDFilesDataFields.pl --molnamemode datafield
210 --molnamereplace always --molname Mol_ID --mode both
211 --datafieldsmap
212 "PUBCHEM_EXT_DATASOURCE_REGID,Mol_ID;
213 PUBCHEM_SUBSTANCE_SYNONYM,Name,CompoundName"
214 --datafieldscommon
215 "PUBCHEM_EXT_SUBSTANCE_URL,domain.org"
216 --datafieldURL "PUBCHEM_EXT_DATASOURCE_URL,
217 http://www.yourdomain.org/GetCmpd.pl,Reg_ID,Mol_ID"
218 -r CmpdDataForPubChem -o Sample1.sdf
219
220 And to add a publication date and request a release data using
221 PUBCHEM_PUBLICATION_DATE and PUBCHEM_DEPOSITOR_RECORD_DATE data fields
222 along with all the data fields in earlier examples, type: optional
223 fields, type:
224
225 % ModifySDFilesDataFields.pl --molnamemode datafield
226 --molnamereplace always --molname Mol_ID --mode both
227 --datafieldsmap
228 "PUBCHEM_EXT_DATASOURCE_REGID,Mol_ID;
229 PUBCHEM_SUBSTANCE_SYNONYM,Name,CompoundName"
230 --datafieldURL "PUBCHEM_EXT_DATASOURCE_URL,
231 http://www.yourdomain.org/GetCmpd.pl,Reg_ID,Mol_ID"
232 --datafieldscommon
233 "PUBCHEM_EXT_SUBSTANCE_URL,domain.org,
234 PUBCHEM_PUBLICATION_DATE,YYY-MM-DD,
235 PUBCHEM_DEPOSITOR_RECORD_DATE,YYYY-MM-DD"
236 -r CmpdDataForPubChem -o Sample1.sdf
237
238 AUTHOR
239 Manish Sud <msud@san.rr.com>
240
241 SEE ALSO
242 InfoSDFiles.pl, JoinSDFiles.pl, MergeTextFilesWithSD.pl,
243 SplitSDFiles.pl, SDFilesToHTML.pl
244
245 COPYRIGHT
246 Copyright (C) 2015 Manish Sud. All rights reserved.
247
248 This file is part of MayaChemTools.
249
250 MayaChemTools is free software; you can redistribute it and/or modify it
251 under the terms of the GNU Lesser General Public License as published by
252 the Free Software Foundation; either version 3 of the License, or (at
253 your option) any later version.
254