comparison mayachemtools/docs/scripts/html/ModifySDFilesDataFields.html @ 0:73ae111cf86f draft

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 11:55:01 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:73ae111cf86f
1 <html>
2 <head>
3 <title>MayaChemTools:Documentation:ModifySDFilesDataFields.pl</title>
4 <meta http-equiv="content-type" content="text/html;charset=utf-8">
5 <link rel="stylesheet" type="text/css" href="../../css/MayaChemTools.css">
6 </head>
7 <body leftmargin="20" rightmargin="20" topmargin="10" bottommargin="10">
8 <br/>
9 <center>
10 <a href="http://www.mayachemtools.org" title="MayaChemTools Home"><img src="../../images/MayaChemToolsLogo.gif" border="0" alt="MayaChemTools"></a>
11 </center>
12 <br/>
13 <div class="DocNav">
14 <table width="100%" border=0 cellpadding=0 cellspacing=2>
15 <tr align="left" valign="top"><td width="33%" align="left"><a href="./ModifyPDBFiles.html" title="ModifyPDBFiles.html">Previous</a>&nbsp;&nbsp;<a href="./index.html" title="Table of Contents">TOC</a>&nbsp;&nbsp;<a href="./ModifyTextFilesFormat.html" title="ModifyTextFilesFormat.html">Next</a></td><td width="34%" align="middle"><strong>ModifySDFilesDataFields.pl</strong></td><td width="33%" align="right"><a href="././code/ModifySDFilesDataFields.html" title="View source code">Code</a>&nbsp;|&nbsp;<a href="./../pdf/ModifySDFilesDataFields.pdf" title="PDF US Letter Size">PDF</a>&nbsp;|&nbsp;<a href="./../pdfgreen/ModifySDFilesDataFields.pdf" title="PDF US Letter Size with narrow margins: www.changethemargins.com">PDFGreen</a>&nbsp;|&nbsp;<a href="./../pdfa4/ModifySDFilesDataFields.pdf" title="PDF A4 Size">PDFA4</a>&nbsp;|&nbsp;<a href="./../pdfa4green/ModifySDFilesDataFields.pdf" title="PDF A4 Size with narrow margins: www.changethemargins.com">PDFA4Green</a></td></tr>
16 </table>
17 </div>
18 <p>
19 </p>
20 <h2>NAME</h2>
21 <p>ModifySDFilesDataFields.pl - Modify data fields in SDFile(s)</p>
22 <p>
23 </p>
24 <h2>SYNOPSIS</h2>
25 <p>ModifySDFilesDataFields.pl SDFile(s)...</p>
26 <p>ModifySDFilesDataFields.pl [<strong>-d, --detail</strong> infolevel]
27 [<strong>--datafieldscommon</strong> newfieldlabel, newfieldvalue, [newfieldlabel, newfieldvalue,...]]
28 [<strong>--datafieldsmap</strong> newfieldlabel, oldfieldlabel, [oldfieldlabel,...]; [newfieldlabel, oldfieldlabel, [oldfieldlabel,...]]]
29 [<strong>--datafieldsmapfile</strong> filename] [<strong>--datafieldURL</strong> URLDataFieldLabel, CGIScriptPath, CGIParamName, CmpdIDFieldLabel]
30 [<strong>-h, --help</strong>] [<strong>-k, --keepolddatafields</strong> all | unmappedonly | none] [<strong>-m, --mode</strong> molname | datafields | both]
31 [<strong>--molnamemode</strong> datafield | labelprefix] [<strong>--molname</strong> datafieldname or prefixstring]
32 [<strong>--molnamereplace</strong> always | empty] [<strong>-o, --overwrite</strong>] [<strong>-r, --root</strong> rootname]
33 [<strong>-w, --workingdir</strong> dirname] SDFile(s)...</p>
34 <p>
35 </p>
36 <h2>DESCRIPTION</h2>
37 <p>Modify molname line and data fields in <em>SDFile(s)</em>. Molname line can be replaced by a
38 data field value or assigned a sequential ID prefixed with a specific string. For data
39 fields and modification of their values, these types of options are supported: replace
40 data field labels by another set of labels; combine values of multiple data fields and
41 assign a new label; add specific set of data field labels and values to all compound
42 records; and others.</p>
43 <p>The file names are separated by space.The valid file extensions are <em>.sdf</em> and <em>.sd</em>.
44 All other file names are ignored. All the SD files in a current directory can be specified
45 either by <em>*.sdf</em> or the current directory name.</p>
46 <p>
47 </p>
48 <h2>OPTIONS</h2>
49 <dl>
50 <dt><strong><strong>-d, --detail</strong> <em>infolevel</em></strong></dt>
51 <dd>
52 <p>Level of information to print about compound records being ignored. Default: <em>1</em>. Possible
53 values: <em>1, 2 or 3</em>.</p>
54 </dd>
55 <dt><strong><strong>--datafieldscommon</strong> <em>newfieldlabel, newfieldvalue, [newfieldlabel, newfieldvalue,...]</em></strong></dt>
56 <dd>
57 <p>Specify data field labels and values for addition to each compound record. It's a comma delimited
58 list of data field label and values pair. Default: <em>none</em>.</p>
59 <p>Examples:</p>
60 <div class="OptionsBox">
61 DepositionDate,YYYY-MM-DD
62 <br/> Source,www.domainname.org,ReleaseData,YYYY-MM-DD</div>
63 </dd>
64 <dt><strong><strong>--datafieldsmap</strong> <em>newfieldlabel, oldfieldlabel, [oldfieldlabel,...]; [newfieldlabel, oldfieldlabel, [oldfieldlabel,...]]</em></strong></dt>
65 <dd>
66 <p>Specify how various data field labels and values are combined to generate a new data field
67 labels and their values. All the comma delimited data fields, with in a semicolon delimited set,
68 are mapped to the first new data field label along with the data field values joined via new
69 line character. Default: <em>none</em>.</p>
70 <p>Examples:</p>
71 <div class="OptionsBox">
72 Synonym,Name,SystematicName,Synonym;CmpdID,Extreg
73 <br/> HBondDonors,SumNHOH</div>
74 </dd>
75 <dt><strong><strong>--datafieldsmapfile</strong> <em>filename</em></strong></dt>
76 <dd>
77 <p>Filename containing mapping of data fields. Format of data fields line in this file corresponds
78 to <strong>--datafieldsmap</strong> option. Example:</p>
79 <div class="OptionsBox">
80 Line 1: Synonym,Name,SystematicName,Synonym;CmpdID,Extreg
81 <br/> Line 2: HBondDonors,SumNHOH</div>
82 </dd>
83 <dt><strong><strong>--datafieldURL</strong> <em>URLDataFieldLabel, CGIScriptPath, CGIParamName, CmpdIDFieldLabel</em></strong></dt>
84 <dd>
85 <p>Specify how to generate a URL for retrieving compound data from a web server and add it
86 to each compound record. <em>URLDataFieldLabel</em> is used as the data field label for URL value
87 which is created by combining <em>CGIScriptPath,CGIParamName,CmpdIDFieldLabel</em> values:
88 CGIScriptPath?CGIParamName=CmpdIDFieldLabelValue. Default: <em>none</em>.</p>
89 <p>Example:</p>
90 <div class="OptionsBox">
91 Source,<a href="http://www.yourdomain.org/GetCmpd.pl">http://www.yourdomain.org/GetCmpd.pl</a>,Reg_ID,Mol_ID</div>
92 </dd>
93 <dt><strong><strong>-h, --help</strong></strong></dt>
94 <dd>
95 <p>Print this help message.</p>
96 </dd>
97 <dt><strong><strong>-k, --keepolddatafields</strong> <em>all | unmappedonly | none</em></strong></dt>
98 <dd>
99 <p>Specify how to transfer old data fields from input SDFile(s) to new SDFile(s) during
100 <em>datafields | both</em> value of <strong>-m, --mode</strong> option: keep all old data fields; write out the ones
101 not mapped to new fields as specified by <strong>--datafieldsmap</strong> or &lt;--datafieldsmapfile&gt; options;
102 or ignore all old data field labels. For <em>molname</em> <strong>-m --mode</strong>, old datafields are always kept.
103 Possible values: <em>all | unmappedonly | none</em>. Default: <em>none</em>.</p>
104 </dd>
105 <dt><strong><strong>-m, --mode</strong> <em>molname | datafields | both</em></strong></dt>
106 <dd>
107 <p>Specify how to modify SDFile(s): <em>molname</em> - change molname line by another datafield or value;
108 <em>datafield</em> - modify data field labels and values by replacing one label by another, combining
109 multiple data field labels and values, adding specific set of data field labels and values to all compound, or
110 inserting an URL for compound retrieval to each record; <em>both</em> - change molname line and datafields
111 simultaneously. Possible values: <em>molname | datafields | both</em>. Default: <em>molname</em></p>
112 </dd>
113 <dt><strong><strong>--molnamemode</strong> <em>datafield | labelprefix</em></strong></dt>
114 <dd>
115 <p>Specify how to change molname line for <strong>-m --mode</strong> option values of <em>molname | both</em>: use
116 a datafield label value or assign a sequential ID prefixed with <em>labelprefix</em>. Possible values:
117 <em>datafield | labelprefix</em>. Default: <em>labelprefix</em>.</p>
118 </dd>
119 <dt><strong><strong>--molname</strong> <em>datafieldname or prefixstring</em></strong></dt>
120 <dd>
121 <p>Molname generation method. For <em>datafield</em> value of <strong>--molnamemode</strong> option, it corresponds
122 to datafield label name whose value is used for molname; otherwise, it's a prefix string used for
123 generating compound IDs like labelprefixstring&lt;Number&gt;. Default value, <em>Cmpd</em>, generates
124 compound IDs like Cmpd&lt;Number&gt; for molname.</p>
125 </dd>
126 <dt><strong><strong>--molnamereplace</strong> <em>always | empty</em></strong></dt>
127 <dd>
128 <p>Specify when to replace molname line for <strong>-m --mode</strong> option values of <em>molname | both</em>:
129 always replace the molname line using <strong>--molname</strong> option or only when it's empty. Possible
130 values: <em>always | empty</em>. Default: <em>empty</em>.</p>
131 </dd>
132 <dt><strong><strong>-o, --overwrite</strong></strong></dt>
133 <dd>
134 <p>Overwrite existing files.</p>
135 </dd>
136 <dt><strong><strong>-r, --root</strong> <em>rootname</em></strong></dt>
137 <dd>
138 <p>New SD file name is generated using the root: &lt;Root&gt;.&lt;Ext&gt;. Default new file
139 name: &lt;InitialSDFileName&gt;ModifiedDataFields.&lt;Ext&gt;. This option is ignored for multiple
140 input files.</p>
141 </dd>
142 <dt><strong><strong>-w, --workingdir</strong> <em>dirname</em></strong></dt>
143 <dd>
144 <p>Location of working directory. Default: current directory.</p>
145 </dd>
146 </dl>
147 <p>
148 </p>
149 <h2>EXAMPLES</h2>
150 <p>To replace empty molname lines by Cmpd&lt;CmpdNumber&gt; and generate a new SD file
151 NewSample1.sdf, type:</p>
152 <div class="ExampleBox">
153 % ModifySDFilesDataFields.pl -o -r NewSample1 Sample1.sdf</div>
154 <p>To replace all molname lines by Mol_ID data field generate a new SD file
155 NewSample1.sdf, type:</p>
156 <div class="ExampleBox">
157 % ModifySDFilesDataFields.pl --molnamemode datafield
158 --molnamereplace always -r NewSample1 -o Sample1.sdf</div>
159 <p>To replace all molname lines by Mol_ID data field, map Name and CompoundName to
160 a new datafield Synonym, and generate a new SD file NewSample1.sdf, type:</p>
161 <div class="ExampleBox">
162 % ModifySDFilesDataFields.pl --molnamemode datafield
163 --molnamereplace always --molname Mol_ID --mode both
164 --datafieldsmap &quot;Synonym,Name,CompoundName&quot; -r
165 NewSample1 -o Sample1.sdf</div>
166 <p>To replace all molname lines by Mol_ID data field, map Name and CompoundName to
167 a new datafield Synonym, add common fields ReleaseDate and Source, and
168 generate a new SD file NewSample1.sdf without keeping any old SD data fields, type:</p>
169 <div class="ExampleBox">
170 % ModifySDFilesDataFields.pl --molnamemode datafield
171 --molnamereplace always --molname Mol_ID --mode both
172 --datafieldsmap &quot;Synonym,Name,CompoundName&quot;
173 --datafieldscommon &quot;ReleaseDate,yyyy-mm-dd,Source,
174 www.mayachemtools.org&quot; --keepolddatafields none -r
175 NewSample1 -o Sample1.sdf</div>
176 <p><strong>Preparing SD files PubChem deposition:</strong></p>
177 <p>Consider a SD file with these fields: Mol_ID, Name, Synonyms and Systematic_Name.
178 And Mol_ID data field uniquely identifies your compound.</p>
179 <p>To prepare a new SD file CmpdDataForPubChem.sdf containing only required
180 PUBCHEM_EXT_DATASOURCE_REGID field, type:</p>
181 <div class="ExampleBox">
182 % ModifySDFilesDataFields.pl --m datafields
183 --datafieldsmap
184 &quot;PUBCHEM_EXT_DATASOURCE_REGID,Mol_ID&quot;
185 -r CmpdDataForPubChem -o Sample1.sdf</div>
186 <p>To prepare a new SD file CmpdDataForPubChem.sdf containing only required
187 PUBCHEM_EXT_DATASOURCE_REGID field and replace molname line with Mol_ID, type:</p>
188 <div class="ExampleBox">
189 % ModifySDFilesDataFields.pl --molnamemode datafield
190 --molnamereplace always --molname Mol_ID --mode both
191 --datafieldsmap
192 &quot;PUBCHEM_EXT_DATASOURCE_REGID,Mol_ID&quot;
193 -r CmpdDataForPubChem -o Sample1.sdf</div>
194 <p>In addition to required PubChem data field, you can also add optional PubChem data
195 fields.</p>
196 <p>To map your Name, Synonyms and Systematic_Name data fields to optional
197 PUBCHEM_SUBSTANCE_SYNONYM data field along with required ID field, type:</p>
198 <div class="ExampleBox">
199 % ModifySDFilesDataFields.pl --molnamemode datafield
200 --molnamereplace always --molname Mol_ID --mode both
201 --datafieldsmap
202 &quot;PUBCHEM_EXT_DATASOURCE_REGID,Mol_ID;
203 PUBCHEM_SUBSTANCE_SYNONYM,Name,CompoundName&quot;
204 -r CmpdDataForPubChem -o Sample1.sdf</div>
205 <p>To add your &lt;domain.org&gt; as PUBCHEM_EXT_SUBSTANCE_URL and link substance
206 retrieval to your CGI script &lt;http://www.yourdomain.org/GetCmpd.pl,Reg_ID,Mol_ID&gt;
207 via PUBCHEM_EXT_DATASOURCE_REGID field along with optional and required
208 data fields, type:</p>
209 <div class="ExampleBox">
210 % ModifySDFilesDataFields.pl --molnamemode datafield
211 --molnamereplace always --molname Mol_ID --mode both
212 --datafieldsmap
213 &quot;PUBCHEM_EXT_DATASOURCE_REGID,Mol_ID;
214 PUBCHEM_SUBSTANCE_SYNONYM,Name,CompoundName&quot;
215 --datafieldscommon
216 &quot;PUBCHEM_EXT_SUBSTANCE_URL,domain.org&quot;
217 --datafieldURL &quot;PUBCHEM_EXT_DATASOURCE_URL,
218 <a href="http://www.yourdomain.org/GetCmpd.pl">http://www.yourdomain.org/GetCmpd.pl</a>,Reg_ID,Mol_ID&quot;
219 -r CmpdDataForPubChem -o Sample1.sdf</div>
220 <p>And to add a publication date and request a release data using
221 PUBCHEM_PUBLICATION_DATE and PUBCHEM_DEPOSITOR_RECORD_DATE data fields
222 along with all the data fields in earlier examples, type:
223 optional fields, type:</p>
224 <div class="ExampleBox">
225 % ModifySDFilesDataFields.pl --molnamemode datafield
226 --molnamereplace always --molname Mol_ID --mode both
227 --datafieldsmap
228 &quot;PUBCHEM_EXT_DATASOURCE_REGID,Mol_ID;
229 PUBCHEM_SUBSTANCE_SYNONYM,Name,CompoundName&quot;
230 --datafieldURL &quot;PUBCHEM_EXT_DATASOURCE_URL,
231 <a href="http://www.yourdomain.org/GetCmpd.pl">http://www.yourdomain.org/GetCmpd.pl</a>,Reg_ID,Mol_ID&quot;
232 --datafieldscommon
233 &quot;PUBCHEM_EXT_SUBSTANCE_URL,domain.org,
234 PUBCHEM_PUBLICATION_DATE,YYY-MM-DD,
235 PUBCHEM_DEPOSITOR_RECORD_DATE,YYYY-MM-DD&quot;
236 -r CmpdDataForPubChem -o Sample1.sdf</div>
237 <p>
238 </p>
239 <h2>AUTHOR</h2>
240 <p><a href="mailto:msud@san.rr.com">Manish Sud</a></p>
241 <p>
242 </p>
243 <h2>SEE ALSO</h2>
244 <p><a href="./InfoSDFiles.html">InfoSDFiles.pl</a>,&nbsp<a href="./JoinSDFiles.html">JoinSDFiles.pl</a>,&nbsp<a href="./MergeTextFilesWithSD.html">MergeTextFilesWithSD.pl</a>,&nbsp<a href="./SplitSDFiles.html">SplitSDFiles.pl</a>,&nbsp<a href="./SDFilesToHTML.html">SDFilesToHTML.pl</a>
245 </p>
246 <p>
247 </p>
248 <h2>COPYRIGHT</h2>
249 <p>Copyright (C) 2015 Manish Sud. All rights reserved.</p>
250 <p>This file is part of MayaChemTools.</p>
251 <p>MayaChemTools is free software; you can redistribute it and/or modify it under
252 the terms of the GNU Lesser General Public License as published by the Free
253 Software Foundation; either version 3 of the License, or (at your option)
254 any later version.</p>
255 <p>&nbsp</p><p>&nbsp</p><div class="DocNav">
256 <table width="100%" border=0 cellpadding=0 cellspacing=2>
257 <tr align="left" valign="top"><td width="33%" align="left"><a href="./ModifyPDBFiles.html" title="ModifyPDBFiles.html">Previous</a>&nbsp;&nbsp;<a href="./index.html" title="Table of Contents">TOC</a>&nbsp;&nbsp;<a href="./ModifyTextFilesFormat.html" title="ModifyTextFilesFormat.html">Next</a></td><td width="34%" align="middle"><strong>March 29, 2015</strong></td><td width="33%" align="right"><strong>ModifySDFilesDataFields.pl</strong></td></tr>
258 </table>
259 </div>
260 <br />
261 <center>
262 <img src="../../images/h2o2.png">
263 </center>
264 </body>
265 </html>