Mercurial > repos > deepakjadmin > mayatool3_test2
comparison docs/scripts/txt/ExtractFromPDBFiles.txt @ 0:4816e4a8ae95 draft default tip
Uploaded
author | deepakjadmin |
---|---|
date | Wed, 20 Jan 2016 09:23:18 -0500 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:4816e4a8ae95 |
---|---|
1 NAME | |
2 ExtractFromPDBFiles.pl - Extract specific data from PDBFile(s) | |
3 | |
4 SYNOPSIS | |
5 ExtractFromPDBFiles.pl PDBFile(s)... | |
6 | |
7 ExtractFromPDBFiles.pl [-a, --Atoms "AtomNum, [AtomNum...]" | | |
8 "StartAtomNum, EndAtomNum" | "AtomName, [AtomName...]"] [-c, --chains | |
9 First | All | "ChainID, [ChainID,...]"] [<--CombineChains> yes | no] | |
10 [-d, --distance number] [--DistanceMode Atom | Hetatm | Residue | XYZ] | |
11 [--DistanceOrigin "AtomNumber, AtomName" | "HetatmNumber, HetAtmName" | | |
12 "ResidueNumber, ResidueName, [ChainID]" | "X,Y,Z">] | |
13 [<--DistanceSelectionMode> ByAtom | ByResidue] [-h, --help] [-k, | |
14 --KeepOldRecords yes | no] [-m, --mode Chains | Sequences | Atoms | | |
15 CAlphas | AtomNums | AtomsRange | AtomNames | ResidueNums | | |
16 ResiduesRange | ResidueNames | Distance | NonWater | NonHydrogens] | |
17 [--ModifyHeader yes | no] [--NonStandardKeep yes | no] | |
18 [--NonStandardCode character] [-o, --overwrite] [-r, --root rootname] | |
19 --RecordMode *Atom | Hetatm | AtomAndHetatm*] [--Residues | |
20 "ResidueNum,[ResidueNum...]" | StartResidueNum,EndResiduNum ] | |
21 [--SequenceLength number] [--SequenceRecords Atom | SeqRes] | |
22 [--SequenceIDPrefix FileName | HeaderRecord | Automatic] | |
23 [--WaterResidueNames Automatic | "ResidueName, [ResidueName,...]"] [-w, | |
24 --WorkingDir dirname] PDBFile(s)... | |
25 | |
26 DESCRIPTION | |
27 Extract specific data from *PDBFile(s)* and generate appropriate PDB or | |
28 sequence file(s). Multiple PDBFile names are separated by spaces. The | |
29 valid file extension is *.pdb*. All other file name extensions are | |
30 ignored during the wild card expansion. All the PDB files in a current | |
31 directory can be specified either by **.pdb* or the current directory | |
32 name. | |
33 | |
34 During *Chains* and *Sequences* values of -m, --mode option, all | |
35 ATOM/HETAM records for chains after the first model in PDB fils | |
36 containing data for multiple models are ignored. | |
37 | |
38 OPTIONS | |
39 -a, --Atoms *"AtomNum,[AtomNum...]" | "StartAtomNum,EndAtomNum" | | |
40 "AtomName,[AtomName...]"* | |
41 Specify which atom records to extract from *PDBFiles(s)* during | |
42 *AtomNums*, *AtomsRange*, and *AtomNames* value of -m, --mode | |
43 option: extract records corresponding to atom numbers specified in a | |
44 comma delimited list of atom numbers/names, or with in the range of | |
45 start and end atom numbers. Possible values: | |
46 *"AtomNum[,AtomNum,..]"*, *StartAtomNum,EndAtomNum*, or | |
47 *"AtomName[,AtomName,..]"*. Default: *None*. Examples: | |
48 | |
49 10 | |
50 15,20 | |
51 N,CA,C,O | |
52 | |
53 -c, --chains *First | All | ChainID,[ChainID,...]* | |
54 Specify which chains to extract from *PDBFile(s)* during *Chains | | |
55 Sequences* value of -m, --mode option: first chain, all chains, or a | |
56 specific list of comma delimited chain IDs. Possible values: *First | |
57 | All | ChainID,[ChainID,...]*. Default: *First*. Examples: | |
58 | |
59 A | |
60 A,B | |
61 All | |
62 | |
63 --CombineChains *yes | no* | |
64 Specify whether to combine extracted chains data into a single file | |
65 during *Chains* or *Sequences* value of -m, --mode option. Possible | |
66 values: *yes | no*. Default: *no*. | |
67 | |
68 During *Chains* value of <-m, --mode> option with *Yes* value of | |
69 <--CombineChains>, extracted data for specified chains is written | |
70 into a single file instead of individual file for each chain. | |
71 | |
72 During *Sequences* value of <-m, --mode> option with *Yes* value of | |
73 <--CombineChains>, residues sequences for specified chains are | |
74 extracted and concatenated into a single sequence file instead of | |
75 individual file for each chain. | |
76 | |
77 -d, --distance *number* | |
78 Specify distance used to extract ATOM/HETATM recods during | |
79 *Distance* value of -m, --mode option. Default: *10.0* angstroms. | |
80 | |
81 --RecordMode option controls type of record lines to extract from | |
82 *PDBFile(s)*: ATOM, HETATM or both. | |
83 | |
84 --DistanceMode *Atom | Hetatm | Residue | XYZ* | |
85 Specify how to extract ATOM/HETATM records from *PDBFile(s)* during | |
86 *Distance* value of -m, --mode option: extract all the records | |
87 within a certain distance specifed by -d, --distance from an atom or | |
88 hetro atom record, a residue, or any artbitrary point. Possible | |
89 values: *Atom | Hetatm | Residue | XYZ*. Default: *XYZ*. | |
90 | |
91 During *Residue* value of --distancemode, distance of ATOM/HETATM | |
92 records is calculated from all the atoms in the residue and the | |
93 records are selected as long as any atom of the residue lies with in | |
94 the distace specified using -d, --distance option. | |
95 | |
96 --RecordMode option controls type of record lines to extract from | |
97 *PDBFile(s)*: ATOM, HETATM or both. | |
98 | |
99 --DistanceSelectionMode *ByAtom | ByResidue* | |
100 Specify how how to extract ATOM/HETATM records from *PDBFile(s)* | |
101 during *Distance* value of -m, --mode option for all values of | |
102 --DistanceMode option: extract only those ATOM/HETATM records that | |
103 meet specified distance criterion; extract all records corresponding | |
104 to a residue as long as one of the ATOM/HETATM record in the residue | |
105 satisfies specified distance criterion. Possible values: *ByAtom, | |
106 ByResidue*. Default value: *ByAtom*. | |
107 | |
108 --DistanceOrigin *"AtomNumber,AtomName" | "HetatmNumber,HetAtmName" | | |
109 "ResidueNumber,ResidueName[,ChainID]" | "X,Y,Z"* | |
110 This value is --distancemode specific. In general, it identifies a | |
111 point used to select other ATOM/HETATMS with in a specific distance | |
112 from this point. | |
113 | |
114 For *Atom* value of --distancemode, this option corresponds to an | |
115 atom specification. Format: *AtomNumber,AtomName*. Example: | |
116 | |
117 455,CA | |
118 | |
119 For *Hetatm* value of --distancemode, this option corresponds to a | |
120 hetatm specification. Format: *HetatmNumber,HetAtmName*. Example: | |
121 | |
122 5295,C1 | |
123 | |
124 For *Residue* value of --distancemode, this option corresponds to a | |
125 residue specification. Format: *ResidueNumber, | |
126 ResidueName[,ChainID]*. Example: | |
127 | |
128 78,MSE | |
129 977,RET,A | |
130 978,RET,B | |
131 | |
132 For *XYZ* value of --distancemode, this option corresponds to a | |
133 coordinate of an arbitrary point. Format: *X,Y,X*. Example: | |
134 | |
135 10.044,19.261,-4.292 | |
136 | |
137 --RecordMode option controls type of record lines to extract from | |
138 *PDBFile(s)*: ATOM, HETATM or both. | |
139 | |
140 -h, --help | |
141 Print this help message. | |
142 | |
143 -k, --KeepOldRecords *yes | no* | |
144 Specify whether to transfer old non ATOM and HETATM records from | |
145 input PDBFile(s) to new PDBFile(s) during *Chains | Atoms | HetAtms | |
146 | CAlphas | Distance| NonWater | NonHydrogens* value of -m --mode | |
147 option. By default, except for the HEADER record, all other | |
148 unnecessary non ATOM/HETATM records are dropped during the | |
149 generation of new PDB files. Possible values: *yes | no*. Default: | |
150 *no*. | |
151 | |
152 -m, --mode *Chains | Sequences | Atoms | CAlphas | AtomNums | AtomsRange | |
153 | AtomNames | ResidueNums | ResiduesRange | ResidueNames | Distance | | |
154 NonWater | NonHydrogens* | |
155 Specify what to extract from *PDBFile(s)*: *Chains* - retrieve | |
156 records for specified chains; *Sequences* - generate sequence files | |
157 for specific chains; *Atoms* - extract atom records; *CAlphas* - | |
158 extract atom records for alpha carbon atoms; *AtomNums* - extract | |
159 atom records for specified atom numbers; *AtomsRange* - extract atom | |
160 records between specified atom number range; *AtomNames* - extract | |
161 atom records for specified atom names; *ResidueNums* - extract | |
162 records for specified residue numbers; *ResiduesRange* - extract | |
163 records for residues between specified residue number range; | |
164 *ResidueNames* - extract records for specified residue names; | |
165 *Distance* - extract records with in a certain distance from a | |
166 specific position; *NonWater* - extract records corresponding to | |
167 residues other than water; *NonHydrogens* - extract non-hydrogen | |
168 records. | |
169 | |
170 Possible values: *Chains, Sequences Atoms, CAlphas, AtomNums, | |
171 AtomsRange, AtomNames, ResidueNums, ResiduesRange, ResidueNames, | |
172 Distance, NonWater, NonHydrogens*. Default value: *NonWater* | |
173 | |
174 During the generation of new PDB files, unnecessay CONECT records | |
175 are dropped. | |
176 | |
177 For *Chains* mode, data for appropriate chains specified by --c | |
178 --chains option is extracted from *PDBFile(s)* and placed into new | |
179 PDB file(s). | |
180 | |
181 For *Sequences* mode, residues names using various sequence related | |
182 options are extracted for chains specified by --c --chains option | |
183 from *PDBFile(s)* and FASTA sequence file(s) are generated. | |
184 | |
185 For *Distance* mode, all ATOM/HETATM records with in a distance | |
186 specified by -d --distance option from a specific atom, residue or a | |
187 point indicated by --distancemode are extracted and placed into new | |
188 PDB file(s). | |
189 | |
190 For *NonWater* mode, non water ATOM/HETATM record lines, identified | |
191 using value of --WaterResidueNames, are extracted and written to new | |
192 PDB file(s). | |
193 | |
194 For *NonHydrogens* mode, ATOM/HETATOM record lines containing | |
195 element symbol other than *H* are extracted and written to new PDB | |
196 file(s). | |
197 | |
198 For all other options, appropriate ATOM/HETATM records are extracted | |
199 to generate new PDB file(s). | |
200 | |
201 --RecordMode option controls type of record lines to extract and | |
202 process from *PDBFile(s)*: ATOM, HETATM or both. | |
203 | |
204 --ModifyHeader *yes | no* | |
205 Specify whether to modify HEADER record during the generation of new | |
206 PDB files for -m, --mode values of *Chains | Atoms | CAlphas | | |
207 Distance*. Possible values: *yes | no*. Default: *yes*. By default, | |
208 Classification data is replaced by *Data extracted using | |
209 MayaChemTools* before writing out HEADER record. | |
210 | |
211 --NonStandardKeep *yes | no* | |
212 Specify whether to include and convert non-standard three letter | |
213 residue codes into a code specified using --nonstandardcode option | |
214 and include them into sequence file(s) generated during *Sequences* | |
215 value of -m, --mode option. Possible values: *yes | no*. Default: | |
216 *yes*. | |
217 | |
218 A warning is also printed about the presence of non-standard | |
219 residues. Any residue other than standard 20 amino acids and 5 | |
220 nucleic acid is considered non-standard; additionally, HETATM | |
221 residues in chains also tagged as non-standard. | |
222 | |
223 --NonStandardCode *character* | |
224 A single character code to use for non-standard residues. Default: | |
225 *X*. Possible values: *?, -, or X*. | |
226 | |
227 -o, --overwrite | |
228 Overwrite existing files. | |
229 | |
230 -r, --root *rootname* | |
231 New PDB and sequence file name is generated using the root: | |
232 <Root><Mode>.<Ext>. Default new file name: | |
233 <PDBFileName>Chain<ChainID>.pdb for *Chains* mode; | |
234 <PDBFileName>SequenceChain<ChainID>.fasta for *Sequences* mode; | |
235 <PDBFileName>DistanceBy<DistanceMode>.pdb for *Distance* -m, --mode | |
236 <PDBFileName><Mode>.pdb for *Atoms | CAlphas | NonWater | | |
237 NonHydrogens* -m, --mode values. This option is ignored for multiple | |
238 input files. | |
239 | |
240 --RecordMode *Atom | Hetatm | AtomAndHetatm* | |
241 Specify type of record lines to extract and process from | |
242 *PDBFile(s)* during various values of -m, --mode option: extract | |
243 only ATOM record lines; extract only HETATM record lines; extract | |
244 both ATOM and HETATM lines. Possible values: *Atom | Hetatm | | |
245 AtomAndHetatm | XYZ*. Default during *Atoms, CAlphas, AtomNums, | |
246 AtomsRange, AtomNames* values of -m, --mode option: *Atom*; | |
247 otherwise: *AtomAndHetatm*. | |
248 | |
249 This option is ignored during *Chains, Sequences* values of -m, | |
250 --mode option. | |
251 | |
252 --Residues *"ResidueNum,[ResidueNum...]" | | |
253 "StartResidueNum,EndResiduNum" | "ResidueName,[ResidueName...]"* | |
254 Specify which resiude records to extract from *PDBFiles(s)* during | |
255 *ResidueNums*, *ResiduesRange*,and *ResidueNames* value of -m, | |
256 --mode option: extract records corresponding to residue numbers | |
257 specified in a comma delimited list of residue numbers/names, or | |
258 with in the range of start and end residue numbers. Possible values: | |
259 *"ResidueNum[,ResidueNum,..]"*, *StartResidueNum,EndResiduNum*, or | |
260 *<"ResidueName[,ResidueName,..]"*. Default: *None*. Examples: | |
261 | |
262 20 | |
263 5,10 | |
264 TYR,SER,THR | |
265 | |
266 --RecordMode option controls type of record lines to extract from | |
267 *PDBFile(s)*: ATOM, HETATM or both. | |
268 | |
269 --SequenceLength *number* | |
270 Maximum sequence length per line in sequence file(s). Default: *80*. | |
271 | |
272 --SequenceRecords *Atom | SeqRes* | |
273 Specify which records to use for extracting residue names from | |
274 *PDBFiles(s)* during *Sequences* value of -m, --mode option: use | |
275 ATOM records to compile a list of residues in a chain or parse | |
276 SEQRES record to get a list of residues. Possible values: *Atom | | |
277 SeqRes*. Default: *Atom*. | |
278 | |
279 --SequenceIDPrefix *FileName | HeaderRecord | Automatic* | |
280 Specify how to generate a prefix for sequence IDs during *Sequences* | |
281 value of -m, --mode option: use input file name prefix; retrieve PDB | |
282 ID from HEADER record; or automatically decide the method for | |
283 generating the prefix. The chain IDs are also appended to the | |
284 prefix. Possible values: *FileName | HeaderRecord | Automatic*. | |
285 Default: *Automatic* | |
286 | |
287 --WaterResidueNames *Automatic | "ResidueName,[ResidueName,...]"* | |
288 Identification of water residues during *NonWater* value of -m, | |
289 --mode option. Possible values: *Automatic | | |
290 "ResidueName,[ResidueName,...]"*. Default: *Automatic* - corresponds | |
291 to "HOH,WAT,H20". You can also specify a different comma delimited | |
292 list of residue names to use for water. | |
293 | |
294 -w, --WorkingDir *dirname* | |
295 Location of working directory. Default: current directory. | |
296 | |
297 EXAMPLES | |
298 To extract non-water records from Sample2.pdb file and generate | |
299 Sample2NonWater.pdb file, type: | |
300 | |
301 % ExtractFromPDBFiles.pl Sample2.pdb | |
302 | |
303 To extract non-water records corresponding to only ATOM records from | |
304 Sample2.pdb file and generate Sample2NonWater.pdb file, type: | |
305 | |
306 % ExtractFromPDBFiles.pl --RecordMode Atom Sample2.pdb | |
307 | |
308 To extract non-water records from Sample2.pdb file using HOH or WAT | |
309 residue name for water along with all old non-coordinate records and | |
310 generate Sample2NewNonWater.pdb file, type: | |
311 | |
312 % ExtractFromPDBFiles.pl -m NonWater --WaterResidueNames "HOH,WAT" | |
313 -KeepOldRecords Yes -r Sample2New -o Sample2.pdb | |
314 | |
315 To extract non-hydrogens records from Sample2.pdb file and generate | |
316 Sample2NonHydrogen.pdb file, type: | |
317 | |
318 % ExtractFromPDBFiles.pl -m NonHydrogens Sample2.pdb | |
319 | |
320 To extract data for first chain in Sample2.pdb and generate | |
321 Sample2ChainA.pdb, type file, type: | |
322 | |
323 % ExtractFromPDBFiles.pl -m chains -o Sample2.pdb | |
324 | |
325 To extract data for both chains in Sample2.pdb and generate | |
326 Sample2ChainA.pdb and Sample2ChainB.pdb, type: | |
327 | |
328 % ExtractFromPDBFiles.pl -m chains -c All -o Sample2.pdb | |
329 | |
330 To extract data for alpha carbons in Sample2.pdb and generate | |
331 Sample2CAlphas.pdb, type: | |
332 | |
333 % ExtractFromPDBFiles.pl -m CAlphas -o Sample2.pdb | |
334 | |
335 To extract records for specific residue numbers in all chains from | |
336 Sample2.pdb file and generate Sample2ResidueNums.pdb file, type: | |
337 | |
338 % ExtractFromPDBFiles.pl -m ResidueNums --Residues "3,6" | |
339 Sample2.pdb | |
340 | |
341 To extract records for a specific range of residue number in all chains | |
342 from Sample2.pdb file and generate Sample2ResiduesRange.pdb file, type: | |
343 | |
344 % ExtractFromPDBFiles.pl -m ResiduesRange --Residues "10,30" | |
345 Sample2.pdb | |
346 | |
347 To extract data for all ATOM and HETATM records with in 10 angstrom of | |
348 an atom specifed by atom serial number and name "1,N" in Sample2.pdb | |
349 file and generate Sample2DistanceByAtom.pdb, type: | |
350 | |
351 % ExtractFromPDBFiles.pl -m Distance --DistanceMode Atom | |
352 --DistanceOrigin "1,N" -k No --distance 10 -o Sample2.pdb | |
353 | |
354 To extract data for all ATOM and HETATM records for complete residues | |
355 with any atom or hetatm less than 10 angstrom of an atom specifed by | |
356 atom serial number and name "1,N" in Sample2.pdb file and generate | |
357 Sample2DistanceByAtom.pdb, type: | |
358 | |
359 % ExtractFromPDBFiles.pl -m Distance --DistanceMode Atom | |
360 --DistanceOrigin "1,N" --DistanceSelectionMode ByResidue | |
361 -k No --distance 10 -o Sample2.pdb | |
362 | |
363 To extract data for all ATOM and HETATM records with in 25 angstrom of | |
364 an arbitrary point "0,0,0" in Sample2.pdb file and generate | |
365 Sample2DistanceByXYZ.pdb, type: | |
366 | |
367 % ExtractFromPDBFiles.pl -m Distance --DistanceMode XYZ | |
368 --DistanceOrigin "0,0,0" -k No --distance 25 -o Sample2.pdb | |
369 | |
370 AUTHOR | |
371 Manish Sud <msud@san.rr.com> | |
372 | |
373 SEE ALSO | |
374 InfoPDBFiles.pl, ModifyPDBFiles.pl | |
375 | |
376 COPYRIGHT | |
377 Copyright (C) 2015 Manish Sud. All rights reserved. | |
378 | |
379 This file is part of MayaChemTools. | |
380 | |
381 MayaChemTools is free software; you can redistribute it and/or modify it | |
382 under the terms of the GNU Lesser General Public License as published by | |
383 the Free Software Foundation; either version 3 of the License, or (at | |
384 your option) any later version. | |
385 |