0
|
1 NAME
|
|
2 ExtractFromPDBFiles.pl - Extract specific data from PDBFile(s)
|
|
3
|
|
4 SYNOPSIS
|
|
5 ExtractFromPDBFiles.pl PDBFile(s)...
|
|
6
|
|
7 ExtractFromPDBFiles.pl [-a, --Atoms "AtomNum, [AtomNum...]" |
|
|
8 "StartAtomNum, EndAtomNum" | "AtomName, [AtomName...]"] [-c, --chains
|
|
9 First | All | "ChainID, [ChainID,...]"] [<--CombineChains> yes | no]
|
|
10 [-d, --distance number] [--DistanceMode Atom | Hetatm | Residue | XYZ]
|
|
11 [--DistanceOrigin "AtomNumber, AtomName" | "HetatmNumber, HetAtmName" |
|
|
12 "ResidueNumber, ResidueName, [ChainID]" | "X,Y,Z">]
|
|
13 [<--DistanceSelectionMode> ByAtom | ByResidue] [-h, --help] [-k,
|
|
14 --KeepOldRecords yes | no] [-m, --mode Chains | Sequences | Atoms |
|
|
15 CAlphas | AtomNums | AtomsRange | AtomNames | ResidueNums |
|
|
16 ResiduesRange | ResidueNames | Distance | NonWater | NonHydrogens]
|
|
17 [--ModifyHeader yes | no] [--NonStandardKeep yes | no]
|
|
18 [--NonStandardCode character] [-o, --overwrite] [-r, --root rootname]
|
|
19 --RecordMode *Atom | Hetatm | AtomAndHetatm*] [--Residues
|
|
20 "ResidueNum,[ResidueNum...]" | StartResidueNum,EndResiduNum ]
|
|
21 [--SequenceLength number] [--SequenceRecords Atom | SeqRes]
|
|
22 [--SequenceIDPrefix FileName | HeaderRecord | Automatic]
|
|
23 [--WaterResidueNames Automatic | "ResidueName, [ResidueName,...]"] [-w,
|
|
24 --WorkingDir dirname] PDBFile(s)...
|
|
25
|
|
26 DESCRIPTION
|
|
27 Extract specific data from *PDBFile(s)* and generate appropriate PDB or
|
|
28 sequence file(s). Multiple PDBFile names are separated by spaces. The
|
|
29 valid file extension is *.pdb*. All other file name extensions are
|
|
30 ignored during the wild card expansion. All the PDB files in a current
|
|
31 directory can be specified either by **.pdb* or the current directory
|
|
32 name.
|
|
33
|
|
34 During *Chains* and *Sequences* values of -m, --mode option, all
|
|
35 ATOM/HETAM records for chains after the first model in PDB fils
|
|
36 containing data for multiple models are ignored.
|
|
37
|
|
38 OPTIONS
|
|
39 -a, --Atoms *"AtomNum,[AtomNum...]" | "StartAtomNum,EndAtomNum" |
|
|
40 "AtomName,[AtomName...]"*
|
|
41 Specify which atom records to extract from *PDBFiles(s)* during
|
|
42 *AtomNums*, *AtomsRange*, and *AtomNames* value of -m, --mode
|
|
43 option: extract records corresponding to atom numbers specified in a
|
|
44 comma delimited list of atom numbers/names, or with in the range of
|
|
45 start and end atom numbers. Possible values:
|
|
46 *"AtomNum[,AtomNum,..]"*, *StartAtomNum,EndAtomNum*, or
|
|
47 *"AtomName[,AtomName,..]"*. Default: *None*. Examples:
|
|
48
|
|
49 10
|
|
50 15,20
|
|
51 N,CA,C,O
|
|
52
|
|
53 -c, --chains *First | All | ChainID,[ChainID,...]*
|
|
54 Specify which chains to extract from *PDBFile(s)* during *Chains |
|
|
55 Sequences* value of -m, --mode option: first chain, all chains, or a
|
|
56 specific list of comma delimited chain IDs. Possible values: *First
|
|
57 | All | ChainID,[ChainID,...]*. Default: *First*. Examples:
|
|
58
|
|
59 A
|
|
60 A,B
|
|
61 All
|
|
62
|
|
63 --CombineChains *yes | no*
|
|
64 Specify whether to combine extracted chains data into a single file
|
|
65 during *Chains* or *Sequences* value of -m, --mode option. Possible
|
|
66 values: *yes | no*. Default: *no*.
|
|
67
|
|
68 During *Chains* value of <-m, --mode> option with *Yes* value of
|
|
69 <--CombineChains>, extracted data for specified chains is written
|
|
70 into a single file instead of individual file for each chain.
|
|
71
|
|
72 During *Sequences* value of <-m, --mode> option with *Yes* value of
|
|
73 <--CombineChains>, residues sequences for specified chains are
|
|
74 extracted and concatenated into a single sequence file instead of
|
|
75 individual file for each chain.
|
|
76
|
|
77 -d, --distance *number*
|
|
78 Specify distance used to extract ATOM/HETATM recods during
|
|
79 *Distance* value of -m, --mode option. Default: *10.0* angstroms.
|
|
80
|
|
81 --RecordMode option controls type of record lines to extract from
|
|
82 *PDBFile(s)*: ATOM, HETATM or both.
|
|
83
|
|
84 --DistanceMode *Atom | Hetatm | Residue | XYZ*
|
|
85 Specify how to extract ATOM/HETATM records from *PDBFile(s)* during
|
|
86 *Distance* value of -m, --mode option: extract all the records
|
|
87 within a certain distance specifed by -d, --distance from an atom or
|
|
88 hetro atom record, a residue, or any artbitrary point. Possible
|
|
89 values: *Atom | Hetatm | Residue | XYZ*. Default: *XYZ*.
|
|
90
|
|
91 During *Residue* value of --distancemode, distance of ATOM/HETATM
|
|
92 records is calculated from all the atoms in the residue and the
|
|
93 records are selected as long as any atom of the residue lies with in
|
|
94 the distace specified using -d, --distance option.
|
|
95
|
|
96 --RecordMode option controls type of record lines to extract from
|
|
97 *PDBFile(s)*: ATOM, HETATM or both.
|
|
98
|
|
99 --DistanceSelectionMode *ByAtom | ByResidue*
|
|
100 Specify how how to extract ATOM/HETATM records from *PDBFile(s)*
|
|
101 during *Distance* value of -m, --mode option for all values of
|
|
102 --DistanceMode option: extract only those ATOM/HETATM records that
|
|
103 meet specified distance criterion; extract all records corresponding
|
|
104 to a residue as long as one of the ATOM/HETATM record in the residue
|
|
105 satisfies specified distance criterion. Possible values: *ByAtom,
|
|
106 ByResidue*. Default value: *ByAtom*.
|
|
107
|
|
108 --DistanceOrigin *"AtomNumber,AtomName" | "HetatmNumber,HetAtmName" |
|
|
109 "ResidueNumber,ResidueName[,ChainID]" | "X,Y,Z"*
|
|
110 This value is --distancemode specific. In general, it identifies a
|
|
111 point used to select other ATOM/HETATMS with in a specific distance
|
|
112 from this point.
|
|
113
|
|
114 For *Atom* value of --distancemode, this option corresponds to an
|
|
115 atom specification. Format: *AtomNumber,AtomName*. Example:
|
|
116
|
|
117 455,CA
|
|
118
|
|
119 For *Hetatm* value of --distancemode, this option corresponds to a
|
|
120 hetatm specification. Format: *HetatmNumber,HetAtmName*. Example:
|
|
121
|
|
122 5295,C1
|
|
123
|
|
124 For *Residue* value of --distancemode, this option corresponds to a
|
|
125 residue specification. Format: *ResidueNumber,
|
|
126 ResidueName[,ChainID]*. Example:
|
|
127
|
|
128 78,MSE
|
|
129 977,RET,A
|
|
130 978,RET,B
|
|
131
|
|
132 For *XYZ* value of --distancemode, this option corresponds to a
|
|
133 coordinate of an arbitrary point. Format: *X,Y,X*. Example:
|
|
134
|
|
135 10.044,19.261,-4.292
|
|
136
|
|
137 --RecordMode option controls type of record lines to extract from
|
|
138 *PDBFile(s)*: ATOM, HETATM or both.
|
|
139
|
|
140 -h, --help
|
|
141 Print this help message.
|
|
142
|
|
143 -k, --KeepOldRecords *yes | no*
|
|
144 Specify whether to transfer old non ATOM and HETATM records from
|
|
145 input PDBFile(s) to new PDBFile(s) during *Chains | Atoms | HetAtms
|
|
146 | CAlphas | Distance| NonWater | NonHydrogens* value of -m --mode
|
|
147 option. By default, except for the HEADER record, all other
|
|
148 unnecessary non ATOM/HETATM records are dropped during the
|
|
149 generation of new PDB files. Possible values: *yes | no*. Default:
|
|
150 *no*.
|
|
151
|
|
152 -m, --mode *Chains | Sequences | Atoms | CAlphas | AtomNums | AtomsRange
|
|
153 | AtomNames | ResidueNums | ResiduesRange | ResidueNames | Distance |
|
|
154 NonWater | NonHydrogens*
|
|
155 Specify what to extract from *PDBFile(s)*: *Chains* - retrieve
|
|
156 records for specified chains; *Sequences* - generate sequence files
|
|
157 for specific chains; *Atoms* - extract atom records; *CAlphas* -
|
|
158 extract atom records for alpha carbon atoms; *AtomNums* - extract
|
|
159 atom records for specified atom numbers; *AtomsRange* - extract atom
|
|
160 records between specified atom number range; *AtomNames* - extract
|
|
161 atom records for specified atom names; *ResidueNums* - extract
|
|
162 records for specified residue numbers; *ResiduesRange* - extract
|
|
163 records for residues between specified residue number range;
|
|
164 *ResidueNames* - extract records for specified residue names;
|
|
165 *Distance* - extract records with in a certain distance from a
|
|
166 specific position; *NonWater* - extract records corresponding to
|
|
167 residues other than water; *NonHydrogens* - extract non-hydrogen
|
|
168 records.
|
|
169
|
|
170 Possible values: *Chains, Sequences Atoms, CAlphas, AtomNums,
|
|
171 AtomsRange, AtomNames, ResidueNums, ResiduesRange, ResidueNames,
|
|
172 Distance, NonWater, NonHydrogens*. Default value: *NonWater*
|
|
173
|
|
174 During the generation of new PDB files, unnecessay CONECT records
|
|
175 are dropped.
|
|
176
|
|
177 For *Chains* mode, data for appropriate chains specified by --c
|
|
178 --chains option is extracted from *PDBFile(s)* and placed into new
|
|
179 PDB file(s).
|
|
180
|
|
181 For *Sequences* mode, residues names using various sequence related
|
|
182 options are extracted for chains specified by --c --chains option
|
|
183 from *PDBFile(s)* and FASTA sequence file(s) are generated.
|
|
184
|
|
185 For *Distance* mode, all ATOM/HETATM records with in a distance
|
|
186 specified by -d --distance option from a specific atom, residue or a
|
|
187 point indicated by --distancemode are extracted and placed into new
|
|
188 PDB file(s).
|
|
189
|
|
190 For *NonWater* mode, non water ATOM/HETATM record lines, identified
|
|
191 using value of --WaterResidueNames, are extracted and written to new
|
|
192 PDB file(s).
|
|
193
|
|
194 For *NonHydrogens* mode, ATOM/HETATOM record lines containing
|
|
195 element symbol other than *H* are extracted and written to new PDB
|
|
196 file(s).
|
|
197
|
|
198 For all other options, appropriate ATOM/HETATM records are extracted
|
|
199 to generate new PDB file(s).
|
|
200
|
|
201 --RecordMode option controls type of record lines to extract and
|
|
202 process from *PDBFile(s)*: ATOM, HETATM or both.
|
|
203
|
|
204 --ModifyHeader *yes | no*
|
|
205 Specify whether to modify HEADER record during the generation of new
|
|
206 PDB files for -m, --mode values of *Chains | Atoms | CAlphas |
|
|
207 Distance*. Possible values: *yes | no*. Default: *yes*. By default,
|
|
208 Classification data is replaced by *Data extracted using
|
|
209 MayaChemTools* before writing out HEADER record.
|
|
210
|
|
211 --NonStandardKeep *yes | no*
|
|
212 Specify whether to include and convert non-standard three letter
|
|
213 residue codes into a code specified using --nonstandardcode option
|
|
214 and include them into sequence file(s) generated during *Sequences*
|
|
215 value of -m, --mode option. Possible values: *yes | no*. Default:
|
|
216 *yes*.
|
|
217
|
|
218 A warning is also printed about the presence of non-standard
|
|
219 residues. Any residue other than standard 20 amino acids and 5
|
|
220 nucleic acid is considered non-standard; additionally, HETATM
|
|
221 residues in chains also tagged as non-standard.
|
|
222
|
|
223 --NonStandardCode *character*
|
|
224 A single character code to use for non-standard residues. Default:
|
|
225 *X*. Possible values: *?, -, or X*.
|
|
226
|
|
227 -o, --overwrite
|
|
228 Overwrite existing files.
|
|
229
|
|
230 -r, --root *rootname*
|
|
231 New PDB and sequence file name is generated using the root:
|
|
232 <Root><Mode>.<Ext>. Default new file name:
|
|
233 <PDBFileName>Chain<ChainID>.pdb for *Chains* mode;
|
|
234 <PDBFileName>SequenceChain<ChainID>.fasta for *Sequences* mode;
|
|
235 <PDBFileName>DistanceBy<DistanceMode>.pdb for *Distance* -m, --mode
|
|
236 <PDBFileName><Mode>.pdb for *Atoms | CAlphas | NonWater |
|
|
237 NonHydrogens* -m, --mode values. This option is ignored for multiple
|
|
238 input files.
|
|
239
|
|
240 --RecordMode *Atom | Hetatm | AtomAndHetatm*
|
|
241 Specify type of record lines to extract and process from
|
|
242 *PDBFile(s)* during various values of -m, --mode option: extract
|
|
243 only ATOM record lines; extract only HETATM record lines; extract
|
|
244 both ATOM and HETATM lines. Possible values: *Atom | Hetatm |
|
|
245 AtomAndHetatm | XYZ*. Default during *Atoms, CAlphas, AtomNums,
|
|
246 AtomsRange, AtomNames* values of -m, --mode option: *Atom*;
|
|
247 otherwise: *AtomAndHetatm*.
|
|
248
|
|
249 This option is ignored during *Chains, Sequences* values of -m,
|
|
250 --mode option.
|
|
251
|
|
252 --Residues *"ResidueNum,[ResidueNum...]" |
|
|
253 "StartResidueNum,EndResiduNum" | "ResidueName,[ResidueName...]"*
|
|
254 Specify which resiude records to extract from *PDBFiles(s)* during
|
|
255 *ResidueNums*, *ResiduesRange*,and *ResidueNames* value of -m,
|
|
256 --mode option: extract records corresponding to residue numbers
|
|
257 specified in a comma delimited list of residue numbers/names, or
|
|
258 with in the range of start and end residue numbers. Possible values:
|
|
259 *"ResidueNum[,ResidueNum,..]"*, *StartResidueNum,EndResiduNum*, or
|
|
260 *<"ResidueName[,ResidueName,..]"*. Default: *None*. Examples:
|
|
261
|
|
262 20
|
|
263 5,10
|
|
264 TYR,SER,THR
|
|
265
|
|
266 --RecordMode option controls type of record lines to extract from
|
|
267 *PDBFile(s)*: ATOM, HETATM or both.
|
|
268
|
|
269 --SequenceLength *number*
|
|
270 Maximum sequence length per line in sequence file(s). Default: *80*.
|
|
271
|
|
272 --SequenceRecords *Atom | SeqRes*
|
|
273 Specify which records to use for extracting residue names from
|
|
274 *PDBFiles(s)* during *Sequences* value of -m, --mode option: use
|
|
275 ATOM records to compile a list of residues in a chain or parse
|
|
276 SEQRES record to get a list of residues. Possible values: *Atom |
|
|
277 SeqRes*. Default: *Atom*.
|
|
278
|
|
279 --SequenceIDPrefix *FileName | HeaderRecord | Automatic*
|
|
280 Specify how to generate a prefix for sequence IDs during *Sequences*
|
|
281 value of -m, --mode option: use input file name prefix; retrieve PDB
|
|
282 ID from HEADER record; or automatically decide the method for
|
|
283 generating the prefix. The chain IDs are also appended to the
|
|
284 prefix. Possible values: *FileName | HeaderRecord | Automatic*.
|
|
285 Default: *Automatic*
|
|
286
|
|
287 --WaterResidueNames *Automatic | "ResidueName,[ResidueName,...]"*
|
|
288 Identification of water residues during *NonWater* value of -m,
|
|
289 --mode option. Possible values: *Automatic |
|
|
290 "ResidueName,[ResidueName,...]"*. Default: *Automatic* - corresponds
|
|
291 to "HOH,WAT,H20". You can also specify a different comma delimited
|
|
292 list of residue names to use for water.
|
|
293
|
|
294 -w, --WorkingDir *dirname*
|
|
295 Location of working directory. Default: current directory.
|
|
296
|
|
297 EXAMPLES
|
|
298 To extract non-water records from Sample2.pdb file and generate
|
|
299 Sample2NonWater.pdb file, type:
|
|
300
|
|
301 % ExtractFromPDBFiles.pl Sample2.pdb
|
|
302
|
|
303 To extract non-water records corresponding to only ATOM records from
|
|
304 Sample2.pdb file and generate Sample2NonWater.pdb file, type:
|
|
305
|
|
306 % ExtractFromPDBFiles.pl --RecordMode Atom Sample2.pdb
|
|
307
|
|
308 To extract non-water records from Sample2.pdb file using HOH or WAT
|
|
309 residue name for water along with all old non-coordinate records and
|
|
310 generate Sample2NewNonWater.pdb file, type:
|
|
311
|
|
312 % ExtractFromPDBFiles.pl -m NonWater --WaterResidueNames "HOH,WAT"
|
|
313 -KeepOldRecords Yes -r Sample2New -o Sample2.pdb
|
|
314
|
|
315 To extract non-hydrogens records from Sample2.pdb file and generate
|
|
316 Sample2NonHydrogen.pdb file, type:
|
|
317
|
|
318 % ExtractFromPDBFiles.pl -m NonHydrogens Sample2.pdb
|
|
319
|
|
320 To extract data for first chain in Sample2.pdb and generate
|
|
321 Sample2ChainA.pdb, type file, type:
|
|
322
|
|
323 % ExtractFromPDBFiles.pl -m chains -o Sample2.pdb
|
|
324
|
|
325 To extract data for both chains in Sample2.pdb and generate
|
|
326 Sample2ChainA.pdb and Sample2ChainB.pdb, type:
|
|
327
|
|
328 % ExtractFromPDBFiles.pl -m chains -c All -o Sample2.pdb
|
|
329
|
|
330 To extract data for alpha carbons in Sample2.pdb and generate
|
|
331 Sample2CAlphas.pdb, type:
|
|
332
|
|
333 % ExtractFromPDBFiles.pl -m CAlphas -o Sample2.pdb
|
|
334
|
|
335 To extract records for specific residue numbers in all chains from
|
|
336 Sample2.pdb file and generate Sample2ResidueNums.pdb file, type:
|
|
337
|
|
338 % ExtractFromPDBFiles.pl -m ResidueNums --Residues "3,6"
|
|
339 Sample2.pdb
|
|
340
|
|
341 To extract records for a specific range of residue number in all chains
|
|
342 from Sample2.pdb file and generate Sample2ResiduesRange.pdb file, type:
|
|
343
|
|
344 % ExtractFromPDBFiles.pl -m ResiduesRange --Residues "10,30"
|
|
345 Sample2.pdb
|
|
346
|
|
347 To extract data for all ATOM and HETATM records with in 10 angstrom of
|
|
348 an atom specifed by atom serial number and name "1,N" in Sample2.pdb
|
|
349 file and generate Sample2DistanceByAtom.pdb, type:
|
|
350
|
|
351 % ExtractFromPDBFiles.pl -m Distance --DistanceMode Atom
|
|
352 --DistanceOrigin "1,N" -k No --distance 10 -o Sample2.pdb
|
|
353
|
|
354 To extract data for all ATOM and HETATM records for complete residues
|
|
355 with any atom or hetatm less than 10 angstrom of an atom specifed by
|
|
356 atom serial number and name "1,N" in Sample2.pdb file and generate
|
|
357 Sample2DistanceByAtom.pdb, type:
|
|
358
|
|
359 % ExtractFromPDBFiles.pl -m Distance --DistanceMode Atom
|
|
360 --DistanceOrigin "1,N" --DistanceSelectionMode ByResidue
|
|
361 -k No --distance 10 -o Sample2.pdb
|
|
362
|
|
363 To extract data for all ATOM and HETATM records with in 25 angstrom of
|
|
364 an arbitrary point "0,0,0" in Sample2.pdb file and generate
|
|
365 Sample2DistanceByXYZ.pdb, type:
|
|
366
|
|
367 % ExtractFromPDBFiles.pl -m Distance --DistanceMode XYZ
|
|
368 --DistanceOrigin "0,0,0" -k No --distance 25 -o Sample2.pdb
|
|
369
|
|
370 AUTHOR
|
|
371 Manish Sud <msud@san.rr.com>
|
|
372
|
|
373 SEE ALSO
|
|
374 InfoPDBFiles.pl, ModifyPDBFiles.pl
|
|
375
|
|
376 COPYRIGHT
|
|
377 Copyright (C) 2015 Manish Sud. All rights reserved.
|
|
378
|
|
379 This file is part of MayaChemTools.
|
|
380
|
|
381 MayaChemTools is free software; you can redistribute it and/or modify it
|
|
382 under the terms of the GNU Lesser General Public License as published by
|
|
383 the Free Software Foundation; either version 3 of the License, or (at
|
|
384 your option) any later version.
|
|
385
|