comparison docs/scripts/txt/ExtractFromPDBFiles.txt @ 0:4816e4a8ae95 draft default tip

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 09:23:18 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:4816e4a8ae95
1 NAME
2 ExtractFromPDBFiles.pl - Extract specific data from PDBFile(s)
3
4 SYNOPSIS
5 ExtractFromPDBFiles.pl PDBFile(s)...
6
7 ExtractFromPDBFiles.pl [-a, --Atoms "AtomNum, [AtomNum...]" |
8 "StartAtomNum, EndAtomNum" | "AtomName, [AtomName...]"] [-c, --chains
9 First | All | "ChainID, [ChainID,...]"] [<--CombineChains> yes | no]
10 [-d, --distance number] [--DistanceMode Atom | Hetatm | Residue | XYZ]
11 [--DistanceOrigin "AtomNumber, AtomName" | "HetatmNumber, HetAtmName" |
12 "ResidueNumber, ResidueName, [ChainID]" | "X,Y,Z">]
13 [<--DistanceSelectionMode> ByAtom | ByResidue] [-h, --help] [-k,
14 --KeepOldRecords yes | no] [-m, --mode Chains | Sequences | Atoms |
15 CAlphas | AtomNums | AtomsRange | AtomNames | ResidueNums |
16 ResiduesRange | ResidueNames | Distance | NonWater | NonHydrogens]
17 [--ModifyHeader yes | no] [--NonStandardKeep yes | no]
18 [--NonStandardCode character] [-o, --overwrite] [-r, --root rootname]
19 --RecordMode *Atom | Hetatm | AtomAndHetatm*] [--Residues
20 "ResidueNum,[ResidueNum...]" | StartResidueNum,EndResiduNum ]
21 [--SequenceLength number] [--SequenceRecords Atom | SeqRes]
22 [--SequenceIDPrefix FileName | HeaderRecord | Automatic]
23 [--WaterResidueNames Automatic | "ResidueName, [ResidueName,...]"] [-w,
24 --WorkingDir dirname] PDBFile(s)...
25
26 DESCRIPTION
27 Extract specific data from *PDBFile(s)* and generate appropriate PDB or
28 sequence file(s). Multiple PDBFile names are separated by spaces. The
29 valid file extension is *.pdb*. All other file name extensions are
30 ignored during the wild card expansion. All the PDB files in a current
31 directory can be specified either by **.pdb* or the current directory
32 name.
33
34 During *Chains* and *Sequences* values of -m, --mode option, all
35 ATOM/HETAM records for chains after the first model in PDB fils
36 containing data for multiple models are ignored.
37
38 OPTIONS
39 -a, --Atoms *"AtomNum,[AtomNum...]" | "StartAtomNum,EndAtomNum" |
40 "AtomName,[AtomName...]"*
41 Specify which atom records to extract from *PDBFiles(s)* during
42 *AtomNums*, *AtomsRange*, and *AtomNames* value of -m, --mode
43 option: extract records corresponding to atom numbers specified in a
44 comma delimited list of atom numbers/names, or with in the range of
45 start and end atom numbers. Possible values:
46 *"AtomNum[,AtomNum,..]"*, *StartAtomNum,EndAtomNum*, or
47 *"AtomName[,AtomName,..]"*. Default: *None*. Examples:
48
49 10
50 15,20
51 N,CA,C,O
52
53 -c, --chains *First | All | ChainID,[ChainID,...]*
54 Specify which chains to extract from *PDBFile(s)* during *Chains |
55 Sequences* value of -m, --mode option: first chain, all chains, or a
56 specific list of comma delimited chain IDs. Possible values: *First
57 | All | ChainID,[ChainID,...]*. Default: *First*. Examples:
58
59 A
60 A,B
61 All
62
63 --CombineChains *yes | no*
64 Specify whether to combine extracted chains data into a single file
65 during *Chains* or *Sequences* value of -m, --mode option. Possible
66 values: *yes | no*. Default: *no*.
67
68 During *Chains* value of <-m, --mode> option with *Yes* value of
69 <--CombineChains>, extracted data for specified chains is written
70 into a single file instead of individual file for each chain.
71
72 During *Sequences* value of <-m, --mode> option with *Yes* value of
73 <--CombineChains>, residues sequences for specified chains are
74 extracted and concatenated into a single sequence file instead of
75 individual file for each chain.
76
77 -d, --distance *number*
78 Specify distance used to extract ATOM/HETATM recods during
79 *Distance* value of -m, --mode option. Default: *10.0* angstroms.
80
81 --RecordMode option controls type of record lines to extract from
82 *PDBFile(s)*: ATOM, HETATM or both.
83
84 --DistanceMode *Atom | Hetatm | Residue | XYZ*
85 Specify how to extract ATOM/HETATM records from *PDBFile(s)* during
86 *Distance* value of -m, --mode option: extract all the records
87 within a certain distance specifed by -d, --distance from an atom or
88 hetro atom record, a residue, or any artbitrary point. Possible
89 values: *Atom | Hetatm | Residue | XYZ*. Default: *XYZ*.
90
91 During *Residue* value of --distancemode, distance of ATOM/HETATM
92 records is calculated from all the atoms in the residue and the
93 records are selected as long as any atom of the residue lies with in
94 the distace specified using -d, --distance option.
95
96 --RecordMode option controls type of record lines to extract from
97 *PDBFile(s)*: ATOM, HETATM or both.
98
99 --DistanceSelectionMode *ByAtom | ByResidue*
100 Specify how how to extract ATOM/HETATM records from *PDBFile(s)*
101 during *Distance* value of -m, --mode option for all values of
102 --DistanceMode option: extract only those ATOM/HETATM records that
103 meet specified distance criterion; extract all records corresponding
104 to a residue as long as one of the ATOM/HETATM record in the residue
105 satisfies specified distance criterion. Possible values: *ByAtom,
106 ByResidue*. Default value: *ByAtom*.
107
108 --DistanceOrigin *"AtomNumber,AtomName" | "HetatmNumber,HetAtmName" |
109 "ResidueNumber,ResidueName[,ChainID]" | "X,Y,Z"*
110 This value is --distancemode specific. In general, it identifies a
111 point used to select other ATOM/HETATMS with in a specific distance
112 from this point.
113
114 For *Atom* value of --distancemode, this option corresponds to an
115 atom specification. Format: *AtomNumber,AtomName*. Example:
116
117 455,CA
118
119 For *Hetatm* value of --distancemode, this option corresponds to a
120 hetatm specification. Format: *HetatmNumber,HetAtmName*. Example:
121
122 5295,C1
123
124 For *Residue* value of --distancemode, this option corresponds to a
125 residue specification. Format: *ResidueNumber,
126 ResidueName[,ChainID]*. Example:
127
128 78,MSE
129 977,RET,A
130 978,RET,B
131
132 For *XYZ* value of --distancemode, this option corresponds to a
133 coordinate of an arbitrary point. Format: *X,Y,X*. Example:
134
135 10.044,19.261,-4.292
136
137 --RecordMode option controls type of record lines to extract from
138 *PDBFile(s)*: ATOM, HETATM or both.
139
140 -h, --help
141 Print this help message.
142
143 -k, --KeepOldRecords *yes | no*
144 Specify whether to transfer old non ATOM and HETATM records from
145 input PDBFile(s) to new PDBFile(s) during *Chains | Atoms | HetAtms
146 | CAlphas | Distance| NonWater | NonHydrogens* value of -m --mode
147 option. By default, except for the HEADER record, all other
148 unnecessary non ATOM/HETATM records are dropped during the
149 generation of new PDB files. Possible values: *yes | no*. Default:
150 *no*.
151
152 -m, --mode *Chains | Sequences | Atoms | CAlphas | AtomNums | AtomsRange
153 | AtomNames | ResidueNums | ResiduesRange | ResidueNames | Distance |
154 NonWater | NonHydrogens*
155 Specify what to extract from *PDBFile(s)*: *Chains* - retrieve
156 records for specified chains; *Sequences* - generate sequence files
157 for specific chains; *Atoms* - extract atom records; *CAlphas* -
158 extract atom records for alpha carbon atoms; *AtomNums* - extract
159 atom records for specified atom numbers; *AtomsRange* - extract atom
160 records between specified atom number range; *AtomNames* - extract
161 atom records for specified atom names; *ResidueNums* - extract
162 records for specified residue numbers; *ResiduesRange* - extract
163 records for residues between specified residue number range;
164 *ResidueNames* - extract records for specified residue names;
165 *Distance* - extract records with in a certain distance from a
166 specific position; *NonWater* - extract records corresponding to
167 residues other than water; *NonHydrogens* - extract non-hydrogen
168 records.
169
170 Possible values: *Chains, Sequences Atoms, CAlphas, AtomNums,
171 AtomsRange, AtomNames, ResidueNums, ResiduesRange, ResidueNames,
172 Distance, NonWater, NonHydrogens*. Default value: *NonWater*
173
174 During the generation of new PDB files, unnecessay CONECT records
175 are dropped.
176
177 For *Chains* mode, data for appropriate chains specified by --c
178 --chains option is extracted from *PDBFile(s)* and placed into new
179 PDB file(s).
180
181 For *Sequences* mode, residues names using various sequence related
182 options are extracted for chains specified by --c --chains option
183 from *PDBFile(s)* and FASTA sequence file(s) are generated.
184
185 For *Distance* mode, all ATOM/HETATM records with in a distance
186 specified by -d --distance option from a specific atom, residue or a
187 point indicated by --distancemode are extracted and placed into new
188 PDB file(s).
189
190 For *NonWater* mode, non water ATOM/HETATM record lines, identified
191 using value of --WaterResidueNames, are extracted and written to new
192 PDB file(s).
193
194 For *NonHydrogens* mode, ATOM/HETATOM record lines containing
195 element symbol other than *H* are extracted and written to new PDB
196 file(s).
197
198 For all other options, appropriate ATOM/HETATM records are extracted
199 to generate new PDB file(s).
200
201 --RecordMode option controls type of record lines to extract and
202 process from *PDBFile(s)*: ATOM, HETATM or both.
203
204 --ModifyHeader *yes | no*
205 Specify whether to modify HEADER record during the generation of new
206 PDB files for -m, --mode values of *Chains | Atoms | CAlphas |
207 Distance*. Possible values: *yes | no*. Default: *yes*. By default,
208 Classification data is replaced by *Data extracted using
209 MayaChemTools* before writing out HEADER record.
210
211 --NonStandardKeep *yes | no*
212 Specify whether to include and convert non-standard three letter
213 residue codes into a code specified using --nonstandardcode option
214 and include them into sequence file(s) generated during *Sequences*
215 value of -m, --mode option. Possible values: *yes | no*. Default:
216 *yes*.
217
218 A warning is also printed about the presence of non-standard
219 residues. Any residue other than standard 20 amino acids and 5
220 nucleic acid is considered non-standard; additionally, HETATM
221 residues in chains also tagged as non-standard.
222
223 --NonStandardCode *character*
224 A single character code to use for non-standard residues. Default:
225 *X*. Possible values: *?, -, or X*.
226
227 -o, --overwrite
228 Overwrite existing files.
229
230 -r, --root *rootname*
231 New PDB and sequence file name is generated using the root:
232 <Root><Mode>.<Ext>. Default new file name:
233 <PDBFileName>Chain<ChainID>.pdb for *Chains* mode;
234 <PDBFileName>SequenceChain<ChainID>.fasta for *Sequences* mode;
235 <PDBFileName>DistanceBy<DistanceMode>.pdb for *Distance* -m, --mode
236 <PDBFileName><Mode>.pdb for *Atoms | CAlphas | NonWater |
237 NonHydrogens* -m, --mode values. This option is ignored for multiple
238 input files.
239
240 --RecordMode *Atom | Hetatm | AtomAndHetatm*
241 Specify type of record lines to extract and process from
242 *PDBFile(s)* during various values of -m, --mode option: extract
243 only ATOM record lines; extract only HETATM record lines; extract
244 both ATOM and HETATM lines. Possible values: *Atom | Hetatm |
245 AtomAndHetatm | XYZ*. Default during *Atoms, CAlphas, AtomNums,
246 AtomsRange, AtomNames* values of -m, --mode option: *Atom*;
247 otherwise: *AtomAndHetatm*.
248
249 This option is ignored during *Chains, Sequences* values of -m,
250 --mode option.
251
252 --Residues *"ResidueNum,[ResidueNum...]" |
253 "StartResidueNum,EndResiduNum" | "ResidueName,[ResidueName...]"*
254 Specify which resiude records to extract from *PDBFiles(s)* during
255 *ResidueNums*, *ResiduesRange*,and *ResidueNames* value of -m,
256 --mode option: extract records corresponding to residue numbers
257 specified in a comma delimited list of residue numbers/names, or
258 with in the range of start and end residue numbers. Possible values:
259 *"ResidueNum[,ResidueNum,..]"*, *StartResidueNum,EndResiduNum*, or
260 *<"ResidueName[,ResidueName,..]"*. Default: *None*. Examples:
261
262 20
263 5,10
264 TYR,SER,THR
265
266 --RecordMode option controls type of record lines to extract from
267 *PDBFile(s)*: ATOM, HETATM or both.
268
269 --SequenceLength *number*
270 Maximum sequence length per line in sequence file(s). Default: *80*.
271
272 --SequenceRecords *Atom | SeqRes*
273 Specify which records to use for extracting residue names from
274 *PDBFiles(s)* during *Sequences* value of -m, --mode option: use
275 ATOM records to compile a list of residues in a chain or parse
276 SEQRES record to get a list of residues. Possible values: *Atom |
277 SeqRes*. Default: *Atom*.
278
279 --SequenceIDPrefix *FileName | HeaderRecord | Automatic*
280 Specify how to generate a prefix for sequence IDs during *Sequences*
281 value of -m, --mode option: use input file name prefix; retrieve PDB
282 ID from HEADER record; or automatically decide the method for
283 generating the prefix. The chain IDs are also appended to the
284 prefix. Possible values: *FileName | HeaderRecord | Automatic*.
285 Default: *Automatic*
286
287 --WaterResidueNames *Automatic | "ResidueName,[ResidueName,...]"*
288 Identification of water residues during *NonWater* value of -m,
289 --mode option. Possible values: *Automatic |
290 "ResidueName,[ResidueName,...]"*. Default: *Automatic* - corresponds
291 to "HOH,WAT,H20". You can also specify a different comma delimited
292 list of residue names to use for water.
293
294 -w, --WorkingDir *dirname*
295 Location of working directory. Default: current directory.
296
297 EXAMPLES
298 To extract non-water records from Sample2.pdb file and generate
299 Sample2NonWater.pdb file, type:
300
301 % ExtractFromPDBFiles.pl Sample2.pdb
302
303 To extract non-water records corresponding to only ATOM records from
304 Sample2.pdb file and generate Sample2NonWater.pdb file, type:
305
306 % ExtractFromPDBFiles.pl --RecordMode Atom Sample2.pdb
307
308 To extract non-water records from Sample2.pdb file using HOH or WAT
309 residue name for water along with all old non-coordinate records and
310 generate Sample2NewNonWater.pdb file, type:
311
312 % ExtractFromPDBFiles.pl -m NonWater --WaterResidueNames "HOH,WAT"
313 -KeepOldRecords Yes -r Sample2New -o Sample2.pdb
314
315 To extract non-hydrogens records from Sample2.pdb file and generate
316 Sample2NonHydrogen.pdb file, type:
317
318 % ExtractFromPDBFiles.pl -m NonHydrogens Sample2.pdb
319
320 To extract data for first chain in Sample2.pdb and generate
321 Sample2ChainA.pdb, type file, type:
322
323 % ExtractFromPDBFiles.pl -m chains -o Sample2.pdb
324
325 To extract data for both chains in Sample2.pdb and generate
326 Sample2ChainA.pdb and Sample2ChainB.pdb, type:
327
328 % ExtractFromPDBFiles.pl -m chains -c All -o Sample2.pdb
329
330 To extract data for alpha carbons in Sample2.pdb and generate
331 Sample2CAlphas.pdb, type:
332
333 % ExtractFromPDBFiles.pl -m CAlphas -o Sample2.pdb
334
335 To extract records for specific residue numbers in all chains from
336 Sample2.pdb file and generate Sample2ResidueNums.pdb file, type:
337
338 % ExtractFromPDBFiles.pl -m ResidueNums --Residues "3,6"
339 Sample2.pdb
340
341 To extract records for a specific range of residue number in all chains
342 from Sample2.pdb file and generate Sample2ResiduesRange.pdb file, type:
343
344 % ExtractFromPDBFiles.pl -m ResiduesRange --Residues "10,30"
345 Sample2.pdb
346
347 To extract data for all ATOM and HETATM records with in 10 angstrom of
348 an atom specifed by atom serial number and name "1,N" in Sample2.pdb
349 file and generate Sample2DistanceByAtom.pdb, type:
350
351 % ExtractFromPDBFiles.pl -m Distance --DistanceMode Atom
352 --DistanceOrigin "1,N" -k No --distance 10 -o Sample2.pdb
353
354 To extract data for all ATOM and HETATM records for complete residues
355 with any atom or hetatm less than 10 angstrom of an atom specifed by
356 atom serial number and name "1,N" in Sample2.pdb file and generate
357 Sample2DistanceByAtom.pdb, type:
358
359 % ExtractFromPDBFiles.pl -m Distance --DistanceMode Atom
360 --DistanceOrigin "1,N" --DistanceSelectionMode ByResidue
361 -k No --distance 10 -o Sample2.pdb
362
363 To extract data for all ATOM and HETATM records with in 25 angstrom of
364 an arbitrary point "0,0,0" in Sample2.pdb file and generate
365 Sample2DistanceByXYZ.pdb, type:
366
367 % ExtractFromPDBFiles.pl -m Distance --DistanceMode XYZ
368 --DistanceOrigin "0,0,0" -k No --distance 25 -o Sample2.pdb
369
370 AUTHOR
371 Manish Sud <msud@san.rr.com>
372
373 SEE ALSO
374 InfoPDBFiles.pl, ModifyPDBFiles.pl
375
376 COPYRIGHT
377 Copyright (C) 2015 Manish Sud. All rights reserved.
378
379 This file is part of MayaChemTools.
380
381 MayaChemTools is free software; you can redistribute it and/or modify it
382 under the terms of the GNU Lesser General Public License as published by
383 the Free Software Foundation; either version 3 of the License, or (at
384 your option) any later version.
385