0
|
1 NAME
|
|
2 SplitSDFiles.pl - Split SDFile(s) into multiple SD files
|
|
3
|
|
4 SYNOPSIS
|
|
5 SplitSDFiles.pl SDFile(s)...
|
|
6
|
|
7 SplitSDFiles.pl [-c, --CmpdsMode DataField | MolName | RootPrefix] [-d,
|
|
8 --DataField DataFieldName] [-h, --help] [-m, --mode Cmpds | Files] [-n,
|
|
9 --numfiles number] [--numcmpds number] [-o, --overwrite] [-r, --root
|
|
10 rootname] [-w,--workingdir dirname] SDFile(s)...
|
|
11
|
|
12 DESCRIPTION
|
|
13 Split *SDFile(s)* into multiple SD files. Each new SDFile contains a
|
|
14 compound subset of similar size from the initial file. Multiple
|
|
15 *SDFile(s)* names are separated by space. The valid file extensions are
|
|
16 *.sdf* and *.sd*. All other file names are ignored. All the SD files in
|
|
17 a current directory can be specified either by **.sdf* or the current
|
|
18 directory name.
|
|
19
|
|
20 OPTIONS
|
|
21 -c, --CmpdsMode *DataField | MolName | RootPrefix*
|
|
22 This option is only used during *Cmpds* value of <-m, --mode> option
|
|
23 with specified --numcmpds value of 1.
|
|
24
|
|
25 Specify how to generate new file names during *Cmpds* value of <-m,
|
|
26 --mode> option: use *SDFile(s)* datafield value or molname line for
|
|
27 a specific compound; generate a sequential ID using root prefix
|
|
28 specified by -r, --root option.
|
|
29
|
|
30 Possible values: *DataField | MolName | RootPrefix | RootPrefix*.
|
|
31 Default: *RootPrefix*.
|
|
32
|
|
33 For empty *MolName* and *DataField* values during these specified
|
|
34 modes, file name is automatically generated using *RootPrefix*.
|
|
35
|
|
36 For *RootPrefix* value of -c, --CmpdsMode option, new file names are
|
|
37 generated using by appending compound record number to value of -r,
|
|
38 --root option. For example: *RootName*Cmd<RecordNumber>.sdf.
|
|
39
|
|
40 Allowed characters in file names are: a-zA-Z0-9_. All other
|
|
41 characters in datafield values, molname line, and root prefix are
|
|
42 ignore during generation of file names.
|
|
43
|
|
44 -d, --DataField *DataFieldName*
|
|
45 This option is only used during *DataField* value of <-c,
|
|
46 --CmpdsMode> option.
|
|
47
|
|
48 Specify *SDFile(s)* datafield label name whose value is used for
|
|
49 generation of new file for a specific compound. Default value:
|
|
50 *None*.
|
|
51
|
|
52 -h, --help
|
|
53 Print this help message.
|
|
54
|
|
55 -m, --mode *Cmpds | Files*
|
|
56 Specify how to split *SDFile(s)*: split into files with each file
|
|
57 containing specified number of compounds or split into a specified
|
|
58 number of files.
|
|
59
|
|
60 Possible values: *Cmpds | Files*. Default: *Files*.
|
|
61
|
|
62 For *Cmpds* value of -m, --mode option, value of --numcmpds option
|
|
63 determines the number of new files. And value of -n, --numfiles
|
|
64 option is used to figure out the number of new files for *Files*
|
|
65 value of -m, --mode option.
|
|
66
|
|
67 -n, --numfiles *number*
|
|
68 Number of new files to generate for each *SDFile(s)*. Default: *2*.
|
|
69
|
|
70 This value is only used during *Files* value of -m, --mode option.
|
|
71
|
|
72 --numcmpds *number*
|
|
73 Number of compounds in each new file corresponding to each
|
|
74 *SDFile(s)*. Default: *1*.
|
|
75
|
|
76 This value is only used during *Cmpds* value of -m, --mode option.
|
|
77
|
|
78 -o, --overwrite
|
|
79 Overwrite existing files.
|
|
80
|
|
81 -r, --root *rootname*
|
|
82 New SD file names are generated using the root:
|
|
83 <Root>Part<Count>.sdf. Default new file names: <InitialSDFileName>
|
|
84 Part<Count>.sdf. This option is ignored for multiple input files.
|
|
85
|
|
86 -w,--workingdir *dirname*
|
|
87 Location of working directory. Default: current directory.
|
|
88
|
|
89 EXAMPLES
|
|
90 To split each SD file into 5 new SD files, type:
|
|
91
|
|
92 % SplitSDFiles.pl -n 5 -o Sample1.sdf Sample2.sdf
|
|
93 % SplitSDFiles.pl -n 5 -o *.sdf
|
|
94
|
|
95 To split Sample1.sdf into 10 new NewSample*.sdf files, type:
|
|
96
|
|
97 % SplitSDFiles.pl -m Files -n 10 -r NewSample -o Sample1.sdf
|
|
98
|
|
99 To split Sample1.sdf into new NewSample*.sdf files containing maximum of
|
|
100 5 compounds in each file, type:
|
|
101
|
|
102 % SplitSDFiles.pl -m Cmpds --numcmpds 5 -r NewSample -o Sample1.sdf
|
|
103
|
|
104 To split Sample1.sdf into new SD files containing one compound each with
|
|
105 new file names corresponding to molname line, type:
|
|
106
|
|
107 % SplitSDFiles.pl -m Cmpds --numcmpds 1 -c MolName -o Sample1.sdf
|
|
108
|
|
109 To split Sample1.sdf into new SD files containing one compound each with
|
|
110 new file names corresponding to value of datafield MolID, type:
|
|
111
|
|
112 % SplitSDFiles.pl -m Cmpds --numcmpds 1 -c DataField -d MolID
|
|
113 -o Sample1.sdf
|
|
114
|
|
115 AUTHOR
|
|
116 Manish Sud <msud@san.rr.com>
|
|
117
|
|
118 SEE ALSO
|
|
119 InfoSDFiles.pl, JoinSDFiles.pl, MolFilesToSD.pl, SDToMolFiles.pl
|
|
120
|
|
121 COPYRIGHT
|
|
122 Copyright (C) 2015 Manish Sud. All rights reserved.
|
|
123
|
|
124 This file is part of MayaChemTools.
|
|
125
|
|
126 MayaChemTools is free software; you can redistribute it and/or modify it
|
|
127 under the terms of the GNU Lesser General Public License as published by
|
|
128 the Free Software Foundation; either version 3 of the License, or (at
|
|
129 your option) any later version.
|
|
130
|