comparison docs/scripts/txt/SplitSDFiles.txt @ 0:4816e4a8ae95 draft default tip

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 09:23:18 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:4816e4a8ae95
1 NAME
2 SplitSDFiles.pl - Split SDFile(s) into multiple SD files
3
4 SYNOPSIS
5 SplitSDFiles.pl SDFile(s)...
6
7 SplitSDFiles.pl [-c, --CmpdsMode DataField | MolName | RootPrefix] [-d,
8 --DataField DataFieldName] [-h, --help] [-m, --mode Cmpds | Files] [-n,
9 --numfiles number] [--numcmpds number] [-o, --overwrite] [-r, --root
10 rootname] [-w,--workingdir dirname] SDFile(s)...
11
12 DESCRIPTION
13 Split *SDFile(s)* into multiple SD files. Each new SDFile contains a
14 compound subset of similar size from the initial file. Multiple
15 *SDFile(s)* names are separated by space. The valid file extensions are
16 *.sdf* and *.sd*. All other file names are ignored. All the SD files in
17 a current directory can be specified either by **.sdf* or the current
18 directory name.
19
20 OPTIONS
21 -c, --CmpdsMode *DataField | MolName | RootPrefix*
22 This option is only used during *Cmpds* value of <-m, --mode> option
23 with specified --numcmpds value of 1.
24
25 Specify how to generate new file names during *Cmpds* value of <-m,
26 --mode> option: use *SDFile(s)* datafield value or molname line for
27 a specific compound; generate a sequential ID using root prefix
28 specified by -r, --root option.
29
30 Possible values: *DataField | MolName | RootPrefix | RootPrefix*.
31 Default: *RootPrefix*.
32
33 For empty *MolName* and *DataField* values during these specified
34 modes, file name is automatically generated using *RootPrefix*.
35
36 For *RootPrefix* value of -c, --CmpdsMode option, new file names are
37 generated using by appending compound record number to value of -r,
38 --root option. For example: *RootName*Cmd<RecordNumber>.sdf.
39
40 Allowed characters in file names are: a-zA-Z0-9_. All other
41 characters in datafield values, molname line, and root prefix are
42 ignore during generation of file names.
43
44 -d, --DataField *DataFieldName*
45 This option is only used during *DataField* value of <-c,
46 --CmpdsMode> option.
47
48 Specify *SDFile(s)* datafield label name whose value is used for
49 generation of new file for a specific compound. Default value:
50 *None*.
51
52 -h, --help
53 Print this help message.
54
55 -m, --mode *Cmpds | Files*
56 Specify how to split *SDFile(s)*: split into files with each file
57 containing specified number of compounds or split into a specified
58 number of files.
59
60 Possible values: *Cmpds | Files*. Default: *Files*.
61
62 For *Cmpds* value of -m, --mode option, value of --numcmpds option
63 determines the number of new files. And value of -n, --numfiles
64 option is used to figure out the number of new files for *Files*
65 value of -m, --mode option.
66
67 -n, --numfiles *number*
68 Number of new files to generate for each *SDFile(s)*. Default: *2*.
69
70 This value is only used during *Files* value of -m, --mode option.
71
72 --numcmpds *number*
73 Number of compounds in each new file corresponding to each
74 *SDFile(s)*. Default: *1*.
75
76 This value is only used during *Cmpds* value of -m, --mode option.
77
78 -o, --overwrite
79 Overwrite existing files.
80
81 -r, --root *rootname*
82 New SD file names are generated using the root:
83 <Root>Part<Count>.sdf. Default new file names: <InitialSDFileName>
84 Part<Count>.sdf. This option is ignored for multiple input files.
85
86 -w,--workingdir *dirname*
87 Location of working directory. Default: current directory.
88
89 EXAMPLES
90 To split each SD file into 5 new SD files, type:
91
92 % SplitSDFiles.pl -n 5 -o Sample1.sdf Sample2.sdf
93 % SplitSDFiles.pl -n 5 -o *.sdf
94
95 To split Sample1.sdf into 10 new NewSample*.sdf files, type:
96
97 % SplitSDFiles.pl -m Files -n 10 -r NewSample -o Sample1.sdf
98
99 To split Sample1.sdf into new NewSample*.sdf files containing maximum of
100 5 compounds in each file, type:
101
102 % SplitSDFiles.pl -m Cmpds --numcmpds 5 -r NewSample -o Sample1.sdf
103
104 To split Sample1.sdf into new SD files containing one compound each with
105 new file names corresponding to molname line, type:
106
107 % SplitSDFiles.pl -m Cmpds --numcmpds 1 -c MolName -o Sample1.sdf
108
109 To split Sample1.sdf into new SD files containing one compound each with
110 new file names corresponding to value of datafield MolID, type:
111
112 % SplitSDFiles.pl -m Cmpds --numcmpds 1 -c DataField -d MolID
113 -o Sample1.sdf
114
115 AUTHOR
116 Manish Sud <msud@san.rr.com>
117
118 SEE ALSO
119 InfoSDFiles.pl, JoinSDFiles.pl, MolFilesToSD.pl, SDToMolFiles.pl
120
121 COPYRIGHT
122 Copyright (C) 2015 Manish Sud. All rights reserved.
123
124 This file is part of MayaChemTools.
125
126 MayaChemTools is free software; you can redistribute it and/or modify it
127 under the terms of the GNU Lesser General Public License as published by
128 the Free Software Foundation; either version 3 of the License, or (at
129 your option) any later version.
130