annotate docs/scripts/txt/SplitSDFiles.txt @ 0:4816e4a8ae95 draft default tip

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 09:23:18 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
1 NAME
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
2 SplitSDFiles.pl - Split SDFile(s) into multiple SD files
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
3
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
4 SYNOPSIS
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
5 SplitSDFiles.pl SDFile(s)...
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
6
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
7 SplitSDFiles.pl [-c, --CmpdsMode DataField | MolName | RootPrefix] [-d,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
8 --DataField DataFieldName] [-h, --help] [-m, --mode Cmpds | Files] [-n,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
9 --numfiles number] [--numcmpds number] [-o, --overwrite] [-r, --root
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
10 rootname] [-w,--workingdir dirname] SDFile(s)...
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
11
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
12 DESCRIPTION
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
13 Split *SDFile(s)* into multiple SD files. Each new SDFile contains a
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
14 compound subset of similar size from the initial file. Multiple
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
15 *SDFile(s)* names are separated by space. The valid file extensions are
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
16 *.sdf* and *.sd*. All other file names are ignored. All the SD files in
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
17 a current directory can be specified either by **.sdf* or the current
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
18 directory name.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
19
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
20 OPTIONS
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
21 -c, --CmpdsMode *DataField | MolName | RootPrefix*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
22 This option is only used during *Cmpds* value of <-m, --mode> option
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
23 with specified --numcmpds value of 1.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
24
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
25 Specify how to generate new file names during *Cmpds* value of <-m,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
26 --mode> option: use *SDFile(s)* datafield value or molname line for
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
27 a specific compound; generate a sequential ID using root prefix
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
28 specified by -r, --root option.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
29
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
30 Possible values: *DataField | MolName | RootPrefix | RootPrefix*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
31 Default: *RootPrefix*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
32
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
33 For empty *MolName* and *DataField* values during these specified
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
34 modes, file name is automatically generated using *RootPrefix*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
35
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
36 For *RootPrefix* value of -c, --CmpdsMode option, new file names are
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
37 generated using by appending compound record number to value of -r,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
38 --root option. For example: *RootName*Cmd<RecordNumber>.sdf.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
39
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
40 Allowed characters in file names are: a-zA-Z0-9_. All other
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
41 characters in datafield values, molname line, and root prefix are
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
42 ignore during generation of file names.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
43
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
44 -d, --DataField *DataFieldName*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
45 This option is only used during *DataField* value of <-c,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
46 --CmpdsMode> option.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
47
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
48 Specify *SDFile(s)* datafield label name whose value is used for
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
49 generation of new file for a specific compound. Default value:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
50 *None*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
51
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
52 -h, --help
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
53 Print this help message.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
54
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
55 -m, --mode *Cmpds | Files*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
56 Specify how to split *SDFile(s)*: split into files with each file
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
57 containing specified number of compounds or split into a specified
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
58 number of files.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
59
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
60 Possible values: *Cmpds | Files*. Default: *Files*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
61
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
62 For *Cmpds* value of -m, --mode option, value of --numcmpds option
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
63 determines the number of new files. And value of -n, --numfiles
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
64 option is used to figure out the number of new files for *Files*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
65 value of -m, --mode option.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
66
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
67 -n, --numfiles *number*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
68 Number of new files to generate for each *SDFile(s)*. Default: *2*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
69
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
70 This value is only used during *Files* value of -m, --mode option.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
71
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
72 --numcmpds *number*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
73 Number of compounds in each new file corresponding to each
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
74 *SDFile(s)*. Default: *1*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
75
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
76 This value is only used during *Cmpds* value of -m, --mode option.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
77
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
78 -o, --overwrite
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
79 Overwrite existing files.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
80
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
81 -r, --root *rootname*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
82 New SD file names are generated using the root:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
83 <Root>Part<Count>.sdf. Default new file names: <InitialSDFileName>
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
84 Part<Count>.sdf. This option is ignored for multiple input files.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
85
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
86 -w,--workingdir *dirname*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
87 Location of working directory. Default: current directory.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
88
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
89 EXAMPLES
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
90 To split each SD file into 5 new SD files, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
91
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
92 % SplitSDFiles.pl -n 5 -o Sample1.sdf Sample2.sdf
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
93 % SplitSDFiles.pl -n 5 -o *.sdf
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
94
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
95 To split Sample1.sdf into 10 new NewSample*.sdf files, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
96
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
97 % SplitSDFiles.pl -m Files -n 10 -r NewSample -o Sample1.sdf
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
98
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
99 To split Sample1.sdf into new NewSample*.sdf files containing maximum of
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
100 5 compounds in each file, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
101
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
102 % SplitSDFiles.pl -m Cmpds --numcmpds 5 -r NewSample -o Sample1.sdf
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
103
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
104 To split Sample1.sdf into new SD files containing one compound each with
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
105 new file names corresponding to molname line, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
106
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
107 % SplitSDFiles.pl -m Cmpds --numcmpds 1 -c MolName -o Sample1.sdf
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
108
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
109 To split Sample1.sdf into new SD files containing one compound each with
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
110 new file names corresponding to value of datafield MolID, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
111
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
112 % SplitSDFiles.pl -m Cmpds --numcmpds 1 -c DataField -d MolID
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
113 -o Sample1.sdf
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
114
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
115 AUTHOR
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
116 Manish Sud <msud@san.rr.com>
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
117
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
118 SEE ALSO
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
119 InfoSDFiles.pl, JoinSDFiles.pl, MolFilesToSD.pl, SDToMolFiles.pl
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
120
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
121 COPYRIGHT
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
122 Copyright (C) 2015 Manish Sud. All rights reserved.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
123
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
124 This file is part of MayaChemTools.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
125
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
126 MayaChemTools is free software; you can redistribute it and/or modify it
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
127 under the terms of the GNU Lesser General Public License as published by
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
128 the Free Software Foundation; either version 3 of the License, or (at
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
129 your option) any later version.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
130