Mercurial > repos > deepakjadmin > mayatool3_test2
comparison docs/scripts/txt/SplitSDFiles.txt @ 0:4816e4a8ae95 draft default tip
Uploaded
| author | deepakjadmin | 
|---|---|
| date | Wed, 20 Jan 2016 09:23:18 -0500 | 
| parents | |
| children | 
   comparison
  equal
  deleted
  inserted
  replaced
| -1:000000000000 | 0:4816e4a8ae95 | 
|---|---|
| 1 NAME | |
| 2 SplitSDFiles.pl - Split SDFile(s) into multiple SD files | |
| 3 | |
| 4 SYNOPSIS | |
| 5 SplitSDFiles.pl SDFile(s)... | |
| 6 | |
| 7 SplitSDFiles.pl [-c, --CmpdsMode DataField | MolName | RootPrefix] [-d, | |
| 8 --DataField DataFieldName] [-h, --help] [-m, --mode Cmpds | Files] [-n, | |
| 9 --numfiles number] [--numcmpds number] [-o, --overwrite] [-r, --root | |
| 10 rootname] [-w,--workingdir dirname] SDFile(s)... | |
| 11 | |
| 12 DESCRIPTION | |
| 13 Split *SDFile(s)* into multiple SD files. Each new SDFile contains a | |
| 14 compound subset of similar size from the initial file. Multiple | |
| 15 *SDFile(s)* names are separated by space. The valid file extensions are | |
| 16 *.sdf* and *.sd*. All other file names are ignored. All the SD files in | |
| 17 a current directory can be specified either by **.sdf* or the current | |
| 18 directory name. | |
| 19 | |
| 20 OPTIONS | |
| 21 -c, --CmpdsMode *DataField | MolName | RootPrefix* | |
| 22 This option is only used during *Cmpds* value of <-m, --mode> option | |
| 23 with specified --numcmpds value of 1. | |
| 24 | |
| 25 Specify how to generate new file names during *Cmpds* value of <-m, | |
| 26 --mode> option: use *SDFile(s)* datafield value or molname line for | |
| 27 a specific compound; generate a sequential ID using root prefix | |
| 28 specified by -r, --root option. | |
| 29 | |
| 30 Possible values: *DataField | MolName | RootPrefix | RootPrefix*. | |
| 31 Default: *RootPrefix*. | |
| 32 | |
| 33 For empty *MolName* and *DataField* values during these specified | |
| 34 modes, file name is automatically generated using *RootPrefix*. | |
| 35 | |
| 36 For *RootPrefix* value of -c, --CmpdsMode option, new file names are | |
| 37 generated using by appending compound record number to value of -r, | |
| 38 --root option. For example: *RootName*Cmd<RecordNumber>.sdf. | |
| 39 | |
| 40 Allowed characters in file names are: a-zA-Z0-9_. All other | |
| 41 characters in datafield values, molname line, and root prefix are | |
| 42 ignore during generation of file names. | |
| 43 | |
| 44 -d, --DataField *DataFieldName* | |
| 45 This option is only used during *DataField* value of <-c, | |
| 46 --CmpdsMode> option. | |
| 47 | |
| 48 Specify *SDFile(s)* datafield label name whose value is used for | |
| 49 generation of new file for a specific compound. Default value: | |
| 50 *None*. | |
| 51 | |
| 52 -h, --help | |
| 53 Print this help message. | |
| 54 | |
| 55 -m, --mode *Cmpds | Files* | |
| 56 Specify how to split *SDFile(s)*: split into files with each file | |
| 57 containing specified number of compounds or split into a specified | |
| 58 number of files. | |
| 59 | |
| 60 Possible values: *Cmpds | Files*. Default: *Files*. | |
| 61 | |
| 62 For *Cmpds* value of -m, --mode option, value of --numcmpds option | |
| 63 determines the number of new files. And value of -n, --numfiles | |
| 64 option is used to figure out the number of new files for *Files* | |
| 65 value of -m, --mode option. | |
| 66 | |
| 67 -n, --numfiles *number* | |
| 68 Number of new files to generate for each *SDFile(s)*. Default: *2*. | |
| 69 | |
| 70 This value is only used during *Files* value of -m, --mode option. | |
| 71 | |
| 72 --numcmpds *number* | |
| 73 Number of compounds in each new file corresponding to each | |
| 74 *SDFile(s)*. Default: *1*. | |
| 75 | |
| 76 This value is only used during *Cmpds* value of -m, --mode option. | |
| 77 | |
| 78 -o, --overwrite | |
| 79 Overwrite existing files. | |
| 80 | |
| 81 -r, --root *rootname* | |
| 82 New SD file names are generated using the root: | |
| 83 <Root>Part<Count>.sdf. Default new file names: <InitialSDFileName> | |
| 84 Part<Count>.sdf. This option is ignored for multiple input files. | |
| 85 | |
| 86 -w,--workingdir *dirname* | |
| 87 Location of working directory. Default: current directory. | |
| 88 | |
| 89 EXAMPLES | |
| 90 To split each SD file into 5 new SD files, type: | |
| 91 | |
| 92 % SplitSDFiles.pl -n 5 -o Sample1.sdf Sample2.sdf | |
| 93 % SplitSDFiles.pl -n 5 -o *.sdf | |
| 94 | |
| 95 To split Sample1.sdf into 10 new NewSample*.sdf files, type: | |
| 96 | |
| 97 % SplitSDFiles.pl -m Files -n 10 -r NewSample -o Sample1.sdf | |
| 98 | |
| 99 To split Sample1.sdf into new NewSample*.sdf files containing maximum of | |
| 100 5 compounds in each file, type: | |
| 101 | |
| 102 % SplitSDFiles.pl -m Cmpds --numcmpds 5 -r NewSample -o Sample1.sdf | |
| 103 | |
| 104 To split Sample1.sdf into new SD files containing one compound each with | |
| 105 new file names corresponding to molname line, type: | |
| 106 | |
| 107 % SplitSDFiles.pl -m Cmpds --numcmpds 1 -c MolName -o Sample1.sdf | |
| 108 | |
| 109 To split Sample1.sdf into new SD files containing one compound each with | |
| 110 new file names corresponding to value of datafield MolID, type: | |
| 111 | |
| 112 % SplitSDFiles.pl -m Cmpds --numcmpds 1 -c DataField -d MolID | |
| 113 -o Sample1.sdf | |
| 114 | |
| 115 AUTHOR | |
| 116 Manish Sud <msud@san.rr.com> | |
| 117 | |
| 118 SEE ALSO | |
| 119 InfoSDFiles.pl, JoinSDFiles.pl, MolFilesToSD.pl, SDToMolFiles.pl | |
| 120 | |
| 121 COPYRIGHT | |
| 122 Copyright (C) 2015 Manish Sud. All rights reserved. | |
| 123 | |
| 124 This file is part of MayaChemTools. | |
| 125 | |
| 126 MayaChemTools is free software; you can redistribute it and/or modify it | |
| 127 under the terms of the GNU Lesser General Public License as published by | |
| 128 the Free Software Foundation; either version 3 of the License, or (at | |
| 129 your option) any later version. | |
| 130 | 
