0
|
1 NAME
|
|
2 MergeTextFilesWithSD.pl - Merge CSV or TSV TextFile(s) into SDFile
|
|
3
|
|
4 SYNOPSIS
|
|
5 MergeTextFilesWithSD.pl SDFile TextFile(s)...
|
|
6
|
|
7 MergeTextFilesWithSD.pl [-h, --help] [--indelim comma | semicolon] [-c,
|
|
8 --columns colnum,...;... | collabel,...;...] [-k, --keys colkeynum;... |
|
|
9 colkeylabel;...] [-m, --mode colnum | collabel] [-o, --overwrite] [-r,
|
|
10 --root rootname] [-s, --sdkey sdfieldname] [-w, --workingdir dirname]
|
|
11 SDFile TextFile(s)...
|
|
12
|
|
13 DESCRIPTION
|
|
14 Merge multiple CSV or TSV *TextFile(s)* into *SDFile*. Unless -k --keys
|
|
15 option is used, data rows from all *TextFile(s)* are added to *SDFile*
|
|
16 in a sequential order, and the number of compounds in *SDFile* is used
|
|
17 to determine how many rows of data are added from *TextFile(s)*.
|
|
18
|
|
19 Multiple *TextFile(s)* names are separated by spaces. The valid file
|
|
20 extensions are *.csv* and *.tsv* for comma/semicolon and tab delimited
|
|
21 text files respectively. All other file names are ignored. All the text
|
|
22 files in a current directory can be specified by **.csv*, **.tsv*, or
|
|
23 the current directory name. The --indelim option determines the format
|
|
24 of *TextFile(s)*. Any file which doesn't correspond to the format
|
|
25 indicated by --indelim option is ignored.
|
|
26
|
|
27 OPTIONS
|
|
28 -h, --help
|
|
29 Print this help message.
|
|
30
|
|
31 --indelim *comma | semicolon*
|
|
32 Input delimiter for CSV *TextFile(s)*. Possible values: *comma or
|
|
33 semicolon*. Default value: *comma*. For TSV files, this option is
|
|
34 ignored and *tab* is used as a delimiter.
|
|
35
|
|
36 -c, --columns *colnum,...;... | collabel,...;...*
|
|
37 This value is mode specific. It is a list of columns to merge into
|
|
38 *SDFile* specified by column numbers or labels for each text file
|
|
39 delimited by ";". All *TextFile(s)* are merged into *SDFile*.
|
|
40
|
|
41 Default value: *all;all;...*. By default, all columns from
|
|
42 TextFile(s) are merged into *SDFile*.
|
|
43
|
|
44 For *colnum* mode, input value format is:
|
|
45 *colnum,...;colnum,...;...*. Example:
|
|
46
|
|
47 "1,2;1,3,4;7,8,9"
|
|
48
|
|
49 For *collabel* mode, input value format is:
|
|
50 *collabel,...;collabel,...;...*. Example:
|
|
51
|
|
52 "MW,SumNO;SumNHOH,ClogP,PSA;MolName,Mol_Id,Extreg"
|
|
53
|
|
54 -k, --keys *colkeynum;... | colkeylabel;...*
|
|
55 This value is mode specific. It specifies column keys to use for
|
|
56 merging *TextFile(s)* into *SDFile*. The column keys, delimited by
|
|
57 ";", are specified by column numbers or labels for *TextFile(s)*.
|
|
58
|
|
59 By default, data rows from *TextFile(s)* are merged into *SDFile* in
|
|
60 the order they appear.
|
|
61
|
|
62 For *colnum* mode, input value format is:*colkeynum, colkeynum;...*.
|
|
63 Example:
|
|
64
|
|
65 "1;3;7"
|
|
66
|
|
67 For *collabel* mode, input value format is:*colkeylabel,
|
|
68 colkeylabel;...*. Example:
|
|
69
|
|
70 "Mol_Id;Mol_Id;Cmpd_Id"
|
|
71
|
|
72 -m, --mode *colnum | collabel*
|
|
73 Specify how to merge *TextFile(s)* into *SDFile*: using column
|
|
74 numbers or column labels. Possible values: *colnum or collabel*.
|
|
75 Default value: *colnum*.
|
|
76
|
|
77 -o, --overwrite
|
|
78 Overwrite existing files.
|
|
79
|
|
80 -r, --root *rootname*
|
|
81 New SD file name is generated using the root: <Root>.sdf. Default
|
|
82 file name:
|
|
83 <InitialSDFileName>MergedWith<FirstTextFileName>1To<Count>.sdf.
|
|
84
|
|
85 -s, --sdkey *sdfieldname*
|
|
86 *SDFile* data field name used as a key to merge data from
|
|
87 TextFile(s). By default, data rows from *TextFile(s)* are merged
|
|
88 into *SDFile* in the order they appear.
|
|
89
|
|
90 -w, --workingdir *dirname*
|
|
91 Location of working directory. Default: current directory.
|
|
92
|
|
93 EXAMPLES
|
|
94 To merge Sample1.csv and Sample2.csv into Sample.sdf and generate
|
|
95 NewSample.sdf, type:
|
|
96
|
|
97 % MergeTextFileswithSD.pl -r NewSample -o Sample.sdf
|
|
98 Sample1.csv Sample2.csv
|
|
99
|
|
100 To merge all Sample*.tsv into Sample.sdf and generate NewSample.sdf
|
|
101 file, type:
|
|
102
|
|
103 % MergeTextFilesWithSD.pl -r NewSample -o Sample.sdf
|
|
104 Sample*.tsv
|
|
105
|
|
106 To merge column numbers "1,2" and "3,4,5" from Sample2.csv and
|
|
107 Sample3.csv into Sample.sdf and to generate NewSample.sdf, type:
|
|
108
|
|
109 % MergeTextFilesWithSD.pl -r NewSample -m colnum -c "1,2;3,4,5"
|
|
110 -o Sample.sdf Sample1.csv Sample2.csv
|
|
111
|
|
112 To merge column "Mol_ID,Formula,MolWeight" and "Mol_ID,ChemBankID,NAME"
|
|
113 from Sample1.csv and Sample2.csv into Sample.sdf using "Mol_ID" as SD
|
|
114 and column keys to generate NewSample.sdf, type:
|
|
115
|
|
116 % MergeTextFilesWithSD.pl -r NewSample -s Mol_ID -k "Mol_ID;Mol_ID"
|
|
117 -m collabel -c "Mol_ID,Formula,MolWeight;Mol_ID,ChemBankID,NAME"
|
|
118 -o Sample1.sdf Sample1.csv Sample2.csv
|
|
119
|
|
120 AUTHOR
|
|
121 Manish Sud <msud@san.rr.com>
|
|
122
|
|
123 SEE ALSO
|
|
124 ExtractFromSDFiles.pl, FilterSDFiles.pl, InfoSDFiles.pl, JoinSDFiles.pl,
|
|
125 JoinTextFiles.pl, MergeTextFiles.pl, ModifyTextFilesFormat.pl,
|
|
126 SplitSDFiles.pl, SplitTextFiles.pl
|
|
127
|
|
128 COPYRIGHT
|
|
129 Copyright (C) 2015 Manish Sud. All rights reserved.
|
|
130
|
|
131 This file is part of MayaChemTools.
|
|
132
|
|
133 MayaChemTools is free software; you can redistribute it and/or modify it
|
|
134 under the terms of the GNU Lesser General Public License as published by
|
|
135 the Free Software Foundation; either version 3 of the License, or (at
|
|
136 your option) any later version.
|
|
137
|