comparison docs/scripts/txt/MergeTextFiles.txt @ 0:4816e4a8ae95 draft default tip

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 09:23:18 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:4816e4a8ae95
1 NAME
2 MergeTextFiles.pl - Merge multiple CSV or TSV text files into a single
3 text file
4
5 SYNOPSIS
6 MergeTextFiles.pl TextFiles...
7
8 MergeTextFiles.pl [-h, --help] [--indelim comma | semicolon] [-c,
9 --columns colnum,...;... | collabel,...;...] [-k, --keys colnum,...;...
10 | collabel,...;...] [-m, --mode colnum | collabel] [-o, --overwrite]
11 [--outdelim comma | tab | semicolon] [-q, --quote yes | no] [-r, --root
12 rootname] [-s, --startcol colnum | collabel] [--startcolmode before |
13 after] [-w, --workingdir dirname] TextFiles...
14
15 DESCRIPTION
16 Merge multiple CSV or TSV *TextFiles* into first *TextFile* to generate
17 a single text file. Unless -k --keys option is used, data rows from
18 other *TextFiles* are added to first *TextFile* in a sequential order,
19 and the number of rows in first *TextFile* is used to determine how many
20 rows of data are added from other *TextFiles*.
21
22 Multiple *TextFiles* names are separated by space. The valid file
23 extensions are *.csv* and *.tsv* for comma/semicolon and tab delimited
24 text files respectively. All other file names are ignored. All the text
25 files in a current directory can be specified by **.csv*, **.tsv*, or
26 the current directory name. The --indelim option determines the format
27 of *TextFiles*. Any file which doesn't correspond to the format
28 indicated by --indelim option is ignored.
29
30 OPTIONS
31 -h, --help
32 Print this help message.
33
34 --indelim *comma | semicolon*
35 Input delimiter for CSV *TextFile(s)*. Possible values: *comma or
36 semicolon*. Default value: *comma*. For TSV files, this option is
37 ignored and *tab* is used as a delimiter.
38
39 -c, --columns *colnum,...;... | collabel,...;...*
40 This value is mode specific. It is a list of columns to merge into
41 first text file specified by column numbers or labels for each text
42 file delimited by ";". All specified text files are merged into
43 first text file.
44
45 Default value: *all;all;...*. By default, all columns from specified
46 text files are merged into first text file.
47
48 For *colnum* mode, input value format is:
49 *colnum,...;colnum,...;...*. Example:
50
51 "1,2;1,3,4;7,8,9"
52
53 For *collabel* mode, input value format is:
54 *collabel,...;collabel,...;...*. Example:
55
56 "MW,SumNO;SumNHOH,ClogP,PSA;MolName,Mol_Id,Extreg"
57
58 -k, --keys *colnum,...;... | collabel,...;...*
59 This value is mode specific. It specifies column keys to use for
60 merging all specified text files into first text file. The column
61 keys are specified by column numbers or labels for each text file
62 delimited by ";".
63
64 By default, data rows from text files are merged into first file in
65 the order they appear.
66
67 For *colnum* mode, input value format is:*colkeynum, colkeynum;...*.
68 Example:
69
70 "1;3;7"
71
72 For *collabel* mode, input value format is:*colkeylabel,
73 colkeylabel;...*. Example:
74
75 "Mol_Id;Mol_Id;Cmpd_Id"
76
77 -m, --mode *colnum | collabel*
78 Specify how to merge text files: using column numbers or column
79 labels. Possible values: *colnum or collabel*. Default value:
80 *colnum*.
81
82 -o, --overwrite
83 Overwrite existing files.
84
85 --outdelim *comma | tab | semicolon*
86 Output text file delimiter. Possible values: *comma, tab, or
87 semicolon* Default value: *comma*.
88
89 -q, --quote *yes | no*
90 Put quotes around column values in output text file. Possible
91 values: *yes or no*. Default value: *yes*.
92
93 -r, --root *rootname*
94 New text file name is generated using the root: <Root>.<Ext>.
95 Default file name: <FirstTextFileName>1To<Count>Merged.<Ext>. The
96 csv, and tsv <Ext> values are used for comma/semicolon, and tab
97 delimited text files respectively.
98
99 -s, --startcol *colnum | collabel*
100 This value is mode specific. It specifies the column in first text
101 file which is used for start merging other text files.For *colnum*
102 mode, specify column number and for *collabel* mode, specify column
103 label.
104
105 Default value: *last*. Start merge after the last column.
106
107 --startcolmode *before | after*
108 Start the merge before or after the -s, --startcol value. Possible
109 values: *before or after* Default value: *after*.
110
111 -w, --workingdir *dirname*
112 Location of working directory. Default: current directory.
113
114 EXAMPLES
115 To merge Sample2.csv and Sample3.csv into Sample1.csv and generate
116 NewSample.csv, type:
117
118 % MergeTextFiles.pl -r NewSample -o Sample1.csv Sample2.csv
119 Sample3.csv
120
121 To merge all Sample*.tsv and generate NewSample.tsv file, type:
122
123 % MergeTextFiles.pl -r NewSample --indelim comma --outdelim tab -o
124 Sample*.csv
125
126 To merge column numbers "1,2" and "3,4,5" from Sample2.csv and
127 Sample3.csv into Sample1.csv starting before column number 3 in
128 Sample1.csv and to generate NewSample.csv without quoting column data,
129 type:
130
131 % MergeTextFiles.pl -s 3 --startcolmode before -r NewSample -q no
132 -m colnum -c "all;1,2;3,4,5" -o Sample1.csv Sample2.csv
133 Sample3.csv
134
135 To merge column "Mol_ID,Formula,MolWeight" and "Mol_ID,NAME,ChemBankID"
136 from Sample2.csv and Sample3.csv into Sample1.csv using "Mol_ID" as a
137 column keys starting after the last column and to generate
138 NewSample.tsv, type:
139
140 % MergeTextFiles.pl -r NewSample --outdelim tab -k "Mol_ID;Mol_ID;
141 Mol_ID" -m collabel -c "all;Mol_ID,Formula,MolWeight;Mol_ID,NAME,
142 ChemBankID" -o Sample1.csv Sample2.csv Sample3.csv
143
144 AUTHOR
145 Manish Sud <msud@san.rr.com>
146
147 SEE ALSO
148 JoinTextFiles.pl, MergeTextFilesWithSD.pl, ModifyTextFilesFormat.pl,
149 SplitTextFiles.pl
150
151 COPYRIGHT
152 Copyright (C) 2015 Manish Sud. All rights reserved.
153
154 This file is part of MayaChemTools.
155
156 MayaChemTools is free software; you can redistribute it and/or modify it
157 under the terms of the GNU Lesser General Public License as published by
158 the Free Software Foundation; either version 3 of the License, or (at
159 your option) any later version.
160