annotate docs/scripts/txt/MergeTextFiles.txt @ 0:4816e4a8ae95 draft default tip

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 09:23:18 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
1 NAME
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
2 MergeTextFiles.pl - Merge multiple CSV or TSV text files into a single
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
3 text file
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
4
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
5 SYNOPSIS
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
6 MergeTextFiles.pl TextFiles...
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
7
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
8 MergeTextFiles.pl [-h, --help] [--indelim comma | semicolon] [-c,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
9 --columns colnum,...;... | collabel,...;...] [-k, --keys colnum,...;...
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
10 | collabel,...;...] [-m, --mode colnum | collabel] [-o, --overwrite]
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
11 [--outdelim comma | tab | semicolon] [-q, --quote yes | no] [-r, --root
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
12 rootname] [-s, --startcol colnum | collabel] [--startcolmode before |
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
13 after] [-w, --workingdir dirname] TextFiles...
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
14
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
15 DESCRIPTION
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
16 Merge multiple CSV or TSV *TextFiles* into first *TextFile* to generate
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
17 a single text file. Unless -k --keys option is used, data rows from
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
18 other *TextFiles* are added to first *TextFile* in a sequential order,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
19 and the number of rows in first *TextFile* is used to determine how many
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
20 rows of data are added from other *TextFiles*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
21
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
22 Multiple *TextFiles* names are separated by space. The valid file
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
23 extensions are *.csv* and *.tsv* for comma/semicolon and tab delimited
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
24 text files respectively. All other file names are ignored. All the text
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
25 files in a current directory can be specified by **.csv*, **.tsv*, or
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
26 the current directory name. The --indelim option determines the format
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
27 of *TextFiles*. Any file which doesn't correspond to the format
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
28 indicated by --indelim option is ignored.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
29
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
30 OPTIONS
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
31 -h, --help
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
32 Print this help message.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
33
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
34 --indelim *comma | semicolon*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
35 Input delimiter for CSV *TextFile(s)*. Possible values: *comma or
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
36 semicolon*. Default value: *comma*. For TSV files, this option is
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
37 ignored and *tab* is used as a delimiter.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
38
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
39 -c, --columns *colnum,...;... | collabel,...;...*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
40 This value is mode specific. It is a list of columns to merge into
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
41 first text file specified by column numbers or labels for each text
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
42 file delimited by ";". All specified text files are merged into
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
43 first text file.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
44
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
45 Default value: *all;all;...*. By default, all columns from specified
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
46 text files are merged into first text file.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
47
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
48 For *colnum* mode, input value format is:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
49 *colnum,...;colnum,...;...*. Example:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
50
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
51 "1,2;1,3,4;7,8,9"
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
52
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
53 For *collabel* mode, input value format is:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
54 *collabel,...;collabel,...;...*. Example:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
55
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
56 "MW,SumNO;SumNHOH,ClogP,PSA;MolName,Mol_Id,Extreg"
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
57
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
58 -k, --keys *colnum,...;... | collabel,...;...*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
59 This value is mode specific. It specifies column keys to use for
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
60 merging all specified text files into first text file. The column
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
61 keys are specified by column numbers or labels for each text file
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
62 delimited by ";".
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
63
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
64 By default, data rows from text files are merged into first file in
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
65 the order they appear.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
66
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
67 For *colnum* mode, input value format is:*colkeynum, colkeynum;...*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
68 Example:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
69
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
70 "1;3;7"
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
71
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
72 For *collabel* mode, input value format is:*colkeylabel,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
73 colkeylabel;...*. Example:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
74
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
75 "Mol_Id;Mol_Id;Cmpd_Id"
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
76
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
77 -m, --mode *colnum | collabel*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
78 Specify how to merge text files: using column numbers or column
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
79 labels. Possible values: *colnum or collabel*. Default value:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
80 *colnum*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
81
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
82 -o, --overwrite
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
83 Overwrite existing files.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
84
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
85 --outdelim *comma | tab | semicolon*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
86 Output text file delimiter. Possible values: *comma, tab, or
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
87 semicolon* Default value: *comma*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
88
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
89 -q, --quote *yes | no*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
90 Put quotes around column values in output text file. Possible
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
91 values: *yes or no*. Default value: *yes*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
92
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
93 -r, --root *rootname*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
94 New text file name is generated using the root: <Root>.<Ext>.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
95 Default file name: <FirstTextFileName>1To<Count>Merged.<Ext>. The
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
96 csv, and tsv <Ext> values are used for comma/semicolon, and tab
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
97 delimited text files respectively.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
98
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
99 -s, --startcol *colnum | collabel*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
100 This value is mode specific. It specifies the column in first text
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
101 file which is used for start merging other text files.For *colnum*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
102 mode, specify column number and for *collabel* mode, specify column
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
103 label.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
104
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
105 Default value: *last*. Start merge after the last column.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
106
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
107 --startcolmode *before | after*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
108 Start the merge before or after the -s, --startcol value. Possible
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
109 values: *before or after* Default value: *after*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
110
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
111 -w, --workingdir *dirname*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
112 Location of working directory. Default: current directory.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
113
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
114 EXAMPLES
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
115 To merge Sample2.csv and Sample3.csv into Sample1.csv and generate
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
116 NewSample.csv, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
117
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
118 % MergeTextFiles.pl -r NewSample -o Sample1.csv Sample2.csv
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
119 Sample3.csv
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
120
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
121 To merge all Sample*.tsv and generate NewSample.tsv file, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
122
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
123 % MergeTextFiles.pl -r NewSample --indelim comma --outdelim tab -o
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
124 Sample*.csv
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
125
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
126 To merge column numbers "1,2" and "3,4,5" from Sample2.csv and
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
127 Sample3.csv into Sample1.csv starting before column number 3 in
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
128 Sample1.csv and to generate NewSample.csv without quoting column data,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
129 type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
130
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
131 % MergeTextFiles.pl -s 3 --startcolmode before -r NewSample -q no
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
132 -m colnum -c "all;1,2;3,4,5" -o Sample1.csv Sample2.csv
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
133 Sample3.csv
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
134
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
135 To merge column "Mol_ID,Formula,MolWeight" and "Mol_ID,NAME,ChemBankID"
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
136 from Sample2.csv and Sample3.csv into Sample1.csv using "Mol_ID" as a
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
137 column keys starting after the last column and to generate
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
138 NewSample.tsv, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
139
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
140 % MergeTextFiles.pl -r NewSample --outdelim tab -k "Mol_ID;Mol_ID;
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
141 Mol_ID" -m collabel -c "all;Mol_ID,Formula,MolWeight;Mol_ID,NAME,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
142 ChemBankID" -o Sample1.csv Sample2.csv Sample3.csv
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
143
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
144 AUTHOR
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
145 Manish Sud <msud@san.rr.com>
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
146
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
147 SEE ALSO
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
148 JoinTextFiles.pl, MergeTextFilesWithSD.pl, ModifyTextFilesFormat.pl,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
149 SplitTextFiles.pl
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
150
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
151 COPYRIGHT
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
152 Copyright (C) 2015 Manish Sud. All rights reserved.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
153
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
154 This file is part of MayaChemTools.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
155
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
156 MayaChemTools is free software; you can redistribute it and/or modify it
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
157 under the terms of the GNU Lesser General Public License as published by
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
158 the Free Software Foundation; either version 3 of the License, or (at
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
159 your option) any later version.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
160