annotate docs/scripts/txt/MergeTextFilesWithSD.txt @ 0:4816e4a8ae95 draft default tip

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 09:23:18 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
1 NAME
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
2 MergeTextFilesWithSD.pl - Merge CSV or TSV TextFile(s) into SDFile
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
3
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
4 SYNOPSIS
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
5 MergeTextFilesWithSD.pl SDFile TextFile(s)...
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
6
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
7 MergeTextFilesWithSD.pl [-h, --help] [--indelim comma | semicolon] [-c,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
8 --columns colnum,...;... | collabel,...;...] [-k, --keys colkeynum;... |
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
9 colkeylabel;...] [-m, --mode colnum | collabel] [-o, --overwrite] [-r,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
10 --root rootname] [-s, --sdkey sdfieldname] [-w, --workingdir dirname]
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
11 SDFile TextFile(s)...
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
12
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
13 DESCRIPTION
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
14 Merge multiple CSV or TSV *TextFile(s)* into *SDFile*. Unless -k --keys
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
15 option is used, data rows from all *TextFile(s)* are added to *SDFile*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
16 in a sequential order, and the number of compounds in *SDFile* is used
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
17 to determine how many rows of data are added from *TextFile(s)*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
18
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
19 Multiple *TextFile(s)* names are separated by spaces. The valid file
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
20 extensions are *.csv* and *.tsv* for comma/semicolon and tab delimited
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
21 text files respectively. All other file names are ignored. All the text
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
22 files in a current directory can be specified by **.csv*, **.tsv*, or
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
23 the current directory name. The --indelim option determines the format
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
24 of *TextFile(s)*. Any file which doesn't correspond to the format
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
25 indicated by --indelim option is ignored.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
26
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
27 OPTIONS
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
28 -h, --help
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
29 Print this help message.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
30
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
31 --indelim *comma | semicolon*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
32 Input delimiter for CSV *TextFile(s)*. Possible values: *comma or
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
33 semicolon*. Default value: *comma*. For TSV files, this option is
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
34 ignored and *tab* is used as a delimiter.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
35
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
36 -c, --columns *colnum,...;... | collabel,...;...*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
37 This value is mode specific. It is a list of columns to merge into
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
38 *SDFile* specified by column numbers or labels for each text file
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
39 delimited by ";". All *TextFile(s)* are merged into *SDFile*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
40
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
41 Default value: *all;all;...*. By default, all columns from
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
42 TextFile(s) are merged into *SDFile*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
43
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
44 For *colnum* mode, input value format is:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
45 *colnum,...;colnum,...;...*. Example:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
46
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
47 "1,2;1,3,4;7,8,9"
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
48
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
49 For *collabel* mode, input value format is:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
50 *collabel,...;collabel,...;...*. Example:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
51
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
52 "MW,SumNO;SumNHOH,ClogP,PSA;MolName,Mol_Id,Extreg"
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
53
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
54 -k, --keys *colkeynum;... | colkeylabel;...*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
55 This value is mode specific. It specifies column keys to use for
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
56 merging *TextFile(s)* into *SDFile*. The column keys, delimited by
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
57 ";", are specified by column numbers or labels for *TextFile(s)*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
58
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
59 By default, data rows from *TextFile(s)* are merged into *SDFile* in
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
60 the order they appear.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
61
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
62 For *colnum* mode, input value format is:*colkeynum, colkeynum;...*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
63 Example:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
64
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
65 "1;3;7"
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
66
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
67 For *collabel* mode, input value format is:*colkeylabel,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
68 colkeylabel;...*. Example:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
69
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
70 "Mol_Id;Mol_Id;Cmpd_Id"
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
71
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
72 -m, --mode *colnum | collabel*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
73 Specify how to merge *TextFile(s)* into *SDFile*: using column
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
74 numbers or column labels. Possible values: *colnum or collabel*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
75 Default value: *colnum*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
76
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
77 -o, --overwrite
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
78 Overwrite existing files.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
79
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
80 -r, --root *rootname*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
81 New SD file name is generated using the root: <Root>.sdf. Default
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
82 file name:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
83 <InitialSDFileName>MergedWith<FirstTextFileName>1To<Count>.sdf.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
84
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
85 -s, --sdkey *sdfieldname*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
86 *SDFile* data field name used as a key to merge data from
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
87 TextFile(s). By default, data rows from *TextFile(s)* are merged
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
88 into *SDFile* in the order they appear.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
89
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
90 -w, --workingdir *dirname*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
91 Location of working directory. Default: current directory.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
92
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
93 EXAMPLES
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
94 To merge Sample1.csv and Sample2.csv into Sample.sdf and generate
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
95 NewSample.sdf, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
96
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
97 % MergeTextFileswithSD.pl -r NewSample -o Sample.sdf
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
98 Sample1.csv Sample2.csv
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
99
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
100 To merge all Sample*.tsv into Sample.sdf and generate NewSample.sdf
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
101 file, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
102
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
103 % MergeTextFilesWithSD.pl -r NewSample -o Sample.sdf
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
104 Sample*.tsv
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
105
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
106 To merge column numbers "1,2" and "3,4,5" from Sample2.csv and
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
107 Sample3.csv into Sample.sdf and to generate NewSample.sdf, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
108
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
109 % MergeTextFilesWithSD.pl -r NewSample -m colnum -c "1,2;3,4,5"
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
110 -o Sample.sdf Sample1.csv Sample2.csv
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
111
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
112 To merge column "Mol_ID,Formula,MolWeight" and "Mol_ID,ChemBankID,NAME"
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
113 from Sample1.csv and Sample2.csv into Sample.sdf using "Mol_ID" as SD
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
114 and column keys to generate NewSample.sdf, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
115
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
116 % MergeTextFilesWithSD.pl -r NewSample -s Mol_ID -k "Mol_ID;Mol_ID"
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
117 -m collabel -c "Mol_ID,Formula,MolWeight;Mol_ID,ChemBankID,NAME"
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
118 -o Sample1.sdf Sample1.csv Sample2.csv
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
119
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
120 AUTHOR
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
121 Manish Sud <msud@san.rr.com>
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
122
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
123 SEE ALSO
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
124 ExtractFromSDFiles.pl, FilterSDFiles.pl, InfoSDFiles.pl, JoinSDFiles.pl,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
125 JoinTextFiles.pl, MergeTextFiles.pl, ModifyTextFilesFormat.pl,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
126 SplitSDFiles.pl, SplitTextFiles.pl
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
127
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
128 COPYRIGHT
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
129 Copyright (C) 2015 Manish Sud. All rights reserved.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
130
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
131 This file is part of MayaChemTools.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
132
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
133 MayaChemTools is free software; you can redistribute it and/or modify it
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
134 under the terms of the GNU Lesser General Public License as published by
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
135 the Free Software Foundation; either version 3 of the License, or (at
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
136 your option) any later version.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
137