annotate mayachemtool/mayachemtools/docs/scripts/txt/SplitTextFiles.txt @ 0:a4a2ad5a214e draft default tip

Uploaded
author deepakjadmin
date Thu, 05 Nov 2015 02:37:56 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
1 NAME
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
2 SplitTextFiles.pl - Split CSV or TSV TextFile(s) into multiple text
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
3 files
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
4
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
5 SYNOPSIS
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
6 SplitTextFiles.pl TextFile(s)...
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
7
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
8 SplitTextFiles.pl [-f, --fast] [-h, --help] [--indelim comma |
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
9 semicolon] [-l, --label yes | no] [-n, --numfiles number] [-o,
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
10 --overwrite] [--outdelim comma | tab | semicolon] [-q, --quote yes | no]
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
11 [-r, --root rootname] [-w, --workingdir dirname] TextFile(s)...
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
12
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
13 DESCRIPTION
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
14 Split CSV or TSV *TextFile(s)* into multiple text files. Each new text
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
15 file contains a subset of similar number of lines from the initial file.
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
16 The file names are separated by space. The valid file extensions are
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
17 *.csv* and *.tsv* for comma/semicolon and tab delimited text files
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
18 respectively. All other file names are ignored. All the text files in a
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
19 current directory can be specified by **.csv*, **.tsv*, or the current
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
20 directory name. The --indelim option determines the format of
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
21 *TextFile(s)*. Any file which doesn't correspond to the format indicated
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
22 by --indelim option is ignored.
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
23
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
24 OPTIONS
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
25 -f, --fast
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
26 In this mode, --indelim, --outdelim, and -q --quote options are
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
27 ignored. The format of input and output file(s) are assumed to be
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
28 similar. And the text lines from input *TextFile(s)* are just
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
29 transferred to output file(s) without any processing.
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
30
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
31 -h, --help
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
32 Print this help message.
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
33
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
34 --indelim *comma | semicolon*
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
35 Input delimiter for CSV *TextFile(s)*. Possible values: *comma or
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
36 semicolon*. Default value: *comma*. For TSV files, this option is
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
37 ignored and *tab* is used as a delimiter.
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
38
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
39 -l, --label *yes | no*
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
40 First line contains column labels. Possible values: *yes or no*.
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
41 Default value: *yes*.
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
42
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
43 -n, --numfiles *number*
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
44 Number of new files to generate for each TextFile(s). Default: *2*.
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
45
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
46 -o, --overwrite
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
47 Overwrite existing files.
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
48
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
49 --outdelim *comma | tab | semicolon*
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
50 Output text file delimiter. Possible values: *comma, tab, or
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
51 semicolon*. Default value: *comma*
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
52
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
53 -q, --quote *yes | no*
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
54 Put quotes around column values in output text file. Possible
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
55 values: *yes or no*. Default value: *yes*.
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
56
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
57 -r, --root *rootname*
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
58 New text file names are generated using the root:
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
59 <Root>Part<Count>.<Ext>. Default new file names:
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
60 <InitialTextFileName>Part<Count>.<Ext>. The csv, and tsv <Ext>
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
61 values are used for comma/semicolon, and tab delimited text files
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
62 respectively.This option is ignored for multiple input files.
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
63
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
64 -w, --workingdir *dirname*
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
65 Location of working directory. Default: current directory.
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
66
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
67 EXAMPLES
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
68 To split each CSV text files into 4 different text files type:
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
69
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
70 % SplitTextFiles.pl -n 5 -o Sample1.csv Sample2.csv
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
71 % SplitTextFiles.pl -n 5 -o *.csv
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
72
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
73 To split Sample1.tsv into 10 different CSV text files, type:
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
74
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
75 % SplitTextFiles.pl -n 10 --outdelim comma -o Sample1.tsv
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
76
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
77 AUTHOR
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
78 Manish Sud <msud@san.rr.com>
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
79
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
80 SEE ALSO
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
81 JoinTextFiles.pl, MergeTextFiles.pl, ModifyTextFilesFormat.pl
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
82
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
83 COPYRIGHT
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
84 Copyright (C) 2015 Manish Sud. All rights reserved.
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
85
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
86 This file is part of MayaChemTools.
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
87
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
88 MayaChemTools is free software; you can redistribute it and/or modify it
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
89 under the terms of the GNU Lesser General Public License as published by
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
90 the Free Software Foundation; either version 3 of the License, or (at
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
91 your option) any later version.
a4a2ad5a214e Uploaded
deepakjadmin
parents:
diff changeset
92