Mercurial > repos > deepakjadmin > mayatool3_test2
comparison docs/scripts/txt/ExtractFromTextFiles.txt @ 0:4816e4a8ae95 draft default tip
Uploaded
| author | deepakjadmin |
|---|---|
| date | Wed, 20 Jan 2016 09:23:18 -0500 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| -1:000000000000 | 0:4816e4a8ae95 |
|---|---|
| 1 NAME | |
| 2 ExtractFromTextFiles.pl - Extract specific data from TextFile(s) | |
| 3 | |
| 4 SYNOPSIS | |
| 5 ExtractFromTextFiles.pl TextFile(s)... | |
| 6 | |
| 7 ExtractFromTextFiles.pl [-c, --colmode colnum | collabel] [--categorycol | |
| 8 number | string] [--columns "colnum,[colnum]..." | | |
| 9 "collabel,[collabel]..."] [-h, --help] [--indelim *comma | semicolon*] | |
| 10 [-m, --mode *columns | rows | categories*] [-o, --overwrite] [--outdelim | |
| 11 *comma | tab | semicolon*] [-q, --quote *yes | no*] [--rows | |
| 12 "colid,value,criteria..." | "colid,value..." | | |
| 13 "colid,mincolvalue,maxcolvalue" | "rownum,rownum,..." | colid | | |
| 14 "minrownum,maxrownum"] [ --rowsmode rowsbycolvalue | rowsbycolvaluelist | |
| 15 | rowsbycolvaluerange | rowbymincolvalue | rowbymaxcolvalue | rownums | | |
| 16 rownumrange] [-r, --root *rootname*] [-w, --workingdir *dirname*] | |
| 17 TextFile(s)... | |
| 18 | |
| 19 DESCRIPTION | |
| 20 Extract column(s)/row(s) data from *TextFile(s)* identified by column | |
| 21 numbers or labels. Or categorize data using a specified column category. | |
| 22 During categorization, a summary text file is generated containing | |
| 23 category name and count; an additional text file, containing data for | |
| 24 for each category, is also generated. The file names are separated by | |
| 25 space. The valid file extensions are *.csv* and *.tsv* for | |
| 26 comma/semicolon and tab delimited text files respectively. All other | |
| 27 file names are ignored. All the text files in a current directory can be | |
| 28 specified by **.csv*, **.tsv*, or the current directory name. The | |
| 29 --indelim option determines the format of *TextFile(s)*. Any file which | |
| 30 doesn't correspond to the format indicated by --indelim option is | |
| 31 ignored. | |
| 32 | |
| 33 OPTIONS | |
| 34 -c, --colmode *colnum | collabel* | |
| 35 Specify how columns are identified in *TextFile(s)*: using column | |
| 36 number or column label. Possible values: *colnum or collabel*. | |
| 37 Default value: *colnum*. | |
| 38 | |
| 39 --categorycol *number | string* | |
| 40 Column used to categorize data. Default value: First column. | |
| 41 | |
| 42 For *colnum* value of -c, --colmode option, input value is a column | |
| 43 number. Example: *1*. | |
| 44 | |
| 45 For *collabel* value of -c, --colmode option, input value is a | |
| 46 column label. Example: *Mol_ID*. | |
| 47 | |
| 48 --columns *"colnum,[colnum]..." | "collabel,[collabel]..."* | |
| 49 List of comma delimited columns to extract. Default value: First | |
| 50 column. | |
| 51 | |
| 52 For *colnum* value of -c, --colmode option, input values format is: | |
| 53 *colnum,colnum,...*. Example: *1,3,5* | |
| 54 | |
| 55 For *collabel* value of -c, --colmode option, input values format | |
| 56 is: *collabel,collabel,..*. Example: *Mol_ID,MolWeight* | |
| 57 | |
| 58 -h, --help | |
| 59 Print this help message. | |
| 60 | |
| 61 --indelim *comma | semicolon* | |
| 62 Input delimiter for CSV *TextFile(s)*. Possible values: *comma or | |
| 63 semicolon*. Default value: *comma*. For TSV files, this option is | |
| 64 ignored and *tab* is used as a delimiter. | |
| 65 | |
| 66 -m, --mode *columns | rows | categories* | |
| 67 Specify what to extract from *TextFile(s)*. Possible values: | |
| 68 *columns, rows, or categories*. Default value: *columns*. | |
| 69 | |
| 70 For *columns* mode, data for appropriate columns specified by | |
| 71 --columns option is extracted from *TextFile(s)* and placed into new | |
| 72 text files. | |
| 73 | |
| 74 For *rows* mode, appropriate rows specified in conjuction with | |
| 75 --rowsmode and rows options are extracted from *TextFile(s)* and | |
| 76 placed into new text files. | |
| 77 | |
| 78 For *categories* mode, coulmn specified by --categorycol is used to | |
| 79 categorize data, and a summary text file is generated containing | |
| 80 category name and count; an additional text file, containing data | |
| 81 for for each category, is also generated. | |
| 82 | |
| 83 -o, --overwrite | |
| 84 Overwrite existing files. | |
| 85 | |
| 86 --outdelim *comma | tab | semicolon*. | |
| 87 Output text file delimiter. Possible values: *comma, tab, or | |
| 88 semicolon*. Default value: *comma* | |
| 89 | |
| 90 -q, --quote *yes | no* | |
| 91 Put quotes around column values in output text file. Possible | |
| 92 values: *yes or no*. Default value: *yes*. | |
| 93 | |
| 94 -r, --root *rootname* | |
| 95 New file name is generated using the root: <Root>.<Ext>. Default for | |
| 96 new file names: <TextFile>CategoriesSummary.<Ext>, | |
| 97 <TextFile>ExtractedColumns.<Ext>, and <TextFile>ExtractedRows.<Ext> | |
| 98 for *categories*, *columns*, and *rows* mode respectively. And | |
| 99 <TextFile>Category<CategoryName>.<Ext> for each category retrieved | |
| 100 from each text file. The output file type determines <Ext> value: | |
| 101 csv and tsv for CSV, and TSV files respectively. | |
| 102 | |
| 103 This option is ignored for multiple input files. | |
| 104 | |
| 105 --rows *"colid,value,criteria..." | "colid,value..." | | |
| 106 "colid,mincolvalue,maxcolvalue" | "rownum,rownum,..." | colid | | |
| 107 "minrownum,maxrownum"* | |
| 108 This value is --rowsmode specific. In general, it's a list of comma | |
| 109 separated column ids and associated mode specific value. Based on | |
| 110 Column ids specification, column label or number, is controlled by | |
| 111 -c, --colmode option. | |
| 112 | |
| 113 First line containing column labels is always written out. And value | |
| 114 comparisons assume numerical column data. | |
| 115 | |
| 116 For *rowsbycolvalue* mode, input value format contains these | |
| 117 triplets: *colid,value, criteria...*. Possible values for criteria: | |
| 118 *le, ge or eq*. Examples: | |
| 119 | |
| 120 MolWt,450,le | |
| 121 MolWt,450,le,LogP,5,le,SumNumNO,10,le,SumNHOH,5,le | |
| 122 | |
| 123 For *rowsbycolvaluelist* mode, input value format is: | |
| 124 *colid,value...*. Examples: | |
| 125 | |
| 126 Mol_ID,20 | |
| 127 Mol_ID,20,1002,1115 | |
| 128 | |
| 129 For *rowsbycolvaluerange* mode, input value format is: | |
| 130 *colid,mincolvalue,maxcolvalue*. Examples: | |
| 131 | |
| 132 MolWt,100,450 | |
| 133 | |
| 134 For *rowbymincolvalue, rowbymaxcolvalue* modes, input value format | |
| 135 is: *colid*. | |
| 136 | |
| 137 For *rownum* mode, input value format is: *rownum*. Default value: | |
| 138 *2*. | |
| 139 | |
| 140 For *rownumrange* mode, input value format is: *minrownum, | |
| 141 maxrownum*. Examples: | |
| 142 | |
| 143 10,40 | |
| 144 | |
| 145 --rowsmode *rowsbycolvalue | rowsbycolvaluelist | rowsbycolvaluerange | | |
| 146 rowbymincolvalue | rowbymaxcolvalue | rownums | rownumrange* | |
| 147 Specify how to extract rows from *TextFile(s)*. Possible values: | |
| 148 *rowsbycolvalue, rowsbycolvaluelist, rowsbycolvaluerange, | |
| 149 rowbymincolvalue, rowbymaxcolvalue, rownum, rownumrange*. Default | |
| 150 value: *rownum*. | |
| 151 | |
| 152 Use --rows option to list rows criterion used for extraction of rows | |
| 153 from *TextFile(s)*. | |
| 154 | |
| 155 -w, --workingdir *dirname* | |
| 156 Location of working directory. Default: current directory. | |
| 157 | |
| 158 EXAMPLES | |
| 159 To extract first column from a text file and generate a new CSV text | |
| 160 file NewSample1.csv, type: | |
| 161 | |
| 162 % ExtractFromTextFiles.pl -r NewSample1 -o Sample1.csv | |
| 163 | |
| 164 To extract columns Mol_ID, MolWeight, and NAME from Sample1.csv and | |
| 165 generate a new textfile NewSample1.tsv with no quotes, type: | |
| 166 | |
| 167 % ExtractFromTextFiles.pl -m columns -c collabel --columns "Mol_ID, | |
| 168 MolWeight,NAME" --outdelim tab --quote no -r NewSample1 | |
| 169 -o Sample1.csv | |
| 170 | |
| 171 To extract rows containing values for MolWeight column of less than 450 | |
| 172 from Sample1.csv and generate a new textfile NewSample1.csv, type: | |
| 173 | |
| 174 % ExtractFromTextFiles.pl -m rows --rowsmode rowsbycolvalue | |
| 175 -c collabel --rows MolWeight,450,le -r NewSample1 | |
| 176 -o Sample1.csv | |
| 177 | |
| 178 To extract rows containing values for MolWeight column between 400 and | |
| 179 500 from Sample1.csv and generate a new textfile NewSample1.csv, type: | |
| 180 | |
| 181 % ExtractFromTextFiles.pl -m rows --rowsmode rowsbycolvaluerange | |
| 182 -c collabel --rows MolWeight,450,500 -r NewSample1 | |
| 183 -o Sample1.csv | |
| 184 | |
| 185 To extract a row containing minimum value for column MolWeight from | |
| 186 Sample1.csv and generate a new textfile NewSample1.csv, type: | |
| 187 | |
| 188 % ExtractFromTextFiles.pl -m rows --rowsmode rowbymincolvalue | |
| 189 -c collabel --rows MolWeight -r NewSample1 | |
| 190 -o Sample1.csv | |
| 191 | |
| 192 AUTHOR | |
| 193 Manish Sud <msud@san.rr.com> | |
| 194 | |
| 195 SEE ALSO | |
| 196 JoinTextFiles.pl, MergeTextFilesWithSD.pl, ModifyTextFilesFormat.pl, | |
| 197 SplitTextFiles.pl | |
| 198 | |
| 199 COPYRIGHT | |
| 200 Copyright (C) 2015 Manish Sud. All rights reserved. | |
| 201 | |
| 202 This file is part of MayaChemTools. | |
| 203 | |
| 204 MayaChemTools is free software; you can redistribute it and/or modify it | |
| 205 under the terms of the GNU Lesser General Public License as published by | |
| 206 the Free Software Foundation; either version 3 of the License, or (at | |
| 207 your option) any later version. | |
| 208 |
