text_processing: awk.xml annotate

annotate awk.xml @ 5:3f0e0d4c15a9 draft

Uploaded

author	bgruening
date	Wed, 07 Jan 2015 11:15:41 -0500
parents	56e80527c482
children	8928e6d1e7ba

rev	line source
4 56e80527c482 Uploaded bgruening parents: 3 diff changeset	1 <tool id="tp_awk_tool" name="Text reformatting" version="@BASE_VERSION@.0">
2 fc862d5bccaf Uploaded bgruening parents: 1 diff changeset	2 <description>with awk</description>
4 56e80527c482 Uploaded bgruening parents: 3 diff changeset	3 <macros>
56e80527c482 Uploaded bgruening parents: 3 diff changeset	4 <import>macros.xml</import>
56e80527c482 Uploaded bgruening parents: 3 diff changeset	5 </macros>
56e80527c482 Uploaded bgruening parents: 3 diff changeset	6 <expand macro="requirements">
0 ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	7 <requirement type="package" version="4.1.0">gnu_awk</requirement>
4 56e80527c482 Uploaded bgruening parents: 3 diff changeset	8 </expand>
56e80527c482 Uploaded bgruening parents: 3 diff changeset	9 <version_command>awk --version \| head -n 1</version_command>
0 ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	10 <command>
4 56e80527c482 Uploaded bgruening parents: 3 diff changeset	11 <![CDATA[
56e80527c482 Uploaded bgruening parents: 3 diff changeset	12 awk
56e80527c482 Uploaded bgruening parents: 3 diff changeset	13 --sandbox
56e80527c482 Uploaded bgruening parents: 3 diff changeset	14 -v FS=\$'\t'
56e80527c482 Uploaded bgruening parents: 3 diff changeset	15 -v OFS=\$'\t'
56e80527c482 Uploaded bgruening parents: 3 diff changeset	16 --re-interval
56e80527c482 Uploaded bgruening parents: 3 diff changeset	17 -f '$awk_script'
56e80527c482 Uploaded bgruening parents: 3 diff changeset	18 "$input"
56e80527c482 Uploaded bgruening parents: 3 diff changeset	19 > "$output"
56e80527c482 Uploaded bgruening parents: 3 diff changeset	20 ]]>
0 ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	21 </command>
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	22 <inputs>
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	23 <param format="txt" name="input" type="data" label="File to process" />
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	24 <param name="url_paste" type="text" area="true" size="5x35" label="AWK Program" help="">
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	25 <sanitizer>
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	26 <valid initial="string.printable">
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	27 <remove value="'"/>
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	28 </valid>
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	29 </sanitizer>
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	30 </param>
4 56e80527c482 Uploaded bgruening parents: 3 diff changeset	31 </inputs>
56e80527c482 Uploaded bgruening parents: 3 diff changeset	32 <configfiles>
56e80527c482 Uploaded bgruening parents: 3 diff changeset	33 <configfile name="awk_script">
56e80527c482 Uploaded bgruening parents: 3 diff changeset	34 $url_paste
56e80527c482 Uploaded bgruening parents: 3 diff changeset	35 </configfile>
56e80527c482 Uploaded bgruening parents: 3 diff changeset	36 </configfiles>
56e80527c482 Uploaded bgruening parents: 3 diff changeset	37 <outputs>
56e80527c482 Uploaded bgruening parents: 3 diff changeset	38 <data format="input" name="output" metadata_source="input"/>
56e80527c482 Uploaded bgruening parents: 3 diff changeset	39 </outputs>
56e80527c482 Uploaded bgruening parents: 3 diff changeset	40 <tests>
0 ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	41 <test>
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	42 <param name="input" value="unix_awk_input1.txt" />
4 56e80527c482 Uploaded bgruening parents: 3 diff changeset	43 <param name="awk_script" value="$2>0.5 { print $2*9, $1 }" />
0 ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	44 <output name="output" file="unix_awk_output1.txt" />
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	45 </test>
4 56e80527c482 Uploaded bgruening parents: 3 diff changeset	46 </tests>
0 ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	47
4 56e80527c482 Uploaded bgruening parents: 3 diff changeset	48 <help>
56e80527c482 Uploaded bgruening parents: 3 diff changeset	49 <![CDATA[
0 ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	50 What it does
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	51
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	52 This tool runs the unix awk command on the selected data file.
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	53
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	54 .. class:: infomark
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	55
1 a4ad586d1403 Uploaded bgruening parents: 0 diff changeset	56 TIP:
a4ad586d1403 Uploaded bgruening parents: 0 diff changeset	57
a4ad586d1403 Uploaded bgruening parents: 0 diff changeset	58 This tool uses the extended regular expression syntax (not the perl syntax).
a4ad586d1403 Uploaded bgruening parents: 0 diff changeset	59 \\d, \\w, \\s etc. are not supported.
0 ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	60
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	61
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	62 Further reading
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	63
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	64 - Awk by Example (http://www.ibm.com/developerworks/linux/library/l-awk1.html)
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	65 - Long AWK tutorial (http://www.grymoire.com/Unix/Awk.html)
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	66 - Learn AWK in 1 hour (http://www.selectorweb.com/awk.html)
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	67 - awk cheat-sheet (http://cbi.med.harvard.edu/people/peshkin/sb302/awk_cheatsheets.pdf)
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	68 - Collection of useful awk one-liners (http://student.northpark.edu/pemente/awk/awk1line.txt)
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	69
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	70 -----
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	71
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	72 AWK programs
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	73
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	74 Most AWK programs consist of patterns (i.e. rules that match lines of text) and actions (i.e. commands to execute when a pattern matches a line).
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	75
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	76 The basic form of AWK program is::
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	77
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	78 pattern { action 1; action 2; action 3; }
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	79
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	80
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	81
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	82 Pattern Examples
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	83
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	84 - $2 == "chr3" will match lines whose second column is the string 'chr3'
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	85 - $5-$4>23 will match lines that after subtracting the value of the fourth column from the value of the fifth column, gives value alrger than 23.
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	86 - /AG..AG/ will match lines that contain the regular expression AG..AG (meaning the characeters AG followed by any two characeters followed by AG). (This is the way to specify regular expressions on the entire line, similar to GREP.)
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	87 - $7 ~ /A{4}U/ will match lines whose seventh column contains 4 consecutive A's followed by a U. (This is the way to specify regular expressions on a specific field.)
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	88 - 10000 < $4 && $4 < 20000 will match lines whose fourth column value is larger than 10,000 but smaller than 20,000
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	89 - If no pattern is specified, all lines match (meaning the action part will be executed on all lines).
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	90
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	91
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	92
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	93 Action Examples
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	94
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	95 - { print } or { print $0 } will print the entire input line (the line that matched in pattern). $0 is a special marker meaning 'the entire line'.
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	96 - { print $1, $4, $5 } will print only the first, fourth and fifth fields of the input line.
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	97 - { print $4, $5-$4 } will print the fourth column and the difference between the fifth and fourth column. (If the fourth column was start-position in the input file, and the fifth column was end-position - the output file will contain the start-position, and the length).
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	98 - If no action part is specified (not even the curly brackets) - the default action is to print the entire line.
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	99
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	100
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	101
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	102 AWK's Regular Expression Syntax
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	103
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	104 The select tool searches the data for lines containing or not containing a match to the given pattern. A Regular Expression is a pattern descibing a certain amount of text.
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	105
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	106 - *( ) { } [ ] . ? + \ ^ $ are all special characters. \\** can be used to "escape" a special character, allowing that special character to be searched for.
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	107 - ^ matches the beginning of a string(but not an internal line).
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	108 - ( .. ) groups a particular pattern.
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	109 - { n or n, or n,m } specifies an expected number of repetitions of the preceding pattern.
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	110
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	111 - {n} The preceding item is matched exactly n times.
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	112 - {n,} The preceding item ismatched n or more times.
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	113 - {n,m} The preceding item is matched at least n times but not more than m times.
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	114
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	115 - [ ... ] creates a character class. Within the brackets, single characters can be placed. A dash (-) may be used to indicate a range such as a-z.
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	116 - . Matches any single character except a newline.
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	117 - ***** The preceding item will be matched zero or more times.
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	118 - ? The preceding item is optional and matched at most once.
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	119 - + The preceding item will be matched one or more times.
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	120 - ^ has two meaning:
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	121 - matches the beginning of a line or string.
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	122 - indicates negation in a character class. For example, [^...] matches every character except the ones inside brackets.
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	123 - $ matches the end of a line or string.
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	124 - \\| Separates alternate possibilities.
ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	125
4 56e80527c482 Uploaded bgruening parents: 3 diff changeset	126 @REFERENCES@
56e80527c482 Uploaded bgruening parents: 3 diff changeset	127 ]]>
1 a4ad586d1403 Uploaded bgruening parents: 0 diff changeset	128 </help>
0 ec66f9d90ef0 initial uploaded bgruening parents: diff changeset	129 </tool>

Mercurial > repos > bgruening > text_processing

annotate awk.xml @ 5:3f0e0d4c15a9 draft