annotate gsummary.xml @ 94:6ef11b60940a draft

Uploaded
author bernhardlutz
date Sun, 26 Jan 2014 09:11:50 -0500
parents c4a3a8999945
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
80
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
1 <tool id="Summary_Statistics1" name="Summary Statistics" version="1.3.0">
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
2 <description>for any numerical column</description>
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
3 <expand macro="requirements" />
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
4 <macros>
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
5 <import>statistic_tools_macros.xml</import>
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
6 </macros>
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
7 <command interpreter="python">gsummary.py $input $out_file1 "$cond"</command>
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
8 <inputs>
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
9 <param format="tabular" name="input" type="data" label="Summary statistics on" help="Dataset missing? See TIP below"/>
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
10 <param name="cond" size="30" type="text" value="c5" label="Column or expression" help="See syntax below">
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
11 <validator type="empty_field" message="Enter a valid column or expression, see syntax below for examples"/>
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
12 </param>
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
13 </inputs>
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
14 <outputs>
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
15 <data format="tabular" name="out_file1" />
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
16 </outputs>
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
17 <tests>
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
18 <test>
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
19 <param name="input" value="1.bed"/>
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
20 <output name="out_file1" file="gsummary_out1.tabular"/>
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
21 <param name="cond" value="c2"/>
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
22 </test>
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
23 </tests>
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
24 <help>
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
25
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
26 .. class:: warningmark
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
27
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
28 This tool expects input datasets consisting of tab-delimited columns (blank or comment lines beginning with a # character are automatically skipped).
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
29
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
30 .. class:: infomark
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
31
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
32 **TIP:** If your data is not TAB delimited, use *Text Manipulation-&gt;Convert delimiters to TAB*
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
33
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
34 .. class:: infomark
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
35
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
36 **TIP:** Computing summary statistics may throw exceptions if the data value in every line of the columns being summarized is not numerical. If a line is missing a value or contains a non-numerical value in the column being summarized, that line is skipped and the value is not included in the statistical computation. The number of invalid skipped lines is documented in the resulting history item.
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
37
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
38 .. class:: infomark
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
39
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
40 **USING R FUNCTIONS:** Most functions (like *abs*) take only a single expression. *log* can take one or two parameters, like *log(expression,base)*
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
41
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
42 Currently, these R functions are supported: *abs, sign, sqrt, floor, ceiling, trunc, round, signif, exp, log, cos, sin, tan, acos, asin, atan, cosh, sinh, tanh, acosh, asinh, atanh, lgamma, gamma, gammaCody, digamma, trigamma, cumsum, cumprod, cummax, cummin*
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
43
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
44 -----
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
45
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
46 **Syntax**
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
47
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
48 This tool computes basic summary statistics on a given column, or on a valid expression containing one or more columns.
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
49
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
50 - Columns are referenced with **c** and a **number**. For example, **c1** refers to the first column of a tab-delimited file.
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
51
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
52 - For example:
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
53
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
54 - **log(c5)** calculates the summary statistics for the natural log of column 5
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
55 - **(c5 + c6 + c7) / 3** calculates the summary statistics on the average of columns 5-7
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
56 - **log(c5,10)** summary statistics of the base 10 log of column 5
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
57 - **sqrt(c5+c9)** summary statistics of the square root of column 5 + column 9
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
58
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
59 -----
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
60
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
61 **Examples**
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
62
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
63 - Input Dataset::
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
64
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
65 c1 c2 c3 c4 c5 c6
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
66 586 chrX 161416 170887 41108_at 16990
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
67 73 chrX 505078 532318 35073_at 1700
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
68 595 chrX 1361578 1388460 33665_s_at 1960
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
69 74 chrX 1420620 1461919 1185_at 8600
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
70
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
71 - Summary Statistics on column c6 of the above input dataset::
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
72
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
73 #sum mean stdev 0% 25% 50% 75% 100%
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
74 29250.000 7312.500 7198.636 1700.000 1895.000 5280.000 10697.500 16990.000
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
75
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
76 </help>
c4a3a8999945 Uploaded
bernhardlutz
parents:
diff changeset
77 </tool>