annotate cor.xml @ 0:ffcdde989859 draft

Uploaded
author iuc
date Tue, 29 Jul 2014 06:30:45 -0400
parents
children 2e7bc1bb2dbe
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
ffcdde989859 Uploaded
iuc
parents:
diff changeset
1 <tool id="cor2" name="Correlation" version="1.1.0">
ffcdde989859 Uploaded
iuc
parents:
diff changeset
2 <description>for numeric columns</description>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
3 <expand macro="requirements" />
ffcdde989859 Uploaded
iuc
parents:
diff changeset
4 <macros>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
5 <import>statistic_tools_macros.xml</import>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
6 </macros>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
7 <command interpreter="python">cor.py $input1 $out_file1 $numeric_columns $method</command>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
8 <inputs>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
9 <param format="tabular" name="input1" type="data" label="Dataset" help="Dataset missing? See TIP below"/>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
10 <param name="numeric_columns" label="Numerical columns" type="data_column" numerical="True" multiple="True" data_ref="input1" help="Multi-select list - hold the appropriate key while clicking to select multiple columns" />
ffcdde989859 Uploaded
iuc
parents:
diff changeset
11 <param name="method" type="select" label="Method">
ffcdde989859 Uploaded
iuc
parents:
diff changeset
12 <option value="pearson">Pearson</option>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
13 <option value="kendall">Kendall rank</option>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
14 <option value="spearman">Spearman rank</option>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
15 </param>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
16 </inputs>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
17 <outputs>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
18 <data format="txt" name="out_file1" />
ffcdde989859 Uploaded
iuc
parents:
diff changeset
19 </outputs>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
20 <tests>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
21 <!--
ffcdde989859 Uploaded
iuc
parents:
diff changeset
22 Test a tabular input with the first line being a comment without a # character to start
ffcdde989859 Uploaded
iuc
parents:
diff changeset
23 -->
ffcdde989859 Uploaded
iuc
parents:
diff changeset
24 <test>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
25 <param name="input1" value="cor.tabular" />
ffcdde989859 Uploaded
iuc
parents:
diff changeset
26 <param name="numeric_columns" value="2,3" />
ffcdde989859 Uploaded
iuc
parents:
diff changeset
27 <param name="method" value="pearson" />
ffcdde989859 Uploaded
iuc
parents:
diff changeset
28 <output name="out_file1" file="cor_out.txt" />
ffcdde989859 Uploaded
iuc
parents:
diff changeset
29 </test>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
30 </tests>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
31 <help>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
32
ffcdde989859 Uploaded
iuc
parents:
diff changeset
33 .. class:: infomark
ffcdde989859 Uploaded
iuc
parents:
diff changeset
34
ffcdde989859 Uploaded
iuc
parents:
diff changeset
35 **TIP:** If your data is not TAB delimited, use *Text Manipulation-&gt;Convert*
ffcdde989859 Uploaded
iuc
parents:
diff changeset
36
ffcdde989859 Uploaded
iuc
parents:
diff changeset
37 .. class:: warningmark
ffcdde989859 Uploaded
iuc
parents:
diff changeset
38
ffcdde989859 Uploaded
iuc
parents:
diff changeset
39 Missing data ("nan") removed from each pairwise comparison
ffcdde989859 Uploaded
iuc
parents:
diff changeset
40
ffcdde989859 Uploaded
iuc
parents:
diff changeset
41 -----
ffcdde989859 Uploaded
iuc
parents:
diff changeset
42
ffcdde989859 Uploaded
iuc
parents:
diff changeset
43 **Syntax**
ffcdde989859 Uploaded
iuc
parents:
diff changeset
44
ffcdde989859 Uploaded
iuc
parents:
diff changeset
45 This tool computes the matrix of correlation coefficients between numeric columns.
ffcdde989859 Uploaded
iuc
parents:
diff changeset
46
ffcdde989859 Uploaded
iuc
parents:
diff changeset
47 - All invalid, blank and comment lines are skipped when performing computations. The number of skipped lines is displayed in the resulting history item.
ffcdde989859 Uploaded
iuc
parents:
diff changeset
48
ffcdde989859 Uploaded
iuc
parents:
diff changeset
49 - **Pearson's Correlation** reflects the degree of linear relationship between two variables. It ranges from +1 to -1. A correlation of +1 means that there is a perfect positive linear relationship between variables. The formula for Pearson's correlation is:
ffcdde989859 Uploaded
iuc
parents:
diff changeset
50
ffcdde989859 Uploaded
iuc
parents:
diff changeset
51 .. image:: $PATH_TO_IMAGES/pearson.png
ffcdde989859 Uploaded
iuc
parents:
diff changeset
52
ffcdde989859 Uploaded
iuc
parents:
diff changeset
53 where n is the number of items
ffcdde989859 Uploaded
iuc
parents:
diff changeset
54
ffcdde989859 Uploaded
iuc
parents:
diff changeset
55 - **Kendall's rank correlation** is used to measure the degree of correspondence between two rankings and assessing the significance of this correspondence. The formula for Kendall's rank correlation is:
ffcdde989859 Uploaded
iuc
parents:
diff changeset
56
ffcdde989859 Uploaded
iuc
parents:
diff changeset
57 .. image:: $PATH_TO_IMAGES/kendall.png
ffcdde989859 Uploaded
iuc
parents:
diff changeset
58
ffcdde989859 Uploaded
iuc
parents:
diff changeset
59 where n is the number of items, and P is the sum.
ffcdde989859 Uploaded
iuc
parents:
diff changeset
60
ffcdde989859 Uploaded
iuc
parents:
diff changeset
61 - **Spearman's rank correlation** assesses how well an arbitrary monotonic function could describe the relationship between two variables, without making any assumptions about the frequency distribution of the variables. The formula for Spearman's rank correlation is
ffcdde989859 Uploaded
iuc
parents:
diff changeset
62
ffcdde989859 Uploaded
iuc
parents:
diff changeset
63 .. image:: $PATH_TO_IMAGES/spearman.png
ffcdde989859 Uploaded
iuc
parents:
diff changeset
64
ffcdde989859 Uploaded
iuc
parents:
diff changeset
65 where D is the difference between the ranks of corresponding values of X and Y, and N is the number of pairs of values.
ffcdde989859 Uploaded
iuc
parents:
diff changeset
66
ffcdde989859 Uploaded
iuc
parents:
diff changeset
67 -----
ffcdde989859 Uploaded
iuc
parents:
diff changeset
68
ffcdde989859 Uploaded
iuc
parents:
diff changeset
69 **Example**
ffcdde989859 Uploaded
iuc
parents:
diff changeset
70
ffcdde989859 Uploaded
iuc
parents:
diff changeset
71 - Input file::
ffcdde989859 Uploaded
iuc
parents:
diff changeset
72
ffcdde989859 Uploaded
iuc
parents:
diff changeset
73 #Person Height Self Esteem
ffcdde989859 Uploaded
iuc
parents:
diff changeset
74 1 68 4.1
ffcdde989859 Uploaded
iuc
parents:
diff changeset
75 2 71 4.6
ffcdde989859 Uploaded
iuc
parents:
diff changeset
76 3 62 3.8
ffcdde989859 Uploaded
iuc
parents:
diff changeset
77 4 75 4.4
ffcdde989859 Uploaded
iuc
parents:
diff changeset
78 5 58 3.2
ffcdde989859 Uploaded
iuc
parents:
diff changeset
79 6 60 3.1
ffcdde989859 Uploaded
iuc
parents:
diff changeset
80 7 67 3.8
ffcdde989859 Uploaded
iuc
parents:
diff changeset
81 8 68 4.1
ffcdde989859 Uploaded
iuc
parents:
diff changeset
82 9 71 4.3
ffcdde989859 Uploaded
iuc
parents:
diff changeset
83 10 69 3.7
ffcdde989859 Uploaded
iuc
parents:
diff changeset
84 11 68 3.5
ffcdde989859 Uploaded
iuc
parents:
diff changeset
85 12 67 3.2
ffcdde989859 Uploaded
iuc
parents:
diff changeset
86 13 63 3.7
ffcdde989859 Uploaded
iuc
parents:
diff changeset
87 14 62 3.3
ffcdde989859 Uploaded
iuc
parents:
diff changeset
88 15 60 3.4
ffcdde989859 Uploaded
iuc
parents:
diff changeset
89 16 63 4.0
ffcdde989859 Uploaded
iuc
parents:
diff changeset
90 17 65 4.1
ffcdde989859 Uploaded
iuc
parents:
diff changeset
91 18 67 3.8
ffcdde989859 Uploaded
iuc
parents:
diff changeset
92 19 63 3.4
ffcdde989859 Uploaded
iuc
parents:
diff changeset
93 20 61 3.6
ffcdde989859 Uploaded
iuc
parents:
diff changeset
94
ffcdde989859 Uploaded
iuc
parents:
diff changeset
95 - Computing the correlation coefficients between columns 2 and 3 of the above file (using Pearson's Correlation), the output is::
ffcdde989859 Uploaded
iuc
parents:
diff changeset
96
ffcdde989859 Uploaded
iuc
parents:
diff changeset
97 1.0 0.730635686279
ffcdde989859 Uploaded
iuc
parents:
diff changeset
98 0.730635686279 1.0
ffcdde989859 Uploaded
iuc
parents:
diff changeset
99
ffcdde989859 Uploaded
iuc
parents:
diff changeset
100 So the correlation for our twenty cases is .73, which is a fairly strong positive relationship.
ffcdde989859 Uploaded
iuc
parents:
diff changeset
101 </help>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
102 </tool>