annotate cor.xml @ 1:2e7bc1bb2dbe draft default tip

Uploaded
author iuc
date Fri, 09 Jan 2015 12:56:07 -0500
parents ffcdde989859
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
1
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
1 <tool id="cor2" name="Correlation" version="1.1.0">
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
2 <description>for numeric columns</description>
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
3 <expand macro="requirements" />
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
4 <macros>
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
5 <import>statistic_tools_macros.xml</import>
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
6 </macros>
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
7 <command interpreter="python">
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
8 <![CDATA[
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
9 cor.py
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
10 $input1
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
11 $out_file1
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
12 $numeric_columns
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
13 $method
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
14 ]]>
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
15 </command>
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
16 <inputs>
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
17 <param format="tabular" name="input1" type="data" label="Dataset" help="Dataset missing? See TIP below"/>
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
18 <param name="numeric_columns" label="Numerical columns" type="data_column" numerical="True" multiple="True" data_ref="input1" help="Multi-select list - hold the appropriate key while clicking to select multiple columns" />
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
19 <param name="method" type="select" label="Method">
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
20 <option value="pearson">Pearson</option>
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
21 <option value="kendall">Kendall rank</option>
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
22 <option value="spearman">Spearman rank</option>
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
23 </param>
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
24 </inputs>
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
25 <outputs>
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
26 <data format="txt" name="out_file1" />
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
27 </outputs>
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
28 <tests>
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
29 <!--
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
30 Test a tabular input with the first line being a comment without a # character to start
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
31 -->
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
32 <test>
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
33 <param name="input1" value="cor.tabular" />
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
34 <param name="numeric_columns" value="2,3" />
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
35 <param name="method" value="pearson" />
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
36 <output name="out_file1" file="cor_out.txt" />
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
37 </test>
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
38 </tests>
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
39 <help>
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
40 <![CDATA[
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
41
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
42 .. class:: infomark
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
43
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
44 **TIP:** If your data is not TAB delimited, use *Text Manipulation->Convert*
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
45
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
46 .. class:: warningmark
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
47
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
48 Missing data ("nan") removed from each pairwise comparison
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
49
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
50 -----
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
51
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
52 **Syntax**
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
53
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
54 This tool computes the matrix of correlation coefficients between numeric columns.
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
55
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
56 - All invalid, blank and comment lines are skipped when performing computations. The number of skipped lines is displayed in the resulting history item.
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
57
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
58 - **Pearson's Correlation** reflects the degree of linear relationship between two variables. It ranges from +1 to -1. A correlation of +1 means that there is a perfect positive linear relationship between variables. The formula for Pearson's correlation is:
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
59
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
60 .. image:: $PATH_TO_IMAGES/pearson.png
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
61
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
62 where n is the number of items
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
63
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
64 - **Kendall's rank correlation** is used to measure the degree of correspondence between two rankings and assessing the significance of this correspondence. The formula for Kendall's rank correlation is:
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
65
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
66 .. image:: $PATH_TO_IMAGES/kendall.png
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
67
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
68 where n is the number of items, and P is the sum.
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
69
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
70 - **Spearman's rank correlation** assesses how well an arbitrary monotonic function could describe the relationship between two variables, without making any assumptions about the frequency distribution of the variables. The formula for Spearman's rank correlation is
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
71
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
72 .. image:: $PATH_TO_IMAGES/spearman.png
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
73
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
74 where D is the difference between the ranks of corresponding values of X and Y, and N is the number of pairs of values.
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
75
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
76 -----
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
77
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
78 **Example**
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
79
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
80 - Input file::
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
81
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
82 #Person Height Self Esteem
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
83 1 68 4.1
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
84 2 71 4.6
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
85 3 62 3.8
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
86 4 75 4.4
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
87 5 58 3.2
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
88 6 60 3.1
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
89 7 67 3.8
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
90 8 68 4.1
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
91 9 71 4.3
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
92 10 69 3.7
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
93 11 68 3.5
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
94 12 67 3.2
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
95 13 63 3.7
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
96 14 62 3.3
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
97 15 60 3.4
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
98 16 63 4.0
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
99 17 65 4.1
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
100 18 67 3.8
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
101 19 63 3.4
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
102 20 61 3.6
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
103
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
104 - Computing the correlation coefficients between columns 2 and 3 of the above file (using Pearson's Correlation), the output is::
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
105
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
106 1.0 0.730635686279
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
107 0.730635686279 1.0
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
108
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
109 So the correlation for our twenty cases is .73, which is a fairly strong positive relationship.
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
110 ]]>
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
111 </help>
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
112 </tool>