annotate cor.xml @ 0:6c20d2297d67 draft default tip

Imported from capsule None
author devteam
date Mon, 28 Jul 2014 11:30:05 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
1 <tool id="cor2" name="Correlation" version="1.0.0">
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
2 <description>for numeric columns</description>
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
3 <requirements>
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
4 <requirement type="package" version="1.0.3">rpy</requirement>
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
5 </requirements>
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
6 <command interpreter="python">cor.py $input1 $out_file1 $numeric_columns $method</command>
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
7 <inputs>
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
8 <param format="tabular" name="input1" type="data" label="Dataset" help="Dataset missing? See TIP below"/>
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
9 <param name="numeric_columns" label="Numerical columns" type="data_column" numerical="True" multiple="True" data_ref="input1" help="Multi-select list - hold the appropriate key while clicking to select multiple columns" />
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
10 <param name="method" type="select" label="Method">
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
11 <option value="pearson">Pearson</option>
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
12 <option value="kendall">Kendall rank</option>
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
13 <option value="spearman">Spearman rank</option>
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
14 </param>
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
15 </inputs>
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
16 <outputs>
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
17 <data format="txt" name="out_file1" />
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
18 </outputs>
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
19 <tests>
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
20 <!--
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
21 Test a tabular input with the first line being a comment without a # character to start
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
22 -->
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
23 <test>
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
24 <param name="input1" value="cor.tabular" />
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
25 <param name="numeric_columns" value="2,3" />
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
26 <param name="method" value="pearson" />
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
27 <output name="out_file1" file="cor_out.txt" />
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
28 </test>
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
29 </tests>
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
30 <help>
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
31
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
32 .. class:: infomark
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
33
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
34 **TIP:** If your data is not TAB delimited, use *Text Manipulation-&gt;Convert*
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
35
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
36 .. class:: warningmark
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
37
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
38 Missing data ("nan") removed from each pairwise comparison
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
39
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
40 -----
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
41
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
42 **Syntax**
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
43
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
44 This tool computes the matrix of correlation coefficients between numeric columns.
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
45
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
46 - All invalid, blank and comment lines are skipped when performing computations. The number of skipped lines is displayed in the resulting history item.
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
47
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
48 - **Pearson's Correlation** reflects the degree of linear relationship between two variables. It ranges from +1 to -1. A correlation of +1 means that there is a perfect positive linear relationship between variables. The formula for Pearson's correlation is:
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
49
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
50 .. image:: pearson.png
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
51
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
52 where n is the number of items
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
53
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
54 - **Kendall's rank correlation** is used to measure the degree of correspondence between two rankings and assessing the significance of this correspondence. The formula for Kendall's rank correlation is:
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
55
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
56 .. image:: kendall.png
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
57
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
58 where n is the number of items, and P is the sum.
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
59
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
60 - **Spearman's rank correlation** assesses how well an arbitrary monotonic function could describe the relationship between two variables, without making any assumptions about the frequency distribution of the variables. The formula for Spearman's rank correlation is
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
61
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
62 .. image:: spearman.png
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
63
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
64 where D is the difference between the ranks of corresponding values of X and Y, and N is the number of pairs of values.
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
65
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
66 -----
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
67
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
68 **Example**
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
69
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
70 - Input file::
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
71
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
72 #Person Height Self Esteem
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
73 1 68 4.1
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
74 2 71 4.6
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
75 3 62 3.8
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
76 4 75 4.4
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
77 5 58 3.2
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
78 6 60 3.1
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
79 7 67 3.8
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
80 8 68 4.1
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
81 9 71 4.3
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
82 10 69 3.7
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
83 11 68 3.5
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
84 12 67 3.2
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
85 13 63 3.7
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
86 14 62 3.3
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
87 15 60 3.4
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
88 16 63 4.0
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
89 17 65 4.1
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
90 18 67 3.8
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
91 19 63 3.4
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
92 20 61 3.6
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
93
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
94 - Computing the correlation coefficients between columns 2 and 3 of the above file (using Pearson's Correlation), the output is::
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
95
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
96 1.0 0.730635686279
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
97 0.730635686279 1.0
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
98
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
99 So the correlation for our twenty cases is .73, which is a fairly strong positive relationship.
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
100 </help>
6c20d2297d67 Imported from capsule None
devteam
parents:
diff changeset
101 </tool>