annotate logistic_regression_vif.xml @ 1:2e7bc1bb2dbe draft default tip

Uploaded
author iuc
date Fri, 09 Jan 2015 12:56:07 -0500
parents ffcdde989859
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
ffcdde989859 Uploaded
iuc
parents:
diff changeset
1 <tool id="LogisticRegression" name="Perform Logistic Regression with vif" version="1.1.0">
ffcdde989859 Uploaded
iuc
parents:
diff changeset
2 <description> </description>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
3 <expand macro="requirements" />
ffcdde989859 Uploaded
iuc
parents:
diff changeset
4 <macros>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
5 <import>statistic_tools_macros.xml</import>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
6 </macros>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
7 <command interpreter="python">
1
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
8 <![CDATA[
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
9 logistic_regression_vif.py
0
ffcdde989859 Uploaded
iuc
parents:
diff changeset
10 $input1
ffcdde989859 Uploaded
iuc
parents:
diff changeset
11 $response_col
ffcdde989859 Uploaded
iuc
parents:
diff changeset
12 $predictor_cols
ffcdde989859 Uploaded
iuc
parents:
diff changeset
13 $out_file1
ffcdde989859 Uploaded
iuc
parents:
diff changeset
14 1>/dev/null
1
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
15 ]]>
0
ffcdde989859 Uploaded
iuc
parents:
diff changeset
16 </command>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
17 <inputs>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
18 <param format="tabular" name="input1" type="data" label="Select data" help="Dataset missing? See TIP below."/>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
19 <param name="response_col" label="Response column (Y)" type="data_column" data_ref="input1" numerical="True"/>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
20 <param name="predictor_cols" label="Predictor columns (X)" type="data_column" data_ref="input1" numerical="True" multiple="true" >
ffcdde989859 Uploaded
iuc
parents:
diff changeset
21 <validator type="no_options" message="Please select at least one column."/>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
22 </param>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
23 </inputs>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
24 <outputs>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
25 <data format="input" name="out_file1" metadata_source="input1" />
ffcdde989859 Uploaded
iuc
parents:
diff changeset
26
ffcdde989859 Uploaded
iuc
parents:
diff changeset
27 </outputs>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
28 <tests>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
29 <test>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
30 <param name="input1" value="logreg_inp.tabular"/>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
31 <param name="response_col" value="4"/>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
32 <param name="predictor_cols" value="1,2,3"/>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
33 <output name="out_file1" file="logreg_out2.tabular"/>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
34
ffcdde989859 Uploaded
iuc
parents:
diff changeset
35 </test>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
36 </tests>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
37 <help>
1
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
38 <![CDATA[
0
ffcdde989859 Uploaded
iuc
parents:
diff changeset
39
ffcdde989859 Uploaded
iuc
parents:
diff changeset
40
ffcdde989859 Uploaded
iuc
parents:
diff changeset
41 .. class:: infomark
ffcdde989859 Uploaded
iuc
parents:
diff changeset
42
1
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
43 **TIP:** If your data is not TAB delimited, use *Edit Datasets->Convert characters*
0
ffcdde989859 Uploaded
iuc
parents:
diff changeset
44
ffcdde989859 Uploaded
iuc
parents:
diff changeset
45 -----
ffcdde989859 Uploaded
iuc
parents:
diff changeset
46
ffcdde989859 Uploaded
iuc
parents:
diff changeset
47 .. class:: infomark
ffcdde989859 Uploaded
iuc
parents:
diff changeset
48
ffcdde989859 Uploaded
iuc
parents:
diff changeset
49 **What it does**
ffcdde989859 Uploaded
iuc
parents:
diff changeset
50
ffcdde989859 Uploaded
iuc
parents:
diff changeset
51 This tool uses the **'glm'** function from R statistical package to perform logistic regression on the input data. It outputs one file containing the summary statistics of the performed regression. Also, it calculates VIF(Variance Inflation Factor) with **'vif'** function from library (car) in R.
ffcdde989859 Uploaded
iuc
parents:
diff changeset
52
ffcdde989859 Uploaded
iuc
parents:
diff changeset
53
ffcdde989859 Uploaded
iuc
parents:
diff changeset
54 *R Development Core Team (2010). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.*
ffcdde989859 Uploaded
iuc
parents:
diff changeset
55
ffcdde989859 Uploaded
iuc
parents:
diff changeset
56 -----
ffcdde989859 Uploaded
iuc
parents:
diff changeset
57
ffcdde989859 Uploaded
iuc
parents:
diff changeset
58 .. class:: warningmark
ffcdde989859 Uploaded
iuc
parents:
diff changeset
59
ffcdde989859 Uploaded
iuc
parents:
diff changeset
60 **Note**
ffcdde989859 Uploaded
iuc
parents:
diff changeset
61
ffcdde989859 Uploaded
iuc
parents:
diff changeset
62 - This tool currently treats all predictor variables as continuous numeric variables and response variable as categorical variable. Currently, the response variable can have only two classes, namely 0 and 1. The program will take 0 as base class.
ffcdde989859 Uploaded
iuc
parents:
diff changeset
63
ffcdde989859 Uploaded
iuc
parents:
diff changeset
64 - Rows containing non-numeric (or missing) data in any of the chosen columns will be skipped from the analysis.
ffcdde989859 Uploaded
iuc
parents:
diff changeset
65
ffcdde989859 Uploaded
iuc
parents:
diff changeset
66 - The summary statistics in the output are described below:
ffcdde989859 Uploaded
iuc
parents:
diff changeset
67
ffcdde989859 Uploaded
iuc
parents:
diff changeset
68 - Pseudo R-squared: the proportion of model improvement from null model
ffcdde989859 Uploaded
iuc
parents:
diff changeset
69 - p-value: p-value for the z-test of the null hypothesis that the corresponding slope is equal to zero against the two-sided alternative.
ffcdde989859 Uploaded
iuc
parents:
diff changeset
70 - Coefficient indicates log ratio of (probability to be class 1 / probability to be class 0)
ffcdde989859 Uploaded
iuc
parents:
diff changeset
71
ffcdde989859 Uploaded
iuc
parents:
diff changeset
72 - This tool also provides **Variance Inflation Factor or VIF** which quantifies the level of multicollinearity. The tool will automatic generate VIF if the model has more than one predictor. The higher the VIF, the higher is the multicollinearity. Multicollinearity will inflate standard error and reduce level of significance of the predictor. In the worst case, it can reverse direction of slope for highly correlated predictors if one of them is significant. A general thumb-rule is to use those predictors having VIF lower than 10 or 5.
1
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
73 - **vif** is calculated by
0
ffcdde989859 Uploaded
iuc
parents:
diff changeset
74 - First, regressing each predictor over all other predictors, and recording R-squared for each regression.
ffcdde989859 Uploaded
iuc
parents:
diff changeset
75 - Second, computing vif as 1/(1- R_squared)
ffcdde989859 Uploaded
iuc
parents:
diff changeset
76
1
2e7bc1bb2dbe Uploaded
iuc
parents: 0
diff changeset
77 ]]>
0
ffcdde989859 Uploaded
iuc
parents:
diff changeset
78 </help>
ffcdde989859 Uploaded
iuc
parents:
diff changeset
79 </tool>