annotate readme.rst @ 0:cf295f36d606 draft

Initial commit for iuc/test rglasso
author fubar
date Sat, 31 Oct 2015 01:07:28 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
1 glmnet wrappers
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
2 ===============
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
3
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
4 This is a self installing Galaxy tool exposing the glmnet_ R package which has excellent documentation at
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
5 glmnet_ Minimal details are provided in this wrapper - please RTM to get the best out of it.
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
6
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
7 The tool exposes the entire range of penalised maximum likelihood
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
8 GLM models ranging from pure lasso (set alpha to 1) to pure ridge-regression (set alpha to 0).
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
9
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
10 These models can be k-fold internally cross validated to help select an "optimal" predictive or classification
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
11 algorithm. Predictive coefficients for each included independent variable are output for each model.
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
12
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
13 Predictors can be forced into models to adjust for known confounders or explanatory factors.
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
14
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
15 The glmnet_ implementation of the coordinate descent algorithm is fast and efficient even on relatively large problems
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
16 with tens of thousands of predictors and thousands of samples - such as normalised microarray intensities and anthropometry
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
17 on a very large sample of obese patients.
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
18
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
19 The user supplies a tabular file with rows as samples and columns containing observations, then chooses
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
20 as many predictors as required. A separate model will be output for each of potentially multiple dependent
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
21 variables. Models are reported as the coefficients for terms in an 'optimal' model.
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
22 These optimal predictors are selected by repeatedly setting
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
23 aside a random subsample, building a model in the remainder and estimating AUC or deviance
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
24 using k (default 10) fold internal cross validation. For each of these steps, a random 1/k
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
25 of the samples are set aside and used to estiamte performance of an optimal model estimated
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
26 from the remaining samples. Plots are provided showing the range of these (eg 10) internal validation
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
27 estimates and mean model AUC (binomial) or residual deviance plots at each penalty increment step.
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
28
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
29 A full range of link functions are available including Gaussian, Poisson, Binomial and
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
30 Cox proportional hazard time to failure for censored data in this wrapper.
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
31
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
32 Note that multinomial and multiresponse gaussian models are NOT yet implemented since I have not yet
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
33 had use for them - send code!
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
34
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
35 .. _glmnet: http://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
36
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
37 Wrapper author: Ross Lazarus
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
38 19 october 2014
cf295f36d606 Initial commit for iuc/test rglasso
fubar
parents:
diff changeset
39