0
|
1 glmnet wrappers
|
|
2 ===============
|
|
3
|
|
4 This is a self installing Galaxy tool exposing the glmnet_ R package which has excellent documentation at
|
|
5 glmnet_ Minimal details are provided in this wrapper - please RTM to get the best out of it.
|
|
6
|
|
7 The tool exposes the entire range of penalised maximum likelihood
|
|
8 GLM models ranging from pure lasso (set alpha to 1) to pure ridge-regression (set alpha to 0).
|
|
9
|
|
10 These models can be k-fold internally cross validated to help select an "optimal" predictive or classification
|
|
11 algorithm. Predictive coefficients for each included independent variable are output for each model.
|
|
12
|
|
13 Predictors can be forced into models to adjust for known confounders or explanatory factors.
|
|
14
|
|
15 The glmnet_ implementation of the coordinate descent algorithm is fast and efficient even on relatively large problems
|
|
16 with tens of thousands of predictors and thousands of samples - such as normalised microarray intensities and anthropometry
|
|
17 on a very large sample of obese patients.
|
|
18
|
|
19 The user supplies a tabular file with rows as samples and columns containing observations, then chooses
|
|
20 as many predictors as required. A separate model will be output for each of potentially multiple dependent
|
|
21 variables. Models are reported as the coefficients for terms in an 'optimal' model.
|
|
22 These optimal predictors are selected by repeatedly setting
|
|
23 aside a random subsample, building a model in the remainder and estimating AUC or deviance
|
|
24 using k (default 10) fold internal cross validation. For each of these steps, a random 1/k
|
|
25 of the samples are set aside and used to estiamte performance of an optimal model estimated
|
|
26 from the remaining samples. Plots are provided showing the range of these (eg 10) internal validation
|
|
27 estimates and mean model AUC (binomial) or residual deviance plots at each penalty increment step.
|
|
28
|
|
29 A full range of link functions are available including Gaussian, Poisson, Binomial and
|
|
30 Cox proportional hazard time to failure for censored data in this wrapper.
|
|
31
|
|
32 Note that multinomial and multiresponse gaussian models are NOT yet implemented since I have not yet
|
|
33 had use for them - send code!
|
|
34
|
|
35 .. _glmnet: http://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html
|
|
36
|
|
37 Wrapper author: Ross Lazarus
|
|
38 19 october 2014
|
|
39
|