diff readme.rst @ 4:fb4959ed5b2b draft

Fixes to paths in git for deps
author fubar
date Sat, 31 Oct 2015 02:26:24 -0400
parents cf295f36d606
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/readme.rst	Sat Oct 31 02:26:24 2015 -0400
@@ -0,0 +1,39 @@
+glmnet wrappers
+===============
+
+This is a self installing Galaxy tool exposing the glmnet_ R package which has excellent documentation at
+glmnet_ Minimal details are provided in this wrapper - please RTM to get the best out of it.
+
+The tool exposes the entire range of penalised maximum likelihood
+GLM models ranging from pure lasso (set alpha to 1) to pure ridge-regression (set alpha to 0). 
+
+These models can be k-fold internally cross validated to help select an "optimal" predictive or classification
+algorithm. Predictive coefficients for each included independent variable are output for each model. 
+
+Predictors can be forced into models to adjust for known confounders or explanatory factors.
+
+The glmnet_ implementation of the coordinate descent algorithm is fast and efficient even on relatively large problems
+with tens of thousands of predictors and thousands of samples - such as normalised microarray intensities and anthropometry
+on a very large sample of obese patients. 
+
+The user supplies a tabular file with rows as samples and columns containing observations, then chooses 
+as many predictors as required. A separate model will be output for each of potentially multiple dependent
+variables. Models are reported as the coefficients for terms in an 'optimal' model.
+These optimal predictors are selected by repeatedly setting
+aside a random subsample, building a model in the remainder and estimating AUC or deviance 
+using  k (default 10) fold internal cross validation. For each of these steps, a random 1/k 
+of the samples are set aside and used to estiamte performance of an optimal model estimated 
+from the remaining samples. Plots are provided showing the range of these (eg 10) internal validation 
+estimates and mean model AUC (binomial) or residual deviance plots at each penalty increment step.
+
+A full range of link functions are available including Gaussian, Poisson, Binomial and
+Cox proportional hazard time to failure for censored data in this wrapper.
+
+Note that multinomial and multiresponse gaussian models are NOT yet implemented since I have not yet
+had use for them - send code!
+
+.. _glmnet: http://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html
+
+Wrapper author: Ross Lazarus
+19 october 2014
+