# HG changeset patch # User iuc # Date 1430794049 14400 # Node ID bb725f6d6d38e5dfda3123fbc5b3aaafdecbd888 # Parent 8c31e2aac682a596f52713aab9075445a21d3105 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/rglasso commit 344140b8df53b8b7024618bb04594607a045c03a diff -r 8c31e2aac682 -r bb725f6d6d38 rg_nri.xml --- a/rg_nri.xml Wed Apr 29 12:07:11 2015 -0400 +++ b/rg_nri.xml Mon May 04 22:47:29 2015 -0400 @@ -7,145 +7,12 @@ glmnet_lars_2_14 - rgToolFactory.py --script_path "$runme" --interpreter "Rscript" --tool_name "rg_NRI" + rgToolFactory.py --script_path "$runme" --interpreter "Rscript" --tool_name "rg_NRI" --output_dir "$html_file.files_path" --output_html "$html_file" --make_HTML "yes" - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -**Before you start** - -This is a simple tool to calculate various measures of improvement in prediction between two models described in pickering_paper_ -It is based on an R script pickering_code_ written by Dr John W Pickering and Dr David Cairns from sunny Otago University which -has been debugged and slightly adjusted to fit a Galaxy tool wrapper. - - -**What it does** - -Copied from the documentation in pickering_code_ :: - - - Functions to create risk assessment plots and associated summary statistics - - - (c) 2012 Dr John W Pickering, john.pickering@otago.ac.nz, and Dr David Cairns - Last modified August 2014 - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are met: - * Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in - the documentation and/or other materials provided with the distribution - - FUNCTIONS - raplot - Produces a Risk Assessment Plot and outputs the coordinates of the four curves - Based on: Pickering, J. W. and Endre, Z. H. (2012). New Metrics for Assessing Diagnostic Potential of - Candidate Biomarkers. Clinical Journal of the American Society of Nephrology, 7, 1355–1364. doi:10.2215/CJN.09590911 - - statistics.raplot - Produces the NRIs, IDIs, IS, IP, AUCs. - Based on: Pencina, M. J., D'Agostino, R. B. and Steyerberg, E. W. (2011). Extensions of net reclassification improvement calculations to - measure usefulness of new biomarkers. Statistics in Medicine, 30(1), 11–21. doi:10.1002/sim.4085 - Pencina, M. J., D'Agostino, R. B. and Vasan, R. S. (2008). Evaluating the added predictive ability of a new marker: From area under the - ROC curve to reclassification and beyond. - Statistics in Medicine, 27(2), 157–172. doi:10.1002/sim.2929 - DeLong, E., DeLong, D. and Clarke-Pearson, D. (1988). Comparing the areas under 2 or more correlated receiver operating characteristic curves - a nonparametric approach. - Biometrics, 44(3), 837–845. - - summary.raplot - Produces the NRIs, IDIs, IS, IP, AUCs with confidence intervals using a bootstrap or asymptotic procedure. (I prefer bootstrap which is chosed by cis=c("boot")) - - - Required arguments for all functions: - x1 is calculated risk (eg from a glm) for the null model, i.e. predict(,type="response") on a glm object - x2 is calculated risk (eg from a glm) for the alternative model - y is the case-control indicator (0 for controls, 1 for cases) - Optional argument - t are the boundaries of the risks for each group (ie 0, 1 and the thresholds beteween. eg c(0,0,3,0,7,1)). If missing, defaults to c(0, the incidence, 1) - - -**Input** - -The observed and predicted outcomes from two models to be compared. - -**Output** - -Lots'o'measures (TM) see pickering_paper_ for details - -**Attributions** - -pickering_paper_ is the paper the caclulations performed by this tool is based on - -pickering_code_ is the R function from John Pickering exposed by this Galaxy tool with minor modifications and hacks by Ross Lazarus. - -Galaxy_ (that's what you are using right now!) for gluing everything together - -Otherwise, all code and documentation comprising this tool was written by Ross Lazarus and is -licensed to you under the LGPL_ like other rgenetics artefacts - -.. _LGPL: http://www.gnu.org/copyleft/lesser.html -.. _pickering_code: http://www.researchgate.net/publication/264672640_R_function_for_Risk_Assessment_Plot__reclassification_metrics_NRI_IDI_cfNRI -.. _pickering_paper: http://cjasn.asnjournals.org/content/early/2012/05/24/CJN.09590911.full -.. _Galaxy: http://getgalaxy.org - - - - - + 0) cfpup.ne = mean(d[b] > 0) @@ -400,31 +267,31 @@ ### Output output = c(n, na, nb, pup.ev, pup.ne, pdown.ev, pdown.ne, nri, se.nri, z.nri, - nri.ev, se.nri.ev, z.nri.ev, nri.ne, se.nri.ne, z.nri.ne, + nri.ev, se.nri.ev, z.nri.ev, nri.ne, se.nri.ne, z.nri.ne, cfpup.ev, cfpup.ne, cfpdown.ev, cfpdown.ne, cfnri, se.cfnri, z.cfnri, - cfnri.ev, se.cfnri.ev, z.cfnri.ev, cfnri.ne, se.cfnri.ne, z.cfnri.ne, - improveSens, improveSpec, idi.ev, se.idi.ev, z.idi.ev, idi.ne, - se.idi.ne, z.idi.ne, idi, se.idi, z.idi, is.x1, NA, is.x2, NA, - ip.x1, NA, ip.x2, NA, auc.x1, se.auc.x1, auc.x2, se.auc.x2, + cfnri.ev, se.cfnri.ev, z.cfnri.ev, cfnri.ne, se.cfnri.ne, z.cfnri.ne, + improveSens, improveSpec, idi.ev, se.idi.ev, z.idi.ev, idi.ne, + se.idi.ne, z.idi.ne, idi, se.idi, z.idi, is.x1, NA, is.x2, NA, + ip.x1, NA, ip.x2, NA, auc.x1, se.auc.x1, auc.x2, se.auc.x2, roc.test.x1.x2\$p.value,incidence) - names(output) = c("n", "na", "nb", "pup.ev", "pup.ne", "pdown.ev", "pdown.ne", + names(output) = c("n", "na", "nb", "pup.ev", "pup.ne", "pdown.ev", "pdown.ne", "nri", "se.nri", "z.nri", "nri.ev", "se.nri.ev", "z.nri.ev", "nri.ne", "se.nri.ne", "z.nri.ne", - "cfpup.ev", "cfpup.ne", "cfpdown.ev", "cfpdown.ne", + "cfpup.ev", "cfpup.ne", "cfpdown.ev", "cfpdown.ne", "cfnri", "se.cfnri", "z.cfnri", "cfnri.ev", "se.cfnri.ev", "z.cfnri.ev", "cfnri.ne", "se.cfnri.ne", "z.cfnri.ne", "improveSens", "improveSpec", - "idi.ev", "se.idi.ev", "z.idi.ev", "idi.ne", "se.idi.ne", - "z.idi.ne", "idi", "se.idi", "z.idi", "is.x1", "se.is.x1", - "is.x2", "se.is.x2", "ip.x1", "se.ip.x1", "ip.x2", "se.ip.x2", - "auc.x1", "se.auc.x1", "auc.x2", "se.auc.x2", + "idi.ev", "se.idi.ev", "z.idi.ev", "idi.ne", "se.idi.ne", + "z.idi.ne", "idi", "se.idi", "z.idi", "is.x1", "se.is.x1", + "is.x2", "se.is.x2", "ip.x1", "se.ip.x1", "ip.x2", "se.ip.x2", + "auc.x1", "se.auc.x1", "auc.x2", "se.auc.x2", "roc.test.x1.x2.pvalue","incidence") resdf = data.frame(N=n, Na=na, Nb=nb, pup.ev=pup.ev, pup.ne=pup.ne, pdown.ev=pdown.ev, pdown.ne=pdown.ne, NRI=nri, NRI.se=se.nri, NRI.z=z.nri, - NRI.ev=nri.ev, NRI.ev.se=se.nri.ev, NRI.ev.z=z.nri.ev, NRI.ne=nri.ne, NRI.ne.se=se.nri.ne, NRI.ne.z=z.nri.ne, + NRI.ev=nri.ev, NRI.ev.se=se.nri.ev, NRI.ev.z=z.nri.ev, NRI.ne=nri.ne, NRI.ne.se=se.nri.ne, NRI.ne.z=z.nri.ne, cfpup.ev=cfpup.ev, cfpup.ne=cfpup.ne, cfpdown.ev=cfpdown.ev, cfpdown.ne=cfpdown.ne, CFNRI=cfnri, CFNRI.se=se.cfnri, CFNRI.z=z.cfnri, - CFNRI.ev=cfnri.ev, CFNRI.ev.se=se.cfnri.ev, CFNRI.ev.z=z.cfnri.ev, CFNRI.ne=cfnri.ne, CFNRI.ne.se=se.cfnri.ne, CFNRI.ne.z=z.cfnri.ne, - improvSens=improveSens, improvSpec=improveSpec, IDI.ev=idi.ev, IDI.ev.se=se.idi.ev, IDI.ev.z=z.idi.ev, IDI.ne=idi.ne, - IDI.ne.se=se.idi.ne, IDI.ne.z=z.idi.ne, IDI=idi, IDI.se=se.idi, IDI.z=z.idi, isx1=is.x1, isx2=is.x2, - ipxi=ip.x1, ipx2=ip.x2, AUC.x1=auc.x1, AUC.x1.se=se.auc.x1, AUC.x2=auc.x2, AUC.x2.se=se.auc.x2, + CFNRI.ev=cfnri.ev, CFNRI.ev.se=se.cfnri.ev, CFNRI.ev.z=z.cfnri.ev, CFNRI.ne=cfnri.ne, CFNRI.ne.se=se.cfnri.ne, CFNRI.ne.z=z.cfnri.ne, + improvSens=improveSens, improvSpec=improveSpec, IDI.ev=idi.ev, IDI.ev.se=se.idi.ev, IDI.ev.z=z.idi.ev, IDI.ne=idi.ne, + IDI.ne.se=se.idi.ne, IDI.ne.z=z.idi.ne, IDI=idi, IDI.se=se.idi, IDI.z=z.idi, isx1=is.x1, isx2=is.x2, + ipxi=ip.x1, ipx2=ip.x2, AUC.x1=auc.x1, AUC.x1.se=se.auc.x1, AUC.x2=auc.x2, AUC.x2.se=se.auc.x2, roctestpval=roc.test.x1.x2\$p.value,incidence=incidence) tr = t(resdf) tresdf = data.frame(measure=colnames(resdf),value=tr[,1]) @@ -453,7 +320,7 @@ boot.index = sample(length(y), replace = TRUE) risk.model1.boot = x1[boot.index] risk.model2.boot = x2[boot.index] - cc.status.boot = y[boot.index] + cc.status.boot = y[boot.index] r = statistics.raplot(x1 = risk.model1.boot, x2 = risk.model2.boot, y = cc.status.boot) results.boot[i, ] = r\$output } @@ -478,87 +345,87 @@ results.matrix[2, ] = c("Events (n)", results["na"]) results.matrix[3, ] = c("Non-events (n)", results["nb"]) results.matrix[4, ] = c("Category free NRI and summary statistics","-------------------------") - results.matrix[5, ] = c("cfNRI events (%)", - paste(round(100*results["cfnri.ev"], dp-2), " (", + results.matrix[5, ] = c("cfNRI events (%)", + paste(round(100*results["cfnri.ev"], dp-2), " (", round(100*results["cfnri.ev"] - z * 100*results["se.cfnri.ev"], dp-2), - " to ", round(100*results["cfnri.ev"] + + " to ", round(100*results["cfnri.ev"] + z * 100*results["se.cfnri.ev"], dp-2), ")", sep = "")) - results.matrix[6, ] = c("cfNRI non-events (%)", + results.matrix[6, ] = c("cfNRI non-events (%)", paste(round(100*results["cfnri.ne"], dp-2), " (", round(100*results["cfnri.ne"] - z * 100*results["se.cfnri.ne"], dp)-2, - " to ", round(100*results["cfnri.ne"] + z * 100*results["se.cfnri.ne"], - dp-2), ")", sep = "")) - results.matrix[7, ] = c("cfNRI (%)", - paste(round(100*results["cfnri"], dp-2), " (", - round(100*results["cfnri"] - z * 100*results["se.cfnri"], dp-2), - " to ", round(100*results["cfnri"] + z * 100*results["se.cfnri"], + " to ", round(100*results["cfnri.ne"] + z * 100*results["se.cfnri.ne"], + dp-2), ")", sep = "")) + results.matrix[7, ] = c("cfNRI (%)", + paste(round(100*results["cfnri"], dp-2), " (", + round(100*results["cfnri"] - z * 100*results["se.cfnri"], dp-2), + " to ", round(100*results["cfnri"] + z * 100*results["se.cfnri"], dp-2), ")", sep = "")) results.matrix[8, ] = c("NRI and summary statistics","-------------------------") - results.matrix[9, ] = c("NRI events (%)", - paste(round(100*results["nri.ev"], dp-2), " (", + results.matrix[9, ] = c("NRI events (%)", + paste(round(100*results["nri.ev"], dp-2), " (", round(100*results["nri.ev"] - z * 100*results["se.nri.ev"], dp-2), - " to ", round(100*results["nri.ev"] + + " to ", round(100*results["nri.ev"] + z * 100*results["se.nri.ev"], dp-2), ")", sep = "")) - results.matrix[10, ] = c("NRI non-events (%)", + results.matrix[10, ] = c("NRI non-events (%)", paste(round(100*results["nri.ne"], dp-2), " (", round(100*results["nri.ne"] - z * 100*results["se.nri.ne"], dp-2), - " to ", round(100*results["nri.ne"] + z * 100*results["se.nri.ne"], - dp-2), ")", sep = "")) - results.matrix[11, ] = c("NRI (%)", - paste(round(100*results["nri"], dp-2), " (", - round(100*results["nri"] - z * 100*results["se.nri"], dp-2), - " to ", round(100*results["nri"] + z * 100*results["se.nri"], + " to ", round(100*results["nri.ne"] + z * 100*results["se.nri.ne"], + dp-2), ")", sep = "")) + results.matrix[11, ] = c("NRI (%)", + paste(round(100*results["nri"], dp-2), " (", + round(100*results["nri"] - z * 100*results["se.nri"], dp-2), + " to ", round(100*results["nri"] + z * 100*results["se.nri"], dp-2), ")", sep = "")) results.matrix[12, ] = c("IDI and summary statistics","-------------------------") - results.matrix[13, ] = c("IDI events", - paste(round(results["idi.ev"], dp), " (", - round(results["idi.ev"] - z * results["se.idi.ev"], dp), - " to ", round(results["idi.ev"] + z * results["se.idi.ev"], + results.matrix[13, ] = c("IDI events", + paste(round(results["idi.ev"], dp), " (", + round(results["idi.ev"] - z * results["se.idi.ev"], dp), + " to ", round(results["idi.ev"] + z * results["se.idi.ev"], dp), ")", sep = "")) - results.matrix[14, ] = c("IDI non-events", - paste(round(results["idi.ne"], dp), " (", - round(results["idi.ne"] - z * results["se.idi.ne"], dp), - " to ", round(results["idi.ne"] + z * results["se.idi.ne"], + results.matrix[14, ] = c("IDI non-events", + paste(round(results["idi.ne"], dp), " (", + round(results["idi.ne"] - z * results["se.idi.ne"], dp), + " to ", round(results["idi.ne"] + z * results["se.idi.ne"], dp), ")", sep = "")) - results.matrix[15, ] = c("IDI", - paste(round(results["idi"], dp), " (", - round(results["idi"] - z * results["se.idi"], dp), - " to ", round(results["idi"] + z * results["se.idi"], + results.matrix[15, ] = c("IDI", + paste(round(results["idi"], dp), " (", + round(results["idi"] - z * results["se.idi"], dp), + " to ", round(results["idi"] + z * results["se.idi"], dp), ")", sep = "")) - results.matrix[16, ] = c("IS (null model)", - paste(round(results["is.x1"], dp), " (", - round(results["is.x1"] - z * results["se.is.x1"], dp), - " to ", round(results["is.x1"] + z * results["se.is.x1"], + results.matrix[16, ] = c("IS (null model)", + paste(round(results["is.x1"], dp), " (", + round(results["is.x1"] - z * results["se.is.x1"], dp), + " to ", round(results["is.x1"] + z * results["se.is.x1"], dp), ")", sep = "")) - results.matrix[17, ] = c("IS (alt model)", - paste(round(results["is.x2"], dp), " (", - round(results["is.x2"] - z * results["se.is.x2"], dp), - " to ", round(results["is.x2"] + z * results["se.is.x2"], + results.matrix[17, ] = c("IS (alt model)", + paste(round(results["is.x2"], dp), " (", + round(results["is.x2"] - z * results["se.is.x2"], dp), + " to ", round(results["is.x2"] + z * results["se.is.x2"], dp), ")", sep = "")) - results.matrix[18, ] = c("IP (null model)", - paste(round(results["ip.x1"], dp), " (", - round(results["ip.x1"] - z * results["se.ip.x1"], dp), - " to ", round(results["ip.x1"] + z * results["se.ip.x1"], + results.matrix[18, ] = c("IP (null model)", + paste(round(results["ip.x1"], dp), " (", + round(results["ip.x1"] - z * results["se.ip.x1"], dp), + " to ", round(results["ip.x1"] + z * results["se.ip.x1"], dp), ")", sep = "")) - results.matrix[19, ] = c("IP (alt model)", - paste(round(results["ip.x2"], dp), " (", - round(results["ip.x2"] - z * results["se.ip.x2"], dp), - " to ", round(results["ip.x2"] + z * results["se.ip.x2"], + results.matrix[19, ] = c("IP (alt model)", + paste(round(results["ip.x2"], dp), " (", + round(results["ip.x2"] - z * results["se.ip.x2"], dp), + " to ", round(results["ip.x2"] + z * results["se.ip.x2"], dp), ")", sep = "")) results.matrix[20, ] = c("AUC","-------------------------") - results.matrix[21, ] = c("AUC (null model)", - paste(round(results["auc.x1"], dp), " (", - round(results["auc.x1"] - z * results["se.auc.x1"], dp), - " to ", round(results["auc.x1"] + z * results["se.auc.x1"], + results.matrix[21, ] = c("AUC (null model)", + paste(round(results["auc.x1"], dp), " (", + round(results["auc.x1"] - z * results["se.auc.x1"], dp), + " to ", round(results["auc.x1"] + z * results["se.auc.x1"], dp), ")", sep = "")) - results.matrix[22, ] = c("AUC (alt model)", - paste(round(results["auc.x2"], dp), " (", - round(results["auc.x2"] - z * results["se.auc.x2"], dp), - " to ", round(results["auc.x2"] + z * results["se.auc.x2"], + results.matrix[22, ] = c("AUC (alt model)", + paste(round(results["auc.x2"], dp), " (", + round(results["auc.x2"] - z * results["se.auc.x2"], dp), + " to ", round(results["auc.x2"] + z * results["se.auc.x2"], dp), ")", sep = "")) results.matrix[23, ] = c("difference (P)", round(results["roc.test.x1.x2.pvalue"], dp)) results.matrix[24, ] = c("Incidence", round(results["incidence"], dp)) - + return(results.matrix) } @@ -624,6 +491,139 @@ sink() + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**Before you start** + +This is a simple tool to calculate various measures of improvement in prediction between two models described in pickering_paper_ +It is based on an R script pickering_code_ written by Dr John W Pickering and Dr David Cairns from sunny Otago University which +has been debugged and slightly adjusted to fit a Galaxy tool wrapper. + + +**What it does** + +Copied from the documentation in pickering_code_ :: + + + Functions to create risk assessment plots and associated summary statistics + + + (c) 2012 Dr John W Pickering, john.pickering@otago.ac.nz, and Dr David Cairns + Last modified August 2014 + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are met: + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the distribution + + FUNCTIONS + raplot + Produces a Risk Assessment Plot and outputs the coordinates of the four curves + Based on: Pickering, J. W. and Endre, Z. H. (2012). New Metrics for Assessing Diagnostic Potential of + Candidate Biomarkers. Clinical Journal of the American Society of Nephrology, 7, 1355–1364. doi:10.2215/CJN.09590911 + + statistics.raplot + Produces the NRIs, IDIs, IS, IP, AUCs. + Based on: Pencina, M. J., D'Agostino, R. B. and Steyerberg, E. W. (2011). Extensions of net reclassification improvement calculations to + measure usefulness of new biomarkers. Statistics in Medicine, 30(1), 11–21. doi:10.1002/sim.4085 + Pencina, M. J., D'Agostino, R. B. and Vasan, R. S. (2008). Evaluating the added predictive ability of a new marker: From area under the + ROC curve to reclassification and beyond. + Statistics in Medicine, 27(2), 157–172. doi:10.1002/sim.2929 + DeLong, E., DeLong, D. and Clarke-Pearson, D. (1988). Comparing the areas under 2 or more correlated receiver operating characteristic curves - a nonparametric approach. + Biometrics, 44(3), 837–845. + + summary.raplot + Produces the NRIs, IDIs, IS, IP, AUCs with confidence intervals using a bootstrap or asymptotic procedure. (I prefer bootstrap which is chosed by cis=c("boot")) + + + Required arguments for all functions: + x1 is calculated risk (eg from a glm) for the null model, i.e. predict(,type="response") on a glm object + x2 is calculated risk (eg from a glm) for the alternative model + y is the case-control indicator (0 for controls, 1 for cases) + Optional argument + t are the boundaries of the risks for each group (ie 0, 1 and the thresholds beteween. eg c(0,0,3,0,7,1)). If missing, defaults to c(0, the incidence, 1) + + +**Input** + +The observed and predicted outcomes from two models to be compared. + +**Output** + +Lots'o'measures (TM) see pickering_paper_ for details + +**Attributions** + +pickering_paper_ is the paper the caclulations performed by this tool is based on + +pickering_code_ is the R function from John Pickering exposed by this Galaxy tool with minor modifications and hacks by Ross Lazarus. + +Galaxy_ (that's what you are using right now!) for gluing everything together + +Otherwise, all code and documentation comprising this tool was written by Ross Lazarus and is +licensed to you under the LGPL_ like other rgenetics artefacts + +.. _LGPL: http://www.gnu.org/copyleft/lesser.html +.. _pickering_code: http://www.researchgate.net/publication/264672640_R_function_for_Risk_Assessment_Plot__reclassification_metrics_NRI_IDI_cfNRI +.. _pickering_paper: http://cjasn.asnjournals.org/content/early/2012/05/24/CJN.09590911.full +.. _Galaxy: http://getgalaxy.org + + + + doi: 10.2215/CJN.09590911 diff -r 8c31e2aac682 -r bb725f6d6d38 rglasso_cox.xml --- a/rglasso_cox.xml Wed Apr 29 12:07:11 2015 -0400 +++ b/rglasso_cox.xml Mon May 04 22:47:29 2015 -0400 @@ -7,223 +7,9 @@ glmnet_lars_2_14 - rgToolFactory.py --script_path "$runme" --interpreter "Rscript" --tool_name "rglasso" + rgToolFactory.py --script_path "$runme" --interpreter "Rscript" --tool_name "rglasso" --output_dir "$html_file.files_path" --output_html "$html_file" --make_HTML "yes" - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - l - - - - l - - - - - - - - - - model['output_full'] == 'T' - - - model['output_pred'] == 'T' - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -**Before you start** - -Please read the glmnet documentation @ glmnet_ - -This Galaxy wrapper merely exposes that code and the glmnet_ documentation is essential reading -before getting useful results here. - -**What it does** - -From documentation at glmnet_ :: - - Glmnet is a package that fits a generalized linear model via penalized maximum likelihood. - The regularization path is computed for the lasso or elasticnet penalty at a grid of values for the regularization parameter lambda. - The algorithm is extremely fast, and can exploit sparsity in the input matrix x. - It fits linear, logistic and multinomial, poisson, and Cox regression models. - A variety of predictions can be made from the fitted models. - -Internal cross validation is used to optimise the choice of lambda based on CV AUC for logistic (binomial outcome) models, or CV mse for gaussian. - -**Warning about the tyrany of dimensionality** - -Yes, this package will select 'optimal' models even when you (optimistically) supply more predictors than you have cases. -The model returned is unlikely to represent the only informative regularisation path through your data - if you run repeatedly with -exactly the same settings, you will probably see many different models being selected. -This is not a software bug - the real problem is that you just don't have enough information in your data. - -Sufficiently big jobs will take a while (eg each lasso regression with 20k features on 1k samples takes about 2-3 minutes on our aged cluster) - -**Input** - -Assuming you have more measurements than samples, you supply data as a tabular text file where each row is a sample and columns -are variables. You specify which columns are dependent (predictors) and which are observations for each sample. Each of multiple -dependent variable columns will be run and reported independently. Predictors can be forced in to the model. - -**Output** - -For each selected dependent regression variable, a brief report of the model coefficients predicted at the -'optimal' nfold CV value of lambda. - -**Predicted event probabilities for Cox and Logistic models** - -If you want to compare (eg) two competing clinical predictions, there's a companion generic NRI tool -for predicted event probabilities. Estimates dozens of measures of improvement in prediction. Currently only works for identical id subjects -but can probably be extended to independent sample predictions. - -Given a model, we can generate a predicted p (for status 1) in binomial or cox frameworks so models can be evaluated in terms of NRI. -Of course, estimates are likely substantially inflated over 'real world' performance by being estimated from the same sample - but you probably -already knew that since you were smart enough to reach this far down into the on screen help. The author salutes you, intrepid reader! - -It may seem an odd thing to do, but we can predict p for an event for each subject from our original data, given a parsimonious model. Doing -this for two separate models (eg, forcing in an additional known explanatory measurement to the new model) allows comparison of the two models -predicted status for each subject, or the same model in independent populations to see how badly it does - -**Attributions** - -glmnet_ is the R package exposed by this Galaxy tool. - -Galaxy_ (that's what you are using right now!) for gluing everything together - -Otherwise, all code and documentation comprising this tool was written by Ross Lazarus and is -licensed to you under the LGPL_ like other rgenetics artefacts - -.. _LGPL: http://www.gnu.org/copyleft/lesser.html -.. _glmnet: http://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html -.. _Galaxy: http://getgalaxy.org - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + l + + + + l + + + + + + + + + + model['output_full'] == 'T' + + + model['output_pred'] == 'T' + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**Before you start** + +Please read the glmnet documentation @ glmnet_ + +This Galaxy wrapper merely exposes that code and the glmnet_ documentation is essential reading +before getting useful results here. + +**What it does** + +From documentation at glmnet_ :: + + Glmnet is a package that fits a generalized linear model via penalized maximum likelihood. + The regularization path is computed for the lasso or elasticnet penalty at a grid of values for the regularization parameter lambda. + The algorithm is extremely fast, and can exploit sparsity in the input matrix x. + It fits linear, logistic and multinomial, poisson, and Cox regression models. + A variety of predictions can be made from the fitted models. + +Internal cross validation is used to optimise the choice of lambda based on CV AUC for logistic (binomial outcome) models, or CV mse for gaussian. + +**Warning about the tyrany of dimensionality** + +Yes, this package will select 'optimal' models even when you (optimistically) supply more predictors than you have cases. +The model returned is unlikely to represent the only informative regularisation path through your data - if you run repeatedly with +exactly the same settings, you will probably see many different models being selected. +This is not a software bug - the real problem is that you just don't have enough information in your data. + +Sufficiently big jobs will take a while (eg each lasso regression with 20k features on 1k samples takes about 2-3 minutes on our aged cluster) + +**Input** + +Assuming you have more measurements than samples, you supply data as a tabular text file where each row is a sample and columns +are variables. You specify which columns are dependent (predictors) and which are observations for each sample. Each of multiple +dependent variable columns will be run and reported independently. Predictors can be forced in to the model. + +**Output** + +For each selected dependent regression variable, a brief report of the model coefficients predicted at the +'optimal' nfold CV value of lambda. + +**Predicted event probabilities for Cox and Logistic models** + +If you want to compare (eg) two competing clinical predictions, there's a companion generic NRI tool +for predicted event probabilities. Estimates dozens of measures of improvement in prediction. Currently only works for identical id subjects +but can probably be extended to independent sample predictions. + +Given a model, we can generate a predicted p (for status 1) in binomial or cox frameworks so models can be evaluated in terms of NRI. +Of course, estimates are likely substantially inflated over 'real world' performance by being estimated from the same sample - but you probably +already knew that since you were smart enough to reach this far down into the on screen help. The author salutes you, intrepid reader! + +It may seem an odd thing to do, but we can predict p for an event for each subject from our original data, given a parsimonious model. Doing +this for two separate models (eg, forcing in an additional known explanatory measurement to the new model) allows comparison of the two models +predicted status for each subject, or the same model in independent populations to see how badly it does + +**Attributions** + +glmnet_ is the R package exposed by this Galaxy tool. + +Galaxy_ (that's what you are using right now!) for gluing everything together + +Otherwise, all code and documentation comprising this tool was written by Ross Lazarus and is +licensed to you under the LGPL_ like other rgenetics artefacts + +.. _LGPL: http://www.gnu.org/copyleft/lesser.html +.. _glmnet: http://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html +.. _Galaxy: http://getgalaxy.org + + @Article{Friedman2010, title = {Regularization Paths for Generalized Linear Models via Coordinate Descent}, @@ -917,6 +908,3 @@ - - -