comparison rg_nri.xml @ 2:21b12c7c52e4 draft

Fixes to paths in git for deps
author fubar
date Sat, 31 Oct 2015 01:34:27 -0400
parents cf295f36d606
children 31be675baa50
comparison
equal deleted inserted replaced
1:7d66bfa4fd56 2:21b12c7c52e4
1 <tool id="rg_nri" name="NRI" version="0.03">
2 <description>and other model improvement measures</description>
3 <requirements>
4 <requirement type="package" version="3.2.2">R_3_2_2</requirement>
5 <requirement type="package" version="1.3.18">graphicsmagick</requirement>
6 <requirement type="package" version="9.10">ghostscript</requirement>
7 <requirement type="package" version="3.2">glmnet_lars_3_2</requirement>
8 </requirements>
9 <command interpreter="python">
10 rgToolFactory.py --script_path "$runme" --interpreter "Rscript" --tool_name "rg_NRI"
11 --output_dir "$html_file.files_path" --output_html "$html_file" --make_HTML "yes"
12 </command>
13 <configfiles>
14 <configfile name="runme">
15
16 <![CDATA[
17
18 ### http://www.researchgate.net/publication/264672640_R_function_for_Risk_Assessment_Plot__reclassification_metrics_NRI_IDI_cfNRI code
19 ### http://cjasn.asnjournals.org/content/early/2012/05/24/CJN.09590911.full is the reference
20 ### lots of little tweaks and but fixes. Using t as a variable name seems fraught to me.
21 ### Ross Lazarus october 2014 for a Galaxy tool wrapper using the toolfactory infrastucture
22
23 #############################################################################
24 ###Functions to create risk assessment plots and associated summary statistics
25 #############################################################################
26 ###
27 ### (c) 2012 Dr John W Pickering, john.pickering@otago.ac.nz, and Dr David Cairns
28 ### Last modified August 2014
29 ###
30 ### Redistribution and use in source and binary forms, with or without
31 ### modification, are permitted provided that the following conditions are met:
32 ### * Redistributions of source code must retain the above copyright
33 ### notice, this list of conditions and the following disclaimer.
34 ### * Redistributions in binary form must reproduce the above copyright
35 ### notice, this list of conditions and the following disclaimer in
36 ### the documentation and/or other materials provided with the distribution
37
38 ### FUNCTIONS
39 ### raplot
40 ### Produces a Risk Assessment Plot and outputs the coordinates of the four curves
41 ### Based on: Pickering, J. W. and Endre, Z. H. (2012). New Metrics for Assessing Diagnostic Potential of
42 ### Candidate Biomarkers. Clinical Journal of the American Society of Nephrology, 7, 1355–1364. doi:10.2215/CJN.09590911
43 ###
44 ### statistics.raplot
45 ### Produces the NRIs, IDIs, IS, IP, AUCs.
46 ### Based on: Pencina, M. J., D'Agostino, R. B. and Steyerberg, E. W. (2011).
47 ### Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Statistics in Medicine, 30(1), 11–21. doi:10.1002/sim.4085
48 ### Pencina, M. J., D'Agostino, R. B. and Vasan, R. S. (2008). Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond.
49 ### Statistics in Medicine, 27(2), 157–172. doi:10.1002/sim.2929
50 ### DeLong, E., DeLong, D. and Clarke-Pearson, D. (1988). Comparing the areas under 2 or more correlated receiver operating characteristic curves - a nonparametric approach.
51 ### Biometrics, 44(3), 837–845.
52 ###
53 ### summary.raplot
54 ### Produces the NRIs, IDIs, IS, IP, AUCs with confidence intervals using a bootstrap or asymptotic procedure. (I prefer bootstrap which is chosed by cis=c("boot"))
55
56
57 ### Required arguments for all functions:
58 ### x1 is calculated risk (eg from a glm) for the null model, i.e. predict(,type="response") on a glm object
59 ### x2 is calculated risk (eg from a glm) for the alternative model
60 ### y is the case-control indicator (0 for controls, 1 for cases)
61 ### Optional argument
62 ### t are the boundaries of the risks for each group (ie 0, 1 and the thresholds beteween. eg c(0,0,3,0,7,1)). If missing, defaults to c(0, the incidence, 1)
63
64
65 ### risk assessment plot
66
67 library('e1071')
68 library('caret')
69 library('pROC')
70 library('Hmisc')
71 library('pracma')
72
73 raplot = function(x1, x2, y, outplot, title) {
74
75 roc.model1 = roc(y, x1)
76 roc.model2 = roc(y, x2)
77 sens.model1 = roc.model1\$sensitivities
78 spec.model1 = 1 - roc.model1\$specificities
79 n.model1 = length(sens.model1)
80 thresh.model1 = roc.model1\$thresholds
81 thresh.model1 = thresh.model1[c(-1,-n.model1)]
82 sens.model1 = sens.model1[c(-1,-n.model1)]
83 spec.model1 = spec.model1[c(-1,-n.model1)]
84 sens.model2 = roc.model2\$sensitivities
85 spec.model2 = 1 - roc.model2\$specificities
86 n.model2 = length(sens.model2)
87 thresh.model2 = roc.model2\$thresholds
88 thresh.model2[1]=0
89 thresh.model2[length(thresh.model2)]=1
90 thresh.model2 = thresh.model2[c(-1,-n.model2)]
91 sens.model2 = sens.model2[c(-1,-n.model2)]
92 spec.model2 = spec.model2[c(-1,-n.model2)]
93
94 n.model1 = length(sens.model1)
95 n.model2 = length(sens.model2)
96
97 ### actual plotting
98 pdf(outplot)
99 plot(thresh.model1, sens.model1, xlim = c(0, 1), ylim = c(0, 1), type = "n",
100 lty = 2, lwd = 2, xlab = "Risk of Event", ylab = "", col = "black", main=title)
101 grid()
102
103 polygon(x = c(thresh.model1, thresh.model2[n.model2:1]),
104 y = c(sens.model1, sens.model2[n.model2:1]), border = NA, col = gray(0.8))
105 polygon(x = c(thresh.model1, thresh.model2[n.model2:1]),
106 y = c(spec.model1, spec.model2[n.model2:1]), border = NA, col = gray(0.8))
107
108 lines(thresh.model1, sens.model1, type = "l", lty = 2, lwd = 2, col = "black")
109 lines(thresh.model2, sens.model2, type = "l", lty = 1, lwd = 2, col = "black")
110
111 lines(thresh.model1, spec.model1, type = "l", lty = 2, lwd = 2, col = "red")
112 lines(thresh.model2, spec.model2, type = "l", lty = 1, lwd = 2, col = "red")
113
114 text(x = -0.15, y = 0.4, labels = "Sensitivity, ", col = "black", xpd = TRUE, srt = 90)
115 text(x = -0.15, y = 0.4 + 0.175, labels = "1-Specificity", col = "red", xpd = TRUE, srt = 90)
116 legend("topleft", c("Event: New model", "Event: Baseline model",
117 "No Event: New model", "No Event: Baseline model"),
118 col = c("black", "black", "red", "red"),
119 lty = c(1,2, 1, 2), lwd = 2, bty = "n")
120 dev.off()
121 return(data.frame("Null.p.sens"=thresh.model1,
122 "Null.sens"=sens.model1,
123 "Null.p.1spec"=thresh.model1,
124 "Null.1spec"=sens.model1,
125 "Alt.p.sens"=thresh.model2,
126 "Alt.sens"=sens.model2,
127 "Alt.p.1spec"=thresh.model2,
128 "Alt.1spec"=sens.model2))
129
130 }
131
132
133
134 ### statistics from a raplot (is an adaptation of improveProb() from Hmisc)
135
136 statistics.raplot = function(x1, x2, y, threshvec)
137 {
138
139 s = is.na(x1 + x2 + y) ###Remove rows with missing data
140 if (any(s)) {
141 smiss = sum(s)
142 s = !s
143 x1 = x1[s]
144 x2 = x2[s]
145 y = y[s]
146 print.noquote(paste('Warning: removed',smiss,'cases with missing values'))
147 }
148 n = length(y)
149 y = as.numeric(y)
150 u = sort(unique(y))
151 if (length(u) != 2 || u[1] != 0 || u[2] != 1) {
152 print.noquote("INPUT ERROR: y must have only two values: 0 and 1")
153 sink()
154 quit(save="no",status=2)
155 }
156 r = range(x1, x2)
157 if (r[1] < 0 || r[2] > 1) {
158 print.noquote("INPUT ERROR: x1 and x2 must be in [0,1]")
159 sink()
160 quit(save="no",status=3)
161 }
162 incidence=sum(y)/n
163 if (missing(threshvec)) {
164 threshvec=c(0, incidence,1)
165 print(paste('threshvec missing. using',paste(threshvec,collapse=',')))
166 }
167 a = (y == 1)
168 b = (y == 0)
169 na = sum(a)
170 nb = sum(b)
171 d = x2 - x1
172 ### NRI
173 n.thresh=length(threshvec)-1
174 risk.class.x1.ev=cut2(x1[a],threshvec)
175 risk.class.x2.ev=cut2(x2[a],threshvec)
176 thresh=c()
177 lt = length(threshvec)
178 for (i in 1:(lt-1)) {
179 thresh[i] = paste("[",toString(threshvec[i]),",",toString(threshvec[i+1]),"]")
180 }
181 levels(risk.class.x1.ev)=thresh
182 levels(risk.class.x2.ev)=thresh
183 cM.ev=confusionMatrix(risk.class.x2.ev,risk.class.x1.ev)
184 pup.ev=0
185 pdown.ev=0
186 for (i in 1:(n.thresh-1)) { pup.ev = pup.ev + sum(cM.ev\$table[(i+1):n.thresh,i])}
187 for (i in 2:n.thresh) { pdown.ev = pdown.ev + sum(cM.ev\$table[1:(i-1),i])}
188 pup.ev=pup.ev/na
189 pdown.ev=pdown.ev/na
190 risk.class.x1.ne=cut2(x1[b],threshvec)
191 risk.class.x2.ne=cut2(x2[b],threshvec)
192 levels(risk.class.x1.ne)=thresh
193 levels(risk.class.x2.ne)=thresh
194 cM.ne=confusionMatrix(risk.class.x2.ne,risk.class.x1.ne)
195 pup.ne=0
196 pdown.ne=0
197 for (i in 1:(n.thresh-1)){pup.ne=pup.ev+sum(cM.ne\$table[(i+1):n.thresh,i])}
198 for (i in 2:n.thresh){pdown.ne=pdown.ne+sum(cM.ne\$table[1:(i-1),i])}
199 pdown.ne=pdown.ne/nb
200 pup.ne=pup.ne/nb
201 nri = pup.ev - pdown.ev - (pup.ne - pdown.ne)
202 se.nri = sqrt((pup.ev + pdown.ev)/na + (pup.ne + pdown.ne)/nb)
203 z.nri = nri/se.nri
204 nri.ev = pup.ev - pdown.ev
205 se.nri.ev = sqrt((pup.ev + pdown.ev)/na)
206 z.nri.ev = nri.ev/se.nri.ev
207 nri.ne = pdown.ne - pup.ne
208 se.nri.ne = sqrt((pdown.ne + pup.ne)/nb)
209 z.nri.ne = nri.ne/se.nri.ne
210 ### Category Free NRI calculations
211 cfpup.ev = mean(d[a] > 0)
212 cfpup.ne = mean(d[b] > 0)
213 cfpdown.ev = mean(d[a] < 0)
214 cfpdown.ne = mean(d[b] < 0)
215 cfnri = cfpup.ev - cfpdown.ev - (cfpup.ne - cfpdown.ne)
216 se.cfnri = sqrt((cfpup.ev + cfpdown.ev)/na + (cfpup.ne + cfpdown.ne)/nb)
217 z.cfnri = cfnri/se.cfnri
218 cfnri.ev = cfpup.ev - cfpdown.ev
219 se.cfnri.ev = sqrt((cfpup.ev + cfpdown.ev)/na)
220 z.cfnri.ev = cfnri.ev/se.cfnri.ev
221 cfnri.ne = cfpdown.ne - cfpup.ne
222 se.cfnri.ne = sqrt((cfpdown.ne + cfpup.ne)/nb)
223 z.cfnri.ne = cfnri.ne/se.cfnri.ne
224 ### IDI calculations
225 improveSens = sum(d[a])/na
226 improveSpec = -sum(d[b])/nb
227 idi.ev = mean(improveSens)
228 idi.ne = mean(improveSpec)
229 idi = idi.ev - idi.ne
230 var.ev = var(d[a])/na
231 se.idi.ev = sqrt(var.ev)
232 z.idi.ev = idi.ev/se.idi.ev
233 var.ne = var(d[b])/nb
234 se.idi.ne = sqrt(var.ne)
235 z.idi.ne = idi.ne/se.idi.ne
236 se.idi = sqrt(var.ev + var.ne)
237 z.idi = idi/se.idi
238 ### AUC calculations
239 roc.x1 = roc(y, x1)
240 auc.x1 = auc(roc.x1)
241 ci.auc.x1 = ci.auc(roc.x1)
242 se.auc.x1 = (ci.auc.x1[3] - auc.x1)/qnorm(0.975)
243 roc.x2 = roc(y, x2)
244 auc.x2 = auc(roc.x2)
245 ci.auc.x2 = ci.auc(roc.x2)
246 se.auc.x2 = (ci.auc.x2[3] - auc.x2)/qnorm(0.975)
247 roc.test.x1.x2 = roc.test(roc.x1, roc.x2) ###Uses the default Delong method
248 sens.x1 = roc.x1\$sensitivities
249 spec.x1 = 1 - roc.x1\$specificities
250 n.x1 = length(sens.x1)
251 x1 = roc.x1\$thresholds
252 x1 = x1[c(-1,-n.x1)]
253 sens.x1 = sens.x1[c(-1,-n.x1)]
254 spec.x1 = spec.x1[c(-1,-n.x1)]
255 sens.x2 = roc.x2\$sensitivities
256 spec.x2 = 1 - roc.x2\$specificities
257 n.x2 = length(sens.x2)
258 x2 = roc.x2\$thresholds
259 x2 = x2[c(-1,-n.x2)]
260 sens.x2 = sens.x2[c(-1,-n.x2)]
261 spec.x2 = spec.x2[c(-1,-n.x2)]
262 ### Integrated sensitivity and 1-specificity calculations
263 is.x1 = trapz(x = x1, y = sens.x1) ### area under curves (relates to integrated sens, 1-spec)
264 is.x2 = trapz(x = x2, y = sens.x2)
265 ip.x1 = trapz(x = x1, y = spec.x1)
266 ip.x2 = trapz(x = x2, y = spec.x2)
267
268 ### Output
269 output = c(n, na, nb, pup.ev, pup.ne, pdown.ev, pdown.ne, nri, se.nri, z.nri,
270 nri.ev, se.nri.ev, z.nri.ev, nri.ne, se.nri.ne, z.nri.ne,
271 cfpup.ev, cfpup.ne, cfpdown.ev, cfpdown.ne, cfnri, se.cfnri, z.cfnri,
272 cfnri.ev, se.cfnri.ev, z.cfnri.ev, cfnri.ne, se.cfnri.ne, z.cfnri.ne,
273 improveSens, improveSpec, idi.ev, se.idi.ev, z.idi.ev, idi.ne,
274 se.idi.ne, z.idi.ne, idi, se.idi, z.idi, is.x1, NA, is.x2, NA,
275 ip.x1, NA, ip.x2, NA, auc.x1, se.auc.x1, auc.x2, se.auc.x2,
276 roc.test.x1.x2\$p.value,incidence)
277 names(output) = c("n", "na", "nb", "pup.ev", "pup.ne", "pdown.ev", "pdown.ne",
278 "nri", "se.nri", "z.nri", "nri.ev", "se.nri.ev", "z.nri.ev",
279 "nri.ne", "se.nri.ne", "z.nri.ne",
280 "cfpup.ev", "cfpup.ne", "cfpdown.ev", "cfpdown.ne",
281 "cfnri", "se.cfnri", "z.cfnri", "cfnri.ev", "se.cfnri.ev", "z.cfnri.ev",
282 "cfnri.ne", "se.cfnri.ne", "z.cfnri.ne", "improveSens", "improveSpec",
283 "idi.ev", "se.idi.ev", "z.idi.ev", "idi.ne", "se.idi.ne",
284 "z.idi.ne", "idi", "se.idi", "z.idi", "is.x1", "se.is.x1",
285 "is.x2", "se.is.x2", "ip.x1", "se.ip.x1", "ip.x2", "se.ip.x2",
286 "auc.x1", "se.auc.x1", "auc.x2", "se.auc.x2",
287 "roc.test.x1.x2.pvalue","incidence")
288 resdf = data.frame(N=n, Na=na, Nb=nb, pup.ev=pup.ev, pup.ne=pup.ne, pdown.ev=pdown.ev, pdown.ne=pdown.ne, NRI=nri, NRI.se=se.nri, NRI.z=z.nri,
289 NRI.ev=nri.ev, NRI.ev.se=se.nri.ev, NRI.ev.z=z.nri.ev, NRI.ne=nri.ne, NRI.ne.se=se.nri.ne, NRI.ne.z=z.nri.ne,
290 cfpup.ev=cfpup.ev, cfpup.ne=cfpup.ne, cfpdown.ev=cfpdown.ev, cfpdown.ne=cfpdown.ne, CFNRI=cfnri, CFNRI.se=se.cfnri, CFNRI.z=z.cfnri,
291 CFNRI.ev=cfnri.ev, CFNRI.ev.se=se.cfnri.ev, CFNRI.ev.z=z.cfnri.ev, CFNRI.ne=cfnri.ne, CFNRI.ne.se=se.cfnri.ne, CFNRI.ne.z=z.cfnri.ne,
292 improvSens=improveSens, improvSpec=improveSpec, IDI.ev=idi.ev, IDI.ev.se=se.idi.ev, IDI.ev.z=z.idi.ev, IDI.ne=idi.ne,
293 IDI.ne.se=se.idi.ne, IDI.ne.z=z.idi.ne, IDI=idi, IDI.se=se.idi, IDI.z=z.idi, isx1=is.x1, isx2=is.x2,
294 ipxi=ip.x1, ipx2=ip.x2, AUC.x1=auc.x1, AUC.x1.se=se.auc.x1, AUC.x2=auc.x2, AUC.x2.se=se.auc.x2,
295 roctestpval=roc.test.x1.x2\$p.value,incidence=incidence)
296 tr = t(resdf)
297 tresdf = data.frame(measure=colnames(resdf),value=tr[,1])
298 return(list(resdf=tresdf,output=output))
299 }
300
301
302 ### More comprehensive summary statistics from a raplot
303 ### Choice of confidence intervals determined through asymptotics or bootstrapping (n.boot = ### of bootstrap resamples)
304 ### dp is number of decimal places for results table
305
306 summary.raplot = function(x1, x2, y, threshvec, cis = "boot", conf.level = 0.95, n.boot = 2000, dp = 4, stat_ra=NA)
307 {
308 results = stat_ra
309 if (cis == "boot") {
310 print.noquote("Bootstrap estimates for SE")
311 results.boot = matrix(NA, n.boot, length(names(results)))
312
313 colnames(results.boot) = names(results)
314
315 for (i in 1:n.boot) {
316 ###boot.index = sample(length(cc.status), replace = TRUE)
317 ###risk.model1.boot = risk.model1[boot.index]
318 ###risk.model2.boot = risk.model2[boot.index]
319 ###cc.status.boot = cc.status[boot.index]
320 boot.index = sample(length(y), replace = TRUE)
321 risk.model1.boot = x1[boot.index]
322 risk.model2.boot = x2[boot.index]
323 cc.status.boot = y[boot.index]
324 r = statistics.raplot(x1 = risk.model1.boot, x2 = risk.model2.boot, y = cc.status.boot)
325 results.boot[i, ] = r\$output
326 }
327
328 results.se.boot = apply(results.boot, 2, sd)
329 print(paste(results.se.boot,collapse=','))
330
331
332 results[grep("se", names(results))] = results.se.boot[grep("se", names(results)) - 1]
333
334 }
335
336
337
338 ### calculate cis and return
339
340 z = abs(qnorm((1 - conf.level)/2))
341
342 results.matrix = matrix(NA, 24, 2)
343
344 results.matrix[1, ] = c("Total (n)", results["n"])
345 results.matrix[2, ] = c("Events (n)", results["na"])
346 results.matrix[3, ] = c("Non-events (n)", results["nb"])
347 results.matrix[4, ] = c("Category free NRI and summary statistics","-------------------------")
348 results.matrix[5, ] = c("cfNRI events (%)",
349 paste(round(100*results["cfnri.ev"], dp-2), " (",
350 round(100*results["cfnri.ev"] - z * 100*results["se.cfnri.ev"], dp-2),
351 " to ", round(100*results["cfnri.ev"] +
352 z * 100*results["se.cfnri.ev"], dp-2), ")", sep = ""))
353 results.matrix[6, ] = c("cfNRI non-events (%)",
354 paste(round(100*results["cfnri.ne"], dp-2), " (",
355 round(100*results["cfnri.ne"] - z * 100*results["se.cfnri.ne"], dp)-2,
356 " to ", round(100*results["cfnri.ne"] + z * 100*results["se.cfnri.ne"],
357 dp-2), ")", sep = ""))
358 results.matrix[7, ] = c("cfNRI (%)",
359 paste(round(100*results["cfnri"], dp-2), " (",
360 round(100*results["cfnri"] - z * 100*results["se.cfnri"], dp-2),
361 " to ", round(100*results["cfnri"] + z * 100*results["se.cfnri"],
362 dp-2), ")", sep = ""))
363 results.matrix[8, ] = c("NRI and summary statistics","-------------------------")
364 results.matrix[9, ] = c("NRI events (%)",
365 paste(round(100*results["nri.ev"], dp-2), " (",
366 round(100*results["nri.ev"] - z * 100*results["se.nri.ev"], dp-2),
367 " to ", round(100*results["nri.ev"] +
368 z * 100*results["se.nri.ev"], dp-2), ")", sep = ""))
369 results.matrix[10, ] = c("NRI non-events (%)",
370 paste(round(100*results["nri.ne"], dp-2), " (",
371 round(100*results["nri.ne"] - z * 100*results["se.nri.ne"], dp-2),
372 " to ", round(100*results["nri.ne"] + z * 100*results["se.nri.ne"],
373 dp-2), ")", sep = ""))
374 results.matrix[11, ] = c("NRI (%)",
375 paste(round(100*results["nri"], dp-2), " (",
376 round(100*results["nri"] - z * 100*results["se.nri"], dp-2),
377 " to ", round(100*results["nri"] + z * 100*results["se.nri"],
378 dp-2), ")", sep = ""))
379 results.matrix[12, ] = c("IDI and summary statistics","-------------------------")
380 results.matrix[13, ] = c("IDI events",
381 paste(round(results["idi.ev"], dp), " (",
382 round(results["idi.ev"] - z * results["se.idi.ev"], dp),
383 " to ", round(results["idi.ev"] + z * results["se.idi.ev"],
384 dp), ")", sep = ""))
385 results.matrix[14, ] = c("IDI non-events",
386 paste(round(results["idi.ne"], dp), " (",
387 round(results["idi.ne"] - z * results["se.idi.ne"], dp),
388 " to ", round(results["idi.ne"] + z * results["se.idi.ne"],
389 dp), ")", sep = ""))
390 results.matrix[15, ] = c("IDI",
391 paste(round(results["idi"], dp), " (",
392 round(results["idi"] - z * results["se.idi"], dp),
393 " to ", round(results["idi"] + z * results["se.idi"],
394 dp), ")", sep = ""))
395 results.matrix[16, ] = c("IS (null model)",
396 paste(round(results["is.x1"], dp), " (",
397 round(results["is.x1"] - z * results["se.is.x1"], dp),
398 " to ", round(results["is.x1"] + z * results["se.is.x1"],
399 dp), ")", sep = ""))
400 results.matrix[17, ] = c("IS (alt model)",
401 paste(round(results["is.x2"], dp), " (",
402 round(results["is.x2"] - z * results["se.is.x2"], dp),
403 " to ", round(results["is.x2"] + z * results["se.is.x2"],
404 dp), ")", sep = ""))
405 results.matrix[18, ] = c("IP (null model)",
406 paste(round(results["ip.x1"], dp), " (",
407 round(results["ip.x1"] - z * results["se.ip.x1"], dp),
408 " to ", round(results["ip.x1"] + z * results["se.ip.x1"],
409 dp), ")", sep = ""))
410 results.matrix[19, ] = c("IP (alt model)",
411 paste(round(results["ip.x2"], dp), " (",
412 round(results["ip.x2"] - z * results["se.ip.x2"], dp),
413 " to ", round(results["ip.x2"] + z * results["se.ip.x2"],
414 dp), ")", sep = ""))
415 results.matrix[20, ] = c("AUC","-------------------------")
416 results.matrix[21, ] = c("AUC (null model)",
417 paste(round(results["auc.x1"], dp), " (",
418 round(results["auc.x1"] - z * results["se.auc.x1"], dp),
419 " to ", round(results["auc.x1"] + z * results["se.auc.x1"],
420 dp), ")", sep = ""))
421 results.matrix[22, ] = c("AUC (alt model)",
422 paste(round(results["auc.x2"], dp), " (",
423 round(results["auc.x2"] - z * results["se.auc.x2"], dp),
424 " to ", round(results["auc.x2"] + z * results["se.auc.x2"],
425 dp), ")", sep = ""))
426 results.matrix[23, ] = c("difference (P)", round(results["roc.test.x1.x2.pvalue"], dp))
427 results.matrix[24, ] = c("Incidence", round(results["incidence"], dp))
428
429 return(results.matrix)
430 }
431
432
433
434 ]]>
435
436 options(width=120)
437 options(digits=5)
438 logf = file("rgNRI.log", open = "a")
439 sink(logf,type = c("output", "message"))
440 Out_Dir = "$html_file.files_path"
441 Input1 = "$input1"
442 Input2 = "$input2"
443 myTitle = "$title"
444 outtab = "$nri_file"
445 input1_obs = $input1_observed
446 input1_pred = $input1_predicted
447 input1_id = $input1_id
448 input2_obs = $input2_observed
449 input2_pred = $input2_predicted
450 input2_id = $input2_id
451 in1 = read.table(Input1,head=T,sep='\t')
452 in2 = read.table(Input2,head=T,sep='\t')
453 id1 = in1[,input1_id]
454 id2 = in2[,input2_id]
455 useme1 = in1[which(id1 %in% id2),]
456 useme2 = in2[which(id2 %in% id1),]
457 id1 = useme1[,input1_id]
458 id2 = useme2[,input2_id]
459 useme1 = useme1[order(id1),]
460 useme2 = useme2[order(id2),]
461 x1 = useme1[,input1_pred]
462 x2 = useme2[,input2_pred]
463 y1 = useme1[,input1_obs]
464 y2 = useme2[,input2_obs]
465 n.boot = $CImeth.nboot
466 conf.level = 0.95
467 cis = "$CImeth.cis"
468 digits = 4
469 nydiff = sum(y1 != y2)
470 if (nydiff &gt; 0) {
471 print.noquote(paste('Input error: observed status column has',nydiff,'differences - cannot reliably proceed'))
472 quit(save="no",status=1)
473 }
474 y = y2
475 outplot = 'rgNRI_EventRisk.pdf'
476 res = raplot(x1=x1, x2=x2, y=y, outplot=outplot,title=myTitle)
477
478 stats = statistics.raplot(x1=x1, x2=x2, y=y)
479 res1 = stats\$resdf
480 out1 = stats\$output
481 print.noquote('Results:')
482 print.noquote(res1,digits=4)
483 res2 = summary.raplot(x1=x1, x2=x2, y=y, cis = cis, conf.level = conf.level, n.boot = n.boot, dp = digits, stat_ra=out1)
484 print.noquote('Summary:')
485 print.noquote(res2,digits=4)
486 write.table(format(res1,digits=4),outtab,quote=F, col.names=F,sep="\t",row.names=F)
487 print.noquote('SessionInfo for this R session:')
488 sessionInfo()
489 print.noquote('warnings for this R session:')
490 warnings()
491 sink()
492 </configfile>
493 </configfiles>
494 <inputs>
495 <param name="title" type="text" value="NRI test" label="Plot Title" help="Will appear as the title for the comparison plot"/>
496 <param name="input1" type="data" format="tabular" label="Select a tabular file from the baseline model with predicted and observed outcome column for each subject"
497 multiple='False' help="Observed and predicted status columns must be selected from this file below - NOTE both models must be in same order with exact matches in all observed outcomes" optional="False"/>
498 <param name="input1_observed" label="Select column containing observed outcome (0 for no event, 1 for an event)" type="data_column" data_ref="input1" numerical="True"
499 multiple="False" use_header_names="True" optional="False" help = "Observed outcomes are compared in the two files to check that the datasets are from the same data"/>
500 <param name="input1_predicted" label="Select column containing predicted event probabilies from baseline model" type="data_column" data_ref="input1" numerical="True"
501 multiple="False" use_header_names="True" optional="False" help="Must be in range 0-1"/>
502 <param name="input1_id" label="Select column containing subject ID from baseline model" type="data_column" data_ref="input1" numerical="True"
503 multiple="False" use_header_names="True" optional="False" help="Subect IDs are needed to match subjects to compare predictions in the two inputs"/>
504 <param name="input2" type="data" format="tabular" label="Select a tabular file from the new model with predicted and observed outcome columns for each subject"
505 multiple='False' help="Observed and predicted status columns must be selected from this file below" />
506 <param name="input2_observed" label="Select column containing observed outcome (0 for no event, 1 for an event)" type="data_column" data_ref="input2" numerical="True"
507 multiple="False" use_header_names="True" optional="False" help = "Observed outcomes are compared in the two files to check that the datasets are from the same data"/>
508 <param name="input2_predicted" label="Select column containing predicted event probabilities from the new model" type="data_column" data_ref="input2" numerical="True"
509 multiple="False" use_header_names="True" optional="False" help="Must be in range 0-1"/>
510 <param name="input2_id" label="Select column containing subject ID from the new model" type="data_column" data_ref="input2" numerical="True"
511 multiple="False" use_header_names="True" optional="False" help="Subect IDs are needed to match subjects to compare predictions in the two inputs"/>
512 <conditional name="CImeth">
513 <param name="cis" type="select" label="CI calculation method"
514 help="Bootstrap will take time - a long time for thousands - asymptotic is quick and informative">
515 <option value="asymptotic" selected="true">Asymptotic estimate</option>
516 <option value="boot">Bootstrap for empirical CIs</option>
517 </param>
518 <when value="boot">
519 <param name="nboot" type="integer" value="1000" label="Number of bootstrap replicates"/>
520 </when>
521 <when value="asymptotic">
522 <param name="nboot" type="hidden" value="1000"/>
523 </when>
524 </conditional>
525 </inputs>
526 <outputs>
527 <data format="html" name="html_file" label="${title}.html"/>
528 <data format="tabular" name="nri_file" label="${title}_nrires.xls"/>
529 </outputs>
530 <tests>
531 <test>
532 <param name='title' value='nri_test1' />
533 <param name='input1' value='nri_test1.xls' ftype='tabular' />
534 <param name='input2' value='nri_test1.xls' ftype='tabular' />
535 <param name='input1_id' value="1" />
536 <param name='input1_observed' value="2" />
537 <param name='input1_predicted' value="3" />
538 <param name='input2_observed' value="2" />
539 <param name='input2_predicted' value="4" />
540 <output name='html_file' file='nri_test1_out.html' compare='diff' lines_diff='10' />
541 <output name='nri_file' file='nri_test1_out.xls' />
542 </test>
543 </tests>
544 <help>
545
546 **Before you start**
547
548 This is a simple tool to calculate various measures of improvement in prediction between two models described in pickering_paper_
549 It is based on an R script pickering_code_ written by Dr John W Pickering and Dr David Cairns from sunny Otago University which
550 has been debugged and slightly adjusted to fit a Galaxy tool wrapper.
551
552
553 **What it does**
554
555 Copied from the documentation in pickering_code_ ::
556
557
558 Functions to create risk assessment plots and associated summary statistics
559
560
561 (c) 2012 Dr John W Pickering, john.pickering@otago.ac.nz, and Dr David Cairns
562 Last modified August 2014
563
564 Redistribution and use in source and binary forms, with or without
565 modification, are permitted provided that the following conditions are met:
566 * Redistributions of source code must retain the above copyright
567 notice, this list of conditions and the following disclaimer.
568 * Redistributions in binary form must reproduce the above copyright
569 notice, this list of conditions and the following disclaimer in
570 the documentation and/or other materials provided with the distribution
571
572 FUNCTIONS
573 raplot
574 Produces a Risk Assessment Plot and outputs the coordinates of the four curves
575 Based on: Pickering, J. W. and Endre, Z. H. (2012). New Metrics for Assessing Diagnostic Potential of
576 Candidate Biomarkers. Clinical Journal of the American Society of Nephrology, 7, 1355–1364. doi:10.2215/CJN.09590911
577
578 statistics.raplot
579 Produces the NRIs, IDIs, IS, IP, AUCs.
580 Based on: Pencina, M. J., D'Agostino, R. B. and Steyerberg, E. W. (2011). Extensions of net reclassification improvement calculations to
581 measure usefulness of new biomarkers. Statistics in Medicine, 30(1), 11–21. doi:10.1002/sim.4085
582 Pencina, M. J., D'Agostino, R. B. and Vasan, R. S. (2008). Evaluating the added predictive ability of a new marker: From area under the
583 ROC curve to reclassification and beyond.
584 Statistics in Medicine, 27(2), 157–172. doi:10.1002/sim.2929
585 DeLong, E., DeLong, D. and Clarke-Pearson, D. (1988). Comparing the areas under 2 or more correlated receiver operating characteristic curves - a nonparametric approach.
586 Biometrics, 44(3), 837–845.
587
588 summary.raplot
589 Produces the NRIs, IDIs, IS, IP, AUCs with confidence intervals using a bootstrap or asymptotic procedure. (I prefer bootstrap which is chosed by cis=c("boot"))
590
591
592 Required arguments for all functions:
593 x1 is calculated risk (eg from a glm) for the null model, i.e. predict(,type="response") on a glm object
594 x2 is calculated risk (eg from a glm) for the alternative model
595 y is the case-control indicator (0 for controls, 1 for cases)
596 Optional argument
597 t are the boundaries of the risks for each group (ie 0, 1 and the thresholds beteween. eg c(0,0,3,0,7,1)). If missing, defaults to c(0, the incidence, 1)
598
599
600 **Input**
601
602 The observed and predicted outcomes from two models to be compared.
603
604 **Output**
605
606 Lots'o'measures (TM) see pickering_paper_ for details
607
608 **Attributions**
609
610 pickering_paper_ is the paper the caclulations performed by this tool is based on
611
612 pickering_code_ is the R function from John Pickering exposed by this Galaxy tool with minor modifications and hacks by Ross Lazarus.
613
614 Galaxy_ (that's what you are using right now!) for gluing everything together
615
616 Otherwise, all code and documentation comprising this tool was written by Ross Lazarus and is
617 licensed to you under the LGPL_ like other rgenetics artefacts
618
619 .. _LGPL: http://www.gnu.org/copyleft/lesser.html
620 .. _pickering_code: http://www.researchgate.net/publication/264672640_R_function_for_Risk_Assessment_Plot__reclassification_metrics_NRI_IDI_cfNRI
621 .. _pickering_paper: http://cjasn.asnjournals.org/content/early/2012/05/24/CJN.09590911.full
622 .. _Galaxy: http://getgalaxy.org
623
624
625 </help>
626
627 <citations>
628 <citation type="doi">doi: 10.2215/​CJN.09590911</citation>
629 </citations>
630 </tool>
631
632
633