annotate caret_regression/tool2/templateLibrary.py @ 0:68300206e90d draft default tip

Uploaded
author deepakjadmin
date Thu, 05 Nov 2015 02:41:30 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
1 def __template4Rnw():
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
2
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
3 template4Rnw = r'''%% Regression Modeling Script
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
4 %% Max Kuhn (max.kuhn@pfizer.com, mxkuhn@gmail.com)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
5 %% Version: 1.00
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
6 %% Created on: 2010/10/02
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
7 %%
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
8 %% Lynn Group
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
9 %% Version: 2.00
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
10 %% Created on: 2014/11/15
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
11
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
12 %% This is an Sweave template for building and describing
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
13 %% classification models. It mixes R and LaTeX code. The document can
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
14 %% be processing using R's Sweave function to produce a tex file.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
15 %%
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
16 %% The inputs are:
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
17 %% - the initial data set in a data frame called 'rawData'
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
18 %% - a numeric column in the data set called 'outcome'. this should be the
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
19 %% outcome variable
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
20 %% - all other columns in rawData should be predictor variables
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
21 %% - the type of model should be in a variable called 'modName'.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
22 %%
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
23 %% The script attempts to make some intelligent choices based on the
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
24 %% model being used. For example, if modName is "pls", the script will
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
25 %% automatically center and scale the predictor data. There are
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
26 %% situations where these choices can (and should be) changed.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
27 %%
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
28 %% There are other options that may make sense to change. For example,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
29 %% the user may want to adjust the type of resampling. To find these
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
30 %% parts of the script, search on the string 'OPTION'. These parts of
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
31 %% the code will document the options.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
32
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
33 \documentclass[12pt]{report}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
34 \usepackage{amsmath}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
35 \usepackage[pdftex]{graphicx}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
36 \usepackage{color}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
37 \usepackage{ctable}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
38 \usepackage{xspace}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
39 \usepackage{fancyvrb}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
40 \usepackage{fancyhdr}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
41 \usepackage{lastpage}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
42 \usepackage{longtable}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
43 \usepackage{algorithm2e}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
44 \usepackage[
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
45 colorlinks=true,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
46 linkcolor=blue,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
47 citecolor=blue,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
48 urlcolor=blue]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
49 {hyperref}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
50 \usepackage{lscape}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
51
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
52 \usepackage{Sweave}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
53 \SweaveOpts{keep.source = TRUE}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
54
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
55 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
56
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
57 % define new colors for use
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
58 \definecolor{darkgreen}{rgb}{0,0.6,0}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
59 \definecolor{darkred}{rgb}{0.6,0.0,0}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
60 \definecolor{lightbrown}{rgb}{1,0.9,0.8}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
61 \definecolor{brown}{rgb}{0.6,0.3,0.3}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
62 \definecolor{darkblue}{rgb}{0,0,0.8}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
63 \definecolor{darkmagenta}{rgb}{0.5,0,0.5}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
64
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
65 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
66
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
67 \newcommand{\bld}[1]{\mbox{\boldmath $#1$}}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
68 \newcommand{\shell}[1]{\mbox{$#1$}}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
69 \renewcommand{\vec}[1]{\mbox{\bf {#1}}}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
70
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
71 \newcommand{\ReallySmallSpacing}{\renewcommand{\baselinestretch}{.6}\Large\normalsize}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
72 \newcommand{\SmallSpacing}{\renewcommand{\baselinestretch}{1.1}\Large\normalsize}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
73
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
74 \newcommand{\halfs}{\frac{1}{2}}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
75
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
76 \setlength{\oddsidemargin}{-.25 truein}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
77 \setlength{\evensidemargin}{0truein}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
78 \setlength{\topmargin}{-0.2truein}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
79 \setlength{\textwidth}{7 truein}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
80 \setlength{\textheight}{8.5 truein}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
81 \setlength{\parindent}{0.20truein}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
82 \setlength{\parskip}{0.10truein}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
83
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
84 \setcounter{LTchunksize}{50}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
85
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
86 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
87 \pagestyle{fancy}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
88 \lhead{}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
89 %% OPTION Report header name
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
90 \chead{Regression Model Script}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
91 \rhead{}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
92 \lfoot{}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
93 \cfoot{}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
94 \rfoot{\thepage\ of \pageref{LastPage}}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
95 \renewcommand{\headrulewidth}{1pt}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
96 \renewcommand{\footrulewidth}{1pt}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
97 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
98
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
99 %% OPTION Report title and modeler name
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
100 \title{Regression Model Script using $METHOD }
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
101 \author{"M. Kuhn and Lynn Group, SCIS, JNU, New Delhi"}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
102
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
103 \begin{document}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
104
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
105 \maketitle
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
106
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
107 \thispagestyle{empty}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
108
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
109 <<startup, eval= TRUE, results = hide, echo = FALSE>>=
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
110 library(Hmisc)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
111 library(caret)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
112 versionTest <- compareVersion(packageDescription("caret")$Version,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
113 "4.65")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
114 if(versionTest < 0) stop("caret version 4.65 or later is required")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
115
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
116 library(RColorBrewer)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
117
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
118
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
119 listString <- function (x, period = FALSE, verbose = FALSE)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
120 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
121 if (verbose) cat("\n entering listString\n")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
122 flush.console()
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
123 if (!is.character(x))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
124 x <- as.character(x)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
125 numElements <- length(x)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
126 out <- if (length(x) > 0) {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
127 switch(min(numElements, 3), x, paste(x, collapse = " and "),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
128 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
129 x <- paste(x, c(rep(",", numElements - 2), " and", ""), sep = "")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
130 paste(x, collapse = " ")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
131 })
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
132 }
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
133 else ""
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
134 if (period) out <- paste(out, ".", sep = "")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
135 if (verbose) cat(" leaving listString\n\n")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
136 flush.console()
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
137 out
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
138 }
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
139
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
140 resampleStats <- function(x, digits = 3)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
141 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
142 bestPerf <- x$bestTune
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
143 colnames(bestPerf) <- gsub("^\\.", "", colnames(bestPerf))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
144 out <- merge(x$results, bestPerf)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
145 out <- out[, colnames(out) %in% x$perfNames]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
146 names(out) <- gsub("ROC", "area under the ROC curve", names(out), fixed = TRUE)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
147 names(out) <- gsub("Sens", "sensitivity", names(out), fixed = TRUE)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
148 names(out) <- gsub("Spec", "specificity", names(out), fixed = TRUE)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
149 names(out) <- gsub("Accuracy", "overall accuracy", names(out), fixed = TRUE)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
150 names(out) <- gsub("Kappa", "Kappa statistics", names(out), fixed = TRUE)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
151 names(out) <- gsub("RMSE", "root mean squared error", names(out), fixed = TRUE)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
152 names(out) <- gsub("Rsquared", "$R^2$", names(out), fixed = TRUE)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
153
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
154 out <- format(out, digits = digits)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
155 listString(paste(names(out), "was", out))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
156 }
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
157
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
158 latticeBubble <- function(x, y, z, offset = .5, splits = 10,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
159 pal = colorRampPalette(brewer.pal(9,"YlOrRd")[-(1:2)]),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
160 ...)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
161 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
162 cexValues <- rank(z)/length(z) + offset
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
163 splits <- unique(quantile(z, probs = seq(0, 1, length = splits)))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
164 splitup <- cut(z, breaks = splits, include.lowest = TRUE)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
165 cols <- pal(length(levels(splitup)))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
166 colValues <- cols[as.numeric(splitup)]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
167 if(is.data.frame(x))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
168 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
169 out <- splom(~x, col = colValues, cex = cexValues, ...)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
170
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
171 } else out <- xyplot(y~x, col = colValues, cex = cexValues, ...)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
172 out
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
173
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
174 }
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
175
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
176
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
177 ##OPTION: model name: see ?train for more values/models
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
178 modName <- "$METHOD"
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
179 load("$RDATA")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
180 rawData <- dataX
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
181 rawData$$outcome <- dataY
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
182
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
183
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
184
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
185 @
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
186
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
187
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
188 \section*{Data Sets}\label{S:data}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
189
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
190 %% OPTION: provide some background on the problem, the experimental
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
191 %% data, how the compounds were selected etc
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
192
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
193
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
194 <<getDataInfo, eval = $GETDATAINFOEVAL, echo = $GETDATAINFOECHO, results = $GETDATAINFORESULT>>=
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
195 if(!any(names(rawData) == "outcome")) stop("a variable called outcome should be in the data set")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
196 if(!is.numeric(rawData$outcome)) stop("the outcome should be a numeric vector")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
197
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
198 numSamples <- nrow(rawData)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
199 numPredictors <- ncol(rawData) - 1
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
200 predictorNames <- names(rawData)[names(rawData) != "outcome"]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
201
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
202 isNum <- apply(rawData[,predictorNames, drop = FALSE], 2, is.numeric)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
203 if(any(!isNum)) stop("all predictors in rawData should be numeric")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
204
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
205 @
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
206
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
207
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
208 <<missingFilter, eval = $MISSINGFILTEREVAL, echo = $MISSINGFILTERECHO, results = $MISSINGFILTERRESULT>>=
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
209
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
210 colRate <- apply(rawData[, predictorNames, drop = FALSE],
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
211 2, function(x) mean(is.na(x)))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
212
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
213 ##OPTION thresholds can be changed
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
214 colExclude <- colRate > $MISSINGFILTERTHRESHC
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
215
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
216 missingText <- ""
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
217
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
218 if(any(colExclude))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
219 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
220 missingText <- paste(missingText,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
221 ifelse(sum(colExclude) > 1,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
222 " There were ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
223 " There was "),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
224 sum(colExclude),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
225 ifelse(sum(colExclude) > 1,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
226 " predictors ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
227 " predictor "),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
228 "with an excessive number of ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
229 "missing data. ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
230 ifelse(sum(colExclude) > 1,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
231 " These were excluded. ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
232 " This was excluded. "))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
233 predictorNames <- predictorNames[!colExclude]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
234 rawData <- rawData[, names(rawData) %in% c("outcome", predictorNames), drop = FALSE]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
235 }
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
236
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
237
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
238 rowRate <- apply(rawData[, predictorNames, drop = FALSE],
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
239 1, function(x) mean(is.na(x)))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
240
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
241 rowExclude <- rowRate > $MISSINGFILTERTHRESHR
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
242
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
243
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
244 if(any(rowExclude))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
245 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
246 missingText <- paste(missingText,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
247 ifelse(sum(rowExclude) > 1,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
248 " There were ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
249 " There was "),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
250 sum(colExclude),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
251 ifelse(sum(rowExclude) > 1,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
252 " samples ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
253 " sample "),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
254 "with an excessive number of ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
255 "missing data. ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
256 ifelse(sum(rowExclude) > 1,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
257 " These were excluded. ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
258 " This was excluded. "),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
259 "After filtering, ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
260 sum(!rowExclude),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
261 " samples remained.")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
262 rawData <- rawData[!rowExclude, ]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
263 hasMissing <- apply(rawData[, predictorNames, drop = FALSE],
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
264 1, function(x) mean(is.na(x)))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
265 } else {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
266 hasMissing <- apply(rawData[, predictorNames, drop = FALSE],
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
267 1, function(x) any(is.na(x)))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
268 missingText <- paste(missingText,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
269 ifelse(missingText == "",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
270 "There ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
271 "Subsequently, there "),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
272 ifelse(sum(hasMissing) == 1,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
273 "was ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
274 "were "),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
275 ifelse(sum(hasMissing) > 0,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
276 sum(hasMissing),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
277 "no"),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
278 ifelse(sum(hasMissing) == 1,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
279 "sample ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
280 "samples "),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
281 "with missing values.")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
282 rawData <- rawData[complete.cases(rawData),]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
283 }
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
284
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
285 dataDist <- summary(rawData$outcome)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
286 dataSD <- sd(rawData$outcome, na.rm = TRUE)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
287 dataText <- paste("The average outcome value was ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
288 dataDist["Mean"],
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
289 " and a standard deviation of ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
290 dataSD, ". The minimum and maximum values were ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
291 dataDist["Min."], " and ", dataDist["Max."],
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
292 ", respectively. Figure \\\\ref{F:dens} shows a ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
293 " density plot (i.e. a smooth histogram) of the response.",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
294 sep = "")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
295
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
296 rawData1 <- rawData[,1:length(rawData)-1]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
297 rawData2 <- rawData[,length(rawData)]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
298
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
299 set.seed(222)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
300 nzv1 <- nearZeroVar(rawData1)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
301 if(length(nzv1) > 0)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
302 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
303 nzvVars1 <- names(rawData1)[nzv1]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
304 rawData <- rawData1[,-nzv1]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
305 rawData$outcome <- rawData2
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
306 nzvText1 <- paste("There were ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
307 length(nzv1),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
308 " predictors that were removed from original data due to",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
309 " severely unbalanced distributions that",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
310 " could negatively affect the model fit",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
311 ifelse(length(nzv1) > 10,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
312 ".",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
313 paste(": ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
314 listString(nzvVars1),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
315 ".",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
316 sep = "")),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
317 sep = "")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
318
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
319 } else {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
320 rawData <- rawData1
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
321 rawData$outcome <- rawData2
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
322 nzvText1 <- ""
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
323
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
324 }
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
325
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
326 remove("rawData1")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
327 remove("rawData2")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
328
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
329 @
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
330
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
331 The initial data set consisted of \Sexpr{numSamples} samples and
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
332 \Sexpr{numPredictors} predictor variables. \Sexpr{dataText} \Sexpr{missingText}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
333 \Sexpr{nzvText1}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
334
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
335 \setkeys{Gin}{width = 0.8\textwidth}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
336 \begin{figure}[b]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
337 \begin{center}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
338
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
339 <<densityplot, echo = FALSE, results = hide, fig = TRUE, width = 8, height = 4.5>>=
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
340 trellis.par.set(caretTheme(), warn = TRUE)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
341 print(densityplot(~rawData$outcome, pch = "|", adjust = 1.25, xlab = ""))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
342 @
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
343 \caption[Data Density]{A density plot of the response. The marks
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
344 along the $x$--axis show the locations of the data points.}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
345 \label{F:dens}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
346 \end{center}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
347 \end{figure}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
348
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
349
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
350 <<pca, eval= $PCAEVAL, echo = $PCAECHO, results = $PCARESULT>>=
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
351 predictorNames <- names(rawData)[names(rawData) != "outcome"]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
352 numPredictors <- length(predictorNames)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
353
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
354 predictors <- rawData[, predictorNames, drop = FALSE]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
355 ## PCA will fail with predictors having less than 2 unique values
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
356 isZeroVar <- apply(predictors, 2,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
357 function(x) length(unique(x)) < 2)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
358 if(any(isZeroVar)) predictors <- predictors[, !isZeroVar, drop = FALSE]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
359 ## For whatever, only the formula interface to prcomp
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
360 ## handles missing values
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
361 pcaForm <- as.formula(
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
362 paste("~",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
363 paste(names(predictors), collapse = "+")))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
364 pca <- prcomp(pcaForm,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
365 data = predictors,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
366 center = TRUE,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
367 scale. = TRUE,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
368 na.action = na.omit)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
369 ## OPTION: the number of components plotted/discussed can be set
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
370 numPCAcomp <- $PCACOMP
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
371 pctVar <- pca$sdev^2/sum(pca$sdev^2)*100
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
372 pcaText <- paste(round(pctVar[1:numPCAcomp], 1),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
373 "\\\\%",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
374 sep = "")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
375 pcaText <- listString(pcaText)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
376 @
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
377
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
378 To get an initial assessment of the separability of the classes,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
379 principal component analysis (PCA) was used to distill the
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
380 \Sexpr{numPredictors} predictors down into \Sexpr{numPCAcomp}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
381 surrogate variables (i.e. the principal components) in a manner that
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
382 attempts to maximize the amount of information preserved from the
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
383 original predictor set. Figure \ref{F:inititalPCA} contains plots of
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
384 the first \Sexpr{numPCAcomp} components, which accounted for
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
385 \Sexpr{pcaText} percent of the variability in the original predictors
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
386 (respectively).
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
387
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
388 %% OPTION: remark on how well (or poorly) the data separated
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
389
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
390 \setkeys{Gin}{width = 0.8\textwidth}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
391 \begin{figure}[p]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
392 \begin{center}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
393
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
394 <<pcaPlot, eval = $PCAPLOTEVAL, echo = $PCAPLOTECHO, results = $PCAPLOTRESULT, fig = $PCAPLOTFIG, width = 8, height = 8>>=
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
395 trellis.par.set(caretTheme(), warn = TRUE)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
396 if(numPCAcomp == 2)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
397 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
398 axisRange <- extendrange(pca$x[, 1:2])
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
399 print(
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
400 latticeBubble(x = as.data.frame(pca$x)$PC1,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
401 y = as.data.frame(pca$x)$PC2,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
402 z = rawData$outcome,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
403 type = c("p", "g"),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
404 xlab = "PC1", ylab = "PC2",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
405 xlim = axisRange,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
406 ylim = axisRange))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
407
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
408 } else {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
409 axisRange <- extendrange(pca$x[, 1:numPCAcomp])
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
410 print(
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
411 latticeBubble(x = as.data.frame(pca$x)[,1:numPCAcomp],
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
412
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
413 z = rawData$outcome,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
414 type = c("p", "g"),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
415 xlab = "PC1", ylab = "PC2",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
416 xlim = axisRange,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
417 ylim = axisRange))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
418
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
419
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
420 }
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
421 @
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
422 \caption[PCA Plot]{A plot of the first \Sexpr{numPCAcomp}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
423 principal components for the original data set. Smaller, lighter
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
424 points indicate smaller values of the response while darker,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
425 larger points correspond to larger values of the outcome}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
426 \label{F:inititalPCA}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
427 \end{center}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
428 \end{figure}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
429
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
430
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
431 <<initialDataSplit, eval = $INITIALDATASPLITEVAL, echo = $INITIALDATASPLITECHO, results = $INITIALDATASPLITRESULT>>=
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
432
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
433 ## OPTION: in small samples sizes, you may not want to set aside a
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
434 ## training set and focus on the resampling results.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
435 numSamples <- nrow(rawData)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
436
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
437 predictorNames <- names(rawData)[names(rawData) != "outcome"]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
438 numPredictors <- length(predictorNames)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
439
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
440 # pctTrain <- .15
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
441 pctTrain <- $PERCENT
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
442
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
443 if(pctTrain < 1)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
444 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
445 ## OPTION: seed number can be changed
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
446 set.seed(1)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
447 inTrain <- createDataPartition(rawData$outcome,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
448 p = pctTrain,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
449 list = FALSE)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
450 trainX <- rawData[ inTrain, predictorNames]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
451 testX <- rawData[-inTrain, predictorNames]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
452 trainY <- rawData[ inTrain, "outcome"]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
453 testY <- rawData[-inTrain, "outcome"]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
454 splitText <- paste("The original data were split into ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
455 "a training set ($n$=",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
456 nrow(trainX),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
457 ") and a test set ($n$=",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
458 nrow(testX),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
459 ") in a manner that preserved the ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
460 "distribution of the response.",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
461 sep = "")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
462 isZeroVar <- apply(trainX, 2,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
463 function(x) length(unique(x)) < 2)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
464 if(any(isZeroVar))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
465 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
466 trainX <- trainX[, !isZeroVar, drop = FALSE]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
467 testX <- testX[, !isZeroVar, drop = FALSE]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
468 }
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
469
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
470 } else {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
471 trainX <- rawData[, predictorNames]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
472 testX <- NULL
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
473 trainY <- rawData[, "outcome"]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
474 testY <- NULL
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
475 splitText <- "The entire data set was used as the training set."
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
476 }
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
477
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
478 remove("rawData")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
479
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
480 @
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
481
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
482 \Sexpr{splitText}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
483 The data set for model building consisted of \Sexpr{numSamples} samples and
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
484 \Sexpr{numPredictors} predictor variables.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
485
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
486
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
487
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
488 <<nzv, eval= $NZVEVAL, results = $NZVRESULT, echo = $NZVECHO>>=
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
489 ## OPTION: other pre-processing steps can be used
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
490 ppSteps <- caret:::suggestions(modName)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
491
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
492 set.seed(2)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
493 if(ppSteps["nzv"])
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
494 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
495 nzv <- nearZeroVar(trainX)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
496 if(length(nzv) > 0)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
497 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
498 nzvVars <- names(trainX)[nzv]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
499 trainX <- trainX[, -nzv]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
500 nzvText <- paste("There were ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
501 length(nzv),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
502 " predictors that were removed due to",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
503 " severely unbalanced distributions that",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
504 " could negatively affect the model fit",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
505 ifelse(length(nzv) > 10,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
506 ".",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
507 paste(": ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
508 listString(nzvVars),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
509 ".",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
510 sep = "")),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
511 sep = "")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
512 testX <- testX[, -nzv]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
513 } else nzvText <- ""
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
514 } else nzvText <- ""
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
515 @
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
516
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
517 \Sexpr{nzvText}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
518
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
519 <<corrFilter, eval = $CORRFILTEREVAL, results = $CORRFILTERRESULT, echo = $CORRFILTERECHO>>=
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
520
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
521 if(ppSteps["corr"])
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
522 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
523 ## OPTION:
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
524 ##corrThresh <- .75
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
525 corrThresh <- $THRESHHOLDCOR
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
526 highCorr <- findCorrelation(cor(trainX, use = "pairwise.complete.obs"),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
527 corrThresh)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
528 if(length(highCorr) > 0)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
529 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
530 corrVars <- names(trainX)[highCorr]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
531 trainX <- trainX[, -highCorr]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
532 corrText <- paste("There were ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
533 length(highCorr),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
534 " predictors that were removed due to",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
535 " large between--predictor correlations that",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
536 " could negatively affect the model fit",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
537 ifelse(length(highCorr) > 10,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
538 ".",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
539 paste(": ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
540 listString(highCorr),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
541 ".",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
542 sep = "")),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
543 " Removing these predictors forced",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
544 " all pair--wise correlations to be",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
545 " less than ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
546 corrThresh,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
547 ".",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
548 sep = "")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
549 testX <- testX[, -highCorr]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
550 } else corrText <- ""
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
551 }else corrText <- ""
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
552 @
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
553
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
554 \Sexpr{corrText}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
555
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
556 <<preProc, eval = $PREPROCEVAL, echo = $PREPROCECHO, results = $PREPROCRESULT>>=
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
557
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
558 ppMethods <- NULL
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
559 if(ppSteps["center"]) ppMethods <- c(ppMethods, "center")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
560 if(ppSteps["scale"]) ppMethods <- c(ppMethods, "scale")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
561 if(any(hasMissing) > 0) ppMethods <- c(ppMethods, "knnImpute")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
562 ##OPTION other methods, such as spatial sign, can be added to this list
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
563
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
564 if(length(ppMethods) > 0)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
565 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
566 ppInfo <- preProcess(trainX, method = ppMethods)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
567 trainX <- predict(ppInfo, trainX)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
568 if(pctTrain < 1) testX <- predict(ppInfo, testX)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
569 ppText <- paste("The following pre--processing methods were",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
570 " applied to the training",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
571 ifelse(pctTrain < 1, " and test", ""),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
572 " data: ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
573 listString(ppMethods),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
574 ".",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
575 sep = "")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
576 ppText <- gsub("center", "mean centering", ppText)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
577 ppText <- gsub("scale", "scaling to unit variance", ppText)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
578 ppText <- gsub("knnImpute",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
579 paste(ppInfo$k, "--nearest neighbor imputation", sep = ""),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
580 ppText)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
581 ppText <- gsub("spatialSign", "the spatial sign transformation", ppText)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
582 ppText <- gsub("pca", "principal component feature extraction", ppText)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
583 ppText <- gsub("ica", "independent component feature extraction", ppText)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
584 } else {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
585 ppInfo <- NULL
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
586 ppText <- ""
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
587 }
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
588
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
589 predictorNames <- names(trainX)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
590 if(nzvText != "" | corrText != "" | ppText != "")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
591 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
592 varText <- paste("After pre--processing, ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
593 ncol(trainX),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
594 "predictors remained for modeling.")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
595 } else varText <- ""
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
596
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
597 @
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
598
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
599 \Sexpr{ppText} \Sexpr{varText}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
600
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
601 \clearpage
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
602 \section*{Model Building}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
603
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
604 <<setupWorkers, eval = TRUE, echo = $SETUPWORKERSECHO, results = $SETUPWORKERSRESULT>>=
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
605
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
606 numWorkers <- $NUMWORKERS
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
607 ##OPTION: turn up numWorkers to use MPI
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
608 if(numWorkers > 1)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
609 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
610 mpiCalcs <- function(X, FUN, ...)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
611 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
612 theDots <- list(...)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
613 parLapply(theDots$cl, X, FUN)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
614 }
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
615
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
616 library(snow)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
617 cl <- makeCluster(numWorkers, "MPI")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
618 }
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
619 @
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
620
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
621 <<setupResampling, echo = $SETUPRESAMPLINGECHO, results = $SETUPRESAMPLINGRESULT>>=
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
622 ##<<setupResampling, echo = FALSE, results = hide>>=
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
623 ##OPTION: the resampling options can be changed. See
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
624 ## ?trainControl for details
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
625 resampName <- "repeatedcv"
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
626 resampNumber <- $RESAMPLENUMBER
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
627 numRepeat <- 3
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
628 resampP <- $RESAMPLENUMBERPERCENT
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
629
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
630 modelInfo <- modelLookup(modName)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
631
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
632 set.seed(3)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
633 ctlObj <- trainControl(method = resampName,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
634 number = resampNumber,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
635 repeats = numRepeat,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
636 p = resampP)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
637
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
638
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
639 ##OPTION select other performance metrics as needed
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
640 optMetric <- "RMSE"
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
641
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
642 if(numWorkers > 1)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
643 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
644 ctlObj$workers <- numWorkers
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
645 ctlObj$computeFunction <- mpiCalcs
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
646 ctlObj$computeArgs <- list(cl = cl)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
647 }
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
648 @
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
649
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
650 <<setupGrid, results = $SETUPGRIDRESULT, echo = $SETUPGRIDECHO>>=
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
651
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
652 ##OPTION expand or contract these grids as needed (or
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
653 ## add more models
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
654
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
655 gridSize <- $SETUPGRIDSIZE
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
656
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
657 if(modName %in% c("svmPoly", "svmRadial", "svmLinear", "ctree2", "ctree")) gridSize <- 5
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
658 if(modName %in% c("earth")) gridSize <- 7
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
659 if(modName %in% c("knn", "glmboost", "rf", "nodeHarvest")) gridSize <- 10
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
660
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
661 if(modName %in% c("rpart")) gridSize <- 15
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
662 if(modName %in% c("pls", "lars2", "lars")) gridSize <- min(20, ncol(trainX))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
663
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
664 if(modName == "gbm")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
665 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
666 tGrid <- expand.grid(.interaction.depth = -1 + (1:5)*2 ,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
667 .n.trees = (1:10)*20,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
668 .shrinkage = .1)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
669 }
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
670
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
671 if(modName == "nnet")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
672 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
673 tGrid <- expand.grid(.size = -1 + (1:5)*2 ,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
674 .decay = c(0, .001, .01, .1))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
675 }
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
676
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
677 @
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
678
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
679
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
680 <<fitModel, results = $FITMODELRESULT, echo = $FITMODELECHO, eval = $FITMODELEVAL>>=
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
681
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
682 ##OPTION alter as needed
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
683
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
684 set.seed(4)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
685 modelFit <- switch(modName,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
686 gbm =
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
687 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
688 mix <- sample(seq(along = trainY))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
689 train(
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
690 trainX[mix,], trainY[mix], modName,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
691 verbose = FALSE,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
692 bag.fraction = .9,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
693 metric = optMetric,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
694 trControl = ctlObj,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
695 tuneGrid = tGrid)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
696 },
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
697
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
698 nnet =
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
699 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
700 train(
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
701 trainX, trainY, modName,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
702 metric = optMetric,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
703 linout = TRUE,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
704 trace = FALSE,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
705 maxiter = 1000,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
706 MaxNWts = 5000,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
707 trControl = ctlObj,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
708 tuneGrid = tGrid)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
709
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
710 },
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
711
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
712 svmRadial =, svmPoly =, svmLinear =
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
713 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
714 train(
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
715 trainX, trainY, modName,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
716 metric = optMetric,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
717 scaled = TRUE,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
718 trControl = ctlObj,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
719 tuneLength = gridSize)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
720 },
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
721 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
722 train(trainX, trainY, modName,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
723 trControl = ctlObj,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
724 metric = optMetric,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
725 tuneLength = gridSize)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
726 })
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
727
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
728 @
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
729
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
730 <<modelDescr, echo = $MODELDESCRECHO, results = $MODELDESCRRESULT>>=
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
731
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
732
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
733 summaryText <- ""
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
734
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
735 resampleName <- switch(tolower(modelFit$control$method),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
736 boot = paste("the bootstrap (", length(modelFit$control$index), " reps)", sep = ""),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
737 boot632 = paste("the bootstrap 632 rule (", length(modelFit$control$index), " reps)", sep = ""),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
738 cv = paste("cross-validation (", modelFit$control$number, " fold)", sep = ""),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
739 repeatedcv = paste("cross-validation (", modelFit$control$number, " fold, repeated ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
740 modelFit$control$repeats, " times)", sep = ""),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
741 lgocv = paste("repeated train/test splits (", length(modelFit$control$index), " reps, ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
742 round(modelFit$control$p, 2), "$\\%$)", sep = ""))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
743
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
744 tuneVars <- latexTranslate(tolower(modelInfo$label))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
745 tuneVars <- gsub("\\#", "the number of ", tuneVars, fixed = TRUE)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
746 if(ncol(modelFit$bestTune) == 1 && colnames(modelFit$bestTune) == ".parameter")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
747 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
748 summaryText <- paste(summaryText,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
749 "\n\n",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
750 "There are no tuning parameters associated with this model.",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
751 "To characterize the model performance on the training set,",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
752 resampleName,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
753 "was used.",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
754 "Table \\\\ref{T:resamps} and Figure \\\\ref{F:profile}",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
755 "show summaries of the resampling results. ")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
756
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
757 } else {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
758 summaryText <- paste("There",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
759 ifelse(nrow(modelInfo) > 1, "are", "is"),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
760 nrow(modelInfo),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
761 ifelse(nrow(modelInfo) > 1, "tuning parameters", "tuning parameter"),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
762 "associated with this model:",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
763 listString(tuneVars, period = TRUE))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
764
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
765
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
766
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
767 paramNames <- gsub(".", "", names(modelFit$bestTune), fixed = TRUE)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
768 for(i in seq(along = paramNames))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
769 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
770 check <- modelInfo$parameter %in% paramNames[i]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
771 if(any(check))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
772 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
773 paramNames[i] <- modelInfo$label[which(check)]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
774 }
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
775 }
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
776
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
777 paramNames <- gsub("#", "the number of ", paramNames, fixed = TRUE)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
778 ## Check to see if there was only one combination fit
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
779 summaryText <- paste(summaryText,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
780 "To choose",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
781 ifelse(nrow(modelInfo) > 1,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
782 "appropriate values of the tuning parameters,",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
783 "an appropriate value of the tuning parameter,"),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
784 resampleName,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
785 "was used to generated a profile of performance across the",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
786 nrow(modelFit$results),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
787 ifelse(nrow(modelInfo) > 1,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
788 "combinations of the tuning parameters.",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
789 "candidate values."),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
790
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
791 "Table \\\\ref{T:resamps} and Figure \\\\ref{F:profile} show",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
792 "summaries of the resampling profile. ", "The final model fitted to the entire training set was:",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
793 listString(paste(latexTranslate(tolower(paramNames)), "=", modelFit$bestTune[1,]), period = TRUE))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
794
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
795 }
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
796 @
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
797
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
798 \Sexpr{summaryText}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
799
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
800 <<resampTable, echo = $RESAMPTABLEECHO, results = $RESAMPTABLERESULT>>=
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
801
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
802 tableData <- modelFit$results
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
803
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
804 if(all(modelInfo$parameter == "parameter"))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
805 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
806 tableData <- tableData[,-1, drop = FALSE]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
807 colNums <- c(length(modelFit$perfNames), length(modelFit$perfNames))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
808 colLabels <- c("Mean", "Standard Deviation")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
809 constString <- ""
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
810 isConst <- NULL
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
811 } else {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
812
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
813 isConst <- apply(tableData[, modelInfo$parameter, drop = FALSE],
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
814 2,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
815 function(x) length(unique(x)) == 1)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
816
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
817 numParamInTable <- sum(!isConst)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
818
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
819 if(any(isConst))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
820 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
821 constParam <- modelInfo$parameter[isConst]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
822 constValues <- format(tableData[, constParam, drop = FALSE], digits = 4)[1,,drop = FALSE]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
823 tableData <- tableData[, !(names(tableData) %in% constParam), drop = FALSE]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
824 constString <- paste("The tuning",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
825 ifelse(sum(isConst) > 1,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
826 "parmeters",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
827 "parameter"),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
828 listString(paste("``", names(constValues), "''", sep = "")),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
829 ifelse(sum(isConst) > 1,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
830 "were",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
831 "was"),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
832 "held constant at",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
833 ifelse(sum(isConst) > 1,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
834 "a value of",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
835 "values of"),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
836 listString(constValues[1,]))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
837
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
838 } else constString <- ""
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
839
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
840 cn <- colnames(tableData)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
841 for(i in seq(along = cn))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
842 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
843 check <- modelInfo$parameter %in% cn[i]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
844 if(any(check))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
845 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
846 cn[i] <- modelInfo$label[which(check)]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
847 }
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
848 }
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
849 colnames(tableData) <- cn
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
850
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
851 colNums <- c(numParamInTable,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
852 length(modelFit$perfNames),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
853 length(modelFit$perfNames))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
854 colLabels <- c("", "Mean", "Standard Deviation")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
855 }
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
856
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
857 colnames(tableData) <- gsub("SD$", "", colnames(tableData))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
858 colnames(tableData) <- latexTranslate(colnames(tableData))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
859 rownames(tableData) <- latexTranslate(rownames(tableData))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
860
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
861 latex(tableData,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
862 rowname = NULL,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
863 file = "",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
864 cgroup = colLabels,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
865 n.cgroup = colNums,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
866 where = "h!",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
867 digits = 4,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
868 longtable = nrow(tableData) > 30,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
869 caption = paste(resampleName, "results from the model fit.", constString),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
870 label = "T:resamps")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
871 @
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
872
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
873 \setkeys{Gin}{ width = 0.9\textwidth}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
874 \begin{figure}[b]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
875 \begin{center}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
876
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
877 <<profilePlot, echo = $PROFILEPLOTECHO, fig = $PROFILEPLOTFIG, width = 8, height = 6>>=
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
878
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
879 trellis.par.set(caretTheme(), warn = TRUE)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
880 if(all(modelInfo$parameter == "parameter") | all(isConst) | modName == "nb")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
881 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
882 resultsPlot <- resampleHist(modelFit)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
883 plotCaption <- paste("Distributions of model performance from the ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
884 "training set estimated using ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
885 resampleName)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
886 } else {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
887 if(modName %in% c("svmPoly", "svmRadial", "svmLinear"))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
888 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
889 resultsPlot <- plot(modelFit,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
890 metric = optMetric,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
891 xTrans = function(x) log10(x))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
892 resultsPlot <- update(resultsPlot,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
893 type = c("g", "p", "l"),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
894 ylab = paste(optMetric, " (", resampleName, ")", sep = ""))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
895
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
896 } else {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
897 resultsPlot <- plot(modelFit,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
898 metric = optMetric)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
899 resultsPlot <- update(resultsPlot,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
900 type = c("g", "p", "l"),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
901 ylab = paste(optMetric, " (", resampleName, ")", sep = ""))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
902 }
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
903 plotCaption <- paste("A plot of the estimates of the",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
904 optMetric,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
905 "values calculated using",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
906 resampleName)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
907 }
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
908 print(resultsPlot)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
909 @
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
910 \caption[Performance Plot]{\Sexpr{plotCaption}.}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
911 \label{F:profile}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
912 \end{center}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
913 \end{figure}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
914
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
915 <<stopWorkers, echo = $STOPWORKERSECHO, results = $STOPWORKERSRESULT>>=
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
916 ##<<stopWorkers, echo = FALSE, results = hide>>=
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
917 if(numWorkers > 1) stopCluster(cl)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
918 @
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
919
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
920 <<testPred, results = $TESTPREDRESULT, echo = $TESTPREDECHO>>=
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
921
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
922
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
923 if(pctTrain < 1)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
924 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
925 cat("\\clearpage\n\\section*{Test Set Results}\n\n")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
926
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
927 testPreds <- extractPrediction(list(fit = modelFit),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
928 testX = testX, testY = testY)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
929 testPreds <- subset(testPreds, dataType == "Test")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
930 values <- modelFit$control$summaryFunction(testPreds)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
931 names(values) <- gsub("RMSE", "root mean squared error", names(values), fixed = TRUE)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
932 names(values) <- gsub("Rsquared", "$R^2$", names(values), fixed = TRUE)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
933 values <- format(values, digits = 3)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
934
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
935 testString <- paste("Based on the test set of",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
936 nrow(testX),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
937 "samples,",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
938 listString(paste(names(values), "was", values), period = TRUE),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
939 " A plot of the observed and predicted outcomes for the test set ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
940 "is given in Figure \\\\ref{F:obsPred}.")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
941 testString <- paste(testString,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
942 " Using ", resampleName,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
943 ", the training set estimates were ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
944 resampleStats(modelFit),
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
945 ".",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
946 sep = "")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
947
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
948 axisRange <- extendrange(testPreds[, c("obs", "pred")])
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
949 obsPred <- xyplot(obs ~ pred,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
950 data = testPreds,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
951 xlim = axisRange,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
952 ylim = axisRange,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
953 panel = function(x, y)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
954 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
955 panel.abline(0, 1, col = "darkgrey", lty = 2)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
956 panel.xyplot(x, y, type = c("p", "g"))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
957 panel.loess(x, y, col = "darkred", lwd = 2)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
958
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
959
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
960 },
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
961 ylab = "Observed Response",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
962 xlab = "Predicted Response")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
963
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
964 pdf("obsPred.pdf", height = 8, width = 8)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
965 trellis.par.set(caretTheme())
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
966 print(obsPred)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
967 dev.off()
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
968
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
969 } else testString <- ""
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
970 @
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
971 \Sexpr{testString}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
972
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
973
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
974 <<classProbsTex, results = $CLASSPROBSTEXRESULT, echo = $CLASSPROBSTEXECHO>>=
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
975
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
976
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
977 if(pctTrain < 1)
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
978 {
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
979 cat(
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
980 paste("\\begin{figure}[p]\n",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
981 "\\begin{center}\n",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
982 "\\includegraphics{obsPred}",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
983 "\\caption[Observed V Fitted Values]{",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
984 "The observed and predicted responses. ",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
985 "The grey line is the line of identity while the",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
986 "solid red line is a smoothed trend line.}\n",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
987 "\\label{F:obsPred}\n",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
988 "\\end{center}\n",
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
989 "\\end{figure}"))
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
990 }
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
991
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
992 @
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
993
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
994 \section*{Versions}
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
995
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
996 <<versions, echo = FALSE, results = tex>>=
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
997 toLatex(sessionInfo())
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
998
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
999 @
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
1000
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
1001 <<save-data, echo = $SAVEDATAECHO, results = $SAVEDATARESULT>>=
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
1002 ## change this to the name of modName....
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
1003 Fit<-modelFit
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
1004 save(Fit,file="$METHOD-Fit.RData")
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
1005 @
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
1006 The model was built using $METHOD and is saved as $METHOD-Fit.RData for reuse. This contains the variable Fit.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
1007
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
1008
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
1009 \end{document}'''
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
1010 return template4Rnw