annotate tool2/templateLibrary.py.orig @ 0:ee374e48024f draft default tip

Uploaded
author deepakjadmin
date Thu, 21 Jan 2016 00:34:58 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1 def __template4Rnw():
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
2
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
3 template4Rnw = r'''%% Classification Modeling Script
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
4 %% Max Kuhn (max.kuhn@pfizer.com, mxkuhn@gmail.com)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
5 %% Version: 1.00
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
6 %% Created on: 2010/10/02
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
7 %%
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
8 %% This is an Sweave template for building and describing
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
9 %% classification models. It mixes R and LaTeX code. The document can
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
10 %% be processing using R's Sweave function to produce a tex file.
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
11 %%
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
12 %% The inputs are:
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
13 %% - the initial data set in a data frame called 'rawData'
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
14 %% - a factor column in the data set called 'class'. this should be the
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
15 %% outcome variable
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
16 %% - all other columns in rawData should be predictor variables
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
17 %% - the type of model should be in a variable called 'modName'.
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
18 %%
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
19 %% The script attempts to make some intelligent choices based on the
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
20 %% model being used. For example, if modName is "pls", the script will
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
21 %% automatically center and scale the predictor data. There are
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
22 %% situations where these choices can (and should be) changed.
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
23 %%
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
24 %% There are other options that may make sense to change. For example,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
25 %% the user may want to adjust the type of resampling. To find these
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
26 %% parts of the script, search on the string 'OPTION'. These parts of
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
27 %% the code will document the options.
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
28
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
29 \documentclass[14pt]{report}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
30 \usepackage{amsmath}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
31 \usepackage[pdftex]{graphicx}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
32 \usepackage{color}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
33 \usepackage{ctable}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
34 \usepackage{xspace}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
35 \usepackage{fancyvrb}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
36 \usepackage{fancyhdr}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
37 \usepackage{lastpage}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
38 \usepackage{longtable}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
39 \usepackage{algorithm2e}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
40 \usepackage[
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
41 colorlinks=true,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
42 linkcolor=blue,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
43 citecolor=blue,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
44 urlcolor=blue]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
45 {hyperref}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
46 \usepackage{lscape}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
47 \usepackage{Sweave}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
48 \SweaveOpts{keep.source = TRUE}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
49
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
50 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
51
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
52 % define new colors for use
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
53 \definecolor{darkgreen}{rgb}{0,0.6,0}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
54 \definecolor{darkred}{rgb}{0.6,0.0,0}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
55 \definecolor{lightbrown}{rgb}{1,0.9,0.8}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
56 \definecolor{brown}{rgb}{0.6,0.3,0.3}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
57 \definecolor{darkblue}{rgb}{0,0,0.8}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
58 \definecolor{darkmagenta}{rgb}{0.5,0,0.5}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
59
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
60 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
61
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
62 \newcommand{\bld}[1]{\mbox{\boldmath $$#1$$}}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
63 \newcommand{\shell}[1]{\mbox{$$#1$$}}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
64 \renewcommand{\vec}[1]{\mbox{\bf {#1}}}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
65
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
66 \newcommand{\ReallySmallSpacing}{\renewcommand{\baselinestretch}{.6}\Large\normalsize}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
67 \newcommand{\SmallSpacing}{\renewcommand{\baselinestretch}{1.1}\Large\normalsize}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
68
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
69 \newcommand{\halfs}{\frac{1}{2}}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
70
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
71 \setlength{\oddsidemargin}{-.25 truein}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
72 \setlength{\evensidemargin}{0truein}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
73 \setlength{\topmargin}{-0.2truein}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
74 \setlength{\textwidth}{7 truein}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
75 \setlength{\textheight}{8.5 truein}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
76 \setlength{\parindent}{0.20truein}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
77 \setlength{\parskip}{0.10truein}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
78
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
79 \setcounter{LTchunksize}{50}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
80
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
81 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
82 \pagestyle{fancy}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
83 \lhead{}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
84 %% OPTION Report header name
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
85 \chead{Classification Model Script}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
86 \rhead{}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
87 \lfoot{}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
88 \cfoot{}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
89 \rfoot{\thepage\ of \pageref{LastPage}}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
90 \renewcommand{\headrulewidth}{1pt}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
91 \renewcommand{\footrulewidth}{1pt}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
92 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
93
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
94 %% OPTION Report title and modeler name
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
95 \title{Classification Model Script using $METHOD}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
96 \author{"Lynn Group with M. Kuhn, SCIS, JNU, New Delhi"}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
97
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
98 \begin{document}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
99
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
100 \maketitle
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
101
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
102 \thispagestyle{empty}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
103 <<dummy, eval=TRUE, echo=FALSE, results=hide>>=
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
104 # sets values for variables used later in the program to prevent the \Sexpr error on parsing with Sweave
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
105 numSamples=''
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
106 classDistString=''
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
107 missingText=''
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
108 numPredictors=''
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
109 numPCAcomp=''
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
110 pcaText=''
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
111 nzvText=''
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
112 corrText=''
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
113 ppText=''
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
114 varText=''
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
115 splitText="Dummy Text"
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
116 nirText="Dummy Text"
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
117 # pctTrain is a variable that is initialised in Data splitting, and reused later in testPred
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
118 pctTrain=0.8
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
119 Smpling=''
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
120 nzvText1=''
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
121 classDistString1=''
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
122 dwnsmpl=''
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
123 upsmpl=''
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
124
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
125 @
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
126 <<startup, eval= TRUE, results = hide, echo = FALSE>>=
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
127 library(Hmisc)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
128 library(caret)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
129 library(pROC)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
130 versionTest <- compareVersion(packageDescription("caret")$$Version,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
131 "4.65")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
132 if(versionTest < 0) stop("caret version 4.65 or later is required")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
133
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
134 library(RColorBrewer)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
135
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
136
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
137 listString <- function (x, period = FALSE, verbose = FALSE)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
138 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
139 if (verbose) cat("\n entering listString\n")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
140 flush.console()
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
141 if (!is.character(x))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
142 x <- as.character(x)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
143 numElements <- length(x)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
144 out <- if (length(x) > 0) {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
145 switch(min(numElements, 3), x, paste(x, collapse = " and "),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
146 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
147 x <- paste(x, c(rep(",", numElements - 2), " and", ""), sep = "")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
148 paste(x, collapse = " ")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
149 })
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
150 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
151 else ""
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
152 if (period) out <- paste(out, ".", sep = "")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
153 if (verbose) cat(" leaving listString\n\n")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
154 flush.console()
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
155 out
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
156 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
157
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
158 resampleStats <- function(x, digits = 3)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
159 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
160 bestPerf <- x$$bestTune
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
161 colnames(bestPerf) <- gsub("^\\.", "", colnames(bestPerf))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
162 out <- merge(x$$results, bestPerf)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
163 out <- out[, colnames(out) %in% x$$perfNames]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
164 names(out) <- gsub("ROC", "area under the ROC curve", names(out), fixed = TRUE)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
165 names(out) <- gsub("Sens", "sensitivity", names(out), fixed = TRUE)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
166 names(out) <- gsub("Spec", "specificity", names(out), fixed = TRUE)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
167 names(out) <- gsub("Accuracy", "overall accuracy", names(out), fixed = TRUE)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
168 names(out) <- gsub("Kappa", "Kappa statistics", names(out), fixed = TRUE)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
169
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
170 out <- format(out, digits = digits)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
171 listString(paste(names(out), "was", out))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
172 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
173
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
174 twoClassNoProbs <- function (data, lev = NULL, model = NULL)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
175 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
176 out <- c(sensitivity(data[, "pred"], data[, "obs"], lev[1]),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
177 specificity(data[, "pred"], data[, "obs"], lev[2]),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
178 confusionMatrix(data[, "pred"], data[, "obs"])$$overall["Kappa"])
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
179
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
180 names(out) <- c("Sens", "Spec", "Kappa")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
181 out
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
182 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
183
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
184
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
185
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
186 ##OPTION: model name: see ?train for more values/models
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
187 modName <- "$METHOD"
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
188
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
189
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
190 load("$RDATA")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
191 rawData <- dataX
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
192 rawData$$outcome <- dataY
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
193
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
194 @
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
195
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
196
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
197 \section*{Data Sets}\label{S:data}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
198
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
199 %% OPTION: provide some background on the problem, the experimental
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
200 %% data, how the compounds were selected etc
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
201
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
202 <<getDataInfo, eval = $GETDATAINFOEVAL, echo = $GETDATAINFOECHO, results = $GETDATAINFORESULT>>=
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
203 if(!any(names(rawData) == "outcome")) stop("a variable called outcome should be in the data set")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
204 if(!is.factor(rawData$$outcome)) stop("the outcome should be a factor vector")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
205
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
206 ## OPTION: when there are only two classes, the first level of the
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
207 ## factor is used as the "positive" or "event" for calculating
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
208 ## sensitivity and specificity. Adjust the outcome factor accordingly.
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
209 numClasses <- length(levels(rawData$$outcome))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
210 numSamples <- nrow(rawData)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
211 numPredictors <- ncol(rawData) - 1
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
212 predictorNames <- names(rawData)[names(rawData) != "outcome"]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
213
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
214 isNum <- apply(rawData[,predictorNames, drop = FALSE], 2, is.numeric)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
215 if(any(!isNum)) stop("all predictors in rawData should be numeric")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
216
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
217 classTextCheck <- all.equal(levels(rawData$$outcome), make.names(levels(rawData$$outcome)))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
218 if(!classTextCheck) warning("the class levels are not valid R variable names; this may cause errors")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
219
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
220 ## Get the class distribution
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
221 classDist <- table(rawData$$outcome)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
222 classDistString <- paste("``",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
223 names(classDist),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
224 "'' ($$n$$=",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
225 classDist,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
226 ")",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
227 sep = "")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
228 classDistString <- listString(classDistString)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
229 @
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
230
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
231 <<missingFilter, eval = $MISSINGFILTEREVAL, echo = $MISSINGFILTERECHO, results = $MISSINGFILTERRESULT>>=
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
232 colRate <- apply(rawData[, predictorNames, drop = FALSE],
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
233 2, function(x) mean(is.na(x)))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
234
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
235 ##OPTION thresholds can be changed
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
236 colExclude <- colRate > $MISSINGFILTERTHRESHC
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
237
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
238 missingText <- ""
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
239
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
240 if(any(colExclude))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
241 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
242 missingText <- paste(missingText,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
243 ifelse(sum(colExclude) > 1,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
244 " There were ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
245 " There was "),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
246 sum(colExclude),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
247 ifelse(sum(colExclude) > 1,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
248 " predictors ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
249 " predictor "),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
250 "with an excessive number of ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
251 "missing data. ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
252 ifelse(sum(colExclude) > 1,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
253 " These were excluded. ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
254 " This was excluded. "))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
255 predictorNames <- predictorNames[!colExclude]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
256 rawData <- rawData[, names(rawData) %in% c("outcome", predictorNames), drop = FALSE]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
257 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
258
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
259
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
260 rowRate <- apply(rawData[, predictorNames, drop = FALSE],
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
261 1, function(x) mean(is.na(x)))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
262
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
263 rowExclude <- rowRate > $MISSINGFILTERTHRESHR
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
264
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
265
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
266 if(any(rowExclude)) {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
267 missingText <- paste(missingText,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
268 ifelse(sum(rowExclude) > 1,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
269 " There were ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
270 " There was "),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
271 sum(colExclude),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
272 ifelse(sum(rowExclude) > 1,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
273 " samples ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
274 " sample "),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
275 "with an excessive number of ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
276 "missing data. ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
277 ifelse(sum(rowExclude) > 1,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
278 " These were excluded. ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
279 " This was excluded. "),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
280 "After filtering, ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
281 sum(!rowExclude),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
282 " samples remained.")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
283 rawData <- rawData[!rowExclude, ]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
284 hasMissing <- apply(rawData[, predictorNames, drop = FALSE],
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
285 1, function(x) mean(is.na(x)))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
286 } else {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
287 hasMissing <- apply(rawData[, predictorNames, drop = FALSE],
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
288 1, function(x) any(is.na(x)))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
289 missingText <- paste(missingText,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
290 ifelse(missingText == "",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
291 "There ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
292 "Subsequently, there "),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
293 ifelse(sum(hasMissing) == 1,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
294 "was ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
295 "were "),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
296 ifelse(sum(hasMissing) > 0,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
297 sum(hasMissing),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
298 "no"),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
299 ifelse(sum(hasMissing) == 1,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
300 "sample ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
301 "samples "),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
302 "with missing values.")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
303
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
304 rawData <- rawData[complete.cases(rawData),]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
305
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
306 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
307
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
308 rawData1 <- rawData[,1:length(rawData)-1]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
309 rawData2 <- rawData[,length(rawData)]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
310
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
311 set.seed(222)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
312 nzv1 <- nearZeroVar(rawData1)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
313 if(length(nzv1) > 0)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
314 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
315 nzvVars1 <- names(rawData1)[nzv1]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
316 rawData <- rawData1[, -nzv1]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
317 rawData$outcome <- rawData2
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
318 nzvText1 <- paste("There were ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
319 length(nzv1),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
320 " predictors that were removed from original data due to",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
321 " severely unbalanced distributions that",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
322 " could negatively affect the model fit",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
323 ifelse(length(nzv1) > 10,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
324 ".",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
325 paste(": ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
326 listString(nzvVars1),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
327 ".",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
328 sep = "")),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
329 sep = "")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
330
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
331 } else {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
332 rawData <- rawData1
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
333 rawData$outcome <- rawData2
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
334 nzvText1 <- ""
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
335
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
336 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
337
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
338 remove("rawData1")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
339 remove("rawData2")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
340
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
341 @
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
342
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
343 The initial data set consisted of \Sexpr{numSamples} samples and
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
344 \Sexpr{numPredictors} predictor variables. The breakdown of the
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
345 outcome data classes were: \Sexpr{classDistString}.
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
346
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
347 \Sexpr{missingText}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
348
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
349 \Sexpr{nzvText1}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
350
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
351 <<pca, eval= $PCAEVAL, echo = $PCAECHO, results = $PCARESULT>>=
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
352
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
353 predictorNames <- names(rawData)[names(rawData) != "outcome"]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
354 numPredictors <- length(predictorNames)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
355 predictors <- rawData[, predictorNames, drop = FALSE]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
356 ## PCA will fail with predictors having less than 2 unique values
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
357 isZeroVar <- apply(predictors, 2,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
358 function(x) length(unique(x)) < 2)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
359 if(any(isZeroVar)) predictors <- predictors[, !isZeroVar, drop = FALSE]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
360 ## For whatever, only the formula interface to prcomp
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
361 ## handles missing values
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
362 pcaForm <- as.formula(
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
363 paste("~",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
364 paste(names(predictors), collapse = "+")))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
365 pca <- prcomp(pcaForm,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
366 data = predictors,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
367 center = TRUE,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
368 scale. = TRUE,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
369 na.action = na.omit)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
370 ## OPTION: the number of components plotted/discussed can be set
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
371 numPCAcomp <- $PCACOMP
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
372 pctVar <- pca$$sdev^2/sum(pca$$sdev^2)*100
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
373 pcaText <- paste(round(pctVar[1:numPCAcomp], 1),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
374 "\\\\%",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
375 sep = "")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
376 pcaText <- listString(pcaText)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
377 @
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
378
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
379 To get an initial assessment of the separability of the classes,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
380 principal component analysis (PCA) was used to distill the
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
381 \Sexpr{numPredictors} predictors down into \Sexpr{numPCAcomp}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
382 surrogate variables (i.e. the principal components) in a manner that
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
383 attempts to maximize the amount of information preserved from the
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
384 original predictor set. Figure \ref{F:inititalPCA} contains plots of
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
385 the first \Sexpr{numPCAcomp} components, which accounted for
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
386 \Sexpr{pcaText} percent of the variability in the original predictors
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
387 (respectively).
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
388
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
389
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
390 %% OPTION: remark on how well (or poorly) the data separated
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
391
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
392 \setkeys{Gin}{width = 0.8\textwidth}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
393 \begin{figure}[p]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
394 \begin{center}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
395
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
396 <<pcaPlot, eval = $PCAPLOTEVAL, echo = $PCAPLOTECHO, results = $PCAPLOTRESULT, fig = $PCAPLOTFIG, width = 8, height = 8>>=
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
397 trellis.par.set(caretTheme(), warn = TRUE)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
398 if(numPCAcomp == 2)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
399 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
400 axisRange <- extendrange(pca$$x[, 1:2])
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
401 print(
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
402 xyplot(PC1 ~ PC2,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
403 data = as.data.frame(pca$$x),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
404 type = c("p", "g"),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
405 groups = rawData$$outcome,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
406 auto.key = list(columns = 2),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
407 xlim = axisRange,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
408 ylim = axisRange))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
409 } else {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
410 axisRange <- extendrange(pca$$x[, 1:numPCAcomp])
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
411 print(
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
412 splom(~as.data.frame(pca$$x)[, 1:numPCAcomp],
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
413 type = c("p", "g"),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
414 groups = rawData$$outcome,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
415 auto.key = list(columns = 2),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
416 as.table = TRUE,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
417 prepanel.limits = function(x) axisRange
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
418 ))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
419
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
420 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
421
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
422 @
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
423
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
424 \caption[PCA Plot]{A plot of the first \Sexpr{numPCAcomp}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
425 principal components for the original data set.}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
426 \label{F:inititalPCA}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
427 \end{center}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
428 \end{figure}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
429
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
430
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
431
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
432 <<initialDataSplit, eval = $INITIALDATASPLITEVAL, echo = $INITIALDATASPLITECHO, results = $INITIALDATASPLITRESULT>>=
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
433
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
434 ## OPTION: in small samples sizes, you may not want to set aside a
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
435 ## training set and focus on the resampling results.
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
436
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
437 set.seed(1234)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
438 dataX <- rawData[,1:length(rawData)-1]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
439 dataY <- rawData[,length(rawData)]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
440
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
441 Smpling <- "$SAAMPLING"
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
442
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
443 if(Smpling=="downsampling")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
444 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
445 dwnsmpl <- downSample(dataX,dataY)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
446 rawData <- dwnsmpl[,1:length(dwnsmpl)-1]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
447 rawData$outcome <- dwnsmpl[,length(dwnsmpl)]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
448 remove("dwnsmpl")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
449 remove("dataX")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
450 remove("dataY")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
451 }else if(Smpling=="upsampling"){
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
452 upsmpl <- upSample(dataX,dataY)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
453 rawData <- upsmpl[,1:length(upsmpl)-1]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
454 rawData$outcome <- upsmpl[,length(upsmpl)]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
455 remove("upsmpl")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
456 remove("dataX")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
457 remove("dataY")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
458 }else{remove("dataX")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
459 remove("dataY")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
460 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
461
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
462
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
463
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
464 numSamples <- nrow(rawData)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
465
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
466 predictorNames <- names(rawData)[names(rawData) != "outcome"]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
467 numPredictors <- length(predictorNames)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
468
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
469
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
470 classDist1 <- table(rawData$outcome)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
471 classDistString1 <- paste("``",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
472 names(classDist1),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
473 "'' ($n$=",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
474 classDist1,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
475 ")",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
476 sep = "")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
477 classDistString1 <- listString(classDistString1)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
478
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
479 pctTrain <- $PERCENT
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
480
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
481 if(pctTrain < 1)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
482 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
483 ## OPTION: seed number can be changed
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
484 set.seed(1)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
485 inTrain <- createDataPartition(rawData$$outcome,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
486 p = pctTrain,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
487 list = FALSE)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
488 trainX <- rawData[ inTrain, predictorNames]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
489 testX <- rawData[-inTrain, predictorNames]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
490 trainY <- rawData[ inTrain, "outcome"]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
491 testY <- rawData[-inTrain, "outcome"]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
492 splitText <- paste("The original data were split into ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
493 "a training set ($$n$$=",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
494 nrow(trainX),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
495 ") and a test set ($$n$$=",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
496 nrow(testX),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
497 ") in a manner that preserved the ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
498 "distribution of the classes.",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
499 sep = "")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
500 isZeroVar <- apply(trainX, 2,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
501 function(x) length(unique(x)) < 2)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
502 if(any(isZeroVar))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
503 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
504 trainX <- trainX[, !isZeroVar, drop = FALSE]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
505 testX <- testX[, !isZeroVar, drop = FALSE]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
506 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
507
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
508 } else {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
509 trainX <- rawData[, predictorNames]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
510 testX <- NULL
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
511 trainY <- rawData[, "outcome"]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
512 testY <- NULL
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
513 splitText <- "The entire data set was used as the training set."
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
514 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
515 trainDist <- table(trainY)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
516 nir <- max(trainDist)/length(trainY)*100
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
517 niClass <- names(trainDist)[which.max(trainDist)]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
518 nirText <- paste("The non--information rate is the accuracy that can be ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
519 "achieved by predicting all samples using the most ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
520 "dominant class. For these data, the rate is ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
521 round(nir, 2), "\\\\% using the ``",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
522 niClass,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
523 "'' class.",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
524 sep = "")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
525
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
526 remove("rawData")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
527
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
528 @
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
529
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
530 \Sexpr{splitText}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
531
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
532 \Sexpr{nirText}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
533
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
534 The data set for model building consisted of \Sexpr{numSamples} samples and
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
535 \Sexpr{numPredictors} predictor variables. The breakdown of the
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
536 outcome data classes were: \Sexpr{classDistString1}.
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
537
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
538 <<nzv, eval= $NZVEVAL, results = $NZVRESULT, echo = $NZVECHO>>=
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
539 ## OPTION: other pre-processing steps can be used
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
540 ppSteps <- caret:::suggestions(modName)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
541
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
542 set.seed(2)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
543 if(ppSteps["nzv"])
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
544 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
545 nzv <- nearZeroVar(trainX)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
546 if(length(nzv) > 0)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
547 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
548 nzvVars <- names(trainX)[nzv]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
549 trainX <- trainX[, -nzv]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
550 nzvText <- paste("There were ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
551 length(nzv),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
552 " predictors that were removed from train set due to",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
553 " severely unbalanced distributions that",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
554 " could negatively affect the model",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
555 ifelse(length(nzv) > 10,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
556 ".",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
557 paste(": ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
558 listString(nzvVars),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
559 ".",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
560 sep = "")),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
561 sep = "")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
562 testX <- testX[, -nzv]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
563 } else nzvText <- ""
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
564 } else nzvText <- ""
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
565 @
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
566
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
567 \Sexpr{nzvText}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
568
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
569
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
570 <<corrFilter, eval = $CORRFILTEREVAL, results = $CORRFILTERRESULT, echo = $CORRFILTERECHO>>=
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
571 if(ppSteps["corr"])
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
572 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
573 ## OPTION:
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
574 corrThresh <- $THRESHHOLDCOR
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
575 highCorr <- findCorrelation(cor(trainX, use = "pairwise.complete.obs"),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
576 corrThresh)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
577 if(length(highCorr) > 0)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
578 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
579 corrVars <- names(trainX)[highCorr]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
580 trainX <- trainX[, -highCorr]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
581 corrText <- paste("There were ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
582 length(highCorr),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
583 " predictors that were removed due to",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
584 " large between--predictor correlations that",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
585 " could negatively affect the model fit",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
586 ifelse(length(highCorr) > 10,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
587 ".",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
588 paste(": ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
589 listString(highCorr),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
590 ".",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
591 sep = "")),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
592 " Removing these predictors forced",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
593 " all pair--wise correlations to be",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
594 " less than ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
595 corrThresh,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
596 ".",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
597 sep = "")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
598 testX <- testX[, -highCorr]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
599 } else corrText <- "No correlation among data on given threshold"
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
600 }else corrText <- ""
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
601 @
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
602
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
603 \Sexpr{corrText}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
604
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
605 <<preProc, eval = $PREPROCEVAL, echo = $PREPROCECHO, results = $PREPROCRESULT>>=
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
606 ppMethods <- NULL
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
607 if(ppSteps["center"]) ppMethods <- c(ppMethods, "center")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
608 if(ppSteps["scale"]) ppMethods <- c(ppMethods, "scale")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
609 if(any(hasMissing) > 0) ppMethods <- c(ppMethods, "knnImpute")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
610 ##OPTION other methods, such as spatial sign, can be added to this list
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
611
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
612 if(length(ppMethods) > 0)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
613 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
614 ppInfo <- preProcess(trainX, method = ppMethods)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
615 trainX <- predict(ppInfo, trainX)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
616 if(pctTrain < 1) testX <- predict(ppInfo, testX)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
617 ppText <- paste("The following pre--processing methods were",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
618 " applied to the training",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
619 ifelse(pctTrain < 1, " and test", ""),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
620 " data: ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
621 listString(ppMethods),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
622 ".",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
623 sep = "")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
624 ppText <- gsub("center", "mean centering", ppText)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
625 ppText <- gsub("scale", "scaling to unit variance", ppText)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
626 ppText <- gsub("knnImpute",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
627 paste(ppInfo$$k, "--nearest neighbor imputation", sep = ""),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
628 ppText)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
629 ppText <- gsub("spatialSign", "the spatial sign transformation", ppText)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
630 ppText <- gsub("pca", "principal component feature extraction", ppText)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
631 ppText <- gsub("ica", "independent component feature extraction", ppText)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
632 } else {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
633 ppInfo <- NULL
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
634 ppText <- ""
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
635 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
636
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
637 predictorNames <- names(trainX)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
638 if(nzvText != "" | corrText != "" | ppText != "")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
639 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
640 varText <- paste("After pre--processing, ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
641 ncol(trainX),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
642 "predictors remained for modeling.")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
643 } else varText <- ""
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
644
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
645 @
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
646
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
647 \Sexpr{ppText}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
648 \Sexpr{varText}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
649
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
650 \clearpage
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
651
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
652 \section*{Model Building}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
653
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
654 <<setupWorkers, eval = TRUE, echo = $SETUPWORKERSECHO, results = $SETUPWORKERSRESULT>>=
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
655 numWorkers <- $NUMWORKERS
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
656 ##OPTION: turn up numWorkers to use MPI
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
657 if(numWorkers > 1)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
658 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
659 mpiCalcs <- function(X, FUN, ...)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
660 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
661 theDots <- list(...)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
662 parLapply(theDots$$cl, X, FUN)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
663 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
664
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
665 library(snow)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
666 cl <- makeCluster(numWorkers, "MPI")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
667 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
668 @
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
669
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
670 <<setupResampling, echo = $SETUPRESAMPLINGECHO, results = $SETUPRESAMPLINGRESULT>>=
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
671 ##OPTION: the resampling options can be changed. See
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
672 ## ?trainControl for details
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
673
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
674 resampName <- "$RESAMPNAME"
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
675 resampNumber <- $RESAMPLENUMBER
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
676 numRepeat <- $NUMREPEAT
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
677 resampP <- $RESAMPLENUMBERPERCENT
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
678
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
679 modelInfo <- modelLookup(modName)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
680
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
681 if(numClasses == 2)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
682 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
683 foo <- if(any(modelInfo$$probModel)) twoClassSummary else twoClassNoProbs
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
684 } else foo <- defaultSummary
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
685
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
686 set.seed(3)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
687 ctlObj <- trainControl(method = resampName,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
688 number = resampNumber,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
689 repeats = numRepeat,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
690 p = resampP,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
691 classProbs = any(modelInfo$$probModel),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
692 summaryFunction = foo)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
693
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
694
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
695 ##OPTION select other performance metrics as needed
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
696 optMetric <- if(numClasses == 2 & any(modelInfo$$probModel)) "ROC" else "Kappa"
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
697
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
698 if(numWorkers > 1)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
699 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
700 ctlObj$$workers <- numWorkers
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
701 ctlObj$$computeFunction <- mpiCalcs
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
702 ctlObj$$computeArgs <- list(cl = cl)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
703 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
704 @
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
705
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
706 <<setupGrid, results = $SETUPGRIDRESULT, echo = $SETUPGRIDECHO>>=
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
707 ##OPTION expand or contract these grids as needed (or
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
708 ## add more models
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
709
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
710 gridSize <- $SETUPGRIDSIZE
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
711
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
712 if(modName %in% c("svmPoly", "svmRadial", "svmLinear", "lvq", "ctree2", "ctree")) gridSize <- 5
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
713 if(modName %in% c("earth", "fda")) gridSize <- 7
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
714 if(modName %in% c("knn", "rocc", "glmboost", "rf", "nodeHarvest")) gridSize <- 10
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
715
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
716 if(modName %in% c("nb")) gridSize <- 2
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
717 if(modName %in% c("pam", "rpart")) gridSize <- 15
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
718 if(modName %in% c("pls")) gridSize <- min(20, ncol(trainX))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
719
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
720 if(modName == "gbm")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
721 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
722 tGrid <- expand.grid(.interaction.depth = -1 + (1:5)*2 ,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
723 .n.trees = (1:10)*20,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
724 .shrinkage = .1)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
725 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
726
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
727 if(modName == "nnet")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
728 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
729 tGrid <- expand.grid(.size = -1 + (1:5)*2 ,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
730 .decay = c(0, .001, .01, .1))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
731 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
732
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
733 if(modName == "ada")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
734 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
735 tGrid <- expand.grid(.maxdepth = 1, .iter = c(100,200,300,400), .nu = 1 )
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
736
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
737 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
738
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
739
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
740 @
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
741
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
742 <<fitModel, results = $FITMODELRESULT, echo = $FITMODELECHO, eval = $FITMODELEVAL>>=
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
743 ##OPTION alter as needed
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
744
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
745 set.seed(4)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
746 modelFit <- switch(modName,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
747 gbm =
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
748 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
749 mix <- sample(seq(along = trainY))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
750 train(
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
751 trainX[mix,], trainY[mix], modName,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
752 verbose = FALSE,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
753 bag.fraction = .9,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
754 metric = optMetric,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
755 trControl = ctlObj,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
756 tuneGrid = tGrid)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
757 },
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
758
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
759 multinom =
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
760 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
761 train(
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
762 trainX, trainY, modName,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
763 trace = FALSE,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
764 metric = optMetric,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
765 maxiter = 1000,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
766 MaxNWts = 5000,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
767 trControl = ctlObj,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
768 tuneLength = gridSize)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
769 },
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
770
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
771 nnet =
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
772 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
773 train(
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
774 trainX, trainY, modName,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
775 metric = optMetric,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
776 linout = FALSE,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
777 trace = FALSE,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
778 maxiter = 1000,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
779 MaxNWts = 5000,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
780 trControl = ctlObj,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
781 tuneGrid = tGrid)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
782
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
783 },
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
784
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
785 svmRadial =, svmPoly =, svmLinear =
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
786 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
787 train(
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
788 trainX, trainY, modName,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
789 metric = optMetric,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
790 scaled = TRUE,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
791 trControl = ctlObj,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
792 tuneLength = gridSize)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
793 },
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
794 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
795 train(trainX, trainY, modName,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
796 trControl = ctlObj,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
797 metric = optMetric,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
798 tuneLength = gridSize)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
799 })
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
800
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
801 @
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
802
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
803 <<modelDescr, echo = $MODELDESCRECHO, results = $MODELDESCRRESULT>>=
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
804 summaryText <- ""
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
805
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
806 resampleName <- switch(tolower(modelFit$$control$$method),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
807 boot = paste("the bootstrap (", length(modelFit$$control$$index), " reps)", sep = ""),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
808 boot632 = paste("the bootstrap 632 rule (", length(modelFit$$control$$index), " reps)", sep = ""),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
809 cv = paste("cross-validation (", modelFit$$control$$number, " fold)", sep = ""),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
810 repeatedcv = paste("cross-validation (", modelFit$$control$$number, " fold, repeated ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
811 modelFit$$control$$repeats, " times)", sep = ""),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
812 lgocv = paste("repeated train/test splits (", length(modelFit$$control$$index), " reps, ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
813 round(modelFit$$control$$p, 2), "$$\\%$$)", sep = ""))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
814
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
815 tuneVars <- latexTranslate(tolower(modelInfo$$label))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
816 tuneVars <- gsub("\\#", "the number of ", tuneVars, fixed = TRUE)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
817 if(ncol(modelFit$$bestTune) == 1 && colnames(modelFit$$bestTune) == ".parameter")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
818 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
819 summaryText <- paste(summaryText,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
820 "\n\n",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
821 "There are no tuning parameters associated with this model.",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
822 "To characterize the model performance on the training set,",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
823 resampleName,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
824 "was used.",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
825 "Table \\\\ref{T:resamps} and Figure \\\\ref{F:profile}",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
826 "show summaries of the resampling results. ")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
827
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
828 } else {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
829 summaryText <- paste("There",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
830 ifelse(nrow(modelInfo) > 1, "are", "is"),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
831 nrow(modelInfo),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
832 ifelse(nrow(modelInfo) > 1, "tuning parameters", "tuning parameter"),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
833 "associated with this model:",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
834 listString(tuneVars, period = TRUE))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
835
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
836
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
837
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
838 paramNames <- gsub(".", "", names(modelFit$$bestTune), fixed = TRUE)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
839 for(i in seq(along = paramNames))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
840 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
841 check <- modelInfo$$parameter %in% paramNames[i]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
842 if(any(check))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
843 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
844 paramNames[i] <- modelInfo$$label[which(check)]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
845 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
846 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
847
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
848 paramNames <- gsub("#", "the number of ", paramNames, fixed = TRUE)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
849 ## Check to see if there was only one combination fit
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
850 summaryText <- paste(summaryText,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
851 "To choose",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
852 ifelse(nrow(modelInfo) > 1,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
853 "appropriate values of the tuning parameters,",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
854 "an appropriate value of the tuning parameter,"),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
855 resampleName,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
856 "was used to generated a profile of performance across the",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
857 nrow(modelFit$$results),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
858 ifelse(nrow(modelInfo) > 1,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
859 "combinations of the tuning parameters.",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
860 "candidate values."),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
861
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
862 "Table \\\\ref{T:resamps} and Figure \\\\ref{F:profile} show",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
863 "summaries of the resampling profile. ", "The final model fitted to the entire training set was:",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
864 listString(paste(latexTranslate(tolower(paramNames)), "=", modelFit$$bestTune[1,]), period = TRUE))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
865
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
866 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
867 @
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
868
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
869 \Sexpr{summaryText}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
870
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
871 <<resampTable, echo = $RESAMPTABLEECHO, results = $RESAMPTABLERESULT>>=
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
872 tableData <- modelFit$$results
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
873
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
874 if(all(modelInfo$$parameter == "parameter") && resampName == "boot632")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
875 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
876 tableData <- tableData[,-1, drop = FALSE]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
877 colNums <- c( length(modelFit$perfNames), length(modelFit$perfNames), length(modelFit$perfNames))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
878 colLabels <- c("Mean", "Standard Deviation","Apparant")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
879 constString <- ""
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
880 isConst <- NULL
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
881 }else if (all(modelInfo$$parameter == "parameter") && (resampName == "boot" | resampName == "cv" | resampName = "repeatedcv" )){
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
882 tableData <- tableData[,-1, drop = FALSE]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
883 colNums <- c(length(modelFit$perfNames), length(modelFit$perfNames))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
884 colLabels <- c("Mean", "Standard Deviation")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
885 constString <- ""
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
886 isConst <- NULL
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
887 }else if (all(modelInfo$$parameter == "parameter") && resampName == "LOOCV" ){
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
888 tableData <- tableData[,-1, drop = FALSE]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
889 colNums <- length(modelFit$perfNames)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
890 colLabels <- c("Measures")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
891 constString <- ""
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
892 isConst <- NULL
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
893 }else if (all(modelInfo$$parameter != "parameter") && (resampName == "boot" | resampName == "cv" | resampName = "repeatedcv" )){
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
894 isConst <- apply(tableData[, modelInfo$$parameter, drop = FALSE],
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
895 2,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
896 function(x) length(unique(x)) == 1)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
897
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
898 numParamInTable <- sum(!isConst)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
899
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
900 if(any(isConst))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
901 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
902 constParam <- modelInfo$$parameter[isConst]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
903 constValues <- format(tableData[, constParam, drop = FALSE], digits = 4)[1,,drop = FALSE]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
904 tableData <- tableData[, !(names(tableData) %in% constParam), drop = FALSE]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
905 constString <- paste("The tuning",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
906 ifelse(sum(isConst) > 1,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
907 "parmeters",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
908 "parameter"),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
909 listString(paste("``", names(constValues), "''", sep = "")),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
910 ifelse(sum(isConst) > 1,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
911 "were",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
912 "was"),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
913 "held constant at",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
914 ifelse(sum(isConst) > 1,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
915 (sum(isConst) > 1,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
916 "a value of",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
917 "values of"),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
918 listString(constValues[1,]))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
919
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
920 } else constString <- ""
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
921
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
922 cn <- colnames(tableData)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
923 for(i in seq(along = cn))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
924 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
925 check <- modelInfo$$parameter %in% cn[i]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
926 if(any(check))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
927 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
928 cn[i] <- modelInfo$$label[which(check)]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
929 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
930 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
931 colnames(tableData) <- cn
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
932
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
933 colNums <- c(numParamInTable,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
934 length(modelFit$perfNames),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
935 length(modelFit$perfNames))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
936 colLabels <- c("", "Meaures", "Standard Deviation")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
937
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
938 } else if (all(modelInfo$$parameter != "parameter") && (resampName == "LOOCV" )){
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
939 isConst <- apply(tableData[, modelInfo$$parameter, drop = FALSE],
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
940 2,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
941 function(x) length(unique(x)) == 1)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
942
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
943 numParamInTable <- sum(!isConst)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
944
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
945 if(any(isConst))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
946 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
947 constParam <- modelInfo$$parameter[isConst]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
948 constValues <- format(tableData[, constParam, drop = FALSE], digits = 4)[1,,drop = FALSE]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
949 tableData <- tableData[, !(names(tableData) %in% constParam), drop = FALSE]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
950 constString <- paste("The tuning",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
951 ifelse(sum(isConst) > 1,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
952 "parmeters",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
953 "parameter"),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
954 listString(paste("``", names(constValues), "''", sep = "")),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
955 ifelse(sum(isConst) > 1,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
956 "were",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
957 "was"),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
958 "held constant at",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
959 ifelse(sum(isConst) > 1,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
960 (sum(isConst) > 1,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
961 "a value of",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
962 "values of"),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
963 listString(constValues[1,]))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
964
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
965 } else constString <- ""
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
966
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
967 cn <- colnames(tableData)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
968 for(i in seq(along = cn))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
969 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
970 check <- modelInfo$$parameter %in% cn[i]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
971 if(any(check))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
972 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
973 cn[i] <- modelInfo$$label[which(check)]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
974 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
975 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
976 colnames(tableData) <- cn
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
977
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
978 colNums <- c(numParamInTable,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
979 length(modelFit$perfNames))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
980 colLabels <- c("", "Measures" )
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
981
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
982
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
983 } else if (all(modelInfo$$parameter != "parameter") && (resampName == "boot632" )) {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
984 isConst <- apply(tableData[, modelInfo$$parameter, drop = FALSE],
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
985 2,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
986 function(x) length(unique(x)) == 1)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
987
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
988 numParamInTable <- sum(!isConst)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
989
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
990 if(any(isConst))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
991 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
992 constParam <- modelInfo$$parameter[isConst]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
993 constValues <- format(tableData[, constParam, drop = FALSE], digits = 4)[1,,drop = FALSE]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
994 tableData <- tableData[, !(names(tableData) %in% constParam), drop = FALSE]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
995 constString <- paste("The tuning",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
996 ifelse(sum(isConst) > 1,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
997 "parmeters",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
998 "parameter"),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
999 listString(paste("``", names(constValues), "''", sep = "")),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1000 ifelse(sum(isConst) > 1,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1001 "were",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1002 "was"),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1003 "held constant at",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1004 ifelse(sum(isConst) > 1,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1005 (sum(isConst) > 1,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1006 "a value of",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1007 "values of"),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1008 listString(constValues[1,]))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1009
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1010 } else constString <- ""
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1011
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1012 cn <- colnames(tableData)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1013 for(i in seq(along = cn))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1014 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1015 check <- modelInfo$$parameter %in% cn[i]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1016 if(any(check))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1017 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1018 cn[i] <- modelInfo$$label[which(check)]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1019 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1020 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1021 colnames(tableData) <- cn
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1022
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1023 colNums <- c(numParamInTable,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1024 length(modelFit$$perfNames),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1025 length(modelFit$$perfNames),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1026 length(modelFit$$perfNames))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1027 colLabels <- c("", "Mean", "Standard Deviation", "Apparant")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1028 } else {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1029 constString <- paste("you played with wrong parameters in resampling method")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1030 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1031
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1032
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1033
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1034
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1035 colnames(tableData) <- gsub("SD$$", "", colnames(tableData))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1036 colnames(tableData) <- gsub("Apparent$$", "", colnames(tableData))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1037 colnames(tableData) <- latexTranslate(colnames(tableData))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1038 rownames(tableData) <- latexTranslate(rownames(tableData))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1039
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1040 latex(tableData,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1041 rowname = NULL,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1042 file = "",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1043 cgroup = colLabels,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1044 n.cgroup = colNums,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1045 where = "h!",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1046 digits = 4,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1047 longtable = nrow(tableData) > 30,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1048 caption = paste(resampleName, "results from the model fit.", constString),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1049 label = "T:resamps")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1050 @
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1051
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1052 \setkeys{Gin}{ width = 0.9\textwidth}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1053 \begin{figure}[b]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1054 \begin{center}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1055
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1056 <<profilePlot, echo = $PROFILEPLOTECHO, fig = $PROFILEPLOTFIG, width = 8, height = 6>>=
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1057 trellis.par.set(caretTheme(), warn = TRUE)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1058 if(all(modelInfo$$parameter == "parameter") | all(isConst) | modName == "nb")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1059 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1060 resultsPlot <- resampleHist(modelFit)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1061 plotCaption <- paste("Distributions of model performance from the ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1062 "training set estimated using ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1063 resampleName)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1064 } else {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1065 if(modName %in% c("svmPoly", "svmRadial", "svmLinear"))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1066 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1067 resultsPlot <- plot(modelFit,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1068 metric = optMetric,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1069 xTrans = function(x) log10(x))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1070 resultsPlot <- update(resultsPlot,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1071 type = c("g", "p", "l"),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1072 ylab = paste(optMetric, " (", resampleName, ")", sep = ""))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1073
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1074 } else {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1075 resultsPlot <- plot(modelFit,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1076 metric = optMetric)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1077 resultsPlot <- update(resultsPlot,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1078 type = c("g", "p", "l"),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1079 ylab = paste(optMetric, " (", resampleName, ")", sep = ""))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1080 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1081 plotCaption <- paste("A plot of the estimates of the",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1082 optMetric,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1083 "values calculated using",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1084 resampleName)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1085 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1086 print(resultsPlot)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1087 @
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1088 \caption[Performance Plot]{\Sexpr{plotCaption}.}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1089 \label{F:profile}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1090 \end{center}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1091 \end{figure}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1092
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1093
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1094 <<stopWorkers, echo = $STOPWORKERSECHO, results = $STOPWORKERSRESULT>>=
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1095 if(numWorkers > 1) stopCluster(cl)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1096 @
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1097
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1098 <<testPred, results = $TESTPREDRESULT, echo = $TESTPREDECHO>>=
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1099 if(pctTrain < 1)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1100 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1101 cat("\\clearpage\n\\section*{Test Set Results}\n\n")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1102 classPred <- predict(modelFit, testX)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1103 cm <- confusionMatrix(classPred, testY)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1104 values <- cm$$overall[c("Accuracy", "Kappa", "AccuracyPValue", "McnemarPValue")]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1105
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1106 values <- values[!is.na(values) & !is.nan(values)]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1107 values <- c(format(values[1:2], digits = 3),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1108 format.pval(values[-(1:2)], digits = 5))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1109 nms <- c("the overall accuracy", "the Kappa statistic",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1110 "the $$p$$--value that accuracy is greater than the no--information rate",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1111 "the $$p$$--value of concordance from McNemar's test")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1112 nms <- nms[seq(along = values)]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1113 names(values) <- nms
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1114
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1115 if(any(modelInfo$$probModel))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1116 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1117 classProbs <- extractProb(list(fit = modelFit),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1118 testX = testX,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1119 testY = testY)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1120 classProbs <- subset(classProbs, dataType == "Test")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1121 if(numClasses == 2)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1122 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1123 tmp <- twoClassSummary(classProbs, lev = levels(classProbs$$obs))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1124 tmp <- c(format(tmp, digits = 3))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1125 names(tmp) <- c("the sensitivity", "the specificity",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1126 "the area under the ROC curve")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1127 values <- c(values, tmp)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1128
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1129 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1130 probPlot <- plotClassProbs(classProbs)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1131 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1132 testString <- paste("Based on the test set of",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1133 nrow(testX),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1134 "samples,",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1135 listString(paste(names(values), "was", values), period = TRUE),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1136 "The confusion matrix for the test set is shown in Table",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1137 "\\\\ref{T:cm}.")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1138 testString <- paste(testString,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1139 " Using ", resampleName,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1140 ", the training set estimates were ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1141 resampleStats(modelFit),
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1142 ".",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1143 sep = "")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1144
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1145 if(any(modelInfo$$probModel)) testString <- paste(testString,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1146 "Histograms of the class probabilities",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1147 "for the test set samples are shown in",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1148 "Figure \\\\ref{F:probs}",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1149 ifelse(numClasses == 2,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1150 " and the test set ROC curve is in Figure \\\\ref{F:roc}.",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1151 "."))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1152
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1153
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1154
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1155 latex(cm$$table,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1156 title = "",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1157 file = "",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1158 where = "h",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1159 cgroup = "Observed Values",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1160 n.cgroup = numClasses,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1161 caption = "The confusion matrix for the test set",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1162 label = "T:cm")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1163
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1164 } else testString <- ""
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1165 @
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1166 \Sexpr{testString}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1167
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1168
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1169 <<classProbsTex, results = $CLASSPROBSTEXRESULT, echo = $CLASSPROBSTEXECHO>>=
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1170 if(any(modelInfo$$probModel))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1171 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1172 cat(
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1173 paste("\\begin{figure}[p]\n",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1174 "\\begin{center}\n",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1175 "\\includegraphics{classProbs}",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1176 "\\caption[PCA Plot]{Class probabilities",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1177 "for the test set. Each panel contains ",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1178 "separate classes}\n",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1179 "\\label{F:probs}\n",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1180 "\\end{center}\n",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1181 "\\end{figure}"))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1182 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1183 if(any(modelInfo$$probModel) & numClasses == 2)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1184 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1185 cat(
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1186 paste("\\begin{figure}[p]\n",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1187 "\\begin{center}\n",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1188 "\\includegraphics[clip, width = .8\\textwidth]{roc}",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1189 "\\caption[ROC Plot]{ROC Curve",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1190 "for the test set.}\n",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1191 "\\label{F:roc}\n",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1192 "\\end{center}\n",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1193 "\\end{figure}"))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1194 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1195 @
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1196 <<classProbsTex, results = $CLASSPROBSTEXRESULT1, echo = $CLASSPROBSTEXECHO1 >>=
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1197 if(any(modelInfo$$probModel))
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1198 {
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1199 pdf("classProbs.pdf", height = 7, width = 7)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1200 trellis.par.set(caretTheme(), warn = FALSE)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1201 print(probPlot)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1202 dev.off()
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1203 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1204
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1205 if(any(modelInfo$$probModel) & numClasses == 2)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1206 { resPonse<-testY
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1207 preDictor<-classProbs[, levels(trainY)[1]]
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1208 pdf("roc.pdf", height = 8, width = 8)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1209 # from pROC example at http://web.expasy.org/pROC/screenshots.htm
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1210 plot.roc(resPonse, preDictor, # data
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1211 percent=TRUE, # show all values in percent
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1212 partial.auc=c(100, 90), partial.auc.correct=TRUE, # define a partial AUC (pAUC)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1213 print.auc=TRUE, #display pAUC value on the plot with following options:
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1214 print.auc.pattern="Corrected pAUC (100-90%% SP):\n%.1f%%", print.auc.col="#1c61b6",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1215 auc.polygon=TRUE, auc.polygon.col="#1c61b6", # show pAUC as a polygon
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1216 max.auc.polygon=TRUE, max.auc.polygon.col="#1c61b622", # also show the 100% polygon
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1217 main="Partial AUC (pAUC)")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1218 plot.roc(resPonse, preDictor,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1219 percent=TRUE, add=TRUE, type="n", # add to plot, but don't re-add the ROC itself (useless)
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1220 partial.auc=c(100, 90), partial.auc.correct=TRUE,
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1221 partial.auc.focus="se", # focus pAUC on the sensitivity
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1222 print.auc=TRUE, print.auc.pattern="Corrected pAUC (100-90%% SE):\n%.1f%%", print.auc.col="#008600",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1223 print.auc.y=40, # do not print auc over the previous one
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1224 auc.polygon=TRUE, auc.polygon.col="#008600",
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1225 max.auc.polygon=TRUE, max.auc.polygon.col="#00860022")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1226 dev.off()
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1227 }
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1228
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1229
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1230 @
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1231
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1232 \section*{Versions}
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1233
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1234 <<versions, echo = FALSE, results = tex>>=
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1235 toLatex(sessionInfo())
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1236
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1237 @
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1238
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1239 <<save-data, echo = $SAVEDATAECHO, results = $SAVEDATARESULT>>=
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1240 ## change this to the name of modName....
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1241 Fit<-modelFit
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1242 save(Fit,file="$METHOD-Fit.RData")
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1243 @
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1244 The model was built using $METHOD and is saved as $METHOD-Fit.RData for reuse. This contains the variable Fit.
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1245
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1246 \end{document}'''
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1247
ee374e48024f Uploaded
deepakjadmin
parents:
diff changeset
1248 return template4Rnw