This package allows to assess classifiers.
It contains the class ClassifierAssessment
that
is used as a super-class of all implemented methodologies of
an assessment to assess classifiers. In addition it should be
used as a super-class of all coming assessments since this
class already implements basic patterns like:
- handling of given classifiers
- handling of given models and construction of classifiers using those models
- management of temporary results (In general an assessment is a repeated
procedure producing several temporary results that are summarized in terms of
means or standard deviations or standard-errors.)
- construction of a summary (mean and standard-error of temporary results)
Further on it contains three implementations of different
assessments to assess classifiers. These are:
- Repeated HoldOut Experiment
- Sampled Repeated HoldOut Experiment
- K-Fold Crossvaliation
- Repeated Subsampling Experiment
A RepeatedHoldOutExperiment
implements the following procedure.
For given data-sets it randomly, mutually exclusive partitions the given data-sets
into a train-data-set and a test-data-set. Afterwards it uses these data-sets to first
train the classifiers and afterwards assess its performance to correctly predict
the elements of the test-data-sets. This step is repeated at users will.
A Sampled_RepeatedHoldOutExperiment
is a special ClassifierAssessment
that partitions the data of a user-specified reference class and data sets non-overlapping
for all other classes, so that one gets the same number of sequences (and the same
lengths of the sequences) in each train and test data set.
A KFoldCrossValidation
implements a k-fold crossvalidation. That is
the given data is randomly and mutually exclusive partitioned into k parts.
Each of these parts is used once as test-data-set and the remaining k-1
parts are used once as train-data-sets. In each of the k steps the classifiers
are trained using the train-data-sets and their performance to correctly predict
the elements of the test-data-sets is assessed.
A RepeatedSubSamplingExperiment
subsamples in each step
a train-data-set and a test-data-set from given data. These data-sets
may be overlapping. Afterwards the classifiers are trained using the
train-data-sets and their performance to predict the elements of the
test-data-sets is assessed. This procedure is repeated at users will.
In addition all classes allow to assess classifiers using a set of
user-specified test-data-sets and a set of user specified train-data-sets.
This methodology allows the user to use test- and train-data-sets
that are not automatically generated but user-specified.