annotate README.rst @ 13:ebd3bd2f2985 draft default tip

planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 80417bf0158a9b596e485dd66408f738f405145a
author bgruening
date Mon, 02 Oct 2023 08:46:12 +0000
parents ac8bef635fcb
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
1 Galaxy wrapper for scikit-learn library
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
2 ***************************************
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
3
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
4 Contents
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
5 ========
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
6
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
7 - `What is scikit-learn?`_
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
8 - `Scikit-learn main package groups`_
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
9 - `Tools offered by this wrapper`_
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
10
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
11 - `Machine learning workflows`_
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
12 - `Supervised learning workflows`_
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
13 - `Unsupervised learning workflows`_
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
14
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
15
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
16 ____________________________
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
17
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
18
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
19 .. _What is scikit-learn?:
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
20
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
21 What is scikit-learn?
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
22 =====================
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
23
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
24 Scikit-learn is an open-source machine learning library for the Python programming language. It offers various algorithms for performing supervised and unsupervised learning as well as data preprocessing and transformation, model selection and evaluation, and dataset utilities. It is built upon SciPy (Scientific Python) library.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
25
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
26 Scikit-learn source code can be accessed at https://github.com/scikit-learn/scikit-learn.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
27 Detailed installation instructions can be found at http://scikit-learn.org/stable/install.html
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
28
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
29
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
30 .. _Scikit-learn main package groups:
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
31
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
32 Scikit-learn main package groups
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
33 ================================
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
34
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
35 Scikit-learn provides the users with several main groups of related operations.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
36 These are:
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
37
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
38 - Classification
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
39 - Identifying to which category an object belongs.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
40 - Regression
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
41 - Predicting a continuous-valued attribute associated with an object.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
42 - Clustering
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
43 - Automatic grouping of similar objects into sets.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
44 - Preprocessing
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
45 - Feature extraction and normalization.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
46 - Model selection and evaluation
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
47 - Comparing, validating and choosing parameters and models.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
48 - Dimensionality reduction
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
49 - Reducing the number of random variables to consider.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
50
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
51 Each group consists of a number of well-known algorithms from the category. For example, one can find hierarchical, spectral, kmeans, and other clustering methods in sklearn.cluster package.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
52
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
53
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
54 .. _Tools offered by this wrapper:
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
55
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
56 Available tools in the current wrapper
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
57 ======================================
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
58
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
59 The current release of the wrapper offers a subset of the packages from scikit-learn library. You can find:
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
60
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
61 - A subset of classification metric functions
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
62 - Linear and quadratic discriminant classifiers
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
63 - Random forest and Ada boost classifiers and regressors
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
64 - All the clustering methods
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
65 - All support vector machine classifiers
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
66 - A subset of data preprocessing estimator classes
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
67 - Pairwise metric measurement functions
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
68
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
69 In addition, several tools for performing matrix operations, generating problem-specific datasets, and encoding text and extracting features have been prepared to help the user with more advanced operations.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
70
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
71 .. _Machine learning workflows:
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
72
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
73 Machine learning workflows
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
74 ==========================
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
75
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
76 Machine learning is about processes. No matter what machine learning algorithm we use, we can apply typical workflows and dataflows to produce more robust models and better predictions.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
77 Here we discuss supervised and unsupervised learning workflows.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
78
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
79 .. _Supervised learning workflows:
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
80
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
81 Supervised machine learning workflows
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
82 =====================================
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
83
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
84 **What is supervised learning?**
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
85
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
86 In this machine learning task, given sample data which are labeled, the aim is to build a model which can predict the labels for new observations.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
87 In practice, there are five steps which we can go through to start from raw input data and end up getting reasonable predictions for new samples:
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
88
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
89 1. Preprocess the data::
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
90
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
91 * Change the collected data into the proper format and datatype.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
92 * Adjust the data quality by filling the missing values, performing
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
93 required scaling and normalizations, etc.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
94 * Extract features which are the most meaningfull for the learning task.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
95 * Split the ready dataset into training and test samples.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
96
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
97 2. Choose an algorithm::
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
98
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
99 * These factors help one to choose a learning algorithm:
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
100 - Nature of the data (e.g. linear vs. nonlinear data)
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
101 - Structure of the predicted output (e.g. binary vs. multilabel classification)
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
102 - Memory and time usage of the training
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
103 - Predictive accuracy on new data
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
104 - Interpretability of the predictions
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
105
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
106 3. Choose a validation method
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
107
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
108 Every machine learning model should be evaluated before being put into practicical use.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
109 There are numerous performance metrics to evaluate machine learning models.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
110 For supervised learning, usually classification or regression metrics are used.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
111
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
112 A validation method helps to evaluate the performance metrics of a trained model in order
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
113 to optimize its performance or ultimately switch to a more efficient model.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
114 Cross-validation is a known validation method.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
115
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
116 4. Fit a model
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
117
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
118 Given the learning algorithm, validation method, and performance metric(s)
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
119 repeat the following steps::
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
120
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
121 * Train the model.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
122 * Evaluate based on metrics.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
123 * Optimize unitl satisfied.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
124
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
125 5. Use fitted model for prediction::
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
126
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
127 This is a final evaluation in which, the optimized model is used to make predictions
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
128 on unseen (here test) samples. After this, the model is put into production.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
129
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
130 .. _Unsupervised learning workflows:
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
131
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
132 Unsupervised machine learning workflows
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
133 =======================================
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
134
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
135 **What is unsupervised learning?**
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
136
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
137 Unlike supervised learning and more liklely in real life, here the initial data is not labeled.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
138 The task is to extract the structure from the data and group the samples based on their similarities.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
139 Clustering and dimensionality reduction are two famous examples of unsupervised learning tasks.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
140
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
141 In this case, the workflow is as follows::
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
142
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
143 * Preprocess the data (without splitting to train and test).
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
144 * Train a model.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
145 * Evaluate and tune parameters.
ac8bef635fcb planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit 60f0fbc0eafd7c11bc60fb6c77f2937782efd8a9-dirty
bgruening
parents:
diff changeset
146 * Analyse the model and test on real data.