Mercurial > repos > bgruening > eden_toolbox
view EDeN_train.xml @ 7:59b3b6ce10bb draft
Uploaded
author | bgruening |
---|---|
date | Tue, 29 Oct 2013 11:07:49 -0400 |
parents | a3edc97e056c |
children | 9262f801d739 |
line wrap: on
line source
<tool id="bg_eden_train" name="EDeN Train" version="0.1"> <description></description> <macros> <import>eden_macros.xml</import> </macros> <expand macro="requirements" /> <command> tmp_dir=`mktemp -d -u`; EDeN --action TRAIN --input_data_file_name $infile --file_type "SPARSE_VECTOR" --binary_file_type ##--output_directory_path \$tmp_dir ## TODO: we need a tool that creates such a file, maybe from the metadata of an SDF file ## target_file_name is a file with 1 or -1 one in each row, indicating the class --target_file_name $target_infile --model_file_name $model_outfile --lambda $lambda ##??? notation? --epochs $epoch --sparsification_num_iterations $sparsification_num_iterations --topological_regularization_num_neighbors $topological_regularization_num_neighbors --topological_regularization_decay_rate $topological_regularization_decay_rate --num_iterations $num_iterations --threshold $threshold --only_positive $only_positive --only_negative $only_negative --random_seed $random_seed </command> <inputs> <param format="eden_sparse_vector" name="infile" type="data" label="Input Graph" help=""/> <param format="txt" name="target_infile" type="data" label="Target file" help="indicates with -1 and 1 the class"/> <param name="epoch" type="integer" value="10" label="Epoch, Stochastic gradient descend algorithm." help=""> <validator type="in_range" min="1" /> </param> <param name="lambda" type="text" value="1e-4" label="lambda, Stochastic gradient descend algorithm." help="" /> <!-- Semi-supervised-settings --> <param name="threshold" type="float" value="1.0" label="Top and low quantile" help="Only the top and low quantile will be used as positives and negative instances. A threshold of 1 means that all unsupervised instaces are used in the next phase."> <validator type="in_range" min="0.0" /> </param> <param name="num_iterations" type="integer" value="3" label="Number of iterations" /> <param name="only_negative" type="boolean" label="Induce only negative class instances." truevalue="--only_negative" falsevalue="" checked="false" /> <param name="only_positive" type="boolean" label="Induce only positive class instances." truevalue="--only_positive" falsevalue="" checked="false" /> <param name="topological_regularization_decay_rate" type="float" value="0.01" label="Topological regularization decay rate"> <validator type="in_range" min="0.0" /> </param> <param name="topological_regularization_num_neighbors" type="integer" value="0" label="Topological regularization number of neighbors"> <validator type="in_range" min="0" /> </param> <param name="sparsification_num_iterations" type="integer" value="0" label="Sparsification number of iterations"> <validator type="in_range" min="0" /> </param> <param name="random_seed" type="integer" value="1" label="Random Seed" help="" /> </inputs> <outputs> <data format="txt" name="model_outfile" label="Train Model from ${on_string}"/> </outputs> <tests> <test> <param name="infile" value="3_molceuls.sdf" /> <output name="outfile" file="3_molecules.gspan" /> </test> </tests> <help> .. class:: infomark **What it does** The linear model is induced using the accelerated stochastic gradient descent technique by Léon Bottou and Yann LeCun. When the target information is 0, a self-training algorithm is used to impute a positive or negative class to the unsupervised instances. If the target information is imbalanced a minority class resampling technique is used to rebalance the training set. @references@ The code for Stochastic Gradient Descent SVM is adapted from http://leon.bottou.org/projects/sgd. Léon Bottou and Yann LeCun, ''Large Scale Online Learning'', Advances in Neural Information Processing Systems 16, Edited by Sebastian Thrun, Lawrence Saul and Bernhard Schölkopf, MIT Press, Cambridge, MA, 2004. </help> </tool>