Mercurial > repos > bgruening > upload_testing

<tool id="LinearRegression1" name="Perform Linear Regression" version="1.1.0">
  <description> </description>
  <expand macro="requirements" />
    <macros>
        <import>statistic_tools_macros.xml</import>
    </macros>
  <command interpreter="python">
    linear_regression.py
      $input1
      $response_col
      $predictor_cols
      $out_file1
      $out_file2
      1>/dev/null
  </command>
  <inputs>
    <param format="tabular" name="input1" type="data" label="Select data" help="Dataset missing? See TIP below."/>
    <param name="response_col" label="Response column (Y)" type="data_column" data_ref="input1" numerical="True"/>
    <param name="predictor_cols" label="Predictor columns (X)" type="data_column" data_ref="input1" numerical="True" multiple="true" >
        <validator type="no_options" message="Please select at least one column."/>
    </param>
  </inputs>
  <outputs>
    <data format="input" name="out_file1" metadata_source="input1" />
    <data format="pdf" name="out_file2" />
  </outputs>
  <tests>
    <test>
        <param name="input1" value="regr_inp.tabular"/>
        <param name="response_col" value="3"/>
        <param name="predictor_cols" value="1,2"/>
        <output name="out_file1" file="regr_out.tabular"/>
        <output name="out_file2" file="regr_out.pdf"/>
    </test>
  </tests>
  <help>


.. class:: infomark

**TIP:** If your data is not TAB delimited, use *Edit Datasets-&gt;Convert characters*

-----

.. class:: infomark

**What it does**

This tool uses the 'lm' function from R statistical package to perform linear regression on the input data. It outputs two files, one containing the summary statistics of the performed regression, and the other containing diagnostic plots to check whether model assumptions are satisfied.

*R Development Core Team (2009). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.*

-----

.. class:: warningmark

**Note**

- This tool currently treats all predictor and response variables as continuous numeric variables. Running the tool on categorical variables might result in incorrect results.

- Rows containing non-numeric (or missing) data in any of the chosen columns will be skipped from the analysis.

- The summary statistics in the output are described below:

  - sigma: the square root of the estimated variance of the random error (standard error of the residiuals)
  - R-squared: the fraction of variance explained by the model
  - Adjusted R-squared: the above R-squared statistic adjusted, penalizing for the number of the predictors (p)
  - p-value: p-value for the t-test of the null hypothesis that the corresponding slope is equal to zero against the two-sided alternative.


  </help>
</tool>
author	bernhardlutz
date	Thu, 23 Jan 2014 14:53:46 -0500
parents	c4a3a8999945
children