Mercurial > repos > bgruening > bg_statistical_hypothesis_testing
view statistical_hypothesis_testing.xml @ 0:a3d8cadaf060 draft default tip
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/statistics commit 7c5002672919ca1e5eacacb835a4ce66ffa19656
author | bgruening |
---|---|
date | Mon, 21 Nov 2022 18:07:45 +0000 |
parents | |
children |
line wrap: on
line source
<tool id="bg_statistical_hypothesis_testing" name="Statistical hypothesis testing" version="0.3"> <description>computes several descriptive statistics</description> <macros> <import>macros.xml</import> </macros> <expand macro="requirements" /> <command detect_errors="exit_code"><![CDATA[ python '$__tool_directory__/statistical_hypothesis_testing.py' --infile '${infile}' --outfile '${outfile}' --test_id '${test_methods.test_methods_opts}' #if str($test_methods.test_methods_opts) == "describe" or str($test_methods.test_methods_opts) == "mode" or str($test_methods.test_methods_opts) == "normaltest" or str($test_methods.test_methods_opts) == "kurtosistest" or str($test_methods.test_methods_opts) == "skewtest" or str($test_methods.test_methods_opts) == "nanmean" or str($test_methods.test_methods_opts) == "nanmedian" or str($test_methods.test_methods_opts) == "variation" or str($test_methods.test_methods_opts) == "itemfreq" or str($test_methods.test_methods_opts) == "kurtosistest" or str($test_methods.test_methods_opts) == "skewtest" or str($test_methods.test_methods_opts) == "nanmean" or str($test_methods.test_methods_opts) == "nanmedian" or str($test_methods.test_methods_opts) == "variation" or str($test_methods.test_methods_opts) == "tiecorrect": --sample_one_cols '${test_methods.sample_one_cols}' #elif str($test_methods.test_methods_opts) == "gmean" or str($test_methods.test_methods_opts) == "hmean": --sample_one_cols '${test_methods.sample_one_cols}' --dtype "${test_methods.dtype}" #elif str($test_methods.test_methods_opts) == "anderson": --sample_one_cols "${test_methods.sample_one_cols}" --dist "${test_methods.dist}" #elif str($test_methods.test_methods_opts) == "binom_test": --sample_one_cols "${test_methods.sample_one_cols}" --n "${test_methods.n}" --p "${test_methods.p}" #elif str($test_methods.test_methods_opts) == "kurtosis": --sample_one_cols "${test_methods.sample_one_cols}" --axis "${test_methods.axis}" $test_methods.fisher $test_methods.bias #elif str($test_methods.test_methods_opts) == "moment": --sample_one_cols "${test_methods.sample_one_cols}" --n "${test_methods.n}" #elif str($test_methods.test_methods_opts) == "bayes_mvs": --sample_one_cols "${test_methods.sample_one_cols}" --alpha "${test_methods.alpha}" #elif str($test_methods.test_methods_opts) == "percentileofscore": --sample_one_cols "${test_methods.sample_one_cols}" --score "${test_methods.score}" --kind "${test_methods.kind}" #elif str($test_methods.test_methods_opts) == "sigmaclip": --sample_one_cols "${test_methods.sample_one_cols}" --n "${test_methods.n}" --m "${test_methods.m}" #elif str($test_methods.test_methods_opts) == "chi2_contingency": --sample_one_cols "${test_methods.sample_one_cols}" $test_methods.correction #if str($test_methods.lambda_).strip(): --lambda_ "${test_methods.lambda_}" #end if #elif str($test_methods.test_methods_opts) == "skew" or str($test_methods.test_methods_opts) == "nanstd" : --sample_one_cols "${test_methods.sample_one_cols}" $test_methods.bias #elif str($test_methods.test_methods_opts) == "rankdata": --sample_one_cols "${test_methods.sample_one_cols}" --md "${test_methods.md}" #elif str($test_methods.test_methods_opts) == "sem" or str($test_methods.test_methods_opts) == "zscore" or str($test_methods.test_methods_opts) == "signaltonoise": --sample_one_cols "${test_methods.sample_one_cols}" #if str($test_methods.ddof).strip(): --ddof "${test_methods.ddof}" #end if #elif str($test_methods.test_methods_opts) == "trimboth": --sample_one_cols "${test_methods.sample_one_cols}" #if str($test_methods.proportiontocut).strip(): --proportiontocut "${test_methods.proportiontocut}" #end if #elif str($test_methods.test_methods_opts) == "trim1": --sample_one_cols "${test_methods.sample_one_cols}" #if str($test_methods.proportiontocut).strip(): --proportiontocut "${test_methods.proportiontocut}" #end if --tail "${test_methods.tail}" #elif str($test_methods.test_methods_opts) == "boxcox": --sample_one_cols "${test_methods.sample_one_cols}" --alpha "${test_methods.alpha}" #if str($test_methods.imbda).strip(): --imbda "${test_methods.imbda}" #end if #elif str($test_methods.test_methods_opts) == "boxcox_llf": --sample_one_cols "${test_methods.sample_one_cols}" --imbda "${test_methods.imbda}" #elif str($test_methods.test_methods_opts) == "kstest": --sample_one_cols "${test_methods.sample_one_cols}" #if str($test_methods.ni).strip(): --ni "${test_methods.ni}" #end if --cdf '${test_methods.cdf}' --alternative '${test_methods.alternative}' --mode '${test_methods.mode}' #elif str($test_methods.test_methods_opts) == "boxcox_normmax": --sample_one_cols '${test_methods.sample_one_cols}' #if str($test_methods.mf).strip(): --mf '${test_methods.mf}' #end if #if str($test_methods.nf).strip(): --nf '${test_methods.nf}' #end if --method '${test_methods.method}' #elif str($test_methods.test_methods_opts) == "tmean" or str($test_methods.test_methods_opts) == "tvar" or str($test_methods.test_methods_opts) == "tstd" or str($test_methods.test_methods_opts) == "tsem": --sample_one_cols "${test_methods.sample_one_cols}" #if str($test_methods.mf).strip(): --mf '${test_methods.mf}' #end if #if str($test_methods.nf).strip(): --nf '${test_methods.nf}' #end if $test_methods.inclusive1 $test_methods.inclusive2 #elif str($test_methods.test_methods_opts) == "tmin": --sample_one_cols "${test_methods.sample_one_cols}" #if str($test_methods.mf).strip(): --mf "${test_methods.mf}" #end if $test_methods.inclusive #elif str($test_methods.test_methods_opts) == "tmax": --sample_one_cols "${test_methods.sample_one_cols}" #if str($test_methods.nf).strip(): --nf "${test_methods.nf}" #end if $test_methods.inclusive #elif str($test_methods.test_methods_opts) == "histogram": --sample_one_cols "${test_methods.sample_one_cols}" #if str($test_methods.mf).strip(): --mf "${test_methods.mf}" #end if #if str($test_methods.nf).strip(): --nf "${test_methods.nf}" #end if --b "${test_methods.b}" $test_methods.printextras #elif str($test_methods.test_methods_opts) == "cumfreq": --sample_one_cols "${test_methods.sample_one_cols}" #if str($test_methods.mf).strip(): --mf "${test_methods.mf}" #end if #if str($test_methods.nf).strip(): --nf "${test_methods.nf}" #end if --b "${test_methods.b}" #elif str($test_methods.test_methods_opts) == "threshold": --sample_one_cols "${test_methods.sample_one_cols}" #if str($test_methods.mf).strip(): --mf "${test_methods.mf}" #end if #if str($test_methods.nf).strip(): --nf "${test_methods.nf}" #end if --new "${test_methods.new}" #elif str($test_methods.test_methods_opts) == "relfreq": --sample_one_cols "${test_methods.sample_one_cols}" #if str($test_methods.mf).strip(): --mf "${test_methods.mf}" #end if #if str($test_methods.nf).strip(): --nf "${test_methods.nf}" #end if --b "${test_methods.b}" #elif str($test_methods.test_methods_opts) == "spearmanr": --sample_one_cols "${test_methods.sample_one_cols}" #if str($test_methods.sample_two_cols).strip(): --sample_two_cols "${test_methods.sample_two_cols}" #end if #elif str($test_methods.test_methods_opts) == "theilslopes": --sample_one_cols "${test_methods.sample_one_cols}" #if str($test_methods.sample_two_cols).strip(): --sample_two_cols "${test_methods.sample_two_cols}" #end if --alpha "${test_methods.alpha}" #elif str($test_methods.test_methods_opts) == "chisquare": --sample_one_cols "${test_methods.sample_one_cols}" #if str($test_methods.sample_two_cols).strip(): --sample_two_cols "${test_methods.sample_two_cols}" #end if #if str($test_methods.ddof).strip(): --ddof "${test_methods.ddof}" #end if #elif str($test_methods.test_methods_opts) == "power_divergence": --sample_one_cols "${test_methods.sample_one_cols}" #if str($test_methods.sample_two_cols).strip(): --sample_two_cols "${test_methods.sample_two_cols}" #end if #if str($test_methods.ddof).strip(): --ddof "${test_methods.ddof}" #end if #if str($test_methods.lambda_).strip(): --lambda_ "${test_methods.lambda_}" #end if #elif str($test_methods.test_methods_opts) == "combine_pvalues": --sample_one_cols "${test_methods.sample_one_cols}" #if str($test_methods.sample_two_cols).strip() and $test_methods.sample_two_cols: --sample_two_cols "${test_methods.sample_two_cols}" #end if --med "${test_methods.med}" #elif str($test_methods.test_methods_opts) == "wilcoxon": --sample_one_cols "${test_methods.sample_one_cols}" #if str($test_methods.sample_two_cols).strip() and $test_methods.sample_two_cols: --sample_two_cols "${test_methods.sample_two_cols}" #end if --zero_method "${test_methods.zero_method}" $test_methods.correction #elif str($test_methods.test_methods_opts) == "ranksums" or str($test_methods.test_methods_opts) == "ansari" or str($test_methods.test_methods_opts) == "linregress" or str($test_methods.test_methods_opts) == "pearsonr" or str($test_methods.test_methods_opts) == "pointbiserialr" or str($test_methods.test_methods_opts) == "ks_2samp" or str($test_methods.test_methods_opts) == "ttest_1samp" or str($test_methods.test_methods_opts) == "histogram2": --sample_one_cols '${test_methods.sample_one_cols}' --sample_two_cols '${test_methods.sample_two_cols}' #elif str($test_methods.test_methods_opts) == "entropy": --sample_one_cols '${test_methods.sample_one_cols}' --sample_two_cols v${test_methods.sample_two_cols}' --base "${test_methods.base}" #elif str($test_methods.test_methods_opts) == "kendalltau": --sample_one_cols '${test_methods.sample_one_cols}' --sample_two_cols '${test_methods.sample_two_cols}' $test_methods.initial_lexsort #elif str($test_methods.test_methods_opts) == "kendalltau": --sample_one_cols '${test_methods.sample_one_cols}' --sample_two_cols '${test_methods.sample_two_cols}' $test_methods.initial_lexsort #elif str($test_methods.test_methods_opts) == "mannwhitneyu": --sample_one_cols '${test_methods.sample_one_cols}' --sample_two_cols '${test_methods.sample_two_cols}' $test_methods.mwu_use_continuity #elif str($test_methods.test_methods_opts) == "ttest_ind": --sample_one_cols '${test_methods.sample_one_cols}' --sample_two_cols '${test_methods.sample_two_cols}' $test_methods.equal_var #elif str($test_methods.test_methods_opts) == "ttest_rel": --sample_one_cols '${test_methods.sample_one_cols}' --sample_two_cols '${test_methods.sample_two_cols}' --axis "${test_methods.axis}" #elif str($test_methods.test_methods_opts) == "zmap": --sample_one_cols '${test_methods.sample_one_cols}' --sample_two_cols '${test_methods.sample_two_cols}' #if str($test_methods.ddof).strip(): --ddof '${test_methods.ddof}' #end if #elif str($test_methods.test_methods_opts) == "binned_statistic": --sample_one_cols '${test_methods.sample_one_cols}' --sample_two_cols '${test_methods.sample_two_cols}' #if str($test_methods.mf).strip(): --mf '${test_methods.mf}' #end if #if str($test_methods.nf).strip(): --nf '${test_methods.nf}' #end if --statistic '${test_methods.statistic}' --b '${test_methods.b}' #elif str($test_methods.test_methods_opts) == "scoreatpercentile": --sample_one_cols "${test_methods.sample_one_cols}" --sample_two_cols "${test_methods.sample_two_cols}" #if str($test_methods.mf).strip(): --mf '${test_methods.mf}' #end if #if str($test_methods.nf).strip(): --nf '${test_methods.nf}' #end if --interpolation '${test_methods.interpolation}' #elif str($test_methods.test_methods_opts) == "mood": --axis "${test_methods.axis}" --sample_one_cols "${test_methods.sample_one_cols}" --sample_two_cols "${test_methods.sample_two_cols}" #elif str($test_methods.test_methods_opts) == "shapiro": --sample_one_cols "${test_methods.sample_one_cols}" #elif str($test_methods.test_methods_opts) == "bartlett" or str($test_methods.test_methods_opts) == "f_oneway" or str($test_methods.test_methods_opts) == "kruskal" or str($test_methods.test_methods_opts) == "friedmanchisquare" or str($test_methods.test_methods_opts) == "obrientransform": --sample_cols "#echo ';'.join( [str($list.sample_cols) for $list in $test_methods.samples] )#" #elif str($test_methods.test_methods_opts) == "levene": --sample_cols "#echo ';'.join( [str($list.sample_cols) for $list in $test_methods.samples] )#" --center "${test_methods.center}" #if str($test_methods.proportiontocut).strip(): --proportiontocut "${test_methods.proportiontocut}" #end if #elif str($test_methods.test_methods_opts) == "fligner": --sample_cols "#echo ';'.join( [str($list.sample_cols) for $list in $test_methods.samples] )#" --center "${test_methods.center}" #if str($test_methods.proportiontocut).strip(): --proportiontocut "${test_methods.proportiontocut}" #end if #elif str($test_methods.test_methods_opts) == "median_test": --sample_cols "#echo ';'.join( [str($list.sample_cols) for $list in $test_methods.samples] )#" $test_methods.correction #if str($test_methods.lambda_).strip(): --lambda_ "${test_methods.lambda_}" #end if --ties '${test_methods.ties}' #end if ]]></command> <inputs> <param name="infile" type="data" format="tabular" label="Sample file" help="tabular file containing the observations"/> <conditional name="test_methods"> <param name="test_methods_opts" type="select" label="Select a statistical test method"> <option value="describe">Computes several descriptive statistics of the passed array</option> <option value="gmean">Compute the geometric mean along the specified axis</option> <option value="hmean">Calculates the harmonic mean along the specified axis</option> <option value="kurtosis">Computes the kurtosis (Fisher or Pearson) of a dataset</option> <option value="kurtosistest">Tests whether a dataset has normal kurtosis</option> <option value="mode">show the most common value in the passed array</option> <option value="moment">Calculates the nth moment about the mean for a sample</option> <option value="normaltest">Tests whether a sample differs from a normal distribution</option> <option value="skew">Computes the skewness of a data set.</option> <option value="skewtest">Tests whether the skew is different from the normal distribution.</option> <option value="tmean">Compute the trimmed mean</option> <option value="tvar">Compute the trimmed variance</option> <option value="tmin">Compute the trimmed minimum</option> <option value="tmax">Compute the trimmed maximum</option> <option value="tstd">Compute the trimmed sample standard deviation</option> <option value="tsem">Compute the trimmed standard error of the mean</option> <option value="nanmean">Compute the mean ignoring nans</option> <option value="nanstd">Compute the standard deviation ignoring nans</option> <option value="nanmedian">Compute the median ignoring nan values.</option> <option value="variation">Computes the coefficient of variation, the ratio of the biased standard deviation to the mean.</option> <option value="cumfreq">Returns a cumulative frequency histogram, using the histogram function</option> <option value="histogram2">Compute histogram using divisions in bins</option> <option value="histogram">Separates the range into several bins</option> <option value="itemfreq">Compute frequencies for each number</option> <option value="percentileofscore">The percentile rank of a score relative to a list of scores</option> <option value="scoreatpercentile">Calculate the score at a given percentile of the input sequence</option> <option value="relfreq">Returns a relative frequency histogram, using the histogram function</option> <option value="binned_statistic">Compute a binned statistic for a set of data</option> <option value="obrientransform">Computes the O’Brien transform on input data</option> <option value="signaltonoise">The signal-to-noise ratio of the input data</option> <option value="bayes_mvs">Bayesian confidence intervals for the mean, var, and std</option> <option value="sem">Calculates the standard error of the mean of the value</option> <option value="zmap">Calculates the relative z-scores</option> <option value="zscore">Calculates the z score of each value in the sample, relative to the sample mean and standard deviation</option> <option value="sigmaclip">Iterative sigma-clipping of array elements</option> <option value="threshold">Clip array to a given value</option> <option value="trimboth">Slices off a proportion of items from both ends of an array</option> <option value="trim1">Slices off a proportion of items from ONE end of the passed array distribution</option> <option value="f_oneway">Performs a 1-way ANOVA</option> <option value="pearsonr">Calculates a Pearson correlation coefficient and the p-value for testing non-correlation.</option> <option value="spearmanr">Calculates a Spearman rank-order correlation coefficient and the p-value to test for non-correlation</option> <option value="pointbiserialr">Calculates a point biserial correlation coefficient and the associated p-value</option> <option value="kendalltau">Calculates Kendall’s tau, a correlation measure for ordinal data</option> <option value="linregress">This computes a least-squares regression for two sets of measurements</option> <option value="theilslopes">Computes the Theil-Sen estimator for a set of points (x, y)</option> <option value="ttest_1samp">Calculates the T-test for the mean of ONE group of scores</option> <option value="ttest_ind">T-test for the means of TWO INDEPENDENT samples of scores</option> <option value="ttest_rel">T-test for the means of TWO RELATED samples of scores</option> <option value="kstest">Perform the Kolmogorov-Smirnov test for goodness of fit.</option> <option value="chisquare">Calculates a one-way chi square test</option> <option value="power_divergence">Cressie-Read power divergence statistic and goodness of fit test</option> <option value="ks_2samp">Computes the Kolmogorov-Smirnov statistic on 2 samples</option> <option value="mannwhitneyu">Computes the Mann-Whitney rank test on samples x and y</option> <option value="tiecorrect">Tie correction factor for ties in the Mann-Whitney U and Kruskal-Wallis H tests</option> <option value="rankdata">Assign ranks to data, dealing with ties appropriately</option> <option value="ranksums">Compute the Wilcoxon rank-sum statistic for two samples</option> <option value="wilcoxon">Calculate the Wilcoxon signed-rank test</option> <option value="kruskal">Compute the Kruskal-Wallis H-test for independent samples</option> <option value="friedmanchisquare">Computes the Friedman test for repeated measurements</option> <option value="combine_pvalues">Methods for combining the p-values of independent tests bearing upon the same hypothesis</option> <option value="ansari">Perform the Ansari-Bradley test for equal scale parameters</option> <option value="bartlett">Perform Bartlett’s test for equal variances</option> <option value="levene">Perform Levene test for equal variances.</option> <option value="shapiro">Perform the Shapiro-Wilk test for normality</option> <option value="anderson">Anderson-Darling test for data coming from a particular distribution</option> <option value="binom_test">Perform a test that the probability of success is p</option> <option value="fligner">Perform Fligner’s test for equal variances</option> <option value="median_test">Mood’s median test</option> <option value="mood">Perform Mood’s test for equal scale parameters</option> <option value="boxcox">Return a positive dataset transformed by a Box-Cox power transformation</option> <option value="boxcox_normmax">Compute optimal Box-Cox transform parameter for input data</option> <option value="boxcox_llf">The boxcox log-likelihood function</option> <option value="entropy">Calculate the entropy of a distribution for given probability values</option> <option value="chi2_contingency">Chi-square test of independence of variables in a contingency table</option> </param> <when value="itemfreq"> <expand macro="macro_sample_one_cols"/> </when> <when value="sem"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_ddof"/> </when> <when value="zscore"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_ddof"/> </when> <when value="relfreq"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_mf"/> <expand macro="macro_nf"/> <expand macro="macro_b"/> </when> <when value="signaltonoise"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_ddof"/> </when> <when value="bayes_mvs"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_alpha"/> </when> <when value="threshold"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_mf"/> <expand macro="macro_nf"/> <expand macro="macro_new"/> </when> <when value="trimboth"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_proportiontocut"/> </when> <when value="trim1"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_proportiontocut"/> <expand macro="macro_tail"/> </when> <when value="percentileofscore"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_score"/> <expand macro="macro_kind"/> </when> <when value="normaltest"> <expand macro="macro_sample_one_cols"/> </when> <when value="kurtosistest"> <expand macro="macro_sample_one_cols"/> </when> <when value="describe"> <expand macro="macro_sample_one_cols"/> </when> <when value="mode"> <expand macro="macro_sample_one_cols"/> </when> <when value="normaltest"> <expand macro="macro_sample_one_cols"/> </when> <when value="kurtosistest"> <expand macro="macro_sample_one_cols"/> </when> <when value="skewtest"> <expand macro="macro_sample_one_cols"/> </when> <when value="nanmean"> <expand macro="macro_sample_one_cols"/> </when> <when value="nanmedian"> <expand macro="macro_sample_one_cols"/> </when> <when value="variation"> <expand macro="macro_sample_one_cols"/> </when> <when value="tiecorrect"> <expand macro="macro_sample_one_cols"/> </when> <when value="gmean"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_dtype"/> </when> <when value="hmean"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_dtype"/> </when> <when value="sigmaclip"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_m"/> <expand macro="macro_n_in"/> </when> <when value="kurtosis"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_axis"/> <expand macro="macro_fisher"/> <expand macro="macro_bias"/> </when> <when value="chi2_contingency"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_correction"/> <expand macro="macro_lambda_"/> </when> <when value="binom_test"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_n_in"/> <expand macro="macro_p"/> </when> <when value="moment"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_n_moment"/> </when> <when value="skew"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_bias"/> </when> <when value="tmean"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_mf"/> <expand macro="macro_nf"/> <expand macro="macro_inclusive1"/> <expand macro="macro_inclusive2"/> </when> <when value="tmin"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_mf"/> <expand macro="macro_inclusive"/> </when> <when value="tmax"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_nf"/> <expand macro="macro_inclusive"/> </when> <when value="tvar"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_mf"/> <expand macro="macro_nf"/> <expand macro="macro_inclusive1"/> <expand macro="macro_inclusive2"/> </when> <when value="tstd"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_mf"/> <expand macro="macro_nf"/> <expand macro="macro_inclusive1"/> <expand macro="macro_inclusive2"/> </when> <when value="tsem"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_mf"/> <expand macro="macro_nf"/> <expand macro="macro_inclusive1"/> <expand macro="macro_inclusive2"/> </when> <when value="nanstd"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_bias"/> </when> <when value="histogram"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_mf"/> <expand macro="macro_nf"/> <expand macro="macro_b"/> <expand macro="macro_printextras"/> </when> <when value="cumfreq"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_mf"/> <expand macro="macro_nf"/> <expand macro="macro_b"/> </when> <when value="boxcox"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_imbda"/> <expand macro="macro_alpha"/> </when> <when value="boxcox_llf"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_imbda"/> </when> <when value="boxcox_normmax"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_mf"/> <expand macro="macro_nf"/> <expand macro="macro_method"/> </when> <when value="anderson"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_dist"/> </when> <when value="rankdata"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_md"/> </when> <when value="kstest"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_cdf"/> <expand macro="macro_ni"/> <expand macro="macro_alternative"/> <expand macro="macro_mode"/> </when> <when value="spearmanr"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_sample_two_cols"/> </when> <when value="ranksums"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_sample_two_cols"/> </when> <when value="ansari"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_sample_two_cols"/> </when> <when value="linregress"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_sample_two_cols"/> </when> <when value="histogram2"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_sample_two_cols"/> </when> <when value="pearsonr"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_sample_two_cols"/> </when> <when value="pointbiserialr"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_sample_two_cols"/> </when> <when value="ttest_1samp"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_sample_two_cols"/> </when> <when value="ks_2samp"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_sample_two_cols"/> </when> <when value="kendalltau"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_sample_two_cols"/> <expand macro="macro_initial_lexsort"/> </when> <when value="mannwhitneyu"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_sample_two_cols"/> <expand macro="macro_mwu_use_continuity"/> </when> <when value="ttest_ind"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_sample_two_cols"/> <expand macro="macro_equal_var"/> </when> <when value="ttest_rel"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_sample_two_cols"/> <expand macro="macro_axis"/> </when> <when value="entropy"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_sample_two_cols"/> <expand macro="macro_base"/> </when> <when value="theilslopes"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_sample_two_cols"/> <expand macro="macro_alpha"/> </when> <when value="zmap"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_sample_two_cols"/> <expand macro="macro_ddof"/> </when> <when value="chisquare"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_sample_two_cols"/> <expand macro="macro_ddof"/> </when> <when value="power_divergence"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_sample_two_cols"/> <expand macro="macro_lambda_"/> <expand macro="macro_ddof"/> </when> <when value="combine_pvalues"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_sample_two_cols"/> <expand macro="macro_med"/> </when> <when value="mood"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_sample_two_cols"/> <expand macro="macro_axis"/> </when> <when value="shapiro"> <expand macro="macro_sample_one_cols"/> </when> <when value="wilcoxon"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_sample_two_cols"/> <expand macro="macro_zero_method"/> <expand macro="macro_correction"/> </when> <when value="scoreatpercentile"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_sample_two_cols"/> <expand macro="macro_mf"/> <expand macro="macro_nf"/> <expand macro="macro_interpolation"/> </when> <when value="binned_statistic"> <expand macro="macro_sample_one_cols"/> <expand macro="macro_sample_two_cols"/> <expand macro="macro_mf"/> <expand macro="macro_nf"/> <expand macro="macro_b"/> <expand macro="macro_statistic"/> </when> <when value="fligner"> <expand macro="macro_proportiontocut"/> <expand macro="macro_center"/> <expand macro="macro_sample_cols_min2"/> </when> <when value="f_oneway"> <expand macro="macro_sample_cols_min2"/> </when> <when value="kruskal"> <expand macro="macro_sample_cols_min2"/> </when> <when value="friedmanchisquare"> <expand macro="macro_sample_cols_min3"/> </when> <when value="bartlett"> <expand macro="macro_sample_cols_min2"/> </when> <when value="levene"> <expand macro="macro_proportiontocut"/> <expand macro="macro_center"/> <expand macro="macro_sample_cols_min2"/> </when> <when value="obrientransform"> <expand macro="macro_sample_cols_min2"/> </when> <when value="median_test"> <expand macro="macro_ties"/> <expand macro="macro_correction"/> <expand macro="macro_lambda_"/> <expand macro="macro_sample_cols_min2"/> </when> </conditional> </inputs> <outputs> <data format="tabular" name="outfile" label="${tool.name} on ${on_string}" /> </outputs> <tests> <!-- Test 01 --> <test> <param name="infile" value="input.tabular"/> <output name="outfile" file="boxcox_normmax2.tabular"/> <param name="sample_one_cols" value="1,2,3,4"/> <param name="test_methods_opts" value="boxcox_normmax"/> <param name="method" value="pearsonr"/> <param name="mf" value="-2.0"/> <param name="nf" value="2.0"/> </test> <test> <param name="infile" value="input.tabular"/> <output name="outfile" file="normaltest.tabular" lines_diff="4"/> <param name="sample_one_cols" value="1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24"/> <param name="test_methods_opts" value="normaltest"/> </test> <test> <param name="infile" value="input.tabular"/> <output name="outfile" file="tmin.tabular"/> <param name="sample_one_cols" value="1,2,3,4,5,6"/> <param name="test_methods_opts" value="tmin"/> <param name="mf" value="10.0"/> <param name="inclusive" value="True"/> </test> <test> <param name="infile" value="input.tabular"/> <output name="outfile" file="shapiro2.tabular"/> <param name="sample_one_cols" value="1,2,3,4,8,9"/> <param name="test_methods_opts" value="shapiro"/> </test> <!-- Test 05 --> <test> <param name="infile" value="input.tabular"/> <output name="outfile" file="obrientransform.tabular"/> <repeat name="samples"> <param name="sample_cols" value="1,2,3,4"/> </repeat> <repeat name="samples"> <param name="sample_cols" value="5,6,7,8"/> </repeat> <param name="test_methods_opts" value="obrientransform"/> </test> <test> <param name="infile" value="input.tabular"/> <output name="outfile" file="median_test_result1.tabular"/> <repeat name="samples"> <param name="sample_cols" value="1,2,3,4"/> </repeat> <repeat name="samples"> <param name="sample_cols" value="5,6,7,8"/> </repeat> <repeat name="samples"> <param name="sample_cols" value="9,10,11,12"/> </repeat> <param name="test_methods_opts" value="median_test"/> <param name="ties" value="above"/> <param name="correction" value="True"/> <param name="lambda_" value="1"/> </test> <test> <param name="infile" value="input.tabular"/> <output name="outfile" file="wilcoxon_result1.tabular"/> <param name="sample_one_cols" value="1,2,3,4,5,6,7,8,9,10"/> <param name="sample_two_cols" value="11,12,13,14,15,16,17,18,19,20"/> <param name="test_methods_opts" value="wilcoxon"/> <param name="zero_method" value="pratt"/> <param name="correction" value="False"/> </test> <test> <param name="infile" value="input.tabular"/> <output name="outfile" file="percentileofscore1.tabular"/> <param name="sample_one_cols" value="1,2,3,4"/> <param name="sample_two_cols" value="5,6,7,8"/> <param name="test_methods_opts" value="percentileofscore"/> <param name="score" value="1"/> <param name="kind" value="rank"/> </test> <test> <param name="infile" value="input.tabular"/> <output name="outfile" file="percentileofscore2.tabular"/> <param name="sample_one_cols" value="1,2,3,4"/> <param name="sample_two_cols" value="5,6,7,8"/> <param name="test_methods_opts" value="percentileofscore"/> <param name="score" value="2"/> <param name="kind" value="mean"/> </test> <!-- Test 10 --> <test> <param name="infile" value="input.tabular"/> <output name="outfile" file="trim1.tabular"/> <param name="sample_one_cols" value="1,2,3,4,5,6"/> <param name="test_methods_opts" value="trim1"/> <param name="tail" value="left"/> <param name="proportiontocut" value="1.0"/> </test> <test> <param name="infile" value="input.tabular"/> <output name="outfile" file="scoreatpercentile.tabular"/> <param name="sample_one_cols" value="1,2,3,4"/> <param name="sample_two_cols" value="11,12,13,14"/> <param name="test_methods_opts" value="scoreatpercentile"/> <param name="mf" value="5.0"/> <param name="nf" value="50.0"/> <param name="interpolation" value="lower"/> </test> <test> <param name="infile" value="input.tabular"/> <output name="outfile" file="anderson.tabular"/> <param name="sample_one_cols" value="1,2,3,4"/> <param name="test_methods_opts" value="anderson"/> <param name="dist" value="expon"/> </test> <test> <param name="infile" value="input.tabular"/> <output name="outfile" file="boxcox_normmax.tabular" lines_diff="14"/> <param name="sample_one_cols" value="1,2,3,4"/> <param name="test_methods_opts" value="boxcox_normmax"/> <param name="method" value="mle"/> <param name="mf" value="-3.0"/> <param name="nf" value="3.0"/> </test> <test> <param name="infile" value="input.tabular"/> <output name="outfile" file="f_oneway.tabular"/> <repeat name="samples"> <param name="sample_cols" value="1,2,3,4"/> </repeat> <repeat name="samples"> <param name="sample_cols" value="5,6,7,8"/> </repeat> <param name="test_methods_opts" value="f_oneway"/> </test> <!-- Test 15 --> <test> <param name="infile" value="input.tabular"/> <output name="outfile" file="shapiro.tabular"/> <param name="sample_one_cols" value="1,2,3,4"/> <param name="test_methods_opts" value="shapiro"/> </test> <!-- Fail with the following error: ValueError: For each axis slice, the sum of the observed frequencies must agree with the sum of the expected frequencies to a relative tolerance of 1e-08, but the percent differences are: 0.08823529411764706 <test> <param name="infile" value="input.tabular"/> <output name="outfile" file="power_divergence.tabular"/> <param name="sample_one_cols" value="1,2,3,4"/> <param name="sample_two_cols" value="5,6,7,8"/> <param name="test_methods_opts" value="power_divergence"/> <param name="ddof" value="1"/> <param name="lambda_" value="1"/> </test> --> <test> <param name="infile" value="input.tabular"/> <output name="outfile" file="itemfreq.tabular"/> <param name="sample_one_cols" value="1,2,3,4,5,6,7,8,9,10"/> <param name="test_methods_opts" value="itemfreq"/> </test> <test> <param name="infile" value="input.tabular"/> <output name="outfile" file="trimboth.tabular"/> <param name="sample_one_cols" value="1,2,3,4,5,6,7,8,9,10"/> <param name="proportiontocut" value="0"/> <param name="test_methods_opts" value="trimboth"/> </test> <test> <param name="infile" value="input.tabular"/> <output name="outfile" file="tmean.tabular"/> <param name="sample_one_cols" value="1,2,3,4,5,6"/> <param name="test_methods_opts" value="tmean"/> <param name="mf" value="0"/> <param name="nf" value="50"/> <param name="inclusive1" value="True"/> <param name="inclusive2" value="True"/> </test> <!-- Test 20 --> <test> <param name="infile" value="input.tabular"/> <output name="outfile" file="tvar.tabular"/> <param name="sample_one_cols" value="1,2,3,4,5,6"/> <param name="test_methods_opts" value="tvar"/> <param name="mf" value="0"/> <param name="nf" value="50"/> <param name="inclusive1" value="True"/> <param name="inclusive2" value="True"/> </test> </tests> <help> .. class:: warningmark Computes a large number of probability distributions as well as a statistical functions of any kind. For more informations have a look at the `SciPy site`_. .. _`SciPy site`: http://docs.scipy.org/doc/scipy/reference/stats.html ----- ======== Describe ======== Computes several descriptive statistics for samples x ----- **The output are:** size of the data : int length of data along axis (min, max): tuple of ndarrays or floats minimum and maximum value of data array arithmetic mean : ndarray or float mean of data along axis unbiased variance : ndarray or float variance of the data along axis, denominator is number of observations minus one. biased skewness : ndarray or float skewness, based on moment calculations with denominator equal to the number of observations, i.e. no degrees of freedom correction biased kurtosis : ndarray or float kurtosis (Fisher), the kurtosis is normalized so that it is zero for the normal distribution. No degrees of freedom or bias correction is used. **example**: describe([4,417,8,3]) the result is (4,(3.0, 417.0),108.0,42440.6666667 ,1.15432044278, -0.666961688151) ===== Gmean ===== Compute the geometric mean along the specified axis. Returns the geometric average of the array elements. That is: n-th root of (x1 * x2 * ... * xn) ----- **The output are:** gmean : ndarray see dtype parameter above **example**: stats.gmean([4,17,8,3],dtype='float64') the result is (6.35594365562) ===== Hmean ===== py.stats.hmean(a, axis=0, dtype=None)[source] Calculates the harmonic mean along the specified axis. That is: n / (1/x1 + 1/x2 + ... + 1/xn) **The output are:** hmean : ndarray see dtype parameter above **example**: stats.hmean([4,17,8,3],dtype='float64')the result is (5.21405750799) ======== Kurtosis ======== Computes the kurtosis (Fisher or Pearson) of a dataset. Kurtosis is the fourth central moment divided by the square of the variance. If Fisher’s definition is used, then 3.0 is subtracted from the result to give 0.0 for a normal distribution. If bias is False then the kurtosis is calculated using k statistics to eliminate bias coming from biased moment estimators ----- Computes the kurtosis for samples x . **The output are:** kurtosis : array The kurtosis of values along an axis. If all values are equal, return -3 for Fisher’s definition and 0 for Pearson’s definition. **example**: kurtosis([4,417,8,3],0,true,true) the result is (-0.666961688151) ============= Kurtosis Test ============= Tests whether a dataset has normal kurtosis This function tests the null hypothesis that the kurtosis of the population from which the sample was drawn is that of the normal distribution: kurtosis = 3(n-1)/(n+1). ----- Computes the Z-value and p-value about samples x. kurtosistest only valid for n>=20. **The output are:** z-score : float The computed z-score for this test p-value : float The 2-sided p-value for the hypothesis test **example**: kurtosistest([4,17,8,3,30,45,5,3,4,17,8,3,30,45,5,3,4,17,8,3,30,45,5,3]) the result is (0.29775013081425117, 0.7658938788569033) ==== Mode ==== Returns an array of the modal value in the passed array. If there is more than one such value, only the first is returned. The bin-count for the modal bins is also returned. ----- Computes the most common value for samples x . **The output are:** vals : ndarray Array of modal values. counts : ndarray Array of counts for each mode. **example**: mode([4,417,8,3]) the result is ([ 3.], [ 1.]) ====== Moment ====== Calculates the nth moment about the mean for a sample. Generally used to calculate coefficients of skewness and kurtosis. ----- Computes the nth moment about the mean for samples x . **The output are:** n-th central moment : ndarray or float The appropriate moment along the given axis or over all values if axis is None. The denominator for the moment calculation is the number of observations, no degrees of freedom correction is done. **example**: mode([4,417,8,3],moment=2) the result is (31830.5) =========== Normal Test =========== Tests whether a sample differs from a normal distribution. This function tests the null hypothesis that a sample comes from a normal distribution. It is based on D’Agostino and Pearson’s test that combines skew and kurtosis to produce an omnibus test of normality. ----- Computes the k2 and p-value for samples x. skewtest is not valid with less than 8 samples.kurtosistest only valid for n>=20. **The output are:** k2 : float or array s^2 + k^2, where s is the z-score returned by skewtest and k is the z-score returned by kurtosistest. p-value : float or array A 2-sided chi squared probability for the hypothesis test. **example**: normaltest([4,17,8,3,30,45,5,3,4,17,8,3,30,45,5,3,4,17,8,3,30,45,5,3]) the result is (5.8877986151838, 0.052659990380181286) ==== Skew ==== Computes the skewness of a data set. For normally distributed data, the skewness should be about 0. A skewness value > 0 means that there is more weight in the left tail of the distribution. The function skewtest can be used to determine if the skewness value is close enough to 0, statistically speaking. ----- Computes the skewness from samples x. **The output are:** skewness : ndarray The skewness of values along an axis, returning 0 where all values are equal. **example**: kurtosistest([4,417,8,3]) the result is (1.1543204427775307) ========= Skew Test ========= Tests whether the skew is different from the normal distribution. This function tests the null hypothesis that the skewness of the population that the sample was drawn from is the same as that of a corresponding normal distribution. ----- Computes the z-value and p-value from samples x. skewtest is not valid with less than 8 samples **The output are:** z-score : float The computed z-score for this test. p-value : float a 2-sided p-value for the hypothesis test **example**: skewtest([4,17,8,3,30,45,5,3,4,17,8,3,30,45,5,3,4,17,8,3,30,45,5,3]) the result is (2.40814108282,0.0160339834731) ====== tmean ====== Compute the trimmed mean. This function finds the arithmetic mean of given values, ignoring values outside the given limits. ----- Computes the mean of samples x,considering the lower and higher limits. Values in the input array less than the lower limit or greater than the upper limit will be ignored for inclusive,These flags determine whether values exactly equal to the lower or upper limits are included. The default value is (True, True) **The output are:** tmean : float The computed mean for this test. **example**: tmean([4,17,8,3],(0,20),(true,true)) the result is (8.0) ===== tvar ===== Compute the trimmed variance This function computes the sample variance of an array of values, while ignoring values which are outside of given limits ----- Computes the variance of samples x,considering the lower and higher limits. Values in the input array less than the lower limit or greater than the upper limit will be ignored for inclusive,These flags determine whether values exactly equal to the lower or upper limits are included. The default value is (True, True) **The output are:** tvar : float The computed variance for this test. **example**: tvar([4,17,8,3],(0,99999),(true,true)) the result is (40.6666666667) ===== tmin ===== Compute the trimmed minimum. This function finds the arithmetic minimum of given values, ignoring values outside the given limits. ----- Compute the trimmed minimum This function finds the miminum value of an array a along the specified axis, but only considering values greater than a specified lower limit. **The output are:** tmin : float The computed min for this test. **example**: stats.tmin([4,17,8,3],2,0,'true') the result is (3.0) ============ tmax ============ Compute the trimmed maximum. This function finds the arithmetic maximum of given values, ignoring values outside the given limits. This function computes the maximum value of an array along a given axis, while ignoring values larger than a specified upper limit. **The output are:** tmax : float The computed max for this test. **example**: stats.tmax([4,17,8,3],50,0,'true') the result is (17.0) ============ tstd ============ Compute the trimmed sample standard deviation This function finds the sample standard deviation of given values, ignoring values outside the given limits. ----- Computes the deviation of samples x,considering the lower and higher limits. Values in the input array less than the lower limit or greater than the upper limit will be ignored for inclusive,These flags determine whether values exactly equal to the lower or upper limits are included. The default value is (True, True) **The output are:** tstd : float The computed deviation for this test. **example**: tstd([4,17,8,3],(0,99999),(true,true)) the result is (6.37704215657) ============ tsem ============ Compute the trimmed standard error of the mean. This function finds the standard error of the mean for given values, ignoring values outside the given limits. ----- Computes the standard error of mean for samples x,considering the lower and higher limits. Values in the input array less than the lower limit or greater than the upper limit will be ignored for inclusive,These flags determine whether values exactly equal to the lower or upper limits are included. The default value is (True, True) **The output are:** tsem : float The computed the standard error of mean for this test. **example**: tsem([4,17,8,3],(0,99999),(true,true)) the result is (3.18852107828) ======== nanmean ======== Compute the mean over the given axis ignoring nans ----- Computes the mean for samples x without considering nans **The output are:** m : float The computed the mean for this test. **example**: tsem([4,17,8,3]) the result is (8.0) ======= nanstd ======= Compute the standard deviation over the given axis, ignoring nans. ----- Computes the deviation for samples x without considering nans **The output are:** s : float The computed the standard deviation for this test. **example**: nanstd([4,17,8,3],0,'false') the result is (5.52268050859) ============ nanmedian ============ Computes the median for samples x without considering nans **The output are:** m : float The computed the median for this test. **example**: nanmedian([4,17,8,3]) the result is (6.0) ============ variation ============ Computes the coefficient of variation, the ratio of the biased standard deviation to the mean for samples x **The output are:** ratio: float The ratio of the biased standard deviation to the mean for this test. **example**: variation([4,17,8,3]) the result is (0.690335063574) ============ cumfreq ============ Returns a cumulative frequency histogram, using the histogram function. **The output are:** cumfreq : ndarray Binned values of cumulative frequency. lowerreallimit : float Lower real limit binsize : float Width of each bin. extrapoints : int Extra points. **example**: cumfreq([4,17,8,3],defaultreallimits=(2.0,3.5)) the result is ([ 0. 0. 0. 0. 0. 0. 1. 1. 1. 1.],2.0,0.15,3) ========== histogram2 ========== Compute histogram using divisions in bins. Count the number of times values from array a fall into numerical ranges defined by bins. samples should at least have two numbers. **The output are:** histogram2 : ndarray of rank 1 Each value represents the occurrences for a given bin (range) of values. **example**: stats.histogram2([4,17,8,3], [30,45,5,3]) the result is (array([ 0, -2, -2, 4])) ============ histogram ============ Separates the range into several bins and returns the number of instances in each bin **The output are:** histogram : ndarray Number of points (or sum of weights) in each bin. low_range : float Lowest value of histogram, the lower limit of the first bin. binsize : float The size of the bins (all bins have the same size). extrapoints : int The number of points outside the range of the histogram. **example**: histogram([4,17,8,3],defaultlimits=(2.0,3.4)) the result is ([ 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.],2.0,0.14,3) ============ itemfreq ============ Computes the frequencies for numbers **The output are:** temfreq : (K, 2) ndarray A 2-D frequency table. Column 1 contains sorted, unique values from a, column 2 contains their respective counts. **example**: variation([4,17,8,3]) the result is array([[ 3, 1], [ 4, 1],[ 8, 1],[17, 1]]) === Sem === Calculates the standard error of the mean (or standard error of measurement) of the values in the input array. **The output are:** s : ndarray or float The standard error of the mean in the sample(s), along the input axis. **example**: variation([4,17,8,3],ddof=1) the result is(3.18852107828) ===== Z Map ===== Calculates the relative z-scores. Returns an array of z-scores, i.e., scores that are standardized to zero mean and unit variance, where mean and variance are calculated from the comparison array. **The output are:** zscore : array_like Z-scores, in the same shape as scores. **example**: stats.zmap([4,17,8,3],[30,45,5,3],ddof=1)the result is[-0.82496302 -0.18469321 -0.62795692 -0.87421454] ======= Z Score ======= Calculates the z score of each value in the sample, relative to the sample mean and standard deviation **The output are:** zscore : array_like The z-scores, standardized by mean and standard deviation of input array a. **example**: variation([4,17,8,3],ddof=0) the result is ([-0.72428597 1.62964343 0. -0.90535746]) =============== Signal to noise =============== The signal-to-noise ratio of the input data. Returns the signal-to-noise ratio of a, here defined as the mean divided by the standard deviation. **The output are:** s2n : ndarray The mean to standard deviation ratio(s) along axis, or 0 where the standard deviation is 0. **example**: variation([4,17,8,3],ddof=0) the result is (1.44857193668) =================== Percentile of score =================== The percentile rank of a score relative to a list of scores. A percentileofscore of, for example, 80% means that 80% of the scores in a are below the given score. In the case of gaps or ties, the exact definition depends on the optional keyword, kind. **The output are:** pcos : float Percentile-position of score (0-100) relative to a. **example**: percentileofscore([4,17,8,3],score=3,kind='rank') the result is(25.0) =================== Score at percentile =================== Calculate the score at a given percentile of the input sequence. For example, the score at per=50 is the median. If the desired quantile lies between two data points, we interpolate between them, according to the value of interpolation. If the parameter limit is provided, it should be a tuple (lower, upper) of two values. The second simple should be in range [0,100]. **The output are:** score : float or ndarray Score at percentile(s). **example**: stats.scoreatpercentile([4,17,8,3],[8,3],(0,100),'fraction') the result is array([ 3.24, 3.09]) ======= relfreq ======= Returns a relative frequency histogram, using the histogram function numbins are the number of bins to use for the histogram. **The output are:** relfreq : ndarray Binned values of relative frequency. lowerreallimit : float Lower real limit binsize : float Width of each bin. extrapoints : int Extra points. **example**: stats.relfreq([4,17,8,3],10,(0,100)) the result is (array([ 0.75, 0.25, 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 ]), 0, 10.0, 0) ================ Binned statistic ================ Compute a binned statistic for a set of data. This is a generalization of a histogram function. A histogram divides the space into bins, and returns the count of the number of points in each bin. This function allows the computation of the sum, mean, median, or other statistic of the values within each bin. Y must be the same shape as X **The output are:** statistic : array The values of the selected statistic in each bin. bin_edges : array of dtype float Return the bin edges (length(statistic)+1). binnumber : 1-D ndarray of ints This assigns to each observation an integer that represents the bin in which this observation falls. Array has the same length as values. **example**: stats.binned_statistic([4,17,8,3],[30,45,5,3],'sum',10,(0,100)) the result is ([ 38. 45. 0. 0. 0. 0. 0. 0. 0. 0.],[ 0. 10. 20. 30. 40. 50. 60. 70. 80. 90. 100.],[1 2 1 1]) ================ obrientransform ================ Computes the O’Brien transform on input data (any number of arrays). Used to test for homogeneity of variance prior to running one-way stats. It has to have at least two samples. **The output are:** obrientransform : ndarray Transformed data for use in an ANOVA. The first dimension of the result corresponds to the sequence of transformed arrays. If the arrays given are all 1-D of the same length, the return value is a 2-D array; otherwise it is a 1-D array of type object, with each element being an ndarray. **example**: stats.obrientransformcenter([4,17,8,3], [30,45,5,3]) the result is (array([[ 16.5 , 124.83333333, -10.16666667, 31.5 ],[ 39.54166667, 877.04166667, 310.375 , 422.04166667]])) ========= bayes mvs ========= Bayesian confidence intervals for the mean, var, and std.alpha should be larger than 0,smaller than 1. **The output are:** mean_cntr, var_cntr, std_cntr : tuple The three results are for the mean, variance and standard deviation, respectively. Each result is a tuple of the form: (center, (lower, upper)) with center the mean of the conditional pdf of the value given the data, and (lower, upper) a confidence interval, centered on the median, containing the estimate to a probability alpha. **example**: stats.bayes_mvs([4,17,8,3],0.8) the result is (8.0, (0.49625108326958145, 15.503748916730416));(122.0, (15.611548029617781, 346.74229584218108));(8.8129230241075476, (3.9511451542075475, 18.621017583423871)) ========= sigmaclip ========= Iterative sigma-clipping of array elements. The output array contains only those elements of the input array c that satisfy the conditions **The output are:** c : ndarray Input array with clipped elements removed. critlower : float Lower threshold value use for clipping. critlupper : float Upper threshold value use for clipping. **example**: sigmaclip([4,17,8,3]) the result is [ 4. 17. 8. 3.],-14.0907220344,30.0907220344) ========= threshold ========= Clip array to a given value. Similar to numpy.clip(), except that values less than threshmin or greater than threshmax are replaced by newval, instead of by threshmin and threshmax respectively. **The output are:** out : ndarray The clipped input array, with values less than threshmin or greater than threshmax replaced with newval. **example**: stats.threshold([4,17,8,3],2,8,0)the result is array([4, 17, 8, 3]) ======== trimboth ======== Slices off a proportion of items from both ends of an array. Slices off the passed proportion of items from both ends of the passed array (i.e., with proportiontocut = 0.1, slices leftmost 10% and rightmost 10% of scores). You must pre-sort the array if you want ‘proper’ trimming. Slices off less if proportion results in a non-integer slice index (i.e., conservatively slices off proportiontocut). **The output are:** out : ndarray Trimmed version of array a. **example**: stats.trimboth([4,17,8,3],0.1)the result is array([ 4, 17, 8, 3]) ===== trim1 ===== Slices off a proportion of items from ONE end of the passed array distribution. If proportiontocut = 0.1, slices off ‘leftmost’ or ‘rightmost’ 10% of scores. Slices off LESS if proportion results in a non-integer slice index (i.e., conservatively slices off proportiontocut ). **The output are:** trim1 : ndarray Trimmed version of array a **example**: stats.trim1([4,17,8,3],0.5,'left')the result is array([8, 3]) ========= spearmanr ========= Calculates a Spearman rank-order correlation coefficient and the p-value to test for non-correlation. The Spearman correlation is a nonparametric measure of the monotonicity of the relationship between two datasets. Unlike the Pearson correlation, the Spearman correlation does not assume that both datasets are normally distributed. Like other correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact monotonic relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases. **The output are:** rho : float or ndarray (2-D square) Spearman correlation matrix or correlation coefficient (if only 2 variables are given as parameters. Correlation matrix is square with length equal to total number of variables (columns or rows) in a and b combined. p-value : float The two-sided p-value for a hypothesis test whose null hypothesis is that two sets of data are uncorrelated, has same dimension as rho. **example**: stats.spearmanr([4,17,8,3,30,45,5,3],[5,3,4,17,8,3,30,45])the result is (-0.722891566265, 0.0427539458876) ======== f oneway ======== Performs a 1-way ANOVA. The one-way ANOVA tests the null hypothesis that two or more groups have the same population mean. The test is applied to samples from two or more groups, possibly with differing sizes. **The output are:** F-value : float The computed F-value of the test. p-value : float The associated p-value from the F-distribution. **example**: stats. f_oneway([4,17,8,3], [30,45,5,3]) the result is (1.43569457222,0.276015080537) ================= Mann-Whitney rank ================= Compute the Wilcoxon rank-sum statistic for two samples. The Wilcoxon rank-sum test tests the null hypothesis that two sets of measurements are drawn from the same distribution. The alternative hypothesis is that values in one sample are more likely to be larger than the values in the other sample. This test should be used to compare two samples from continuous distributions. It does not handle ties between measurements in x and y. For tie-handling and an optional continuity correction use mannwhitneyu. ----- Computes the Mann-Whitney rank test on samples x and y. u : float The Mann-Whitney statistics. prob : float One-sided p-value assuming a asymptotic normal distribution. =================== Ansari-Bradley test =================== Perform the Ansari-Bradley test for equal scale parameters The Ansari-Bradley test is a non-parametric test for the equality of the scale parameter of the distributions from which two samples were drawn. The p-value given is exact when the sample sizes are both less than 55 and there are no ties, otherwise a normal approximation for the p-value is used. ----- Computes the Ansari-Bradley test for samples x and y. **The output are:** AB : float The Ansari-Bradley test statistic p-value : float The p-value of the hypothesis test **example**: ansari([1,2,3,4],[15,5,20,8,10,12]) the result is (10.0, 0.53333333333333333) ======== bartlett ======== Perform Bartlett’s test for equal variances Bartlett’s test tests the null hypothesis that all input samples are from populations with equal variances. It has to have at least two samples. **The output are:** T : float The test statistic. p-value : float The p-value of the test. **example**: stats.bartlett([4,17,8,3], [30,45,5,3]) the result is (2.87507113948,0.0899609995242) ====== levene ====== Perform Levene test for equal variances. The Levene test tests the null hypothesis that all input samples are from populations with equal variances. It has to have at least two samples. **The output are:** W : float The test statistic. p-value : float The p-value for the test. **example**: stats.levene(center='mean',proportiontocut=0.01,[4,17,8,3], [30,45,5,3]) the result is (11.5803858521,0.014442549362) ======= fligner ======= Perform Fligner’s test for equal variances. Fligner’s test tests the null hypothesis that all input samples are from populations with equal variances. Fligner’s test is non-parametric in contrast to Bartlett’s test bartlett and Levene’s test levene. **The output are:** Xsq : float The test statistic. p-value : float The p-value for the hypothesis test. ========== linregress ========== Calculate a regression line This computes a least-squares regression for two sets of measurements. ----- Computes the least-squares regression for samples x and y. **The output are:** slope : float slope of the regression line intercept : float intercept of the regression line r-value : float correlation coefficient p-value : float two-sided p-value for a hypothesis test whose null hypothesis is that the slope is zero. stderr : float Standard error of the estimate **example**: linregress([4,417,8,3],[30,45,5,3]) the result is (0.0783053989099, 12.2930169177, 0.794515680443,0.205484319557,0.0423191764713) =========== ttest 1samp =========== Calculates the T-test for the mean of ONE group of scores. This is a two-sided test for the null hypothesis that the expected value (mean) of a sample of independent observations a is equal to the given population mean, popmean. **The output are:** t : float or array The calculated t-statistic. prob : float or array The two-tailed p-value. **example**: stats.ttest_1samp([4,17,8,3],[30,45,5,3])the result is (array([ -6.89975053, -11.60412589, 0.94087507, 1.56812512]), array([ 0.00623831, 0.00137449, 0.41617971, 0.21485306])) ========= ttest ind ========= Calculates the T-test for the means of TWO INDEPENDENT samples of scores. This is a two-sided test for the null hypothesis that 2 independent samples have identical average (expected) values. This test assumes that the populations have identical variances. The independent samples t-test is used when two separate sets of independent and identically distributed samples are obtained, one from each of the two populations being compared. ----- Computes the T-test for the means of independent samples x and y. **The output are:** t : float or array The calculated t-statistic. prob : float or array The two-tailed p-value. **example**: ttest_ind([4,417,8,3],[30,45,5,3]) the result is (0.842956644207,0.431566932748) ========= ttest rel ========= Calculates the T-test on TWO RELATED samples of scores, a and b. This is a two-sided test for the null hypothesis that 2 related or repeated samples have identical average (expected) values. related samples t-tests typically consist of a sample of matched pairs of similar units, or one group of units that has been tested twice (a "repeated measures" t-test) ----- Computes the T-test for the means of related samples x and y. **The output are:** t : float or array t-statistic prob : float or array two-tailed p-value **example**: ttest_rel([4,417,8,3],[30,45,5,3]) the result is (0.917072474241,0.426732624361) ========= chisquare ========= Calculates a one-way chi square test. The chi square test tests the null hypothesis that the categorical data has the given frequencies. **The output are:** chisq : float or ndarray The chi-squared test statistic. The value is a float if axis is None or f_obs and f_exp are 1-D. p : float or ndarray The p-value of the test. The value is a float if ddof and the return value chisq are scalars. **example**: stats.chisquare([4,17,8,3],[30,45,5,3],ddof=1)the result is (41.7555555556,8.5683326078e-10) ================ power divergence ================ Cressie-Read power divergence statistic and goodness of fit test. This function tests the null hypothesis that the categorical data has the given frequencies, using the Cressie-Read power divergence statistic. **The output are:** stat : float or ndarray The Cressie-Read power divergence test statistic. The value is a float if axis is None or if` f_obs and f_exp are 1-D. p : float or ndarray The p-value of the test. The value is a float if ddof and the return value stat are scalars. **example**: stats.power_divergence([4,17,8,3],[30,45,5,3],1,lambda=1)the result is (41.7555555556, 8.5683326078e-10) ========== tiecorrect ========== Tie correction factor for ties in the Mann-Whitney U and Kruskal-Wallis H tests. **The output are:** factor : float Correction factor for U or H. **example**: stats.tiecorrect([4,17,8,3,30,45,5,3])the result is (0.988095238095) ======== rankdata ======== Assign ranks to data, dealing with ties appropriately. Ranks begin at 1. The method argument controls how ranks are assigned to equal values. See [R308] for further discussion of ranking methods. **The output are:** ranks : ndarray An array of length equal to the size of a, containing rank scores. **example**: stats.rankdata([4,17,8,3],average)the result is ([ 2. 4. 3. 1.]) ======= kruskal ======= Compute the Kruskal-Wallis H-test for independent samples The Kruskal-Wallis H-test tests the null hypothesis that the population median of all of the groups are equal. It is a non-parametric version of ANOVA. The number of samples have to be more than one **The output are:** H-statistic : float The Kruskal-Wallis H statistic, corrected for ties p-value : float The p-value for the test using the assumption that H has a chi square distribution **example**: stats. kruskal([4,17,8,3], [30,45,5,3]) the result is (0.527108433735,0.467825077285) ================== friedmanchisquare ================== Computes the Friedman test for repeated measurements The Friedman test tests the null hypothesis that repeated measurements of the same individuals have the same distribution. It is often used to test for consistency among measurements obtained in different ways. The number of samples have to be more than two. **The output are:** friedman chi-square statistic : float the test statistic, correcting for ties p-value : float the associated p-value assuming that the test statistic has a chi squared distribution **example**: stats.friedmanchisquare([4,17,8,3],[8,3,30,45],[30,45,5,3])the result is (0.933333333333,0.627089085273) ===== mood ===== Perform Mood’s test for equal scale parameters. Mood’s two-sample test for scale parameters is a non-parametric test for the null hypothesis that two samples are drawn from the same distribution with the same scale parameter. ----- Computes the Mood’s test for equal scale samples x and y. **The output are:** z : scalar or ndarray The z-score for the hypothesis test. For 1-D inputs a scalar is returned; p-value : scalar ndarray The p-value for the hypothesis test. **example**: mood([4,417,8,3],[30,45,5,3]) the result is (0.396928310068,0.691420327045) =============== combine_pvalues =============== Methods for combining the p-values of independent tests bearing upon the same hypothesis. **The output are:** statistic: float The statistic calculated by the specified method: - “fisher”: The chi-squared statistic - “stouffer”: The Z-score pval: float The combined p-value. **example**: stats.combine_pvalues([4,17,8,3],method='fisher',weights=[5,6,7,8]) the result is (-14.795123071,1.0) =========== median test =========== Mood’s median test. Test that two or more samples come from populations with the same median. **The output are:** stat : float The test statistic. The statistic that is returned is determined by lambda. The default is Pearson’s chi-squared statistic. p : float The p-value of the test. m : float The grand median. table : ndarray The contingency table. **example**: stats.median_test(ties='below',correction=True ,lambda=1,*a)the result is ((0.0, 1.0, 6.5, array([[2, 2],[2, 2]]))) ======== shapiro ======== Perform the Shapiro-Wilk test for normality. The Shapiro-Wilk test tests the null hypothesis that the data was drawn from a normal distribution. ----- Computes the Shapiro-Wilk test for samples x and y. If x has length n, then y must have length n/2. **The output are:** W : float The test statistic. p-value : float The p-value for the hypothesis test. **example**: shapiro([4,417,8,3]) the result is (0.66630089283, 0.00436889193952) ======== anderson ======== Anderson-Darling test for data coming from a particular distribution The Anderson-Darling test is a modification of the Kolmogorov- Smirnov test kstest for the null hypothesis that a sample is drawn from a population that follows a particular distribution. For the Anderson-Darling test, the critical values depend on which distribution is being tested against. This function works for normal, exponential, logistic, or Gumbel (Extreme Value Type I) distributions. ----- Computes the Anderson-Darling test for samples x which comes from a specific distribution.. **The output are:** A2 : float The Anderson-Darling test statistic critical : list The critical values for this distribution sig : list The significance levels for the corresponding critical values in percents. The function returns critical values for a differing set of significance levels depending on the distribution that is being tested against. **example**: anderson([4,417,8,3],norm) the result is (0.806976419634,[ 1.317 1.499 1.799 2.098 2.496] ,[ 15. 10. 5. 2.5 1. ]) ========== binom_test ========== Perform a test that the probability of success is p. This is an exact, two-sided test of the null hypothesis that the probability of success in a Bernoulli experiment is p. he binomial test is an exact test of the statistical significance of deviations from a theoretically expected distribution of observations into two categories. ----- Computes the test for the probability of success is p . **The output are:** p-value : float The p-value of the hypothesis test **example**: binom_test([417,8],1,0.5) the result is (5.81382734132e-112) ======== pearsonr ======== Calculates a Pearson correlation coefficient and the p-value for testing non-correlation. The Pearson correlation coefficient measures the linear relationship between two datasets.The value of the correlation (i.e., correlation coefficient) does not depend on the specific measurement units used. **The output are:** Pearson’s correlation coefficient: float 2-tailed p-value: float **example**: pearsonr([4,17,8,3],[30,45,5,3]) the result is (0.695092958988,0.304907041012) ======== wilcoxon ======== Calculate the Wilcoxon signed-rank test. The Wilcoxon signed-rank test tests the null hypothesis that two related paired samples come from the same distribution. In particular, it tests whether the distribution of the differences x - y is symmetric about zero. It is a non-parametric version of the paired T-test. **The output are:** T : float The sum of the ranks of the differences above or below zero, whichever is smaller. p-value : float The two-sided p-value for the test. **example**: stats.wilcoxon([3,6,23,70,20,55,4,19,3,6], [23,70,20,55,4,19,3,6,23,70],zero_method='pratt',correction=True) the result is (23.0, 0.68309139830960874) ============== pointbiserialr ============== Calculates a Pearson correlation coefficient and the p-value for testing non-correlation. The Pearson correlation coefficient measures the linear relationship between two datasets.The value of the correlation (i.e., correlation coefficient) does not depend on the specific measurement units used. **The output are:** r : float R value p-value : float 2-tailed p-value **example**: pointbiserialr([0,0,0,1,1,1,1],[1,0,1,2,3,4,5]) the result is (0.84162541153017323, 0.017570710081214368) ======== ks_2samp ======== Computes the Kolmogorov-Smirnov statistic on 2 samples. This is a two-sided test for the null hypothesis that 2 independent samples are drawn from the same continuous distribution. If the K-S statistic is small or the p-value is high, then we cannot reject the hypothesis that the distributions of the two samples are the same. **The output are:** D : float KS statistic p-value : float two-tailed p-value **example**: ks_2samp([4,17,8,3],[30,45,5,3]) the result is (0.5,0.534415719217) ========== kendalltau ========== Calculates Kendall’s tau, a correlation measure for sample x and sample y. sample x and sample y should be in the same size. Kendall’s tau is a measure of the correspondence between two rankings. Values close to 1 indicate strong agreement, values close to -1 indicate strong disagreement. This is the tau-b version of Kendall’s tau which accounts for ties. **The output are:** Kendall’s tau : float The tau statistic. p-value : float The two-sided p-value for a hypothesis test whose null hypothesis is an absence of association, tau = 0. **example**: kendalltau([4,17,8,3],[30,45,5,3]),the result is (0.666666666667,0.174231399708) ================ chi2_contingency ================ Chi-square test of independence of variables in a contingency table. This function computes the chi-square statistic and p-value for the hypothesis test of independence of the observed frequencies in the contingency table observed. **The output are:** chi2 : float The test statistic. p : float The p-value of the test dof : int Degrees of freedom expected : ndarray, same shape as observed The expected frequencies, based on the marginal sums of the table. **example**: stats.chi2_contingency([4,17,8,3],1)the result is (0.0, 1.0, 0, array([ 4., 17., 8., 3.])) ====== boxcox ====== Return a positive dataset transformed by a Box-Cox power transformation **The output are:** boxcox : ndarray Box-Cox power transformed array. maxlog : float, optional If the lmbda parameter is None, the second returned argument is the lambda that maximizes the log-likelihood function. (min_ci, max_ci) : tuple of float, optional If lmbda parameter is None and alpha is not None, this returned tuple of floats represents the minimum and maximum confidence limits given alpha. **example**: stats.boxcox([4,17,8,3],0.9) the result is ([ 1.03301717 1.60587825 1.35353026 0.8679017 ],-0.447422166194,(-0.5699221654511225, -0.3259515659400082)) ============== boxcox normmax ============== Compute optimal Box-Cox transform parameter for input data **The output are:** maxlog : float or ndarray The optimal transform parameter found. An array instead of a scalar for method='all'. **example**: stats.boxcox_normmax([4,17,8,3],(-2,2),'pearsonr')the result is (-0.702386238971) ========== boxcox llf ========== The boxcox log-likelihood function **The output are:** llf : float or ndarray Box-Cox log-likelihood of data given lmb. A float for 1-D data, an array otherwise. **example**: stats.boxcox_llf(1,[4,17,8,3]) the result is (-6.83545336723) ======= entropy ======= Calculate the entropy of a distribution for given probability values. If only probabilities pk are given, the entropy is calculated as S = -sum(pk * log(pk), axis=0). If qk is not None, then compute the Kullback-Leibler divergence S = sum(pk * log(pk / qk), axis=0). This routine will normalize pk and qk if they don’t sum to 1. **The output are:** S : float The calculated entropy. **example**: stats.entropy([4,17,8,3],[30,45,5,3],1.6)the result is (0.641692653659) ====== kstest ====== Perform the Kolmogorov-Smirnov test for goodness of fit. **The output are:** D : float KS test statistic, either D, D+ or D-. p-value : float One-tailed or two-tailed p-value. **example**: stats.kstest([4,17,8,3],'norm',N=20,alternative='two-sided',mode='approx')the result is (0.998650101968,6.6409100441e-12) =========== theilslopes =========== Computes the Theil-Sen estimator for a set of points (x, y). theilslopes implements a method for robust linear regression. It computes the slope as the median of all slopes between paired values. **The output are:** medslope : float Theil slope. medintercept : float Intercept of the Theil line, as median(y) - medslope*median(x). lo_slope : float Lower bound of the confidence interval on medslope. up_slope : float Upper bound of the confidence interval on medslope. **example**: stats.theilslopes([4,17,8,3],[30,45,5,3],0.95)the result is (0.279166666667,1.11458333333,-0.16,2.5) </help> </tool>