# HG changeset patch # User rico # Date 1333652778 14400 # Node ID 3cc35686acfb6b7c2eac83d48e86545dff1bdf27 Uploaded diff -r 000000000000 -r 3cc35686acfb add_fst_column.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/add_fst_column.xml Thu Apr 05 15:06:18 2012 -0400 @@ -0,0 +1,90 @@ + + to a table + + + add_fst_column.py "$input" "$p1_input" "$p2_input" "$data_source" "$min_reads" "$min_qual" "$retain" "$discard_fixed" "$biased" "$output" + #for $individual, $individual_col in zip($input.dataset.metadata.individual_names, $input.dataset.metadata.individual_columns) + #set $arg = '%s:%s' % ($individual_col, $individual) + "$arg" + #end for + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +The user specifies a SNP table and two "populations" of individuals, +both previously defined using the Galaxy tool to select individuals from +a SNP table. No individual can be in both populations. Other choices are +as follows. + +Data source. The allele frequencies of a SNP in the two populations can be +estimated either by the total number of reads of each allele, or by adding +the frequencies inferred from genotypes of individuals in the populations. + +After specifying the data source, the user sets lower bounds on amount +of data required at a SNP. For estimating the Fst using read counts, +the bound is the minimum count of reads of the two alleles in a population. +For estimations based on genotype, the bound is the minimum reported genotype +quality per individual. + +The user specifies whether the SNPs that violate the lower bound should be +ignored or the Fst set to -1. + +The user specifies whether SNPs where both populations appear to be fixed +for the same allele should be retained or discarded. + +Finally, the user chooses which definition of Fst to use: Wright's original +definition or Weir's unbiased estimator. + +A column is appended to the SNP table giving the Fst for each retained SNP. + +