Mercurial > repos > cropgeeks > ukseed

--- a/ukseed_stage1.xml	Fri Apr 20 14:44:25 2018 -0400
+++ b/ukseed_stage1.xml	Sun Apr 22 04:44:44 2018 -0400
@@ -27,11 +27,17 @@
     </stdio>

 <help>
-In **UK-SeeD Data Analysis Infrastructure**, a BBSRC-Newton funded project, we have deployed an advanced computing hardware and software platform for the
-analysis of large genomics datasets for wheat varieties. The platform integrates computing resources and bioinformatics expertise to
-enable crop geneticists to implement sophisticated data analysis algorithms to improve the use of genetic resources for wheat and
-other important crops. The computing platform is distributed across the partners’ sites with hardware deployed at CIMMYT (Mexico) and
-the Earlham Institute (UK).
+This pipeline has been developed for loading a DArT SNP or SilicoDArT, report and apply filters to those datasets based on locus and
+individuals call rates and locus reproducibility. It also allows data export for other formats such as GDS, plink bed and a text file
+with a header line, and then one line per sample with V+6 where V is the number of variants suitable for loading into R. Finally the
+pipeline perform a Principal Coordinates Analysis (PCoA, = Multidimensional scaling, MDS) to explore similarities of data and outputs
+a vcf file suitable for visualizing in CurlyWhirly.
+
+In **UK-SeeD Data Analysis Infrastructure**, a BBSRC-Newton funded project, we have deployed an advanced computing hardware and software
+platform for the analysis of large genomics datasets for wheat varieties. The platform integrates computing resources and bioinformatics
+expertise to enable crop geneticists to implement sophisticated data analysis algorithms to improve the use of genetic resources for
+wheat and other important crops. The computing platform is distributed across the partners’ sites with hardware deployed at CIMMYT
+(Mexico) and the Earlham Institute (UK).

 |LOGOS|
--- a/ukseed_stage2.xml	Fri Apr 20 14:44:25 2018 -0400
+++ b/ukseed_stage2.xml	Sun Apr 22 04:44:44 2018 -0400
@@ -21,14 +21,11 @@
         <param format="csv,txt" name="input" type="data" label="Input file"
             help="Input file of genotype data"/>

-		<param name="gl_call_rate" type="float" value="0.75" label="gl_call_rate"
-			help="gl_call_rate"/>
+		<param name="gl_call_rate" type="float" value="0.75" label="Minimum call rate per locus"/>

-		<param name="gl_final" type="float" value="0.8" label="gl_final"
-			help="gl_final"/>
+		<param name="gl_final" type="float" value="0.8" label="Minimum call rate per individual"/>

-		<param name="gl_rep" type="float" value="0.98" label="gl_rep"
-			help="gl_rep"/>
+		<param name="gl_rep" type="float" value="0.98" label="Minimum locus reproducibility"/>
     </inputs>

     <outputs>
@@ -40,11 +37,17 @@
     </stdio>

 <help>
-In **UK-SeeD Data Analysis Infrastructure**, a BBSRC-Newton funded project, we have deployed an advanced computing hardware and software platform for the
-analysis of large genomics datasets for wheat varieties. The platform integrates computing resources and bioinformatics expertise to
-enable crop geneticists to implement sophisticated data analysis algorithms to improve the use of genetic resources for wheat and
-other important crops. The computing platform is distributed across the partners’ sites with hardware deployed at CIMMYT (Mexico) and
-the Earlham Institute (UK).
+This pipeline has been developed for loading a DArT SNP or SilicoDArT, report and apply filters to those datasets based on locus and
+individuals call rates and locus reproducibility. It also allows data export for other formats such as GDS, plink bed and a text file
+with a header line, and then one line per sample with V+6 where V is the number of variants suitable for loading into R. Finally the
+pipeline perform a Principal Coordinates Analysis (PCoA, = Multidimensional scaling, MDS) to explore similarities of data and outputs
+a vcf file suitable for visualizing in CurlyWhirly.
+
+In **UK-SeeD Data Analysis Infrastructure**, a BBSRC-Newton funded project, we have deployed an advanced computing hardware and software
+platform for the analysis of large genomics datasets for wheat varieties. The platform integrates computing resources and bioinformatics
+expertise to enable crop geneticists to implement sophisticated data analysis algorithms to improve the use of genetic resources for
+wheat and other important crops. The computing platform is distributed across the partners’ sites with hardware deployed at CIMMYT
+(Mexico) and the Earlham Institute (UK).

 |LOGOS|