view check_aln_design_file.xml @ 0:8b2027117ce5 draft default tip

"planemo upload for repository https://github.com/McIntyre-Lab/BayesASE/tree/main/galaxy commit b0be4c13808f3b2973aad4df5bfe87c6f41a3359"
author malex
date Thu, 14 Jan 2021 21:31:44 +0000
parents
children
line wrap: on
line source

<tool id="base_check_alignment_design_file" name="Check Alignment Design File" version="21.1.13">
    <description>for correct formatting and duplicate FASTQ names </description>
    <macros>
      <import>macros.xml</import>
    </macros>
    <expand macro="requirements" />
    <command><![CDATA[
    check_aln_design_file.py
    --design=$design
    --logfile=$logfile
    --dups=$dups
]]></command>
    <inputs>
        <param name="design" type="data" format="tabular,tsv" label="Alignment Design file" help="Design file containing FASTQ file names, sampleIDs, etc. Refer to the help section at the bottom of this page for the format. [Required]"/>
    </inputs>
    <outputs>
        <data format="tabular" name="dups" label="${tool.name} on ${on_string}: FASTQ Duplicates"/>
        <data format="tabular" name="logfile" label="${tool.name} on ${on_string}: Alignment Design Criteria"/>
    </outputs>
    <tests>
        <test>
            <param name="design" ftype="data"     value="align_and_counts_test_data/alignment_design_file.tsv"/>
            <output name="dups"  ftype="data" file="align_and_counts_test_data/alignment_design_file_duplicates.tabular" />
            <output name="logfile"  ftype="data" file="align_and_counts_test_data/alignment_design_file_criteria.csv" />
        </test>
    </tests>
    <help><![CDATA[
**Tool Description**

This tool checks to make sure the Alignment design file is formatted correctly and has all needed headers. Also verifies that there are no duplicate FastQ file names.
Default values are already in place to save user time.

**NOTE:** There are two Design Files that must be created and supplied by the user. They are the *Alignment Design File* and the *Comparate Design File*.

All others (Sample and Priors) are created and used as intermediates by the BASE workflows.

**The design file should contain the following columns, in order:**

	(1) G1 - name of parental genome 1 for alignment
	(2) G2 - name of parental genome 2 for alignment
	(3) sampleID - sample identifier (no spaces)
	(4) fqName - name of the column containing the FASTQ file names, WITHOUT the extension
	(5) fqExtension - FASTQ file extension, for example, .fq or .fastq (NOT gzipped)
	(6) techRep - name of the column containing the technical replicates for each sampleID, for example, the same library run on different lanes.
	(7) readLength - the read length in base pairs

The sample identifier must contain the biological replicate number, and the comparate conditions to be tested in the Bayesian Model for Allelic Imbalance.

An example of a comparate condition is W1118_F, where W1118 is the genome and F refers to female. There must be at least two comparate conditions for Bayesian Analysis.

In the example below, there are two comparate condtions, W55_Mated and W55_Virgin, and E1 refers to the biological replicate number.

An example design file::

	G1	G2	sampleID		fqName			fqExtension	techRep	readLength
	W1118	W55	mel_W55_Mated_E1	mel_W55_Mated_E1_R1	.fq		1	150
	W1118	W55	mel_W55_Mated_E1	mel_W55_Mated_E1_R2	.fq		2	150
	W1118	W55	mel_W55_Mated_E1	mel_W55_Mated_E1_R3	.fq		3	150
	W1118	W55	mel_W55_Virgin_E1	mel_W55_Virgin_E1_R1	.fq		1	150
	W1118	W55	mel_W55_Virgin_E1	mel_W55_Virgin_E1_R2	.fq		2	150
	W1118	W55	mel_W55_Virgin_E1	mel_W55_Virgin_E1_R3	.fq		3	150

If using simulated reads, include the technical replicate column, but label the technical replicates with the same number

Example design file for simulated data ::

	G1	G2      sampleID   fqName	fqExtension	techRep	readLength
	W1118	W55      W55_M_1   SRR1989586_1     .fq		1	96
	W1118	W55      W55_M_2   SRR1989588_1     .fq		1	96
	W1118	W55      W55_V_1   SRR1989592_1     .fq		1	96
	W1118	W55      W55_V_2   SRR1989594_1     .fq		1	96

**Outputs**

	(1) a logfile with information about whether the column names are correct and if there are any duplicated FASTQ file names.
	(2) a text file containing a list of any duplicated FASTQ file names, if present.

    ]]></help>
    <citations>
            <citation type="bibtex">@ARTICLE{Miller20BASE,
            author = {Brecca Miller, Alison M. Morse, Elyse Borgert, Zihao Liu, Kelsey Sinclair, Gavin Gamble, Fei Zou, Jeremy Newman, Luis Leon Novello, Fabio Marroni, Lauren M. McIntyre},
            title = {Testcrosses are an efficient strategy for identifying cis regulatory variation: Bayesian analysis of allele imbalance among conditions (BASE)},
            journal = {????},
            year = {submitted for publication}
            }</citation>
        </citations>
</tool>