Mercurial > repos > iuc > aegean_parseval

<tool id='aegean_parseval' name='AEGeAn ParsEval' version='@TOOL_VERSION@' profile='20.01'>
    <description> compare two sets of gene annotations for the same sequence.</description>
    <macros>
        <import>macros.xml</import>
    </macros>
    <expand macro='xrefs'/>
    <expand macro='edam_ontology'/>
    <expand macro='requirements'/>
    <version_command>parseval --version</version_command>
    <command detect_errors='exit_code'>
        <![CDATA[
            #if $output_type == 'html'
                mkdir -p '${output_html.extra_files_path}' &&
            #end if
            parseval '$referencegff3' '$predictiongff3'
            --delta $delta
            --maxtrans $maxtrans
            -w
            #if $refrlabel
                --refrlabel '$refrlabel'
            #end if
            #if $predlabel
                --predlabel '$predlabel'
            #end if
            #if $output_type =='text'
                -f 'text'
                -o '$output_txt'
            #else if $output_type == 'html'
                -f 'html'
                -o '${output_html.files_path}'  &&
                echo "</div> </body> </html>" >> '${output_html.files_path}'/index.html &&
                cp '${output_html.files_path}'/index.html '$output_html'
            #end if
            ]]>
    </command>
    <inputs>
        <param name='referencegff3' type='data'  format='gff3' label="Reference GFF3 file"/>
        <param name='predictiongff3' type='data'  format='gff3' label="Prediction GFF3 file"/>
        <param argument='--delta' type='integer'
            min='0' max='20' value='0'
            label='Number of nucleotides to extend gene loci'/>
        <param argument='--maxtrans' type='integer'
            min='1' max='50' value='32'
            label='Maximum transcript allowed per locus' />
        <param name='output_type' type='select'
            label='Select the output type'>
            <option value='text'>Text</option>
            <option value='html'>HTML</option>
        </param>
        <param argument='--refrlabel' type='text'
            value='' optional='true'
            label='Reference annotations'
            help='Optional label for reference annotations'/>
        <param argument='--predlabel' type='text'
            value='' optional='true'
            label='Prediction annotations'
            help='Optional label for prediction annotations'/>
    </inputs>
    <outputs>
        <data name='output_txt' format='txt'>
            <filter>output_type == 'text'</filter>
        </data>
        <data name='output_html' format='html' from_work_dir="output.html">
            <filter>output_type == 'html'</filter>
        </data>
    </outputs>
    <tests>
        <test expect_num_outputs="1">
            <param name='referencegff3' value='TAIR9_GFF3_genes.gff'/>
            <param name='predictiongff3' value='TAIR10_GFF3_genes.gff'/>
            <param name='delta' value='5'/>
            <param name='maxtrans' value='20'/>
            <output name='output_txt' file='parseval_output_test1.txt' lines_diff='8'/>
        </test>
        <test expect_num_outputs="1">
            <param name='referencegff3' value='TAIR9_GFF3_genes.gff'/>
            <param name='predictiongff3' value='TAIR10_GFF3_genes.gff'/>
            <output name='output_txt' file='parseval_output_test2.txt' lines_diff='8'/>
        </test>
        <test expect_num_outputs="1">
            <param name='referencegff3' value='TAIR9_GFF3_genes.gff'/>
            <param name='predictiongff3' value='TAIR10_GFF3_genes.gff'/>
            <param name='delta' value='10'/>
            <param name='maxtrans' value='10'/>
            <output name='output_txt' file='parseval_output_test3.txt' lines_diff='8'/>
        </test>
        <test expect_num_outputs="1">
            <param name='referencegff3' value='TAIR9_GFF3_genes.gff'/>
            <param name='predictiongff3' value='TAIR10_GFF3_genes.gff'/>
            <param name='output_type' value='html'/>
            <output name='output_html' file='parseval_output_test4.html' lines_diff='8'/>
        </test>
        <test expect_num_outputs="1">
            <param name='referencegff3' value='TAIR9_GFF3_genes.gff'/>
            <param name='predictiongff3' value='TAIR10_GFF3_genes.gff'/>
            <param name='output_type' value='html'/>
            <param name='delta' value='10'/>
            <param name='maxtrans' value='10'/>
            <output name='output_html' file='parseval_output_test5.html' lines_diff='8'/>
        </test>
        <test expect_num_outputs="1">
            <param name='referencegff3' value='TAIR9_GFF3_genes.gff'/>
            <param name='predictiongff3' value='TAIR10_GFF3_genes.gff'/>
            <param name='output_type' value='text'/>
            <param name='refrlabel' value='example_ref_label'/>
            <param name='predlabel' value='example_pred_label'/>
            <output name='output_txt' file='parseval_output_test6.txt' lines_diff='8'/>
        </test>
    </tests>
    <help>
        <![CDATA[
.. class:: infomark

**Purpose**

ParsEval is a program for comparing two sets of gene annotations for the same sequence. The most common use cases for ParsEval are as follows.

- You are annotating a newly assembled genome. The optimal parameter settings for annotation are not clear initially, so you do some exploratory data analysis and try several different parameter settings. You can use ParsEval to identify the similarities and differences between the different annotations you have produced.

- You are doing a genome-wide analysis of genes in your favorite organism. There is a gene annotation available from the consortium that sequenced and assembled the genome, but there is a different annotation available at NCBI. Again, ParsEval is the best way to compare these two annotations to quickly identify their similarities and differences.

-----

.. class:: infomark

**Input**

Input for ParsEval is two sets of annotations in GFF3 format. ParsEval uses the GenomeTools GFF3 parser, which strictly enforces syntax rules laid out in the GFF3 specification. ParsEval itself does some additional checks on the data to make sure valid comparisons are possible. Any features not directly related to protein-coding genes are ignored.

ParsEval will infer features implicitly encoded in the data. For example, if a gene annotation declares 6 exon features but no intron features, ParsEval will infer the 5 corresponding intron features from the exon boundaries. However, if ParsEval sees any intron features in a gene model it will assume all introns are declared explicitly. Violations of that assumption will likely elicit a program error.

ParsEval is pretty flexible in handling various common conventions for encoding gene structure: exons + start/stop codons, exons + CDS, CDS + UTRs, etc. Any subset of features that completely captures the gene’s exon/intron structure, CDS(s), and UTRs should be handled correctly.

ParsEval requires that gene isoforms be encoded using the feature type mRNA (as opposed to transcript, primary_transcript, or other valid SO terms). For mRNA features lacking an explicitly declared gene parent, ParsEval will create one. Note, however, that ParsEval will treat all such transcripts as belonging to separate distinct genes, which will erroneously inflate summary statistics reported by ParsEval.

-----

.. class:: infomark

**Output**

ParsEval output includes a variety of similarity statistics that measure the agreement between the two annotations. Our use of agreement here instead of accuracy is intentional: except in a very few rare cases, you will not be comparing a prediction to a true high-quality “gold standard.” It is much more common to compare two annotation sets whose relative quality is unknown. ParsEval uses the terms reference and prediction only to distinguish the two sets: it makes no assumptions as to their relative quality.


]]>
    </help>
    <expand macro='citations'/>
</tool>