# HG changeset patch # User bcrain-completegenomics # Date 1339689887 14400 # Node ID e8bedf1cbdb33d97e35d3c526b0631678d293a41 # Parent 26c4a8289928d05d1f1d49662b6f1284b0634f37 Deleted selected files diff -r 26c4a8289928 -r e8bedf1cbdb3 cgatools/tool-data/cg_crr_files.loc.sample --- a/cgatools/tool-data/cg_crr_files.loc.sample Tue Jun 12 13:16:21 2012 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,11 +0,0 @@ -#This is a sample file distributed with Galaxy that enables tools -#to use .crr reference files. You will need to download or create -#the .crr reference files and then create a cg_crr_files.loc file -#similar to this one (store it in this directory) that points to -#the location of the files. The cg_crr_files.loc -#file has this format (white space characters are TAB characters): -# -# -# -#hg19 hg19 hg19.crr /Users/bcrain/Documents/hg19.crr - diff -r 26c4a8289928 -r e8bedf1cbdb3 cgatools/tool_data_table_conf.xml.sample --- a/cgatools/tool_data_table_conf.xml.sample Tue Jun 12 13:16:21 2012 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,9 +0,0 @@ - - - - value, dbkey, name, path - -
- diff -r 26c4a8289928 -r e8bedf1cbdb3 cgatools/tools/cgatools/calldiff.xml --- a/cgatools/tools/cgatools/calldiff.xml Tue Jun 12 13:16:21 2012 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,343 +0,0 @@ - - - compares two Complete Genomics variant files. - - - cgatools - - - - cgatools calldiff --beta - --reference ${crr.fields.path} - --variantsA $data_sources.inputA - --variantsB $data_sources.inputB - $validation - $diploid - --locus-stats-column-count $column - --max-hypothesis-count $hypothesis - --output-prefix cg_ - --reports `echo ${report1} ${report2} ${report3} ${report4} ${report5} ${somatic.report6} | sed 's/ */,/g'` - #if $somatic.report6 == "SomaticOutput" - --genome-rootA $somatic.genomeA - --genome-rootB $somatic.genomeB - --calibration-root $somatic.calibration - #end if - - - - - (report1 == 'SuperlocusOutput') - - - (report2 == 'SuperlocusStats') - - - (report3 == 'LocusOutput') - - - (report4 == 'LocusStats') - - - (report5 == 'VariantOutput') - - - (report5 == 'VariantOutput') - - - (somatic['report6'] == 'SomaticOutput') - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -**What it does** - -This tool compares two Complete Genomics variant files. - -cgatools: http://sourceforge.net/projects/cgatools/files/ - ------ - -**cgatools Manual**:: - - COMMAND NAME - calldiff - Compares two Complete Genomics variant files. - - DESCRIPTION - Compares two Complete Genomics variant files. Divides the genome up into - superloci of nearby variants, then compares the superloci. Also refines the - comparison to determine per-call or per-locus comparison results. - - Comparison results are usually described by a semi-colon separated string, - one per allele. Each allele's comparison result is one of the following - classifications: - - ref-identical The alleles of the two variant files are identical, and - they are consistent with the reference. - alt-identical The alleles of the two variant files are identical, and - they are inconsistent with the reference. - ref-consistent The alleles of the two variant files are consistent, - and they are consistent with the reference. - alt-consistent The alleles of the two variant files are consistent, - and they are inconsistent with the reference. - onlyA The alleles of the two variant files are inconsistent, - and only file A is inconsistent with the reference. - onlyB The alleles of the two variant files are inconsistent, - and only file B is inconsistent with the reference. - mismatch The alleles of the two variant files are inconsistent, - and they are both inconsistent with the reference. - phase-mismatch The two variant files would be consistent if the - hapLink field had been empty, but they are - inconsistent. - ploidy-mismatch The superlocus did not have uniform ploidy. - - In some contexts, this classification is rolled up into a simplified - classification, which is one of "identical", "consistent", "onlyA", - "onlyB", or "mismatch". - - A good place to start looking at the results is the superlocus-output file. - It has columns defined as follows: - - SuperlocusId An identifier given to the superlocus. - Chromosome The name of the chromosome. - Begin The 0-based offset of the start of the superlocus. - End The 0-based offset of the base one past the end of the - superlocus. - Classification The match classification of the superlocus. - Reference The reference sequence. - AllelesA A semicolon-separated list of the alleles (one per - haplotype) for variant file A, for the phasing with the - best comparison result. - AllelesB A semicolon-separated list of the alleles (one per - haplotype) for variant file B, for the phasing with the - best comparison result. - - The locus-output file contains, for each locus in file A and file B that is - not consistent with the reference, an annotated set of calls for the locus. - The calls are annotated with the following columns: - - SuperlocusId The id of the superlocus containing the locus. - File The variant file (A or B). - LocusClassification The locus classification is determined by the - varType column of the call that is inconsistent - with the reference, concatenated with a - modifier that describes whether the locus is - heterozygous, homozygous, or contains no-calls. - If there is no one variant in the locus (i.e., - it is heterozygous alt-alt), the locus - classification begins with "other". - LocusDiffClassification The match classification for the locus. This is - defined to be the best of the comparison of the - locus to the same region in the other file, or - the comparison of the superlocus. - - The somatic output file contains a list of putative somatic variations of - genome A. The output includes only those loci that can be classified as - snp, del, ins or sub in file A, and are called reference in the file B. - Every locus is annotated with the following columns: - - VarCvgA The totalReadCount from file A for this locus - (computed on the fly if file A is not a - masterVar file). - VarScoreA The varScoreVAF from file A, or varScoreEAF if - the "--diploid" option is used. - RefCvgB The maximum of the uniqueSequenceCoverage - values for the locus in genome B. - RefScoreB Minimum of the reference scores of the locus in - genome B. - SomaticCategory The category used for determining the - calibrated scores and the SomaticRank. - VarScoreACalib The calibrated variant score of file A, under - the model selected by using or not using the - "--diploid" option, and corrected for the count - of heterozygous variants observed in this - genome. See user guide for more information. - VarScoreBCalib The calibrated reference score of file B, under - the model selected by using or not using the - "--diploid" option, and corrected for the count - of heterozygous variants observed in this - genome. See user guide for more information. - SomaticRank The estimated rank of this somatic mutation, - amongst all true somatic mutations within this - SomaticCategory. The value is a number between - 0 and 1; a value of 0.012 means, for example, - that an estimated 1.2% of the true somatic - mutations in this somaticCategory have a - somaticScore less than the somaticScore for - this mutation. See user guide for more - information. - SomaticScore An integer that provides a total order on - quality for all somatic mutations. It is equal - to -10*log10( P(false)/P(true) ), under the - assumption that this genome has a rate of - somatic mutation equal to 1/Mb for - SomaticCategory snp, 1/10Mb for SomaticCategory - ins, 1/10Mb for SomaticCategory del, and 1/20Mb - for SomaticCategory sub. The computation is - based on the assumptions described in the user - guide, and is affected by choice of variant - model selected by using or not using the - "--diploid" option. - SomaticQuality Equal to VQHIGH for all somatic mutations where - SomaticScore >= -10. Otherwise, this column is - empty. - - OPTIONS - -h [ --help ] - Print this help message. - - --reference arg - The input crr file. - - --variantsA arg - The "A" input variant file. - - --variantsB arg - The "B" input variant file. - - --output-prefix arg - The path prefix for all output reports. - - --reports arg (=SuperlocusOutput,SuperlocusStats,LocusOutput,LocusStats) - Comma-separated list of reports to generate. (Beware any reports whose - name begins with "Debug".) A report is one of: - SuperlocusOutput Report for superlocus classification. - SuperlocusStats Report for superlocus classification stats. - LocusOutput Report for locus classification. - LocusStats Report for locus stats. - VariantOutput Both variant files annotated by comparison - results.If the somatic output report is - requested, file A is also annotated with the - same score ranks as produced in that report. - SomaticOutput Report for the list of simple variations that - are present only in file "A", annotated with - the score that indicates the probability of - the variation being truly somatic. Requires - beta, genome-rootA, and genome-rootB options - to be provided as well. Note: generating this - report slows calldiff by 10x-20x. - DebugCallOutput Report for call classification. - DebugSuperlocusOutput Report for debug superlocus information. - DebugSomaticOutput Report for distribution estimates used for - somatic rescoring. Only produced if - SomaticOutput is also turned on. - - --diploid - Uses varScoreEAF instead of varScoreVAF in somatic score computations. - Also, uses diploid variant model instead of variable allele mixture - model. - - --locus-stats-column-count arg (=15) - The number of columns for locus compare classification in the locus - stats file. - - --max-hypothesis-count arg (=32) - The maximum number of possible phasings to consider for a superlocus. - - --no-reference-cover-validation - Turns off validation that all bases of a chromosome are covered by - calls of the variant file. - - --genome-rootA arg - The "A" genome directory, for example /data/GS00118-DNA_A01; this - directory is expected to contain ASM/REF and ASM/EVIDENCE - subdirectories. - - --genome-rootB arg - The "B" genome directory. - - --calibration-root arg - The directory containing calibration data. For example, there should - exist a file calibration-root/0.0.0/metrics.tsv. - - --beta - This flag enables the SomaticOutput report, which is beta - functionality. - - SUPPORTED FORMAT_VERSION - 0.3 or later - -