# HG changeset patch
# User devteam
# Date 1400511582 14400
# Node ID 4414f07398086616acbf64502a43b55cf877353f
Imported from capsule None
diff -r 000000000000 -r 4414f0739808 annotation_profiler.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/annotation_profiler.xml Mon May 19 10:59:42 2014 -0400
@@ -0,0 +1,147 @@
+
+ for a set of genomic intervals
+
+ bx-python
+
+ annotation_profiler_for_interval.py -i $input1 -c ${input1.metadata.chromCol} -s ${input1.metadata.startCol} -e ${input1.metadata.endCol} -o $out_file1 $keep_empty -p ${GALAXY_DATA_INDEX_DIR}/annotation_profiler/$dbkey $summary -b 3 -t $table_names
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+**What it does**
+
+Takes an input set of intervals and for each interval determines the base coverage of the interval by a set of features (tables) available from UCSC. Genomic regions from the input feature data have been merged by overlap / direct adjacency (e.g. a table having ranges of: 1-10, 6-12, 12-20 and 25-28 results in two merged ranges of: 1-20 and 25-28).
+
+By default, this tool will check the coverage of your intervals against all available features; you may, however, choose to select only those tables that you want to include. Selecting a section heading will effectively cause all of its children to be selected.
+
+You may alternatively choose to receive a summary across all of the intervals that you provide.
+
+-----
+
+**Example**
+
+Using the interval below and selecting several tables::
+
+ chr1 4558 14764 uc001aab.1 0 -
+
+results in::
+
+ chr1 4558 14764 uc001aab.1 0 - snp126Exceptions 151 142
+ chr1 4558 14764 uc001aab.1 0 - genomicSuperDups 10206 1
+ chr1 4558 14764 uc001aab.1 0 - chainOryLat1 3718 1
+ chr1 4558 14764 uc001aab.1 0 - multiz28way 10206 1
+ chr1 4558 14764 uc001aab.1 0 - affyHuEx1 3553 32
+ chr1 4558 14764 uc001aab.1 0 - netXenTro2 3050 1
+ chr1 4558 14764 uc001aab.1 0 - intronEst 10206 1
+ chr1 4558 14764 uc001aab.1 0 - xenoMrna 10203 1
+ chr1 4558 14764 uc001aab.1 0 - ctgPos 10206 1
+ chr1 4558 14764 uc001aab.1 0 - clonePos 10206 1
+ chr1 4558 14764 uc001aab.1 0 - chainStrPur2Link 1323 29
+ chr1 4558 14764 uc001aab.1 0 - affyTxnPhase3HeLaNuclear 9011 8
+ chr1 4558 14764 uc001aab.1 0 - snp126orthoPanTro2RheMac2 61 58
+ chr1 4558 14764 uc001aab.1 0 - snp126 205 192
+ chr1 4558 14764 uc001aab.1 0 - chainEquCab1 10206 1
+ chr1 4558 14764 uc001aab.1 0 - netGalGal3 3686 1
+ chr1 4558 14764 uc001aab.1 0 - phastCons28wayPlacMammal 10172 3
+
+Where::
+
+ The first added column is the table name.
+ The second added column is the number of bases covered by the table.
+ The third added column is the number of regions from the table that is covered by the interval.
+
+Alternatively, requesting a summary, using the intervals below and selecting several tables::
+
+ chr1 4558 14764 uc001aab.1 0 -
+ chr1 4558 19346 uc001aac.1 0 -
+
+results in::
+
+ #tableName tableSize tableRegionCount allIntervalCount allIntervalSize allCoverage allTableRegionsOverlaped allIntervalsOverlapingTable nrIntervalCount nrIntervalSize nrCoverage nrTableRegionsOverlaped nrIntervalsOverlapingTable
+ snp126Exceptions 133601 92469 2 24994 388 359 2 1 14788 237 217 1
+ genomicSuperDups 12268847 657 2 24994 24994 2 2 1 14788 14788 1 1
+ chainOryLat1 70337730 2542 2 24994 7436 2 2 1 14788 3718 1 1
+ affyHuEx1 15703901 112274 2 24994 7846 70 2 1 14788 4293 38 1
+ netXenTro2 111440392 1877 2 24994 6100 2 2 1 14788 3050 1 1
+ snp126orthoPanTro2RheMac2 700436 690674 2 24994 124 118 2 1 14788 63 60 1
+ intronEst 135796064 2332 2 24994 24994 2 2 1 14788 14788 1 1
+ xenoMrna 129031327 1586 2 24994 20406 2 2 1 14788 10203 1 1
+ snp126 956976 838091 2 24994 498 461 2 1 14788 293 269 1
+ clonePos 224999719 39 2 24994 24994 2 2 1 14788 14788 1 1
+ chainStrPur2Link 7948016 119841 2 24994 2646 58 2 1 14788 1323 29 1
+ affyTxnPhase3HeLaNuclear 136797870 140244 2 24994 22601 17 2 1 14788 13590 9 1
+ multiz28way 225928588 38 2 24994 24994 2 2 1 14788 14788 1 1
+ ctgPos 224999719 39 2 24994 24994 2 2 1 14788 14788 1 1
+ chainEquCab1 246306414 141 2 24994 24994 2 2 1 14788 14788 1 1
+ netGalGal3 203351973 461 2 24994 7372 2 2 1 14788 3686 1 1
+ phastCons28wayPlacMammal 221017670 22803 2 24994 24926 6 2 1 14788 14754 3 1
+
+Where::
+
+ tableName is the name of the table
+ tableChromosomeCoverage is the number of positions existing in the table for only the chromosomes that were referenced by the interval file
+ tableChromosomeCount is the number of regions existing in the table for only the chromosomes that were referenced by the interval file
+ tableRegionCoverage is the number of positions existing in the table between the minimal and maximal bounding regions that were referenced by the interval file
+ tableRegionCount is the number of regions existing in the table between the minimal and maximal bounding regions that were referenced by the interval file
+
+ allIntervalCount is the number of provided intervals
+ allIntervalSize is the sum of the lengths of the provided interval file
+ allCoverage is the sum of the coverage for each provided interval
+ allTableRegionsOverlapped is the sum of the number of regions of the table (non-unique) that were overlapped for each interval
+ allIntervalsOverlappingTable is the number of provided intervals which overlap the table
+
+ nrIntervalCount is the number of non-redundant intervals
+ nrIntervalSize is the sum of the lengths of non-redundant intervals
+ nrCoverage is the sum of the coverage of non-redundant intervals
+ nrTableRegionsOverlapped is the number of regions of the table (unique) that were overlapped by the non-redundant intervals
+ nrIntervalsOverlappingTable is the number of non-redundant intervals which overlap the table
+
+
+.. class:: infomark
+
+**TIP:** non-redundant (nr) refers to the set of intervals that remains after the intervals provided have been merged to resolve overlaps
+
+------
+
+**Citation**
+
+For the underlying data, please see http://genome.ucsc.edu/cite.html for the proper citation.
+
+If you use this tool in Galaxy, please cite Blankenberg D, et al. *In preparation.*
+
+
+
diff -r 000000000000 -r 4414f0739808 annotation_profiler_for_interval.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/annotation_profiler_for_interval.py Mon May 19 10:59:42 2014 -0400
@@ -0,0 +1,358 @@
+#!/usr/bin/env python
+#Dan Blankenberg
+#For a set of intervals, this tool returns the same set of intervals
+#with 2 additional fields: the name of a Table/Feature and the number of
+#bases covered. The original intervals are repeated for each Table/Feature.
+
+import sys, struct, optparse, os, random
+import bx.intervals.io
+import bx.bitset
+try:
+ import psyco
+ psyco.full()
+except:
+ pass
+
+assert sys.version_info[:2] >= ( 2, 4 )
+
+class CachedRangesInFile:
+ DEFAULT_STRUCT_FORMAT = ' self._coverage[-1][1]:
+ return len( self._coverage ) - 1
+ i = 0
+ j = len( self._coverage) - 1
+ while i < j:
+ k = ( i + j ) / 2
+ if start <= self._coverage[k][1]:
+ j = k
+ else:
+ i = k + 1
+ return i
+ def get_coverage( self, start, end ):
+ return self.get_coverage_regions_overlap( start, end )[0]
+ def get_coverage_regions_overlap( self, start, end ):
+ return self.get_coverage_regions_index_overlap( start, end )[0:2]
+ def get_coverage_regions_index_overlap( self, start, end ):
+ if len( self._coverage ) < 1 or start > self._coverage[-1][1] or end < self._coverage[0][0]:
+ return 0, 0, 0
+ if self._total_coverage and start <= self._coverage[0][0] and end >= self._coverage[-1][1]:
+ return self._total_coverage, len( self._coverage ), 0
+ coverage = 0
+ region_count = 0
+ start_index = self.get_start_index( start )
+ for i in xrange( start_index, len( self._coverage ) ):
+ c_start, c_end = self._coverage[i]
+ if c_start > end:
+ break
+ if c_start <= end and c_end >= start:
+ coverage += min( end, c_end ) - max( start, c_start )
+ region_count += 1
+ return coverage, region_count, start_index
+
+class CachedCoverageReader:
+ def __init__( self, base_file_path, buffer = 10, table_names = None, profiler_info = None ):
+ self._base_file_path = base_file_path
+ self._buffer = buffer #number of chromosomes to keep in memory at a time
+ self._coverage = {}
+ if table_names is None: table_names = [ table_dir for table_dir in os.listdir( self._base_file_path ) if os.path.isdir( os.path.join( self._base_file_path, table_dir ) ) ]
+ for tablename in table_names: self._coverage[tablename] = {}
+ if profiler_info is None: profiler_info = {}
+ self._profiler_info = profiler_info
+ def iter_table_coverage_by_region( self, chrom, start, end ):
+ for tablename, coverage, regions in self.iter_table_coverage_regions_by_region( chrom, start, end ):
+ yield tablename, coverage
+ def iter_table_coverage_regions_by_region( self, chrom, start, end ):
+ for tablename, coverage, regions, index in self.iter_table_coverage_regions_index_by_region( chrom, start, end ):
+ yield tablename, coverage, regions
+ def iter_table_coverage_regions_index_by_region( self, chrom, start, end ):
+ for tablename, chromosomes in self._coverage.iteritems():
+ if chrom not in chromosomes:
+ if len( chromosomes ) >= self._buffer:
+ #randomly remove one chromosome from this table
+ del chromosomes[ chromosomes.keys().pop( random.randint( 0, self._buffer - 1 ) ) ]
+ chromosomes[chrom] = RegionCoverage( os.path.join ( self._base_file_path, tablename, chrom ), self._profiler_info )
+ coverage, regions, index = chromosomes[chrom].get_coverage_regions_index_overlap( start, end )
+ yield tablename, coverage, regions, index
+
+class TableCoverageSummary:
+ def __init__( self, coverage_reader, chrom_lengths ):
+ self.coverage_reader = coverage_reader
+ self.chrom_lengths = chrom_lengths
+ self.chromosome_coverage = {} #dict of bitset by chromosome holding user's collapsed input intervals
+ self.total_interval_size = 0 #total size of user's input intervals
+ self.total_interval_count = 0 #total number of user's input intervals
+ self.table_coverage = {} #dict of total coverage by user's input intervals by table
+ self.table_chromosome_size = {} #dict of dict of table:chrom containing total coverage of table for a chrom
+ self.table_chromosome_count = {} #dict of dict of table:chrom containing total number of coverage ranges of table for a chrom
+ self.table_regions_overlaped_count = {} #total number of table regions overlaping user's input intervals (non unique)
+ self.interval_table_overlap_count = {} #total number of user input intervals which overlap table
+ self.region_size_errors = {} #dictionary of lists of invalid ranges by chromosome
+ def add_region( self, chrom, start, end ):
+ chrom_length = self.chrom_lengths.get( chrom )
+ region_start = min( start, chrom_length )
+ region_end = min( end, chrom_length )
+ region_length = region_end - region_start
+
+ if region_length < 1 or region_start != start or region_end != end:
+ if chrom not in self.region_size_errors:
+ self.region_size_errors[chrom] = []
+ self.region_size_errors[chrom].append( ( start, end ) )
+ if region_length < 1: return
+
+ self.total_interval_size += region_length
+ self.total_interval_count += 1
+ if chrom not in self.chromosome_coverage:
+ self.chromosome_coverage[chrom] = bx.bitset.BitSet( chrom_length )
+
+ self.chromosome_coverage[chrom].set_range( region_start, region_length )
+ for table_name, coverage, regions in self.coverage_reader.iter_table_coverage_regions_by_region( chrom, region_start, region_end ):
+ if table_name not in self.table_coverage:
+ self.table_coverage[table_name] = 0
+ self.table_chromosome_size[table_name] = {}
+ self.table_regions_overlaped_count[table_name] = 0
+ self.interval_table_overlap_count[table_name] = 0
+ self.table_chromosome_count[table_name] = {}
+ if chrom not in self.table_chromosome_size[table_name]:
+ self.table_chromosome_size[table_name][chrom] = self.coverage_reader._coverage[table_name][chrom]._total_coverage
+ self.table_chromosome_count[table_name][chrom] = len( self.coverage_reader._coverage[table_name][chrom]._coverage )
+ self.table_coverage[table_name] += coverage
+ if coverage:
+ self.interval_table_overlap_count[table_name] += 1
+ self.table_regions_overlaped_count[table_name] += regions
+ def iter_table_coverage( self ):
+ def get_nr_coverage():
+ #returns non-redundant coverage, where user's input intervals have been collapse to resolve overlaps
+ table_coverage = {} #dictionary of tables containing number of table bases overlaped by nr intervals
+ interval_table_overlap_count = {} #dictionary of tables containing number of nr intervals overlaping table
+ table_regions_overlap_count = {} #dictionary of tables containing number of regions overlaped (unique)
+ interval_count = 0 #total number of nr intervals
+ interval_size = 0 #holds total size of nr intervals
+ region_start_end = {} #holds absolute start,end for each user input chromosome
+ for chrom, chromosome_bitset in self.chromosome_coverage.iteritems():
+ #loop through user's collapsed input intervals
+ end = 0
+ last_end_index = {}
+ interval_size += chromosome_bitset.count_range()
+ while True:
+ if end >= chromosome_bitset.size: break
+ start = chromosome_bitset.next_set( end )
+ if start >= chromosome_bitset.size: break
+ end = chromosome_bitset.next_clear( start )
+ interval_count += 1
+ if chrom not in region_start_end:
+ region_start_end[chrom] = [start, end]
+ else:
+ region_start_end[chrom][1] = end
+ for table_name, coverage, region_count, start_index in self.coverage_reader.iter_table_coverage_regions_index_by_region( chrom, start, end ):
+ if table_name not in table_coverage:
+ table_coverage[table_name] = 0
+ interval_table_overlap_count[table_name] = 0
+ table_regions_overlap_count[table_name] = 0
+ table_coverage[table_name] += coverage
+ if coverage:
+ interval_table_overlap_count[table_name] += 1
+ table_regions_overlap_count[table_name] += region_count
+ if table_name in last_end_index and last_end_index[table_name] == start_index:
+ table_regions_overlap_count[table_name] -= 1
+ last_end_index[table_name] = start_index + region_count - 1
+ table_region_coverage = {} #total coverage for tables by bounding nr interval region
+ table_region_count = {} #total number for tables by bounding nr interval region
+ for chrom, start_end in region_start_end.items():
+ for table_name, coverage, region_count in self.coverage_reader.iter_table_coverage_regions_by_region( chrom, start_end[0], start_end[1] ):
+ if table_name not in table_region_coverage:
+ table_region_coverage[table_name] = 0
+ table_region_count[table_name] = 0
+ table_region_coverage[table_name] += coverage
+ table_region_count[table_name] += region_count
+ return table_region_coverage, table_region_count, interval_count, interval_size, table_coverage, table_regions_overlap_count, interval_table_overlap_count
+ table_region_coverage, table_region_count, nr_interval_count, nr_interval_size, nr_table_coverage, nr_table_regions_overlap_count, nr_interval_table_overlap_count = get_nr_coverage()
+ for table_name in self.table_coverage:
+ #TODO: determine a type of statistic, then calculate and report here
+ yield table_name, sum( self.table_chromosome_size.get( table_name, {} ).values() ), sum( self.table_chromosome_count.get( table_name, {} ).values() ), table_region_coverage.get( table_name, 0 ), table_region_count.get( table_name, 0 ), self.total_interval_count, self.total_interval_size, self.table_coverage[table_name], self.table_regions_overlaped_count.get( table_name, 0), self.interval_table_overlap_count.get( table_name, 0 ), nr_interval_count, nr_interval_size, nr_table_coverage[table_name], nr_table_regions_overlap_count.get( table_name, 0 ), nr_interval_table_overlap_count.get( table_name, 0 )
+
+def profile_per_interval( interval_filename, chrom_col, start_col, end_col, out_filename, keep_empty, coverage_reader ):
+ out = open( out_filename, 'wb' )
+ for region in bx.intervals.io.NiceReaderWrapper( open( interval_filename, 'rb' ), chrom_col = chrom_col, start_col = start_col, end_col = end_col, fix_strand = True, return_header = False, return_comments = False ):
+ for table_name, coverage, region_count in coverage_reader.iter_table_coverage_regions_by_region( region.chrom, region.start, region.end ):
+ if keep_empty or coverage:
+ #only output regions that have atleast 1 base covered unless empty are requested
+ out.write( "%s\t%s\t%s\t%s\n" % ( "\t".join( region.fields ), table_name, coverage, region_count ) )
+ out.close()
+
+def profile_summary( interval_filename, chrom_col, start_col, end_col, out_filename, keep_empty, coverage_reader, chrom_lengths ):
+ out = open( out_filename, 'wb' )
+ table_coverage_summary = TableCoverageSummary( coverage_reader, chrom_lengths )
+ for region in bx.intervals.io.NiceReaderWrapper( open( interval_filename, 'rb' ), chrom_col = chrom_col, start_col = start_col, end_col = end_col, fix_strand = True, return_header = False, return_comments = False ):
+ table_coverage_summary.add_region( region.chrom, region.start, region.end )
+
+ out.write( "#tableName\ttableChromosomeCoverage\ttableChromosomeCount\ttableRegionCoverage\ttableRegionCount\tallIntervalCount\tallIntervalSize\tallCoverage\tallTableRegionsOverlaped\tallIntervalsOverlapingTable\tnrIntervalCount\tnrIntervalSize\tnrCoverage\tnrTableRegionsOverlaped\tnrIntervalsOverlapingTable\n" )
+ for table_name, table_chromosome_size, table_chromosome_count, table_region_coverage, table_region_count, total_interval_count, total_interval_size, total_coverage, table_regions_overlaped_count, interval_region_overlap_count, nr_interval_count, nr_interval_size, nr_coverage, nr_table_regions_overlaped_count, nr_interval_table_overlap_count in table_coverage_summary.iter_table_coverage():
+ if keep_empty or total_coverage:
+ #only output tables that have atleast 1 base covered unless empty are requested
+ out.write( "%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n" % ( table_name, table_chromosome_size, table_chromosome_count, table_region_coverage, table_region_count, total_interval_count, total_interval_size, total_coverage, table_regions_overlaped_count, interval_region_overlap_count, nr_interval_count, nr_interval_size, nr_coverage, nr_table_regions_overlaped_count, nr_interval_table_overlap_count ) )
+ out.close()
+
+ #report chrom size errors as needed:
+ if table_coverage_summary.region_size_errors:
+ print "Regions provided extended beyond known chromosome lengths, and have been truncated as necessary, for the following intervals:"
+ for chrom, regions in table_coverage_summary.region_size_errors.items():
+ if len( regions ) > 3:
+ extra_region_info = ", ... "
+ else:
+ extra_region_info = ""
+ print "%s has max length of %s, exceeded by %s%s." % ( chrom, chrom_lengths.get( chrom ), ", ".join( map( str, regions[:3] ) ), extra_region_info )
+
+class ChromosomeLengths:
+ def __init__( self, profiler_info ):
+ self.chroms = {}
+ self.default_bitset_size = int( profiler_info.get( 'bitset_size', bx.bitset.MAX ) )
+ chroms = profiler_info.get( 'chromosomes', None )
+ if chroms:
+ for chrom in chroms.split( ',' ):
+ for fields in chrom.rsplit( '=', 1 ):
+ if len( fields ) == 2:
+ self.chroms[ fields[0] ] = int( fields[1] )
+ else:
+ self.chroms[ fields[0] ] = self.default_bitset_size
+ def get( self, name ):
+ return self.chroms.get( name, self.default_bitset_size )
+
+def parse_profiler_info( filename ):
+ profiler_info = {}
+ try:
+ for line in open( filename ):
+ fields = line.rstrip( '\n\r' ).split( '\t', 1 )
+ if len( fields ) == 2:
+ if fields[0] in profiler_info:
+ if not isinstance( profiler_info[ fields[0] ], list ):
+ profiler_info[ fields[0] ] = [ profiler_info[ fields[0] ] ]
+ profiler_info[ fields[0] ].append( fields[1] )
+ else:
+ profiler_info[ fields[0] ] = fields[1]
+ except:
+ pass #likely missing file
+ return profiler_info
+
+def __main__():
+ parser = optparse.OptionParser()
+ parser.add_option(
+ '-k','--keep_empty',
+ action="store_true",
+ dest='keep_empty',
+ default=False,
+ help='Keep tables with 0 coverage'
+ )
+ parser.add_option(
+ '-b','--buffer',
+ dest='buffer',
+ type='int',default=10,
+ help='Number of Chromosomes to keep buffered'
+ )
+ parser.add_option(
+ '-c','--chrom_col',
+ dest='chrom_col',
+ type='int',default=1,
+ help='Chromosome column'
+ )
+ parser.add_option(
+ '-s','--start_col',
+ dest='start_col',
+ type='int',default=2,
+ help='Start Column'
+ )
+ parser.add_option(
+ '-e','--end_col',
+ dest='end_col',
+ type='int',default=3,
+ help='End Column'
+ )
+ parser.add_option(
+ '-p','--path',
+ dest='path',
+ type='str',default='/galaxy/data/annotation_profiler/hg18',
+ help='Path to profiled data for this organism'
+ )
+ parser.add_option(
+ '-t','--table_names',
+ dest='table_names',
+ type='str',default='None',
+ help='Table names requested'
+ )
+ parser.add_option(
+ '-i','--input',
+ dest='interval_filename',
+ type='str',
+ help='Input Interval File'
+ )
+ parser.add_option(
+ '-o','--output',
+ dest='out_filename',
+ type='str',
+ help='Input Interval File'
+ )
+ parser.add_option(
+ '-S','--summary',
+ action="store_true",
+ dest='summary',
+ default=False,
+ help='Display Summary Results'
+ )
+
+ options, args = parser.parse_args()
+
+ assert os.path.isdir( options.path ), IOError( "Configuration error: Table directory is missing (%s)" % options.path )
+
+ #get profiler_info
+ profiler_info = parse_profiler_info( os.path.join( options.path, 'profiler_info.txt' ) )
+
+ table_names = options.table_names.split( "," )
+ if table_names == ['None']: table_names = None
+ coverage_reader = CachedCoverageReader( options.path, buffer = options.buffer, table_names = table_names, profiler_info = profiler_info )
+
+ if options.summary:
+ profile_summary( options.interval_filename, options.chrom_col - 1, options.start_col - 1, options.end_col -1, options.out_filename, options.keep_empty, coverage_reader, ChromosomeLengths( profiler_info ) )
+ else:
+ profile_per_interval( options.interval_filename, options.chrom_col - 1, options.start_col - 1, options.end_col -1, options.out_filename, options.keep_empty, coverage_reader )
+
+ #print out data version info
+ print 'Data version (%s:%s:%s)' % ( profiler_info.get( 'dbkey', 'unknown' ), profiler_info.get( 'profiler_hash', 'unknown' ), profiler_info.get( 'dump_time', 'unknown' ) )
+
+if __name__ == "__main__": __main__()
diff -r 000000000000 -r 4414f0739808 scripts/README.txt
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/scripts/README.txt Mon May 19 10:59:42 2014 -0400
@@ -0,0 +1,54 @@
+This file explains how to create annotation indexes for the annotation profiler tool. Annotation profiler indexes are an exceedingly simple binary format,
+containing no header information and consisting of an ordered linear list of (start,stop encoded individually as ' hg19.txt
+
+where the genome build is hg19 and /ucsc_data/hg19/database/ contains the downloaded database dump from UCSC (e.g. obtained by rsync: rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/ /ucsc_data/hg19/database/).
+
+
+
+By default, chromosome names come from a file named 'chromInfo.txt.gz' found in the input directory, with FTP used as a backup.
+When FTP is used to obtain the names of chromosomes from UCSC for a particular genome build, alternate ftp sites and paths can be specified by using the --ftp_site and --ftp_path attributes.
+Chromosome names can instead be provided on the commandline via the --chromosomes option, which accepts a comma separated list of:ChromName1[=length],ChromName2[=length],...
+
+
+
+ usage = "usage: %prog options"
+ parser = OptionParser( usage=usage )
+ parser.add_option( '-d', '--dbkey', dest='dbkey', default='hg18', help='dbkey to process' )
+ parser.add_option( '-i', '--input_dir', dest='input_dir', default=os.path.join( 'golden_path','%s', 'database' ), help='Input Directory' )
+ parser.add_option( '-o', '--output_dir', dest='output_dir', default=os.path.join( 'profiled_annotations','%s' ), help='Output Directory' )
+ parser.add_option( '-c', '--chromosomes', dest='chromosomes', default='', help='Comma separated list of: ChromName1[=length],ChromName2[=length],...' )
+ parser.add_option( '-b', '--bitset_size', dest='bitset_size', default=DEFAULT_BITSET_SIZE, type='int', help='Default BitSet size; overridden by sizes specified in chromInfo.txt.gz or by --chromosomes' )
+ parser.add_option( '-f', '--ftp_site', dest='ftp_site', default='hgdownload.cse.ucsc.edu', help='FTP site; used for chromosome info when chromInfo.txt.gz method fails' )
+ parser.add_option( '-p', '--ftp_path', dest='ftp_path', default='/goldenPath/%s/chromosomes/', help='FTP Path; used for chromosome info when chromInfo.txt.gz method fails' )
diff -r 000000000000 -r 4414f0739808 scripts/build_profile_indexes.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/scripts/build_profile_indexes.py Mon May 19 10:59:42 2014 -0400
@@ -0,0 +1,338 @@
+#!/usr/bin/env python
+#Dan Blankenberg
+
+VERSION = '1.0.0' # version of this script
+
+from optparse import OptionParser
+import os, gzip, struct, time
+from ftplib import FTP #do we want a diff method than using FTP to determine Chrom Names, eg use local copy
+
+#import md5 from hashlib; if python2.4 or less, use old md5
+try:
+ from hashlib import md5
+except ImportError:
+ from md5 import new as md5
+
+#import BitSet from bx-python, try using eggs and package resources, fall back to any local installation
+try:
+ from galaxy import eggs
+ import pkg_resources
+ pkg_resources.require( "bx-python" )
+except: pass #Maybe there is a local installation available
+from bx.bitset import BitSet
+
+#Define constants
+STRUCT_FMT = ' 1:
+ setting_value = setting_fields[ 1 ]
+ if setting_name or setting_value:
+ tables[ table_name ][ 'settings' ][ setting_name ] = setting_value
+ #Load Groups
+ groups = load_groups()
+ in_groups = {}
+ for table_name, values in tables.iteritems():
+ if os.path.exists( os.path.join( output_dir, table_name ) ):
+ group = values['grp']
+ if group not in in_groups:
+ in_groups[group]={}
+ #***NAME CHANGE***, 'subTrack' no longer exists as a setting...use 'parent' instead
+ #subTrack = values.get('settings', {} ).get( 'subTrack', table_name )
+ subTrack = values.get('settings', {} ).get( 'parent', table_name ).split( ' ' )[0] #need to split, because could be e.g. 'trackgroup on'
+ if subTrack not in in_groups[group]:
+ in_groups[group][subTrack]=[]
+ in_groups[group][subTrack].append( table_name )
+
+ assigned_tables = []
+ out.write( """\n""" % ( dbkey ) )
+ out.write( " \n" )
+ for group, subTracks in sorted( in_groups.iteritems() ):
+ out.write( """ \n""" % ( track_label, track ) )
+ assigned_tables.append( track )
+ out.write( " \n" )
+ else:
+ track = sub_tracks[0]
+ track_label = track
+ if "$" not in tables[track]['shortLabel']:
+ track_label = tables[track]['shortLabel']
+ out.write( """ \n""" % ( track_label, track ) )
+ assigned_tables.append( track )
+ out.write( " \n" )
+ unassigned_tables = list( sorted( [ table_dir for table_dir in os.listdir( output_dir ) if table_dir not in assigned_tables and os.path.isdir( os.path.join( output_dir, table_dir ) ) ] ) )
+ if unassigned_tables:
+ out.write( """ \n""" % ( table_name, table_name ) )
+ out.write( " \n" )
+ out.write( " \n" )
+ out.write( """\n""" )
+ out.close()
+
+def write_database_dump_info( input_dir, output_dir, dbkey, chrom_lengths, default_bitset_size ):
+ #generate hash for profiled table directories
+ #sort directories off output root (files in output root not hashed, including the profiler_info.txt file)
+ #sort files in each directory and hash file contents
+ profiled_hash = md5()
+ for table_dir in sorted( [ table_dir for table_dir in os.listdir( output_dir ) if os.path.isdir( os.path.join( output_dir, table_dir ) ) ] ):
+ for filename in sorted( os.listdir( os.path.join( output_dir, table_dir ) ) ):
+ f = open( os.path.join( output_dir, table_dir, filename ), 'rb' )
+ while True:
+ hash_chunk = f.read( CHUNK_SIZE )
+ if not hash_chunk:
+ break
+ profiled_hash.update( hash_chunk )
+ profiled_hash = profiled_hash.hexdigest()
+
+ #generate hash for input dir
+ #sort directories off input root
+ #sort files in each directory and hash file contents
+ database_hash = md5()
+ for dirpath, dirnames, filenames in sorted( os.walk( input_dir ) ):
+ for filename in sorted( filenames ):
+ f = open( os.path.join( input_dir, dirpath, filename ), 'rb' )
+ while True:
+ hash_chunk = f.read( CHUNK_SIZE )
+ if not hash_chunk:
+ break
+ database_hash.update( hash_chunk )
+ database_hash = database_hash.hexdigest()
+
+ #write out info file
+ out = open( os.path.join( output_dir, 'profiler_info.txt' ), 'wb' )
+ out.write( 'dbkey\t%s\n' % ( dbkey ) )
+ out.write( 'chromosomes\t%s\n' % ( ','.join( [ '%s=%s' % ( chrom_name, chrom_len ) for chrom_name, chrom_len in chrom_lengths.iteritems() ] ) ) )
+ out.write( 'bitset_size\t%s\n' % ( default_bitset_size ) )
+ for line in open( os.path.join( input_dir, 'trackDb.sql' ) ):
+ line = line.strip()
+ if line.startswith( '-- Dump completed on ' ):
+ line = line[ len( '-- Dump completed on ' ): ]
+ out.write( 'dump_time\t%s\n' % ( line ) )
+ break
+ out.write( 'dump_hash\t%s\n' % ( database_hash ) )
+ out.write( 'profiler_time\t%s\n' % ( time.time() ) )
+ out.write( 'profiler_hash\t%s\n' % ( profiled_hash ) )
+ out.write( 'profiler_version\t%s\n' % ( VERSION ) )
+ out.write( 'profiler_struct_format\t%s\n' % ( STRUCT_FMT ) )
+ out.write( 'profiler_struct_size\t%s\n' % ( STRUCT_SIZE ) )
+ out.close()
+
+def __main__():
+ usage = "usage: %prog options"
+ parser = OptionParser( usage=usage )
+ parser.add_option( '-d', '--dbkey', dest='dbkey', default='hg18', help='dbkey to process' )
+ parser.add_option( '-i', '--input_dir', dest='input_dir', default=os.path.join( 'golden_path','%s', 'database' ), help='Input Directory' )
+ parser.add_option( '-o', '--output_dir', dest='output_dir', default=os.path.join( 'profiled_annotations','%s' ), help='Output Directory' )
+ parser.add_option( '-c', '--chromosomes', dest='chromosomes', default='', help='Comma separated list of: ChromName1[=length],ChromName2[=length],...' )
+ parser.add_option( '-b', '--bitset_size', dest='bitset_size', default=DEFAULT_BITSET_SIZE, type='int', help='Default BitSet size; overridden by sizes specified in chromInfo.txt.gz or by --chromosomes' )
+ parser.add_option( '-f', '--ftp_site', dest='ftp_site', default='hgdownload.cse.ucsc.edu', help='FTP site; used for chromosome info when chromInfo.txt.gz method fails' )
+ parser.add_option( '-p', '--ftp_path', dest='ftp_path', default='/goldenPath/%s/chromosomes/', help='FTP Path; used for chromosome info when chromInfo.txt.gz method fails' )
+
+ ( options, args ) = parser.parse_args()
+
+ input_dir = options.input_dir
+ if '%' in input_dir:
+ input_dir = input_dir % options.dbkey
+ assert os.path.exists( input_dir ), 'Input directory does not exist'
+ output_dir = options.output_dir
+ if '%' in output_dir:
+ output_dir = output_dir % options.dbkey
+ assert not os.path.exists( output_dir ), 'Output directory already exists'
+ os.makedirs( output_dir )
+ ftp_path = options.ftp_path
+ if '%' in ftp_path:
+ ftp_path = ftp_path % options.dbkey
+
+ #Get chromosome names and lengths
+ chrom_lengths = {}
+ if options.chromosomes:
+ for chrom in options.chromosomes.split( ',' ):
+ fields = chrom.split( '=' )
+ chrom = fields[0]
+ if len( fields ) > 1:
+ chrom_len = int( fields[1] )
+ else:
+ chrom_len = options.bitset_size
+ chrom_lengths[ chrom ] = chrom_len
+ chroms = chrom_lengths.keys()
+ print 'Chrom info taken from command line option.'
+ else:
+ try:
+ for line in gzip.open( os.path.join( input_dir, 'chromInfo.txt.gz' ) ):
+ fields = line.strip().split( '\t' )
+ chrom_lengths[ fields[0] ] = int( fields[ 1 ] )
+ chroms = chrom_lengths.keys()
+ print 'Chrom info taken from chromInfo.txt.gz.'
+ except Exception, e:
+ print 'Error loading chrom info from chromInfo.txt.gz, trying FTP method.'
+ chrom_lengths = {} #zero out chrom_lengths
+ chroms = []
+ ftp = FTP( options.ftp_site )
+ ftp.login()
+ for name in ftp.nlst( ftp_path ):
+ if name.endswith( '.fa.gz' ):
+ chroms.append( name.split( '/' )[-1][ :-len( '.fa.gz' ) ] )
+ ftp.close()
+ for chrom in chroms:
+ chrom_lengths[ chrom ] = options.bitset_size
+ #sort chroms by length of name, decending; necessary for when table names start with chrom name
+ chroms = list( reversed( [ chrom for chrom_len, chrom in sorted( [ ( len( chrom ), chrom ) for chrom in chroms ] ) ] ) )
+
+ #parse tables from local files
+ #loop through directory contents, if file ends in '.sql', process table
+ for filename in os.listdir( input_dir ):
+ if filename.endswith ( '.sql' ):
+ base_filename = filename[ 0:-len( '.sql' ) ]
+ table_out_dir = os.path.join( output_dir, base_filename )
+ #some tables are chromosome specific, lets strip off the chrom name
+ for chrom in chroms:
+ if base_filename.startswith( "%s_" % chrom ):
+ #found chromosome
+ table_out_dir = os.path.join( output_dir, base_filename[len( "%s_" % chrom ):] )
+ break
+ #create table dir
+ if not os.path.exists( table_out_dir ):
+ os.mkdir( table_out_dir ) #table dir may already exist in the case of single chrom tables
+ print "Created table dir (%s)." % table_out_dir
+ else:
+ print "Table dir (%s) already exists." % table_out_dir
+ #find column assignments
+ table_name, chrom_col, start_col, end_col = get_columns( "%s.sql" % os.path.join( input_dir, base_filename ) )
+ if chrom_col is None or start_col is None or end_col is None:
+ print "Table %s (%s) does not appear to have a chromosome, a start, or a stop." % ( table_name, "%s.sql" % os.path.join( input_dir, base_filename ) )
+ if not os.listdir( table_out_dir ):
+ print "Removing empty table (%s) directory (%s)." % ( table_name, table_out_dir )
+ os.rmdir( table_out_dir )
+ continue
+ #build bitsets from table
+ bitset_dict = {}
+ for line in gzip.open( '%s.txt.gz' % os.path.join( input_dir, base_filename ) ):
+ fields = line.strip().split( '\t' )
+ chrom = fields[ chrom_col ]
+ start = int( fields[ start_col ] )
+ end = int( fields[ end_col ] )
+ if chrom not in bitset_dict:
+ bitset_dict[ chrom ] = BitSet( chrom_lengths.get( chrom, options.bitset_size ) )
+ bitset_dict[ chrom ].set_range( start, end - start )
+ #write bitsets as profiled annotations
+ for chrom_name, chrom_bits in bitset_dict.iteritems():
+ out = open( os.path.join( table_out_dir, '%s.covered' % chrom_name ), 'wb' )
+ end = 0
+ total_regions = 0
+ total_coverage = 0
+ max_size = chrom_lengths.get( chrom_name, options.bitset_size )
+ while True:
+ start = chrom_bits.next_set( end )
+ if start >= max_size:
+ break
+ end = chrom_bits.next_clear( start )
+ out.write( struct.pack( STRUCT_FMT, start ) )
+ out.write( struct.pack( STRUCT_FMT, end ) )
+ total_regions += 1
+ total_coverage += end - start
+ if end >= max_size:
+ break
+ out.close()
+ open( os.path.join( table_out_dir, '%s.total_regions' % chrom_name ), 'wb' ).write( str( total_regions ) )
+ open( os.path.join( table_out_dir, '%s.total_coverage' % chrom_name ), 'wb' ).write( str( total_coverage ) )
+
+ #create xml
+ create_grouping_xml( input_dir, output_dir, options.dbkey )
+ #create database dump info file, for database version control
+ write_database_dump_info( input_dir, output_dir, options.dbkey, chrom_lengths, options.bitset_size )
+
+if __name__ == "__main__": __main__()
diff -r 000000000000 -r 4414f0739808 test-data/3.bed
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/3.bed Mon May 19 10:59:42 2014 -0400
@@ -0,0 +1,25 @@
+chr1 147962006 147975713 NM_005997 0 - 147962192 147975670 0 6 574,145,177,115,153,160, 0,1543,7859,9048,9340,13547,
+chr1 147984101 148035079 BC007833 0 + 147984545 148033414 0 14 529,32,81,131,118,153,300,206,84,49,85,130,46,1668, 0,25695,28767,33118,33695,33998,35644,38005,39629,40577,41402,43885,48367,49310,
+chr1 148077485 148111797 NM_002651 0 - 148078400 148111728 0 12 1097,121,133,266,124,105,110,228,228,45,937,77, 0,2081,2472,6871,9907,10257,11604,14199,15637,18274,23636,34235,
+chr1 148185113 148187485 NM_002796 0 + 148185136 148187378 0 7 163,207,147,82,117,89,120, 0,416,877,1199,1674,1977,2252,
+chr2 118288484 118306183 NM_006773 0 + 118288583 118304530 0 14 184,285,144,136,101,200,115,140,162,153,114,57,178,1796, 0,2765,4970,6482,6971,7183,7468,9890,10261,10768,11590,14270,14610,15903,
+chr2 118389378 118390700 BC005078 0 - 118390395 118390500 0 1 1322, 0,
+chr2 220108603 220116964 NM_001927 0 + 220108689 220116217 0 9 664,61,96,162,126,221,44,83,789, 0,1718,1874,2118,2451,2963,5400,7286,7572,
+chr2 220229182 220233943 NM_024536 0 - 220229609 220233765 0 4 1687,180,574,492, 0,1990,2660,4269,
+chr5 131170738 131357870 AF099740 0 - 131311206 131357817 0 31 112,124,120,81,65,40,120,129,61,88,94,79,72,102,144,117,89,73,96,135,135,78,74,52,33,179,100,102,65,115,248, 0,11593,44117,47607,104668,109739,114675,126366,135488,137518,138009,140437,152389,153373,155388,159269,160793,162981,164403,165577,166119,167611,169501,178260,179675,180901,181658,182260,182953,183706,186884,
+chr5 131424245 131426795 NM_000588 0 + 131424298 131426383 0 5 215,42,90,42,535, 0,313,1658,1872,2015,
+chr5 131556201 131590458 NM_004199 0 - 131556601 131582218 0 15 471,97,69,66,54,100,71,177,194,240,138,152,97,100,170, 0,2316,2802,5596,6269,11138,11472,15098,16528,17674,21306,24587,25142,25935,34087,
+chr5 131621285 131637046 NM_003687 0 + 131621326 131635821 0 7 134,152,82,179,164,118,1430, 0,4915,8770,13221,13609,14097,14331,
+chr6 108298214 108386086 NM_007214 0 - 108299600 108385906 0 21 1530,105,99,102,159,174,60,83,148,155,93,133,95,109,51,59,62,113,115,100,304, 0,2490,6246,10831,12670,23164,23520,27331,31052,32526,34311,36130,36365,38609,41028,42398,43048,51479,54500,59097,87568,
+chr6 108593954 108616704 NM_003269 0 + 108594662 108615360 0 9 733,146,88,236,147,97,150,106,1507, 0,5400,8778,10445,12037,14265,14749,15488,21243,
+chr6 108639410 108689143 NM_152827 0 - 108640045 108688818 0 3 741,125,487, 0,2984,49246,
+chr6 108722790 108950942 NM_145315 0 + 108722976 108950321 0 13 325,224,52,102,131,100,59,83,71,101,141,114,750, 0,28931,52094,60760,61796,71339,107102,152319,181970,182297,215317,224802,227402,
+chr7 113320332 113924911 AK131266 0 + 113862563 113893433 0 20 285,91,178,90,58,75,138,51,201,178,214,105,88,84,77,102,122,70,164,1124, 0,201692,340175,448290,451999,484480,542213,543265,543478,545201,556083,558358,565876,567599,573029,573245,575738,577123,577946,603455,
+chr7 116511232 116557294 NM_003391 0 - 116512159 116556994 0 5 1157,265,278,227,383, 0,20384,37843,43339,45679,
+chr7 116713967 116902666 NM_000492 0 + 116714099 116901113 0 27 185,111,109,216,90,164,126,247,93,183,192,95,87,724,129,38,251,80,151,228,101,249,156,90,173,106,1754, 0,24290,29071,50936,54313,55285,56585,60137,62053,68678,79501,107776,110390,111971,114967,122863,123569,126711,130556,131618,134650,147559,162475,172879,184725,185496,186945,
+chr7 116944658 117107512 AF377960 0 - 116945541 116979926 0 23 1129,102,133,64,186,206,179,188,153,100,87,80,96,276,118,255,151,100,204,1654,225,108,173, 0,7364,8850,10413,13893,14398,17435,24259,24615,35177,35359,45901,47221,49781,56405,66857,69787,72208,73597,80474,100111,150555,162681,
+chr8 118880786 119193239 NM_000127 0 - 118881131 119192466 0 11 531,172,161,90,96,119,133,120,108,94,1735, 0,5355,7850,13505,19068,20309,23098,30863,36077,37741,310718,
+chr9 128763240 128783870 NM_174933 0 + 128764156 128783586 0 12 261,118,74,159,76,48,56,63,129,117,127,370, 0,522,875,5630,12374,12603,15040,15175,18961,19191,20037,20260,
+chr9 128787362 128789566 NM_014908 0 - 128787519 128789136 0 1 2204, 0,
+chr9 128789530 128848928 NM_015354 0 + 128789552 128848511 0 44 54,55,74,85,81,45,93,120,212,115,201,90,66,120,127,153,127,88,77,115,121,67,129,140,107,207,170,70,68,196,78,86,146,182,201,93,159,138,75,228,132,74,130,594, 0,1491,5075,8652,9254,10312,11104,11317,20808,21702,23060,25462,31564,32908,33566,34851,35204,35595,35776,37202,38860,39111,39891,40349,42422,45499,45827,46675,47158,47621,50453,50840,51474,51926,53831,54186,55119,55619,57449,57605,57947,58352,58541,58804,
+chr9 128849867 128870133 NM_020145 0 - 128850516 128869987 0 11 757,241,101,90,24,63,93,134,129,142,209, 0,1071,1736,2085,2635,4201,6376,6736,13056,14247,20057,
diff -r 000000000000 -r 4414f0739808 test-data/4.bed
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/4.bed Mon May 19 10:59:42 2014 -0400
@@ -0,0 +1,1 @@
+chr22 30128507 31828507 uc003bnx.1_cds_2_0_chr22_29227_f 0 +
diff -r 000000000000 -r 4414f0739808 test-data/annotation_profiler_1.out
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/annotation_profiler_1.out Mon May 19 10:59:42 2014 -0400
@@ -0,0 +1,9 @@
+chr22 30128507 31828507 uc003bnx.1_cds_2_0_chr22_29227_f 0 + multiz17way 1700000 1
+chr22 30128507 31828507 uc003bnx.1_cds_2_0_chr22_29227_f 0 + mrna 1476531 12
+chr22 30128507 31828507 uc003bnx.1_cds_2_0_chr22_29227_f 0 + multiz28way 1700000 1
+chr22 30128507 31828507 uc003bnx.1_cds_2_0_chr22_29227_f 0 + refGene 1247808 15
+chr22 30128507 31828507 uc003bnx.1_cds_2_0_chr22_29227_f 0 + knownAlt 14617 57
+chr22 30128507 31828507 uc003bnx.1_cds_2_0_chr22_29227_f 0 + affyGnf1h 16218 2
+chr22 30128507 31828507 uc003bnx.1_cds_2_0_chr22_29227_f 0 + snp126 8224 7262
+chr22 30128507 31828507 uc003bnx.1_cds_2_0_chr22_29227_f 0 + acembly 1532618 20
+chr22 30128507 31828507 uc003bnx.1_cds_2_0_chr22_29227_f 0 + knownGene 1282789 18
diff -r 000000000000 -r 4414f0739808 test-data/annotation_profiler_2.out
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/annotation_profiler_2.out Mon May 19 10:59:42 2014 -0400
@@ -0,0 +1,10 @@
+#tableName tableChromosomeCoverage tableChromosomeCount tableRegionCoverage tableRegionCount allIntervalCount allIntervalSize allCoverage allTableRegionsOverlaped allIntervalsOverlapingTable nrIntervalCount nrIntervalSize nrCoverage nrTableRegionsOverlaped nrIntervalsOverlapingTable
+multiz17way 1232617592 115 107496500 7 25 2178864 2178864 25 25 24 2178828 2178828 7 24
+mrna 610115393 8453 53577685 617 25 2178864 1904380 38 24 24 2178828 1904344 33 23
+multiz28way 1233785185 143 107466479 10 25 2178864 2178864 25 25 24 2178828 2178828 8 24
+refGene 496767116 7324 46112187 488 25 2178864 1677947 30 23 24 2178828 1677911 27 22
+knownAlt 8647368 20213 766619 1630 25 2178864 5612 31 11 24 2178828 5612 31 11
+affyGnf1h 24034558 3995 2446754 307 25 2178864 191851 9 6 24 2178828 191851 9 6
+snp126 5297125 4456213 382226 331523 25 2178864 9205 7074 25 24 2178828 9205 7074 24
+acembly 710938193 13800 63146381 938 25 2178864 1903560 35 24 24 2178828 1903524 30 23
+knownGene 555770538 7921 50317496 558 25 2178864 1822985 30 23 24 2178828 1822949 27 22
diff -r 000000000000 -r 4414f0739808 tool-data/annotation_profiler_options.xml.sample
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tool-data/annotation_profiler_options.xml.sample Mon May 19 10:59:42 2014 -0400
@@ -0,0 +1,1101 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff -r 000000000000 -r 4414f0739808 tool-data/annotation_profiler_valid_builds.txt.sample
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tool-data/annotation_profiler_valid_builds.txt.sample Mon May 19 10:59:42 2014 -0400
@@ -0,0 +1,1 @@
+hg18
diff -r 000000000000 -r 4414f0739808 tool_dependencies.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tool_dependencies.xml Mon May 19 10:59:42 2014 -0400
@@ -0,0 +1,6 @@
+
+
+
+
+
+