msp_sr_readmap_and_size_histograms: readmap.py annotate

annotate readmap.py @ 5:6ee5a6e89aa4 draft

planemo upload for repository https://bitbucket.org/drosofff/gedtools/ commit 8d708b1a6643c06464e00e9e41d271474a85c0ba

author	mvdbeek
date	Wed, 21 Oct 2015 09:31:37 -0400
parents	9af9983dcd02
children	70f4385534f9

rev	line source
0 9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	1 #!/usr/bin/python
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	2 # python parser module for for readmaps and size distributions, guided by GFF3
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	3 # version 0.9.1 (1-6-2014)
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	4 # Usage readmap.py <1:index source> <2:extraction directive> <3:output pre-mir> <4: output mature miRs> <5:mirbase GFF3>
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	5 # <6:pathToLatticeDataframe or "dummy_dataframe_path"> <7:Rcode or "dummy_plotCode"> <8:latticePDF or "dummy_latticePDF">
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	6 # <9:10:11 filePath:FileExt:FileLabel> <.. ad lib>
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	7
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	8 import sys, subprocess, argparse
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	9 from smRtools import *
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	10 from collections import OrderedDict, defaultdict
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	11 import os
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	12
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	13 def Parser():
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	14 the_parser = argparse.ArgumentParser()
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	15 the_parser.add_argument('--output_readmap', action="store", type=str, help="readmap dataframe")
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	16 the_parser.add_argument('--output_size_distribution', action="store", type=str, help="size distribution dataframe")
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	17 the_parser.add_argument('--reference_fasta', action="store", type=str, help="output file")
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	18 the_parser.add_argument('--reference_bowtie_index',action='store', help="paths to indexed or fasta references")
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	19 the_parser.add_argument('--input',nargs='+', help="paths to multiple input files")
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	20 the_parser.add_argument('--ext',nargs='+', help="input file type")
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	21 the_parser.add_argument('--label',nargs='+', help="labels of multiple input files")
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	22 the_parser.add_argument('--normalization_factor',nargs='+', type=float, help="Normalization factor for input file")
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	23 the_parser.add_argument('--gff', type=str, help="GFF containing regions of interest")
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	24 the_parser.add_argument('--minquery', type=int, help="Minimum readsize")
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	25 the_parser.add_argument('--maxquery', type=int, help="Maximum readsize")
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	26 the_parser.add_argument('--rcode', type=str, help="R script")
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	27 args = the_parser.parse_args()
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	28 return args
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	29
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	30 args=Parser()
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	31 if args.reference_fasta:
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	32 genomeRefFormat = "fastaSource"
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	33 genomeRefFile = args.reference_fasta
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	34 if args.reference_bowtie_index:
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	35 genomeRefFormat = "bowtieIndex"
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	36 genomeRefFile = args.reference_bowtie_index
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	37 readmap_file=args.output_readmap
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	38 size_distribution_file=args.output_size_distribution
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	39 minquery=args.minquery
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	40 maxquery=args.maxquery
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	41 Rcode = args.rcode
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	42 filePath=args.input
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	43 fileExt=args.ext
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	44 fileLabel=args.label
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	45 normalization_factor=args.normalization_factor
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	46
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	47 MasterListOfGenomes = OrderedDict()
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	48
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	49 def process_samples(filePath):
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	50 for i, filePath in enumerate(filePath):
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	51 norm=normalization_factor[i]
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	52 print fileLabel[i]
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	53 MasterListOfGenomes[fileLabel[i]] = HandleSmRNAwindows (alignmentFile=filePath, alignmentFileFormat=fileExt[i], genomeRefFile=genomeRefFile, genomeRefFormat=genomeRefFormat,\
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	54 biosample=fileLabel[i], size_inf=minquery, size_sup=maxquery, norm=norm)
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	55 return MasterListOfGenomes
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	56
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	57 def dataframe_sanityzer (listofdatalines):
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	58 Dict = defaultdict(float)
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	59 for line in listofdatalines:
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	60 fields= line.split("\t")
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	61 Dict[fields[0]] += float (fields[2])
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	62 filtered_list = []
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	63 for line in listofdatalines:
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	64 fields= line.split("\t")
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	65 if Dict[fields[0]] != 0:
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	66 filtered_list.append(line)
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	67 return filtered_list
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	68
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	69 def write_readplot_dataframe(readDict, readmap_file):
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	70 listoflines = []
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	71 with open(readmap_file, 'w') as readmap:
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	72 print >>readmap, "gene\tcoord\tcount\tpolarity\tsample"
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	73 for sample in readDict.keys():
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	74 if args.gff:
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	75 dict=readDict[sample]
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	76 else:
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	77 dict=readDict[sample].instanceDict
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	78 for gene in dict.keys():
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	79 plottable = dict[gene].readplot()
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	80 for line in plottable:
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	81 #print >>readmap, "%s\t%s" % (line, sample)
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	82 listoflines.append ("%s\t%s" % (line, sample))
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	83 listoflines = dataframe_sanityzer(listoflines)
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	84 for line in listoflines:
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	85 print >>readmap, line
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	86
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	87 def write_size_distribution_dataframe(readDict, size_distribution_file):
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	88 listoflines = []
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	89 with open(size_distribution_file, 'w') as size_distrib:
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	90 print >>size_distrib, "gene\tsize\tcount\tpolarity\tsample" # test before was "gene\tpolarity\tsize\tcount\tsample"
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	91 for sample in readDict.keys():
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	92 if args.gff:
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	93 dict=readDict[sample]
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	94 else:
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	95 dict=readDict[sample].instanceDict
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	96 for gene in dict.keys():
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	97 histogram = dict[gene].size_histogram(minquery=args.minquery, maxquery=args.maxquery)
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	98 for polarity in histogram.keys():
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	99 if polarity=='both':
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	100 continue
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	101 #for size in xrange(args.minquery, args.maxquery):
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	102 # if not size in histogram[polarity].keys():
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	103 # histogram[size]=0
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	104 for size, count in histogram[polarity].iteritems():
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	105 #print >>size_distrib, "%s\t%s\t%s\t%s\t%s" % (gene, size, count, polarity, sample) # test, changed the order accordingly
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	106 listoflines.append ("%s\t%s\t%s\t%s\t%s" % (gene, size, count, polarity, sample) )
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	107 listoflines = dataframe_sanityzer(listoflines)
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	108 for line in listoflines:
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	109 print >>size_distrib, line
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	110
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	111 def gff_item_subinstances(readDict, gff3):
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	112 GFFinstanceDict=OrderedDict()
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	113 for sample in readDict.keys():
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	114 GFFinstanceDict[sample]={} # to implement the 2nd level of directionary in an OrderedDict Class object (would not be required with defaultdict Class)
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	115 with open(gff3) as gff:
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	116 for line in gff:
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	117 if line[0] == "#": continue
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	118 gff_fields = line[:-1].split("\t")
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	119 chrom = gff_fields[0]
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	120 gff_name = gff_fields[-1].split("Name=")[-1].split(";")[0] # to isolate the GFF Name
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	121 item_upstream_coordinate = int(gff_fields[3])
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	122 item_downstream_coordinate = int(gff_fields[4])
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	123 item_polarity = gff_fields[6]
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	124 for sample in readDict.keys():
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	125 ## this is not required anymore but test
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	126 # if not GFFinstanceDict.has_key(sample):
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	127 # GFFinstanceDict[sample]={}
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	128 ####
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	129 subinstance=extractsubinstance(item_upstream_coordinate, item_downstream_coordinate, readDict[sample].instanceDict[chrom])
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	130 if item_polarity == '-':
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	131 subinstance.readDict={key*-1:value for key, value in subinstance.readDict.iteritems()}
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	132 subinstance.gene=gff_name
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	133 GFFinstanceDict[sample][gff_name]=subinstance
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	134 return GFFinstanceDict
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	135
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	136 MasterListOfGenomes=process_samples(filePath)
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	137
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	138 if args.gff:
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	139 MasterListOfGenomes=gff_item_subinstances(MasterListOfGenomes, args.gff)
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	140
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	141 write_readplot_dataframe(MasterListOfGenomes, readmap_file)
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	142 write_size_distribution_dataframe(MasterListOfGenomes, size_distribution_file)
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	143
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	144 R_command="Rscript "+ Rcode
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	145 process = subprocess.Popen(R_command.split())
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	146 process.wait()
9af9983dcd02 Imported from capsule None drosofff parents: diff changeset	147

Mercurial > repos > drosofff > msp_sr_readmap_and_size_histograms

annotate readmap.py @ 5:6ee5a6e89aa4 draft