annotate scripts/pogs.py @ 6:b19ed7395dcc draft

planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
author abims-sbr
date Wed, 17 Jan 2018 08:54:30 -0500
parents
children 1a728cb1da31
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
6
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
1 #!/usr/bin/env python
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
2 # coding: utf8
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
3 # September 2017 - Author : Victor Mataigne (Station Biologique de Roscoff - ABiMS)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
4 # Command line : ./pogsPOO.py <list_of_input_files_separated_by_commas> <minimal number of species per group> [-v) [-p]
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
5
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
6 """
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
7 What it does:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
8 - pogs.py parses output files from the "pairwise" tool of the AdaptSearch suite and proceeds to gather genes in orthogroups (using transitivity).
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
9 - A minimal number of species per group has to be set.
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
10
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
11 BETA VERSION
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
12 """
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
13
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
14 import os, argparse
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
15 import numpy as np
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
16 import pandas as pd
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
17
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
18 """ Definition of a locus : header + sequence + a tag """
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
19 class Locus:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
20
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
21 def __init__(self, header, sequence):
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
22 self.header = header
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
23 self.sequence = sequence
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
24 self.tagged = False
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
25
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
26 def __str__(self):
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
27 return "{}{}".format(self.header, self.sequence)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
28
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
29 def __eq__(self, other):
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
30 return self.getHeader() == other.getHeader() # Test if two loci are the same
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
31
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
32 def __hash__(self):
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
33 return hash((self.header, self.sequence)) # Make the object iterable and hashable
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
34
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
35 def getHeader(self):
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
36 return self.header
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
37
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
38 def getSequence(self):
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
39 return self.sequence
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
40
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
41 def getTag(self):
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
42 return self.tagged
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
43
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
44 def prettyPrint(self):
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
45 # Used for debugging : print "{ Header : ", self.header[0:-1], "Tag : ", self.tagged, " }"
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
46 print "[ Header : {header} ]".format(header=self.header[0:-1])
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
47
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
48 def prettyPrint2(self):
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
49 print "[ Header : {header} Sequence : {sequence} ]".format(header=self.header[0:-1], sequence=self.sequence[0:-1])
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
50
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
51 """ Applies the getPairwiseCouple() function to a list of files and return a big list with ALL pairwises couples
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
52 Returns a list of sets (2 items per set) """
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
53 def getListPairwiseAll(listPairwiseFiles):
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
54
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
55 # Sub-Function
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
56
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
57 """ Reads an output file from the 'Pairwise' tool (AdaptSearch suite) and returns its content into a list
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
58 Returns a list of sets (2 items per set) """
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
59 def getPairwiseCouple(pairwiseFile):
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
60 list_pairwises_2sp = []
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
61 with open(pairwiseFile, "r") as file:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
62 while (1): # Ugly !
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
63 name, sequence, name2, sequence2 = file.readline(), file.readline(), file.readline(), file.readline()
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
64 if not name: break # Use assert ?
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
65 # One locus every two lines (one pairwise couple = 4 lines) : header + sequence
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
66 locus1 = Locus(name, sequence)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
67 locus2 = Locus(name2, sequence2)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
68 group = set([])
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
69 group.add(locus1)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
70 group.add(locus2)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
71 list_pairwises_2sp.append(group)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
72 return (list_pairwises_2sp)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
73
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
74 # Function
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
75 list_pairwises_allsp = []
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
76 for file in listPairwiseFiles:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
77 listPairwises = getPairwiseCouple(file)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
78 for pairwise in listPairwises:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
79 list_pairwises_allsp.append(pairwise) # all pairwises in the same 1D list
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
80 return list_pairwises_allsp
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
81
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
82 """ Proceeds to create orthogroups by putting together pairwise couples sharing a locus.
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
83 Iterates over the orthogroups list and tag to 'True' the pairwise couple already gathered in a group to avoid
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
84 redondancy. Writes each orthogroup in a fasta file
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
85 Returns an integer (a list length) """
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
86 def makeOrthogroups(list_pairwises_allsp, minspec, nb_rbh, verbose, paralogs):
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
87
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
88 # Sub-funtions
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
89
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
90 """ Check if a locus/group has already been treated in makeOrthogroups()
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
91 Returns a boolean """
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
92 def checkIfTagged(pair):
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
93 tag = True
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
94 for element in pair:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
95 if not element.getTag() and tag: # use a list comprehension maybe ?
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
96 tag = False
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
97 return tag
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
98
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
99 """ True means a locus/group has already been treated in makeOrthogroups()
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
100 A stronger code would be to implement a method inside the class Locus """
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
101 def tagGroup(pair):
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
102 for element in pair:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
103 element.tagged = True
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
104
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
105 """ Write an orthogroup in a file """
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
106 def writeOutputFile(orthogroup, number, naming):
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
107 name = ""
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
108 if naming:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
109 name = "orthogroup_{}_with_{}_sequences_withParalogs.fasta".format(number, len(orthogroup))
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
110 else :
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
111 name = "orthogroup_{}_with_{}_sequences.fasta".format(number, len(orthogroup))
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
112 result = open(name, "w")
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
113 with result:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
114 for locus in orthogroup:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
115 if locus.getHeader()[-1] == "\n":
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
116 result.write("%s" % locus.getHeader()) # write geneID
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
117 else :
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
118 result.write("%s\n" % locus.Header()) # write geneID
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
119 if locus.getSequence()[-1] == "\n":
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
120 result.write("%s" % locus.getSequence()) # write sequence
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
121 else :
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
122 result.write("%s\n" % locus.getSequence()) # write sequence
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
123 if naming:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
124 os.system("mv {} outputs_withParalogs/".format(name))
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
125 else :
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
126 os.system("mv {} outputs/".format(name))
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
127
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
128 """ Parse an orthogroup list to keep only one paralog sequence per species & per group
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
129 (Keeps the 1st paralogous encoutered)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
130 Returns a list """
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
131 def filterParalogs(list_orthogroups, minspec):
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
132 list_orthogroups_format = []
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
133 j = 1
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
134
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
135 for nofilter_group in list_orthogroups:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
136 new_group = []
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
137 species = {}
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
138 for loci in nofilter_group:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
139 species[loci.getHeader()[1:3]] = False
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
140 for loci in nofilter_group:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
141 if not species[loci.getHeader()[1:3]]:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
142 new_group.append(loci)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
143 species[loci.getHeader()[1:3]] = True
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
144
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
145 if len(new_group) >= minspec: # Drop too small orthogroups
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
146 list_orthogroups_format.append(new_group)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
147 writeOutputFile(new_group, j, False)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
148 j += 1
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
149
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
150 return list_orthogroups_format
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
151
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
152 """ Builds a 2D array for a summary
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
153 Returns a numpy 2D array """
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
154 def countings(listOrthogroups, nb_rbh):
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
155
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
156 def compute_nbspec(nb_rbh):
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
157
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
158 def factorielle(x):
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
159 n = 1
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
160 s = 0
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
161 while n <= x:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
162 s += n
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
163 n += 1
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
164 return s
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
165
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
166 x = 2
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
167 nb_specs = 0
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
168 while x*x - factorielle(x) < nb_rbh:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
169 x += 1
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
170 return x
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
171 #listOrthogroups.sort().reverse()
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
172 #nblines = len(listOrthogroups[0])
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
173 nblines = 0
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
174 for group in listOrthogroups:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
175 if len(group) > nblines:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
176 nblines = len(group)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
177 matrix = np.array([[0]*compute_nbspec(nb_rbh)]*nblines)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
178
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
179 for group in listOrthogroups:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
180 listSpecs = []
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
181 for loci in group:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
182 if loci.getHeader()[1:3] not in listSpecs:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
183 listSpecs.append(loci.getHeader()[1:3])
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
184 matrix[len(group)-1][len(listSpecs)-1] += 1
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
185
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
186 return matrix
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
187
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
188 """ numpy 2D array in a nice dataframe
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
189 Returns a pandas 2D dataframe """
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
190 def asFrame(matrix) :
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
191 index = [0]*len(matrix)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
192 colnames = [0]*len(matrix[0])
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
193 index = [str(i+1)+" seqs" for i in range(len(matrix))]
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
194 colnames = [str(i+1)+" sps" for i in range(len(matrix[0]))]
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
195 df = pd.DataFrame(matrix, index=index, columns=colnames)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
196 return df # Mettre une selection pour ne renvoyer que les lignes et les colonnes qui somment > 0
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
197 #return df.loc['4 seqs':'9 seqs'].loc[:,colnames[3:]]
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
198
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
199 # Function -------------------------------------------------------------------------------------------------
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
200 list_orthogroups = []
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
201
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
202 for ortho_pair1 in list_pairwises_allsp[0:-1]:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
203 if not checkIfTagged(ortho_pair1):
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
204 orthogroup = ortho_pair1 # the orthogroup grows as we go throught the second loop
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
205
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
206 # check for common locus between two groups
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
207 for ortho_pair2 in list_pairwises_allsp[list_pairwises_allsp.index(ortho_pair1) + 1:]:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
208 if len(orthogroup.intersection(ortho_pair2)) != 0 and not checkIfTagged(ortho_pair2):
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
209 orthogroup.update(orthogroup | ortho_pair2)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
210 tagGroup(ortho_pair2)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
211
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
212 # Check if subgroup is already computed
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
213 if len(list_orthogroups) > 0:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
214 presence = False
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
215 for group in list_orthogroups:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
216 if len(group.intersection(orthogroup)) != 0:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
217 group.update(group | orthogroup)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
218 presence = True
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
219 if not presence:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
220 list_orthogroups.append(orthogroup)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
221 else:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
222 list_orthogroups.append(orthogroup)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
223
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
224 # Options --------------------------------------------------------------------------------------------------
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
225
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
226 """ nb : I could try to implement a more complex code which does in the same previous loop all the following lines, to avoid multiples parsing of
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
227 the orthogroups list, but the code would become hardly readable. Since the whole program is already quite fast, I chosed code simplicity
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
228 over code efficiency """
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
229
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
230 # Print summary table with all paralogs
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
231 if verbose :
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
232 frame = countings(list_orthogroups, nb_rbh)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
233 df = asFrame(frame)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
234 print "\n Summary before paralogous filtering : \n"
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
235 print df.loc[df.ne(0).any(1),df.ne(0).any()], "\n" # Don't display columns and lines filled with 0
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
236
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
237 # Write outputFile with all the paralogous
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
238 if paralogs:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
239 print "Writing orthogroups with paralogs files ...\n"
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
240 j = 1
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
241 for group in list_orthogroups:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
242 if len(group) >= minspec:
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
243 writeOutputFile(group, j, True)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
244 j += 1
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
245
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
246 # Paralogs filtering and summary ----------------------------------------------------------------------------
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
247
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
248 print "Filtering paralogous sequences and writing final orthogroups files ..."
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
249 print " (Dropping Orthogroups with less than {} species)".format(minspec)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
250
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
251 # writeOutputFile() is called in filterParalogs()
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
252 list_orthogroups_format = filterParalogs(list_orthogroups, minspec)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
253
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
254 frame = countings(list_orthogroups_format, nb_rbh)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
255 df = asFrame(frame)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
256 print "\n Summary after paralogous filtering : \n"
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
257 print df.loc[df.ne(0).any(1),df.ne(0).any()]
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
258
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
259 #return only the length of the list (at this point the program doesn't need more)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
260 return len(list_orthogroups_format)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
261
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
262 def main():
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
263 parser = argparse.ArgumentParser()
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
264 parser.add_argument("files", help="Input files separated by commas. Each file contains all the reciprocical best hits between a pair of species")
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
265 parser.add_argument("minspec", help="Only keep Orthogroups with at least this number of species", type=int)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
266 parser.add_argument("-v", "--verbose", action="store_true", help="A supplemental summary table of orthogroups before paralogs filtering will be returned")
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
267 parser.add_argument("-p", "--paralogs", action="store_true", help="Proceeds to write orthogroups also before paralogous filtering")
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
268 args = parser.parse_args()
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
269
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
270 print "*** pogs.py ***"
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
271 print "\nBuilding of orthogroups based on pairs of genes obtained by pairwise comparisons between pairs of species."
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
272 print "Genes are gathered in orthogroups based on the principle of transitivity between genes pairs."
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
273
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
274 os.system("mkdir outputs")
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
275 if args.paralogs: os.system("mkdir outputs_withParalogs")
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
276 infiles = args.files
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
277 listPairwiseFiles = str.split(infiles, ",")
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
278 print "\nParsing input files ..."
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
279 list_Locus = getListPairwiseAll(listPairwiseFiles)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
280 print "Creating Orthogroups ..."
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
281 nb_orthogroups = makeOrthogroups(list_Locus, args.minspec, len(listPairwiseFiles), args.verbose, args.paralogs)
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
282 print "\n{} orthogroups have been infered from {} pairwise comparisons by RBH\n".format(nb_orthogroups, len(listPairwiseFiles))
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
283
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
284 if __name__ == "__main__":
b19ed7395dcc planemo upload for repository https://github.com/abims-sbr/adaptsearch commit cf1b9c905931ca2ca25faa4844d45c908756472f
abims-sbr
parents:
diff changeset
285 main()