annotate scripts/S01_find_orf_on_multiple_alignment.py @ 11:06a28df198b6 draft default tip

planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
author lecorguille
date Mon, 24 Sep 2018 03:58:34 -0400
parents 3d00be2d05f3
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
1 #!/usr/bin/env python
8
716a45028e55 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 90cfcf697b9f128e81bea1270378e59d63ab0a6f
abims-sbr
parents: 7
diff changeset
2 # coding: utf8
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
3 # Author: Eric Fontanillas
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
4 # Modification: 03/09/14 by Julie BAFFARD
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
5 # Last modification : 25/07/18 by Victor Mataigne
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
6
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
7 # Description: Predict potential ORF on the basis of 2 criteria + 1 optional criteria
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
8 # CRITERIA 1 - Longest part of the alignment of sequence without codon stop "*", tested in the 3 potential ORF
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
9 # CRITERIA 2 - This longest part should be > 150nc or 50aa
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
10 # CRITERIA 3 - [OPTIONNAL] A codon start "M" should be present in this longuest part, before the last 50 aa
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
11 # OUTPUTs "05_CDS_aa" & "05_CDS_nuc" => NOT INCLUDE THIS CRITERIA
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
12 # OUTPUTs "06_CDS_with_M_aa" & "06_CDS_with_M_nuc" => INCLUDE THIS CRITERIA
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
13
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
14 import string, os, time, re, zipfile, sys, argparse
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
15 from dico import dico
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
16
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
17 def code_universel(F1):
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
18 """ Creates bash for genetic code (key : codon ; value : amino-acid) """
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
19 bash_codeUniversel = {}
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
20
7
35e39b4128ba planemo upload for repository https://github.com/abims-sbr/adaptsearch commit b7a3030ea134b5dfad89b1a869db659d72d1145c
abims-sbr
parents: 3
diff changeset
21 with open(F1, "r") as file:
35e39b4128ba planemo upload for repository https://github.com/abims-sbr/adaptsearch commit b7a3030ea134b5dfad89b1a869db659d72d1145c
abims-sbr
parents: 3
diff changeset
22 for line in file.readlines():
35e39b4128ba planemo upload for repository https://github.com/abims-sbr/adaptsearch commit b7a3030ea134b5dfad89b1a869db659d72d1145c
abims-sbr
parents: 3
diff changeset
23 L1 = string.split(line, " ")
35e39b4128ba planemo upload for repository https://github.com/abims-sbr/adaptsearch commit b7a3030ea134b5dfad89b1a869db659d72d1145c
abims-sbr
parents: 3
diff changeset
24 length1 = len(L1)
35e39b4128ba planemo upload for repository https://github.com/abims-sbr/adaptsearch commit b7a3030ea134b5dfad89b1a869db659d72d1145c
abims-sbr
parents: 3
diff changeset
25 if length1 == 3:
35e39b4128ba planemo upload for repository https://github.com/abims-sbr/adaptsearch commit b7a3030ea134b5dfad89b1a869db659d72d1145c
abims-sbr
parents: 3
diff changeset
26 key = L1[0]
35e39b4128ba planemo upload for repository https://github.com/abims-sbr/adaptsearch commit b7a3030ea134b5dfad89b1a869db659d72d1145c
abims-sbr
parents: 3
diff changeset
27 value = L1[2][:-1]
35e39b4128ba planemo upload for repository https://github.com/abims-sbr/adaptsearch commit b7a3030ea134b5dfad89b1a869db659d72d1145c
abims-sbr
parents: 3
diff changeset
28 bash_codeUniversel[key] = value
35e39b4128ba planemo upload for repository https://github.com/abims-sbr/adaptsearch commit b7a3030ea134b5dfad89b1a869db659d72d1145c
abims-sbr
parents: 3
diff changeset
29 else:
35e39b4128ba planemo upload for repository https://github.com/abims-sbr/adaptsearch commit b7a3030ea134b5dfad89b1a869db659d72d1145c
abims-sbr
parents: 3
diff changeset
30 key = L1[0]
35e39b4128ba planemo upload for repository https://github.com/abims-sbr/adaptsearch commit b7a3030ea134b5dfad89b1a869db659d72d1145c
abims-sbr
parents: 3
diff changeset
31 value = L1[2]
35e39b4128ba planemo upload for repository https://github.com/abims-sbr/adaptsearch commit b7a3030ea134b5dfad89b1a869db659d72d1145c
abims-sbr
parents: 3
diff changeset
32 bash_codeUniversel[key] = value
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
33
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
34 return(bash_codeUniversel)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
35
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
36 def multiple3(seq):
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
37 """ Tests if the sequence is a multiple of 3, and if not removes extra-bases
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
38 !! Possible to lost a codon, when I test ORF (as I will decay the ORF) """
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
39
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
40 m = len(seq)%3
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
41 if m != 0 :
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
42 return seq[:-m], m
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
43 else :
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
44 return seq, m
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
45
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
46 def detect_Methionine(seq_aa, Ortho, minimal_cds_length):
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
47 """ Detects if methionin in the aa sequence """
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
48
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
49 ln = len(seq_aa)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
50 CUTOFF_Last_50aa = ln - minimal_cds_length
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
51
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
52 # Find all indices of occurances of "M" in a string of aa
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
53 list_indices = [pos for pos, char in enumerate(seq_aa) if char == "M"]
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
54
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
55 # If some "M" are present, find whether the first "M" found is not in the 50 last aa (indice < CUTOFF_Last_50aa) ==> in this case: maybenot a CDS
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
56 if list_indices != []:
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
57 first_M = list_indices[0]
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
58 if first_M < CUTOFF_Last_50aa:
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
59 Ortho = 1 # means orthologs found
2
0d2f72caea10 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 44a89d5eeb82789bfc643b33c11f391281b6374b
abims-sbr
parents: 1
diff changeset
60
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
61 return(Ortho)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
62
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
63 def ReverseComplement2(seq):
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
64 """ Reverse complement DNA sequence """
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
65 seq1 = 'ATCGN-TAGCN-atcgn-tagcn-'
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
66 seq_dict = { seq1[i]:seq1[i+6] for i in range(24) if i < 6 or 12<=i<=16 }
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
67
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
68 return "".join([seq_dict[base] for base in reversed(seq)])
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
69
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
70 def simply_get_ORF(seq_dna, gen_code):
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
71 seq_by_codons = [seq_dna.upper().replace('T', 'U')[i:i+3] for i in range(0, len(seq_dna), 3)]
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
72 seq_by_aa = [gen_code[codon] if codon in gen_code.keys() else '?' for codon in seq_by_codons]
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
73
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
74 return ''.join(seq_by_aa)
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
75
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
76 def find_good_ORF_criteria_3(bash_aligned_nc_seq, bash_codeUniversel, minimal_cds_length, min_spec):
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
77 # Multiple sequence based : Based on the alignment of several sequences (orthogroup)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
78 # Criteria 1 : Get the segment in the alignment with no codon stop
2
0d2f72caea10 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 44a89d5eeb82789bfc643b33c11f391281b6374b
abims-sbr
parents: 1
diff changeset
79
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
80 # 1 - Get the list of aligned aa seq for the 3 ORF:
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
81 bash_of_aligned_aa_seq_3ORF = {}
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
82 bash_of_aligned_nuc_seq_3ORF = {}
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
83 BEST_LONGUEST_SUBSEQUENCE_LIST_POSITION = []
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
84
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
85 for fasta_name in bash_aligned_nc_seq.keys():
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
86 # Get sequence, chek if multiple 3, then get 6 orfs
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
87 sequence_nc = bash_aligned_nc_seq[fasta_name]
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
88 new_sequence_nc, modulo = multiple3(sequence_nc)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
89 new_sequence_rev = ReverseComplement2(new_sequence_nc)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
90 # For each seq of the multialignment => give the 6 ORFs (in nuc)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
91 bash_of_aligned_nuc_seq_3ORF[fasta_name] = [new_sequence_nc, new_sequence_nc[1:-2], new_sequence_nc[2:-1], new_sequence_rev, new_sequence_rev[1:-2], new_sequence_rev[2:-1]]
2
0d2f72caea10 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 44a89d5eeb82789bfc643b33c11f391281b6374b
abims-sbr
parents: 1
diff changeset
92
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
93 seq_prot_ORF1 = simply_get_ORF(new_sequence_nc, bash_codeUniversel)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
94 seq_prot_ORF2 = simply_get_ORF(new_sequence_nc[1:-2], bash_codeUniversel)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
95 seq_prot_ORF3 = simply_get_ORF(new_sequence_nc[2:-1], bash_codeUniversel)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
96 seq_prot_ORF4 = simply_get_ORF(new_sequence_rev, bash_codeUniversel)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
97 seq_prot_ORF5 = simply_get_ORF(new_sequence_rev[1:-2], bash_codeUniversel)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
98 seq_prot_ORF6 = simply_get_ORF(new_sequence_rev[2:-1], bash_codeUniversel)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
99
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
100 # For each seq of the multialignment => give the 6 ORFs (in aa)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
101 bash_of_aligned_aa_seq_3ORF[fasta_name] = [seq_prot_ORF1, seq_prot_ORF2, seq_prot_ORF3, seq_prot_ORF4, seq_prot_ORF5, seq_prot_ORF6]
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
102
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
103 # 2 - Test for the best ORF (Get the longuest segment in the alignment with no codon stop ... for each ORF ... the longuest should give the ORF)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
104 BEST_MAX = 0
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
105
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
106 for i in [0,1,2,3,4,5]: # Test the 6 ORFs
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
107 ORF_Aligned_aa = []
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
108 ORF_Aligned_nuc = []
2
0d2f72caea10 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 44a89d5eeb82789bfc643b33c11f391281b6374b
abims-sbr
parents: 1
diff changeset
109
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
110 # 2.1 - Get the alignment of sequence for a given ORF
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
111 # Compare the 1rst ORF between all sequence => list them in ORF_Aligned_aa // them do the same for the second ORF, and them the 3rd
2
0d2f72caea10 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 44a89d5eeb82789bfc643b33c11f391281b6374b
abims-sbr
parents: 1
diff changeset
112 for fasta_name in bash_of_aligned_aa_seq_3ORF.keys():
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
113 ORFsequence = bash_of_aligned_aa_seq_3ORF[fasta_name][i]
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
114 aa_length = len(ORFsequence)
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
115 ORF_Aligned_aa.append(ORFsequence) ### List of all sequences in the ORF nb "i" =
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
116
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
117 n = i+1
2
0d2f72caea10 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 44a89d5eeb82789bfc643b33c11f391281b6374b
abims-sbr
parents: 1
diff changeset
118
0d2f72caea10 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 44a89d5eeb82789bfc643b33c11f391281b6374b
abims-sbr
parents: 1
diff changeset
119 for fasta_name in bash_of_aligned_nuc_seq_3ORF.keys():
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
120 ORFsequence = bash_of_aligned_nuc_seq_3ORF[fasta_name][i]
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
121 nuc_length = len(ORFsequence)
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
122 ORF_Aligned_nuc.append(ORFsequence) # List of all sequences in the ORF nb "i" =
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
123
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
124 # 2.2 - Get the list of sublist of positions whithout codon stop in the alignment
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
125 # For each ORF, now we have the list of sequences available (i.e. THE ALIGNMENT IN A GIVEN ORF)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
126 # Next step is to get the longuest subsequence whithout stop
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
127 # We will explore the presence of stop "*" in each column of the alignment, and get the positions of the segments between the positions with "*"
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
128 MAX_LENGTH = 0
2
0d2f72caea10 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 44a89d5eeb82789bfc643b33c11f391281b6374b
abims-sbr
parents: 1
diff changeset
129 LONGUEST_SEGMENT_UNSTOPPED = ""
0d2f72caea10 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 44a89d5eeb82789bfc643b33c11f391281b6374b
abims-sbr
parents: 1
diff changeset
130 j = 0 # Start from first position in alignment
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
131 List_of_List_subsequences = []
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
132 List_positions_subsequence = []
2
0d2f72caea10 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 44a89d5eeb82789bfc643b33c11f391281b6374b
abims-sbr
parents: 1
diff changeset
133 while j < aa_length:
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
134 column = []
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
135 for seq in ORF_Aligned_aa:
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
136 column.append(seq[j])
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
137 j = j+1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
138 if "*" in column:
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
139 List_of_List_subsequences.append(List_positions_subsequence) # Add previous list of positions
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
140 List_positions_subsequence = [] # Re-initialyse list of positions
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
141 else:
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
142 List_positions_subsequence.append(j)
2
0d2f72caea10 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 44a89d5eeb82789bfc643b33c11f391281b6374b
abims-sbr
parents: 1
diff changeset
143
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
144 # 2.3 - Among all the sublists (separated by column with codon stop "*"), get the longuest one (BETTER SEGMENT for a given ORF)
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
145 LONGUEST_SUBSEQUENCE_LIST_POSITION = []
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
146 MAX=0
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
147 for sublist in List_of_List_subsequences:
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
148 if len(sublist) > MAX and len(sublist) > minimal_cds_length:
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
149 MAX = len(sublist)
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
150 LONGUEST_SUBSEQUENCE_LIST_POSITION = sublist
2
0d2f72caea10 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 44a89d5eeb82789bfc643b33c11f391281b6374b
abims-sbr
parents: 1
diff changeset
151
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
152 # 2.4. - Test if the longuest subsequence start exactly at the beginning of the original sequence (i.e. means the ORF maybe truncated)
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
153 if LONGUEST_SUBSEQUENCE_LIST_POSITION != []:
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
154 if LONGUEST_SUBSEQUENCE_LIST_POSITION[0] == 0:
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
155 CDS_maybe_truncated = 1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
156 else:
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
157 CDS_maybe_truncated = 0
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
158 else:
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
159 CDS_maybe_truncated = 0
2
0d2f72caea10 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 44a89d5eeb82789bfc643b33c11f391281b6374b
abims-sbr
parents: 1
diff changeset
160
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
161
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
162 # 2.5 - Test if this BETTER SEGMENT for a given ORF, is the better than the one for the other ORF (GET THE BEST ORF)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
163 # Test whether it is the better ORF
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
164 if MAX > BEST_MAX:
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
165 BEST_MAX = MAX
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
166 BEST_ORF = i+1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
167 BEST_LONGUEST_SUBSEQUENCE_LIST_POSITION = LONGUEST_SUBSEQUENCE_LIST_POSITION
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
168
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
169
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
170 # 3 - ONCE we have this better segment (BEST CODING SEGMENT)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
171 # ==> GET THE STARTING and ENDING POSITIONS (in aa position and in nuc position)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
172 # And get the INDEX of the best ORF [0, 1, or 2]
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
173 if BEST_LONGUEST_SUBSEQUENCE_LIST_POSITION != []:
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
174 pos_MIN_aa = BEST_LONGUEST_SUBSEQUENCE_LIST_POSITION[0]
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
175 pos_MIN_aa = pos_MIN_aa - 1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
176 pos_MAX_aa = BEST_LONGUEST_SUBSEQUENCE_LIST_POSITION[-1]
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
177
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
178
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
179 BESTORF_bash_of_aligned_aa_seq = {}
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
180 BESTORF_bash_of_aligned_aa_seq_CODING = {}
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
181 for fasta_name in bash_of_aligned_aa_seq_3ORF.keys():
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
182 index_BEST_ORF = BEST_ORF-1 # cause list going from 0 to 2 in LIST_3_ORF, while the ORF nb is indexed from 1 to 3
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
183 seq = bash_of_aligned_aa_seq_3ORF[fasta_name][index_BEST_ORF]
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
184 seq_coding = seq[pos_MIN_aa:pos_MAX_aa]
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
185 BESTORF_bash_of_aligned_aa_seq[fasta_name] = seq
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
186 BESTORF_bash_of_aligned_aa_seq_CODING[fasta_name] = seq_coding
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
187
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
188 # 4 - Get the corresponding position (START/END of BEST CODING SEGMENT) for nucleotides alignment
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
189 pos_MIN_nuc = pos_MIN_aa * 3
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
190 pos_MAX_nuc = pos_MAX_aa * 3
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
191
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
192 BESTORF_bash_aligned_nc_seq = {}
2
0d2f72caea10 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 44a89d5eeb82789bfc643b33c11f391281b6374b
abims-sbr
parents: 1
diff changeset
193 BESTORF_bash_aligned_nc_seq_CODING = {}
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
194 for fasta_name in bash_aligned_nc_seq.keys():
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
195 seq = bash_of_aligned_nuc_seq_3ORF[fasta_name][index_BEST_ORF]
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
196 seq_coding = seq[pos_MIN_nuc:pos_MAX_nuc]
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
197 BESTORF_bash_aligned_nc_seq[fasta_name] = seq
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
198 BESTORF_bash_aligned_nc_seq_CODING[fasta_name] = seq_coding
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
199
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
200 else: # no CDS found
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
201 BESTORF_bash_aligned_nc_seq = {}
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
202 BESTORF_bash_aligned_nc_seq_CODING = {}
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
203 BESTORF_bash_of_aligned_aa_seq = {}
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
204 BESTORF_bash_of_aligned_aa_seq_CODING ={}
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
205
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
206 # Check whether their is a "M" or not, and if at least 1 "M" is present, that it is not in the last 50 aa
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
207
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
208 BESTORF_bash_of_aligned_aa_seq_CDS_with_M = {}
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
209 BESTORF_bash_of_aligned_nuc_seq_CDS_with_M = {}
2
0d2f72caea10 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 44a89d5eeb82789bfc643b33c11f391281b6374b
abims-sbr
parents: 1
diff changeset
210
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
211 Ortho = 0
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
212 for fasta_name in BESTORF_bash_of_aligned_aa_seq_CODING.keys():
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
213 seq_aa = BESTORF_bash_of_aligned_aa_seq_CODING[fasta_name]
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
214 Ortho = detect_Methionine(seq_aa, Ortho, minimal_cds_length) ### DEF6 ###
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
215
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
216 # CASE 1: A "M" is present and correctly localized (not in last 50 aa)
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
217 if Ortho == 1:
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
218 BESTORF_bash_of_aligned_aa_seq_CDS_with_M = BESTORF_bash_of_aligned_aa_seq_CODING
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
219 BESTORF_bash_of_aligned_nuc_seq_CDS_with_M = BESTORF_bash_aligned_nc_seq_CODING
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
220
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
221 # CASE 2: in case the CDS is truncated, so the "M" is maybe missing:
2
0d2f72caea10 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 44a89d5eeb82789bfc643b33c11f391281b6374b
abims-sbr
parents: 1
diff changeset
222 if Ortho == 0 and CDS_maybe_truncated == 1:
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
223 BESTORF_bash_of_aligned_aa_seq_CDS_with_M = BESTORF_bash_of_aligned_aa_seq_CODING
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
224 BESTORF_bash_of_aligned_nuc_seq_CDS_with_M = BESTORF_bash_aligned_nc_seq_CODING
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
225
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
226 # CASE 3: CDS not truncated AND no "M" found in good position (i.e. before the last 50 aa):
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
227 ## => the 2 bash "CDS_with_M" are left empty ("{}")
2
0d2f72caea10 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 44a89d5eeb82789bfc643b33c11f391281b6374b
abims-sbr
parents: 1
diff changeset
228
0d2f72caea10 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 44a89d5eeb82789bfc643b33c11f391281b6374b
abims-sbr
parents: 1
diff changeset
229 return(BESTORF_bash_aligned_nc_seq, BESTORF_bash_aligned_nc_seq_CODING, BESTORF_bash_of_aligned_nuc_seq_CDS_with_M, BESTORF_bash_of_aligned_aa_seq, BESTORF_bash_of_aligned_aa_seq_CODING, BESTORF_bash_of_aligned_aa_seq_CDS_with_M)
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
230
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
231 def write_output_file(results_dict, name_elems, path_out):
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
232 if results_dict != {}:
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
233 name_elems[3] = str(len(results_dict.keys()))
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
234 new_name = "_".join(name_elems)
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
235
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
236 out1 = open("%s/%s" %(path_out,new_name), "w")
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
237 for fasta_name in results_dict.keys():
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
238 seq = results_dict[fasta_name]
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
239 out1.write("%s\n" %fasta_name)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
240 out1.write("%s\n" %seq)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
241 out1.close()
2
0d2f72caea10 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 44a89d5eeb82789bfc643b33c11f391281b6374b
abims-sbr
parents: 1
diff changeset
242
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
243 def main():
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
244 parser = argparse.ArgumentParser()
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
245 parser.add_argument("codeUniversel", help="File describing the genetic code (code_universel_modified.txt")
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
246 parser.add_argument("min_cds_len", help="Minmal length of a CDS (in amino-acids)", type=int)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
247 parser.add_argument("min_spec", help="Minimal number of species per alignment")
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
248 parser.add_argument("list_files", help="File with all input files names")
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
249 args = parser.parse_args()
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
250
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
251 minimal_cds_length = int(args.min_cds_len) # in aa number
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
252 bash_codeUniversel = code_universel(args.codeUniversel)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
253 minimum_species = int(args.min_spec)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
254
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
255 # Inputs from file containing list of species
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
256 list_files = []
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
257 with open(args.list_files, 'r') as f:
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
258 for line in f.readlines():
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
259 list_files.append(line.strip('\n'))
9
640ef4c06ed5 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit f1ba8d136e0129f3e8435b25a95f70f697d51464-dirty
abims-sbr
parents: 8
diff changeset
260
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
261 # Directories for results
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
262 dirs = ["04_BEST_ORF_nuc", "04_BEST_ORF_aa", "05_CDS_nuc", "05_CDS_aa", "06_CDS_with_M_nuc", "06_CDS_with_M_aa"]
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
263 for directory in dirs:
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
264 os.mkdir(directory)
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
265
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
266 count_file_processed, count_file_with_CDS, count_file_without_CDS, count_file_with_CDS_plus_M = 0, 0, 0, 0
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
267 count_file_with_cds_and_enought_species, count_file_with_cds_M_and_enought_species = 0, 0
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
268
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
269 # ! : Currently, files are named "Orthogroup_x_y_sequences.fasta, where x is the number of the orthogroup (not important, juste here to make a distinct name),
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
270 # and y is the number of sequences/species in the group. These files are outputs of blastalign, where species can be removed. y is then modified.
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
271 name_elems = ["orthogroup", "0", "with", "0", "species.fasta"]
8
716a45028e55 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 90cfcf697b9f128e81bea1270378e59d63ab0a6f
abims-sbr
parents: 7
diff changeset
272
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
273 # by fixing the counter here, there will be some "holes" in the outputs directories (missing numbers), but the groups between directories will correspond
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
274 #n0 = 0
8
716a45028e55 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 90cfcf697b9f128e81bea1270378e59d63ab0a6f
abims-sbr
parents: 7
diff changeset
275
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
276 for file in list_files:
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
277 #n0 += 1
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
278
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
279 count_file_processed = count_file_processed + 1
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
280 nb_gp = file.split('_')[1] # Keep trace of the orthogroup number
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
281 fasta_file_path = "./%s" %file
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
282 bash_fasta = dico(fasta_file_path)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
283 BESTORF_nuc, BESTORF_nuc_CODING, BESTORF_nuc_CDS_with_M, BESTORF_aa, BESTORF_aa_CODING, BESTORF_aa_CDS_with_M = find_good_ORF_criteria_3(bash_fasta, bash_codeUniversel, minimal_cds_length, minimum_species)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
284
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
285 name_elems[1] = nb_gp
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
286
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
287 # Update counts and write group in corresponding output directory
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
288 if BESTORF_nuc != {}:
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
289 count_file_with_CDS += 1
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
290 if len(BESTORF_nuc.keys()) >= minimum_species :
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
291 count_file_with_cds_and_enought_species += 1
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
292 write_output_file(BESTORF_nuc, name_elems, dirs[0]) # OUTPUT BESTORF_nuc
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
293 write_output_file(BESTORF_aa, name_elems, dirs[1]) # The most interesting
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
294 else:
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
295 count_file_without_CDS += 1
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
296
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
297 if BESTORF_nuc_CODING != {} and len(BESTORF_nuc_CODING.keys()) >= minimum_species:
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
298 write_output_file(BESTORF_nuc_CODING, name_elems, dirs[2])
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
299 write_output_file(BESTORF_aa_CODING, name_elems, dirs[3])
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
300
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
301 if BESTORF_nuc_CDS_with_M != {}:
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
302 count_file_with_CDS_plus_M += 1
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
303 if len(BESTORF_nuc_CDS_with_M.keys()) >= minimum_species :
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
304 count_file_with_cds_M_and_enought_species += 1
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
305 write_output_file(BESTORF_nuc_CDS_with_M, name_elems, dirs[4])
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
306 write_output_file(BESTORF_aa_CDS_with_M, name_elems, dirs[5])
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
307
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
308 print "*************** CDS detection ***************"
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
309 print "\nFiles processed: %d" %count_file_processed
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
310 print "\tFiles with CDS: %d" %count_file_with_CDS
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
311 print "\tFiles wth CDS and more than %s species: %d" %(minimum_species, count_file_with_cds_and_enought_species)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
312 print "\t\tFiles with CDS plus M (codon start): %d" %count_file_with_CDS_plus_M
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
313 print "\t\tFiles with CDS plus M (codon start) and more than %s species: %d" %(minimum_species,count_file_with_cds_M_and_enought_species)
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
314 print "\tFiles without CDS: %d \n" %count_file_without_CDS
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
315 print ""
1
567d5b771a90 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit ab76075e541dd7ece1090f6b55ca508ec0fde39d
lecorguille
parents:
diff changeset
316
11
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
317 if __name__ == '__main__':
06a28df198b6 planemo upload for repository https://github.com/abims-sbr/adaptsearch commit 3c7982d775b6f3b472f6514d791edcb43cd258a1
lecorguille
parents: 10
diff changeset
318 main()