annotate variant_effect_predictor/Bio/Assembly/ContigAnalysis.pm @ 0:2bc9b66ada89 draft default tip

Uploaded
author mahtabm
date Thu, 11 Apr 2013 06:29:17 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
1 # $Id: ContigAnalysis.pm,v 1.2 2002/12/01 00:03:28 jason Exp $
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
2 #
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
3 # BioPerl module for Bio::Assembly::ContigAnalysis
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
4 #
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
5 # Cared for by Robson francisco de Souza <rfsouza@citri.iq.usp.br>
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
6 #
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
7 # Copyright Robson Francisco de Souza
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
8 #
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
9 # You may distribute this module under the same terms as perl itself
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
10 #
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
11 # POD documentation - main docs before the code
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
12
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
13 =head1 NAME
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
14
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
15 Bio::Assembly::ContigAnalysis -
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
16 Perform analysis on sequence assembly contigs.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
17
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
18 =head1 SYNOPSIS
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
19
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
20 # Module loading
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
21 use Bio::Assembly::ContigAnalysis;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
22
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
23 # Assembly loading methods
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
24 my $ca = new Bio::Assembly::ContigAnalysis( -contig=>$contigOBJ );
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
25
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
26 my @lcq = $ca->low_consensus_quality;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
27 my @hqd = $ca->high_quality_discrepancies;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
28 my @ss = $ca->single_strand_regions;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
29
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
30 =head1 DESCRIPTION
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
31
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
32 A contig is as a set of sequences, locally aligned to each other, when
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
33 the sequences in a pair may be aligned. It may also include a
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
34 consensus sequence. Bio::Assembly::ContigAnalysis is a module
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
35 holding a collection of methods to analyze contig objects. It was
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
36 developed around the Bio::Assembly::Contig implementation of contigs and
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
37 can not work with another contig interface.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
38
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
39 =head1 FEEDBACK
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
40
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
41 =head2 Mailing Lists
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
42
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
43 User feedback is an integral part of the evolution of this and other
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
44 Bioperl modules. Send your comments and suggestions preferably to the
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
45 Bioperl mailing lists Your participation is much appreciated.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
46
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
47 bioperl-l@bioperl.org - General discussion
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
48 http://bio.perl.org/MailList.html - About the mailing lists
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
49
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
50 =head2 Reporting Bugs
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
51
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
52 Report bugs to the Bioperl bug tracking system to help us keep track
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
53 the bugs and their resolution. Bug reports can be submitted via email
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
54 or the web:
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
55
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
56 bioperl-bugs@bio.perl.org
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
57 http://bugzilla.bioperl.org/
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
58
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
59 =head1 AUTHOR - Robson Francisco de Souza
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
60
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
61 Email: rfsouza@citri.iq.usp.br
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
62
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
63 =head1 APPENDIX
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
64
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
65 The rest of the documentation details each of the object
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
66 methods. Internal methods are usually preceded with a _
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
67
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
68 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
69
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
70 package Bio::Assembly::ContigAnalysis;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
71
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
72 use Bio::Root::Root;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
73 use strict;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
74 use vars qw(@ISA);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
75
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
76 @ISA = qw(Bio::Root::Root);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
77
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
78 =head1 Object creator
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
79
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
80 =head2 new
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
81
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
82 Title : new
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
83 Usage : my $contig = Bio::Assembly::ContigAnalysis->new(-contig=>$contigOBJ);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
84 Function : Creates a new contig analysis object
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
85 Returns : Bio::Assembly::ContigAnalysis
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
86 Args :
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
87 -contig : a Bio::Assembly::Contig object
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
88
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
89 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
90
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
91 sub new {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
92 my($class,@args) = @_;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
93 my $self = $class->SUPER::new(@args);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
94
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
95 my ($contigOBJ) = $self->_rearrange([qw(CONTIG)],@args);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
96 unless ($contigOBJ->isa("Bio::Assembly::Contig")) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
97 $self->throw("ContigAnal works only on Bio::Assembly::Contig objects\n");
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
98 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
99
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
100 $self->{'_objref'} = $contigOBJ;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
101 return $self;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
102 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
103
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
104 =head1 Analysis methods
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
105
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
106 =head2 high_quality_discrepancies
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
107
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
108 Title : high_quality_discrepancies
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
109 Usage : my $sfc = $ContigAnal->high_quality_discrepancies();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
110 Function :
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
111
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
112 Locates all high quality discrepancies among aligned
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
113 sequences and the consensus sequence.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
114
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
115 Note: see Bio::Assembly::Contig POD documentation,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
116 section "Coordinate System", for a definition of
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
117 available types. Default coordinate system type is
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
118 "gapped consensus", i.e. consensus sequence (with gaps)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
119 coordinates. If limits are not specified, the entire
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
120 alignment is analyzed.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
121
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
122 Returns : Bio::SeqFeature::Collection
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
123 Args : optional arguments are
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
124 -threshold : cutoff value for low quality (minimum high quality)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
125 Default: 40
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
126 -ignore : number of bases that will not be analysed at
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
127 both ends of contig aligned elements
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
128 Default: 5
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
129 -start : start of interval that will be analyzed
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
130 -end : start of interval that will be analyzed
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
131 -type : coordinate system type for interval
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
132
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
133 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
134
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
135 sub high_quality_discrepancies {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
136 my ($self,@args) = shift; # Package reference
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
137
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
138 my ($threshold,$ignore,$start,$end,$type) =
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
139 $self->_rearrange([qw(THRESHOLD IGNORE START END TYPE)],@args);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
140
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
141 # Defining default threhold and HQD_ignore
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
142 $threshold = 40 unless (defined($threshold));
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
143 $ignore = 5 unless (defined($ignore));
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
144 $type = 'gapped consensus' unless (defined($type));
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
145
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
146 # Changing input coordinates system (if needed)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
147 if (defined $start && $type ne 'gapped consensus') {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
148 $start = $self->{'_objref'}->change_coord($type,'gapped consensus',$start);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
149 } elsif (!defined $start) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
150 $start = 1;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
151 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
152 if (defined $end && $type ne 'gapped consensus') {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
153 $end = $self->{'_objref'}->change_coord($type,'gapped consensus',$end);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
154 } elsif (!defined $end) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
155 $end = $self->{'_objref'}->get_consensus_length();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
156 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
157
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
158 # Scanning each read sequence and the contig sequence and
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
159 # adding discrepancies to Bio::SeqFeature::Collection
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
160 my @seqIDs = $self->{'_objref'}->get_seq_ids(-start=>$start,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
161 -end=>$end,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
162 -type=>$type);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
163 my $consensus = $self->{'_objref'}->get_consensus_sequence()->seq;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
164
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
165 my @HQD = ();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
166 foreach my $seqID (@seqIDs) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
167 # Setting aligned read sub-sequence limits and loading data
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
168 my $seq = $self->{'_objref'}->get_seq_by_name($seqID);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
169 my $qual = $self->{'_objref'}->get_qual_by_name($seqID);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
170 unless (defined $qual) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
171 $self->warn("Can't correctly evaluate HQD without aligned sequence qualities for $seqID");
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
172 next;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
173 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
174 my $sequence = $seq->seq;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
175 my @quality = @{ $qual->qual };
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
176
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
177 # Scanning the aligned region of each read
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
178 my $seq_ix = 0;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
179 my $coord = $self->{'_objref'}->get_seq_feat_by_tag($seq,"_align_clipping:$seqID");
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
180 my ($astart,$aend) = ($coord->start,$coord->end);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
181 $astart = $astart + $ignore; # Redefining limits to evaluate HQDs (jump $ignore at start)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
182 $aend = $aend - $ignore; # Redefining limits to evaluate HQDs (stop $ignore before end)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
183
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
184 my ($d_start,$d_end,$i);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
185 for ($i=$astart-1; $i<=$aend-1; $i++) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
186 # Changing coordinate $i+1 from 'gapped consensus' mode to "aligned $seqID" (coordinate $seq_ix)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
187 $seq_ix = $self->{'_objref'}->change_coord('gapped consensus',"aligned $seqID",$i+1);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
188 next unless (($i >= $start) && ($i <= $end));
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
189
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
190 my $r_base = uc(substr($sequence,$seq_ix-1,1));
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
191 my $c_base = uc(substr($consensus,$i,1));
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
192
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
193 # Discrepant region start: store $d_start and $type
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
194 (!defined($d_start) &&
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
195 ($r_base ne $c_base) &&
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
196 ($quality[$seq_ix-1] >= $threshold)) && do {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
197 $d_start = $self->{'_objref'}->change_coord('gapped consensus','ungapped consensus',$i+1);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
198 #print $seqID," ",$r_base," ",$i+1," ",$c_base," ",$contig_ix-1," ",$quality[$i]," $type\n";
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
199 next;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
200 };
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
201
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
202 # Quality change or end of discrepant region: store limits and undef $d_start
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
203 if (defined($d_start) &&
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
204 (($quality[$seq_ix-1] < $threshold) ||
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
205 (uc($r_base) eq uc($c_base)))) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
206 $d_end = $self->{'_objref'}->change_coord('gapped consensus','ungapped consensus',$i);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
207 #print $seqID," ",$r_base," ",$i+1," ",$c_base," ",$contig_ix-1," ",$quality[$i]," $type\n";
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
208 push(@HQD, Bio::SeqFeature::Generic->new(-primary=>"high_quality_discrepancy:$seqID",
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
209 -start=>$d_start,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
210 -end=>$d_end,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
211 -strand=>$seq->strand()) );
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
212 $d_start = undef;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
213 next;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
214 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
215 } # for ($i=$astart-1; $i<=$aend-1; $i++)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
216
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
217 # Loading discrepancies located at sub-sequence end, if any.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
218 if (defined($d_start)) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
219 $d_end = $self->{'_objref'}->change_coord('gapped consensus','ungapped consensus',$i);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
220 push(@HQD, Bio::SeqFeature::Generic->new(-primary=>"high_quality_discrepancy:$seqID",
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
221 -start=>$d_start,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
222 -end=>$d_end,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
223 -strand=>$seq->strand()) );
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
224 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
225 } # foreach my $seqID (@seqIDs)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
226
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
227 return @HQD;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
228 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
229
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
230 =head2 low_consensus_quality
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
231
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
232 Title : low_consensus_quality
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
233 Usage : my $sfc = $ContigAnal->low_consensus_quality();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
234 Function : Locates all low quality regions in the consensus
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
235 Returns : an array of Bio::SeqFeature::Generic objects
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
236 Args : optional arguments are
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
237 -threshold : cutoff value for low quality (minimum high quality)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
238 Default: 25
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
239 -start : start of interval that will be analyzed
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
240 -end : start of interval that will be analyzed
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
241 -type : coordinate system type for interval
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
242
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
243 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
244
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
245 sub low_consensus_quality {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
246 my ($self,@args) = shift; # Packege reference
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
247
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
248 my ($threshold,$start,$end,$type) =
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
249 $self->_rearrange([qw(THRESHOLD START END TYPE)],@args);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
250
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
251 # Setting default value for threshold
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
252 $threshold = 25 unless (defined($threshold));
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
253
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
254 # Loading qualities
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
255 my @quality = @{ $self->{'_objref'}->get_consensus_quality()->qual };
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
256
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
257 # Changing coordinates to gap mode noaln (consed: consensus without alignments)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
258 $start = 1 unless (defined($start));
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
259 if (defined $start && defined $type && ($type ne 'gapped consensus')) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
260 $start = $self->{'objref'}->change_coord($type,'gapped consensus',$start);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
261 $end = $self->{'objref'}->change_coord($type,'gapped consensus',$end) if (defined($end));
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
262 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
263 $end = $self->{'_objref'}->get_consensus_length unless (defined $end);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
264
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
265 # Scanning @quality vector and storing intervals limits with base qualities less then
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
266 # the threshold value
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
267 my ($lcq_start);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
268 my ($i,@LCQ);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
269 for ($i=$start-1; $i<=$end-1; $i++) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
270 # print $quality[$i],"\t",$i,"\n";
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
271 if (!defined($lcq_start) && (($quality[$i] <= $threshold) || ($quality[$i] == 98))) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
272 $lcq_start = $i+1;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
273 } elsif (defined($lcq_start) && ($quality[$i] > $threshold)) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
274 $lcq_start = $self->{'_objref'}->change_coord('gapped consensus','ungapped consensus',$lcq_start);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
275 my $lcq_end = $self->{'_objref'}->change_coord('gapped consensus','ungapped consensus',$i);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
276 push(@LCQ, Bio::SeqFeature::Generic->new(-start=>$lcq_start,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
277 -end=>$lcq_end,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
278 -primary=>'low_consensus_quality') );
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
279 $lcq_start = undef;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
280 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
281 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
282
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
283 if (defined $lcq_start) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
284 $lcq_start = $self->{'_objref'}->change_coord('gapped consensus','ungapped consensus',$lcq_start);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
285 my $lcq_end = $self->{'_objref'}->change_coord('gapped consensus','ungapped consensus',$i);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
286 push(@LCQ, Bio::SeqFeature::Generic->new(-start=>$lcq_start,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
287 -end=>$lcq_end,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
288 -primary=>'low_consensus_quality') );
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
289 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
290
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
291 return @LCQ;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
292 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
293
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
294 =head2 not_confirmed_on_both_strands
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
295
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
296 Title : low_quality_consensus
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
297 Usage : my $sfc = $ContigAnal->low_quality_consensus();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
298 Function :
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
299
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
300 Locates all regions whose consensus bases were not
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
301 confirmed by bases from sequences aligned in both
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
302 orientations, i.e., in such regions, no bases in aligned
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
303 sequences of either +1 or -1 strand agree with the
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
304 consensus bases.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
305
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
306 Returns : an array of Bio::SeqFeature::Generic objects
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
307 Args : optional arguments are
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
308 -start : start of interval that will be analyzed
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
309 -end : start of interval that will be analyzed
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
310 -type : coordinate system type for interval
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
311
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
312 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
313
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
314 sub not_confirmed_on_both_strands {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
315 my ($self,@args) = shift; # Package reference
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
316
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
317 my ($start,$end,$type) =
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
318 $self->_rearrange([qw(START END TYPE)],@args);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
319
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
320 # Changing coordinates to default system 'align' (contig sequence with alignments)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
321 $start = 1 unless (defined($start));
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
322 if (defined($type) && ($type ne 'gapped consensus')) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
323 $start = $self->{'_objref'}->change_coord($type,'gapped consensus',$start);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
324 $end = $self->{'_objref'}->change_coord($type,'gapped consensus',$end) if (defined($end));
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
325 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
326 $end = $self->{'_objref'}->get_consensus_length unless (defined($end));
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
327
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
328 # Scanning alignment
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
329 my %confirmed = (); # If ($confirmed{$orientation}[$i] > 0) then $i is confirmed in $orientation strand
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
330 my ($i);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
331 my $consensus = $self->{'_objref'}->get_consensus_sequence()->seq;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
332 foreach my $seqID ($self->{'_objref'}->get_seq_ids) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
333 # Setting aligned read sub-sequence limits and loading data
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
334 my $seq = $self->{'_objref'}->get_seq_by_name($seqID);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
335 my $sequence = $seq->seq;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
336
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
337 # Scanning the aligned regions of each read and registering confirmed sites
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
338 my $contig_ix = 0;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
339 my $coord = $self->{'_objref'}->get_seq_feat_by_tag($seq,"_align_clipping:$seqID");
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
340 my ($astart,$aend,$orientation) = ($coord->start,$coord->end,$coord->strand);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
341 $astart = $self->{'_objref'}->change_coord('gapped consensus',"aligned $seqID",$astart);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
342 $aend = $self->{'_objref'}->change_coord('gapped consensus',"aligned $seqID",$aend);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
343 for ($i=$astart-1; $i<=$aend-1; $i++) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
344 # $i+1 in 'align' mode is $contig_ix
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
345 $contig_ix = $self->{'_objref'}->change_coord("aligned $seqID",'gapped consensus',$i+1);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
346 next unless (($contig_ix >= $start) && ($contig_ix <= $end));
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
347 my $r_base = uc(substr($sequence,$i,1));
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
348 my $c_base = uc(substr($consensus,$contig_ix-1,1));
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
349 if ($c_base eq '-') {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
350 $confirmed{$orientation}[$contig_ix] = -1;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
351 } elsif (uc($r_base) eq uc($c_base)) { # Non discrepant region found
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
352 $confirmed{$orientation}[$contig_ix]++;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
353 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
354 } # for ($i=$astart-1; $i<=$aend-1; $i++)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
355 } # foreach $seqID (@reads)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
356
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
357 # Locating non-confirmed aligned regions for each orientation in $confirmed registry
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
358 my ($orientation);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
359 my @NCBS = ();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
360 foreach $orientation (keys %confirmed) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
361 my ($ncbs_start,$ncbs_end);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
362
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
363 for ($i=$start; $i<=$end; $i++) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
364 if (!defined($ncbs_start) &&
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
365 (!defined($confirmed{$orientation}[$i]) || ($confirmed{$orientation}[$i] == 0))) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
366 $ncbs_start = $self->{'_objref'}->change_coord('gapped consensus','ungapped consensus',$i);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
367 } elsif (defined($ncbs_start) &&
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
368 defined($confirmed{$orientation}[$i]) &&
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
369 ($confirmed{$orientation}[$i] > 0)) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
370 $ncbs_end = $self->{'_objref'}->change_coord('gapped consensus','ungapped consensus',$i-1);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
371 push(@NCBS, Bio::SeqFeature::Generic->new(-start=>$ncbs_start,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
372 -end=>$ncbs_end,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
373 -strand=>$orientation,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
374 -primary=>"not_confirmed_on_both_strands") );
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
375 $ncbs_start = undef;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
376 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
377 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
378
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
379 if (defined($ncbs_start)) { # NCBS at the end of contig
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
380 $ncbs_end = $self->{'_objref'}->change_coord('gapped consensus','ungapped consensus',$end);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
381 push(@NCBS, Bio::SeqFeature::Generic->new(-start=>$ncbs_start,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
382 -end=>$ncbs_end,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
383 -strand=>$orientation,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
384 -primary=>'not_confirmed_on_both_strands') );
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
385 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
386 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
387
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
388 return @NCBS;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
389 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
390
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
391 =head2 single_strand
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
392
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
393 Title : single_strand
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
394 Usage : my $sfc = $ContigAnal->single_strand();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
395 Function :
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
396
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
397 Locates all regions covered by aligned sequences only in
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
398 one of the two strands, i.e., regions for which aligned
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
399 sequence's strand() method returns +1 or -1 for all
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
400 sequences.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
401
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
402 Returns : an array of Bio::SeqFeature::Generic objects
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
403 Args : optional arguments are
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
404 -start : start of interval that will be analyzed
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
405 -end : start of interval that will be analyzed
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
406 -type : coordinate system type for interval
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
407
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
408 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
409
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
410 #'
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
411 sub single_strand {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
412 my ($self,@args) = shift; # Package reference
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
413
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
414 my ($start,$end,$type) =
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
415 $self->_rearrange([qw(START END TYPE)],@args);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
416
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
417 # Changing coordinates to gap mode align (consed: consensus sequence with alignments)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
418 $type = 'gapped consensus' unless(defined($type));
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
419 $start = 1 unless (defined($start));
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
420 if (defined($type) && $type ne 'gapped consensus') {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
421 $start = $self->{'objref'}->change_coord($type,'gapped consensus',$start);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
422 $end = $self->{'objref'}->change_coord($type,'gapped consensus',$end) if (defined($end));
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
423 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
424 ($end) = $self->{'_objref'}->get_consensus_length unless (defined($end));
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
425
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
426 # Loading complete list of coordinates for aligned sequences
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
427 my $sfc = $self->{'_objref'}->get_features_collection();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
428 my @forward = grep { $_->primary_tag =~ /^_aligned_coord:/ }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
429 $sfc->features_in_range(-start=>$start,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
430 -end=>$end,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
431 -contain=>0,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
432 -strand=>1,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
433 -strandmatch=>'strong');
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
434 my @reverse = grep { $_->primary_tag =~ /^_aligned_coord:/ }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
435 $sfc->features_in_range(-start=>$start,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
436 -end=>$end,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
437 -contain=>0,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
438 -strand=>-1,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
439 -strandmatch=>'strong');
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
440 # Merging overlapping features
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
441 @forward = $self->_merge_overlapping_features(@forward);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
442 @reverse = $self->_merge_overlapping_features(@reverse);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
443
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
444 # Finding single stranded regions
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
445 my ($length) = $self->{'_objref'}->get_consensus_length;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
446 $length = $self->{'_objref'}->change_coord('gapped consensus','ungapped consensus',$length);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
447 @forward = $self->_complementary_features_list(1,$length,@forward);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
448 @reverse = $self->_complementary_features_list(1,$length,@reverse);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
449
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
450 my @SS = ();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
451 foreach my $feat (@forward, @reverse) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
452 $feat->primary_tag('single_strand_region');
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
453 push(@SS,$feat);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
454 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
455
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
456 return @SS;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
457 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
458
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
459 =head1 Internal Methods
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
460
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
461 =head2 _merge_overlapping_features
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
462
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
463 Title : _merge_overlapping_features
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
464 Usage : my @feat = $ContigAnal->_merge_overlapping_features(@features);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
465 Function : Merge all overlapping features into features
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
466 that hold original features as sub-features
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
467 Returns : array of Bio::SeqFeature::Generic objects
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
468 Args : array of Bio::SeqFeature::Generic objects
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
469
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
470 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
471
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
472 sub _merge_overlapping_features {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
473 my ($self,@feat) = @_;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
474
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
475 $self->throw_not_implemented();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
476 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
477
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
478 =head2 _complementary_features_list
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
479
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
480 Title : _complementary_features_list
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
481 Usage : @feat = $ContigAnal->_complementary_features_list($start,$end,@features);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
482 Function : Build a list of features for regions
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
483 not covered by features in @features array
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
484 Returns : array of Bio::SeqFeature::Generic objects
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
485 Args :
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
486 $start : [integer] start of first output feature
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
487 $end : [integer] end of last output feature
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
488 @features : array of Bio::SeqFeature::Generic objects
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
489
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
490 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
491
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
492 sub _complementary_features_list {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
493 my ($self,$start,$end,@feat) = @_;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
494
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
495 $self->throw_not_implemented();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
496 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
497
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
498 1;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
499
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
500 __END__