annotate variant_effect_predictor/Bio/Assembly/ContigAnalysis.pm @ 3:d30fa12e4cc5 default tip

Merge heads 2:a5976b2dce6f and 1:09613ce8151e which were created as a result of a recently fixed bug.
author devteam <devteam@galaxyproject.org>
date Mon, 13 Jan 2014 10:38:30 -0500
parents 1f6dce3d34e0
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1 # $Id: ContigAnalysis.pm,v 1.2 2002/12/01 00:03:28 jason Exp $
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
2 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
3 # BioPerl module for Bio::Assembly::ContigAnalysis
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
4 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
5 # Cared for by Robson francisco de Souza <rfsouza@citri.iq.usp.br>
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
6 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
7 # Copyright Robson Francisco de Souza
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
8 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
9 # You may distribute this module under the same terms as perl itself
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
10 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
11 # POD documentation - main docs before the code
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
12
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
13 =head1 NAME
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
14
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
15 Bio::Assembly::ContigAnalysis -
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
16 Perform analysis on sequence assembly contigs.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
17
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
18 =head1 SYNOPSIS
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
19
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
20 # Module loading
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
21 use Bio::Assembly::ContigAnalysis;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
22
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
23 # Assembly loading methods
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
24 my $ca = new Bio::Assembly::ContigAnalysis( -contig=>$contigOBJ );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
25
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
26 my @lcq = $ca->low_consensus_quality;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
27 my @hqd = $ca->high_quality_discrepancies;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
28 my @ss = $ca->single_strand_regions;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
29
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
30 =head1 DESCRIPTION
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
31
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
32 A contig is as a set of sequences, locally aligned to each other, when
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
33 the sequences in a pair may be aligned. It may also include a
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
34 consensus sequence. Bio::Assembly::ContigAnalysis is a module
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
35 holding a collection of methods to analyze contig objects. It was
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
36 developed around the Bio::Assembly::Contig implementation of contigs and
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
37 can not work with another contig interface.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
38
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
39 =head1 FEEDBACK
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
40
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
41 =head2 Mailing Lists
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
42
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
43 User feedback is an integral part of the evolution of this and other
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
44 Bioperl modules. Send your comments and suggestions preferably to the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
45 Bioperl mailing lists Your participation is much appreciated.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
46
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
47 bioperl-l@bioperl.org - General discussion
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
48 http://bio.perl.org/MailList.html - About the mailing lists
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
49
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
50 =head2 Reporting Bugs
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
51
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
52 Report bugs to the Bioperl bug tracking system to help us keep track
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
53 the bugs and their resolution. Bug reports can be submitted via email
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
54 or the web:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
55
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
56 bioperl-bugs@bio.perl.org
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
57 http://bugzilla.bioperl.org/
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
58
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
59 =head1 AUTHOR - Robson Francisco de Souza
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
60
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
61 Email: rfsouza@citri.iq.usp.br
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
62
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
63 =head1 APPENDIX
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
64
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
65 The rest of the documentation details each of the object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
66 methods. Internal methods are usually preceded with a _
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
67
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
68 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
69
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
70 package Bio::Assembly::ContigAnalysis;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
71
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
72 use Bio::Root::Root;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
73 use strict;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
74 use vars qw(@ISA);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
75
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
76 @ISA = qw(Bio::Root::Root);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
77
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
78 =head1 Object creator
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
79
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
80 =head2 new
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
81
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
82 Title : new
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
83 Usage : my $contig = Bio::Assembly::ContigAnalysis->new(-contig=>$contigOBJ);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
84 Function : Creates a new contig analysis object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
85 Returns : Bio::Assembly::ContigAnalysis
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
86 Args :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
87 -contig : a Bio::Assembly::Contig object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
88
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
89 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
90
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
91 sub new {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
92 my($class,@args) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
93 my $self = $class->SUPER::new(@args);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
94
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
95 my ($contigOBJ) = $self->_rearrange([qw(CONTIG)],@args);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
96 unless ($contigOBJ->isa("Bio::Assembly::Contig")) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
97 $self->throw("ContigAnal works only on Bio::Assembly::Contig objects\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
98 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
99
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
100 $self->{'_objref'} = $contigOBJ;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
101 return $self;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
102 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
103
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
104 =head1 Analysis methods
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
105
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
106 =head2 high_quality_discrepancies
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
107
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
108 Title : high_quality_discrepancies
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
109 Usage : my $sfc = $ContigAnal->high_quality_discrepancies();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
110 Function :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
111
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
112 Locates all high quality discrepancies among aligned
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
113 sequences and the consensus sequence.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
114
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
115 Note: see Bio::Assembly::Contig POD documentation,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
116 section "Coordinate System", for a definition of
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
117 available types. Default coordinate system type is
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
118 "gapped consensus", i.e. consensus sequence (with gaps)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
119 coordinates. If limits are not specified, the entire
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
120 alignment is analyzed.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
121
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
122 Returns : Bio::SeqFeature::Collection
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
123 Args : optional arguments are
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
124 -threshold : cutoff value for low quality (minimum high quality)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
125 Default: 40
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
126 -ignore : number of bases that will not be analysed at
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
127 both ends of contig aligned elements
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
128 Default: 5
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
129 -start : start of interval that will be analyzed
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
130 -end : start of interval that will be analyzed
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
131 -type : coordinate system type for interval
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
132
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
133 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
134
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
135 sub high_quality_discrepancies {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
136 my ($self,@args) = shift; # Package reference
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
137
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
138 my ($threshold,$ignore,$start,$end,$type) =
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
139 $self->_rearrange([qw(THRESHOLD IGNORE START END TYPE)],@args);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
140
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
141 # Defining default threhold and HQD_ignore
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
142 $threshold = 40 unless (defined($threshold));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
143 $ignore = 5 unless (defined($ignore));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
144 $type = 'gapped consensus' unless (defined($type));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
145
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
146 # Changing input coordinates system (if needed)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
147 if (defined $start && $type ne 'gapped consensus') {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
148 $start = $self->{'_objref'}->change_coord($type,'gapped consensus',$start);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
149 } elsif (!defined $start) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
150 $start = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
151 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
152 if (defined $end && $type ne 'gapped consensus') {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
153 $end = $self->{'_objref'}->change_coord($type,'gapped consensus',$end);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
154 } elsif (!defined $end) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
155 $end = $self->{'_objref'}->get_consensus_length();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
156 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
157
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
158 # Scanning each read sequence and the contig sequence and
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
159 # adding discrepancies to Bio::SeqFeature::Collection
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
160 my @seqIDs = $self->{'_objref'}->get_seq_ids(-start=>$start,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
161 -end=>$end,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
162 -type=>$type);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
163 my $consensus = $self->{'_objref'}->get_consensus_sequence()->seq;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
164
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
165 my @HQD = ();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
166 foreach my $seqID (@seqIDs) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
167 # Setting aligned read sub-sequence limits and loading data
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
168 my $seq = $self->{'_objref'}->get_seq_by_name($seqID);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
169 my $qual = $self->{'_objref'}->get_qual_by_name($seqID);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
170 unless (defined $qual) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
171 $self->warn("Can't correctly evaluate HQD without aligned sequence qualities for $seqID");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
172 next;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
173 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
174 my $sequence = $seq->seq;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
175 my @quality = @{ $qual->qual };
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
176
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
177 # Scanning the aligned region of each read
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
178 my $seq_ix = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
179 my $coord = $self->{'_objref'}->get_seq_feat_by_tag($seq,"_align_clipping:$seqID");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
180 my ($astart,$aend) = ($coord->start,$coord->end);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
181 $astart = $astart + $ignore; # Redefining limits to evaluate HQDs (jump $ignore at start)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
182 $aend = $aend - $ignore; # Redefining limits to evaluate HQDs (stop $ignore before end)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
183
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
184 my ($d_start,$d_end,$i);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
185 for ($i=$astart-1; $i<=$aend-1; $i++) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
186 # Changing coordinate $i+1 from 'gapped consensus' mode to "aligned $seqID" (coordinate $seq_ix)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
187 $seq_ix = $self->{'_objref'}->change_coord('gapped consensus',"aligned $seqID",$i+1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
188 next unless (($i >= $start) && ($i <= $end));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
189
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
190 my $r_base = uc(substr($sequence,$seq_ix-1,1));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
191 my $c_base = uc(substr($consensus,$i,1));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
192
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
193 # Discrepant region start: store $d_start and $type
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
194 (!defined($d_start) &&
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
195 ($r_base ne $c_base) &&
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
196 ($quality[$seq_ix-1] >= $threshold)) && do {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
197 $d_start = $self->{'_objref'}->change_coord('gapped consensus','ungapped consensus',$i+1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
198 #print $seqID," ",$r_base," ",$i+1," ",$c_base," ",$contig_ix-1," ",$quality[$i]," $type\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
199 next;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
200 };
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
201
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
202 # Quality change or end of discrepant region: store limits and undef $d_start
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
203 if (defined($d_start) &&
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
204 (($quality[$seq_ix-1] < $threshold) ||
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
205 (uc($r_base) eq uc($c_base)))) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
206 $d_end = $self->{'_objref'}->change_coord('gapped consensus','ungapped consensus',$i);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
207 #print $seqID," ",$r_base," ",$i+1," ",$c_base," ",$contig_ix-1," ",$quality[$i]," $type\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
208 push(@HQD, Bio::SeqFeature::Generic->new(-primary=>"high_quality_discrepancy:$seqID",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
209 -start=>$d_start,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
210 -end=>$d_end,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
211 -strand=>$seq->strand()) );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
212 $d_start = undef;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
213 next;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
214 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
215 } # for ($i=$astart-1; $i<=$aend-1; $i++)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
216
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
217 # Loading discrepancies located at sub-sequence end, if any.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
218 if (defined($d_start)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
219 $d_end = $self->{'_objref'}->change_coord('gapped consensus','ungapped consensus',$i);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
220 push(@HQD, Bio::SeqFeature::Generic->new(-primary=>"high_quality_discrepancy:$seqID",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
221 -start=>$d_start,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
222 -end=>$d_end,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
223 -strand=>$seq->strand()) );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
224 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
225 } # foreach my $seqID (@seqIDs)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
226
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
227 return @HQD;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
228 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
229
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
230 =head2 low_consensus_quality
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
231
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
232 Title : low_consensus_quality
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
233 Usage : my $sfc = $ContigAnal->low_consensus_quality();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
234 Function : Locates all low quality regions in the consensus
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
235 Returns : an array of Bio::SeqFeature::Generic objects
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
236 Args : optional arguments are
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
237 -threshold : cutoff value for low quality (minimum high quality)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
238 Default: 25
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
239 -start : start of interval that will be analyzed
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
240 -end : start of interval that will be analyzed
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
241 -type : coordinate system type for interval
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
242
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
243 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
244
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
245 sub low_consensus_quality {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
246 my ($self,@args) = shift; # Packege reference
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
247
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
248 my ($threshold,$start,$end,$type) =
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
249 $self->_rearrange([qw(THRESHOLD START END TYPE)],@args);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
250
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
251 # Setting default value for threshold
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
252 $threshold = 25 unless (defined($threshold));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
253
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
254 # Loading qualities
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
255 my @quality = @{ $self->{'_objref'}->get_consensus_quality()->qual };
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
256
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
257 # Changing coordinates to gap mode noaln (consed: consensus without alignments)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
258 $start = 1 unless (defined($start));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
259 if (defined $start && defined $type && ($type ne 'gapped consensus')) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
260 $start = $self->{'objref'}->change_coord($type,'gapped consensus',$start);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
261 $end = $self->{'objref'}->change_coord($type,'gapped consensus',$end) if (defined($end));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
262 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
263 $end = $self->{'_objref'}->get_consensus_length unless (defined $end);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
264
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
265 # Scanning @quality vector and storing intervals limits with base qualities less then
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
266 # the threshold value
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
267 my ($lcq_start);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
268 my ($i,@LCQ);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
269 for ($i=$start-1; $i<=$end-1; $i++) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
270 # print $quality[$i],"\t",$i,"\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
271 if (!defined($lcq_start) && (($quality[$i] <= $threshold) || ($quality[$i] == 98))) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
272 $lcq_start = $i+1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
273 } elsif (defined($lcq_start) && ($quality[$i] > $threshold)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
274 $lcq_start = $self->{'_objref'}->change_coord('gapped consensus','ungapped consensus',$lcq_start);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
275 my $lcq_end = $self->{'_objref'}->change_coord('gapped consensus','ungapped consensus',$i);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
276 push(@LCQ, Bio::SeqFeature::Generic->new(-start=>$lcq_start,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
277 -end=>$lcq_end,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
278 -primary=>'low_consensus_quality') );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
279 $lcq_start = undef;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
280 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
281 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
282
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
283 if (defined $lcq_start) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
284 $lcq_start = $self->{'_objref'}->change_coord('gapped consensus','ungapped consensus',$lcq_start);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
285 my $lcq_end = $self->{'_objref'}->change_coord('gapped consensus','ungapped consensus',$i);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
286 push(@LCQ, Bio::SeqFeature::Generic->new(-start=>$lcq_start,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
287 -end=>$lcq_end,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
288 -primary=>'low_consensus_quality') );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
289 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
290
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
291 return @LCQ;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
292 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
293
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
294 =head2 not_confirmed_on_both_strands
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
295
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
296 Title : low_quality_consensus
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
297 Usage : my $sfc = $ContigAnal->low_quality_consensus();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
298 Function :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
299
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
300 Locates all regions whose consensus bases were not
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
301 confirmed by bases from sequences aligned in both
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
302 orientations, i.e., in such regions, no bases in aligned
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
303 sequences of either +1 or -1 strand agree with the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
304 consensus bases.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
305
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
306 Returns : an array of Bio::SeqFeature::Generic objects
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
307 Args : optional arguments are
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
308 -start : start of interval that will be analyzed
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
309 -end : start of interval that will be analyzed
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
310 -type : coordinate system type for interval
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
311
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
312 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
313
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
314 sub not_confirmed_on_both_strands {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
315 my ($self,@args) = shift; # Package reference
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
316
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
317 my ($start,$end,$type) =
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
318 $self->_rearrange([qw(START END TYPE)],@args);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
319
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
320 # Changing coordinates to default system 'align' (contig sequence with alignments)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
321 $start = 1 unless (defined($start));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
322 if (defined($type) && ($type ne 'gapped consensus')) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
323 $start = $self->{'_objref'}->change_coord($type,'gapped consensus',$start);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
324 $end = $self->{'_objref'}->change_coord($type,'gapped consensus',$end) if (defined($end));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
325 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
326 $end = $self->{'_objref'}->get_consensus_length unless (defined($end));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
327
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
328 # Scanning alignment
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
329 my %confirmed = (); # If ($confirmed{$orientation}[$i] > 0) then $i is confirmed in $orientation strand
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
330 my ($i);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
331 my $consensus = $self->{'_objref'}->get_consensus_sequence()->seq;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
332 foreach my $seqID ($self->{'_objref'}->get_seq_ids) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
333 # Setting aligned read sub-sequence limits and loading data
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
334 my $seq = $self->{'_objref'}->get_seq_by_name($seqID);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
335 my $sequence = $seq->seq;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
336
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
337 # Scanning the aligned regions of each read and registering confirmed sites
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
338 my $contig_ix = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
339 my $coord = $self->{'_objref'}->get_seq_feat_by_tag($seq,"_align_clipping:$seqID");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
340 my ($astart,$aend,$orientation) = ($coord->start,$coord->end,$coord->strand);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
341 $astart = $self->{'_objref'}->change_coord('gapped consensus',"aligned $seqID",$astart);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
342 $aend = $self->{'_objref'}->change_coord('gapped consensus',"aligned $seqID",$aend);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
343 for ($i=$astart-1; $i<=$aend-1; $i++) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
344 # $i+1 in 'align' mode is $contig_ix
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
345 $contig_ix = $self->{'_objref'}->change_coord("aligned $seqID",'gapped consensus',$i+1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
346 next unless (($contig_ix >= $start) && ($contig_ix <= $end));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
347 my $r_base = uc(substr($sequence,$i,1));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
348 my $c_base = uc(substr($consensus,$contig_ix-1,1));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
349 if ($c_base eq '-') {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
350 $confirmed{$orientation}[$contig_ix] = -1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
351 } elsif (uc($r_base) eq uc($c_base)) { # Non discrepant region found
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
352 $confirmed{$orientation}[$contig_ix]++;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
353 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
354 } # for ($i=$astart-1; $i<=$aend-1; $i++)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
355 } # foreach $seqID (@reads)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
356
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
357 # Locating non-confirmed aligned regions for each orientation in $confirmed registry
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
358 my ($orientation);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
359 my @NCBS = ();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
360 foreach $orientation (keys %confirmed) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
361 my ($ncbs_start,$ncbs_end);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
362
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
363 for ($i=$start; $i<=$end; $i++) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
364 if (!defined($ncbs_start) &&
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
365 (!defined($confirmed{$orientation}[$i]) || ($confirmed{$orientation}[$i] == 0))) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
366 $ncbs_start = $self->{'_objref'}->change_coord('gapped consensus','ungapped consensus',$i);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
367 } elsif (defined($ncbs_start) &&
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
368 defined($confirmed{$orientation}[$i]) &&
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
369 ($confirmed{$orientation}[$i] > 0)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
370 $ncbs_end = $self->{'_objref'}->change_coord('gapped consensus','ungapped consensus',$i-1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
371 push(@NCBS, Bio::SeqFeature::Generic->new(-start=>$ncbs_start,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
372 -end=>$ncbs_end,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
373 -strand=>$orientation,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
374 -primary=>"not_confirmed_on_both_strands") );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
375 $ncbs_start = undef;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
376 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
377 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
378
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
379 if (defined($ncbs_start)) { # NCBS at the end of contig
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
380 $ncbs_end = $self->{'_objref'}->change_coord('gapped consensus','ungapped consensus',$end);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
381 push(@NCBS, Bio::SeqFeature::Generic->new(-start=>$ncbs_start,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
382 -end=>$ncbs_end,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
383 -strand=>$orientation,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
384 -primary=>'not_confirmed_on_both_strands') );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
385 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
386 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
387
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
388 return @NCBS;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
389 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
390
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
391 =head2 single_strand
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
392
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
393 Title : single_strand
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
394 Usage : my $sfc = $ContigAnal->single_strand();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
395 Function :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
396
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
397 Locates all regions covered by aligned sequences only in
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
398 one of the two strands, i.e., regions for which aligned
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
399 sequence's strand() method returns +1 or -1 for all
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
400 sequences.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
401
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
402 Returns : an array of Bio::SeqFeature::Generic objects
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
403 Args : optional arguments are
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
404 -start : start of interval that will be analyzed
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
405 -end : start of interval that will be analyzed
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
406 -type : coordinate system type for interval
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
407
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
408 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
409
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
410 #'
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
411 sub single_strand {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
412 my ($self,@args) = shift; # Package reference
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
413
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
414 my ($start,$end,$type) =
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
415 $self->_rearrange([qw(START END TYPE)],@args);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
416
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
417 # Changing coordinates to gap mode align (consed: consensus sequence with alignments)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
418 $type = 'gapped consensus' unless(defined($type));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
419 $start = 1 unless (defined($start));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
420 if (defined($type) && $type ne 'gapped consensus') {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
421 $start = $self->{'objref'}->change_coord($type,'gapped consensus',$start);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
422 $end = $self->{'objref'}->change_coord($type,'gapped consensus',$end) if (defined($end));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
423 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
424 ($end) = $self->{'_objref'}->get_consensus_length unless (defined($end));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
425
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
426 # Loading complete list of coordinates for aligned sequences
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
427 my $sfc = $self->{'_objref'}->get_features_collection();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
428 my @forward = grep { $_->primary_tag =~ /^_aligned_coord:/ }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
429 $sfc->features_in_range(-start=>$start,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
430 -end=>$end,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
431 -contain=>0,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
432 -strand=>1,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
433 -strandmatch=>'strong');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
434 my @reverse = grep { $_->primary_tag =~ /^_aligned_coord:/ }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
435 $sfc->features_in_range(-start=>$start,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
436 -end=>$end,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
437 -contain=>0,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
438 -strand=>-1,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
439 -strandmatch=>'strong');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
440 # Merging overlapping features
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
441 @forward = $self->_merge_overlapping_features(@forward);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
442 @reverse = $self->_merge_overlapping_features(@reverse);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
443
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
444 # Finding single stranded regions
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
445 my ($length) = $self->{'_objref'}->get_consensus_length;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
446 $length = $self->{'_objref'}->change_coord('gapped consensus','ungapped consensus',$length);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
447 @forward = $self->_complementary_features_list(1,$length,@forward);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
448 @reverse = $self->_complementary_features_list(1,$length,@reverse);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
449
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
450 my @SS = ();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
451 foreach my $feat (@forward, @reverse) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
452 $feat->primary_tag('single_strand_region');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
453 push(@SS,$feat);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
454 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
455
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
456 return @SS;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
457 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
458
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
459 =head1 Internal Methods
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
460
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
461 =head2 _merge_overlapping_features
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
462
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
463 Title : _merge_overlapping_features
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
464 Usage : my @feat = $ContigAnal->_merge_overlapping_features(@features);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
465 Function : Merge all overlapping features into features
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
466 that hold original features as sub-features
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
467 Returns : array of Bio::SeqFeature::Generic objects
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
468 Args : array of Bio::SeqFeature::Generic objects
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
469
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
470 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
471
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
472 sub _merge_overlapping_features {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
473 my ($self,@feat) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
474
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
475 $self->throw_not_implemented();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
476 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
477
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
478 =head2 _complementary_features_list
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
479
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
480 Title : _complementary_features_list
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
481 Usage : @feat = $ContigAnal->_complementary_features_list($start,$end,@features);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
482 Function : Build a list of features for regions
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
483 not covered by features in @features array
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
484 Returns : array of Bio::SeqFeature::Generic objects
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
485 Args :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
486 $start : [integer] start of first output feature
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
487 $end : [integer] end of last output feature
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
488 @features : array of Bio::SeqFeature::Generic objects
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
489
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
490 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
491
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
492 sub _complementary_features_list {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
493 my ($self,$start,$end,@feat) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
494
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
495 $self->throw_not_implemented();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
496 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
497
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
498 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
499
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
500 __END__