annotate variant_effect_predictor/Bio/Tools/Alignment/Trim.pm @ 1:d6778b5d8382 draft default tip

Deleted selected files
author willmclaren
date Fri, 03 Aug 2012 10:05:43 -0400
parents 21066c0abaf5
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1 # Bio::Tools::Alignment::Trim.pm
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
2 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
3 # Cared for by Chad Matsalla
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
4 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
5 # Copyright Chad Matsalla
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
6 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
7 # You may distribute this module under the same terms as perl itself
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
8
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
9 # POD documentation - main docs before the code
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
10
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
11 =head1 NAME
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
12
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
13 Bio::Tools::Alignment::Trim - A kludge to do specialized trimming of
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
14 sequence based on quality.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
15
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
16 =head1 SYNOPSIS
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
17
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
18 use Bio::Tools::Alignment::Trim;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
19 $o_trim = new Bio::Tools::Alignment::Trim;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
20 $o_trim->set_reverse_designator("R");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
21 $o_trim->set_forward_designator("F");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
22
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
23
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
24 =head1 DESCRIPTION
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
25
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
26 This is a specialized module designed by Chad for Chad to trim sequences
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
27 based on a highly specialized list of requirements. In other words, write
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
28 something that will trim sequences 'just like the people in the lab would
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
29 do manually'.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
30
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
31 I settled on a sliding-window-average style of search which is ugly and
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
32 slow but does _exactly_ what I want it to do.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
33
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
34 Mental note: rewrite this.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
35
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
36 It is very important to keep in mind the context in which this module was
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
37 written: strictly to support the projects for which Consed.pm was
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
38 designed.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
39
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
40 =head1 FEEDBACK
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
41
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
42 =head2 Mailing Lists
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
43
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
44 User feedback is an integral part of the evolution of this and other
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
45 Bioperl modules. Send your comments and suggestions preferably to one
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
46 of the Bioperl mailing lists. Your participation is much appreciated.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
47
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
48 bioperl-l@bioperl.org - General discussion
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
49 http://bio.perl.org/MailList.html - About the mailing
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
50 lists
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
51
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
52 =head2 Reporting Bugs
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
53
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
54 Report bugs to the Bioperl bug tracking system to help us keep track
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
55 the bugs and their resolution. Bug reports can be submitted via
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
56 email or the web:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
57
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
58 bioperl-bugs@bio.perl.org
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
59 http://bugzilla.bioperl.org/
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
60
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
61 =head1 AUTHOR - Chad Matsalla
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
62
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
63 Email bioinformatics@dieselwurks.com
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
64
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
65 =head1 APPENDIX
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
66
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
67 The rest of the documentation details each of the object methods.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
68 Internal methods are usually preceded with a _
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
69
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
70 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
71
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
72 package Bio::Tools::Alignment::Trim;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
73
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
74 use Bio::Root::Root;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
75 use strict;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
76 use Dumpvalue;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
77
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
78
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
79
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
80 use vars qw($VERSION @ISA %DEFAULTS);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
81
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
82 $VERSION = '0.01';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
83
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
84 @ISA = qw(Bio::Root::Root);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
85
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
86 BEGIN {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
87 %DEFAULTS = ( 'f_designator' => 'f',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
88 'r_designator' => 'r',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
89 'windowsize' => '10',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
90 'phreds' => '20');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
91 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
92
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
93 =head2 new()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
94
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
95 Title : new()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
96 Usage : $o_trim = Bio::Tools::Alignment::Trim->new();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
97 Function: Construct the Bio::Tools::Alignment::Trim object. No parameters
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
98 are required to create this object. It is strictly a bundle of
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
99 functions, as far as I am concerned.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
100 Returns : A reference to a Bio::Tools::Alignment::Trim object.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
101 Args : (optional)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
102 -windowsize (default 10)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
103 -phreds (default 20)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
104
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
105
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
106 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
107
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
108 sub new {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
109 my ($class,@args) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
110 my $self = $class->SUPER::new(@args);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
111 my($windowsize,$phreds) =
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
112 $self->_rearrange([qw(
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
113 WINDOWSIZE
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
114 PHREDS
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
115 )],
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
116 @args);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
117 $self->{windowsize} = $windowsize || $DEFAULTS{'windowsize'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
118 $self->{phreds} = $phreds || $DEFAULTS{'phreds'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
119 # print("Constructor set phreds to ".$self->{phreds}."\n") if $self->verbose > 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
120 $self->set_designators($DEFAULTS{'f_designator'},
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
121 $DEFAULTS{'r_designator'});
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
122 return $self;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
123 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
124
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
125 =head2 set_designators($forward_designator,$reverse_designator)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
126
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
127 Title : set_designators(<forward>,<reverse>)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
128 Usage : $o_trim->set_designators("F","R")
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
129 Function: Set the string by which the system determines whether a given
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
130 sequence represents a forward or a reverse read.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
131 Returns : Nothing.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
132 Args : two scalars: one representing the forward designator and one
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
133 representing the reverse designator
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
134
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
135 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
136
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
137 sub set_designators {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
138 my $self = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
139 ($self->{'f_designator'},$self->{'r_designator'}) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
140 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
141
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
142 =head2 set_forward_designator($designator)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
143
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
144 Title : set_forward_designator($designator)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
145 Usage : $o_trim->set_forward_designator("F")
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
146 Function: Set the string by which the system determines if a given
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
147 sequence is a forward read.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
148 Returns : Nothing.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
149 Args : A string representing the forward designator of this project.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
150
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
151 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
152
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
153 sub set_forward_designator {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
154 my ($self,$desig) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
155 $self->{'f_designator'} = $desig;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
156 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
157
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
158 =head2 set_reverse_designator($reverse_designator)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
159
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
160 Title : set_reverse_designator($reverse_designator)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
161 Function: Set the string by which the system determines if a given
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
162 sequence is a reverse read.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
163 Usage : $o_trim->set_reverse_designator("R")
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
164 Returns : Nothing.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
165 Args : A string representing the forward designator of this project.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
166
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
167 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
168
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
169 sub set_reverse_designator {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
170 my ($self,$desig) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
171 $self->{'r_designator'} = $desig;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
172 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
173
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
174 =head2 get_designators()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
175
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
176 Title : get_designators()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
177 Usage : $o_trim->get_designators()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
178 Returns : A string describing the current designators.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
179 Args : None
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
180 Notes : Really for informational purposes only. Duh.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
181
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
182 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
183
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
184 sub get_designators {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
185 my $self = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
186 return("forward: ".$self->{'f_designator'}." reverse: ".$self->{'r_designator'});
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
187 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
188
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
189 =head2 trim_leading_polys()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
190
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
191 Title : trim_leading_polys()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
192 Usage : $o_trim->trim_leading_polys()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
193 Function: Not implemented. Does nothing.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
194 Returns : Nothing.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
195 Args : None.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
196 Notes : This function is not implemented. Part of something I wanted to
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
197 do but never got around to doing.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
198
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
199 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
200
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
201 sub trim_leading_polys {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
202 my ($self, $sequence) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
203 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
204
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
205 =head2 dump_hash()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
206
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
207 Title : dump_hash()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
208 Usage : $o_trim->dump_hash()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
209 Function: Unimplemented.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
210 Returns : Nothing.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
211 Args : None.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
212 Notes : Does nothing.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
213
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
214 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
215
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
216 sub dump_hash {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
217 my $self = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
218 my %hash = %{$self->{'qualities'}};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
219 } # end dump_hash
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
220
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
221 =head2 trim_singlet($sequence,$quality,$name,$class)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
222
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
223 Title : trim_singlet($sequence,$quality,$name,$class)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
224 Usage : ($r_trim_points,$trimmed_sequence) =
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
225 @{$o_trim->trim_singlet($sequence,$quality,$name,$class)};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
226 Function: Trim a singlet based on its quality.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
227 Returns : a reference to an array containing the forward and reverse
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
228 trim points and the trimmed sequence.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
229 Args : $sequence : A sequence (SCALAR, please)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
230 $quality : A _scalar_ of space-delimited quality values.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
231 $name : the name of the sequence
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
232 $class : The class of the sequence. One of qw(singlet
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
233 singleton doublet pair multiplet)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
234 Notes : At the time this was written the bioperl objects SeqWithQuality
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
235 and PrimaryQual did not exist. This is what is with the clumsy
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
236 passing of references and so on. I will rewrite this next time I
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
237 have to work with it. I also wasn't sure whether this function
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
238 should return just the trim points or the points and the sequence.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
239 I decided that I always wanted both so that's how I implemented
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
240 it.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
241 - Note that the size of the sliding windows is set during construction of
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
242 the Bio::Tools::Alignment::Trim object.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
243
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
244 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
245
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
246 sub trim_singlet {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
247 my ($self,$sequence,$quality,$name,$class) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
248 # this split is done because I normally store quality values in a
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
249 # space-delimited scalar rather then in an array.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
250 # I do this because serialization of the arrays is tough.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
251 my @qual = split(' ',$quality);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
252 my @points;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
253 my $sequence_length = length($sequence);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
254 my ($returnstring,$processed_sequence);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
255 # smooth out the qualities
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
256 my $r_windows = &_sliding_window(\@qual,$self->{windowsize});
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
257 # find out the leading and trailing trimpoints
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
258 my $start_base = $self->_get_start($r_windows,$self->{windowsize},$self->{phreds});
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
259 my (@new_points,$trimmed_sequence);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
260 # do you think that any sequence shorter then 100 should be
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
261 # discarded? I don't think that this should be the decision of this
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
262 # module.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
263 # removed, 020926
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
264 $points[0] = $start_base;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
265 # whew! now for the end base
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
266 # required parameters: reference_to_windows,windowsize,$phredvalue,start_base
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
267 my $end_base = &_get_end($r_windows,$self->{windowsize},
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
268 $self->{phreds},$start_base);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
269 $points[1] = $end_base;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
270 # now do the actual trimming
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
271 # CHAD : I don't think that it is a good idea to call chop_sequence here
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
272 # because chop_sequence also removes X's and N's and things
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
273 # and that is not always what is wanted
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
274 return \@points;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
275 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
276
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
277 =head2 trim_doublet($sequence,$quality,$name,$class)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
278
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
279 Title : trim_doublet($sequence,$quality,$name,$class)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
280 Usage : ($r_trim_points,$trimmed_sequence) =
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
281 @{$o_trim->trim_singlet($sequence,$quality,$name,$class)};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
282 Function: Trim a singlet based on its quality.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
283 Returns : a reference to an array containing the forward and reverse
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
284 Args : $sequence : A sequence
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
285 $quality : A _scalar_ of space-delimited quality values.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
286 $name : the name of the sequence
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
287 $class : The class of the sequence. One of qw(singlet
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
288 singleton doublet pair multiplet)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
289 Notes : At the time this was written the bioperl objects SeqWithQuality
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
290 and PrimaryQual did not exist. This is what is with the clumsy
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
291 passing of references and so on. I will rewrite this next time I
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
292 have to work with it. I also wasn't sure whether this function
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
293 should return just the trim points or the points and the sequence.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
294 I decided that I always wanted both so that's how I implemented
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
295 it.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
296
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
297 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
298
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
299 #'
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
300 sub trim_doublet {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
301 my ($self,$sequence,$quality,$name,$class) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
302 my @qual = split(' ',$quality);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
303 my @points;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
304 my $sequence_length = length($sequence);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
305 my ($returnstring,$processed_sequence);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
306 # smooth out the qualities
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
307 my $r_windows = &_sliding_window(\@qual,$self->{windowsize});
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
308 # determine where the consensus sequence starts
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
309 my $offset = 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
310 for (my $current = 0; $current<$sequence_length;$current++) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
311 if ($qual[$current] != 0) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
312 $offset = $current;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
313 last;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
314 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
315 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
316 # start_base required: r_quality,$windowsize,$phredvalue
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
317 my $start_base = $self->_get_start($r_windows,$self->{windowsize},$self->{phreds},$offset);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
318 if ($start_base > ($sequence_length - 100)) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
319 $points[0] = ("FAILED");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
320 $points[1] = ("FAILED");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
321 return @points;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
322 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
323 $points[0] = $start_base;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
324 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
325 # whew! now for the end base
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
326 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
327 # required parameters: reference_to_windows,windowsize,$phredvalue,start_base
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
328 # |
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
329 # 010420 NOTE: We will no longer get the end base to avoid the Q/--\___/-- syndrome
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
330 my $end_base = $sequence_length;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
331 my $start_of_trailing_zeros = &count_doublet_trailing_zeros(\@qual);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
332 $points[1] = $end_base;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
333 # CHAD : I don't think that it is a good idea to call chop_sequence here
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
334 # because chop_sequence also removes X's and N's and things
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
335 # and that is not always what is wanted
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
336 return @points;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
337 } # end trim_doublet
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
338
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
339 =head2 chop_sequence($name,$class,$sequence,@points)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
340
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
341 Title : chop_sequence($name,$class,$sequence,@points)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
342 Usage : ($start_point,$end_point,$chopped_sequence) =
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
343 $o_trim->chop_sequence($name,$class,$sequence,@points);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
344 Function: Chop a sequence based on its name, class, and sequence.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
345 Returns : an array containing three scalars:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
346 1- the start trim point
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
347 2- the end trim point
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
348 3- the chopped sequence
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
349 Args :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
350 $name : the name of the sequence
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
351 $class : The class of the sequence. One of qw(singlet
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
352 singleton doublet pair multiplet)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
353 $sequence : A sequence
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
354 @points : An array containing two elements- the first contains
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
355 the start trim point and the second conatines the end trim
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
356 point.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
357
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
358 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
359
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
360 sub chop_sequence {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
361 my ($self,$name,$class,$sequence,@points) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
362 print("Coming into chop_sequence, \@points are @points\n");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
363 my $fdesig = $self->{'f_designator'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
364 my $rdesig = $self->{'r_designator'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
365 if (!$points[0] && !$points[1]) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
366 $sequence = "junk";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
367 return $sequence;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
368 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
369 if ($class eq "singlet" && $name =~ /$fdesig$/) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
370 $sequence = substr($sequence,$points[0],$points[1]-$points[0]);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
371 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
372 elsif ($class eq "singlet" && $name =~ /$rdesig$/) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
373 $sequence = substr($sequence,$points[0],$points[1]-$points[0]);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
374 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
375 elsif ($class eq "singleton" && $name =~ /$fdesig$/) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
376 $sequence = substr($sequence,$points[0],$points[1]-$points[0]);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
377 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
378 elsif ($class eq "singleton" && $name =~ /$rdesig$/) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
379 $sequence = substr($sequence,$points[0],$points[1]-$points[0]);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
380 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
381 elsif ($class eq "doublet") {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
382 $sequence = substr($sequence,$points[0],$points[1]-$points[0]);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
383 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
384 # this is a _terrible_ to do this! i couldn't seem to find a better way
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
385 # i thought something like s/(^.*[Xx]{5,})//g; might work, but no go
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
386 # no time to find a fix!
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
387 my $length_before_trimming = length($sequence);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
388 my $subs_Xs = $sequence =~ s/^.*[Xx]{5,}//g;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
389 if ($subs_Xs) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
390 my $length_after_trimming = length($sequence);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
391 my $number_Xs_trimmed = $length_before_trimming - $length_after_trimming;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
392 $points[0] += $number_Xs_trimmed;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
393 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
394 $length_before_trimming = length($sequence);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
395 my $subs_Ns = $sequence =~ s/[Nn]{1,}$//g;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
396 if ($subs_Ns) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
397 my $length_after_trimming = length($sequence);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
398 my $number_Ns_trimmed = $length_before_trimming - $length_after_trimming;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
399 $points[1] -= $number_Ns_trimmed;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
400 $points[1] -= 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
401 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
402 push @points,$sequence;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
403 print("chop_sequence \@points are @points\n");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
404 return @points;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
405 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
406
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
407 =head2 _get_start($r_quals,$windowsize,$phreds,$offset)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
408
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
409 Title : _get_start($r_quals,$windowsize,$phreds,$offset)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
410 Usage : $start_base = $self->_get_start($r_windows,5,20);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
411 Function: Provide the start trim point for this sequence.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
412 Returns : a scalar representing the start of the sequence
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
413 Args :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
414 $r_quals : A reference to an array containing quality values. In
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
415 context, this array of values has been smoothed by then
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
416 sliding window-look ahead algorithm.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
417 $windowsize : The size of the window used when the sliding window
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
418 look-ahead average was calculated.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
419 $phreds : <fill in what this does here>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
420 $offset : <fill in what this does here>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
421
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
422 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
423
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
424 sub _get_start {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
425 my ($self,$r_quals,$windowsize,$phreds,$offset) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
426 print("Using $phreds phreds\n") if $self->verbose > 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
427 # this is to help determine whether the sequence is good at all
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
428 my @quals = @$r_quals;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
429 my ($count,$count2,$qualsum);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
430 if ($offset) { $count = $offset; } else { $count = 0; }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
431 # search along the length of the sequence
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
432 for (; ($count+$windowsize) <= scalar(@quals); $count++) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
433 # sum all of the quality values in this window.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
434 my $cumulative=0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
435 for($count2 = $count; $count2 < $count+$windowsize; $count2++) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
436 if (!$quals[$count2]) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
437 # print("Quals don't exist here!\n");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
438 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
439 else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
440 $qualsum += $quals[$count2];
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
441 # print("Incremented qualsum to ($qualsum)\n");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
442 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
443 $cumulative++;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
444 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
445 # print("The sum of this window (starting at $count) is $qualsum. I counted $cumulative bases.\n");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
446 # if the total of windowsize * phreds is
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
447 if ($qualsum && $qualsum >= $windowsize*$phreds) { return $count; }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
448 $qualsum = 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
449 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
450 # if ($count > scalar(@quals)-$windowsize) { return; }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
451 return $count;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
452 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
453
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
454 =head2 _get_end($r_qual,$windowsize,$phreds,$count)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
455
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
456 Title : _get_end($r_qual,$windowsize,$phreds,$count)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
457 Usage : my $end_base = &_get_end($r_windows,20,20,$start_base);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
458 Function: Get the end trim point for this sequence.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
459 Returns : A scalar representing the end trim point for this sequence.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
460 Args :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
461 $r_qual : A reference to an array containing quality values. In
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
462 context, this array of values has been smoothed by then
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
463 sliding window-look ahead algorithm.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
464 $windowsize : The size of the window used when the sliding window
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
465 look-ahead average was calculated.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
466 $phreds : <fill in what this does here>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
467 $count : Start looking for the end of the sequence here.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
468
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
469 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
470
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
471 sub _get_end {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
472 my ($r_qual,$windowsize,$phreds,$count) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
473 my @quals = @$r_qual;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
474 my $total_bases = scalar(@quals);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
475 my ($count2,$qualsum,$end_of_quals,$bases_counted);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
476 if (!$count) { $count=0; }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
477 BASE: for (; $count < $total_bases; $count++) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
478 $bases_counted = 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
479 $qualsum = 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
480 POSITION: for($count2 = $count; $count2 < $total_bases; $count2++) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
481 $bases_counted++;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
482
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
483 if ($count2 == $total_bases-1) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
484 $qualsum += $quals[$count2];
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
485 $bases_counted++;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
486 last BASE;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
487 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
488 elsif ($bases_counted == $windowsize) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
489 $qualsum += $quals[$count2];
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
490 if ($qualsum < $bases_counted*$phreds) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
491 return $count+$bases_counted+$windowsize;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
492 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
493 next BASE;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
494 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
495 else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
496 $qualsum += $quals[$count2];
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
497 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
498 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
499 if ($qualsum < $bases_counted*$phreds) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
500 return $count+$bases_counted+$windowsize;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
501 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
502 else { }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
503 $qualsum = 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
504 } # end for
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
505 if ($end_of_quals) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
506 my $bases_for_average = $total_bases-$count2;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
507 return $count2;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
508 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
509 else { }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
510 if ($qualsum) { } # print ("$qualsum\n");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
511 return $total_bases;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
512 } # end get_end
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
513
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
514 =head2 count_doublet_trailing_zeros($r_qual)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
515
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
516 Title : count_doublet_trailing_zeros($r_qual)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
517 Usage : my $start_of_trailing_zeros = &count_doublet_trailing_zeros(\@qual);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
518 Function: Find out when the trailing zero qualities start.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
519 Returns : A scalar representing where the zeros start.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
520 Args : A reference to an array of quality values.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
521 Notes : Again, this should be rewritten to use PrimaryQual objects.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
522 A more detailed explanation of why phrap puts these zeros here should
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
523 be written and placed here. Please email and hassle the author.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
524
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
525
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
526 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
527
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
528 sub count_doublet_trailing_zeros {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
529 my ($r_qual) = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
530 my $number_of_trailing_zeros = 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
531 my @qualities = @$r_qual;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
532 for (my $current=scalar(@qualities);$current>0;$current--) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
533 if ($qualities[$current] && $qualities[$current] != 0) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
534 $number_of_trailing_zeros = scalar(@qualities)-$current;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
535 return $current+1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
536 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
537 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
538 return scalar(@qualities);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
539 } # end count_doublet_trailing_zeros
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
540
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
541 =head2 _sliding_window($r_quals,$windowsize)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
542
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
543 Title : _sliding_window($r_quals,$windowsize)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
544 Usage : my $r_windows = &_sliding_window(\@qual,$windowsize);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
545 Function: Create a sliding window, look-forward-average on an array
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
546 of quality values. Used to smooth out differences in qualities.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
547 Returns : A reference to an array containing the smoothed values.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
548 Args : $r_quals: A reference to an array containing quality values.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
549 $windowsize : The size of the sliding window.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
550 Notes : This was written before PrimaryQual objects existed. They
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
551 should use that object but I haven't rewritten this yet.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
552
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
553 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
554
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
555 #'
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
556 sub _sliding_window {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
557 my ($r_quals,$windowsize) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
558 my (@window,@quals,$qualsum,$count,$count2,$average,@averages,$bases_counted);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
559 @quals = @$r_quals;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
560 my $size_of_quality = scalar(@quals);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
561 # do this loop for all of the qualities
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
562 for ($count=0; $count <= $size_of_quality; $count++) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
563 $bases_counted = 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
564 BASE: for($count2 = $count; $count2 < $size_of_quality; $count2++) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
565 $bases_counted++;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
566 # if the search hits the end of the averages, stop
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
567 # this is for the case near the end where bases remaining < windowsize
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
568 if ($count2 == $size_of_quality) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
569 $qualsum += $quals[$count2];
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
570 last BASE;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
571 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
572 # if the search hits the size of the window
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
573 elsif ($bases_counted == $windowsize) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
574 $qualsum += $quals[$count2];
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
575 last BASE;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
576 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
577 # otherwise add the quality value
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
578 unless (!$quals[$count2]) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
579 $qualsum += $quals[$count2];
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
580 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
581 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
582 unless (!$qualsum || !$windowsize) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
583 $average = $qualsum / $bases_counted;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
584 if (!$average) { $average = "0"; }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
585 push @averages,$average;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
586 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
587 $qualsum = 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
588 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
589 # 02101 Yes, I repaired the mismatching numbers between averages and windows.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
590 # print("There are ".scalar(@$r_quals)." quality values. They are @$r_quals\n");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
591 # print("There are ".scalar(@averages)." average values. They are @averages\n");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
592 return \@averages;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
593
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
594 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
595
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
596 =head2 _print_formatted_qualities
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
597
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
598 Title : _print_formatted_qualities(\@quals)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
599 Usage : &_print_formatted_qualities(\@quals);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
600 Returns : Nothing. Prints.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
601 Args : A reference to an array containing quality values.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
602 Notes : An internal procedure used in debugging. Prints out an array nicely.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
603
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
604 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
605
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
606 sub _print_formatted_qualities {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
607 my $rquals = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
608 my @qual = @$rquals;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
609 for (my $count=0; $count<scalar(@qual) ; $count++) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
610 if (($count%10)==0) { print("\n$count\t"); }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
611 if ($qual[$count]) { print ("$qual[$count]\t");}
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
612 else { print("0\t"); }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
613 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
614 print("\n");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
615 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
616
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
617 =head2 _get_end_old($r_qual,$windowsize,$phreds,$count)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
618
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
619 Title : _get_end_old($r_qual,$windowsize,$phreds,$count)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
620 Usage : Deprecated. Don't use this!
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
621 Returns : Deprecated. Don't use this!
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
622 Args : Deprecated. Don't use this!
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
623
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
624 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
625
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
626 #'
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
627 sub _get_end_old {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
628 my ($r_qual,$windowsize,$phreds,$count) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
629 warn("Do Not Use this function (_get_end_old)");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
630 my $target = $windowsize*$phreds;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
631 my @quals = @$r_qual;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
632 my $total_bases = scalar(@quals);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
633 my ($count2,$qualsum,$end_of_quals);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
634 if (!$count) { $count=0; }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
635 BASE: for (; $count < $total_bases; $count++) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
636 for($count2 = $count; $count2 < $count+$windowsize; $count2++) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
637 if ($count2 == scalar(@quals)-1) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
638 $qualsum += $quals[$count2];
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
639 $end_of_quals = 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
640 last BASE;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
641
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
642 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
643 $qualsum += $quals[$count2];
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
644 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
645 if ($qualsum < $windowsize*$phreds) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
646 return $count+$windowsize;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
647 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
648 $qualsum = 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
649 } # end for
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
650 } # end get_end_old
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
651
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
652
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
653 # Autoload methods go after =cut, and are processed by the autosplit program.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
654
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
655 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
656 __END__