annotate variant_effect_predictor/Bio/Tools/Alignment/Consed.pm @ 2:a5976b2dce6f

changing defualt values for ensembl database
author mahtabm
date Thu, 11 Apr 2013 17:15:42 +1000
parents 1f6dce3d34e0
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1 # Bio::Tools::Alignment::Consed.pm
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
2 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
3 # Cared for by Chad Matsalla
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
4 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
5 # Copyright Chad Matsalla
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
6 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
7 # You may distribute this module under the same terms as perl itself
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
8
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
9 # POD documentation - main docs before the code
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
10
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
11 =head1 NAME
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
12
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
13 Bio::Tools::Alignment::Consed - A module to work with objects from consed .ace files
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
14
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
15 =head1 SYNOPSIS
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
16
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
17 # a report for sequencing stuff
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
18 my $o_consed = new Bio::Tools::Alignment::Consed(
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
19 -acefile => "/path/to/an/acefile.ace.1",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
20 -verbose => 1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
21 my $foo = $o_consed->set_reverse_designator("r");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
22 my $bar = $o_consed->set_forward_designator("f");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
23
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
24 # get the contig numbers
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
25 my @keys = $o_consed->get_contigs();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
26
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
27 # construct the doublets
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
28 my $setter_doublets = $o_consed->choose_doublets();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
29
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
30 # get the doublets
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
31 my @doublets = $o_consed->get_doublets();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
32
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
33 =head1 DESCRIPTION
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
34
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
35 Bio::Tools::Alignment::Consed provides methods and objects to deal
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
36 with the output from the Consed package of programs. Specifically,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
37 Bio::Tools::Alignment::Consed takes in the name of in .ace file and
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
38 provides objects for the results.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
39
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
40 A word about doublets: This module was written to accomodate a large
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
41 EST sequencing operation. In this case, EST's were sequenced from the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
42 3' and from the 5' end of the EST. The objective was to find a
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
43 consensus sequence for these two reads. Thus, a contig of two is what
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
44 we wanted, and this contig should consist of the forward and reverse
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
45 reads of a getn clone. For example, for a forward designator of "F"
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
46 and a reverse designator of "R", if the two reads chad1F and chad1R
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
47 were in a single contig (for example Contig 5) it will be determined
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
48 that the consensus sequence for Contig 5 will be the sequence for
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
49 clone chad1.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
50
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
51 Doublets are good!
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
52
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
53 This module parses .ace and related files. A detailed list of methods
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
54 can be found at the end of this document.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
55
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
56 I wrote a detailed rationale for design that may explain the reasons
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
57 why some things were done the way they were done. That document is
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
58 beyond the scope of this pod and can probably be found in the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
59 directory from which this module was 'made' or at
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
60 http://www.dieselwurks.com/bioinformatics/consedpm_documentation.pdf
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
61
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
62 Note that the pod in that document might be old but the original
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
63 rationale still stands.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
64
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
65 =head1 FEEDBACK
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
66
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
67 =head2 Mailing Lists
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
68
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
69 User feedback is an integral part of the evolution of this and other
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
70 Bioperl modules. Send your comments and suggestions preferably to one
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
71 of the Bioperl mailing lists. Your participation is much appreciated.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
72
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
73 bioperl-l@bioperl.org - General discussion
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
74 http://bio.perl.org/MailList.html - About the mailing lists
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
75
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
76 =head2 Reporting Bugs
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
77
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
78 Report bugs to the Bioperl bug tracking system to help us keep track
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
79 the bugs and their resolution. Bug reports can be submitted via email
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
80 or the web:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
81
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
82 bioperl-bugs@bio.perl.org
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
83 http://bugzilla.bioperl.org/
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
84
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
85
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
86 =head1 AUTHOR - Chad Matsalla
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
87
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
88 chad@dieselwurks.com
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
89
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
90 =head1 APPENDIX
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
91
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
92 The rest of the documentation details each of the object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
93 methods. Internal methods are usually preceded with a _
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
94
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
95 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
96
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
97 #'
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
98
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
99 package Bio::Tools::Alignment::Consed;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
100
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
101 use strict;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
102 use vars qw($VERSION @ISA $Contigs %DEFAULTS);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
103 use FileHandle;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
104 use Dumpvalue qw(dumpValue);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
105 use Bio::Tools::Alignment::Trim;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
106 use Bio::Root::Root;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
107 use Bio::Root::IO;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
108
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
109 @ISA = qw(Bio::Root::Root Bio::Root::IO);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
110
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
111 $VERSION = '0.60';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
112
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
113 BEGIN {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
114 %DEFAULTS = ( 'f_designator' => 'f',
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
115 'r_designator' => 'r');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
116 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
117
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
118 =head2 new()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
119
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
120 Title : new(-acefile => $path_to_some_acefile, -verbose => "1")
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
121 Usage : $o_consed = Bio::Tools::Alignment::Consed->
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
122 new(-acefile => $path_to_some_acefile, -verbose => "1");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
123 Function: Construct the Bio::Tools::Alignment::Consed object. Sets
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
124 verbosity for the following procedures, if necessary:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
125 1. Construct a new Bio::Tools::Alignment::Trim object, to
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
126 handle quality trimming 2. Read in the acefile and parse it
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
127
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
128 Returns : A reference to a Bio::Tools::Alignment::Consed object.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
129 Args : A hash. (-acefile) is the filename of an acefile. If a full path
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
130 is not specified "./" is prepended to the filename and used from
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
131 instantiation until destruction. If you want
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
132 Bio::Tools::Alignment::Consed to be noisy during parsing of
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
133 the acefile, specify some value for (-verbose).
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
134
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
135 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
136
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
137 sub new {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
138 my ($class,%args) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
139 my $self = $class->SUPER::new(%args);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
140
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
141 $self->{'filename'} = $args{'-acefile'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
142 # this is special to UNIX and should probably use catfile
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
143 if (!($self->{'filename'} =~ /\//)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
144 $self->{'filename'} = "./".$self->{'filename'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
145 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
146 $self->{'filename'} =~ m/(.*\/)(.*)ace.*$/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
147 $self->{'path'} = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
148 $self->_initialize_io('-file'=>$self->{'filename'});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
149 $self->{'o_trim'} = new Bio::Tools::Alignment::Trim(-verbose => $self->verbose());
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
150 $self->set_forward_designator($DEFAULTS{'f_designator'});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
151 $self->set_reverse_designator($DEFAULTS{'r_designator'});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
152
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
153 $self->_read_file();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
154 return $self;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
155 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
156
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
157 =head2 verbose()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
158
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
159 Title : verbose()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
160 Usage : $o_consed->verbose(1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
161 Function: Set the verbosity level for debugging messages. On instantiation
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
162 of the Bio::Tools::Alignment::Consed object the verbosity level
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
163 is set to 0 (quiet).
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
164 Returns : 1 or 0.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
165 Args : The verbosity levels are:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
166 0 - quiet
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
167 1 - noisy
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
168 2 - noisier
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
169 3 - annoyingly noisy
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
170 Notes : This method for setting verbosity has largely been superseeded by
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
171 a sub-by-sub way, where for every sub you can provide a (-verbose)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
172 switch. I am doing converting this bit-by-bit so do not be surprised
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
173 if some subs do not honour this.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
174
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
175 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
176
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
177 # from RootI
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
178
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
179 # backwards compat
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
180 sub set_verbose { (shift)->verbose(@_) }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
181
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
182 =head2 get_filename()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
183
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
184 Title : get_filename()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
185 Usage : $o_consed->get_filename();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
186 Function: Returns the name of the acefile being used by the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
187 Bio::Tools::Alignment::Consed object.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
188 Returns : A scalar containing the name of a file.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
189 Args : None.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
190
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
191 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
192
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
193
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
194 sub get_filename {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
195 my $self = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
196 return $self->{'filename'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
197 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
198
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
199 =head2 count_sequences_with_grep()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
200
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
201 Title : count_sequences_with_grep()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
202 Usage : $o_consed->count_sequences_with_grep();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
203 Function: Use /bin/grep to scan through the files in the ace project dir
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
204 and count sequences in those files. I used this method in the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
205 development of this module to verify that I was getting all of the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
206 sequences. It works, but it is (I think) unix-like platform
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
207 dependent.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
208 Returns : A scalar containing the number of sequences in the ace project
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
209 directory.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
210 Args : None.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
211 Notes : If you are on a non-UNIX platform, you really do not have to use
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
212 this. It is more of a debugging routine designed to address very
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
213 specific problems.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
214
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
215 This method was reimplemented to be platform independent with a
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
216 pure perl implementation. The above note can be ignored.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
217
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
218 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
219
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
220 sub count_sequences_with_grep {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
221 my $self = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
222 my ($working_dir,$grep_cli,@total_grep_sequences);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
223 # this should be migrated to a pure perl implementation ala
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
224 # Tom Christiansen's 'tcgrep'
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
225 # http://www.perl.com/language/ppt/src/grep/tcgrep
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
226
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
227 open(FILE, $self->{'filename'}) or do { $self->warn("cannot open file ".$self->{'filename'}. " for grepping"); return};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
228 my $counter =0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
229 while(<FILE>) { $counter++ if(/^AF/); }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
230
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
231 close FILE;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
232 opendir(SINGLETS,$self->{'path'});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
233 foreach my $f ( readdir(SINGLETS) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
234 next unless ($f =~ /\.singlets$/);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
235 open(FILE, $self->catfile($self->{'path'},$f)) or do{ $self->warn("cannot open file ".$self->catfile($self->{'path'},$f)); next };
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
236 while(<FILE>) { $counter++ if(/^>/) }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
237 close FILE;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
238 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
239 return $counter;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
240 # Chad's platform implementation which required grep on the system
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
241
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
242 # I tried to cause graceful exiting if not on *ix here
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
243 # then i took platforms from Bioperl*/PLATFORMS here. Is that good?
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
244 # print("\$^O is $^O\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
245 # if (!($^O =~ /dec_osf|linux|unix|bsd|solaris|darwin/i)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
246 # $self->warn("Bio::Tools::Alignment::Consed::count_sequences_with_grep: This sub uses grep which is doesn't run on this operating system, AFAIK. Sorry .".$^O);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
247 # return 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
248 # }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
249 # $grep_cli = `which grep`;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
250 # if (!$grep_cli) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
251 # $self->warn("I couldn't see to find grep on this system, or the which command is broken. Bio::Tools::Alignment::Consed::count_sequences_with_grep requires grep and which to find grep.");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
252 # return 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
253 # }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
254 # chomp $grep_cli;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
255 # push(@total_grep_sequences, my @foo = `$grep_cli ^AF $self->{filename}`);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
256 # my $cli = "$grep_cli \\> $self->{'path'}*.singlets";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
257 # push(@total_grep_sequences, @foo = `$cli`);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
258 # return scalar(@total_grep_sequences);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
259 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
260
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
261 =head2 get_path()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
262
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
263 Title : get_path()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
264 Usage : $o_consed->get_path();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
265 Function: Returns the path to the acefile this object is working with.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
266 Returns : Scalar. The path to the working acefile.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
267 Args : None.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
268
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
269 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
270
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
271 sub get_path {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
272 my $self = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
273 return $self->{'path'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
274 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
275
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
276 =head2 get_contigs()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
277
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
278 Title : get_contigs()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
279 Usage : $o_consed->get_contigs();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
280 Function: Return the keys to the Bio::Tools::Alignment::Consed object.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
281 Returns : An array containing the keynames in the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
282 Bio::Tools::Alignment::Consed object.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
283 Args : None.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
284 Notes : This would normally be used to get the keynames for some sort of
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
285 iterator. These keys are worthless in general day-to-day use because
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
286 in the Consed acefile they are simply Contig1, Contig2, ...
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
287
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
288 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
289
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
290 sub get_contigs {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
291 my ($self,$contig) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
292 my @keys = (sort keys %{$self->{'contigs'}});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
293 return @keys;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
294 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
295
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
296 =head2 get_class($contig_keyname)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
297
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
298 Title : get_class($contig_keyname)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
299 Usage : $o_consed->get_class($contig_keyname);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
300 Function: Return the class name for this contig
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
301 Returns : A scalar representing the class of this contig.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
302 Args : None.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
303 Notes :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
304
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
305 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
306
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
307 sub get_class {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
308 my ($self,$contig) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
309 return $self->{contigs}->{$contig}->{class};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
310 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
311
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
312 =head2 get_quality_array($contig_keyname)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
313
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
314 Title : get_quality_array($contig_keyname)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
315 Usage : $o_consed->get_quality_array($contig_keyname);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
316 Function: Returns the quality for the consensus sequence for the given
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
317 contig as an array. See get_quality_scalar to get this as a scalar.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
318 Returns : An array containing the quality for the consensus sequence with
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
319 the given keyname.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
320 Args : The keyname of a contig. Note: This is a keyname. The key would
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
321 normally come from get_contigs.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
322 Notes : Returns an array, not a reference. Is this a bug? <thinking> No.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
323 Well, maybe.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
324 Why was this developed like this? I was using FreezeThaw for object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
325 persistence, and when it froze out these arrays it took a long time
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
326 to thaw it. Much better as a scalar.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
327
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
328 See L<get_quality_scalar()|get_quality_scalar>
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
329
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
330 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
331
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
332 sub get_quality_array {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
333 my ($self,$contig) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
334 my $quality = $self->{contigs}->{$contig}->{quality};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
335 # chad, what is with the s/// ?
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
336 # my @qualities = split
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
337 # (' ',($self->{contigs}->{$contig}->{quality} =~ s/\s+//));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
338 my @qualities = split
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
339 (' ',$self->{contigs}->{$contig}->{quality});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
340 return @qualities;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
341 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
342
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
343 =head2 get_quality_scalar($contig_keyname))
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
344
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
345 Title : get_quality_scalar($contig_keyname)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
346 Usage : $o_consed->get_quality_scalar($contig_keyname);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
347 Function: Returns the quality for the consensus sequence for the given
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
348 contig as a scalar. See get_quality_array to get this as an array.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
349 Returns : An scalar containing the quality for the consensus sequence with
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
350 the given keyname.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
351 Args : The keyname of a contig. Note this is a _keyname_. The key would
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
352 normally come from get_contigs.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
353 Notes : Why was this developed like this? I was using FreezeThaw for object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
354 persistence, and when it froze out these arrays it took a coon's age
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
355 to thaw it. Much better as a scalar.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
356
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
357 See L<get_quality_array()|get_quality_array>
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
358
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
359 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
360
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
361 #'
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
362 sub get_quality_scalar {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
363 my ($self,$contig) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
364 return $self->{'contigs'}->{$contig}->{'quality'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
365 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
366
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
367 =head2 freeze_hash()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
368
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
369 Title : freeze_hash()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
370 Usage : $o_consed->freeze_hash();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
371
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
372 Function: Use Ilya's FreezeThaw module to create a persistent data
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
373 object for this Bio::Tools::Alignment::Consed data
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
374 structure. In the case of AAFC, we use
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
375 Bio::Tools::Alignment::Consed to pre-process bunches of
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
376 sequences, freeze the structures, and send in a harvesting
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
377 robot later to do database stuff.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
378 Returns : 0 or 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
379 Args : None.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
380 Notes : This procedure was removed so Consed.pm won't require
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
381 FreezeThaw.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
382
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
383
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
384 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
385
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
386 #'
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
387 sub freeze_hash {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
388 my $self = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
389 $self->warn("This method (freeze_hash) was removed from the bioperl consed.pm. Sorry.\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
390 if (1==2) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
391 $self->debug("Bio::Tools::Alignment::Consed::freeze_hash: \$self->{path} is $self->{path}\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
392 my $filename = $self->{'path'}."frozen";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
393 my %contigs = %{$self->{'contigs'}};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
394 my $frozen = freeze(%contigs);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
395 umask 0001;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
396 open (FREEZE,">$filename") or do {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
397 $self->warn( "Bio::Tools::Alignment::Consed could not freeze the contig hash because the file ($filename) could not be opened: $!\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
398 return 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
399 };
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
400 print FREEZE $frozen;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
401 close FREEZE;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
402 return 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
403 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
404 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
405
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
406 =head2 get_members($contig_keyname)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
407
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
408 Title : get_members($contig_keyname)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
409 Usage : $o_consed->get_members($contig_keyname);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
410 Function: Return the _names_ of the reads in this contig.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
411 Returns : An array containing the names of the reads in this contig.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
412 Args : The keyname of a contig. Note this is a keyname. The keyname
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
413 would normally come from get_contigs.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
414
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
415 See L<get_contigs()|get_contigs>
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
416
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
417 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
418
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
419 sub get_members {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
420 my ($self,$contig) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
421 if (!$contig) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
422 $self->warn("You need to provide the name of a contig to use Bio::Tools::Alignment::Consed::get_members!\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
423 return;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
424 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
425 return @{$self->{'contigs'}->{$contig}->{'member_array'}};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
426 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
427
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
428 =head2 get_members_by_name($some_arbitrary_name)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
429
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
430 Title : get_members_by_name($some_arbitrary_name)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
431 Usage : $o_consed->get_members_by_name($some_arbitrary_name);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
432 Function: Return the names of the reads in a contig. This is the name given
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
433 to $contig{key} based on what is in the contig. This is different
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
434 from the keys retrieved through get_contigs().
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
435 Returns : An array containing the names of the reads in the contig with this
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
436 name.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
437 Args : The name of a contig. Not a key, but a name.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
438 Notes : Highly inefficient. use some other method if possible.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
439
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
440 See L<get_contigs()|get_contigs>
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
441
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
442 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
443
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
444 sub get_members_by_name {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
445 my ($self,$name) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
446 # build a list to try to screen for redundancy
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
447 my @contigs_with_that_name;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
448 foreach my $currkey ( sort keys %{$self->{'contigs'}} ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
449 next if (!$self->{'contigs'}->{$currkey}->{'name'});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
450 if ($self->{'contigs'}->{$currkey}->{'name'} eq "$name") {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
451 push @contigs_with_that_name,$currkey;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
452 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
453 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
454 my $count = @contigs_with_that_name;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
455 if ($count == 1) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
456 my $contig_num = $contigs_with_that_name[0];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
457 return @{$self->{'contigs'}->{$contig_num}->{'member_array'}};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
458 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
459 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
460
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
461 =head2 get_contig_number_by_name($some_arbitrary_name)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
462
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
463 Title : get_contig_number_by_name($some_arbitrary_name)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
464 Usage : $o_consed->get_contig_number_by_name($some_arbitrary_name);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
465 Function: Return the names of the reads in a contig. This is the name given
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
466 to $contig{key} based on what is in the contig. This is different
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
467 from the keys retrieved through get_contigs().
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
468 Returns : An array containing the names of the reads in the contig with this
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
469 name.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
470 Args : The name of a contig. Not a key, but a name.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
471
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
472 See L<get_contigs()|get_contigs>
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
473
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
474 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
475
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
476 sub get_contig_number_by_name {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
477 my ($self,$name) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
478 foreach my $currkey (sort keys %{$self->{'contigs'}}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
479 if ($self->{'contigs'}->{$currkey}->{'name'} &&
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
480 $self->{'contigs'}->{$currkey}->{'name'} eq "$name") {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
481 return $currkey;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
482 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
483 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
484 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
485
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
486 =head2 get_sequence($contig_keyname)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
487
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
488 Title : get_sequence($contig_keyname)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
489 Usage : $o_consed->get_sequence($contig_keyname);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
490 Function: Returns the consensus sequence for a given contig.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
491 Returns : A scalar containing a sequence.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
492 Args : The keyname of a contig. Note this is a key. The key would
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
493 normally come from get_contigs.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
494
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
495 See L<get_contigs()|get_contigs>
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
496
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
497 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
498
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
499 sub get_sequence {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
500 my ($self,$contig) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
501 return $self->{'contigs'}->{$contig}->{'consensus'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
502 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
503
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
504 =head2 set_final_sequence($some_sequence)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
505
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
506 Title : set_final_sequence($name,$some_sequence)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
507 Usage : $o_consed->set_final_sequence($name,$some_sequence);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
508 Function: Provides a manual way to set the sequence for a given key in the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
509 contig hash. Rarely used.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
510 Returns : 0 or 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
511 Args : The name (not the keyname) of a contig and an arbitrary string.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
512 Notes : A method with a questionable and somewhat mysterious origin. May
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
513 raise the dead or something like that.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
514
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
515 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
516
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
517 sub set_final_sequence {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
518 my ($self,$name,$sequence) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
519 if (!$self->{'contigs'}->{$name}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
520 $self->warn("You cannot set the final sequence for $name because it doesn't exist!\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
521 return 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
522 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
523 else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
524 $self->{'contigs'}->{$name}->{'final_sequence'} = $sequence;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
525 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
526 return 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
527 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
528
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
529 =head2 _read_file()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
530
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
531 Title : _read_file();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
532 Usage : _read_file();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
533 Function: An internal subroutine used to read in an acefile and parse it
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
534 into a Bio::Tools::Alignment::Consed object.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
535 Returns : 0 or 1.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
536 Args : Nothing.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
537 Notes : This routine creates and saves the filhandle for reading the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
538 files in {fh}
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
539
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
540 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
541
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
542 sub _read_file {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
543 my ($self) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
544 my ($line,$in_contig,$in_quality,$contig_number,$top);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
545 # make it easier to type $fhl
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
546 while (defined($line=$self->_readline()) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
547 chomp $line;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
548 # check if there is anything on this line
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
549 # if not, you can stop gathering consensus sequence
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
550 if (!$line) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
551 # if the line is blank you are no longer to gather consensus
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
552 # sequence or quality values
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
553 $in_contig = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
554 $in_quality = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
555 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
556 # you are currently gathering consensus sequence
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
557 elsif ($in_contig) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
558 if ($in_contig == 1) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
559 $self->debug("Adding $line to consensus of contig number $contig_number.\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
560 $self->{'contigs'}->{$contig_number}->{'consensus'} .= $line;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
561 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
562 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
563 elsif ($in_quality) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
564 if (!$line) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
565 $in_quality = undef;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
566 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
567 else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
568 # I wrote this in here because acefiles produced by cap3 do not have a leading space
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
569 # like the acefiles produced by phrap and there is the potential to have concatenated
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
570 # quality values like this: 2020 rather then 20 20 whre lines collide. Thanks Andrew for noticing.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
571 if ($self->{'contigs'}->{$contig_number}->{'quality'} && !($self->{'contigs'}->{$contig_number}->{'quality'} =~ m/\ $/)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
572 $self->{'contigs'}->{$contig_number}->{'quality'} .= " ";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
573 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
574 $self->{'contigs'}->{$contig_number}->{'quality'} .= $line;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
575 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
576 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
577 elsif ($line =~ /^BQ/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
578 $in_quality = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
579 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
580 # the line /^CO/ like this:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
581 # CO Contig1 796 1 1 U
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
582 # can be broken down as follows:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
583 # CO - Contig!
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
584 # Contig1 - the name of this contig
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
585 # 796 - Number of bases in this contig
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
586 # 1 - Number of reads in this contig
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
587 # 1 - number of base segments in this contig
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
588 # U - Uncomplemented
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
589 elsif ($line =~ /^CO/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
590 $line =~ m/^CO\ Contig(\d+)\ \d+\ \d+\ \d+\ (\w)/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
591 $contig_number = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
592 if ($2 eq "C") {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
593 $self->debug("Contig $contig_number is complemented!\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
594 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
595 $self->{'contigs'}->{$contig_number}->{'member_array'} = [];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
596 $self->{'contigs'}->{$contig_number}->{'contig_direction'} = "$2";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
597 $in_contig = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
598 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
599 # 000713
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
600 # this BS is deprecated, I think.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
601 # haha, I am really witty. <ew>
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
602 elsif ($line =~ /^BSDEPRECATED/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
603 $line =~ m/^BS\s+\d+\s+\d+\s+(.+)/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
604 my $member = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
605 $self->{'contigs'}->{$contig_number}->{$member}++;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
606 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
607 # the members of the contigs are determined by the AF line in the ace file
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
608 elsif ($line =~ /^AF/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
609 $self->debug("I see an AF line here.\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
610 $line =~ /^AF\ (\S+)\ (\w)\ (\S+)/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
611 # push the name of the current read onto the member array for this contig
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
612 push @{$self->{'contigs'}->{$contig_number}->{'member_array'}},$1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
613 # the first read in the contig will be named the "top" read
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
614 if (!$top) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
615 $self->debug("\$top is not set.\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
616 if ($self->{'contigs'}->{$contig_number}->{'contig_direction'} eq "C") {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
617 $self->debug("Reversing the order of the reads. The bottom will be $1\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
618 # if the contig sequence is marked as the complement, the top becomes the bottom and$
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
619 $self->{'contigs'}->{$contig_number}->{'bottom_name'} = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
620 $self->{'contigs'}->{$contig_number}->{'bottom_complement'} = $2;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
621 $self->{'contigs'}->{$contig_number}->{'bottom_start'} = $3;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
622 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
623 else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
624 $self->debug("NOT reversing the order of the reads. The top_name will be $1\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
625 # if the contig sequence is marked as the complement, the top becomes the bottom and$
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
626 $self->{'contigs'}->{$contig_number}->{'top_name'} = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
627 $self->{'contigs'}->{$contig_number}->{'top_complement'} = $2;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
628 $self->{'contigs'}->{$contig_number}->{'top_start'} = $3;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
629 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
630 $top = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
631 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
632 else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
633 # if the contig sequence is marked as the complement, the top becomes the bottom and the bottom becomes the top
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
634 if ($self->{'contigs'}->{$contig_number}->{'contig_direction'} eq "C") {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
635 $self->debug("Reversing the order of the reads. The top will be $1\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
636 $self->{'contigs'}->{$contig_number}->{'top_name'} = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
637 $self->{'contigs'}->{$contig_number}->{'top_complement'} = $2;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
638 $self->{'contigs'}->{$contig_number}->{'top_start'} = $3;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
639 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
640 else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
641 $self->debug("NOT reversing the order of the reads. The bottom will be $1\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
642 $self->{'contigs'}->{$contig_number}->{'bottom_name'} = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
643 $self->{'contigs'}->{$contig_number}->{'bottom_complement'} = $2;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
644 $self->{'contigs'}->{$contig_number}->{'bottom_start'} = $3;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
645 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
646 $top = undef;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
647 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
648 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
649 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
650 return 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
651 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
652
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
653 =head2 set_reverse_designator($some_string)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
654
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
655 Title : set_reverse_designator($some_string)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
656 Usage : $o_consed->set_reverse_designator($some_string);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
657 Function: Set the designator for the reverse read of contigs in this
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
658 Bio::Tools::Alignment::Consed object. Used to determine if
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
659 contigs containing two reads can be named.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
660 Returns : The value of $o_consed->{reverse_designator} so you can check
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
661 to see that it was set properly.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
662 Args : An arbitrary string.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
663 Notes : May be useful only to me. <shrug>
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
664
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
665 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
666
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
667 sub set_reverse_designator {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
668 my ($self,$reverse_designator) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
669 $self->{'reverse_designator'} = $reverse_designator;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
670 $self->{'o_trim'}->set_reverse_designator($reverse_designator);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
671 return $self->{'reverse_designator'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
672 } # end set_reverse_designator
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
673
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
674 =head2 set_forward_designator($some_string)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
675
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
676 Title : set_forward_designator($some_string)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
677 Usage : $o_consed->set_forward_designator($some_string);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
678 Function: Set the designator for the forward read of contigs in this
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
679 Bio::Tools::Alignment::Consed object. Used to determine if
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
680 contigs containing two reads can be named.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
681 Returns : The value of $o_consed->{forward_designator} so you can check
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
682 to see that it was set properly.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
683 Args : An arbitrary string.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
684 Notes : May be useful only to me. <shrug>
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
685
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
686 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
687
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
688 sub set_forward_designator {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
689 my ($self,$forward_designator) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
690 $self->{'forward_designator'} = $forward_designator;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
691 $self->{'o_trim'}->set_forward_designator($forward_designator);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
692 return $self->{'forward_designator'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
693 } # end set_forward_designator
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
694
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
695 =head2 set_designator_ignore_case("yes")
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
696
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
697 Title : set_designator_ignore_case("yes")
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
698 Usage : $o_consed->set_designator_ignore_case("yes");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
699 Function: Deprecated.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
700 Returns : Deprecated.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
701 Args : Deprecated.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
702 Notes : Deprecated. Really. Trust me.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
703
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
704 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
705
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
706 sub set_designator_ignore_case {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
707 my ($self,$ignore_case) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
708 if ($ignore_case eq "yes") {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
709 $self->{'designator_ignore_case'} = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
710 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
711 return $self->{'designator_ignore_case'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
712 } # end set_designator_ignore_case
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
713
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
714 =head2 set_trim_points_singlets_and_singletons()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
715
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
716 Title : set_trim_points_singlets_and_singletons()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
717 Usage : $o_consed->set_trim_points_singlets_and_singletons();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
718 Function: Set the trim points for singlets and singletons based on
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
719 quality. Uses the Bio::Tools::Alignment::Trim object. Use
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
720 at your own risk because the Bio::Tools::Alignment::Trim
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
721 object was designed specifically for me and is mysterious
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
722 in its ways. Every time somebody other then me uses it a
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
723 swarm of locusts decends on a small Central American
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
724 village so do not say you weren't warned.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
725 Returns : Nothing.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
726 Args : None.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
727 Notes : Working on exceptions and warnings here.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
728
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
729 See L<Bio::Tools::Alignment::Trim> for more information
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
730
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
731 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
732
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
733 #' to make my emacs happy
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
734
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
735 sub set_trim_points_singlets_and_singletons {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
736 my ($self) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
737 $self->debug("Consed.pm : \$self is $self\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
738 my (@points,$trimmed_sequence);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
739 if (!$self->{'doublets_set'}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
740 $self->debug("You need to set the doublets before you use set_trim_points_singlets_and_doublets. Doing that now.");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
741 $self->set_doublets();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
742 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
743 foreach (sort keys %{$self->{'contigs'}}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
744 if ($self->{'contigs'}->{$_}->{'class'} eq "singlet") {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
745 $self->debug("Singlet $_\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
746 # this is what Warehouse wants
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
747 # my ($self,$sequence,$quality,$name) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
748 # this is what Bio::Tools::Alignment::Trim::trim_singlet wants:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
749 # my ($self,$sequence,$quality,$name,$class) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
750 # the following several lines are to make the parameter passing legible.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
751 my ($sequence,$quality,$name,$class);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
752 $sequence = $self->{'contigs'}->{$_}->{'consensus'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
753 if (!$self->{'contigs'}->{$_}->{'quality'}) { $quality = "unset"; }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
754 else { $quality = $self->{'contigs'}->{$_}->{'quality'}; }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
755 $name = $self->{'contigs'}->{$_}->{'name'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
756 $class = $self->{'contigs'}->{$_}->{'class'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
757 (@points) = @{$self->{'o_trim'}->trim_singlet($sequence,$quality,$name,$class)};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
758 $self->{'contigs'}->{$_}->{'start_point'} = $points[0];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
759 $self->{'contigs'}->{$_}->{'end_point'} = $points[1];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
760 $self->{'contigs'}->{$_}->{'sequence_trimmed'} = substr($self->{contigs}->{$_}->{'consensus'},$points[0],$points[1]-$points[0]);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
761 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
762 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
763 $self->debug("Bio::Tools::Alignment::Consed::set_trim_points_singlets_and_singletons: Done setting the quality trimpoints.\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
764 return;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
765 } # end set_trim_points_singlet
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
766
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
767 =head2 set_trim_points_doublets()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
768
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
769 Title : set_trim_points_doublets()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
770 Usage : $o_consed->set_trim_points_doublets();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
771 Function: Set the trim points for doublets based on quality. Uses the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
772 Bio::Tools::Alignment::Trim object. Use at your own risk because
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
773 the Bio::Tools::Alignment::Trim object was designed specifically
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
774 for me and is mysterious in its ways. Every time somebody other
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
775 then me uses it you risk a biblical plague being loosed on your
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
776 city.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
777 Returns : Nothing.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
778 Args : None.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
779 Notes : Working on exceptions here.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
780
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
781 See L<Bio::Tools::Alignment::Trim> for more information
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
782
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
783 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
784
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
785 sub set_trim_points_doublets {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
786 my $self = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
787 my @points;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
788 $self->debug("Bio::Tools::Alignment::Consed::set_trim_points_doublets: Restoring zeros for doublets.\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
789 # &show_missing_sequence($self);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
790 $self->debug("Bio::Tools::Alignment::Consed::set_trim_points_doublets: Setting doublet trim points.\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
791 foreach (sort keys %{$self->{'contigs'}}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
792 if ($self->{'contigs'}->{$_}->{'class'} eq "doublet") {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
793 # my ($self,$sequence,$quality,$name,$class) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
794 my @quals = split(' ',$self->{'contigs'}->{$_}->{'quality'});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
795
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
796 (@points) = $self->{o_trim}->trim_doublet($self->{'contigs'}->{$_}->{'consensus'},$self->{'contigs'}->{$_}->{'quality'},$self->{'contigs'}->{$_}->{name},$self->{'contigs'}->{$_}->{'class'});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
797 $self->{'contigs'}->{$_}->{'start_point'} = $points[0];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
798 $self->{'contigs'}->{$_}->{'end_point'} = $points[1];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
799 # now set this
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
800 $self->{'contigs'}->{$_}->{'sequence_trimmed'} = substr($self->{contigs}->{$_}->{'consensus'},$points[0],$points[1]-$points[0]);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
801 # 010102 the deprecated way to do things:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
802 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
803 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
804 $self->debug("Bio::Tools::Alignment::Consed::set_trim_points_doublets: Done setting doublet trim points.\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
805 return;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
806 } # end set_trim_points_doublets
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
807
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
808 =head2 get_trimmed_sequence_by_name($name)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
809
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
810 Title : get_trimmed_sequence_by_name($name)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
811 Usage : $o_consed->get_trimmed_sequence_by_name($name);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
812 Function: Returns the trimmed_sequence of a contig with {name} eq $name.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
813 Returns : A scalar- the trimmed sequence.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
814 Args : The {name} of a contig.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
815 Notes :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
816
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
817 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
818
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
819 sub get_trimmed_sequence_by_name {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
820 my ($self,$name) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
821 my $trimmed_sequence;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
822 my $contigname = &get_contig_number_by_name($self,$name);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
823 my $class = $self->{'contigs'}->{$contigname}->{'class'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
824 # what is this business and who was smoking crack while writing this?
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
825 # if ($class eq "singlet") {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
826 # send the sequence, the quality, and the name
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
827 # $trimmed_sequence = $self->{o_trim}->trim_singlet($self->{'contigs'}->{$contigname}->{consensus},$self->{'contigs'}->{$contigname}->{'quality'},$name);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
828 # }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
829 return $self->{'contigs'}->{$contigname}->{'sequence_trimmed'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
830 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
831
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
832 =head2 set_dash_present_in_sequence_name("yes")
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
833
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
834 Title : set_dash_present_in_sequence_name("yes")
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
835 Usage : $o_consed->set_dash_present_in_sequence_name("yes");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
836 Function: Deprecated. Part of an uncompleted thought. ("Oooh! Shiny!")
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
837 Returns : Nothing.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
838 Args : "yes" to set {dash_present_in_sequence_name} to 1
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
839 Notes :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
840
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
841 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
842
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
843 sub set_dash_present_in_sequence_name {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
844 my ($self,$dash_present) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
845 if ($dash_present eq "yes") {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
846 $self->{'dash_present_in_sequence_name'} = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
847 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
848 else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
849 $self->{'dash_present_in_sequence_name'} = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
850 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
851 return $self->{'dash_present_in_sequence_name'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
852 } # end set_dash_present_in_sequence_name
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
853
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
854 =head2 set_doublets()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
855
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
856 Title : set_doublets()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
857 Usage : $o_consed->set_doublets();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
858 Function: Find pairs that have similar names and mark them as doublets
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
859 and set the {name}.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
860 Returns : 0 or 1.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
861 Args : None.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
862 Notes : A complicated subroutine that iterates over the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
863 Bio::Tools::Alignment::Consed looking for contigs of 2. If the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
864 forward and reverse designator are removed from each of the reads
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
865 in {'member_array'} and the remaining reads are the same, {name}
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
866 is set to that name and the contig's class is set as "doublet".
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
867 If any of those cases fail the contig is marked as a "pair".
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
868
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
869 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
870
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
871 #' make my emacs happy
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
872
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
873 sub set_doublets {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
874 my ($self) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
875 # set the designators in the Bio::Tools::Alignment::Trim object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
876
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
877 $self->{'o_trim'}->set_designators($self->{'reverse_designator'},
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
878 $self->{'forward_designator'});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
879 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
880 foreach my $key_contig (sort keys %{$self->{'contigs'}}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
881
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
882 # if there is a member array (why would there not be? This should be a die()able offence
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
883 # but for now I will leave it
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
884 if ($self->{'contigs'}->{$key_contig}->{'member_array'}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
885 # if there are two reads in this contig
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
886 # i am pretty sure that this is wrong but i am keeping it for reference
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
887 # if (@{$self->{'contigs'}->{$key_contig}->{'member_array'}} == 2 || !$self->{'contigs'}->{$key_contig}->{'class'}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
888 # <seconds later>
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
889 # <nod> WRONG. Was I on crack?
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
890 if (@{$self->{'contigs'}->{$key_contig}->{'member_array'}} == 2) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
891 $self->{'contigs'}->{$key_contig}->{'num_members'} = 2;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
892 $self->debug("\tThere are 2 members! Looking for the contig name...\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
893 my $name = _get_contig_name($self,$self->{'contigs'}->{$key_contig}->{'member_array'});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
894 $self->debug("The name is $name\n") if defined $name;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
895 if ($name) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
896 $self->{'contigs'}->{$key_contig}->{'name'} = $name;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
897 $self->{'contigs'}->{$key_contig}->{'class'} = "doublet";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
898 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
899 else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
900 $self->debug("$key_contig is a pair.\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
901 $self->{'contigs'}->{$key_contig}->{'class'} = "pair";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
902 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
903 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
904 # this is all fair and good but what about singlets?
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
905 # they have one reads in the member_array but certainly are not singletons
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
906 elsif (@{$self->{'contigs'}->{$key_contig}->{'member_array'}} == 1) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
907 # set the name to be the name of the read
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
908 $self->{'contigs'}->{$key_contig}->{name} = @{$self->{'contigs'}->{$key_contig}->{'member_array'}}[0];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
909 # set the number of members to be one
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
910 $self->{'contigs'}->{$key_contig}->{num_members} = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
911 # if this was a singlet, it would already belong to the class "singlet"
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
912 # so leave it alone
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
913 # if it is not a singlet, it is a singleton! lablel it appropriately
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
914 unless ($self->{'contigs'}->{$key_contig}->{'class'}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
915 $self->{'contigs'}->{$key_contig}->{'class'} = "singleton";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
916 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
917 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
918 # set the multiplet characteristics
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
919 elsif (@{$self->{'contigs'}->{$key_contig}->{'member_array'}} >= 3) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
920 $self->{'contigs'}->{$key_contig}->{'num_members'} = @{$self->{'contigs'}->{$key_contig}->{'member_array'}};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
921 $self->{'contigs'}->{$key_contig}->{'class'} = "multiplet";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
922 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
923 $self->{'contigs'}->{$key_contig}->{'num_members'} = @{$self->{'contigs'}->{$key_contig}->{'member_array'}};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
924
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
925 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
926 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
927 $self->{'doublets_set'} = "done";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
928 return 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
929 } # end set_doublets
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
930
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
931 =head2 set_singlets
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
932
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
933 Title : set_singlets
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
934 Usage : $o_consed->set_singlets();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
935 Function: Read in a singlets file and place them into the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
936 Bio::Tools::Alignment::Consed object.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
937 Returns : Nothing.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
938 Args : A scalar to turn on verbose parsing of the singlets file.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
939 Notes :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
940
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
941 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
942
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
943 sub set_singlets {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
944 # parse out the contents of the singlets file
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
945 my ($self) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
946 $self->debug("Bio::Tools::Alignment::Consed Adding singlets to the contig hash...\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
947 my $full_filename = $self->{'filename'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
948 $self->debug("Bio::Tools::Alignment::Consed::set_singlets: \$full_filename is $full_filename\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
949 $full_filename =~ m/(.*\/)(.*ace.*)$/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
950 my ($base_path,$filename) = ($1,$2);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
951 $self->debug("Bio::Tools::Alignment::Consed::set_singlets: singlets filename is $filename and \$base_path is $base_path\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
952 $filename =~ m/(.*)ace.*$/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
953 my $singletsfile = $base_path.$1."singlets";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
954 $self->debug("\$singletsfile is $singletsfile\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
955 if (-f $singletsfile) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
956 $self->debug("$singletsfile is indeed a file. Trying to open it...\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
957 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
958 my $singlets_fh = Bio::Root::IO->new(-file => $singletsfile);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
959 my ($sequence,$name,$count);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
960 while ($_ = $singlets_fh->_readline()) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
961 chomp $_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
962 if (/\>/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
963 if ($name && $sequence) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
964 $self->debug("Adding $name with sequence $sequence to hash...\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
965 push @{$self->{'contigs'}->{$name}->{'member_array'}},$name;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
966 $self->{'contigs'}->{$name}->{'consensus'} = $sequence;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
967 $self->{'contigs'}->{$name}->{'name'} = $name;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
968 $self->{'contigs'}->{$name}->{"singlet"} = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
969 $self->{'contigs'}->{$name}->{'class'} = "singlet";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
970 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
971 $sequence = $name = undef;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
972 $count++;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
973 m/^\>(.*)\s\sCHROMAT/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
974 $name = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
975 if (!$name) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
976 m/\>(\S+)\s/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
977 $name = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
978 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
979 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
980 else { $sequence .= $_; }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
981 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
982 if ($name && $sequence) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
983 $self->debug("Pushing the last of the singlets ($name)\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
984 @{$self->{'contigs'}->{$name}->{'member_array'}} = $name;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
985 $self->{'contigs'}->{$name}->{'consensus'} = $sequence;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
986 $self->{'contigs'}->{$name}->{'name'} = $name;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
987 $self->{'contigs'}->{$name}->{"singlet"} = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
988 $self->{'contigs'}->{$name}->{'class'} = "singlet";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
989 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
990 $self->debug("Bio::Tools::Alignment::Consed::set_singlets: Done adding singlets to the singlets hash.\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
991 $self->{'singlets_set'} = "done";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
992 return 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
993 } # end sub set_singlets
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
994
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
995 =head2 get_singlets()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
996
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
997 Title : get_singlets()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
998 Usage : $o_consed->get_singlets();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
999 Function: Return the keynames of the singlets.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1000 Returns : An array containing the keynames of all
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1001 Bio::Tools::Alignment::Consed sequences in the class "singlet".
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1002 Args : None.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1003 Notes :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1004
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1005 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1006
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1007 sub get_singlets {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1008 # returns an array of singlet names
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1009 # singlets have "singlet"=1 in the hash
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1010 my $self = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1011 if (!$self->{singlets_set}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1012 $self->debug("You need to set the singlets before you get them. Doing that now.");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1013 $self->set_singlets();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1014 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1015
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1016 my (@singlets,@array);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1017 foreach my $key (sort keys %{$self->{'contigs'}}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1018 # @array = @{$Consed::contigs{$key}->{'member_array'}};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1019 # somethimes a user will try to get a list of singlets before the classes for the rest of the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1020 # contigs has been set (see t/test.t for how I figured this out. <bah>
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1021 # so either way, just return class=singlets
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1022 if (!$self->{'contigs'}->{$key}->{'class'}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1023 # print("$key has no class. why?\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1024 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1025 elsif ($self->{'contigs'}->{$key}->{'class'} eq "singlet") {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1026 push @singlets,$key;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1027 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1028 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1029 return @singlets;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1030 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1031
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1032 =head2 set_quality_by_name($name,$quality)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1033
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1034 Title : set_quality_by_name($name,$quality)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1035 Usage : $o_consed->set_quality_by_name($name,$quality);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1036 Function: Deprecated. Make the contig with {name} have {'quality'} $quality.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1037 Probably used for testing.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1038 Returns : Nothing.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1039 Args : The name of a contig and a scalar for its quality.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1040 Notes : Deprecated.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1041
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1042 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1043
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1044 sub set_quality_by_name {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1045 # this is likely deprecated
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1046 my ($self,$name,$quality) = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1047 my $return;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1048 foreach (sort keys %{$self->{'contigs'}}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1049 if ($self->{'contigs'} eq "$name" || $self->{'contigs'}->{'name'} eq "$name") {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1050 $self->{'contigs'}->{'quality'} = $quality;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1051 $return=1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1052 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1053 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1054 if ($return) { return "0"; } else { return "1"; }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1055 } # end set quality by name
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1056
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1057 =head2 set_singlet_quality()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1058
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1059 Title : set_singlet_quality()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1060 Usage : $o_consed->set_singlet_quality();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1061 Function: For each singlet, go to the appropriate file in phd_dir and read
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1062 in the phred quality for that read and place it into {'quality'}
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1063 Returns : 0 or 1.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1064 Args : None.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1065 Notes : This is the next subroutine that will receive substantial revision
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1066 in the next little while. It really should eval the creation of
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1067 Bio::Tools::Alignment::Phred objects and go from there.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1068
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1069 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1070
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1071 sub set_singlet_quality {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1072 my $self = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1073 my $full_filename = $self->{'filename'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1074 $full_filename =~ m/(.*\/)(.*)ace.*$/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1075 my ($base_path,$filename) = ($1,"$2"."qual");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1076 my $singletsfile = $base_path.$filename;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1077 if (-f $singletsfile) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1078 # print("$singletsfile is indeed a file. Trying to open it...\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1079 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1080 else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1081 $self->warn("$singletsfile is not a file. Sorry.\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1082 return;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1083 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1084 my $singlets_fh = Bio::Root::IO->new(-file => $singletsfile);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1085 my ($sequence,$name,$count);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1086 my ($identity,$line,$quality,@qline);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1087 while ($line = $singlets_fh->_readline()) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1088 chomp $line;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1089 if ($line =~ /^\>/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1090 $quality = undef;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1091 $line =~ m/\>(\S*)\s/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1092 $identity = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1093 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1094 else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1095 if ($self->{'contigs'}->{$identity}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1096 $self->{'contigs'}->{$identity}->{'quality'} .= "$line ";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1097 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1098 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1099
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1100 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1101 return 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1102 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1103
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1104 =head2 set_contig_quality()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1105
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1106 Title : set_contig_quality()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1107 Usage : $o_consed->set_contig_quality();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1108 Function: Deprecated.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1109 Returns : Deprecated.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1110 Args : Deprecated.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1111 Notes : Deprecated. Really. Trust me.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1112
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1113 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1114
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1115 sub set_contig_quality {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1116 # note: contigs _include_ singletons but _not_ singlets
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1117 my ($self) = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1118 # the unexpected results I am referring to here are a doubling of quality values.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1119 # the profanity I uttered on discovering this reminded me of the simpsons:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1120 # Ned Flanders: "That is the loudest profanity I have ever heard!"
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1121 $self->warn("set_contig_quality is deprecated and will likely produce unexpected results");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1122 my $full_filename = $self->{'filename'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1123 # Run_SRC3700_2000-08-01_73+74.fasta.screen.contigs.qual
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1124 # from Consed.pm
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1125 $full_filename =~ m/(.*\/)(.*)ace.*$/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1126 my ($base_path,$filename) = ($1,"$2"."contigs.qual");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1127 my $singletsfile = $base_path.$filename;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1128 if (-f $singletsfile) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1129 # print("$singletsfile is indeed a file. Trying to open it...\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1130 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1131 else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1132 $self->warn("Bio::Tools::Alignment::Consed::set_contig_quality $singletsfile is not a file. Sorry.\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1133 return;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1134 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1135 my $contig_quality_fh = Bio::Root::IO->new(-file => $singletsfile);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1136
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1137 my ($sequence,$name,$count,$identity,$line,$quality);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1138 while ($line = $contig_quality_fh->_readline()) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1139 chomp $line;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1140 if ($line =~ /^\>/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1141 $quality = undef;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1142 $line =~ m/\>.*Contig(\d+)\s/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1143 $identity = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1144 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1145 else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1146 if ($self->{'contigs'}->{$identity} ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1147 $self->{'contigs'}->{$identity}->{'quality'} .= " $line";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1148 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1149 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1150 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1151 } # end set_contig_quality
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1152
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1153 =head2 get_multiplets()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1154
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1155 Title : get_multiplets()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1156 Usage : $o_consed->get_multiplets();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1157 Function: Return the keynames of the multiplets.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1158 Returns : Returns an array containing the keynames of all
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1159 Bio::Tools::Alignment::Consed sequences in the class "multiplet".
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1160 Args : None.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1161 Notes :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1162
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1163 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1164
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1165 sub get_multiplets {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1166 # returns an array of multiplet names
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1167 # multiplets have # members > 2
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1168 my $self = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1169 my (@multiplets,@array);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1170 foreach my $key (sort keys %{$self->{'contigs'}}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1171 if ($self->{'contigs'}->{$key}->{'class'}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1172 if ($self->{'contigs'}->{$key}->{'class'} eq "multiplet") {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1173 push @multiplets,$key;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1174 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1175 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1176 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1177 return @multiplets;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1178 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1179
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1180 =head2 get_all_members()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1181
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1182 Title : get_all_members()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1183 Usage : @all_members = $o_consed->get_all_members();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1184 Function: Return a list of all of the read names in the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1185 Bio::Tools::Alignment::Consed object.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1186 Returns : An array containing all of the elements in all of the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1187 {'member_array'}s.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1188 Args : None.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1189 Notes :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1190
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1191 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1192
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1193 sub get_all_members {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1194 my $self = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1195 my @members;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1196 foreach my $key (sort keys %{$self->{'contigs'}}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1197 if ($key =~ /^singlet/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1198 push @members,$self->{'contigs'}->{$key}->{'member_array'}[0];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1199 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1200 elsif ($self->{'contigs'}->{$key}->{'member_array'}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1201 push @members,@{$self->{'contigs'}->{$key}->{'member_array'}};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1202 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1203 # else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1204 # print("Bio::Tools::Alignment::Consed: $key is _not_ an array. Pushing $self->{'contigs'}->{$key}->{'member_array'} onto \@members\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1205 # push @members,$self->{'contigs'}->{$key}->{'member_array'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1206 # }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1207 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1208 return @members;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1209 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1210
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1211 =head2 sum_lets($total_only)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1212
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1213 Title : sum_lets($total_only)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1214 Usage : $statistics = $o_consed->sum_lets($total_only);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1215 Function: Provide numbers for how many sequences were accounted for in the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1216 Bio::Tools::Alignment::Consed object.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1217 Returns : If a scalar is present, returns the total number of
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1218 sequences accounted for in all classes. If no scalar passed
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1219 then returns a string that looks like this:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1220 Singt/singn/doub/pair/mult/total : 2,0,1(2),0(0),0(0),4
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1221 This example means the following: There were 1 singlets.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1222 There were 0 singletons. There were 1 doublets for a total
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1223 of 2 sequences in this class. There were 0 pairs for a
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1224 total of 0 sequences in this class. There were 0
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1225 multiplets for a total of 0 sequences in this class. There
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1226 were a total of 4 sequences accounted for in the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1227 Bio::Tools::Alignment::Consed object.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1228 Args : A scalar is optional to change the way the numbers are returned.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1229 Notes:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1230
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1231 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1232
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1233 sub sum_lets {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1234 my ($self,$total_only) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1235 my ($count,$count_multiplets,$multiplet_count);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1236 my $singlets = &get_singlets($self); $count += $singlets;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1237 my $doublets = &get_doublets($self); $count += ($doublets * 2);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1238 my $pairs = &get_pairs($self); $count += ($pairs * 2);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1239 my $singletons = &get_singletons($self); $count += $singletons;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1240 my @multiplets = &get_multiplets($self);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1241 $count_multiplets = @multiplets;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1242 my $return_string;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1243 foreach (@multiplets) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1244 my $number_members = $self->{'contigs'}->{$_}->{num_members};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1245 $multiplet_count += $number_members;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1246 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1247 if ($multiplet_count) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1248 $count += $multiplet_count;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1249 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1250 foreach (qw(multiplet_count singlets doublets pairs singletons multiplets count_multiplets)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1251 no strict 'refs'; # renege for the block
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1252 if (!${$_}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1253 ${$_} = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1254 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1255 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1256 if (!$multiplet_count) { $multiplet_count = 0; }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1257 if ($total_only) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1258 return $count;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1259 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1260 $return_string = "Singt/singn/doub/pair/mult/total : $singlets,$singletons,$doublets(".
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1261 ($doublets*2)."),$pairs(".($pairs*2)."),$count_multiplets($multiplet_count),$count";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1262 return $return_string;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1263 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1264
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1265 =head2 write_stats()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1266
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1267 Title : write_stats()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1268 Usage : $o_consed->write_stats();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1269 Function: Write a file called "statistics" containing numbers similar to
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1270 those provided in sum_lets().
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1271 Returns : Nothing. Write a file in $o_consed->{path} containing something
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1272 like this:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1273
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1274 0,0,50(100),0(0),0(0),100
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1275
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1276 Where the numbers provided are in the format described in the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1277 documentation for sum_lets().
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1278 Args : None.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1279 Notes : This might break platform independence, I do not know.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1280
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1281 See L<sum_lets()|sum_lets>
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1282
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1283 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1284
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1285 sub write_stats {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1286 # worry about platform dependence here?
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1287 # oh shucksdarn.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1288 my $self = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1289 my $stats_filename = $self->{'path'}."statistics";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1290 my $statistics_raw = $self->sum_lets;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1291 (my $statsfilecontents = $statistics_raw) =~ s/.*\ \:\ //g;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1292 umask 0001;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1293 my $fh = new Bio::Root::IO(-file=>"$stats_filename");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1294 # open(STATSFILE,">$stats_filename") or print("Could not open the statsfile: $!\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1295 $fh->_print("$statsfilecontents");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1296 # close STATSFILE;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1297 $fh->close();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1298 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1299
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1300 =head2 get_singletons()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1301
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1302 Title : get_singletons()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1303 Usage : @singletons = $o_consed->get_singletons();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1304 Function: Return the keynames of the singletons.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1305 Returns : Returns an array containing the keynames of all
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1306 Bio::Tools::Alignment::Consed sequences in the class "singleton".
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1307 Args : None.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1308 Notes :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1309
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1310 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1311
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1312 sub get_singletons {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1313 # returns an array of singleton names
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1314 # singletons are contigs with one member (see consed documentation)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1315 my $self = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1316 my (@singletons,@array);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1317 foreach my $key (sort keys %{$self->{'contigs'}}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1318 if ($self->{'contigs'}->{$key}->{'class'}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1319 # print ("$key class: $self->{'contigs'}->{$key}->{'class'}\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1320 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1321 else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1322 # print("$key belongs to no class. why?\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1323 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1324 if ($self->{'contigs'}->{$key}->{'member_array'}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1325 @array = @{$self->{'contigs'}->{$key}->{'member_array'}};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1326 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1327 my $num_array_elem = @array;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1328 if ($num_array_elem == 1 && $self->{'contigs'}->{$key}->{'class'} && $self->{'contigs'}->{$key}->{'class'} eq "singleton") { push @singletons,$key; }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1329 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1330 return @singletons;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1331 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1332
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1333 =head2 get_pairs()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1334
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1335 Title : get_pairs()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1336 Usage : @pairs = $o_consed->get_pairs();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1337 Function: Return the keynames of the pairs.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1338 Returns : Returns an array containing the keynames of all
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1339 Bio::Tools::Alignment::Consed sequences in the class "pair".
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1340 Args : None.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1341 Notes :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1342
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1343 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1344
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1345 sub get_pairs {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1346 # returns an array of pair contig names
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1347 # a pair is a contig of two where the names do not match
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1348 my $self = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1349 my (@pairs,@array);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1350 foreach my $key (sort keys %{$self->{'contigs'}}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1351 if ($self->{'contigs'}->{$key}->{'member_array'}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1352 if (@{$self->{'contigs'}->{$key}->{'member_array'}} == 2 &&
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1353 $self->{'contigs'}->{$key}->{'class'} eq "pair") {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1354 push @pairs,$key;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1355 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1356 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1357 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1358 return @pairs;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1359 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1360
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1361 =head2 get_name($contig_keyname)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1362
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1363 Title : get_name($contig_keyname)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1364 Usage : $name = $o_consed->get_name($contig_keyname);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1365 Function: Return the {name} for $contig_keyname.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1366 Returns : A string. ({name})
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1367 Args : A contig keyname.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1368 Notes :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1369
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1370 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1371
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1372 sub get_name {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1373 my ($self,$contig) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1374 return $self->{'contigs'}->{$contig}->{'name'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1375 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1376
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1377 =head2 _get_contig_name(\@array_containing_reads)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1378
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1379 Title : _get_contig_name(\@array_containing_reads)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1380 Usage : $o_consed->_get_contig_name(\@array_containing_reads);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1381 Function: The logic for the set_doublets subroutine.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1382 Returns : The name for this contig.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1383 Args : A reference to an array containing read names.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1384 Notes : Depends on reverse_designator. Be sure this is set the way you
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1385 intend.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1386
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1387 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1388
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1389 sub _get_contig_name {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1390 my ($self,$r_array) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1391 my @contig_members = @$r_array;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1392 my @name_nodir;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1393 foreach (@contig_members) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1394 # how can I distinguish the clone name from the direction label?
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1395 # look for $Consed::reverse_designator and $Consed::forward_designator
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1396 # what if you do not find _any_ of those?
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1397 my $forward_designator = $self->{'forward_designator'} || "f";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1398 my $reverse_designator = $self->{'reverse_designator'} || "r";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1399 my $any_hits = /(.+)($forward_designator.*)/ || /(.+)($reverse_designator.*)/||/(.+)(_.+)/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1400 my $name = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1401 my $suffix = $2;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1402 if ($name) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1403 # print("\t\$name is $name ");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1404 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1405 if ($suffix) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1406 # print("and \$suffix is $suffix.\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1407 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1408 # Jee, I hope we get a naming convention soon
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1409 if ($suffix) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1410 if ($suffix =~ /^$forward_designator/ || $suffix =~ /^$reverse_designator/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1411 push @name_nodir,$name;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1412 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1413 # bugwatch here! should this be unnested?
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1414 else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1415 push @name_nodir,"$name$suffix";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1416 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1417 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1418 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1419 # print("\@name_nodir: @name_nodir\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1420 my $mismatch = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1421 for (my $counter=0; $counter<@name_nodir;$counter++) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1422 next if ($name_nodir[0] eq $name_nodir[$counter]);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1423 $mismatch = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1424 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1425 if ($mismatch == 0) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1426 # print("\tYou have a cohesive contig named $name_nodir[0].\n\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1427 return $name_nodir[0];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1428 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1429 else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1430 # print("\tYou have mixed names in this contig.\n\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1431 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1432 } # end _get_contig_name
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1433
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1434 =head2 get_doublets()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1435
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1436 Title : get_doublets()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1437 Usage : @doublets = $o_consed->get_doublets();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1438 Function: Return the keynames of the doublets.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1439 Returns : Returns an array containing the keynames of all
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1440 Bio::Tools::Alignment::Consed sequences in the class "doublet".
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1441 Args : None.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1442 Notes :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1443
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1444 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1445
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1446 sub get_doublets {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1447 my $self = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1448 if (!$self->{doublets_set}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1449 $self->warn("You need to set the doublets before you can get them. Doing that now.");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1450 $self->set_doublets();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1451 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1452 my @doublets;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1453 foreach (sort keys %{$self->{'contigs'}}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1454 if ($self->{'contigs'}->{$_}->{name} && $self->{'contigs'}->{$_}->{'class'} eq "doublet") {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1455 push @doublets,$_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1456 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1457 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1458 return @doublets;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1459 } # end get_doublets
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1460
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1461 =head2 dump_hash()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1462
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1463 Title : dump_hash()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1464 Usage : $o_consed->dump_hash();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1465 Function: Use dumpvar.pl to dump out the Bio::Tools::Alignment::Consed
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1466 object to STDOUT.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1467 Returns : Nothing.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1468 Args : None.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1469 Notes : I used this a lot in debugging.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1470
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1471 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1472
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1473 sub dump_hash {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1474 my $self = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1475 my $dumper = new Dumpvalue;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1476 $self->debug( "Bio::Tools::Alignment::Consed::dump_hash - ".
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1477 "The following is the contents of the contig hash...\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1478 $dumper->dumpValue($self->{'contigs'});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1479 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1480
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1481 =head2 dump_hash_compact()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1482
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1483 Title : dump_hash_compact()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1484 Usage : $o_consed->dump_hash_compact();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1485 Function: Dump out the Bio::Tools::Alignment::Consed object in a compact way.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1486 Returns : Nothing.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1487 Args : Nothing.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1488 Notes : Cleaner then dumpValue(), dumpHash(). I used this a lot in
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1489 debugging.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1490
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1491 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1492
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1493 sub dump_hash_compact {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1494 no strict 'refs'; # renege for the block
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1495 my ($self,$sequence) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1496 # get the classes
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1497 my @singlets = $self->get_singlets();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1498 my @singletons = $self->get_singletons();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1499 my @doublets = $self->get_doublets();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1500 my @pairs = $self->get_pairs();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1501 my @multiplets = $self->get_multiplets();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1502 print("Name\tClass\tMembers\tQuality?\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1503 foreach (@singlets) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1504 my @members = $self->get_members($_);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1505 print($self->get_name($_)."\tsinglets\t".(join',',@members)."\t");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1506 if ($self->{'contigs'}->{$_}->{'quality'}) { print("qualities found here\n"); }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1507 else { print("no qualities found here\n"); }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1508
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1509 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1510 foreach (@singletons) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1511 my @members = $self->get_members($_);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1512 print($self->get_name($_)."\tsingletons\t".(join',',@members)."\t");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1513 if ($self->{'contigs'}->{$_}->{'quality'}) { print("qualities found here\n"); }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1514 else { print("no qualities found here\n"); }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1515 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1516 foreach my $pair (@pairs) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1517 my @members = $self->get_members($pair);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1518 my $name;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1519 if (!$self->get_name($pair)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1520 $name = "BLANK";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1521 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1522 else { $name = $self->get_name($pair); }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1523 print("$name\tpairs\t".(join',',@members)."\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1524 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1525 foreach (@doublets) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1526 my @members = $self->get_members_by_name($_);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1527 print("$_\tdoublets\t".(join',',@members)."\t");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1528 my $contig_number = &get_contig_number_by_name($self,$_);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1529 if ($self->{'contigs'}->{$contig_number}->{'quality'}) { print("qualities found here\n"); }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1530 else { print("no qualities found here\n"); }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1531 # print($_."\tdoublets\t".(join',',@members)."\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1532 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1533 foreach (@multiplets) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1534 my @members = $self->get_members($_);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1535 print("Contig $_"."\tmultiplets\t".(join',',@members)."\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1536 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1537 } # end dump_hash_compact
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1538
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1539 =head2 get_phreds()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1540
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1541 Title : get_phreds()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1542 Usage : @phreds = $o_consed->get_phreds();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1543 Function: For each doublet in the Bio::Tools::Alignment::Consed hash, go
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1544 and get the phreds for the top and bottom reads. Place them into
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1545 {top_phreds} and {bottom_phreds}.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1546 Returns : Nothing.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1547 Args : Nothing.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1548 Notes : Requires parse_phd() and reverse_and_complement(). I realize that
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1549 it would be much more elegant to pull qualities as required but there
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1550 were certain "features" in the acefile that required a bit more
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1551 detailed work be done to get the qualities for certain parts of the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1552 consensus sequence. In order to make _sure_ that this was done
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1553 properly I wrote things to do all steps and then I used dump_hash()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1554 and checked each one to ensure expected bahavior. I have never changed
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1555 this, so there you are.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1556
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1557 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1558
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1559 sub get_phreds {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1560 # this subroutine is the target of a rewrite to use the Bio::Tools::Alignment::Phred object.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1561 my $self = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1562 my $current_contig;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1563 foreach $current_contig (sort keys %{$self->{'contigs'}}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1564 if ($self->{'contigs'}->{$current_contig}->{'class'} eq "doublet") {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1565 $self->debug("$current_contig is a doublet. Going to parse_phd for top($self->{'contigs'}->{$current_contig}->{'top_name'}) and bottom($self->{'contigs'}->{$current_contig}->{'bottom_name'})\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1566 my $r_phreds_top = &parse_phd($self,$self->{'contigs'}->{$current_contig}->{'top_name'});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1567 my $r_phreds_bottom = &parse_phd($self,$self->{'contigs'}->{$current_contig}->{'bottom_name'});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1568 if ($self->{'contigs'}->{$current_contig}->{'top_complement'} eq "C") {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1569 # print("Reversing and complementing...\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1570 $r_phreds_top = &reverse_and_complement($r_phreds_top);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1571 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1572 if ($self->{'contigs'}->{$current_contig}->{'bottom_complement'} eq "C") {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1573 $r_phreds_bottom = &reverse_and_complement($r_phreds_bottom);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1574 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1575 $self->{'contigs'}->{$current_contig}->{'top_phreds'} = $r_phreds_top;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1576 $self->{'contigs'}->{$current_contig}->{'bottom_phreds'} = $r_phreds_bottom;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1577 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1578 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1579 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1580
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1581 =head2 parse_phd($read_name)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1582
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1583 Title : parse_phd($read_name)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1584 Usage : $o_consed->parse_phd($read_name);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1585 Function: Suck in the contents of a .phd file.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1586 Returns : A reference to an array containing the quality values for the read.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1587 Args : The name of a read.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1588 Notes : This is a significantly weak subroutine because it was always
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1589 intended that these functions, along with the functions provided by
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1590 get_phreds() be put into the Bio::SeqIO:phd module. This is done
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1591 now but the Bio::Tools::Alignment::Consed module has not be
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1592 rewritten to reflect this change.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1593
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1594 See L<Bio::SeqIO::phd> for more information.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1595
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1596 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1597
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1598 sub parse_phd {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1599 my ($self,$sequence_name) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1600 $self->debug("Parsing phd for $sequence_name\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1601 my $in_dna = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1602 my $base_number = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1603 my (@bases,@current_line);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1604 # print("parse_phd: $sequence_name\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1605 my $fh = new Bio::Root::IO(-file=>"$self->{path}/../phd_dir/$sequence_name.phd.1");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1606 # open(PHD,"<$self->{path}/../phd_dir/$sequence_name.phd.1") or
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1607 # die "Couldn't open the phred for $sequence_name\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1608 while ($fh->_readline()) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1609 # print("Reading a line from a phredfile!\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1610 chomp;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1611 if (/^BEGIN_DNA/) { $in_dna = 1; next}
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1612 if (/^END_DNA/) { last; }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1613 if (!$in_dna) { next; }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1614 push(@bases,$_);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1615 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1616 return \@bases;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1617 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1618
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1619 =head2 reverse_and_complement(\@source)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1620
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1621 Title : reverse_and_complement(\@source)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1622 Usage : $reference_to_array = $o_consed->reverse_and_complement(\@source);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1623 Function: A stub for the recursive routine reverse_recurse().
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1624 Returns : A reference to a reversed and complemented array of phred data.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1625 Args : A reference to an array of phred data.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1626 Notes :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1627
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1628 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1629
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1630 sub reverse_and_complement {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1631 my $r_source = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1632 my $r_destination;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1633 $r_destination = &reverse_recurse($r_source,$r_destination);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1634 return $r_destination;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1635 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1636
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1637 =head2 reverse_recurse($r_source,$r_destination)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1638
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1639 Title : reverse_recurse(\@source,\@destination)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1640 Usage : $o_consed->reverse_recurse(\@source,\@destination);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1641 Function: A recursive routine to reverse and complement an array of phred
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1642 data.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1643 Returns : A reference to an array containing reversed phred data.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1644 Args : A reference to a source array and a reverence to a destination
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1645 array.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1646 Notes : Recursion is kewl, but this sub should likely be _reverse_recurse.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1647
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1648 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1649
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1650 my $current_base;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1651 sub reverse_recurse($$) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1652 (my $r_source,my $r_destination) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1653 if (!@$r_source) { return $r_destination; }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1654 $_=pop(@$r_source);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1655 (s/c/g/||s/g/c/||s/a/t/||s/t/a/);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1656 push(@$r_destination,$_);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1657 &reverse_recurse($r_source,$r_destination);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1658 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1659
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1660 =head2 show_missing_sequence()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1661
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1662 Title : show_missing_sequence();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1663 Usage : $o_consed->show_missing_sequence();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1664 Function: Used by set_trim_points_doublets() to fill in quality values where
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1665 consed (phrap?) set them to 0 at the beginning and/or end of the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1666 consensus sequences.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1667 Returns : Nothing.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1668 Args : None.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1669 Notes : Acts on doublets only. Really very somewhat quite ugly. A
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1670 disgusting kludge. <insert pride here> It was written stepwise with
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1671 no real plan because it was not really evident why consed (phrap?)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1672 was doing this.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1673
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1674 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1675
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1676 sub show_missing_sequence() {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1677 # decide which sequence should not have been clipped at consensus position = 0
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1678 my $self = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1679 &get_phreds($self);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1680 my ($current_contig,@qualities);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1681 foreach $current_contig (sort keys %{$self->{'contigs'}}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1682 if ($self->{'contigs'}->{$current_contig}->{'class'} eq "doublet") {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1683 my $number_leading_xs = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1684 my $number_trailing_xs = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1685 my $measurer = $self->{'contigs'}->{$current_contig}->{'quality'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1686 while ($measurer =~ s/^\ 0\ /\ /) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1687 $number_leading_xs++;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1688 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1689 while ($measurer =~ s/\ 0(\s*)$/$1/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1690 $number_trailing_xs++;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1691 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1692 @qualities = split(' ',$self->{'contigs'}->{$current_contig}->{'quality'});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1693 my $in_initial_zeros = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1694 for (my $count=0;$count<scalar(@qualities); $count++) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1695 if ($qualities[$count] == 0) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1696 my ($quality,$top_phred_position,$bottom_phred_position,$top_phred_data,$bottom_phred_data);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1697 # print("The quality of the consensus at ".($count+1)." is zero. Retrieving the real quality value.\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1698 # how do I know which strand to get these quality values from????
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1699 # boggle
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1700 my $top_quality_here = $self->{'contigs'}->{$current_contig}->{'top_phreds'}->[0-$self->{'contigs'}->{$current_contig}->{'top_start'}+$count+1];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1701 my $bottom_quality_here = $self->{'contigs'}->{$current_contig}->{'bottom_phreds'}->[1-$self->{'contigs'}->{$current_contig}->{'bottom_start'}+$count];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1702 if (!$bottom_quality_here || (1-$self->{'contigs'}->{$current_contig}->{'bottom_start'}+$count)<0) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1703 $bottom_quality_here = "not found";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1704 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1705 if (!$top_quality_here) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1706 $top_quality_here = "not found";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1707 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1708 # print("Looking for quals at position $count of $current_contig: top position ".(0-$self->{'contigs'}->{$current_contig}->{top_start}+$count)." ($self->{'contigs'}->{$current_contig}->{top_name}) $top_quality_here , bottom position ".(1-$self->{'contigs'}->{$current_contig}->{bottom_start}+$count)." ($self->{'contigs'}->{$current_contig}->{bottom_name}) $bottom_quality_here\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1709 if ($count<$number_leading_xs) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1710 # print("$count is less then $number_leading_xs so I will get the quality from the top strand\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1711 # print("retrieved quality is ".$self->{'contigs'}->{$current_contig}->{top_phreds}[0-$self->{'contigs'}->{$current_contig}->{top_start}+$count+1]."\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1712 my $quality = $top_quality_here;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1713 $quality =~ /\S+\s(\d+)\s+/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1714 $quality = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1715 # print("retrieved quality for leading zero $count is $quality\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1716 # t 9 9226
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1717 $qualities[$count] = $quality;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1718 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1719 else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1720 # this part is tricky
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1721 # if the contig is like this
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1722 # cccccccccccccccc
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1723 # ffffffffffffffffff
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1724 # rrrrrrrrrrrrrrrrr
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1725 # then take the quality value for the trailing zeros in the cons. seq from the r
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1726 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1727 # but if the contig is like this
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1728 # cccccccccccccccccc
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1729 # ffffffffffffffffffffffffffffffff
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1730 # rrrrrrrrrrrrrrrrrrrrrrrxxxxxxxxr
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1731 # ^^^
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1732 # then any zeros that fall in the positions (^) must be decided whether the quality
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1733 # is the qual from the f or r strand. I will use the greater number
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1734 # does a similar situation exist for the leading zeros? i dunno
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1735 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1736 # print("$count is greater then $number_leading_xs so I will get the quality from the bottom strand\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1737 # print("retrieved quality is ".$contigs->{$current_contig}->{top_phreds}[0-$contigs->{$current_contig}->{top_start}+$count+1]."\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1738 # my ($quality,$top_phred_position,$bottom_phred_position,$top_phred_data,$bottom_phred_data);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1739 if ($bottom_quality_here eq "not found") {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1740 # $top_phred_position = 1-$contigs->{$current_contig}->{bottom_start}+$count;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1741 # print("Going to get quality from here: $top_phred_position of the top.\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1742 # my $temp_quality - $contigs->{$current_contig}->{top_phreds}
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1743 # $quality = $contigs->{$current_contig}->{top_phreds}[$top_phred_position];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1744 $top_quality_here =~ /\w+\s(\d+)\s/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1745 $quality = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1746 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1747 elsif ($top_quality_here eq "not found") {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1748 # $bottom_phred_position = 1+$contigs->{$current_contig}->{bottom_start}+$count;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1749 # print("Going to get quality from here: $bottom_phred_position of the bottom.\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1750 # $quality = $contigs->{$current_contig}->{bottom_phreds}[$bottom_phred_position];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1751 # print("Additional: no top quality but bottom is $quality\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1752 $bottom_quality_here =~ /\w+\s(\d+)\s/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1753 $quality = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1754 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1755 else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1756 # print("Oh jeepers, there are 2 qualities to choose from at this position.\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1757 # print("Going to compare these phred qualities: top: #$top_quality_here# bottom: #$bottom_quality_here#\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1758 # now you have to compare them
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1759 # my $top_quality_phred = $contigs->{$current_contig}->{top_phreds}[$top_phred_position];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1760 # #t 40 875#
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1761 # print("regexing #$top_quality_here#... ");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1762 $top_quality_here =~ /\w\ (\d+)\s/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1763 my $top_quality = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1764 # print("$top_quality\nregexing #$bottom_quality_here#... ");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1765 $bottom_quality_here =~ /\w\ (\d+)\s/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1766 my $bottom_quality = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1767 # print("$bottom_quality\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1768 # print("top_quality: $top_quality bottom quality: $bottom_quality\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1769 if ($bottom_quality > $top_quality) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1770 # print("Chose to take the bottom quality: $bottom_quality\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1771 $quality = $bottom_quality;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1772 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1773 # print("Chose to take the top quality: $top_quality\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1774 $quality = $top_quality;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1775 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1776 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1777 if (!$quality) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1778 # print("Warning: no quality value for $current_contig, position $count!\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1779 # print("Additional data: top quality phred: $top_quality_here\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1780 # print("Additional data: bottom quality phred: $bottom_quality_here\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1781 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1782 $qualities[$count] = $quality;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1783 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1784 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1785 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1786
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1787 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1788 unless (!@qualities) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1789 $self->{'contigs'}->{$current_contig}->{'quality'} = join(" ",@qualities);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1790 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1791 $self->{'contigs'}->{$current_contig}->{'bottom_phreds'} = undef;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1792 $self->{'contigs'}->{$current_contig}->{'top_phreds'} = undef;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1793 my $count = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1794 } # end foreach key
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1795 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1796 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1797
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1798
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1799 # Autoload methods go after =cut, and are processed by the autosplit program.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1800
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1801 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1802 __END__
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1803
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1804 =head1 SEE ALSO
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1805
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1806 perl(1).
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1807
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1808 =cut