annotate variant_effect_predictor/Bio/DB/Flat/OBDAIndex.pm @ 3:d30fa12e4cc5 default tip

Merge heads 2:a5976b2dce6f and 1:09613ce8151e which were created as a result of a recently fixed bug.
author devteam <devteam@galaxyproject.org>
date Mon, 13 Jan 2014 10:38:30 -0500
parents 1f6dce3d34e0
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1 # $Id: OBDAIndex.pm,v 1.12.2.1 2003/06/28 20:47:16 jason Exp $
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
2 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
3 # BioPerl module for Bio::DB::Flat::OBDAIndex
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
4 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
5 # Cared for by Michele Clamp <michele@sanger.ac.uk>>
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
6 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
7 # You may distribute this module under the same terms as perl itself
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
8
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
9 # POD documentation - main docs before the code
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
10
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
11 =head1 NAME
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
12
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
13 Bio::DB::Flat::OBDAIndex - Binary search indexing system for sequence files
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
14
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
15 =head1 SYNOPSIS
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
16
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
17 This module can be used both to index sequence files and also to retrieve
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
18 sequences from existing sequence files.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
19
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
20 =head2 Index creation
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
21
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
22 my $sequencefile; # Some fasta sequence file
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
23
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
24 Patterns have to be entered to define where the keys are to be
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
25 indexed and also where the start of each record. E.g. for fasta
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
26
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
27 my $start_pattern = "^>";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
28 my $primary_pattern = "^>(\\S+)";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
29
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
30
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
31 So the start of a record is a line starting with a E<gt> and the primary
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
32 key is all characters up to the first space afterf the E<gt>
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
33
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
34 A string also has to be entered to defined what the primary key
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
35 (primary_namespace) is called.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
36
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
37 The index can now be created using
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
38
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
39 my $index = new Bio::DB::Flat::OBDAIndex(
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
40 -start_pattern => $start_pattern,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
41 -primary_pattern => $primary_pattern,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
42 -primary_namespace => "ACC",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
43 );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
44
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
45 To actually write it out to disk we need to enter a directory where the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
46 indices will live, a database name and an array of sequence files to index.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
47
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
48 my @files = ("file1","file2","file3");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
49
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
50 $index->make_index("/Users/michele/indices","mydatabase",@files);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
51
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
52 The index is now ready to use. For large sequence files the perl
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
53 way of indexing takes a *long* time and a *huge* amount of memory.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
54 For indexing things like dbEST I recommend using the C indexer.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
55
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
56 =head2 Creating indices with secondary keys
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
57
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
58 Sometimes just indexing files with one id per entry is not enough. For
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
59 instance you may want to retrieve sequences from swissprot using
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
60 their accessions as well as their ids.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
61
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
62 To be able to do this when creating your index you need to pass in
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
63 a hash of secondary_patterns which have their namespaces as the keys
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
64 to the hash.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
65
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
66 e.g. For Indexing something like
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
67
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
68 ID 1433_CAEEL STANDARD; PRT; 248 AA.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
69 AC P41932;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
70 DT 01-NOV-1995 (Rel. 32, Created)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
71 DT 01-NOV-1995 (Rel. 32, Last sequence update)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
72 DT 15-DEC-1998 (Rel. 37, Last annotation update)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
73 DE 14-3-3-LIKE PROTEIN 1.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
74 GN FTT-1 OR M117.2.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
75 OS Caenorhabditis elegans.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
76 OC Eukaryota; Metazoa; Nematoda; Chromadorea; Rhabditida; Rhabditoidea;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
77 OC Rhabditidae; Peloderinae; Caenorhabditis.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
78 OX NCBI_TaxID=6239;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
79 RN [1]
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
80
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
81 where we want to index the accession (P41932) as the primary key and the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
82 id (1433_CAEEL) as the secondary id. The index is created as follows
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
83
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
84 my %secondary_patterns;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
85
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
86 my $start_pattern = "^ID (\\S+)";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
87 my $primary_pattern = "^AC (\\S+)\;";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
88
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
89 $secondary_patterns{"ID"} = "^ID (\\S+)";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
90
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
91 my $index = new Bio::DB::Flat::OBDAIndex(
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
92 -start_pattern => $start_pattern,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
93 -primary_pattern => $primary_pattern,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
94 -primary_namespace => 'ACC',
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
95 -secondary_patterns => \%secondary_patterns);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
96
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
97 $index->make_index("/Users/michele/indices","mydb",($seqfile));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
98
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
99 Of course having secondary indices makes indexing slower and more
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
100 of a memory hog.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
101
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
102
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
103 =head2 Index reading
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
104
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
105 To fetch sequences using an existing index first of all create your sequence
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
106 object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
107
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
108 my $index = new Bio::DB::Flat::OBDAIndex(-index_dir => $index_directory,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
109 -dbname => 'swissprot');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
110
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
111 Now you can happily fetch sequences either by the primary key or
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
112 by the secondary keys.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
113
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
114 my $entry = $index->get_entry_by_id('HBA_HUMAN');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
115
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
116 This returns just a string containing the whole entry. This is
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
117 useful is you just want to print the sequence to screen or write it to a file.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
118
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
119 Other ways of getting sequences are
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
120
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
121 my $fh = $index->get_stream_by_id('HBA_HUMAN');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
122
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
123 This can then be passed to a seqio object for output or converting
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
124 into objects.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
125
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
126 my $seq = new Bio::SeqIO(-fh => $fh,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
127 -format => 'fasta');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
128
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
129 The last way is to retrieve a sequence directly. This is the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
130 slowest way of extracting as the sequence objects need to be made.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
131
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
132 my $seq = $index->get_Seq_by_id('HBA_HUMAN');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
133
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
134 To access the secondary indices the secondary namespace needs to be known
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
135 (use $index-E<gt>secondary_namespaces) and the following call used
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
136
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
137 my $seq = $index->get_Seq_by_secondary('ACC','Q21973');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
138 my $fh = $index->get_stream_by_secondary('ACC','Q21973');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
139 my $entry = $index->get_entry_by_secondary('ACC','Q21973');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
140
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
141 =head1 DESCRIPTION
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
142
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
143 This object allows indexing of sequence files both by a primary key
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
144 (say accession) and multiple secondary keys (say ids). This is
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
145 different from the Bio::Index::Abstract (see L<Bio::Index::Abstract>)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
146 which uses DBM files as storage. This module uses a binary search to
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
147 retrieve sequences which is more efficient for large datasets.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
148
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
149
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
150 =head1 FEEDBACK
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
151
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
152 =head2 Mailing Lists
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
153
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
154 User feedback is an integral part of the evolution of this and other
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
155 Bioperl modules. Send your comments and suggestions preferably to one
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
156 of the Bioperl mailing lists. Your participation is much appreciated.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
157
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
158 bioperl-l@bioperl.org - General discussion
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
159 http://bioperl.org/MailList.shtml - About the mailing lists
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
160
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
161 =head2 Reporting Bugs
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
162
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
163 Report bugs to the Bioperl bug tracking system to help us keep track
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
164 the bugs and their resolution. Bug reports can be submitted via
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
165 email or the web:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
166
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
167 bioperl-bugs@bio.perl.org
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
168 http://bugzilla.bioperl.org/
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
169
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
170 =head1 AUTHOR - Michele Clamp
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
171
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
172 Email - michele@sanger.ac.uk
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
173
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
174 =head1 APPENDIX
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
175
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
176 The rest of the documentation details each of the object methods. Internal
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
177 methods are usually preceded with an "_" (underscore).
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
178
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
179 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
180
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
181 package Bio::DB::Flat::OBDAIndex;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
182
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
183 use strict;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
184 use vars qw(@ISA);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
185
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
186 use Fcntl qw(SEEK_END SEEK_CUR);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
187 # rather than using tell which might be buffered
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
188 sub systell{ sysseek($_[0], 0, SEEK_CUR) }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
189 sub syseof{ sysseek($_[0], 0, SEEK_END) }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
190
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
191
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
192 use Bio::DB::RandomAccessI;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
193 use Bio::Root::RootI;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
194 use Bio::SeqIO;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
195 use Bio::Seq;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
196
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
197 @ISA = qw(Bio::DB::RandomAccessI);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
198
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
199 use constant CONFIG_FILE_NAME => 'config.dat';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
200 use constant HEADER_SIZE => 4;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
201
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
202 my @formats = ['FASTA','SWISSPROT','EMBL'];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
203
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
204 =head2 new
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
205
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
206 Title : new
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
207 Usage : For reading
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
208 my $index = new Bio::DB::Flat::OBDAIndex(
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
209 -index_dir => '/Users/michele/indices/',
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
210 -dbname => 'dbEST',
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
211 -format => 'fasta');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
212
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
213 For writing
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
214
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
215 my %secondary_patterns = {"ACC" => "^>\\S+ +(\\S+)"}
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
216 my $index = new Bio::DB::Flat::OBDAIndex(
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
217 -index_dir => '/Users/michele/indices',
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
218 -primary_pattern => "^>(\\S+)",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
219 -secondary_patterns => \%secondary_patterns,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
220 -primary_namespace => "ID");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
221
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
222 my @files = ('file1','file2','file3');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
223
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
224 $index->make_index('mydbname',@files);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
225
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
226
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
227 Function: create a new Bio::DB::Flat::OBDAIndex object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
228 Returns : new Bio::DB::Flat::OBDAIndex
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
229 Args : -index_dir Directory containing the indices
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
230 -primary_pattern Regexp defining the primary id
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
231 -secondary_patterns A hash ref containing the secondary
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
232 patterns with the namespaces as keys
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
233 -primary_namespace A string defining what the primary key
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
234 is
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
235
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
236 Status : Public
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
237
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
238 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
239
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
240 sub new {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
241 my($class, @args) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
242
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
243 my $self = $class->SUPER::new(@args);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
244
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
245 bless $self, $class;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
246
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
247 my ($index_dir,$dbname,$format,$primary_pattern,$primary_namespace,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
248 $start_pattern,$secondary_patterns) =
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
249 $self->_rearrange([qw(INDEX_DIR
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
250 DBNAME
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
251 FORMAT
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
252 PRIMARY_PATTERN
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
253 PRIMARY_NAMESPACE
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
254 START_PATTERN
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
255 SECONDARY_PATTERNS)], @args);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
256
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
257 $self->index_directory($index_dir);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
258 $self->database_name ($dbname);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
259
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
260 if ($self->index_directory && $dbname) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
261
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
262 $self->read_config_file;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
263
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
264 my $fh = $self->primary_index_filehandle;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
265 my $record_width = $self->read_header($fh);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
266
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
267 $self->record_size($record_width);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
268 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
269
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
270
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
271 $self->format ($format);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
272 $self->primary_pattern ($primary_pattern);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
273 $self->primary_namespace ($primary_namespace);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
274 $self->start_pattern ($start_pattern);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
275 $self->secondary_patterns($secondary_patterns);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
276
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
277 return $self;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
278 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
279
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
280 sub new_from_registry {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
281 my ($self,%config) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
282
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
283 my $dbname = $config{'dbname'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
284 my $location = $config{'location'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
285
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
286 my $index = new Bio::DB::Flat::OBDAIndex(-dbname => $dbname,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
287 -index_dir => $location,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
288 );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
289 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
290
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
291 =head2 get_Seq_by_id
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
292
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
293 Title : get_Seq_by_id
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
294 Usage : $obj->get_Seq_by_id($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
295 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
296 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
297 Returns : value of get_Seq_by_id
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
298 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
299
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
300 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
301
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
302 sub get_Seq_by_id {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
303 my ($self,$id) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
304
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
305 my ($fh,$length) = $self->get_stream_by_id($id);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
306
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
307 if (!defined($self->format)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
308 $self->throw("Can't create sequence - format is not defined");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
309 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
310
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
311 if(!$fh){
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
312 return;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
313 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
314 if (!defined($self->{_seqio})) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
315
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
316 $self->{_seqio} = new Bio::SeqIO(-fh => $fh,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
317 -format => $self->format);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
318 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
319
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
320 $self->{_seqio}->fh($fh);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
321 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
322
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
323 return $self->{_seqio}->next_seq;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
324
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
325 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
326
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
327 =head2 get_entry_by_id
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
328
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
329 Title : get_entry_by_id
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
330 Usage : $obj->get_entry_by_id($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
331 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
332 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
333 Returns :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
334 Args :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
335
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
336
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
337 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
338
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
339 sub get_entry_by_id {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
340 my ($self,$id) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
341
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
342 my ($fh,$length) = $self->get_stream_by_id($id);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
343
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
344 my $entry;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
345
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
346 sysread($fh,$entry,$length);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
347
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
348 return $entry;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
349 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
350
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
351
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
352 =head2 get_stream_by_id
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
353
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
354 Title : get_stream_by_id
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
355 Usage : $obj->get_stream_by_id($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
356 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
357 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
358 Returns : value of get_stream_by_id
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
359 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
360
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
361
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
362 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
363
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
364 sub get_stream_by_id {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
365 my ($self,$id) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
366
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
367 my $indexfh = $self->primary_index_filehandle;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
368
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
369 syseof ($indexfh);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
370
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
371 my $filesize = systell($indexfh);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
372
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
373 my $end = ($filesize-$self->{_start_pos})/$self->record_size;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
374
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
375 my ($newid,$rest,$fhpos) = $self->find_entry($indexfh,0,$end,$id,$self->record_size);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
376
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
377
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
378 my ($fileid,$pos,$length) = split(/\t/,$rest);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
379
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
380 #print STDERR "OBDAIndex Found id entry $newid $fileid $pos $length:$rest\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
381
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
382 if (!$newid) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
383 return;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
384 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
385
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
386 my $fh = $self->get_filehandle_by_fileid($fileid);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
387 my $file = $self->{_file}{$fileid};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
388
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
389 open (IN,"<$file");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
390 $fh = \*IN;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
391
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
392 my $entry;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
393
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
394 sysseek($fh,$pos,0);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
395
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
396 return ($fh,$length);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
397 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
398
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
399 =head2 get_Seq_by_acc
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
400
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
401 Title : get_Seq_by_acc
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
402 Usage : $obj->get_Seq_by_acc($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
403 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
404 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
405 Returns : value of get_Seq_by_acc
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
406 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
407
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
408
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
409 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
410
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
411 sub get_Seq_by_acc {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
412 my ($self,$acc) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
413
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
414 if ($self->primary_namespace eq "ACC") {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
415 return $self->get_Seq_by_id($acc);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
416 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
417 return $self->get_Seq_by_secondary("ACC",$acc);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
418 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
419 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
420
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
421 =head2 get_Seq_by_secondary
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
422
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
423 Title : get_Seq_by_secondary
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
424 Usage : $obj->get_Seq_by_secondary($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
425 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
426 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
427 Returns : value of get_Seq_by_secondary
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
428 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
429
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
430
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
431 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
432
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
433 sub get_Seq_by_secondary {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
434 my ($self,$name,$id) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
435
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
436 my @names = $self->secondary_namespaces;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
437
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
438 my $found = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
439 foreach my $tmpname (@names) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
440 if ($name eq $tmpname) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
441 $found = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
442 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
443 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
444
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
445 if ($found == 0) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
446 $self->throw("Secondary index for $name doesn't exist\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
447 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
448
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
449 my $fh = $self->open_secondary_index($name);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
450
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
451 syseof ($fh);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
452
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
453 my $filesize = systell($fh);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
454
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
455 my $recsize = $self->{_secondary_record_size}{$name};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
456 # print "Name " . $recsize . "\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
457
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
458 my $end = ($filesize-$self->{_start_pos})/$recsize;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
459
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
460 # print "End $end $filesize\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
461
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
462 my ($newid,$primary_id,$pos) = $self->find_entry($fh,0,$end,$id,$recsize);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
463
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
464 sysseek($fh,$pos,0);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
465
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
466 # print "Found new id $newid $primary_id\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
467 # We now need to shuffle up the index file to find the top secondary entry
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
468
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
469 my $record = $newid;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
470
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
471 while ($record =~ /^$newid/ && $pos >= 0) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
472
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
473 $record = $self->read_record($fh,$pos,$recsize);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
474 $pos = $pos - $recsize;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
475 # print "Up record = $record:$newid\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
476 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
477
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
478 $pos += $recsize;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
479
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
480 # print "Top position is $pos\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
481
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
482 # Now we have to shuffle back down again to read all the secondary entries
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
483
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
484 my $current_id = $newid;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
485 my %primary_id;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
486
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
487 $primary_id{$primary_id} = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
488
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
489 while ($current_id eq $newid) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
490 $record = $self->read_record($fh,$pos,$recsize);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
491 print "Record is :$record:\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
492 my ($secid,$primary_id) = split(/\t/,$record,2);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
493 $current_id = $secid;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
494
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
495 if ($current_id eq $newid) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
496 $primary_id =~ s/ //g;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
497 # print "Primary $primary_id\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
498 $primary_id{$primary_id} = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
499
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
500 $pos = $pos + $recsize;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
501 # print "Down record = $record\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
502 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
503 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
504
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
505 if (!defined($newid)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
506 return;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
507 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
508
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
509 my $entry;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
510
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
511 foreach my $id (keys %primary_id) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
512 $entry .= $self->get_Seq_by_id($id);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
513 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
514 return $entry;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
515
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
516 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
517
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
518 =head2 read_header
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
519
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
520 Title : read_header
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
521 Usage : $obj->read_header($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
522 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
523 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
524 Returns : value of read_header
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
525 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
526
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
527
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
528 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
529
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
530 sub read_header {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
531 my ($self,$fh) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
532
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
533 my $record_width;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
534
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
535 sysread($fh,$record_width,HEADER_SIZE);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
536
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
537 $self->{_start_pos} = HEADER_SIZE;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
538 $record_width =~ s/ //g;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
539 $record_width = $record_width * 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
540
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
541 return $record_width;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
542 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
543
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
544 =head2 read_record
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
545
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
546 Title : read_record
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
547 Usage : $obj->read_record($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
548 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
549 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
550 Returns : value of read_record
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
551 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
552
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
553
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
554 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
555
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
556 sub read_record {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
557 my ($self,$fh,$pos,$len) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
558
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
559 sysseek($fh,$pos,0);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
560
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
561 my $record;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
562
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
563 sysread($fh,$record,$len);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
564
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
565 return $record;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
566
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
567 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
568
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
569
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
570 =head2 find_entry
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
571
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
572 Title : find_entry
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
573 Usage : $obj->find_entry($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
574 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
575 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
576 Returns : value of find_entry
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
577 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
578
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
579
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
580 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
581
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
582 sub find_entry {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
583 my ($self,$fh,$start,$end,$id,$recsize) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
584
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
585 my $mid = int(($end+1+$start)/2);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
586 my $pos = ($mid-1)*$recsize + $self->{_start_pos};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
587
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
588 my ($record) = $self->read_record($fh,$pos,$recsize);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
589 my ($entryid,$rest) = split(/\t/,$record,2);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
590
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
591 # print "Mid $recsize $mid $pos:$entryid:$rest:$record\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
592 # print "Entry :$id:$entryid:$rest\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
593
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
594
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
595 my ($first,$second) = sort { $a cmp $b} ($id,$entryid);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
596
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
597 if ($id eq $entryid) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
598
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
599 return ($id,$rest,$pos-$recsize);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
600
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
601 } elsif ($first eq $id) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
602
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
603 if ($end-$start <= 1) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
604 return;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
605 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
606 my $end = $mid;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
607 # print "Moving up $entryid $id\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
608 $self->find_entry($fh,$start,$end,$id,$recsize);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
609
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
610 } elsif ($second eq $id ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
611 # print "Moving down $entryid $id\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
612 if ($end-$start <= 1) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
613 return;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
614 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
615
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
616 $start = $mid;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
617
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
618 $self->find_entry($fh,$start,$end,$id,$recsize);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
619 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
620
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
621 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
622
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
623
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
624 =head2 make_index
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
625
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
626 Title : make_index
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
627 Usage : $obj->make_index($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
628 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
629 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
630 Returns : value of make_index
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
631 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
632
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
633
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
634 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
635
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
636 sub make_index {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
637 my ($self,$dbname,@files) = @_;;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
638
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
639 my $rootdir = $self->index_directory;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
640
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
641 if (!defined($rootdir)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
642 $self->throw("No index directory set - can't build indices");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
643 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
644
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
645 if (! -d $rootdir) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
646 $self->throw("Index directory [$rootdir] is not a directory. Cant' build indices");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
647 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
648 if (!(@files)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
649 $self->throw("Must enter an array of filenames to index");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
650 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
651
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
652 if (!defined($dbname)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
653 $self->throw("Must enter an index name for your files");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
654 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
655
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
656 my $pwd = `pwd`; chomp($pwd);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
657
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
658 foreach my $file (@files) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
659 if ($file !~ /^\//) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
660 $file = $pwd . "/$file";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
661 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
662 if (! -e $file) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
663 $self->throw("Can't index file [$file] as it doesn't exist");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
664 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
665 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
666
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
667 $self->database_name($dbname);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
668 $self->make_indexdir($rootdir);;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
669 $self->make_config_file(\@files);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
670
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
671 # Finally lets index
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
672 foreach my $file (@files) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
673 $self->_index_file($file);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
674 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
675
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
676 # And finally write out the indices
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
677 $self->write_primary_index;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
678 $self->write_secondary_indices;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
679 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
680
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
681 =head2 _index_file
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
682
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
683 Title : _index_file
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
684 Usage : $obj->_index_file($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
685 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
686 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
687 Returns : value of _index_file
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
688 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
689
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
690
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
691 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
692
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
693 sub _index_file {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
694 my ($self,$file) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
695
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
696 open(FILE,"<$file") || $self->throw("Can't open file [$file]");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
697
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
698 my $recstart = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
699 my $fileid = $self->get_fileid_by_filename($file);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
700 my $found = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
701 my $id;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
702 my $count;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
703
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
704 my $primary = $self->primary_pattern;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
705 my $start_pattern = $self->start_pattern;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
706
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
707 my $pos = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
708
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
709 my $new_primary_entry;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
710
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
711 my $length;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
712 #my $pos = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
713 my $fh = \*FILE;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
714
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
715 my $done = -1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
716
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
717 my @secondary_names = $self->secondary_namespaces;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
718 my %secondary_id;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
719
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
720 while (<$fh>) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
721 if ($_ =~ /$start_pattern/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
722 if ($done == 0) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
723 $id = $new_primary_entry;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
724
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
725 my $tmplen = tell($fh) - length($_);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
726
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
727 $length = $tmplen - $pos;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
728
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
729 if (!defined($id)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
730 $self->throw("No id defined for sequence");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
731 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
732 if (!defined($fileid)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
733 $self->throw("No fileid defined for file $file");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
734 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
735 if (!defined($pos)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
736 $self->throw("No position defined for " . $id . "\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
737 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
738 if (!defined($length)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
739 $self->throw("No length defined for " . $id . "\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
740 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
741
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
742 $self->_add_id_position($id,$pos,$fileid,$length,\%secondary_id);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
743
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
744 $pos = $tmplen;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
745
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
746 if ($count%1000 == 0) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
747 print STDERR "Indexed $count ids\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
748 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
749
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
750 $count++;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
751 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
752 $done = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
753 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
754 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
755
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
756 if ($_ =~ /$primary/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
757 $new_primary_entry = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
758 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
759
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
760 my $secondary_patterns = $self->secondary_patterns;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
761
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
762 foreach my $sec (@secondary_names) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
763 my $pattern = $secondary_patterns->{$sec};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
764
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
765 if ($_ =~ /$pattern/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
766 $secondary_id{$sec} = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
767 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
768 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
769
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
770 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
771
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
772 # Remeber to add in the last one
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
773
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
774 $id = $new_primary_entry;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
775
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
776 my $tmplen = tell($fh) - length($_);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
777
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
778 $length = $tmplen - $pos;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
779
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
780 if (!defined($id)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
781 $self->throw("No id defined for sequence");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
782 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
783 if (!defined($fileid)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
784 $self->throw("No fileid defined for file $file");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
785 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
786 if (!defined($pos)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
787 $self->throw("No position defined for " . $id . "\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
788 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
789 if (!defined($length)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
790 $self->throw("No length defined for " . $id . "\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
791 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
792
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
793 $self->_add_id_position($id,$pos,$fileid,$length,\%secondary_id);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
794
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
795 close(FILE);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
796 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
797
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
798 =head2 write_primary_index
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
799
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
800 Title : write_primary_index
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
801 Usage : $obj->write_primary_index($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
802 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
803 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
804 Returns : value of write_primary_index
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
805 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
806
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
807
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
808 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
809
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
810 sub write_primary_index {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
811 my ($self) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
812
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
813 my @ids = keys %{$self->{_id}};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
814
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
815 @ids = sort {$a cmp $b} @ids;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
816
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
817 print STDERR "Number of ids = " . scalar(@ids) . "\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
818
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
819 open (INDEX,">" . $self->primary_index_file) || $self->throw("Can't open primary index file [" . $self->primary_index_file . "]");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
820
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
821 my $recordlength = $self->{_maxidlength} +
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
822 $self->{_maxfileidlength} +
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
823 $self->{_maxposlength} +
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
824 $self->{_maxlengthlength} + 3;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
825
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
826
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
827 print INDEX sprintf("%4d",$recordlength);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
828
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
829 foreach my $id (@ids) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
830
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
831 if (!defined($self->{_id}{$id}{_fileid})) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
832 $self->throw("No fileid for $id\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
833 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
834 if (!defined($self->{_id}{$id}{_pos})) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
835 $self->throw("No position for $id\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
836 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
837 if (!defined($self->{_id}{$id}{_length})) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
838 $self->throw("No length for $id");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
839 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
840
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
841 my $record = $id . "\t" .
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
842 $self->{_id}{$id}{_fileid} . "\t" .
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
843 $self->{_id}{$id}{_pos} . "\t" .
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
844 $self->{_id}{$id}{_length};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
845
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
846 print INDEX sprintf("%-${recordlength}s",$record);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
847
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
848 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
849 close(INDEX);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
850 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
851
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
852 =head2 write_secondary_indices
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
853
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
854 Title : write_secondary_indices
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
855 Usage : $obj->write_secondary_indices($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
856 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
857 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
858 Returns : value of write_secondary_indices
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
859 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
860
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
861
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
862 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
863
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
864 sub write_secondary_indices {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
865 my ($self) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
866
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
867 # These are the different
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
868 my @names = keys (%{$self->{_secondary_id}});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
869
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
870
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
871 foreach my $name (@names) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
872
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
873 my @seconds = keys %{$self->{_secondary_id}{$name}};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
874
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
875 # First we need to loop over to get the longest record.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
876 my $length = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
877
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
878 foreach my $second (@seconds) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
879 my $tmplen = length($second) + 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
880 my @prims = keys %{$self->{_secondary_id}{$name}{$second}};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
881
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
882 foreach my $prim (@prims) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
883 my $recordlen = $tmplen + length($prim);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
884
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
885 if ($recordlen > $length) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
886 $length = $recordlen;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
887 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
888 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
889 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
890
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
891 # Now we can print the index
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
892
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
893 my $fh = $self->new_secondary_filehandle($name);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
894
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
895 print $fh sprintf("%4d",$length);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
896 @seconds = sort @seconds;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
897
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
898 foreach my $second (@seconds) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
899
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
900 my @prims = keys %{$self->{_secondary_id}{$name}{$second}};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
901 my $tmp = $second;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
902
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
903 foreach my $prim (@prims) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
904 my $record = $tmp . "\t" . $prim;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
905 if (length($record) > $length) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
906 $self->throw("Something has gone horribly wrong - length of record is more than we thought [$length]\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
907 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
908 print $fh sprintf("%-${length}s",$record);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
909 print $fh sprintf("%-${length}s",$record);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
910 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
911 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
912 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
913
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
914 close($fh);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
915 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
916 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
917
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
918 =head2 new_secondary_filehandle
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
919
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
920 Title : new_secondary_filehandle
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
921 Usage : $obj->new_secondary_filehandle($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
922 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
923 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
924 Returns : value of new_secondary_filehandle
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
925 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
926
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
927
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
928 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
929
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
930 sub new_secondary_filehandle {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
931 my ($self,$name) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
932
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
933 my $indexdir = $self->index_directory;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
934
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
935 my $secindex = $indexdir . $self->database_name . "/id_$name.index";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
936
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
937 my $fh = new FileHandle(">$secindex");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
938
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
939 return $fh;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
940 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
941
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
942 =head2 open_secondary_index
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
943
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
944 Title : open_secondary_index
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
945 Usage : $obj->open_secondary_index($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
946 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
947 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
948 Returns : value of open_secondary_index
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
949 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
950
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
951
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
952 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
953
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
954 sub open_secondary_index {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
955 my ($self,$name) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
956
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
957 if (!defined($self->{_secondary_filehandle}{$name})) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
958
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
959 my $indexdir = $self->index_directory;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
960 my $secindex = $indexdir . $self->database_name . "/id_$name.index";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
961
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
962 if (! -e $secindex) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
963 $self->throw("Index is not present for namespace [$name]\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
964 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
965
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
966 my $newfh = new FileHandle("<$secindex");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
967 my $reclen = $self->read_header($newfh);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
968
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
969 $self->{_secondary_filehandle} {$name} = $newfh;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
970 $self->{_secondary_record_size}{$name} = $reclen;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
971 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
972
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
973 return $self->{_secondary_filehandle}{$name};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
974
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
975 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
976
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
977 =head2 _add_id_position
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
978
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
979 Title : _add_id_position
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
980 Usage : $obj->_add_id_position($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
981 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
982 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
983 Returns : value of _add_id_position
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
984 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
985
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
986
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
987 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
988
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
989 sub _add_id_position {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
990 my ($self,$id,$pos,$fileid,$length,$secondary_id) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
991
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
992 if (!defined($id)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
993 $self->throw("No id defined. Can't add id position");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
994 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
995 if (!defined($pos)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
996 v $self->throw("No position defined. Can't add id position");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
997 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
998 if (!defined($fileid)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
999 $self->throw("No fileid defined. Can't add id position");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1000 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1001 if (!defined($length) || $length <= 0) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1002 $self->throw("No length defined or <= 0 [$length]. Can't add id position");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1003 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1004
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1005 $self->{_id}{$id}{_pos} = $pos;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1006 $self->{_id}{$id}{_length} = $length;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1007 $self->{_id}{$id}{_fileid} = $fileid;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1008
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1009 # Now the secondary ids
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1010
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1011 foreach my $sec (keys (%$secondary_id)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1012 my $value = $secondary_id->{$sec};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1013
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1014 $self->{_secondary_id}{$sec}{$value}{$id} = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1015 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1016
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1017 if (length($id) >= $self->{_maxidlength}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1018 $self->{_maxidlength} = length($id);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1019 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1020
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1021 if (length($fileid) >= $self->{_maxfileidlength}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1022 $self->{_maxfileidlength} = length($fileid);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1023 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1024
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1025 if (length($pos) >= $self->{_maxposlength}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1026 $self->{_maxposlength} = length($pos);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1027 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1028
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1029 if (length($length) >= $self->{_maxlengthlength}) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1030 $self->{_maxlengthlength} = length($length);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1031 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1032 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1033
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1034 =head2 make_indexdir
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1035
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1036 Title : make_indexdir
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1037 Usage : $obj->make_indexdir($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1038 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1039 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1040 Returns : value of make_indexdir
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1041 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1042
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1043
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1044 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1045
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1046 sub make_indexdir {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1047 my ($self,$rootdir) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1048
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1049 if (!defined($rootdir)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1050 $self->throw("Must enter an index directory name for make_indexdir");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1051 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1052 if (! -e $rootdir) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1053 $self->throw("Root index directory [$rootdir] doesn't exist");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1054 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1055
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1056 if (! -d $rootdir) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1057 $self->throw("[$rootdir] exists but is not a directory");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1058 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1059
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1060 if ($rootdir !~ /\/$/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1061 $rootdir .= "/";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1062 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1063
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1064 my $indexdir = $rootdir . $self->database_name;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1065
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1066 if (! -e $indexdir) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1067 mkdir $indexdir,0755;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1068 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1069 $self->throw("Index directory " . $indexdir . " already exists. Exiting\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1070 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1071
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1072 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1073
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1074 =head2 make_config_file
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1075
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1076 Title : make_config_file
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1077 Usage : $obj->make_config_file($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1078 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1079 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1080 Returns : value of make_config_file
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1081 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1082
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1083 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1084
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1085 sub make_config_file {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1086 my ($self,$files) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1087
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1088 my @files = @$files;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1089
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1090 my $dir = $self->index_directory;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1091
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1092 my $configfile = $dir . $self->database_name . "/" .CONFIG_FILE_NAME;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1093
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1094 open(CON,">$configfile") || $self->throw("Can't create config file [$configfile]");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1095
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1096 # First line must be the type of index - in this case flat
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1097
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1098 print CON "index\tflat/1\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1099
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1100 # Now the fileids
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1101
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1102 my $count = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1103
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1104 foreach my $file (@files) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1105
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1106 my $size = -s $file;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1107
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1108 print CON "fileid_$count\t$file\t$size\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1109
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1110 my $fh = new FileHandle("<$file");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1111 $self->{_fileid}{$count} = $fh;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1112 $self->{_file} {$count} = $file;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1113 $self->{_dbfile}{$file} = $count;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1114 $self->{_size}{$count} = $size;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1115
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1116 $count++;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1117 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1118
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1119 # Now the namespaces
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1120
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1121 print CON "primary_namespace\t" .$self->primary_namespace. "\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1122
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1123 # Needs fixing for the secondary stuff
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1124
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1125 my $second_patterns = $self->secondary_patterns;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1126
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1127 my @second = keys %$second_patterns;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1128
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1129 if ((@second)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1130 print CON "secondary_namespaces";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1131
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1132 foreach my $second (@second) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1133 print CON "\t$second";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1134 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1135 print CON "\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1136 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1137
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1138 # Now the config format
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1139
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1140 if (!defined($self->format)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1141 $self->throw("Format does not exist in module - can't write config file");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1142 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1143 print CON "format\t" . $self->format . "\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1144 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1145
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1146
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1147 close(CON);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1148 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1149
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1150 =head2 read_config_file
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1151
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1152 Title : read_config_file
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1153 Usage : $obj->read_config_file($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1154 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1155 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1156 Returns : value of read_config_file
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1157 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1158
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1159
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1160 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1161
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1162 sub read_config_file {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1163 my ($self) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1164
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1165 my $dir = $self->index_directory . $self->database_name . "/";;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1166
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1167 if (! -d $dir) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1168 $self->throw("No index directory [" . $dir . "]. Can't read ". CONFIG_FILE_NAME);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1169 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1170
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1171 my $configfile = $dir . CONFIG_FILE_NAME;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1172
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1173 if (! -e $configfile) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1174 $self->throw("No config file [$configfile]. Can't read namespace");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1175 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1176
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1177 open(CON,"<$configfile") || $self->throw("Can't open configfile [$configfile]");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1178
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1179 # First line must be type
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1180
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1181 my $line = <CON>; chomp($line);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1182 my $version;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1183
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1184 # This is hard coded as we only index flatfiles here
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1185 if ($line =~ /index\tflat\/(\d+)/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1186 $version = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1187 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1188 $self->throw("First line not compatible with flat file index. Should be something like\n\nindex\tflat/1");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1189 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1190
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1191 $self->index_type("flat");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1192 $self->index_version($version);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1193
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1194 while (<CON>) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1195 chomp;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1196
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1197 # Look for fileid lines
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1198 if ($_ =~ /^fileid_(\d+)\t(\S+)\t(\d+)/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1199 my $fileid = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1200 my $filename = $2;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1201 my $filesize = $3;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1202
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1203 if (! -e $filename) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1204 $self->throw("File [$filename] does not exist!");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1205 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1206 if (-s $filename != $filesize) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1207 $self->throw("Flatfile size for $filename differs from what the index thinks it is. Real size [" . (-s $filename) . "] Index thinks it is [" . $filesize . "]");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1208 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1209
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1210 my $fh = new FileHandle("<$filename");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1211
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1212 $self->{_fileid}{$fileid} = $fh;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1213 $self->{_file} {$fileid} = $filename;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1214 $self->{_dbfile}{$filename} = $fileid;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1215 $self->{_size} {$fileid} = $filesize;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1216
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1217 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1218
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1219 # Look for namespace lines
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1220 if ($_ =~ /(.*)_namespace.*\t(\S+)/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1221 if ($1 eq "primary") {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1222 $self->primary_namespace($2);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1223 } elsif ($1 eq "secondary") {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1224 $self->secondary_namespaces($2);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1225 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1226 $self->throw("Unknown namespace name in config file [$1");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1227 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1228 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1229
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1230 # Look for format lines
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1231
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1232 if ($_ =~ /format\t(\S+)/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1233
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1234 # Check the format here?
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1235
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1236 $self->format($1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1237 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1238 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1239 close(CON);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1240
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1241 # Now check we have all that we need
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1242
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1243 my @fileid_keys = keys (%{$self->{_fileid}});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1244
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1245 if (!(@fileid_keys)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1246 $self->throw("No flatfile fileid files in config - check the index has been made correctly");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1247 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1248
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1249 if (!defined($self->primary_namespace)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1250 $self->throw("No primary namespace exists");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1251 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1252
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1253 if (! -e $self->primary_index_file) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1254 $self->throw("Primary index file [" . $self->primary_index_file . "] doesn't exist");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1255 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1256 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1257
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1258 =head2 get_fileid_by_filename
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1259
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1260 Title : get_fileid_by_filename
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1261 Usage : $obj->get_fileid_by_filename($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1262 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1263 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1264 Returns : value of get_fileid_by_filename
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1265 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1266
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1267
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1268 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1269
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1270 sub get_fileid_by_filename {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1271 my ($self,$file) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1272
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1273 if (!defined($self->{_dbfile})) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1274 $self->throw("No file to fileid mapping present. Has the fileid file been read?");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1275 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1276
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1277
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1278 return $self->{_dbfile}{$file};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1279 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1280
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1281 =head2 get_filehandle_by_fileid
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1282
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1283 Title : get_filehandle_by_fileid
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1284 Usage : $obj->get_filehandle_by_fileid($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1285 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1286 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1287 Returns : value of get_filehandle_by_fileid
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1288 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1289
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1290
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1291 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1292
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1293 sub get_filehandle_by_fileid {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1294 my ($self,$fileid) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1295
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1296 if (!defined($self->{_fileid}{$fileid})) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1297 $self->throw("ERROR: undefined fileid in index [$fileid]");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1298 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1299
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1300 return $self->{_fileid}{$fileid};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1301 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1302
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1303 =head2 primary_index_file
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1304
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1305 Title : primary_index_file
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1306 Usage : $obj->primary_index_file($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1307 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1308 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1309 Returns : value of primary_index_file
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1310 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1311
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1312
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1313 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1314
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1315 sub primary_index_file {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1316 my ($self) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1317
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1318 return $self->index_directory . $self->database_name . "/key_" . $self->primary_namespace . ".key";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1319 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1320
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1321 =head2 primary_index_filehandle
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1322
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1323 Title : primary_index_filehandle
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1324 Usage : $obj->primary_index_filehandle($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1325 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1326 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1327 Returns : value of primary_index_filehandle
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1328 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1329
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1330
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1331 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1332
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1333 sub primary_index_filehandle {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1334 my ($self) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1335
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1336 if (!defined ($self->{_primary_index_handle})) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1337 $self->{_primary_index_handle} = new FileHandle("<" . $self->primary_index_file);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1338 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1339 return $self->{_primary_index_handle};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1340 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1341
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1342 =head2 database_name
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1343
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1344 Title : database_name
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1345 Usage : $obj->database_name($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1346 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1347 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1348 Returns : value of database_name
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1349 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1350
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1351
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1352 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1353
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1354
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1355 sub database_name {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1356 my ($self,$arg) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1357
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1358 if (defined($arg)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1359 $self->{_database_name} = $arg;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1360 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1361 return $self->{_database_name};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1362
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1363 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1364
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1365 =head2 format
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1366
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1367 Title : format
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1368 Usage : $obj->format($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1369 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1370 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1371 Returns : value of format
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1372 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1373
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1374
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1375 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1376
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1377 sub format{
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1378 my ($obj,$value) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1379 if( defined $value) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1380 $obj->{'format'} = $value;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1381 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1382 return $obj->{'format'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1383
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1384 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1385
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1386 =head2 index_directory
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1387
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1388 Title : index_directory
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1389 Usage : $obj->index_directory($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1390 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1391 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1392 Returns : value of index_directory
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1393 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1394
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1395
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1396 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1397
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1398 sub index_directory {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1399 my ($self,$arg) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1400
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1401 if (defined($arg)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1402 if ($arg !~ /\/$/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1403 $arg .= "/";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1404 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1405 $self->{_index_directory} = $arg;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1406 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1407 return $self->{_index_directory};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1408
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1409 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1410
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1411 =head2 record_size
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1412
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1413 Title : record_size
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1414 Usage : $obj->record_size($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1415 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1416 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1417 Returns : value of record_size
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1418 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1419
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1420
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1421 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1422
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1423 sub record_size {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1424 my ($self,$arg) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1425
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1426 if (defined($arg)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1427 $self->{_record_size} = $arg;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1428 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1429 return $self->{_record_size};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1430 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1431
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1432 =head2 primary_namespace
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1433
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1434 Title : primary_namespace
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1435 Usage : $obj->primary_namespace($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1436 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1437 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1438 Returns : value of primary_namespace
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1439 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1440
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1441 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1442
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1443 sub primary_namespace {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1444 my ($self,$arg) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1445
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1446 if (defined($arg)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1447 $self->{_primary_namespace} = $arg;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1448 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1449 return $self->{_primary_namespace};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1450 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1451
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1452 =head2 index_type
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1453
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1454 Title : index_type
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1455 Usage : $obj->index_type($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1456 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1457 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1458 Returns : value of index_type
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1459 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1460
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1461
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1462 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1463
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1464 sub index_type {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1465 my ($self,$arg) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1466
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1467 if (defined($arg)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1468 $self->{_index_type} = $arg;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1469 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1470 return $self->{_index_type};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1471 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1472
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1473 =head2 index_version
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1474
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1475 Title : index_version
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1476 Usage : $obj->index_version($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1477 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1478 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1479 Returns : value of index_version
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1480 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1481
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1482
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1483 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1484
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1485 sub index_version {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1486 my ($self,$arg) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1487
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1488 if (defined($arg)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1489 $self->{_index_version} = $arg;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1490 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1491 return $self->{_index_version};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1492 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1493
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1494 =head2 primary_pattern
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1495
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1496 Title : primary_pattern
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1497 Usage : $obj->primary_pattern($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1498 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1499 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1500 Returns : value of primary_pattern
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1501 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1502
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1503
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1504 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1505
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1506 sub primary_pattern{
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1507 my ($obj,$value) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1508 if( defined $value) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1509 $obj->{'primary_pattern'} = $value;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1510 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1511
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1512 return $obj->{'primary_pattern'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1513
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1514 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1515 =head2 start_pattern
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1516
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1517 Title : start_pattern
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1518 Usage : $obj->start_pattern($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1519 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1520 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1521 Returns : value of start_pattern
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1522 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1523
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1524
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1525 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1526
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1527 sub start_pattern{
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1528 my ($obj,$value) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1529 if( defined $value) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1530 $obj->{'start_pattern'} = $value;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1531 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1532 return $obj->{'start_pattern'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1533
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1534 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1535
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1536 =head2 secondary_patterns
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1537
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1538 Title : secondary_patterns
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1539 Usage : $obj->secondary_patterns($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1540 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1541 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1542 Returns : value of secondary_patterns
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1543 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1544
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1545
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1546 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1547
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1548 sub secondary_patterns{
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1549 my ($obj,$value) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1550 if( defined $value) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1551 $obj->{'secondary_patterns'} = $value;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1552
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1553 my @names = keys %$value;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1554
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1555 foreach my $name (@names) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1556 $obj->secondary_namespaces($name);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1557 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1558 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1559 return $obj->{'secondary_patterns'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1560
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1561 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1562
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1563 =head2 secondary_namespaces
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1564
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1565 Title : secondary_namespaces
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1566 Usage : $obj->secondary_namespaces($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1567 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1568 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1569 Returns : value of secondary_namespaces
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1570 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1571
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1572
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1573 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1574
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1575 sub secondary_namespaces{
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1576 my ($obj,$value) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1577
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1578 if (!defined($obj->{secondary_namespaces})) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1579 $obj->{secondary_namespaces} = [];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1580 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1581 if( defined $value) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1582 push(@{$obj->{'secondary_namespaces'}},$value);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1583 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1584 return @{$obj->{'secondary_namespaces'}};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1585
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1586 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1587
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1588
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1589
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1590 ## These are indexing routines to index commonly used format - fasta
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1591 ## swissprot and embl
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1592
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1593 sub new_SWISSPROT_index {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1594 my ($self,$index_dir,$dbname,@files) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1595
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1596 my %secondary_patterns;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1597
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1598 my $start_pattern = "^ID (\\S+)";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1599 my $primary_pattern = "^AC (\\S+)\\;";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1600
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1601 $secondary_patterns{"ID"} = $start_pattern;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1602
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1603 my $index = new Bio::DB::Flat::OBDAIndex(-index_dir => $index_dir,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1604 -format => 'swiss',
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1605 -primary_pattern => $primary_pattern,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1606 -primary_namespace => "ACC",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1607 -start_pattern => $start_pattern,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1608 -secondary_patterns => \%secondary_patterns);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1609
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1610 $index->make_index($dbname,@files);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1611 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1612
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1613 sub new_EMBL_index {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1614 my ($self,$index_dir,$dbname,@files) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1615
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1616 my %secondary_patterns;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1617
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1618 my $start_pattern = "^ID (\\S+)";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1619 my $primary_pattern = "^AC (\\S+)\\;";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1620 my $primary_namespace = "ACC";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1621
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1622 $secondary_patterns{"ID"} = $start_pattern;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1623
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1624 my $index = new Bio::DB::Flat::OBDAIndex(-index_dir => $index_dir,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1625 -format => 'embl',
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1626 -primary_pattern => $primary_pattern,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1627 -primary_namespace => "ACC",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1628 -start_pattern => $start_pattern,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1629 -secondary_patterns => \%secondary_patterns);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1630
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1631 $index->make_index($dbname,@files);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1632
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1633 return $index;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1634 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1635
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1636 sub new_FASTA_index {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1637 my ($self,$index_dir,$dbname,@files) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1638
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1639 my %secondary_patterns;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1640
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1641 my $start_pattern = "^>";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1642 my $primary_pattern = "^>(\\S+)";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1643 my $primary_namespace = "ACC";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1644
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1645 $secondary_patterns{"ID"} = "^>\\S+ +(\\S+)";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1646
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1647 my $index = new Bio::DB::Flat::OBDAIndex(-index_dir => $index_dir,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1648 -format => 'fasta',
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1649 -primary_pattern => $primary_pattern,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1650 -primary_namespace => "ACC",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1651 -start_pattern => $start_pattern,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1652 -secondary_patterns => \%secondary_patterns);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1653
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1654 $index->make_index($dbname,@files);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1655
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1656 return $index;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1657
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1658 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1659
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1660
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1661
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1662 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1663
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1664
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1665
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1666
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1667
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1668
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1669
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1670
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1671
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1672
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1673
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1674
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1675
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1676
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1677
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1678
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1679
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1680
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1681
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1682
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1683
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1684
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1685
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1686
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1687
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1688
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1689
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1690
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1691
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1692