annotate VelvetOptimiser-2.1.7_modified/VelvetOpt/Utils.pm @ 0:7363fee7f20c default tip

Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
author konradpaszkiewicz
date Tue, 07 Jun 2011 17:42:26 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
1 #
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
2 # VelvetOpt::Utils.pm
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
3 #
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
4 # Copyright 2008,2009,2010 Simon Gladman <simon.gladman@csiro.au>
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
5 #
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
6 # This program is free software; you can redistribute it and/or modify
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
7 # it under the terms of the GNU General Public License as published by
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
8 # the Free Software Foundation; either version 2 of the License, or
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
9 # (at your option) any later version.
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
10 #
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
11 # This program is distributed in the hope that it will be useful,
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
12 # but WITHOUT ANY WARRANTY; without even the implied warranty of
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
13 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
14 # GNU General Public License for more details.
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
15 #
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
16 # You should have received a copy of the GNU General Public License
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
17 # along with this program; if not, write to the Free Software
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
18 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
19 # MA 02110-1301, USA.
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
20
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
21 # Version 2.1.3
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
22
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
23 # Changes for Version 2.0.1
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
24 # Added Mikael Brandstrom Durling's numCpus and freeMem for the Mac.
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
25 #
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
26 # Changes for Version 2.1.0
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
27 # Fixed bug in estExpCov so it now correctly uses all short read categories not just the first two.
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
28 #
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
29 # Changes for Version 2.1.2
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
30 # Fixed bug in estExpCov so it now won't take columns with "N/A" or "Inf" into account
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
31 #
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
32 # Changes for Version 2.1.3
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
33 # Changed the minimum contig size to use for estimating expected coverage to 3*kmer size -1 and set the minimum coverage to 2 instead of 0.
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
34 # This should get rid of exp_covs of 1 when it should be very high for assembling reads that weren't ampped to a reference using one of the standard read mapping programs
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
35
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
36
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
37 package VelvetOpt::Utils;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
38
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
39 use strict;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
40 use lib "/usr/local/lib/perl5/site_perl/5.8.8";
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
41 use warnings;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
42 use POSIX qw(ceil floor);
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
43 use Carp;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
44 use List::Util qw(max);
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
45 use Bio::SeqIO;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
46
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
47 # num_cpu
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
48 # It returns the number of cpus present in the system if linux.
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
49 # If it is MAC then it returns the number of cores present.
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
50 # If the OS is not linux or Mac then it returns 1.
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
51 # Written by Torsten Seemann 2009 (linux) and Mikael Brandstrom Durling 2009 (Mac).
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
52
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
53 sub num_cpu {
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
54 if ( $^O =~ m/linux/i ) {
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
55 my ($num) = qx(grep -c ^processor /proc/cpuinfo);
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
56 chomp $num;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
57 return $num if $num =~ m/^\d+/;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
58 }
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
59 elsif( $^O =~ m/darwin/i){
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
60 my ($num) = qx(system_profiler SPHardwareDataType | grep Cores);
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
61 $num =~ /.*Cores: (\d+)/;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
62 $num =$1;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
63 return $num;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
64 }
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
65 return 1;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
66 }
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
67
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
68 # free_mem
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
69 # Returns the current amount of free memory
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
70 # Mac Section written by Mikael Brandstrom Durling 2009 (Mac).
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
71
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
72 sub free_mem {
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
73 if( $^O =~ m/linux/i ) {
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
74 my $x = `free | grep '^Mem:' | sed 's/ */~/g' | cut -d '~' -f 4,7`;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
75 my @tmp = split "~", $x;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
76 my $total = $tmp[0] + $tmp[1];
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
77 my $totalGB = $total / 1024 / 1024;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
78 return $totalGB;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
79 }
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
80 elsif( $^O =~ m/darwin/i){
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
81 my ($tmp) = qx(vm_stat | grep size);
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
82 $tmp =~ /.*size of (\d+) bytes.*/;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
83 my $page_size = $1;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
84 ($tmp) = qx(vm_stat | grep free);
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
85 $tmp =~ /[^0-9]+(\d+).*/;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
86 my $free_pages = $1;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
87 my $totalGB = ($free_pages * $page_size) / 1024 / 1024 / 1024;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
88 return $totalGB;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
89 }
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
90 }
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
91
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
92 # estExpCov
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
93 # it returns the expected coverage of short reads from an assembly by
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
94 # performing a math mode on the stats.txt file supplied.. It looks at
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
95 # all the short_cov? columns.. Uses minimum contig length and minimum coverage.
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
96 # needs the stats.txt file path and name, and the k-value used in the assembly.
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
97 # Original algorithm by Torsten Seemann 2009 under the GPL.
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
98 # Adapted by Simon Gladman 2009.
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
99 # It does a weighted mode...
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
100
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
101 sub estExpCov {
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
102 use List::Util qw(max);
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
103 my $file = shift;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
104 my $kmer = shift;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
105 my $minlen = 3 * $kmer - 1;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
106 my $mincov = 2;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
107 my $fh;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
108 unless ( open IN, $file ) {
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
109 croak "Unable to open $file for exp_cov determination.\n";
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
110 }
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
111 my @cov;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
112 while (<IN>) {
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
113 chomp;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
114 my @x = split m/\t/;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
115 my $len = scalar @x;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
116 next unless @x >= 7;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
117 next unless $x[1] =~ m/^\d+$/;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
118 next unless $x[1] >= $minlen;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
119
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
120 #add all the short_cov columns..
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
121 my $cov = 0;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
122 for(my $i = 5; $i < $len; $i += 2){
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
123 if($x[$i] =~ /\d/){
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
124 $cov += $x[$i];
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
125 }
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
126 }
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
127 next unless $cov > $mincov;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
128 push @cov, ( ( int($cov) ) x $x[1] );
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
129 }
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
130
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
131 my %freq_of;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
132 map { $freq_of{$_}++ } @cov;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
133 my $mode = 0;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
134 $freq_of{$mode} = 0; # sentinel
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
135 for my $x ( keys %freq_of ) {
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
136 $mode = $x if $freq_of{$x} > $freq_of{$mode};
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
137 }
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
138 return $mode;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
139 }
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
140
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
141 # estVelvetMemUse
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
142 # returns the estimated memory usage for velvet in GB
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
143
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
144 sub estVelvetMemUse {
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
145 my ($readsize, $genomesize, $numreads, $k) = @_;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
146 my $velvetgmem = -109635 + 18977*$readsize + 86326*$genomesize + 233353*$numreads - 51092*$k;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
147 my $out = ($velvetgmem/1024) / 1024;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
148 return $out;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
149 }
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
150
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
151 # getReadSizeNum
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
152 # returns the number of reads and average size in the short and shortPaired categories...
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
153
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
154 sub getReadSizeNum {
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
155 my $f = shift;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
156 my %reads;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
157 my $num = 0;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
158 my $currentfiletype = "fasta";
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
159 #first pull apart the velveth string and get the short and shortpaired filenames...
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
160 my @l = split /\s+/, $f;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
161 my $i = 0;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
162 foreach (@l){
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
163 if(/^-/){
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
164 if(/^-fasta/){
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
165 $currentfiletype = "fasta";
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
166 }
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
167 elsif(/^-fastq/){
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
168 $currentfiletype = "fastq";
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
169 }
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
170 elsif(/(-eland)|(-gerald)|(-fasta.gz)|(-fastq.gz)/) {
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
171 croak "Cannot estimate memory usage from file types other than fasta or fastq..\n";
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
172 }
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
173 }
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
174 elsif(-r $_){
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
175 my $file = $_;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
176 if($currentfiletype eq "fasta"){
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
177 my $x = `grep -c "^>" $file`;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
178 chomp($x);
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
179 $num += $x;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
180 my $l = &getReadLength($file, 'Fasta');
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
181 $reads{$l} += $x;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
182 print STDERR "File: $file has $x reads of length $l\n";
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
183 }
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
184 else {
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
185 my $x = `grep -c "^@" $file`;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
186 chomp($x);
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
187 $num += $x;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
188 my $l = &getReadLength($file, 'Fastq');
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
189 $reads{$l} += $x;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
190 print STDERR "File: $file has $x reads of length $l\n";
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
191 }
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
192 }
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
193 $i ++;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
194 }
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
195 my $totlength = 0;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
196 foreach my $k (keys %reads){
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
197 $totlength += ($reads{$k} * $k);
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
198 }
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
199
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
200
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
201 my @results;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
202 push @results, floor($totlength/$num);
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
203 push @results, ($num/1000000);
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
204 printf STDERR "Total reads: %.1f million. Avg length: %.1f\n",($num/1000000), ($totlength/$num);
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
205 return @results;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
206 }
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
207
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
208 # getReadLength - returns the length of the first read in a file of type fasta or fastq..
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
209 #
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
210 sub getReadLength {
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
211 my ($f, $t) = @_;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
212 my $sio = Bio::SeqIO->new(-file => $f, -format => $t);
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
213 my $s = $sio->next_seq() or croak "Something went bad while reading file $f!\n";
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
214 return $s->length;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
215 }
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
216
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
217 return 1;
7363fee7f20c Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
konradpaszkiewicz
parents:
diff changeset
218