changeset 2:ceb6adffc4e2 draft

Uploaded
author plus91-technologies-pvt-ltd
date Wed, 04 Jun 2014 08:00:42 -0400
parents 1c4234620728
children 16c74d8815d0
files 2.4/LICENCE.txt 2.4/binary/String-Approx-3.27.tar.gz 2.4/binary/Text-LevenshteinXS-0.03.tar.gz 2.4/install.pl 2.4/lib/LevD.pm 2.4/lib/perl5/x86_64-linux-gnu-thread-multi/String/Approx.pm 2.4/lib/perl5/x86_64-linux-gnu-thread-multi/Text/LevenshteinXS.pm 2.4/lib/perl5/x86_64-linux-gnu-thread-multi/auto/String/Approx/.packlist 2.4/lib/perl5/x86_64-linux-gnu-thread-multi/auto/String/Approx/Approx.bs 2.4/lib/perl5/x86_64-linux-gnu-thread-multi/auto/String/Approx/Approx.so 2.4/lib/perl5/x86_64-linux-gnu-thread-multi/auto/Text/LevenshteinXS/.packlist 2.4/lib/perl5/x86_64-linux-gnu-thread-multi/auto/Text/LevenshteinXS/LevenshteinXS.bs 2.4/lib/perl5/x86_64-linux-gnu-thread-multi/auto/Text/LevenshteinXS/LevenshteinXS.so 2.4/lib/perl5/x86_64-linux-gnu-thread-multi/auto/Text/LevenshteinXS/autosplit.ix 2.4/lib/perl5/x86_64-linux-gnu-thread-multi/perllocal.pod 2.4/library/LevD.pm 2.4/logs/StringApprox.err 2.4/logs/StringApprox.out 2.4/logs/levD.err 2.4/logs/levD.errmake 2.4/logs/levD.out 2.4/man/man3/String::Approx.3pm 2.4/man/man3/Text::LevenshteinXS.3pm 2.4/progress.txt 2.4/script/Annotate_SoftSearch.pl 2.4/script/Bam2pair.pl 2.4/script/Check_integration.sh 2.4/script/Extract_nSC.pl 2.4/script/Merge_SV.pl 2.4/script/Merge_Soft.pl 2.4/script/ReadCluster.pl 2.4/script/SoftSearch.multi.pl 2.4/script/SoftSearch.pl 2.4/script/SoftSearch_Filter.pl 2.4/script/Subset_targets.sh 2.4/script/blat_parse.pl 2.4/script/cluster.pair.pl 2.4/script/direction_filter.pl 2.4/script/reduce_redundancy.pl 2.4/script/run_blat.pl 2.4/script/standalone_blat2.pl 2.4/src/Annotate_SoftSearch.pl 2.4/src/Bam2pair.pl 2.4/src/Check_integration.sh 2.4/src/Extract_nSC.pl 2.4/src/Merge_SV.pl 2.4/src/Merge_Soft.pl 2.4/src/ReadCluster.pl 2.4/src/SoftSearch.multi.pl 2.4/src/SoftSearch.pl 2.4/src/SoftSearch_Filter.pl 2.4/src/Subset_targets.sh 2.4/src/blat_parse.pl 2.4/src/cluster.pair.pl 2.4/src/direction_filter.pl 2.4/src/reduce_redundancy.pl 2.4/src/run_blat.pl 2.4/src/standalone_blat2.pl fasta_indexes.loc.sample softsearch/SoftSearch.pl softsearch/softsearch.xml tool_data_table_conf.xml.sample
diffstat 58 files changed, 12341 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/LICENCE.txt	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,59 @@
+
+SoftSearch software terms of use (download)
+
+Mayo Foundation and Medical Education and Research (MFMER), has created SoftSearch software that identifies structural variations from whole-genome sequencing data (.Software.).
+
+Using the Software indicates your agreement to be bound by the terms of this Software User Agreement (.Agreement.). Absent your agreement to the terms below, you (the .End User.) have no rights to hold or use the Software whatsoever.
+
+MFMER agrees to grant hereunder the limited non-exclusive license to End User for the use of the Software in performance of End User.s internal, non-profit research at End User.s institution on the following terms and conditions:
+
+1. NO REDISTRIBUTION. Software remains the property MFMER and End User shall not publish, distribute, or otherwise transfer or make available the Software to any other party.
+
+2. NO COMMERCIAL USE. The End User shall not use Software for commercial purposes and any such use hereunder this license is explicitly prohibited. This includes, but is not limited to, use of Software in fee for service core laboratories or to provide services to, or commercial sponsored research for third parties for a fee. If End User wishes to use Software for any commercial purposes, End User will need to execute a separate licensing agreement with the MFMER. Requests for the use of Software for commercial purposes, please contact:
+
+To MAYO:	Mayo Foundation for Medical Education and Research
+Mayo Clinic Ventures . BB4
+200 First Street SW
+Rochester, Minnesota 55905-0001
+Attn:  Ventures Operations
+Phone:  (507)293-3900
+Facsimile:  (507) 284-5410
+Email:  mayoclinicventures@mayo.edu 
+Fed Tax ID: 41-1506440
+
+3. OWNERSHIP AND COPYRIGHT NOTICE. MFMER owns all intellectual property in the Software. End User shall gain no ownership to the Software. End User shall not remove or delete and shall retain in the Software and any modifications to Software, the copyright, trademark, or other notices pertaining to Software as provided with the Software.
+
+4. FEEDBACK. In order to improve the Software, comments from End Users may be useful. End User agrees to provide MFMERwith feedback on the End User's use of the Software e.g. any bugs in the Software, the user experience etc. MFMER is permitted to use such information provided by End User in making changes and improvements to the Software without compensation or an accounting to End User.
+
+5. NON ASSERT. End User acknowledges that MFMER may develop modifications to Software that may be based on the feedback provided by End User under Section 4. MFMER shall not be constrained in any way by End User regarding MFMER's use of such information. End User acknowledges the right of MFMER to prepare, publish and or use modifications to the Software that may be substantially similar or functionally equivalent to End User's modifications, and/or improvements if any. In the event that End User obtains patent protection for any modification or improvement to Software, End User agrees not to allege or enjoin infringement of End User's patent against MFMER, or any of its researchers, medical or research staff, officers, directors, and employees.
+
+6. PUBLICATION & ATTRIBUTION. End User has the right to publish, present, or share results from the use of the Software. If utilization of the Software results in outcomes which will be published, End User shall acknowledge MFMER as the provider of the Software, shall specify the version of the Software used and cite the following reference:
+
+_______________________________
+_______________________________
+
+7. NO CLINICAL USE. The Software is for academic research use only and it is not approved for clinical, diagnostic or treatment purposes. End User shall not use the Software for clinical, diagnostic or treatment purposes and any such uses are expressly prohibited.
+
+8. NO WARRANTIES. THE SOFWARE IS EXPERIMENTAL IN NATURE AND IS MADE AVAILABLE .AS IS,. WITHOUT OBLIGATION BY MFMER TO PROVIDE ACCOMPANYING SERVICES OR SUPPORT. ANY RISK ASSOCIATED WITH USE OF THE SOFTWARE IS AT THE SOLE RISK OF INSTITUTION AND END USER. MFMER MAKES NO WARRANTIES, EXPRESSED OR IMPLIED, REGARDING THAT QUALITY OF ANY PRODUCT PRODUCED UNDER THIS AGREEMENT. UNDER NO CIRCUMSTANCES, SHALL MFMER BE LIABLE FOR INCIDENTAL, SPECIAL, INDIRECT, DIRECT OR CONSEQUENTIAL DAMAGES OR LOSS OF PROFITS, INTERRUPTION OF BUSINESS, OR RELATED EXPENSES WHICH MAY ARISE FROM THE USE OF THE SOFTWARE, INCLUDING BUT NOT LIMITED TO THOSE RESULTING FROM DEFECT IN SOFTWARE AND/OR DOCUMENTATION, OR LOSS OR INACCURACY OF DATA OF ANY KIND.
+
+MFMER EXPRESSLY DISCLAIMS ANY AND ALL WARRANTIES CONCERNING SOFTWARE, INCLUDING ANY WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR ANY PARTICULAR PURPOSE, AND WARRANTIES OF PERFORMANCE, OR WARRANTY OF NON-INFRINGEMENT, AND ANY WARRANTY THAT MIGHT OTHERWISE ARISE FROM COURSE OF DEALING OR USAGE OF TRADE. NO WARRANTY IS EITHER EXPRESS OR IMPLIED WITH RESPECT TO THE USE OF THE SOFTWARE.
+
+9. INDEMNIFICATION. To the extent permitted by law, End User shall indemnify, defend and hold harmless MFMER, its corporate affiliates, current or future directors, trustees, officers, faculty, medical and professional staff, employees, students and agents and their respective successors, heirs and assigns (the .Indemnitees.), against any liability, damage, loss or expense (including reasonable attorney.s fees and expenses of litigation) incurred by or imposed upon the Indemnitees or any one of them in connection with any claims, suits, actions, demands or judgments arising from End User.s use of Software.  MFMER and MFMER.s Affiliates shall have no obligation to indemnify End User hereunder.
+
+This Section 9 indemnification clause shall survive expiration or termination of this Agreement.
+
+10. GOVERNING LAW. This Agreement is made and performed in Minnesota.  The terms and conditions of this Agreement, as well as all disputes arising under or relating to this Agreement, shall be governed by Minnesota law, specifically excluding its choice-of-law principles, except that the interpretation, validity and enforceability of the Patent Rights will be governed by the patent laws of the country in which the patent application is pending or issued.  This is not an Agreement for the sale of goods and as such Article 2 of the Uniform Commercial Code as enacted in Minnesota does not apply
+
+11. NON-USE OF NAME. Other than permitted under Sections 3 and 6, End User will not use for publicity, promotion or otherwise, any logo, name, trade name, service mark or trademark of MAYO or its Affiliates, including, but not limited to, the terms .MAYO®,. .MAYO Clinic®. and the triple shield MAYO logo, or any simulation, abbreviation or adaptation of the same, or the name of any MAYO employee or agent, without MAYO.s prior, written, express consent.  MAYO may withhold such consent in MAYO.s absolute discretion.  With regard to the use of MAYO.s name, all requests for approval pursuant to this Section must be submitted to the MAYO Clinic Public Affairs Business Relations Group, at the following e-mail address: PublicAffairsBR@MAYO.edu at least five (5) business days prior to the date on which a response is needed.
+
+IN WITNESS WHEREOF, the Parties have caused this Agreement to be executed by their duly authorized representatives. 
+
+MAYO FOUNDATION FOR MEDICAL		COMPANY
+EDUCATION AND RESEARCH 
+
+By	_______________________			By	___________________________
+	Name:						Name:
+	Title:						Title:
+
+Date:	_______________________			Date:	______________________
+
Binary file 2.4/binary/String-Approx-3.27.tar.gz has changed
Binary file 2.4/binary/Text-LevenshteinXS-0.03.tar.gz has changed
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/install.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,316 @@
+#!/usr/bin/perl
+
+=head1 NAME
+   install.pl
+
+=head1 SYNOPSIS
+    USAGE: install.pl --prefix=/location/of/install/dir
+
+=head1 OPTIONS
+
+B<--prefix, -p>
+	Required. Prefix location where package will be installed.
+
+B<--perl_exec, -e>
+	Optional.  If perl exec is other than /usr/bin/perl please specify location of perl install
+
+B<--help,-h>
+
+
+=head1  DESCRIPTION
+	Install package
+
+=head1  INPUT
+
+=head1  OUTPUT
+
+
+=head1  CONTACT
+  bjaysheel@gmail.com
+
+
+==head1 EXAMPLE
+	./install.pl --prefix=/prefix
+
+=cut
+
+use strict;
+use warnings;
+use Cwd;
+use Data::Dumper;
+use Pod::Usage;
+use Getopt::Long qw(:config no_ignore_case no_auto_abbrev pass_through);
+
+my %options = ();
+my $results = GetOptions (\%options,
+                          'prefix|p=s',
+						  'perl_exec|e=s',
+						  'help|h') || pod2usage();
+
+## display documentation
+if( $options{'help'} ){
+    pod2usage( {-exitval => 0, -verbose => 2, -output => \*STDERR} );
+}
+
+#############################################################################
+#### make sure everything passed was peachy
+&check_parameters(\%options);
+
+#### print time now.
+timestamp();
+
+my $this = {};
+my $progress = {};
+my $cmd = "";
+
+#### get current working dir
+$this->{source} = getcwd();
+
+$progress = getProgress();
+
+#### make logs dir
+$cmd = "mkdir -p $options{prefix}/logs";
+execute_cmd($cmd);
+
+#### installling libraries required for successfull run
+install_libraries();
+
+#### unpack binary dir containing all binary to be installed
+#### which are required for successfull run
+print STDERR "\n\nInstalling binaries...\n";
+
+#### install each package in binary folder.
+my @packages = qw(stringApprox levD);
+
+foreach my $tool (@packages) {
+	if ((exists $progress->{$tool}) && ($progress->{$tool})){
+		print STDERR "\t$tool already installed. Skipping...\n";
+	} else {
+		print STDERR "\tInstalling $tool...\n";
+
+		#### unpack and install each tool
+		eval("install_${tool}()");
+	}
+}
+
+#### copy source code and update paths for perl and libs
+install_source();
+
+#### completion message
+print "\n\n\tSoftSearch installation complete.  Use following command to initiate a test run\n";
+print "\n\tperl $options{prefix}/src/SoftSearch.pl -f {GENOME} -b {BAM_FILE}\n\n";
+
+#### print time now
+timestamp();
+
+#############################################################################
+sub check_parameters {
+    my $options = shift;
+
+	my @required = qw(prefix);
+
+	foreach my $key (@required) {
+		unless ($options{$key}) {
+			print STDERR "ARG: $key is required\n";
+			pod2usage({-exitval => 2,  -message => "error message", -verbose => 1, -output => \*STDERR});
+			exit(-1);
+		}
+	}
+
+	$options{'perl_exec'} = "/usr/bin/perl" unless($options{'perl_exec'});
+}
+
+#############################################################################
+sub getProgress {
+	my $hash = {};
+	my @sofar;
+
+	#### if file exists get progress so far.
+	if (-s "$options{prefix}/progress.txt") {
+		open(FHD, "<", "$options{prefix}/progress.txt") or die "Could not open file to read $options{prefix}/progress.txt";
+		while(<FHD>){
+			chomp $_;
+			push @sofar, $_;
+		}
+		close(FHD);
+
+		map { $hash->{$1} = $2 if( /([^=]+)\s*=\s*([^=]+)/ ) } @sofar;
+	}
+
+	#### return hash
+	return $hash;
+}
+
+#############################################################################
+sub setProgress {
+	my $hash = shift;
+
+	open(OUT, ">", "$options{prefix}/progress.txt") or die "Could not open file to write $options{prefix}/progress.txt";
+
+	foreach my $key (keys %{$hash}){
+		print OUT $key."=".$hash->{$key}."\n";
+	}
+
+	close(OUT);
+}
+
+#############################################################################
+sub install_libraries {
+	if ((exists $progress->{libraries}) && ($progress->{libraries})){
+		print STDERR "\tLibraries already installed. Skipping...\n";
+		return;
+	}
+
+	print STDERR "\n\nInstalling libraries...\n\n";
+	chdir($this->{source});
+
+	$cmd = "cp -r $this->{source}/library $options{prefix}/lib";
+	execute_cmd($cmd);
+
+	$progress->{libraries} = 1;
+	setProgress($progress);
+}
+
+#############################################################################
+sub install_stringApprox {
+	#### check and install dir
+	my $dir = "$options{prefix}/lib";
+	my $cmd = "";
+
+	$cmd = "mkdir -p $dir";
+	execute_cmd($cmd);
+
+	$cmd = "tar -zxvf $this->{source}/binary/String-Approx-3.27.tar.gz -C $this->{source}/binary";
+	execute_cmd($cmd);
+
+	chdir("$this->{source}/binary/String-Approx-3.27");
+	$cmd = "perl Makefile.PL INSTALL_BASE=$options{prefix}";
+	$cmd .= " 1>$options{prefix}/logs/StringApprox.out";
+	$cmd .= " 2>$options{prefix}/logs/StringApprox.err";
+	execute_cmd($cmd);
+
+	$cmd = "make && make install";
+	$cmd .= " 1>>$options{prefix}/logs/StringApprox.out";
+	$cmd .= " 2>>$options{prefix}/logs/StringApprox.err";
+	execute_cmd($cmd);
+
+	$cmd = "make install";
+	$cmd .= " 1>>$options{prefix}/logs/StringApprox.out";
+	$cmd .= " 2>>$options{prefix}/logs/StringApprox.err";
+	execute_cmd($cmd);
+
+
+	chdir("$this->{source}/binary");
+	$cmd = "rm -rf $this->{source}/binary/String-Approx-3.27";
+	execute_cmd($cmd);
+
+	$progress->{stringApprox} = 1;
+	setProgress($progress);
+}
+
+#############################################################################
+sub install_levD {
+	#### check and install dir
+	my $dir = "$options{prefix}/lib";
+	my $cmd = "";
+
+	$cmd = "mkdir -p $dir";
+	execute_cmd($cmd);
+
+	$cmd = "tar -zxvf $this->{source}/binary/Text-LevenshteinXS-0.03.tar.gz -C $this->{source}/binary";
+	execute_cmd($cmd);
+
+	chdir("$this->{source}/binary/Text-LevenshteinXS-0.03");
+	$cmd = "perl Makefile.PL INSTALL_BASE=$options{prefix}";
+	$cmd .= " 1>$options{prefix}/logs/levD.out";
+	$cmd .= " 2>$options{prefix}/logs/levD.err";
+	execute_cmd($cmd);
+
+	$cmd = "make";
+	$cmd .= " 1>>$options{prefix}/logs/levD.out";
+	$cmd .= " 2>>$options{prefix}/logs/levD.err";
+	execute_cmd($cmd);
+
+	$cmd .= "make install";
+	$cmd .= " 1>>$options{prefix}/logs/levD.out";
+	$cmd .= " 2>>$options{prefix}/logs/levD.err";
+	execute_cmd($cmd);
+
+	chdir("$this->{source}/binary");
+	$cmd = "rm -rf $this->{source}/binary/Text-LevenshteinXS-0.03";
+	execute_cmd($cmd);
+
+	$progress->{levD} = 1;
+	setProgress($progress);
+}
+
+#############################################################################
+sub install_source {
+	if ((exists $progress->{source}) && ($progress->{source})){
+		print STDERR "\tSource already installed. Skipping...\n";
+		return;
+	}
+
+	print STDERR "\n\nInstalling source...\n\n";
+
+	#### create dir to store source code
+	$cmd = "mkdir -p $options{prefix}/src";
+	execute_cmd($cmd);
+
+	$cmd = "cp -r $this->{source}/script/* $options{prefix}/src/.";
+	execute_cmd($cmd);
+
+	#### make sure all scripts are executable
+	$cmd = "chmod -R +x $options{prefix}/src";
+	execute_cmd($cmd);
+
+	#### replace /usr/local/biotools/perl/5.10.0/bin/perl with perl_exec
+	$options{perl_exec} =~ s/\//\\\//g;
+	$cmd = "find $options{prefix}/src -name \"*.pl\" -print";
+	$cmd .= " -exec sed -i 's/#!\\/usr\\/local\\/biotools\\/perl\\/5.10.0\\/bin\\/perl/#!$options{perl_exec}/' {} \\;";
+	execute_cmd($cmd);
+
+	#### check if perl exec location is other than /usr/bin/perl
+	if ($options{perl_exec} !~ /^\/usr\/bin\/perl$/) {
+		$cmd = "find $options{prefix}/src -name \"*.pl\" -print";
+		$cmd .= " -exec sed -i 's/#!\\/usr\\/bin\\/perl/#!$options{perl_exec}/' {} \\;";
+		execute_cmd($cmd);
+	}
+
+	#### replace library references to local install
+	my $lib = "$options{prefix}/lib";
+	$lib =~ s/\//\\\//g;
+
+        $cmd = "find $options{prefix}/src -name \"*.pl\" -print";
+        $cmd .= " -exec sed -i 's/\\/data2\\/bsi\\/reference\\/softsearch\\/lib/$lib/' {} \\;";
+        execute_cmd($cmd);
+        
+        $cmd = "find $options{prefix}/lib -name \"LevD.pm\" -print";
+        $cmd .= " -exec sed -i 's/\\/data2\\/bsi\\/reference\\/softsearch\\/lib/$lib/' {} \\;";
+        execute_cmd($cmd);
+
+	$progress->{source} = 1;
+	setProgress($progress);
+}
+
+#############################################################################
+sub execute_cmd {
+	my $cmd = shift;
+
+	system($cmd);
+
+	#while (( $? >> 8 ) != 0 ){
+	#	print STDERR "ERROR: Following command failed to execute. Exiting execution of workflow\n$cmd\n";
+	#	exit(-1);
+	#}
+}
+
+#############################################################################
+sub timestamp {
+	my @months = qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
+    my @weekDays = qw(Sun Mon Tue Wed Thu Fri Sat Sun);
+    my ($second, $minute, $hour, $dayOfMonth, $month, $yearOffset, $dayOfWeek, $dayOfYear, $daylightSavings) = localtime();
+    my $year = 1900 + $yearOffset;
+    my $theTime = "$hour:$minute:$second, $weekDays[$dayOfWeek] $months[$month] $dayOfMonth, $year";
+    print "Time now: " . $theTime."\n";
+}
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/lib/LevD.pm	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,80 @@
+package LevD;
+
+use lib "/home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/lib/perl5";
+use strict;
+use warnings;
+use Data::Dumper;
+use String::Approx 'adist';
+use String::Approx 'adistr';
+use String::Approx 'aindex';
+
+my $WINDOW_SIZE = 100;
+
+sub new {
+	my ($class, $file) = @_;
+    my $self = {};
+
+ 	bless($self,$class);
+	$self->init();
+
+	return $self;
+}
+
+sub init {
+	my ($self) = @_;
+
+	#### default values.
+	$self->{index} = 0;
+	$self->{relative_edit_dist} = 0;
+	$self->{edit_dist} = 0;
+}
+
+sub search {
+	my ($self, $clip, $chr, $start, $stop, $ref) = @_;
+
+	if (! -s $ref) {
+		die "ERROR: Reference file $ref now found\n";
+	}
+
+	#### extact seq from reference file.
+	my $target = $chr .":". $start ."-". $stop;
+	my $cmd = "samtools faidx $ref $target";
+
+	my @output = $self->_run_system_cmd($cmd);
+
+	#### depending on ref file format seq could be on multiple lines
+	#### concatinate all except for the header in one line.
+	#### e.g:
+	#### >chr1:8222999-8223099
+	#### GGTGCAATCATAGCTCACTAAGCTTCAACCTCAAGAGATCCTCCCACCTCAGCCTCCCAG
+	#### GTAGCTGGGACTACAGGCAAATGCCATGACACCTAGCTAAT
+	my $seq = join("", @output[1..$#output]);
+
+	#### remove new line character
+	$seq =~ s/\n//g;
+
+	#### find number of mismatches and start index
+	#### of clip to be searched against target seq.
+	$self->{relative_edit_dist} = adistr($clip, $seq);
+	$self->{edit_dist} = adist($clip, $seq);
+	$self->{index} = aindex($clip, $seq);
+}
+
+sub _run_system_cmd {
+	my ($self, $cmd) = @_;
+	my @cmd_output;
+
+	eval {
+		@cmd_output = qx{$cmd 2>&1};
+		if ( ($? << 8) != 0 ) {
+			die "@cmd_output";
+		}
+	};
+	if ($@) {
+		die "Error executing command $cmd: $@";
+	}
+
+	return @cmd_output;
+}
+
+1;
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/lib/perl5/x86_64-linux-gnu-thread-multi/String/Approx.pm	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,928 @@
+package String::Approx;
+
+require v5.8.0;
+
+$VERSION = '3.27';
+
+use strict;
+local $^W = 1;
+
+use Carp;
+use vars qw($VERSION @ISA @EXPORT @EXPORT_OK);
+
+require Exporter;
+require DynaLoader;
+
+@ISA = qw(Exporter DynaLoader);
+
+@EXPORT_OK = qw(amatch asubstitute aindex aslice arindex
+		adist adistr adistword adistrword);
+
+bootstrap String::Approx $VERSION;
+
+my $CACHE_MAX = 1000;	# high water mark
+my $CACHE_PURGE = 0.75;	# purge this much of the least used
+my $CACHE_N_PURGE;	# purge this many of the least used
+
+sub cache_n_purge () {
+    $CACHE_N_PURGE = $CACHE_MAX * $CACHE_PURGE;
+    $CACHE_N_PURGE = 1 if $CACHE_N_PURGE < 1;
+    return $CACHE_N_PURGE;
+}
+
+cache_n_purge();
+
+sub cache_max (;$) {
+    if (@_ == 0) {
+	return $CACHE_MAX;
+    } else {
+	$CACHE_MAX = shift;
+    }
+    $CACHE_MAX = 0 if $CACHE_MAX < 0;
+    cache_n_purge();
+}
+
+sub cache_purge (;$) {
+    if (@_ == 0) {
+	return $CACHE_PURGE;
+    } else {
+	$CACHE_PURGE = shift;
+    }
+    if ($CACHE_PURGE < 0) {
+	$CACHE_PURGE = 0;
+    } elsif ($CACHE_PURGE > 1) {
+	$CACHE_PURGE = 1;
+    }
+    cache_n_purge();
+}
+
+my %_simple;
+my %_simple_usage_count;
+
+sub _cf_simple {
+    my $P = shift;
+
+    my @usage =
+	sort { $_simple_usage_count{$a} <=> $_simple_usage_count{$b} }
+             grep { $_ ne $P }
+                  keys %_simple_usage_count;
+	    
+    # Make room, delete the least used entries.
+    $#usage = $CACHE_N_PURGE - 1;
+	    
+    delete @_simple_usage_count{@usage};
+    delete @_simple{@usage};
+}
+
+sub _simple {
+    my $P = shift;
+
+    my $_simple = new(__PACKAGE__, $P);
+
+    if ($CACHE_MAX) {
+	$_simple{$P} = $_simple unless exists $_simple{$P};
+
+	$_simple_usage_count{$P}++;
+
+	if (keys %_simple_usage_count > $CACHE_MAX) {
+	    _cf_simple($P);
+	}
+    }
+
+    return ( $_simple );
+}
+
+sub _parse_param {
+    use integer;
+
+    my ($n, @param) = @_;
+    my %param;
+
+    foreach (@param) {
+        while ($_ ne '') {
+	    s/^\s+//;
+            if (s/^([IDS]\s*)?(\d+)(\s*%)?//) {
+                my $k = defined $3 ? (($2-1) * $n) / 100 + ($2 ? 1 : 0) : $2;
+
+		if (defined $1) {
+		    $param{$1} = $k;
+		} else {
+		    $param{k}  = $k;
+		}
+	    } elsif (s/^initial_position\W+(\d+)\b//) {
+		$param{'initial_position'} = $1;
+	    } elsif (s/^final_position\W+(\d+)\b//) {
+		$param{'final_position'} = $1;
+	    } elsif (s/^position_range\W+(\d+)\b//) {
+		$param{'position_range'} = $1;
+	    } elsif (s/^minimal_distance\b//) {
+		$param{'minimal_distance'} = 1;
+            } elsif (s/^i//) {
+                $param{ i } = 1;
+            } elsif (s/^g//) {
+                $param{ g } = 1;
+            } elsif (s/^\?//) {
+                $param{'?'} = 1;
+            } else {
+                warn "unknown parameter: '$_'\n";
+                return;
+            }
+        }
+    }
+
+    return %param;
+}
+
+my %_param_key;
+my %_parsed_param;
+
+my %_complex;
+my %_complex_usage_count;
+
+sub _cf_complex {
+    my $P = shift;
+
+    my @usage =
+	sort { $_complex_usage_count{$a} <=>
+		   $_complex_usage_count{$b} }
+             grep { $_ ne $P }
+                  keys %_complex_usage_count;
+	    
+    # Make room, delete the least used entries.
+    $#usage = $CACHE_N_PURGE - 1;
+	    
+    delete @_complex_usage_count{@usage};
+    delete @_complex{@usage};
+}
+
+sub _complex {
+    my ($P, @param) = @_;
+    unshift @param, length $P;
+    my $param = "@param";
+    my $_param_key;
+    my %param;
+    my $complex;
+    my $is_new;
+
+    unless (exists $_param_key{$param}) {
+	%param = _parse_param(@param);
+	$_parsed_param{$param} = { %param };
+	$_param_key{$param} = join(" ", %param);
+    } else {
+	%param = %{ $_parsed_param{$param} };
+    }
+
+    $_param_key = $_param_key{$param};
+
+    if ($CACHE_MAX) {
+	if (exists $_complex{$P}->{$_param_key}) {
+	    $complex = $_complex{$P}->{$_param_key};
+	}
+    }
+
+    unless (defined $complex) {
+	if (exists $param{'k'}) {
+	    $complex = new(__PACKAGE__, $P, $param{k});
+	} else {
+	    $complex = new(__PACKAGE__, $P);
+	}
+	$_complex{$P}->{$_param_key} = $complex if $CACHE_MAX;
+	$is_new = 1;
+    }
+
+    if ($is_new) {
+	$complex->set_greedy unless exists $param{'?'};
+
+	$complex->set_insertions($param{'I'})
+	    if exists $param{'I'};
+	$complex->set_deletions($param{'D'})
+	    if exists $param{'D'};
+	$complex->set_substitutions($param{'S'})
+	    if exists $param{'S'};
+	
+	$complex->set_caseignore_slice
+	    if exists $param{'i'};
+
+	$complex->set_text_initial_position($param{'initial_position'})
+	    if exists $param{'initial_position'};
+
+	$complex->set_text_final_position($param{'final_position'})
+	    if exists $param{'final_position'};
+
+	$complex->set_text_position_range($param{'position_range'})
+	    if exists $param{'position_range'};
+
+	$complex->set_minimal_distance($param{'minimal_distance'})
+	    if exists $param{'minimal_distance'};
+    }
+
+    if ($CACHE_MAX) {
+	$_complex_usage_count{$P}->{$_param_key}++;
+
+	# If our cache overfloweth.
+	if (scalar keys %_complex_usage_count > $CACHE_MAX) {
+	    _cf_complex($P);
+	}
+    }
+
+    return ( $complex, %param );
+}
+
+sub cache_disable {
+    cache_max(0);
+}
+
+sub cache_flush_all {
+    my $old_purge = cache_purge();
+    cache_purge(1);
+    _cf_simple('');
+    _cf_complex('');
+    cache_purge($old_purge);
+}
+
+sub amatch {
+    my $P = shift;
+    return 1 unless length $P; 
+    my $a = ((@_ && ref $_[0] eq 'ARRAY') ?
+		 _complex($P, @{ shift(@_) }) : _simple($P))[0];
+
+    if (@_) {
+        if (wantarray) {
+            return grep { $a->match($_) } @_;
+        } else {
+            foreach (@_) {
+                return 1 if $a->match($_);
+            }
+             return 0;
+        }
+    } 
+    if (defined $_) {
+        if (wantarray) {
+            return $a->match($_) ? $_ : undef;
+        } else {
+	    return 1 if $a->match($_);
+        }
+    } 
+    return $a->match($_) if defined $_;
+
+    warn "amatch: \$_ is undefined: what are you matching?\n";
+    return;
+}
+
+sub _find_substitute {
+    my ($ri, $rs, $i, $s, $S, $rn) = @_;
+
+    push @{ $ri }, $i;
+    push @{ $rs }, $s;
+
+    my $pre = substr($_, 0, $i);
+    my $old = substr($_, $i, $s);
+    my $suf = substr($_, $i + $s);
+    my $new = $S;
+
+    $new =~ s/\$\`/$pre/g;
+    $new =~ s/\$\&/$old/g;
+    $new =~ s/\$\'/$suf/g;
+
+    push @{ $rn }, $new;
+}
+
+sub _do_substitute {
+    my ($rn, $ri, $rs, $rS) = @_;
+
+    my $d = 0;
+    my $n = $_;
+
+    foreach my $i (0..$#$rn) {
+	substr($n, $ri->[$i] + $d, $rs->[$i]) = $rn->[$i];
+	$d += length($rn->[$i]) - $rs->[$i];
+    }
+
+    push @{ $rS }, $n;
+}
+
+sub asubstitute {
+    my $P = shift;
+    my $S = shift;
+    my ($a, %p) =
+	(@_ && ref $_[0] eq 'ARRAY') ?
+	    _complex($P, @{ shift(@_) }) : _simple($P);
+
+    my ($i, $s, @i, @s, @n, @S);
+
+    if (@_) {
+	if (exists $p{ g }) {
+	    foreach (@_) {
+		@s = @i = @n = ();
+		while (($i, $s) = $a->slice_next($_)) {
+		    if (defined $i) {
+			_find_substitute(\@i, \@s, $i, $s, $S, \@n);
+		    }
+		}
+		_do_substitute(\@n, \@i, \@s, \@S) if @n;
+	    }
+	} else {
+	    foreach (@_) {
+		@s = @i = @n = ();
+		($i, $s) = $a->slice($_);
+		if (defined $i) {
+		    _find_substitute(\@i, \@s, $i, $s, $S, \@n);
+		    _do_substitute(\@n, \@i, \@s, \@S);
+		}
+	    }
+	}
+	return @S;
+    } elsif (defined $_) {
+	if (exists $p{ g }) {
+	    while (($i, $s) = $a->slice_next($_)) {
+		if (defined $i) {
+		    _find_substitute(\@i, \@s, $i, $s, $S, \@n);
+		}
+	    }
+	    _do_substitute(\@n, \@i, \@s, \@S) if @n;
+	} else {
+	    ($i, $s) = $a->slice($_);
+	    if (defined $i) {
+		_find_substitute(\@i, \@s, $i, $s, $S, \@n);
+		_do_substitute(\@n, \@i, \@s, \@S);
+	    }
+	}
+	return $_ = $n[0];
+    } else {
+	warn "asubstitute: \$_ is undefined: what are you substituting?\n";
+        return;
+    }
+}
+
+sub aindex {
+    my $P = shift;
+    return 0 unless length $P; 
+    my $a = ((@_ && ref $_[0] eq 'ARRAY') ?
+		 _complex($P, @{ shift(@_) }) : _simple($P))[0];
+
+    $a->set_greedy; # The *first* match, thank you.
+
+    if (@_) {
+	if (wantarray) {
+	    return map { $a->index($_) } @_;
+	} else {
+	    return $a->index($_[0]);
+	}
+    }
+    return $a->index($_) if defined $_;
+
+    warn "aindex: \$_ is undefined: what are you indexing?\n";
+    return;
+}
+
+sub aslice {
+    my $P = shift;
+    return (0, 0) unless length $P; 
+    my $a = ((@_ && ref $_[0] eq 'ARRAY') ?
+		 _complex($P, @{ shift(@_) }) : _simple($P))[0];
+
+    $a->set_greedy; # The *first* match, thank you.
+
+    if (@_) {
+	return map { [ $a->slice($_) ] } @_;
+    }
+    return $a->slice($_) if defined $_;
+
+    warn "aslice: \$_ is undefined: what are you slicing?\n";
+    return;
+}
+
+sub _adist {
+    my $s0 = shift;
+    my $s1 = shift;
+    my ($aslice) = aslice($s0, ['minimal_distance', @_], $s1);
+    my ($index, $size, $distance) = @$aslice;
+    my ($l0, $l1) = map { length } ($s0, $s1);
+    return $l0 <= $l1 ? $distance : -$distance;
+}
+
+sub adist {
+    my $a0 = shift;
+    my $a1 = shift;
+    if (length($a0) == 0) {
+      return length($a1);
+    }
+    if (length($a1) == 0) {
+      return length($a0);
+    }
+    my @m = ref $_[0] eq 'ARRAY' ? @{shift()} : ();
+    if (ref $a0 eq 'ARRAY') {
+	if (ref $a1 eq 'ARRAY') {
+	    return [ map {  adist($a0, $_, @m) } @{$a1} ];
+	} else {
+	    return [ map { _adist($_, $a1, @m) } @{$a0} ];
+	}
+    } elsif (ref $a1 eq 'ARRAY') {
+	return     [ map { _adist($a0, $_, @m) } @{$a1} ];
+    } else {
+	if (wantarray) {
+	    return map { _adist($a0, $_, @m) } ($a1, @_);
+	} else {
+	    return _adist($a0, $a1, @m);
+	}
+    }
+}
+
+sub adistr {
+    my $a0 = shift;
+    my $a1 = shift;
+    my @m = ref $_[0] eq 'ARRAY' ? shift : ();
+    if (ref $a0 eq 'ARRAY') {
+	if (ref $a1 eq 'ARRAY') {
+	    my $l0 = length();
+	    return $l0 ? [ map { adist($a0, $_, @m) }
+			  @{$a1} ] :
+		         [ ];
+	} else {
+	    return [ map { my $l0 = length();
+			   $l0 ? _adist($_, $a1, @m) / $l0 : undef
+		     } @{$a0} ];
+	}
+    } elsif (ref $a1 eq 'ARRAY') {
+	my $l0 = length($a0);
+	return [] unless $l0;
+	return     [ map { _adist($a0, $_, @m) / $l0 } @{$a1} ];
+    } else {
+	my $l0 = length($a0);
+	if (wantarray) {
+	    return map { $l0 ? _adist($a0, $_, @m) / $l0 : undef } ($a1, @_);
+	} else {
+	    return undef unless $l0;
+	    return _adist($a0, $a1, @m) / $l0;
+	}
+    }
+}
+
+sub adistword {
+    return adist($_[0], $_[1], ['position_range=0']);
+}
+
+sub adistrword {
+    return adistr($_[0], $_[1], ['position_range=0']);
+}
+
+sub arindex {
+    my $P = shift;
+    my $l = length $P;
+    return 0 unless $l;
+    my $R = reverse $P;
+    my $a = ((@_ && ref $_[0] eq 'ARRAY') ?
+		 _complex($R, @{ shift(@_) }) : _simple($R))[0];
+
+    $a->set_greedy; # The *first* match, thank you.
+
+    if (@_) {
+	if (wantarray) {
+	    return map {
+		my $aindex = $a->index(scalar reverse());
+		$aindex == -1 ? $aindex : (length($_) - $aindex - $l);
+	    } @_;
+	} else {
+	    my $aindex = $a->index(scalar reverse $_[0]);
+	    return $aindex == -1 ? $aindex : (length($_[0]) - $aindex - $l);
+	}
+    }
+    if (defined $_) {
+	my $aindex = $a->index(scalar reverse());
+	return $aindex == -1 ? $aindex : (length($_) - $aindex - $l);
+    }
+
+    warn "arindex: \$_ is undefined: what are you indexing?\n";
+    return;
+}
+
+1;
+__END__
+=pod
+
+=head1 NAME
+
+String::Approx - Perl extension for approximate matching (fuzzy matching)
+
+=head1 SYNOPSIS
+
+  use String::Approx 'amatch';
+
+  print if amatch("foobar");
+
+  my @matches = amatch("xyzzy", @inputs);
+
+  my @catches = amatch("plugh", ['2'], @inputs);
+
+=head1 DESCRIPTION
+
+String::Approx lets you match and substitute strings approximately.
+With this you can emulate errors: typing errorrs, speling errors,
+closely related vocabularies (colour color), genetic mutations (GAG
+ACT), abbreviations (McScot, MacScot).
+
+NOTE: String::Approx suits the task of B<string matching>, not 
+B<string comparison>, and it works for B<strings>, not for B<text>.
+
+If you want to compare strings for similarity, you probably just want
+the Levenshtein edit distance (explained below), the Text::Levenshtein
+and Text::LevenshteinXS modules in CPAN.  See also Text::WagnerFischer
+and Text::PhraseDistance.  (There are functions for this in String::Approx,
+e.g. adist(), but their results sometimes differ from the bare Levenshtein
+et al.)
+
+If you want to compare things like text or source code, consisting of
+B<words> or B<tokens> and B<phrases> and B<sentences>, or
+B<expressions> and B<statements>, you should probably use some other
+tool than String::Approx, like for example the standard UNIX diff(1)
+tool, or the Algorithm::Diff module from CPAN.
+
+The measure of B<approximateness> is the I<Levenshtein edit distance>.
+It is the total number of "edits": insertions,
+
+	word world
+
+deletions,
+
+	monkey money
+
+and substitutions
+
+	sun fun
+
+required to transform a string to another string.  For example, to
+transform I<"lead"> into I<"gold">, you need three edits:
+
+	lead gead goad gold
+
+The edit distance of "lead" and "gold" is therefore three, or 75%.
+
+B<String::Approx> uses the Levenshtein edit distance as its measure, but
+String::Approx is not well-suited for comparing strings of different
+length, in other words, if you want a "fuzzy eq", see above.
+String::Approx is more like regular expressions or index(), it finds
+substrings that are close matches.>
+
+=head1 MATCH
+
+	use String::Approx 'amatch';
+
+	$matched     = amatch("pattern") 
+	$matched     = amatch("pattern", [ modifiers ])
+
+	$any_matched = amatch("pattern", @inputs) 
+	$any_matched = amatch("pattern", [ modifiers ], @inputs)
+
+	@match       = amatch("pattern") 
+	@match       = amatch("pattern", [ modifiers ])
+
+	@matches     = amatch("pattern", @inputs) 
+	@matches     = amatch("pattern", [ modifiers ], @inputs)
+
+Match B<pattern> approximately.  In list context return the matched
+B<@inputs>.  If no inputs are given, match against the B<$_>.  In scalar
+context return true if I<any> of the inputs match, false if none match.
+
+Notice that the pattern is a string.  Not a regular expression.  None
+of the regular expression notations (^, ., *, and so on) work.  They
+are characters just like the others.  Note-on-note: some limited form
+of I<"regular expressionism"> is planned in future: for example
+character classes ([abc]) and I<any-chars> (.).  But that feature will
+be turned on by a special I<modifier> (just a guess: "r"), so there
+should be no backward compatibility problem.
+
+Notice also that matching is not symmetric.  The inputs are matched
+against the pattern, not the other way round.  In other words: the
+pattern can be a substring, a submatch, of an input element.  An input
+element is always a superstring of the pattern.
+
+=head2 MODIFIERS
+
+With the modifiers you can control the amount of approximateness and
+certain other control variables.  The modifiers are one or more
+strings, for example B<"i">, within a string optionally separated by
+whitespace.  The modifiers are inside an anonymous array: the B<[ ]>
+in the syntax are not notational, they really do mean B<[ ]>, for
+example B<[ "i", "2" ]>.  B<["2 i"]> would be identical.
+
+The implicit default approximateness is 10%, rounded up.  In other
+words: every tenth character in the pattern may be an error, an edit.
+You can explicitly set the maximum approximateness by supplying a
+modifier like
+
+	number
+	number%
+
+Examples: B<"3">, B<"15%">.
+
+Note that C<0%> is not rounded up, it is equal to C<0>.
+
+Using a similar syntax you can separately control the maximum number
+of insertions, deletions, and substitutions by prefixing the numbers
+with I, D, or S, like this:
+
+	Inumber
+	Inumber%
+	Dnumber
+	Dnumber%
+	Snumber
+	Snumber%
+
+Examples: B<"I2">, B<"D20%">, B<"S0">.
+
+You can ignore case (B<"A"> becames equal to B<"a"> and vice versa)
+by adding the B<"i"> modifier.
+
+For example
+
+	[ "i 25%", "S0" ]
+
+means I<ignore case>, I<allow every fourth character to be "an edit">,
+but allow I<no substitutions>.  (See L<NOTES> about disallowing
+substitutions or insertions.)
+
+NOTE: setting C<I0 D0 S0> is not equivalent to using index().
+If you want to use index(), use index().
+
+=head1 SUBSTITUTE
+
+	use String::Approx 'asubstitute';
+
+	@substituted = asubstitute("pattern", "replacement")
+	@substituted = asubstitute("pattern", "replacement", @inputs) 
+	@substituted = asubstitute("pattern", "replacement", [ modifiers ])
+	@substituted = asubstitute("pattern", "replacement",
+				   [ modifiers ], @inputs)
+
+Substitute approximate B<pattern> with B<replacement> and return as a
+list <copies> of B<@inputs>, the substitutions having been made on the
+elements that did match the pattern.  If no inputs are given,
+substitute in the B<$_>.  The replacement can contain magic strings
+B<$&>, B<$`>, B<$'> that stand for the matched string, the string
+before it, and the string after it, respectively.  All the other
+arguments are as in C<amatch()>, plus one additional modifier, B<"g">
+which means substitute globally (all the matches in an element and not
+just the first one, as is the default).
+
+See L<BAD NEWS> about the unfortunate stinginess of C<asubstitute()>.
+
+=head1 INDEX
+
+	use String::Approx 'aindex';
+
+	$index   = aindex("pattern")
+	@indices = aindex("pattern", @inputs)
+	$index   = aindex("pattern", [ modifiers ])
+	@indices = aindex("pattern", [ modifiers ], @inputs)
+
+Like C<amatch()> but returns the index/indices at which the pattern
+matches approximately.  In list context and if C<@inputs> are used,
+returns a list of indices, one index for each input element.
+If there's no approximate match, C<-1> is returned as the index.
+
+NOTE: if there is character repetition (e.g. "aa") either in
+the pattern or in the text, the returned index might start 
+"too early".  This is consistent with the goal of the module
+of matching "as early as possible", just like regular expressions
+(that there might be a "less approximate" match starting later is
+of somewhat irrelevant).
+
+There's also backwards-scanning C<arindex()>.
+
+=head1 SLICE
+
+	use String::Approx 'aslice';
+
+	($index, $size)   = aslice("pattern")
+	([$i0, $s0], ...) = aslice("pattern", @inputs)
+	($index, $size)   = aslice("pattern", [ modifiers ])
+	([$i0, $s0], ...) = aslice("pattern", [ modifiers ], @inputs)
+
+Like C<aindex()> but returns also the size (length) of the match.
+If the match fails, returns an empty list (when matching against C<$_>)
+or an empty anonymous list corresponding to the particular input.
+
+NOTE: size of the match will very probably be something you did not
+expect (such as longer than the pattern, or a negative number).  This
+may or may not be fixed in future releases. Also the beginning of the
+match may vary from the expected as with aindex(), see above.
+
+If the modifier
+
+	"minimal_distance"
+
+is used, the minimal possible edit distance is returned as the
+third element:
+
+	($index, $size, $distance) = aslice("pattern", [ modifiers ])
+	([$i0, $s0, $d0], ...)     = aslice("pattern", [ modifiers ], @inputs)
+
+=head1 DISTANCE
+
+	use String::Approx 'adist';
+
+	$dist = adist("pattern", $input);
+	@dist = adist("pattern", @input);
+
+Return the I<edit distance> or distances between the pattern and the
+input or inputs.  Zero edit distance means exact match.  (Remember
+that the match can 'float' in the inputs, the match is a substring
+match.)  If the pattern is longer than the input or inputs, the
+returned distance or distances is or are negative.
+
+	use String::Approx 'adistr';
+
+	$dist = adistr("pattern", $input);
+	@dist = adistr("pattern", @inputs);
+
+Return the B<relative> I<edit distance> or distances between the
+pattern and the input or inputs.  Zero relative edit distance means
+exact match, one means completely different.  (Remember that the
+match can 'float' in the inputs, the match is a substring match.)  If
+the pattern is longer than the input or inputs, the returned distance
+or distances is or are negative.
+
+You can use adist() or adistr() to sort the inputs according to their
+approximateness:
+
+	my %d;
+	@d{@inputs} = map { abs } adistr("pattern", @inputs);
+	my @d = sort { $d{$a} <=> $d{$b} } @inputs;
+
+Now C<@d> contains the inputs, the most like C<"pattern"> first.
+
+=head1 CONTROLLING THE CACHE
+
+C<String::Approx> maintains a LU (least-used) cache that holds the
+'matching engines' for each instance of a I<pattern+modifiers>.  The
+cache is intended to help the case where you match a small set of
+patterns against a large set of string.  However, the more engines you
+cache the more you eat memory.  If you have a lot of different
+patterns or if you have a lot of memory to burn, you may want to
+control the cache yourself.  For example, allowing a larger cache
+consumes more memory but probably runs a little bit faster since the
+cache fills (and needs flushing) less often.
+
+The cache has two parameters: I<max> and I<purge>.  The first one
+is the maximum size of the cache and the second one is the cache
+flushing ratio: when the number of cache entries exceeds I<max>,
+I<max> times I<purge> cache entries are flushed.  The default
+values are 1000 and 0.75, respectively, which means that when
+the 1001st entry would be cached, 750 least used entries will
+be removed from the cache.  To access the parameters you can
+use the calls
+
+	$now_max = String::Approx::cache_max();
+	String::Approx::cache_max($new_max);
+
+	$now_purge = String::Approx::cache_purge();
+	String::Approx::cache_purge($new_purge);
+
+	$limit = String::Approx::cache_n_purge();
+
+To be honest, there are actually B<two> caches: the first one is used
+far the patterns with no modifiers, the second one for the patterns
+with pattern modifiers.  Using the standard parameters you will
+therefore actually cache up to 2000 entries.  The above calls control
+both caches for the same price.
+
+To disable caching completely use
+
+	String::Approx::cache_disable();
+
+Note that this doesn't flush any possibly existing cache entries,
+to do that use
+
+	String::Approx::cache_flush_all();
+
+=head1 NOTES
+
+Because matching is by I<substrings>, not by whole strings, insertions
+and substitutions produce often very similar results: "abcde" matches
+"axbcde" either by insertion B<or> substitution of "x".
+
+The maximum edit distance is also the maximum number of edits.
+That is, the B<"I2"> in
+
+	amatch("abcd", ["I2"])
+
+is useless because the maximum edit distance is (implicitly) 1.
+You may have meant to say
+
+	amatch("abcd", ["2D1S1"])
+
+or something like that.
+
+If you want to simulate transposes
+
+	feet fete
+
+you need to allow at least edit distance of two because in terms of
+our edit primitives a transpose is first one deletion and then one
+insertion.
+
+=head2 TEXT POSITION
+
+The starting and ending positions of matching, substituting, indexing, or
+slicing can be changed from the beginning and end of the input(s) to
+some other positions by using either or both of the modifiers
+
+	"initial_position=24"
+	"final_position=42"
+
+or the both the modifiers
+
+	"initial_position=24"
+	"position_range=10"
+
+By setting the B<"position_range"> to be zero you can limit
+(anchor) the operation to happen only once (if a match is possible)
+at the position.
+
+=head1 VERSION
+
+Major release 3.
+
+=head1 CHANGES FROM VERSION 2
+
+=head2 GOOD NEWS
+
+=over 4
+
+=item The version 3 is 2-3 times faster than version 2
+
+=item No pattern length limitation
+
+The algorithm is independent on the pattern length: its time
+complexity is I<O(kn)>, where I<k> is the number of edits and I<n> the
+length of the text (input).  The preprocessing of the pattern will of
+course take some I<O(m)> (I<m> being the pattern length) time, but
+C<amatch()> and C<asubstitute()> cache the result of this
+preprocessing so that it is done only once per pattern.
+
+=back
+
+=head2 BAD NEWS
+
+=over 4
+
+=item You do need a C compiler to install the module
+
+Perl's regular expressions are no more used; instead a faster and more
+scalable algorithm written in C is used.
+
+=item C<asubstitute()> is now always stingy
+
+The string matched and substituted is now always stingy, as short
+as possible.  It used to be as long as possible.  This is an unfortunate
+change stemming from switching the matching algorithm.  Example: with
+edit distance of two and substituting for B<"word"> from B<"cork"> and
+B<"wool"> previously did match B<"cork"> and B<"wool">.  Now it does
+match B<"or"> and B<"wo">.  As little as possible, or, in other words,
+with as much approximateness, as many edits, as possible.  Because
+there is no I<need> to match the B<"c"> of B<"cork">, it is not matched.
+
+=item no more C<aregex()> because regular expressions are no more used
+
+=item no more C<compat1> for String::Approx version 1 compatibility
+
+=back
+
+=head1 ACKNOWLEDGEMENTS
+
+The following people have provided valuable test cases, documentation
+clarifications, and other feedback:
+
+Jared August, Arthur Bergman, Anirvan Chatterjee, Steve A. Chervitz,
+Aldo Calpini, David Curiel, Teun van den Dool, Alberto Fontaneda,
+Rob Fugina, Dmitrij Frishman, Lars Gregersen, Kevin Greiner,
+B. Elijah Griffin, Mike Hanafey, Mitch Helle, Ricky Houghton,
+'idallen', Helmut Jarausch, Damian Keefe, Ben Kennedy, Craig Kelley,
+Franz Kirsch, Dag Kristian, Mark Land, J. D. Laub, John P. Linderman,
+Tim Maher, Juha Muilu, Sergey Novoselov, Andy Oram, Ji Y Park,
+Eric Promislow, Nikolaus Rath, Stefan Ram, Slaven Rezic,
+Dag Kristian Rognlien, Stewart Russell, Slaven Rezic, Chris Rosin,
+Pasha Sadri, Ilya Sandler, Bob J.A. Schijvenaars, Ross Smith,
+Frank Tobin, Greg Ward, Rich Williams, Rick Wise.
+
+The matching algorithm was developed by Udi Manber, Sun Wu, and Burra
+Gopal in the Department of Computer Science, University of Arizona.
+
+=head1 AUTHOR
+
+Jarkko Hietaniemi <jhi@iki.fi>
+
+=head1 COPYRIGHT AND LICENSE
+
+Copyright 2001-2013 by Jarkko Hietaniemi
+
+This library is free software; you can redistribute it and/or modify
+under either the terms of the Artistic License 2.0, or the GNU Library
+General Public License, Version 2.  See the files Artistic and LGPL
+for more details.
+
+Furthermore: no warranties or obligations of any kind are given, and
+the separate file F<COPYRIGHT> must be included intact in all copies
+and derived materials.
+
+=cut
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/lib/perl5/x86_64-linux-gnu-thread-multi/Text/LevenshteinXS.pm	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,75 @@
+package Text::LevenshteinXS;
+
+use strict;
+use warnings;
+use Carp;
+
+require Exporter;
+require DynaLoader;
+use AutoLoader;
+
+our @ISA = qw(Exporter DynaLoader);
+
+our %EXPORT_TAGS = ( 'all' => [ qw(
+	
+) ] );
+
+our @EXPORT_OK = ( @{ $EXPORT_TAGS{'all'} } );
+
+our @EXPORT = qw(
+distance
+);
+our $VERSION = '0.03';
+
+bootstrap Text::LevenshteinXS $VERSION;
+
+1;
+__END__
+
+=head1 NAME
+
+Text::LevenshteinXS - An XS implementation of the Levenshtein edit distance
+
+=head1 SYNOPSIS
+
+ use Text::LevenshteinXS qw(distance);
+
+ print distance("foo","four");
+ # prints "2"
+
+ print distance("foo","bar");
+ # prints "3"
+
+
+=head1 DESCRIPTION
+
+This module implements the Levenshtein edit distance in a XS way.
+
+The Levenshtein edit distance is a measure of the degree of proximity between two strings.
+This distance is the number of substitutions, deletions or insertions ("edits") 
+needed to transform one string into the other one (and vice versa).
+When two strings have distance 0, they are the same.
+A good point to start is: <http://www.merriampark.com/ld.htm>
+
+
+=head1 CREDITS
+
+All the credits go to Vladimir Levenshtein the author of the algorithm and to 
+Lorenzo Seidenari who made the C implementation <http://www.merriampark.com/ldc.htm>
+
+
+=head1 SEE ALSO
+
+Text::Levenshtein , Text::WagnerFischer , Text::Brew , String::Approx
+
+
+=head1 AUTHOR
+
+Copyright 2003 Dree Mistrut <F<dree@friul.it>>
+Modifications Copyright 2004 Josh Goldberg <F<josh@3io.com>>
+
+This package is free software and is provided "as is" without express
+or implied warranty.  You can redistribute it and/or modify it under 
+the same terms as Perl itself.
+
+=cut
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/lib/perl5/x86_64-linux-gnu-thread-multi/auto/String/Approx/.packlist	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,4 @@
+/home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/lib/perl5/x86_64-linux-gnu-thread-multi/String/Approx.pm
+/home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/lib/perl5/x86_64-linux-gnu-thread-multi/auto/String/Approx/Approx.bs
+/home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/lib/perl5/x86_64-linux-gnu-thread-multi/auto/String/Approx/Approx.so
+/home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/man/man3/String::Approx.3pm
Binary file 2.4/lib/perl5/x86_64-linux-gnu-thread-multi/auto/String/Approx/Approx.so has changed
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/lib/perl5/x86_64-linux-gnu-thread-multi/auto/Text/LevenshteinXS/.packlist	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,5 @@
+/home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/lib/perl5/x86_64-linux-gnu-thread-multi/Text/LevenshteinXS.pm
+/home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/lib/perl5/x86_64-linux-gnu-thread-multi/auto/Text/LevenshteinXS/LevenshteinXS.bs
+/home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/lib/perl5/x86_64-linux-gnu-thread-multi/auto/Text/LevenshteinXS/LevenshteinXS.so
+/home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/lib/perl5/x86_64-linux-gnu-thread-multi/auto/Text/LevenshteinXS/autosplit.ix
+/home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/man/man3/Text::LevenshteinXS.3pm
Binary file 2.4/lib/perl5/x86_64-linux-gnu-thread-multi/auto/Text/LevenshteinXS/LevenshteinXS.so has changed
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/lib/perl5/x86_64-linux-gnu-thread-multi/auto/Text/LevenshteinXS/autosplit.ix	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,3 @@
+# Index created by AutoSplit for blib/lib/Text/LevenshteinXS.pm
+#    (file acts as timestamp)
+1;
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/lib/perl5/x86_64-linux-gnu-thread-multi/perllocal.pod	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,66 @@
+=head2 Tue Jun  3 18:21:52 2014: C<Module> L<String::Approx|String::Approx>
+
+=over 4
+
+=item *
+
+C<installed into: /home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/lib/perl5>
+
+=item *
+
+C<LINKTYPE: dynamic>
+
+=item *
+
+C<VERSION: 3.27>
+
+=item *
+
+C<EXE_FILES: >
+
+=back
+
+=head2 Tue Jun  3 18:21:52 2014: C<Module> L<String::Approx|String::Approx>
+
+=over 4
+
+=item *
+
+C<installed into: /home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/lib/perl5>
+
+=item *
+
+C<LINKTYPE: dynamic>
+
+=item *
+
+C<VERSION: 3.27>
+
+=item *
+
+C<EXE_FILES: >
+
+=back
+
+=head2 Tue Jun  3 18:21:53 2014: C<Module> L<Text::LevenshteinXS|Text::LevenshteinXS>
+
+=over 4
+
+=item *
+
+C<installed into: /home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/lib/perl5>
+
+=item *
+
+C<LINKTYPE: dynamic>
+
+=item *
+
+C<VERSION: 0.03>
+
+=item *
+
+C<EXE_FILES: >
+
+=back
+
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/library/LevD.pm	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,80 @@
+package LevD;
+
+use lib "/data2/bsi/reference/softsearch/lib/perl5";
+use strict;
+use warnings;
+use Data::Dumper;
+use String::Approx 'adist';
+use String::Approx 'adistr';
+use String::Approx 'aindex';
+
+my $WINDOW_SIZE = 100;
+
+sub new {
+	my ($class, $file) = @_;
+    my $self = {};
+
+ 	bless($self,$class);
+	$self->init();
+
+	return $self;
+}
+
+sub init {
+	my ($self) = @_;
+
+	#### default values.
+	$self->{index} = 0;
+	$self->{relative_edit_dist} = 0;
+	$self->{edit_dist} = 0;
+}
+
+sub search {
+	my ($self, $clip, $chr, $start, $stop, $ref) = @_;
+
+	if (! -s $ref) {
+		die "ERROR: Reference file $ref now found\n";
+	}
+
+	#### extact seq from reference file.
+	my $target = $chr .":". $start ."-". $stop;
+	my $cmd = "samtools faidx $ref $target";
+
+	my @output = $self->_run_system_cmd($cmd);
+
+	#### depending on ref file format seq could be on multiple lines
+	#### concatinate all except for the header in one line.
+	#### e.g:
+	#### >chr1:8222999-8223099
+	#### GGTGCAATCATAGCTCACTAAGCTTCAACCTCAAGAGATCCTCCCACCTCAGCCTCCCAG
+	#### GTAGCTGGGACTACAGGCAAATGCCATGACACCTAGCTAAT
+	my $seq = join("", @output[1..$#output]);
+
+	#### remove new line character
+	$seq =~ s/\n//g;
+
+	#### find number of mismatches and start index
+	#### of clip to be searched against target seq.
+	$self->{relative_edit_dist} = adistr($clip, $seq);
+	$self->{edit_dist} = adist($clip, $seq);
+	$self->{index} = aindex($clip, $seq);
+}
+
+sub _run_system_cmd {
+	my ($self, $cmd) = @_;
+	my @cmd_output;
+
+	eval {
+		@cmd_output = qx{$cmd 2>&1};
+		if ( ($? << 8) != 0 ) {
+			die "@cmd_output";
+		}
+	};
+	if ($@) {
+		die "Error executing command $cmd: $@";
+	}
+
+	return @cmd_output;
+}
+
+1;
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/logs/StringApprox.out	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,12 @@
+Checking if your kit is complete...
+Looks good
+Writing Makefile for String::Approx
+Writing MYMETA.yml
+Files found in blib/arch: installing files in blib/lib into architecture dependent library tree
+Installing /home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/lib/perl5/x86_64-linux-gnu-thread-multi/auto/String/Approx/Approx.bs
+Installing /home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/lib/perl5/x86_64-linux-gnu-thread-multi/auto/String/Approx/Approx.so
+Installing /home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/lib/perl5/x86_64-linux-gnu-thread-multi/String/Approx.pm
+Installing /home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/man/man3/String::Approx.3pm
+Appending installation info to /home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/lib/perl5/x86_64-linux-gnu-thread-multi/perllocal.pod
+Files found in blib/arch: installing files in blib/lib into architecture dependent library tree
+Appending installation info to /home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/lib/perl5/x86_64-linux-gnu-thread-multi/perllocal.pod
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/logs/levD.err	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,1 @@
+Please specify prototyping behavior for LevenshteinXS.xs (see perlxs manual)
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/logs/levD.out	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,25 @@
+Checking if your kit is complete...
+Looks good
+Writing Makefile for Text::LevenshteinXS
+Writing MYMETA.yml
+cp LevenshteinXS.pm blib/lib/Text/LevenshteinXS.pm
+AutoSplitting blib/lib/Text/LevenshteinXS.pm (blib/lib/auto/Text/LevenshteinXS)
+/usr/bin/perl /usr/share/perl/5.14/ExtUtils/xsubpp  -typemap /usr/share/perl/5.14/ExtUtils/typemap  LevenshteinXS.xs > LevenshteinXS.xsc && mv LevenshteinXS.xsc LevenshteinXS.c
+cc -c   -D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -g   -DVERSION=\"0.03\" -DXS_VERSION=\"0.03\" -fPIC "-I/usr/lib/perl/5.14/CORE"   LevenshteinXS.c
+Running Mkbootstrap for Text::LevenshteinXS ()
+chmod 644 LevenshteinXS.bs
+rm -f blib/arch/auto/Text/LevenshteinXS/LevenshteinXS.so
+cc  -shared -O2 -g -L/usr/local/lib -fstack-protector LevenshteinXS.o  -o blib/arch/auto/Text/LevenshteinXS/LevenshteinXS.so 	\
+	     	\
+	  
+chmod 755 blib/arch/auto/Text/LevenshteinXS/LevenshteinXS.so
+cp LevenshteinXS.bs blib/arch/auto/Text/LevenshteinXS/LevenshteinXS.bs
+chmod 644 blib/arch/auto/Text/LevenshteinXS/LevenshteinXS.bs
+Manifying blib/man3/Text::LevenshteinXS.3pm
+Files found in blib/arch: installing files in blib/lib into architecture dependent library tree
+Installing /home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/lib/perl5/x86_64-linux-gnu-thread-multi/auto/Text/LevenshteinXS/LevenshteinXS.bs
+Installing /home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/lib/perl5/x86_64-linux-gnu-thread-multi/auto/Text/LevenshteinXS/LevenshteinXS.so
+Installing /home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/lib/perl5/x86_64-linux-gnu-thread-multi/auto/Text/LevenshteinXS/autosplit.ix
+Installing /home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/lib/perl5/x86_64-linux-gnu-thread-multi/Text/LevenshteinXS.pm
+Installing /home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/man/man3/Text::LevenshteinXS.3pm
+Appending installation info to /home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/lib/perl5/x86_64-linux-gnu-thread-multi/perllocal.pod
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/man/man3/String::Approx.3pm	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,585 @@
+.\" Automatically generated by Pod::Man 2.25 (Pod::Simple 3.16)
+.\"
+.\" Standard preamble:
+.\" ========================================================================
+.de Sp \" Vertical space (when we can't use .PP)
+.if t .sp .5v
+.if n .sp
+..
+.de Vb \" Begin verbatim text
+.ft CW
+.nf
+.ne \\$1
+..
+.de Ve \" End verbatim text
+.ft R
+.fi
+..
+.\" Set up some character translations and predefined strings.  \*(-- will
+.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
+.\" double quote, and \*(R" will give a right double quote.  \*(C+ will
+.\" give a nicer C++.  Capital omega is used to do unbreakable dashes and
+.\" therefore won't be available.  \*(C` and \*(C' expand to `' in nroff,
+.\" nothing in troff, for use with C<>.
+.tr \(*W-
+.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
+.ie n \{\
+.    ds -- \(*W-
+.    ds PI pi
+.    if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
+.    if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\"  diablo 12 pitch
+.    ds L" ""
+.    ds R" ""
+.    ds C` ""
+.    ds C' ""
+'br\}
+.el\{\
+.    ds -- \|\(em\|
+.    ds PI \(*p
+.    ds L" ``
+.    ds R" ''
+'br\}
+.\"
+.\" Escape single quotes in literal strings from groff's Unicode transform.
+.ie \n(.g .ds Aq \(aq
+.el       .ds Aq '
+.\"
+.\" If the F register is turned on, we'll generate index entries on stderr for
+.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
+.\" entries marked with X<> in POD.  Of course, you'll have to process the
+.\" output yourself in some meaningful fashion.
+.ie \nF \{\
+.    de IX
+.    tm Index:\\$1\t\\n%\t"\\$2"
+..
+.    nr % 0
+.    rr F
+.\}
+.el \{\
+.    de IX
+..
+.\}
+.\"
+.\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2).
+.\" Fear.  Run.  Save yourself.  No user-serviceable parts.
+.    \" fudge factors for nroff and troff
+.if n \{\
+.    ds #H 0
+.    ds #V .8m
+.    ds #F .3m
+.    ds #[ \f1
+.    ds #] \fP
+.\}
+.if t \{\
+.    ds #H ((1u-(\\\\n(.fu%2u))*.13m)
+.    ds #V .6m
+.    ds #F 0
+.    ds #[ \&
+.    ds #] \&
+.\}
+.    \" simple accents for nroff and troff
+.if n \{\
+.    ds ' \&
+.    ds ` \&
+.    ds ^ \&
+.    ds , \&
+.    ds ~ ~
+.    ds /
+.\}
+.if t \{\
+.    ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u"
+.    ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u'
+.    ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u'
+.    ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u'
+.    ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u'
+.    ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u'
+.\}
+.    \" troff and (daisy-wheel) nroff accents
+.ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V'
+.ds 8 \h'\*(#H'\(*b\h'-\*(#H'
+.ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#]
+.ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H'
+.ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u'
+.ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#]
+.ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#]
+.ds ae a\h'-(\w'a'u*4/10)'e
+.ds Ae A\h'-(\w'A'u*4/10)'E
+.    \" corrections for vroff
+.if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u'
+.if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u'
+.    \" for low resolution devices (crt and lpr)
+.if \n(.H>23 .if \n(.V>19 \
+\{\
+.    ds : e
+.    ds 8 ss
+.    ds o a
+.    ds d- d\h'-1'\(ga
+.    ds D- D\h'-1'\(hy
+.    ds th \o'bp'
+.    ds Th \o'LP'
+.    ds ae ae
+.    ds Ae AE
+.\}
+.rm #[ #] #H #V #F C
+.\" ========================================================================
+.\"
+.IX Title "Approx 3pm"
+.TH Approx 3pm "2013-01-22" "perl v5.14.2" "User Contributed Perl Documentation"
+.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
+.\" way too many mistakes in technical documents.
+.if n .ad l
+.nh
+.SH "NAME"
+String::Approx \- Perl extension for approximate matching (fuzzy matching)
+.SH "SYNOPSIS"
+.IX Header "SYNOPSIS"
+.Vb 1
+\&  use String::Approx \*(Aqamatch\*(Aq;
+\&
+\&  print if amatch("foobar");
+\&
+\&  my @matches = amatch("xyzzy", @inputs);
+\&
+\&  my @catches = amatch("plugh", [\*(Aq2\*(Aq], @inputs);
+.Ve
+.SH "DESCRIPTION"
+.IX Header "DESCRIPTION"
+String::Approx lets you match and substitute strings approximately.
+With this you can emulate errors: typing errorrs, speling errors,
+closely related vocabularies (colour color), genetic mutations (\s-1GAG\s0
+\&\s-1ACT\s0), abbreviations (McScot, MacScot).
+.PP
+\&\s-1NOTE:\s0 String::Approx suits the task of \fBstring matching\fR, not 
+\&\fBstring comparison\fR, and it works for \fBstrings\fR, not for \fBtext\fR.
+.PP
+If you want to compare strings for similarity, you probably just want
+the Levenshtein edit distance (explained below), the Text::Levenshtein
+and Text::LevenshteinXS modules in \s-1CPAN\s0.  See also Text::WagnerFischer
+and Text::PhraseDistance.  (There are functions for this in String::Approx,
+e.g. \fIadist()\fR, but their results sometimes differ from the bare Levenshtein
+et al.)
+.PP
+If you want to compare things like text or source code, consisting of
+\&\fBwords\fR or \fBtokens\fR and \fBphrases\fR and \fBsentences\fR, or
+\&\fBexpressions\fR and \fBstatements\fR, you should probably use some other
+tool than String::Approx, like for example the standard \s-1UNIX\s0 \fIdiff\fR\|(1)
+tool, or the Algorithm::Diff module from \s-1CPAN\s0.
+.PP
+The measure of \fBapproximateness\fR is the \fILevenshtein edit distance\fR.
+It is the total number of \*(L"edits\*(R": insertions,
+.PP
+.Vb 1
+\&        word world
+.Ve
+.PP
+deletions,
+.PP
+.Vb 1
+\&        monkey money
+.Ve
+.PP
+and substitutions
+.PP
+.Vb 1
+\&        sun fun
+.Ve
+.PP
+required to transform a string to another string.  For example, to
+transform \fI\*(L"lead\*(R"\fR into \fI\*(L"gold\*(R"\fR, you need three edits:
+.PP
+.Vb 1
+\&        lead gead goad gold
+.Ve
+.PP
+The edit distance of \*(L"lead\*(R" and \*(L"gold\*(R" is therefore three, or 75%.
+.PP
+\&\fBString::Approx\fR uses the Levenshtein edit distance as its measure, but
+String::Approx is not well-suited for comparing strings of different
+length, in other words, if you want a \*(L"fuzzy eq\*(R", see above.
+String::Approx is more like regular expressions or \fIindex()\fR, it finds
+substrings that are close matches.>
+.SH "MATCH"
+.IX Header "MATCH"
+.Vb 1
+\&        use String::Approx \*(Aqamatch\*(Aq;
+\&
+\&        $matched     = amatch("pattern") 
+\&        $matched     = amatch("pattern", [ modifiers ])
+\&
+\&        $any_matched = amatch("pattern", @inputs) 
+\&        $any_matched = amatch("pattern", [ modifiers ], @inputs)
+\&
+\&        @match       = amatch("pattern") 
+\&        @match       = amatch("pattern", [ modifiers ])
+\&
+\&        @matches     = amatch("pattern", @inputs) 
+\&        @matches     = amatch("pattern", [ modifiers ], @inputs)
+.Ve
+.PP
+Match \fBpattern\fR approximately.  In list context return the matched
+\&\fB\f(CB@inputs\fB\fR.  If no inputs are given, match against the \fB\f(CB$_\fB\fR.  In scalar
+context return true if \fIany\fR of the inputs match, false if none match.
+.PP
+Notice that the pattern is a string.  Not a regular expression.  None
+of the regular expression notations (^, ., *, and so on) work.  They
+are characters just like the others.  Note-on-note: some limited form
+of \fI\*(L"regular expressionism\*(R"\fR is planned in future: for example
+character classes ([abc]) and \fIany-chars\fR (.).  But that feature will
+be turned on by a special \fImodifier\fR (just a guess: \*(L"r\*(R"), so there
+should be no backward compatibility problem.
+.PP
+Notice also that matching is not symmetric.  The inputs are matched
+against the pattern, not the other way round.  In other words: the
+pattern can be a substring, a submatch, of an input element.  An input
+element is always a superstring of the pattern.
+.SS "\s-1MODIFIERS\s0"
+.IX Subsection "MODIFIERS"
+With the modifiers you can control the amount of approximateness and
+certain other control variables.  The modifiers are one or more
+strings, for example \fB\*(L"i\*(R"\fR, within a string optionally separated by
+whitespace.  The modifiers are inside an anonymous array: the \fB[ ]\fR
+in the syntax are not notational, they really do mean \fB[ ]\fR, for
+example \fB[ \*(L"i\*(R", \*(L"2\*(R" ]\fR.  \fB[\*(L"2 i\*(R"]\fR would be identical.
+.PP
+The implicit default approximateness is 10%, rounded up.  In other
+words: every tenth character in the pattern may be an error, an edit.
+You can explicitly set the maximum approximateness by supplying a
+modifier like
+.PP
+.Vb 2
+\&        number
+\&        number%
+.Ve
+.PP
+Examples: \fB\*(L"3\*(R"\fR, \fB\*(L"15%\*(R"\fR.
+.PP
+Note that \f(CW\*(C`0%\*(C'\fR is not rounded up, it is equal to \f(CW0\fR.
+.PP
+Using a similar syntax you can separately control the maximum number
+of insertions, deletions, and substitutions by prefixing the numbers
+with I, D, or S, like this:
+.PP
+.Vb 6
+\&        Inumber
+\&        Inumber%
+\&        Dnumber
+\&        Dnumber%
+\&        Snumber
+\&        Snumber%
+.Ve
+.PP
+Examples: \fB\*(L"I2\*(R"\fR, \fB\*(L"D20%\*(R"\fR, \fB\*(L"S0\*(R"\fR.
+.PP
+You can ignore case (\fB\*(L"A\*(R"\fR becames equal to \fB\*(L"a\*(R"\fR and vice versa)
+by adding the \fB\*(L"i\*(R"\fR modifier.
+.PP
+For example
+.PP
+.Vb 1
+\&        [ "i 25%", "S0" ]
+.Ve
+.PP
+means \fIignore case\fR, \fIallow every fourth character to be \*(L"an edit\*(R"\fR,
+but allow \fIno substitutions\fR.  (See \s-1NOTES\s0 about disallowing
+substitutions or insertions.)
+.PP
+\&\s-1NOTE:\s0 setting \f(CW\*(C`I0 D0 S0\*(C'\fR is not equivalent to using \fIindex()\fR.
+If you want to use \fIindex()\fR, use \fIindex()\fR.
+.SH "SUBSTITUTE"
+.IX Header "SUBSTITUTE"
+.Vb 1
+\&        use String::Approx \*(Aqasubstitute\*(Aq;
+\&
+\&        @substituted = asubstitute("pattern", "replacement")
+\&        @substituted = asubstitute("pattern", "replacement", @inputs) 
+\&        @substituted = asubstitute("pattern", "replacement", [ modifiers ])
+\&        @substituted = asubstitute("pattern", "replacement",
+\&                                   [ modifiers ], @inputs)
+.Ve
+.PP
+Substitute approximate \fBpattern\fR with \fBreplacement\fR and return as a
+list <copies> of \fB\f(CB@inputs\fB\fR, the substitutions having been made on the
+elements that did match the pattern.  If no inputs are given,
+substitute in the \fB\f(CB$_\fB\fR.  The replacement can contain magic strings
+\&\fB$&\fR, \fB$`\fR, \fB$'\fR that stand for the matched string, the string
+before it, and the string after it, respectively.  All the other
+arguments are as in \f(CW\*(C`amatch()\*(C'\fR, plus one additional modifier, \fB\*(L"g\*(R"\fR
+which means substitute globally (all the matches in an element and not
+just the first one, as is the default).
+.PP
+See \*(L"\s-1BAD\s0 \s-1NEWS\s0\*(R" about the unfortunate stinginess of \f(CW\*(C`asubstitute()\*(C'\fR.
+.SH "INDEX"
+.IX Header "INDEX"
+.Vb 1
+\&        use String::Approx \*(Aqaindex\*(Aq;
+\&
+\&        $index   = aindex("pattern")
+\&        @indices = aindex("pattern", @inputs)
+\&        $index   = aindex("pattern", [ modifiers ])
+\&        @indices = aindex("pattern", [ modifiers ], @inputs)
+.Ve
+.PP
+Like \f(CW\*(C`amatch()\*(C'\fR but returns the index/indices at which the pattern
+matches approximately.  In list context and if \f(CW@inputs\fR are used,
+returns a list of indices, one index for each input element.
+If there's no approximate match, \f(CW\*(C`\-1\*(C'\fR is returned as the index.
+.PP
+\&\s-1NOTE:\s0 if there is character repetition (e.g. \*(L"aa\*(R") either in
+the pattern or in the text, the returned index might start 
+\&\*(L"too early\*(R".  This is consistent with the goal of the module
+of matching \*(L"as early as possible\*(R", just like regular expressions
+(that there might be a \*(L"less approximate\*(R" match starting later is
+of somewhat irrelevant).
+.PP
+There's also backwards-scanning \f(CW\*(C`arindex()\*(C'\fR.
+.SH "SLICE"
+.IX Header "SLICE"
+.Vb 1
+\&        use String::Approx \*(Aqaslice\*(Aq;
+\&
+\&        ($index, $size)   = aslice("pattern")
+\&        ([$i0, $s0], ...) = aslice("pattern", @inputs)
+\&        ($index, $size)   = aslice("pattern", [ modifiers ])
+\&        ([$i0, $s0], ...) = aslice("pattern", [ modifiers ], @inputs)
+.Ve
+.PP
+Like \f(CW\*(C`aindex()\*(C'\fR but returns also the size (length) of the match.
+If the match fails, returns an empty list (when matching against \f(CW$_\fR)
+or an empty anonymous list corresponding to the particular input.
+.PP
+\&\s-1NOTE:\s0 size of the match will very probably be something you did not
+expect (such as longer than the pattern, or a negative number).  This
+may or may not be fixed in future releases. Also the beginning of the
+match may vary from the expected as with \fIaindex()\fR, see above.
+.PP
+If the modifier
+.PP
+.Vb 1
+\&        "minimal_distance"
+.Ve
+.PP
+is used, the minimal possible edit distance is returned as the
+third element:
+.PP
+.Vb 2
+\&        ($index, $size, $distance) = aslice("pattern", [ modifiers ])
+\&        ([$i0, $s0, $d0], ...)     = aslice("pattern", [ modifiers ], @inputs)
+.Ve
+.SH "DISTANCE"
+.IX Header "DISTANCE"
+.Vb 1
+\&        use String::Approx \*(Aqadist\*(Aq;
+\&
+\&        $dist = adist("pattern", $input);
+\&        @dist = adist("pattern", @input);
+.Ve
+.PP
+Return the \fIedit distance\fR or distances between the pattern and the
+input or inputs.  Zero edit distance means exact match.  (Remember
+that the match can 'float' in the inputs, the match is a substring
+match.)  If the pattern is longer than the input or inputs, the
+returned distance or distances is or are negative.
+.PP
+.Vb 1
+\&        use String::Approx \*(Aqadistr\*(Aq;
+\&
+\&        $dist = adistr("pattern", $input);
+\&        @dist = adistr("pattern", @inputs);
+.Ve
+.PP
+Return the \fBrelative\fR \fIedit distance\fR or distances between the
+pattern and the input or inputs.  Zero relative edit distance means
+exact match, one means completely different.  (Remember that the
+match can 'float' in the inputs, the match is a substring match.)  If
+the pattern is longer than the input or inputs, the returned distance
+or distances is or are negative.
+.PP
+You can use \fIadist()\fR or \fIadistr()\fR to sort the inputs according to their
+approximateness:
+.PP
+.Vb 3
+\&        my %d;
+\&        @d{@inputs} = map { abs } adistr("pattern", @inputs);
+\&        my @d = sort { $d{$a} <=> $d{$b} } @inputs;
+.Ve
+.PP
+Now \f(CW@d\fR contains the inputs, the most like \f(CW"pattern"\fR first.
+.SH "CONTROLLING THE CACHE"
+.IX Header "CONTROLLING THE CACHE"
+\&\f(CW\*(C`String::Approx\*(C'\fR maintains a \s-1LU\s0 (least-used) cache that holds the
+\&'matching engines' for each instance of a \fIpattern+modifiers\fR.  The
+cache is intended to help the case where you match a small set of
+patterns against a large set of string.  However, the more engines you
+cache the more you eat memory.  If you have a lot of different
+patterns or if you have a lot of memory to burn, you may want to
+control the cache yourself.  For example, allowing a larger cache
+consumes more memory but probably runs a little bit faster since the
+cache fills (and needs flushing) less often.
+.PP
+The cache has two parameters: \fImax\fR and \fIpurge\fR.  The first one
+is the maximum size of the cache and the second one is the cache
+flushing ratio: when the number of cache entries exceeds \fImax\fR,
+\&\fImax\fR times \fIpurge\fR cache entries are flushed.  The default
+values are 1000 and 0.75, respectively, which means that when
+the 1001st entry would be cached, 750 least used entries will
+be removed from the cache.  To access the parameters you can
+use the calls
+.PP
+.Vb 2
+\&        $now_max = String::Approx::cache_max();
+\&        String::Approx::cache_max($new_max);
+\&
+\&        $now_purge = String::Approx::cache_purge();
+\&        String::Approx::cache_purge($new_purge);
+\&
+\&        $limit = String::Approx::cache_n_purge();
+.Ve
+.PP
+To be honest, there are actually \fBtwo\fR caches: the first one is used
+far the patterns with no modifiers, the second one for the patterns
+with pattern modifiers.  Using the standard parameters you will
+therefore actually cache up to 2000 entries.  The above calls control
+both caches for the same price.
+.PP
+To disable caching completely use
+.PP
+.Vb 1
+\&        String::Approx::cache_disable();
+.Ve
+.PP
+Note that this doesn't flush any possibly existing cache entries,
+to do that use
+.PP
+.Vb 1
+\&        String::Approx::cache_flush_all();
+.Ve
+.SH "NOTES"
+.IX Header "NOTES"
+Because matching is by \fIsubstrings\fR, not by whole strings, insertions
+and substitutions produce often very similar results: \*(L"abcde\*(R" matches
+\&\*(L"axbcde\*(R" either by insertion \fBor\fR substitution of \*(L"x\*(R".
+.PP
+The maximum edit distance is also the maximum number of edits.
+That is, the \fB\*(L"I2\*(R"\fR in
+.PP
+.Vb 1
+\&        amatch("abcd", ["I2"])
+.Ve
+.PP
+is useless because the maximum edit distance is (implicitly) 1.
+You may have meant to say
+.PP
+.Vb 1
+\&        amatch("abcd", ["2D1S1"])
+.Ve
+.PP
+or something like that.
+.PP
+If you want to simulate transposes
+.PP
+.Vb 1
+\&        feet fete
+.Ve
+.PP
+you need to allow at least edit distance of two because in terms of
+our edit primitives a transpose is first one deletion and then one
+insertion.
+.SS "\s-1TEXT\s0 \s-1POSITION\s0"
+.IX Subsection "TEXT POSITION"
+The starting and ending positions of matching, substituting, indexing, or
+slicing can be changed from the beginning and end of the input(s) to
+some other positions by using either or both of the modifiers
+.PP
+.Vb 2
+\&        "initial_position=24"
+\&        "final_position=42"
+.Ve
+.PP
+or the both the modifiers
+.PP
+.Vb 2
+\&        "initial_position=24"
+\&        "position_range=10"
+.Ve
+.PP
+By setting the \fB\*(L"position_range\*(R"\fR to be zero you can limit
+(anchor) the operation to happen only once (if a match is possible)
+at the position.
+.SH "VERSION"
+.IX Header "VERSION"
+Major release 3.
+.SH "CHANGES FROM VERSION 2"
+.IX Header "CHANGES FROM VERSION 2"
+.SS "\s-1GOOD\s0 \s-1NEWS\s0"
+.IX Subsection "GOOD NEWS"
+.IP "The version 3 is 2\-3 times faster than version 2" 4
+.IX Item "The version 3 is 2-3 times faster than version 2"
+.PD 0
+.IP "No pattern length limitation" 4
+.IX Item "No pattern length limitation"
+.PD
+The algorithm is independent on the pattern length: its time
+complexity is \fIO(kn)\fR, where \fIk\fR is the number of edits and \fIn\fR the
+length of the text (input).  The preprocessing of the pattern will of
+course take some \fIO(m)\fR (\fIm\fR being the pattern length) time, but
+\&\f(CW\*(C`amatch()\*(C'\fR and \f(CW\*(C`asubstitute()\*(C'\fR cache the result of this
+preprocessing so that it is done only once per pattern.
+.SS "\s-1BAD\s0 \s-1NEWS\s0"
+.IX Subsection "BAD NEWS"
+.IP "You do need a C compiler to install the module" 4
+.IX Item "You do need a C compiler to install the module"
+Perl's regular expressions are no more used; instead a faster and more
+scalable algorithm written in C is used.
+.ie n .IP """asubstitute()"" is now always stingy" 4
+.el .IP "\f(CWasubstitute()\fR is now always stingy" 4
+.IX Item "asubstitute() is now always stingy"
+The string matched and substituted is now always stingy, as short
+as possible.  It used to be as long as possible.  This is an unfortunate
+change stemming from switching the matching algorithm.  Example: with
+edit distance of two and substituting for \fB\*(L"word\*(R"\fR from \fB\*(L"cork\*(R"\fR and
+\&\fB\*(L"wool\*(R"\fR previously did match \fB\*(L"cork\*(R"\fR and \fB\*(L"wool\*(R"\fR.  Now it does
+match \fB\*(L"or\*(R"\fR and \fB\*(L"wo\*(R"\fR.  As little as possible, or, in other words,
+with as much approximateness, as many edits, as possible.  Because
+there is no \fIneed\fR to match the \fB\*(L"c\*(R"\fR of \fB\*(L"cork\*(R"\fR, it is not matched.
+.ie n .IP "no more ""aregex()"" because regular expressions are no more used" 4
+.el .IP "no more \f(CWaregex()\fR because regular expressions are no more used" 4
+.IX Item "no more aregex() because regular expressions are no more used"
+.PD 0
+.ie n .IP "no more ""compat1"" for String::Approx version 1 compatibility" 4
+.el .IP "no more \f(CWcompat1\fR for String::Approx version 1 compatibility" 4
+.IX Item "no more compat1 for String::Approx version 1 compatibility"
+.PD
+.SH "ACKNOWLEDGEMENTS"
+.IX Header "ACKNOWLEDGEMENTS"
+The following people have provided valuable test cases, documentation
+clarifications, and other feedback:
+.PP
+Jared August, Arthur Bergman, Anirvan Chatterjee, Steve A. Chervitz,
+Aldo Calpini, David Curiel, Teun van den Dool, Alberto Fontaneda,
+Rob Fugina, Dmitrij Frishman, Lars Gregersen, Kevin Greiner,
+B. Elijah Griffin, Mike Hanafey, Mitch Helle, Ricky Houghton,
+\&'idallen', Helmut Jarausch, Damian Keefe, Ben Kennedy, Craig Kelley,
+Franz Kirsch, Dag Kristian, Mark Land, J. D. Laub, John P. Linderman,
+Tim Maher, Juha Muilu, Sergey Novoselov, Andy Oram, Ji Y Park,
+Eric Promislow, Nikolaus Rath, Stefan Ram, Slaven Rezic,
+Dag Kristian Rognlien, Stewart Russell, Slaven Rezic, Chris Rosin,
+Pasha Sadri, Ilya Sandler, Bob J.A. Schijvenaars, Ross Smith,
+Frank Tobin, Greg Ward, Rich Williams, Rick Wise.
+.PP
+The matching algorithm was developed by Udi Manber, Sun Wu, and Burra
+Gopal in the Department of Computer Science, University of Arizona.
+.SH "AUTHOR"
+.IX Header "AUTHOR"
+Jarkko Hietaniemi <jhi@iki.fi>
+.SH "COPYRIGHT AND LICENSE"
+.IX Header "COPYRIGHT AND LICENSE"
+Copyright 2001\-2013 by Jarkko Hietaniemi
+.PP
+This library is free software; you can redistribute it and/or modify
+under either the terms of the Artistic License 2.0, or the \s-1GNU\s0 Library
+General Public License, Version 2.  See the files Artistic and \s-1LGPL\s0
+for more details.
+.PP
+Furthermore: no warranties or obligations of any kind are given, and
+the separate file \fI\s-1COPYRIGHT\s0\fR must be included intact in all copies
+and derived materials.
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/man/man3/Text::LevenshteinXS.3pm	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,168 @@
+.\" Automatically generated by Pod::Man 2.25 (Pod::Simple 3.16)
+.\"
+.\" Standard preamble:
+.\" ========================================================================
+.de Sp \" Vertical space (when we can't use .PP)
+.if t .sp .5v
+.if n .sp
+..
+.de Vb \" Begin verbatim text
+.ft CW
+.nf
+.ne \\$1
+..
+.de Ve \" End verbatim text
+.ft R
+.fi
+..
+.\" Set up some character translations and predefined strings.  \*(-- will
+.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
+.\" double quote, and \*(R" will give a right double quote.  \*(C+ will
+.\" give a nicer C++.  Capital omega is used to do unbreakable dashes and
+.\" therefore won't be available.  \*(C` and \*(C' expand to `' in nroff,
+.\" nothing in troff, for use with C<>.
+.tr \(*W-
+.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
+.ie n \{\
+.    ds -- \(*W-
+.    ds PI pi
+.    if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
+.    if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\"  diablo 12 pitch
+.    ds L" ""
+.    ds R" ""
+.    ds C` ""
+.    ds C' ""
+'br\}
+.el\{\
+.    ds -- \|\(em\|
+.    ds PI \(*p
+.    ds L" ``
+.    ds R" ''
+'br\}
+.\"
+.\" Escape single quotes in literal strings from groff's Unicode transform.
+.ie \n(.g .ds Aq \(aq
+.el       .ds Aq '
+.\"
+.\" If the F register is turned on, we'll generate index entries on stderr for
+.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
+.\" entries marked with X<> in POD.  Of course, you'll have to process the
+.\" output yourself in some meaningful fashion.
+.ie \nF \{\
+.    de IX
+.    tm Index:\\$1\t\\n%\t"\\$2"
+..
+.    nr % 0
+.    rr F
+.\}
+.el \{\
+.    de IX
+..
+.\}
+.\"
+.\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2).
+.\" Fear.  Run.  Save yourself.  No user-serviceable parts.
+.    \" fudge factors for nroff and troff
+.if n \{\
+.    ds #H 0
+.    ds #V .8m
+.    ds #F .3m
+.    ds #[ \f1
+.    ds #] \fP
+.\}
+.if t \{\
+.    ds #H ((1u-(\\\\n(.fu%2u))*.13m)
+.    ds #V .6m
+.    ds #F 0
+.    ds #[ \&
+.    ds #] \&
+.\}
+.    \" simple accents for nroff and troff
+.if n \{\
+.    ds ' \&
+.    ds ` \&
+.    ds ^ \&
+.    ds , \&
+.    ds ~ ~
+.    ds /
+.\}
+.if t \{\
+.    ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u"
+.    ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u'
+.    ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u'
+.    ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u'
+.    ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u'
+.    ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u'
+.\}
+.    \" troff and (daisy-wheel) nroff accents
+.ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V'
+.ds 8 \h'\*(#H'\(*b\h'-\*(#H'
+.ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#]
+.ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H'
+.ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u'
+.ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#]
+.ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#]
+.ds ae a\h'-(\w'a'u*4/10)'e
+.ds Ae A\h'-(\w'A'u*4/10)'E
+.    \" corrections for vroff
+.if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u'
+.if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u'
+.    \" for low resolution devices (crt and lpr)
+.if \n(.H>23 .if \n(.V>19 \
+\{\
+.    ds : e
+.    ds 8 ss
+.    ds o a
+.    ds d- d\h'-1'\(ga
+.    ds D- D\h'-1'\(hy
+.    ds th \o'bp'
+.    ds Th \o'LP'
+.    ds ae ae
+.    ds Ae AE
+.\}
+.rm #[ #] #H #V #F C
+.\" ========================================================================
+.\"
+.IX Title "LevenshteinXS 3pm"
+.TH LevenshteinXS 3pm "2004-06-30" "perl v5.14.2" "User Contributed Perl Documentation"
+.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
+.\" way too many mistakes in technical documents.
+.if n .ad l
+.nh
+.SH "NAME"
+Text::LevenshteinXS \- An XS implementation of the Levenshtein edit distance
+.SH "SYNOPSIS"
+.IX Header "SYNOPSIS"
+.Vb 1
+\& use Text::LevenshteinXS qw(distance);
+\&
+\& print distance("foo","four");
+\& # prints "2"
+\&
+\& print distance("foo","bar");
+\& # prints "3"
+.Ve
+.SH "DESCRIPTION"
+.IX Header "DESCRIPTION"
+This module implements the Levenshtein edit distance in a \s-1XS\s0 way.
+.PP
+The Levenshtein edit distance is a measure of the degree of proximity between two strings.
+This distance is the number of substitutions, deletions or insertions (\*(L"edits\*(R") 
+needed to transform one string into the other one (and vice versa).
+When two strings have distance 0, they are the same.
+A good point to start is: <http://www.merriampark.com/ld.htm>
+.SH "CREDITS"
+.IX Header "CREDITS"
+All the credits go to Vladimir Levenshtein the author of the algorithm and to 
+Lorenzo Seidenari who made the C implementation <http://www.merriampark.com/ldc.htm>
+.SH "SEE ALSO"
+.IX Header "SEE ALSO"
+Text::Levenshtein , Text::WagnerFischer , Text::Brew , String::Approx
+.SH "AUTHOR"
+.IX Header "AUTHOR"
+Copyright 2003 Dree Mistrut <\fIdree@friul.it\fR>
+Modifications Copyright 2004 Josh Goldberg <\fIjosh@3io.com\fR>
+.PP
+This package is free software and is provided \*(L"as is\*(R" without express
+or implied warranty.  You can redistribute it and/or modify it under 
+the same terms as Perl itself.
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/progress.txt	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,4 @@
+source=1
+stringApprox=1
+levD=1
+libraries=1
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/script/Annotate_SoftSearch.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,40 @@
+#!/usr/bin/perl
+open(VCF,"$ARGV[0]")||die "Usage: <VCF> <Annotation.bed>\n\n\t\t The annotation BED should be of exons\n";
+
+$bedtools=`which intersectBed`;
+if(!$bedtools){die "Requires Bedtools in path\n\n"}
+if(!$ARGV[1]){die "Usage: <VCF> <Annotation.bed>\n\n";}
+
+while (<VCF>){
+	if($_=~/^#/){print;next}
+	chomp;
+	@data=split(/\t/,$_);
+	#Get left pair information
+	$chr1=$data[0];
+        $pos1a=$data[1]-1;
+        $pos1b=$data[1];
+	#Get right pair information
+	$data[4]=~s/[ACTGactghr\[\]]//g;#$data[4]=~s/hr/chr/;
+	@pos2=split(/:/,$data[4]);
+	$chr2="chr";
+	$chr2.=$pos2[0];
+	$pos2a=$pos2[1]-1;
+	$pos2b=$pos2[1];
+	#Now get left side annotations
+	#
+	#print "LEFT=get_anno($chr1,$pos1a,$pos1b)\n";
+	$left_gene=get_anno($chr1,$pos1a,$pos1b);
+        #print "RIGHT=get_anno($chr2,$pos2a,$pos2b)\n";
+        $right_gene=get_anno($chr2,$pos2a,$pos2b);
+	print "$_\t$left_gene\t$right_gene\n";
+}
+
+close VCF;
+
+sub get_anno(){
+	my ($chr,$pos1,$pos2)=@_;
+ 	$result=`perl -e 'if(($chr)&&($pos1)&&($pos2)){print join(\"\\t\",$chr,$pos1,$pos2,\"\\n\")}else {print STDERR "Not all variables defined: chr,pos1,pos2=$chr,$pos1,$pos2\n$_\n"}'|intersectBed -a $ARGV[1] -b stdin|cut -f4|head -1`;
+	$result=~chomp;$result=~s/\n//;
+	if(!$result){$result="NA"};
+	return $result;
+}
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/script/Bam2pair.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,98 @@
+#!/usr/bin/perl
+#Author Steven Hart, PhD
+#11-15-2012
+#Convert and filter BAM files into merged bed 
+#Output should be 
+#chrA StartA EndA chrB StartB EndB Gene_id #supportingReads StrandA StrandB
+#chr9 1000 5000 chr9 3000 3800 bedpe_example2 100 - +
+
+use Cwd;
+use File::Basename;
+#Usage
+sub usage(){
+	print "Usage: perl Bam2Pair.pl -b <BAM> -o <outfile>\n
+		-isize [10000]\t\tThe insert size to be considered discordant\n
+		-winsize [10000]\tThe distance between mate pairs to be considered the same\n
+		-min [1]\t\tThe minimum number of reads required to support an SV event\n
+		-prefix need a random prefix so files with the same name don't get created\n\n"
+		;
+}
+$bedtools=`which intersectBed`;
+$samtools=`which samtools`;
+
+if(!defined($bedtools)){die "BEDtools must be installed\n";}
+if(!defined($samtools)){die "Samtools must be installed\n";}
+use Getopt::Long;
+#Declare variables
+GetOptions(
+	'b=s' => \$BAM_FILE,		#path to bam
+	'out=s' => \$outfile,		#path to output
+	'java:s' => \$java	,
+        'chrom:s' => \$chrom      ,
+	'isize=i' => \$isize,
+	'winsize=i' => \$winsize,
+        'prefix=s' => \$prefix,
+	'min=i' => \$minSupport,
+	'blacklist:s' => \$new_blacklist,
+	'q=s' => \$qual,
+	'v' => \$verbose
+	);
+#if(defined($picard_path)){$picard_path=$picard_path} else {die "Must specify a path to PICARD so that files can be sorted and indexed properly\n"};
+if(!defined($BAM_FILE)){die "Must specify a BAM file!\n".usage();}
+if(!defined($outfile)){die "Must specify an out filename!\n".usage();}
+if(!defined($java)){$java=$java;}else{$java=`which java`}
+if(!defined($qual)){$qual=20}
+if($new_blacklist){$new_blacklist=" -L $new_blacklist"}
+
+
+$Filter_BAM=$BAM_FILE;
+
+@bam=split("/",$Filter_BAM);
+$Filter_BAM=@bam[@bam-1];
+$Filter_BAM=~s/.bam/$prefix.bam/;
+$Filter_sam=$Filter_BAM;
+$Filter_sam=~s/.bam/.sam/;
+
+
+
+
+print "\nLooking for Discordant read pairs (and Unmated reads) without soft-clips\n";
+
+#$command=join("","samtools view -h -q 20 -f 1 -F 1804 ",$BAM_FILE," ",$chrom," ",$new_blacklist," |  awk -F\'\\t\' \'{if (\$9 > ", $isize, " || \$9 < -",$isize," || \$9 == 0 || \$1 ~ /^@/) print \$0}' > ",$Filter_sam);
+
+
+#Change command to allow reads where mate is unmapped & remove qual
+$command=join("","samtools view -h -f 1 -F 1800 -q $qual ",$BAM_FILE," ",$chrom," ",$new_blacklist," |  awk -F\'\\t\' \'{if (\$9 > ", $isize, " || \$9 < -",$isize," || \$9 == 0 || \$1 ~ /^@/) print \$0}' > ",$Filter_sam);
+
+print "$command\n" if ($verbose);
+system($command);
+$path = dirname(__FILE__);
+$Filter_cluster=$Filter_sam;
+$Filter_cluster=~s/.sam/.cluster/;
+$command=join("",$path,"/ReadCluster.pl -i=$Filter_sam -o=$Filter_cluster -m $minSupport");
+if($verbose){print "\n$command\n"};	
+
+system($command);
+
+##################################
+#Now there are 2 SAM files of filtered reads
+#.filter.cluster.inter.sam
+#.filter.cluster.intra.sam
+$result_pe=join("",$Filter_cluster,".out");
+$command=join("","cat ",$Filter_cluster,".int\*|perl -ane 'next if(\@F[0]=~/^\@/);if(\@F[6]!~/=/){print join(\"\\t\",\$F[11],\@F[2],\@F[3],\@F[6],\@F[7],\"\\n\")}else{print join(\"\\t\",\$F[11],\@F[2],\@F[3],\@F[2],\@F[7],\"\\n\")}' >",$result_pe);
+if($verbose){print $command."\n"};
+system($command);
+ #my ($sample, $chrstart, $start, $chrend, $end) 
+$command=join("","cat ",$result_pe," | ",$path,"/cluster.pair.pl ",$winsize," |awk '(\$6 >",$minSupport,")' >> ", $outfile);
+if($verbose){print $command."\n"};
+system($command);
+$filt1=join("",$Filter_cluster,".inter.sam");
+$filt2=join("",$Filter_cluster,".intra.sam");
+
+
+unlink($Filter_sam,$filt1,$filt2,$result_pe);
+
+#########################################
+#Now determine if left or righ clipping surrogate
+print "\nBam2pair.pl Done\n";
+
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/script/Check_integration.sh	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,86 @@
+#!/bin/sh
+#$ -V
+#$ -cwd
+#$ -q 1-day
+#$ -m ae
+#$ -M hart.steven@mayo.edu
+#$ -l h_vmem=8G
+#$ -l h_stack=10M
+VCF_FILE=$1
+x=$2
+#VIRAL_SEQDB=/data2/bsi/tertiary/m110344/SoftTile/Mia/BLAST_DB/OBrien/Virus_PCGS.fasta #must me indexed by bwasw
+VIRAL_SEQDB=$3
+VCF_FILE=$4
+#VCF_FILE=final.vcf
+
+set -x 
+
+perl -ane '$dist=100;
+$mate=$F[4];
+$mate=~s/[A-Z]|\[|\]//g;
+@mate=split(/:/,$mate);
+$end1a=@F[1]-$dist;
+$end1b=@F[1]+$dist;
+$end2a=$dist+@mate[1];
+$end2b=$dist+@mate[1];
+print "@F[0]\t$end1a\t$end1b\n@mate[0]\t$end2a\t$end2b\n"' $VCF_FILE|
+sortBed > targets.bed
+
+
+#100 min
+time samtools view -h $x -L targets.bed |awk '(($9==0)&&($11!~/#/)&&($3!~/^chrGL/)&&($3!~/^chrM/))'|perl -ane 'print "\@@F[0]\n@F[9]\n+\n@F[10]\n"' >${x%%.bam}.res.fq
+#23 min
+time bwa mem -t 4 $VIRAL_SEQDB ${x%%.bam}.res.fq |samtools view -S - |grep gi > ${x%%.bam}.tmp.sam
+
+#find out how many hits there are
+cut -f3 ${x%%.bam}.tmp.sam|perl -ne '@_=split(":",$_);@res=split(/_/,@_[1],2);print "@res[1]"' | sort|uniq -c|sort -k1nr|tee ${x%%.bam}.Viral_maps.out |head
+#Get the reads mapping to those hits to find out where the integration site is
+
+#Read in the viruses until there is a significant drop off in number of reads (i.e. contributing less than 10%)
+perl -ne '@_=split(" ",$_);$i=$_[0]+$i;$j=$_[0];$res=$j/$i;if($res>.1){print "@_[1]\n"}' ${x%%.bam}.Viral_maps.out >${x%%.bam}.to.keep
+fgrep -f ${x%%.bam}.to.keep ${x%%.bam}.tmp.sam |cut -f1 >${x%%.bam}.reads
+
+#75min+
+
+time samtools view $x -L targets.bed |
+fgrep -f ${x%%.bam}.reads|
+awk '{if(($9==0)&&($11!~/#/)&&($3!~/^chrGL/)&&($3!~/^chrM/)&&($3!~/^\*/)){print $3"\t"$4"\t"$4"\t"$1}}'|
+tee ${x%%.bam}.unsorted.bed|
+sortBed | mergeBed -nms -d 1000|
+perl -e 'open (FILE,"$ARGV[0]") or die "cant open file\n\n";
+ $SAM="$ARGV[1]";
+ $SAM=~chomp;
+ while(<FILE>){
+chomp;
+  @_=split(/\t/,$_);
+  @reads=split(/;/,@_[3]);
+#print "LINE=$_\nRES=grep $reads[0] $SAM\n";
+  $res=`grep $reads[0] $SAM` ;
+#  print "AFTER GREP, RES=$res\n";
+  if($res){
+   @res=split(/\t/,$res);
+   print join("\t",@_[0..2],@res[2])."\n"
+   }
+  };
+ close FILE' - ${x%%.bam}.tmp.sam |
+perl -ne 's/\|/\t/g;@_=split("\t",$_);print join ("\t",@_[0..2,7])'|
+perl -pne 's/_/\t/'|  cut -f4 --complement |
+perl -e '
+ open (FILE,"$ARGV[0]") or die "cant open file\n\n";
+ $SAM="$ARGV[1]";
+ while(<FILE>){
+  chomp;
+  @_=split(/\t/,$_);
+  $res=`grep $_[3] $SAM`;
+  if($res){
+   @res=split(" ",$res);
+   $reads[0]=chomp;
+   print join("\t",@_[0..4],@res[0])."\n";
+  }
+ }
+close FILE;
+' - ${x%%.bam}.Viral_maps.out|
+perl -pne 's/_/ /g'> ${x%%.bam}.Virus.integrated.bed
+
+rm ${x%%.bam}.reads ${x%%.bam}.to.keep ${x%%.bam}.tmp.sam ${x%%.bam}.res.fq
+echo "DONE with $x"
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/script/Extract_nSC.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,27 @@
+#!/usr/bin/perl -w
+
+use Getopt::Long;
+
+#Initialize values
+my (@queries,@HEADER,$samples,@HEADER_OUT,$end,$samp);
+GetOptions ("query|q=s" => \$queries);
+if(!$queries){die "Usage: FORMAT_extract.pl <VCF> -query nSC 
+\n\n";}
+
+
+open (VCF,"$ARGV[0]") or die "Usage: <VCF>";
+
+while (<VCF>) {
+        if($_=~/^##/){print;next}
+    chomp;
+    @line=split(/\t/,$_);
+    if($line[0]=~/^#CH/){
+        print join ("\t",@line,$queries)."\n";
+	next}
+ @FORMAT=split(/:/,$line[8]);
+ @SAMPLE=split(/:/,$line[9]);
+	for($i=0;$i<@FORMAT;$i++){
+	if($FORMAT[$i] =~/^$queries$/){print join ("\t",@line,$SAMPLE[$i])."\n";next}
+	}
+}
+close VCF;
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/script/Merge_SV.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,218 @@
+#!/usr/bin/perl -w
+use Getopt::Long;
+use List::Util qw(min max);
+
+
+#Declare variables
+my ($window,$tmpSpace,$usage,$help,$outFile);
+
+GetOptions(
+        'v=s{2,}' => \@VCF,
+        'o:s' => \$outFile,
+        'w:s' => \$window,
+		'h|help' => \$help
+);
+
+if((!@VCF)||($help)){&usage();exit}
+
+
+if (!$window) {
+    $window=500;
+}
+if (!$outFile) {
+    $outFile="merged.vcf.out";
+}
+###########################################
+# Protect against merging too many results
+###########################################
+$tmpSpace='temporarySV_merge';
+if (-e $tmpSpace) {
+    #Delete temp file if it exists
+    unlink $tmpSpace;
+}
+###########################################
+#For each VCF, create a BEDPE file
+###########################################
+
+open(OUT,">>$tmpSpace") or die "Can't write in this directory\n";
+for (my $i=0;$i<@VCF;$i++){
+    #print STDERR "opening $VCF[$i]\n";
+    open(VCF,$VCF[$i]) or die &usage();
+    while (<VCF>) {
+        next if ($_=~/^#/);
+        chomp;
+        @line=split("\t",$_);
+        $mate=$line[4];
+        $mate=~s/[A-L]|[N-W]|[Z]|\[|\]//g;
+        @mate=split(/:/,$mate);
+        $end1a=$line[1]-$window;
+        $end1b=$line[1]+$window;
+        $end2a=$mate[1]-$window;
+        $end2b=$mate[1]+$window;
+        next if (($end1a<0)||($end2a<0));
+        if (($line[0]=~/^chr$/)||($mate[0]=~/^chr$/)) {
+            next;
+        }
+        print OUT "$line[0]\t$end1a\t$end1b\t$mate[0]\t$end2a\t$end2b\n";
+        print OUT "$mate[0]\t$end2a\t$end2b\t$line[0]\t$end1a\t$end1b\n";
+    }
+}
+close OUT;
+
+###########################################
+#Now merge the BEDPE into a unique BEDPE
+###########################################
+#Make sure the BEDPE is sorted
+#print "Make sure the BEDPE is sorted\n";
+my $tmpSpace2=join("",$tmpSpace,".2");
+system("cat $tmpSpace|sort -k1,1 -k2,3n -k4,4 -k5,5n -u > $tmpSpace2");
+unlink($tmpSpace);
+
+#Create output files for the left and right merged BEDPE
+my $tmpSpace3=join("",$tmpSpace,".3");
+my $tmpSpace4=join("",$tmpSpace,".4");
+
+open (OUT1,">$tmpSpace3") or die "Cant write in this directory\n";
+open (OUT2,">$tmpSpace4") or die "Cant write in this directory\n";
+
+open(BEDPE,"$tmpSpace2") or die "$tmpSpace2 has already been deleted\n";
+#Initialize positions
+#my ($chr1,$pos2,$pos3,$chr2,$pos3,$pos4);
+my (@chr,@pos1,@pos2,@chr2,@pos3,@pos4);
+while (<BEDPE>) {
+    ($chr1,$pos2,$pos3,$chr2,$pos3,$pos4)=split("\t",$_);
+	if(!$Echr1){
+	($Echr1,$Epos1,$Epos2,$Echr2,$Epos3,$Epos4)=split("\t",$_);
+	}
+    while ( 
+		 ($chr1 =~ /^$Echr1$/)&&
+           ($pos2 <= $Epos2+$window)&&
+            ($chr2 =~ /^$Echr2$/)&&
+           ($pos3 <= $Epos3+$window)
+          )
+        {$nextline = <BEDPE> ;
+		last if (!$nextline);
+		$nextline=~chomp;
+         ($chr1,$pos1,$pos2,$chr2,$pos3,$pos4)=split("\t",$nextline);
+		 #print "NEXTLINE=$nextline";
+         push (@chr1,$chr1);
+         push (@pos1,$pos1);
+         push (@pos2,$pos2);
+         push (@chr2,$chr2);
+         push (@pos3,$pos3);
+         push (@pos4,$pos4);   
+		  }
+    ($Echr1,$Epos1,$Epos2,$Echr2,$Epos3,$Epos4)=($chr1[0],min(@pos1),max(@pos2),$chr2[-2],min(@pos3),$pos4[-2]);
+    #print join("\t",$Echr1,$Epos1,$Epos2,$Echr2,$Epos3,$Epos4);
+	if($pos1>$pos2){my $tmp=$pos1;$pos1=$pos2;$pos2=$tmp}
+	if($pos1>$pos2){my $tmp=$pos3;$pos3=$pos4;$pos4=$tmp}
+	print OUT1 join ("\t",$chr1,$pos1,$pos2)."\n";
+	print OUT2 join ("\t",$chr2,$pos3,$pos4);
+	($Echr1,$Epos1,$Epos2,$Echr2,$Epos3,$Epos4)=($chr1,$pos1,$pos2,$chr2,$pos3,$pos4);
+	}
+close BEDPE;
+close OUT;
+unlink ($tmpSpace2);
+
+#####################################################################
+#Now find out for each Unique BEDPE, how many Samples was the SV in?
+#####################################################################
+#FOR EACH VCF
+#get NAME
+
+my $tmpSpace5=join("",$tmpSpace,".5");
+my $tmpSpace6=join("",$tmpSpace,".6");
+my $tmpSpace7=join("",$tmpSpace,".7");
+my $tmpSpace8=join("",$tmpSpace,".8");
+my $tmpSpace9=join("",$tmpSpace,".9");
+
+#Create a placeholder file
+system("paste $tmpSpace3 $tmpSpace4| awk '{OFS=\"\\t\"}{print \$1,\$2,\$3,\$4,\$5,\$6,0,\"NA\"}' > $tmpSpace7");
+#Convert the VCF into a BED PE
+for (my $i=0;$i<@VCF;$i++){
+	open (OUT,">$tmpSpace5") or die "Cant write in this directory\n";
+	open(VCF,$VCF[$i]) ;
+	print STDERR "Starting on $VCF[$i]\n";
+		while (<VCF>) {
+			next if ($_=~/^#/);
+			chomp;
+			@line=split("\t",$_);
+			$mate=$line[4];
+			$mate=~s/[A-L]|[N-W]|[Z]|\[|\]//g;
+			@mate=split(/:/,$mate);
+			$end1a=$line[1]-$window;
+			$end1b=$line[1]+$window;
+			$end2a=$mate[1]-$window;
+			$end2b=$mate[1]+$window;
+			next if (($end1a<0)||($end2a<0));
+			if (($line[0]=~/^chr$/)||($mate[0]=~/^chr$/)) {
+				#print "$_\n";
+				next;
+			}
+			print OUT "$line[0]\t$end1a\t$end1b\t$mate[0]\t$end2a\t$end2b\n";
+			print OUT "$mate[0]\t$end2a\t$end2b\t$line[0]\t$end1a\t$end1b\n";
+		}
+	close VCF;
+	close OUT;
+	#for each row in $tmpSpace3, count the number of overlaps on both sides
+	my $left=join("",$tmpSpace,".left");
+	my $right=join("",$tmpSpace,".right");
+	system("intersectBed -a $tmpSpace3 -b $tmpSpace5 -loj -c > $left");
+	system("intersectBed -a $tmpSpace4 -b $tmpSpace5 -loj -c > $right");
+
+	my $Lcount=`wc -l $left|cut -f1 -d" "`;
+	my $Rcount=`wc -l $right|cut -f1 -d" "`;
+	if ($Lcount != $Rcount){die "Need to check for errors in $left and $right\n\n"}
+	system("paste $left $right > $tmpSpace5");
+	system ("rm $left $right");
+	open (IN,"$tmpSpace5") or die "Cant find $tmpSpace5\n";
+	open (OUT,">$tmpSpace6") or die "Cant write in this directory\n";
+	while(<IN>){
+		$_=~chomp;
+		@lines=split("\t",$_);
+		if(($lines[3] > 0)&&($lines[6] > 0)){print OUT "1\t$VCF[$i]\n"}else{print OUT "0\t.\n"}
+		}
+	close IN;
+	close OUT;
+
+	system("paste $tmpSpace7 $tmpSpace6 > $tmpSpace8");
+	#system("head $tmpSpace7 $tmpSpace8");
+	 open (IN,"$tmpSpace8") or die "Cant find $tmpSpace8\n";
+	 open (OUT,">$tmpSpace9") or die "Cant write in this directory\n";
+	 my ($Samples,$NumSamples,$EVENT);
+	 while(<IN>){
+		 $_=~chomp;
+		 @lines=split("\t",$_);
+
+		 if ($lines[8] > 0){
+			$Samples=$lines[7].";".$lines[9];
+			$Samples=~s/^NA;//;
+			$NumSamples=$lines[6]+$lines[8];
+			}
+			else{
+			$Samples=$lines[7];
+			$NumSamples=$lines[6];
+			}
+			print OUT join ("\t",@lines[0..5],$NumSamples,$Samples)."\n";
+	 }
+	 close IN;
+	 close OUT;
+	 print STDERR "completed with $VCF[$i]\n";
+	 system("cp $tmpSpace9 $tmpSpace7");
+}
+
+system("cp $tmpSpace7 $outFile");
+unlink ($tmpSpace9, $tmpSpace8, $tmpSpace7, $tmpSpace9,$tmpSpace3, $tmpSpace4, $tmpSpace5, $tmpSpace6);
+print STDERR "Your results are in $outFile\n";
+
+
+sub usage(){
+    print "
+###
+### This script will merge multiple SoftSearch VCF files
+###
+
+Usage: Merge_SV.pl -v <vcf1> <vcf2> <vcfN> -w [500] -o <output file>
+   
+    Note: Must have bedtools installed and in your path\n\n\n";
+}
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/script/Merge_Soft.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,39 @@
+#!/usr/bin/perl -s
+#Merge Softsearch results by chrom
+if(!$ARGV[0]){die "Usage: <Sample.1.vcf>\n";}
+my ($sample,$cmd);
+
+#Get basename
+$sample="$ARGV[0]";
+$sample=~s/.[0-9(+)].out.vcf//;
+$sample=~s/.[0-9(+)].pe.vcf//;
+
+my $outfile=$sample;
+$outfile.="out.vcf";
+if( -e $outfile ){unlink($outfile)}
+$cmd="ls $sample\*vcf";
+my @samples=`$cmd`;
+print "there are " .scalar(@samples)." samples\n";
+
+open (OUT,">$outfile");
+my $i=1;
+my $tmp=@samples[$i];
+open(TMP,"$tmp");
+while (<TMP>){
+	print OUT if ($_=~/^#/);
+}
+
+open (OUT,">>$outfile");
+my $chr;
+for (my $i=0;$i<@samples;$i++){
+	my $tmp=@samples[$i];
+	open(TMP,"$tmp");
+	while (<TMP>){
+		unless (($_=~/^chrGL/)||($_=~/^#/)){print OUT $_;}
+	}
+	print "Done with $tmp";
+        unlink($tmp);
+	system("rm $tmp");
+	close TMP;
+}
+close OUT;
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/script/ReadCluster.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,191 @@
+#!/usr/bin/perl
+
+=head1 NAME
+   ReadCluster.pl
+
+=head1 SYNOPSIS
+
+    USAGE: ReadCluster.pl --input input_sam_file --output output_prefix [--threshold 10000 --minClusterSize 4]
+
+=head1 OPTIONS
+
+B<--input,-i>
+   Input file
+
+B<--output,-o>
+   output prefix
+
+B<--window, -w>
+    Window size
+
+B<--minClusterSize, -m>
+	Min size of cluster
+
+B<--help,-h>
+   This help message
+
+=head1  DESCRIPTION
+
+
+=head1  INPUT
+
+
+=head1  OUTPUT
+
+
+=head1  CONTACT
+  Jaysheel D. Bhavsar @ bjaysheel[at]gmail[dot]com
+
+
+==head1 EXAMPLE
+   ReadCluster.pl --input=filename.sam --window=10000 --output=PREFIX
+
+=cut
+
+use strict;
+use warnings;
+use Data::Dumper;
+use DBI;
+use Pod::Usage;
+use Scalar::Util qw(looks_like_number);
+use Getopt::Long qw(:config no_ignore_case no_auto_abbrev pass_through);
+
+my %options = ();
+my $results = GetOptions (\%options,
+                          'input|i=s',
+						  'output|o=s',
+                          'window|w=s',
+						  'minClusterSize|m=s',
+						  'help|h') || pod2usage();
+
+## display documentation
+if( $options{'help'} ){
+    pod2usage( {-exitval => 0, -verbose => 2, -output => \*STDERR} );
+}
+#############################################################################
+## make sure everything passed was peachy
+&check_parameters(\%options);
+
+my $r1_start = 0;
+my $r2_start = 0;
+my $r1_end = $r1_start + $options{window};
+my $r2_end = $r2_start + $options{window};
+my $r1_chr = "";
+my $r2_chr = "";
+
+my @cluster = ();
+
+open (FHD, "<", $options{input}) or die "Cound not open file $options{input}\n";
+open (INTRA, ">", $options{output} . ".intra.sam") or die "Cound not open file $options{output}.intra.sam\n";
+open (INTER, ">", $options{output} . ".inter.sam") or die "Cound not open file $options{output}.inter.sam\n";
+
+while (<FHD>){
+	chomp $_;
+
+	#skip processing lines starting with @ just print to output file.
+	if ($_ =~ /^@/){
+		print INTRA $_."\n";
+		print INTER $_."\n";
+		next;
+	}
+#print "$_\n";
+	check_sequence($_);
+}
+
+close(FHD);
+close(INTRA);
+close(INTER);
+
+exit(0);
+
+#############################################################################
+sub check_parameters {
+    my $options = shift;
+
+	my @required = ("input", "output");
+
+	foreach my $key (@required) {
+		unless ($options{$key}) {
+			print STDERR "ARG: $key is required\n";
+			pod2usage({-exitval => 2,  -message => "error message", -verbose => 1, -output => \*STDERR});
+			exit(-1);
+		}
+	}
+
+	unless($options{window}) { $options{window} = 10000; }
+	unless($options{minClusterSize}) { $options{minClusterSize} = 4; }
+}
+
+#############################################################################
+sub check_sequence {
+	my $line = shift;
+
+	my @data = split(/\t/, $line);
+
+	## check if mates are within the window.
+	if ((inWindow($data[3], 1)) && (inWindow($data[7], 2)) &&
+		($r1_chr =~ /$data[2]/) && ($r2_chr =~ /$data[6]/)) {
+
+		## if minClusterSize is reached output
+		if (scalar(@cluster) >= $options{minClusterSize}) {
+
+			## if chr are the same then print intra-chr else inter-chr
+			if ($data[6] =~ /=/) {
+				print INTRA $line."\n";
+			} else {
+				print INTER $line."\n";
+			}
+		} else {
+			push @cluster, $line;
+		}
+	} else {
+
+		if (scalar(@cluster) >= $options{minClusterSize}) {
+			dumpCluster(@cluster);
+		}
+
+		@cluster = ();
+		$r1_start = $data[3];
+		$r2_start = $data[7];
+		$r1_end = $r1_start + $options{window};
+		$r2_end = $r2_start + $options{window};
+		$r1_chr = $data[2];
+		$r2_chr = $data[6];
+	}
+}
+
+#############################################################################
+sub inWindow {
+	my $coord = shift;
+	my $read = shift;
+
+	my $start = 0;
+	my $end = 0;
+
+	if ($read == 1) {
+		$start = $r1_start;
+		$end = $r1_end;
+	} else {
+		$start = $r2_start;
+		$end = $r2_end;
+	}
+
+	if (($coord > $start) && ($coord < $end)){
+		return 1;
+	} else { return 0; }
+}
+
+#############################################################################
+sub dumpCluster {
+	my @cluster = shift;
+
+	foreach (@cluster){
+		my @data = split(/\t/, $_);
+
+		if ($data[6] =~ /=/) {
+			print INTRA $_."\n";
+		} else {
+			print INTER $_."\n";
+		}
+	}
+}
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/script/SoftSearch.multi.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,1183 @@
+#!/usr/bin/perl
+
+####
+#### Usage: SoftSearch.pl [-lqrmsd] -b <BAM> -f <Genome.fa> -sam <samtools path> -bed <bedtools path>
+#### Created 1-30-2012 by Steven Hart, PhD
+#### hart.steven@mayo.edu
+#### Required bedtools & samtools to be in path
+
+#use FindBin;
+#use lib "$FindBin::Bin/lib";
+use lib "/data2/bsi/reference/softsearch/lib" ;
+use Getopt::Long;
+use strict;
+use warnings;
+use Data::Dumper;
+use LevD;
+use File::Basename;
+use List::Util qw(min max);
+ 
+my (@INPUT_BAM,$INPUT_FASTA,$OUTPUT_FILE,$minSoft,$minSoftReads,$dist_To_Soft,$bedtools,$samtools);
+my ($minRP, $temp_output, $num_sd, $MapQ, $chrom, $unmated_pairs, $minBQ, $pair_only, $disable_RP_only);
+my ($levD_local_threshold, $levD_distl_threshold,$pe_upper_limit,$high_qual,$sv_only,$blacklist,$genome_file,$verbose);
+
+my $cmd = "";
+
+#Declare variables
+GetOptions(
+	'b=s{1,}' => \@INPUT_BAM,
+	'f=s' => \$INPUT_FASTA,
+	'o:s' => \$OUTPUT_FILE,
+	'm:i' => \$minRP,
+	'l:i' => \$minSoft,
+	'r:i' => \$minSoftReads,
+	't:i' => \$temp_output,
+	's:s' => \$num_sd,
+	'd:i' => \$dist_To_Soft,
+	'q:i' => \$MapQ,
+	'c:s' => \$chrom,
+	'u:s' => \$unmated_pairs,
+	'x:s' => \$minBQ,
+	'p' => \$pair_only,	
+	'g' => \$disable_RP_only,	#ignore softclips
+	'j:s' => \$levD_local_threshold,
+	'k:s' => \$levD_distl_threshold,
+    'a:s' => \$pe_upper_limit,
+    'e:s' => \$high_qual,
+	'L' => \$sv_only,
+	'v' => \$verbose, 
+	'blacklist:s' => \$blacklist,
+	'genome_file:s' => \$genome_file,
+	"help|h|?"	=> \&usage);
+#print "Using @INPUT_BAM as INPUT_BAM\n";
+unless($sv_only){$sv_only=""};
+my $INPUT_BAM=$INPUT_BAM[0];
+#print "MY NEW INPUT BAM=$INPUT_BAM[0]\n\n";die;
+if(defined($INPUT_BAM)){$INPUT_BAM=$INPUT_BAM} else {print usage();die "Where is the BAM file?\n\n"}
+if(defined($INPUT_FASTA)){$INPUT_FASTA=$INPUT_FASTA} else {print usage();die "Where is the fasta file?\n\n"}
+my ($fn,$pathname) = fileparse($INPUT_BAM,".bam");
+#my $index=`ls $pathname/$fn*bai|head -1`;
+#my $index =`ls \${INPUT_BAM%.bam}*bai`;
+#print "INDEX=$index\n";
+#if(!$index){die "\n\nERROR: you need index your BAM file\n\n"}
+my $index="";
+### get current time
+print "Start Time : " . &spGetCurDateTime() . "\n";
+my $now = time;
+
+#if(defined($OUTPUT_FILE)){$OUTPUT_FILE=$OUTPUT_FILE} else {$OUTPUT_FILE="output.vcf"; print "\nNo outfile specified.  Using output.vcf as default\n\n"}
+if(defined($minSoft)){$minSoft=$minSoft} else {$minSoft=5}
+if(defined($minRP)){$minRP=$minRP} else {$minRP=5}
+if(defined($minSoftReads)){$minSoftReads=$minSoftReads} else {$minSoftReads=5}
+if(defined($dist_To_Soft)){$dist_To_Soft=$dist_To_Soft} else {$dist_To_Soft=300}
+if(defined($num_sd)){$num_sd=$num_sd} else {$num_sd=6}
+if(defined($MapQ)){$MapQ=$MapQ} else {$MapQ=20}
+
+unless (defined $pe_upper_limit) { $pe_upper_limit = 10000; }
+unless (defined $levD_local_threshold) { $levD_local_threshold = 0.05; }
+unless (defined $levD_distl_threshold) { $levD_distl_threshold = 0.05; }
+#Get sample name if available
+my $SAMPLE_NAME="";
+my $OUTNAME ="";
+$SAMPLE_NAME=`samtools view -f2 -H $INPUT_BAM|awk '{if(\$1~/^\@RG/){sub("ID:","",\$2);name=\$2;print name}}'|head -1`;
+$SAMPLE_NAME=~s/\n//g;
+if (!$OUTPUT_FILE){
+	if($SAMPLE_NAME ne ""){$OUTNAME=$SAMPLE_NAME.".vcf"}
+	else {$OUTNAME="output.vcf"}
+}
+else{$OUTNAME=$OUTPUT_FILE}
+
+print "Writing results to $OUTNAME\n";
+
+
+##Make sure if submitting on SGE, to prepned the "chr".  Not all referecne FAST files require "chr", so we shouldn't force the issue.
+if(!defined($chrom)){$chrom=""}
+if(!defined($unmated_pairs)){$unmated_pairs=0}
+
+my $badQualValue=chr($MapQ);
+if(defined($minBQ)){ $badQualValue=chr($minBQ); }
+
+if($badQualValue  eq "#"){$badQualValue="\#"}
+
+# adding and cheking for samtools and bedtools in the PATh
+## check for bedtools and samtools in the path
+$bedtools=`which intersectBed` ;
+if(!defined($bedtools)){die "\nError:\n\tno bedtools. Please install bedtools and add to the path\n";}
+#$samtools=`samtools 2>&1`;
+$samtools=`which samtools`;
+if($samtools !~ /(samtools)/i){die "\nError:\n\tno samtools. Please install samtools and add to the path\n";}
+
+print "Usage = SoftSearch.pl -l $minSoft -q $MapQ -r $minSoftReads -d $dist_To_Soft -m $minRP -s $num_sd -c $chrom -b @INPUT_BAM -f $INPUT_FASTA -o $OUTNAME \n\n";
+sub usage {
+	print "\nusage: SoftSearch.pl [-cqlrmsd] -b <BAM> -f <Genome.fa> \n";
+	print "\t-q\t\tMinimum mapping quality [20]\n";
+	print "\t-l\t\tMinimum length of soft-clipped segment [5]\n";
+	print "\t-r\t\tMinimum depth of soft-clipped reads at position [5]\n";
+	print "\t-m\t\tMinimum number of discordant read pairs [5]\n";
+	print "\t-s\t\tNumber of sd away from mean to be considered discordant [6]\n";
+	print "\t-u\t\tNumber of unmated pairs [0]\n";
+	print "\t-d\t\tMax distance between soft-clipped segments and discordant read pairs [300]\n";
+	print "\t-o\t\tOutput file name [output.vcf]\n";
+	print "\t-t\t\tPrint temp files for debugging [no|yes]\n";
+	print "\t-c\t\tuse only this chrom or chr:pos1-pos2\n";
+	print "\t-p\t\tuse paired-end mode only \n";
+	print "\t-g\t\tEnable paired-only seach. This will look for discordant read pairs even without soft clips.\n";
+        print "\t-a\t\tset the minimum distance for a discordant read pair without soft-clipping info [10000]\n";
+        print "\t-L\t\tFlag to print out even small deletions (low quality)\n";
+        print "\t-e\t\tdisable strict quality filtering of base qualities in soft-clipped reads [no]\n";
+        print "\t-blacklist\tareas of the genome to skip calling.  Requires -genome_file\n";
+        print "\t-genome_file\ttab seperated value of chromosome name and length.  Only used with -blacklist option\n\n";
+
+	exit 1;
+	}
+
+
+#############################################################
+# create temporary variable name
+#############################################################
+srand (time ^ $$ ^ unpack "%L*", `ps axww | gzip -f`);
+our $random_name=join "", map { ("a".."z")[rand 26] } 1..8;
+
+#############################################################
+## create green list
+##############################################################
+#
+my $new_blacklist="";
+if($blacklist){
+        if(!$genome_file){die "if using a blacklist, you must also specify the location of a genome_file
+        The format of the genome_file should be
+                chrom   size
+                chr1    249250621
+                chr2    243199373
+                ...
+
+        If using hg19, you can ge the genome file by
+                mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \"select chrom, size from hg19.chromInfo\"  > hg19.genome";}
+        
+	$cmd=join("","complementBed -i $blacklist -g $genome_file >",$random_name,".bed") ;
+	system ($cmd);
+	$new_blacklist=join(""," -L ",$random_name,".bed ");
+	}
+
+if($verbose){print "CMD=$cmd\nBlacklist is $new_blacklist\n";}
+
+
+
+
+
+#############################################################
+# Calcualte insert size distribution of properly mated reads
+#############################################################
+
+#Change for compatability with other operating systems
+#my $metrics=`samtools view -q $MapQ -f2 $INPUT_BAM $chrom|cut -f9|head -10000|awk '{if (\$1<0){\$1=-\$1}else{\$1=\$1} sum+=\$1; sumsq+=\$1*\$1} END {print sum/NR, sqrt(sumsq/NR - (sum/NR)**2)}'`;
+#print "samtools view -q $MapQ -f2 $INPUT_BAM $chrom|cut -f9|head -10000|awk '{if (\$1<0){\$1=-\$1}else{\$1=\$1} sum+=\$1; sumsq+=\$1*\$1} END {print sum/NR, sqrt(sumsq/NR - (sum/NR)^2)}'\n";
+my $metrics=`samtools view -q $MapQ -f2 $INPUT_BAM $chrom|cut -f9|head -10000|awk '{if (\$1<0){\$1=-\$1}else{\$1=\$1} sum+=\$1; sumsq+=\$1*\$1} END {print sum/NR, sqrt(sumsq/NR - (sum/NR)^2)}'`;
+#my ($mean,$stdev)=split(/ /,$metrics);
+my ($mean,$stdev)=split(/\s/,$metrics);
+$stdev=~s/\n//;
+
+#print "MEAN=$mean\tSTDEV=$stdev\n\n";
+
+my $upper_limit=int($mean+($num_sd*$stdev));
+my $lower_limit=int($mean-($num_sd*$stdev));
+die if (!$mean);
+print qq{The mean insert size is $mean +/- $stdev (sd)
+The upper limit = $upper_limit
+The lower limit = $lower_limit\n
+};
+if($lower_limit<0){
+	print "Warning!! Given this insert size distribution, we can not call small indels.  No other data will be affected\n\n";
+	$lower_limit=1;
+}
+my $tmp_name=join ("",$random_name,".tmp.bam");
+my $random_file_sc = "";
+my $command = "";
+
+#############################################################
+# Make sam file that has soft clipped reads
+#############################################################
+#give file a name
+if(!defined($pair_only)){
+	foreach my $bam(@INPUT_BAM){
+	$random_file_sc=join ("",$random_name,".sc.sam");
+	$command=join ("","samtools view -q $MapQ -F 1024 $bam $chrom $new_blacklist| awk '{OFS=\"\\t\"}{c=0;if(\$6~/S/){++c};if(c == 1){print}}' | perl -ane '\$TR=(\@F[10]=~tr/\#//);if(\$TR<2){print}' >> ", $random_file_sc);
+	print "Making SAM file of soft-clipped reads from $bam\n";
+	if($verbose){	print "$command\n";}
+	system("$command");
+}
+	#############################################################
+	# Find areas that have deep enough soft-clip coverage
+	print "Identifying soft-clipped regions that are at least $minSoft bp long iin $random_file_sc\n";
+	open (FILE,"$random_file_sc")||die "Can't open soft-clipped sam file $random_file_sc\n";
+
+	my $tmpfile=join("",$random_file_sc,".sc.passfilter");
+	open (OUT,">$tmpfile")||die "Can't write files here!\n";
+
+	while(<FILE>){
+		@_ = split(/\t/, $_);
+		#### parse CIGAR string and create a hash of array of each operation
+		my @CIGAR = split(/([0-9]+[SMIDNHXP])/, $_[5]);
+		my $hash;
+		map { push(@{$hash->{$2}}, $1) if (/(\d+)([SMIDNHXP])/) } @CIGAR;
+
+		#for ($i=0; $i<=$#softclip_pos; $i++)	{
+		foreach my $softclip (@{$hash->{S}}) {
+			#if	($CIGAR[$softclip_pos[$i]] > $minSoft){
+			if	($softclip > $minSoft){
+				###############Make sure base qualities don't have more than 2 bad marks
+				my $qual=$_[10];
+				my $TR=($qual=~tr/$badQualValue//);
+				if($badQualValue eq "#"){ $TR=($qual=~tr/\#//); }
+				#Skip the soft clip if there is more than 2 bad qual values
+				#next if($TR > 2);
+#				if (!$high_qual){next if($TR > 2);}
+				print OUT;
+				last;
+			}
+		}
+	}
+	close FILE;
+	close OUT;
+
+	$command=join(" ","mv",$tmpfile,$random_file_sc);
+if($verbose){	print "$command\n";}
+	system("$command");
+}
+
+#########################################################
+#Stack up SoftClips
+#########################################################
+my $random_file=join("",$random_name,".sc.direction.bed");
+if(!defined($pair_only)){
+        open (FILE,"$random_file_sc")|| die "Can't open sam file\n";
+        #$random_file=join("",$random_name,".sc.direction");
+
+        print "Calling sides of soft-clips from $random_file_sc\n";
+        #\nTMPOUT=$random_file\tINPUT=$random_file_sc\n\n";
+        open (TMPOUT,">$random_file")|| die "Can't create tmp file\n";
+
+        while (<FILE>){
+                @_ = split(/\t/, $_);
+                #### parse CIGAR string and create a hash of array of each operation
+                my @CIGAR = split(/([0-9]+[SMIDNHXP])/, $_[5]);
+                my $hash;
+                map { push(@{$hash->{$2}}, $1) if (/(\d+)([SMIDNHXP])/) } @CIGAR;
+
+                #### next if softclips on each end
+                next if ($_[5] =~ /^[0-9]+S.*S$/);
+
+                #### next softclip occurs in the middle
+                next if ($_[5] =~ /^[0-9]+[^S][0-9].*S.+$/);
+
+                my $softclip = $hash->{S}[0];
+
+                my $end1 = 0;
+                my $end2 = 0;
+                my $softBases = "";
+		my $right_corrected="";my $left_corrected="";
+
+        if ($softclip > $minSoft) {
+		
+                        ####If the soft clip occurs at end of read and its on the minus strand, then it's a right clip
+                        if ($_[5] =~ /^.*S$/) {
+                                $end1=$_[3]+length($_[9])-$softclip-1;
+                                $end2=$end1+1;
+next if ($end1<0);
+                                #RIGHT clip on Minus
+                                $softBases=substr($_[9], length($_[9])-$softclip, length($_[9]));
+                                #Right clips don't always get clipped correctly, so fix that
+                                # Check to see if sc base matches ref
+                                $right_corrected=baseCheck($_[2],$end2,"right",$softBases);
+                               print TMPOUT "$right_corrected\n"
+
+                        } else {
+                                #### Begins with S (left clip)
+                                $end1=$_[3]-$softclip;
+next if ($end1<0);
+
+                                $softBases=substr($_[9], 0,$softclip);#print "TMP=$softBases\n";
+        			$left_corrected=baseCheck($_[2],$end1,"left",$softBases);
+if(!$left_corrected){print "baseCheck($_[2],$end1,left,$softBases)\n";next}
+                               print TMPOUT "$left_corrected\n";
+#print "\nSEQ=$_[9]\t\n";
+
+                        }
+        }
+  }
+close FILE;
+close TMPOUT;
+}
+sub baseCheck{
+        my ($chrom,$pos,$direction,$softBases)=@_;
+        #skip if position is less than 0, which is caused by MT DNA
+        return if ($pos<0);
+        my $exit="";
+
+        while(!$exit){
+        if($direction=~/right/){
+                        my $refBase=getSeq($chrom,$pos,$INPUT_FASTA);
+                        my $softBase=substr($softBases,0,1);
+                        if ($softBase !~ /$refBase/){
+                                my $value=join("\t",$chrom,$pos,$pos+1,join("|",$softBases,$direction));
+                                $exit="STOP";
+                                return $value;
+                        }
+                        else{
+                                $pos=$pos+1;
+                                $softBases=substr($softBases, 1,length($softBases));
+                        }
+         }
+        else{
+                        my $refBase=getSeq($chrom,$pos+1,$INPUT_FASTA);
+                        my $softBase=substr($softBases,-1,1);
+                        if ($softBase !~ /$refBase/){
+                                $pos=$pos-1+length($softBases);
+                                my $value=join("\t",$chrom,$pos-1,$pos,join("|",$softBases,$direction));
+                                $exit="STOP";
+                                return $value;
+                        }
+                        else{
+                                $pos=$pos-1;
+                                $softBases=substr($softBases, 0, -1);
+                                #print "Trying again $softBases\n";
+                       }
+
+        }
+
+}
+}
+#Remove SAM files to conserve space
+unlink($random_file_sc);
+
+
+
+###
+#
+######################################################
+# Transform Read pair groups into softclip equivalents
+######################################################
+#
+#
+#
+my $v="";
+#if($disable_RP_only){
+print "Running Bam2pair.pl\n";
+print "Looking for discordant read pairs without requiring soft-clipping information\n";
+	use FindBin qw($Bin);
+	my $path=$Bin;
+#	print"\n\nPATH=$path\n\n";
+if($verbose){$v="-v"}
+foreach my $random_file_disc(@INPUT_BAM){
+	my $tmp_out=join("",$random_name,"RP.out");
+	$command=join("","perl ",$path,"/Bam2pair.pl -b $random_file_disc  -o $tmp_out -isize $pe_upper_limit -winsize $dist_To_Soft -min $minRP -chrom $chrom -prefix $random_name -q $MapQ -blacklist $random_name.bed $v");
+if($verbose){	print "$command\n"};
+	system("$command");
+	$command=join("","perl -ane '\$end1=\@F[1];\$end2=\@F[3];print join(\"\\t\",\@F[0..1],\$end1,\"unknown|left\");print \"\\n\";print join(\"\\t\",\@F[2..3],\$end2,\"unknown|left\");print \"\\n\"' ", $tmp_out," >> ",$random_file);
+if($verbose){print "$command\n"};
+	system($command);
+	unlink($tmp_out);
+#}
+}
+
+
+######################################################
+unlink("$random_file","$tmp_name","$random_file","$index","$random_name","$new_blacklist") if (-z $random_file || ! -e $random_file) ;
+if (-z $random_file || ! -e $random_file){
+	print "Softclipped file is empty($random_file).\nNo soft clipping found using desired paramters\n\n";
+	open (OUT,">$OUTNAME")||die "Can't write files here!\n";
+        &print_header();
+        close OUT;
+        exit;
+}
+
+
+#############################################################
+#  Make sure there are enough soft-clippped supporting reads
+#############################################################
+my $outfile=join("",$random_file,".sc.merge.bed");
+#sortbed -i .sc.direction | mergeBed -nms -d 25 -i stdin > .sc.merge.bed
+$command=join(" ","sortBed -i",$random_file," | mergeBed  -nms -i stdin","|grep \";\"","|awk '{OFS=\"\t\"}(NF==4)'",">",$outfile);
+
+#print "$command\n";
+system("$command");
+
+if (-z $outfile || ! -e $outfile){
+	unlink("$tmp_name","$random_file","$outfile","$index","$random_name","$new_blacklist"); 
+	print "mergeBed file is empty.\nNo strucutral variants found\n\n" ;
+        open (OUT,">$OUTNAME")||die "Can't write files here!\n";
+        &print_header();
+        close OUT;
+        exit;
+}
+
+print "completed mergeBed\n";
+
+###############################################################
+# If left and right are on the same line, make into 2 lines
+###############################################################
+open (INFILE,$outfile)||die "couldn't open temp file : $. \n\n";
+my $tmp2=join("",$random_name,".sc.fixed.merge.bed");
+#print "INFILE=$outfile\tOUTFILE=$tmp2\n\n";
+#INPUT FORMAT=chr9\t131467\t131473\tATGCTTATTAAAA|left;TTATTAAAAGCATA|left
+open (OUTFILE,">$tmp2")||die "couldn't create temp file : $. \n\n";
+while(<INFILE>){
+	chomp $_;
+	my $l = $_;
+
+	my @a = split(/\t/, $l);
+	my $info = $a[3];
+	my @info_arr = split(/\;/, $info);
+	my @left_arr=();
+	my @right_arr=();
+	@left_arr = grep(/left/, @info_arr);
+	@right_arr = grep(/right/, @info_arr);
+
+	#New
+	my $left = join(";", @left_arr);
+	my $right = join(";", @right_arr);
+	$info = join(";", @info_arr);
+
+	if((@left_arr) && (@right_arr)){
+		print OUTFILE "$a[0]\t$a[1]\t$a[2]\t$left\n$a[0]\t$a[1]\t$a[2]\t$right\n";
+	} else{
+		my $all=join("\t",@a[0..2],$info);
+		print OUTFILE "$all\n";
+	}
+}
+
+# make sure output file name is $outfile
+$command=join(" ","sed -e '/ /s//\t/g'", $tmp2,"|awk 'BEGIN{OFS=\"\\t\"}(NF==4)'", "|perl -pne 's/ /\t/g'>",$outfile);
+system("$command");
+if($verbose){print "$command\n"};
+unlink("$tmp_name","$random_file","$tmp2","$outfile","$index","random_name","$new_blacklist") if (-z $outfile || ! -e $outfile) ;
+ if (-z $outfile || ! -e $outfile){
+	print "Fixed mergeBed file is empty($outfile).\nNo strucutral variants found\n\n";
+        open (OUT,">$OUTNAME")||die "Can't write files here!\n";
+        &print_header();
+        close OUT;
+        exit;
+}
+
+print "completed fixing mergeBed\n\n";
+
+###############################################################
+# Seperate directions of soft clips
+###############################################################
+my $left_sc = join("", "left", $tmp2);
+my $right_sc = join("", "right", $tmp2);
+use FindBin qw($Bin);
+#my $path=$Bin;
+
+$command=join("","grep left ", $tmp2, " |sed -e '/left /s//left\;/g' | sed -e '/ /s//\t/g'|perl ".$path."/direction_filter.pl - >",$left_sc);
+system("$command");
+#print "$command\n";
+$command=join("","grep right ", $tmp2, " |sed -e '/right /s//right\;/g' | sed -e '/ /s//\t/g'|perl ".$path."/direction_filter.pl - >",$right_sc);
+#$command=join(" ","grep right ", $tmp2, " |sed -e '/right /s//right\;/g' | sed -e '/ /s//\t/g' >",$right_sc);
+system("$command");
+#print "$command\n";
+#die "CHECK $right_sc\n";
+
+###############################################################
+# Count the number and identify directions of soft clips
+###############################################################
+print "Count the number and identify directions of soft clips\n";
+#print "looking in $outfile\n";
+$outfile=join("",$random_name,".sc.fixed.merge.bed");
+#system("ls -lhrt");
+open (INFILE,$outfile)||die "couldn't open temp file\n\n";
+my $tmp3 = join("", $random_file, "predSV");
+open (OUTFILE, ">$tmp3")||die "couldn't create temp file\n\n";
+while(<INFILE>){
+chomp;
+	@_=split(/\t/,$_);
+	my $count=tr/\;//;
+	$count=$count+1;
+	my $left=0;
+	my $right=0;
+
+	while ($_ =~ /left/g) { $left++ } # count number of right clips
+	while ($_ =~ /right/g) { $right++ } # count number of left clips
+
+	###############################################################
+	if ($count >= $minSoftReads){
+		####get longets soft-clipped read
+		my @clips=split(/\;|\|/,$_[3]);
+
+		my ($max, $temp, $temp2, $temp3, $dir, $maxSclip) = (0) x 6;
+		for (my $i=0; $i<$count; $i++) {
+			my $plus1=$i+1;
+			$temp=length($clips[$i]);
+			$temp2=$clips[$plus1];
+			$temp3=$clips[$i];
+
+			if ($temp > $max){
+				$maxSclip=$temp3;
+				$max =$temp;
+				$dir=$temp2;
+			} else {
+				$max=$max;
+				$dir=$dir;
+				$maxSclip=$maxSclip;
+			}
+			$i++;
+		}
+		my $order2 = join("|", $left, $right);
+        #print join ("\t",@_[0..2],$maxSclip,$max,$dir,$count,$order2) . "\n";
+		print OUTFILE join ("\t",@_[0..2],$maxSclip,$max,$dir,$count,$order2) . "\n";
+	} elsif($_=~/unknown/){
+	print OUTFILE join ("\t",@_[0..2],"NA","NA","left","NA","NA|NA") . "\n";
+        print OUTFILE join ("\t",@_[0..2],"NA","NA","right","NA","NA|NA") . "\n";
+	}
+	####Format is Chrom,start, end,longest Soft-clip,length of longest Soft-clip, direction of longest soft-clip,#supporting softclips,#right Sclips|#left Sclips
+}
+close INFILE;
+close OUTFILE;
+
+unlink("$tmp2","$tmp_name","$random_file","$tmp3","$outfile","$index","$random_name","$right_sc","$left_sc","$new_blacklist") if (-z $tmp3 || !-e $tmp3) ;
+
+ if (-z $tmp3 || !-e $tmp3){
+	print "No structural variants found while Counting the number and identify directions of soft clips.\n" ;
+
+#	open (OUT,">$OUTNAME")||die "Can't write files here!\n";
+#	&print_header();
+#	close OUT;
+exit;
+}
+
+print "Done counting Softclipped reads\n";
+###############################################################
+#### Print header information
+###############################################################
+
+
+foreach my $random_file_disc(@INPUT_BAM){
+print "Making the header for $random_file_disc\n";
+$SAMPLE_NAME=`samtools view -f2 -H $random_file_disc|awk '{if(\$1~/^\@RG/){sub("ID:","",\$2);name=\$2;print name}}'|head -1`;
+$SAMPLE_NAME=~s/\n//g;
+if($chrom){$SAMPLE_NAME.=".".$chrom}
+
+$SAMPLE_NAME.=".vcf";
+open (OUT,">$SAMPLE_NAME")||die "Can't write files here!\n";
+&print_header();
+
+# DO the bulk of the work
+open (FILE,"$tmp3")|| die "Can't open file\n";
+
+while (<FILE>){
+	#If left clip {+- or -- or -+ }{+- are uninformative b/c they go upstream}
+	#If right clip {++ or -- or +-}
+	chomp $_;
+	my @res=();my $res;
+	my $line = $_;
+	my @info = split(/\t/, $_);
+	my $i=0;
+	my $basename=basename($random_file_disc);$i=0;
+	if($info[5] eq "left") {
+		$res=bulk_work("left", $line, $random_file_disc);
+                if(!$res){$res=join("\t",".",".",".",".",".",".",".",".",".",".")};
+		$i++;
+		} 
+	elsif ($info[5] eq "right") {
+		$res=bulk_work("right", $line, $random_file_disc);
+		if(!$res){$res=join("\t",".",".",".",".",".",".",".",".",".",".")};
+		$i++;
+		}
+	if($res){@res=split("\t",$res);
+	print OUT join("\t",@res)."\n";
+	}}
+close FILE;
+close OUT;
+print "Done with $random_file_disc\n\n";
+}
+
+
+
+###############################################################################
+###############################################################################
+#### Delete temp files
+my $meregedBed=join("",$random_name,".sc.direction.bed.sc.merge.bed");
+
+if(defined($temp_output)){$temp_output=$temp_output} else {$temp_output="no"}
+
+if ($temp_output eq "no"){
+	unlink("$tmp_name","$random_file","$tmp2",,"$tmp3","$outfile","$index","$random_name","$right_sc","$left_sc","$meregedBed","$random_name.bed");
+}
+####Sort VCF
+#my $tmp=join(".",$random_name,"tmp");
+#Get header
+#$cmd="grep \"#\" $OUTNAME > $tmp";
+#system($cmd);
+#sort results
+#$cmd="grep -v \"#\" $OUTNAME|perl -pne 's/chr//'|sort -k1,1n -k2,2n|perl -ne 'print \"chr\".\$_' >>$tmp";
+#system($cmd);
+#$cmd="mv $tmp $OUTNAME";
+#system($cmd);
+#remove entries next to each other
+
+
+print "Analysis Completed\n\nYou did it!!!\n";
+print "Finish Time : " . &spGetCurDateTime() . "\n";
+$now = time - $now;
+printf("\n\nTotal running time: %02d:%02d:%02d\n\n", int($now / 3600), int(($now % 3600) / 60),
+int($now % 60));
+
+exit;
+
+###############################################################################
+sub rev_comp {
+  my $dna = shift;
+  my $revcomp = reverse($dna);
+  $revcomp =~ tr/ACGTacgt/TGCAtgca/;
+  return $revcomp;
+}
+
+
+###############################################################################
+#### to get reference base
+sub getSeq{
+	my ($chr,$pos,$fasta)=@_;
+	#don't require chr
+	#if($chr !~ /^chr/){die "$chr is not correct\n";}
+#	die "$pos is not a number\n" if ($pos <0);
+my @result=();
+        if ($pos <0){print "$pos is not a valid position (likely caused by circular MT chromosome)\n";return;}
+
+	@result = `samtools faidx $fasta $chr:$pos-$pos`;
+	if($result[1]){chomp($result[1]);
+	return uc($result[1]);
+	}
+	return("NA");
+	#### after return will not be printed
+	####print "RESULTS=@result\n";
+}
+
+sub getBases{
+        my ($chr,$pos1,$pos2,$fasta)=@_;
+        #don't require chr
+        #if($chr !~ /^chr/){die "$chr is not correct\n";}
+my @result=();
+        if ($pos1 <0){print "$pos1 is not a valid position (likely caused by circular MT chromosome)\n";return;};
+
+        @result = `samtools faidx $fasta $chr:$pos1-$pos2`;
+	if(!$result[1]){$result[1]="NA"};
+        chomp($result[1]);
+        return uc($result[1]);
+
+        #### after return will not be printed
+        ####print "RESULTS=@result\n";
+}
+###############################################################################
+#### to get time
+sub spGetCurDateTime {
+	my ($sec, $min, $hour, $mday, $mon, $year) = localtime();
+	my $curDateTime = sprintf "%4d-%02d-%02d %02d:%02d:%02d",
+	$year+1900, $mon+1, $mday, $hour, $min, $sec;
+	return ($curDateTime);
+}
+
+
+###############################################################################
+#### print header
+sub print_header {
+	my $date=&spGetCurDateTime();
+	my $header = qq{##fileformat=VCFv4.1
+##fileDate=$date
+##source=SoftSearch.pl
+##reference=$INPUT_FASTA
+##Usage= SoftSearch.pl -l $minSoft -q $MapQ -r $minSoftReads -d $dist_To_Soft -m $minRP -u $unmated_pairs -s $num_sd -b @INPUT_BAM -f $INPUT_FASTA -o $OUTNAME
+##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
+##INFO=<ID=EVENT,Number=1,Type=String,Description="ID of event associated to breakend">
+##FORMAT=<ID=lSC,Number=1,Type=Integer,Description="Length of the longest soft clips supporting the BND">
+##FORMAT=<ID=nSC,Number=1,Type=Integer,Description="Number of supporting soft-clips\">
+##FORMAT=<ID=uRP,Number=1,Type=Integer,Description="Number of unmated read pairs nearby Soft-Clips">
+##FORMAT=<ID=levD_local,Number=1,Type=Float,Description="Levenstein distance between soft-clipped bases and the area around the original soft-clipped site">
+##FORMAT=<ID=levD_distl,Number=1,Type=Float,Description="Levenstein distance between the soft-clipped bases and mate location">
+##FORMAT=<ID=CTX,Number=1,Type=Integer,Description="Number of chromosomal translocations">
+##FORMAT=<ID=DEL,Number=1,Type=Integer,Description="Number of reads supporting Large Deletions">
+##FORMAT=<ID=INS,Number=1,Type=Integer,Description="Number of reads supporting Large insertions">
+##FORMAT=<ID=NOV_INS,Number=1,Type=Integer,Description="Number of reads supporting novel sequence insertion">
+##FORMAT=<ID=INV,Number=1,Type=Integer,Description="Number of reads supporting inversions">
+##FORMAT=<ID=sDEL,Number=1,Type=Integer,Description="Number of reads supporting novel sequence insertion">
+##INFO=<ID=NO_MATE_SC,Number=1,Type=Flag,Description="When there is no softclipping of the mate read location, an appromiate position is used">
+##FORMAT=<ID=GT,Number=1,Type=String,Description="Dummy value for maintaining VCF-Spec">
+#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\t$SAMPLE_NAME\n};
+
+	print OUT $header;
+}
+
+
+###############################################################################
+sub bulk_work {
+	my ($side, $line, $file) = @_;
+	my $local_levD = 0;
+	my $distl_levD = 0;
+
+	#my @info = split(/\t/, $line);
+	my @plus_Reads = split(/\t/, $line);
+	$plus_Reads[7] =~ s/\n//g;
+
+	#### softclip length and softclip size.
+	my $lSC = $plus_Reads[4];
+	my $nSC = $plus_Reads[6];
+
+
+	#Get all types of compatible reads
+	#Get improperly paired reads (@ max distance)
+
+	#### default value for left SIDE.
+	#If left-clip, then look downstream for match of softclipped reads to define a deletion, but look for DRPs upstream
+	my $sv_type = "SVTYPE=BND";
+	my $start_local=0; my $end_local=0;my $target_local="";my $target_drp="";my $start_drp="";my $end_drp="";
+	if ($side =~ /left/) {
+		$start_local = $plus_Reads[1]-$dist_To_Soft;
+		$end_local = $plus_Reads[2];
+                $start_drp = $plus_Reads[1];
+                $end_drp = $plus_Reads[1]+$dist_To_Soft;
+	
+	}
+	else{                
+                $start_local = $plus_Reads[1];
+                $end_local = $plus_Reads[1]+$dist_To_Soft;
+                $start_drp = $plus_Reads[1]-$dist_To_Soft;
+                $end_drp = $plus_Reads[1];
+        }
+	
+	$target_local=join("", $plus_Reads[0], ":", $start_local, "-", $end_local);
+	$target_drp=join("", $plus_Reads[0], ":", $start_drp, "-", $end_drp);
+	my $num_unmapped_pairs="";
+	if ($side =~ /right/) {
+		$num_unmapped_pairs=`samtools view $new_blacklist -q $MapQ -f8 -F 1536 -c $file $target_drp`;
+	} else {
+        $num_unmapped_pairs=`samtools view $new_blacklist -q $MapQ -f24 -F 1536 -c $file $target_drp`;
+	}
+if($verbose){print "samtools view $new_blacklist -q $MapQ -f24 -F 1536 -c $file $target_drp\n";}
+
+	$num_unmapped_pairs=~s/\n//;
+if($verbose){print "NUM UNMAPPED PAIRS= $num_unmapped_pairs\n";}
+	my $REF1_base = "";
+	my $REF2_base = "";
+	my $INFO_1 = "";
+	my $INFO_2 = "";
+	my $ALT_1 = "";
+	my $ALT_2 = "";
+	my $isize = 0;
+	my $QUAL = "";
+	my $FORMAT = "GT:";
+
+	#### get 8 bit rand id
+	my $BND1_name = join "", map { ("a".."z")[rand 26] } 1..8;
+	my $BND2_name = join "", map { ("a".."z")[rand 26] } 1..8;
+	$BND1_name=join "_","BND",$BND1_name;
+	$BND2_name=join "_","BND",$BND2_name;
+
+	my $counts = {CTX => 0, DEL => 0, INS => 0, INV => 0, TDUP => 0, NOV_INS => 0 };
+	my $event_mate_info = {CTX => "", DEL => "", INS => "", INV => "", TDUP => "", NOV_INS => "" };
+
+	#### get mate pair info and counts per event
+	foreach my $e (sort keys %{$counts}) {
+		my $h = get_counts_n_info($e, $side, $MapQ, $file, $dist_To_Soft, $target_drp, $upper_limit, $lower_limit);
+
+		$counts->{$e} = $h->{count};
+		$event_mate_info->{$e} = $h->{info};
+	}
+
+	my $max = 0;
+	my $type = "UNKNOWN";
+	my $nRP = 0;
+	my $mate_info = "NA\tNA\tNA\tNA";
+	my $summary = "GT:";
+
+	#### find max count of events and set type, nRP and info to corresponding
+	#### max count event.
+	#### also create a summary string of all counts to be added to VCF file.
+	foreach my $e (sort keys %{$counts}){
+#		if ($counts->{$e} >=i $max){
+		if ($counts->{$e} > $max){		
+			$type = $e .",". $counts->{$e};
+			$nRP = $counts->{$e};
+
+			$max = $counts->{$e};
+
+			if (length($event_mate_info->{$e})) {
+				$mate_info = $event_mate_info->{$e};
+			}
+		}
+
+		$summary .= $e .",". $counts->{$e} .":";
+	}
+	#print "done with Summaryi=$summary\n";
+	#### remove last colon ":" from
+	$summary =~ s/:$//;
+ if (($minRP > $max)&&(!$disable_RP_only )){return};
+
+	#### Run Levenstein distance on softclip in target region to find out if its a small deletion/insetion
+	#### passing 1: clip_seq, 2: chr, 3: start, 4: end, 5: ref file.
+	my $levD = new LevD;
+########################################################
+########################################################
+########################################################
+
+	#### redefine start and end location for LevD calc.
+#	$start = $plus_Reads[1]-$dist_To_Soft;
+#	$end = $plus_Reads[2];
+	my $num_bases_to_loc=0;
+	my $new_start=0;
+	my $new_end=0;
+	my $del_seq="";
+        my $start = $start_local;
+        my $end = $end_local;
+	if ($lSC=~/NA/){$lSC=0}
+
+	if ($side =~ /right/) {
+	        $levD->search($plus_Reads[3], $plus_Reads[0], $start, $end, $INPUT_FASTA);
+		$local_levD = sprintf("%.2f", $levD->{relative_edit_dist});
+	        $num_bases_to_loc=$levD->{index};
+		$new_start = $plus_Reads[2];
+                if ($plus_Reads[2]=~/^[0-9]/){$new_end=$plus_Reads[2]+$lSC};
+	}
+	else{
+		$levD->search($plus_Reads[3], $plus_Reads[0], $start, $end, $INPUT_FASTA);
+		$local_levD = sprintf("%.2f", $levD->{relative_edit_dist});
+		$num_bases_to_loc=$levD->{index};
+		if ($plus_Reads[2]=~/^[0-9]/){$new_start=$plus_Reads[2]-$lSC};
+                $new_end = $plus_Reads[2];
+	}
+	return if((!$new_start)||(!$new_end));
+return if ($new_start<0);	
+	$del_seq=getBases($plus_Reads[0], $new_start,$new_end,$INPUT_FASTA);
+##############################################################################
+#	#If there is a match, where is the start position of the match?
+#
+##############################################################################
+
+
+	#if $plus_Reads[3] eq "NA", then it was found without soft-clipped reads
+	if($plus_Reads[3] !~  /NA/){
+			if (($local_levD < $levD_local_threshold)) {
+				return if (!$sv_only);
+				#### add value to summary to be written to vcf file.
+				$summary = "GT:sDel," . $plus_Reads[6];
+				$type = "sDEL";
+				###########################################################################
+				##### Printing output
+
+				#########################################
+				##### Get DNA info
+				#########################################
+				#$REF1_base = getSeq($plus_Reads[0], $plus_Reads[1], $INPUT_FASTA);
+				$REF1_base = substr($del_seq, 0, 1);
+
+				#### this is alt ref. for softclip its $plus_Reads[3]
+				$REF2_base = $del_seq;
+				$QUAL = 1/($local_levD + 0.001);
+				$QUAL = sprintf("%.2f",$QUAL);
+				$isize = length($del_seq);
+
+				#### svtype = none for sDEL
+				#### isize = length($info[3]);
+				#### nRP = NA
+				#### mate_id = NA
+				#### CTX,:DEL,:....sDEL,##
+				$INFO_1=join(";", "SVTYPE=NA", "EVENT=$type", "ISIZE=$isize");
+
+				#Add Sample infomration
+				my $FORMAT="GT:sDEL";
+				$FORMAT .= ":lSC:nSC:uRP:levD_local";
+				my $SAMPLE= "0/1:";
+				$SAMPLE .= "$plus_Reads[6]:$lSC:$nSC:$num_unmapped_pairs:$local_levD";
+
+				#### remove any white spaces.
+				$INFO_1=~s/\s//g;
+				$INFO_2=~s/\s//g;
+
+				$BND1_name =~ s/^BND/LEVD/;
+				# If left, then the start position is plus_Reads[1]-isize
+				my $start_pos=0;
+				#Make sure Ref1 and Ref2 bases are different
+				if($REF2_base eq $REF1_base){$REF1_base="NA"}
+				if($side=~/left/){$start_pos=$plus_Reads[1]-$isize}else{$start_pos=$plus_Reads[1]};		
+				 my $var=join("\t", $plus_Reads[0], $start_pos, $BND1_name, $REF2_base, $REF1_base, $QUAL, "PASS", $INFO_1,$FORMAT,$SAMPLE);
+				return $var;
+				#print OUT join ("\t", $plus_Reads[0], $start_pos, $BND1_name, $REF2_base, $REF1_base, $QUAL, "PASS", $INFO_1,$FORMAT,$SAMPLE, "\n");
+#				return;
+			}
+		}
+
+		#### Otherwise, look for DRP mate info
+	#if($nRP=~/NA/){print "MATE_INFO=$mate_info\tSide=$side\tline=$line\n";}
+		my @mate_info_arr = split(/\t/, $mate_info);
+		$nRP = $mate_info_arr[3];
+		my $mate_chr=$mate_info_arr[0];
+
+			if((! defined $nRP) || ($nRP =~ /na/i) || ($mate_chr =~ /NA/) ){
+			#PRINT UNKNOWN
+return if ($nRP =~ /na/i);
+	#print "There is an unknown\nNRP=$nRP Mate_CHR=$mate_chr minRP=$minRP\n";die;
+				$summary .= ":unknown," . $plus_Reads[6];
+				$type = "unknown";
+				$REF1_base = getSeq($plus_Reads[0], $plus_Reads[1], $INPUT_FASTA);
+				$REF2_base = $plus_Reads[3];
+				$BND1_name =~ s/^BND/UNKNOWN/;
+				$QUAL = 1/($local_levD + 0.001);
+				$QUAL = sprintf("%.2f",$QUAL);
+				$INFO_1=join(";", "SVTYPE=unknown", "EVENT=unknown", "ISIZE=unknown");
+				#Add Sample infomration
+				my $FORMAT="GT:sDEL";
+				$FORMAT .= ":lSC:nSC:uRP:levD_local";
+				my $SAMPLE = "0/1:";
+				$SAMPLE .= "$plus_Reads[6]:$lSC:$nSC:$num_unmapped_pairs:$local_levD";
+				#### remove any white spaces.
+				$INFO_1=~s/\s//g;
+			       #print join ("\t", $plus_Reads[0], $plus_Reads[1],  $REF2_base, $REF1_base, $QUAL, "PASS", $INFO_1,$FORMAT,$SAMPLE, "\n");
+
+				#print OUT join ("\t", $plus_Reads[0], $plus_Reads[1], $BND1_name, $REF1_base, $REF2_base, $QUAL, "PASS", $INFO_1,$FORMAT,$SAMPLE, "\n");
+				my $var=join("\t", $plus_Reads[0], $plus_Reads[1], $BND1_name, $REF1_base, $REF2_base, $QUAL, "PASS", $INFO_1,$FORMAT,$SAMPLE);
+				return $var;
+
+		}
+
+		#### end if there is no mate info or nRP+uRP<minRP
+		return if (($nRP<$minRP)&&($unmated_pairs > ($num_unmapped_pairs+$nRP)));
+
+		##################################################################################
+		# Find out if mates have nearby soft-clips (to refine the breakpoints)
+		##################################################################################
+		#Look for evidence of soft-clipping near mate
+		my @mate_soft_arr = ();
+		my $mate_start = 0;
+		my $mate_soft = "";
+
+		@mate_info_arr = split(/\t/, $mate_info);
+
+		#### mate start and end locations.
+		my $filename = $right_sc;
+
+		$start = $mate_info_arr[1] - $dist_To_Soft;
+		$end = $mate_info_arr[1];
+
+		if ($side =~ /right/) {
+			$start = $mate_info_arr[2];
+			$end = $mate_info_arr[2] + $dist_To_Soft;
+
+			$filename = $left_sc;
+		}
+
+		#### add levenstein distance to Summary
+	#print "Calc distal Levd\n";
+		$levD->search(rev_comp($plus_Reads[3]), $mate_info_arr[0], $start, $end, $INPUT_FASTA);
+		$distl_levD = sprintf("%.2f", $levD->{relative_edit_dist});
+	$distl_levD = "NA" if($plus_Reads[3] =~ /NA/);
+	#If there is no softclips to string match, then give 0 as quality value
+       if ($plus_Reads[3] !~ /NA/){
+			$QUAL=1/($distl_levD + 0.001);
+		}
+		else	{
+			$QUAL=0;
+		};
+	$QUAL=sprintf("%.2f",$QUAL);
+	#### looking for softclips to refine break point
+	#### if left look in right and vice-versa.
+	$cmd = qq{echo -e "$mate_info_arr[0]\t$start\t$end"};
+	$cmd .= qq{ | awk -F'\t' 'NR==3' | intersectBed -a stdin -b $filename | head -1};
+
+	$mate_soft = `$cmd`;
+
+	$mate_soft =~ s/\n//g;
+	@mate_soft_arr = split(/\s/, $mate_soft);
+my $NO_MATE_SC="";
+	if(@mate_soft_arr){
+		$mate_chr = $mate_soft_arr[0];
+		$mate_start = $mate_soft_arr[1];
+                $NO_MATE_SC="APPROXIMATE";
+
+	} else{
+		@mate_info_arr = split(/\s/,$mate_info);
+		$mate_chr = $mate_info_arr[0];
+		$mate_start = $mate_info_arr[1];
+	}
+
+	#end if there is no mate info
+	return if ($mate_chr eq "");
+	#end if there is no mate info and !disable_RP_only
+	return if (($lSC =~/NA/)&&(!$disable_RP_only));
+	
+	
+	###########################################################################
+	##### Printing output
+
+	#########################################
+	# Get DNA info
+	#########################################
+	#print "PLUS_READS=$plus_Reads[0],$plus_Reads[1]\nMATE=$mate_chr,$mate_start,$INPUT_FASTA\n";
+	$REF1_base = getSeq($plus_Reads[0], $plus_Reads[1], $INPUT_FASTA);
+
+	### this is alt ref. for softclip its $plus_Reads[3]
+	$REF2_base = getSeq($mate_chr, $mate_start, $INPUT_FASTA);
+
+	#########################################
+	# print in VCF format
+	#########################################
+
+	#### abs value to account for left and right reads.
+	$isize = abs($plus_Reads[1]-$mate_start);
+	
+	my $event_type=$type;
+	$event_type=~ s/,|[0-9]//g;
+	$INFO_1=join(";", "$sv_type", "EVENT=$event_type", "ISIZE=$isize","MATE_ID=$BND2_name");
+	$INFO_2=join(";", "$sv_type", "EVENT=$event_type", "ISIZE=$isize","MATE_ID=$BND1_name");
+
+	#### remove any white spaces.
+	#### ask: did you mean to remove space from ends? eg. trim()
+	$INFO_1=~s/\s//g;
+	$INFO_2=~s/\s//g;
+
+	$FORMAT=$summary; 
+ 	$FORMAT=~ s/,|[0-9]//g;
+        $FORMAT .= ":lSC:nSC:uRP:distl_levD";
+	if($NO_MATE_SC){$INFO_2 .= ":NO_MATE_SC"}
+	my $SAMPLE="0/1:";	
+	$SAMPLE .=$summary;
+#        if($NO_MATE_SC){$SAMPLE.= ":$NO_MATE_SC"}
+
+	$SAMPLE=~s/[A-Z|,|_]//g;
+        my $MATE_SAMPLE=$SAMPLE;
+        $SAMPLE .= ":$lSC:$nSC:$num_unmapped_pairs:$distl_levD";
+	$MATE_SAMPLE .=":NA:NA:NA:NA";
+	$SAMPLE=~s/::/:/g;
+	$MATE_SAMPLE=~s/::/:/g;
+ 
+	if($type !~ /INV/){
+		$ALT_1 = join("","]",$mate_chr,":",$mate_start,"]",$REF1_base);
+		$ALT_2 = join("",$REF2_base,"[",$plus_Reads[0],":",$plus_Reads[1],"[");
+		#		2      321682 bnd_V  T   ]13:123456]T  6    PASS SVTYPE=BND
+		#		13     123456 bnd_U  C   C[2:321682[   6    PASS SVTYPE=BND
+	} else {
+		$ALT_1 = join("", "]", $plus_Reads[0], ":", $plus_Reads[1], "]", $REF2_base);
+		$ALT_2 = join("", $REF1_base, "[", $mate_chr, ":", $mate_start, "[");
+	}
+
+	if(($mate_chr) && ($plus_Reads[0])){
+#		print OUT join ("\t", $plus_Reads[0], $plus_Reads[1], $BND1_name, $REF1_base, $ALT_1, $QUAL,"PASS", $INFO_1, $FORMAT,$SAMPLE,"\n");
+#		print OUT join ("\t", $mate_chr, $mate_start, $BND2_name, $REF2_base, $ALT_2, $QUAL, "PASS", $INFO_2, $FORMAT,$MATE_SAMPLE,"\n");
+		my $var=join ("\t", $plus_Reads[0], $plus_Reads[1], $BND1_name, $REF1_base, $ALT_1, $QUAL,"PASS", $INFO_1, $FORMAT,$SAMPLE);
+		return $var;		
+	}
+}
+
+###############################################################################
+###############################################################################
+sub get_counts_n_info {
+        my ($event, $side, $mapQ, $file, $dist, $target, $upL, $lwL) = @_;
+
+        my $mate_info = "";
+        my $cmd = "";
+
+        if ($event =~ /^CTX$/i) {
+                #print "CTX side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 32 -F 1536 $file $target};
+                        $cmd .= qq{ | perl -ane 'if(\$F[6] ne "="){\$end=\$F[7]+1; print join ("\\t",\$F[6],\$F[7],\$end,"\\n")}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+if($verbose){print "$cmd\n"}
+                        $mate_info=`$cmd`;
+                } else {
+                        $cmd = qq{ samtools view $new_blacklist -q $mapQ -f 16 -F 1536 $file $target};
+                        $cmd .= qq{ | perl -ane 'if(\$F[6] ne "="){\$end=\$F[7]+1; print join ("\\t",\$F[6],\$F[7],\$end,"\\n")}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+if($verbose){print "$cmd\n"}
+                        $mate_info=`$cmd`;
+                }
+        } elsif ($event =~ /^DEL$/i) {
+                #print "DEL side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ  -f 32 -F 1552 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$9>$upL)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+if($verbose){print "$cmd\n"}
+                        $mate_info=`$cmd`;
+                } else {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -F 1568 -f 16 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"} {if((\$7 ~ /=/)&&(\$9<-$upL)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist_To_Soft -n | sort -k4nr | head -1};
+if($verbose){print "$cmd\n"}
+
+                        $mate_info=`$cmd`;
+                }
+        } elsif ($event =~ /^INS$/i) {
+                #print "INS side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 32 -F 1552 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$9<$lwL && \$9 > 0 )){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                } else {
+                        $cmd = qq {samtools view $new_blacklist -q $mapQ -f 16 -F 1568 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$9>-$lwL && \$9 < 0 )){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                }
+        } elsif ($event =~ /^INV$/i) {
+                #print "INV side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ  -F 1596 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist_To_Soft -n | sort -k4nr | head -1};
+if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                } else {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 48 -F 1548 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                }
+        } elsif ($event =~ /^TDUP$/i) {
+                #print "TDUP side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 32 -F 1552 $file $target};
+#			$cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$9>$upL)){end=\$8+1;print \$3,\$8,end}}'};
+			$cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$4>\$8)&&(\$9<0)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                } else {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 16 -F 1568 $file $target};
+#                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$9<-$upL )){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$4<\$8)&&(\$9>0)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                }
+        } elsif ($event =~ /^NOV_INS$/i) {
+                #print "NOV_INS side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 8 -F 1552 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{end=\$8+1;print \$3,\$8,end}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                } else {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ  -f 24 -F 1536 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{end=\$8+1;print \$3,\$8,end}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                }
+        }
+
+        $mate_info=~s/\n//g;
+        my @tmp=split(/\t/, $mate_info);
+
+        my $counts = 0;
+
+        if (defined $tmp[3]) {
+                $tmp[3] =~ s/\n//g;
+
+                $counts = $tmp[3] if (length($tmp[3]));
+        }
+        return ({count=>$counts, info=>$mate_info});                                                                                                                                
+}
+
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/script/SoftSearch.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,1192 @@
+#!/usr/bin/perl
+
+####
+#### Usage: SoftSearch.pl [-lqrmsd] -b <BAM> -f <Genome.fa> -sam <samtools path> -bed <bedtools path>
+#### Created 1-30-2012 by Steven Hart, PhD
+#### hart.steven@mayo.edu
+#### Required bedtools & samtools to be in path
+
+
+use lib "/data2/bsi/reference/softsearch/lib" ;
+
+use Getopt::Long;
+use strict;
+use warnings;
+#use Data::Dumper;
+use LevD;
+use File::Basename;
+
+my ($INPUT_BAM,$INPUT_FASTA,$OUTPUT_FILE,$minSoft,$minSoftReads,$dist_To_Soft,$bedtools,$samtools);
+my ($minRP, $temp_output, $num_sd, $MapQ, $chrom, $unmated_pairs, $minBQ, $pair_only, $disable_RP_only);
+my ($levD_local_threshold, $levD_distl_threshold,$pe_upper_limit,$high_qual,$sv_only,$blacklist,$genome_file,$verbose);
+
+my $cmd = "";
+
+#Declare variables
+GetOptions(
+	'b=s' => \$INPUT_BAM,
+	'f=s' => \$INPUT_FASTA,
+	'o:s' => \$OUTPUT_FILE,
+	'm:i' => \$minRP,
+	'l:i' => \$minSoft,
+	'r:i' => \$minSoftReads,
+	't:i' => \$temp_output,
+	's:s' => \$num_sd,
+	'd:i' => \$dist_To_Soft,
+	'q:i' => \$MapQ,
+	'c:s' => \$chrom,
+	'u:s' => \$unmated_pairs,
+	'x:s' => \$minBQ,
+	'p' => \$pair_only,
+	'g' => \$disable_RP_only,
+	'j:s' => \$levD_local_threshold,
+	'k:s' => \$levD_distl_threshold,
+        'a:s' => \$pe_upper_limit,
+        'e:s' => \$high_qual,
+	'L' => \$sv_only,
+	'v' => \$verbose, 
+	'blacklist:s' => \$blacklist,
+	'genome_file:s' => \$genome_file,
+	"help|h|?"	=> \&usage);
+
+unless($sv_only){$sv_only=""};
+if(defined($INPUT_BAM)){$INPUT_BAM=$INPUT_BAM} else {print usage();die "Where is the BAM file?\n\n"}
+if(defined($INPUT_FASTA)){$INPUT_FASTA=$INPUT_FASTA} else {print usage();die "Where is the fasta file?\n\n"}
+my ($fn,$pathname) = fileparse($INPUT_BAM,".bam");
+my $index=`ls $pathname/$fn*bai|head -1`;
+#my $index =`ls \${INPUT_BAM%.bam}*bai`;
+#print "INDEX=$index\n";
+if(!$index){die "\n\nERROR: you need index your BAM file\n\n"}
+
+### get current time
+print "Start Time : " . &spGetCurDateTime() . "\n";
+my $now = time;
+
+#if(defined($OUTPUT_FILE)){$OUTPUT_FILE=$OUTPUT_FILE} else {$OUTPUT_FILE="output.vcf"; print "\nNo outfile specified.  Using output.vcf as default\n\n"}
+if(defined($minSoft)){$minSoft=$minSoft} else {$minSoft=5}
+if(defined($minRP)){$minRP=$minRP} else {$minRP=5}
+if(defined($minSoftReads)){$minSoftReads=$minSoftReads} else {$minSoftReads=5}
+if(defined($dist_To_Soft)){$dist_To_Soft=$dist_To_Soft} else {$dist_To_Soft=300}
+if(defined($num_sd)){$num_sd=$num_sd} else {$num_sd=6}
+if(defined($MapQ)){$MapQ=$MapQ} else {$MapQ=20}
+
+unless (defined $pe_upper_limit) { $pe_upper_limit = 10000; }
+unless (defined $levD_local_threshold) { $levD_local_threshold = 0.05; }
+unless (defined $levD_distl_threshold) { $levD_distl_threshold = 0.05; }
+#Get sample name if available
+my $SAMPLE_NAME="";
+my $OUTNAME ="";
+$SAMPLE_NAME=`samtools view -f2 -H $INPUT_BAM|awk '{if(\$1~/^\@RG/){sub("ID:","",\$2);name=\$2;print name}}'|head -1`;
+$SAMPLE_NAME=~s/\n//g;
+if (!$OUTPUT_FILE){
+	if($SAMPLE_NAME ne ""){$OUTNAME=$SAMPLE_NAME.".vcf"}
+	else {$OUTNAME="output.vcf"}
+}
+else{$OUTNAME=$OUTPUT_FILE}
+
+print "Writing results to $OUTNAME\n";
+
+
+##Make sure if submitting on SGE, to prepned the "chr".  Not all referecne FAST files require "chr", so we shouldn't force the issue.
+if(!defined($chrom)){$chrom=""}
+if(!defined($unmated_pairs)){$unmated_pairs=0}
+
+my $badQualValue=chr($MapQ);
+if(defined($minBQ)){ $badQualValue=chr($minBQ); }
+
+if($badQualValue  eq "#"){$badQualValue="\#"}
+
+# adding and cheking for samtools and bedtools in the PATh
+## check for bedtools and samtools in the path
+$bedtools=`which intersectBed` ;
+if(!defined($bedtools)){die "\nError:\n\tno bedtools. Please install bedtools and add to the path\n";}
+#$samtools=`samtools 2>&1`;
+$samtools=`which samtools`;
+if($samtools !~ /(samtools)/i){die "\nError:\n\tno samtools. Please install samtools and add to the path\n";}
+
+print "Usage = SoftSearch.pl -l $minSoft -q $MapQ -r $minSoftReads -d $dist_To_Soft -m $minRP -s $num_sd -c $chrom -b $INPUT_BAM -f $INPUT_FASTA -o $OUTNAME \n\n";
+sub usage {
+	print "\nusage: SoftSearch.pl [-cqlrmsd] -b <BAM> -f <Genome.fa> \n";
+	print "\t-q\t\tMinimum mapping quality [20]\n";
+	print "\t-l\t\tMinimum length of soft-clipped segment [5]\n";
+	print "\t-r\t\tMinimum depth of soft-clipped reads at position [5]\n";
+	print "\t-m\t\tMinimum number of discordant read pairs [5]\n";
+	print "\t-s\t\tNumber of sd away from mean to be considered discordant [6]\n";
+	print "\t-u\t\tNumber of unmated pairs [0]\n";
+	print "\t-d\t\tMax distance between soft-clipped segments and discordant read pairs [300]\n";
+	print "\t-o\t\tOutput file name [output.vcf]\n";
+	print "\t-t\t\tPrint temp files for debugging [no|yes]\n";
+	print "\t-c\t\tuse only this chrom or chr:pos1-pos2\n";
+	print "\t-p\t\tuse paired-end mode only. In other words, don't try to find soft-clipping events!\n";
+	print "\t-g\t\tEnable paired-only seach. This will look for discordant read pairs even without soft clips.\n";
+        print "\t-a\t\tset the minimum distance for a discordant read pair without soft-clipping info [10000]\n";
+        print "\t-L\t\tFlag to print out even small deletions (low quality)\n";
+        print "\t-e\t\tdisable strict quality filtering of base qualities in soft-clipped reads [no]\n";
+        print "\t-blacklist\tareas of the genome to skip calling.  Requires -genome_file\n";
+        print "\t-genome_file\ttab seperated value of chromosome name and length.  Only used with -blacklist option\n\n";
+
+	exit 1;
+	}
+
+
+#############################################################
+# create temporary variable name
+#############################################################
+srand (time ^ $$ ^ unpack "%L*", `ps axww | gzip -f`);
+our $random_name=join "", map { ("a".."z")[rand 26] } 1..8;
+
+#############################################################
+## create green list
+##############################################################
+#
+my $new_blacklist="";
+if($blacklist){
+        if(!$genome_file){die "if using a blacklist, you must also specify the location of a genome_file
+        The format of the genome_file should be
+                chrom   size
+                chr1    249250621
+                chr2    243199373
+                ...
+
+        If using hg19, you can ge the genome file by
+                mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \"select chrom, size from hg19.chromInfo\"  > hg19.genome";}
+        
+	$cmd=join("","complementBed -i $blacklist -g $genome_file >",$random_name,".bed") ;
+	system ($cmd);
+	$new_blacklist=join(""," -L ",$random_name,".bed ");
+	}
+
+if($verbose){print "CMD=$cmd\nBlacklist is $new_blacklist\n";}
+
+
+
+
+
+#############################################################
+# Calcualte insert size distribution of properly mated reads
+#############################################################
+
+#Change for compatability with other operating systems
+#my $metrics=`samtools view -q $MapQ -f2 $INPUT_BAM $chrom|head -10000|cut -f9|awk '{if (\$1<0){\$1=-\$1}else{\$1=\$1} sum+=\$1; sumsq+=\$1*\$1} END {print sum/NR, sqrt(sumsq/NR - (sum/NR)**2)}'`;
+
+my $metrics=`samtools view -q $MapQ -f2 $INPUT_BAM $chrom|cut -f9|head -10000|awk '{if (\$1<0){\$1=-\$1}else{\$1=\$1} sum+=\$1; sumsq+=\$1*\$1} END {print sum/NR, sqrt(sumsq/NR - (sum/NR)^2)}'`;
+#my ($mean,$stdev)=split(/ /,$metrics);
+my ($mean,$stdev)=split(/\s/,$metrics);
+$stdev=~s/\n//;
+my $upper_limit=int($mean+($num_sd*$stdev));
+my $lower_limit=int($mean-($num_sd*$stdev));
+die if (!$mean);
+print qq{The mean insert size is $mean +/- $stdev (sd)
+The upper limit = $upper_limit
+The lower limit = $lower_limit\n
+};
+if($lower_limit<0){
+	print "Warning!! Given this insert size distribution, we can not call small indels.  No other data will be affected\n";
+	$lower_limit=1;
+}
+my $tmp_name=join ("",$random_name,".tmp.bam");
+my $random_file_sc = "";
+my $command = "";
+
+#############################################################
+# Make sam file that has soft clipped reads
+#############################################################
+#give file a name
+if(!defined($pair_only)){
+	$random_file_sc=join ("",$random_name,".sc.sam");
+	$command=join ("","samtools view -q $MapQ -F 1024 $INPUT_BAM $chrom $new_blacklist| awk '{OFS=\"\\t\"}{c=0;if(\$6~/S/){++c};if(c == 1){print}}' | perl -ane '\$TR=(\@F[10]=~tr/\#//);if(\$TR<2){print}' > ", $random_file_sc);
+
+	print "Making SAM file of soft-clipped reads\n";
+if($verbose){	print "$command\n";}
+	system("$command");
+
+	#############################################################
+	# Find areas that have deep enough soft-clip coverage
+	print "Identifying soft-clipped regions that are at least $minSoft bp long \n";
+	open (FILE,"$random_file_sc")||die "Can't open soft-clipped sam file $random_file_sc\n";
+
+	my $tmpfile=join("",$random_file_sc,".sc.passfilter");
+	open (OUT,">$tmpfile")||die "Can't write files here!\n";
+
+	while(<FILE>){
+		@_ = split(/\t/, $_);
+		#### parse CIGAR string and create a hash of array of each operation
+		my @CIGAR = split(/([0-9]+[SMIDNHXP])/, $_[5]);
+		my $hash;
+		map { push(@{$hash->{$2}}, $1) if (/(\d+)([SMIDNHXP])/) } @CIGAR;
+
+		#for ($i=0; $i<=$#softclip_pos; $i++)	{
+		foreach my $softclip (@{$hash->{S}}) {
+			#if	($CIGAR[$softclip_pos[$i]] > $minSoft){
+			if	($softclip > $minSoft){
+				###############Make sure base qualities don't have more than 2 bad marks
+				my $qual=$_[10];
+				my $TR=($qual=~tr/$badQualValue//);
+				if($badQualValue eq "#"){ $TR=($qual=~tr/\#//); }
+				#Skip the soft clip if there is more than 2 bad qual values
+				#next if($TR > 2);
+#				if (!$high_qual){next if($TR > 2);}
+				print OUT;
+				last;
+			}
+		}
+	}
+	close FILE;
+	close OUT;
+
+	$command=join(" ","mv",$tmpfile,$random_file_sc);
+if($verbose){	print "$command\n";}
+	system("$command");
+}
+
+#########################################################
+#Stack up SoftClips
+#########################################################
+my $random_file=join("",$random_name,".sc.direction.bed");
+if(!defined($pair_only)){
+        open (FILE,"$random_file_sc")|| die "Can't open sam file\n";
+        #$random_file=join("",$random_name,".sc.direction");
+
+        print "Calling sides of soft-clips\n";
+        #\nTMPOUT=$random_file\tINPUT=$random_file_sc\n\n";
+        open (TMPOUT,">$random_file")|| die "Can't create tmp file\n";
+
+        while (<FILE>){
+                @_ = split(/\t/, $_);
+                #### parse CIGAR string and create a hash of array of each operation
+                my @CIGAR = split(/([0-9]+[SMIDNHXP])/, $_[5]);
+                my $hash;
+                map { push(@{$hash->{$2}}, $1) if (/(\d+)([SMIDNHXP])/) } @CIGAR;
+
+                #### next if softclips on each end
+                next if ($_[5] =~ /^[0-9]+S.*S$/);
+
+                #### next softclip occurs in the middle
+                next if ($_[5] =~ /^[0-9]+[^S][0-9].*S.+$/);
+
+                my $softclip = $hash->{S}[0];
+
+                my $end1 = 0;
+                my $end2 = 0;
+                my $softBases = "";
+		my $right_corrected="";my $left_corrected="";
+
+        if ($softclip > $minSoft) {
+		
+                        ####If the soft clip occurs at end of read and its on the minus strand, then it's a right clip
+                        if ($_[5] =~ /^.*S$/) {
+                                $end1=$_[3]+length($_[9])-$softclip-1;
+                                $end2=$end1+1;
+next if ($end1<0);
+                                #RIGHT clip on Minus
+                                $softBases=substr($_[9], length($_[9])-$softclip, length($_[9]));
+                                #Right clips don't always get clipped correctly, so fix that
+                                # Check to see if sc base matches ref
+                                $right_corrected=baseCheck($_[2],$end2,"right",$softBases);
+                               print TMPOUT "$right_corrected\n"
+
+                        } else {
+                                #### Begins with S (left clip)
+                                $end1=$_[3]-$softclip;
+next if ($end1<0);
+
+                                $softBases=substr($_[9], 0,$softclip);#print "TMP=$softBases\n";
+        			$left_corrected=baseCheck($_[2],$end1,"left",$softBases);
+if(!$left_corrected){print "baseCheck($_[2],$end1,left,$softBases)\n";next}
+                               print TMPOUT "$left_corrected\n";
+#print "\nSEQ=$_[9]\t\n";
+
+                        }
+        }
+  }
+close FILE;
+close TMPOUT;
+}
+sub baseCheck{
+        my ($chrom,$pos,$direction,$softBases)=@_;
+        #skip if position is less than 0, which is caused by MT DNA
+        return if ($pos<0);
+        my $exit="";
+
+        while(!$exit){
+        if($direction=~/right/){
+                        my $refBase=getSeq($chrom,$pos,$INPUT_FASTA);
+                        my $softBase=substr($softBases,0,1);
+                        if ($softBase !~ /$refBase/){
+                                my $value=join("\t",$chrom,$pos,$pos+1,join("|",$softBases,$direction));
+                                $exit="STOP";
+                                return $value;
+                        }
+                        else{
+                                $pos=$pos+1;
+                                $softBases=substr($softBases, 1,length($softBases));
+                        }
+         }
+        else{
+                        my $refBase=getSeq($chrom,$pos+1,$INPUT_FASTA);
+                        my $softBase=substr($softBases,-1,1);
+                        if ($softBase !~ /$refBase/){
+                                $pos=$pos-1+length($softBases);
+                                my $value=join("\t",$chrom,$pos-1,$pos,join("|",$softBases,$direction));
+                                $exit="STOP";
+                                return $value;
+                        }
+                        else{
+                                $pos=$pos-1;
+                                $softBases=substr($softBases, 0, -1);
+                                #print "Trying again $softBases\n";
+                       }
+
+        }
+
+}
+}
+#Remove SAM files to conserve space
+unlink($random_file_sc);
+
+
+my $random_file_disc="$INPUT_BAM";
+###
+#
+######################################################
+# Transform Read pair groups into softclip equivalents
+######################################################
+#
+#
+#
+my $v="";
+#if($disable_RP_only){
+print "Running Bam2pair.pl\n";
+print "Looking for discordant read pairs without requiring soft-clipping information\n";
+	use FindBin qw($Bin);
+	my $path=$Bin;
+#	print"\n\nPATH=$path\n\n";
+if($verbose){$v="-v"}
+	my $tmp_out=join("",$random_name,"RP.out");
+	$command=join("","perl ",$path,"/Bam2pair.pl -b $random_file_disc  -o $tmp_out -isize $pe_upper_limit -winsize $dist_To_Soft -min $minRP -chrom $chrom -prefix $random_name -q $MapQ -blacklist $random_name.bed $v");
+if($verbose){	print "$command\n"};
+	system("$command");
+	$command=join("","perl -ane '\$end1=\@F[1];\$end2=\@F[3];print join(\"\\t\",\@F[0..1],\$end1,\"unknown|left\");print \"\\n\";print join(\"\\t\",\@F[2..3],\$end2,\"unknown|left\");print \"\\n\"' ", $tmp_out," >> ",$random_file);
+if($verbose){print "$command\n"};
+	system($command);
+	unlink($tmp_out);
+#}
+#
+
+
+######################################################
+unlink("$random_file","$tmp_name","$random_file","$index","$random_name","$new_blacklist") if (-z $random_file || ! -e $random_file ) ;
+if (-z $random_file || ! -e $random_file){
+	print "Softclipped file is empty($random_file).\nNo soft clipping found using desired paramters\n\n";
+	open (OUT,">$OUTNAME")||die "Can't write files here!\n";
+        &print_header();
+        close OUT;
+        exit;
+	}
+
+
+#############################################################
+#  Make sure there are enough soft-clippped supporting reads
+#############################################################
+my $outfile=join("",$random_file,".sc.merge.bed");
+#sortbed -i .sc.direction | mergeBed -nms -d 25 -i stdin > .sc.merge.bed
+$command=join(" ","sortBed -i",$random_file," | mergeBed  -nms -i stdin","|egrep \";|,\"","|awk '{OFS=\"\t\"}(NF==4)'",">",$outfile);
+
+print "$command\n" if ($verbose);
+system("$command");
+
+if (-z $outfile || ! -e $outfile){
+	unlink("$tmp_name","$random_file","$outfile","$index","$random_name","$new_blacklist"); 
+	print "mergeBed file is empty.\nNo strucutral variants found\n\n" ;
+        open (OUT,">$OUTNAME")||die "Can't write files here!\n";
+        &print_header();
+        close OUT;
+        exit;
+}
+
+print "completed mergeBed\n";
+
+###############################################################
+# If left and right are on the same line, make into 2 lines
+###############################################################
+open (INFILE,$outfile)||die "couldn't open temp file : $. \n\n";
+my $tmp2=join("",$random_name,".sc.fixed.merge.bed");
+#print "INFILE=$outfile\tOUTFILE=$tmp2\n\n";
+#INPUT FORMAT=chr9\t131467\t131473\tATGCTTATTAAAA|left;TTATTAAAAGCATA|left
+open (OUTFILE,">$tmp2")||die "couldn't create temp file : $. \n\n";
+while(<INFILE>){
+	chomp $_;
+	my $l = $_;
+
+	my @a = split(/\t/, $l);
+	my $info = $a[3];
+	my @info_arr = split(/\;/, $info);
+	my @left_arr=();
+	my @right_arr=();
+	@left_arr = grep(/left/, @info_arr);
+	@right_arr = grep(/right/, @info_arr);
+
+	#New
+	my $left = join(";", @left_arr);
+	my $right = join(";", @right_arr);
+	$info = join(";", @info_arr);
+
+	if((@left_arr) && (@right_arr)){
+		print OUTFILE "$a[0]\t$a[1]\t$a[2]\t$left\n$a[0]\t$a[1]\t$a[2]\t$right\n";
+	} else{
+		my $all=join("\t",@a[0..2],$info);
+		print OUTFILE "$all\n";
+	}
+}
+
+# make sure output file name is $outfile
+$command=join(" ","sed -e '/ /s//\t/g'", $tmp2,"|awk 'BEGIN{OFS=\"\\t\"}(NF==4)'", "|perl -pne 's/ /\t/g'>",$outfile);
+system("$command");
+if($verbose){print "$command\n"};
+unlink("$tmp_name","$random_file","$tmp2","$outfile","$index","random_name","$new_blacklist") if (-z $outfile || ! -e $outfile) ;
+ if (-z $outfile || ! -e $outfile){
+	print "Fixed mergeBed file is empty($outfile).\nNo strucutral variants found\n\n";
+        open (OUT,">$OUTNAME")||die "Can't write files here!\n";
+        &print_header();
+        close OUT;
+        exit;
+}
+
+print "completed fixing mergeBed\n\n";
+
+###############################################################
+# Seperate directions of soft clips
+###############################################################
+my $left_sc = join("", "left", $tmp2);
+my $right_sc = join("", "right", $tmp2);
+use FindBin qw($Bin);
+#my $path=$Bin;
+
+$command=join("","grep left ", $tmp2, " |sed -e '/left /s//left\;/g' | sed -e '/ /s//\t/g'|perl ".$path."/direction_filter.pl - >",$left_sc);
+system("$command");
+#print "$command\n";
+$command=join("","grep right ", $tmp2, " |sed -e '/right /s//right\;/g' | sed -e '/ /s//\t/g'|perl ".$path."/direction_filter.pl - >",$right_sc);
+#$command=join(" ","grep right ", $tmp2, " |sed -e '/right /s//right\;/g' | sed -e '/ /s//\t/g' >",$right_sc);
+system("$command");
+#print "$command\n";
+#die "CHECK $right_sc\n";
+
+###############################################################
+# Count the number and identify directions of soft clips
+###############################################################
+print "Count the number and identify directions of soft clips\n";
+#print "looking in $outfile\n";
+$outfile=join("",$random_name,".sc.fixed.merge.bed");
+
+open (INFILE,$outfile)||die "couldn't open temp file\n\n";
+my $tmp3 = join("", $random_file, "predSV");
+open (OUTFILE, ">$tmp3")||die "couldn't create temp file\n\n";
+while(<INFILE>){
+chomp;
+	@_=split(/\t/,$_);
+	my $count=tr/\;//;$count+=tr/\,//;
+	$count=$count+1;
+	my $left=0;
+	my $right=0;
+
+	while ($_ =~ /left/g) { $left++ } # count number of right clips
+	while ($_ =~ /right/g) { $right++ } # count number of left clips
+
+	###############################################################
+	if ($count >= $minSoftReads){
+		####get longets soft-clipped read
+		my @clips=split(/\;|,|\|/,$_[3]);
+
+		my ($max, $temp, $temp2, $temp3, $dir, $maxSclip) = (0) x 6;
+		for (my $i=0; $i<$count; $i++) {
+			my $plus1=$i+1;
+			$temp=length($clips[$i]);
+			$temp2=$clips[$plus1];
+			$temp3=$clips[$i];
+
+			if ($temp > $max){
+				$maxSclip=$temp3;
+				$max =$temp;
+				$dir=$temp2;
+			} else {
+				$max=$max;
+				$dir=$dir;
+				$maxSclip=$maxSclip;
+			}
+			$i++;
+		}
+		my $order2 = join("|", $left, $right);
+        #print join ("\t",@_[0..2],$maxSclip,$max,$dir,$count,$order2) . "\n";
+		print OUTFILE join ("\t",@_[0..2],$maxSclip,$max,$dir,$count,$order2) . "\n";
+	} elsif($_=~/unknown/){
+	print OUTFILE join ("\t",@_[0..2],"NA","NA","left","NA","NA|NA") . "\n";
+        print OUTFILE join ("\t",@_[0..2],"NA","NA","right","NA","NA|NA") . "\n";
+	}
+	####Format is Chrom,start, end,longest Soft-clip,length of longest Soft-clip, direction of longest soft-clip,#supporting softclips,#right Sclips|#left Sclips
+}
+close INFILE;
+close OUTFILE;
+
+unlink("$tmp2","$tmp_name","$random_file","$tmp3","$outfile","$index","$random_name","$right_sc","$left_sc","$new_blacklist") if (-z $tmp3 || !-e $tmp3) ;
+
+ if (-z $tmp3 || !-e $tmp3){
+	print "No structural variants found while Counting the number and identify directions of soft clips.\n" ;
+
+	open (OUT,">$OUTNAME")||die "Can't write files here!\n";
+	&print_header();
+	close OUT;
+	exit;
+
+}
+
+print "Done counting Softclipped reads\n";
+###############################################################
+#### Print header information
+###############################################################
+open (OUT,">$OUTNAME")||die "Can't write files here!\n";
+&print_header();
+close OUT;
+
+###############################################################
+###############################################################
+#### DO the bulk of the work
+###############################################################
+use List::Util qw(min max);
+open (FILE,"$tmp3")|| die "Can't open file\n";
+open (OUT,">>$OUTNAME")|| die "Can't open file\n";
+
+#print "\nusing $tmp3 and writing to $OUTPUT_FILE \n";
+while (<FILE>){
+	#If left clip {+- or -- or -+ }{+- are uninformative b/c they go upstream}
+	#If right clip {++ or -- or +-}
+	chomp $_;
+	my $line = $_;
+	my @info = split(/\t/, $_);
+
+	if($info[5] eq "left") {
+		bulk_work("left", $line, $random_file_disc);
+
+	} elsif ($info[5] eq "right") {
+		bulk_work("right", $line, $random_file_disc);
+	}
+#if($. ==6){print "THIS IS LINE 6\n$_\n";die}
+print "Completed line $.\n" if ($verbose);
+}
+close FILE;
+close OUT;
+
+###############################################################################
+###############################################################################
+#### Delete temp files
+my $meregedBed=join("",$random_name,".sc.direction.bed.sc.merge.bed");
+
+if(defined($temp_output)){$temp_output=$temp_output} else {$temp_output="no"}
+
+if ($temp_output eq "no"){
+	unlink("$tmp_name","$random_file","$tmp2",,"$tmp3","$outfile","$index","$random_name","$right_sc","$left_sc","$meregedBed","$random_name.bed");
+}
+####Sort VCF
+my $tmp=join(".",$random_name,"tmp");
+#Get header
+$cmd="grep \"#\" $OUTNAME > $tmp";
+system($cmd);
+#sort results
+$cmd="grep -v \"#\" $OUTNAME|perl -pne 's/chr//'|sort -k1,1n -k2,2n|perl -ne 'print \"chr\".\$_' >>$tmp";
+system($cmd);
+$cmd="mv $tmp $OUTNAME";
+system($cmd);
+#remove entries next to each other
+
+
+
+
+#############################################################
+##May not need this anymore since filtering on left and right
+#############################################################
+#my $tmpout=$OUTNAME;
+#$tmpout.=".tmp";
+#use FindBin qw($Bin);
+##my $path=$Bin;
+#$command="perl ".$path."/Extract_nSC.pl $OUTNAME -q nSC > $tmpout";
+##print "Command=$command\n";
+#system($command);
+#$command="perl ".$path."/reduce_redundancy.pl $tmpout $upper_limit |cut -f1-10 > $OUTNAME";
+##print "$command\n";
+#system($command);
+#system("rm $tmpout");
+########################################################
+
+
+
+
+print "Analysis Completed\n\nYou did it!!!\n";
+print "Finish Time : " . &spGetCurDateTime() . "\n";
+$now = time - $now;
+printf("\n\nTotal running time: %02d:%02d:%02d\n\n", int($now / 3600), int(($now % 3600) / 60),
+int($now % 60));
+
+exit;
+
+###############################################################################
+sub rev_comp {
+  my $dna = shift;
+  my $revcomp = reverse($dna);
+  $revcomp =~ tr/ACGTacgt/TGCAtgca/;
+
+  return $revcomp;
+}
+
+
+###############################################################################
+#### to get reference base
+sub getSeq{
+	my ($chr,$pos,$fasta)=@_;
+	#don't require chr
+	#if($chr !~ /^chr/){die "$chr is not correct\n";}
+#	die "$pos is not a number\n" if ($pos <0);
+my @result=();
+        if ($pos <0){print "$pos is not a valid position (likely caused by circular MT chromosome)\n";return;}
+
+	@result = `samtools faidx $fasta $chr:$pos-$pos`;
+	if($result[1]){chomp($result[1]);
+	return uc($result[1]);
+	}
+	return("NA");
+	#### after return will not be printed
+	####print "RESULTS=@result\n";
+}
+
+sub getBases{
+        my ($chr,$pos1,$pos2,$fasta)=@_;
+        #don't require chr
+        #if($chr !~ /^chr/){die "$chr is not correct\n";}
+my @result=();
+        if ($pos1 <0){print "$pos1 is not a valid position (likely caused by circular MT chromosome)\n";return;};
+
+        @result = `samtools faidx $fasta $chr:$pos1-$pos2`;
+	if(!$result[1]){$result[1]="NA"};
+        chomp($result[1]);
+        return uc($result[1]);
+
+        #### after return will not be printed
+        ####print "RESULTS=@result\n";
+}
+###############################################################################
+#### to get time
+sub spGetCurDateTime {
+	my ($sec, $min, $hour, $mday, $mon, $year) = localtime();
+	my $curDateTime = sprintf "%4d-%02d-%02d %02d:%02d:%02d",
+	$year+1900, $mon+1, $mday, $hour, $min, $sec;
+	return ($curDateTime);
+}
+
+
+###############################################################################
+#### print header
+sub print_header {
+	my $date=&spGetCurDateTime();
+	my $header = qq{##fileformat=VCFv4.1
+##fileDate=$date
+##source=SoftSearch.pl
+##reference=$INPUT_FASTA
+##Usage= SoftSearch.pl -l $minSoft -q $MapQ -r $minSoftReads -d $dist_To_Soft -m $minRP -u $unmated_pairs -s $num_sd -b $INPUT_BAM -f $INPUT_FASTA -o $OUTNAME
+##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
+##INFO=<ID=EVENT,Number=1,Type=String,Description="ID of event associated to breakend">
+##INFO=<ID=MATEID,Number=.,Type=String,Description="ID of mate breakends">
+##INFO=<ID=ISIZE,Number=.,Type=String,Description="Size of the SV">
+##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
+##FORMAT=<ID=lSC,Number=1,Type=Integer,Description="Length of the longest soft clips supporting the BND">
+##FORMAT=<ID=nSC,Number=1,Type=Integer,Description="Number of supporting soft-clips\">
+##FORMAT=<ID=uRP,Number=1,Type=Integer,Description="Number of unmated read pairs nearby Soft-Clips">
+##FORMAT=<ID=levD_local,Number=1,Type=Float,Description="Levenstein distance between soft-clipped bases and the area around the original soft-clipped site">
+##FORMAT=<ID=levD_distl,Number=1,Type=Float,Description="Levenstein distance between the soft-clipped bases and mate location">
+##FORMAT=<ID=CTX,Number=1,Type=Integer,Description="Number of chromosomal translocations">
+##FORMAT=<ID=DEL,Number=1,Type=Integer,Description="Number of reads supporting Large Deletions">
+##FORMAT=<ID=INS,Number=1,Type=Integer,Description="Number of reads supporting Large insertions">
+##FORMAT=<ID=NOV_INS,Number=1,Type=Integer,Description="Number of reads supporting novel sequence insertion">
+##FORMAT=<ID=TDUP,Number=1,Type=Integer,Description="Number of reads supporting a tandem duplication">
+##FORMAT=<ID=INV,Number=1,Type=Integer,Description="Number of reads supporting inversions">
+##FORMAT=<ID=sDEL,Number=1,Type=Integer,Description="Number of reads supporting novel sequence insertion">
+##INFO=<ID=NO_MATE_SC,Number=1,Type=Flag,Description="When there is no softclipping of the mate read location, an appromiate position is used">
+##FORMAT=<ID=GT,Number=1,Type=String,Description="Dummy value for maintaining VCF-Spec">
+#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\t$SAMPLE_NAME\n};
+
+	print OUT $header;
+}
+
+
+###############################################################################
+sub bulk_work {
+print "#####################################@_\n" if ($verbose);
+	my ($side, $line, $file) = @_;
+	my $local_levD = 0;
+	my $distl_levD = 0;
+
+	#my @info = split(/\t/, $line);
+	my @plus_Reads = split(/\t/, $line);
+	$plus_Reads[7] =~ s/\n//g;
+
+	#### softclip length and softclip size.
+	my $lSC = $plus_Reads[4];
+	my $nSC = $plus_Reads[6];
+
+
+	#Get all types of compatible reads
+	#Get improperly paired reads (@ max distance)
+
+	#### default value for left SIDE.
+	#If left-clip, then look downstream for match of softclipped reads to define a deletion, but look for DRPs upstream
+	my $sv_type = "SVTYPE=BND";
+	my $start_local=0; my $end_local=0;my $target_local="";my $target_drp="";my $start_drp="";my $end_drp="";
+	if ($side =~ /left/) {
+		$start_local = $plus_Reads[1]-$dist_To_Soft;
+		$end_local = $plus_Reads[2];
+                $start_drp = $plus_Reads[1];
+                $end_drp = $plus_Reads[1]+$dist_To_Soft;
+	
+	}
+	else{                
+                $start_local = $plus_Reads[1];
+                $end_local = $plus_Reads[1]+$dist_To_Soft;
+                $start_drp = $plus_Reads[1]-$dist_To_Soft;
+                $end_drp = $plus_Reads[1];
+        }
+	
+	$target_local=join("", $plus_Reads[0], ":", $start_local, "-", $end_local);
+	$target_drp=join("", $plus_Reads[0], ":", $start_drp, "-", $end_drp);
+	my $num_unmapped_pairs="";
+	if ($side =~ /right/) {
+		$num_unmapped_pairs=`samtools view $new_blacklist -q $MapQ -f8 -F 1536 -c $INPUT_BAM $target_drp`;
+	} else {
+        $num_unmapped_pairs=`samtools view $new_blacklist -q $MapQ -f24 -F 1536 -c $INPUT_BAM $target_drp`;
+	}
+if($verbose){print "samtools view $new_blacklist -q $MapQ -f24 -F 1536 -c $INPUT_BAM $target_drp\n";}
+
+	$num_unmapped_pairs=~s/\n//;
+if($verbose){print "NUM UNMAPPED PAIRS= $num_unmapped_pairs\n";}
+	my $REF1_base = "";
+	my $REF2_base = "";
+	my $INFO_1 = "";
+	my $INFO_2 = "";
+	my $ALT_1 = "";
+	my $ALT_2 = "";
+	my $isize = 0;
+	my $QUAL = "";
+	my $FORMAT = "GT:";
+
+	#### get 8 bit rand id
+	my $BND1_name = join "", map { ("a".."z")[rand 26] } 1..8;
+	my $BND2_name = join "", map { ("a".."z")[rand 26] } 1..8;
+	$BND1_name=join "_","BND",$BND1_name;
+	$BND2_name=join "_","BND",$BND2_name;
+
+	my $counts = {CTX => 0, DEL => 0, INS => 0, INV => 0, TDUP => 0, NOV_INS => 0 };
+	my $event_mate_info = {CTX => "", DEL => "", INS => "", INV => "", TDUP => "", NOV_INS => "" };
+
+	#### get mate pair info and counts per event
+	foreach my $e (sort keys %{$counts}) {
+		my $h = get_counts_n_info($e, $side, $MapQ, $file, $dist_To_Soft, $target_drp, $upper_limit, $lower_limit);
+
+		$counts->{$e} = $h->{count};
+		$event_mate_info->{$e} = $h->{info};
+	}
+#print Dumper($counts);
+
+	my $max = 0;
+	my $type = "UNKNOWN";
+	my $nRP = 0;
+	my $mate_info = "NA\tNA\tNA\tNA";
+	my $summary = "GT:";
+
+	#### find max count of events and set type, nRP and info to corresponding
+	#### max count event.
+	#### also create a summary string of all counts to be added to VCF file.
+	foreach my $e (sort keys %{$counts}){
+#		if ($counts->{$e} >=i $max){
+		if ($counts->{$e} > $max){		
+			$type = $e .",". $counts->{$e};
+			$nRP = $counts->{$e};
+
+			$max = $counts->{$e};
+
+			if (length($event_mate_info->{$e})) {
+				$mate_info = $event_mate_info->{$e};
+			}
+		}
+
+		$summary .= $e .",". $counts->{$e} .":";
+	}
+#	print "done with Summary\n";
+	#### remove last colon ":" from
+	$summary =~ s/:$//;
+ if (($minRP > $max)&&(!$disable_RP_only )){if ($verbose){print "FAILED BECAUSE ($minRP > $max)&&(!$disable_RP_only )"};return};
+
+	#### Run Levenstein distance on softclip in target region to find out if its a small deletion/insetion
+	#### passing 1: clip_seq, 2: chr, 3: start, 4: end, 5: ref file.
+	my $levD = new LevD;
+########################################################
+########################################################
+########################################################
+
+	#### redefine start and end location for LevD calc.
+#	$start = $plus_Reads[1]-$dist_To_Soft;
+#	$end = $plus_Reads[2];
+	my $num_bases_to_loc=0;
+	my $new_start=0;
+	my $new_end=0;
+	my $del_seq="";
+        my $start = $start_local;
+        my $end = $end_local;
+	if ($lSC=~/NA/){$lSC=0}
+
+	if ($side =~ /right/) {
+	        $levD->search($plus_Reads[3], $plus_Reads[0], $start, $end, $INPUT_FASTA);
+		$local_levD = sprintf("%.2f", $levD->{relative_edit_dist});
+	        $num_bases_to_loc=$levD->{index};
+		$new_start = $plus_Reads[2];
+                if ($plus_Reads[2]=~/^[0-9]/){$new_end=$plus_Reads[2]+$lSC};
+	}
+	else{
+		$levD->search($plus_Reads[3], $plus_Reads[0], $start, $end, $INPUT_FASTA);
+		$local_levD = sprintf("%.2f", $levD->{relative_edit_dist});
+		$num_bases_to_loc=$levD->{index};
+		if ($plus_Reads[2]=~/^[0-9]/){$new_start=$plus_Reads[2]-$lSC};
+                $new_end = $plus_Reads[2];
+	}
+	if((!$new_start)||(!$new_end)||($new_start<0)){print "FAILED AT ((!$new_start)||(!$new_end)||($new_start<0))\n";return};
+	
+	$del_seq=getBases($plus_Reads[0], $new_start,$new_end,$INPUT_FASTA);
+##############################################################################
+#	#If there is a match, where is the start position of the match?
+#
+##############################################################################
+
+
+	#if $plus_Reads[3] eq "NA", then it was found without soft-clipped reads
+	if($plus_Reads[3] !~  /NA/){
+			if (($local_levD < $levD_local_threshold)) {
+				return if (!$sv_only);
+				#### add value to summary to be written to vcf file.
+				$summary = "GT:sDel," . $plus_Reads[6];
+				$type = "sDEL";
+				###########################################################################
+				##### Printing output
+
+				#########################################
+				##### Get DNA info
+				#########################################
+				#$REF1_base = getSeq($plus_Reads[0], $plus_Reads[1], $INPUT_FASTA);
+				$REF1_base = substr($del_seq, 0, 1);
+
+				#### this is alt ref. for softclip its $plus_Reads[3]
+				$REF2_base = $del_seq;
+				$QUAL = 1/($local_levD + 0.001);
+				$QUAL = sprintf("%.2f",$QUAL);
+				$isize = length($del_seq);
+
+				#### svtype = none for sDEL
+				#### isize = length($info[3]);
+				#### nRP = NA
+				#### mate_id = NA
+				#### CTX,:DEL,:....sDEL,##
+				$INFO_1=join(";", "SVTYPE=NA", "EVENT=$type", "ISIZE=$isize");
+
+				#Add Sample infomration
+				my $FORMAT="GT:sDEL";
+				$FORMAT .= ":lSC:nSC:uRP:levD_local";
+				my $SAMPLE= "0/1:";
+				$SAMPLE .= "$plus_Reads[6]:$lSC:$nSC:$num_unmapped_pairs:$local_levD";
+
+				#### remove any white spaces.
+				$INFO_1=~s/\s//g;
+				$INFO_2=~s/\s//g;
+
+				$BND1_name =~ s/^BND/LEVD/;
+				# If left, then the start position is plus_Reads[1]-isize
+				my $start_pos=0;
+				#Make sure Ref1 and Ref2 bases are different
+				if($REF2_base eq $REF1_base){$REF1_base="NA"}
+				if($side=~/left/){$start_pos=$plus_Reads[1]-$isize}else{$start_pos=$plus_Reads[1]};		
+				print OUT join ("\t", $plus_Reads[0], $start_pos, $BND1_name, $REF2_base, $REF1_base, $QUAL, "PASS", $INFO_1,$FORMAT,$SAMPLE, "\n");
+				if ($verbose){print "No Softclipped reads here!\n"}
+				return;
+			}
+		}
+
+		#### Otherwise, look for DRP mate info
+	#if($nRP=~/NA/){print "MATE_INFO=$mate_info\tSide=$side\tline=$line\n";}
+		my @mate_info_arr = split(/\t/, $mate_info);
+		$nRP = $mate_info_arr[3];
+		my $mate_chr=$mate_info_arr[0];
+
+			if((! defined $nRP) || ($nRP =~ /na/i) || ($mate_chr =~ /NA/) ){
+			#PRINT UNKNOWN
+	if ($nRP =~ /na/i){print "Can't find SC reads\n" if ($verbose);return};
+	if ($verbose){print "There is an unknown\nNRP=$nRP Mate_CHR=$mate_chr minRP=$minRP\n"}
+				$summary .= ":unknown," . $plus_Reads[6];
+				$type = "unknown";
+				$REF1_base = getSeq($plus_Reads[0], $plus_Reads[1], $INPUT_FASTA);
+				$REF2_base = $plus_Reads[3];
+				$BND1_name =~ s/^BND/UNKNOWN/;
+				$QUAL = 1/($local_levD + 0.001);
+				$QUAL = sprintf("%.2f",$QUAL);
+				$INFO_1=join(";", "SVTYPE=unknown", "EVENT=unknown", "ISIZE=unknown");
+				#Add Sample infomration
+				my $FORMAT="GT:sDEL";
+				$FORMAT .= ":lSC:nSC:uRP:levD_local";
+				my $SAMPLE = "0/1:";
+				$SAMPLE .= "$plus_Reads[6]:$lSC:$nSC:$num_unmapped_pairs:$local_levD";
+				$SAMPLE=~s/NA/0/g;
+				#### remove any white spaces.
+				$INFO_1=~s/\s//g;
+			       #print join ("\t", $plus_Reads[0], $plus_Reads[1],  $REF2_base, $REF1_base, $QUAL, "PASS", $INFO_1,$FORMAT,$SAMPLE, "\n");
+
+				print OUT join ("\t", $plus_Reads[0], $plus_Reads[1], $BND1_name, $REF1_base, $REF2_base, $QUAL, "PASS", $INFO_1,$FORMAT,$SAMPLE, "\n");
+				return;
+
+		}
+		#### end if there is no mate info or nRP+uRP<minRP
+		if (($nRP<$minRP)&&($unmated_pairs > ($num_unmapped_pairs+$nRP))){
+			print "Something failed here\nif (($nRP<$minRP)&&($unmated_pairs > ($num_unmapped_pairs+$nRP)))\n";
+		return}
+
+		##################################################################################
+		# Find out if mates have nearby soft-clips (to refine the breakpoints)
+		##################################################################################
+		#Look for evidence of soft-clipping near mate
+		my @mate_soft_arr = ();
+		my $mate_start = 0;
+		my $mate_soft = "";
+
+		@mate_info_arr = split(/\t/, $mate_info);
+
+		#### mate start and end locations.
+		my $filename = $right_sc;
+
+		$start = $mate_info_arr[1] - $dist_To_Soft;
+		$end = $mate_info_arr[1];
+
+		if ($side =~ /right/) {
+			$start = $mate_info_arr[2];
+			$end = $mate_info_arr[2] + $dist_To_Soft;
+
+			$filename = $left_sc;
+		}
+
+		#### add levenstein distance to Summary
+	#print "Calc distal Levd\n";
+		$levD->search(rev_comp($plus_Reads[3]), $mate_info_arr[0], $start, $end, $INPUT_FASTA);
+		$distl_levD = sprintf("%.2f", $levD->{relative_edit_dist});
+	$distl_levD = "NA" if($plus_Reads[3] =~ /NA/);
+	#If there is no softclips to string match, then give 0 as quality value
+       if ($plus_Reads[3] !~ /NA/){
+			$QUAL=1/($distl_levD + 0.001);
+		}
+		else	{
+			$QUAL=0;
+		};
+	$QUAL=sprintf("%.2f",$QUAL);
+	#### looking for softclips to refine break point
+	#### if left look in right and vice-versa.
+	$cmd = qq{echo -e "$mate_info_arr[0]\t$start\t$end"};
+	$cmd .= qq{ | awk -F'\t' 'NF==3' | intersectBed -a stdin -b $filename | head -1};
+print "$cmd\n" if $verbose;
+	$mate_soft = `$cmd`;
+
+	$mate_soft =~ s/\n//g;
+	@mate_soft_arr = split(/\s/, $mate_soft);
+my $NO_MATE_SC="";
+	if(@mate_soft_arr){
+		$mate_chr = $mate_soft_arr[0];
+		$mate_start = $mate_soft_arr[1];
+                $NO_MATE_SC="APPROXIMATE";
+
+	} else{
+		@mate_info_arr = split(/\s/,$mate_info);
+		$mate_chr = $mate_info_arr[0];
+		$mate_start = $mate_info_arr[1];
+	}
+
+	#end if there is no mate info
+	return if ($mate_chr eq "");
+	#end if there is no mate info and !disable_RP_only
+	return if (($lSC =~/NA/)&&(!$disable_RP_only));
+	
+	
+	###########################################################################
+	##### Printing output
+
+	#########################################
+	# Get DNA info
+	#########################################
+	#print "PLUS_READS=$plus_Reads[0],$plus_Reads[1]\nMATE=$mate_chr,$mate_start,$INPUT_FASTA\n";
+	$REF1_base = getSeq($plus_Reads[0], $plus_Reads[1], $INPUT_FASTA);
+
+	### this is alt ref. for softclip its $plus_Reads[3]
+	$REF2_base = getSeq($mate_chr, $mate_start, $INPUT_FASTA);
+
+	#########################################
+	# print in VCF format
+	#########################################
+
+	#### abs value to account for left and right reads.
+	$isize = abs($plus_Reads[1]-$mate_start);
+	
+	my $event_type=$type;
+	$event_type=~ s/,|[0-9]//g;
+	$INFO_1=join(";", "$sv_type", "EVENT=$event_type","END=$mate_start", "ISIZE=$isize","MATEID=$BND2_name");
+	$INFO_2=join(";", "$sv_type", "EVENT=$event_type","END=$plus_Reads[1]", "ISIZE=$isize","MATEID=$BND1_name");
+
+	#### remove any white spaces.
+	#### ask: did you mean to remove space from ends? eg. trim()
+	$INFO_1=~s/\s//g;
+	$INFO_2=~s/\s//g;
+
+	$FORMAT=$summary;
+ 	$FORMAT=~ s/,|[0-9]//g;
+        $FORMAT .= ":lSC:nSC:uRP:distl_levD";
+	if($NO_MATE_SC){$INFO_2 .= ":NO_MATE_SC"}
+	my $SAMPLE="0/1:";	
+	$SAMPLE .=$summary;
+#        if($NO_MATE_SC){$SAMPLE.= ":$NO_MATE_SC"}
+
+	$SAMPLE=~s/[A-Z|,|_]//g;
+        my $MATE_SAMPLE=$SAMPLE;
+        $SAMPLE .= ":$lSC:$nSC:$num_unmapped_pairs:$distl_levD";
+	$MATE_SAMPLE .=":NA:NA:NA:NA";
+	$SAMPLE=~s/::/:/g;
+	$MATE_SAMPLE=~s/::/:/g;
+	$MATE_SAMPLE=~s/NA/0/g;
+	$SAMPLE=~s/NA/0/g;
+ 
+	if($type !~ /INV/){
+		$ALT_1 = join("","]",$mate_chr,":",$mate_start,"]",$REF1_base);
+		$ALT_2 = join("",$REF2_base,"[",$plus_Reads[0],":",$plus_Reads[1],"[");
+		#		2      321682 bnd_V  T   ]13:123456]T  6    PASS SVTYPE=BND
+		#		13     123456 bnd_U  C   C[2:321682[   6    PASS SVTYPE=BND
+	} else {
+		$ALT_1 = join("", "]", $plus_Reads[0], ":", $plus_Reads[1], "]", $REF2_base);
+		$ALT_2 = join("", $REF1_base, "[", $mate_chr, ":", $mate_start, "[");
+	}
+
+	if(($mate_chr) && ($plus_Reads[0])){
+		print OUT join ("\t", $plus_Reads[0], $plus_Reads[1], $BND1_name, $REF1_base, $ALT_1, $QUAL,"PASS", $INFO_1, $FORMAT,$SAMPLE,"\n");
+		print OUT join ("\t", $mate_chr, $mate_start, $BND2_name, $REF2_base, $ALT_2, $QUAL, "PASS", $INFO_2, $FORMAT,$MATE_SAMPLE,"\n");
+	}
+}
+
+###############################################################################
+###############################################################################
+sub get_counts_n_info {
+        my ($event, $side, $mapQ, $file, $dist, $target, $upL, $lwL) = @_;
+
+        my $mate_info = "";
+        my $cmd = "";
+
+        if ($event =~ /^CTX$/i) {
+                #print "CTX side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 32 -F 1536 $file $target};
+                        $cmd .= qq{ | perl -ane 'if(\$F[6] ne "="){\$end=\$F[7]+1; print join ("\\t",\$F[6],\$F[7],\$end,"\\n")}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info=`$cmd`;
+                } else {
+                        $cmd = qq{ samtools view $new_blacklist -q $mapQ -f 16 -F 1536 $file $target};
+                        $cmd .= qq{ | perl -ane 'if(\$F[6] ne "="){\$end=\$F[7]+1; print join ("\\t",\$F[6],\$F[7],\$end,"\\n")}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info=`$cmd`;
+                }
+        } elsif ($event =~ /^DEL$/i) {
+                #print "DEL side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ  -f 32 -F 1552 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$9>$upL)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info=`$cmd`;
+                } else {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -F 1568 -f 16 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"} {if((\$7 ~ /=/)&&(\$9<-$upL)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist_To_Soft -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+
+                        $mate_info=`$cmd`;
+                }
+        } elsif ($event =~ /^INS$/i) {
+                #print "INS side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 32 -F 1552 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$9<$lwL && \$9 > 0 )){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                } else {
+                        $cmd = qq {samtools view $new_blacklist -q $mapQ -f 16 -F 1568 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$9>-$lwL && \$9 < 0 )){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                }
+        } elsif ($event =~ /^INV$/i) {
+                #print "INV side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ  -F 1596 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist_To_Soft -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                } else {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 48 -F 1548 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                }
+        } elsif ($event =~ /^TDUP$/i) {
+                #print "TDUP side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 32 -F 1552 $file $target};
+#			$cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$9>$upL)){end=\$8+1;print \$3,\$8,end}}'};
+			$cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$4>\$8)&&(\$9<0)&& (\$9>$upL)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                } else {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 16 -F 1568 $file $target};
+#                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$9<-$upL )){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$4<\$8)&&(\$9>0)&&(\$9>$upL)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                }
+        } elsif ($event =~ /^NOV_INS$/i) {
+                #print "NOV_INS side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 8 -F 1552 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{end=\$8+1;print \$3,\$8,end}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                } else {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ  -f 24 -F 1536 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{end=\$8+1;print \$3,\$8,end}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                }
+        }
+
+        $mate_info=~s/\n//g;
+        my @tmp=split(/\t/, $mate_info);
+
+        my $counts = 0;
+
+        if (defined $tmp[3]) {
+                $tmp[3] =~ s/\n//g;
+
+                $counts = $tmp[3] if (length($tmp[3]));
+        }
+        return ({count=>$counts, info=>$mate_info});                                                                                                                                
+}
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/script/SoftSearch_Filter.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,137 @@
+#!/usr/bin/perl -s
+open (FILE,"$ARGV[0]")||usage();#die "Not using the right Parameters!\n\n";
+use Getopt::Long;
+#Declare variables
+my ($lsc,$minDist,$skip,$nSC,$nRP,$isize,$answer);
+GetOptions(
+	'dist:s' => \$minDist,		#minimum distance between events
+	'lsc:i' => \$lsc,		#minimum somatic score
+	'nsc:i' => \$nsc, 	#minimum depth of coverage in normal
+	'nRP:i' => \$nRP,	#minimum number of times it can be seen in tumor
+	'isize:i' => \$isize,	
+	'sv:s' => \$sv,		#whether or not to skip small deletions
+	'q:s' => \$answer,		#useful for plotting histograms
+	'skip:s' => \$skip
+	);
+if(defined($lsc)){$lsc=$lsc} else {$lsc=0};
+if(defined($nsc)){$nsc=$nsc} else {$nsc=0};
+if(defined($nRP)){$nRP=$nRP} else {$nRP=0};
+if(defined($minDist)){$minDist=$minDist} else {$minDist=0};
+if(!$isize){$isize=0};
+if(!$uRP){$uRP=0};
+
+if($answer eq "yes"){$answer=$answer} else {$answer="no"};
+
+if ($answer eq "yes"){
+open(lsc,">lsc.out")||die;
+open(nsc,">nsc.out")||die;
+open(nRP,">nRP.out")||die;
+}
+
+
+#Remove hits if they are within $minDist
+$chr="chr1";$pos=0;
+while (<FILE>){
+	if ($_=~/^#/){
+		print; 
+		next
+	};
+	if ($skip){next if $_=~/$skip/}
+	@_=split(/\t/,$_);
+	#Get ISIZE from INFO field
+	my @info=split(/;/,$_[7]);
+       	my $k = 0;
+	my $v = 0; 
+	my $infoHash;
+	for (my $i=0;$i<=@info;$i++){
+        	my @tmp=split(/=/,$info[$i]);
+		$k=shift(@tmp);
+		$v=shift(@tmp);
+		$infoHash{$k}=$v;
+	}
+
+	#Get the value of TYPE to find out how many reads support the event
+        my $counts = {CTX => 0, DEL => 0, INS => 0, INV => 0, TDUP => 0, NOV_INS => 0, lSC => 0, nSC => 0,uRP =>0,sDEL => 0,levD_local=>0,distl_levD => 0 };
+	#Get Complete Hash
+	#@_[8] is format
+	#@_[9] is values
+	my @format=split(/:/, $_[8]);
+	my @sample=split(/:/,$_[9]);
+	my %hash; 
+	@hash{@format}=@sample;
+	#Subset has to get proper type of variants
+	my $max_val = 0;
+	my $max_type = "NA";
+	
+	#Get TYPEOF HASH 
+	my %type;
+	%type = %hash ;
+	delete $type{'lSC'};
+        delete $type{'nSC'};
+        delete $type{'uRP'};
+        delete $type{'levD_local'};
+        delete $type{'distl_levD'};
+
+ 	while (my ($key,$val)=each(%type)){
+		if($val > $max_val){$max_val=$val;$max_type=$key}
+		}
+
+
+#######################################################################################################
+        #Start applying filters
+	
+	#Remove hits if they are within $minDist
+	$chrom=$_[0];$position=$_[1];
+
+	#next if chroms are same and distance is less than X
+	$difference=abs($pos-$position);
+	if(($chrom eq $chr)&&($difference < $minDist)){
+		$pos=$position;$chr=$chrom;;
+		next}
+	$pos=$position;$chr=$chrom;	
+	$EVENT_SIZE=$infoHash{'ISIZE'};
+	$EVENT_TYPE=$max_type;
+	$EVENT_SUPPORT=$max_val;
+	$length_of_softClips=$hash{'lSC'};
+	$number_of_softclips=$hash{'nSC'};
+        $number_of_unmated=$hash{'uRP'};
+	
+	########################################################################
+	#Print if all fileds are ok
+	next if($EVENT_SIZE < $isize);
+        next if($EVENT_SUPPORT < $nRP);
+        next if($length_of_softClips < $lsc);
+        next if($number_of_softclips < $nsc);
+        next if($number_of_unmated < $uRP);
+	next if (($sv)&&($EVENT_TYPE=~/sDEL/));
+	print;
+
+
+	if ($answer eq "yes"){
+	print lsc $length_of_softClips."\n";
+	print nsc $number_of_softclips."\n";
+	print nRP $EVENT_SUPPORT."\n";
+	}
+}
+
+
+sub usage{
+print "\nUsage: Soft_SearchFilter.pl <VCF>\n
+	-dist	#minimum distance between events [0]
+	-lsc	#minimum length soft-clip [0]
+	-nsc	#minimum number of soft-clip [0]
+	-nRP	#minimum number of discordant read pairs [0]
+	-isize	#minimum size [0]
+	-sv	#skip small deletions [no|yes]
+	-skip	#pipe-delimited list of strings to skip (e.g. chrM|chY|chrGL)
+	\n"
+}
+
+#R
+# lsc<-read.table("lsc.out")
+# nsc<-read.table("nsc.out")
+# nRP<-read.table("nRP.out")
+# par(mfrow=c(2,2))
+# hist(lsc$V1,breaks=100)
+# hist(nsc$V1,breaks=100)
+# hist(nRP$V1,breaks=100)
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/script/Subset_targets.sh	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,23 @@
+#!/bin/sh
+#$ -V
+#$ -cwd
+#$ -q 1-day
+#$ -m ae
+#$ -M hart.steven@mayo.edu
+#$ -l h_vmem=1G
+#$ -l h_stack=10M
+BAM=$1
+TARGET_BED=$2
+SAMPLE_NUMBER=$3
+
+#cat $HEADER > out.${SAMPLE_NUMBER}.sam
+samtools view -L $TARGET_BED $BAM|
+ perl -ane '
+ next if ($F[10]=~/#/);
+ $minSize=1000;
+ if( $F[1] & 8 || $F[1] & 4 ||  $F[8] == 0 || abs($F[8]) > $minSize || $F[5] =~/S/){
+ $rName=join("","@",@F[0]);
+  print join ("\n",$rName,$F[9],"+",@F[10])."\n";
+};
+ ' >> out.${SAMPLE_NUMBER}.fq
+echo "Done with $BAM `date`"
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/script/blat_parse.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,526 @@
+#####################################################################################################################################################
+#Purpose: To parse blat psl file
+#Date: 07-30-2013
+#####################################################################################################################################################
+use Getopt::Long;
+use Cwd;
+#reading input arguments
+&Getopt::Long::GetOptions(
+'b|BLAT_OUT=s'=> \$blat_out,
+'temp:s'=>\$dirtemp,
+'f|FASTA=s'=>\$infast,
+);
+$blat_out =~ s/\s|\t|\r|\n//g;
+$dirtemp =~ s/\s|\t|\r|\n//g;
+$infast =~ s/\s|\t|\r|\n//g;
+$samtools=`which samtools`;
+$samtools =~ s/\s|\t|\r|\n//g;
+
+if($blat_out eq "" || $infast eq "" )
+{
+	die "Try: perl blat_parse.pl -b <PSL FILE> -f <Contigs.fa> 
+	-temp	temporary file directory
+	\n";
+}   
+if (!(-e $samtools))
+{
+	die "samtools must be in your path\n";
+}
+
+if (!(-e $infast))
+{
+	die "input fasta file doesn't exit\n";
+}
+unless(-d $dirtemp)
+{
+    #system("mkdir -p $dirtemp");
+    $dirtemp= getcwd;
+}	
+#opening the blat output file
+open(BUFF,$blat_out) or die "no file found $blat_out\n";
+open(WRBUFF,">$dirtemp/Temp_out.txt") or  die "not able to write the file \n";
+#parsing throught he file
+while(<BUFF>)
+{
+	if($_ =~ m/^\d/)
+	{
+		print WRBUFF $_;	
+	}
+	else
+	{
+		print "ignoring headers $.\n";
+	}
+}	
+close(WRBUFF);
+system("sort -k10,10 -k18,18n $dirtemp/Temp_out.txt > $dirtemp/Temp_out1.txt");
+system("mv  $dirtemp/Temp_out1.txt $dirtemp/Temp_out.txt");
+open(BUFF,"$dirtemp/Temp_out.txt") or die "no file found Temp_out.txt\n";
+open(WRBUFF,">$dirtemp/File1_out.txt") or  die "not able to write the file \n";
+close(WRBUFF);
+
+$prev_contig_name="";
+my @temp;
+#parsing throught he file
+while(<BUFF>)
+{
+	
+		chomp($_);
+		split "\t";
+		if($_[9] ne $prev_contig_name)
+		{
+			if($prev_contig_name ne "")
+			{
+				#print @temp."\n";
+				#print @temp."\n";
+				&processing(@temp);
+			}
+			undef(@temp);
+			push(@temp,$_);		
+		}
+		else
+		{
+			push(@temp,$_);
+		}	
+		$prev_contig_name=$_[9];	
+	
+	
+}	
+#processing last record
+&processing(@temp);
+#print @temp."\n";
+close(BUFF);
+
+
+
+
+##################SUBROUTINES######################
+#actual processing of each record in the temp array(same query name objects)
+
+sub processing {
+	open(WRBUFF,">>$dirtemp/File1_out.txt") or  die "not able to write the file \n";
+        open(BAD_CONTIG,">>$dirtemp/bad_contig.out.txt") or  die "not able to write the file \n";
+
+	@temp = @_;
+	#if number of hits for a contig is one
+	if(@temp == 1)
+	{
+			$i=0;
+			#define blocksizes array
+			@row=split("\t",$temp[$i]);
+			$row[18] =~ s/,$//g;
+			@blockSizes=split(',',$row[18]);
+			#defining var
+			$qSize=$row[10];
+			$qStart=$row[11];
+			$qStop=$row[12];
+			$tstart=$row[15];
+			$tstop=$row[16];
+			$Strand=$row[8];
+			$coverage = $row[9];
+			$coverage =~ s/\w+_//g;
+			#calculate match val
+			if(($qSize-($qStop-$qStart)) ==0)
+			{ 	
+				$flag=1;
+				#these ara non informative
+				if (@blockSizes ==1)
+				{
+					print "ignoring one of the event $row[9] $i as the event is non informative \n";
+					print BAD_CONTIG "$row[9]\n";
+				}
+				#Ignoring when number of blocks are more than two
+				if(@blockSizes > 2)
+				{
+					print "ignoring event $row[9] $. AS BLOCK SIZE is greater than 2\n";	
+				}
+				#if number of blocks is equal to 2
+				if(@blockSizes == 2)
+				{
+					$temp1=$tstart+$blockSizes[0]+1;
+					$temp2=$tstop-$blockSizes[1]-1;
+						
+					print  WRBUFF "$row[9]\t$row[13]\t$temp1\t$Strand\t$row[13]\t$temp2\t$Strand\t$coverage\n";
+				}
+				$i=@temp;
+			}
+			#later part missing
+			elsif($qStart ==0)
+			{	
+				$temp1=$tstart+$blockSizes[0]+1;
+				$infast_chr=$infast;
+				$infast_chr=~ s/\.fa//g;
+				$infast_chr_start=$qStop+1;
+				$infast_chr_stop=$qSize;
+				$sys="$samtools faidx $infast $infast_chr:$infast_chr_start-$infast_chr_stop";
+				
+				$sys = `$sys`;
+				chomp($sys);
+				@sys=split("\n",$sys);
+				$INSERTION="";
+				for($i=1;$i<@sys;$i++)
+				{
+					$INSERTION=$INSERTION.$sys[$i];
+				}
+				$INSERTION_LENGTH=length($INSERTION);
+				$temp1=$tstart+$blockSizes[0]+1;
+				print  WRBUFF "$row[9]\t$row[13]\t$temp1\t$Strand\tUNKNOWN\tUNKNOWN\t$Strand\t$coverage\t$INSERTION\t$INSERTION_LENGTH\n";
+				
+			}
+			#intial part missing
+			elsif($qStop == $qSize)
+			{
+				$temp1=$tstart;
+				$infast_chr=$infast;
+				$infast_chr=~ s/\.fa//g;
+				$infast_chr_start=0;
+				$infast_chr_stop=$qStart;
+				$sys ="$samtools faidx $infast $infast_chr:$infast_chr_start-$infast_chr_stop";
+				#die "$sys\n";
+				$sys = `$sys`;
+				#die "$sys\n";
+				chomp($sys);
+				@sys=split("\n",$sys);
+				$INSERTION="";
+				for( $i=1;$i<@sys;$i++)
+				{
+						$INSERTION=$INSERTION.$sys[$i];
+				}
+				$INSERTION_LENGTH=length($INSERTION);
+				$temp1=$tstart+1;
+				print  WRBUFF "$row[9]\tUNKNOWN\tUNKNOWN\t$Strand\t$row[13]\t$temp1\t$Strand\t$coverage\n";
+				
+			}
+			else
+			{
+				print "ignoring one of the event $row[9] $i as the event is non informative \n";
+			}
+		
+	}
+	#if number of hits for a contig is greater than one
+	else
+	{
+		#this flag is used to see if perfect hit not found (match val =0)
+		$flag1 = 0;
+		for(my $i=0;$i<@temp;$i++)
+		{
+			
+			#define blocksizes array
+			@row=split("\t",$temp[$i]);
+			$row[18] =~ s/,$//g;
+			@blockSizes=split(',',$row[18]);
+			#defining var
+			$qSize=$row[10];
+			$qStart=$row[11];
+			$qStop=$row[12];
+			$tstart=$row[15];
+			$tstop=$row[16];
+			$Strand=$row[8];
+			$coverage = $row[9];
+			$coverage =~ s/\w+_//g;
+			#calculate match val
+			if(($qSize-($qStop-$qStart)) ==0)
+			{ 	
+				$flag1=1;
+				#these ara non informative
+				if (@blockSizes ==1)
+				{
+					print "ignoring one of the event $row[9] $i as the event is non informative \n";
+					print BAD_CONTIG "$row[9]\n";
+				}
+				#Ignoring when number of blocks are more than two
+				if(@blockSizes > 2)
+				{
+					print "ignoring event $row[9] $. AS BLOCK SIZE is greater than 2\n";	
+				}
+				if(@blockSizes == 2)
+				{
+					$temp1=$tstart+$blockSizes[0]+1;
+					$temp2=$tstop-$blockSizes[1]-1;
+						
+					print  WRBUFF "$row[9]\t$row[13]\t$temp1\t$Strand\t$row[13]\t$temp2\t$Strand\t$coverage\n";
+				}
+				$i=@temp;
+			}
+		}
+		#as flag value not changed proceed to see next step
+		if($flag1 == 0)
+		{
+			undef(@initial);
+			my @initial;
+			for(my $i=0;$i<@temp;$i++)
+			{
+				@row=split("\t",$temp[$i]);
+				#print "@row\n";
+				unshift(@initial,[@row]);
+			}
+			#sortin the hits according to qstart & qend
+			@initial = sort {$a->[11] <=> $b->[11] || $b->[12] <=> $a->[12]} @initial;
+			#print "$row[9]\t@initial\n";
+			#if($row[9]  eq "NODE_5_length_149_cov_12.395973")
+			#{
+			#	for($i=0;$i<@initial;$i++)
+			#	{
+			#		print "@{$initial[$i]}\n";
+			#	}
+			#}
+			$start = "";
+			$stop = "";
+			$start_len=0;
+			$stop_len=0;
+			#this super flag is used to skip processing of remaining uncessary hits
+			$super_flag = 0;
+			for($i=0;$i<@initial && $super_flag == 0;$i++)
+			{
+				$flag = 0;
+				#print "@{$initial[$i]}\n";
+				$initial[$i][18] =~ s/,$//g;
+				@blockSizes1=split(',',$initial[$i][18]);
+				#defining var
+				$qSize1=$initial[$i][10];
+				$qStart1=$initial[$i][11];
+				$qStop1=$initial[$i][12];
+				$tstart1=$initial[$i][15];
+				$tstop1=$initial[$i][16];
+				$Strand1=$initial[$i][8];
+				$Chr1 = $initial[$i][13];
+				$coverage1 = $initial[$i][9];
+				$coverage1 =~ s/\w+_//g;
+				#die "$qSize1\t$qStart1\t$qStop1\t$tstart1\t$tstop1\t$Strand1\t$Chr1\t$coverage1\n";
+				#if a hit qstart = 0 then set flag =1 
+				if($qStart1 == 0)
+				{
+					$flag =1;
+				}
+				#if a hit qstop = 0 then set flag =2 
+				if($qStop1 == $qSize1)
+				{
+					$flag =2;
+				}
+				#if($row[9]  eq "NODE_5_length_149_cov_12.395973")
+				#{
+				#	print "$flag \n";
+				#}
+				if(@blockSizes1 == 1)
+				{
+					if($flag == 1 )
+					{
+						for($j=0;$j<@initial;$j++)
+						{
+							#both hits should not be the same 
+							if($i != $j)
+							{
+								#print "@{$initial[$i]}\n";
+								$initial[$j][18] =~ s/,$//g;
+								@blockSizes2=split(',',$initial[$j][18]);
+								#defining var
+								$qSize2=$initial[$j][10];
+								$qStart2=$initial[$j][11];
+								$qStop2=$initial[$j][12];
+								$tstart2=$initial[$j][15];
+								$tstop2=$initial[$j][16];
+								$Strand2=$initial[$j][8];
+								$coverage2 = $initial[$j][9];
+								$Chr2 = $initial[$j][13];
+								$coverage2 =~ s/\w+_//g;
+								#making sure both hits are not over lapping
+								if($qStart2 > $qStart1)
+								{	#allowing +-2 bases as the this hit is immediate next continous hit
+									if($qStop1 >= $qStart2 -2  &&  $qStop1 <= $qStart2 +2  )
+									{
+										#perfect match
+										if($qStop2 == $qSize2)
+										{
+											if($Strand1 eq "+")
+											{
+												$tmp1 = $tstart1+$blockSizes1[0]+1;
+												$tmp2 = $tstart2+$blockSizes2[0];
+												print WRBUFF "$initial[$i][9]\t$Chr1\t$tmp1\t$Strand1\t$Chr2\t$tmp2\t$Strand2\t$coverage1\n";
+											}
+											else
+											{
+												$tmp1 = $tstart1+1;
+												$tmp2 = $tstart2+1;
+												print WRBUFF "$initial[$i][9]\t$Chr1\t$tmp1\t$Strand1\t$Chr2\t$tmp2\t$Strand2\t$coverage1\n";
+											
+											}
+											$super_flag = 1;
+											$j = @initial+1;	
+										}
+										#some part is missing after the second hit
+										else
+										{
+											$tmp1 = $tstart1+$blockSizes1[0];
+											$tmp2 = $tstart2+$blockSizes2[0];
+											$INSERTION="";
+											$infast_chr=$infast;
+											$infast_chr=~ s/\.fa//g;
+											$infast_chr_start=$qStop1+1;
+											$infast_chr_stop=$qStart2-1;
+											$sys ="$samtools faidx $infast $infast_chr:$infast_chr_start-$infast_chr_stop";
+											#die "$sys\n";
+											$sys = `$sys`;
+											#die "$sys\n";
+											chomp($sys);
+											@sys=split("\n",$sys);
+											for( $i=1;$i<@sys;$i++)
+											{
+												$INSERTION=$INSERTION.$sys[$i];
+											}
+											$INSERTION_LENGTH=length($INSERTION);
+											print WRBUFF "$initial[$i][9]\t$Chr1\t$tmp1\t$Strand1\t$Chr2\t$tmp2\t$Strand2\t$coverage1\n";
+											$super_flag = 1;
+											$j = @initial+1;	 
+										}
+										
+									}
+									#if there are some insertion between two hits
+									elsif($qStop2 == $qSize2)
+									{
+										$tmp1 = $tstart1+$blockSizes1[0];
+										$tmp2 = $tstart2+$blockSizes2[0];
+										$INSERTION="";
+										$infast_chr=$infast;
+										$infast_chr=~ s/\.fa//g;
+										$infast_chr_start=$qStop2+1;
+										$infast_chr_stop=$qSize;
+										$sys ="$samtools faidx $infast $infast_chr:$infast_chr_start-$infast_chr_stop";
+										#die "$sys\n";
+										$sys = `$sys`;
+										#die "$sys\n";
+										chomp($sys);
+										@sys=split("\n",$sys);
+										for( $i=1;$i<@sys;$i++)
+										{
+											$INSERTION=$INSERTION.$sys[$i];
+										}
+										$INSERTION_LENGTH=length($INSERTION);
+										print WRBUFF "$initial[$i][9]\t$Chr1\t$tmp1\t$Strand1\t$Chr2\t$tmp2\t$Strand2\t$coverage1\n";
+										$super_flag = 1;
+										$j = @initial+1;	
+									}
+												
+								}
+									
+							}	
+						}
+						#if none worked with other reads then only process that read
+						if($j == @initial)
+						{
+							#die "success\n";
+							$temp1=$tstart1+$blockSizes1[0]+1;
+							#print  WRBUFF "$Chr1\t$temp1\t$Strand1\tUNKNOWN\tUNKNOWN\t$Strand\t$coverage\n";
+							$infast_chr=$infast;
+							$infast_chr=~ s/\.fa//g;
+							$infast_chr_start=$qStop1+1;
+							$infast_chr_stop=$qSize1;
+							$sys ="$samtools faidx $infast $infast_chr:$infast_chr_start-$infast_chr_stop";
+							#die "$sys\n";
+							$sys = `$sys`;
+							#die "$sys\n";
+							chomp($sys);
+							@sys=split("\n",$sys);
+							$INSERTION="";
+							for( $i=1;$i<@sys;$i++)
+							{
+								$INSERTION=$INSERTION.$sys[$i];
+							}
+							$INSERTION_LENGTH=length($INSERTION);
+							print WRBUFF "$initial[$i][9]\t$Chr1\t$temp1\t$Strand1\tUNKNOWN\tUNKNOWN\t$Strand1\t$coverage1\n";
+							$super_flag = 1;
+						}	
+					}
+					#if query end is matched to query size
+					elsif($flag == 2)
+					{
+						#going through other hits
+						for($j=0;$j<@initial;$j++)
+						{
+							#hits should not be same
+							if($i != $j && $qStop2)
+							{
+								#print "@{$initial[$i]}\n";
+								$initial[$j][18] =~ s/,$//g;
+								@blockSizes2=split(',',$initial[$j][18]);
+								#defining var
+								$qSize2=$initial[$j][10];
+								$qStart2=$initial[$j][11];
+								$qStop2=$initial[$j][12];
+								$tstart2=$initial[$j][15];
+								$tstop2=$initial[$j][16];
+								$Strand2=$initial[$j][8];
+								$coverage2 = $initial[$j][9];
+								$Chr2 = $initial[$j][13];
+								$coverage2 =~ s/\w+_//g;
+								#if 
+								if($qStop2 < $qStop1)
+								{
+									if($qStart1 >= $qStop2 -2  &&  $qStart1 <= $qStop2 +2  )
+									{
+										#die "$qStart1 <= $qStop2 \n";
+										$tmp1 = $tstart1+$blockSizes1[0];
+										$tmp2 = $tstart2+$blockSizes2[0];
+										$INSERTION="";
+										$infast_chr=$infast;
+										$infast_chr=~ s/\.fa//g;
+										$infast_chr_start=0;
+										$infast_chr_stop=$qStart1-1;
+										$sys ="$samtools faidx $infast $infast_chr:$infast_chr_start-$infast_chr_stop";
+										#die "test $sys\n";
+										$sys = `$sys`;
+										#die "$sys\n";
+										chomp($sys);
+										@sys=split("\n",$sys);
+										for( $i=1;$i<@sys;$i++)
+										{
+											$INSERTION=$INSERTION.$sys[$i];
+										}
+										$INSERTION_LENGTH=length($INSERTION);
+										print WRBUFF "$initial[$i][9]\t$Chr2\t$tmp2\t$Strand2\t$Chr1\t$tmp1\t$Strand1\t$coverage1\n";
+										$super_flag = 1;
+										$j = @initial+1;
+										
+									}
+									
+								}	
+							}
+						}
+						if($j == @initial)
+						{
+							$infast_chr=$infast;
+							$infast_chr=~ s/\.fa//g;
+							$infast_chr_start=0;
+							$infast_chr_stop=$qStart1;
+							$sys ="$samtools faidx $infast $infast_chr:$infast_chr_start-$infast_chr_stop";
+							#die "test $sys\n";
+							$sys = `$sys`;
+							#die "$sys\n";
+							chomp($sys);
+							@sys=split("\n",$sys);
+							$INSERTION="";
+							for( $i=1;$i<@sys;$i++)
+							{
+								$INSERTION=$INSERTION.$sys[$i];							
+							}
+							$INSERTION_LENGTH=length($INSERTION);
+							$tmp = $tstart1+1;
+							print WRBUFF "$initial[$i][9]\tUNKNOWN\tUNKNOWN\t$Strand1\t$Chr1\t$tmp\t$Strand1\t$coverage1\n";
+							$super_flag = 1;
+						}	
+					}
+				}
+				elsif(@blockSizes == 2)
+				{
+					$temp1=$tstart1+$blockSizes[0]+1;
+					$temp2=$tstop1-$blockSizes[1]-1;
+					print  WRBUFF "$initial[$i][9]\t$Chr1\t$temp1\t$Strand1\t$Chr1\t$temp2\t$Strand1\t$coverage1\n";
+				
+				}		
+			}
+		}
+		
+	}
+	close(WRBUFF);
+	
+	undef(@temp);
+}
+ 
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/script/cluster.pair.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,70 @@
+#!/usr/bin/perl                                                                                                                                            
+use strict;
+use POSIX;
+
+my $usage = "cluster.pair.pl maxdist\n";
+my $maxdist = shift or die $usage;
+
+my %count;
+
+while (<STDIN>){
+    chomp;
+    my ($sample, $chrstart, $start, $chrend, $end) = split /\t/;
+    my $nstart = floor ($start/$maxdist);
+    my $nend   = floor ($end/$maxdist);
+    my $coord = {start=>$start, end=>$end};
+
+    push @{$count{$chrstart}->{$nstart}->{$chrend}->{$nend}->{$sample}}, $coord;
+}
+
+print_groups (\%count);
+
+sub print_groups {
+    my ($rcount) = @_;
+    my %count = %{$rcount};
+
+    foreach my $chrstart (sort {$a<=>$b} keys %count) {
+	foreach my $posstart (sort {$a<=>$b} keys %{$count{$chrstart}}) {
+	    my %fcoord = %{$count{$chrstart}->{$posstart}};
+
+	    foreach my $chrend (sort {$a<=>$b} keys %fcoord) {
+		foreach my $posend (sort {$a<=>$b} keys %{$fcoord{$chrend}}){
+		    my @nsamples = sort {$a cmp $b} (keys %{$fcoord{$chrend}->{$posend}});
+
+		    my $cpos = $fcoord{$chrend}->{$posend};
+
+		    my @coords;
+		    my $totnum=0;
+	    
+		    foreach my $sample (@nsamples) {
+			my ($num, $avgx, $avgy) = calc_moments(@{$cpos->{$sample}});
+			push (@coords, {start=>$avgx, end=>$avgy});
+			$totnum+=$num;
+		    }
+
+		    my ($num, $avgx, $avgy)  = calc_moments(@coords);
+	    
+		    print $chrstart."\t".$avgx."\t".$chrend."\t".$avgy ."\t".$num."\t".$totnum."\t" ;
+	    
+		    print $_."\t" foreach (@nsamples);
+		    print "\n";
+		}
+	    }
+	}
+    }
+}
+
+sub calc_moments {
+    my (@pos) = @_;
+
+    my ($num, $sumx, $sumy) = (0,0,0);
+    foreach my $cpos (@pos) {
+	$num++;
+	$sumx+=$cpos->{start};
+	$sumy+=$cpos->{end};
+    }
+    my $avgx = sprintf ("%d", $sumx/$num);
+    my $avgy = sprintf ("%d", $sumy/$num);
+
+    return ($num, $avgx, $avgy);
+}
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/script/direction_filter.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,55 @@
+use Getopt::Long;
+my ($v);
+
+GetOptions ("v|verbose"  => \$v);   # flag
+
+
+
+open (FILE,"$ARGV[0]") or die "Cant find file\n\n";
+my $dist=0;
+my $pos=0;
+my @max=0;
+my @events=0;
+
+while(<FILE>){
+	$dist=0;
+	@first=split(/\s+/,$_);
+	$numEvents=($_=~tr/\|//)+1;
+	$dist=$first[1]-$pos;
+	push(@max,$_);
+	push(@events,$numEvents);
+#print "STARTING_POS=$pos\n";
+	if(($dist<500)||eof()){
+		until (($dist>500)||eof()){
+			$newline=<FILE>;
+			@second=split(/\s+/,$newline);
+			$numEvents=($newline=~tr/\|//)+1;
+			push(@max,$newline);
+			push(@events,$numEvents);
+			if($v){print "DIST=$dist\nSEC1=@second[1] POS1=$pos;\n";}
+			my $tmp=$pos;
+			$pos=@second[1];
+			$dist=@second[1]-$tmp;
+		}
+	}
+if ($v){print "Corrected dist= $dist\n" if ($v)};
+	#Get the last values since they don't count
+	$NL=pop(@max);
+	$NE=pop(@events);
+	my $idxMax = 0;
+	#Get the index of the largest value in array
+	if ($v){print "Picking from events:\n"};
+	$events[$idxMax] > $events[$_] or $idxMax = $_ for 1 .. $#events;
+
+	my $val=@max[$idxMax];
+	print "$val" unless ($val=~/^0$/) ;
+	
+	
+	@max=$NL;
+	@events=$NE;		
+	my @tmp=split(/\s+/,$NL);
+	$pos=$tmp[1];
+}
+
+close FILE;
+
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/script/reduce_redundancy.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,65 @@
+open(BUFF,"$ARGV[0]") or die "no input file found\n";
+$range="$ARGV[1]";
+my %hash;
+my %store;
+$prev_chr="";
+$next=0;
+while(<BUFF>)
+{
+	chomp($_);
+	#print "$.\n";
+	if($_ !~ m/^#/)
+	{
+		@array=split("\t",$_);
+		$chr=$array[0];
+		$pos=$array[1];
+		$value=$array[@array-1];
+		if($prev_chr ne $chr )
+		{
+			if($prev_chr ne "")
+			{
+				foreach $key (sort {$hash{$b} <=> $hash{$a} } keys %hash)
+                        	{
+                                	print "$store{$key}\n";
+                                	last;
+                        	}
+
+			}
+			$next = $pos+$range;
+			undef(%hash);
+			undef(%store);
+		}
+		if($next< $pos)
+		{	
+			foreach $key (sort {$hash{$b} <=> $hash{$a} } keys %hash)
+			{
+     				print "$store{$key}\n";
+				last;
+			}
+			$next = $pos+$range;
+			undef(%hash);
+			undef(%store);
+			
+		}	
+		if($value eq "NA")
+                {
+                      $hash{$chr." ".$pos." ".$.}=0;
+                }
+                else
+                {
+                       $hash{$chr." ".$pos." ".$.}=$value;
+               	}
+                $store{$chr." ".$pos." ".$.}=$_;
+	}
+	else
+	{
+		print $_."\n";
+	}
+	$prev_chr = $chr;
+}
+foreach $key (sort {$hash{$b} <=> $hash{$a} } keys %hash)
+{
+       print "$store{$key}\n";
+       last;
+}
+
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/script/run_blat.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,68 @@
+#####################################################################################################################################################
+#Purpose: To perform blat and organize blat
+#Date: 07-19-2013
+#####################################################################################################################################################
+use Getopt::Long;
+#reading input arguments
+&Getopt::Long::GetOptions(
+'BLAT_PATH=s'=> \$blatpath,
+'REF_FILE=s'=> \$reffile,
+'INPUT_FILE=s' => \$inputfile,
+'OUTPUT_FILE=s' => \$outputfile,
+'MIN_SCORE=s'=> \$minScore,
+'MIN_IDENTITY=s'=> \$minidentity,
+'BLAT_PORT=s'=>\$blatport
+);
+$blatpath =~ s/\s|\t|\r|\n//g;
+$reffile=~ s/\s|\t|\r|\n//g;
+$inputfile=~ s/\s|\t|\r|\n//g;
+$outputfile=~ s/\s|\t|\r|\n//g;
+$minScore=~ s/\s|\t|\r|\n//g;
+$minidentity=~ s/\s|\t|\r|\n//g;
+$blatport=~ s/\s|\t|\r|\n//g;
+#input arguments
+
+#checking for missing arguments
+if($blatport eq "" || $blatpath eq "" || $reffile eq "" || $inputfile eq "" || $outputfile eq "" || $minScore eq "" || $minidentity eq "")
+{
+	die "missing arguments\n USAGE : perl perl_blat.pl -BLAT_PORT <BLAT_PORT> -MIN_SCORE <MIN_SCORE> -MIN_IDENTITY <MIN_IDENTITY> -BLAT_PATH <PATH TO BLAT FOLDER> -REF_FILE <PATH TO 2bit file> -INPUT_FILE <INPUT CONFIG FILE> -OUTPUT_FILE <OUTPUT FILE>\n";
+}
+
+#parsing the arguments
+
+#unless(-d $outdir)
+#{
+#	system("mkdir -p $outdir");
+#}
+$status=`$blatpath/gfServer status localhost $blatport |wc -l`;
+chomp($status);
+$count = 0;
+while($status < 2 )
+{
+	if($count > 0)
+	{
+		$blatport = $blatport+int(rand(1000))+1;
+	}
+	print "Starting the server\n";
+	$sys ="$blatpath/gfServer start -canStop localhost $blatport $reffile &";
+	print "$sys\n";
+	system($sys);
+	sleep(300);
+	$status=`$blatpath/gfServer status localhost $blatport |wc -l`;
+	chomp($status);
+	$count++;
+	if($count > 5)
+	{
+		die "something wrong with gfServer or command . Failed 5 times\n";
+	}
+}	
+print "querying \n";
+$sys = "$blatpath/gfClient localhost $blatport / $inputfile $outputfile -minScore=$minScore -minIdentity=$minidentity";
+print "$sys\n";
+system($sys);
+print "stoping the server\n";
+#$sys = "$blatpath/gfServer stop localhost $blatport";
+$pid = `ps|grep gfServer|head -1|cut -f1 -d ' '`;
+$sys ="kill -9 $pid";
+print "$sys\n";
+system($sys);
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/script/standalone_blat2.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,267 @@
+#!/usr/bin/perl -sw
+use Getopt::Long;
+sub usage(){
+    print "
+    Usage: <VCF> -g <genome.2bit> -seq|s <seq.fa> -f genome.fa 
+	-o out.vcf
+	-n contig.names
+        -dist   how wide of a window to look for bp [50]\n
+	-v	verbose option
+        Requires samtools,bedTools, and blat in your path\n;
+        ";
+    die;
+}
+#Initialize values
+my ($blat,$genome,$tei_bed,$vntr_bed,$out_vcf,$contig_names,$contig,$fasta,$uninformative_contigs,$dist,$verbose,$bedTools,$samtools);
+GetOptions ("genome|g=s" => \$genome,
+            "o|out:s" => \$out_vcf,
+            "names|n:s" => \$contig_names,
+            "seq|s=s" => \$contig,
+            "f|fasta:s" => \$fasta,
+	    "b|bad:s" => \$uninformative_contigs,
+            "dist:s" => \$dist,
+	    "v" => \$verbose
+	    );
+#$genome="/data2/bsi/reference/db/hg19.2bit""
+#$blat="/projects/bsi/bictools/apps/alignment/blat/34/blat" ;
+#TEI.bed=egrep "LINE|SINE|LTR" /data5/bsi/epibreast/m087494.couch/Reference_Data/Annotations/hg19.repeatMasker.bed >TEI.bed
+#VNTR_BED=egrep "Satellite|Simple_repeat" /data5/bsi/epibreast/m087494.couch/Reference_Data/Annotations/hg19.repeatMasker.bed > VNTR.bed
+
+
+$blat=`which blat`;
+if (!$blat) {die "Your do not have BLAT in your path\n\n"}
+$samtools=`which samtools`;
+if (!$samtools) {die "Your do not have samtools in your path\n\n"}
+$bedTools=`which sortBed`;
+if (!$bedTools) {die "Your do not have bedTools in your path\n\n"}
+
+
+if (!$dist) {$dist=50}
+if (!$out_vcf) {$out_vcf="out.vcf"}
+if (!$contig_names) {$contig_names="contig.names"}
+if (!$uninformative_contigs) {$uninformative_contigs="uninformative.contigs"}
+
+if ((!$genome)||(!$contig)||(!$fasta)){&usage;die;}
+
+
+open(VCF,"$ARGV[0]") or die "must specify VCF file\n\n";
+open(OUT_VCF,">",$out_vcf) or die "can't open the output VCF\n";
+open(CONTIG_LIST,">",$contig_names) or die "can't open the contig names\n";
+open(BAD_CONTIG_LIST,">",$uninformative_contigs) or die "can't open the contig names\n";
+#print "writing to CONTIG_LIST=$contig_names\n";
+while (<VCF>) {
+    if($_=~/^#/){
+        if ($.==1) {
+            print OUT_VCF $_;
+            print OUT_VCF "##INFO=<ID=STRAND,Number=1,Type=String,Description=\"Strand to which assembled contig aligned\">\n";
+            print OUT_VCF "##INFO=<ID=CONTIG,Number=1,Type=String,Description=\"Name of assembeled contig matching event\">\n";
+            print OUT_VCF "##INFO=<ID=MECHANISM,Number=1,Type=String,Description=\"Proposed mechanism of how the event arose\">\n";
+            print OUT_VCF "##INFO=<ID=INSLEN,Number=1,Type=Integer,Description=\"Length of insertion\">\n";
+            print OUT_VCF "##INFO=<ID=HOM_LEN,Number=1,Type=Integer,Description=\"Length of microhomology\">\n"; 
+            next;
+        }
+    else {
+        print OUT_VCF $_;
+        next;
+        }
+    };
+    chomp;
+
+    ##look for exact location of BP
+    @line=split("\t",$_);
+    my($left_chr,$start,$end);
+
+    #Get left position
+    $left_chr=$line[0];
+    $start=$line[1]-$dist;
+    $end=$line[1]+$dist;
+
+    #Get right position
+    my ($mate_pos,@mate,$mate_chr,$mate_start,$mate_end);
+    $mate_pos=$line[4];
+    $mate_pos=~s/[\[|\]|A-Z]//g;
+    #print "mate_pos=$mate_pos\n";
+    @mate=split(/:/,$mate_pos);
+    $mate_chr=$mate[0]; $mate_pos=$mate[1];
+    $mate_start=$mate_pos-$dist;$mate_end=$mate_pos+$dist;
+    #print "$left_chr:$start-$end\n$mate_chr:$mate_start-$mate_end\n";
+    
+    #Run through blat
+    my ($result1,$result2);
+    my $target1=join("",$left_chr,":",$start,"-",$end);
+    my $target2=join("",$mate_chr,":",$mate_start,"-",$mate_end);
+    #print "target1=$target1\ttarget2=$target2\n";die;
+    $result1=get_result($target1);
+    $result2=get_result($target2);
+   
+
+    my $NOV_INS="";
+    #If there is a NOV_INS, then there shouldn't be any output, so trick the results
+    if ($_=~/EVENT=NOV_INS/) {
+        $mate_start=$start;
+        $NOV_INS="true";
+        if (!$result1) {$result1=join("\t","0","0","0","0","0","0","0","0","+","UNKNOWN_NODE","0","0",$dist);}
+        if (!$result2) {$result2=join("\t","0","0","0","0","0","0","0","0","+","UNKNOWN_NODE","0","0",$dist);}
+   }
+    
+    #Skip over events that aren't supported
+    if ((!$result1)||(!$result2)){
+	my @tmp1=split("\t",$result1);
+	my @tmp2=split("\t",$result2);
+	if ($tmp1[9]) {print BAD_CONTIG_LIST "$tmp1[9]\n"}
+	if ($tmp2[9]) {print BAD_CONTIG_LIST "$tmp2[9]\n" }
+	next;
+    }
+    #Parse blat results   
+    my @result1=split("\t",$result1);
+    my @result2=split("\t",$result2);
+if($result2[9] ne $result1[9]){print "$result2[9] != $result1[9]\n";next}
+    #print "@result1\n@result2\n";die;
+    my $pos1=$start+($result1[12]-$result1[11]);
+    my $pos2=$mate_start+($result2[12]-$result2[11]);
+    #print "$_\n$pos1\t$pos2\n";
+    
+    ##############################################################
+    ### Build Classifier
+    
+    my ($QSTART1,$QEND1,$QSTART2,$QEND2,$len,$MECHANISM, $INSERTION, $DELETION, $bed_res1,$bed_res2);
+    $MECHANISM="UNKNOWN";
+    $len="UNKNOWN";
+    #Make sure the later event is second
+    if ($result1[11] <  $result2[11]){
+	$QSTART1=$result1[11];
+	$QEND1=$result1[12];
+	$QSTART2=$result2[11];
+	$QEND2=$result2[12];
+    }
+    else{
+	$QSTART1=$result2[11];
+	$QEND1=$result2[12];
+	$QSTART2=$result1[11];
+	$QEND2=$result1[12];
+    }
+    #Now calculate the difference between $QEND1 and QSTART2
+    if($verbose){print "QEND1=$QEND1\tQSTART2=$QSTART2\n";}
+    $len=$QEND1-$QSTART2;
+    #Check for TEI
+    if($_=~/MECHANISM=TEI/){$MECHANISM="TEI"}
+    elsif($_=~/MECHANISM=VNTR/){$MECHANISM="VNTR"}
+    else{
+        if ($len==0) {$MECHANISM="NHEJ"}
+	else{
+	    if ($len>0){$INSERTION="true"}
+		if ($len<0){$DELETION="true"}
+		    if ($INSERTION){
+		        if ($len>10) {$MECHANISM="FOSTES"}
+		        else{$MECHANISM="NHEJ"}
+		    }
+		elsif ($DELETION){
+		    if ($len>100) {$MECHANISM="NAHR"}
+		        elsif ($len > 2){$MECHANISM="altEJ"}
+		        else{$MECHANISM="NHEJ"}
+	        }
+	    }	
+	}
+
+    
+#if ($verbose){print "@result1";print "@result2";}
+
+    #print out VCF
+    #############################################################
+    # create temporary variable name
+    #############################################################
+    srand (time ^ $$ ^ unpack "%L*", `ps axww | gzip -f`);
+    my $random_name=join "", map { ("a".."z")[rand 26] } 1..8;
+    my $random_name2=join "", map { ("a".."z")[rand 26] } 1..8;
+   
+   #Get Ref Base
+   my ($ref_base,$alt_base,$tmp_mate_pos);
+   $ref_base=getBases($left_chr,$pos1,$fasta);
+   $alt_base=getBases($mate_chr,$pos2,$fasta);#print "ALT=$alt_base\n";
+   #Substitute the new mate position and base
+   $tmp_mate_pos=$line[4];
+   $tmp_mate_pos=~s/$mate_pos/$pos2/;
+   $tmp_mate_pos=~s/[A-Z]/$alt_base/;
+   #split apart the INFO field to adjust the ISIZE and MATEID
+   my $NEW_INFO="";
+   my @INFO=split(/;/,$line[7]);
+   for (my $i=0;$i<@INFO;$i++){
+        if ($INFO[$i] =~ /^ISIZE=/){
+            my @tmp=split(/=/,$INFO[$i]);
+            $NEW_INFO.="ISIZE=";
+            my $new_ISZIE=$pos2-$pos1;
+            $NEW_INFO.=$new_ISZIE
+            }
+        elsif($INFO[$i] =~ /^MATE_ID=/){
+            $NEW_INFO.=";MATE_ID=".$random_name2 . ";";
+        }
+        else{
+            $NEW_INFO.=$INFO[$i].";";
+        }
+   }
+   #ADD in strand and name
+   $NEW_INFO.="STRAND=".$result1[8];
+   $NEW_INFO.=";CONTIG=".$result1[9];
+   if($MECHANISM!~/TEI|VNTR/){$NEW_INFO.=";MECHANISM=".$MECHANISM;}
+    $NEW_INFO.=";HOM_LEN=".$len;
+   #don't pring contig nage if its a novel insertion
+   if(!$NOV_INS){print CONTIG_LIST "$result1[9]\n";}#else{print "I'm not printing $result1[9]\n";}
+    print OUT_VCF "$left_chr\t$pos1\t$random_name\t$ref_base\t$tmp_mate_pos\t1000\tPASS\t$NEW_INFO\t$line[8]\t$line[9]\n";
+    #Now go through and fill info in for mate
+    #Substitute the new mate position and base
+   $tmp_mate_pos=$line[4];
+   $tmp_mate_pos=~s/$mate_pos/$pos1/;
+   $tmp_mate_pos=~s/[A-Z]/$ref_base/;
+   $tmp_mate_pos=~s/$mate_chr/$left_chr/;
+    $NEW_INFO="";
+    @INFO=split(/;/,$line[7]);
+   for (my $i=0;$i<@INFO;$i++){
+    if ($INFO[$i] =~ /^ISIZE=/){
+            my @tmp=split(/=/,$INFO[$i]);
+            $NEW_INFO.="ISIZE=";
+            my $new_ISZIE=$pos2-$pos1;
+            $NEW_INFO.=$new_ISZIE
+            }
+        elsif($INFO[$i] =~ /^MATE_ID=/){
+            $NEW_INFO.=";MATE_ID=".$random_name.";";
+        }
+        else{
+            $NEW_INFO.=$INFO[$i].";";
+        }
+   }
+    #ADD in strand and name
+   $NEW_INFO.="STRAND=".$result2[8];
+   $NEW_INFO.=";CONTIG=".$result2[9];
+   if ($MECHANISM!~/TEI|VNTR/){$NEW_INFO.=";MECHANISM=".$MECHANISM;}
+    $NEW_INFO.=";HOM_LEN=".$len;
+
+   #don't pring contig nage if its a novel insertion
+   if(!$NOV_INS){print CONTIG_LIST "$result2[9]\n";} #else{print "I'm not printing $result1[9]\n";}
+    print OUT_VCF "$mate_chr\t$pos2\t$random_name2\t$alt_base\t$tmp_mate_pos\t1000\tPASS\t$NEW_INFO\t$line[8]\t$line[9]\n";
+	if ($verbose){print  "$mate_chr\t$pos2\t$random_name2\t$alt_base\t$tmp_mate_pos\t1000\tPASS\t$NEW_INFO\t$line[8]\t$line[9]\n";}
+}
+close VCF;
+close OUT_VCF;
+close CONTIG_LIST;
+close BAD_CONTIG_LIST;
+sub get_result{
+        my $target=($_[0]);
+if($verbose){print "target=$target\n"}#;die;
+        my $cmd="blat $genome:$target $contig /dev/stdout -t=dna -q=dna -noHead|egrep -v \"Searched|Loaded\" |head -1";
+
+if ($verbose){print "$cmd\n"}        #print "$cmd\n";die;
+        my $result=`$cmd`;
+        next if (!$cmd);
+        return ($result);
+}
+sub getBases{
+        my ($chr,$pos1,$fasta)=@_;
+        my @result=();
+        if ($pos1 <0){print "$pos1 is not a valid position (likely caused by circular MT chromosome)\n";$result[1]="NA";};
+        @result = `samtools faidx $fasta $chr:$pos1-$pos1`;
+        if(!$result[1]){$result[1]="NA"};
+        chomp($result[1]);
+        return uc($result[1]);
+}
+
+
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/src/Annotate_SoftSearch.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,40 @@
+#!/usr/bin/perl
+open(VCF,"$ARGV[0]")||die "Usage: <VCF> <Annotation.bed>\n\n\t\t The annotation BED should be of exons\n";
+
+$bedtools=`which intersectBed`;
+if(!$bedtools){die "Requires Bedtools in path\n\n"}
+if(!$ARGV[1]){die "Usage: <VCF> <Annotation.bed>\n\n";}
+
+while (<VCF>){
+	if($_=~/^#/){print;next}
+	chomp;
+	@data=split(/\t/,$_);
+	#Get left pair information
+	$chr1=$data[0];
+        $pos1a=$data[1]-1;
+        $pos1b=$data[1];
+	#Get right pair information
+	$data[4]=~s/[ACTGactghr\[\]]//g;#$data[4]=~s/hr/chr/;
+	@pos2=split(/:/,$data[4]);
+	$chr2="chr";
+	$chr2.=$pos2[0];
+	$pos2a=$pos2[1]-1;
+	$pos2b=$pos2[1];
+	#Now get left side annotations
+	#
+	#print "LEFT=get_anno($chr1,$pos1a,$pos1b)\n";
+	$left_gene=get_anno($chr1,$pos1a,$pos1b);
+        #print "RIGHT=get_anno($chr2,$pos2a,$pos2b)\n";
+        $right_gene=get_anno($chr2,$pos2a,$pos2b);
+	print "$_\t$left_gene\t$right_gene\n";
+}
+
+close VCF;
+
+sub get_anno(){
+	my ($chr,$pos1,$pos2)=@_;
+ 	$result=`perl -e 'if(($chr)&&($pos1)&&($pos2)){print join(\"\\t\",$chr,$pos1,$pos2,\"\\n\")}else {print STDERR "Not all variables defined: chr,pos1,pos2=$chr,$pos1,$pos2\n$_\n"}'|intersectBed -a $ARGV[1] -b stdin|cut -f4|head -1`;
+	$result=~chomp;$result=~s/\n//;
+	if(!$result){$result="NA"};
+	return $result;
+}
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/src/Bam2pair.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,98 @@
+#!/usr/bin/perl
+#Author Steven Hart, PhD
+#11-15-2012
+#Convert and filter BAM files into merged bed 
+#Output should be 
+#chrA StartA EndA chrB StartB EndB Gene_id #supportingReads StrandA StrandB
+#chr9 1000 5000 chr9 3000 3800 bedpe_example2 100 - +
+
+use Cwd;
+use File::Basename;
+#Usage
+sub usage(){
+	print "Usage: perl Bam2Pair.pl -b <BAM> -o <outfile>\n
+		-isize [10000]\t\tThe insert size to be considered discordant\n
+		-winsize [10000]\tThe distance between mate pairs to be considered the same\n
+		-min [1]\t\tThe minimum number of reads required to support an SV event\n
+		-prefix need a random prefix so files with the same name don't get created\n\n"
+		;
+}
+$bedtools=`which intersectBed`;
+$samtools=`which samtools`;
+
+if(!defined($bedtools)){die "BEDtools must be installed\n";}
+if(!defined($samtools)){die "Samtools must be installed\n";}
+use Getopt::Long;
+#Declare variables
+GetOptions(
+	'b=s' => \$BAM_FILE,		#path to bam
+	'out=s' => \$outfile,		#path to output
+	'java:s' => \$java	,
+        'chrom:s' => \$chrom      ,
+	'isize=i' => \$isize,
+	'winsize=i' => \$winsize,
+        'prefix=s' => \$prefix,
+	'min=i' => \$minSupport,
+	'blacklist:s' => \$new_blacklist,
+	'q=s' => \$qual,
+	'v' => \$verbose
+	);
+#if(defined($picard_path)){$picard_path=$picard_path} else {die "Must specify a path to PICARD so that files can be sorted and indexed properly\n"};
+if(!defined($BAM_FILE)){die "Must specify a BAM file!\n".usage();}
+if(!defined($outfile)){die "Must specify an out filename!\n".usage();}
+if(!defined($java)){$java=$java;}else{$java=`which java`}
+if(!defined($qual)){$qual=20}
+if($new_blacklist){$new_blacklist=" -L $new_blacklist"}
+
+
+$Filter_BAM=$BAM_FILE;
+
+@bam=split("/",$Filter_BAM);
+$Filter_BAM=@bam[@bam-1];
+$Filter_BAM=~s/.bam/$prefix.bam/;
+$Filter_sam=$Filter_BAM;
+$Filter_sam=~s/.bam/.sam/;
+
+
+
+
+print "\nLooking for Discordant read pairs (and Unmated reads) without soft-clips\n";
+
+#$command=join("","samtools view -h -q 20 -f 1 -F 1804 ",$BAM_FILE," ",$chrom," ",$new_blacklist," |  awk -F\'\\t\' \'{if (\$9 > ", $isize, " || \$9 < -",$isize," || \$9 == 0 || \$1 ~ /^@/) print \$0}' > ",$Filter_sam);
+
+
+#Change command to allow reads where mate is unmapped & remove qual
+$command=join("","samtools view -h -f 1 -F 1800 -q $qual ",$BAM_FILE," ",$chrom," ",$new_blacklist," |  awk -F\'\\t\' \'{if (\$9 > ", $isize, " || \$9 < -",$isize," || \$9 == 0 || \$1 ~ /^@/) print \$0}' > ",$Filter_sam);
+
+print "$command\n" if ($verbose);
+system($command);
+$path = dirname(__FILE__);
+$Filter_cluster=$Filter_sam;
+$Filter_cluster=~s/.sam/.cluster/;
+$command=join("",$path,"/ReadCluster.pl -i=$Filter_sam -o=$Filter_cluster -m $minSupport");
+if($verbose){print "\n$command\n"};	
+
+system($command);
+
+##################################
+#Now there are 2 SAM files of filtered reads
+#.filter.cluster.inter.sam
+#.filter.cluster.intra.sam
+$result_pe=join("",$Filter_cluster,".out");
+$command=join("","cat ",$Filter_cluster,".int\*|perl -ane 'next if(\@F[0]=~/^\@/);if(\@F[6]!~/=/){print join(\"\\t\",\$F[11],\@F[2],\@F[3],\@F[6],\@F[7],\"\\n\")}else{print join(\"\\t\",\$F[11],\@F[2],\@F[3],\@F[2],\@F[7],\"\\n\")}' >",$result_pe);
+if($verbose){print $command."\n"};
+system($command);
+ #my ($sample, $chrstart, $start, $chrend, $end) 
+$command=join("","cat ",$result_pe," | ",$path,"/cluster.pair.pl ",$winsize," |awk '(\$6 >",$minSupport,")' >> ", $outfile);
+if($verbose){print $command."\n"};
+system($command);
+$filt1=join("",$Filter_cluster,".inter.sam");
+$filt2=join("",$Filter_cluster,".intra.sam");
+
+
+unlink($Filter_sam,$filt1,$filt2,$result_pe);
+
+#########################################
+#Now determine if left or righ clipping surrogate
+print "\nBam2pair.pl Done\n";
+
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/src/Check_integration.sh	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,86 @@
+#!/bin/sh
+#$ -V
+#$ -cwd
+#$ -q 1-day
+#$ -m ae
+#$ -M hart.steven@mayo.edu
+#$ -l h_vmem=8G
+#$ -l h_stack=10M
+VCF_FILE=$1
+x=$2
+#VIRAL_SEQDB=/data2/bsi/tertiary/m110344/SoftTile/Mia/BLAST_DB/OBrien/Virus_PCGS.fasta #must me indexed by bwasw
+VIRAL_SEQDB=$3
+VCF_FILE=$4
+#VCF_FILE=final.vcf
+
+set -x 
+
+perl -ane '$dist=100;
+$mate=$F[4];
+$mate=~s/[A-Z]|\[|\]//g;
+@mate=split(/:/,$mate);
+$end1a=@F[1]-$dist;
+$end1b=@F[1]+$dist;
+$end2a=$dist+@mate[1];
+$end2b=$dist+@mate[1];
+print "@F[0]\t$end1a\t$end1b\n@mate[0]\t$end2a\t$end2b\n"' $VCF_FILE|
+sortBed > targets.bed
+
+
+#100 min
+time samtools view -h $x -L targets.bed |awk '(($9==0)&&($11!~/#/)&&($3!~/^chrGL/)&&($3!~/^chrM/))'|perl -ane 'print "\@@F[0]\n@F[9]\n+\n@F[10]\n"' >${x%%.bam}.res.fq
+#23 min
+time bwa mem -t 4 $VIRAL_SEQDB ${x%%.bam}.res.fq |samtools view -S - |grep gi > ${x%%.bam}.tmp.sam
+
+#find out how many hits there are
+cut -f3 ${x%%.bam}.tmp.sam|perl -ne '@_=split(":",$_);@res=split(/_/,@_[1],2);print "@res[1]"' | sort|uniq -c|sort -k1nr|tee ${x%%.bam}.Viral_maps.out |head
+#Get the reads mapping to those hits to find out where the integration site is
+
+#Read in the viruses until there is a significant drop off in number of reads (i.e. contributing less than 10%)
+perl -ne '@_=split(" ",$_);$i=$_[0]+$i;$j=$_[0];$res=$j/$i;if($res>.1){print "@_[1]\n"}' ${x%%.bam}.Viral_maps.out >${x%%.bam}.to.keep
+fgrep -f ${x%%.bam}.to.keep ${x%%.bam}.tmp.sam |cut -f1 >${x%%.bam}.reads
+
+#75min+
+
+time samtools view $x -L targets.bed |
+fgrep -f ${x%%.bam}.reads|
+awk '{if(($9==0)&&($11!~/#/)&&($3!~/^chrGL/)&&($3!~/^chrM/)&&($3!~/^\*/)){print $3"\t"$4"\t"$4"\t"$1}}'|
+tee ${x%%.bam}.unsorted.bed|
+sortBed | mergeBed -nms -d 1000|
+perl -e 'open (FILE,"$ARGV[0]") or die "cant open file\n\n";
+ $SAM="$ARGV[1]";
+ $SAM=~chomp;
+ while(<FILE>){
+chomp;
+  @_=split(/\t/,$_);
+  @reads=split(/;/,@_[3]);
+#print "LINE=$_\nRES=grep $reads[0] $SAM\n";
+  $res=`grep $reads[0] $SAM` ;
+#  print "AFTER GREP, RES=$res\n";
+  if($res){
+   @res=split(/\t/,$res);
+   print join("\t",@_[0..2],@res[2])."\n"
+   }
+  };
+ close FILE' - ${x%%.bam}.tmp.sam |
+perl -ne 's/\|/\t/g;@_=split("\t",$_);print join ("\t",@_[0..2,7])'|
+perl -pne 's/_/\t/'|  cut -f4 --complement |
+perl -e '
+ open (FILE,"$ARGV[0]") or die "cant open file\n\n";
+ $SAM="$ARGV[1]";
+ while(<FILE>){
+  chomp;
+  @_=split(/\t/,$_);
+  $res=`grep $_[3] $SAM`;
+  if($res){
+   @res=split(" ",$res);
+   $reads[0]=chomp;
+   print join("\t",@_[0..4],@res[0])."\n";
+  }
+ }
+close FILE;
+' - ${x%%.bam}.Viral_maps.out|
+perl -pne 's/_/ /g'> ${x%%.bam}.Virus.integrated.bed
+
+rm ${x%%.bam}.reads ${x%%.bam}.to.keep ${x%%.bam}.tmp.sam ${x%%.bam}.res.fq
+echo "DONE with $x"
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/src/Extract_nSC.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,27 @@
+#!/usr/bin/perl -w
+
+use Getopt::Long;
+
+#Initialize values
+my (@queries,@HEADER,$samples,@HEADER_OUT,$end,$samp);
+GetOptions ("query|q=s" => \$queries);
+if(!$queries){die "Usage: FORMAT_extract.pl <VCF> -query nSC 
+\n\n";}
+
+
+open (VCF,"$ARGV[0]") or die "Usage: <VCF>";
+
+while (<VCF>) {
+        if($_=~/^##/){print;next}
+    chomp;
+    @line=split(/\t/,$_);
+    if($line[0]=~/^#CH/){
+        print join ("\t",@line,$queries)."\n";
+	next}
+ @FORMAT=split(/:/,$line[8]);
+ @SAMPLE=split(/:/,$line[9]);
+	for($i=0;$i<@FORMAT;$i++){
+	if($FORMAT[$i] =~/^$queries$/){print join ("\t",@line,$SAMPLE[$i])."\n";next}
+	}
+}
+close VCF;
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/src/Merge_SV.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,218 @@
+#!/usr/bin/perl -w
+use Getopt::Long;
+use List::Util qw(min max);
+
+
+#Declare variables
+my ($window,$tmpSpace,$usage,$help,$outFile);
+
+GetOptions(
+        'v=s{2,}' => \@VCF,
+        'o:s' => \$outFile,
+        'w:s' => \$window,
+		'h|help' => \$help
+);
+
+if((!@VCF)||($help)){&usage();exit}
+
+
+if (!$window) {
+    $window=500;
+}
+if (!$outFile) {
+    $outFile="merged.vcf.out";
+}
+###########################################
+# Protect against merging too many results
+###########################################
+$tmpSpace='temporarySV_merge';
+if (-e $tmpSpace) {
+    #Delete temp file if it exists
+    unlink $tmpSpace;
+}
+###########################################
+#For each VCF, create a BEDPE file
+###########################################
+
+open(OUT,">>$tmpSpace") or die "Can't write in this directory\n";
+for (my $i=0;$i<@VCF;$i++){
+    #print STDERR "opening $VCF[$i]\n";
+    open(VCF,$VCF[$i]) or die &usage();
+    while (<VCF>) {
+        next if ($_=~/^#/);
+        chomp;
+        @line=split("\t",$_);
+        $mate=$line[4];
+        $mate=~s/[A-L]|[N-W]|[Z]|\[|\]//g;
+        @mate=split(/:/,$mate);
+        $end1a=$line[1]-$window;
+        $end1b=$line[1]+$window;
+        $end2a=$mate[1]-$window;
+        $end2b=$mate[1]+$window;
+        next if (($end1a<0)||($end2a<0));
+        if (($line[0]=~/^chr$/)||($mate[0]=~/^chr$/)) {
+            next;
+        }
+        print OUT "$line[0]\t$end1a\t$end1b\t$mate[0]\t$end2a\t$end2b\n";
+        print OUT "$mate[0]\t$end2a\t$end2b\t$line[0]\t$end1a\t$end1b\n";
+    }
+}
+close OUT;
+
+###########################################
+#Now merge the BEDPE into a unique BEDPE
+###########################################
+#Make sure the BEDPE is sorted
+#print "Make sure the BEDPE is sorted\n";
+my $tmpSpace2=join("",$tmpSpace,".2");
+system("cat $tmpSpace|sort -k1,1 -k2,3n -k4,4 -k5,5n -u > $tmpSpace2");
+unlink($tmpSpace);
+
+#Create output files for the left and right merged BEDPE
+my $tmpSpace3=join("",$tmpSpace,".3");
+my $tmpSpace4=join("",$tmpSpace,".4");
+
+open (OUT1,">$tmpSpace3") or die "Cant write in this directory\n";
+open (OUT2,">$tmpSpace4") or die "Cant write in this directory\n";
+
+open(BEDPE,"$tmpSpace2") or die "$tmpSpace2 has already been deleted\n";
+#Initialize positions
+#my ($chr1,$pos2,$pos3,$chr2,$pos3,$pos4);
+my (@chr,@pos1,@pos2,@chr2,@pos3,@pos4);
+while (<BEDPE>) {
+    ($chr1,$pos2,$pos3,$chr2,$pos3,$pos4)=split("\t",$_);
+	if(!$Echr1){
+	($Echr1,$Epos1,$Epos2,$Echr2,$Epos3,$Epos4)=split("\t",$_);
+	}
+    while ( 
+		 ($chr1 =~ /^$Echr1$/)&&
+           ($pos2 <= $Epos2+$window)&&
+            ($chr2 =~ /^$Echr2$/)&&
+           ($pos3 <= $Epos3+$window)
+          )
+        {$nextline = <BEDPE> ;
+		last if (!$nextline);
+		$nextline=~chomp;
+         ($chr1,$pos1,$pos2,$chr2,$pos3,$pos4)=split("\t",$nextline);
+		 #print "NEXTLINE=$nextline";
+         push (@chr1,$chr1);
+         push (@pos1,$pos1);
+         push (@pos2,$pos2);
+         push (@chr2,$chr2);
+         push (@pos3,$pos3);
+         push (@pos4,$pos4);   
+		  }
+    ($Echr1,$Epos1,$Epos2,$Echr2,$Epos3,$Epos4)=($chr1[0],min(@pos1),max(@pos2),$chr2[-2],min(@pos3),$pos4[-2]);
+    #print join("\t",$Echr1,$Epos1,$Epos2,$Echr2,$Epos3,$Epos4);
+	if($pos1>$pos2){my $tmp=$pos1;$pos1=$pos2;$pos2=$tmp}
+	if($pos1>$pos2){my $tmp=$pos3;$pos3=$pos4;$pos4=$tmp}
+	print OUT1 join ("\t",$chr1,$pos1,$pos2)."\n";
+	print OUT2 join ("\t",$chr2,$pos3,$pos4);
+	($Echr1,$Epos1,$Epos2,$Echr2,$Epos3,$Epos4)=($chr1,$pos1,$pos2,$chr2,$pos3,$pos4);
+	}
+close BEDPE;
+close OUT;
+unlink ($tmpSpace2);
+
+#####################################################################
+#Now find out for each Unique BEDPE, how many Samples was the SV in?
+#####################################################################
+#FOR EACH VCF
+#get NAME
+
+my $tmpSpace5=join("",$tmpSpace,".5");
+my $tmpSpace6=join("",$tmpSpace,".6");
+my $tmpSpace7=join("",$tmpSpace,".7");
+my $tmpSpace8=join("",$tmpSpace,".8");
+my $tmpSpace9=join("",$tmpSpace,".9");
+
+#Create a placeholder file
+system("paste $tmpSpace3 $tmpSpace4| awk '{OFS=\"\\t\"}{print \$1,\$2,\$3,\$4,\$5,\$6,0,\"NA\"}' > $tmpSpace7");
+#Convert the VCF into a BED PE
+for (my $i=0;$i<@VCF;$i++){
+	open (OUT,">$tmpSpace5") or die "Cant write in this directory\n";
+	open(VCF,$VCF[$i]) ;
+	print STDERR "Starting on $VCF[$i]\n";
+		while (<VCF>) {
+			next if ($_=~/^#/);
+			chomp;
+			@line=split("\t",$_);
+			$mate=$line[4];
+			$mate=~s/[A-L]|[N-W]|[Z]|\[|\]//g;
+			@mate=split(/:/,$mate);
+			$end1a=$line[1]-$window;
+			$end1b=$line[1]+$window;
+			$end2a=$mate[1]-$window;
+			$end2b=$mate[1]+$window;
+			next if (($end1a<0)||($end2a<0));
+			if (($line[0]=~/^chr$/)||($mate[0]=~/^chr$/)) {
+				#print "$_\n";
+				next;
+			}
+			print OUT "$line[0]\t$end1a\t$end1b\t$mate[0]\t$end2a\t$end2b\n";
+			print OUT "$mate[0]\t$end2a\t$end2b\t$line[0]\t$end1a\t$end1b\n";
+		}
+	close VCF;
+	close OUT;
+	#for each row in $tmpSpace3, count the number of overlaps on both sides
+	my $left=join("",$tmpSpace,".left");
+	my $right=join("",$tmpSpace,".right");
+	system("intersectBed -a $tmpSpace3 -b $tmpSpace5 -loj -c > $left");
+	system("intersectBed -a $tmpSpace4 -b $tmpSpace5 -loj -c > $right");
+
+	my $Lcount=`wc -l $left|cut -f1 -d" "`;
+	my $Rcount=`wc -l $right|cut -f1 -d" "`;
+	if ($Lcount != $Rcount){die "Need to check for errors in $left and $right\n\n"}
+	system("paste $left $right > $tmpSpace5");
+	system ("rm $left $right");
+	open (IN,"$tmpSpace5") or die "Cant find $tmpSpace5\n";
+	open (OUT,">$tmpSpace6") or die "Cant write in this directory\n";
+	while(<IN>){
+		$_=~chomp;
+		@lines=split("\t",$_);
+		if(($lines[3] > 0)&&($lines[6] > 0)){print OUT "1\t$VCF[$i]\n"}else{print OUT "0\t.\n"}
+		}
+	close IN;
+	close OUT;
+
+	system("paste $tmpSpace7 $tmpSpace6 > $tmpSpace8");
+	#system("head $tmpSpace7 $tmpSpace8");
+	 open (IN,"$tmpSpace8") or die "Cant find $tmpSpace8\n";
+	 open (OUT,">$tmpSpace9") or die "Cant write in this directory\n";
+	 my ($Samples,$NumSamples,$EVENT);
+	 while(<IN>){
+		 $_=~chomp;
+		 @lines=split("\t",$_);
+
+		 if ($lines[8] > 0){
+			$Samples=$lines[7].";".$lines[9];
+			$Samples=~s/^NA;//;
+			$NumSamples=$lines[6]+$lines[8];
+			}
+			else{
+			$Samples=$lines[7];
+			$NumSamples=$lines[6];
+			}
+			print OUT join ("\t",@lines[0..5],$NumSamples,$Samples)."\n";
+	 }
+	 close IN;
+	 close OUT;
+	 print STDERR "completed with $VCF[$i]\n";
+	 system("cp $tmpSpace9 $tmpSpace7");
+}
+
+system("cp $tmpSpace7 $outFile");
+unlink ($tmpSpace9, $tmpSpace8, $tmpSpace7, $tmpSpace9,$tmpSpace3, $tmpSpace4, $tmpSpace5, $tmpSpace6);
+print STDERR "Your results are in $outFile\n";
+
+
+sub usage(){
+    print "
+###
+### This script will merge multiple SoftSearch VCF files
+###
+
+Usage: Merge_SV.pl -v <vcf1> <vcf2> <vcfN> -w [500] -o <output file>
+   
+    Note: Must have bedtools installed and in your path\n\n\n";
+}
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/src/Merge_Soft.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,39 @@
+#!/usr/bin/perl -s
+#Merge Softsearch results by chrom
+if(!$ARGV[0]){die "Usage: <Sample.1.vcf>\n";}
+my ($sample,$cmd);
+
+#Get basename
+$sample="$ARGV[0]";
+$sample=~s/.[0-9(+)].out.vcf//;
+$sample=~s/.[0-9(+)].pe.vcf//;
+
+my $outfile=$sample;
+$outfile.="out.vcf";
+if( -e $outfile ){unlink($outfile)}
+$cmd="ls $sample\*vcf";
+my @samples=`$cmd`;
+print "there are " .scalar(@samples)." samples\n";
+
+open (OUT,">$outfile");
+my $i=1;
+my $tmp=@samples[$i];
+open(TMP,"$tmp");
+while (<TMP>){
+	print OUT if ($_=~/^#/);
+}
+
+open (OUT,">>$outfile");
+my $chr;
+for (my $i=0;$i<@samples;$i++){
+	my $tmp=@samples[$i];
+	open(TMP,"$tmp");
+	while (<TMP>){
+		unless (($_=~/^chrGL/)||($_=~/^#/)){print OUT $_;}
+	}
+	print "Done with $tmp";
+        unlink($tmp);
+	system("rm $tmp");
+	close TMP;
+}
+close OUT;
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/src/ReadCluster.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,191 @@
+#!/usr/bin/perl
+
+=head1 NAME
+   ReadCluster.pl
+
+=head1 SYNOPSIS
+
+    USAGE: ReadCluster.pl --input input_sam_file --output output_prefix [--threshold 10000 --minClusterSize 4]
+
+=head1 OPTIONS
+
+B<--input,-i>
+   Input file
+
+B<--output,-o>
+   output prefix
+
+B<--window, -w>
+    Window size
+
+B<--minClusterSize, -m>
+	Min size of cluster
+
+B<--help,-h>
+   This help message
+
+=head1  DESCRIPTION
+
+
+=head1  INPUT
+
+
+=head1  OUTPUT
+
+
+=head1  CONTACT
+  Jaysheel D. Bhavsar @ bjaysheel[at]gmail[dot]com
+
+
+==head1 EXAMPLE
+   ReadCluster.pl --input=filename.sam --window=10000 --output=PREFIX
+
+=cut
+
+use strict;
+use warnings;
+use Data::Dumper;
+use DBI;
+use Pod::Usage;
+use Scalar::Util qw(looks_like_number);
+use Getopt::Long qw(:config no_ignore_case no_auto_abbrev pass_through);
+
+my %options = ();
+my $results = GetOptions (\%options,
+                          'input|i=s',
+						  'output|o=s',
+                          'window|w=s',
+						  'minClusterSize|m=s',
+						  'help|h') || pod2usage();
+
+## display documentation
+if( $options{'help'} ){
+    pod2usage( {-exitval => 0, -verbose => 2, -output => \*STDERR} );
+}
+#############################################################################
+## make sure everything passed was peachy
+&check_parameters(\%options);
+
+my $r1_start = 0;
+my $r2_start = 0;
+my $r1_end = $r1_start + $options{window};
+my $r2_end = $r2_start + $options{window};
+my $r1_chr = "";
+my $r2_chr = "";
+
+my @cluster = ();
+
+open (FHD, "<", $options{input}) or die "Cound not open file $options{input}\n";
+open (INTRA, ">", $options{output} . ".intra.sam") or die "Cound not open file $options{output}.intra.sam\n";
+open (INTER, ">", $options{output} . ".inter.sam") or die "Cound not open file $options{output}.inter.sam\n";
+
+while (<FHD>){
+	chomp $_;
+
+	#skip processing lines starting with @ just print to output file.
+	if ($_ =~ /^@/){
+		print INTRA $_."\n";
+		print INTER $_."\n";
+		next;
+	}
+#print "$_\n";
+	check_sequence($_);
+}
+
+close(FHD);
+close(INTRA);
+close(INTER);
+
+exit(0);
+
+#############################################################################
+sub check_parameters {
+    my $options = shift;
+
+	my @required = ("input", "output");
+
+	foreach my $key (@required) {
+		unless ($options{$key}) {
+			print STDERR "ARG: $key is required\n";
+			pod2usage({-exitval => 2,  -message => "error message", -verbose => 1, -output => \*STDERR});
+			exit(-1);
+		}
+	}
+
+	unless($options{window}) { $options{window} = 10000; }
+	unless($options{minClusterSize}) { $options{minClusterSize} = 4; }
+}
+
+#############################################################################
+sub check_sequence {
+	my $line = shift;
+
+	my @data = split(/\t/, $line);
+
+	## check if mates are within the window.
+	if ((inWindow($data[3], 1)) && (inWindow($data[7], 2)) &&
+		($r1_chr =~ /$data[2]/) && ($r2_chr =~ /$data[6]/)) {
+
+		## if minClusterSize is reached output
+		if (scalar(@cluster) >= $options{minClusterSize}) {
+
+			## if chr are the same then print intra-chr else inter-chr
+			if ($data[6] =~ /=/) {
+				print INTRA $line."\n";
+			} else {
+				print INTER $line."\n";
+			}
+		} else {
+			push @cluster, $line;
+		}
+	} else {
+
+		if (scalar(@cluster) >= $options{minClusterSize}) {
+			dumpCluster(@cluster);
+		}
+
+		@cluster = ();
+		$r1_start = $data[3];
+		$r2_start = $data[7];
+		$r1_end = $r1_start + $options{window};
+		$r2_end = $r2_start + $options{window};
+		$r1_chr = $data[2];
+		$r2_chr = $data[6];
+	}
+}
+
+#############################################################################
+sub inWindow {
+	my $coord = shift;
+	my $read = shift;
+
+	my $start = 0;
+	my $end = 0;
+
+	if ($read == 1) {
+		$start = $r1_start;
+		$end = $r1_end;
+	} else {
+		$start = $r2_start;
+		$end = $r2_end;
+	}
+
+	if (($coord > $start) && ($coord < $end)){
+		return 1;
+	} else { return 0; }
+}
+
+#############################################################################
+sub dumpCluster {
+	my @cluster = shift;
+
+	foreach (@cluster){
+		my @data = split(/\t/, $_);
+
+		if ($data[6] =~ /=/) {
+			print INTRA $_."\n";
+		} else {
+			print INTER $_."\n";
+		}
+	}
+}
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/src/SoftSearch.multi.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,1183 @@
+#!/usr/bin/perl
+
+####
+#### Usage: SoftSearch.pl [-lqrmsd] -b <BAM> -f <Genome.fa> -sam <samtools path> -bed <bedtools path>
+#### Created 1-30-2012 by Steven Hart, PhD
+#### hart.steven@mayo.edu
+#### Required bedtools & samtools to be in path
+
+#use FindBin;
+#use lib "$FindBin::Bin/lib";
+use lib "/home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/lib" ;
+use Getopt::Long;
+use strict;
+use warnings;
+use Data::Dumper;
+use LevD;
+use File::Basename;
+use List::Util qw(min max);
+ 
+my (@INPUT_BAM,$INPUT_FASTA,$OUTPUT_FILE,$minSoft,$minSoftReads,$dist_To_Soft,$bedtools,$samtools);
+my ($minRP, $temp_output, $num_sd, $MapQ, $chrom, $unmated_pairs, $minBQ, $pair_only, $disable_RP_only);
+my ($levD_local_threshold, $levD_distl_threshold,$pe_upper_limit,$high_qual,$sv_only,$blacklist,$genome_file,$verbose);
+
+my $cmd = "";
+
+#Declare variables
+GetOptions(
+	'b=s{1,}' => \@INPUT_BAM,
+	'f=s' => \$INPUT_FASTA,
+	'o:s' => \$OUTPUT_FILE,
+	'm:i' => \$minRP,
+	'l:i' => \$minSoft,
+	'r:i' => \$minSoftReads,
+	't:i' => \$temp_output,
+	's:s' => \$num_sd,
+	'd:i' => \$dist_To_Soft,
+	'q:i' => \$MapQ,
+	'c:s' => \$chrom,
+	'u:s' => \$unmated_pairs,
+	'x:s' => \$minBQ,
+	'p' => \$pair_only,	
+	'g' => \$disable_RP_only,	#ignore softclips
+	'j:s' => \$levD_local_threshold,
+	'k:s' => \$levD_distl_threshold,
+    'a:s' => \$pe_upper_limit,
+    'e:s' => \$high_qual,
+	'L' => \$sv_only,
+	'v' => \$verbose, 
+	'blacklist:s' => \$blacklist,
+	'genome_file:s' => \$genome_file,
+	"help|h|?"	=> \&usage);
+#print "Using @INPUT_BAM as INPUT_BAM\n";
+unless($sv_only){$sv_only=""};
+my $INPUT_BAM=$INPUT_BAM[0];
+#print "MY NEW INPUT BAM=$INPUT_BAM[0]\n\n";die;
+if(defined($INPUT_BAM)){$INPUT_BAM=$INPUT_BAM} else {print usage();die "Where is the BAM file?\n\n"}
+if(defined($INPUT_FASTA)){$INPUT_FASTA=$INPUT_FASTA} else {print usage();die "Where is the fasta file?\n\n"}
+my ($fn,$pathname) = fileparse($INPUT_BAM,".bam");
+#my $index=`ls $pathname/$fn*bai|head -1`;
+#my $index =`ls \${INPUT_BAM%.bam}*bai`;
+#print "INDEX=$index\n";
+#if(!$index){die "\n\nERROR: you need index your BAM file\n\n"}
+my $index="";
+### get current time
+print "Start Time : " . &spGetCurDateTime() . "\n";
+my $now = time;
+
+#if(defined($OUTPUT_FILE)){$OUTPUT_FILE=$OUTPUT_FILE} else {$OUTPUT_FILE="output.vcf"; print "\nNo outfile specified.  Using output.vcf as default\n\n"}
+if(defined($minSoft)){$minSoft=$minSoft} else {$minSoft=5}
+if(defined($minRP)){$minRP=$minRP} else {$minRP=5}
+if(defined($minSoftReads)){$minSoftReads=$minSoftReads} else {$minSoftReads=5}
+if(defined($dist_To_Soft)){$dist_To_Soft=$dist_To_Soft} else {$dist_To_Soft=300}
+if(defined($num_sd)){$num_sd=$num_sd} else {$num_sd=6}
+if(defined($MapQ)){$MapQ=$MapQ} else {$MapQ=20}
+
+unless (defined $pe_upper_limit) { $pe_upper_limit = 10000; }
+unless (defined $levD_local_threshold) { $levD_local_threshold = 0.05; }
+unless (defined $levD_distl_threshold) { $levD_distl_threshold = 0.05; }
+#Get sample name if available
+my $SAMPLE_NAME="";
+my $OUTNAME ="";
+$SAMPLE_NAME=`samtools view -f2 -H $INPUT_BAM|awk '{if(\$1~/^\@RG/){sub("ID:","",\$2);name=\$2;print name}}'|head -1`;
+$SAMPLE_NAME=~s/\n//g;
+if (!$OUTPUT_FILE){
+	if($SAMPLE_NAME ne ""){$OUTNAME=$SAMPLE_NAME.".vcf"}
+	else {$OUTNAME="output.vcf"}
+}
+else{$OUTNAME=$OUTPUT_FILE}
+
+print "Writing results to $OUTNAME\n";
+
+
+##Make sure if submitting on SGE, to prepned the "chr".  Not all referecne FAST files require "chr", so we shouldn't force the issue.
+if(!defined($chrom)){$chrom=""}
+if(!defined($unmated_pairs)){$unmated_pairs=0}
+
+my $badQualValue=chr($MapQ);
+if(defined($minBQ)){ $badQualValue=chr($minBQ); }
+
+if($badQualValue  eq "#"){$badQualValue="\#"}
+
+# adding and cheking for samtools and bedtools in the PATh
+## check for bedtools and samtools in the path
+$bedtools=`which intersectBed` ;
+if(!defined($bedtools)){die "\nError:\n\tno bedtools. Please install bedtools and add to the path\n";}
+#$samtools=`samtools 2>&1`;
+$samtools=`which samtools`;
+if($samtools !~ /(samtools)/i){die "\nError:\n\tno samtools. Please install samtools and add to the path\n";}
+
+print "Usage = SoftSearch.pl -l $minSoft -q $MapQ -r $minSoftReads -d $dist_To_Soft -m $minRP -s $num_sd -c $chrom -b @INPUT_BAM -f $INPUT_FASTA -o $OUTNAME \n\n";
+sub usage {
+	print "\nusage: SoftSearch.pl [-cqlrmsd] -b <BAM> -f <Genome.fa> \n";
+	print "\t-q\t\tMinimum mapping quality [20]\n";
+	print "\t-l\t\tMinimum length of soft-clipped segment [5]\n";
+	print "\t-r\t\tMinimum depth of soft-clipped reads at position [5]\n";
+	print "\t-m\t\tMinimum number of discordant read pairs [5]\n";
+	print "\t-s\t\tNumber of sd away from mean to be considered discordant [6]\n";
+	print "\t-u\t\tNumber of unmated pairs [0]\n";
+	print "\t-d\t\tMax distance between soft-clipped segments and discordant read pairs [300]\n";
+	print "\t-o\t\tOutput file name [output.vcf]\n";
+	print "\t-t\t\tPrint temp files for debugging [no|yes]\n";
+	print "\t-c\t\tuse only this chrom or chr:pos1-pos2\n";
+	print "\t-p\t\tuse paired-end mode only \n";
+	print "\t-g\t\tEnable paired-only seach. This will look for discordant read pairs even without soft clips.\n";
+        print "\t-a\t\tset the minimum distance for a discordant read pair without soft-clipping info [10000]\n";
+        print "\t-L\t\tFlag to print out even small deletions (low quality)\n";
+        print "\t-e\t\tdisable strict quality filtering of base qualities in soft-clipped reads [no]\n";
+        print "\t-blacklist\tareas of the genome to skip calling.  Requires -genome_file\n";
+        print "\t-genome_file\ttab seperated value of chromosome name and length.  Only used with -blacklist option\n\n";
+
+	exit 1;
+	}
+
+
+#############################################################
+# create temporary variable name
+#############################################################
+srand (time ^ $$ ^ unpack "%L*", `ps axww | gzip -f`);
+our $random_name=join "", map { ("a".."z")[rand 26] } 1..8;
+
+#############################################################
+## create green list
+##############################################################
+#
+my $new_blacklist="";
+if($blacklist){
+        if(!$genome_file){die "if using a blacklist, you must also specify the location of a genome_file
+        The format of the genome_file should be
+                chrom   size
+                chr1    249250621
+                chr2    243199373
+                ...
+
+        If using hg19, you can ge the genome file by
+                mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \"select chrom, size from hg19.chromInfo\"  > hg19.genome";}
+        
+	$cmd=join("","complementBed -i $blacklist -g $genome_file >",$random_name,".bed") ;
+	system ($cmd);
+	$new_blacklist=join(""," -L ",$random_name,".bed ");
+	}
+
+if($verbose){print "CMD=$cmd\nBlacklist is $new_blacklist\n";}
+
+
+
+
+
+#############################################################
+# Calcualte insert size distribution of properly mated reads
+#############################################################
+
+#Change for compatability with other operating systems
+#my $metrics=`samtools view -q $MapQ -f2 $INPUT_BAM $chrom|cut -f9|head -10000|awk '{if (\$1<0){\$1=-\$1}else{\$1=\$1} sum+=\$1; sumsq+=\$1*\$1} END {print sum/NR, sqrt(sumsq/NR - (sum/NR)**2)}'`;
+#print "samtools view -q $MapQ -f2 $INPUT_BAM $chrom|cut -f9|head -10000|awk '{if (\$1<0){\$1=-\$1}else{\$1=\$1} sum+=\$1; sumsq+=\$1*\$1} END {print sum/NR, sqrt(sumsq/NR - (sum/NR)^2)}'\n";
+my $metrics=`samtools view -q $MapQ -f2 $INPUT_BAM $chrom|cut -f9|head -10000|awk '{if (\$1<0){\$1=-\$1}else{\$1=\$1} sum+=\$1; sumsq+=\$1*\$1} END {print sum/NR, sqrt(sumsq/NR - (sum/NR)^2)}'`;
+#my ($mean,$stdev)=split(/ /,$metrics);
+my ($mean,$stdev)=split(/\s/,$metrics);
+$stdev=~s/\n//;
+
+#print "MEAN=$mean\tSTDEV=$stdev\n\n";
+
+my $upper_limit=int($mean+($num_sd*$stdev));
+my $lower_limit=int($mean-($num_sd*$stdev));
+die if (!$mean);
+print qq{The mean insert size is $mean +/- $stdev (sd)
+The upper limit = $upper_limit
+The lower limit = $lower_limit\n
+};
+if($lower_limit<0){
+	print "Warning!! Given this insert size distribution, we can not call small indels.  No other data will be affected\n\n";
+	$lower_limit=1;
+}
+my $tmp_name=join ("",$random_name,".tmp.bam");
+my $random_file_sc = "";
+my $command = "";
+
+#############################################################
+# Make sam file that has soft clipped reads
+#############################################################
+#give file a name
+if(!defined($pair_only)){
+	foreach my $bam(@INPUT_BAM){
+	$random_file_sc=join ("",$random_name,".sc.sam");
+	$command=join ("","samtools view -q $MapQ -F 1024 $bam $chrom $new_blacklist| awk '{OFS=\"\\t\"}{c=0;if(\$6~/S/){++c};if(c == 1){print}}' | perl -ane '\$TR=(\@F[10]=~tr/\#//);if(\$TR<2){print}' >> ", $random_file_sc);
+	print "Making SAM file of soft-clipped reads from $bam\n";
+	if($verbose){	print "$command\n";}
+	system("$command");
+}
+	#############################################################
+	# Find areas that have deep enough soft-clip coverage
+	print "Identifying soft-clipped regions that are at least $minSoft bp long iin $random_file_sc\n";
+	open (FILE,"$random_file_sc")||die "Can't open soft-clipped sam file $random_file_sc\n";
+
+	my $tmpfile=join("",$random_file_sc,".sc.passfilter");
+	open (OUT,">$tmpfile")||die "Can't write files here!\n";
+
+	while(<FILE>){
+		@_ = split(/\t/, $_);
+		#### parse CIGAR string and create a hash of array of each operation
+		my @CIGAR = split(/([0-9]+[SMIDNHXP])/, $_[5]);
+		my $hash;
+		map { push(@{$hash->{$2}}, $1) if (/(\d+)([SMIDNHXP])/) } @CIGAR;
+
+		#for ($i=0; $i<=$#softclip_pos; $i++)	{
+		foreach my $softclip (@{$hash->{S}}) {
+			#if	($CIGAR[$softclip_pos[$i]] > $minSoft){
+			if	($softclip > $minSoft){
+				###############Make sure base qualities don't have more than 2 bad marks
+				my $qual=$_[10];
+				my $TR=($qual=~tr/$badQualValue//);
+				if($badQualValue eq "#"){ $TR=($qual=~tr/\#//); }
+				#Skip the soft clip if there is more than 2 bad qual values
+				#next if($TR > 2);
+#				if (!$high_qual){next if($TR > 2);}
+				print OUT;
+				last;
+			}
+		}
+	}
+	close FILE;
+	close OUT;
+
+	$command=join(" ","mv",$tmpfile,$random_file_sc);
+if($verbose){	print "$command\n";}
+	system("$command");
+}
+
+#########################################################
+#Stack up SoftClips
+#########################################################
+my $random_file=join("",$random_name,".sc.direction.bed");
+if(!defined($pair_only)){
+        open (FILE,"$random_file_sc")|| die "Can't open sam file\n";
+        #$random_file=join("",$random_name,".sc.direction");
+
+        print "Calling sides of soft-clips from $random_file_sc\n";
+        #\nTMPOUT=$random_file\tINPUT=$random_file_sc\n\n";
+        open (TMPOUT,">$random_file")|| die "Can't create tmp file\n";
+
+        while (<FILE>){
+                @_ = split(/\t/, $_);
+                #### parse CIGAR string and create a hash of array of each operation
+                my @CIGAR = split(/([0-9]+[SMIDNHXP])/, $_[5]);
+                my $hash;
+                map { push(@{$hash->{$2}}, $1) if (/(\d+)([SMIDNHXP])/) } @CIGAR;
+
+                #### next if softclips on each end
+                next if ($_[5] =~ /^[0-9]+S.*S$/);
+
+                #### next softclip occurs in the middle
+                next if ($_[5] =~ /^[0-9]+[^S][0-9].*S.+$/);
+
+                my $softclip = $hash->{S}[0];
+
+                my $end1 = 0;
+                my $end2 = 0;
+                my $softBases = "";
+		my $right_corrected="";my $left_corrected="";
+
+        if ($softclip > $minSoft) {
+		
+                        ####If the soft clip occurs at end of read and its on the minus strand, then it's a right clip
+                        if ($_[5] =~ /^.*S$/) {
+                                $end1=$_[3]+length($_[9])-$softclip-1;
+                                $end2=$end1+1;
+next if ($end1<0);
+                                #RIGHT clip on Minus
+                                $softBases=substr($_[9], length($_[9])-$softclip, length($_[9]));
+                                #Right clips don't always get clipped correctly, so fix that
+                                # Check to see if sc base matches ref
+                                $right_corrected=baseCheck($_[2],$end2,"right",$softBases);
+                               print TMPOUT "$right_corrected\n"
+
+                        } else {
+                                #### Begins with S (left clip)
+                                $end1=$_[3]-$softclip;
+next if ($end1<0);
+
+                                $softBases=substr($_[9], 0,$softclip);#print "TMP=$softBases\n";
+        			$left_corrected=baseCheck($_[2],$end1,"left",$softBases);
+if(!$left_corrected){print "baseCheck($_[2],$end1,left,$softBases)\n";next}
+                               print TMPOUT "$left_corrected\n";
+#print "\nSEQ=$_[9]\t\n";
+
+                        }
+        }
+  }
+close FILE;
+close TMPOUT;
+}
+sub baseCheck{
+        my ($chrom,$pos,$direction,$softBases)=@_;
+        #skip if position is less than 0, which is caused by MT DNA
+        return if ($pos<0);
+        my $exit="";
+
+        while(!$exit){
+        if($direction=~/right/){
+                        my $refBase=getSeq($chrom,$pos,$INPUT_FASTA);
+                        my $softBase=substr($softBases,0,1);
+                        if ($softBase !~ /$refBase/){
+                                my $value=join("\t",$chrom,$pos,$pos+1,join("|",$softBases,$direction));
+                                $exit="STOP";
+                                return $value;
+                        }
+                        else{
+                                $pos=$pos+1;
+                                $softBases=substr($softBases, 1,length($softBases));
+                        }
+         }
+        else{
+                        my $refBase=getSeq($chrom,$pos+1,$INPUT_FASTA);
+                        my $softBase=substr($softBases,-1,1);
+                        if ($softBase !~ /$refBase/){
+                                $pos=$pos-1+length($softBases);
+                                my $value=join("\t",$chrom,$pos-1,$pos,join("|",$softBases,$direction));
+                                $exit="STOP";
+                                return $value;
+                        }
+                        else{
+                                $pos=$pos-1;
+                                $softBases=substr($softBases, 0, -1);
+                                #print "Trying again $softBases\n";
+                       }
+
+        }
+
+}
+}
+#Remove SAM files to conserve space
+unlink($random_file_sc);
+
+
+
+###
+#
+######################################################
+# Transform Read pair groups into softclip equivalents
+######################################################
+#
+#
+#
+my $v="";
+#if($disable_RP_only){
+print "Running Bam2pair.pl\n";
+print "Looking for discordant read pairs without requiring soft-clipping information\n";
+	use FindBin qw($Bin);
+	my $path=$Bin;
+#	print"\n\nPATH=$path\n\n";
+if($verbose){$v="-v"}
+foreach my $random_file_disc(@INPUT_BAM){
+	my $tmp_out=join("",$random_name,"RP.out");
+	$command=join("","perl ",$path,"/Bam2pair.pl -b $random_file_disc  -o $tmp_out -isize $pe_upper_limit -winsize $dist_To_Soft -min $minRP -chrom $chrom -prefix $random_name -q $MapQ -blacklist $random_name.bed $v");
+if($verbose){	print "$command\n"};
+	system("$command");
+	$command=join("","perl -ane '\$end1=\@F[1];\$end2=\@F[3];print join(\"\\t\",\@F[0..1],\$end1,\"unknown|left\");print \"\\n\";print join(\"\\t\",\@F[2..3],\$end2,\"unknown|left\");print \"\\n\"' ", $tmp_out," >> ",$random_file);
+if($verbose){print "$command\n"};
+	system($command);
+	unlink($tmp_out);
+#}
+}
+
+
+######################################################
+unlink("$random_file","$tmp_name","$random_file","$index","$random_name","$new_blacklist") if (-z $random_file || ! -e $random_file) ;
+if (-z $random_file || ! -e $random_file){
+	print "Softclipped file is empty($random_file).\nNo soft clipping found using desired paramters\n\n";
+	open (OUT,">$OUTNAME")||die "Can't write files here!\n";
+        &print_header();
+        close OUT;
+        exit;
+}
+
+
+#############################################################
+#  Make sure there are enough soft-clippped supporting reads
+#############################################################
+my $outfile=join("",$random_file,".sc.merge.bed");
+#sortbed -i .sc.direction | mergeBed -nms -d 25 -i stdin > .sc.merge.bed
+$command=join(" ","sortBed -i",$random_file," | mergeBed  -nms -i stdin","|grep \";\"","|awk '{OFS=\"\t\"}(NF==4)'",">",$outfile);
+
+#print "$command\n";
+system("$command");
+
+if (-z $outfile || ! -e $outfile){
+	unlink("$tmp_name","$random_file","$outfile","$index","$random_name","$new_blacklist"); 
+	print "mergeBed file is empty.\nNo strucutral variants found\n\n" ;
+        open (OUT,">$OUTNAME")||die "Can't write files here!\n";
+        &print_header();
+        close OUT;
+        exit;
+}
+
+print "completed mergeBed\n";
+
+###############################################################
+# If left and right are on the same line, make into 2 lines
+###############################################################
+open (INFILE,$outfile)||die "couldn't open temp file : $. \n\n";
+my $tmp2=join("",$random_name,".sc.fixed.merge.bed");
+#print "INFILE=$outfile\tOUTFILE=$tmp2\n\n";
+#INPUT FORMAT=chr9\t131467\t131473\tATGCTTATTAAAA|left;TTATTAAAAGCATA|left
+open (OUTFILE,">$tmp2")||die "couldn't create temp file : $. \n\n";
+while(<INFILE>){
+	chomp $_;
+	my $l = $_;
+
+	my @a = split(/\t/, $l);
+	my $info = $a[3];
+	my @info_arr = split(/\;/, $info);
+	my @left_arr=();
+	my @right_arr=();
+	@left_arr = grep(/left/, @info_arr);
+	@right_arr = grep(/right/, @info_arr);
+
+	#New
+	my $left = join(";", @left_arr);
+	my $right = join(";", @right_arr);
+	$info = join(";", @info_arr);
+
+	if((@left_arr) && (@right_arr)){
+		print OUTFILE "$a[0]\t$a[1]\t$a[2]\t$left\n$a[0]\t$a[1]\t$a[2]\t$right\n";
+	} else{
+		my $all=join("\t",@a[0..2],$info);
+		print OUTFILE "$all\n";
+	}
+}
+
+# make sure output file name is $outfile
+$command=join(" ","sed -e '/ /s//\t/g'", $tmp2,"|awk 'BEGIN{OFS=\"\\t\"}(NF==4)'", "|perl -pne 's/ /\t/g'>",$outfile);
+system("$command");
+if($verbose){print "$command\n"};
+unlink("$tmp_name","$random_file","$tmp2","$outfile","$index","random_name","$new_blacklist") if (-z $outfile || ! -e $outfile) ;
+ if (-z $outfile || ! -e $outfile){
+	print "Fixed mergeBed file is empty($outfile).\nNo strucutral variants found\n\n";
+        open (OUT,">$OUTNAME")||die "Can't write files here!\n";
+        &print_header();
+        close OUT;
+        exit;
+}
+
+print "completed fixing mergeBed\n\n";
+
+###############################################################
+# Seperate directions of soft clips
+###############################################################
+my $left_sc = join("", "left", $tmp2);
+my $right_sc = join("", "right", $tmp2);
+use FindBin qw($Bin);
+#my $path=$Bin;
+
+$command=join("","grep left ", $tmp2, " |sed -e '/left /s//left\;/g' | sed -e '/ /s//\t/g'|perl ".$path."/direction_filter.pl - >",$left_sc);
+system("$command");
+#print "$command\n";
+$command=join("","grep right ", $tmp2, " |sed -e '/right /s//right\;/g' | sed -e '/ /s//\t/g'|perl ".$path."/direction_filter.pl - >",$right_sc);
+#$command=join(" ","grep right ", $tmp2, " |sed -e '/right /s//right\;/g' | sed -e '/ /s//\t/g' >",$right_sc);
+system("$command");
+#print "$command\n";
+#die "CHECK $right_sc\n";
+
+###############################################################
+# Count the number and identify directions of soft clips
+###############################################################
+print "Count the number and identify directions of soft clips\n";
+#print "looking in $outfile\n";
+$outfile=join("",$random_name,".sc.fixed.merge.bed");
+#system("ls -lhrt");
+open (INFILE,$outfile)||die "couldn't open temp file\n\n";
+my $tmp3 = join("", $random_file, "predSV");
+open (OUTFILE, ">$tmp3")||die "couldn't create temp file\n\n";
+while(<INFILE>){
+chomp;
+	@_=split(/\t/,$_);
+	my $count=tr/\;//;
+	$count=$count+1;
+	my $left=0;
+	my $right=0;
+
+	while ($_ =~ /left/g) { $left++ } # count number of right clips
+	while ($_ =~ /right/g) { $right++ } # count number of left clips
+
+	###############################################################
+	if ($count >= $minSoftReads){
+		####get longets soft-clipped read
+		my @clips=split(/\;|\|/,$_[3]);
+
+		my ($max, $temp, $temp2, $temp3, $dir, $maxSclip) = (0) x 6;
+		for (my $i=0; $i<$count; $i++) {
+			my $plus1=$i+1;
+			$temp=length($clips[$i]);
+			$temp2=$clips[$plus1];
+			$temp3=$clips[$i];
+
+			if ($temp > $max){
+				$maxSclip=$temp3;
+				$max =$temp;
+				$dir=$temp2;
+			} else {
+				$max=$max;
+				$dir=$dir;
+				$maxSclip=$maxSclip;
+			}
+			$i++;
+		}
+		my $order2 = join("|", $left, $right);
+        #print join ("\t",@_[0..2],$maxSclip,$max,$dir,$count,$order2) . "\n";
+		print OUTFILE join ("\t",@_[0..2],$maxSclip,$max,$dir,$count,$order2) . "\n";
+	} elsif($_=~/unknown/){
+	print OUTFILE join ("\t",@_[0..2],"NA","NA","left","NA","NA|NA") . "\n";
+        print OUTFILE join ("\t",@_[0..2],"NA","NA","right","NA","NA|NA") . "\n";
+	}
+	####Format is Chrom,start, end,longest Soft-clip,length of longest Soft-clip, direction of longest soft-clip,#supporting softclips,#right Sclips|#left Sclips
+}
+close INFILE;
+close OUTFILE;
+
+unlink("$tmp2","$tmp_name","$random_file","$tmp3","$outfile","$index","$random_name","$right_sc","$left_sc","$new_blacklist") if (-z $tmp3 || !-e $tmp3) ;
+
+ if (-z $tmp3 || !-e $tmp3){
+	print "No structural variants found while Counting the number and identify directions of soft clips.\n" ;
+
+#	open (OUT,">$OUTNAME")||die "Can't write files here!\n";
+#	&print_header();
+#	close OUT;
+exit;
+}
+
+print "Done counting Softclipped reads\n";
+###############################################################
+#### Print header information
+###############################################################
+
+
+foreach my $random_file_disc(@INPUT_BAM){
+print "Making the header for $random_file_disc\n";
+$SAMPLE_NAME=`samtools view -f2 -H $random_file_disc|awk '{if(\$1~/^\@RG/){sub("ID:","",\$2);name=\$2;print name}}'|head -1`;
+$SAMPLE_NAME=~s/\n//g;
+if($chrom){$SAMPLE_NAME.=".".$chrom}
+
+$SAMPLE_NAME.=".vcf";
+open (OUT,">$SAMPLE_NAME")||die "Can't write files here!\n";
+&print_header();
+
+# DO the bulk of the work
+open (FILE,"$tmp3")|| die "Can't open file\n";
+
+while (<FILE>){
+	#If left clip {+- or -- or -+ }{+- are uninformative b/c they go upstream}
+	#If right clip {++ or -- or +-}
+	chomp $_;
+	my @res=();my $res;
+	my $line = $_;
+	my @info = split(/\t/, $_);
+	my $i=0;
+	my $basename=basename($random_file_disc);$i=0;
+	if($info[5] eq "left") {
+		$res=bulk_work("left", $line, $random_file_disc);
+                if(!$res){$res=join("\t",".",".",".",".",".",".",".",".",".",".")};
+		$i++;
+		} 
+	elsif ($info[5] eq "right") {
+		$res=bulk_work("right", $line, $random_file_disc);
+		if(!$res){$res=join("\t",".",".",".",".",".",".",".",".",".",".")};
+		$i++;
+		}
+	if($res){@res=split("\t",$res);
+	print OUT join("\t",@res)."\n";
+	}}
+close FILE;
+close OUT;
+print "Done with $random_file_disc\n\n";
+}
+
+
+
+###############################################################################
+###############################################################################
+#### Delete temp files
+my $meregedBed=join("",$random_name,".sc.direction.bed.sc.merge.bed");
+
+if(defined($temp_output)){$temp_output=$temp_output} else {$temp_output="no"}
+
+if ($temp_output eq "no"){
+	unlink("$tmp_name","$random_file","$tmp2",,"$tmp3","$outfile","$index","$random_name","$right_sc","$left_sc","$meregedBed","$random_name.bed");
+}
+####Sort VCF
+#my $tmp=join(".",$random_name,"tmp");
+#Get header
+#$cmd="grep \"#\" $OUTNAME > $tmp";
+#system($cmd);
+#sort results
+#$cmd="grep -v \"#\" $OUTNAME|perl -pne 's/chr//'|sort -k1,1n -k2,2n|perl -ne 'print \"chr\".\$_' >>$tmp";
+#system($cmd);
+#$cmd="mv $tmp $OUTNAME";
+#system($cmd);
+#remove entries next to each other
+
+
+print "Analysis Completed\n\nYou did it!!!\n";
+print "Finish Time : " . &spGetCurDateTime() . "\n";
+$now = time - $now;
+printf("\n\nTotal running time: %02d:%02d:%02d\n\n", int($now / 3600), int(($now % 3600) / 60),
+int($now % 60));
+
+exit;
+
+###############################################################################
+sub rev_comp {
+  my $dna = shift;
+  my $revcomp = reverse($dna);
+  $revcomp =~ tr/ACGTacgt/TGCAtgca/;
+  return $revcomp;
+}
+
+
+###############################################################################
+#### to get reference base
+sub getSeq{
+	my ($chr,$pos,$fasta)=@_;
+	#don't require chr
+	#if($chr !~ /^chr/){die "$chr is not correct\n";}
+#	die "$pos is not a number\n" if ($pos <0);
+my @result=();
+        if ($pos <0){print "$pos is not a valid position (likely caused by circular MT chromosome)\n";return;}
+
+	@result = `samtools faidx $fasta $chr:$pos-$pos`;
+	if($result[1]){chomp($result[1]);
+	return uc($result[1]);
+	}
+	return("NA");
+	#### after return will not be printed
+	####print "RESULTS=@result\n";
+}
+
+sub getBases{
+        my ($chr,$pos1,$pos2,$fasta)=@_;
+        #don't require chr
+        #if($chr !~ /^chr/){die "$chr is not correct\n";}
+my @result=();
+        if ($pos1 <0){print "$pos1 is not a valid position (likely caused by circular MT chromosome)\n";return;};
+
+        @result = `samtools faidx $fasta $chr:$pos1-$pos2`;
+	if(!$result[1]){$result[1]="NA"};
+        chomp($result[1]);
+        return uc($result[1]);
+
+        #### after return will not be printed
+        ####print "RESULTS=@result\n";
+}
+###############################################################################
+#### to get time
+sub spGetCurDateTime {
+	my ($sec, $min, $hour, $mday, $mon, $year) = localtime();
+	my $curDateTime = sprintf "%4d-%02d-%02d %02d:%02d:%02d",
+	$year+1900, $mon+1, $mday, $hour, $min, $sec;
+	return ($curDateTime);
+}
+
+
+###############################################################################
+#### print header
+sub print_header {
+	my $date=&spGetCurDateTime();
+	my $header = qq{##fileformat=VCFv4.1
+##fileDate=$date
+##source=SoftSearch.pl
+##reference=$INPUT_FASTA
+##Usage= SoftSearch.pl -l $minSoft -q $MapQ -r $minSoftReads -d $dist_To_Soft -m $minRP -u $unmated_pairs -s $num_sd -b @INPUT_BAM -f $INPUT_FASTA -o $OUTNAME
+##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
+##INFO=<ID=EVENT,Number=1,Type=String,Description="ID of event associated to breakend">
+##FORMAT=<ID=lSC,Number=1,Type=Integer,Description="Length of the longest soft clips supporting the BND">
+##FORMAT=<ID=nSC,Number=1,Type=Integer,Description="Number of supporting soft-clips\">
+##FORMAT=<ID=uRP,Number=1,Type=Integer,Description="Number of unmated read pairs nearby Soft-Clips">
+##FORMAT=<ID=levD_local,Number=1,Type=Float,Description="Levenstein distance between soft-clipped bases and the area around the original soft-clipped site">
+##FORMAT=<ID=levD_distl,Number=1,Type=Float,Description="Levenstein distance between the soft-clipped bases and mate location">
+##FORMAT=<ID=CTX,Number=1,Type=Integer,Description="Number of chromosomal translocations">
+##FORMAT=<ID=DEL,Number=1,Type=Integer,Description="Number of reads supporting Large Deletions">
+##FORMAT=<ID=INS,Number=1,Type=Integer,Description="Number of reads supporting Large insertions">
+##FORMAT=<ID=NOV_INS,Number=1,Type=Integer,Description="Number of reads supporting novel sequence insertion">
+##FORMAT=<ID=INV,Number=1,Type=Integer,Description="Number of reads supporting inversions">
+##FORMAT=<ID=sDEL,Number=1,Type=Integer,Description="Number of reads supporting novel sequence insertion">
+##INFO=<ID=NO_MATE_SC,Number=1,Type=Flag,Description="When there is no softclipping of the mate read location, an appromiate position is used">
+##FORMAT=<ID=GT,Number=1,Type=String,Description="Dummy value for maintaining VCF-Spec">
+#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\t$SAMPLE_NAME\n};
+
+	print OUT $header;
+}
+
+
+###############################################################################
+sub bulk_work {
+	my ($side, $line, $file) = @_;
+	my $local_levD = 0;
+	my $distl_levD = 0;
+
+	#my @info = split(/\t/, $line);
+	my @plus_Reads = split(/\t/, $line);
+	$plus_Reads[7] =~ s/\n//g;
+
+	#### softclip length and softclip size.
+	my $lSC = $plus_Reads[4];
+	my $nSC = $plus_Reads[6];
+
+
+	#Get all types of compatible reads
+	#Get improperly paired reads (@ max distance)
+
+	#### default value for left SIDE.
+	#If left-clip, then look downstream for match of softclipped reads to define a deletion, but look for DRPs upstream
+	my $sv_type = "SVTYPE=BND";
+	my $start_local=0; my $end_local=0;my $target_local="";my $target_drp="";my $start_drp="";my $end_drp="";
+	if ($side =~ /left/) {
+		$start_local = $plus_Reads[1]-$dist_To_Soft;
+		$end_local = $plus_Reads[2];
+                $start_drp = $plus_Reads[1];
+                $end_drp = $plus_Reads[1]+$dist_To_Soft;
+	
+	}
+	else{                
+                $start_local = $plus_Reads[1];
+                $end_local = $plus_Reads[1]+$dist_To_Soft;
+                $start_drp = $plus_Reads[1]-$dist_To_Soft;
+                $end_drp = $plus_Reads[1];
+        }
+	
+	$target_local=join("", $plus_Reads[0], ":", $start_local, "-", $end_local);
+	$target_drp=join("", $plus_Reads[0], ":", $start_drp, "-", $end_drp);
+	my $num_unmapped_pairs="";
+	if ($side =~ /right/) {
+		$num_unmapped_pairs=`samtools view $new_blacklist -q $MapQ -f8 -F 1536 -c $file $target_drp`;
+	} else {
+        $num_unmapped_pairs=`samtools view $new_blacklist -q $MapQ -f24 -F 1536 -c $file $target_drp`;
+	}
+if($verbose){print "samtools view $new_blacklist -q $MapQ -f24 -F 1536 -c $file $target_drp\n";}
+
+	$num_unmapped_pairs=~s/\n//;
+if($verbose){print "NUM UNMAPPED PAIRS= $num_unmapped_pairs\n";}
+	my $REF1_base = "";
+	my $REF2_base = "";
+	my $INFO_1 = "";
+	my $INFO_2 = "";
+	my $ALT_1 = "";
+	my $ALT_2 = "";
+	my $isize = 0;
+	my $QUAL = "";
+	my $FORMAT = "GT:";
+
+	#### get 8 bit rand id
+	my $BND1_name = join "", map { ("a".."z")[rand 26] } 1..8;
+	my $BND2_name = join "", map { ("a".."z")[rand 26] } 1..8;
+	$BND1_name=join "_","BND",$BND1_name;
+	$BND2_name=join "_","BND",$BND2_name;
+
+	my $counts = {CTX => 0, DEL => 0, INS => 0, INV => 0, TDUP => 0, NOV_INS => 0 };
+	my $event_mate_info = {CTX => "", DEL => "", INS => "", INV => "", TDUP => "", NOV_INS => "" };
+
+	#### get mate pair info and counts per event
+	foreach my $e (sort keys %{$counts}) {
+		my $h = get_counts_n_info($e, $side, $MapQ, $file, $dist_To_Soft, $target_drp, $upper_limit, $lower_limit);
+
+		$counts->{$e} = $h->{count};
+		$event_mate_info->{$e} = $h->{info};
+	}
+
+	my $max = 0;
+	my $type = "UNKNOWN";
+	my $nRP = 0;
+	my $mate_info = "NA\tNA\tNA\tNA";
+	my $summary = "GT:";
+
+	#### find max count of events and set type, nRP and info to corresponding
+	#### max count event.
+	#### also create a summary string of all counts to be added to VCF file.
+	foreach my $e (sort keys %{$counts}){
+#		if ($counts->{$e} >=i $max){
+		if ($counts->{$e} > $max){		
+			$type = $e .",". $counts->{$e};
+			$nRP = $counts->{$e};
+
+			$max = $counts->{$e};
+
+			if (length($event_mate_info->{$e})) {
+				$mate_info = $event_mate_info->{$e};
+			}
+		}
+
+		$summary .= $e .",". $counts->{$e} .":";
+	}
+	#print "done with Summaryi=$summary\n";
+	#### remove last colon ":" from
+	$summary =~ s/:$//;
+ if (($minRP > $max)&&(!$disable_RP_only )){return};
+
+	#### Run Levenstein distance on softclip in target region to find out if its a small deletion/insetion
+	#### passing 1: clip_seq, 2: chr, 3: start, 4: end, 5: ref file.
+	my $levD = new LevD;
+########################################################
+########################################################
+########################################################
+
+	#### redefine start and end location for LevD calc.
+#	$start = $plus_Reads[1]-$dist_To_Soft;
+#	$end = $plus_Reads[2];
+	my $num_bases_to_loc=0;
+	my $new_start=0;
+	my $new_end=0;
+	my $del_seq="";
+        my $start = $start_local;
+        my $end = $end_local;
+	if ($lSC=~/NA/){$lSC=0}
+
+	if ($side =~ /right/) {
+	        $levD->search($plus_Reads[3], $plus_Reads[0], $start, $end, $INPUT_FASTA);
+		$local_levD = sprintf("%.2f", $levD->{relative_edit_dist});
+	        $num_bases_to_loc=$levD->{index};
+		$new_start = $plus_Reads[2];
+                if ($plus_Reads[2]=~/^[0-9]/){$new_end=$plus_Reads[2]+$lSC};
+	}
+	else{
+		$levD->search($plus_Reads[3], $plus_Reads[0], $start, $end, $INPUT_FASTA);
+		$local_levD = sprintf("%.2f", $levD->{relative_edit_dist});
+		$num_bases_to_loc=$levD->{index};
+		if ($plus_Reads[2]=~/^[0-9]/){$new_start=$plus_Reads[2]-$lSC};
+                $new_end = $plus_Reads[2];
+	}
+	return if((!$new_start)||(!$new_end));
+return if ($new_start<0);	
+	$del_seq=getBases($plus_Reads[0], $new_start,$new_end,$INPUT_FASTA);
+##############################################################################
+#	#If there is a match, where is the start position of the match?
+#
+##############################################################################
+
+
+	#if $plus_Reads[3] eq "NA", then it was found without soft-clipped reads
+	if($plus_Reads[3] !~  /NA/){
+			if (($local_levD < $levD_local_threshold)) {
+				return if (!$sv_only);
+				#### add value to summary to be written to vcf file.
+				$summary = "GT:sDel," . $plus_Reads[6];
+				$type = "sDEL";
+				###########################################################################
+				##### Printing output
+
+				#########################################
+				##### Get DNA info
+				#########################################
+				#$REF1_base = getSeq($plus_Reads[0], $plus_Reads[1], $INPUT_FASTA);
+				$REF1_base = substr($del_seq, 0, 1);
+
+				#### this is alt ref. for softclip its $plus_Reads[3]
+				$REF2_base = $del_seq;
+				$QUAL = 1/($local_levD + 0.001);
+				$QUAL = sprintf("%.2f",$QUAL);
+				$isize = length($del_seq);
+
+				#### svtype = none for sDEL
+				#### isize = length($info[3]);
+				#### nRP = NA
+				#### mate_id = NA
+				#### CTX,:DEL,:....sDEL,##
+				$INFO_1=join(";", "SVTYPE=NA", "EVENT=$type", "ISIZE=$isize");
+
+				#Add Sample infomration
+				my $FORMAT="GT:sDEL";
+				$FORMAT .= ":lSC:nSC:uRP:levD_local";
+				my $SAMPLE= "0/1:";
+				$SAMPLE .= "$plus_Reads[6]:$lSC:$nSC:$num_unmapped_pairs:$local_levD";
+
+				#### remove any white spaces.
+				$INFO_1=~s/\s//g;
+				$INFO_2=~s/\s//g;
+
+				$BND1_name =~ s/^BND/LEVD/;
+				# If left, then the start position is plus_Reads[1]-isize
+				my $start_pos=0;
+				#Make sure Ref1 and Ref2 bases are different
+				if($REF2_base eq $REF1_base){$REF1_base="NA"}
+				if($side=~/left/){$start_pos=$plus_Reads[1]-$isize}else{$start_pos=$plus_Reads[1]};		
+				 my $var=join("\t", $plus_Reads[0], $start_pos, $BND1_name, $REF2_base, $REF1_base, $QUAL, "PASS", $INFO_1,$FORMAT,$SAMPLE);
+				return $var;
+				#print OUT join ("\t", $plus_Reads[0], $start_pos, $BND1_name, $REF2_base, $REF1_base, $QUAL, "PASS", $INFO_1,$FORMAT,$SAMPLE, "\n");
+#				return;
+			}
+		}
+
+		#### Otherwise, look for DRP mate info
+	#if($nRP=~/NA/){print "MATE_INFO=$mate_info\tSide=$side\tline=$line\n";}
+		my @mate_info_arr = split(/\t/, $mate_info);
+		$nRP = $mate_info_arr[3];
+		my $mate_chr=$mate_info_arr[0];
+
+			if((! defined $nRP) || ($nRP =~ /na/i) || ($mate_chr =~ /NA/) ){
+			#PRINT UNKNOWN
+return if ($nRP =~ /na/i);
+	#print "There is an unknown\nNRP=$nRP Mate_CHR=$mate_chr minRP=$minRP\n";die;
+				$summary .= ":unknown," . $plus_Reads[6];
+				$type = "unknown";
+				$REF1_base = getSeq($plus_Reads[0], $plus_Reads[1], $INPUT_FASTA);
+				$REF2_base = $plus_Reads[3];
+				$BND1_name =~ s/^BND/UNKNOWN/;
+				$QUAL = 1/($local_levD + 0.001);
+				$QUAL = sprintf("%.2f",$QUAL);
+				$INFO_1=join(";", "SVTYPE=unknown", "EVENT=unknown", "ISIZE=unknown");
+				#Add Sample infomration
+				my $FORMAT="GT:sDEL";
+				$FORMAT .= ":lSC:nSC:uRP:levD_local";
+				my $SAMPLE = "0/1:";
+				$SAMPLE .= "$plus_Reads[6]:$lSC:$nSC:$num_unmapped_pairs:$local_levD";
+				#### remove any white spaces.
+				$INFO_1=~s/\s//g;
+			       #print join ("\t", $plus_Reads[0], $plus_Reads[1],  $REF2_base, $REF1_base, $QUAL, "PASS", $INFO_1,$FORMAT,$SAMPLE, "\n");
+
+				#print OUT join ("\t", $plus_Reads[0], $plus_Reads[1], $BND1_name, $REF1_base, $REF2_base, $QUAL, "PASS", $INFO_1,$FORMAT,$SAMPLE, "\n");
+				my $var=join("\t", $plus_Reads[0], $plus_Reads[1], $BND1_name, $REF1_base, $REF2_base, $QUAL, "PASS", $INFO_1,$FORMAT,$SAMPLE);
+				return $var;
+
+		}
+
+		#### end if there is no mate info or nRP+uRP<minRP
+		return if (($nRP<$minRP)&&($unmated_pairs > ($num_unmapped_pairs+$nRP)));
+
+		##################################################################################
+		# Find out if mates have nearby soft-clips (to refine the breakpoints)
+		##################################################################################
+		#Look for evidence of soft-clipping near mate
+		my @mate_soft_arr = ();
+		my $mate_start = 0;
+		my $mate_soft = "";
+
+		@mate_info_arr = split(/\t/, $mate_info);
+
+		#### mate start and end locations.
+		my $filename = $right_sc;
+
+		$start = $mate_info_arr[1] - $dist_To_Soft;
+		$end = $mate_info_arr[1];
+
+		if ($side =~ /right/) {
+			$start = $mate_info_arr[2];
+			$end = $mate_info_arr[2] + $dist_To_Soft;
+
+			$filename = $left_sc;
+		}
+
+		#### add levenstein distance to Summary
+	#print "Calc distal Levd\n";
+		$levD->search(rev_comp($plus_Reads[3]), $mate_info_arr[0], $start, $end, $INPUT_FASTA);
+		$distl_levD = sprintf("%.2f", $levD->{relative_edit_dist});
+	$distl_levD = "NA" if($plus_Reads[3] =~ /NA/);
+	#If there is no softclips to string match, then give 0 as quality value
+       if ($plus_Reads[3] !~ /NA/){
+			$QUAL=1/($distl_levD + 0.001);
+		}
+		else	{
+			$QUAL=0;
+		};
+	$QUAL=sprintf("%.2f",$QUAL);
+	#### looking for softclips to refine break point
+	#### if left look in right and vice-versa.
+	$cmd = qq{echo -e "$mate_info_arr[0]\t$start\t$end"};
+	$cmd .= qq{ | awk -F'\t' 'NR==3' | intersectBed -a stdin -b $filename | head -1};
+
+	$mate_soft = `$cmd`;
+
+	$mate_soft =~ s/\n//g;
+	@mate_soft_arr = split(/\s/, $mate_soft);
+my $NO_MATE_SC="";
+	if(@mate_soft_arr){
+		$mate_chr = $mate_soft_arr[0];
+		$mate_start = $mate_soft_arr[1];
+                $NO_MATE_SC="APPROXIMATE";
+
+	} else{
+		@mate_info_arr = split(/\s/,$mate_info);
+		$mate_chr = $mate_info_arr[0];
+		$mate_start = $mate_info_arr[1];
+	}
+
+	#end if there is no mate info
+	return if ($mate_chr eq "");
+	#end if there is no mate info and !disable_RP_only
+	return if (($lSC =~/NA/)&&(!$disable_RP_only));
+	
+	
+	###########################################################################
+	##### Printing output
+
+	#########################################
+	# Get DNA info
+	#########################################
+	#print "PLUS_READS=$plus_Reads[0],$plus_Reads[1]\nMATE=$mate_chr,$mate_start,$INPUT_FASTA\n";
+	$REF1_base = getSeq($plus_Reads[0], $plus_Reads[1], $INPUT_FASTA);
+
+	### this is alt ref. for softclip its $plus_Reads[3]
+	$REF2_base = getSeq($mate_chr, $mate_start, $INPUT_FASTA);
+
+	#########################################
+	# print in VCF format
+	#########################################
+
+	#### abs value to account for left and right reads.
+	$isize = abs($plus_Reads[1]-$mate_start);
+	
+	my $event_type=$type;
+	$event_type=~ s/,|[0-9]//g;
+	$INFO_1=join(";", "$sv_type", "EVENT=$event_type", "ISIZE=$isize","MATE_ID=$BND2_name");
+	$INFO_2=join(";", "$sv_type", "EVENT=$event_type", "ISIZE=$isize","MATE_ID=$BND1_name");
+
+	#### remove any white spaces.
+	#### ask: did you mean to remove space from ends? eg. trim()
+	$INFO_1=~s/\s//g;
+	$INFO_2=~s/\s//g;
+
+	$FORMAT=$summary; 
+ 	$FORMAT=~ s/,|[0-9]//g;
+        $FORMAT .= ":lSC:nSC:uRP:distl_levD";
+	if($NO_MATE_SC){$INFO_2 .= ":NO_MATE_SC"}
+	my $SAMPLE="0/1:";	
+	$SAMPLE .=$summary;
+#        if($NO_MATE_SC){$SAMPLE.= ":$NO_MATE_SC"}
+
+	$SAMPLE=~s/[A-Z|,|_]//g;
+        my $MATE_SAMPLE=$SAMPLE;
+        $SAMPLE .= ":$lSC:$nSC:$num_unmapped_pairs:$distl_levD";
+	$MATE_SAMPLE .=":NA:NA:NA:NA";
+	$SAMPLE=~s/::/:/g;
+	$MATE_SAMPLE=~s/::/:/g;
+ 
+	if($type !~ /INV/){
+		$ALT_1 = join("","]",$mate_chr,":",$mate_start,"]",$REF1_base);
+		$ALT_2 = join("",$REF2_base,"[",$plus_Reads[0],":",$plus_Reads[1],"[");
+		#		2      321682 bnd_V  T   ]13:123456]T  6    PASS SVTYPE=BND
+		#		13     123456 bnd_U  C   C[2:321682[   6    PASS SVTYPE=BND
+	} else {
+		$ALT_1 = join("", "]", $plus_Reads[0], ":", $plus_Reads[1], "]", $REF2_base);
+		$ALT_2 = join("", $REF1_base, "[", $mate_chr, ":", $mate_start, "[");
+	}
+
+	if(($mate_chr) && ($plus_Reads[0])){
+#		print OUT join ("\t", $plus_Reads[0], $plus_Reads[1], $BND1_name, $REF1_base, $ALT_1, $QUAL,"PASS", $INFO_1, $FORMAT,$SAMPLE,"\n");
+#		print OUT join ("\t", $mate_chr, $mate_start, $BND2_name, $REF2_base, $ALT_2, $QUAL, "PASS", $INFO_2, $FORMAT,$MATE_SAMPLE,"\n");
+		my $var=join ("\t", $plus_Reads[0], $plus_Reads[1], $BND1_name, $REF1_base, $ALT_1, $QUAL,"PASS", $INFO_1, $FORMAT,$SAMPLE);
+		return $var;		
+	}
+}
+
+###############################################################################
+###############################################################################
+sub get_counts_n_info {
+        my ($event, $side, $mapQ, $file, $dist, $target, $upL, $lwL) = @_;
+
+        my $mate_info = "";
+        my $cmd = "";
+
+        if ($event =~ /^CTX$/i) {
+                #print "CTX side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 32 -F 1536 $file $target};
+                        $cmd .= qq{ | perl -ane 'if(\$F[6] ne "="){\$end=\$F[7]+1; print join ("\\t",\$F[6],\$F[7],\$end,"\\n")}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+if($verbose){print "$cmd\n"}
+                        $mate_info=`$cmd`;
+                } else {
+                        $cmd = qq{ samtools view $new_blacklist -q $mapQ -f 16 -F 1536 $file $target};
+                        $cmd .= qq{ | perl -ane 'if(\$F[6] ne "="){\$end=\$F[7]+1; print join ("\\t",\$F[6],\$F[7],\$end,"\\n")}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+if($verbose){print "$cmd\n"}
+                        $mate_info=`$cmd`;
+                }
+        } elsif ($event =~ /^DEL$/i) {
+                #print "DEL side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ  -f 32 -F 1552 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$9>$upL)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+if($verbose){print "$cmd\n"}
+                        $mate_info=`$cmd`;
+                } else {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -F 1568 -f 16 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"} {if((\$7 ~ /=/)&&(\$9<-$upL)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist_To_Soft -n | sort -k4nr | head -1};
+if($verbose){print "$cmd\n"}
+
+                        $mate_info=`$cmd`;
+                }
+        } elsif ($event =~ /^INS$/i) {
+                #print "INS side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 32 -F 1552 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$9<$lwL && \$9 > 0 )){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                } else {
+                        $cmd = qq {samtools view $new_blacklist -q $mapQ -f 16 -F 1568 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$9>-$lwL && \$9 < 0 )){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                }
+        } elsif ($event =~ /^INV$/i) {
+                #print "INV side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ  -F 1596 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist_To_Soft -n | sort -k4nr | head -1};
+if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                } else {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 48 -F 1548 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                }
+        } elsif ($event =~ /^TDUP$/i) {
+                #print "TDUP side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 32 -F 1552 $file $target};
+#			$cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$9>$upL)){end=\$8+1;print \$3,\$8,end}}'};
+			$cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$4>\$8)&&(\$9<0)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                } else {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 16 -F 1568 $file $target};
+#                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$9<-$upL )){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$4<\$8)&&(\$9>0)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                }
+        } elsif ($event =~ /^NOV_INS$/i) {
+                #print "NOV_INS side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 8 -F 1552 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{end=\$8+1;print \$3,\$8,end}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                } else {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ  -f 24 -F 1536 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{end=\$8+1;print \$3,\$8,end}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                }
+        }
+
+        $mate_info=~s/\n//g;
+        my @tmp=split(/\t/, $mate_info);
+
+        my $counts = 0;
+
+        if (defined $tmp[3]) {
+                $tmp[3] =~ s/\n//g;
+
+                $counts = $tmp[3] if (length($tmp[3]));
+        }
+        return ({count=>$counts, info=>$mate_info});                                                                                                                                
+}
+
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/src/SoftSearch.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,1192 @@
+#!/usr/bin/perl
+
+####
+#### Usage: SoftSearch.pl [-lqrmsd] -b <BAM> -f <Genome.fa> -sam <samtools path> -bed <bedtools path>
+#### Created 1-30-2012 by Steven Hart, PhD
+#### hart.steven@mayo.edu
+#### Required bedtools & samtools to be in path
+
+
+use lib "/home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/lib" ;
+
+use Getopt::Long;
+use strict;
+use warnings;
+#use Data::Dumper;
+use LevD;
+use File::Basename;
+
+my ($INPUT_BAM,$INPUT_FASTA,$OUTPUT_FILE,$minSoft,$minSoftReads,$dist_To_Soft,$bedtools,$samtools);
+my ($minRP, $temp_output, $num_sd, $MapQ, $chrom, $unmated_pairs, $minBQ, $pair_only, $disable_RP_only);
+my ($levD_local_threshold, $levD_distl_threshold,$pe_upper_limit,$high_qual,$sv_only,$blacklist,$genome_file,$verbose);
+
+my $cmd = "";
+
+#Declare variables
+GetOptions(
+	'b=s' => \$INPUT_BAM,
+	'f=s' => \$INPUT_FASTA,
+	'o:s' => \$OUTPUT_FILE,
+	'm:i' => \$minRP,
+	'l:i' => \$minSoft,
+	'r:i' => \$minSoftReads,
+	't:i' => \$temp_output,
+	's:s' => \$num_sd,
+	'd:i' => \$dist_To_Soft,
+	'q:i' => \$MapQ,
+	'c:s' => \$chrom,
+	'u:s' => \$unmated_pairs,
+	'x:s' => \$minBQ,
+	'p' => \$pair_only,
+	'g' => \$disable_RP_only,
+	'j:s' => \$levD_local_threshold,
+	'k:s' => \$levD_distl_threshold,
+        'a:s' => \$pe_upper_limit,
+        'e:s' => \$high_qual,
+	'L' => \$sv_only,
+	'v' => \$verbose, 
+	'blacklist:s' => \$blacklist,
+	'genome_file:s' => \$genome_file,
+	"help|h|?"	=> \&usage);
+
+unless($sv_only){$sv_only=""};
+if(defined($INPUT_BAM)){$INPUT_BAM=$INPUT_BAM} else {print usage();die "Where is the BAM file?\n\n"}
+if(defined($INPUT_FASTA)){$INPUT_FASTA=$INPUT_FASTA} else {print usage();die "Where is the fasta file?\n\n"}
+my ($fn,$pathname) = fileparse($INPUT_BAM,".bam");
+my $index=`ls $pathname/$fn*bai|head -1`;
+#my $index =`ls \${INPUT_BAM%.bam}*bai`;
+#print "INDEX=$index\n";
+if(!$index){die "\n\nERROR: you need index your BAM file\n\n"}
+
+### get current time
+print "Start Time : " . &spGetCurDateTime() . "\n";
+my $now = time;
+
+#if(defined($OUTPUT_FILE)){$OUTPUT_FILE=$OUTPUT_FILE} else {$OUTPUT_FILE="output.vcf"; print "\nNo outfile specified.  Using output.vcf as default\n\n"}
+if(defined($minSoft)){$minSoft=$minSoft} else {$minSoft=5}
+if(defined($minRP)){$minRP=$minRP} else {$minRP=5}
+if(defined($minSoftReads)){$minSoftReads=$minSoftReads} else {$minSoftReads=5}
+if(defined($dist_To_Soft)){$dist_To_Soft=$dist_To_Soft} else {$dist_To_Soft=300}
+if(defined($num_sd)){$num_sd=$num_sd} else {$num_sd=6}
+if(defined($MapQ)){$MapQ=$MapQ} else {$MapQ=20}
+
+unless (defined $pe_upper_limit) { $pe_upper_limit = 10000; }
+unless (defined $levD_local_threshold) { $levD_local_threshold = 0.05; }
+unless (defined $levD_distl_threshold) { $levD_distl_threshold = 0.05; }
+#Get sample name if available
+my $SAMPLE_NAME="";
+my $OUTNAME ="";
+$SAMPLE_NAME=`samtools view -f2 -H $INPUT_BAM|awk '{if(\$1~/^\@RG/){sub("ID:","",\$2);name=\$2;print name}}'|head -1`;
+$SAMPLE_NAME=~s/\n//g;
+if (!$OUTPUT_FILE){
+	if($SAMPLE_NAME ne ""){$OUTNAME=$SAMPLE_NAME.".vcf"}
+	else {$OUTNAME="output.vcf"}
+}
+else{$OUTNAME=$OUTPUT_FILE}
+
+print "Writing results to $OUTNAME\n";
+
+
+##Make sure if submitting on SGE, to prepned the "chr".  Not all referecne FAST files require "chr", so we shouldn't force the issue.
+if(!defined($chrom)){$chrom=""}
+if(!defined($unmated_pairs)){$unmated_pairs=0}
+
+my $badQualValue=chr($MapQ);
+if(defined($minBQ)){ $badQualValue=chr($minBQ); }
+
+if($badQualValue  eq "#"){$badQualValue="\#"}
+
+# adding and cheking for samtools and bedtools in the PATh
+## check for bedtools and samtools in the path
+$bedtools=`which intersectBed` ;
+if(!defined($bedtools)){die "\nError:\n\tno bedtools. Please install bedtools and add to the path\n";}
+#$samtools=`samtools 2>&1`;
+$samtools=`which samtools`;
+if($samtools !~ /(samtools)/i){die "\nError:\n\tno samtools. Please install samtools and add to the path\n";}
+
+print "Usage = SoftSearch.pl -l $minSoft -q $MapQ -r $minSoftReads -d $dist_To_Soft -m $minRP -s $num_sd -c $chrom -b $INPUT_BAM -f $INPUT_FASTA -o $OUTNAME \n\n";
+sub usage {
+	print "\nusage: SoftSearch.pl [-cqlrmsd] -b <BAM> -f <Genome.fa> \n";
+	print "\t-q\t\tMinimum mapping quality [20]\n";
+	print "\t-l\t\tMinimum length of soft-clipped segment [5]\n";
+	print "\t-r\t\tMinimum depth of soft-clipped reads at position [5]\n";
+	print "\t-m\t\tMinimum number of discordant read pairs [5]\n";
+	print "\t-s\t\tNumber of sd away from mean to be considered discordant [6]\n";
+	print "\t-u\t\tNumber of unmated pairs [0]\n";
+	print "\t-d\t\tMax distance between soft-clipped segments and discordant read pairs [300]\n";
+	print "\t-o\t\tOutput file name [output.vcf]\n";
+	print "\t-t\t\tPrint temp files for debugging [no|yes]\n";
+	print "\t-c\t\tuse only this chrom or chr:pos1-pos2\n";
+	print "\t-p\t\tuse paired-end mode only. In other words, don't try to find soft-clipping events!\n";
+	print "\t-g\t\tEnable paired-only seach. This will look for discordant read pairs even without soft clips.\n";
+        print "\t-a\t\tset the minimum distance for a discordant read pair without soft-clipping info [10000]\n";
+        print "\t-L\t\tFlag to print out even small deletions (low quality)\n";
+        print "\t-e\t\tdisable strict quality filtering of base qualities in soft-clipped reads [no]\n";
+        print "\t-blacklist\tareas of the genome to skip calling.  Requires -genome_file\n";
+        print "\t-genome_file\ttab seperated value of chromosome name and length.  Only used with -blacklist option\n\n";
+
+	exit 1;
+	}
+
+
+#############################################################
+# create temporary variable name
+#############################################################
+srand (time ^ $$ ^ unpack "%L*", `ps axww | gzip -f`);
+our $random_name=join "", map { ("a".."z")[rand 26] } 1..8;
+
+#############################################################
+## create green list
+##############################################################
+#
+my $new_blacklist="";
+if($blacklist){
+        if(!$genome_file){die "if using a blacklist, you must also specify the location of a genome_file
+        The format of the genome_file should be
+                chrom   size
+                chr1    249250621
+                chr2    243199373
+                ...
+
+        If using hg19, you can ge the genome file by
+                mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \"select chrom, size from hg19.chromInfo\"  > hg19.genome";}
+        
+	$cmd=join("","complementBed -i $blacklist -g $genome_file >",$random_name,".bed") ;
+	system ($cmd);
+	$new_blacklist=join(""," -L ",$random_name,".bed ");
+	}
+
+if($verbose){print "CMD=$cmd\nBlacklist is $new_blacklist\n";}
+
+
+
+
+
+#############################################################
+# Calcualte insert size distribution of properly mated reads
+#############################################################
+
+#Change for compatability with other operating systems
+#my $metrics=`samtools view -q $MapQ -f2 $INPUT_BAM $chrom|head -10000|cut -f9|awk '{if (\$1<0){\$1=-\$1}else{\$1=\$1} sum+=\$1; sumsq+=\$1*\$1} END {print sum/NR, sqrt(sumsq/NR - (sum/NR)**2)}'`;
+
+my $metrics=`samtools view -q $MapQ -f2 $INPUT_BAM $chrom|head -10000|cut -f9|awk '{if (\$1<0){\$1=-\$1}else{\$1=\$1} sum+=\$1; sumsq+=\$1*\$1} END {print sum/NR, sqrt(sumsq/NR - (sum/NR)^2)}'`;
+#my ($mean,$stdev)=split(/ /,$metrics);
+my ($mean,$stdev)=split(/\s/,$metrics);
+$stdev=~s/\n//;
+my $upper_limit=int($mean+($num_sd*$stdev));
+my $lower_limit=int($mean-($num_sd*$stdev));
+die if (!$mean);
+print qq{The mean insert size is $mean +/- $stdev (sd)
+The upper limit = $upper_limit
+The lower limit = $lower_limit\n
+};
+if($lower_limit<0){
+	print "Warning!! Given this insert size distribution, we can not call small indels.  No other data will be affected\n";
+	$lower_limit=1;
+}
+my $tmp_name=join ("",$random_name,".tmp.bam");
+my $random_file_sc = "";
+my $command = "";
+
+#############################################################
+# Make sam file that has soft clipped reads
+#############################################################
+#give file a name
+if(!defined($pair_only)){
+	$random_file_sc=join ("",$random_name,".sc.sam");
+	$command=join ("","samtools view -q $MapQ -F 1024 $INPUT_BAM $chrom $new_blacklist| awk '{OFS=\"\\t\"}{c=0;if(\$6~/S/){++c};if(c == 1){print}}' | perl -ane '\$TR=(\@F[10]=~tr/\#//);if(\$TR<2){print}' > ", $random_file_sc);
+
+	print "Making SAM file of soft-clipped reads\n";
+if($verbose){	print "$command\n";}
+	system("$command");
+
+	#############################################################
+	# Find areas that have deep enough soft-clip coverage
+	print "Identifying soft-clipped regions that are at least $minSoft bp long \n";
+	open (FILE,"$random_file_sc")||die "Can't open soft-clipped sam file $random_file_sc\n";
+
+	my $tmpfile=join("",$random_file_sc,".sc.passfilter");
+	open (OUT,">$tmpfile")||die "Can't write files here!\n";
+
+	while(<FILE>){
+		@_ = split(/\t/, $_);
+		#### parse CIGAR string and create a hash of array of each operation
+		my @CIGAR = split(/([0-9]+[SMIDNHXP])/, $_[5]);
+		my $hash;
+		map { push(@{$hash->{$2}}, $1) if (/(\d+)([SMIDNHXP])/) } @CIGAR;
+
+		#for ($i=0; $i<=$#softclip_pos; $i++)	{
+		foreach my $softclip (@{$hash->{S}}) {
+			#if	($CIGAR[$softclip_pos[$i]] > $minSoft){
+			if	($softclip > $minSoft){
+				###############Make sure base qualities don't have more than 2 bad marks
+				my $qual=$_[10];
+				my $TR=($qual=~tr/$badQualValue//);
+				if($badQualValue eq "#"){ $TR=($qual=~tr/\#//); }
+				#Skip the soft clip if there is more than 2 bad qual values
+				#next if($TR > 2);
+#				if (!$high_qual){next if($TR > 2);}
+				print OUT;
+				last;
+			}
+		}
+	}
+	close FILE;
+	close OUT;
+
+	$command=join(" ","mv",$tmpfile,$random_file_sc);
+if($verbose){	print "$command\n";}
+	system("$command");
+}
+
+#########################################################
+#Stack up SoftClips
+#########################################################
+my $random_file=join("",$random_name,".sc.direction.bed");
+if(!defined($pair_only)){
+        open (FILE,"$random_file_sc")|| die "Can't open sam file\n";
+        #$random_file=join("",$random_name,".sc.direction");
+
+        print "Calling sides of soft-clips\n";
+        #\nTMPOUT=$random_file\tINPUT=$random_file_sc\n\n";
+        open (TMPOUT,">$random_file")|| die "Can't create tmp file\n";
+
+        while (<FILE>){
+                @_ = split(/\t/, $_);
+                #### parse CIGAR string and create a hash of array of each operation
+                my @CIGAR = split(/([0-9]+[SMIDNHXP])/, $_[5]);
+                my $hash;
+                map { push(@{$hash->{$2}}, $1) if (/(\d+)([SMIDNHXP])/) } @CIGAR;
+
+                #### next if softclips on each end
+                next if ($_[5] =~ /^[0-9]+S.*S$/);
+
+                #### next softclip occurs in the middle
+                next if ($_[5] =~ /^[0-9]+[^S][0-9].*S.+$/);
+
+                my $softclip = $hash->{S}[0];
+
+                my $end1 = 0;
+                my $end2 = 0;
+                my $softBases = "";
+		my $right_corrected="";my $left_corrected="";
+
+        if ($softclip > $minSoft) {
+		
+                        ####If the soft clip occurs at end of read and its on the minus strand, then it's a right clip
+                        if ($_[5] =~ /^.*S$/) {
+                                $end1=$_[3]+length($_[9])-$softclip-1;
+                                $end2=$end1+1;
+next if ($end1<0);
+                                #RIGHT clip on Minus
+                                $softBases=substr($_[9], length($_[9])-$softclip, length($_[9]));
+                                #Right clips don't always get clipped correctly, so fix that
+                                # Check to see if sc base matches ref
+                                $right_corrected=baseCheck($_[2],$end2,"right",$softBases);
+                               print TMPOUT "$right_corrected\n"
+
+                        } else {
+                                #### Begins with S (left clip)
+                                $end1=$_[3]-$softclip;
+next if ($end1<0);
+
+                                $softBases=substr($_[9], 0,$softclip);#print "TMP=$softBases\n";
+        			$left_corrected=baseCheck($_[2],$end1,"left",$softBases);
+if(!$left_corrected){print "baseCheck($_[2],$end1,left,$softBases)\n";next}
+                               print TMPOUT "$left_corrected\n";
+#print "\nSEQ=$_[9]\t\n";
+
+                        }
+        }
+  }
+close FILE;
+close TMPOUT;
+}
+sub baseCheck{
+        my ($chrom,$pos,$direction,$softBases)=@_;
+        #skip if position is less than 0, which is caused by MT DNA
+        return if ($pos<0);
+        my $exit="";
+
+        while(!$exit){
+        if($direction=~/right/){
+                        my $refBase=getSeq($chrom,$pos,$INPUT_FASTA);
+                        my $softBase=substr($softBases,0,1);
+                        if ($softBase !~ /$refBase/){
+                                my $value=join("\t",$chrom,$pos,$pos+1,join("|",$softBases,$direction));
+                                $exit="STOP";
+                                return $value;
+                        }
+                        else{
+                                $pos=$pos+1;
+                                $softBases=substr($softBases, 1,length($softBases));
+                        }
+         }
+        else{
+                        my $refBase=getSeq($chrom,$pos+1,$INPUT_FASTA);
+                        my $softBase=substr($softBases,-1,1);
+                        if ($softBase !~ /$refBase/){
+                                $pos=$pos-1+length($softBases);
+                                my $value=join("\t",$chrom,$pos-1,$pos,join("|",$softBases,$direction));
+                                $exit="STOP";
+                                return $value;
+                        }
+                        else{
+                                $pos=$pos-1;
+                                $softBases=substr($softBases, 0, -1);
+                                #print "Trying again $softBases\n";
+                       }
+
+        }
+
+}
+}
+#Remove SAM files to conserve space
+unlink($random_file_sc);
+
+
+my $random_file_disc="$INPUT_BAM";
+###
+#
+######################################################
+# Transform Read pair groups into softclip equivalents
+######################################################
+#
+#
+#
+my $v="";
+#if($disable_RP_only){
+print "Running Bam2pair.pl\n";
+print "Looking for discordant read pairs without requiring soft-clipping information\n";
+	use FindBin qw($Bin);
+	my $path=$Bin;
+#	print"\n\nPATH=$path\n\n";
+if($verbose){$v="-v"}
+	my $tmp_out=join("",$random_name,"RP.out");
+	$command=join("","perl ",$path,"/Bam2pair.pl -b $random_file_disc  -o $tmp_out -isize $pe_upper_limit -winsize $dist_To_Soft -min $minRP -chrom $chrom -prefix $random_name -q $MapQ -blacklist $random_name.bed $v");
+if($verbose){	print "$command\n"};
+	system("$command");
+	$command=join("","perl -ane '\$end1=\@F[1];\$end2=\@F[3];print join(\"\\t\",\@F[0..1],\$end1,\"unknown|left\");print \"\\n\";print join(\"\\t\",\@F[2..3],\$end2,\"unknown|left\");print \"\\n\"' ", $tmp_out," >> ",$random_file);
+if($verbose){print "$command\n"};
+	system($command);
+	unlink($tmp_out);
+#}
+#
+
+
+######################################################
+unlink("$random_file","$tmp_name","$random_file","$index","$random_name","$new_blacklist") if (-z $random_file || ! -e $random_file ) ;
+if (-z $random_file || ! -e $random_file){
+	print "Softclipped file is empty($random_file).\nNo soft clipping found using desired paramters\n\n";
+	open (OUT,">$OUTNAME")||die "Can't write files here!\n";
+        &print_header();
+        close OUT;
+        exit;
+	}
+
+
+#############################################################
+#  Make sure there are enough soft-clippped supporting reads
+#############################################################
+my $outfile=join("",$random_file,".sc.merge.bed");
+#sortbed -i .sc.direction | mergeBed -nms -d 25 -i stdin > .sc.merge.bed
+$command=join(" ","sortBed -i",$random_file," | mergeBed  -nms -i stdin","|egrep \";|,\"","|awk '{OFS=\"\t\"}(NF==4)'",">",$outfile);
+
+print "$command\n" if ($verbose);
+system("$command");
+
+if (-z $outfile || ! -e $outfile){
+	unlink("$tmp_name","$random_file","$outfile","$index","$random_name","$new_blacklist"); 
+	print "mergeBed file is empty.\nNo strucutral variants found\n\n" ;
+        open (OUT,">$OUTNAME")||die "Can't write files here!\n";
+        &print_header();
+        close OUT;
+        exit;
+}
+
+print "completed mergeBed\n";
+
+###############################################################
+# If left and right are on the same line, make into 2 lines
+###############################################################
+open (INFILE,$outfile)||die "couldn't open temp file : $. \n\n";
+my $tmp2=join("",$random_name,".sc.fixed.merge.bed");
+#print "INFILE=$outfile\tOUTFILE=$tmp2\n\n";
+#INPUT FORMAT=chr9\t131467\t131473\tATGCTTATTAAAA|left;TTATTAAAAGCATA|left
+open (OUTFILE,">$tmp2")||die "couldn't create temp file : $. \n\n";
+while(<INFILE>){
+	chomp $_;
+	my $l = $_;
+
+	my @a = split(/\t/, $l);
+	my $info = $a[3];
+	my @info_arr = split(/\;/, $info);
+	my @left_arr=();
+	my @right_arr=();
+	@left_arr = grep(/left/, @info_arr);
+	@right_arr = grep(/right/, @info_arr);
+
+	#New
+	my $left = join(";", @left_arr);
+	my $right = join(";", @right_arr);
+	$info = join(";", @info_arr);
+
+	if((@left_arr) && (@right_arr)){
+		print OUTFILE "$a[0]\t$a[1]\t$a[2]\t$left\n$a[0]\t$a[1]\t$a[2]\t$right\n";
+	} else{
+		my $all=join("\t",@a[0..2],$info);
+		print OUTFILE "$all\n";
+	}
+}
+
+# make sure output file name is $outfile
+$command=join(" ","sed -e '/ /s//\t/g'", $tmp2,"|awk 'BEGIN{OFS=\"\\t\"}(NF==4)'", "|perl -pne 's/ /\t/g'>",$outfile);
+system("$command");
+if($verbose){print "$command\n"};
+unlink("$tmp_name","$random_file","$tmp2","$outfile","$index","random_name","$new_blacklist") if (-z $outfile || ! -e $outfile) ;
+ if (-z $outfile || ! -e $outfile){
+	print "Fixed mergeBed file is empty($outfile).\nNo strucutral variants found\n\n";
+        open (OUT,">$OUTNAME")||die "Can't write files here!\n";
+        &print_header();
+        close OUT;
+        exit;
+}
+
+print "completed fixing mergeBed\n\n";
+
+###############################################################
+# Seperate directions of soft clips
+###############################################################
+my $left_sc = join("", "left", $tmp2);
+my $right_sc = join("", "right", $tmp2);
+use FindBin qw($Bin);
+#my $path=$Bin;
+
+$command=join("","grep left ", $tmp2, " |sed -e '/left /s//left\;/g' | sed -e '/ /s//\t/g'|perl ".$path."/direction_filter.pl - >",$left_sc);
+system("$command");
+#print "$command\n";
+$command=join("","grep right ", $tmp2, " |sed -e '/right /s//right\;/g' | sed -e '/ /s//\t/g'|perl ".$path."/direction_filter.pl - >",$right_sc);
+#$command=join(" ","grep right ", $tmp2, " |sed -e '/right /s//right\;/g' | sed -e '/ /s//\t/g' >",$right_sc);
+system("$command");
+#print "$command\n";
+#die "CHECK $right_sc\n";
+
+###############################################################
+# Count the number and identify directions of soft clips
+###############################################################
+print "Count the number and identify directions of soft clips\n";
+#print "looking in $outfile\n";
+$outfile=join("",$random_name,".sc.fixed.merge.bed");
+
+open (INFILE,$outfile)||die "couldn't open temp file\n\n";
+my $tmp3 = join("", $random_file, "predSV");
+open (OUTFILE, ">$tmp3")||die "couldn't create temp file\n\n";
+while(<INFILE>){
+chomp;
+	@_=split(/\t/,$_);
+	my $count=tr/\;//;$count+=tr/\,//;
+	$count=$count+1;
+	my $left=0;
+	my $right=0;
+
+	while ($_ =~ /left/g) { $left++ } # count number of right clips
+	while ($_ =~ /right/g) { $right++ } # count number of left clips
+
+	###############################################################
+	if ($count >= $minSoftReads){
+		####get longets soft-clipped read
+		my @clips=split(/\;|,|\|/,$_[3]);
+
+		my ($max, $temp, $temp2, $temp3, $dir, $maxSclip) = (0) x 6;
+		for (my $i=0; $i<$count; $i++) {
+			my $plus1=$i+1;
+			$temp=length($clips[$i]);
+			$temp2=$clips[$plus1];
+			$temp3=$clips[$i];
+
+			if ($temp > $max){
+				$maxSclip=$temp3;
+				$max =$temp;
+				$dir=$temp2;
+			} else {
+				$max=$max;
+				$dir=$dir;
+				$maxSclip=$maxSclip;
+			}
+			$i++;
+		}
+		my $order2 = join("|", $left, $right);
+        #print join ("\t",@_[0..2],$maxSclip,$max,$dir,$count,$order2) . "\n";
+		print OUTFILE join ("\t",@_[0..2],$maxSclip,$max,$dir,$count,$order2) . "\n";
+	} elsif($_=~/unknown/){
+	print OUTFILE join ("\t",@_[0..2],"NA","NA","left","NA","NA|NA") . "\n";
+        print OUTFILE join ("\t",@_[0..2],"NA","NA","right","NA","NA|NA") . "\n";
+	}
+	####Format is Chrom,start, end,longest Soft-clip,length of longest Soft-clip, direction of longest soft-clip,#supporting softclips,#right Sclips|#left Sclips
+}
+close INFILE;
+close OUTFILE;
+
+unlink("$tmp2","$tmp_name","$random_file","$tmp3","$outfile","$index","$random_name","$right_sc","$left_sc","$new_blacklist") if (-z $tmp3 || !-e $tmp3) ;
+
+ if (-z $tmp3 || !-e $tmp3){
+	print "No structural variants found while Counting the number and identify directions of soft clips.\n" ;
+
+	open (OUT,">$OUTNAME")||die "Can't write files here!\n";
+	&print_header();
+	close OUT;
+	exit;
+
+}
+
+print "Done counting Softclipped reads\n";
+###############################################################
+#### Print header information
+###############################################################
+open (OUT,">$OUTNAME")||die "Can't write files here!\n";
+&print_header();
+close OUT;
+
+###############################################################
+###############################################################
+#### DO the bulk of the work
+###############################################################
+use List::Util qw(min max);
+open (FILE,"$tmp3")|| die "Can't open file\n";
+open (OUT,">>$OUTNAME")|| die "Can't open file\n";
+
+#print "\nusing $tmp3 and writing to $OUTPUT_FILE \n";
+while (<FILE>){
+	#If left clip {+- or -- or -+ }{+- are uninformative b/c they go upstream}
+	#If right clip {++ or -- or +-}
+	chomp $_;
+	my $line = $_;
+	my @info = split(/\t/, $_);
+
+	if($info[5] eq "left") {
+		bulk_work("left", $line, $random_file_disc);
+
+	} elsif ($info[5] eq "right") {
+		bulk_work("right", $line, $random_file_disc);
+	}
+#if($. ==6){print "THIS IS LINE 6\n$_\n";die}
+print "Completed line $.\n" if ($verbose);
+}
+close FILE;
+close OUT;
+
+###############################################################################
+###############################################################################
+#### Delete temp files
+my $meregedBed=join("",$random_name,".sc.direction.bed.sc.merge.bed");
+
+if(defined($temp_output)){$temp_output=$temp_output} else {$temp_output="no"}
+
+if ($temp_output eq "no"){
+	unlink("$tmp_name","$random_file","$tmp2",,"$tmp3","$outfile","$index","$random_name","$right_sc","$left_sc","$meregedBed","$random_name.bed");
+}
+####Sort VCF
+my $tmp=join(".",$random_name,"tmp");
+#Get header
+$cmd="grep \"#\" $OUTNAME > $tmp";
+system($cmd);
+#sort results
+$cmd="grep -v \"#\" $OUTNAME|perl -pne 's/chr//'|sort -k1,1n -k2,2n|perl -ne 'print \"chr\".\$_' >>$tmp";
+system($cmd);
+$cmd="mv $tmp $OUTNAME";
+system($cmd);
+#remove entries next to each other
+
+
+
+
+#############################################################
+##May not need this anymore since filtering on left and right
+#############################################################
+#my $tmpout=$OUTNAME;
+#$tmpout.=".tmp";
+#use FindBin qw($Bin);
+##my $path=$Bin;
+#$command="perl ".$path."/Extract_nSC.pl $OUTNAME -q nSC > $tmpout";
+##print "Command=$command\n";
+#system($command);
+#$command="perl ".$path."/reduce_redundancy.pl $tmpout $upper_limit |cut -f1-10 > $OUTNAME";
+##print "$command\n";
+#system($command);
+#system("rm $tmpout");
+########################################################
+
+
+
+
+print "Analysis Completed\n\nYou did it!!!\n";
+print "Finish Time : " . &spGetCurDateTime() . "\n";
+$now = time - $now;
+printf("\n\nTotal running time: %02d:%02d:%02d\n\n", int($now / 3600), int(($now % 3600) / 60),
+int($now % 60));
+
+exit;
+
+###############################################################################
+sub rev_comp {
+  my $dna = shift;
+  my $revcomp = reverse($dna);
+  $revcomp =~ tr/ACGTacgt/TGCAtgca/;
+
+  return $revcomp;
+}
+
+
+###############################################################################
+#### to get reference base
+sub getSeq{
+	my ($chr,$pos,$fasta)=@_;
+	#don't require chr
+	#if($chr !~ /^chr/){die "$chr is not correct\n";}
+#	die "$pos is not a number\n" if ($pos <0);
+my @result=();
+        if ($pos <0){print "$pos is not a valid position (likely caused by circular MT chromosome)\n";return;}
+
+	@result = `samtools faidx $fasta $chr:$pos-$pos`;
+	if($result[1]){chomp($result[1]);
+	return uc($result[1]);
+	}
+	return("NA");
+	#### after return will not be printed
+	####print "RESULTS=@result\n";
+}
+
+sub getBases{
+        my ($chr,$pos1,$pos2,$fasta)=@_;
+        #don't require chr
+        #if($chr !~ /^chr/){die "$chr is not correct\n";}
+my @result=();
+        if ($pos1 <0){print "$pos1 is not a valid position (likely caused by circular MT chromosome)\n";return;};
+
+        @result = `samtools faidx $fasta $chr:$pos1-$pos2`;
+	if(!$result[1]){$result[1]="NA"};
+        chomp($result[1]);
+        return uc($result[1]);
+
+        #### after return will not be printed
+        ####print "RESULTS=@result\n";
+}
+###############################################################################
+#### to get time
+sub spGetCurDateTime {
+	my ($sec, $min, $hour, $mday, $mon, $year) = localtime();
+	my $curDateTime = sprintf "%4d-%02d-%02d %02d:%02d:%02d",
+	$year+1900, $mon+1, $mday, $hour, $min, $sec;
+	return ($curDateTime);
+}
+
+
+###############################################################################
+#### print header
+sub print_header {
+	my $date=&spGetCurDateTime();
+	my $header = qq{##fileformat=VCFv4.1
+##fileDate=$date
+##source=SoftSearch.pl
+##reference=$INPUT_FASTA
+##Usage= SoftSearch.pl -l $minSoft -q $MapQ -r $minSoftReads -d $dist_To_Soft -m $minRP -u $unmated_pairs -s $num_sd -b $INPUT_BAM -f $INPUT_FASTA -o $OUTNAME
+##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
+##INFO=<ID=EVENT,Number=1,Type=String,Description="ID of event associated to breakend">
+##INFO=<ID=MATEID,Number=.,Type=String,Description="ID of mate breakends">
+##INFO=<ID=ISIZE,Number=.,Type=String,Description="Size of the SV">
+##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
+##FORMAT=<ID=lSC,Number=1,Type=Integer,Description="Length of the longest soft clips supporting the BND">
+##FORMAT=<ID=nSC,Number=1,Type=Integer,Description="Number of supporting soft-clips\">
+##FORMAT=<ID=uRP,Number=1,Type=Integer,Description="Number of unmated read pairs nearby Soft-Clips">
+##FORMAT=<ID=levD_local,Number=1,Type=Float,Description="Levenstein distance between soft-clipped bases and the area around the original soft-clipped site">
+##FORMAT=<ID=levD_distl,Number=1,Type=Float,Description="Levenstein distance between the soft-clipped bases and mate location">
+##FORMAT=<ID=CTX,Number=1,Type=Integer,Description="Number of chromosomal translocations">
+##FORMAT=<ID=DEL,Number=1,Type=Integer,Description="Number of reads supporting Large Deletions">
+##FORMAT=<ID=INS,Number=1,Type=Integer,Description="Number of reads supporting Large insertions">
+##FORMAT=<ID=NOV_INS,Number=1,Type=Integer,Description="Number of reads supporting novel sequence insertion">
+##FORMAT=<ID=TDUP,Number=1,Type=Integer,Description="Number of reads supporting a tandem duplication">
+##FORMAT=<ID=INV,Number=1,Type=Integer,Description="Number of reads supporting inversions">
+##FORMAT=<ID=sDEL,Number=1,Type=Integer,Description="Number of reads supporting novel sequence insertion">
+##INFO=<ID=NO_MATE_SC,Number=1,Type=Flag,Description="When there is no softclipping of the mate read location, an appromiate position is used">
+##FORMAT=<ID=GT,Number=1,Type=String,Description="Dummy value for maintaining VCF-Spec">
+#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\t$SAMPLE_NAME\n};
+
+	print OUT $header;
+}
+
+
+###############################################################################
+sub bulk_work {
+print "#####################################@_\n" if ($verbose);
+	my ($side, $line, $file) = @_;
+	my $local_levD = 0;
+	my $distl_levD = 0;
+
+	#my @info = split(/\t/, $line);
+	my @plus_Reads = split(/\t/, $line);
+	$plus_Reads[7] =~ s/\n//g;
+
+	#### softclip length and softclip size.
+	my $lSC = $plus_Reads[4];
+	my $nSC = $plus_Reads[6];
+
+
+	#Get all types of compatible reads
+	#Get improperly paired reads (@ max distance)
+
+	#### default value for left SIDE.
+	#If left-clip, then look downstream for match of softclipped reads to define a deletion, but look for DRPs upstream
+	my $sv_type = "SVTYPE=BND";
+	my $start_local=0; my $end_local=0;my $target_local="";my $target_drp="";my $start_drp="";my $end_drp="";
+	if ($side =~ /left/) {
+		$start_local = $plus_Reads[1]-$dist_To_Soft;
+		$end_local = $plus_Reads[2];
+                $start_drp = $plus_Reads[1];
+                $end_drp = $plus_Reads[1]+$dist_To_Soft;
+	
+	}
+	else{                
+                $start_local = $plus_Reads[1];
+                $end_local = $plus_Reads[1]+$dist_To_Soft;
+                $start_drp = $plus_Reads[1]-$dist_To_Soft;
+                $end_drp = $plus_Reads[1];
+        }
+	
+	$target_local=join("", $plus_Reads[0], ":", $start_local, "-", $end_local);
+	$target_drp=join("", $plus_Reads[0], ":", $start_drp, "-", $end_drp);
+	my $num_unmapped_pairs="";
+	if ($side =~ /right/) {
+		$num_unmapped_pairs=`samtools view $new_blacklist -q $MapQ -f8 -F 1536 -c $INPUT_BAM $target_drp`;
+	} else {
+        $num_unmapped_pairs=`samtools view $new_blacklist -q $MapQ -f24 -F 1536 -c $INPUT_BAM $target_drp`;
+	}
+if($verbose){print "samtools view $new_blacklist -q $MapQ -f24 -F 1536 -c $INPUT_BAM $target_drp\n";}
+
+	$num_unmapped_pairs=~s/\n//;
+if($verbose){print "NUM UNMAPPED PAIRS= $num_unmapped_pairs\n";}
+	my $REF1_base = "";
+	my $REF2_base = "";
+	my $INFO_1 = "";
+	my $INFO_2 = "";
+	my $ALT_1 = "";
+	my $ALT_2 = "";
+	my $isize = 0;
+	my $QUAL = "";
+	my $FORMAT = "GT:";
+
+	#### get 8 bit rand id
+	my $BND1_name = join "", map { ("a".."z")[rand 26] } 1..8;
+	my $BND2_name = join "", map { ("a".."z")[rand 26] } 1..8;
+	$BND1_name=join "_","BND",$BND1_name;
+	$BND2_name=join "_","BND",$BND2_name;
+
+	my $counts = {CTX => 0, DEL => 0, INS => 0, INV => 0, TDUP => 0, NOV_INS => 0 };
+	my $event_mate_info = {CTX => "", DEL => "", INS => "", INV => "", TDUP => "", NOV_INS => "" };
+
+	#### get mate pair info and counts per event
+	foreach my $e (sort keys %{$counts}) {
+		my $h = get_counts_n_info($e, $side, $MapQ, $file, $dist_To_Soft, $target_drp, $upper_limit, $lower_limit);
+
+		$counts->{$e} = $h->{count};
+		$event_mate_info->{$e} = $h->{info};
+	}
+#print Dumper($counts);
+
+	my $max = 0;
+	my $type = "UNKNOWN";
+	my $nRP = 0;
+	my $mate_info = "NA\tNA\tNA\tNA";
+	my $summary = "GT:";
+
+	#### find max count of events and set type, nRP and info to corresponding
+	#### max count event.
+	#### also create a summary string of all counts to be added to VCF file.
+	foreach my $e (sort keys %{$counts}){
+#		if ($counts->{$e} >=i $max){
+		if ($counts->{$e} > $max){		
+			$type = $e .",". $counts->{$e};
+			$nRP = $counts->{$e};
+
+			$max = $counts->{$e};
+
+			if (length($event_mate_info->{$e})) {
+				$mate_info = $event_mate_info->{$e};
+			}
+		}
+
+		$summary .= $e .",". $counts->{$e} .":";
+	}
+#	print "done with Summary\n";
+	#### remove last colon ":" from
+	$summary =~ s/:$//;
+ if (($minRP > $max)&&(!$disable_RP_only )){if ($verbose){print "FAILED BECAUSE ($minRP > $max)&&(!$disable_RP_only )"};return};
+
+	#### Run Levenstein distance on softclip in target region to find out if its a small deletion/insetion
+	#### passing 1: clip_seq, 2: chr, 3: start, 4: end, 5: ref file.
+	my $levD = new LevD;
+########################################################
+########################################################
+########################################################
+
+	#### redefine start and end location for LevD calc.
+#	$start = $plus_Reads[1]-$dist_To_Soft;
+#	$end = $plus_Reads[2];
+	my $num_bases_to_loc=0;
+	my $new_start=0;
+	my $new_end=0;
+	my $del_seq="";
+        my $start = $start_local;
+        my $end = $end_local;
+	if ($lSC=~/NA/){$lSC=0}
+
+	if ($side =~ /right/) {
+	        $levD->search($plus_Reads[3], $plus_Reads[0], $start, $end, $INPUT_FASTA);
+		$local_levD = sprintf("%.2f", $levD->{relative_edit_dist});
+	        $num_bases_to_loc=$levD->{index};
+		$new_start = $plus_Reads[2];
+                if ($plus_Reads[2]=~/^[0-9]/){$new_end=$plus_Reads[2]+$lSC};
+	}
+	else{
+		$levD->search($plus_Reads[3], $plus_Reads[0], $start, $end, $INPUT_FASTA);
+		$local_levD = sprintf("%.2f", $levD->{relative_edit_dist});
+		$num_bases_to_loc=$levD->{index};
+		if ($plus_Reads[2]=~/^[0-9]/){$new_start=$plus_Reads[2]-$lSC};
+                $new_end = $plus_Reads[2];
+	}
+	if((!$new_start)||(!$new_end)||($new_start<0)){print "FAILED AT ((!$new_start)||(!$new_end)||($new_start<0))\n";return};
+	
+	$del_seq=getBases($plus_Reads[0], $new_start,$new_end,$INPUT_FASTA);
+##############################################################################
+#	#If there is a match, where is the start position of the match?
+#
+##############################################################################
+
+
+	#if $plus_Reads[3] eq "NA", then it was found without soft-clipped reads
+	if($plus_Reads[3] !~  /NA/){
+			if (($local_levD < $levD_local_threshold)) {
+				return if (!$sv_only);
+				#### add value to summary to be written to vcf file.
+				$summary = "GT:sDel," . $plus_Reads[6];
+				$type = "sDEL";
+				###########################################################################
+				##### Printing output
+
+				#########################################
+				##### Get DNA info
+				#########################################
+				#$REF1_base = getSeq($plus_Reads[0], $plus_Reads[1], $INPUT_FASTA);
+				$REF1_base = substr($del_seq, 0, 1);
+
+				#### this is alt ref. for softclip its $plus_Reads[3]
+				$REF2_base = $del_seq;
+				$QUAL = 1/($local_levD + 0.001);
+				$QUAL = sprintf("%.2f",$QUAL);
+				$isize = length($del_seq);
+
+				#### svtype = none for sDEL
+				#### isize = length($info[3]);
+				#### nRP = NA
+				#### mate_id = NA
+				#### CTX,:DEL,:....sDEL,##
+				$INFO_1=join(";", "SVTYPE=NA", "EVENT=$type", "ISIZE=$isize");
+
+				#Add Sample infomration
+				my $FORMAT="GT:sDEL";
+				$FORMAT .= ":lSC:nSC:uRP:levD_local";
+				my $SAMPLE= "0/1:";
+				$SAMPLE .= "$plus_Reads[6]:$lSC:$nSC:$num_unmapped_pairs:$local_levD";
+
+				#### remove any white spaces.
+				$INFO_1=~s/\s//g;
+				$INFO_2=~s/\s//g;
+
+				$BND1_name =~ s/^BND/LEVD/;
+				# If left, then the start position is plus_Reads[1]-isize
+				my $start_pos=0;
+				#Make sure Ref1 and Ref2 bases are different
+				if($REF2_base eq $REF1_base){$REF1_base="NA"}
+				if($side=~/left/){$start_pos=$plus_Reads[1]-$isize}else{$start_pos=$plus_Reads[1]};		
+				print OUT join ("\t", $plus_Reads[0], $start_pos, $BND1_name, $REF2_base, $REF1_base, $QUAL, "PASS", $INFO_1,$FORMAT,$SAMPLE, "\n");
+				if ($verbose){print "No Softclipped reads here!\n"}
+				return;
+			}
+		}
+
+		#### Otherwise, look for DRP mate info
+	#if($nRP=~/NA/){print "MATE_INFO=$mate_info\tSide=$side\tline=$line\n";}
+		my @mate_info_arr = split(/\t/, $mate_info);
+		$nRP = $mate_info_arr[3];
+		my $mate_chr=$mate_info_arr[0];
+
+			if((! defined $nRP) || ($nRP =~ /na/i) || ($mate_chr =~ /NA/) ){
+			#PRINT UNKNOWN
+	if ($nRP =~ /na/i){print "Can't find SC reads\n" if ($verbose);return};
+	if ($verbose){print "There is an unknown\nNRP=$nRP Mate_CHR=$mate_chr minRP=$minRP\n"}
+				$summary .= ":unknown," . $plus_Reads[6];
+				$type = "unknown";
+				$REF1_base = getSeq($plus_Reads[0], $plus_Reads[1], $INPUT_FASTA);
+				$REF2_base = $plus_Reads[3];
+				$BND1_name =~ s/^BND/UNKNOWN/;
+				$QUAL = 1/($local_levD + 0.001);
+				$QUAL = sprintf("%.2f",$QUAL);
+				$INFO_1=join(";", "SVTYPE=unknown", "EVENT=unknown", "ISIZE=unknown");
+				#Add Sample infomration
+				my $FORMAT="GT:sDEL";
+				$FORMAT .= ":lSC:nSC:uRP:levD_local";
+				my $SAMPLE = "0/1:";
+				$SAMPLE .= "$plus_Reads[6]:$lSC:$nSC:$num_unmapped_pairs:$local_levD";
+				$SAMPLE=~s/NA/0/g;
+				#### remove any white spaces.
+				$INFO_1=~s/\s//g;
+			       #print join ("\t", $plus_Reads[0], $plus_Reads[1],  $REF2_base, $REF1_base, $QUAL, "PASS", $INFO_1,$FORMAT,$SAMPLE, "\n");
+
+				print OUT join ("\t", $plus_Reads[0], $plus_Reads[1], $BND1_name, $REF1_base, $REF2_base, $QUAL, "PASS", $INFO_1,$FORMAT,$SAMPLE, "\n");
+				return;
+
+		}
+		#### end if there is no mate info or nRP+uRP<minRP
+		if (($nRP<$minRP)&&($unmated_pairs > ($num_unmapped_pairs+$nRP))){
+			print "Something failed here\nif (($nRP<$minRP)&&($unmated_pairs > ($num_unmapped_pairs+$nRP)))\n";
+		return}
+
+		##################################################################################
+		# Find out if mates have nearby soft-clips (to refine the breakpoints)
+		##################################################################################
+		#Look for evidence of soft-clipping near mate
+		my @mate_soft_arr = ();
+		my $mate_start = 0;
+		my $mate_soft = "";
+
+		@mate_info_arr = split(/\t/, $mate_info);
+
+		#### mate start and end locations.
+		my $filename = $right_sc;
+
+		$start = $mate_info_arr[1] - $dist_To_Soft;
+		$end = $mate_info_arr[1];
+
+		if ($side =~ /right/) {
+			$start = $mate_info_arr[2];
+			$end = $mate_info_arr[2] + $dist_To_Soft;
+
+			$filename = $left_sc;
+		}
+
+		#### add levenstein distance to Summary
+	#print "Calc distal Levd\n";
+		$levD->search(rev_comp($plus_Reads[3]), $mate_info_arr[0], $start, $end, $INPUT_FASTA);
+		$distl_levD = sprintf("%.2f", $levD->{relative_edit_dist});
+	$distl_levD = "NA" if($plus_Reads[3] =~ /NA/);
+	#If there is no softclips to string match, then give 0 as quality value
+       if ($plus_Reads[3] !~ /NA/){
+			$QUAL=1/($distl_levD + 0.001);
+		}
+		else	{
+			$QUAL=0;
+		};
+	$QUAL=sprintf("%.2f",$QUAL);
+	#### looking for softclips to refine break point
+	#### if left look in right and vice-versa.
+	$cmd = qq{echo -e "$mate_info_arr[0]\t$start\t$end"};
+	$cmd .= qq{ | awk -F'\t' 'NF==3' | intersectBed -a stdin -b $filename | head -1};
+print "$cmd\n" if $verbose;
+	$mate_soft = `$cmd`;
+
+	$mate_soft =~ s/\n//g;
+	@mate_soft_arr = split(/\s/, $mate_soft);
+my $NO_MATE_SC="";
+	if(@mate_soft_arr){
+		$mate_chr = $mate_soft_arr[0];
+		$mate_start = $mate_soft_arr[1];
+                $NO_MATE_SC="APPROXIMATE";
+
+	} else{
+		@mate_info_arr = split(/\s/,$mate_info);
+		$mate_chr = $mate_info_arr[0];
+		$mate_start = $mate_info_arr[1];
+	}
+
+	#end if there is no mate info
+	return if ($mate_chr eq "");
+	#end if there is no mate info and !disable_RP_only
+	return if (($lSC =~/NA/)&&(!$disable_RP_only));
+	
+	
+	###########################################################################
+	##### Printing output
+
+	#########################################
+	# Get DNA info
+	#########################################
+	#print "PLUS_READS=$plus_Reads[0],$plus_Reads[1]\nMATE=$mate_chr,$mate_start,$INPUT_FASTA\n";
+	$REF1_base = getSeq($plus_Reads[0], $plus_Reads[1], $INPUT_FASTA);
+
+	### this is alt ref. for softclip its $plus_Reads[3]
+	$REF2_base = getSeq($mate_chr, $mate_start, $INPUT_FASTA);
+
+	#########################################
+	# print in VCF format
+	#########################################
+
+	#### abs value to account for left and right reads.
+	$isize = abs($plus_Reads[1]-$mate_start);
+	
+	my $event_type=$type;
+	$event_type=~ s/,|[0-9]//g;
+	$INFO_1=join(";", "$sv_type", "EVENT=$event_type","END=$mate_start", "ISIZE=$isize","MATEID=$BND2_name");
+	$INFO_2=join(";", "$sv_type", "EVENT=$event_type","END=$plus_Reads[1]", "ISIZE=$isize","MATEID=$BND1_name");
+
+	#### remove any white spaces.
+	#### ask: did you mean to remove space from ends? eg. trim()
+	$INFO_1=~s/\s//g;
+	$INFO_2=~s/\s//g;
+
+	$FORMAT=$summary;
+ 	$FORMAT=~ s/,|[0-9]//g;
+        $FORMAT .= ":lSC:nSC:uRP:distl_levD";
+	if($NO_MATE_SC){$INFO_2 .= ":NO_MATE_SC"}
+	my $SAMPLE="0/1:";	
+	$SAMPLE .=$summary;
+#        if($NO_MATE_SC){$SAMPLE.= ":$NO_MATE_SC"}
+
+	$SAMPLE=~s/[A-Z|,|_]//g;
+        my $MATE_SAMPLE=$SAMPLE;
+        $SAMPLE .= ":$lSC:$nSC:$num_unmapped_pairs:$distl_levD";
+	$MATE_SAMPLE .=":NA:NA:NA:NA";
+	$SAMPLE=~s/::/:/g;
+	$MATE_SAMPLE=~s/::/:/g;
+	$MATE_SAMPLE=~s/NA/0/g;
+	$SAMPLE=~s/NA/0/g;
+ 
+	if($type !~ /INV/){
+		$ALT_1 = join("","]",$mate_chr,":",$mate_start,"]",$REF1_base);
+		$ALT_2 = join("",$REF2_base,"[",$plus_Reads[0],":",$plus_Reads[1],"[");
+		#		2      321682 bnd_V  T   ]13:123456]T  6    PASS SVTYPE=BND
+		#		13     123456 bnd_U  C   C[2:321682[   6    PASS SVTYPE=BND
+	} else {
+		$ALT_1 = join("", "]", $plus_Reads[0], ":", $plus_Reads[1], "]", $REF2_base);
+		$ALT_2 = join("", $REF1_base, "[", $mate_chr, ":", $mate_start, "[");
+	}
+
+	if(($mate_chr) && ($plus_Reads[0])){
+		print OUT join ("\t", $plus_Reads[0], $plus_Reads[1], $BND1_name, $REF1_base, $ALT_1, $QUAL,"PASS", $INFO_1, $FORMAT,$SAMPLE,"\n");
+		print OUT join ("\t", $mate_chr, $mate_start, $BND2_name, $REF2_base, $ALT_2, $QUAL, "PASS", $INFO_2, $FORMAT,$MATE_SAMPLE,"\n");
+	}
+}
+
+###############################################################################
+###############################################################################
+sub get_counts_n_info {
+        my ($event, $side, $mapQ, $file, $dist, $target, $upL, $lwL) = @_;
+
+        my $mate_info = "";
+        my $cmd = "";
+
+        if ($event =~ /^CTX$/i) {
+                #print "CTX side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 32 -F 1536 $file $target};
+                        $cmd .= qq{ | perl -ane 'if(\$F[6] ne "="){\$end=\$F[7]+1; print join ("\\t",\$F[6],\$F[7],\$end,"\\n")}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info=`$cmd`;
+                } else {
+                        $cmd = qq{ samtools view $new_blacklist -q $mapQ -f 16 -F 1536 $file $target};
+                        $cmd .= qq{ | perl -ane 'if(\$F[6] ne "="){\$end=\$F[7]+1; print join ("\\t",\$F[6],\$F[7],\$end,"\\n")}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info=`$cmd`;
+                }
+        } elsif ($event =~ /^DEL$/i) {
+                #print "DEL side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ  -f 32 -F 1552 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$9>$upL)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info=`$cmd`;
+                } else {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -F 1568 -f 16 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"} {if((\$7 ~ /=/)&&(\$9<-$upL)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist_To_Soft -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+
+                        $mate_info=`$cmd`;
+                }
+        } elsif ($event =~ /^INS$/i) {
+                #print "INS side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 32 -F 1552 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$9<$lwL && \$9 > 0 )){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                } else {
+                        $cmd = qq {samtools view $new_blacklist -q $mapQ -f 16 -F 1568 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$9>-$lwL && \$9 < 0 )){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                }
+        } elsif ($event =~ /^INV$/i) {
+                #print "INV side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ  -F 1596 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist_To_Soft -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                } else {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 48 -F 1548 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                }
+        } elsif ($event =~ /^TDUP$/i) {
+                #print "TDUP side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 32 -F 1552 $file $target};
+#			$cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$9>$upL)){end=\$8+1;print \$3,\$8,end}}'};
+			$cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$4>\$8)&&(\$9<0)&& (\$9>$upL)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                } else {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 16 -F 1568 $file $target};
+#                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$9<-$upL )){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$4<\$8)&&(\$9>0)&&(\$9>$upL)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                }
+        } elsif ($event =~ /^NOV_INS$/i) {
+                #print "NOV_INS side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 8 -F 1552 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{end=\$8+1;print \$3,\$8,end}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                } else {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ  -f 24 -F 1536 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{end=\$8+1;print \$3,\$8,end}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                }
+        }
+
+        $mate_info=~s/\n//g;
+        my @tmp=split(/\t/, $mate_info);
+
+        my $counts = 0;
+
+        if (defined $tmp[3]) {
+                $tmp[3] =~ s/\n//g;
+
+                $counts = $tmp[3] if (length($tmp[3]));
+        }
+        return ({count=>$counts, info=>$mate_info});                                                                                                                                
+}
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/src/SoftSearch_Filter.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,137 @@
+#!/usr/bin/perl -s
+open (FILE,"$ARGV[0]")||usage();#die "Not using the right Parameters!\n\n";
+use Getopt::Long;
+#Declare variables
+my ($lsc,$minDist,$skip,$nSC,$nRP,$isize,$answer);
+GetOptions(
+	'dist:s' => \$minDist,		#minimum distance between events
+	'lsc:i' => \$lsc,		#minimum somatic score
+	'nsc:i' => \$nsc, 	#minimum depth of coverage in normal
+	'nRP:i' => \$nRP,	#minimum number of times it can be seen in tumor
+	'isize:i' => \$isize,	
+	'sv:s' => \$sv,		#whether or not to skip small deletions
+	'q:s' => \$answer,		#useful for plotting histograms
+	'skip:s' => \$skip
+	);
+if(defined($lsc)){$lsc=$lsc} else {$lsc=0};
+if(defined($nsc)){$nsc=$nsc} else {$nsc=0};
+if(defined($nRP)){$nRP=$nRP} else {$nRP=0};
+if(defined($minDist)){$minDist=$minDist} else {$minDist=0};
+if(!$isize){$isize=0};
+if(!$uRP){$uRP=0};
+
+if($answer eq "yes"){$answer=$answer} else {$answer="no"};
+
+if ($answer eq "yes"){
+open(lsc,">lsc.out")||die;
+open(nsc,">nsc.out")||die;
+open(nRP,">nRP.out")||die;
+}
+
+
+#Remove hits if they are within $minDist
+$chr="chr1";$pos=0;
+while (<FILE>){
+	if ($_=~/^#/){
+		print; 
+		next
+	};
+	if ($skip){next if $_=~/$skip/}
+	@_=split(/\t/,$_);
+	#Get ISIZE from INFO field
+	my @info=split(/;/,$_[7]);
+       	my $k = 0;
+	my $v = 0; 
+	my $infoHash;
+	for (my $i=0;$i<=@info;$i++){
+        	my @tmp=split(/=/,$info[$i]);
+		$k=shift(@tmp);
+		$v=shift(@tmp);
+		$infoHash{$k}=$v;
+	}
+
+	#Get the value of TYPE to find out how many reads support the event
+        my $counts = {CTX => 0, DEL => 0, INS => 0, INV => 0, TDUP => 0, NOV_INS => 0, lSC => 0, nSC => 0,uRP =>0,sDEL => 0,levD_local=>0,distl_levD => 0 };
+	#Get Complete Hash
+	#@_[8] is format
+	#@_[9] is values
+	my @format=split(/:/, $_[8]);
+	my @sample=split(/:/,$_[9]);
+	my %hash; 
+	@hash{@format}=@sample;
+	#Subset has to get proper type of variants
+	my $max_val = 0;
+	my $max_type = "NA";
+	
+	#Get TYPEOF HASH 
+	my %type;
+	%type = %hash ;
+	delete $type{'lSC'};
+        delete $type{'nSC'};
+        delete $type{'uRP'};
+        delete $type{'levD_local'};
+        delete $type{'distl_levD'};
+
+ 	while (my ($key,$val)=each(%type)){
+		if($val > $max_val){$max_val=$val;$max_type=$key}
+		}
+
+
+#######################################################################################################
+        #Start applying filters
+	
+	#Remove hits if they are within $minDist
+	$chrom=$_[0];$position=$_[1];
+
+	#next if chroms are same and distance is less than X
+	$difference=abs($pos-$position);
+	if(($chrom eq $chr)&&($difference < $minDist)){
+		$pos=$position;$chr=$chrom;;
+		next}
+	$pos=$position;$chr=$chrom;	
+	$EVENT_SIZE=$infoHash{'ISIZE'};
+	$EVENT_TYPE=$max_type;
+	$EVENT_SUPPORT=$max_val;
+	$length_of_softClips=$hash{'lSC'};
+	$number_of_softclips=$hash{'nSC'};
+        $number_of_unmated=$hash{'uRP'};
+	
+	########################################################################
+	#Print if all fileds are ok
+	next if($EVENT_SIZE < $isize);
+        next if($EVENT_SUPPORT < $nRP);
+        next if($length_of_softClips < $lsc);
+        next if($number_of_softclips < $nsc);
+        next if($number_of_unmated < $uRP);
+	next if (($sv)&&($EVENT_TYPE=~/sDEL/));
+	print;
+
+
+	if ($answer eq "yes"){
+	print lsc $length_of_softClips."\n";
+	print nsc $number_of_softclips."\n";
+	print nRP $EVENT_SUPPORT."\n";
+	}
+}
+
+
+sub usage{
+print "\nUsage: Soft_SearchFilter.pl <VCF>\n
+	-dist	#minimum distance between events [0]
+	-lsc	#minimum length soft-clip [0]
+	-nsc	#minimum number of soft-clip [0]
+	-nRP	#minimum number of discordant read pairs [0]
+	-isize	#minimum size [0]
+	-sv	#skip small deletions [no|yes]
+	-skip	#pipe-delimited list of strings to skip (e.g. chrM|chY|chrGL)
+	\n"
+}
+
+#R
+# lsc<-read.table("lsc.out")
+# nsc<-read.table("nsc.out")
+# nRP<-read.table("nRP.out")
+# par(mfrow=c(2,2))
+# hist(lsc$V1,breaks=100)
+# hist(nsc$V1,breaks=100)
+# hist(nRP$V1,breaks=100)
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/src/Subset_targets.sh	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,23 @@
+#!/bin/sh
+#$ -V
+#$ -cwd
+#$ -q 1-day
+#$ -m ae
+#$ -M hart.steven@mayo.edu
+#$ -l h_vmem=1G
+#$ -l h_stack=10M
+BAM=$1
+TARGET_BED=$2
+SAMPLE_NUMBER=$3
+
+#cat $HEADER > out.${SAMPLE_NUMBER}.sam
+samtools view -L $TARGET_BED $BAM|
+ perl -ane '
+ next if ($F[10]=~/#/);
+ $minSize=1000;
+ if( $F[1] & 8 || $F[1] & 4 ||  $F[8] == 0 || abs($F[8]) > $minSize || $F[5] =~/S/){
+ $rName=join("","@",@F[0]);
+  print join ("\n",$rName,$F[9],"+",@F[10])."\n";
+};
+ ' >> out.${SAMPLE_NUMBER}.fq
+echo "Done with $BAM `date`"
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/src/blat_parse.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,526 @@
+#####################################################################################################################################################
+#Purpose: To parse blat psl file
+#Date: 07-30-2013
+#####################################################################################################################################################
+use Getopt::Long;
+use Cwd;
+#reading input arguments
+&Getopt::Long::GetOptions(
+'b|BLAT_OUT=s'=> \$blat_out,
+'temp:s'=>\$dirtemp,
+'f|FASTA=s'=>\$infast,
+);
+$blat_out =~ s/\s|\t|\r|\n//g;
+$dirtemp =~ s/\s|\t|\r|\n//g;
+$infast =~ s/\s|\t|\r|\n//g;
+$samtools=`which samtools`;
+$samtools =~ s/\s|\t|\r|\n//g;
+
+if($blat_out eq "" || $infast eq "" )
+{
+	die "Try: perl blat_parse.pl -b <PSL FILE> -f <Contigs.fa> 
+	-temp	temporary file directory
+	\n";
+}   
+if (!(-e $samtools))
+{
+	die "samtools must be in your path\n";
+}
+
+if (!(-e $infast))
+{
+	die "input fasta file doesn't exit\n";
+}
+unless(-d $dirtemp)
+{
+    #system("mkdir -p $dirtemp");
+    $dirtemp= getcwd;
+}	
+#opening the blat output file
+open(BUFF,$blat_out) or die "no file found $blat_out\n";
+open(WRBUFF,">$dirtemp/Temp_out.txt") or  die "not able to write the file \n";
+#parsing throught he file
+while(<BUFF>)
+{
+	if($_ =~ m/^\d/)
+	{
+		print WRBUFF $_;	
+	}
+	else
+	{
+		print "ignoring headers $.\n";
+	}
+}	
+close(WRBUFF);
+system("sort -k10,10 -k18,18n $dirtemp/Temp_out.txt > $dirtemp/Temp_out1.txt");
+system("mv  $dirtemp/Temp_out1.txt $dirtemp/Temp_out.txt");
+open(BUFF,"$dirtemp/Temp_out.txt") or die "no file found Temp_out.txt\n";
+open(WRBUFF,">$dirtemp/File1_out.txt") or  die "not able to write the file \n";
+close(WRBUFF);
+
+$prev_contig_name="";
+my @temp;
+#parsing throught he file
+while(<BUFF>)
+{
+	
+		chomp($_);
+		split "\t";
+		if($_[9] ne $prev_contig_name)
+		{
+			if($prev_contig_name ne "")
+			{
+				#print @temp."\n";
+				#print @temp."\n";
+				&processing(@temp);
+			}
+			undef(@temp);
+			push(@temp,$_);		
+		}
+		else
+		{
+			push(@temp,$_);
+		}	
+		$prev_contig_name=$_[9];	
+	
+	
+}	
+#processing last record
+&processing(@temp);
+#print @temp."\n";
+close(BUFF);
+
+
+
+
+##################SUBROUTINES######################
+#actual processing of each record in the temp array(same query name objects)
+
+sub processing {
+	open(WRBUFF,">>$dirtemp/File1_out.txt") or  die "not able to write the file \n";
+        open(BAD_CONTIG,">>$dirtemp/bad_contig.out.txt") or  die "not able to write the file \n";
+
+	@temp = @_;
+	#if number of hits for a contig is one
+	if(@temp == 1)
+	{
+			$i=0;
+			#define blocksizes array
+			@row=split("\t",$temp[$i]);
+			$row[18] =~ s/,$//g;
+			@blockSizes=split(',',$row[18]);
+			#defining var
+			$qSize=$row[10];
+			$qStart=$row[11];
+			$qStop=$row[12];
+			$tstart=$row[15];
+			$tstop=$row[16];
+			$Strand=$row[8];
+			$coverage = $row[9];
+			$coverage =~ s/\w+_//g;
+			#calculate match val
+			if(($qSize-($qStop-$qStart)) ==0)
+			{ 	
+				$flag=1;
+				#these ara non informative
+				if (@blockSizes ==1)
+				{
+					print "ignoring one of the event $row[9] $i as the event is non informative \n";
+					print BAD_CONTIG "$row[9]\n";
+				}
+				#Ignoring when number of blocks are more than two
+				if(@blockSizes > 2)
+				{
+					print "ignoring event $row[9] $. AS BLOCK SIZE is greater than 2\n";	
+				}
+				#if number of blocks is equal to 2
+				if(@blockSizes == 2)
+				{
+					$temp1=$tstart+$blockSizes[0]+1;
+					$temp2=$tstop-$blockSizes[1]-1;
+						
+					print  WRBUFF "$row[9]\t$row[13]\t$temp1\t$Strand\t$row[13]\t$temp2\t$Strand\t$coverage\n";
+				}
+				$i=@temp;
+			}
+			#later part missing
+			elsif($qStart ==0)
+			{	
+				$temp1=$tstart+$blockSizes[0]+1;
+				$infast_chr=$infast;
+				$infast_chr=~ s/\.fa//g;
+				$infast_chr_start=$qStop+1;
+				$infast_chr_stop=$qSize;
+				$sys="$samtools faidx $infast $infast_chr:$infast_chr_start-$infast_chr_stop";
+				
+				$sys = `$sys`;
+				chomp($sys);
+				@sys=split("\n",$sys);
+				$INSERTION="";
+				for($i=1;$i<@sys;$i++)
+				{
+					$INSERTION=$INSERTION.$sys[$i];
+				}
+				$INSERTION_LENGTH=length($INSERTION);
+				$temp1=$tstart+$blockSizes[0]+1;
+				print  WRBUFF "$row[9]\t$row[13]\t$temp1\t$Strand\tUNKNOWN\tUNKNOWN\t$Strand\t$coverage\t$INSERTION\t$INSERTION_LENGTH\n";
+				
+			}
+			#intial part missing
+			elsif($qStop == $qSize)
+			{
+				$temp1=$tstart;
+				$infast_chr=$infast;
+				$infast_chr=~ s/\.fa//g;
+				$infast_chr_start=0;
+				$infast_chr_stop=$qStart;
+				$sys ="$samtools faidx $infast $infast_chr:$infast_chr_start-$infast_chr_stop";
+				#die "$sys\n";
+				$sys = `$sys`;
+				#die "$sys\n";
+				chomp($sys);
+				@sys=split("\n",$sys);
+				$INSERTION="";
+				for( $i=1;$i<@sys;$i++)
+				{
+						$INSERTION=$INSERTION.$sys[$i];
+				}
+				$INSERTION_LENGTH=length($INSERTION);
+				$temp1=$tstart+1;
+				print  WRBUFF "$row[9]\tUNKNOWN\tUNKNOWN\t$Strand\t$row[13]\t$temp1\t$Strand\t$coverage\n";
+				
+			}
+			else
+			{
+				print "ignoring one of the event $row[9] $i as the event is non informative \n";
+			}
+		
+	}
+	#if number of hits for a contig is greater than one
+	else
+	{
+		#this flag is used to see if perfect hit not found (match val =0)
+		$flag1 = 0;
+		for(my $i=0;$i<@temp;$i++)
+		{
+			
+			#define blocksizes array
+			@row=split("\t",$temp[$i]);
+			$row[18] =~ s/,$//g;
+			@blockSizes=split(',',$row[18]);
+			#defining var
+			$qSize=$row[10];
+			$qStart=$row[11];
+			$qStop=$row[12];
+			$tstart=$row[15];
+			$tstop=$row[16];
+			$Strand=$row[8];
+			$coverage = $row[9];
+			$coverage =~ s/\w+_//g;
+			#calculate match val
+			if(($qSize-($qStop-$qStart)) ==0)
+			{ 	
+				$flag1=1;
+				#these ara non informative
+				if (@blockSizes ==1)
+				{
+					print "ignoring one of the event $row[9] $i as the event is non informative \n";
+					print BAD_CONTIG "$row[9]\n";
+				}
+				#Ignoring when number of blocks are more than two
+				if(@blockSizes > 2)
+				{
+					print "ignoring event $row[9] $. AS BLOCK SIZE is greater than 2\n";	
+				}
+				if(@blockSizes == 2)
+				{
+					$temp1=$tstart+$blockSizes[0]+1;
+					$temp2=$tstop-$blockSizes[1]-1;
+						
+					print  WRBUFF "$row[9]\t$row[13]\t$temp1\t$Strand\t$row[13]\t$temp2\t$Strand\t$coverage\n";
+				}
+				$i=@temp;
+			}
+		}
+		#as flag value not changed proceed to see next step
+		if($flag1 == 0)
+		{
+			undef(@initial);
+			my @initial;
+			for(my $i=0;$i<@temp;$i++)
+			{
+				@row=split("\t",$temp[$i]);
+				#print "@row\n";
+				unshift(@initial,[@row]);
+			}
+			#sortin the hits according to qstart & qend
+			@initial = sort {$a->[11] <=> $b->[11] || $b->[12] <=> $a->[12]} @initial;
+			#print "$row[9]\t@initial\n";
+			#if($row[9]  eq "NODE_5_length_149_cov_12.395973")
+			#{
+			#	for($i=0;$i<@initial;$i++)
+			#	{
+			#		print "@{$initial[$i]}\n";
+			#	}
+			#}
+			$start = "";
+			$stop = "";
+			$start_len=0;
+			$stop_len=0;
+			#this super flag is used to skip processing of remaining uncessary hits
+			$super_flag = 0;
+			for($i=0;$i<@initial && $super_flag == 0;$i++)
+			{
+				$flag = 0;
+				#print "@{$initial[$i]}\n";
+				$initial[$i][18] =~ s/,$//g;
+				@blockSizes1=split(',',$initial[$i][18]);
+				#defining var
+				$qSize1=$initial[$i][10];
+				$qStart1=$initial[$i][11];
+				$qStop1=$initial[$i][12];
+				$tstart1=$initial[$i][15];
+				$tstop1=$initial[$i][16];
+				$Strand1=$initial[$i][8];
+				$Chr1 = $initial[$i][13];
+				$coverage1 = $initial[$i][9];
+				$coverage1 =~ s/\w+_//g;
+				#die "$qSize1\t$qStart1\t$qStop1\t$tstart1\t$tstop1\t$Strand1\t$Chr1\t$coverage1\n";
+				#if a hit qstart = 0 then set flag =1 
+				if($qStart1 == 0)
+				{
+					$flag =1;
+				}
+				#if a hit qstop = 0 then set flag =2 
+				if($qStop1 == $qSize1)
+				{
+					$flag =2;
+				}
+				#if($row[9]  eq "NODE_5_length_149_cov_12.395973")
+				#{
+				#	print "$flag \n";
+				#}
+				if(@blockSizes1 == 1)
+				{
+					if($flag == 1 )
+					{
+						for($j=0;$j<@initial;$j++)
+						{
+							#both hits should not be the same 
+							if($i != $j)
+							{
+								#print "@{$initial[$i]}\n";
+								$initial[$j][18] =~ s/,$//g;
+								@blockSizes2=split(',',$initial[$j][18]);
+								#defining var
+								$qSize2=$initial[$j][10];
+								$qStart2=$initial[$j][11];
+								$qStop2=$initial[$j][12];
+								$tstart2=$initial[$j][15];
+								$tstop2=$initial[$j][16];
+								$Strand2=$initial[$j][8];
+								$coverage2 = $initial[$j][9];
+								$Chr2 = $initial[$j][13];
+								$coverage2 =~ s/\w+_//g;
+								#making sure both hits are not over lapping
+								if($qStart2 > $qStart1)
+								{	#allowing +-2 bases as the this hit is immediate next continous hit
+									if($qStop1 >= $qStart2 -2  &&  $qStop1 <= $qStart2 +2  )
+									{
+										#perfect match
+										if($qStop2 == $qSize2)
+										{
+											if($Strand1 eq "+")
+											{
+												$tmp1 = $tstart1+$blockSizes1[0]+1;
+												$tmp2 = $tstart2+$blockSizes2[0];
+												print WRBUFF "$initial[$i][9]\t$Chr1\t$tmp1\t$Strand1\t$Chr2\t$tmp2\t$Strand2\t$coverage1\n";
+											}
+											else
+											{
+												$tmp1 = $tstart1+1;
+												$tmp2 = $tstart2+1;
+												print WRBUFF "$initial[$i][9]\t$Chr1\t$tmp1\t$Strand1\t$Chr2\t$tmp2\t$Strand2\t$coverage1\n";
+											
+											}
+											$super_flag = 1;
+											$j = @initial+1;	
+										}
+										#some part is missing after the second hit
+										else
+										{
+											$tmp1 = $tstart1+$blockSizes1[0];
+											$tmp2 = $tstart2+$blockSizes2[0];
+											$INSERTION="";
+											$infast_chr=$infast;
+											$infast_chr=~ s/\.fa//g;
+											$infast_chr_start=$qStop1+1;
+											$infast_chr_stop=$qStart2-1;
+											$sys ="$samtools faidx $infast $infast_chr:$infast_chr_start-$infast_chr_stop";
+											#die "$sys\n";
+											$sys = `$sys`;
+											#die "$sys\n";
+											chomp($sys);
+											@sys=split("\n",$sys);
+											for( $i=1;$i<@sys;$i++)
+											{
+												$INSERTION=$INSERTION.$sys[$i];
+											}
+											$INSERTION_LENGTH=length($INSERTION);
+											print WRBUFF "$initial[$i][9]\t$Chr1\t$tmp1\t$Strand1\t$Chr2\t$tmp2\t$Strand2\t$coverage1\n";
+											$super_flag = 1;
+											$j = @initial+1;	 
+										}
+										
+									}
+									#if there are some insertion between two hits
+									elsif($qStop2 == $qSize2)
+									{
+										$tmp1 = $tstart1+$blockSizes1[0];
+										$tmp2 = $tstart2+$blockSizes2[0];
+										$INSERTION="";
+										$infast_chr=$infast;
+										$infast_chr=~ s/\.fa//g;
+										$infast_chr_start=$qStop2+1;
+										$infast_chr_stop=$qSize;
+										$sys ="$samtools faidx $infast $infast_chr:$infast_chr_start-$infast_chr_stop";
+										#die "$sys\n";
+										$sys = `$sys`;
+										#die "$sys\n";
+										chomp($sys);
+										@sys=split("\n",$sys);
+										for( $i=1;$i<@sys;$i++)
+										{
+											$INSERTION=$INSERTION.$sys[$i];
+										}
+										$INSERTION_LENGTH=length($INSERTION);
+										print WRBUFF "$initial[$i][9]\t$Chr1\t$tmp1\t$Strand1\t$Chr2\t$tmp2\t$Strand2\t$coverage1\n";
+										$super_flag = 1;
+										$j = @initial+1;	
+									}
+												
+								}
+									
+							}	
+						}
+						#if none worked with other reads then only process that read
+						if($j == @initial)
+						{
+							#die "success\n";
+							$temp1=$tstart1+$blockSizes1[0]+1;
+							#print  WRBUFF "$Chr1\t$temp1\t$Strand1\tUNKNOWN\tUNKNOWN\t$Strand\t$coverage\n";
+							$infast_chr=$infast;
+							$infast_chr=~ s/\.fa//g;
+							$infast_chr_start=$qStop1+1;
+							$infast_chr_stop=$qSize1;
+							$sys ="$samtools faidx $infast $infast_chr:$infast_chr_start-$infast_chr_stop";
+							#die "$sys\n";
+							$sys = `$sys`;
+							#die "$sys\n";
+							chomp($sys);
+							@sys=split("\n",$sys);
+							$INSERTION="";
+							for( $i=1;$i<@sys;$i++)
+							{
+								$INSERTION=$INSERTION.$sys[$i];
+							}
+							$INSERTION_LENGTH=length($INSERTION);
+							print WRBUFF "$initial[$i][9]\t$Chr1\t$temp1\t$Strand1\tUNKNOWN\tUNKNOWN\t$Strand1\t$coverage1\n";
+							$super_flag = 1;
+						}	
+					}
+					#if query end is matched to query size
+					elsif($flag == 2)
+					{
+						#going through other hits
+						for($j=0;$j<@initial;$j++)
+						{
+							#hits should not be same
+							if($i != $j && $qStop2)
+							{
+								#print "@{$initial[$i]}\n";
+								$initial[$j][18] =~ s/,$//g;
+								@blockSizes2=split(',',$initial[$j][18]);
+								#defining var
+								$qSize2=$initial[$j][10];
+								$qStart2=$initial[$j][11];
+								$qStop2=$initial[$j][12];
+								$tstart2=$initial[$j][15];
+								$tstop2=$initial[$j][16];
+								$Strand2=$initial[$j][8];
+								$coverage2 = $initial[$j][9];
+								$Chr2 = $initial[$j][13];
+								$coverage2 =~ s/\w+_//g;
+								#if 
+								if($qStop2 < $qStop1)
+								{
+									if($qStart1 >= $qStop2 -2  &&  $qStart1 <= $qStop2 +2  )
+									{
+										#die "$qStart1 <= $qStop2 \n";
+										$tmp1 = $tstart1+$blockSizes1[0];
+										$tmp2 = $tstart2+$blockSizes2[0];
+										$INSERTION="";
+										$infast_chr=$infast;
+										$infast_chr=~ s/\.fa//g;
+										$infast_chr_start=0;
+										$infast_chr_stop=$qStart1-1;
+										$sys ="$samtools faidx $infast $infast_chr:$infast_chr_start-$infast_chr_stop";
+										#die "test $sys\n";
+										$sys = `$sys`;
+										#die "$sys\n";
+										chomp($sys);
+										@sys=split("\n",$sys);
+										for( $i=1;$i<@sys;$i++)
+										{
+											$INSERTION=$INSERTION.$sys[$i];
+										}
+										$INSERTION_LENGTH=length($INSERTION);
+										print WRBUFF "$initial[$i][9]\t$Chr2\t$tmp2\t$Strand2\t$Chr1\t$tmp1\t$Strand1\t$coverage1\n";
+										$super_flag = 1;
+										$j = @initial+1;
+										
+									}
+									
+								}	
+							}
+						}
+						if($j == @initial)
+						{
+							$infast_chr=$infast;
+							$infast_chr=~ s/\.fa//g;
+							$infast_chr_start=0;
+							$infast_chr_stop=$qStart1;
+							$sys ="$samtools faidx $infast $infast_chr:$infast_chr_start-$infast_chr_stop";
+							#die "test $sys\n";
+							$sys = `$sys`;
+							#die "$sys\n";
+							chomp($sys);
+							@sys=split("\n",$sys);
+							$INSERTION="";
+							for( $i=1;$i<@sys;$i++)
+							{
+								$INSERTION=$INSERTION.$sys[$i];							
+							}
+							$INSERTION_LENGTH=length($INSERTION);
+							$tmp = $tstart1+1;
+							print WRBUFF "$initial[$i][9]\tUNKNOWN\tUNKNOWN\t$Strand1\t$Chr1\t$tmp\t$Strand1\t$coverage1\n";
+							$super_flag = 1;
+						}	
+					}
+				}
+				elsif(@blockSizes == 2)
+				{
+					$temp1=$tstart1+$blockSizes[0]+1;
+					$temp2=$tstop1-$blockSizes[1]-1;
+					print  WRBUFF "$initial[$i][9]\t$Chr1\t$temp1\t$Strand1\t$Chr1\t$temp2\t$Strand1\t$coverage1\n";
+				
+				}		
+			}
+		}
+		
+	}
+	close(WRBUFF);
+	
+	undef(@temp);
+}
+ 
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/src/cluster.pair.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,70 @@
+#!/usr/bin/perl                                                                                                                                            
+use strict;
+use POSIX;
+
+my $usage = "cluster.pair.pl maxdist\n";
+my $maxdist = shift or die $usage;
+
+my %count;
+
+while (<STDIN>){
+    chomp;
+    my ($sample, $chrstart, $start, $chrend, $end) = split /\t/;
+    my $nstart = floor ($start/$maxdist);
+    my $nend   = floor ($end/$maxdist);
+    my $coord = {start=>$start, end=>$end};
+
+    push @{$count{$chrstart}->{$nstart}->{$chrend}->{$nend}->{$sample}}, $coord;
+}
+
+print_groups (\%count);
+
+sub print_groups {
+    my ($rcount) = @_;
+    my %count = %{$rcount};
+
+    foreach my $chrstart (sort {$a<=>$b} keys %count) {
+	foreach my $posstart (sort {$a<=>$b} keys %{$count{$chrstart}}) {
+	    my %fcoord = %{$count{$chrstart}->{$posstart}};
+
+	    foreach my $chrend (sort {$a<=>$b} keys %fcoord) {
+		foreach my $posend (sort {$a<=>$b} keys %{$fcoord{$chrend}}){
+		    my @nsamples = sort {$a cmp $b} (keys %{$fcoord{$chrend}->{$posend}});
+
+		    my $cpos = $fcoord{$chrend}->{$posend};
+
+		    my @coords;
+		    my $totnum=0;
+	    
+		    foreach my $sample (@nsamples) {
+			my ($num, $avgx, $avgy) = calc_moments(@{$cpos->{$sample}});
+			push (@coords, {start=>$avgx, end=>$avgy});
+			$totnum+=$num;
+		    }
+
+		    my ($num, $avgx, $avgy)  = calc_moments(@coords);
+	    
+		    print $chrstart."\t".$avgx."\t".$chrend."\t".$avgy ."\t".$num."\t".$totnum."\t" ;
+	    
+		    print $_."\t" foreach (@nsamples);
+		    print "\n";
+		}
+	    }
+	}
+    }
+}
+
+sub calc_moments {
+    my (@pos) = @_;
+
+    my ($num, $sumx, $sumy) = (0,0,0);
+    foreach my $cpos (@pos) {
+	$num++;
+	$sumx+=$cpos->{start};
+	$sumy+=$cpos->{end};
+    }
+    my $avgx = sprintf ("%d", $sumx/$num);
+    my $avgy = sprintf ("%d", $sumy/$num);
+
+    return ($num, $avgx, $avgy);
+}
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/src/direction_filter.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,55 @@
+use Getopt::Long;
+my ($v);
+
+GetOptions ("v|verbose"  => \$v);   # flag
+
+
+
+open (FILE,"$ARGV[0]") or die "Cant find file\n\n";
+my $dist=0;
+my $pos=0;
+my @max=0;
+my @events=0;
+
+while(<FILE>){
+	$dist=0;
+	@first=split(/\s+/,$_);
+	$numEvents=($_=~tr/\|//)+1;
+	$dist=$first[1]-$pos;
+	push(@max,$_);
+	push(@events,$numEvents);
+#print "STARTING_POS=$pos\n";
+	if(($dist<500)||eof()){
+		until (($dist>500)||eof()){
+			$newline=<FILE>;
+			@second=split(/\s+/,$newline);
+			$numEvents=($newline=~tr/\|//)+1;
+			push(@max,$newline);
+			push(@events,$numEvents);
+			if($v){print "DIST=$dist\nSEC1=@second[1] POS1=$pos;\n";}
+			my $tmp=$pos;
+			$pos=@second[1];
+			$dist=@second[1]-$tmp;
+		}
+	}
+if ($v){print "Corrected dist= $dist\n" if ($v)};
+	#Get the last values since they don't count
+	$NL=pop(@max);
+	$NE=pop(@events);
+	my $idxMax = 0;
+	#Get the index of the largest value in array
+	if ($v){print "Picking from events:\n"};
+	$events[$idxMax] > $events[$_] or $idxMax = $_ for 1 .. $#events;
+
+	my $val=@max[$idxMax];
+	print "$val" unless ($val=~/^0$/) ;
+	
+	
+	@max=$NL;
+	@events=$NE;		
+	my @tmp=split(/\s+/,$NL);
+	$pos=$tmp[1];
+}
+
+close FILE;
+
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/src/reduce_redundancy.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,65 @@
+open(BUFF,"$ARGV[0]") or die "no input file found\n";
+$range="$ARGV[1]";
+my %hash;
+my %store;
+$prev_chr="";
+$next=0;
+while(<BUFF>)
+{
+	chomp($_);
+	#print "$.\n";
+	if($_ !~ m/^#/)
+	{
+		@array=split("\t",$_);
+		$chr=$array[0];
+		$pos=$array[1];
+		$value=$array[@array-1];
+		if($prev_chr ne $chr )
+		{
+			if($prev_chr ne "")
+			{
+				foreach $key (sort {$hash{$b} <=> $hash{$a} } keys %hash)
+                        	{
+                                	print "$store{$key}\n";
+                                	last;
+                        	}
+
+			}
+			$next = $pos+$range;
+			undef(%hash);
+			undef(%store);
+		}
+		if($next< $pos)
+		{	
+			foreach $key (sort {$hash{$b} <=> $hash{$a} } keys %hash)
+			{
+     				print "$store{$key}\n";
+				last;
+			}
+			$next = $pos+$range;
+			undef(%hash);
+			undef(%store);
+			
+		}	
+		if($value eq "NA")
+                {
+                      $hash{$chr." ".$pos." ".$.}=0;
+                }
+                else
+                {
+                       $hash{$chr." ".$pos." ".$.}=$value;
+               	}
+                $store{$chr." ".$pos." ".$.}=$_;
+	}
+	else
+	{
+		print $_."\n";
+	}
+	$prev_chr = $chr;
+}
+foreach $key (sort {$hash{$b} <=> $hash{$a} } keys %hash)
+{
+       print "$store{$key}\n";
+       last;
+}
+
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/src/run_blat.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,68 @@
+#####################################################################################################################################################
+#Purpose: To perform blat and organize blat
+#Date: 07-19-2013
+#####################################################################################################################################################
+use Getopt::Long;
+#reading input arguments
+&Getopt::Long::GetOptions(
+'BLAT_PATH=s'=> \$blatpath,
+'REF_FILE=s'=> \$reffile,
+'INPUT_FILE=s' => \$inputfile,
+'OUTPUT_FILE=s' => \$outputfile,
+'MIN_SCORE=s'=> \$minScore,
+'MIN_IDENTITY=s'=> \$minidentity,
+'BLAT_PORT=s'=>\$blatport
+);
+$blatpath =~ s/\s|\t|\r|\n//g;
+$reffile=~ s/\s|\t|\r|\n//g;
+$inputfile=~ s/\s|\t|\r|\n//g;
+$outputfile=~ s/\s|\t|\r|\n//g;
+$minScore=~ s/\s|\t|\r|\n//g;
+$minidentity=~ s/\s|\t|\r|\n//g;
+$blatport=~ s/\s|\t|\r|\n//g;
+#input arguments
+
+#checking for missing arguments
+if($blatport eq "" || $blatpath eq "" || $reffile eq "" || $inputfile eq "" || $outputfile eq "" || $minScore eq "" || $minidentity eq "")
+{
+	die "missing arguments\n USAGE : perl perl_blat.pl -BLAT_PORT <BLAT_PORT> -MIN_SCORE <MIN_SCORE> -MIN_IDENTITY <MIN_IDENTITY> -BLAT_PATH <PATH TO BLAT FOLDER> -REF_FILE <PATH TO 2bit file> -INPUT_FILE <INPUT CONFIG FILE> -OUTPUT_FILE <OUTPUT FILE>\n";
+}
+
+#parsing the arguments
+
+#unless(-d $outdir)
+#{
+#	system("mkdir -p $outdir");
+#}
+$status=`$blatpath/gfServer status localhost $blatport |wc -l`;
+chomp($status);
+$count = 0;
+while($status < 2 )
+{
+	if($count > 0)
+	{
+		$blatport = $blatport+int(rand(1000))+1;
+	}
+	print "Starting the server\n";
+	$sys ="$blatpath/gfServer start -canStop localhost $blatport $reffile &";
+	print "$sys\n";
+	system($sys);
+	sleep(300);
+	$status=`$blatpath/gfServer status localhost $blatport |wc -l`;
+	chomp($status);
+	$count++;
+	if($count > 5)
+	{
+		die "something wrong with gfServer or command . Failed 5 times\n";
+	}
+}	
+print "querying \n";
+$sys = "$blatpath/gfClient localhost $blatport / $inputfile $outputfile -minScore=$minScore -minIdentity=$minidentity";
+print "$sys\n";
+system($sys);
+print "stoping the server\n";
+#$sys = "$blatpath/gfServer stop localhost $blatport";
+$pid = `ps|grep gfServer|head -1|cut -f1 -d ' '`;
+$sys ="kill -9 $pid";
+print "$sys\n";
+system($sys);
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/2.4/src/standalone_blat2.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,267 @@
+#!/usr/bin/perl -sw
+use Getopt::Long;
+sub usage(){
+    print "
+    Usage: <VCF> -g <genome.2bit> -seq|s <seq.fa> -f genome.fa 
+	-o out.vcf
+	-n contig.names
+        -dist   how wide of a window to look for bp [50]\n
+	-v	verbose option
+        Requires samtools,bedTools, and blat in your path\n;
+        ";
+    die;
+}
+#Initialize values
+my ($blat,$genome,$tei_bed,$vntr_bed,$out_vcf,$contig_names,$contig,$fasta,$uninformative_contigs,$dist,$verbose,$bedTools,$samtools);
+GetOptions ("genome|g=s" => \$genome,
+            "o|out:s" => \$out_vcf,
+            "names|n:s" => \$contig_names,
+            "seq|s=s" => \$contig,
+            "f|fasta:s" => \$fasta,
+	    "b|bad:s" => \$uninformative_contigs,
+            "dist:s" => \$dist,
+	    "v" => \$verbose
+	    );
+#$genome="/data2/bsi/reference/db/hg19.2bit""
+#$blat="/projects/bsi/bictools/apps/alignment/blat/34/blat" ;
+#TEI.bed=egrep "LINE|SINE|LTR" /data5/bsi/epibreast/m087494.couch/Reference_Data/Annotations/hg19.repeatMasker.bed >TEI.bed
+#VNTR_BED=egrep "Satellite|Simple_repeat" /data5/bsi/epibreast/m087494.couch/Reference_Data/Annotations/hg19.repeatMasker.bed > VNTR.bed
+
+
+$blat=`which blat`;
+if (!$blat) {die "Your do not have BLAT in your path\n\n"}
+$samtools=`which samtools`;
+if (!$samtools) {die "Your do not have samtools in your path\n\n"}
+$bedTools=`which sortBed`;
+if (!$bedTools) {die "Your do not have bedTools in your path\n\n"}
+
+
+if (!$dist) {$dist=50}
+if (!$out_vcf) {$out_vcf="out.vcf"}
+if (!$contig_names) {$contig_names="contig.names"}
+if (!$uninformative_contigs) {$uninformative_contigs="uninformative.contigs"}
+
+if ((!$genome)||(!$contig)||(!$fasta)){&usage;die;}
+
+
+open(VCF,"$ARGV[0]") or die "must specify VCF file\n\n";
+open(OUT_VCF,">",$out_vcf) or die "can't open the output VCF\n";
+open(CONTIG_LIST,">",$contig_names) or die "can't open the contig names\n";
+open(BAD_CONTIG_LIST,">",$uninformative_contigs) or die "can't open the contig names\n";
+#print "writing to CONTIG_LIST=$contig_names\n";
+while (<VCF>) {
+    if($_=~/^#/){
+        if ($.==1) {
+            print OUT_VCF $_;
+            print OUT_VCF "##INFO=<ID=STRAND,Number=1,Type=String,Description=\"Strand to which assembled contig aligned\">\n";
+            print OUT_VCF "##INFO=<ID=CONTIG,Number=1,Type=String,Description=\"Name of assembeled contig matching event\">\n";
+            print OUT_VCF "##INFO=<ID=MECHANISM,Number=1,Type=String,Description=\"Proposed mechanism of how the event arose\">\n";
+            print OUT_VCF "##INFO=<ID=INSLEN,Number=1,Type=Integer,Description=\"Length of insertion\">\n";
+            print OUT_VCF "##INFO=<ID=HOM_LEN,Number=1,Type=Integer,Description=\"Length of microhomology\">\n"; 
+            next;
+        }
+    else {
+        print OUT_VCF $_;
+        next;
+        }
+    };
+    chomp;
+
+    ##look for exact location of BP
+    @line=split("\t",$_);
+    my($left_chr,$start,$end);
+
+    #Get left position
+    $left_chr=$line[0];
+    $start=$line[1]-$dist;
+    $end=$line[1]+$dist;
+
+    #Get right position
+    my ($mate_pos,@mate,$mate_chr,$mate_start,$mate_end);
+    $mate_pos=$line[4];
+    $mate_pos=~s/[\[|\]|A-Z]//g;
+    #print "mate_pos=$mate_pos\n";
+    @mate=split(/:/,$mate_pos);
+    $mate_chr=$mate[0]; $mate_pos=$mate[1];
+    $mate_start=$mate_pos-$dist;$mate_end=$mate_pos+$dist;
+    #print "$left_chr:$start-$end\n$mate_chr:$mate_start-$mate_end\n";
+    
+    #Run through blat
+    my ($result1,$result2);
+    my $target1=join("",$left_chr,":",$start,"-",$end);
+    my $target2=join("",$mate_chr,":",$mate_start,"-",$mate_end);
+    #print "target1=$target1\ttarget2=$target2\n";die;
+    $result1=get_result($target1);
+    $result2=get_result($target2);
+   
+
+    my $NOV_INS="";
+    #If there is a NOV_INS, then there shouldn't be any output, so trick the results
+    if ($_=~/EVENT=NOV_INS/) {
+        $mate_start=$start;
+        $NOV_INS="true";
+        if (!$result1) {$result1=join("\t","0","0","0","0","0","0","0","0","+","UNKNOWN_NODE","0","0",$dist);}
+        if (!$result2) {$result2=join("\t","0","0","0","0","0","0","0","0","+","UNKNOWN_NODE","0","0",$dist);}
+   }
+    
+    #Skip over events that aren't supported
+    if ((!$result1)||(!$result2)){
+	my @tmp1=split("\t",$result1);
+	my @tmp2=split("\t",$result2);
+	if ($tmp1[9]) {print BAD_CONTIG_LIST "$tmp1[9]\n"}
+	if ($tmp2[9]) {print BAD_CONTIG_LIST "$tmp2[9]\n" }
+	next;
+    }
+    #Parse blat results   
+    my @result1=split("\t",$result1);
+    my @result2=split("\t",$result2);
+if($result2[9] ne $result1[9]){print "$result2[9] != $result1[9]\n";next}
+    #print "@result1\n@result2\n";die;
+    my $pos1=$start+($result1[12]-$result1[11]);
+    my $pos2=$mate_start+($result2[12]-$result2[11]);
+    #print "$_\n$pos1\t$pos2\n";
+    
+    ##############################################################
+    ### Build Classifier
+    
+    my ($QSTART1,$QEND1,$QSTART2,$QEND2,$len,$MECHANISM, $INSERTION, $DELETION, $bed_res1,$bed_res2);
+    $MECHANISM="UNKNOWN";
+    $len="UNKNOWN";
+    #Make sure the later event is second
+    if ($result1[11] <  $result2[11]){
+	$QSTART1=$result1[11];
+	$QEND1=$result1[12];
+	$QSTART2=$result2[11];
+	$QEND2=$result2[12];
+    }
+    else{
+	$QSTART1=$result2[11];
+	$QEND1=$result2[12];
+	$QSTART2=$result1[11];
+	$QEND2=$result1[12];
+    }
+    #Now calculate the difference between $QEND1 and QSTART2
+    if($verbose){print "QEND1=$QEND1\tQSTART2=$QSTART2\n";}
+    $len=$QEND1-$QSTART2;
+    #Check for TEI
+    if($_=~/MECHANISM=TEI/){$MECHANISM="TEI"}
+    elsif($_=~/MECHANISM=VNTR/){$MECHANISM="VNTR"}
+    else{
+        if ($len==0) {$MECHANISM="NHEJ"}
+	else{
+	    if ($len>0){$INSERTION="true"}
+		if ($len<0){$DELETION="true"}
+		    if ($INSERTION){
+		        if ($len>10) {$MECHANISM="FOSTES"}
+		        else{$MECHANISM="NHEJ"}
+		    }
+		elsif ($DELETION){
+		    if ($len>100) {$MECHANISM="NAHR"}
+		        elsif ($len > 2){$MECHANISM="altEJ"}
+		        else{$MECHANISM="NHEJ"}
+	        }
+	    }	
+	}
+
+    
+#if ($verbose){print "@result1";print "@result2";}
+
+    #print out VCF
+    #############################################################
+    # create temporary variable name
+    #############################################################
+    srand (time ^ $$ ^ unpack "%L*", `ps axww | gzip -f`);
+    my $random_name=join "", map { ("a".."z")[rand 26] } 1..8;
+    my $random_name2=join "", map { ("a".."z")[rand 26] } 1..8;
+   
+   #Get Ref Base
+   my ($ref_base,$alt_base,$tmp_mate_pos);
+   $ref_base=getBases($left_chr,$pos1,$fasta);
+   $alt_base=getBases($mate_chr,$pos2,$fasta);#print "ALT=$alt_base\n";
+   #Substitute the new mate position and base
+   $tmp_mate_pos=$line[4];
+   $tmp_mate_pos=~s/$mate_pos/$pos2/;
+   $tmp_mate_pos=~s/[A-Z]/$alt_base/;
+   #split apart the INFO field to adjust the ISIZE and MATEID
+   my $NEW_INFO="";
+   my @INFO=split(/;/,$line[7]);
+   for (my $i=0;$i<@INFO;$i++){
+        if ($INFO[$i] =~ /^ISIZE=/){
+            my @tmp=split(/=/,$INFO[$i]);
+            $NEW_INFO.="ISIZE=";
+            my $new_ISZIE=$pos2-$pos1;
+            $NEW_INFO.=$new_ISZIE
+            }
+        elsif($INFO[$i] =~ /^MATE_ID=/){
+            $NEW_INFO.=";MATE_ID=".$random_name2 . ";";
+        }
+        else{
+            $NEW_INFO.=$INFO[$i].";";
+        }
+   }
+   #ADD in strand and name
+   $NEW_INFO.="STRAND=".$result1[8];
+   $NEW_INFO.=";CONTIG=".$result1[9];
+   if($MECHANISM!~/TEI|VNTR/){$NEW_INFO.=";MECHANISM=".$MECHANISM;}
+    $NEW_INFO.=";HOM_LEN=".$len;
+   #don't pring contig nage if its a novel insertion
+   if(!$NOV_INS){print CONTIG_LIST "$result1[9]\n";}#else{print "I'm not printing $result1[9]\n";}
+    print OUT_VCF "$left_chr\t$pos1\t$random_name\t$ref_base\t$tmp_mate_pos\t1000\tPASS\t$NEW_INFO\t$line[8]\t$line[9]\n";
+    #Now go through and fill info in for mate
+    #Substitute the new mate position and base
+   $tmp_mate_pos=$line[4];
+   $tmp_mate_pos=~s/$mate_pos/$pos1/;
+   $tmp_mate_pos=~s/[A-Z]/$ref_base/;
+   $tmp_mate_pos=~s/$mate_chr/$left_chr/;
+    $NEW_INFO="";
+    @INFO=split(/;/,$line[7]);
+   for (my $i=0;$i<@INFO;$i++){
+    if ($INFO[$i] =~ /^ISIZE=/){
+            my @tmp=split(/=/,$INFO[$i]);
+            $NEW_INFO.="ISIZE=";
+            my $new_ISZIE=$pos2-$pos1;
+            $NEW_INFO.=$new_ISZIE
+            }
+        elsif($INFO[$i] =~ /^MATE_ID=/){
+            $NEW_INFO.=";MATE_ID=".$random_name.";";
+        }
+        else{
+            $NEW_INFO.=$INFO[$i].";";
+        }
+   }
+    #ADD in strand and name
+   $NEW_INFO.="STRAND=".$result2[8];
+   $NEW_INFO.=";CONTIG=".$result2[9];
+   if ($MECHANISM!~/TEI|VNTR/){$NEW_INFO.=";MECHANISM=".$MECHANISM;}
+    $NEW_INFO.=";HOM_LEN=".$len;
+
+   #don't pring contig nage if its a novel insertion
+   if(!$NOV_INS){print CONTIG_LIST "$result2[9]\n";} #else{print "I'm not printing $result1[9]\n";}
+    print OUT_VCF "$mate_chr\t$pos2\t$random_name2\t$alt_base\t$tmp_mate_pos\t1000\tPASS\t$NEW_INFO\t$line[8]\t$line[9]\n";
+	if ($verbose){print  "$mate_chr\t$pos2\t$random_name2\t$alt_base\t$tmp_mate_pos\t1000\tPASS\t$NEW_INFO\t$line[8]\t$line[9]\n";}
+}
+close VCF;
+close OUT_VCF;
+close CONTIG_LIST;
+close BAD_CONTIG_LIST;
+sub get_result{
+        my $target=($_[0]);
+if($verbose){print "target=$target\n"}#;die;
+        my $cmd="blat $genome:$target $contig /dev/stdout -t=dna -q=dna -noHead|egrep -v \"Searched|Loaded\" |head -1";
+
+if ($verbose){print "$cmd\n"}        #print "$cmd\n";die;
+        my $result=`$cmd`;
+        next if (!$cmd);
+        return ($result);
+}
+sub getBases{
+        my ($chr,$pos1,$fasta)=@_;
+        my @result=();
+        if ($pos1 <0){print "$pos1 is not a valid position (likely caused by circular MT chromosome)\n";$result[1]="NA";};
+        @result = `samtools faidx $fasta $chr:$pos1-$pos1`;
+        if(!$result[1]){$result[1]="NA"};
+        chomp($result[1]);
+        return uc($result[1]);
+}
+
+
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/fasta_indexes.loc.sample	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,29 @@
+#This is a sample file distributed with Galaxy that enables tools
+#to use a directory of Samtools indexed sequences data files.  You will need
+#to create these data files and then create a fasta_indexes.loc file
+#similar to this one (store it in this directory) that points to
+#the directories in which those files are stored. The fasta_indexes.loc
+#file has this format (white space characters are TAB characters):
+#
+# <unique_build_id>	<dbkey>	<display_name>	<file_base_path>
+#
+#So, for example, if you had hg19 Canonical indexed stored in
+#
+# /depot/data2/galaxy/hg19/sam/,
+#
+#then the fasta_indexes.loc entry would look like this:
+#
+#hg19canon	hg19	Human (Homo sapiens): hg19 Canonical	/depot/data2/galaxy/hg19/sam/hg19canon.fa
+#
+#and your /depot/data2/galaxy/hg19/sam/ directory
+#would contain hg19canon.fa and hg19canon.fa.fai files.
+#
+#Your fasta_indexes.loc file should include an entry per line for
+#each index set you have stored.  The file in the path does actually
+#exist, but it should never be directly used. Instead, the name serves
+#as a prefix for the index file.  For example:
+#
+#hg18canon	hg18	Human (Homo sapiens): hg18 Canonical	/depot/data2/galaxy/hg18/sam/hg18canon.fa
+#hg18full	hg18	Human (Homo sapiens): hg18 Full	/depot/data2/galaxy/hg18/sam/hg18full.fa
+#hg19canon	hg19	Human (Homo sapiens): hg19 Canonical	/depot/data2/galaxy/hg19/sam/hg19canon.fa
+#hg19full	hg19	Human (Homo sapiens): hg19 Full	/depot/data2/galaxy/hg19/sam/hg19full.fa
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/softsearch/SoftSearch.pl	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,1192 @@
+#!/usr/bin/perl
+
+####
+#### Usage: SoftSearch.pl [-lqrmsd] -b <BAM> -f <Genome.fa> -sam <samtools path> -bed <bedtools path>
+#### Created 1-30-2012 by Steven Hart, PhD
+#### hart.steven@mayo.edu
+#### Required bedtools & samtools to be in path
+
+
+use lib "/home/plus91/shed_tools/toolshed.g2.bx.psu.edu/repos/plus91-technologies-pvt-ltd/softsearch/2.4/lib" ;
+
+use Getopt::Long;
+use strict;
+use warnings;
+#use Data::Dumper;
+use LevD;
+use File::Basename;
+
+my ($INPUT_BAM,$INPUT_FASTA,$OUTPUT_FILE,$minSoft,$minSoftReads,$dist_To_Soft,$bedtools,$samtools);
+my ($minRP, $temp_output, $num_sd, $MapQ, $chrom, $unmated_pairs, $minBQ, $pair_only, $disable_RP_only);
+my ($levD_local_threshold, $levD_distl_threshold,$pe_upper_limit,$high_qual,$sv_only,$blacklist,$genome_file,$verbose);
+
+my $cmd = "";
+
+#Declare variables
+GetOptions(
+	'b=s' => \$INPUT_BAM,
+	'f=s' => \$INPUT_FASTA,
+	'o:s' => \$OUTPUT_FILE,
+	'm:i' => \$minRP,
+	'l:i' => \$minSoft,
+	'r:i' => \$minSoftReads,
+	't:i' => \$temp_output,
+	's:s' => \$num_sd,
+	'd:i' => \$dist_To_Soft,
+	'q:i' => \$MapQ,
+	'c:s' => \$chrom,
+	'u:s' => \$unmated_pairs,
+	'x:s' => \$minBQ,
+	'p' => \$pair_only,
+	'g' => \$disable_RP_only,
+	'j:s' => \$levD_local_threshold,
+	'k:s' => \$levD_distl_threshold,
+        'a:s' => \$pe_upper_limit,
+        'e:s' => \$high_qual,
+	'L' => \$sv_only,
+	'v' => \$verbose, 
+	'blacklist:s' => \$blacklist,
+	'genome_file:s' => \$genome_file,
+	"help|h|?"	=> \&usage);
+
+unless($sv_only){$sv_only=""};
+if(defined($INPUT_BAM)){$INPUT_BAM=$INPUT_BAM} else {print usage();die "Where is the BAM file?\n\n"}
+if(defined($INPUT_FASTA)){$INPUT_FASTA=$INPUT_FASTA} else {print usage();die "Where is the fasta file?\n\n"}
+my ($fn,$pathname) = fileparse($INPUT_BAM,".bam");
+my $index=`ls $pathname/$fn*bai|head -1`;
+#my $index =`ls \${INPUT_BAM%.bam}*bai`;
+#print "INDEX=$index\n";
+if(!$index){die "\n\nERROR: you need index your BAM file\n\n"}
+
+### get current time
+print "Start Time : " . &spGetCurDateTime() . "\n";
+my $now = time;
+
+#if(defined($OUTPUT_FILE)){$OUTPUT_FILE=$OUTPUT_FILE} else {$OUTPUT_FILE="output.vcf"; print "\nNo outfile specified.  Using output.vcf as default\n\n"}
+if(defined($minSoft)){$minSoft=$minSoft} else {$minSoft=5}
+if(defined($minRP)){$minRP=$minRP} else {$minRP=5}
+if(defined($minSoftReads)){$minSoftReads=$minSoftReads} else {$minSoftReads=5}
+if(defined($dist_To_Soft)){$dist_To_Soft=$dist_To_Soft} else {$dist_To_Soft=300}
+if(defined($num_sd)){$num_sd=$num_sd} else {$num_sd=6}
+if(defined($MapQ)){$MapQ=$MapQ} else {$MapQ=20}
+
+unless (defined $pe_upper_limit) { $pe_upper_limit = 10000; }
+unless (defined $levD_local_threshold) { $levD_local_threshold = 0.05; }
+unless (defined $levD_distl_threshold) { $levD_distl_threshold = 0.05; }
+#Get sample name if available
+my $SAMPLE_NAME="";
+my $OUTNAME ="";
+$SAMPLE_NAME=`samtools view -f2 -H $INPUT_BAM|awk '{if(\$1~/^\@RG/){sub("ID:","",\$2);name=\$2;print name}}'|head -1`;
+$SAMPLE_NAME=~s/\n//g;
+if (!$OUTPUT_FILE){
+	if($SAMPLE_NAME ne ""){$OUTNAME=$SAMPLE_NAME.".vcf"}
+	else {$OUTNAME="output.vcf"}
+}
+else{$OUTNAME=$OUTPUT_FILE}
+
+print "Writing results to $OUTNAME\n";
+
+
+##Make sure if submitting on SGE, to prepned the "chr".  Not all referecne FAST files require "chr", so we shouldn't force the issue.
+if(!defined($chrom)){$chrom=""}
+if(!defined($unmated_pairs)){$unmated_pairs=0}
+
+my $badQualValue=chr($MapQ);
+if(defined($minBQ)){ $badQualValue=chr($minBQ); }
+
+if($badQualValue  eq "#"){$badQualValue="\#"}
+
+# adding and cheking for samtools and bedtools in the PATh
+## check for bedtools and samtools in the path
+$bedtools=`which intersectBed` ;
+if(!defined($bedtools)){die "\nError:\n\tno bedtools. Please install bedtools and add to the path\n";}
+#$samtools=`samtools 2>&1`;
+$samtools=`which samtools`;
+if($samtools !~ /(samtools)/i){die "\nError:\n\tno samtools. Please install samtools and add to the path\n";}
+
+print "Usage = SoftSearch.pl -l $minSoft -q $MapQ -r $minSoftReads -d $dist_To_Soft -m $minRP -s $num_sd -c $chrom -b $INPUT_BAM -f $INPUT_FASTA -o $OUTNAME \n\n";
+sub usage {
+	print "\nusage: SoftSearch.pl [-cqlrmsd] -b <BAM> -f <Genome.fa> \n";
+	print "\t-q\t\tMinimum mapping quality [20]\n";
+	print "\t-l\t\tMinimum length of soft-clipped segment [5]\n";
+	print "\t-r\t\tMinimum depth of soft-clipped reads at position [5]\n";
+	print "\t-m\t\tMinimum number of discordant read pairs [5]\n";
+	print "\t-s\t\tNumber of sd away from mean to be considered discordant [6]\n";
+	print "\t-u\t\tNumber of unmated pairs [0]\n";
+	print "\t-d\t\tMax distance between soft-clipped segments and discordant read pairs [300]\n";
+	print "\t-o\t\tOutput file name [output.vcf]\n";
+	print "\t-t\t\tPrint temp files for debugging [no|yes]\n";
+	print "\t-c\t\tuse only this chrom or chr:pos1-pos2\n";
+	print "\t-p\t\tuse paired-end mode only. In other words, don't try to find soft-clipping events!\n";
+	print "\t-g\t\tEnable paired-only seach. This will look for discordant read pairs even without soft clips.\n";
+        print "\t-a\t\tset the minimum distance for a discordant read pair without soft-clipping info [10000]\n";
+        print "\t-L\t\tFlag to print out even small deletions (low quality)\n";
+        print "\t-e\t\tdisable strict quality filtering of base qualities in soft-clipped reads [no]\n";
+        print "\t-blacklist\tareas of the genome to skip calling.  Requires -genome_file\n";
+        print "\t-genome_file\ttab seperated value of chromosome name and length.  Only used with -blacklist option\n\n";
+
+	exit 1;
+	}
+
+
+#############################################################
+# create temporary variable name
+#############################################################
+srand (time ^ $$ ^ unpack "%L*", `ps axww | gzip -f`);
+our $random_name=join "", map { ("a".."z")[rand 26] } 1..8;
+
+#############################################################
+## create green list
+##############################################################
+#
+my $new_blacklist="";
+if($blacklist){
+        if(!$genome_file){die "if using a blacklist, you must also specify the location of a genome_file
+        The format of the genome_file should be
+                chrom   size
+                chr1    249250621
+                chr2    243199373
+                ...
+
+        If using hg19, you can ge the genome file by
+                mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \"select chrom, size from hg19.chromInfo\"  > hg19.genome";}
+        
+	$cmd=join("","complementBed -i $blacklist -g $genome_file >",$random_name,".bed") ;
+	system ($cmd);
+	$new_blacklist=join(""," -L ",$random_name,".bed ");
+	}
+
+if($verbose){print "CMD=$cmd\nBlacklist is $new_blacklist\n";}
+
+
+
+
+
+#############################################################
+# Calcualte insert size distribution of properly mated reads
+#############################################################
+
+#Change for compatability with other operating systems
+#my $metrics=`samtools view -q $MapQ -f2 $INPUT_BAM $chrom|head -10000|cut -f9|awk '{if (\$1<0){\$1=-\$1}else{\$1=\$1} sum+=\$1; sumsq+=\$1*\$1} END {print sum/NR, sqrt(sumsq/NR - (sum/NR)**2)}'`;
+
+my $metrics=`samtools view -q $MapQ -f2 $INPUT_BAM $chrom|cut -f9|head -10000|awk '{if (\$1<0){\$1=-\$1}else{\$1=\$1} sum+=\$1; sumsq+=\$1*\$1} END {print sum/NR, sqrt(sumsq/NR - (sum/NR)^2)}'`;
+#my ($mean,$stdev)=split(/ /,$metrics);
+my ($mean,$stdev)=split(/\s/,$metrics);
+$stdev=~s/\n//;
+my $upper_limit=int($mean+($num_sd*$stdev));
+my $lower_limit=int($mean-($num_sd*$stdev));
+die if (!$mean);
+print qq{The mean insert size is $mean +/- $stdev (sd)
+The upper limit = $upper_limit
+The lower limit = $lower_limit\n
+};
+if($lower_limit<0){
+	print "Warning!! Given this insert size distribution, we can not call small indels.  No other data will be affected\n";
+	$lower_limit=1;
+}
+my $tmp_name=join ("",$random_name,".tmp.bam");
+my $random_file_sc = "";
+my $command = "";
+
+#############################################################
+# Make sam file that has soft clipped reads
+#############################################################
+#give file a name
+if(!defined($pair_only)){
+	$random_file_sc=join ("",$random_name,".sc.sam");
+	$command=join ("","samtools view -q $MapQ -F 1024 $INPUT_BAM $chrom $new_blacklist| awk '{OFS=\"\\t\"}{c=0;if(\$6~/S/){++c};if(c == 1){print}}' | perl -ane '\$TR=(\@F[10]=~tr/\#//);if(\$TR<2){print}' > ", $random_file_sc);
+
+	print "Making SAM file of soft-clipped reads\n";
+if($verbose){	print "$command\n";}
+	system("$command");
+
+	#############################################################
+	# Find areas that have deep enough soft-clip coverage
+	print "Identifying soft-clipped regions that are at least $minSoft bp long \n";
+	open (FILE,"$random_file_sc")||die "Can't open soft-clipped sam file $random_file_sc\n";
+
+	my $tmpfile=join("",$random_file_sc,".sc.passfilter");
+	open (OUT,">$tmpfile")||die "Can't write files here!\n";
+
+	while(<FILE>){
+		@_ = split(/\t/, $_);
+		#### parse CIGAR string and create a hash of array of each operation
+		my @CIGAR = split(/([0-9]+[SMIDNHXP])/, $_[5]);
+		my $hash;
+		map { push(@{$hash->{$2}}, $1) if (/(\d+)([SMIDNHXP])/) } @CIGAR;
+
+		#for ($i=0; $i<=$#softclip_pos; $i++)	{
+		foreach my $softclip (@{$hash->{S}}) {
+			#if	($CIGAR[$softclip_pos[$i]] > $minSoft){
+			if	($softclip > $minSoft){
+				###############Make sure base qualities don't have more than 2 bad marks
+				my $qual=$_[10];
+				my $TR=($qual=~tr/$badQualValue//);
+				if($badQualValue eq "#"){ $TR=($qual=~tr/\#//); }
+				#Skip the soft clip if there is more than 2 bad qual values
+				#next if($TR > 2);
+#				if (!$high_qual){next if($TR > 2);}
+				print OUT;
+				last;
+			}
+		}
+	}
+	close FILE;
+	close OUT;
+
+	$command=join(" ","mv",$tmpfile,$random_file_sc);
+if($verbose){	print "$command\n";}
+	system("$command");
+}
+
+#########################################################
+#Stack up SoftClips
+#########################################################
+my $random_file=join("",$random_name,".sc.direction.bed");
+if(!defined($pair_only)){
+        open (FILE,"$random_file_sc")|| die "Can't open sam file\n";
+        #$random_file=join("",$random_name,".sc.direction");
+
+        print "Calling sides of soft-clips\n";
+        #\nTMPOUT=$random_file\tINPUT=$random_file_sc\n\n";
+        open (TMPOUT,">$random_file")|| die "Can't create tmp file\n";
+
+        while (<FILE>){
+                @_ = split(/\t/, $_);
+                #### parse CIGAR string and create a hash of array of each operation
+                my @CIGAR = split(/([0-9]+[SMIDNHXP])/, $_[5]);
+                my $hash;
+                map { push(@{$hash->{$2}}, $1) if (/(\d+)([SMIDNHXP])/) } @CIGAR;
+
+                #### next if softclips on each end
+                next if ($_[5] =~ /^[0-9]+S.*S$/);
+
+                #### next softclip occurs in the middle
+                next if ($_[5] =~ /^[0-9]+[^S][0-9].*S.+$/);
+
+                my $softclip = $hash->{S}[0];
+
+                my $end1 = 0;
+                my $end2 = 0;
+                my $softBases = "";
+		my $right_corrected="";my $left_corrected="";
+
+        if ($softclip > $minSoft) {
+		
+                        ####If the soft clip occurs at end of read and its on the minus strand, then it's a right clip
+                        if ($_[5] =~ /^.*S$/) {
+                                $end1=$_[3]+length($_[9])-$softclip-1;
+                                $end2=$end1+1;
+next if ($end1<0);
+                                #RIGHT clip on Minus
+                                $softBases=substr($_[9], length($_[9])-$softclip, length($_[9]));
+                                #Right clips don't always get clipped correctly, so fix that
+                                # Check to see if sc base matches ref
+                                $right_corrected=baseCheck($_[2],$end2,"right",$softBases);
+                               print TMPOUT "$right_corrected\n"
+
+                        } else {
+                                #### Begins with S (left clip)
+                                $end1=$_[3]-$softclip;
+next if ($end1<0);
+
+                                $softBases=substr($_[9], 0,$softclip);#print "TMP=$softBases\n";
+        			$left_corrected=baseCheck($_[2],$end1,"left",$softBases);
+if(!$left_corrected){print "baseCheck($_[2],$end1,left,$softBases)\n";next}
+                               print TMPOUT "$left_corrected\n";
+#print "\nSEQ=$_[9]\t\n";
+
+                        }
+        }
+  }
+close FILE;
+close TMPOUT;
+}
+sub baseCheck{
+        my ($chrom,$pos,$direction,$softBases)=@_;
+        #skip if position is less than 0, which is caused by MT DNA
+        return if ($pos<0);
+        my $exit="";
+
+        while(!$exit){
+        if($direction=~/right/){
+                        my $refBase=getSeq($chrom,$pos,$INPUT_FASTA);
+                        my $softBase=substr($softBases,0,1);
+                        if ($softBase !~ /$refBase/){
+                                my $value=join("\t",$chrom,$pos,$pos+1,join("|",$softBases,$direction));
+                                $exit="STOP";
+                                return $value;
+                        }
+                        else{
+                                $pos=$pos+1;
+                                $softBases=substr($softBases, 1,length($softBases));
+                        }
+         }
+        else{
+                        my $refBase=getSeq($chrom,$pos+1,$INPUT_FASTA);
+                        my $softBase=substr($softBases,-1,1);
+                        if ($softBase !~ /$refBase/){
+                                $pos=$pos-1+length($softBases);
+                                my $value=join("\t",$chrom,$pos-1,$pos,join("|",$softBases,$direction));
+                                $exit="STOP";
+                                return $value;
+                        }
+                        else{
+                                $pos=$pos-1;
+                                $softBases=substr($softBases, 0, -1);
+                                #print "Trying again $softBases\n";
+                       }
+
+        }
+
+}
+}
+#Remove SAM files to conserve space
+unlink($random_file_sc);
+
+
+my $random_file_disc="$INPUT_BAM";
+###
+#
+######################################################
+# Transform Read pair groups into softclip equivalents
+######################################################
+#
+#
+#
+my $v="";
+#if($disable_RP_only){
+print "Running Bam2pair.pl\n";
+print "Looking for discordant read pairs without requiring soft-clipping information\n";
+	use FindBin qw($Bin);
+	my $path=$Bin;
+#	print"\n\nPATH=$path\n\n";
+if($verbose){$v="-v"}
+	my $tmp_out=join("",$random_name,"RP.out");
+	$command=join("","perl ",$path,"/Bam2pair.pl -b $random_file_disc  -o $tmp_out -isize $pe_upper_limit -winsize $dist_To_Soft -min $minRP -chrom $chrom -prefix $random_name -q $MapQ -blacklist $random_name.bed $v");
+if($verbose){	print "$command\n"};
+	system("$command");
+	$command=join("","perl -ane '\$end1=\@F[1];\$end2=\@F[3];print join(\"\\t\",\@F[0..1],\$end1,\"unknown|left\");print \"\\n\";print join(\"\\t\",\@F[2..3],\$end2,\"unknown|left\");print \"\\n\"' ", $tmp_out," >> ",$random_file);
+if($verbose){print "$command\n"};
+	system($command);
+	unlink($tmp_out);
+#}
+#
+
+
+######################################################
+unlink("$random_file","$tmp_name","$random_file","$index","$random_name","$new_blacklist") if (-z $random_file || ! -e $random_file ) ;
+if (-z $random_file || ! -e $random_file){
+	print "Softclipped file is empty($random_file).\nNo soft clipping found using desired paramters\n\n";
+	open (OUT,">$OUTNAME")||die "Can't write files here!\n";
+        &print_header();
+        close OUT;
+        exit;
+	}
+
+
+#############################################################
+#  Make sure there are enough soft-clippped supporting reads
+#############################################################
+my $outfile=join("",$random_file,".sc.merge.bed");
+#sortbed -i .sc.direction | mergeBed -nms -d 25 -i stdin > .sc.merge.bed
+$command=join(" ","sortBed -i",$random_file," | mergeBed  -nms -i stdin","|egrep \";|,\"","|awk '{OFS=\"\t\"}(NF==4)'",">",$outfile);
+
+print "$command\n" if ($verbose);
+system("$command");
+
+if (-z $outfile || ! -e $outfile){
+	unlink("$tmp_name","$random_file","$outfile","$index","$random_name","$new_blacklist"); 
+	print "mergeBed file is empty.\nNo strucutral variants found\n\n" ;
+        open (OUT,">$OUTNAME")||die "Can't write files here!\n";
+        &print_header();
+        close OUT;
+        exit;
+}
+
+print "completed mergeBed\n";
+
+###############################################################
+# If left and right are on the same line, make into 2 lines
+###############################################################
+open (INFILE,$outfile)||die "couldn't open temp file : $. \n\n";
+my $tmp2=join("",$random_name,".sc.fixed.merge.bed");
+#print "INFILE=$outfile\tOUTFILE=$tmp2\n\n";
+#INPUT FORMAT=chr9\t131467\t131473\tATGCTTATTAAAA|left;TTATTAAAAGCATA|left
+open (OUTFILE,">$tmp2")||die "couldn't create temp file : $. \n\n";
+while(<INFILE>){
+	chomp $_;
+	my $l = $_;
+
+	my @a = split(/\t/, $l);
+	my $info = $a[3];
+	my @info_arr = split(/\;/, $info);
+	my @left_arr=();
+	my @right_arr=();
+	@left_arr = grep(/left/, @info_arr);
+	@right_arr = grep(/right/, @info_arr);
+
+	#New
+	my $left = join(";", @left_arr);
+	my $right = join(";", @right_arr);
+	$info = join(";", @info_arr);
+
+	if((@left_arr) && (@right_arr)){
+		print OUTFILE "$a[0]\t$a[1]\t$a[2]\t$left\n$a[0]\t$a[1]\t$a[2]\t$right\n";
+	} else{
+		my $all=join("\t",@a[0..2],$info);
+		print OUTFILE "$all\n";
+	}
+}
+
+# make sure output file name is $outfile
+$command=join(" ","sed -e '/ /s//\t/g'", $tmp2,"|awk 'BEGIN{OFS=\"\\t\"}(NF==4)'", "|perl -pne 's/ /\t/g'>",$outfile);
+system("$command");
+if($verbose){print "$command\n"};
+unlink("$tmp_name","$random_file","$tmp2","$outfile","$index","random_name","$new_blacklist") if (-z $outfile || ! -e $outfile) ;
+ if (-z $outfile || ! -e $outfile){
+	print "Fixed mergeBed file is empty($outfile).\nNo strucutral variants found\n\n";
+        open (OUT,">$OUTNAME")||die "Can't write files here!\n";
+        &print_header();
+        close OUT;
+        exit;
+}
+
+print "completed fixing mergeBed\n\n";
+
+###############################################################
+# Seperate directions of soft clips
+###############################################################
+my $left_sc = join("", "left", $tmp2);
+my $right_sc = join("", "right", $tmp2);
+use FindBin qw($Bin);
+#my $path=$Bin;
+
+$command=join("","grep left ", $tmp2, " |sed -e '/left /s//left\;/g' | sed -e '/ /s//\t/g'|perl ".$path."/direction_filter.pl - >",$left_sc);
+system("$command");
+#print "$command\n";
+$command=join("","grep right ", $tmp2, " |sed -e '/right /s//right\;/g' | sed -e '/ /s//\t/g'|perl ".$path."/direction_filter.pl - >",$right_sc);
+#$command=join(" ","grep right ", $tmp2, " |sed -e '/right /s//right\;/g' | sed -e '/ /s//\t/g' >",$right_sc);
+system("$command");
+#print "$command\n";
+#die "CHECK $right_sc\n";
+
+###############################################################
+# Count the number and identify directions of soft clips
+###############################################################
+print "Count the number and identify directions of soft clips\n";
+#print "looking in $outfile\n";
+$outfile=join("",$random_name,".sc.fixed.merge.bed");
+
+open (INFILE,$outfile)||die "couldn't open temp file\n\n";
+my $tmp3 = join("", $random_file, "predSV");
+open (OUTFILE, ">$tmp3")||die "couldn't create temp file\n\n";
+while(<INFILE>){
+chomp;
+	@_=split(/\t/,$_);
+	my $count=tr/\;//;$count+=tr/\,//;
+	$count=$count+1;
+	my $left=0;
+	my $right=0;
+
+	while ($_ =~ /left/g) { $left++ } # count number of right clips
+	while ($_ =~ /right/g) { $right++ } # count number of left clips
+
+	###############################################################
+	if ($count >= $minSoftReads){
+		####get longets soft-clipped read
+		my @clips=split(/\;|,|\|/,$_[3]);
+
+		my ($max, $temp, $temp2, $temp3, $dir, $maxSclip) = (0) x 6;
+		for (my $i=0; $i<$count; $i++) {
+			my $plus1=$i+1;
+			$temp=length($clips[$i]);
+			$temp2=$clips[$plus1];
+			$temp3=$clips[$i];
+
+			if ($temp > $max){
+				$maxSclip=$temp3;
+				$max =$temp;
+				$dir=$temp2;
+			} else {
+				$max=$max;
+				$dir=$dir;
+				$maxSclip=$maxSclip;
+			}
+			$i++;
+		}
+		my $order2 = join("|", $left, $right);
+        #print join ("\t",@_[0..2],$maxSclip,$max,$dir,$count,$order2) . "\n";
+		print OUTFILE join ("\t",@_[0..2],$maxSclip,$max,$dir,$count,$order2) . "\n";
+	} elsif($_=~/unknown/){
+	print OUTFILE join ("\t",@_[0..2],"NA","NA","left","NA","NA|NA") . "\n";
+        print OUTFILE join ("\t",@_[0..2],"NA","NA","right","NA","NA|NA") . "\n";
+	}
+	####Format is Chrom,start, end,longest Soft-clip,length of longest Soft-clip, direction of longest soft-clip,#supporting softclips,#right Sclips|#left Sclips
+}
+close INFILE;
+close OUTFILE;
+
+unlink("$tmp2","$tmp_name","$random_file","$tmp3","$outfile","$index","$random_name","$right_sc","$left_sc","$new_blacklist") if (-z $tmp3 || !-e $tmp3) ;
+
+ if (-z $tmp3 || !-e $tmp3){
+	print "No structural variants found while Counting the number and identify directions of soft clips.\n" ;
+
+	open (OUT,">$OUTNAME")||die "Can't write files here!\n";
+	&print_header();
+	close OUT;
+	exit;
+
+}
+
+print "Done counting Softclipped reads\n";
+###############################################################
+#### Print header information
+###############################################################
+open (OUT,">$OUTNAME")||die "Can't write files here!\n";
+&print_header();
+close OUT;
+
+###############################################################
+###############################################################
+#### DO the bulk of the work
+###############################################################
+use List::Util qw(min max);
+open (FILE,"$tmp3")|| die "Can't open file\n";
+open (OUT,">>$OUTNAME")|| die "Can't open file\n";
+
+#print "\nusing $tmp3 and writing to $OUTPUT_FILE \n";
+while (<FILE>){
+	#If left clip {+- or -- or -+ }{+- are uninformative b/c they go upstream}
+	#If right clip {++ or -- or +-}
+	chomp $_;
+	my $line = $_;
+	my @info = split(/\t/, $_);
+
+	if($info[5] eq "left") {
+		bulk_work("left", $line, $random_file_disc);
+
+	} elsif ($info[5] eq "right") {
+		bulk_work("right", $line, $random_file_disc);
+	}
+#if($. ==6){print "THIS IS LINE 6\n$_\n";die}
+print "Completed line $.\n" if ($verbose);
+}
+close FILE;
+close OUT;
+
+###############################################################################
+###############################################################################
+#### Delete temp files
+my $meregedBed=join("",$random_name,".sc.direction.bed.sc.merge.bed");
+
+if(defined($temp_output)){$temp_output=$temp_output} else {$temp_output="no"}
+
+if ($temp_output eq "no"){
+	unlink("$tmp_name","$random_file","$tmp2",,"$tmp3","$outfile","$index","$random_name","$right_sc","$left_sc","$meregedBed","$random_name.bed");
+}
+####Sort VCF
+my $tmp=join(".",$random_name,"tmp");
+#Get header
+$cmd="grep \"#\" $OUTNAME > $tmp";
+system($cmd);
+#sort results
+$cmd="grep -v \"#\" $OUTNAME|perl -pne 's/chr//'|sort -k1,1n -k2,2n|perl -ne 'print \"chr\".\$_' >>$tmp";
+system($cmd);
+$cmd="mv $tmp $OUTNAME";
+system($cmd);
+#remove entries next to each other
+
+
+
+
+#############################################################
+##May not need this anymore since filtering on left and right
+#############################################################
+#my $tmpout=$OUTNAME;
+#$tmpout.=".tmp";
+#use FindBin qw($Bin);
+##my $path=$Bin;
+#$command="perl ".$path."/Extract_nSC.pl $OUTNAME -q nSC > $tmpout";
+##print "Command=$command\n";
+#system($command);
+#$command="perl ".$path."/reduce_redundancy.pl $tmpout $upper_limit |cut -f1-10 > $OUTNAME";
+##print "$command\n";
+#system($command);
+#system("rm $tmpout");
+########################################################
+
+
+
+
+print "Analysis Completed\n\nYou did it!!!\n";
+print "Finish Time : " . &spGetCurDateTime() . "\n";
+$now = time - $now;
+printf("\n\nTotal running time: %02d:%02d:%02d\n\n", int($now / 3600), int(($now % 3600) / 60),
+int($now % 60));
+
+exit;
+
+###############################################################################
+sub rev_comp {
+  my $dna = shift;
+  my $revcomp = reverse($dna);
+  $revcomp =~ tr/ACGTacgt/TGCAtgca/;
+
+  return $revcomp;
+}
+
+
+###############################################################################
+#### to get reference base
+sub getSeq{
+	my ($chr,$pos,$fasta)=@_;
+	#don't require chr
+	#if($chr !~ /^chr/){die "$chr is not correct\n";}
+#	die "$pos is not a number\n" if ($pos <0);
+my @result=();
+        if ($pos <0){print "$pos is not a valid position (likely caused by circular MT chromosome)\n";return;}
+
+	@result = `samtools faidx $fasta $chr:$pos-$pos`;
+	if($result[1]){chomp($result[1]);
+	return uc($result[1]);
+	}
+	return("NA");
+	#### after return will not be printed
+	####print "RESULTS=@result\n";
+}
+
+sub getBases{
+        my ($chr,$pos1,$pos2,$fasta)=@_;
+        #don't require chr
+        #if($chr !~ /^chr/){die "$chr is not correct\n";}
+my @result=();
+        if ($pos1 <0){print "$pos1 is not a valid position (likely caused by circular MT chromosome)\n";return;};
+
+        @result = `samtools faidx $fasta $chr:$pos1-$pos2`;
+	if(!$result[1]){$result[1]="NA"};
+        chomp($result[1]);
+        return uc($result[1]);
+
+        #### after return will not be printed
+        ####print "RESULTS=@result\n";
+}
+###############################################################################
+#### to get time
+sub spGetCurDateTime {
+	my ($sec, $min, $hour, $mday, $mon, $year) = localtime();
+	my $curDateTime = sprintf "%4d-%02d-%02d %02d:%02d:%02d",
+	$year+1900, $mon+1, $mday, $hour, $min, $sec;
+	return ($curDateTime);
+}
+
+
+###############################################################################
+#### print header
+sub print_header {
+	my $date=&spGetCurDateTime();
+	my $header = qq{##fileformat=VCFv4.1
+##fileDate=$date
+##source=SoftSearch.pl
+##reference=$INPUT_FASTA
+##Usage= SoftSearch.pl -l $minSoft -q $MapQ -r $minSoftReads -d $dist_To_Soft -m $minRP -u $unmated_pairs -s $num_sd -b $INPUT_BAM -f $INPUT_FASTA -o $OUTNAME
+##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
+##INFO=<ID=EVENT,Number=1,Type=String,Description="ID of event associated to breakend">
+##INFO=<ID=MATEID,Number=.,Type=String,Description="ID of mate breakends">
+##INFO=<ID=ISIZE,Number=.,Type=String,Description="Size of the SV">
+##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
+##FORMAT=<ID=lSC,Number=1,Type=Integer,Description="Length of the longest soft clips supporting the BND">
+##FORMAT=<ID=nSC,Number=1,Type=Integer,Description="Number of supporting soft-clips\">
+##FORMAT=<ID=uRP,Number=1,Type=Integer,Description="Number of unmated read pairs nearby Soft-Clips">
+##FORMAT=<ID=levD_local,Number=1,Type=Float,Description="Levenstein distance between soft-clipped bases and the area around the original soft-clipped site">
+##FORMAT=<ID=levD_distl,Number=1,Type=Float,Description="Levenstein distance between the soft-clipped bases and mate location">
+##FORMAT=<ID=CTX,Number=1,Type=Integer,Description="Number of chromosomal translocations">
+##FORMAT=<ID=DEL,Number=1,Type=Integer,Description="Number of reads supporting Large Deletions">
+##FORMAT=<ID=INS,Number=1,Type=Integer,Description="Number of reads supporting Large insertions">
+##FORMAT=<ID=NOV_INS,Number=1,Type=Integer,Description="Number of reads supporting novel sequence insertion">
+##FORMAT=<ID=TDUP,Number=1,Type=Integer,Description="Number of reads supporting a tandem duplication">
+##FORMAT=<ID=INV,Number=1,Type=Integer,Description="Number of reads supporting inversions">
+##FORMAT=<ID=sDEL,Number=1,Type=Integer,Description="Number of reads supporting novel sequence insertion">
+##INFO=<ID=NO_MATE_SC,Number=1,Type=Flag,Description="When there is no softclipping of the mate read location, an appromiate position is used">
+##FORMAT=<ID=GT,Number=1,Type=String,Description="Dummy value for maintaining VCF-Spec">
+#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\t$SAMPLE_NAME\n};
+
+	print OUT $header;
+}
+
+
+###############################################################################
+sub bulk_work {
+print "#####################################@_\n" if ($verbose);
+	my ($side, $line, $file) = @_;
+	my $local_levD = 0;
+	my $distl_levD = 0;
+
+	#my @info = split(/\t/, $line);
+	my @plus_Reads = split(/\t/, $line);
+	$plus_Reads[7] =~ s/\n//g;
+
+	#### softclip length and softclip size.
+	my $lSC = $plus_Reads[4];
+	my $nSC = $plus_Reads[6];
+
+
+	#Get all types of compatible reads
+	#Get improperly paired reads (@ max distance)
+
+	#### default value for left SIDE.
+	#If left-clip, then look downstream for match of softclipped reads to define a deletion, but look for DRPs upstream
+	my $sv_type = "SVTYPE=BND";
+	my $start_local=0; my $end_local=0;my $target_local="";my $target_drp="";my $start_drp="";my $end_drp="";
+	if ($side =~ /left/) {
+		$start_local = $plus_Reads[1]-$dist_To_Soft;
+		$end_local = $plus_Reads[2];
+                $start_drp = $plus_Reads[1];
+                $end_drp = $plus_Reads[1]+$dist_To_Soft;
+	
+	}
+	else{                
+                $start_local = $plus_Reads[1];
+                $end_local = $plus_Reads[1]+$dist_To_Soft;
+                $start_drp = $plus_Reads[1]-$dist_To_Soft;
+                $end_drp = $plus_Reads[1];
+        }
+	
+	$target_local=join("", $plus_Reads[0], ":", $start_local, "-", $end_local);
+	$target_drp=join("", $plus_Reads[0], ":", $start_drp, "-", $end_drp);
+	my $num_unmapped_pairs="";
+	if ($side =~ /right/) {
+		$num_unmapped_pairs=`samtools view $new_blacklist -q $MapQ -f8 -F 1536 -c $INPUT_BAM $target_drp`;
+	} else {
+        $num_unmapped_pairs=`samtools view $new_blacklist -q $MapQ -f24 -F 1536 -c $INPUT_BAM $target_drp`;
+	}
+if($verbose){print "samtools view $new_blacklist -q $MapQ -f24 -F 1536 -c $INPUT_BAM $target_drp\n";}
+
+	$num_unmapped_pairs=~s/\n//;
+if($verbose){print "NUM UNMAPPED PAIRS= $num_unmapped_pairs\n";}
+	my $REF1_base = "";
+	my $REF2_base = "";
+	my $INFO_1 = "";
+	my $INFO_2 = "";
+	my $ALT_1 = "";
+	my $ALT_2 = "";
+	my $isize = 0;
+	my $QUAL = "";
+	my $FORMAT = "GT:";
+
+	#### get 8 bit rand id
+	my $BND1_name = join "", map { ("a".."z")[rand 26] } 1..8;
+	my $BND2_name = join "", map { ("a".."z")[rand 26] } 1..8;
+	$BND1_name=join "_","BND",$BND1_name;
+	$BND2_name=join "_","BND",$BND2_name;
+
+	my $counts = {CTX => 0, DEL => 0, INS => 0, INV => 0, TDUP => 0, NOV_INS => 0 };
+	my $event_mate_info = {CTX => "", DEL => "", INS => "", INV => "", TDUP => "", NOV_INS => "" };
+
+	#### get mate pair info and counts per event
+	foreach my $e (sort keys %{$counts}) {
+		my $h = get_counts_n_info($e, $side, $MapQ, $file, $dist_To_Soft, $target_drp, $upper_limit, $lower_limit);
+
+		$counts->{$e} = $h->{count};
+		$event_mate_info->{$e} = $h->{info};
+	}
+#print Dumper($counts);
+
+	my $max = 0;
+	my $type = "UNKNOWN";
+	my $nRP = 0;
+	my $mate_info = "NA\tNA\tNA\tNA";
+	my $summary = "GT:";
+
+	#### find max count of events and set type, nRP and info to corresponding
+	#### max count event.
+	#### also create a summary string of all counts to be added to VCF file.
+	foreach my $e (sort keys %{$counts}){
+#		if ($counts->{$e} >=i $max){
+		if ($counts->{$e} > $max){		
+			$type = $e .",". $counts->{$e};
+			$nRP = $counts->{$e};
+
+			$max = $counts->{$e};
+
+			if (length($event_mate_info->{$e})) {
+				$mate_info = $event_mate_info->{$e};
+			}
+		}
+
+		$summary .= $e .",". $counts->{$e} .":";
+	}
+#	print "done with Summary\n";
+	#### remove last colon ":" from
+	$summary =~ s/:$//;
+ if (($minRP > $max)&&(!$disable_RP_only )){if ($verbose){print "FAILED BECAUSE ($minRP > $max)&&(!$disable_RP_only )"};return};
+
+	#### Run Levenstein distance on softclip in target region to find out if its a small deletion/insetion
+	#### passing 1: clip_seq, 2: chr, 3: start, 4: end, 5: ref file.
+	my $levD = new LevD;
+########################################################
+########################################################
+########################################################
+
+	#### redefine start and end location for LevD calc.
+#	$start = $plus_Reads[1]-$dist_To_Soft;
+#	$end = $plus_Reads[2];
+	my $num_bases_to_loc=0;
+	my $new_start=0;
+	my $new_end=0;
+	my $del_seq="";
+        my $start = $start_local;
+        my $end = $end_local;
+	if ($lSC=~/NA/){$lSC=0}
+
+	if ($side =~ /right/) {
+	        $levD->search($plus_Reads[3], $plus_Reads[0], $start, $end, $INPUT_FASTA);
+		$local_levD = sprintf("%.2f", $levD->{relative_edit_dist});
+	        $num_bases_to_loc=$levD->{index};
+		$new_start = $plus_Reads[2];
+                if ($plus_Reads[2]=~/^[0-9]/){$new_end=$plus_Reads[2]+$lSC};
+	}
+	else{
+		$levD->search($plus_Reads[3], $plus_Reads[0], $start, $end, $INPUT_FASTA);
+		$local_levD = sprintf("%.2f", $levD->{relative_edit_dist});
+		$num_bases_to_loc=$levD->{index};
+		if ($plus_Reads[2]=~/^[0-9]/){$new_start=$plus_Reads[2]-$lSC};
+                $new_end = $plus_Reads[2];
+	}
+	if((!$new_start)||(!$new_end)||($new_start<0)){print "FAILED AT ((!$new_start)||(!$new_end)||($new_start<0))\n";return};
+	
+	$del_seq=getBases($plus_Reads[0], $new_start,$new_end,$INPUT_FASTA);
+##############################################################################
+#	#If there is a match, where is the start position of the match?
+#
+##############################################################################
+
+
+	#if $plus_Reads[3] eq "NA", then it was found without soft-clipped reads
+	if($plus_Reads[3] !~  /NA/){
+			if (($local_levD < $levD_local_threshold)) {
+				return if (!$sv_only);
+				#### add value to summary to be written to vcf file.
+				$summary = "GT:sDel," . $plus_Reads[6];
+				$type = "sDEL";
+				###########################################################################
+				##### Printing output
+
+				#########################################
+				##### Get DNA info
+				#########################################
+				#$REF1_base = getSeq($plus_Reads[0], $plus_Reads[1], $INPUT_FASTA);
+				$REF1_base = substr($del_seq, 0, 1);
+
+				#### this is alt ref. for softclip its $plus_Reads[3]
+				$REF2_base = $del_seq;
+				$QUAL = 1/($local_levD + 0.001);
+				$QUAL = sprintf("%.2f",$QUAL);
+				$isize = length($del_seq);
+
+				#### svtype = none for sDEL
+				#### isize = length($info[3]);
+				#### nRP = NA
+				#### mate_id = NA
+				#### CTX,:DEL,:....sDEL,##
+				$INFO_1=join(";", "SVTYPE=NA", "EVENT=$type", "ISIZE=$isize");
+
+				#Add Sample infomration
+				my $FORMAT="GT:sDEL";
+				$FORMAT .= ":lSC:nSC:uRP:levD_local";
+				my $SAMPLE= "0/1:";
+				$SAMPLE .= "$plus_Reads[6]:$lSC:$nSC:$num_unmapped_pairs:$local_levD";
+
+				#### remove any white spaces.
+				$INFO_1=~s/\s//g;
+				$INFO_2=~s/\s//g;
+
+				$BND1_name =~ s/^BND/LEVD/;
+				# If left, then the start position is plus_Reads[1]-isize
+				my $start_pos=0;
+				#Make sure Ref1 and Ref2 bases are different
+				if($REF2_base eq $REF1_base){$REF1_base="NA"}
+				if($side=~/left/){$start_pos=$plus_Reads[1]-$isize}else{$start_pos=$plus_Reads[1]};		
+				print OUT join ("\t", $plus_Reads[0], $start_pos, $BND1_name, $REF2_base, $REF1_base, $QUAL, "PASS", $INFO_1,$FORMAT,$SAMPLE, "\n");
+				if ($verbose){print "No Softclipped reads here!\n"}
+				return;
+			}
+		}
+
+		#### Otherwise, look for DRP mate info
+	#if($nRP=~/NA/){print "MATE_INFO=$mate_info\tSide=$side\tline=$line\n";}
+		my @mate_info_arr = split(/\t/, $mate_info);
+		$nRP = $mate_info_arr[3];
+		my $mate_chr=$mate_info_arr[0];
+
+			if((! defined $nRP) || ($nRP =~ /na/i) || ($mate_chr =~ /NA/) ){
+			#PRINT UNKNOWN
+	if ($nRP =~ /na/i){print "Can't find SC reads\n" if ($verbose);return};
+	if ($verbose){print "There is an unknown\nNRP=$nRP Mate_CHR=$mate_chr minRP=$minRP\n"}
+				$summary .= ":unknown," . $plus_Reads[6];
+				$type = "unknown";
+				$REF1_base = getSeq($plus_Reads[0], $plus_Reads[1], $INPUT_FASTA);
+				$REF2_base = $plus_Reads[3];
+				$BND1_name =~ s/^BND/UNKNOWN/;
+				$QUAL = 1/($local_levD + 0.001);
+				$QUAL = sprintf("%.2f",$QUAL);
+				$INFO_1=join(";", "SVTYPE=unknown", "EVENT=unknown", "ISIZE=unknown");
+				#Add Sample infomration
+				my $FORMAT="GT:sDEL";
+				$FORMAT .= ":lSC:nSC:uRP:levD_local";
+				my $SAMPLE = "0/1:";
+				$SAMPLE .= "$plus_Reads[6]:$lSC:$nSC:$num_unmapped_pairs:$local_levD";
+				$SAMPLE=~s/NA/0/g;
+				#### remove any white spaces.
+				$INFO_1=~s/\s//g;
+			       #print join ("\t", $plus_Reads[0], $plus_Reads[1],  $REF2_base, $REF1_base, $QUAL, "PASS", $INFO_1,$FORMAT,$SAMPLE, "\n");
+
+				print OUT join ("\t", $plus_Reads[0], $plus_Reads[1], $BND1_name, $REF1_base, $REF2_base, $QUAL, "PASS", $INFO_1,$FORMAT,$SAMPLE, "\n");
+				return;
+
+		}
+		#### end if there is no mate info or nRP+uRP<minRP
+		if (($nRP<$minRP)&&($unmated_pairs > ($num_unmapped_pairs+$nRP))){
+			print "Something failed here\nif (($nRP<$minRP)&&($unmated_pairs > ($num_unmapped_pairs+$nRP)))\n";
+		return}
+
+		##################################################################################
+		# Find out if mates have nearby soft-clips (to refine the breakpoints)
+		##################################################################################
+		#Look for evidence of soft-clipping near mate
+		my @mate_soft_arr = ();
+		my $mate_start = 0;
+		my $mate_soft = "";
+
+		@mate_info_arr = split(/\t/, $mate_info);
+
+		#### mate start and end locations.
+		my $filename = $right_sc;
+
+		$start = $mate_info_arr[1] - $dist_To_Soft;
+		$end = $mate_info_arr[1];
+
+		if ($side =~ /right/) {
+			$start = $mate_info_arr[2];
+			$end = $mate_info_arr[2] + $dist_To_Soft;
+
+			$filename = $left_sc;
+		}
+
+		#### add levenstein distance to Summary
+	#print "Calc distal Levd\n";
+		$levD->search(rev_comp($plus_Reads[3]), $mate_info_arr[0], $start, $end, $INPUT_FASTA);
+		$distl_levD = sprintf("%.2f", $levD->{relative_edit_dist});
+	$distl_levD = "NA" if($plus_Reads[3] =~ /NA/);
+	#If there is no softclips to string match, then give 0 as quality value
+       if ($plus_Reads[3] !~ /NA/){
+			$QUAL=1/($distl_levD + 0.001);
+		}
+		else	{
+			$QUAL=0;
+		};
+	$QUAL=sprintf("%.2f",$QUAL);
+	#### looking for softclips to refine break point
+	#### if left look in right and vice-versa.
+	$cmd = qq{echo -e "$mate_info_arr[0]\t$start\t$end"};
+	$cmd .= qq{ | awk -F'\t' 'NF==3' | intersectBed -a stdin -b $filename | head -1};
+print "$cmd\n" if $verbose;
+	$mate_soft = `$cmd`;
+
+	$mate_soft =~ s/\n//g;
+	@mate_soft_arr = split(/\s/, $mate_soft);
+my $NO_MATE_SC="";
+	if(@mate_soft_arr){
+		$mate_chr = $mate_soft_arr[0];
+		$mate_start = $mate_soft_arr[1];
+                $NO_MATE_SC="APPROXIMATE";
+
+	} else{
+		@mate_info_arr = split(/\s/,$mate_info);
+		$mate_chr = $mate_info_arr[0];
+		$mate_start = $mate_info_arr[1];
+	}
+
+	#end if there is no mate info
+	return if ($mate_chr eq "");
+	#end if there is no mate info and !disable_RP_only
+	return if (($lSC =~/NA/)&&(!$disable_RP_only));
+	
+	
+	###########################################################################
+	##### Printing output
+
+	#########################################
+	# Get DNA info
+	#########################################
+	#print "PLUS_READS=$plus_Reads[0],$plus_Reads[1]\nMATE=$mate_chr,$mate_start,$INPUT_FASTA\n";
+	$REF1_base = getSeq($plus_Reads[0], $plus_Reads[1], $INPUT_FASTA);
+
+	### this is alt ref. for softclip its $plus_Reads[3]
+	$REF2_base = getSeq($mate_chr, $mate_start, $INPUT_FASTA);
+
+	#########################################
+	# print in VCF format
+	#########################################
+
+	#### abs value to account for left and right reads.
+	$isize = abs($plus_Reads[1]-$mate_start);
+	
+	my $event_type=$type;
+	$event_type=~ s/,|[0-9]//g;
+	$INFO_1=join(";", "$sv_type", "EVENT=$event_type","END=$mate_start", "ISIZE=$isize","MATEID=$BND2_name");
+	$INFO_2=join(";", "$sv_type", "EVENT=$event_type","END=$plus_Reads[1]", "ISIZE=$isize","MATEID=$BND1_name");
+
+	#### remove any white spaces.
+	#### ask: did you mean to remove space from ends? eg. trim()
+	$INFO_1=~s/\s//g;
+	$INFO_2=~s/\s//g;
+
+	$FORMAT=$summary;
+ 	$FORMAT=~ s/,|[0-9]//g;
+        $FORMAT .= ":lSC:nSC:uRP:distl_levD";
+	if($NO_MATE_SC){$INFO_2 .= ":NO_MATE_SC"}
+	my $SAMPLE="0/1:";	
+	$SAMPLE .=$summary;
+#        if($NO_MATE_SC){$SAMPLE.= ":$NO_MATE_SC"}
+
+	$SAMPLE=~s/[A-Z|,|_]//g;
+        my $MATE_SAMPLE=$SAMPLE;
+        $SAMPLE .= ":$lSC:$nSC:$num_unmapped_pairs:$distl_levD";
+	$MATE_SAMPLE .=":NA:NA:NA:NA";
+	$SAMPLE=~s/::/:/g;
+	$MATE_SAMPLE=~s/::/:/g;
+	$MATE_SAMPLE=~s/NA/0/g;
+	$SAMPLE=~s/NA/0/g;
+ 
+	if($type !~ /INV/){
+		$ALT_1 = join("","]",$mate_chr,":",$mate_start,"]",$REF1_base);
+		$ALT_2 = join("",$REF2_base,"[",$plus_Reads[0],":",$plus_Reads[1],"[");
+		#		2      321682 bnd_V  T   ]13:123456]T  6    PASS SVTYPE=BND
+		#		13     123456 bnd_U  C   C[2:321682[   6    PASS SVTYPE=BND
+	} else {
+		$ALT_1 = join("", "]", $plus_Reads[0], ":", $plus_Reads[1], "]", $REF2_base);
+		$ALT_2 = join("", $REF1_base, "[", $mate_chr, ":", $mate_start, "[");
+	}
+
+	if(($mate_chr) && ($plus_Reads[0])){
+		print OUT join ("\t", $plus_Reads[0], $plus_Reads[1], $BND1_name, $REF1_base, $ALT_1, $QUAL,"PASS", $INFO_1, $FORMAT,$SAMPLE,"\n");
+		print OUT join ("\t", $mate_chr, $mate_start, $BND2_name, $REF2_base, $ALT_2, $QUAL, "PASS", $INFO_2, $FORMAT,$MATE_SAMPLE,"\n");
+	}
+}
+
+###############################################################################
+###############################################################################
+sub get_counts_n_info {
+        my ($event, $side, $mapQ, $file, $dist, $target, $upL, $lwL) = @_;
+
+        my $mate_info = "";
+        my $cmd = "";
+
+        if ($event =~ /^CTX$/i) {
+                #print "CTX side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 32 -F 1536 $file $target};
+                        $cmd .= qq{ | perl -ane 'if(\$F[6] ne "="){\$end=\$F[7]+1; print join ("\\t",\$F[6],\$F[7],\$end,"\\n")}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info=`$cmd`;
+                } else {
+                        $cmd = qq{ samtools view $new_blacklist -q $mapQ -f 16 -F 1536 $file $target};
+                        $cmd .= qq{ | perl -ane 'if(\$F[6] ne "="){\$end=\$F[7]+1; print join ("\\t",\$F[6],\$F[7],\$end,"\\n")}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info=`$cmd`;
+                }
+        } elsif ($event =~ /^DEL$/i) {
+                #print "DEL side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ  -f 32 -F 1552 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$9>$upL)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info=`$cmd`;
+                } else {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -F 1568 -f 16 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"} {if((\$7 ~ /=/)&&(\$9<-$upL)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist_To_Soft -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+
+                        $mate_info=`$cmd`;
+                }
+        } elsif ($event =~ /^INS$/i) {
+                #print "INS side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 32 -F 1552 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$9<$lwL && \$9 > 0 )){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                } else {
+                        $cmd = qq {samtools view $new_blacklist -q $mapQ -f 16 -F 1568 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$9>-$lwL && \$9 < 0 )){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                }
+        } elsif ($event =~ /^INV$/i) {
+                #print "INV side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ  -F 1596 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist_To_Soft -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                } else {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 48 -F 1548 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                }
+        } elsif ($event =~ /^TDUP$/i) {
+                #print "TDUP side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 32 -F 1552 $file $target};
+#			$cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$9>$upL)){end=\$8+1;print \$3,\$8,end}}'};
+			$cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$4>\$8)&&(\$9<0)&& (\$9>$upL)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                } else {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 16 -F 1568 $file $target};
+#                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$9<-$upL )){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{if((\$7 ~ /=/)&&(\$4<\$8)&&(\$9>0)&&(\$9>$upL)){end=\$8+1;print \$3,\$8,end}}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                }
+        } elsif ($event =~ /^NOV_INS$/i) {
+                #print "NOV_INS side $side\n";
+                if ($side =~ /right/i) {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ -f 8 -F 1552 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{end=\$8+1;print \$3,\$8,end}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                } else {
+                        $cmd = qq{samtools view $new_blacklist -q $mapQ  -f 24 -F 1536 $file $target};
+                        $cmd .= qq{ | awk '{OFS="\\t"}{end=\$8+1;print \$3,\$8,end}'};
+                        $cmd .= qq{ | sortBed | mergeBed -d $dist -n | sort -k4nr | head -1};
+#if($verbose){print "$cmd\n"}
+                        $mate_info = `$cmd`;
+                }
+        }
+
+        $mate_info=~s/\n//g;
+        my @tmp=split(/\t/, $mate_info);
+
+        my $counts = 0;
+
+        if (defined $tmp[3]) {
+                $tmp[3] =~ s/\n//g;
+
+                $counts = $tmp[3] if (length($tmp[3]));
+        }
+        return ({count=>$counts, info=>$mate_info});                                                                                                                                
+}
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/softsearch/softsearch.xml	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,57 @@
+<?xml version="1.0"?>
+<tool id="SoftSearch" name="SoftSearch" version="0.6">
+  <requirements>
+      	<requirement type="package" version="0.6">ss_tool</requirement>
+	<requirement type="package" version="0.1.18">samtools</requirement>
+  </requirements>
+  <description>for structure variation</description>
+  <command>#if $source.index_source=="history" 
+	        samtools index $bam_file ; samtools faidx $source.history_fasta_file ; $inc | SoftSearch.pl -l $min_length_soft_clip -q $min_map_quality -r $min_depth_soft_clip_loc -m $min_no_discordant_read -s $no_sd_consider_discordant -b $bam_file -f $source.history_fasta_file -o $out_file1
+	    #else
+		samtools index $bam_file ; samtools faidx $source.ref_fasta.fields.path ; $inc | SoftSearch.pl -l $min_length_soft_clip -q $min_map_quality -r $min_depth_soft_clip_loc -m $min_no_discordant_read -s $no_sd_consider_discordant -b $bam_file -f $source.ref_fasta.fields.path -o $out_file1
+	    #end if		
+ </command>
+  <inputs>
+      <param name="bam_file" type="data"   format="bam" label="BAM Files" />
+	 <conditional name="source">
+            <param name="index_source" type="select" label="Choose the source for the reference list">
+                <option value="cached">Locally cached</option>
+                <option value="history">History</option>
+            </param>
+            <when value="history">
+                <param format="fasta" name="history_fasta_file" type="data" label="Fasta file from history."/>
+            </when>
+            <when value="cached">
+                  <param name="ref_fasta" type="select" >
+                    <options from_data_table="fasta_indexes">
+                    <validator type="no_options" message="No Fasta file is available" />
+                    </options>
+                  </param>
+            </when>
+        </conditional>
+      <param name="inc" type="hidden" value="n=$RANDOM" />
+     	<param name="min_length_soft_clip" type="integer" value="10" label="-l Minimum length of soft-clipped segment [5]" /> 
+        <param name="min_map_quality" type="integer" value="20" label="-q Minimum mapping quality [20]" /> 	
+	<param name="min_depth_soft_clip_loc" type="integer" value="10" label="-r Minimum depth of soft-clipped reads at position [5]" /> 
+	<param name="min_no_discordant_read" type="integer" value="10" label="-m Minimum number of discordant read pairs [5]" /> 
+	<param name="no_sd_consider_discordant" type="integer" value="4" label="-s Number of sd away from mean to be considered discordant" />       
+  </inputs>
+  <outputs>
+      <data format="vcf" name="out_file1" />
+  </outputs>
+  <help>
+  </help>
+</tool>
+
+
+
+
+
+
+
+
+
+
+
+
+
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tool_data_table_conf.xml.sample	Wed Jun 04 08:00:42 2014 -0400
@@ -0,0 +1,82 @@
+<!-- Use the file tool_data_table_conf.xml.oldlocstyle if you don't want to update your loc files as changed in revision 4550:535d276c92bc-->
+<tables>
+    <!-- Locations of all fasta files under genome directory -->
+    <table name="all_fasta" comment_char="#">
+        <columns>value, dbkey, name, path</columns>
+        <file path="tool-data/all_fasta.loc" />
+    </table>
+    <!-- Locations of indexes in the BFAST mapper format -->
+    <table name="bfast_indexes" comment_char="#">
+        <columns>value, dbkey, formats, name, path</columns>
+        <file path="tool-data/bfast_indexes.loc" />
+    </table>
+    <!-- Locations of nucleotide (mega)blast databases -->
+    <table name="blastdb" comment_char="#">
+        <columns>value, name, path</columns>
+        <file path="tool-data/blastdb.loc" />
+    </table>
+    <!-- Locations of protein (mega)blast databases -->
+    <table name="blastdb_p" comment_char="#">
+        <columns>value, name, path</columns>
+        <file path="tool-data/blastdb_p.loc" />
+    </table>
+    <!-- Locations of indexes in the BWA mapper format -->
+    <table name="bwa_indexes" comment_char="#">
+        <columns>value, dbkey, name, path</columns>
+        <file path="tool-data/bwa_index.loc" />
+    </table>
+    <!-- Locations of indexes in the BWA color-space mapper format -->
+    <table name="bwa_indexes_color" comment_char="#">
+        <columns>value, dbkey, name, path</columns>
+        <file path="tool-data/bwa_index_color.loc" />
+    </table>
+    <!-- Locations of MAF files that have been indexed with bx-python -->
+    <table name="indexed_maf_files">
+        <columns>name, value, dbkey, species</columns>
+        <file path="tool-data/maf_index.loc" />
+    </table>
+    <!-- Locations of fasta files appropriate for NGS simulation -->
+    <table name="ngs_sim_fasta" comment_char="#">
+        <columns>value, dbkey, name, path</columns>
+        <file path="tool-data/ngs_sim_fasta.loc" />
+    </table>
+    <!-- Locations of PerM base index files -->
+    <table name="perm_base_indexes" comment_char="#">
+        <columns>value, name, path</columns>
+        <file path="tool-data/perm_base_index.loc" />
+    </table>
+    <!-- Locations of PerM color-space index files -->
+    <table name="perm_color_indexes" comment_char="#">
+        <columns>value, name, path</columns>
+        <file path="tool-data/perm_color_index.loc" />
+    </table>
+    <!-- Location of Picard dict file and other files -->
+    <table name="picard_indexes" comment_char="#">
+        <columns>value, dbkey, name, path</columns>
+        <file path="tool-data/picard_index.loc" />
+    </table>
+    <!-- Location of Picard dict files valid for GATK -->
+    <table name="gatk_picard_indexes" comment_char="#">
+        <columns>value, dbkey, name, path</columns>
+        <file path="tool-data/gatk_sorted_picard_index.loc" />
+    </table>
+    <!-- Available of GATK references -->
+    <table name="gatk_annotations" comment_char="#">
+        <columns>value, name, gatk_value, tools_valid_for</columns>
+        <file path="tool-data/gatk_annotations.txt" />
+    </table>
+    <!-- Location of SRMA dict file and other files -->
+    <table name="srma_indexes" comment_char="#">
+        <columns>value, dbkey, name, path</columns>
+        <file path="tool-data/picard_index.loc" />
+    </table>
+    <!-- Location of Mosaik files -->
+    <table name="mosaik_indexes" comment_char="#">
+        <columns>value, dbkey, name, path</columns>
+        <file path="tool-data/mosaik_index.loc" />
+    </table>
+    <table name="fasta_indexes" comment_char="#">
+        <columns>value, dbkey, name, path</columns>
+        <file path="tool-data/fasta_indexes.loc" />
+    </table>
+</tables>