annotate tools/fastq/fastq_paired_unpaired.py @ 0:3a39d2053bc5

Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
author peterjc
date Tue, 07 Jun 2011 16:32:01 -0400
parents
children 2feaef06d388
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
2 """Divides a FASTQ into paired and single (orphan reads) as separate files.
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
3
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
4 The input file should be a valid FASTQ file which has been sorted so that
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
5 any partner forward+reverse reads are consecutive. The output files all
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
6 preserve this sort order. Pairing are recognised based on standard name
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
7 suffices. See below or run the tool with no arguments for more details.
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
8
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
9 Note that the FASTQ variant is unimportant (Sanger, Solexa, Illumina, or even
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
10 Color Space should all work equally well).
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
11
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
12 This script is copyright 2010 by Peter Cock, SCRI, UK. All rights reserved.
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
13 See accompanying text file for licence details (MIT/BSD style).
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
14
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
15 This is version 0.0.4 of the script.
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
16 """
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
17 import os
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
18 import sys
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
19 import re
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
20 from galaxy_utils.sequence.fastq import fastqReader, fastqWriter
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
21
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
22 def stop_err(msg, err=1):
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
23 sys.stderr.write(msg.rstrip() + "\n")
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
24 sys.exit(err)
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
25
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
26 msg = """Expect either 3 or 4 arguments, all FASTQ filenames.
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
27
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
28 If you want two output files, use four arguments:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
29 - FASTQ variant (e.g. sanger, solexa, illumina or cssanger)
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
30 - Sorted input FASTQ filename,
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
31 - Output paired FASTQ filename (forward then reverse interleaved),
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
32 - Output singles FASTQ filename (orphan reads)
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
33
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
34 If you want three output files, use five arguments:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
35 - FASTQ variant (e.g. sanger, solexa, illumina or cssanger)
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
36 - Sorted input FASTQ filename,
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
37 - Output forward paired FASTQ filename,
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
38 - Output reverse paired FASTQ filename,
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
39 - Output singles FASTQ filename (orphan reads)
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
40
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
41 The input file should be a valid FASTQ file which has been sorted so that
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
42 any partner forward+reverse reads are consecutive. The output files all
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
43 preserve this sort order.
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
44
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
45 Any reads where the forward/reverse naming suffix used is not recognised
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
46 are treated as orphan reads. The tool supports the /1 and /2 convention
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
47 used by Illumina, the .f and .r convention, and the Sanger convention
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
48 (see http://staden.sourceforge.net/manual/pregap4_unix_50.html for details).
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
49
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
50 Note that this does support multiple forward and reverse reads per template
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
51 (which is quite common with Sanger sequencing), e.g. this which is sorted
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
52 alphabetically:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
53
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
54 WTSI_1055_4p17.p1kapIBF
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
55 WTSI_1055_4p17.p1kpIBF
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
56 WTSI_1055_4p17.q1kapIBR
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
57 WTSI_1055_4p17.q1kpIBR
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
58
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
59 or this where the reads already come in pairs:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
60
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
61 WTSI_1055_4p17.p1kapIBF
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
62 WTSI_1055_4p17.q1kapIBR
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
63 WTSI_1055_4p17.p1kpIBF
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
64 WTSI_1055_4p17.q1kpIBR
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
65
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
66 both become:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
67
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
68 WTSI_1055_4p17.p1kapIBF paired with WTSI_1055_4p17.q1kapIBR
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
69 WTSI_1055_4p17.p1kpIBF paired with WTSI_1055_4p17.q1kpIBR
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
70 """
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
71
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
72 if len(sys.argv) == 5:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
73 format, input_fastq, pairs_fastq, singles_fastq = sys.argv[1:]
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
74 elif len(sys.argv) == 6:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
75 pairs_fastq = None
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
76 format, input_fastq, pairs_f_fastq, pairs_r_fastq, singles_fastq = sys.argv[1:]
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
77 else:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
78 stop_err(msg)
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
79
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
80 format = format.replace("fastq", "").lower()
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
81 if not format:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
82 format="sanger" #safe default
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
83 elif format not in ["sanger","solexa","illumina","cssanger"]:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
84 stop_err("Unrecognised format %s" % format)
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
85
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
86 def f_match(name):
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
87 if name.endswith("/1") or name.endswith(".f"):
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
88 return True
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
89
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
90 #Cope with three widely used suffix naming convensions,
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
91 #Illumina: /1 or /2
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
92 #Forward/revered: .f or .r
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
93 #Sanger, e.g. .p1k and .q1k
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
94 #See http://staden.sourceforge.net/manual/pregap4_unix_50.html
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
95 re_f = re.compile(r"(/1|\.f|\.[sfp]\d\w*)$")
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
96 re_r = re.compile(r"(/2|\.r|\.[rq]\d\w*)$")
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
97
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
98 #assert re_f.match("demo/1")
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
99 assert re_f.search("demo.f")
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
100 assert re_f.search("demo.s1")
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
101 assert re_f.search("demo.f1k")
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
102 assert re_f.search("demo.p1")
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
103 assert re_f.search("demo.p1k")
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
104 assert re_f.search("demo.p1lk")
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
105 assert re_r.search("demo/2")
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
106 assert re_r.search("demo.r")
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
107 assert re_r.search("demo.q1")
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
108 assert re_r.search("demo.q1lk")
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
109 assert not re_r.search("demo/1")
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
110 assert not re_r.search("demo.f")
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
111 assert not re_r.search("demo.p")
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
112 assert not re_f.search("demo/2")
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
113 assert not re_f.search("demo.r")
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
114 assert not re_f.search("demo.q")
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
115
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
116 count, forward, reverse, neither, pairs, singles = 0, 0, 0, 0, 0, 0
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
117 in_handle = open(input_fastq)
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
118 if pairs_fastq:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
119 pairs_f_writer = fastqWriter(open(pairs_fastq, "w"), format)
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
120 pairs_r_writer = pairs_f_writer
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
121 else:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
122 pairs_f_writer = fastqWriter(open(pairs_f_fastq, "w"), format)
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
123 pairs_r_writer = fastqWriter(open(pairs_r_fastq, "w"), format)
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
124 singles_writer = fastqWriter(open(singles_fastq, "w"), format)
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
125 last_template, buffered_reads = None, []
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
126
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
127 for record in fastqReader(in_handle, format):
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
128 count += 1
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
129 name = record.identifier.split(None,1)[0]
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
130 assert name[0]=="@", record.identifier #Quirk of the Galaxy parser
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
131 suffix = re_f.search(name)
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
132 if suffix:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
133 #============
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
134 #Forward read
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
135 #============
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
136 template = name[:suffix.start()]
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
137 #print name, "forward", template
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
138 forward += 1
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
139 if last_template == template:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
140 buffered_reads.append(record)
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
141 else:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
142 #Any old buffered reads are orphans
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
143 for old in buffered_reads:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
144 singles_writer.write(old)
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
145 singles += 1
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
146 #Save this read in buffer
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
147 buffered_reads = [record]
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
148 last_template = template
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
149 else:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
150 suffix = re_r.search(name)
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
151 if suffix:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
152 #============
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
153 #Reverse read
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
154 #============
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
155 template = name[:suffix.start()]
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
156 #print name, "reverse", template
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
157 reverse += 1
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
158 if last_template == template and buffered_reads:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
159 #We have a pair!
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
160 #If there are multiple buffered forward reads, want to pick
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
161 #the first one (although we could try and do something more
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
162 #clever looking at the suffix to match them up...)
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
163 old = buffered_reads.pop(0)
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
164 pairs_f_writer.write(old)
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
165 pairs_r_writer.write(record)
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
166 pairs += 2
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
167 else:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
168 #As this is a reverse read, this and any buffered read(s) are
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
169 #all orphans
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
170 for old in buffered_reads:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
171 singles_writer.write(old)
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
172 singles += 1
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
173 buffered_reads = []
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
174 singles_writer.write(record)
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
175 singles += 1
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
176 last_template = None
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
177 else:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
178 #===========================
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
179 #Neither forward nor reverse
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
180 #===========================
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
181 singles_writer.write(record)
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
182 singles += 1
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
183 neither += 1
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
184 for old in buffered_reads:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
185 singles_writer.write(old)
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
186 singles += 1
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
187 buffered_reads = []
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
188 last_template = None
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
189 if last_template:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
190 #Left over singles...
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
191 for old in buffered_reads:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
192 singles_writer.write(old)
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
193 singles += 1
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
194 in_handle.close
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
195 singles_writer.close()
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
196 if pairs_fastq:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
197 pairs_f_writer.close()
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
198 assert pairs_r_writer.file.closed
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
199 else:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
200 pairs_f_writer.close()
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
201 pairs_r_writer.close()
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
202
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
203 if neither:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
204 print "%i reads (%i forward, %i reverse, %i neither), %i in pairs, %i as singles" \
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
205 % (count, forward, reverse, neither, pairs, singles)
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
206 else:
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
207 print "%i reads (%i forward, %i reverse), %i in pairs, %i as singles" \
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
208 % (count, forward, reverse, pairs, singles)
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
209
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
210 assert count == pairs + singles == forward + reverse + neither, \
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
211 "%i vs %i+%i=%i vs %i+%i=%i" \
3a39d2053bc5 Migrated tool version 0.0.4 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
212 % (count,pairs,singles,pairs+singles,forward,reverse,neither,forward+reverse+neither)