annotate msp_split.py @ 10:0302d7e2ce01 draft

planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca-dirty
author tomnl
date Tue, 12 Jun 2018 11:46:01 -0400
parents 2cba35789adf
children cb8dce9812ff
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
9
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
1 from __future__ import print_function
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
2 import argparse
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
3 import textwrap
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
4 import os
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
5 import re
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
6 import csv
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
7 import math
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
8
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
9 def msp_split(i, o, n):
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
10 spec_total = lcount('NAME', i)
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
11 spec_lim = math.ceil(spec_total/float(n))
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
12 spec_c = 0
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
13 filelist = []
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
14 header = ''
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
15 print('spec_lim', spec_lim)
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
16 with open(i, 'r') as msp_in:
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
17 for i in range(1, n+1):
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
18 with open(os.path.join(o, 'file{}.msp'.format(str(i).zfill(len(str(n))))), 'w+') as msp_out:
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
19 while spec_c <= spec_lim:
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
20 if header:
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
21 msp_out.write(header)
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
22 header = ''
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
23 line = msp_in.readline()
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
24
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
25 if not line:
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
26 break # end of file
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
27
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
28 if re.match('^NAME:.*$', line, re.IGNORECASE):
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
29 header = line
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
30 spec_c += 1
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
31 else:
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
32 msp_out.write(line)
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
33 spec_c = 1
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
34
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
35 return filelist
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
36
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
37 def lcount(keyword, fname):
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
38 with open(fname, 'r') as fin:
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
39 return sum([1 for line in fin if keyword in line])
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
40
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
41 def main():
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
42
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
43 p = argparse.ArgumentParser(prog='PROG',
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
44 formatter_class=argparse.RawDescriptionHelpFormatter,
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
45 description='''Create filelist for DMA DIMS nearline workflow''',
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
46 epilog=textwrap.dedent('''
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
47 -------------------------------------------------------------------------
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
48
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
49 Example Usage
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
50
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
51 python dma-filelist-generation.py -i [dir with sample files], [dir with blank files] -o .
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
52
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
53 '''))
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
54
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
55 p.add_argument('-i', dest='i', help='dir with sample files', required=True)
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
56 p.add_argument('-o', dest='o', help='out dir', required=True)
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
57 p.add_argument('-n', dest='n',)
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
58
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
59
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
60 args = p.parse_args()
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
61
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
62 if not os.path.exists(args.o):
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
63 os.makedirs(args.o)
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
64 print('in file', args.i)
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
65 print('out dir', args.o)
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
66 print('nm files', args.n)
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
67
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
68 msp_split(args.i, args.o, int(args.n))
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
69
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
70
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
71 if __name__ == '__main__':
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
72 main()
2cba35789adf planemo upload for repository https://github.com/computational-metabolomics/dma-tools-galaxy commit af689d3f20c86f69aa824545e668280bcd5e0cca
tomnl
parents:
diff changeset
73