annotate relabel_fasta.py @ 26:f0917c340f13 draft

planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730-dirty
author pjbriggs
date Thu, 30 Aug 2018 08:41:11 -0400
parents fe354f5dd0ee
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
24
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
1 #!/usr/bin/env python
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
2
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
3 DESCRIPTION = \
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
4 """Replace FASTA labels with new labels <PREFIX>1, <PREFIX>2,
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
5 <PREFIX>3 ... (<PREFIX> is provided by the user via the command
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
6 line).
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
7
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
8 Can be used to label OTUs as OTU_1, OTU_2 etc.
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
9
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
10 This reimplements the functionality of the fasta_number.py utility
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
11 from https://drive5.com/python/fasta_number_py.html
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
12 """
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
13
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
14 import argparse
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
15
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
16 def relabel_fasta(fp,prefix,include_size=False):
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
17 """
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
18 Relabel sequence records in a FASTA file
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
19
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
20 Arguments:
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
21 fp (File): file-like object opened for reading
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
22 input FASTA data from
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
23 prefix (str): prefix to use in new labels
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
24 include_size (bool): if True then copy
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
25 'size=...' records into new labels (default
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
26 is not to copy the size)
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
27
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
28 Yields: updated lines from the input FASTA.
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
29 """
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
30 # Iterate over lines in file
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
31 nlabel = 0
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
32 for line in fp:
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
33 # Strip trailing newlines
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
34 line = line.rstrip('\n')
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
35 if not line:
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
36 # Skip blank lines
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
37 continue
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
38 elif line.startswith('>'):
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
39 # Deal with start of a sequence record
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
40 nlabel += 1
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
41 label = line[1:].strip()
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
42 if include_size:
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
43 # Extract size from the label
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
44 try:
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
45 size = filter(
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
46 lambda x: x.startswith("size="),
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
47 label.split(';'))[0]
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
48 except Exception as ex:
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
49 raise Exception("Couldn't locate 'size' in "
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
50 "label: %s" % label)
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
51 yield ">%s%d;%s" % (args.prefix,
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
52 nlabel,
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
53 size)
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
54 else:
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
55 yield ">%s%d" % (args.prefix,
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
56 nlabel)
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
57 else:
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
58 # Echo the line to output
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
59 yield line
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
60
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
61 if __name__ == "__main__":
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
62 # Set up command line parser
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
63 p = argparse.ArgumentParser(description=DESCRIPTION)
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
64 p.add_argument("--needsize",
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
65 action="store_true",
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
66 help="include the size as part of the "
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
67 "output label ('size=...' must be present "
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
68 "in the input FASTA labels). Output labels "
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
69 "will be '<PREFIX><NUMBER>;size=<SIZE>'")
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
70 p.add_argument("--nosize",
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
71 action="store_true",
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
72 help="don't include the size as part of "
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
73 "the output label (this is the default)")
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
74 p.add_argument("fasta",
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
75 metavar="FASTA",
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
76 help="input FASTA file")
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
77 p.add_argument("prefix",
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
78 metavar="PREFIX",
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
79 help="prefix to use for labels in output")
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
80 # Process command line
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
81 args = p.parse_args()
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
82 # Relabel FASTA
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
83 with open(args.fasta,'rU') as fasta:
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
84 for line in relabel_fasta(fasta,
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
85 args.prefix,
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
86 include_size=args.needsize):
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
87 print line
fe354f5dd0ee planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff changeset
88