Mercurial > repos > pjbriggs > amplicon_analysis_pipeline
annotate relabel_fasta.py @ 26:f0917c340f13 draft
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730-dirty
author | pjbriggs |
---|---|
date | Thu, 30 Aug 2018 08:41:11 -0400 |
parents | fe354f5dd0ee |
children |
rev | line source |
---|---|
24
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
1 #!/usr/bin/env python |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
2 |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
3 DESCRIPTION = \ |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
4 """Replace FASTA labels with new labels <PREFIX>1, <PREFIX>2, |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
5 <PREFIX>3 ... (<PREFIX> is provided by the user via the command |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
6 line). |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
7 |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
8 Can be used to label OTUs as OTU_1, OTU_2 etc. |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
9 |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
10 This reimplements the functionality of the fasta_number.py utility |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
11 from https://drive5.com/python/fasta_number_py.html |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
12 """ |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
13 |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
14 import argparse |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
15 |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
16 def relabel_fasta(fp,prefix,include_size=False): |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
17 """ |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
18 Relabel sequence records in a FASTA file |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
19 |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
20 Arguments: |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
21 fp (File): file-like object opened for reading |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
22 input FASTA data from |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
23 prefix (str): prefix to use in new labels |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
24 include_size (bool): if True then copy |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
25 'size=...' records into new labels (default |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
26 is not to copy the size) |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
27 |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
28 Yields: updated lines from the input FASTA. |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
29 """ |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
30 # Iterate over lines in file |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
31 nlabel = 0 |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
32 for line in fp: |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
33 # Strip trailing newlines |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
34 line = line.rstrip('\n') |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
35 if not line: |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
36 # Skip blank lines |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
37 continue |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
38 elif line.startswith('>'): |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
39 # Deal with start of a sequence record |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
40 nlabel += 1 |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
41 label = line[1:].strip() |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
42 if include_size: |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
43 # Extract size from the label |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
44 try: |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
45 size = filter( |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
46 lambda x: x.startswith("size="), |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
47 label.split(';'))[0] |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
48 except Exception as ex: |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
49 raise Exception("Couldn't locate 'size' in " |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
50 "label: %s" % label) |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
51 yield ">%s%d;%s" % (args.prefix, |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
52 nlabel, |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
53 size) |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
54 else: |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
55 yield ">%s%d" % (args.prefix, |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
56 nlabel) |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
57 else: |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
58 # Echo the line to output |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
59 yield line |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
60 |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
61 if __name__ == "__main__": |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
62 # Set up command line parser |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
63 p = argparse.ArgumentParser(description=DESCRIPTION) |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
64 p.add_argument("--needsize", |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
65 action="store_true", |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
66 help="include the size as part of the " |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
67 "output label ('size=...' must be present " |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
68 "in the input FASTA labels). Output labels " |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
69 "will be '<PREFIX><NUMBER>;size=<SIZE>'") |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
70 p.add_argument("--nosize", |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
71 action="store_true", |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
72 help="don't include the size as part of " |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
73 "the output label (this is the default)") |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
74 p.add_argument("fasta", |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
75 metavar="FASTA", |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
76 help="input FASTA file") |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
77 p.add_argument("prefix", |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
78 metavar="PREFIX", |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
79 help="prefix to use for labels in output") |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
80 # Process command line |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
81 args = p.parse_args() |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
82 # Relabel FASTA |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
83 with open(args.fasta,'rU') as fasta: |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
84 for line in relabel_fasta(fasta, |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
85 args.prefix, |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
86 include_size=args.needsize): |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
87 print line |
fe354f5dd0ee
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit 34034189622f4cf14edd12a4de43739c37b50730
pjbriggs
parents:
diff
changeset
|
88 |