tmhmm_and_signalp: tools/protein_analysis/psortb.py annotate

annotate tools/protein_analysis/psortb.py @ 34:7a2e20baacee draft default tip

"v0.2.13 - Python 3 fix for raising StopIteration"

author	peterjc
date	Thu, 17 Jun 2021 17:58:23 +0000
parents	20da7f48b56f
children

rev	line source
8 391a142c1e60 Uploaded peterjc parents: diff changeset	1 #!/usr/bin/env python
391a142c1e60 Uploaded peterjc parents: diff changeset	2 """Wrapper for psortb for use in Galaxy.
391a142c1e60 Uploaded peterjc parents: diff changeset	3
391a142c1e60 Uploaded peterjc parents: diff changeset	4 This script takes exactly six command line arguments - which includes the
391a142c1e60 Uploaded peterjc parents: diff changeset	5 number of threads, and the input protein FASTA filename and output
391a142c1e60 Uploaded peterjc parents: diff changeset	6 tabular filename. It then splits up the FASTA input and calls multiple
391a142c1e60 Uploaded peterjc parents: diff changeset	7 copies of the standalone psortb v3 program, then collates the output.
391a142c1e60 Uploaded peterjc parents: diff changeset	8 e.g. Rather than this,
391a142c1e60 Uploaded peterjc parents: diff changeset	9
391a142c1e60 Uploaded peterjc parents: diff changeset	10 psort $type -c $cutoff -d $divergent -o long $sequence > $outfile
391a142c1e60 Uploaded peterjc parents: diff changeset	11
391a142c1e60 Uploaded peterjc parents: diff changeset	12 Call this:
391a142c1e60 Uploaded peterjc parents: diff changeset	13
391a142c1e60 Uploaded peterjc parents: diff changeset	14 psort $threads $type $cutoff $divergent $sequence $outfile
391a142c1e60 Uploaded peterjc parents: diff changeset	15
391a142c1e60 Uploaded peterjc parents: diff changeset	16 If ommitting -c or -d options, set $cutoff and $divergent to zero or blank.
391a142c1e60 Uploaded peterjc parents: diff changeset	17
391a142c1e60 Uploaded peterjc parents: diff changeset	18 Note that this is somewhat redundant with job-splitting available in Galaxy
391a142c1e60 Uploaded peterjc parents: diff changeset	19 itself (see the SignalP XML file for settings), but both can be applied.
391a142c1e60 Uploaded peterjc parents: diff changeset	20
391a142c1e60 Uploaded peterjc parents: diff changeset	21 Additionally it ensures the header line (with the column names) starts
391a142c1e60 Uploaded peterjc parents: diff changeset	22 with a # character as used elsewhere in Galaxy.
391a142c1e60 Uploaded peterjc parents: diff changeset	23 """
30 6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes peterjc parents: 29 diff changeset	24
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes peterjc parents: 29 diff changeset	25 from __future__ import print_function
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes peterjc parents: 29 diff changeset	26
8 391a142c1e60 Uploaded peterjc parents: diff changeset	27 import os
30 6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes peterjc parents: 29 diff changeset	28 import sys
8 391a142c1e60 Uploaded peterjc parents: diff changeset	29 import tempfile
30 6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes peterjc parents: 29 diff changeset	30
6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes peterjc parents: 29 diff changeset	31 from seq_analysis_utils import run_jobs, split_fasta, thread_count
8 391a142c1e60 Uploaded peterjc parents: diff changeset	32
391a142c1e60 Uploaded peterjc parents: diff changeset	33 FASTA_CHUNK = 500
391a142c1e60 Uploaded peterjc parents: diff changeset	34
391a142c1e60 Uploaded peterjc parents: diff changeset	35 if "-v" in sys.argv or "--version" in sys.argv:
391a142c1e60 Uploaded peterjc parents: diff changeset	36 """Return underlying PSORTb's version"""
391a142c1e60 Uploaded peterjc parents: diff changeset	37 sys.exit(os.system("psort --version"))
391a142c1e60 Uploaded peterjc parents: diff changeset	38
391a142c1e60 Uploaded peterjc parents: diff changeset	39 if len(sys.argv) != 8:
32 20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	40 sys.exit(
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	41 "Require 7 arguments, number of threads (int), type (e.g. archaea), "
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	42 "output (e.g. terse/normal/long), cutoff, divergent, input protein "
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	43 "FASTA file & output tabular file"
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	44 )
8 391a142c1e60 Uploaded peterjc parents: diff changeset	45
391a142c1e60 Uploaded peterjc parents: diff changeset	46 num_threads = thread_count(sys.argv[1], default=4)
391a142c1e60 Uploaded peterjc parents: diff changeset	47 org_type = sys.argv[2]
391a142c1e60 Uploaded peterjc parents: diff changeset	48 out_type = sys.argv[3]
391a142c1e60 Uploaded peterjc parents: diff changeset	49 cutoff = sys.argv[4]
391a142c1e60 Uploaded peterjc parents: diff changeset	50 if cutoff.strip() and float(cutoff.strip()) != 0.0:
391a142c1e60 Uploaded peterjc parents: diff changeset	51 cutoff = "-c %s" % cutoff
391a142c1e60 Uploaded peterjc parents: diff changeset	52 else:
391a142c1e60 Uploaded peterjc parents: diff changeset	53 cutoff = ""
391a142c1e60 Uploaded peterjc parents: diff changeset	54 divergent = sys.argv[5]
391a142c1e60 Uploaded peterjc parents: diff changeset	55 if divergent.strip() and float(divergent.strip()) != 0.0:
391a142c1e60 Uploaded peterjc parents: diff changeset	56 divergent = "-d %s" % divergent
391a142c1e60 Uploaded peterjc parents: diff changeset	57 else:
391a142c1e60 Uploaded peterjc parents: diff changeset	58 divergent = ""
391a142c1e60 Uploaded peterjc parents: diff changeset	59 fasta_file = sys.argv[6]
391a142c1e60 Uploaded peterjc parents: diff changeset	60 tabular_file = sys.argv[7]
391a142c1e60 Uploaded peterjc parents: diff changeset	61
391a142c1e60 Uploaded peterjc parents: diff changeset	62 if out_type == "terse":
32 20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	63 header = ["SeqID", "Localization", "Score"]
8 391a142c1e60 Uploaded peterjc parents: diff changeset	64 elif out_type == "normal":
29 3cb02adf4326 v0.2.9 Python style improvements peterjc parents: 26 diff changeset	65 sys.exit("Normal output not implemented yet, sorry.")
8 391a142c1e60 Uploaded peterjc parents: diff changeset	66 elif out_type == "long":
391a142c1e60 Uploaded peterjc parents: diff changeset	67 if org_type == "-n":
29 3cb02adf4326 v0.2.9 Python style improvements peterjc parents: 26 diff changeset	68 # Gram negative bacteria
32 20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	69 header = [
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	70 "SeqID",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	71 "CMSVM-_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	72 "CMSVM-_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	73 "CytoSVM-_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	74 "CytoSVM-_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	75 "ECSVM-_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	76 "ECSVM-_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	77 "ModHMM-_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	78 "ModHMM-_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	79 "Motif-_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	80 "Motif-_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	81 "OMPMotif-_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	82 "OMPMotif-_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	83 "OMSVM-_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	84 "OMSVM-_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	85 "PPSVM-_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	86 "PPSVM-_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	87 "Profile-_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	88 "Profile-_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	89 "SCL-BLAST-_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	90 "SCL-BLAST-_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	91 "SCL-BLASTe-_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	92 "SCL-BLASTe-_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	93 "Signal-_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	94 "Signal-_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	95 "Cytoplasmic_Score",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	96 "CytoplasmicMembrane_Score",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	97 "Periplasmic_Score",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	98 "OuterMembrane_Score",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	99 "Extracellular_Score",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	100 "Final_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	101 "Final_Localization_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	102 "Final_Score",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	103 "Secondary_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	104 "PSortb_Version",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	105 ]
8 391a142c1e60 Uploaded peterjc parents: diff changeset	106 elif org_type == "-p":
29 3cb02adf4326 v0.2.9 Python style improvements peterjc parents: 26 diff changeset	107 # Gram positive bacteria
32 20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	108 header = [
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	109 "SeqID",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	110 "CMSVM+_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	111 "CMSVM+_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	112 "CWSVM+_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	113 "CWSVM+_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	114 "CytoSVM+_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	115 "CytoSVM+_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	116 "ECSVM+_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	117 "ECSVM+_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	118 "ModHMM+_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	119 "ModHMM+_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	120 "Motif+_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	121 "Motif+_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	122 "Profile+_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	123 "Profile+_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	124 "SCL-BLAST+_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	125 "SCL-BLAST+_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	126 "SCL-BLASTe+_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	127 "SCL-BLASTe+_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	128 "Signal+_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	129 "Signal+_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	130 "Cytoplasmic_Score",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	131 "CytoplasmicMembrane_Score",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	132 "Cellwall_Score",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	133 "Extracellular_Score",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	134 "Final_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	135 "Final_Localization_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	136 "Final_Score",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	137 "Secondary_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	138 "PSortb_Version",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	139 ]
8 391a142c1e60 Uploaded peterjc parents: diff changeset	140 elif org_type == "-a":
29 3cb02adf4326 v0.2.9 Python style improvements peterjc parents: 26 diff changeset	141 # Archaea
32 20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	142 header = [
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	143 "SeqID",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	144 "CMSVM_a_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	145 "CMSVM_a_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	146 "CWSVM_a_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	147 "CWSVM_a_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	148 "CytoSVM_a_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	149 "CytoSVM_a_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	150 "ECSVM_a_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	151 "ECSVM_a_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	152 "ModHMM_a_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	153 "ModHMM_a_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	154 "Motif_a_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	155 "Motif_a_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	156 "Profile_a_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	157 "Profile_a_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	158 "SCL-BLAST_a_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	159 "SCL-BLAST_a_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	160 "SCL-BLASTe_a_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	161 "SCL-BLASTe_a_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	162 "Signal_a_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	163 "Signal_a_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	164 "Cytoplasmic_Score",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	165 "CytoplasmicMembrane_Score",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	166 "Cellwall_Score",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	167 "Extracellular_Score",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	168 "Final_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	169 "Final_Localization_Details",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	170 "Final_Score",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	171 "Secondary_Localization",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	172 "PSortb_Version",
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	173 ]
8 391a142c1e60 Uploaded peterjc parents: diff changeset	174 else:
29 3cb02adf4326 v0.2.9 Python style improvements peterjc parents: 26 diff changeset	175 sys.exit("Expected -n, -p or -a for the organism type, not %r" % org_type)
8 391a142c1e60 Uploaded peterjc parents: diff changeset	176 else:
29 3cb02adf4326 v0.2.9 Python style improvements peterjc parents: 26 diff changeset	177 sys.exit("Expected terse, normal or long for the output type, not %r" % out_type)
8 391a142c1e60 Uploaded peterjc parents: diff changeset	178
391a142c1e60 Uploaded peterjc parents: diff changeset	179 tmp_dir = tempfile.mkdtemp()
391a142c1e60 Uploaded peterjc parents: diff changeset	180
29 3cb02adf4326 v0.2.9 Python style improvements peterjc parents: 26 diff changeset	181
8 391a142c1e60 Uploaded peterjc parents: diff changeset	182 def clean_tabular(raw_handle, out_handle):
391a142c1e60 Uploaded peterjc parents: diff changeset	183 """Clean up tabular TMHMM output, returns output line count."""
391a142c1e60 Uploaded peterjc parents: diff changeset	184 global header
391a142c1e60 Uploaded peterjc parents: diff changeset	185 count = 0
391a142c1e60 Uploaded peterjc parents: diff changeset	186 for line in raw_handle:
391a142c1e60 Uploaded peterjc parents: diff changeset	187 if not line.strip() or line.startswith("#"):
29 3cb02adf4326 v0.2.9 Python style improvements peterjc parents: 26 diff changeset	188 # Ignore any blank lines or comment lines
8 391a142c1e60 Uploaded peterjc parents: diff changeset	189 continue
391a142c1e60 Uploaded peterjc parents: diff changeset	190 parts = [x.strip() for x in line.rstrip("\r\n").split("\t")]
391a142c1e60 Uploaded peterjc parents: diff changeset	191 if parts == header:
29 3cb02adf4326 v0.2.9 Python style improvements peterjc parents: 26 diff changeset	192 # Ignore the header line
8 391a142c1e60 Uploaded peterjc parents: diff changeset	193 continue
391a142c1e60 Uploaded peterjc parents: diff changeset	194 if not parts[-1] and len(parts) == len(header) + 1:
29 3cb02adf4326 v0.2.9 Python style improvements peterjc parents: 26 diff changeset	195 # Ignore dummy blank extra column, e.g.
3cb02adf4326 v0.2.9 Python style improvements peterjc parents: 26 diff changeset	196 # "...2.0\t\tPSORTb version 3.0\t\n"
8 391a142c1e60 Uploaded peterjc parents: diff changeset	197 parts = parts[:-1]
32 20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	198 assert len(parts) == len(header), "%i fields, not %i, in line:\n%r" % (
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	199 len(line),
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	200 len(header),
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	201 line,
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	202 )
8 391a142c1e60 Uploaded peterjc parents: diff changeset	203 out_handle.write(line)
391a142c1e60 Uploaded peterjc parents: diff changeset	204 count += 1
391a142c1e60 Uploaded peterjc parents: diff changeset	205 return count
391a142c1e60 Uploaded peterjc parents: diff changeset	206
30 6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes peterjc parents: 29 diff changeset	207
29 3cb02adf4326 v0.2.9 Python style improvements peterjc parents: 26 diff changeset	208 # Note that if the input FASTA file contains no sequences,
3cb02adf4326 v0.2.9 Python style improvements peterjc parents: 26 diff changeset	209 # split_fasta returns an empty list (i.e. zero temp files).
8 391a142c1e60 Uploaded peterjc parents: diff changeset	210 fasta_files = split_fasta(fasta_file, os.path.join(tmp_dir, "tmhmm"), FASTA_CHUNK)
29 3cb02adf4326 v0.2.9 Python style improvements peterjc parents: 26 diff changeset	211 temp_files = [f + ".out" for f in fasta_files]
32 20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	212 jobs = [
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	213 "psort %s %s %s -o %s %s > %s"
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	214 % (org_type, cutoff, divergent, out_type, fasta, temp)
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	215 for fasta, temp in zip(fasta_files, temp_files)
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	216 ]
8 391a142c1e60 Uploaded peterjc parents: diff changeset	217
29 3cb02adf4326 v0.2.9 Python style improvements peterjc parents: 26 diff changeset	218
8 391a142c1e60 Uploaded peterjc parents: diff changeset	219 def clean_up(file_list):
391a142c1e60 Uploaded peterjc parents: diff changeset	220 for f in file_list:
391a142c1e60 Uploaded peterjc parents: diff changeset	221 if os.path.isfile(f):
391a142c1e60 Uploaded peterjc parents: diff changeset	222 os.remove(f)
391a142c1e60 Uploaded peterjc parents: diff changeset	223 try:
391a142c1e60 Uploaded peterjc parents: diff changeset	224 os.rmdir(tmp_dir)
29 3cb02adf4326 v0.2.9 Python style improvements peterjc parents: 26 diff changeset	225 except Exception:
8 391a142c1e60 Uploaded peterjc parents: diff changeset	226 pass
391a142c1e60 Uploaded peterjc parents: diff changeset	227
30 6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes peterjc parents: 29 diff changeset	228
8 391a142c1e60 Uploaded peterjc parents: diff changeset	229 if len(jobs) > 1 and num_threads > 1:
29 3cb02adf4326 v0.2.9 Python style improvements peterjc parents: 26 diff changeset	230 # A small "info" message for Galaxy to show the user.
30 6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes peterjc parents: 29 diff changeset	231 print("Using %i threads for %i tasks" % (min(num_threads, len(jobs)), len(jobs)))
8 391a142c1e60 Uploaded peterjc parents: diff changeset	232 results = run_jobs(jobs, num_threads)
391a142c1e60 Uploaded peterjc parents: diff changeset	233 for fasta, temp, cmd in zip(fasta_files, temp_files, jobs):
391a142c1e60 Uploaded peterjc parents: diff changeset	234 error_level = results[cmd]
391a142c1e60 Uploaded peterjc parents: diff changeset	235 if error_level:
391a142c1e60 Uploaded peterjc parents: diff changeset	236 try:
391a142c1e60 Uploaded peterjc parents: diff changeset	237 output = open(temp).readline()
391a142c1e60 Uploaded peterjc parents: diff changeset	238 except IOError:
391a142c1e60 Uploaded peterjc parents: diff changeset	239 output = ""
391a142c1e60 Uploaded peterjc parents: diff changeset	240 clean_up(fasta_files + temp_files)
32 20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	241 sys.exit(
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	242 "One or more tasks failed, e.g. %i from %r gave:\n%s"
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	243 % (error_level, cmd, output),
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	244 error_level,
20da7f48b56f "Check this is up to date with all 2020 changes" peterjc parents: 30 diff changeset	245 )
8 391a142c1e60 Uploaded peterjc parents: diff changeset	246 del results
391a142c1e60 Uploaded peterjc parents: diff changeset	247 del jobs
391a142c1e60 Uploaded peterjc parents: diff changeset	248
391a142c1e60 Uploaded peterjc parents: diff changeset	249 out_handle = open(tabular_file, "w")
391a142c1e60 Uploaded peterjc parents: diff changeset	250 out_handle.write("#%s\n" % "\t".join(header))
11 3d74c1176d67 Uploaded minor fix peterjc parents: 8 diff changeset	251 count = 0
8 391a142c1e60 Uploaded peterjc parents: diff changeset	252 for temp in temp_files:
391a142c1e60 Uploaded peterjc parents: diff changeset	253 data_handle = open(temp)
11 3d74c1176d67 Uploaded minor fix peterjc parents: 8 diff changeset	254 count += clean_tabular(data_handle, out_handle)
8 391a142c1e60 Uploaded peterjc parents: diff changeset	255 data_handle.close()
391a142c1e60 Uploaded peterjc parents: diff changeset	256 if not count:
391a142c1e60 Uploaded peterjc parents: diff changeset	257 clean_up(fasta_files + temp_files)
29 3cb02adf4326 v0.2.9 Python style improvements peterjc parents: 26 diff changeset	258 sys.exit("No output from psortb")
8 391a142c1e60 Uploaded peterjc parents: diff changeset	259 out_handle.close()
30 6d9d7cdf00fc v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes peterjc parents: 29 diff changeset	260 print("%i records" % count)
8 391a142c1e60 Uploaded peterjc parents: diff changeset	261
391a142c1e60 Uploaded peterjc parents: diff changeset	262 clean_up(fasta_files + temp_files)

Mercurial > repos > peterjc > tmhmm_and_signalp

annotate tools/protein_analysis/psortb.py @ 34:7a2e20baacee draft default tip