# HG changeset patch # User Matt Shirley # Date 1371665497 14400 # Node ID 423f3eb064280331d847add404305b92087c4fcd # Parent d850324e82cf03a60431de095bc4bd3827dc24e8 more fixes for sra datatype, added sra_pileup diff -r d850324e82cf -r 423f3eb06428 datatypes_conf.xml --- a/datatypes_conf.xml Wed Jun 19 13:31:58 2013 -0400 +++ b/datatypes_conf.xml Wed Jun 19 14:11:37 2013 -0400 @@ -4,6 +4,6 @@ - + diff -r d850324e82cf -r 423f3eb06428 fastq_dump.xml --- a/fastq_dump.xml Wed Jun 19 13:31:58 2013 -0400 +++ b/fastq_dump.xml Wed Jun 19 14:11:37 2013 -0400 @@ -1,9 +1,9 @@ - format reads from NCBI SRA. + format reads from NCBI sra. fastq-dump --log-level fatal --accession '${input.name}' --stdout $split $aligned '$input' > $output fastq-dump --version - + @@ -26,6 +26,8 @@ sra_toolkit - This tool extracts fastqsanger reads from SRA archives using fastq-dump. The fastq-dump program is developed at NCBI, and is available at: http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software. + This tool extracts fastq format reads from sra archives using fastq-dump. + The fastq-dump program is developed at NCBI, and is available at: http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software. + Contact Matt Shirley at mdshw5@gmail.com for support and bug reports. diff -r d850324e82cf -r 423f3eb06428 sam_dump.xml --- a/sam_dump.xml Wed Jun 19 13:31:58 2013 -0400 +++ b/sam_dump.xml Wed Jun 19 14:11:37 2013 -0400 @@ -1,5 +1,5 @@ - format reads from NCBI SRA. + format reads from NCBI sra. sam-dump $header $aligned $primary '$input' > $output sam-dump --version @@ -27,7 +27,7 @@ sra_toolkit - This tool extracts SAM format reads from SRA archives using sam-dump. + This tool extracts sam format reads from sra archives using sam-dump. The sam-dump program is developed at NCBI, and is available at: http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software. Contact Matt Shirley at mdshw5@gmail.com for support and bug reports. diff -r d850324e82cf -r 423f3eb06428 sra.py --- a/sra.py Wed Jun 19 13:31:58 2013 -0400 +++ b/sra.py Wed Jun 19 14:11:37 2013 -0400 @@ -1,5 +1,5 @@ """ -SRA class +NCBI sra class """ import logging import binascii @@ -10,14 +10,15 @@ log = logging.getLogger(__name__) -class SRA( Binary ): +class sra( Binary ): """ Sequence Read Archive (SRA) """ file_ext = 'sra' def __init__( self, **kwd ): Binary.__init__( self, **kwd ) def sniff( self, filename ): - """ The first 8 bytes of any NCBI sra file is 'NCIB.sra', and the file is binary. Not sure if EBI and DDBJ files may differ. + """ The first 8 bytes of any NCBI sra file is 'NCIB.sra', and the file is binary. EBI and DDBJ files may differ, though EBI and DDBJ + submissions through NCBI (ERR and DRR accessions) read 'NCBI.sra'. For details about the format, see http://www.ncbi.nlm.nih.gov/books/n/helpsra/SRA_Overview_BK/#SRA_Overview_BK.4_SRA_Data_Structure """ try: @@ -30,7 +31,7 @@ return False def set_peek(self, dataset, is_multi_byte=False): if not dataset.dataset.purged: - dataset.peek = 'Binary SRA file' + dataset.peek = 'Binary sra file' dataset.blurb = data.nice_size(dataset.get_size()) else: dataset.peek = 'file does not exist' @@ -39,7 +40,7 @@ try: return dataset.peek except: - return 'Binary SRA file (%s)' % ( data.nice_size(dataset.get_size())) + return 'Binary sra file (%s)' % ( data.nice_size(dataset.get_size())) if hasattr(Binary, 'register_sniffable_binary_format'): - Binary.register_sniffable_binary_format('SRA', 'SRA', SRA) + Binary.register_sniffable_binary_format('sra', 'sra', sra) diff -r d850324e82cf -r 423f3eb06428 sra_fetch.xml --- a/sra_fetch.xml Wed Jun 19 13:31:58 2013 -0400 +++ b/sra_fetch.xml Wed Jun 19 14:11:37 2013 -0400 @@ -1,13 +1,13 @@ - - by accession from NCBI SRA. + + by accession from NCBI sra. sra_fetch.py --accession '$accession' --out '$output' - + - + - This tool fetches SRA archives from NCBI over FTP using the python ftplib. + This tool fetches sra archives by accession from NCBI over ftp. diff -r d850324e82cf -r 423f3eb06428 sra_pileup.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/sra_pileup.xml Wed Jun 19 14:11:37 2013 -0400 @@ -0,0 +1,19 @@ + + from NCBI sra. + sra-pileup '$input' > $output + sra-pileup --version + + + + + + + + sra_toolkit + + + This tool produces pileup format from sra archives using sra-pileup. + The sra-pileup program is developed at NCBI, and is available at: http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software. + Contact Matt Shirley at mdshw5@gmail.com for support and bug reports. + +