comparison tool-data/blastdb.loc.sample @ 1:5e9d5e536b79 draft

Uploaded v0.1.02 preview 2, clarify sample blastdb loc files, etc
author peterjc
date Tue, 03 Mar 2015 05:32:18 -0500
parents 432ea9614cc9
children
comparison
equal deleted inserted replaced
0:432ea9614cc9 1:5e9d5e536b79
1 #This is a sample file distributed with Galaxy that is used to define a 1 # This is a sample file distributed with Galaxy that is used to define a
2 #list of nucleotide BLAST databases, using three columns tab separated 2 # list of nucleotide BLAST databases, using three columns tab separated:
3 #(longer whitespace are TAB characters):
4 # 3 #
5 #<unique_id> <database_caption> <base_name_path> 4 # <unique_id>{tab}<database_caption>{tab}<base_name_path>
6 # 5 #
7 #The captions typically contain spaces and might end with the build date. 6 # The captions typically contain spaces and might end with the build date.
8 #It is important that the actual database name does not have a space in 7 # It is important that the actual database name does not have a space in
9 #it, and that there are only two tabs on each line. 8 # it, and that there are only two tabs on each line.
10 # 9 #
11 #So, for example, if your database is nt and the path to your base name 10 # You can download the NCBI provided protein databases like NR from here:
12 #is /depot/data2/galaxy/blastdb/nt/nt.chunk, then the blastdb.loc entry 11 # ftp://ftp.ncbi.nlm.nih.gov/blast/db/
13 #would look like this:
14 # 12 #
15 #nt_02_Dec_2009 nt 02 Dec 2009 /depot/data2/galaxy/blastdb/nt/nt.chunk 13 # For simplicity, many Galaxy servers are configured to offer just a live
14 # version of each NCBI BLAST database (updated with the NCBI provided
15 # Perl scripts or similar). In this case, we recommend using the case
16 # sensistive base-name of the NCBI BLAST databases as the unique id.
17 # Consistent naming is important for sharing workflows between Galaxy
18 # servers.
16 # 19 #
17 #and your /depot/data2/galaxy/blastdb/nt directory would contain all of 20 # For example, consider the NCBI partially non-redundant nucleotide
18 #your "base names" (e.g.): 21 # nt BLAST database, where you have downloaded and decompressed the
22 # files under /data/blastdb/ meaning at the command line BLAST+ would
23 # would look at the files /data/blastdb/nt.n* when run with:
19 # 24 #
20 #-rw-r--r-- 1 wychung galaxy 23437408 2008-04-09 11:26 nt.chunk.00.nhr 25 # $ blastn -db /data/blastdb/nt -query ...
21 #-rw-r--r-- 1 wychung galaxy 3689920 2008-04-09 11:26 nt.chunk.00.nin
22 #-rw-r--r-- 1 wychung galaxy 251215198 2008-04-09 11:26 nt.chunk.00.nsq
23 #...etc...
24 # 26 #
25 #Your blastdb.loc file should include an entry per line for each "base name" 27 # In this case use nr (lower case to match the NCBI file naming) as the
26 #you have stored. For example: 28 # unique id in the first column of blastdb_p.loc, giving an entry like
29 # this:
27 # 30 #
28 #nt_02_Dec_2009 nt 02 Dec 2009 /depot/data2/galaxy/blastdb/nt/nt.chunk 31 # nt{tab}NCBI partially non-redundant (nt){tab}/data/blastdb/nt
29 #wgs_30_Nov_2009 wgs 30 Nov 2009 /depot/data2/galaxy/blastdb/wgs/wgs.chunk
30 #test_20_Sep_2008 test 20 Sep 2008 /depot/data2/galaxy/blastdb/test/test
31 #...etc...
32 # 32 #
33 #You can download the NCBI provided protein databases like NT from here: 33 # Alternatively, rather than a "live" mirror of the NCBI databases which
34 #ftp://ftp.ncbi.nlm.nih.gov/blast/db/ 34 # are updated automatically, for full reproducibility the Galaxy Team
35 # recommend saving date-stamped copies of the databases. In this case
36 # your blastdb.loc file should include an entry per line for each
37 # version you have stored. For example:
35 # 38 #
36 #See also blastdb_p.loc which is for any protein BLAST database, and 39 # nt_05Jun2010{tab}NCBI nt (partially non-redundant) 05 Jun 2010{tab}/data/blastdb/05Jun2010/nt
37 #blastdb_d.loc which is for any protein domains databases (like CDD). 40 # nt_15Aug2010{tab}NCBI nt (partially non-redundant) 15 Aug 2010{tab}/data/blastdb/15Aug2010/nt
38 41 # ...etc...
39 42 #
43 # See also blastdb_p.loc which is for any protein BLAST database, and
44 # blastdb_d.loc which is for any protein domains databases (like CDD).