Mercurial > repos > diodupima > coast_search_blast

--- a/macros.xml	Thu Jul 15 11:51:11 2021 +0000
+++ b/macros.xml	Thu Jul 15 15:19:24 2021 +0000
@@ -327,40 +327,41 @@

 **AAIc - Average Amino Acid Identity coast**

-The AAIc is an attempt to have transform the AAI into a measure to compare two proteomes, as annotated.
-Low identity hits will be considered, when they are usually removed.
-On the other hand proteins that have no match at all will be also considered, as having 0 identity.
+The AAIc is an attempt modify the AAI into a measure to compare proteomes for all annotated proteins.
+Low identity hits will be considered, when they are usually removed by the traditional method.
+Proteins that have no match at all will be also considered, as having 0 identity match.
 It provides a way to compare the actual annotation and select organisms, even if more taxonomically distant, with proteins that could be
 relevant for the function determination in hypothetical proteins, as an example.
-For this the best hit is considered the one with the highest identity.
+For this the best hit is selected by the highest identity.

 **AAIbd - Average Amino Acid Identity blast-diamond**

 The AAIbd, is a implementation of a similar calculation to that of the original
-AAI, but calculated simply one way. It has by default a coverage and identity
-of 50 and 40 respectively, as used also by EzAAI, based in the recent study
+AAI, but calculated only one way. It has by default a coverage and identity
+of 50 and 40 respectively. This values are also used by EzAAI, based in the recent study
 done by Nicholson et. all in 2020. The best hit is then selected by the the
-highest identity The main purpose of this metric is to provide the user with an
-estimate of how close taxonomically that taxid might be. The designation **bd** is
-to distinguish it from the original AAIb, and because of the fact it might be
+highest identity.
+The main purpose of this metric is to provide the user with an
+estimate of how close taxonomically that Taxonomic node might be. The designation **bd** is used
+to distinguish it from the original AAIb. It identifies that the score might be
 produced using either BLAST results or diamond results.

-The following options might be used to calibrate this selection to the user's context:
+The following options might be used to calibrate this selection to the user's preference:

-- Minimum Identity: Minimum Amino Acid Identity, for hit selection for AAIbd calculation
-- Minimum Coverage: Minimum coverage, for hit selection for AAIbd calculation
+- Minimum Identity: Minimum Amino Acid Identity, for hit selection for the AAIbd calculation;
+- Minimum Coverage: Minimum coverage, for hit selection for the AAIbd calculation.

 **HITSPP - Hits Per Protein**

 The score is calculated by the quotient of the count of all the hits all proteins got, by the number of proteins in the query
 proteome.
-This will help the user understand how represented the proteome’s proteins might be in in that database.
+This will help the user understand how represented the proteome’s proteins might be in that particular database.

 .. class:: warningmark

 **WARNING** Very high values, above 100, might indicate that the taxonomic node very represented in the database.
 Intermediate steps only deal with up to 500 hits per proteins, before best-hit selection.
-As such, a small number of organisms with very high HITSPP can reduce the amount of organisms returned.
+As such, a small number of organisms with very high HITSPP scores can reduce the amount of organisms returned.

     ]]></token>
     <token name="@OUT_DESC@"><![CDATA[
@@ -368,14 +369,14 @@
 Outputs
 _______

-**Batch alignment results**  This is a non-optional output. It contains the total alignment search results for all proteins in the proteome. This can also be used to generated new outputs from the COAST Report tool, using different parameters.
+**Batch alignment results**  This is a non-optional output. It contains the all alignment search results for all proteins in the proteome. It can also be used to generated new outputs from the COAST Report tool, using different parameters.

 **Summarized report**  Is an HTML document that contains a list of filtered results ordered by AAIc. This report includes an heatmap visualization for protein identities.
 It also contains metadata for the COAST job.

-**Best-hits table** Tabular file with all the individual selected best-hits for each protein in the proteome. These are hits selected for the AAIc calculation.
+**Best-hits table** Tabular file with all the individual selected best-hits for each protein in the proteome. These are the hits selected for the AAIc calculation.

-**Results table** Tabular file with aggregated metrics for each proteome match. Aggregated for TAXID.
+**Results table** Tabular file with aggregated metrics for each proteome match. Aggregated for taxid.

     ]]></token>
     <token name="@TAX_FILTER_WARNING@"><![CDATA[
@@ -385,8 +386,8 @@

 Taxonomic based filtering is present in both BLAST and diamond. It is **THE** key for short COAST run times in large databases.

-Most organisms in a database, like nr or Trembl ,are not useful in the close proteomes identification process.
-When users try to identify similar viruses, the bacteria and eukaryotes in the same database will only slow the search down.
+Most organisms in a database, like nr or Trembl, are not useful in the close proteome identification process.
+When users, for example, try to identify similar viruses the bacteria and eukaryotes in the same database will only slow the search down.
 You should determine how wide you desire the search to be and identify the corresponding TAXID node.
 Some of these filters are provided along with this tool.

@@ -394,7 +395,7 @@
     <token name="@HYPO_FILTER_WARNING@"><![CDATA[
 .. class:: warningmark

-**WARNING** Hypothetical protein filtering might lead to worse results. Should only be used when few of the proteins have corresponding best-hits and the database might lack poorly studied proteins.
+**WARNING - Experimental feature** Hypothetical protein filtering can lead to worse results. Should only be used when few of the proteins have corresponding best-hits and the database might lack poorly studied proteins.

     ]]></token>
--- a/tool-data/blastdb.loc.sample	Thu Jul 15 11:51:11 2021 +0000
+++ b/tool-data/blastdb.loc.sample	Thu Jul 15 15:19:24 2021 +0000
@@ -1,5 +1,5 @@
 #This is a sample file distributed with Galaxy that enables tools
-#to use a directory of Samtools indexed sequences data files.  You will need
+#to use a directory of blast_databases.  You will need
 #to create these data files and then create a coast_taxonomic_filters.loc file
 #similar to this one (store it in this directory) that points to
 #the directories in which those files are stored. The coast_taxonomic_filters.loc
--- a/tool-data/coast_taxonomic_filters.loc.sample	Thu Jul 15 11:51:11 2021 +0000
+++ b/tool-data/coast_taxonomic_filters.loc.sample	Thu Jul 15 15:19:24 2021 +0000
@@ -1,5 +1,5 @@
 #This is a sample file distributed with Galaxy that enables tools
-#to use a directory of Samtools indexed sequences data files.  You will need
+#to use a directory of coast_taxonomic_filters.  You will need
 #to create these data files and then create a coast_taxonomic_filters.loc file
 #similar to this one (store it in this directory) that points to
 #the directories in which those files are stored. The coast_taxonomic_filters.loc
--- a/tool-data/diamond_database.loc.sample	Thu Jul 15 11:51:11 2021 +0000
+++ b/tool-data/diamond_database.loc.sample	Thu Jul 15 15:19:24 2021 +0000
@@ -1,5 +1,5 @@
 #This is a sample file distributed with Galaxy that enables tools
-#to use a directory of Samtools indexed sequences data files.  You will need
+#to use a directory of diamond Databases.  You will need
 #to create these data files and then create a coast_taxonomic_filters.loc file
 #similar to this one (store it in this directory) that points to
 #the directories in which those files are stored. The coast_taxonomic_filters.loc
--- a/tool_data_table_conf.xml.sample	Thu Jul 15 11:51:11 2021 +0000
+++ b/tool_data_table_conf.xml.sample	Thu Jul 15 15:19:24 2021 +0000
@@ -8,10 +8,6 @@
         <columns>value, name, path</columns>
         <file path="tool-data/blastdb.loc" />
     </table>
-	<table name="blastdb" comment_char="#">
-        <columns>value, name, path</columns>
-        <file path="tool-data/blastdb.loc" />
-    </table>
 	<table name="diamond_database" comment_char="#">
         <columns>value, name, db_path</columns>
         <file path="tool-data/diamond_database.loc" />