annotate README.md @ 1:1758bc8694e4 draft default tip

"planemo upload commit 6a9208ec123353417bc4c7f81c02e50c05d54e63-dirty"
author sanbi-uwc
date Sun, 19 Apr 2020 12:00:17 +0000
parents a1ae9babbfe1
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
1 WindowMasker
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
2 ------------
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
3
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
4 This is a Galaxy Wrapper for WindowMasker. WindowMasker is a program that can mask out highly repetitive and low complexity DNA sequences within a genome using the sequence of the genome itself.
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
5
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
6 The WinMask module works in two stages. During Stage 1, unit counts are collected and stored in a separate file. During Stage 2 that file is used to mask the input sequences. Usually the unit counts file is created once per genome and then used multiple times for masking.
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
7
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
8 WindowMasker_mkcounts
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
9 ======================
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
10 Stage 1: Generate a counts file
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
11
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
12 $ windowmasker -mk_counts [-in input_file_name] [-out output_file_name] [-checkdup check_duplicates] [-t_low T_low] [-t_high T_high] [-fa_list input_is_a_list] [-mem available_memory] [-unit unit_length] [-genome_size genome_size] [-exclude_ids exclide_id_list] [-ids id_list] [-infmt input_format] [-sformat unit_counts_format] [-smem available_memory] [-use_ba use_bit_arrays]
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
13
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
14
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
15 WindowMasker_ustat
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
16 ===================
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
17 Stage 2: WindowMasker reads the data generated in Stage 1 and a set of input DNA sequences to output information about masked subintervals. If "-dust true" is specified, then the corresponding algorithm of the DUST module is applied to the input sequences in addition to window based masking. When DUST module is run, the results of the DUST and WinMask modules are merged together in the output. Specifically, a base is masked if it is masked by either DUST or by WinMask.
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
18
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
19 windowmasker -ustat unit_counts [-in input_file_name] [-out output_file_name] [-window window_size] [-t_thres T_threshold] [-t_extend T_extend] [-t_low T_low] [-t_high T_high] [-set_t_low score] [-set_t_high score] [-infmt input_format] [-outfmt output_format] [-dust use_dust] [-exclude_ids exclude_id_list] [-ids id_list] [-text_match text_match_ids] [-use_ba use_bit_arrays]
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
20
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
21 Output formats:
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
22 * Use the binary or text maskinfo ASN.1 output formats to generate the mask file for the NCBI BLAST+ makeblastdb tool
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
23 * Use the BED output format to generate a list of masked regions
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
24
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
25 Reference
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
26 ==========
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
27 [NCBI C++ Toolkit Cross Reference -- WindowMasker](https://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/app/winmasker/README)
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
28
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
29 Citation
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
30 =========
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
31
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
32 [1] Morgulis A, Gertz EM, Schaffer AA, Agarwala R. WindowMasker:
a1ae9babbfe1 "planemo upload commit 8e874182b67b9a313bbb4d947d8db040a6148c5d-dirty"
sanbi-uwc
parents:
diff changeset
33 Window based masker for sequence genomes. Submitted for publication.