Galaxy |

BWAKIT options

run-gen-ref options:

Usage: /home/padge/miniconda3/envs/bwakit/bin/run-gen-ref <hs38|hs38a|hs38DH|hs37|hs37d5>
Analysis sets:
hs38     primary assembly of GRCh38 (incl. chromosomes, unplaced and unlocalized contigs) and EBV
hs38a    hs38 plus ALT contigs
hs38DH   hs38a plus decoy contigs and HLA genes (recommended for GRCh38 mapping)
hs37     primary assembly of GRCh37 (used by 1000g phase 1) plus the EBV genome
hs37d5   hs37 plus decoy contigs (used by 1000g phase 3)

Note: This script downloads human reference genomes. For hs38a and hs38DH, it needs additional
    sequences and ALT-to-REF mapping included in the bwa.kit package.

run-bwamem options:

Usage:   run-bwamem [options] <idxbase> <file1> [file2]

Options: -o STR    prefix for output files                       [inferred from input]
        -R STR    read group header line such as '@RG\tID:foo\tSM:bar'         [null]
        -x STR    read type: pacbio, ont2d or intractg                      [default]
                intractg: intra-species contig (kb query, highly similar)
                pacbio:   pacbio subreads (~10kb query, high error rate)
                ont2d:    Oxford Nanopore reads (~10kb query, higher error rate)
        -t INT    number of threads                                               [1]

        -H        apply HLA typing
        -a        trim HiSeq2000/2500 PE resequencing adapters (via trimadap)
        -d        mark duplicate (via samblaster)
        -S        for BAM input, don't shuffle
        -s        sort the output alignment (via samtools; requring more RAM)
        -k        keep temporary files generated by typeHLA
        -M        mark shorter split hits as secondary

Examples:

* Map paired-end reads to GRCh38+ALT+decoy+HLA and perform HLA typing:

    run-bwamem -o prefix -t8 -HR"@RG\tID:foo\tSM:bar" hs38DH.fa read1.fq.gz read2.fq.gz

Note: HLA typing is only effective for high-coverage data. The typing accuracy varies
with the quality of input. It is only intended for research purpose, not for diagnostic.

* Remap coordinate-sorted BAM, transfer read groups tags, trim Illumina PE adapters and
sort the output. The BAM may contain single-end or paired-end reads, or a mixture of
the two types. Specifying -R stops read group transfer.

    run-bwamem -sao prefix hs38DH.fa old-srt.bam

Note: the adaptor trimmer included in bwa.kit is chosen because it fits the current
mapping pipeline better. It is conservative and suboptimal. A more sophisticated
trimmer is recommended if this becomes a concern.

* Remap name-grouped BAM and mark duplicates:

    run-bwamem -Sdo prefix hs38DH.fa old-unsrt.bam

Note: streamed duplicate marking requires all reads from a single paired-end library
to be aligned at the same time.

Output files:

{-o}.aln.bam - final alignment
{-o}.hla.top - best genotypes for the 6 classical HLA genes (if there are HLA-* contigs)
{-o}.hla.all - additional HLA genotypes consistent with data
{-o}.log.*   - log files