Galaxy |

What it does This is a Galaxy wrapper exposing software from Babraham -fastq_screen Designed to search sequence data in fastq files for matches to contaminants or to check the likely species. In QC checking, you can use it to look for (eg) sequence from contaminating mycoplasmae in cell cultures - it may be non-differential but it will be pro-inflammatory and, well, less than ideal.

Here's the help from the perl script used by this wrapper:

Fastq Screen - Screen sequences against a panel of databases

Synopsis

fastq_screen [OPTION]... [FastQ FILE]...

Function

Fastq Screen is intended to be used as part of a QC pipeline. It allows you to take a sequence dataset and search it against a set of bowtie databases. It will then generate both a text and a graphical summary of the results to see if the sequence dataset contains the kind of sequences you expect or not.

Options

--help -h Print program help and exit

—subset Don't use the whole sequence file to search, but create a temporary dataset of this size. The dataset created will be of approximately (within a factor of 2) of this size. If the real dataset is smaller than twice the specified size then the whole dataset will be used. Subsets will be taken evenly from throughout the whole original dataset

--paired Files are paired end. Files must be specified in the correct order with pairs of files coming immediately after one another. Results files will be named after the first file in the pair if the names differ between the two files.

—outdir Specify a directory in which to save output files. If no directory is specified then output files are saved into the same directory as the input file.

--illumina1_3 Assume that the quality values are in encoded in Illumina v1.3 format. Defaults to Sanger format if this flag is not specified

—quiet Supress all progress reports on stderr and only report errors

--version Print the program version and exit

—threads Specify across how many threads bowtie will be allowed to run. Overrides the default value set in the conf file

--conf Manually specify a location for the configuration file to be used for this run. If not specified then the file will be taken from the same directory as the fastq_screen program

—color FastQ files are in colorspace. This requires that the libraries configures in the config file are colorspace indices.

--bowtie Specify extra parameters to be passed to bowtie. These parameters should be quoted to clearly delimit bowtie parameters from fastq_screen parameters. You should not try to use this option to override the normal search or reporting options for bowtie which are set automatically but it might be useful to allow reads to be trimmed before alignment etc.

—bowtie2 Specify extra parameters to be passed to bowtie 2. These parameters should be quoted to clearly delimit bowtie2 parameters from fastq_screen parameters. You should not try to use this option to override the normal search or reporting options for bowtie which are set automatically but it might be useful to allow reads to be trimmed before alignment etc.

--nohits Writes to a file the sequences that did not map to any of the specified genome libraries. If the subset option is also specified, only reads from the temporary dataset that failed to align to the reference genomes will be written to the output file.

—aligner Specify the aligner to use for the mapping. Valid arguments are 'bowtie' or 'bowtie2'.

Attributions

Note that each component has its own license. Good luck with figuring out your obligations.

fastq_screen - see the web site at Fastq_screen

Galaxy (that's what you are using right now!) for gluing everything together

Code and documentation comprising this tool was written by Ross Lazarus and that part is Licensed the same way as other rgenetics artefacts