view msi_filtering.xml @ 7:a3ec6c3564ee draft

planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/msi_filtering commit 620a469e20836b921b6c0147421c8a4268b66ebd
author galaxyp
date Wed, 15 Aug 2018 05:38:41 -0400
parents c7c91ceeffcd
children 04fb6dce1ee3
line wrap: on
line source

<tool id="mass_spectrometry_imaging_filtering" name="MSI filtering" version="1.10.0.5">
    <description>tool for filtering mass spectrometry imaging data</description>
    <requirements>
        <requirement type="package" version="1.10.0">bioconductor-cardinal</requirement>
        <requirement type="package" version="2.2.1">r-gridextra</requirement>
        <requirement type="package" version="2.2.1">r-ggplot2</requirement>
    </requirements>
    <command detect_errors="exit_code">
    <![CDATA[

        #if $infile.ext == 'imzml'
            ln -s '${infile.extra_files_path}/imzml' infile.imzML &&
            ln -s '${infile.extra_files_path}/ibd' infile.ibd &&
        #elif $infile.ext == 'analyze75'
            ln -s '${infile.extra_files_path}/hdr' infile.hdr &&
            ln -s '${infile.extra_files_path}/img' infile.img &&
            ln -s '${infile.extra_files_path}/t2m' infile.t2m &&
        #else
            ln -s $infile infile.RData &&
        #end if
        cat '${MSI_subsetting}' &&
        echo ${MSI_subsetting} &&
        Rscript '${MSI_subsetting}'

    ]]>
    </command>
    <configfiles>
        <configfile name="MSI_subsetting"><![CDATA[


################################# load libraries and read file #################


library(Cardinal)
library(ggplot2)
library(gridExtra)

#if $infile.ext == 'imzml'
    #if str($processed_cond.processed_file) == "processed":
        msidata <- readImzML('infile', mass.accuracy=$processed_cond.accuracy, units.accuracy = "$processed_cond.units")
    #else
        msidata <- readImzML('infile')
    #end if
#elif $infile.ext == 'analyze75'
    msidata = readAnalyze('infile')
#else
    load('infile.RData')
#end if


########################### QC numbers ########################

        ## Number of features (m/z)
        maxfeatures = length(features(msidata))
        ## Range m/z
        minmz = round(min(mz(msidata)), digits=2)
        maxmz = round(max(mz(msidata)), digits=2)
        ## Number of spectra (pixels)
        pixelcount = length(pixels(msidata))
        ## Range x coordinates
        minimumx = min(coord(msidata)[,1])
        maximumx = max(coord(msidata)[,1])
        ## Range y coordinates
        minimumy = min(coord(msidata)[,2])
        maximumy = max(coord(msidata)[,2])
        ## Number of intensities > 0
        npeaks= sum(spectra(msidata)[]>0, na.rm=TRUE)
        ## Spectra multiplied with m/z (potential number of peaks)
        numpeaks = ncol(spectra(msidata)[])*nrow(spectra(msidata)[])
        ## Percentage of intensities > 0
        percpeaks = round(npeaks/numpeaks*100, digits=2)
        ## Number of empty TICs
        TICs = colSums(spectra(msidata)[], na.rm=TRUE) 
        NumemptyTIC = sum(TICs == 0)
        ## median TIC
        medint = round(median(TICs), digits=2)
        ## Store features for QC plot
        featuresinfile = mz(msidata)

## Next steps will only run if there are more than 0 intensities/pixels/features in the file

if (sum(spectra(msidata)[]>0, na.rm=TRUE) > 0)
{


        ## prepare dataframe for QC of pixel distribution (will be overwritten in filtering of pixels condition)
        position_df = cbind(coord(msidata)[,1:2], rep("$infile.element_identifier", times=ncol(msidata)))
        colnames(position_df)[3] = "annotation"

    ###################################### Filtering of pixels #####################
    ################################################################################

    #################### Pixels in the one column format "x=,y=" #####################

    #if str($pixels_cond.pixel_filtering) == "single_column":
        print("single column")

        ## read tabular file, count number of rows (= number of pixels), count how many pixels are valid
        input_list = read.delim("$pixels_cond.single_pixels", header = FALSE, stringsAsFactors = FALSE)
        startingrow = $pixels_cond.pixel_header+1
        numberpixels = length(startingrow:nrow(input_list))
        valid_entries = input_list[startingrow:nrow(input_list),$pixels_cond.pixel_column] %in% names(pixels(msidata))
        validpixels = sum(valid_entries)
        valid_annotations = input_list[valid_entries,c($pixels_cond.pixel_column, $pixels_cond.annotation_column)]
        colnames(valid_annotations) = c("pixel_coordinates", "annotation")

        ## for valid pixels: filter file for pixels and create dataframe with x,y,annotation for QC
        if (validpixels != 0){
            ## filter file for pixels
            pixelsofinterest = pixels(msidata)[names(pixels(msidata)) %in% valid_annotations[,1]]
            msidata = msidata[,pixelsofinterest]
            ## position_df for QC


            ###pixel_coords = coord(msidata)[names(pixels(msidata)) %in% valid_annotations[,1],1:2]
            pixel_coords = cbind(rownames(coord(msidata)), coord(msidata))
            colnames(pixel_coords)[1] = "pixel_coordinates"

            ###position_df = cbind(pixel_coords, valid_annotations[,2])
            position_df = merge(pixel_coords, valid_annotations)
            position_df\$annotation = factor(position_df\$annotation)

        }else{
            msidata = msidata[,0]
            validpixels=0}

    ############ Pixels in two columns format: x and y in different columns #############

    #elif str($pixels_cond.pixel_filtering) == "two_columns":
        print("two columns")

        ## read tabular file, count number of rows (= number of pixels), extract dataframe with x,y,annotation (for QC), count number of valid pixels
        input_list = read.delim("$pixels_cond.two_columns_pixel", header = FALSE, 
        stringsAsFactors = FALSE)
        startingrow = $pixels_cond.pixel_header+1
        numberpixels = length(startingrow:nrow(input_list))
        inputpixels = input_list[startingrow:nrow(input_list),c($pixels_cond.pixel_column_x, $pixels_cond.pixel_column_y, $pixels_cond.annotation_column_xy)]
        colnames(inputpixels) = c("x", "y", "annotation")
        position_df = merge(coord(msidata)[,1:2], inputpixels, by=c("x", "y"), all.x=TRUE)
        validpixels = nrow(position_df)
        colnames(position_df)[3] = "annotation"
        position_df\$annotation = factor(position_df\$annotation)

        ## for valid pixels: filter file for pixels
        if (validpixels != 0){
            pixelvector = character()
            for (pixel in 1:nrow(position_df)){
                pixelvector[pixel] = paste0("x = ", position_df[pixel,1],", ", "y = ", position_df[pixel,2])}
            pixelsofinterest= pixels(msidata)[names(pixels(msidata)) %in% pixelvector]
            msidata = msidata[,pixelsofinterest]
        }else{
            validpixels=0}

    ########### Pixels wihin x and y minima and maxima are kept ###################

    #elif str($pixels_cond.pixel_filtering) == "pixel_range":
        print("pixel range")

        numberpixels = "range"
        validpixels = "range"

        ## only filter pixels if at least one pixel will be left
        if (sum(coord(msidata)\$x <= $pixels_cond.max_x_range & coord(msidata)\$x >= $pixels_cond.min_x_range) > 0 & sum(coord(msidata)\$y <= $pixels_cond.max_y_range & coord(msidata)\$y >= $pixels_cond.min_y_range) > 0){

            msidata = msidata[, coord(msidata)\$x <= $pixels_cond.max_x_range & coord(msidata)\$x >= $pixels_cond.min_x_range]
            msidata = msidata[, coord(msidata)\$y <= $pixels_cond.max_y_range & coord(msidata)\$y >= $pixels_cond.min_y_range]
        }else{
            msidata = msidata[,0]
            print("no valid pixel found")}

        ## update position_df for filtered pixels
        position_df = cbind(coord(msidata)[,1:2], rep("$infile.element_identifier", times=ncol(msidata)))
        colnames(position_df)[3] = "annotation"
        position_df\$annotation = factor(position_df\$annotation)

    #elif str($pixels_cond.pixel_filtering) == "none":
        print("no pixel filtering")

        numberpixels = 0
        validpixels = 0

    #end if

}else{
    print("Inputfile has no intensities > 0")
}

    ################################# filtering of features ######################
    ##############################################################################

    ####################### Keep m/z from tabular file #########################

## feature filtering only when pixels/features/intensities are left
    npeaks_before_filtering= sum(spectra(msidata)[]>0, na.rm=TRUE)



if (npeaks_before_filtering > 0)

{

    #if str($features_cond.features_filtering) == "features_list":
        print("feature list")

        ## read tabular file, define starting row, extract and count valid features
        input_features = read.delim("$inputfeatures", header = FALSE, stringsAsFactors = FALSE)
        startingrow = $features_cond.feature_header+1
        extracted_features = input_features[startingrow:nrow(input_features),$features_cond.feature_column]
        numberfeatures = length(extracted_features)

        ## find out type of tabular file (numeric or character format)
        if (grepl("m/z = ", input_features[startingrow,$features_cond.feature_column])==FALSE){

    ### if input is in numeric format
            if (class(extracted_features) == "numeric"){
                ### max digits given in the input file will be used to match m/z but the maximum is 4
                max_digits = max(nchar(matrix(unlist(strsplit(as.character(extracted_features), "\\.")), ncol=2, byrow=TRUE)[,2]))
                if (max_digits >4)
                {
                    max_digits = 4
                }

                validfeatures = round(extracted_features, max_digits) %in% round(mz(msidata),max_digits)
                featuresofinterest = features(msidata)[round(mz(msidata), digits = max_digits) %in% round(extracted_features[validfeatures], max_digits)]
                validmz = length(unique(featuresofinterest))
            }else{
                    validmz = 0
                    featuresofinterest = 0}

    ### if input is already in character format (m/z = 800.01)

        }else{
            validfeatures = extracted_features %in% names(features(msidata))
            featuresofinterest = features(msidata)[names(features(msidata)) %in% extracted_features[validfeatures]]
            validmz = sum(validfeatures)}

    ### filter msidata for valid features

        msidata = msidata[featuresofinterest,]

    ############### features within a given range are kept #####################

    #elif str($features_cond.features_filtering) == "features_range":
        print("feature range")

        numberfeatures = "range"
        validmz = "range"

        if (sum(mz(msidata) >= $features_cond.min_mz & mz(msidata) <= $features_cond.max_mz)> 0){
            msidata = msidata[mz(msidata) >= $features_cond.min_mz & mz(msidata) <= $features_cond.max_mz,]
        }else{ 
            msidata = msidata[0,]
            print("no valid mz range")}

    ############### Remove m/z from tabular file #########################

    #elif str($features_cond.features_filtering) == "remove_features":
        print("remove features")

    ### Tabular file contains mz either as numbers or in the format mz = 800.01

        input_features = read.delim("$inputfeatures_removal", header = FALSE, stringsAsFactors = FALSE) 
        startingrow = $features_cond.removal_header+1
        extracted_features = input_features[startingrow:nrow(input_features),$features_cond.removal_column]
        numberfeatures = length(extracted_features)

        if (grepl("m/z = ", input_features[startingrow,$features_cond.removal_column])==TRUE){

    ### if input is mz = 800 character format
            print("input is in format mz = 400")
            validfeatures = extracted_features %in% names(features(msidata))
            validmz = sum(validfeatures)
            filtered_features = features(msidata)[names(features(msidata)) %in% extracted_features[validfeatures]]
            featuresofinterest = mz(msidata)[filtered_features]

    ### if input is numeric:
        }else{
            if (class(extracted_features) == "numeric"){
                print("input is numeric")
                featuresofinterest = extracted_features
                validmz = sum(featuresofinterest <= max(mz(msidata))& featuresofinterest >= min(mz(msidata)))
            }else{featuresofinterest = 0
                    validmz = 0}
        }

    ### Here starts removal of features: 

        plusminus = $features_cond.removal_plusminus

        mass_to_remove = numeric()
        if (sum(featuresofinterest) > 0){
            for (masses in featuresofinterest){
                #if str($features_cond.units_removal) == "ppm": 
                    plusminus = masses * $features_cond.removal_plusminus/1000000
                #end if 
                current_mass = which(c(mz(msidata) <= masses + plusminus & mz(msidata) >= masses - plusminus))
                mass_to_remove = append(mass_to_remove, current_mass)}
            msidata= msidata[-mass_to_remove, ]
        }else{print("No features were removed as they were not fitting to m/z values and/or range")}


    #elif str($features_cond.features_filtering) == "none":

        print("no feature filtering")
        validmz = 0
        numberfeatures = 0

    #end if

    ## save msidata as Rfile
    save(msidata, file="$msidata_filtered")

}else{
    print("Inputfile or file filtered for pixels has no intensities > 0")
    numberfeatures = NA
    validmz = NA
}

    #################### QC numbers #######################


        ## Number of features (m/z)
        maxfeatures2 = length(features(msidata))
        ## Range m/z
        minmz2 = round(min(mz(msidata)), digits=2)
        maxmz2 = round(max(mz(msidata)), digits=2)
        ## Number of spectra (pixels)
        pixelcount2 = length(pixels(msidata))
        ## Range x coordinates
        minimumx2 = min(coord(msidata)[,1])
        maximumx2 = max(coord(msidata)[,1])
        ## Range y coordinates
        minimumy2 = min(coord(msidata)[,2])
        maximumy2 = max(coord(msidata)[,2])
        ## Number of intensities > 0
        npeaks2= sum(spectra(msidata)[]>0, na.rm=TRUE)
        ## Spectra multiplied with m/z (potential number of peaks)
        numpeaks2 = ncol(spectra(msidata)[])*nrow(spectra(msidata)[])
        ## Percentage of intensities > 0
        percpeaks2 = round(npeaks2/numpeaks2*100, digits=2)
        ## Number of empty TICs
        TICs2 = colSums(spectra(msidata)[], na.rm=TRUE) 
        NumemptyTIC2 = sum(TICs2 == 0)
        ## median TIC
        medint2 = round(median(TICs2), digits=2)

        properties = c("Number of m/z features",
                       "Range of m/z values",
                       "Number of pixels", 
                       "Range of x coordinates", 
                       "Range of y coordinates",
                       "Intensities > 0",
                       "Median TIC per pixel",
                       "Number of zero TICs", 
                       "pixel overview", 
                       "feature overview")

        before = c(paste0(maxfeatures), 
                   paste0(minmz, " - ", maxmz), 
                   paste0(pixelcount), 
                   paste0(minimumx, " - ", maximumx), 
                   paste0(minimumy, " - ", maximumy), 
                   paste0(percpeaks, " %"), 
                   paste0(medint),
                   paste0(NumemptyTIC), 
                   paste0("input pixels: ", numberpixels),
                   paste0("input mz: ", numberfeatures))

        filtered = c(paste0(maxfeatures2), 
                   paste0(minmz2, " - ", maxmz2), 
                   paste0(pixelcount2), 
                   paste0(minimumx2, " - ", maximumx2),  
                   paste0(minimumy2, " - ", maximumy2), 
                   paste0(percpeaks2, " %"), 
                   paste0(medint2),
                   paste0(NumemptyTIC2), 
                   paste0("valid pixels: ", validpixels),
                   paste0("valid mz: ", validmz))

        property_df = data.frame(properties, before, filtered)

    ############################### PDF QC ################################


        pdf("filtertool_QC.pdf", fonts = "Times", pointsize = 12)
        plot(0,type='n',axes=FALSE,ann=FALSE)
        title(main=paste0("Qualitycontrol of filtering tool for file: \n\n", "$infile.display_name"))
        grid.table(property_df, rows= NULL)

## QC report with more than value-table: only when pixels/features/intensities are left
if (npeaks2 > 0)
{
        ### visual pixel control

            pixel_image = ggplot(position_df, aes(x=x, y=y, fill=annotation))+
                   geom_tile(height = 1, width=1)+
                   coord_fixed()+
                   ggtitle("Spatial orientation of filtered pixels")+
                   theme_bw()+
                   theme(plot.title = element_text(hjust = 0.5))+
                   theme(text=element_text(family="ArialMT", face="bold", size=12))+
                   theme(legend.position="bottom",legend.direction="vertical")+
                   guides(fill=guide_legend(ncol=4,byrow=TRUE))

            print(pixel_image)

            ### control features which are removed
            hist(mz(msidata), xlab="m/z", main="Kept m/z values")
            #if str($features_cond.features_filtering) == "none":
                print("no difference histogram as no m/z filtering took place")
            #else:

                if (isTRUE(all.equal(featuresinfile, mz(msidata)))){
                print("No difference in m/z values before and after filtering, no histogram drawn")
                }else{
                hist(setdiff(featuresinfile, mz(msidata)), xlab="m/z", main="Removed m/z values")}
            #end if

        dev.off()

    ############################### optional intensity matrix ######################

    #if $output_matrix:

        spectramatrix = spectra(msidata)[]
        spectramatrix = cbind(mz(msidata),spectramatrix)
        newmatrix = rbind(c("mz | spectra", names(pixels(msidata))), spectramatrix)
        write.table(newmatrix, file="$matrixasoutput", quote = FALSE, row.names = FALSE, col.names=FALSE, sep = "\t")

    #end if

}else{
    print("Inputfile or filtered file has no intensities > 0")
    dev.off()
}
    ]]></configfile>
    </configfiles>
    <inputs>
        <param name="infile" type="data" format="imzml,rdata,analyze75"
               label="Inputfile as imzML, Analyze7.5 or Cardinal MSImageSet saved as RData"
                help="Upload composite datatype imzML (ibd+imzML) or analyze75 (hdr+img+t2m) or regular upload .RData (Cardinal MSImageSet)"/>
        <conditional name="processed_cond">
            <param name="processed_file" type="select" label="Is the input file a processed imzML file ">
                <option value="no_processed" selected="True">not a processed imzML</option>
                <option value="processed">processed imzML</option>
            </param>
            <when value="no_processed"/>
            <when value="processed">
                <param name="accuracy" type="float" value="50" label="Mass accuracy to which the m/z values will be binned" help="This should be set to the native accuracy of the mass spectrometer, if known"/>
                <param name="units" display="radio" type="select" label="Unit of the mass accuracy" help="either m/z or ppm">
                    <option value="mz" >mz</option>
                    <option value="ppm" selected="True" >ppm</option>
                </param>
            </when>
        </conditional>

        <conditional name="pixels_cond">
            <param name="pixel_filtering" type="select" label="Select pixel filtering option">
                <option value="none" selected="True">none</option>
                <option value="single_column">tabular file with single column (x = 1, y = 1)</option>
                <option value="two_columns">tabular file with separate columns for x and y values</option>
                <option value="pixel_range">ranges for x and y</option>
            </param>
            <when value="none"/>
            <when value="single_column">
                <param name="single_pixels" type="data" format="tabular" label="Pixels in single column for filtering of MSI data"
                    help="tabular file with pixels of interest in the form x = 1, y = 1"/>
                <param name="pixel_column" data_ref="single_pixels" label="Column with pixels" type="data_column"/>
                <param name="annotation_column" data_ref="single_pixels" label="Column with annotations for each pixel" type="data_column"/>
                <param name="pixel_header" label="Number of header lines to skip" value="0" type="integer"/>
            </when> 
            <when value="two_columns">
                <param name="two_columns_pixel" type="data" format="tabular" label="Pixels in two columns for filtering of MSI data"
                    help="tabular file with pixels of interest in two separate columns"/>
                <param name="pixel_column_x" data_ref="two_columns_pixel" label="Column with x values" type="data_column"/>
                <param name="pixel_column_y" data_ref="two_columns_pixel" label="Column with y values" type="data_column"/>
                <param name="annotation_column_xy" data_ref="two_columns_pixel" label="Column with annotations" type="data_column"/>
                <param name="pixel_header" label="Number of header lines to skip" value="0" type="integer"/>
            </when> 
            <when value="pixel_range">
                <param name="min_x_range" type="integer" value="0" label="Minimum value for x"/>
                <param name="max_x_range" type="integer" value="100" label="Maximum value for x"/>
                <param name="min_y_range" type="integer" value="0" label="Minimum value for y"/>
                <param name="max_y_range" type="integer" value="100" label="Maximum value for y"/>
            </when> 
        </conditional>

        <conditional name="features_cond">
            <param name="features_filtering" type="select" label="Select feature filtering option">
                <option value="none" selected="True">none</option>
                <option value="features_list">keep features (tabular input)</option>
                <option value="features_range">keep features within a range (manual input)</option>
                <option value="remove_features">remove features (tabular input)</option>
            </param>
            <when value="none"/>
            <when value="features_list">
                <param name="inputfeatures" type="data" format="tabular" label="Features for filtering of MSI data" help="tabular file with m/z of interest either as numbers (800.05) or in the form m/z = 800.05"/>
                <param name="feature_column" data_ref="inputfeatures" label="Column with features" type="data_column"/>
                <param name="feature_header" label="Number of header lines to skip" value="0" type="integer"/>
            </when> 
            <when value="features_range">
                <param name="min_mz" type="float" value="1" label="Minimum value for m/z"/>
                <param name="max_mz" type="float" value="100" label="Maximum value for m/z"/>
            </when> 
            <when value="remove_features">
                <param name="inputfeatures_removal" type="data" format="tabular" label="Features for filtering of MSI data" help="tabular file with m/z to be removed either as numbers (800.05) or in the form m/z = 800.05"/>
                <param name="removal_column" data_ref="inputfeatures_removal" label="Column with features" type="data_column"/>
                <param name="removal_header" label="Number of header lines to skip" value="0" type="integer"/>
                <param name="removal_plusminus" type="float" value="20" label="Window in which m/z will be removed" help="This value will be added and substracted from the given input value"/>
                <param name="units_removal" type="select" display = "radio" optional = "False" label="units">
                        <option value="ppm" selected="True">ppm</option>
                        <option value="Da">Da</option>
                </param>
            </when>
        </conditional>
         <param name="output_matrix" type="boolean" display="radio" label="Intensity matrix output"/>
    </inputs>

    <outputs>
        <data format="rdata" name="msidata_filtered" label="$infile.display_name filtered"/>
        <data format="pdf" name="filtering_qc" from_work_dir="filtertool_QC.pdf" label = "$infile.display_name filtered_QC"/>
        <data format="tabular" name="matrixasoutput" label="$infile.display_name filtered_matrix">
            <filter>output_matrix</filter>
        </data>
    </outputs>
    <tests>
        <test expect_num_outputs="2">
            <param name="infile" value="" ftype="imzml">
                <composite_data value="Example_Continuous.imzML"/>
                <composite_data value="Example_Continuous.ibd"/>
            </param>
            <param name="pixel_filtering" value="single_column"/>
            <param name="single_pixels" ftype="tabular" value = "inputpixels.tabular"/>
            <param name="pixel_column" value="1"/>
            <param name="annotation_column" value="2"/>
            <param name="features_filtering" value="features_list"/>
            <param name="inputfeatures" ftype="tabular" value = "inputfeatures.tabular"/>
            <param name="feature_column" value="2"/>
            <param name="feature_header" value="1"/>
            <output name="filtering_qc" file="imzml_filtered.pdf" compare="sim_size" delta="20000"/>
            <output name="msidata_filtered" file="imzml_filtered.RData" compare="sim_size"/>
        </test>
        <test expect_num_outputs="2">
            <param name="infile" value="" ftype="imzml">
                <composite_data value="Example_Continuous.imzML"/>
                <composite_data value="Example_Continuous.ibd"/>
            </param>
            <param name="pixel_filtering" value="pixel_range"/>
            <param name="min_x_range" value="10"/>
            <param name="max_x_range" value="20"/>
            <param name="min_y_range" value="2"/>
            <param name="max_y_range" value="2"/>
            <output name="filtering_qc" file="imzml_filtered2.pdf" compare="sim_size" delta="20000"/>
            <output name="msidata_filtered" file="imzml_filtered2.RData" compare="sim_size"/>
        </test>
        <test expect_num_outputs="3">
            <param name="infile" value="" ftype="imzml">
                <composite_data value="Example_Continuous.imzML"/>
                <composite_data value="Example_Continuous.ibd"/>
            </param>
            <param name="pixel_filtering" value="pixel_range"/>
            <param name="min_x_range" value="1"/>
            <param name="max_x_range" value="20"/>
            <param name="min_y_range" value="2"/>
            <param name="max_y_range" value="2"/>
            <param name="features_filtering" value="features_range"/>
            <param name="min_mz" value="350" />
            <param name="max_mz" value="500"/>
            <param name="output_matrix" value="True"/>
            <output name="filtering_qc" file="imzml_filtered3.pdf" compare="sim_size" delta="20000"/>
            <output name="msidata_filtered" file="imzml_filtered3.RData" compare="sim_size"/>
            <output name="matrixasoutput" file="imzml_matrix3.tabular"/>
        </test>
        <test expect_num_outputs="2">
            <param name="infile" value="" ftype="imzml">
                <composite_data value="Example_Continuous.imzML"/>
                <composite_data value="Example_Continuous.ibd"/>
            </param>
            <param name="pixel_filtering" value="two_columns"/>
            <param name="two_columns_pixel" ftype="tabular" value = "inputpixels_2column.tabular"/>
            <param name="pixel_column_x" value="1"/>
            <param name="pixel_column_y" value="3"/>
            <param name="annotation_column_xy" value="2"/>
            <output name="filtering_qc" file="imzml_filtered4.pdf" compare="sim_size" delta="20000"/>
            <output name="msidata_filtered" file="imzml_filtered4.RData" compare="sim_size"/>
        </test>
        <test expect_num_outputs="2">
            <param name="infile" value="" ftype="imzml">
                <composite_data value="Example_Continuous.imzML"/>
                <composite_data value="Example_Continuous.ibd"/>
            </param>
            <param name="pixel_filtering" value="pixel_range"/>
            <param name="min_x_range" value="0"/>
            <param name="max_x_range" value="10"/>
            <param name="min_y_range" value="2"/>
            <param name="max_y_range" value="20"/>
            <param name="features_filtering" value="features_list"/>
            <param name="inputfeatures" ftype="tabular" value = "featuresofinterest5.tabular"/>
            <param name="feature_column" value="1"/>
            <param name="feature_header" value="0"/>
            <output name="filtering_qc" file="imzml_filtered5.pdf" compare="sim_size" delta="20000"/>
            <output name="msidata_filtered" file="imzml_filtered5.RData" compare="sim_size" />
        </test>
        <test expect_num_outputs="3">
           <param name="infile" value="" ftype="analyze75">
                <composite_data value="Analyze75.hdr"/>
                <composite_data value="Analyze75.img"/>
                <composite_data value="Analyze75.t2m"/>
            </param>
            <param name="pixel_filtering" value="single_column"/>
            <param name="single_pixels" ftype="tabular" value = "inputpixels2.tabular"/>
            <param name="pixel_column" value="1"/>
            <param name="features_filtering" value="features_list"/>
            <param name="inputfeatures" ftype="tabular" value = "featuresofinterest2.tabular"/>
            <param name="feature_column" value="1"/>
            <param name="output_matrix" value="True"/>
            <output name="filtering_qc" file="analyze_filtered.pdf" compare="sim_size" delta="20000"/>
            <output name="msidata_filtered" file="analyze_filtered.RData" compare="sim_size" />
            <output name="matrixasoutput" file="analyze_matrix.tabular"/>
        </test>
        <test expect_num_outputs="2">
           <param name="infile" value="" ftype="analyze75">
                <composite_data value="Analyze75.hdr"/>
                <composite_data value="Analyze75.img"/>
                <composite_data value="Analyze75.t2m"/>
            </param>
            <output name="filtering_qc" file="analyze75_filtered2.pdf" compare="sim_size" delta="20000"/>
            <output name="msidata_filtered" file="analyze_filteredoutside.RData" compare="sim_size" />
        </test>
        <test expect_num_outputs="3">
            <param name="infile" value="preprocessed.RData" ftype="rdata"/>
            <conditional name="outputs">
                <param name="outputs_select" value="no_quality_control"/>
            </conditional>
            <param name="output_matrix" value="True"/>
            <output name="matrixasoutput" file="rdata_matrix.tabular"/>
            <output name="msidata_filtered" file="rdata_notfiltered.RData" compare="sim_size" />
            <output name="filtering_qc" file="rdata_notfiltered.pdf" compare="sim_size" />
        </test>
    </tests>
    <help>
        <![CDATA[

Cardinal is an R package that implements statistical & computational tools for analyzing mass spectrometry imaging datasets. `More information on Cardinal <http://cardinalmsi.org//>`_

This tool provides provides options to filter (subset) pixels and m/z features of mass spectrometry imaging data.

Input data: 3 types of input data can be used:

- imzml file (upload imzml and ibd file via the "composite" function) `Introduction to the imzml format <https://ms-imaging.org/wp/imzml/>`_
- Analyze7.5 (upload hdr, img and t2m file via the "composite" function)
- Cardinal "MSImageSet" data (with variable name "msidata", saved as .RData)


Options:

- pixel filtering/annotation: either with a tabular file containing x and y coordinates and pixel annotations or by defining a range for x and y by hand (for the latter no annotation is possible)
- m/z feature filtering: can use a tabular file containing m/z of interest or by defining a range for the m/z values (! numeric input will be rounded to 2 digits before matching to m/z!)
- m/z feature removing: perturbing m/z such as matrix contaminants can be removed by specifying their m/z in a tabular file and optionally set a window (window in ppm or m/z in which peaks should be removed)


Output: 

- imzML file filtered for pixels and/or m/z
- optional: pdf with heatmap showing the pixels that are left after filtering and histograms of kept and removed m/z
- optional: intensity matrix as tabular file (intensities for m/z in rows and pixel in columns)


Tip: 

- It is recommended to use the filtering tool only for m/z which have been extracted from the same dataset. If you have m/z from dataset A and you want to use them to filter dataset B, first find the corresponding (closest) features in dataset B by using the tool "Join two files on column allowing a small difference". Afterwards use the corresponding feature m/z from dataset A to filter dataset B. 


        ]]>
    </help>
    <citations>
        <citation type="doi">10.1093/bioinformatics/btv146</citation>
    </citations>
</tool>