# HG changeset patch # User lecorguille # Date 1519895870 18000 # Node ID 3d43395940100f595d82029de63cd8ac3058fa96 # Parent 8ad83969888bdfc70a816c6f3e52e85fc3d03a7f planemo upload for repository https://github.com/workflow4metabolomics/xcms commit e384d6dd5f410799ec211f73bca0b5d5d7bc651e diff -r 8ad83969888b -r 3d4339594010 README.rst --- a/README.rst Tue Feb 13 04:45:13 2018 -0500 +++ b/README.rst Thu Mar 01 04:17:50 2018 -0500 @@ -2,10 +2,6 @@ Changelog/News -------------- -**Version 1.0.4 - 13/02/2018** - -- UPGRADE: upgrate the CAMERA version from 1.26.0 to 1.32.0 - **Version 1.0.3 - 03/02/2017** - IMPROVEMENT: xcms.summary can deal with merged individual data diff -r 8ad83969888b -r 3d4339594010 abims_xcms_summary.xml --- a/abims_xcms_summary.xml Tue Feb 13 04:45:13 2018 -0500 +++ b/abims_xcms_summary.xml Thu Mar 01 04:17:50 2018 -0500 @@ -1,4 +1,4 @@ - + Create a summary of XCMS analysis @@ -6,10 +6,9 @@ macros.xml - + bioconductor-camera - r-batch - + @@ -33,10 +32,10 @@ - + @@ -47,9 +46,9 @@ @HELP_AUTHORS@ -============ -Xcms.summary -============ +==================== +xcms process history +==================== ----------- Description @@ -85,6 +84,10 @@ Changelog/News -------------- +**Version 3.0.0.0 - 14/02/2018** + +- UPGRADE: upgrade the xcms version from 1.46.0 to 3.0.0. So refactoring of a lot of underlining codes and methods + **Version 1.0.4 - 13/02/2018** - UPGRADE: upgrate the CAMERA version from 1.26.0 to 1.32.0 diff -r 8ad83969888b -r 3d4339594010 lib.r --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/lib.r Thu Mar 01 04:17:50 2018 -0500 @@ -0,0 +1,764 @@ +#@authors ABiMS TEAM, Y. Guitton +# lib.r for Galaxy Workflow4Metabolomics xcms tools + +#@author G. Le Corguille +# solve an issue with batch if arguments are logical TRUE/FALSE +parseCommandArgs <- function(...) { + args <- batch::parseCommandArgs(...) + for (key in names(args)) { + if (args[key] %in% c("TRUE","FALSE")) + args[key] = as.logical(args[key]) + } + return(args) +} + +#@author G. Le Corguille +# This function will +# - load the packages +# - display the sessionInfo +loadAndDisplayPackages <- function(pkgs) { + for(pkg in pkgs) suppressPackageStartupMessages( stopifnot( library(pkg, quietly=TRUE, logical.return=TRUE, character.only=TRUE))) + + sessioninfo = sessionInfo() + cat(sessioninfo$R.version$version.string,"\n") + cat("Main packages:\n") + for (pkg in names(sessioninfo$otherPkgs)) { cat(paste(pkg,packageVersion(pkg)),"\t") }; cat("\n") + cat("Other loaded packages:\n") + for (pkg in names(sessioninfo$loadedOnly)) { cat(paste(pkg,packageVersion(pkg)),"\t") }; cat("\n") +} + +#@author G. Le Corguille +# This function convert if it is required the Retention Time in minutes +RTSecondToMinute <- function(variableMetadata, convertRTMinute) { + if (convertRTMinute){ + #converting the retention times (seconds) into minutes + print("converting the retention times into minutes in the variableMetadata") + variableMetadata[,"rt"] <- variableMetadata[,"rt"]/60 + variableMetadata[,"rtmin"] <- variableMetadata[,"rtmin"]/60 + variableMetadata[,"rtmax"] <- variableMetadata[,"rtmax"]/60 + } + return (variableMetadata) +} + +#@author G. Le Corguille +# This function format ions identifiers +formatIonIdentifiers <- function(variableMetadata, numDigitsRT=0, numDigitsMZ=0) { + splitDeco <- strsplit(as.character(variableMetadata$name),"_") + idsDeco <- sapply(splitDeco, function(x) { deco=unlist(x)[2]; if (is.na(deco)) return ("") else return(paste0("_",deco)) }) + namecustom <- make.unique(paste0("M",round(variableMetadata[,"mz"],numDigitsMZ),"T",round(variableMetadata[,"rt"],numDigitsRT),idsDeco)) + variableMetadata <- cbind(name=variableMetadata$name, namecustom=namecustom, variableMetadata[,!(colnames(variableMetadata) %in% c("name"))]) + return(variableMetadata) +} + +#@author G. Le Corguille +# Draw the plotChromPeakDensity 3 per page in a pdf file +getPlotChromPeakDensity <- function(xdata) { + pdf(file="plotChromPeakDensity.pdf", width=16, height=12) + + par(mfrow = c(3, 1), mar = c(4, 4, 1, 0.5)) + + group_colors <- brewer.pal(3, "Set1")[1:length(unique(xdata$sample_group))] + names(group_colors) <- unique(xdata$sample_group) + + xlim <- c(min(featureDefinitions(xdata)$rtmin), max(featureDefinitions(xdata)$rtmax)) + for (i in 1:nrow(featureDefinitions(xdata))) { + plotChromPeakDensity(xdata, mz=c(featureDefinitions(xdata)[i,]$mzmin,featureDefinitions(xdata)[i,]$mzmax), col=group_colors, pch=16, xlim=xlim) + legend("topright", legend=names(group_colors), col=group_colors, cex=0.8, lty=1) + } + + dev.off() +} + +#@author G. Le Corguille +# Draw the plotChromPeakDensity 3 per page in a pdf file +getPlotAdjustedRtime <- function(xdata) { + pdf(file="raw_vs_adjusted_rt.pdf", width=16, height=12) + # Color by group + group_colors <- brewer.pal(3, "Set1")[1:length(unique(xdata$sample_group))] + names(group_colors) <- unique(xdata$sample_group) + plotAdjustedRtime(xdata, col = group_colors[xdata$sample_group]) + legend("topright", legend=names(group_colors), col=group_colors, cex=0.8, lty=1) + # Color by sample + plotAdjustedRtime(xdata, col = rainbow(length(xdata@phenoData@data$sample_name))) + legend("topright", legend=xdata@phenoData@data$sample_name, col=rainbow(length(xdata@phenoData@data$sample_name)), cex=0.8, lty=1) + dev.off() +} + +#@author G. Le Corguille +# value: intensity values to be used into, maxo or intb +getPeaklistW4M <- function(xdata, intval="into", convertRTMinute=F, numDigitsMZ=4, numDigitsRT=0, variableMetadataOutput, dataMatrixOutput) { + dataMatrix <- featureValues(xdata, method="medret", value=intval) + colnames(dataMatrix) <- tools::file_path_sans_ext(colnames(dataMatrix)) + dataMatrix = cbind(name=groupnamesW4M(xdata), dataMatrix) + variableMetadata <- featureDefinitions(xdata) + colnames(variableMetadata)[1] = "mz"; colnames(variableMetadata)[4] = "rt" + variableMetadata = data.frame(name=groupnamesW4M(xdata), variableMetadata) + + variableMetadata <- RTSecondToMinute(variableMetadata, convertRTMinute) + variableMetadata <- formatIonIdentifiers(variableMetadata, numDigitsRT=numDigitsRT, numDigitsMZ=numDigitsMZ) + + write.table(variableMetadata, file=variableMetadataOutput,sep="\t",quote=F,row.names=F) + write.table(dataMatrix, file=dataMatrixOutput,sep="\t",quote=F,row.names=F) + +} + +#@author Y. Guitton +getBPC <- function(file,rtcor=NULL, ...) { + object <- xcmsRaw(file) + sel <- profRange(object, ...) + cbind(if (is.null(rtcor)) object@scantime[sel$scanidx] else rtcor ,xcms:::colMax(object@env$profile[sel$massidx,sel$scanidx,drop=FALSE])) + #plotChrom(xcmsRaw(file), base=T) +} + +#@author Y. Guitton +getBPCs <- function (xcmsSet=NULL, pdfname="BPCs.pdf",rt=c("raw","corrected"), scanrange=NULL) { + cat("Creating BIC pdf...\n") + + if (is.null(xcmsSet)) { + cat("Enter an xcmsSet \n") + stop() + } else { + files <- filepaths(xcmsSet) + } + + phenoDataClass <- as.vector(levels(xcmsSet@phenoData[,"class"])) #sometime phenoData have more than 1 column use first as class + + classnames <- vector("list",length(phenoDataClass)) + for (i in 1:length(phenoDataClass)){ + classnames[[i]] <- which( xcmsSet@phenoData[,"class"]==phenoDataClass[i]) + } + + N <- dim(phenoData(xcmsSet))[1] + + TIC <- vector("list",N) + + + for (j in 1:N) { + + TIC[[j]] <- getBPC(files[j]) + #good for raw + # seems strange for corrected + #errors if scanrange used in xcmsSetgeneration + if (!is.null(xcmsSet) && rt == "corrected") + rtcor <- xcmsSet@rt$corrected[[j]] + else + rtcor <- NULL + + TIC[[j]] <- getBPC(files[j],rtcor=rtcor) + # TIC[[j]][,1]<-rtcor + } + + + + pdf(pdfname,w=16,h=10) + cols <- rainbow(N) + lty <- 1:N + pch <- 1:N + #search for max x and max y in BPCs + xlim <- range(sapply(TIC, function(x) range(x[,1]))) + ylim <- range(sapply(TIC, function(x) range(x[,2]))) + ylim <- c(-ylim[2], ylim[2]) + + + ##plot start + + if (length(phenoDataClass)>2){ + for (k in 1:(length(phenoDataClass)-1)){ + for (l in (k+1):length(phenoDataClass)){ + #print(paste(phenoDataClass[k],"vs",phenoDataClass[l],sep=" ")) + plot(0, 0, type="n", xlim=xlim/60, ylim=ylim, main=paste("Base Peak Chromatograms \n","BPCs_",phenoDataClass[k]," vs ",phenoDataClass[l], sep=""), xlab="Retention Time (min)", ylab="BPC") + colvect <- NULL + for (j in 1:length(classnames[[k]])) { + tic <- TIC[[classnames[[k]][j]]] + # points(tic[,1]/60, tic[,2], col=cols[i], pch=pch[i], type="l") + points(tic[,1]/60, tic[,2], col=cols[classnames[[k]][j]], pch=pch[classnames[[k]][j]], type="l") + colvect <- append(colvect,cols[classnames[[k]][j]]) + } + for (j in 1:length(classnames[[l]])) { + # i <- class2names[j] + tic <- TIC[[classnames[[l]][j]]] + points(tic[,1]/60, -tic[,2], col=cols[classnames[[l]][j]], pch=pch[classnames[[l]][j]], type="l") + colvect <- append(colvect,cols[classnames[[l]][j]]) + } + legend("topright",paste(basename(files[c(classnames[[k]],classnames[[l]])])), col=colvect, lty=lty, pch=pch) + } + } + }#end if length >2 + + if (length(phenoDataClass)==2){ + k <- 1 + l <- 2 + colvect <- NULL + plot(0, 0, type="n", xlim=xlim/60, ylim=ylim, main=paste("Base Peak Chromatograms \n","BPCs_",phenoDataClass[k],"vs",phenoDataClass[l], sep=""), xlab="Retention Time (min)", ylab="BPC") + + for (j in 1:length(classnames[[k]])) { + + tic <- TIC[[classnames[[k]][j]]] + # points(tic[,1]/60, tic[,2], col=cols[i], pch=pch[i], type="l") + points(tic[,1]/60, tic[,2], col=cols[classnames[[k]][j]], pch=pch[classnames[[k]][j]], type="l") + colvect<-append(colvect,cols[classnames[[k]][j]]) + } + for (j in 1:length(classnames[[l]])) { + # i <- class2names[j] + tic <- TIC[[classnames[[l]][j]]] + points(tic[,1]/60, -tic[,2], col=cols[classnames[[l]][j]], pch=pch[classnames[[l]][j]], type="l") + colvect <- append(colvect,cols[classnames[[l]][j]]) + } + legend("topright",paste(basename(files[c(classnames[[k]],classnames[[l]])])), col=colvect, lty=lty, pch=pch) + + }#end length ==2 + + #case where only one class + if (length(phenoDataClass)==1){ + k <- 1 + ylim <- range(sapply(TIC, function(x) range(x[,2]))) + colvect <- NULL + plot(0, 0, type="n", xlim=xlim/60, ylim=ylim, main=paste("Base Peak Chromatograms \n","BPCs_",phenoDataClass[k], sep=""), xlab="Retention Time (min)", ylab="BPC") + + for (j in 1:length(classnames[[k]])) { + tic <- TIC[[classnames[[k]][j]]] + # points(tic[,1]/60, tic[,2], col=cols[i], pch=pch[i], type="l") + points(tic[,1]/60, tic[,2], col=cols[classnames[[k]][j]], pch=pch[classnames[[k]][j]], type="l") + colvect <- append(colvect,cols[classnames[[k]][j]]) + } + + legend("topright",paste(basename(files[c(classnames[[k]])])), col=colvect, lty=lty, pch=pch) + + }#end length ==1 + + dev.off() #pdf(pdfname,w=16,h=10) + + invisible(TIC) +} + + + +#@author Y. Guitton +getTIC <- function(file, rtcor=NULL) { + object <- xcmsRaw(file) + cbind(if (is.null(rtcor)) object@scantime else rtcor, rawEIC(object, mzrange=range(object@env$mz))$intensity) +} + +#overlay TIC from all files in current folder or from xcmsSet, create pdf +#@author Y. Guitton +getTICs <- function(xcmsSet=NULL,files=NULL, pdfname="TICs.pdf", rt=c("raw","corrected")) { + cat("Creating TIC pdf...\n") + + if (is.null(xcmsSet)) { + filepattern <- c("[Cc][Dd][Ff]", "[Nn][Cc]", "([Mm][Zz])?[Xx][Mm][Ll]", "[Mm][Zz][Dd][Aa][Tt][Aa]", "[Mm][Zz][Mm][Ll]") + filepattern <- paste(paste("\\.", filepattern, "$", sep=""), collapse="|") + if (is.null(files)) + files <- getwd() + info <- file.info(files) + listed <- list.files(files[info$isdir], pattern=filepattern, recursive=TRUE, full.names=TRUE) + files <- c(files[!info$isdir], listed) + } else { + files <- filepaths(xcmsSet) + } + + phenoDataClass <- as.vector(levels(xcmsSet@phenoData[,"class"])) #sometime phenoData have more than 1 column use first as class + classnames <- vector("list",length(phenoDataClass)) + for (i in 1:length(phenoDataClass)){ + classnames[[i]] <- which( xcmsSet@phenoData[,"class"]==phenoDataClass[i]) + } + + N <- length(files) + TIC <- vector("list",N) + + for (i in 1:N) { + if (!is.null(xcmsSet) && rt == "corrected") + rtcor <- xcmsSet@rt$corrected[[i]] else + rtcor <- NULL + TIC[[i]] <- getTIC(files[i], rtcor=rtcor) + } + + pdf(pdfname, w=16, h=10) + cols <- rainbow(N) + lty <- 1:N + pch <- 1:N + #search for max x and max y in TICs + xlim <- range(sapply(TIC, function(x) range(x[,1]))) + ylim <- range(sapply(TIC, function(x) range(x[,2]))) + ylim <- c(-ylim[2], ylim[2]) + + + ##plot start + if (length(phenoDataClass)>2){ + for (k in 1:(length(phenoDataClass)-1)){ + for (l in (k+1):length(phenoDataClass)){ + #print(paste(phenoDataClass[k],"vs",phenoDataClass[l],sep=" ")) + plot(0, 0, type="n", xlim=xlim/60, ylim=ylim, main=paste("Total Ion Chromatograms \n","TICs_",phenoDataClass[k]," vs ",phenoDataClass[l], sep=""), xlab="Retention Time (min)", ylab="TIC") + colvect <- NULL + for (j in 1:length(classnames[[k]])) { + tic <- TIC[[classnames[[k]][j]]] + # points(tic[,1]/60, tic[,2], col=cols[i], pch=pch[i], type="l") + points(tic[,1]/60, tic[,2], col=cols[classnames[[k]][j]], pch=pch[classnames[[k]][j]], type="l") + colvect <- append(colvect,cols[classnames[[k]][j]]) + } + for (j in 1:length(classnames[[l]])) { + # i=class2names[j] + tic <- TIC[[classnames[[l]][j]]] + points(tic[,1]/60, -tic[,2], col=cols[classnames[[l]][j]], pch=pch[classnames[[l]][j]], type="l") + colvect <- append(colvect,cols[classnames[[l]][j]]) + } + legend("topright",paste(basename(files[c(classnames[[k]],classnames[[l]])])), col=colvect, lty=lty, pch=pch) + } + } + }#end if length >2 + if (length(phenoDataClass)==2){ + k <- 1 + l <- 2 + + plot(0, 0, type="n", xlim=xlim/60, ylim=ylim, main=paste("Total Ion Chromatograms \n","TICs_",phenoDataClass[k],"vs",phenoDataClass[l], sep=""), xlab="Retention Time (min)", ylab="TIC") + colvect <- NULL + for (j in 1:length(classnames[[k]])) { + tic <- TIC[[classnames[[k]][j]]] + # points(tic[,1]/60, tic[,2], col=cols[i], pch=pch[i], type="l") + points(tic[,1]/60, tic[,2], col=cols[classnames[[k]][j]], pch=pch[classnames[[k]][j]], type="l") + colvect <- append(colvect,cols[classnames[[k]][j]]) + } + for (j in 1:length(classnames[[l]])) { + # i <- class2names[j] + tic <- TIC[[classnames[[l]][j]]] + points(tic[,1]/60, -tic[,2], col=cols[classnames[[l]][j]], pch=pch[classnames[[l]][j]], type="l") + colvect <- append(colvect,cols[classnames[[l]][j]]) + } + legend("topright",paste(basename(files[c(classnames[[k]],classnames[[l]])])), col=colvect, lty=lty, pch=pch) + + }#end length ==2 + + #case where only one class + if (length(phenoDataClass)==1){ + k <- 1 + ylim <- range(sapply(TIC, function(x) range(x[,2]))) + + plot(0, 0, type="n", xlim=xlim/60, ylim=ylim, main=paste("Total Ion Chromatograms \n","TICs_",phenoDataClass[k], sep=""), xlab="Retention Time (min)", ylab="TIC") + colvect <- NULL + for (j in 1:length(classnames[[k]])) { + tic <- TIC[[classnames[[k]][j]]] + # points(tic[,1]/60, tic[,2], col=cols[i], pch=pch[i], type="l") + points(tic[,1]/60, tic[,2], col=cols[classnames[[k]][j]], pch=pch[classnames[[k]][j]], type="l") + colvect <- append(colvect,cols[classnames[[k]][j]]) + } + + legend("topright",paste(basename(files[c(classnames[[k]])])), col=colvect, lty=lty, pch=pch) + + }#end length ==1 + + dev.off() #pdf(pdfname,w=16,h=10) + + invisible(TIC) +} + + + +# Get the polarities from all the samples of a condition +#@author Misharl Monsoor misharl.monsoor@sb-roscoff.fr ABiMS TEAM +#@author Gildas Le Corguille lecorguille@sb-roscoff.fr ABiMS TEAM +getSampleMetadata <- function(xdata=NULL, sampleMetadataOutput="sampleMetadata.tsv") { + cat("Creating the sampleMetadata file...\n") + + #Create the sampleMetada dataframe + sampleMetadata <- xdata@phenoData@data + rownames(sampleMetadata) <- NULL + colnames(sampleMetadata) <- c("sampleMetadata", "class") + + sampleNamesOrigin <- sampleMetadata$sampleMetadata + sampleNamesMakeNames <- make.names(sampleNamesOrigin) + + if (any(duplicated(sampleNamesMakeNames))) { + write("\n\nERROR: Usually, R has trouble to deal with special characters in its column names, so it rename them using make.names().\nIn your case, at least two columns after the renaming obtain the same name, thus XCMS will collapse those columns per name.", stderr()) + for (sampleName in sampleNamesOrigin) { + write(paste(sampleName,"\t->\t",make.names(sampleName)),stderr()) + } + stop("\n\nERROR: One or more of your files will not be import by xcmsSet. It may due to bad characters in their filenames.") + } + + if (!all(sampleNamesOrigin == sampleNamesMakeNames)) { + cat("\n\nWARNING: Usually, R has trouble to deal with special characters in its column names, so it rename them using make.names()\nIn your case, one or more sample names will be renamed in the sampleMetadata and dataMatrix files:\n") + for (sampleName in sampleNamesOrigin) { + cat(paste(sampleName,"\t->\t",make.names(sampleName),"\n")) + } + } + + sampleMetadata$sampleMetadata <- sampleNamesMakeNames + + + #For each sample file, the following actions are done + for (fileIdx in 1:length(fileNames(xdata))) { + #Check if the file is in the CDF format + if (!mzR:::netCDFIsFile(fileNames(xdata))) { + + # If the column isn't exist, with add one filled with NA + if (is.null(sampleMetadata$polarity)) sampleMetadata$polarity <- NA + + #Extract the polarity (a list of polarities) + polarity <- fData(xdata)[fData(xdata)$fileIdx == fileIdx,"polarity"] + #Verify if all the scans have the same polarity + uniq_list <- unique(polarity) + if (length(uniq_list)>1){ + polarity <- "mixed" + } else { + polarity <- as.character(uniq_list) + } + + #Set the polarity attribute + sampleMetadata$polarity[fileIdx] <- polarity + } + + } + + write.table(sampleMetadata, sep="\t", quote=FALSE, row.names=FALSE, file=sampleMetadataOutput) + + return(list("sampleNamesOrigin"=sampleNamesOrigin, "sampleNamesMakeNames"=sampleNamesMakeNames)) + +} + + +# This function check if xcms will found all the files +#@author Gildas Le Corguille lecorguille@sb-roscoff.fr ABiMS TEAM +checkFilesCompatibilityWithXcms <- function(directory) { + cat("Checking files filenames compatibilities with xmcs...\n") + # WHAT XCMS WILL FIND + filepattern <- c("[Cc][Dd][Ff]", "[Nn][Cc]", "([Mm][Zz])?[Xx][Mm][Ll]","[Mm][Zz][Dd][Aa][Tt][Aa]", "[Mm][Zz][Mm][Ll]") + filepattern <- paste(paste("\\.", filepattern, "$", sep=""),collapse="|") + info <- file.info(directory) + listed <- list.files(directory[info$isdir], pattern=filepattern, recursive=TRUE, full.names=TRUE) + files <- c(directory[!info$isdir], listed) + files_abs <- file.path(getwd(), files) + exists <- file.exists(files_abs) + files[exists] <- files_abs[exists] + files[exists] <- sub("//","/",files[exists]) + + # WHAT IS ON THE FILESYSTEM + filesystem_filepaths <- system(paste("find $PWD/",directory," -not -name '\\.*' -not -path '*conda-env*' -type f -name \"*\"", sep=""), intern=T) + filesystem_filepaths <- filesystem_filepaths[grep(filepattern, filesystem_filepaths, perl=T)] + + # COMPARISON + if (!is.na(table(filesystem_filepaths %in% files)["FALSE"])) { + write("\n\nERROR: List of the files which will not be imported by xcmsSet",stderr()) + write(filesystem_filepaths[!(filesystem_filepaths %in% files)],stderr()) + stop("\n\nERROR: One or more of your files will not be import by xcmsSet. It may due to bad characters in their filenames.") + } +} + + +#This function list the compatible files within the directory as xcms did +#@author Gildas Le Corguille lecorguille@sb-roscoff.fr ABiMS TEAM +getMSFiles <- function (directory) { + filepattern <- c("[Cc][Dd][Ff]", "[Nn][Cc]", "([Mm][Zz])?[Xx][Mm][Ll]","[Mm][Zz][Dd][Aa][Tt][Aa]", "[Mm][Zz][Mm][Ll]") + filepattern <- paste(paste("\\.", filepattern, "$", sep=""),collapse="|") + info <- file.info(directory) + listed <- list.files(directory[info$isdir], pattern=filepattern,recursive=TRUE, full.names=TRUE) + files <- c(directory[!info$isdir], listed) + exists <- file.exists(files) + files <- files[exists] + return(files) +} + +# This function check if XML contains special caracters. It also checks integrity and completness. +#@author Misharl Monsoor misharl.monsoor@sb-roscoff.fr ABiMS TEAM +checkXmlStructure <- function (directory) { + cat("Checking XML structure...\n") + + cmd <- paste("IFS=$'\n'; for xml in $(find",directory,"-not -name '\\.*' -not -path '*conda-env*' -type f -iname '*.*ml*'); do if [ $(xmllint --nonet --noout \"$xml\" 2> /dev/null; echo $?) -gt 0 ]; then echo $xml;fi; done;") + capture <- system(cmd, intern=TRUE) + + if (length(capture)>0){ + #message=paste("The following mzXML or mzML file is incorrect, please check these files first:",capture) + write("\n\nERROR: The following mzXML or mzML file(s) are incorrect, please check these files first:", stderr()) + write(capture, stderr()) + stop("ERROR: xcmsSet cannot continue with incorrect mzXML or mzML files") + } + +} + + +# This function check if XML contain special characters +#@author Misharl Monsoor misharl.monsoor@sb-roscoff.fr ABiMS TEAM +deleteXmlBadCharacters<- function (directory) { + cat("Checking Non ASCII characters in the XML...\n") + + processed <- F + l <- system( paste("find",directory, "-not -name '\\.*' -not -path '*conda-env*' -type f -iname '*.*ml*'"), intern=TRUE) + for (i in l){ + cmd <- paste("LC_ALL=C grep '[^ -~]' \"", i, "\"", sep="") + capture <- suppressWarnings(system(cmd, intern=TRUE)) + if (length(capture)>0){ + cmd <- paste("perl -i -pe 's/[^[:ascii:]]//g;'",i) + print( paste("WARNING: Non ASCII characters have been removed from the ",i,"file") ) + c <- system(cmd, intern=TRUE) + capture <- "" + processed <- T + } + } + if (processed) cat("\n\n") + return(processed) +} + + +# This function will compute MD5 checksum to check the data integrity +#@author Gildas Le Corguille lecorguille@sb-roscoff.fr +getMd5sum <- function (directory) { + cat("Compute md5 checksum...\n") + # WHAT XCMS WILL FIND + filepattern <- c("[Cc][Dd][Ff]", "[Nn][Cc]", "([Mm][Zz])?[Xx][Mm][Ll]","[Mm][Zz][Dd][Aa][Tt][Aa]", "[Mm][Zz][Mm][Ll]") + filepattern <- paste(paste("\\.", filepattern, "$", sep=""),collapse="|") + info <- file.info(directory) + listed <- list.files(directory[info$isdir], pattern=filepattern, recursive=TRUE, full.names=TRUE) + files <- c(directory[!info$isdir], listed) + exists <- file.exists(files) + files <- files[exists] + + library(tools) + + #cat("\n\n") + + return(as.matrix(md5sum(files))) +} + + +# This function get the raw file path from the arguments +#@author Gildas Le Corguille lecorguille@sb-roscoff.fr +getRawfilePathFromArguments <- function(singlefile, zipfile, args) { + if (!is.null(args$zipfile)) zipfile <- args$zipfile + if (!is.null(args$zipfilePositive)) zipfile <- args$zipfilePositive + if (!is.null(args$zipfileNegative)) zipfile <- args$zipfileNegative + + if (!is.null(args$singlefile_galaxyPath)) { + singlefile_galaxyPaths <- args$singlefile_galaxyPath; + singlefile_sampleNames <- args$singlefile_sampleName + } + if (!is.null(args$singlefile_galaxyPathPositive)) { + singlefile_galaxyPaths <- args$singlefile_galaxyPathPositive; + singlefile_sampleNames <- args$singlefile_sampleNamePositive + } + if (!is.null(args$singlefile_galaxyPathNegative)) { + singlefile_galaxyPaths <- args$singlefile_galaxyPathNegative; + singlefile_sampleNames <- args$singlefile_sampleNameNegative + } + if (exists("singlefile_galaxyPaths")){ + singlefile_galaxyPaths <- unlist(strsplit(singlefile_galaxyPaths,",")) + singlefile_sampleNames <- unlist(strsplit(singlefile_sampleNames,",")) + + singlefile <- NULL + for (singlefile_galaxyPath_i in seq(1:length(singlefile_galaxyPaths))) { + singlefile_galaxyPath <- singlefile_galaxyPaths[singlefile_galaxyPath_i] + singlefile_sampleName <- singlefile_sampleNames[singlefile_galaxyPath_i] + singlefile[[singlefile_sampleName]] <- singlefile_galaxyPath + } + } + for (argument in c("zipfile","zipfilePositive","zipfileNegative","singlefile_galaxyPath","singlefile_sampleName","singlefile_galaxyPathPositive","singlefile_sampleNamePositive","singlefile_galaxyPathNegative","singlefile_sampleNameNegative")) { + args[[argument]] <- NULL + } + return(list(zipfile=zipfile, singlefile=singlefile, args=args)) +} + + +# This function retrieve the raw file in the working directory +# - if zipfile: unzip the file with its directory tree +# - if singlefiles: set symlink with the good filename +#@author Gildas Le Corguille lecorguille@sb-roscoff.fr +retrieveRawfileInTheWorkingDirectory <- function(singlefile, zipfile) { + if(!is.null(singlefile) && (length("singlefile")>0)) { + for (singlefile_sampleName in names(singlefile)) { + singlefile_galaxyPath <- singlefile[[singlefile_sampleName]] + if(!file.exists(singlefile_galaxyPath)){ + error_message <- paste("Cannot access the sample:",singlefile_sampleName,"located:",singlefile_galaxyPath,". Please, contact your administrator ... if you have one!") + print(error_message); stop(error_message) + } + + if (!suppressWarnings( try (file.link(singlefile_galaxyPath, singlefile_sampleName), silent=T))) + file.copy(singlefile_galaxyPath, singlefile_sampleName) + + } + directory <- "." + + } + if(!is.null(zipfile) && (zipfile != "")) { + if(!file.exists(zipfile)){ + error_message <- paste("Cannot access the Zip file:",zipfile,". Please, contact your administrator ... if you have one!") + print(error_message) + stop(error_message) + } + + #list all file in the zip file + #zip_files <- unzip(zipfile,list=T)[,"Name"] + + #unzip + suppressWarnings(unzip(zipfile, unzip="unzip")) + + #get the directory name + suppressWarnings(filesInZip <- unzip(zipfile, list=T)) + directories <- unique(unlist(lapply(strsplit(filesInZip$Name,"/"), function(x) x[1]))) + directories <- directories[!(directories %in% c("__MACOSX")) & file.info(directories)$isdir] + directory <- "." + if (length(directories) == 1) directory <- directories + + cat("files_root_directory\t",directory,"\n") + + } + return (directory) +} + + +# This function retrieve a xset like object +#@author Gildas Le Corguille lecorguille@sb-roscoff.fr +getxcmsSetObject <- function(xobject) { + # XCMS 1.x + if (class(xobject) == "xcmsSet") + return (xobject) + # XCMS 3.x + if (class(xobject) == "XCMSnExp") { + # Get the legacy xcmsSet object + suppressWarnings(xset <- as(xobject, 'xcmsSet')) + sampclass(xset) <- xset@phenoData$sample_group + return (xset) + } +} + + +#@TODO: remove this function as soon as we can use xcms 3.x.x from Bioconductor 3.7 +# https://github.com/sneumann/xcms/issues/250 +groupnamesW4M <- function(xdata, mzdec = 0, rtdec = 0) { + mzfmt <- paste("%.", mzdec, "f", sep = "") + rtfmt <- paste("%.", rtdec, "f", sep = "") + + gnames <- paste("M", sprintf(mzfmt, featureDefinitions(xdata)[,"mzmed"]), "T", + sprintf(rtfmt, featureDefinitions(xdata)[,"rtmed"]), sep = "") + + if (any(dup <- duplicated(gnames))) + for (dupname in unique(gnames[dup])) { + dupidx <- which(gnames == dupname) + gnames[dupidx] <- paste(gnames[dupidx], seq(along = dupidx), sep = "_") + } + + return (gnames) +} + +#@TODO: remove this function as soon as we can use xcms 3.x.x from Bioconductor 3.7 +# https://github.com/sneumann/xcms/issues/247 +.concatenate_XCMSnExp <- function(...) { + x <- list(...) + if (length(x) == 0) + return(NULL) + if (length(x) == 1) + return(x[[1]]) + ## Check that all are XCMSnExp objects. + if (!all(unlist(lapply(x, function(z) is(z, "XCMSnExp"))))) + stop("All passed objects should be 'XCMSnExp' objects") + new_x <- as(.concatenate_OnDiskMSnExp(...), "XCMSnExp") + ## If any of the XCMSnExp has alignment results or detected features drop + ## them! + x <- lapply(x, function(z) { + if (hasAdjustedRtime(z)) { + z <- dropAdjustedRtime(z) + warning("Adjusted retention times found, had to drop them.") + } + if (hasFeatures(z)) { + z <- dropFeatureDefinitions(z) + warning("Feature definitions found, had to drop them.") + } + z + }) + ## Combine peaks + fls <- lapply(x, fileNames) + startidx <- cumsum(lengths(fls)) + pks <- lapply(x, chromPeaks) + procH <- lapply(x, processHistory) + for (i in 2:length(fls)) { + pks[[i]][, "sample"] <- pks[[i]][, "sample"] + startidx[i - 1] + procH[[i]] <- lapply(procH[[i]], function(z) { + z@fileIndex <- as.integer(z@fileIndex + startidx[i - 1]) + z + }) + } + pks <- do.call(rbind, pks) + new_x@.processHistory <- unlist(procH) + chromPeaks(new_x) <- pks + if (validObject(new_x)) + new_x +} + +#@TODO: remove this function as soon as we can use xcms 3.x.x from Bioconductor 3.7 +# https://github.com/sneumann/xcms/issues/247 +.concatenate_OnDiskMSnExp <- function(...) { + x <- list(...) + if (length(x) == 0) + return(NULL) + if (length(x) == 1) + return(x[[1]]) + ## Check that all are XCMSnExp objects. + if (!all(unlist(lapply(x, function(z) is(z, "OnDiskMSnExp"))))) + stop("All passed objects should be 'OnDiskMSnExp' objects") + ## Check processingQueue + procQ <- lapply(x, function(z) z@spectraProcessingQueue) + new_procQ <- procQ[[1]] + is_ok <- unlist(lapply(procQ, function(z) + !is.character(all.equal(new_procQ, z)) + )) + if (any(!is_ok)) { + warning("Processing queues from the submitted objects differ! ", + "Dropping the processing queue.") + new_procQ <- list() + } + ## processingData + fls <- lapply(x, function(z) z@processingData@files) + startidx <- cumsum(lengths(fls)) + ## featureData + featd <- lapply(x, fData) + ## Have to update the file index and the spectrum names. + for (i in 2:length(featd)) { + featd[[i]]$fileIdx <- featd[[i]]$fileIdx + startidx[i - 1] + rownames(featd[[i]]) <- MSnbase:::formatFileSpectrumNames( + fileIds = featd[[i]]$fileIdx, + spectrumIds = featd[[i]]$spIdx, + nSpectra = nrow(featd[[i]]), + nFiles = length(unlist(fls)) + ) + } + featd <- do.call(rbind, featd) + featd$spectrum <- 1:nrow(featd) + ## experimentData + expdata <- lapply(x, function(z) { + ed <- z@experimentData + data.frame(instrumentManufacturer = ed@instrumentManufacturer, + instrumentModel = ed@instrumentModel, + ionSource = ed@ionSource, + analyser = ed@analyser, + detectorType = ed@detectorType, + stringsAsFactors = FALSE) + }) + expdata <- do.call(rbind, expdata) + expdata <- new("MIAPE", + instrumentManufacturer = expdata$instrumentManufacturer, + instrumentModel = expdata$instrumentModel, + ionSource = expdata$ionSource, + analyser = expdata$analyser, + detectorType = expdata$detectorType) + + ## protocolData + protodata <- lapply(x, function(z) z@protocolData) + if (any(unlist(lapply(protodata, nrow)) > 0)) + warning("Found non-empty protocol data, but merging protocol data is", + " currently not supported. Skipped.") + ## phenoData + pdata <- do.call(rbind, lapply(x, pData)) + res <- new( + "OnDiskMSnExp", + phenoData = new("NAnnotatedDataFrame", data = pdata), + featureData = new("AnnotatedDataFrame", featd), + processingData = new("MSnProcess", + processing = paste0("Concatenated [", date(), "]"), + files = unlist(fls), smoothed = NA), + experimentData = expdata, + spectraProcessingQueue = new_procQ) + if (validObject(res)) + res +} + +#@TODO: remove this function as soon as we can use xcms 3.x.x from Bioconductor 3.7 +# https://github.com/sneumann/xcms/issues/247 +c.XCMSnExp <- function(...) { + .concatenate_XCMSnExp(...) +} diff -r 8ad83969888b -r 3d4339594010 macros.xml --- a/macros.xml Tue Feb 13 04:45:13 2018 -0500 +++ b/macros.xml Thu Mar 01 04:17:50 2018 -0500 @@ -1,15 +1,12 @@ + 3.0.0 - r-snow - bioconductor-xcms + bioconductor-xcms r-batch - - - - - bioconductor-xcms + r-rcolorbrewer + @@ -18,15 +15,12 @@ - - LC_ALL=C Rscript $__tool_directory__/xcms.r - + LC_ALL=C Rscript $__tool_directory__/ ; return=\$?; - mv log.txt '$log'; - cat '$log'; + cat 'log.txt'; sh -c "exit \$return" @@ -70,6 +64,15 @@ + +
+ + + + +
+
+
@@ -81,8 +84,6 @@ #if $peaklist.peaklistBool - variableMetadataOutput '$variableMetadata' - dataMatrixOutput '$dataMatrix' convertRTMinute $peaklist.convertRTMinute numDigitsMZ $peaklist.numDigitsMZ numDigitsRT $peaklist.numDigitsRT @@ -108,10 +109,10 @@ - + (peaklist['peaklistBool']) - + (peaklist['peaklistBool']) @@ -131,6 +132,39 @@ + + +For details and explanations for all the parameters and the workflow of xcms_ package, see its manual_ and this example_ + +.. _xcms: https://bioconductor.org/packages/release/bioc/html/xcms.html +.. _manual: http://www.bioconductor.org/packages/release/bioc/manuals/xcms/man/xcms.pdf +.. _example: https://bioconductor.org/packages/release/bioc/vignettes/xcms/inst/doc/xcms.html + + + + + +Get a Peak List +--------------- + +If 'true', the module generates two additional files corresponding to the peak list: +- the variable metadata file (corresponding to information about extracted ions such as mass or retention time) +- the data matrix (corresponding to related intensities) + +**decimal places for [mass or retention time] values in identifiers** + + | Ions' identifiers are constructed as MxxxTyyy where 'xxx' is the ion median mass and 'yyy' the ion median retention time. + | Two parameters are used to adjust the number of decimal places wanted in identifiers for mass and retention time respectively. + | Theses parameters do not affect decimal places in columns other than the identifier one. + +**Reported intensity values** + + | This parameter determines which values should be reported as intensities in the dataMatrix table; it correspond to xcms 'intval' parameter: + | - into: integrated area of original (raw) peak + | - maxo: maximum intensity of original (raw) peak + | - intb: baseline corrected integrated peak area (only available if peak detection was done by ‘findPeaks.centWave’) + + diff -r 8ad83969888b -r 3d4339594010 test-data/faahKO-single.xset.merged.group.retcor.group.fillpeaks.RData Binary file test-data/faahKO-single.xset.merged.group.retcor.group.fillpeaks.RData has changed diff -r 8ad83969888b -r 3d4339594010 test-data/faahKO-single.xset.merged.group.retcor.group.fillpeaks.summary.html --- a/test-data/faahKO-single.xset.merged.group.retcor.group.fillpeaks.summary.html Tue Feb 13 04:45:13 2018 -0500 +++ b/test-data/faahKO-single.xset.merged.group.retcor.group.fillpeaks.summary.html Thu Mar 01 04:17:50 2018 -0500 @@ -17,79 +17,126 @@

Samples used:

- +
samplefilenamemd5sum*
ko15 ./ko15.CDF 4698c36c0b3af007faf70975c04ccf2a
ko16 ./ko16.CDF afaeed94ced3140bc042d5ab6aeb16c1
wt15 ./wt15.CDF d58a27fad7c04ddddb0359ddc2b7ba68
wt16 ./wt16.CDF 29654e9f8ad48c1fbe2a41b9ba578f6e
ko15ko15.CDF4698c36c0b3af007faf70975c04ccf2a
ko16ko16.CDFafaeed94ced3140bc042d5ab6aeb16c1
wt15wt15.CDFd58a27fad7c04ddddb0359ddc2b7ba68
wt16wt16.CDF29654e9f8ad48c1fbe2a41b9ba578f6e

*The program md5sum is designed to verify data integrity. So you can check if the data were uploaded correctly or if the data were changed during the process.

Function launched:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + +
timestamp***functionargumentvalue
170203-11:04:42xcmsSetnSlaves1
methodcentWave
ppm25
peakwidth2050
170203-11:05:21xcmsSetnSlaves1
methodcentWave
ppm25
peakwidth2050
170203-11:06:21xcmsSetnSlaves1
methodcentWave
ppm25
peakwidth2050
170203-11:06:59xcmsSetnSlaves1
methodcentWave
ppm25
peakwidth2050
170203-14:38:53groupmethoddensity
sleep0.001
minfrac0.3
bw5
mzwid0.01
max50
170203-14:51:16retcormethodpeakgroups
smoothloess
extra1
missing1
span0.2
familygaussian
plottypedeviation
170203-15:27:58groupmethoddensity
sleep0.001
minfrac0.3
bw5
mzwid0.01
max50
170203-15:44:50fillPeaksmethodchrom
convertRTMinuteFALSE
numDigitsMZ4
numDigitsRT1
intvalinto
Wed Feb 7 11:15:25 2018Peak detection
+Object of class:  CentWaveParam 
+Parameters:
+ ppm: 25 
+ peakwidth: 20, 50 
+ snthresh: 10 
+ prefilter: 3, 100 
+ mzCenterFun: wMean 
+ integrate: 1 
+ mzdiff: -0.001 
+ fitgauss: FALSE 
+ noise: 0 
+ verboseColumns: FALSE 
+ roiList length: 0 
+ firstBaselineCheck TRUE 
+ roiScales length: 0 
+
Mon Feb 12 15:31:11 2018Peak grouping
+Object of class:  PeakDensityParam 
+Parameters:
+ sampleGroups: character of length 4 
+ bw: 30 
+ minFraction: 0.8 
+ minSamples: 1 
+ binSize: 0.25 
+ maxFeatures: 50 
+
Mon Feb 12 15:31:19 2018Retention time correction
+Object of class:  PeakGroupsParam 
+Parameters:
+ minFraction: 0.85 
+ extraPeaks: 1 
+ smooth: loess 
+ span: 0.2 
+ family: gaussian 
+ number of peak groups: 125 
+
Mon Feb 12 15:31:27 2018Peak grouping
+Object of class:  PeakDensityParam 
+Parameters:
+ sampleGroups: character of length 4 
+ bw: 20 
+ minFraction: 0.4 
+ minSamples: 1 
+ binSize: 0.25 
+ maxFeatures: 50 
+
Wed Feb 14 09:55:13 2018Missing peak filling
+Object of class:  FillChromPeaksParam 
+Parameters:
+ expandMz: 0 
+ expandRt: 0 
+ ppm: 0 
+
-
***timestamp format: yymmdd-hh:mm:ss +
***timestamp format: DD MM dd hh:mm:ss YYYY or yymmdd-hh:mm:ss
+

Informations about the XCMSnExp object:

+
+MSn experiment data ("XCMSnExp")
+Object size in memory: 1.36 Mb
+- - - Spectra data - - -
+ MS level(s): 1 
+ Number of spectra: 5112 
+ MSn retention times: 41:33 - 75:0 minutes
+- - - Processing information - - -
+Concatenated [Thu Feb  8 15:36:09 2018] 
+ MSnbase version: 2.4.2 
+- - - Meta data  - - -
+phenoData
+  rowNames: ./ko15.CDF ./ko16.CDF ./wt15.CDF ./wt16.CDF
+  varLabels: sample_name sample_group
+  varMetadata: labelDescription
+Loaded from:
+  [1] ko15.CDF...  [4] wt16.CDF
+  Use 'fileNames(.)' to see all files.
+protocolData: none
+featureData
+  featureNames: F1.S0001 F1.S0002 ... F4.S1278 (5112 total)
+  fvarLabels: fileIdx spIdx ... spectrum (27 total)
+  fvarMetadata: labelDescription
+experimentData: use 'experimentData(object)'
+- - - xcms preprocessing - - -
+Chromatographic peak detection:
+ method: centWave 
+ 15230 peaks identified in 4 samples.
+ On average 3808 chromatographic peaks per sample.
+Alignment/retention time adjustment:
+ method: peak groups 
+Correspondence:
+ method: chromatographic peak density 
+ 6332 features identified.
+ Median mz range of features: 0
+ Median rt range of features: 0
+ 5979 filled peaks (on average 1494.75 per sample).
+

Informations about the xcmsSet object:

 An "xcmsSet" object with 4 samples
 
-Time range: 2506-4484 seconds (41.8-74.7 minutes)
+Time range: 2499.4-4473.6 seconds (41.7-74.6 minutes)
 Mass range: 200.1-600 m/z
-Peaks: 32720 (about 8180 per sample)
-Peak Groups: 8157 
+Peaks: 15230 (about 3808 per sample)
+Peak Groups: 6332 
 Sample classes: KO, WT 
 
-Peak picking was performed on MS1.
+Feature detection:
+ o Peak picking performed on MS1.
+ o Scan range limited to  1 - 1278 
 Profile settings: method = bin
                   step = 0.1
 
-Memory usage: 4.25 MB
+Memory usage: 2.98 MB
 

Citations:

    diff -r 8ad83969888b -r 3d4339594010 test-data/faahKO.xset.group.retcor.group.fillpeaks.summary.html --- a/test-data/faahKO.xset.group.retcor.group.fillpeaks.summary.html Tue Feb 13 04:45:13 2018 -0500 +++ b/test-data/faahKO.xset.group.retcor.group.fillpeaks.summary.html Thu Mar 01 04:17:50 2018 -0500 @@ -17,7 +17,7 @@

    Samples used:

    - +
    samplefilenamemd5sum*
    ko15 faahKO_reduce/KO/ko15.CDF 4698c36c0b3af007faf70975c04ccf2a
    ko16 faahKO_reduce/KO/ko16.CDF afaeed94ced3140bc042d5ab6aeb16c1
    wt15 faahKO_reduce/WT/wt15.CDF d58a27fad7c04ddddb0359ddc2b7ba68
    wt16 faahKO_reduce/WT/wt16.CDF 29654e9f8ad48c1fbe2a41b9ba578f6e
    ko15faahKO_reduce/KO/ko15.CDF4698c36c0b3af007faf70975c04ccf2a
    ko16faahKO_reduce/KO/ko16.CDFafaeed94ced3140bc042d5ab6aeb16c1
    wt15faahKO_reduce/WT/wt15.CDFd58a27fad7c04ddddb0359ddc2b7ba68
    wt16faahKO_reduce/WT/wt16.CDF29654e9f8ad48c1fbe2a41b9ba578f6e

    *The program md5sum is designed to verify data integrity. So you can check if the data were uploaded correctly or if the data were changed during the process.
    @@ -54,7 +54,7 @@ 160421-11:50:48fillPeaks methodchrom -
    ***timestamp format: yymmdd-hh:mm:ss +
    ***timestamp format: DD MM dd hh:mm:ss YYYY or yymmdd-hh:mm:ss

Informations about the xcmsSet object:

@@ -66,6 +66,7 @@
 Peak Groups: 8157 
 Sample classes: KO, WT 
 
+Feature detection:
 Profile settings: method = bin
                   step = 0.1
 
diff -r 8ad83969888b -r 3d4339594010 xcms_summary.r
--- a/xcms_summary.r	Tue Feb 13 04:45:13 2018 -0500
+++ b/xcms_summary.r	Thu Mar 01 04:17:50 2018 -0500
@@ -1,51 +1,81 @@
 #!/usr/bin/env Rscript
-# version="1.0.0"
-#@author Gildas Le Corguille lecorguille@sb-roscoff.fr ABIMS TEAM
 
 
 
 # ----- ARGUMENTS BLACKLIST -----
 #xcms.r
-argBlacklist=c("zipfile","singlefile_galaxyPath","singlefile_sampleName","xfunction","xsetRdataOutput","sampleMetadataOutput","ticspdf","bicspdf","rplotspdf")
+argBlacklist <- c("zipfile", "singlefile_galaxyPath", "singlefile_sampleName", "xfunction", "xsetRdataOutput", "sampleMetadataOutput", "ticspdf", "bicspdf", "rplotspdf")
 #CAMERA.r
-argBlacklist=c(argBlacklist,"dataMatrixOutput","variableMetadataOutput","new_file_path")
+argBlacklist <- c(argBlacklist, "dataMatrixOutput", "variableMetadataOutput", "new_file_path")
+
 
 # ----- PACKAGE -----
+cat("\tSESSION INFO\n")
 
-pkgs=c("parallel","BiocGenerics", "Biobase", "Rcpp", "mzR", "igraph", "xcms","CAMERA","batch")
-for(pkg in pkgs) {
-    cat(pkg,"\n")
-    suppressPackageStartupMessages( stopifnot( library(pkg, quietly=TRUE, logical.return=TRUE, character.only=TRUE)))
-}
+#Import the different functions
+source_local <- function(fname){ argv <- commandArgs(trailingOnly=FALSE); base_dir <- dirname(substring(argv[grep("--file=", argv)], 8)); source(paste(base_dir, fname, sep="/")) }
+source_local("lib.r")
+
+pkgs <- c("CAMERA","batch")
+loadAndDisplayPackages(pkgs)
+cat("\n\n");
 
 
 # ----- FUNCTION -----
-writehtml = function(...) { cat(...,"\n", file=htmlOutput,append = TRUE,sep="") }
+writehtml <- function(...) { cat(...,"\n", file=htmlOutput,append = TRUE,sep="") }
+writeraw <- function(htmlOutput, object, open="at") {
+    log_file <- file(htmlOutput, open = open)
+    sink(log_file)
+    sink(log_file, type = "output")
+        print(object)
+    sink()
+    close(log_file)
+}
+getSampleNames <- function(xobject) {
+    if (class(xobject) == "xcmsSet")
+        return (sampnames(xobject))
+    if (class(xobject) == "XCMSnExp")
+        return (xobject@phenoData@data$sample_name)
+}
+getFilePaths <- function(xobject) {
+    if (class(xobject) == "xcmsSet")
+        return (xobject@filepaths)
+    if (class(xobject) == "XCMSnExp")
+        return (fileNames(xobject))
+}
+equalParams <- function(param1, param2) {
+    writeraw("param1.txt", param1, open="wt")
+    writeraw("param2.txt", param2, open="wt")
+    return(tools::md5sum("param1.txt") == tools::md5sum("param2.txt"))
+}
 
 
 # ----- ARGUMENTS -----
 
-listArguments = parseCommandArgs(evaluate=FALSE) #interpretation of arguments given in command line as an R list of objects
+args <- parseCommandArgs(evaluate=FALSE) #interpretation of arguments given in command line as an R list of objects
 
 
 # ----- ARGUMENTS PROCESSING -----
 
 #image is an .RData file necessary to use xset variable given by previous tools
-load(listArguments[["image"]]);
+load(args$image);
 
-htmlOutput = "summary.html"
-if (!is.null(listArguments[["htmlOutput"]])) htmlOutput = listArguments[["htmlOutput"]];
+htmlOutput <- "summary.html"
+if (!is.null(args$htmlOutput)) htmlOutput = args$htmlOutput;
 
-user_email = NULL
-if (!is.null(listArguments[["user_email"]])) user_email = listArguments[["user_email"]];
+user_email <- NULL
+if (!is.null(args$user_email)) user_email = args$user_email;
 
-# if the RData come from CAMERA
-if (!exists("xset") & exists("xa")) xset=xa@xcmsSet
-
+# if the RData come from XCMS 1.x
+if (exists("xset")) xobject <- xset
 # retrocompatability
-if (!exists("sampleNamesList")) sampleNamesList=list("sampleNamesMakeNames"=make.names(sampnames(xset)))
+if (!exists("sampleNamesList")) sampleNamesList <- list("sampleNamesMakeNames"=make.names(sampnames(xobject)))
+# if the RData come from CAMERA
+if (exists("xa")) xobject <- xa@xcmsSet
+# if the RData come from XCMS 3.x
+if (exists("xdata")) xobject <- xdata
 
-if (!exists("xset")) stop("You need at least a xset or a xa object.")
+if (!exists("xobject")) stop("You need at least a xdata, a xset or a xa object.")
 
 
 
@@ -71,37 +101,37 @@
     writehtml("

___ XCMS analysis summary using Workflow4Metabolomics ___

") # to pass the planemo shed_test if (user_email != "test@bx.psu.edu") { - if (!is.null(user_email)) writehtml("By: ",user_email," - ") - writehtml("Date: ",format(Sys.time(), "%y%m%d-%H:%M:%S")) + if (!is.null(user_email)) writehtml("By: ", user_email," - ") + writehtml("Date: ", format(Sys.time(), "%y%m%d-%H:%M:%S")) } writehtml("
") writehtml("

Samples used:

") writehtml("
") - if (all(sampnames(xset) == sampleNamesList$sampleNamesMakeNames)) { - sampleNameHeaderHtml = paste("") - sampleNameHtml = paste("") + if (all(getSampleNames(xobject) == sampleNamesList$sampleNamesMakeNames)) { + sampleNameHeaderHtml <- paste0("") + sampleNameHtml <- paste0("") } else { - sampleNameHeaderHtml = paste("") - sampleNameHtml = paste("") + sampleNameHeaderHtml <- paste0("") + sampleNameHtml <- paste0("") } if (!exists("md5sumList")) { - md5sumHeaderHtml = "" - md5sumHtml = "" - md5sumLegend="" + md5sumHeaderHtml <- "" + md5sumHtml <- "" + md5sumLegend <- "" } else if (is.null(md5sumList$removalBadCharacters)) { - md5sumHeaderHtml = paste("") - md5sumHtml = paste("") - md5sumLegend = "
*The program md5sum is designed to verify data integrity. So you can check if the data were uploaded correctly or if the data were changed during the process." + md5sumHeaderHtml <- paste0("") + md5sumHtml <- paste0("") + md5sumLegend <- "
*The program md5sum is designed to verify data integrity. So you can check if the data were uploaded correctly or if the data were changed during the process." } else { - md5sumHeaderHtml = paste("") - md5sumHtml = paste("") - md5sumLegend = "
*The program md5sum is designed to verify data integrity. So you can check if the data were uploaded correctly or if the data were changed during the process.
**Because some bad characters (eg: accent) were removed from your original file, the checksum have changed too.
" + md5sumHeaderHtml <- paste0("") + md5sumHtml <- paste0("") + md5sumLegend <- "
*The program md5sum is designed to verify data integrity. So you can check if the data were uploaded correctly or if the data were changed during the process.
**Because some bad characters (eg: accent) were removed from your original file, the checksum have changed too.
" } writehtml("",sampleNameHeaderHtml,"",md5sumHeaderHtml,"") - writehtml(paste("",sampleNameHtml,"",md5sumHtml,"")) + writehtml(paste0("",sampleNameHtml,"",md5sumHtml,"")) writehtml("
sample",sampnames(xset),"sample",getSampleNames(xobject),"samplesample renamed",sampnames(xset),"",sampleNamesList$sampleNamesMakeNames,"samplesample renamed",getSampleNames(xobject),"",sampleNamesList$sampleNamesMakeNames,"md5sum*",md5sumList$origin,"md5sum*",md5sumList$origin,"md5sum*md5sum** after bad characters removal",md5sumList$origin,"",md5sumList$removalBadCharacters,"md5sum*md5sum** after bad characters removal",md5sumList$origin,"",md5sumList$removalBadCharacters,"
filename
",xset@filepaths,"
",getFilePaths(xobject),"
") writehtml(md5sumLegend) @@ -110,32 +140,57 @@ writehtml("

Function launched:

") writehtml("
") writehtml("") - for(tool in names(listOFlistArguments)) { - listOFlistArgumentsDisplay=listOFlistArguments[[tool]][!(names(listOFlistArguments[[tool]]) %in% argBlacklist)] + # XCMS 3.x + if (class(xobject) == "XCMSnExp") { + xcmsFunction <- NULL + params <- NULL + for (processHistoryItem in processHistory(xobject)) { + if ((xcmsFunction == processType(processHistoryItem)) && equalParams(params, processParam(processHistoryItem))) + next + timestamp <- processDate(processHistoryItem) + xcmsFunction <- processType(processHistoryItem) + params <- processParam(processHistoryItem) + writehtml("") + } + } + # CAMERA and retrocompatability XCMS 1.x + if (exists("listOFlistArguments")) { + for(tool in names(listOFlistArguments)) { + listOFlistArgumentsDisplay <- listOFlistArguments[[tool]][!(names(listOFlistArguments[[tool]]) %in% argBlacklist)] - timestamp = strsplit(tool,"_")[[1]][1] - xcmsFunction = strsplit(tool,"_")[[1]][2] - writehtml("") - line_begin="" - for (arg in names(listOFlistArgumentsDisplay)) { - writehtml(line_begin,"") - line_begin="" + timestamp <- strsplit(tool,"_")[[1]][1] + xcmsFunction <- strsplit(tool,"_")[[1]][2] + writehtml("") + line_begin <- "" + for (arg in names(listOFlistArgumentsDisplay)) { + writehtml(line_begin,"") + line_begin <- "" + } } } writehtml("
timestamp***functionargumentvalue
",timestamp,"",xcmsFunction,"
")
+                writeraw(htmlOutput, params)
+                writehtml("
",timestamp,"",xcmsFunction,"",arg,"",unlist(listOFlistArgumentsDisplay[arg][1]),"
",timestamp,"",xcmsFunction,"",arg,"",unlist(listOFlistArgumentsDisplay[arg][1]),"
") - writehtml("
***timestamp format: yymmdd-hh:mm:ss") + writehtml("
***timestamp format: DD MM dd hh:mm:ss YYYY or yymmdd-hh:mm:ss") writehtml("
") + if (class(xobject) == "XCMSnExp") { + writehtml("

Informations about the XCMSnExp object:

") + + writehtml("
")
+            writeraw(htmlOutput, xobject)
+        writehtml("
") + } + writehtml("

Informations about the xcmsSet object:

") writehtml("
")
-        log_file=file(htmlOutput, open = "at")
-        sink(log_file)
-        sink(log_file, type = "output")
-            xset
-        sink()
+        # Get the legacy xcmsSet object
+        xset <- getxcmsSetObject(xobject)
+        writeraw(htmlOutput, xset)
     writehtml("
") + # CAMERA if (exists("xa")) { writehtml("

Informations about the CAMERA object:

")