# HG changeset patch # User lecorguille # Date 1519895805 18000 # Node ID 4d6f4cd7c3ef2df589fa344e3be14d37e48b832e # Parent c013ed353a2ff360f0ecc8a15120b1a8c43a31eb planemo upload for repository https://github.com/workflow4metabolomics/xcms commit e384d6dd5f410799ec211f73bca0b5d5d7bc651e diff -r c013ed353a2f -r 4d6f4cd7c3ef abims_xcms_retcor.xml --- a/abims_xcms_retcor.xml Tue Feb 13 04:44:03 2018 -0500 +++ b/abims_xcms_retcor.xml Thu Mar 01 04:16:45 2018 -0500 @@ -1,6 +1,6 @@ - + - Retention Time Correction using retcor function from xcms R package + Retention Time Correction macros.xml @@ -10,27 +10,31 @@ - - - + + + - - + + + + + + + + + +
+ + + + + +
+
+ +
- - - - - - - - - - - - - - - - - - - + + +
+ + + + + + + + + - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - - - + + + + + + + +
@@ -87,95 +126,100 @@ - - - (methods['method'] == 'peakgroups') - (options['option'] == 'show') - (family == 'symmetric') - (plottype != 'none') - - - - - + + + + + + + + +
+ + +
+
+
+ + + + + + + + + + + + +
+ +
**peakgroups** + | Method: -> **PeakGroups** | smooth: -> **loess** - | extra: -> **1** - | missing -> **1** + | extraPeaks: -> **1** + | minFraction -> **1** | Advanced options: -> **show** | span -> **0.2** | family -> **gaussian** @@ -319,6 +366,10 @@ Changelog/News -------------- +**Version 3.0.0.0 - 14/02/2018** + +- UPGRADE: upgrade the xcms version from 1.46.0 to 3.0.0. So refactoring of a lot of underlining codes and methods + **Version 2.1.1 - 29/11/2017** - BUGFIX: To avoid issues with accented letter in the parentFile tag of the mzXML files, we changed a hidden mechanim to LC_ALL=C diff -r c013ed353a2f -r 4d6f4cd7c3ef lib.r --- a/lib.r Tue Feb 13 04:44:03 2018 -0500 +++ b/lib.r Thu Mar 01 04:16:45 2018 -0500 @@ -1,54 +1,105 @@ -#Authors ABiMS TEAM -#Lib.r for Galaxy Workflow4Metabolomics xcms tools -# -#version 2.4: lecorguille -# add getPeaklistW4M -#version 2.3: yguitton -# correction for empty PDF when only 1 class -#version 2.2 -# correct bug in Base Peak Chromatogram (BPC) option, not only TIC when scanrange used in xcmsSet -# Note if scanrange is used a warning is prompted in R console but do not stop PDF generation -#version 2.1: yguitton -# Modifications made by Guitton Yann +#@authors ABiMS TEAM, Y. Guitton +# lib.r for Galaxy Workflow4Metabolomics xcms tools +#@author G. Le Corguille +# solve an issue with batch if arguments are logical TRUE/FALSE +parseCommandArgs <- function(...) { + args <- batch::parseCommandArgs(...) + for (key in names(args)) { + if (args[key] %in% c("TRUE","FALSE")) + args[key] = as.logical(args[key]) + } + return(args) +} #@author G. Le Corguille -#This function convert if it is required the Retention Time in minutes +# This function will +# - load the packages +# - display the sessionInfo +loadAndDisplayPackages <- function(pkgs) { + for(pkg in pkgs) suppressPackageStartupMessages( stopifnot( library(pkg, quietly=TRUE, logical.return=TRUE, character.only=TRUE))) + + sessioninfo = sessionInfo() + cat(sessioninfo$R.version$version.string,"\n") + cat("Main packages:\n") + for (pkg in names(sessioninfo$otherPkgs)) { cat(paste(pkg,packageVersion(pkg)),"\t") }; cat("\n") + cat("Other loaded packages:\n") + for (pkg in names(sessioninfo$loadedOnly)) { cat(paste(pkg,packageVersion(pkg)),"\t") }; cat("\n") +} + +#@author G. Le Corguille +# This function convert if it is required the Retention Time in minutes RTSecondToMinute <- function(variableMetadata, convertRTMinute) { if (convertRTMinute){ #converting the retention times (seconds) into minutes print("converting the retention times into minutes in the variableMetadata") - variableMetadata[,"rt"]=variableMetadata[,"rt"]/60 - variableMetadata[,"rtmin"]=variableMetadata[,"rtmin"]/60 - variableMetadata[,"rtmax"]=variableMetadata[,"rtmax"]/60 + variableMetadata[,"rt"] <- variableMetadata[,"rt"]/60 + variableMetadata[,"rtmin"] <- variableMetadata[,"rtmin"]/60 + variableMetadata[,"rtmax"] <- variableMetadata[,"rtmax"]/60 } return (variableMetadata) } #@author G. Le Corguille -#This function format ions identifiers +# This function format ions identifiers formatIonIdentifiers <- function(variableMetadata, numDigitsRT=0, numDigitsMZ=0) { - splitDeco = strsplit(as.character(variableMetadata$name),"_") - idsDeco = sapply(splitDeco, function(x) { deco=unlist(x)[2]; if (is.na(deco)) return ("") else return(paste0("_",deco)) }) - namecustom = make.unique(paste0("M",round(variableMetadata[,"mz"],numDigitsMZ),"T",round(variableMetadata[,"rt"],numDigitsRT),idsDeco)) - variableMetadata=cbind(name=variableMetadata$name, namecustom=namecustom, variableMetadata[,!(colnames(variableMetadata) %in% c("name"))]) + splitDeco <- strsplit(as.character(variableMetadata$name),"_") + idsDeco <- sapply(splitDeco, function(x) { deco=unlist(x)[2]; if (is.na(deco)) return ("") else return(paste0("_",deco)) }) + namecustom <- make.unique(paste0("M",round(variableMetadata[,"mz"],numDigitsMZ),"T",round(variableMetadata[,"rt"],numDigitsRT),idsDeco)) + variableMetadata <- cbind(name=variableMetadata$name, namecustom=namecustom, variableMetadata[,!(colnames(variableMetadata) %in% c("name"))]) return(variableMetadata) } #@author G. Le Corguille +# Draw the plotChromPeakDensity 3 per page in a pdf file +getPlotChromPeakDensity <- function(xdata) { + pdf(file="plotChromPeakDensity.pdf", width=16, height=12) + + par(mfrow = c(3, 1), mar = c(4, 4, 1, 0.5)) + + group_colors <- brewer.pal(3, "Set1")[1:length(unique(xdata$sample_group))] + names(group_colors) <- unique(xdata$sample_group) + + xlim <- c(min(featureDefinitions(xdata)$rtmin), max(featureDefinitions(xdata)$rtmax)) + for (i in 1:nrow(featureDefinitions(xdata))) { + plotChromPeakDensity(xdata, mz=c(featureDefinitions(xdata)[i,]$mzmin,featureDefinitions(xdata)[i,]$mzmax), col=group_colors, pch=16, xlim=xlim) + legend("topright", legend=names(group_colors), col=group_colors, cex=0.8, lty=1) + } + + dev.off() +} + +#@author G. Le Corguille +# Draw the plotChromPeakDensity 3 per page in a pdf file +getPlotAdjustedRtime <- function(xdata) { + pdf(file="raw_vs_adjusted_rt.pdf", width=16, height=12) + # Color by group + group_colors <- brewer.pal(3, "Set1")[1:length(unique(xdata$sample_group))] + names(group_colors) <- unique(xdata$sample_group) + plotAdjustedRtime(xdata, col = group_colors[xdata$sample_group]) + legend("topright", legend=names(group_colors), col=group_colors, cex=0.8, lty=1) + # Color by sample + plotAdjustedRtime(xdata, col = rainbow(length(xdata@phenoData@data$sample_name))) + legend("topright", legend=xdata@phenoData@data$sample_name, col=rainbow(length(xdata@phenoData@data$sample_name)), cex=0.8, lty=1) + dev.off() +} + +#@author G. Le Corguille # value: intensity values to be used into, maxo or intb -getPeaklistW4M <- function(xset, intval="into",convertRTMinute=F,numDigitsMZ=4,numDigitsRT=0,variableMetadataOutput,dataMatrixOutput) { - variableMetadata_dataMatrix = peakTable(xset, method="medret", value=intval) - variableMetadata_dataMatrix = cbind(name=groupnames(xset),variableMetadata_dataMatrix) +getPeaklistW4M <- function(xdata, intval="into", convertRTMinute=F, numDigitsMZ=4, numDigitsRT=0, variableMetadataOutput, dataMatrixOutput) { + dataMatrix <- featureValues(xdata, method="medret", value=intval) + colnames(dataMatrix) <- tools::file_path_sans_ext(colnames(dataMatrix)) + dataMatrix = cbind(name=groupnamesW4M(xdata), dataMatrix) + variableMetadata <- featureDefinitions(xdata) + colnames(variableMetadata)[1] = "mz"; colnames(variableMetadata)[4] = "rt" + variableMetadata = data.frame(name=groupnamesW4M(xdata), variableMetadata) - dataMatrix = variableMetadata_dataMatrix[,(make.names(colnames(variableMetadata_dataMatrix)) %in% c("name", make.names(sampnames(xset))))] - - variableMetadata = variableMetadata_dataMatrix[,!(make.names(colnames(variableMetadata_dataMatrix)) %in% c(make.names(sampnames(xset))))] - variableMetadata = RTSecondToMinute(variableMetadata, convertRTMinute) - variableMetadata = formatIonIdentifiers(variableMetadata, numDigitsRT=numDigitsRT, numDigitsMZ=numDigitsMZ) + variableMetadata <- RTSecondToMinute(variableMetadata, convertRTMinute) + variableMetadata <- formatIonIdentifiers(variableMetadata, numDigitsRT=numDigitsRT, numDigitsMZ=numDigitsMZ) write.table(variableMetadata, file=variableMetadataOutput,sep="\t",quote=F,row.names=F) write.table(dataMatrix, file=dataMatrixOutput,sep="\t",quote=F,row.names=F) + } #@author Y. Guitton @@ -70,11 +121,11 @@ files <- filepaths(xcmsSet) } - phenoDataClass<-as.vector(levels(xcmsSet@phenoData[,1])) #sometime phenoData have more than 1 column use first as class + phenoDataClass <- as.vector(levels(xcmsSet@phenoData[,"class"])) #sometime phenoData have more than 1 column use first as class - classnames<-vector("list",length(phenoDataClass)) + classnames <- vector("list",length(phenoDataClass)) for (i in 1:length(phenoDataClass)){ - classnames[[i]]<-which( xcmsSet@phenoData[,1]==phenoDataClass[i]) + classnames[[i]] <- which( xcmsSet@phenoData[,"class"]==phenoDataClass[i]) } N <- dim(phenoData(xcmsSet))[1] @@ -101,12 +152,12 @@ pdf(pdfname,w=16,h=10) cols <- rainbow(N) - lty = 1:N - pch = 1:N + lty <- 1:N + pch <- 1:N #search for max x and max y in BPCs - xlim = range(sapply(TIC, function(x) range(x[,1]))) - ylim = range(sapply(TIC, function(x) range(x[,2]))) - ylim = c(-ylim[2], ylim[2]) + xlim <- range(sapply(TIC, function(x) range(x[,1]))) + ylim <- range(sapply(TIC, function(x) range(x[,2]))) + ylim <- c(-ylim[2], ylim[2]) ##plot start @@ -115,63 +166,63 @@ for (k in 1:(length(phenoDataClass)-1)){ for (l in (k+1):length(phenoDataClass)){ #print(paste(phenoDataClass[k],"vs",phenoDataClass[l],sep=" ")) - plot(0, 0, type="n", xlim = xlim/60, ylim = ylim, main = paste("Base Peak Chromatograms \n","BPCs_",phenoDataClass[k]," vs ",phenoDataClass[l], sep=""), xlab = "Retention Time (min)", ylab = "BPC") - colvect<-NULL + plot(0, 0, type="n", xlim=xlim/60, ylim=ylim, main=paste("Base Peak Chromatograms \n","BPCs_",phenoDataClass[k]," vs ",phenoDataClass[l], sep=""), xlab="Retention Time (min)", ylab="BPC") + colvect <- NULL for (j in 1:length(classnames[[k]])) { tic <- TIC[[classnames[[k]][j]]] - # points(tic[,1]/60, tic[,2], col = cols[i], pch = pch[i], type="l") - points(tic[,1]/60, tic[,2], col = cols[classnames[[k]][j]], pch = pch[classnames[[k]][j]], type="l") - colvect<-append(colvect,cols[classnames[[k]][j]]) + # points(tic[,1]/60, tic[,2], col=cols[i], pch=pch[i], type="l") + points(tic[,1]/60, tic[,2], col=cols[classnames[[k]][j]], pch=pch[classnames[[k]][j]], type="l") + colvect <- append(colvect,cols[classnames[[k]][j]]) } for (j in 1:length(classnames[[l]])) { - # i=class2names[j] + # i <- class2names[j] tic <- TIC[[classnames[[l]][j]]] - points(tic[,1]/60, -tic[,2], col = cols[classnames[[l]][j]], pch = pch[classnames[[l]][j]], type="l") - colvect<-append(colvect,cols[classnames[[l]][j]]) + points(tic[,1]/60, -tic[,2], col=cols[classnames[[l]][j]], pch=pch[classnames[[l]][j]], type="l") + colvect <- append(colvect,cols[classnames[[l]][j]]) } - legend("topright",paste(basename(files[c(classnames[[k]],classnames[[l]])])), col = colvect, lty = lty, pch = pch) + legend("topright",paste(basename(files[c(classnames[[k]],classnames[[l]])])), col=colvect, lty=lty, pch=pch) } } }#end if length >2 if (length(phenoDataClass)==2){ - k=1 - l=2 - colvect<-NULL - plot(0, 0, type="n", xlim = xlim/60, ylim = ylim, main = paste("Base Peak Chromatograms \n","BPCs_",phenoDataClass[k],"vs",phenoDataClass[l], sep=""), xlab = "Retention Time (min)", ylab = "BPC") + k <- 1 + l <- 2 + colvect <- NULL + plot(0, 0, type="n", xlim=xlim/60, ylim=ylim, main=paste("Base Peak Chromatograms \n","BPCs_",phenoDataClass[k],"vs",phenoDataClass[l], sep=""), xlab="Retention Time (min)", ylab="BPC") for (j in 1:length(classnames[[k]])) { tic <- TIC[[classnames[[k]][j]]] - # points(tic[,1]/60, tic[,2], col = cols[i], pch = pch[i], type="l") - points(tic[,1]/60, tic[,2], col = cols[classnames[[k]][j]], pch = pch[classnames[[k]][j]], type="l") + # points(tic[,1]/60, tic[,2], col=cols[i], pch=pch[i], type="l") + points(tic[,1]/60, tic[,2], col=cols[classnames[[k]][j]], pch=pch[classnames[[k]][j]], type="l") colvect<-append(colvect,cols[classnames[[k]][j]]) } for (j in 1:length(classnames[[l]])) { - # i=class2names[j] + # i <- class2names[j] tic <- TIC[[classnames[[l]][j]]] - points(tic[,1]/60, -tic[,2], col = cols[classnames[[l]][j]], pch = pch[classnames[[l]][j]], type="l") - colvect<-append(colvect,cols[classnames[[l]][j]]) + points(tic[,1]/60, -tic[,2], col=cols[classnames[[l]][j]], pch=pch[classnames[[l]][j]], type="l") + colvect <- append(colvect,cols[classnames[[l]][j]]) } - legend("topright",paste(basename(files[c(classnames[[k]],classnames[[l]])])), col = colvect, lty = lty, pch = pch) + legend("topright",paste(basename(files[c(classnames[[k]],classnames[[l]])])), col=colvect, lty=lty, pch=pch) }#end length ==2 #case where only one class if (length(phenoDataClass)==1){ - k=1 - ylim = range(sapply(TIC, function(x) range(x[,2]))) - colvect<-NULL - plot(0, 0, type="n", xlim = xlim/60, ylim = ylim, main = paste("Base Peak Chromatograms \n","BPCs_",phenoDataClass[k], sep=""), xlab = "Retention Time (min)", ylab = "BPC") + k <- 1 + ylim <- range(sapply(TIC, function(x) range(x[,2]))) + colvect <- NULL + plot(0, 0, type="n", xlim=xlim/60, ylim=ylim, main=paste("Base Peak Chromatograms \n","BPCs_",phenoDataClass[k], sep=""), xlab="Retention Time (min)", ylab="BPC") for (j in 1:length(classnames[[k]])) { tic <- TIC[[classnames[[k]][j]]] - # points(tic[,1]/60, tic[,2], col = cols[i], pch = pch[i], type="l") - points(tic[,1]/60, tic[,2], col = cols[classnames[[k]][j]], pch = pch[classnames[[k]][j]], type="l") - colvect<-append(colvect,cols[classnames[[k]][j]]) + # points(tic[,1]/60, tic[,2], col=cols[i], pch=pch[i], type="l") + points(tic[,1]/60, tic[,2], col=cols[classnames[[k]][j]], pch=pch[classnames[[k]][j]], type="l") + colvect <- append(colvect,cols[classnames[[k]][j]]) } - legend("topright",paste(basename(files[c(classnames[[k]])])), col = colvect, lty = lty, pch = pch) + legend("topright",paste(basename(files[c(classnames[[k]])])), col=colvect, lty=lty, pch=pch) }#end length ==1 @@ -183,34 +234,32 @@ #@author Y. Guitton -getTIC <- function(file,rtcor=NULL) { +getTIC <- function(file, rtcor=NULL) { object <- xcmsRaw(file) - cbind(if (is.null(rtcor)) object@scantime else rtcor, rawEIC(object,mzrange=range(object@env$mz))$intensity) + cbind(if (is.null(rtcor)) object@scantime else rtcor, rawEIC(object, mzrange=range(object@env$mz))$intensity) } -## -## overlay TIC from all files in current folder or from xcmsSet, create pdf -## +#overlay TIC from all files in current folder or from xcmsSet, create pdf #@author Y. Guitton -getTICs <- function(xcmsSet=NULL,files=NULL, pdfname="TICs.pdf",rt=c("raw","corrected")) { +getTICs <- function(xcmsSet=NULL,files=NULL, pdfname="TICs.pdf", rt=c("raw","corrected")) { cat("Creating TIC pdf...\n") if (is.null(xcmsSet)) { filepattern <- c("[Cc][Dd][Ff]", "[Nn][Cc]", "([Mm][Zz])?[Xx][Mm][Ll]", "[Mm][Zz][Dd][Aa][Tt][Aa]", "[Mm][Zz][Mm][Ll]") - filepattern <- paste(paste("\\.", filepattern, "$", sep = ""), collapse = "|") + filepattern <- paste(paste("\\.", filepattern, "$", sep=""), collapse="|") if (is.null(files)) files <- getwd() info <- file.info(files) - listed <- list.files(files[info$isdir], pattern = filepattern, recursive = TRUE, full.names = TRUE) + listed <- list.files(files[info$isdir], pattern=filepattern, recursive=TRUE, full.names=TRUE) files <- c(files[!info$isdir], listed) } else { files <- filepaths(xcmsSet) } - phenoDataClass<-as.vector(levels(xcmsSet@phenoData[,1])) #sometime phenoData have more than 1 column use first as class - classnames<-vector("list",length(phenoDataClass)) + phenoDataClass <- as.vector(levels(xcmsSet@phenoData[,"class"])) #sometime phenoData have more than 1 column use first as class + classnames <- vector("list",length(phenoDataClass)) for (i in 1:length(phenoDataClass)){ - classnames[[i]]<-which( xcmsSet@phenoData[,1]==phenoDataClass[i]) + classnames[[i]] <- which( xcmsSet@phenoData[,"class"]==phenoDataClass[i]) } N <- length(files) @@ -220,17 +269,17 @@ if (!is.null(xcmsSet) && rt == "corrected") rtcor <- xcmsSet@rt$corrected[[i]] else rtcor <- NULL - TIC[[i]] <- getTIC(files[i],rtcor=rtcor) + TIC[[i]] <- getTIC(files[i], rtcor=rtcor) } - pdf(pdfname,w=16,h=10) + pdf(pdfname, w=16, h=10) cols <- rainbow(N) - lty = 1:N - pch = 1:N + lty <- 1:N + pch <- 1:N #search for max x and max y in TICs - xlim = range(sapply(TIC, function(x) range(x[,1]))) - ylim = range(sapply(TIC, function(x) range(x[,2]))) - ylim = c(-ylim[2], ylim[2]) + xlim <- range(sapply(TIC, function(x) range(x[,1]))) + ylim <- range(sapply(TIC, function(x) range(x[,2]))) + ylim <- c(-ylim[2], ylim[2]) ##plot start @@ -238,61 +287,61 @@ for (k in 1:(length(phenoDataClass)-1)){ for (l in (k+1):length(phenoDataClass)){ #print(paste(phenoDataClass[k],"vs",phenoDataClass[l],sep=" ")) - plot(0, 0, type="n", xlim = xlim/60, ylim = ylim, main = paste("Total Ion Chromatograms \n","TICs_",phenoDataClass[k]," vs ",phenoDataClass[l], sep=""), xlab = "Retention Time (min)", ylab = "TIC") - colvect<-NULL + plot(0, 0, type="n", xlim=xlim/60, ylim=ylim, main=paste("Total Ion Chromatograms \n","TICs_",phenoDataClass[k]," vs ",phenoDataClass[l], sep=""), xlab="Retention Time (min)", ylab="TIC") + colvect <- NULL for (j in 1:length(classnames[[k]])) { tic <- TIC[[classnames[[k]][j]]] - # points(tic[,1]/60, tic[,2], col = cols[i], pch = pch[i], type="l") - points(tic[,1]/60, tic[,2], col = cols[classnames[[k]][j]], pch = pch[classnames[[k]][j]], type="l") - colvect<-append(colvect,cols[classnames[[k]][j]]) + # points(tic[,1]/60, tic[,2], col=cols[i], pch=pch[i], type="l") + points(tic[,1]/60, tic[,2], col=cols[classnames[[k]][j]], pch=pch[classnames[[k]][j]], type="l") + colvect <- append(colvect,cols[classnames[[k]][j]]) } for (j in 1:length(classnames[[l]])) { # i=class2names[j] tic <- TIC[[classnames[[l]][j]]] - points(tic[,1]/60, -tic[,2], col = cols[classnames[[l]][j]], pch = pch[classnames[[l]][j]], type="l") - colvect<-append(colvect,cols[classnames[[l]][j]]) + points(tic[,1]/60, -tic[,2], col=cols[classnames[[l]][j]], pch=pch[classnames[[l]][j]], type="l") + colvect <- append(colvect,cols[classnames[[l]][j]]) } - legend("topright",paste(basename(files[c(classnames[[k]],classnames[[l]])])), col = colvect, lty = lty, pch = pch) + legend("topright",paste(basename(files[c(classnames[[k]],classnames[[l]])])), col=colvect, lty=lty, pch=pch) } } }#end if length >2 if (length(phenoDataClass)==2){ - k=1 - l=2 + k <- 1 + l <- 2 - plot(0, 0, type="n", xlim = xlim/60, ylim = ylim, main = paste("Total Ion Chromatograms \n","TICs_",phenoDataClass[k],"vs",phenoDataClass[l], sep=""), xlab = "Retention Time (min)", ylab = "TIC") - colvect<-NULL + plot(0, 0, type="n", xlim=xlim/60, ylim=ylim, main=paste("Total Ion Chromatograms \n","TICs_",phenoDataClass[k],"vs",phenoDataClass[l], sep=""), xlab="Retention Time (min)", ylab="TIC") + colvect <- NULL for (j in 1:length(classnames[[k]])) { tic <- TIC[[classnames[[k]][j]]] - # points(tic[,1]/60, tic[,2], col = cols[i], pch = pch[i], type="l") - points(tic[,1]/60, tic[,2], col = cols[classnames[[k]][j]], pch = pch[classnames[[k]][j]], type="l") - colvect<-append(colvect,cols[classnames[[k]][j]]) + # points(tic[,1]/60, tic[,2], col=cols[i], pch=pch[i], type="l") + points(tic[,1]/60, tic[,2], col=cols[classnames[[k]][j]], pch=pch[classnames[[k]][j]], type="l") + colvect <- append(colvect,cols[classnames[[k]][j]]) } for (j in 1:length(classnames[[l]])) { - # i=class2names[j] + # i <- class2names[j] tic <- TIC[[classnames[[l]][j]]] - points(tic[,1]/60, -tic[,2], col = cols[classnames[[l]][j]], pch = pch[classnames[[l]][j]], type="l") - colvect<-append(colvect,cols[classnames[[l]][j]]) + points(tic[,1]/60, -tic[,2], col=cols[classnames[[l]][j]], pch=pch[classnames[[l]][j]], type="l") + colvect <- append(colvect,cols[classnames[[l]][j]]) } - legend("topright",paste(basename(files[c(classnames[[k]],classnames[[l]])])), col = colvect, lty = lty, pch = pch) + legend("topright",paste(basename(files[c(classnames[[k]],classnames[[l]])])), col=colvect, lty=lty, pch=pch) }#end length ==2 #case where only one class if (length(phenoDataClass)==1){ - k=1 - ylim = range(sapply(TIC, function(x) range(x[,2]))) + k <- 1 + ylim <- range(sapply(TIC, function(x) range(x[,2]))) - plot(0, 0, type="n", xlim = xlim/60, ylim = ylim, main = paste("Total Ion Chromatograms \n","TICs_",phenoDataClass[k], sep=""), xlab = "Retention Time (min)", ylab = "TIC") - colvect<-NULL + plot(0, 0, type="n", xlim=xlim/60, ylim=ylim, main=paste("Total Ion Chromatograms \n","TICs_",phenoDataClass[k], sep=""), xlab="Retention Time (min)", ylab="TIC") + colvect <- NULL for (j in 1:length(classnames[[k]])) { tic <- TIC[[classnames[[k]][j]]] - # points(tic[,1]/60, tic[,2], col = cols[i], pch = pch[i], type="l") - points(tic[,1]/60, tic[,2], col = cols[classnames[[k]][j]], pch = pch[classnames[[k]][j]], type="l") - colvect<-append(colvect,cols[classnames[[k]][j]]) + # points(tic[,1]/60, tic[,2], col=cols[i], pch=pch[i], type="l") + points(tic[,1]/60, tic[,2], col=cols[classnames[[k]][j]], pch=pch[classnames[[k]][j]], type="l") + colvect <- append(colvect,cols[classnames[[k]][j]]) } - legend("topright",paste(basename(files[c(classnames[[k]])])), col = colvect, lty = lty, pch = pch) + legend("topright",paste(basename(files[c(classnames[[k]])])), col=colvect, lty=lty, pch=pch) }#end length ==1 @@ -303,17 +352,19 @@ -## -## Get the polarities from all the samples of a condition +# Get the polarities from all the samples of a condition #@author Misharl Monsoor misharl.monsoor@sb-roscoff.fr ABiMS TEAM #@author Gildas Le Corguille lecorguille@sb-roscoff.fr ABiMS TEAM -getSampleMetadata <- function(xcmsSet=NULL, sampleMetadataOutput="sampleMetadata.tsv") { +getSampleMetadata <- function(xdata=NULL, sampleMetadataOutput="sampleMetadata.tsv") { cat("Creating the sampleMetadata file...\n") #Create the sampleMetada dataframe - sampleMetadata=xset@phenoData - sampleNamesOrigin=rownames(sampleMetadata) - sampleNamesMakeNames=make.names(sampleNamesOrigin) + sampleMetadata <- xdata@phenoData@data + rownames(sampleMetadata) <- NULL + colnames(sampleMetadata) <- c("sampleMetadata", "class") + + sampleNamesOrigin <- sampleMetadata$sampleMetadata + sampleNamesMakeNames <- make.names(sampleNamesOrigin) if (any(duplicated(sampleNamesMakeNames))) { write("\n\nERROR: Usually, R has trouble to deal with special characters in its column names, so it rename them using make.names().\nIn your case, at least two columns after the renaming obtain the same name, thus XCMS will collapse those columns per name.", stderr()) @@ -330,63 +381,49 @@ } } - sampleMetadata$sampleMetadata=sampleNamesMakeNames - sampleMetadata=cbind(sampleMetadata["sampleMetadata"],sampleMetadata["class"]) #Reorder columns - rownames(sampleMetadata)=NULL + sampleMetadata$sampleMetadata <- sampleNamesMakeNames + - #Create a list of files name in the current directory - list_files=xset@filepaths #For each sample file, the following actions are done - for (file in list_files){ + for (fileIdx in 1:length(fileNames(xdata))) { #Check if the file is in the CDF format - if (!mzR:::netCDFIsFile(file)){ + if (!mzR:::netCDFIsFile(fileNames(xdata))) { # If the column isn't exist, with add one filled with NA - if (is.null(sampleMetadata$polarity)) sampleMetadata$polarity=NA + if (is.null(sampleMetadata$polarity)) sampleMetadata$polarity <- NA - #Create a simple xcmsRaw object for each sample - xcmsRaw=xcmsRaw(file) #Extract the polarity (a list of polarities) - polarity=xcmsRaw@polarity + polarity <- fData(xdata)[fData(xdata)$fileIdx == fileIdx,"polarity"] #Verify if all the scans have the same polarity - uniq_list=unique(polarity) + uniq_list <- unique(polarity) if (length(uniq_list)>1){ - polarity="mixed" + polarity <- "mixed" } else { - polarity=as.character(uniq_list) + polarity <- as.character(uniq_list) } - #Transforms the character to obtain only the sample name - filename=basename(file) - library(tools) - samplename=file_path_sans_ext(filename) #Set the polarity attribute - sampleMetadata$polarity[sampleMetadata$sampleMetadata==samplename]=polarity - - #Delete xcmsRaw object because it creates a bug for the fillpeaks step - rm(xcmsRaw) + sampleMetadata$polarity[fileIdx] <- polarity } } write.table(sampleMetadata, sep="\t", quote=FALSE, row.names=FALSE, file=sampleMetadataOutput) - return(list("sampleNamesOrigin"=sampleNamesOrigin,"sampleNamesMakeNames"=sampleNamesMakeNames)) + return(list("sampleNamesOrigin"=sampleNamesOrigin, "sampleNamesMakeNames"=sampleNamesMakeNames)) } -## -## This function check if xcms will found all the files -## +# This function check if xcms will found all the files #@author Gildas Le Corguille lecorguille@sb-roscoff.fr ABiMS TEAM checkFilesCompatibilityWithXcms <- function(directory) { cat("Checking files filenames compatibilities with xmcs...\n") # WHAT XCMS WILL FIND filepattern <- c("[Cc][Dd][Ff]", "[Nn][Cc]", "([Mm][Zz])?[Xx][Mm][Ll]","[Mm][Zz][Dd][Aa][Tt][Aa]", "[Mm][Zz][Mm][Ll]") - filepattern <- paste(paste("\\.", filepattern, "$", sep = ""),collapse = "|") + filepattern <- paste(paste("\\.", filepattern, "$", sep=""),collapse="|") info <- file.info(directory) - listed <- list.files(directory[info$isdir], pattern = filepattern,recursive = TRUE, full.names = TRUE) + listed <- list.files(directory[info$isdir], pattern=filepattern, recursive=TRUE, full.names=TRUE) files <- c(directory[!info$isdir], listed) files_abs <- file.path(getwd(), files) exists <- file.exists(files_abs) @@ -394,8 +431,8 @@ files[exists] <- sub("//","/",files[exists]) # WHAT IS ON THE FILESYSTEM - filesystem_filepaths=system(paste("find $PWD/",directory," -not -name '\\.*' -not -path '*conda-env*' -type f -name \"*\"", sep=""), intern=T) - filesystem_filepaths=filesystem_filepaths[grep(filepattern, filesystem_filepaths, perl=T)] + filesystem_filepaths <- system(paste("find $PWD/",directory," -not -name '\\.*' -not -path '*conda-env*' -type f -name \"*\"", sep=""), intern=T) + filesystem_filepaths <- filesystem_filepaths[grep(filepattern, filesystem_filepaths, perl=T)] # COMPARISON if (!is.na(table(filesystem_filepaths %in% files)["FALSE"])) { @@ -406,16 +443,26 @@ } +#This function list the compatible files within the directory as xcms did +#@author Gildas Le Corguille lecorguille@sb-roscoff.fr ABiMS TEAM +getMSFiles <- function (directory) { + filepattern <- c("[Cc][Dd][Ff]", "[Nn][Cc]", "([Mm][Zz])?[Xx][Mm][Ll]","[Mm][Zz][Dd][Aa][Tt][Aa]", "[Mm][Zz][Mm][Ll]") + filepattern <- paste(paste("\\.", filepattern, "$", sep=""),collapse="|") + info <- file.info(directory) + listed <- list.files(directory[info$isdir], pattern=filepattern,recursive=TRUE, full.names=TRUE) + files <- c(directory[!info$isdir], listed) + exists <- file.exists(files) + files <- files[exists] + return(files) +} -## -## This function check if XML contains special caracters. It also checks integrity and completness. -## +# This function check if XML contains special caracters. It also checks integrity and completness. #@author Misharl Monsoor misharl.monsoor@sb-roscoff.fr ABiMS TEAM checkXmlStructure <- function (directory) { cat("Checking XML structure...\n") - cmd=paste("IFS=$'\n'; for xml in $(find",directory,"-not -name '\\.*' -not -path '*conda-env*' -type f -iname '*.*ml*'); do if [ $(xmllint --nonet --noout \"$xml\" 2> /dev/null; echo $?) -gt 0 ]; then echo $xml;fi; done;") - capture=system(cmd,intern=TRUE) + cmd <- paste("IFS=$'\n'; for xml in $(find",directory,"-not -name '\\.*' -not -path '*conda-env*' -type f -iname '*.*ml*'); do if [ $(xmllint --nonet --noout \"$xml\" 2> /dev/null; echo $?) -gt 0 ]; then echo $xml;fi; done;") + capture <- system(cmd, intern=TRUE) if (length(capture)>0){ #message=paste("The following mzXML or mzML file is incorrect, please check these files first:",capture) @@ -427,24 +474,22 @@ } -## -## This function check if XML contain special characters -## +# This function check if XML contain special characters #@author Misharl Monsoor misharl.monsoor@sb-roscoff.fr ABiMS TEAM deleteXmlBadCharacters<- function (directory) { cat("Checking Non ASCII characters in the XML...\n") - processed=F - l=system( paste("find",directory, "-not -name '\\.*' -not -path '*conda-env*' -type f -iname '*.*ml*'"),intern=TRUE) + processed <- F + l <- system( paste("find",directory, "-not -name '\\.*' -not -path '*conda-env*' -type f -iname '*.*ml*'"), intern=TRUE) for (i in l){ - cmd=paste("LC_ALL=C grep '[^ -~]' \"",i,"\"",sep="") - capture=suppressWarnings(system(cmd,intern=TRUE)) + cmd <- paste("LC_ALL=C grep '[^ -~]' \"", i, "\"", sep="") + capture <- suppressWarnings(system(cmd, intern=TRUE)) if (length(capture)>0){ - cmd=paste("perl -i -pe 's/[^[:ascii:]]//g;'",i) + cmd <- paste("perl -i -pe 's/[^[:ascii:]]//g;'",i) print( paste("WARNING: Non ASCII characters have been removed from the ",i,"file") ) - c=system(cmd,intern=TRUE) - capture="" - processed=T + c <- system(cmd, intern=TRUE) + capture <- "" + processed <- T } } if (processed) cat("\n\n") @@ -452,17 +497,15 @@ } -## -## This function will compute MD5 checksum to check the data integrity -## +# This function will compute MD5 checksum to check the data integrity #@author Gildas Le Corguille lecorguille@sb-roscoff.fr getMd5sum <- function (directory) { cat("Compute md5 checksum...\n") # WHAT XCMS WILL FIND filepattern <- c("[Cc][Dd][Ff]", "[Nn][Cc]", "([Mm][Zz])?[Xx][Mm][Ll]","[Mm][Zz][Dd][Aa][Tt][Aa]", "[Mm][Zz][Mm][Ll]") - filepattern <- paste(paste("\\.", filepattern, "$", sep = ""),collapse = "|") + filepattern <- paste(paste("\\.", filepattern, "$", sep=""),collapse="|") info <- file.info(directory) - listed <- list.files(directory[info$isdir], pattern = filepattern,recursive = TRUE, full.names = TRUE) + listed <- list.files(directory[info$isdir], pattern=filepattern, recursive=TRUE, full.names=TRUE) files <- c(directory[!info$isdir], listed) exists <- file.exists(files) files <- files[exists] @@ -476,80 +519,246 @@ # This function get the raw file path from the arguments -getRawfilePathFromArguments <- function(singlefile, zipfile, listArguments) { - if (!is.null(listArguments[["zipfile"]])) zipfile = listArguments[["zipfile"]] - if (!is.null(listArguments[["zipfilePositive"]])) zipfile = listArguments[["zipfilePositive"]] - if (!is.null(listArguments[["zipfileNegative"]])) zipfile = listArguments[["zipfileNegative"]] +#@author Gildas Le Corguille lecorguille@sb-roscoff.fr +getRawfilePathFromArguments <- function(singlefile, zipfile, args) { + if (!is.null(args$zipfile)) zipfile <- args$zipfile + if (!is.null(args$zipfilePositive)) zipfile <- args$zipfilePositive + if (!is.null(args$zipfileNegative)) zipfile <- args$zipfileNegative - if (!is.null(listArguments[["singlefile_galaxyPath"]])) { - singlefile_galaxyPaths = listArguments[["singlefile_galaxyPath"]]; - singlefile_sampleNames = listArguments[["singlefile_sampleName"]] + if (!is.null(args$singlefile_galaxyPath)) { + singlefile_galaxyPaths <- args$singlefile_galaxyPath; + singlefile_sampleNames <- args$singlefile_sampleName } - if (!is.null(listArguments[["singlefile_galaxyPathPositive"]])) { - singlefile_galaxyPaths = listArguments[["singlefile_galaxyPathPositive"]]; - singlefile_sampleNames = listArguments[["singlefile_sampleNamePositive"]] + if (!is.null(args$singlefile_galaxyPathPositive)) { + singlefile_galaxyPaths <- args$singlefile_galaxyPathPositive; + singlefile_sampleNames <- args$singlefile_sampleNamePositive } - if (!is.null(listArguments[["singlefile_galaxyPathNegative"]])) { - singlefile_galaxyPaths = listArguments[["singlefile_galaxyPathNegative"]]; - singlefile_sampleNames = listArguments[["singlefile_sampleNameNegative"]] + if (!is.null(args$singlefile_galaxyPathNegative)) { + singlefile_galaxyPaths <- args$singlefile_galaxyPathNegative; + singlefile_sampleNames <- args$singlefile_sampleNameNegative } if (exists("singlefile_galaxyPaths")){ - singlefile_galaxyPaths = unlist(strsplit(singlefile_galaxyPaths,",")) - singlefile_sampleNames = unlist(strsplit(singlefile_sampleNames,",")) + singlefile_galaxyPaths <- unlist(strsplit(singlefile_galaxyPaths,",")) + singlefile_sampleNames <- unlist(strsplit(singlefile_sampleNames,",")) - singlefile=NULL + singlefile <- NULL for (singlefile_galaxyPath_i in seq(1:length(singlefile_galaxyPaths))) { - singlefile_galaxyPath=singlefile_galaxyPaths[singlefile_galaxyPath_i] - singlefile_sampleName=singlefile_sampleNames[singlefile_galaxyPath_i] - singlefile[[singlefile_sampleName]] = singlefile_galaxyPath + singlefile_galaxyPath <- singlefile_galaxyPaths[singlefile_galaxyPath_i] + singlefile_sampleName <- singlefile_sampleNames[singlefile_galaxyPath_i] + singlefile[[singlefile_sampleName]] <- singlefile_galaxyPath } } for (argument in c("zipfile","zipfilePositive","zipfileNegative","singlefile_galaxyPath","singlefile_sampleName","singlefile_galaxyPathPositive","singlefile_sampleNamePositive","singlefile_galaxyPathNegative","singlefile_sampleNameNegative")) { - listArguments[[argument]]=NULL + args[[argument]] <- NULL } - return(list(zipfile=zipfile, singlefile=singlefile, listArguments=listArguments)) + return(list(zipfile=zipfile, singlefile=singlefile, args=args)) } # This function retrieve the raw file in the working directory # - if zipfile: unzip the file with its directory tree # - if singlefiles: set symlink with the good filename +#@author Gildas Le Corguille lecorguille@sb-roscoff.fr retrieveRawfileInTheWorkingDirectory <- function(singlefile, zipfile) { if(!is.null(singlefile) && (length("singlefile")>0)) { for (singlefile_sampleName in names(singlefile)) { - singlefile_galaxyPath = singlefile[[singlefile_sampleName]] + singlefile_galaxyPath <- singlefile[[singlefile_sampleName]] if(!file.exists(singlefile_galaxyPath)){ - error_message=paste("Cannot access the sample:",singlefile_sampleName,"located:",singlefile_galaxyPath,". Please, contact your administrator ... if you have one!") + error_message <- paste("Cannot access the sample:",singlefile_sampleName,"located:",singlefile_galaxyPath,". Please, contact your administrator ... if you have one!") print(error_message); stop(error_message) } - file.symlink(singlefile_galaxyPath,singlefile_sampleName) + if (!suppressWarnings( try (file.link(singlefile_galaxyPath, singlefile_sampleName), silent=T))) + file.copy(singlefile_galaxyPath, singlefile_sampleName) + } - directory = "." + directory <- "." } - if(!is.null(zipfile) && (zipfile!="")) { + if(!is.null(zipfile) && (zipfile != "")) { if(!file.exists(zipfile)){ - error_message=paste("Cannot access the Zip file:",zipfile,". Please, contact your administrator ... if you have one!") + error_message <- paste("Cannot access the Zip file:",zipfile,". Please, contact your administrator ... if you have one!") print(error_message) stop(error_message) } #list all file in the zip file - #zip_files=unzip(zipfile,list=T)[,"Name"] + #zip_files <- unzip(zipfile,list=T)[,"Name"] #unzip suppressWarnings(unzip(zipfile, unzip="unzip")) #get the directory name - filesInZip=unzip(zipfile, list=T); - directories=unique(unlist(lapply(strsplit(filesInZip$Name,"/"), function(x) x[1]))); - directories=directories[!(directories %in% c("__MACOSX")) & file.info(directories)$isdir] - directory = "." - if (length(directories) == 1) directory = directories + suppressWarnings(filesInZip <- unzip(zipfile, list=T)) + directories <- unique(unlist(lapply(strsplit(filesInZip$Name,"/"), function(x) x[1]))) + directories <- directories[!(directories %in% c("__MACOSX")) & file.info(directories)$isdir] + directory <- "." + if (length(directories) == 1) directory <- directories cat("files_root_directory\t",directory,"\n") } return (directory) } + + +# This function retrieve a xset like object +#@author Gildas Le Corguille lecorguille@sb-roscoff.fr +getxcmsSetObject <- function(xobject) { + # XCMS 1.x + if (class(xobject) == "xcmsSet") + return (xobject) + # XCMS 3.x + if (class(xobject) == "XCMSnExp") { + # Get the legacy xcmsSet object + suppressWarnings(xset <- as(xobject, 'xcmsSet')) + sampclass(xset) <- xset@phenoData$sample_group + return (xset) + } +} + + +#@TODO: remove this function as soon as we can use xcms 3.x.x from Bioconductor 3.7 +# https://github.com/sneumann/xcms/issues/250 +groupnamesW4M <- function(xdata, mzdec = 0, rtdec = 0) { + mzfmt <- paste("%.", mzdec, "f", sep = "") + rtfmt <- paste("%.", rtdec, "f", sep = "") + + gnames <- paste("M", sprintf(mzfmt, featureDefinitions(xdata)[,"mzmed"]), "T", + sprintf(rtfmt, featureDefinitions(xdata)[,"rtmed"]), sep = "") + + if (any(dup <- duplicated(gnames))) + for (dupname in unique(gnames[dup])) { + dupidx <- which(gnames == dupname) + gnames[dupidx] <- paste(gnames[dupidx], seq(along = dupidx), sep = "_") + } + + return (gnames) +} + +#@TODO: remove this function as soon as we can use xcms 3.x.x from Bioconductor 3.7 +# https://github.com/sneumann/xcms/issues/247 +.concatenate_XCMSnExp <- function(...) { + x <- list(...) + if (length(x) == 0) + return(NULL) + if (length(x) == 1) + return(x[[1]]) + ## Check that all are XCMSnExp objects. + if (!all(unlist(lapply(x, function(z) is(z, "XCMSnExp"))))) + stop("All passed objects should be 'XCMSnExp' objects") + new_x <- as(.concatenate_OnDiskMSnExp(...), "XCMSnExp") + ## If any of the XCMSnExp has alignment results or detected features drop + ## them! + x <- lapply(x, function(z) { + if (hasAdjustedRtime(z)) { + z <- dropAdjustedRtime(z) + warning("Adjusted retention times found, had to drop them.") + } + if (hasFeatures(z)) { + z <- dropFeatureDefinitions(z) + warning("Feature definitions found, had to drop them.") + } + z + }) + ## Combine peaks + fls <- lapply(x, fileNames) + startidx <- cumsum(lengths(fls)) + pks <- lapply(x, chromPeaks) + procH <- lapply(x, processHistory) + for (i in 2:length(fls)) { + pks[[i]][, "sample"] <- pks[[i]][, "sample"] + startidx[i - 1] + procH[[i]] <- lapply(procH[[i]], function(z) { + z@fileIndex <- as.integer(z@fileIndex + startidx[i - 1]) + z + }) + } + pks <- do.call(rbind, pks) + new_x@.processHistory <- unlist(procH) + chromPeaks(new_x) <- pks + if (validObject(new_x)) + new_x +} + +#@TODO: remove this function as soon as we can use xcms 3.x.x from Bioconductor 3.7 +# https://github.com/sneumann/xcms/issues/247 +.concatenate_OnDiskMSnExp <- function(...) { + x <- list(...) + if (length(x) == 0) + return(NULL) + if (length(x) == 1) + return(x[[1]]) + ## Check that all are XCMSnExp objects. + if (!all(unlist(lapply(x, function(z) is(z, "OnDiskMSnExp"))))) + stop("All passed objects should be 'OnDiskMSnExp' objects") + ## Check processingQueue + procQ <- lapply(x, function(z) z@spectraProcessingQueue) + new_procQ <- procQ[[1]] + is_ok <- unlist(lapply(procQ, function(z) + !is.character(all.equal(new_procQ, z)) + )) + if (any(!is_ok)) { + warning("Processing queues from the submitted objects differ! ", + "Dropping the processing queue.") + new_procQ <- list() + } + ## processingData + fls <- lapply(x, function(z) z@processingData@files) + startidx <- cumsum(lengths(fls)) + ## featureData + featd <- lapply(x, fData) + ## Have to update the file index and the spectrum names. + for (i in 2:length(featd)) { + featd[[i]]$fileIdx <- featd[[i]]$fileIdx + startidx[i - 1] + rownames(featd[[i]]) <- MSnbase:::formatFileSpectrumNames( + fileIds = featd[[i]]$fileIdx, + spectrumIds = featd[[i]]$spIdx, + nSpectra = nrow(featd[[i]]), + nFiles = length(unlist(fls)) + ) + } + featd <- do.call(rbind, featd) + featd$spectrum <- 1:nrow(featd) + ## experimentData + expdata <- lapply(x, function(z) { + ed <- z@experimentData + data.frame(instrumentManufacturer = ed@instrumentManufacturer, + instrumentModel = ed@instrumentModel, + ionSource = ed@ionSource, + analyser = ed@analyser, + detectorType = ed@detectorType, + stringsAsFactors = FALSE) + }) + expdata <- do.call(rbind, expdata) + expdata <- new("MIAPE", + instrumentManufacturer = expdata$instrumentManufacturer, + instrumentModel = expdata$instrumentModel, + ionSource = expdata$ionSource, + analyser = expdata$analyser, + detectorType = expdata$detectorType) + + ## protocolData + protodata <- lapply(x, function(z) z@protocolData) + if (any(unlist(lapply(protodata, nrow)) > 0)) + warning("Found non-empty protocol data, but merging protocol data is", + " currently not supported. Skipped.") + ## phenoData + pdata <- do.call(rbind, lapply(x, pData)) + res <- new( + "OnDiskMSnExp", + phenoData = new("NAnnotatedDataFrame", data = pdata), + featureData = new("AnnotatedDataFrame", featd), + processingData = new("MSnProcess", + processing = paste0("Concatenated [", date(), "]"), + files = unlist(fls), smoothed = NA), + experimentData = expdata, + spectraProcessingQueue = new_procQ) + if (validObject(res)) + res +} + +#@TODO: remove this function as soon as we can use xcms 3.x.x from Bioconductor 3.7 +# https://github.com/sneumann/xcms/issues/247 +c.XCMSnExp <- function(...) { + .concatenate_XCMSnExp(...) +} diff -r c013ed353a2f -r 4d6f4cd7c3ef macros.xml --- a/macros.xml Tue Feb 13 04:44:03 2018 -0500 +++ b/macros.xml Thu Mar 01 04:16:45 2018 -0500 @@ -1,15 +1,12 @@ + 3.0.0 - r-snow - bioconductor-xcms + bioconductor-xcms r-batch - - - - - bioconductor-xcms + r-rcolorbrewer + @@ -18,15 +15,12 @@ - - LC_ALL=C Rscript $__tool_directory__/xcms.r - + LC_ALL=C Rscript $__tool_directory__/ ; return=\$?; - mv log.txt '$log'; - cat '$log'; + cat 'log.txt'; sh -c "exit \$return" @@ -70,6 +64,15 @@ + +
+ + + + +
+
+
@@ -81,8 +84,6 @@ #if $peaklist.peaklistBool - variableMetadataOutput '$variableMetadata' - dataMatrixOutput '$dataMatrix' convertRTMinute $peaklist.convertRTMinute numDigitsMZ $peaklist.numDigitsMZ numDigitsRT $peaklist.numDigitsRT @@ -108,10 +109,10 @@ - + (peaklist['peaklistBool']) - + (peaklist['peaklistBool']) @@ -131,6 +132,39 @@ + + +For details and explanations for all the parameters and the workflow of xcms_ package, see its manual_ and this example_ + +.. _xcms: https://bioconductor.org/packages/release/bioc/html/xcms.html +.. _manual: http://www.bioconductor.org/packages/release/bioc/manuals/xcms/man/xcms.pdf +.. _example: https://bioconductor.org/packages/release/bioc/vignettes/xcms/inst/doc/xcms.html + + + + + +Get a Peak List +--------------- + +If 'true', the module generates two additional files corresponding to the peak list: +- the variable metadata file (corresponding to information about extracted ions such as mass or retention time) +- the data matrix (corresponding to related intensities) + +**decimal places for [mass or retention time] values in identifiers** + + | Ions' identifiers are constructed as MxxxTyyy where 'xxx' is the ion median mass and 'yyy' the ion median retention time. + | Two parameters are used to adjust the number of decimal places wanted in identifiers for mass and retention time respectively. + | Theses parameters do not affect decimal places in columns other than the identifier one. + +**Reported intensity values** + + | This parameter determines which values should be reported as intensities in the dataMatrix table; it correspond to xcms 'intval' parameter: + | - into: integrated area of original (raw) peak + | - maxo: maximum intensity of original (raw) peak + | - intb: baseline corrected integrated peak area (only available if peak detection was done by ‘findPeaks.centWave’) + + diff -r c013ed353a2f -r 4d6f4cd7c3ef test-data/faahKO-single-class.xset.group.RData Binary file test-data/faahKO-single-class.xset.group.RData has changed diff -r c013ed353a2f -r 4d6f4cd7c3ef test-data/faahKO.xset.group.RData Binary file test-data/faahKO.xset.group.RData has changed diff -r c013ed353a2f -r 4d6f4cd7c3ef xcms.r --- a/xcms.r Tue Feb 13 04:44:03 2018 -0500 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,229 +0,0 @@ -#!/usr/bin/env Rscript -# xcms.r version="2.2.0" -#Authors ABIMS TEAM -#BPC Addition from Y.guitton - - -# ----- LOG FILE ----- -log_file=file("log.txt", open = "wt") -sink(log_file) -sink(log_file, type = "output") - - -# ----- PACKAGE ----- -cat("\tPACKAGE INFO\n") -#pkgs=c("xcms","batch") -pkgs=c("parallel","BiocGenerics", "Biobase", "Rcpp", "mzR", "xcms","snow","batch") -for(pkg in pkgs) { - suppressPackageStartupMessages( stopifnot( library(pkg, quietly=TRUE, logical.return=TRUE, character.only=TRUE))) - cat(pkg,"\t",as.character(packageVersion(pkg)),"\n",sep="") -} -source_local <- function(fname){ argv <- commandArgs(trailingOnly = FALSE); base_dir <- dirname(substring(argv[grep("--file=", argv)], 8)); source(paste(base_dir, fname, sep="/")) } -cat("\n\n"); - - - - - -# ----- ARGUMENTS ----- -cat("\tARGUMENTS INFO\n") -listArguments = parseCommandArgs(evaluate=FALSE) #interpretation of arguments given in command line as an R list of objects -write.table(as.matrix(listArguments), col.names=F, quote=F, sep='\t') - -cat("\n\n"); - - -# ----- ARGUMENTS PROCESSING ----- -cat("\tINFILE PROCESSING INFO\n") - -#image is an .RData file necessary to use xset variable given by previous tools -if (!is.null(listArguments[["image"]])){ - load(listArguments[["image"]]); listArguments[["image"]]=NULL -} - -#Import the different functions -source_local("lib.r") - -cat("\n\n") - -#Import the different functions - -# ----- PROCESSING INFILE ----- -cat("\tARGUMENTS PROCESSING INFO\n") - -# Save arguments to generate a report -if (!exists("listOFlistArguments")) listOFlistArguments=list() -listOFlistArguments[[paste(format(Sys.time(), "%y%m%d-%H:%M:%S_"),listArguments[["xfunction"]],sep="")]] = listArguments - - -#saving the commun parameters -thefunction = listArguments[["xfunction"]]; listArguments[["xfunction"]]=NULL #delete from the list of arguments - -xsetRdataOutput = paste(thefunction,"RData",sep=".") -if (!is.null(listArguments[["xsetRdataOutput"]])){ - xsetRdataOutput = listArguments[["xsetRdataOutput"]]; listArguments[["xsetRdataOutput"]]=NULL -} - -#saving the specific parameters -rplotspdf = "Rplots.pdf" -if (!is.null(listArguments[["rplotspdf"]])){ - rplotspdf = listArguments[["rplotspdf"]]; listArguments[["rplotspdf"]]=NULL -} -sampleMetadataOutput = "sampleMetadata.tsv" -if (!is.null(listArguments[["sampleMetadataOutput"]])){ - sampleMetadataOutput = listArguments[["sampleMetadataOutput"]]; listArguments[["sampleMetadataOutput"]]=NULL -} -variableMetadataOutput = "variableMetadata.tsv" -if (!is.null(listArguments[["variableMetadataOutput"]])){ - variableMetadataOutput = listArguments[["variableMetadataOutput"]]; listArguments[["variableMetadataOutput"]]=NULL -} -dataMatrixOutput = "dataMatrix.tsv" -if (!is.null(listArguments[["dataMatrixOutput"]])){ - dataMatrixOutput = listArguments[["dataMatrixOutput"]]; listArguments[["dataMatrixOutput"]]=NULL -} -if (!is.null(listArguments[["convertRTMinute"]])){ - convertRTMinute = listArguments[["convertRTMinute"]]; listArguments[["convertRTMinute"]]=NULL -} -if (!is.null(listArguments[["numDigitsMZ"]])){ - numDigitsMZ = listArguments[["numDigitsMZ"]]; listArguments[["numDigitsMZ"]]=NULL -} -if (!is.null(listArguments[["numDigitsRT"]])){ - numDigitsRT = listArguments[["numDigitsRT"]]; listArguments[["numDigitsRT"]]=NULL -} -if (!is.null(listArguments[["intval"]])){ - intval = listArguments[["intval"]]; listArguments[["intval"]]=NULL -} - -if (thefunction %in% c("xcmsSet","retcor")) { - ticspdf = listArguments[["ticspdf"]]; listArguments[["ticspdf"]]=NULL - bicspdf = listArguments[["bicspdf"]]; listArguments[["bicspdf"]]=NULL -} - - -if (thefunction %in% c("xcmsSet","retcor","fillPeaks")) { - if (!exists("singlefile")) singlefile=NULL - if (!exists("zipfile")) zipfile=NULL - rawFilePath = getRawfilePathFromArguments(singlefile, zipfile, listArguments) - zipfile = rawFilePath$zipfile - singlefile = rawFilePath$singlefile - listArguments = rawFilePath$listArguments - directory = retrieveRawfileInTheWorkingDirectory(singlefile, zipfile) - md5sumList=list("origin"=getMd5sum(directory)) -} - -#addition of the directory to the list of arguments in the first position -if (thefunction == "xcmsSet") { - checkXmlStructure(directory) - checkFilesCompatibilityWithXcms(directory) - listArguments=append(directory, listArguments) -} - - -#addition of xset object to the list of arguments in the first position -if (exists("xset")){ - listArguments=append(list(xset), listArguments) -} - -cat("\n\n") - - - - -# ----- MAIN PROCESSING INFO ----- -cat("\tMAIN PROCESSING INFO\n") - - -#Verification of a group step before doing the fillpeaks job. - -if (thefunction == "fillPeaks") { - res=try(is.null(groupnames(xset))) - if (class(res) == "try-error"){ - error<-geterrmessage() - write(error, stderr()) - stop("You must always do a group step after a retcor. Otherwise it won't work for the fillpeaks step") - } - -} - -#change the default display settings -#dev.new(file="Rplots.pdf", width=16, height=12) -pdf(file=rplotspdf, width=16, height=12) -if (thefunction == "group") { - par(mfrow=c(2,2)) -} -#else if (thefunction == "retcor") { -#try to change the legend display -# par(xpd=NA) -# par(xpd=T, mar=par()$mar+c(0,0,0,4)) -#} - - -#execution of the function "thefunction" with the parameters given in "listArguments" - -cat("\t\tCOMPUTE\n") -xset = do.call(thefunction, listArguments) - -# check if there are no peaks -if (nrow(peaks(xset)) == 0) { - stop("No peaks were detected. You should review your settings") -} - - -cat("\n\n") - -dev.off() #dev.new(file="Rplots.pdf", width=16, height=12) - -if (thefunction == "xcmsSet") { - - #transform the files absolute pathways into relative pathways - xset@filepaths<-sub(paste(getwd(),"/",sep="") ,"", xset@filepaths) - if(exists("zipfile") && !is.null(zipfile) && (zipfile!="")) { - - #Modify the samples names (erase the path) - for(i in 1:length(sampnames(xset))){ - - sample_name=unlist(strsplit(sampnames(xset)[i], "/")) - sample_name=sample_name[length(sample_name)] - sample_name= unlist(strsplit(sample_name,"[.]"))[1] - sampnames(xset)[i]=sample_name - - } - - } - -} - -# -- TIC -- -if (thefunction == "xcmsSet") { - cat("\t\tGET TIC GRAPH\n") - sampleNamesList = getSampleMetadata(xcmsSet=xset, sampleMetadataOutput=sampleMetadataOutput) - getTICs(xcmsSet=xset, pdfname=ticspdf,rt="raw") - getBPCs(xcmsSet=xset,rt="raw",pdfname=bicspdf) -} else if (thefunction == "retcor") { - cat("\t\tGET TIC GRAPH\n") - getTICs(xcmsSet=xset, pdfname=ticspdf,rt="corrected") - getBPCs(xcmsSet=xset,rt="corrected",pdfname=bicspdf) -} - -if ((thefunction == "group" || thefunction == "fillPeaks") && exists("intval")) { - getPeaklistW4M(xset,intval,convertRTMinute,numDigitsMZ,numDigitsRT,variableMetadataOutput,dataMatrixOutput) -} - - -cat("\n\n") - -# ----- EXPORT ----- - -cat("\tXSET OBJECT INFO\n") -print(xset) -#delete the parameters to avoid the passage to the next tool in .RData image - - -#saving R data in .Rdata file to save the variables used in the present tool -objects2save = c("xset","zipfile","singlefile","listOFlistArguments","md5sumList","sampleNamesList") -save(list=objects2save[objects2save %in% ls()], file=xsetRdataOutput) - -cat("\n\n") - - -cat("\tDONE\n") diff -r c013ed353a2f -r 4d6f4cd7c3ef xcms_retcor.r --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/xcms_retcor.r Thu Mar 01 04:16:45 2018 -0500 @@ -0,0 +1,106 @@ +#!/usr/bin/env Rscript + +# ----- LOG FILE ----- +log_file=file("log.txt", open = "wt") +sink(log_file) +sink(log_file, type = "output") + + +# ----- PACKAGE ----- +cat("\tSESSION INFO\n") + +#Import the different functions +source_local <- function(fname){ argv <- commandArgs(trailingOnly=FALSE); base_dir <- dirname(substring(argv[grep("--file=", argv)], 8)); source(paste(base_dir, fname, sep="/")) } +source_local("lib.r") + +pkgs <- c("xcms","batch","RColorBrewer") +loadAndDisplayPackages(pkgs) +cat("\n\n"); + + +# ----- ARGUMENTS ----- +cat("\tARGUMENTS INFO\n") +args = parseCommandArgs(evaluate=FALSE) #interpretation of arguments given in command line as an R list of objects +write.table(as.matrix(args), col.names=F, quote=F, sep='\t') + +cat("\n\n") + +# ----- PROCESSING INFILE ----- +cat("\tARGUMENTS PROCESSING INFO\n") + +#saving the specific parameters +method <- args$method; args$method <- NULL + +cat("\n\n") + + +# ----- ARGUMENTS PROCESSING ----- +cat("\tINFILE PROCESSING INFO\n") + +#image is an .RData file necessary to use xset variable given by previous tools +load(args$image); args$image=NULL +if (!exists("xdata")) stop("\n\nERROR: The RData doesn't contain any object called 'xdata'. This RData should have been created by an old version of XMCS 2.*") + +# Handle infiles +if (!exists("singlefile")) singlefile <- NULL +if (!exists("zipfile")) zipfile <- NULL +rawFilePath <- getRawfilePathFromArguments(singlefile, zipfile, args) +zipfile <- rawFilePath$zipfile +singlefile <- rawFilePath$singlefile +args <- rawFilePath$args +directory <- retrieveRawfileInTheWorkingDirectory(singlefile, zipfile) + +# Check some character issues +md5sumList <- list("origin" = getMd5sum(directory)) +checkXmlStructure(directory) +checkFilesCompatibilityWithXcms(directory) + + +cat("\n\n") + + +# ----- MAIN PROCESSING INFO ----- +cat("\tMAIN PROCESSING INFO\n") + + +cat("\t\tCOMPUTE\n") + +cat("\t\t\tAlignment/Retention Time correction\n") +adjustRtimeParam <- do.call(paste0(method,"Param"), args) +print(adjustRtimeParam) +xdata <- adjustRtime(xdata, param=adjustRtimeParam) + +# Get the legacy xcmsSet object +xset <- getxcmsSetObject(xdata) + +cat("\n\n") + + +# -- TIC -- +cat("\t\tDRAW GRAPHICS\n") +getPlotAdjustedRtime(xdata) + +#@TODO: one day, use xdata instead of xset to draw the TICs and BPC or a complete other method +getTICs(xcmsSet=xset, rt="raw", pdfname="TICs.pdf") +getBPCs(xcmsSet=xset, rt="raw", pdfname="BICs.pdf") + +cat("\n\n") + +# ----- EXPORT ----- + +cat("\tXCMSnExp OBJECT INFO\n") +print(xdata) +cat("\n\n") + +cat("\txcmsSet OBJECT INFO\n") +print(xset) +cat("\n\n") + +#saving R data in .Rdata file to save the variables used in the present tool +objects2save = c("xdata","zipfile","singlefile","md5sumList","sampleNamesList") +save(list=objects2save[objects2save %in% ls()], file="retcor.RData") + +cat("\n\n") + + +cat("\tDONE\n")