How to name the PDF file of output with CMD in R - r

I've met a problem when reading the documentation. It says in the description of the file argument:
a character string giving the name of the file. If it is of the form "|cmd", the output is piped to the command given by cmd.
I don't quite get the meaning here.
Does it mean that i can write the statement in R script like pdf(file = "|cmd", ...) and use command in the cmd like Rscript input.R --args 'output.pdf'?
Here is the example given in the documentation
## Test function for encodings
TestChars <- function(encoding = "ISOLatin1", ...)
{
pdf(encoding = encoding, ...)
par(pty = "s")
plot(c(-1,16), c(-1,16), type = "n", xlab = "", ylab = "",
xaxs = "i", yaxs = "i")
title(paste("Centred chars in encoding", encoding))
grid(17, 17, lty = 1)
for(i in c(32:255)) {
x <- i %% 16
y <- i %/% 16
points(x, y, pch = i)
}
dev.off()
}
## there will be many warnings.
TestChars("ISOLatin2")
## this does not view properly in older viewers.
TestChars("ISOLatin2", family = "URWHelvetica")
## works well for viewing in gs-based viewers, and often in xpdf.

Related

suppress interactive plotting in R

I would like to send a plot to a file using pdf(), but plot.MCMCglmm() is attempting to act interactively, which interferes with dev.off().
pdf(file="model.pdf")
plot(model, random=FALSE)
Hit <Return> to see next plot: dev.off()
And the file is not closed. Adding another dev.off() closes the file. Is there a way to suppress the interactive plotting?
EDIT example:
require(MCMCglmm)
mod_dat <- data.frame( Name = rep(letters[1:3], each=10),
Group = rep(letters[1:3], 10),
Age = rep(letters[1:5], each=6),
Happy = rep(letters[1:2], 15),
x = rnorm(30),
y = rnorm(30) )
mod_out <- MCMCglmm( y~x, random=~Name+Group+Age+Happy,
data=mod_dat, verbose=FALSE )
pdf( file="model out.pdf" )
plot(mod_out)
dev.off()
dev.off()
You could modify the plot function for plot.MCMCglmm to turn off the new page prompt. You can get the code for the function by typing plot.MCMCglmm in the console.
myPlotGLMM = function (x, random = FALSE, ...)
{
nF <- x$Fixed$nfl
#devAskNewPage.orig <- devAskNewPage()
if (random) {
nF <- sum(rep(x$Random$nrl, x$Random$nfl)) + nF
if (nF != dim(x$Sol)[2]) {
stop("random effects not saved and cannot be plotted")
}
}
plot(x$Sol[, 1:nF, drop = FALSE], ...)
#devAskNewPage(TRUE)
if (is.null(x$Lambda) == FALSE) {
plot(x$Lambda, ...)
#devAskNewPage(TRUE)
}
plot(x$VCV, ...)
#devAskNewPage(devAskNewPage.orig)
}
myPlotGLMM(model)
Another option is to plot everything onto one page. Without a reproducible example I can't test this but this should work:
pdf(file="model.pdf")
par(mfrow=c(2,2))
plot(model, random=FALSE)
dev.off()
So if this generated four plots, they would be arranged on one page in 2 x2 grid.

Reset graph at the end of the loop :could not find function "device" error

I am trying to generate plots by looping, here is my code:
n <- unique(wide_data$Product.Code)[1:3]
for (i in n)
{
my.prod2 <- filter(tall_bind, Product.Code == i, Date > ymd("2012/04/01"))
dev.new()
mypath <- file.path("C:","R","SAVEHERE",paste("myplot_", i, ".jpg", sep = ""))
jpeg(file=mypath)
mytitle = paste("Plot for product", i)
p <- qplot(Date, Sold, data = my.prod2, geom = "line", main=mytitle, group = Model, colour = Model) + facet_grid(Model ~ .)
ggsave("myplot_", i, plot=p, device= "jpg" )
}
I get the following error for the above code:
Saving 6.67 x 6.67 in image
Error in ggsave("myplot_", i, plot = p, device = "jpg") : could
not find function "device"
Earlier when I used dev.off() at the end of the loop, I found that though the graphs were being generated they were totally blank.
Could someone please help me understand where is the mistake in my code?
You can leave out the dev.new() and jpg() commands, and also your arguments to ggsave() are incorrect. This should work:
n <- unique(wide_data$Product.Code)[1:3]
for (i in n) {
my.prod2 <- filter(tall_bind, Product.Code == i, Date > ymd("2012/04/01"))
mypath <- file.path("C:","R","SAVEHERE",paste("myplot_", i, ".jpg", sep = ""))
mytitle = paste("Plot for product", i)
p <- qplot(Date, Sold, data = my.prod2, geom = "line", main=mytitle, group = Model, colour = Model) + facet_grid(Model ~ .)
ggsave(filename = mypath, plot = p)
}
What you did was creating a new default graphics device, typically a plotting window, then a jpeg graphics device, i.e. a file. Then you tried to make ggplot2 to plot to directly to file using ggsave, i.e. using its own (jpg) device, and not using either of the two graphics devices you created.
The error, however, was because you gave ggsave the wrong arguments. But even with the right arguments, you would still have ended up with additional unused graphics windows and files through the dev.new() and jpeg() commands. I suggest some extra reading of the help (e.g. type ?ggsave at the r console).
Typically, when using ggplot2 you do not need to worry about dev.new, jpeg and the like. qplot or ggplot and ggsave should do all you need.

How to run PCA, distance matrix and other math procedures on genome VCF files in R?

I am learning to process VCF (variant call files) to produce plots and reports. Here is the R code, which crashes for unknown to me reasons. Please advise how to fix it and tell appropriate tutorials.
library(VariantAnnotation)
library(SNPRelate)
vcf<-readVcf("test.vcf","hg19") # load your VCF file from a set dir
snpgdsVCF2GDS("test.vcf", "my.gds")
snpgdsSummary("my.gds")
genofile <- openfn.gds("my.gds")
#dendogram
dissMatrix <- snpgdsDiss(genofile , sample.id=NULL, snp.id=NULL,
autosome.only=TRUE,remove.monosnp=TRUE, maf=NaN, missing.rate=NaN,
num.thread=10, verbose=TRUE)
snpHCluster <- snpgdsHCluster(dist, sample.id=NULL, need.mat=TRUE,
hang=0.25)
cutTree <- snpgdsCutTree(snpHCluster, z.threshold=15, outlier.n=5,
n.perm = 5000, samp.group=NULL,col.outlier="red", col.list=NULL,
pch.outlier=4, pch.list=NULL,label.H=FALSE, label.Z=TRUE,
verbose=TRUE)
#pca
sample.id <- read.gdsn(index.gdsn(genofile, "sample.id"))
pop_code <- read.gdsn(index.gdsn(genofile, "sample.id")
pca <- snpgdsPCA(genofile)
tab <- data.framesample.id = pca$sample.id,pop =
factor(pop_code)[match(pca$sample.id, sample.id)],EV1 =
pca$eigenvect[,1],EV2 = pca$eigenvect[,2],stringsAsFactors = FALSE)
plot(tab$EV2, tab$EV1, col=as.integer(tab$pop),xlab="eigenvector 2",
ylab="eigenvector 1") legend("topleft", legend=levels(tab$pop),
pch="o", col=1:nlevels(tab$pop))
Your code has several issues:
-the snpgdsHCluster step should be run on dissMatrix, not dist:
snpHCluster <- snpgdsHCluster(dissMatrix, sample.id=NULL, need.mat=TRUE,
hang=0.25)
-you need a paren after dataframe in the tab line:
tab <- data.frame(sample.id = pca$sample.id,pop =
factor(pop_code)[match(pca$sample.id, sample.id)],EV1 =
pca$eigenvect[,1],EV2 = pca$eigenvect[,2],stringsAsFactors = FALSE)
-legend is a separate command from plot:
plot(tab$EV2, tab$EV1, col=as.integer(tab$pop),xlab="eigenvector 2",
ylab="eigenvector 1")
legend("topleft", legend=levels(tab$pop),
pch="o", col=1:nlevels(tab$pop))
I think otherwise it should work for you.

How to modify R program to support RHadoop

I am new to RHadoop and R. I am having a normal R program which has a library(Methylkit). I am wondering can someone give some insights on how do I run this R program on hadoop. What do I need to modify in the original R program? It would be really help if some one gives me some idea.
The Code:
library(methylKit)
file.list=list( "new_sample1.txt","new_sample2.txt","n_sample3.txt")
myobj=read(file.list,sample.id=list("test1","test2","ctrl1"),assembly="hg19",treatment=c(1,1,0),context="CpG", pipeline=list(fraction=TRUE,chr.col=1,start.col=2,end.col=2,
coverage.col=6,strand.col=3,freqC.col=5 ))
getMethylationStats(myobj[[1]],plot=F,both.strands=F)
pdf("sample1_statistics.pdf")
getMethylationStats(myobj[[1]],plot=T,both.strands=F)
dev.off()
getMethylationStats(myobj[[2]],plot=F,both.strands=F)
pdf("sample2_statistics.pdf")
getMethylationStats(myobj[[2]],plot=T,both.strands=F)
dev.off()
getCoverageStats(myobj[[3]],plot=F,both.strands=F)
pdf("sample3_statistics.pdf")
getMethylationStats(myobj[[3]],plot=T,both.strands=F)
dev.off()
library("graphics")
pdf("sample1_coverage.pdf")
getCoverageStats(myobj[[1]], plot = T, both.strands = F)
dev.off()
pdf("sample2_coverage.pdf")
getCoverageStats(myobj[[2]], plot = T, both.strands = F)
dev.off()
pdf("sample3_coverage.pdf")
getCoverageStats(myobj[[3]], plot = T, both.strands = F)
dev.off()
meth=unite(myobj, destrand=FALSE)
pdf("correlation.pdf")
getCorrelation(meth,plot=T)
dev.off()
pdf("cluster.pdf")
clusterSamples(meth, dist="correlation",method="ward", plot=TRUE)
dev.off()
hc <- clusterSamples(meth, dist = "correlation", method = "ward",plot = FALSE)
pdf("pca.pdf")
PCASamples(meth, screeplot = TRUE)
PCASamples(meth)
myDiff=calculateDiffMeth(meth)
write.table(myDiff, "mydiff.txt", sep='\t')
myDiff25p.hyper <-get.methylDiff(myDiff,differenc=25,qvalue=0.01,type="hyper")
myDiff25p.hyper
write.table(myDiff25p.hyper,"hyper_methylated.txt",sep='\t')
myDiff25p.hypo <-get.methylDiff(myDiff,differenc=25,qvalue=0.01,type="hypo")
myDiff25p.hypo
write.table(myDiff25p.hypo,"hypo_methylated.txt",sep='\t')
myDiff25p <-get.methylDiff(myDiff,differenc=25,qvalue=0.01)
myDiff25p
write.table(myDiff25p,"differentialy_methylated.txt",sep='\t')
diffMethPerChr(myDiff,plot=FALSE,qvalue.cutoff=0.01,meth.cutoff=25)
pdf("diffMethPerChr.pdf")
diffMethPerChr(myDiff,plot=TRUE,qvalue.cutoff=0.01,meth.cutoff=25)
dev.off()
gene.obj <- read.transcript.features(system.file("extdata","refseq.hg18.bed.txt", package = "methylKit"))
write.table(gene.obj,"gene_obj.txt", sep='\t')
annotate.WithGenicParts(myDiff25p, gene.obj)
cpg.obj <- read.feature.flank(system.file("extdata","cpgi.hg18.bed.txt", package = "methylKit"),feature.flank.name = c("CpGi","shores"))
write.table(cpg.obj,"cpg_obj.txt", sep='\t')
diffCpGann <- annotate.WithFeature.Flank(myDiff25p,cpg.obj$CpGi, cpg.obj$shores, feature.name = "CpGi",flank.name = "shores")
write.table(diffCpGann,"diffCpCann.txt", sep='\t')
diffCpGann
promoters <- regionCounts(myobj, gene.obj$promoters)
head(promoters[[1]])
write.table(promoters,"promoters.txt", sep='\t')
diffAnn <- annotate.WithGenicParts(myDiff25p, gene.obj)
head(getAssociationWithTSS(diffAnn))
diffAnn
write.table(getAssociationWithTSS(diffAnn),"diff_ann.txt", sep='\t')
getTargetAnnotationStats(diffAnn, percentage = TRUE,precedence = TRUE)
pdf("piechart1.pdf")
plotTargetAnnotation(diffAnn, precedence = TRUE, main ="differential methylation annotation")
dev.off()
pdf("piechart2.pdf")
plotTargetAnnotation(diffCpGann, col = c("green","gray", "white"), main = "differential methylation annotation")
dev.off()
getFeatsWithTargetsStats(diffAnn, percentage = TRUE)
Are the *.txt files located in hdfs? If not, do put. You can use hadoop streaming to read data from hadoop.
line1 <- file('stdin')
open(line1)
while(length(line <- readLines(line1,n=1)) > 0) {
}
'stdin' is the input param to R-program from hadoop streaming jar. 'line' gets new line of data every time loop iterates.
Inside while loop do write the logic on what to do with line.
Use hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-streaming.jar -input hdfs_input_file1, file2,n-files -output hdfs_output_dir -file mapper_file -file reducer_file -mapper mapper.R -reducer reducer.R to run the program.
-input accepts n-input files. Hadoop streaming jar reads one by one and feed to stdin

Unused arguments in R error

I am new to R , I am trying to run example which is given in "rebmix-help pdf". It use galaxy dataset and here is the code
library(rebmix)
devAskNewPage(ask = TRUE)
data("galaxy")
write.table(galaxy, file = "galaxy.txt", sep = "\t",eol = "\n", row.names = FALSE, col.names = FALSE)
REBMIX <- array(list(NULL), c(3, 3, 3))
Table <- NULL
Preprocessing <- c("histogram", "Parzen window", "k-nearest neighbour")
InformationCriterion <- c("AIC", "BIC", "CLC")
pdf <- c("normal", "lognormal", "Weibull")
K <- list(7:20, 7:20, 2:10)
for (i in 1:3) {
for (j in 1:3) {
for (k in 1:3) {
REBMIX[[i, j, k]] <- REBMIX(Dataset = "galaxy.txt",
Preprocessing = Preprocessing[k], D = 0.0025,
cmax = 12, InformationCriterion = InformationCriterion[j],
pdf = pdf[i], K = K[[k]])
if (is.null(Table))
Table <- REBMIX[[i, j, k]]$summary
else Table <- merge(Table, REBMIX[[i, j,k]]$summary, all = TRUE, sort = FALSE)
}
}
}
It is giving me error ERROR:
unused argument (InformationCriterion = InformationCriterion[j])
Plz help
I'm running R 3.0.2 (Windows) and the library rebmix defines a function REBMIX where InformationCriterion is not listed as a named argument, but Criterion.
Brief invoke REBMIX as :
REBMIX[[i, j, k]] <- REBMIX(Dataset = "galaxy.txt",
Preprocessing = Preprocessing[k], D = 0.0025,
cmax = 12, Criterion = InformationCriterion[j],
pdf = pdf[i], K = K[[k]])
It looks as though there have been substantial changes to the rebmix package since the example mentioned in the OP was created. Among the most noticable changes is the use of S4 classes.
There's also an updated demo in the rebmix package using the galaxy data (see demo("rebmix.galaxy"))
To get the above example to produce results (Note: I am not familiar with this package or the rebmix algorithm!!!):
Change the argument to Criterion as mentioned by #Giupo
Use the S4 slot access operator # instead of $
Don't name the results object REDMIX because that's already the function name
library(rebmix)
data("galaxy")
## Don't re-name the REBMIX object!
myREBMIX <- array(list(NULL), c(3, 3, 3))
Table <- NULL
Preprocessing <- c("histogram", "Parzen window", "k-nearest neighbour")
InformationCriterion <- c("AIC", "BIC", "CLC")
pdf <- c("normal", "lognormal", "Weibull")
K <- list(7:20, 7:20, 2:10)
for (i in 1:3) {
for (j in 1:3) {
for (k in 1:3) {
myREBMIX[[i, j, k]] <- REBMIX(Dataset = list(galaxy),
Preprocessing = Preprocessing[k], D = 0.0025,
cmax = 12, Criterion = InformationCriterion[j],
pdf = pdf[i], K = K[[k]])
if (is.null(Table)) {
Table <- myREBMIX[[i, j, k]]#summary
} else {
Table <- merge(Table, myREBMIX[[i, j,k]]#summary, all = TRUE, sort = FALSE)
}
}
}
}
I guess this is late. But I encountered a similar problem just a few minutes ago. And I realized the real scenario that you may face when you got this kind of error msg... It's just the version conflict.
You may use a different version of the R package from the tutorial, thus the argument names could be different between what you are running and what the real code use.
So please check the version first before you try to manually edit the file. Also, it happens that your old version package is still in the path and it overrides the new one. This was exactly what I had... since I manually installed the old and new version separately...

Resources