Obtain names of variable arguments based on dot dot dot in function R (deparse) - r

I am creating an automated plotter, based on some dummy variables. I set it up such that:
plotter <- function(...) { }
will plot all the dummies I feed it.
However, I would like it to be able to add labels to the plot, namely the variable names.
I do know that
deparse(substitute(variablename))
will yield
"variablename"
which is a start, but how do I do this in the case of multiple arguments? Is it at possible? Is there a workaround?

names(list(...)) will get you a character vector containing the names of the supplied arguments that have been absorbed by ...:
plotter <- function(...) {names(list(...))}
plotter(x=1:4, y=11:14)
# [1] "x" "y"
Alternatively, if you want to pass in unnamed arguments, try this (which extends #baptiste's now-deleted answer):
plotter <- function(..., pch=16, col="red") {
nms <- setdiff(as.character(match.call(expand.dots=TRUE)),
as.character(match.call(expand.dots=FALSE)))
nms
}
x <- 1:4
y <- 1:14
plotter(x, y, col="green")
# [1] "x" "y"

Related

Remove outliers by condition from list of data frames

I try to create a function to remove multiple outliers via cooks distance from a list of data frames.
There are some problems at the moment:
Can I formulate part 1 as function? I tried several things that did not work out. I want to use several different variables for the lm - so it would be great if I could use colnumbers and the regular expression syntax of data frames as input argument.
Part 2 - the filename of the plots are not correct. It takes the first observation in each data frame from the list as filename. How can I correct this?
Part 3: data frames without the outliers are not created. Function comes to an end after the message is printed. I can't find my mistake.
data(iris)
iris.lst <- split(iris[, 1:2], iris$Species)
new_names <- c(paste0(unlist(levels(iris$Species)),"_data"))
for (i in 1:length(iris.lst)) {
assign(new_names[i], iris.lst[[i]])
}
# Part 1: Then cooks distances
fit <- lapply(mget(ls(pattern = "_data")),
function(x) lm(x[,1] ~ x[,3], data = x))
cooksd <-lapply(fit,cooks.distance)
# Part 2: Plot each data frame with suspected outlier
plots <- function(x){
jpeg(file=paste0(names(x),".jpeg")) # file names are numbers
#par(mfrow=c(2,1))
plot(x, pch="*", cex=2, main="Influential cases by Cooks distance") # plot cook's distance
abline(h = 3*mean(x, na.rm=T), col="red") # add cutoff line
text(x=1:length(x)+1, y=x, labels=ifelse(x > 3*mean(x, na.rm=T),
names(x),""), col="red")
dev.off()
}
myplots <- lapply(cooksd, plots)
# Part 3: give me new data frames without influential cases
show_influential_cases <- function(x){
# invisible(cooksd[["n_OG"]] <- lapply(cooksd, length)
influential <- lapply(x,function(x) names(x)[x > 3*mean(x, na.rm=T)])
test <- as.data.frame(unlist(influential))[,1]
test <- as.numeric(test)
}
tested <- show_influential_cases(result)
cleaned_data <- add_new[-tested,] # removing outliers by indexing
Could someone please help me to improve my code?
Many thanks,
Nadine
In general, it is not a good practice to create multiple dataframes in global environment. Lists always are a better option, they are easy to manage.
Part 1 -
You can combine multiple steps in one lapply function. Here in part 1 we apply lm and cooks.distance function together in the same lapply call.
master_data <- split(iris[, 1:2], iris$Species)
data <- lapply(master_data, function(x) {
cooks.distance(lm(Sepal.Length ~ Sepal.Width, data = x))
})
new_names <- paste0(levels(iris$Species),"_data")
names(data) <- new_names
Part 2 -
lapply does not have access to names of the list, pass them separately and use Map to call plots function.
plots <- function(x, y){
jpeg(file=paste0(y,".jpeg"))
plot(x, pch="*", cex=2, main="Influential cases by Cooks distance")
abline(h = 3*mean(x, na.rm=T), col="red") # add cutoff line
text(x=1:length(x)+1,y=x,labels=ifelse(x > 3*mean(x, na.rm=T),y,""), col="red")
dev.off()
}
Map(plots, data, names(data))
Part 3 -
I am not exactly clear about how you want to perform Part3 but for now I am showing outlier and data separately.
remove_influential_cases <- function(x, y){
inds <- x > 3*mean(x, na.rm=TRUE)
y[!inds, ]
}
result <- Map(remove_influential_cases, data, master_data)

Parsing R user input into plotting functions

I've been dealing with user input for various graphs. My main aim was to ask the user for an input and then parse this to a plotting function. I managed to do this for scatterplot, but not boxplot and barplot. This is my working example:
n<- function(){
readline(prompt="enter x value to plot: ")
}
m<- function(){
readline(prompt="enter y value to plot: ")
}
plotfun <- function(dat) {
colx <- n()
coly <- m()
plot(dat[,colx], dat[,coly], main="Scatterplot", pch=20,xlab=[,colx] )
}
But when I try something similar with boxplot for example:
plot2<-function(infile){
a<-readline(prompt="which variable")
barplot(table(infile$a))
}
or
a<-readline(prompt="enter...")
Boxplot( ~ a, data=infile, id.method="y")
It doesn't work
Errors were something like: can't find the object, argument "infile" is missing, with no default.
What is infile?
plot2 <- function(){
a <- readline(prompt = "which variable")
barplot(table(a))
}
You cannot use "$" with character variable names. You must do the subsetting with [ as you did in the other cases
plot2<-function(infile){
a<-readline(prompt="which variable")
barplot(table(infile[,a]))
}
If your Boxplot function is the one from car, then
a<-readline(prompt="enter...")
Boxplot(infile[,a], labels=rownames(infile), id.method="y")
Is the variable friendly equivalent. You can't use character variables in formulas either. They are taken as literal values.

How can I suppress the creation of a plot while calling a function in R?

I am using a function in R (specifically limma::plotMDS) that produces a plot and also returns a useful value. I want to get the returned value without producing the plot. Is there an easy way to call the function but suppress the plot that it creates?
You can wrap the function call like this :
plotMDS.invisible <- function(...){
ff <- tempfile()
png(filename=ff)
res <- plotMDS(...)
dev.off()
unlink(ff)
res
}
An example of call :
x <- matrix(rnorm(1000*6,sd=0.5),1000,6)
rownames(x) <- paste("Gene",1:1000)
x[1:50,4:6] <- x[1:50,4:6] + 2
# without labels, indexes of samples are plotted.
mds <- plotMDS.invisible(x, col=c(rep("black",3), rep("red",3)) )

Passing a list of arguments to plot in R

I would like to use the same arguments for several calls to plot.
I tried to use a list (which can serve as a dictionary) :
a <- list(type="o",ylab="")
plot(x,y, a)
But it does not work :
Error in plot.xy(xy, type, ...) : invalid plot type
Any suggestion ?
Extending #baptiste's answer, you can use do.call like this:
x <- 1:10 # some data
y <- 10:1
do.call("plot", list(x,y, type="o", ylab=""))
Or setting the arguments in a list and call it a
a <- list(x,y, type="o", ylab="")
do.call(plot, a)
Another option is to create a function wrapper:
myplot <- function(...) plot(...,type="o",ylab="")
myplot(x,y)

How can I auto-title a plot with the R call that produced it?

R's plotting is great for data exploration, as it often has very intelligent defaults. For example, when plotting with a formula the labels for the plot axes are derived from the formula. In other words, the following two calls produce the same output:
plot(x~y)
plot(x~y, xlab="x", ylab="y")
Is there any way to get a similar "intelligent auto-title"?
For example, I would like to call
plot(x~y, main=<something>)
And produce the same output as calling
plot(x~y, main="plot(x~y)")
Where the <something> inserts the call used using some kind of introspection.
Is there a facility for doing this in R, either through some standard mechanism or an external package?
edit: One suggestion was to specify the formula as a string, and supply that as the argument to a formula() call as well as main. This is useful, but it misses out on parameters than can affect a plot, such as using subsets of data. To elaborate, I'd like
x<-c(1,2,3)
y<-c(1,2,3)
z<-c(0,0,1)
d<-data.frame(x,y,z)
plot(x~y, subset(d, z==0), main=<something>)
To have the same effect as
plot(x~y, subset(d, z==0), main="plot(x~y, subset(d, z==0))")
I don't think this can be done without writing a thin wrapper around plot(). The reason is that R evaluates "supplied arguments" in the evaluation frame of the calling function, in which there's no way to access the current function call (see here for details).
By contrast, "default arguments" are evaluated in the evaluation frame of the function, from where introspection is possible. Here are a couple of possibilities (differing just in whether you want "myPlot" or "plot" to appear in the title:
## Function that reports actual call to itself (i.e. 'myPlot()') in plot title.
myPlot <- function(x,...) {
cl <- deparse(sys.call())
plot(x, main=cl, ...)
}
## Function that 'lies' and says that plot() (rather than myPlot2()) called it.
myPlot2 <- function(x,...) {
cl <- sys.call()
cl[[1]] <- as.symbol("plot")
cl <- deparse(cl)
plot(x, main=cl, ...)
}
## Try them out
x <- 1:10
y <- 1:10
par(mfcol=c(1,2))
myPlot(x,y)
myPlot2(y~x)
Here's a more general solution:
plotCaller <- function(plotCall, ...) {
main <- deparse(substitute(plotCall))
main <- paste(main, collapse="\n")
eval(as.call(c(as.list(substitute(plotCall)), main=main, ...)))
}
## Try _it_ out
plotCaller(hist(rnorm(9999), breaks=100, col="red"))
library(lattice)
plotCaller(xyplot(rnorm(10)~1:10, pch=16))
## plotCaller will also pass through additional arguments, so they take effect
## without being displayed
plotCaller(xyplot(rnorm(10)~1:10), pch=16)
deparse will attempt to break deparsed lines if they get too long (the default is 60 characters). When it does this, it returns a vector of strings. plot methods assume that 'main' is a single string, so the line main <- paste(main, collapse='\n') deals with this by concatenating all the strings returned by deparse, joining them using \n.
Here is an example of where this is necessary:
plotCaller(hist(rnorm(9999), breaks=100, col="red", xlab="a rather long label",
ylab="yet another long label"))
Of course there is! Here ya go:
x = rnorm(100)
y = sin(x)
something = "y~x"
plot(formula(something),main=something)
You might be thinking of the functionality of match.call. However that only really works when called inside of a function, not passed in as an argument. You could create your wrapper function that would call match.call then pass everything else on to plot or use substitute to capture the call then modify it with the call before evaluating:
x <- runif(25)
y <- rnorm(25, x, .1)
myplot <- function(...) {
tmp <- match.call()
plot(..., main=deparse(tmp))
}
myplot( y~x )
myplot( y~x, xlim=c(-.25,1.25) )
## or
myplot2 <- function(FUN) {
tmp1 <- substitute(FUN)
tmp2 <- deparse(tmp1)
tmp3 <- as.list(tmp1)
tmp4 <- as.call(c(tmp3, main=tmp2))
eval(tmp4)
}
myplot2( plot(y~x) )
myplot2( plot(y~x, xlim=c(-.25,1.25) ) )

Resources