Parsing R user input into plotting functions - r

I've been dealing with user input for various graphs. My main aim was to ask the user for an input and then parse this to a plotting function. I managed to do this for scatterplot, but not boxplot and barplot. This is my working example:
n<- function(){
readline(prompt="enter x value to plot: ")
}
m<- function(){
readline(prompt="enter y value to plot: ")
}
plotfun <- function(dat) {
colx <- n()
coly <- m()
plot(dat[,colx], dat[,coly], main="Scatterplot", pch=20,xlab=[,colx] )
}
But when I try something similar with boxplot for example:
plot2<-function(infile){
a<-readline(prompt="which variable")
barplot(table(infile$a))
}
or
a<-readline(prompt="enter...")
Boxplot( ~ a, data=infile, id.method="y")
It doesn't work
Errors were something like: can't find the object, argument "infile" is missing, with no default.

What is infile?
plot2 <- function(){
a <- readline(prompt = "which variable")
barplot(table(a))
}

You cannot use "$" with character variable names. You must do the subsetting with [ as you did in the other cases
plot2<-function(infile){
a<-readline(prompt="which variable")
barplot(table(infile[,a]))
}
If your Boxplot function is the one from car, then
a<-readline(prompt="enter...")
Boxplot(infile[,a], labels=rownames(infile), id.method="y")
Is the variable friendly equivalent. You can't use character variables in formulas either. They are taken as literal values.

Related

How to construct a function that will construct a histogram or a bar chart depending the variable

How can I construct a function (in base R)which will receive as an input parameter a variable, and will construct a histogram or a bar chart depending on whether it will be a quantitative or categorical variable?
I have tried googling but the solution has to be found witout an extra package downloaded
Function
Assuming you are working with numeric variables or factor. But you can keep putting if elses there for all kinds of variables.
plot_hist_or_bar <- function(x) {
if(is.numeric(x)) {
hist(x)
} else if(is.factor(x)) {
barplot(table(x))
} else {
stop("Input variable must be numeric or a factor.")
}
}
Testing
x <- rnorm(100)
plot_hist_or_bar(x)
y <- factor(rep(c("A", "B"), 50))
plot_hist_or_bar(y)
Created on 2023-02-06 with reprex v2.0.2

Change title of plots in list

I have a problem with my plotting function. I already asked kind of a similar question and Here are all the data and the plotting function.
When I try to apply my plotting function to my list of dfs, the title gets changed and the title condition I specified in the function is not respected. Is there a way to rename all the plots in the list or fix the function/loop so that the title stays the same?
Thanks in advance
This works as intended:
mynames <- sapply(names(tbls), function(x) {
paste("How do they rank? -",gsub("\\.",": ",x))
})
myfilenames <- names(tbls)
plot_likert <- function(x, myname, myfilename){
p <- plot(likert(x),
type ="bar",center=3,
group.order=names(x))+
labs(x = "Theme", subtitle=paste("Number of observations:",nrow(x)))+
guides(fill=guide_legend("Rank"))+
ggtitle(myname)
p
}
list_plots <- lapply(1:length(tbls),function(i) {
plot_likert(tbls[[i]], mynames[i], myfilenames[i])
})
When in doubt, keep things stupid and simple. Non-standard evaluation like deparse(substitute( will throw you right into Burns' R inferno.

Looping cut2 color argument in qplot

First off fair warning that this is relevant to a quiz question from coursera.org practical machine learning. However, my question does not deal with the actual question asked, but is a tangential question about plotting.
I have a training set of data and I am trying to create a plot for each predictor that includes the outcome on the y axis, the index of the data set on the x axis, and colors the plot by the predictor in order to determine the cause of bias along the index. To make the color argument more clear I am trying to use cut2() from the Hmisc package.
Here is my data:
library(ggplot2)
library(caret)
library(AppliedPredictiveModeling)
library(Hmisc)
data(concrete)
set.seed(1000)
inTrain = createDataPartition(mixtures$CompressiveStrength, p = 3/4)[[1]]
training = mixtures[ inTrain,]
testing = mixtures[-inTrain,]
training$index <- 1:nrow(training)
I tried this and it makes all the plots but they are all the same color.
plotCols <- function(x) {
cols <- names(x)
for (i in 1:length(cols)) {
assign(paste0("cutEx",i), cut2(x[ ,i]))
print(qplot(x$index, x$CompressiveStrength, color=paste0("cutEx",i)))
}
}
plotCols(training)
Then I tried this and it makes all the plots, and this time they are colored but the cut doesn't work.
plotCols <- function(x) {
cols <- names(x)
for (i in 1:length(cols)) {
assign(cols[i], cut2(x[ ,i]))
print(qplot(x$index, x$CompressiveStrength, color=x[ ,cols[i]]))
}
}
plotCols(training)
It seems qplot() doesn't like having paste() in the color argument. Does anyone know another way to loop through the color argument and still keep my cuts? Any help is greatly appreciated!
Your desired output is easier to achieve using ggplot() instead of qplot(), since you can use aes_string(), that accepts strings as arguments.
plotCols <- function(x) {
cols <- names(x)
for (i in 1:length(cols)) {
assign(paste0("cutEx", i), cut2(x[, i]))
p <- ggplot(x) +
aes_string("index", "CompressiveStrength", color = paste0("cutEx", i)) +
geom_point()
print(p)
}
}
plotCols(training)

Save plots as R objects and displaying in grid

In the following reproducible example I try to create a function for a ggplot distribution plot and saving it as an R object, with the intention of displaying two plots in a grid.
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
output<-list(distribution,var1,dat)
return(output)
}
Call to function:
set.seed(100)
df <- data.frame(x = rnorm(100, mean=10),y =rep(1,100))
output1 <- ggplothist(dat=df,var1='x')
output1[1]
All fine untill now.
Then i want to make a second plot, (of note mean=100 instead of previous 10)
df2 <- data.frame(x = rep(1,1000),y = rnorm(1000, mean=100))
output2 <- ggplothist(dat=df2,var1='y')
output2[1]
Then i try to replot first distribution with mean 10.
output1[1]
I get the same distibution as before?
If however i use the information contained inside the function, return it back and reset it as a global variable it works.
var1=as.numeric(output1[2]);dat=as.data.frame(output1[3]);p1 <- output1[1]
p1
If anyone can explain why this happens I would like to know. It seems that in order to to draw the intended distribution I have to reset the data.frame and variable to what was used to draw the plot. Is there a way to save the plot as an object without having to this. luckly I can replot the first distribution.
but i can't plot them both at the same time
var1=as.numeric(output2[2]);dat=as.data.frame(output2[3]);p2 <- output2[1]
grid.arrange(p1,p2)
ERROR: Error in gList(list(list(data = list(x = c(9.66707664902549, 11.3631137069225, :
only 'grobs' allowed in "gList"
In this" Grid of multiple ggplot2 plots which have been made in a for loop " answer is suggested to use a list for containing the plots
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
plot(distribution)
pltlist <- list()
pltlist[["plot"]] <- distribution
output<-list(pltlist,var1,dat)
return(output)
}
output1 <- ggplothist(dat=df,var1='x')
p1<-output1[1]
output2 <- ggplothist(dat=df2,var1='y')
p2<-output2[1]
output1[1]
Will produce the distribution with mean=100 again instead of mean=10
and:
grid.arrange(p1,p2)
will produce the same Error
Error in gList(list(list(plot = list(data = list(x = c(9.66707664902549, :
only 'grobs' allowed in "gList"
As a last attempt i try to use recordPlot() to record everything about the plot into an object. The following is now inside the function.
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
plot(distribution)
distribution<-recordPlot()
output<-list(distribution,var1,dat)
return(output)
}
This function will produce the same errors as before, dependent on resetting the dat, and var1 variables to what is needed for drawing the distribution. and similarly can't be put inside a grid.
I've tried similar things like arrangeGrob() in this question "R saving multiple ggplot2 plots as R-object in list and re-displaying in grid " but with no luck.
I would really like a solution that creates an R object containing the plot, that can be redrawn by itself and can be used inside a grid without having to reset the variables used to draw the plot each time it is done. I would also like to understand wht this is happening as I don't consider it intuitive at all.
The only solution I can think of is to draw the plot as a png file, saved somewhere and then have the function return the path such that i can be reused - is that what other people are doing?.
Thanks for reading, and sorry for the long question.
Found a solution
How can I reference the local environment within a function, in R?
by inserting
localenv <- environment()
And referencing that in the ggplot
distribution <- ggplot(data=dat, aes(dat[,var1]),environment = localenv)
made it all work! even with grid arrange!

How can I auto-title a plot with the R call that produced it?

R's plotting is great for data exploration, as it often has very intelligent defaults. For example, when plotting with a formula the labels for the plot axes are derived from the formula. In other words, the following two calls produce the same output:
plot(x~y)
plot(x~y, xlab="x", ylab="y")
Is there any way to get a similar "intelligent auto-title"?
For example, I would like to call
plot(x~y, main=<something>)
And produce the same output as calling
plot(x~y, main="plot(x~y)")
Where the <something> inserts the call used using some kind of introspection.
Is there a facility for doing this in R, either through some standard mechanism or an external package?
edit: One suggestion was to specify the formula as a string, and supply that as the argument to a formula() call as well as main. This is useful, but it misses out on parameters than can affect a plot, such as using subsets of data. To elaborate, I'd like
x<-c(1,2,3)
y<-c(1,2,3)
z<-c(0,0,1)
d<-data.frame(x,y,z)
plot(x~y, subset(d, z==0), main=<something>)
To have the same effect as
plot(x~y, subset(d, z==0), main="plot(x~y, subset(d, z==0))")
I don't think this can be done without writing a thin wrapper around plot(). The reason is that R evaluates "supplied arguments" in the evaluation frame of the calling function, in which there's no way to access the current function call (see here for details).
By contrast, "default arguments" are evaluated in the evaluation frame of the function, from where introspection is possible. Here are a couple of possibilities (differing just in whether you want "myPlot" or "plot" to appear in the title:
## Function that reports actual call to itself (i.e. 'myPlot()') in plot title.
myPlot <- function(x,...) {
cl <- deparse(sys.call())
plot(x, main=cl, ...)
}
## Function that 'lies' and says that plot() (rather than myPlot2()) called it.
myPlot2 <- function(x,...) {
cl <- sys.call()
cl[[1]] <- as.symbol("plot")
cl <- deparse(cl)
plot(x, main=cl, ...)
}
## Try them out
x <- 1:10
y <- 1:10
par(mfcol=c(1,2))
myPlot(x,y)
myPlot2(y~x)
Here's a more general solution:
plotCaller <- function(plotCall, ...) {
main <- deparse(substitute(plotCall))
main <- paste(main, collapse="\n")
eval(as.call(c(as.list(substitute(plotCall)), main=main, ...)))
}
## Try _it_ out
plotCaller(hist(rnorm(9999), breaks=100, col="red"))
library(lattice)
plotCaller(xyplot(rnorm(10)~1:10, pch=16))
## plotCaller will also pass through additional arguments, so they take effect
## without being displayed
plotCaller(xyplot(rnorm(10)~1:10), pch=16)
deparse will attempt to break deparsed lines if they get too long (the default is 60 characters). When it does this, it returns a vector of strings. plot methods assume that 'main' is a single string, so the line main <- paste(main, collapse='\n') deals with this by concatenating all the strings returned by deparse, joining them using \n.
Here is an example of where this is necessary:
plotCaller(hist(rnorm(9999), breaks=100, col="red", xlab="a rather long label",
ylab="yet another long label"))
Of course there is! Here ya go:
x = rnorm(100)
y = sin(x)
something = "y~x"
plot(formula(something),main=something)
You might be thinking of the functionality of match.call. However that only really works when called inside of a function, not passed in as an argument. You could create your wrapper function that would call match.call then pass everything else on to plot or use substitute to capture the call then modify it with the call before evaluating:
x <- runif(25)
y <- rnorm(25, x, .1)
myplot <- function(...) {
tmp <- match.call()
plot(..., main=deparse(tmp))
}
myplot( y~x )
myplot( y~x, xlim=c(-.25,1.25) )
## or
myplot2 <- function(FUN) {
tmp1 <- substitute(FUN)
tmp2 <- deparse(tmp1)
tmp3 <- as.list(tmp1)
tmp4 <- as.call(c(tmp3, main=tmp2))
eval(tmp4)
}
myplot2( plot(y~x) )
myplot2( plot(y~x, xlim=c(-.25,1.25) ) )

Resources