Plotting a data.frame from within a function with ggplot2 - r

I have this function to take an object returned by the IRT package sirt and plot item response functions for a set of items that the user can specify:
plotRaschIRF <- function(x,items=NULL,thl=-5,thu=5,thi=.01,D=1.7) {
if (!class(x)=="rasch.mml") stop("Object must be of class rasch.mml")
thetas <- seq(thl,thu,thi)
N <- length(thetas)
n <- length(x$item$b)
tmp <- data.frame(item=rep(1:n,each=N),theta=rep(thetas,times=n),b=rep(x$item$b,each=N))
probs <- exp(D*(tmp[,2]-tmp[,3]))/(1+exp(D*(tmp[,2]-tmp[,3])))
dat <- data.frame(item=rep(1:n,each=N),theta=rep(thetas,times=n),b=rep(x$item$b,each=N),p=probs)
#dat$item <- factor(dat$item,levels=1:n,labels=paste0("Item",1:n))
if (is.null(items)) {
m <- min(10,n)
items <- 1:m
if (10<n) warning("By default, this function will plot only the first 10 items")
}
if (length(items)==1) {
title="Item Response Function"
} else {
title="Item Response Functions"
}
dat2 <- subset(dat,eval(quote(eval(item,dat) %in% items)))
dat2$item <- factor(dat2$item,levels=unique(dat2$item),labels=paste0("Item",unique(dat2$item)))
out <- ggplot(dat2,aes(x=theta,y=p,group=item)) +
geom_line(aes(color=dat2$item),lwd=1) + guides(col=guide_legend(title="Items")) +
theme_bw() + ggtitle(title) + xlab(expression(theta)) +
ylab("Probability") + scale_x_continuous(breaks=seq(thl,thu,1))
print(out)
}
But it seems to be getting stuck at either the line just before I start using ggplot2 (where I convert one column of dat2 to a factor) or at the ggplotting itself -- not really sure which. I get the error message "Error in eval(expr, envir, enclos) : object 'dat2' not found".
I tried reading through this as was suggested here but either this is a different problem or I'm just not getting it. The function works fine when I step through it line by line. Any help is greatly appreciated!

Based on your comments, the error is almost certainly in geom_line(aes(color=dat2$item)). Get rid of dat2$ and it should work fine (i.e. geom_line(aes(color=item))). Stuff in aes is evaluated in the data argument (dat2 here), with the global environment as the enclosure. Notably this means stuff in the function environment is not available for use by aes unless it is part of the data (dat2 here). Since dat2 doesn't exist inside dat2, and dat2 doesn't exist in the global environment, you get that error.

Related

R : pass Graph as parameter to a function

I have a decent looking graph ,which I plotted using
r <- ggplot(data=data2.Gurgaon,aes(x=createdDate,y=count))+geom_point()
Now i want to higlight few points on the graph say 500,1000,5000 etc..
so ,I am trying to write a function , in which i can pass point I want to mark
Below is the function I have written
graphPoint <- function(graph,point) {
g <- graph
g <- g+geom_point(aes(x=createdDate[point],y=count[point]),pch=1,size=8,col='black')
g <- g+ geom_point(aes(x=createdDate[point],y=count[point]),pch=16,size=5,col='red')
g
}
when i am passing parameters
r -> graphPoint(r,500)
this is giving error
Error in lapply(X = x, FUN = "[", ..., drop = drop) :
object 'point' not found
i am not that great with R . Hope its possible , But I am missing at some small point .. Thanks.
This is actually an extremely subtle (and annoying...) problem in ggplot, although not a bug. The aes(...) function evaluates all symbols first in the context of the default dataset (e.g. it looks for columns with that name), and, if that fails in the global environment. It does not move up the calling chain, as you might justifiably expect it to. So in your case the symbol point is first evaluated in the context of data2.Gurgaon. Since there is no such column, it looks for point in the global environment, but not in the context of your graphPoint(...) function. Here is a demonstration:
df <- mtcars
library(ggplot2)
graphPoint <- function(graph,point) {
g <- graph
g <- g + geom_point(aes(x=wt[point],y=mpg[point]),pch=1,size=8,col='black')
g <- g + geom_point(aes(x=wt[point],y=mpg[point]),pch=16,size=5,col='red')
g
}
ggp <- ggplot(df, aes(x=wt, y=mpg)) + geom_point()
point=10
graphPoint(ggp, 10)
The reason this works is because I defined point in the global environment; the point variable inside the function is being ignored (you can demonstrate that by calling the fn with something other than 10: you'll get the same plot).
The correct way around this is by subsetting the data=... argument, as shown in the other answer.
You cannot select a subset of the data within the aesthetics part of a ggplot function, as you are trying to do. However you can achieve this by extracting the original data from the ggplot object, subsetting it and using the subset in the rest of the function.
r <- ggplot(data=mtcars,aes(x=cyl,y=drat))+geom_point()
graphPoint <- function(graph,point) {
g <- graph
data_subset <- g$data[point, ]
g <- g+geom_point(data = data_subset,
aes(x=cyl,y=drat),pch=1,size=8,col='black')
g <- g+ geom_point(data = data_subset,
aes(x=cyl,y=drat),pch=16,size=5,col='red')
g
}
graphPoint(r, point = 2)
PS for upcoming posts I would advise you to make a reproducible example by using data that is generally accessible, like the mtcars data. This would make it easier to help you out.

How do I loop a qqplot in ggplot2?

I am trying to create a function that loops through the columns of my dataset and saves a qq-plot of each of my variables. I have spent a lot of time looking for a solution, but I am an R novice and haven't been able to successfully apply any answers to my data. Can anyone see what I am doing wrong?
There error I am give is this, "Error in eval(expr, envir, enclos) : object 'i' not found"
library(ggplot2)
QQPlot <- function(x, na.rm = TRUE, ...) {
nm <- names(x)
for (i in names(mybbs)) {
plots <-ggplot(mybbs, aes(sample = nm[i])) +
stat_qq()
ggsave(plots, filename = paste(nm[i], ".png", sep=""))
}
}
QQPlot(mybbs)
The error happens because you are trying to pass a string as a variable name. Use aes_string() instead of aes()
Moreover, you are looping over names, not indexes; nm[i] would work for something like for(i in seq_along(names(x)), but not with your current loop. You would be better off replacing all nm[i] by i in the function, since what you want is the variable name.
Finally, you use mybbs instead of x inside the function. That means it will not work properly with any other data.frame.
Here is a solution to those three problems:
QQPlot <- function(x, na.rm = TRUE, ...) {
for (i in names(x)) {
plots <-ggplot(x, aes_string(sample = i)) +
stat_qq()
#print(plots)
ggsave(plots, filename = paste(i, ".png", sep=""))
}
}

Save plots as R objects and displaying in grid

In the following reproducible example I try to create a function for a ggplot distribution plot and saving it as an R object, with the intention of displaying two plots in a grid.
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
output<-list(distribution,var1,dat)
return(output)
}
Call to function:
set.seed(100)
df <- data.frame(x = rnorm(100, mean=10),y =rep(1,100))
output1 <- ggplothist(dat=df,var1='x')
output1[1]
All fine untill now.
Then i want to make a second plot, (of note mean=100 instead of previous 10)
df2 <- data.frame(x = rep(1,1000),y = rnorm(1000, mean=100))
output2 <- ggplothist(dat=df2,var1='y')
output2[1]
Then i try to replot first distribution with mean 10.
output1[1]
I get the same distibution as before?
If however i use the information contained inside the function, return it back and reset it as a global variable it works.
var1=as.numeric(output1[2]);dat=as.data.frame(output1[3]);p1 <- output1[1]
p1
If anyone can explain why this happens I would like to know. It seems that in order to to draw the intended distribution I have to reset the data.frame and variable to what was used to draw the plot. Is there a way to save the plot as an object without having to this. luckly I can replot the first distribution.
but i can't plot them both at the same time
var1=as.numeric(output2[2]);dat=as.data.frame(output2[3]);p2 <- output2[1]
grid.arrange(p1,p2)
ERROR: Error in gList(list(list(data = list(x = c(9.66707664902549, 11.3631137069225, :
only 'grobs' allowed in "gList"
In this" Grid of multiple ggplot2 plots which have been made in a for loop " answer is suggested to use a list for containing the plots
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
plot(distribution)
pltlist <- list()
pltlist[["plot"]] <- distribution
output<-list(pltlist,var1,dat)
return(output)
}
output1 <- ggplothist(dat=df,var1='x')
p1<-output1[1]
output2 <- ggplothist(dat=df2,var1='y')
p2<-output2[1]
output1[1]
Will produce the distribution with mean=100 again instead of mean=10
and:
grid.arrange(p1,p2)
will produce the same Error
Error in gList(list(list(plot = list(data = list(x = c(9.66707664902549, :
only 'grobs' allowed in "gList"
As a last attempt i try to use recordPlot() to record everything about the plot into an object. The following is now inside the function.
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
plot(distribution)
distribution<-recordPlot()
output<-list(distribution,var1,dat)
return(output)
}
This function will produce the same errors as before, dependent on resetting the dat, and var1 variables to what is needed for drawing the distribution. and similarly can't be put inside a grid.
I've tried similar things like arrangeGrob() in this question "R saving multiple ggplot2 plots as R-object in list and re-displaying in grid " but with no luck.
I would really like a solution that creates an R object containing the plot, that can be redrawn by itself and can be used inside a grid without having to reset the variables used to draw the plot each time it is done. I would also like to understand wht this is happening as I don't consider it intuitive at all.
The only solution I can think of is to draw the plot as a png file, saved somewhere and then have the function return the path such that i can be reused - is that what other people are doing?.
Thanks for reading, and sorry for the long question.
Found a solution
How can I reference the local environment within a function, in R?
by inserting
localenv <- environment()
And referencing that in the ggplot
distribution <- ggplot(data=dat, aes(dat[,var1]),environment = localenv)
made it all work! even with grid arrange!

Variable class different within function?

In order to streamline future data analysis, I'm trying to write a script that will identify the different self-report scales included in a data.frame and perform routine analyses on each scale's items. Currently, I want it to identify which scales are present, find the responses for each of the scale's items, and then calculate the Cronbach's Alphas for each scale.
Everything seems to be working except when I run my function that should produce a list of alpha() outputs for each scale I get the following error:
> Cronbach.Alphas(scales.data, scale.names)
Error in alpha(data[, responses[[i]]]) :
Data must either be a data frame or a matrix
Obviously I know that this is saying the information being given to the alpha() function is not a data.frame or matrix. The reason I'm so confused though is that when I do these calculations manually step-by-step outside of my Cronbach.Alphas() function, it clearly tells me that it is a data.frame and seems to work like a charm:
> class(scales.data[,responses[[1]]])
[1] "data.frame"
This is driving me crazy and I'll be extremely appreciative of any help with figuring this out. My full code is pasted below. (Note: I'm pretty new to programming functions in R so the way I'm doing things is probably not optimal. Any additional advice is welcome as well.)
Also, it might help to mention that my code is designed to identify scale names based on the presence of an underscore in a column name. That is, "rsq_12" indicates the scale as rsq and the column as responses to item 12 of the scale.
require(psych)
##### Function for identifying names of scales present in the data file #####
GetScales <- function(x) {
find.scale.names <- regexec("^(([^_]+)_)", colnames(x))
scales <- do.call(rbind, lapply(regmatches(colnames(x), find.scale.names), `[`, 3L))
colnames(scales) <- "scale"
na.find <- ifelse(is.na(scales[,1]), 0, 1)
scales <- cbind(scales, na.find)
output <- scales[scales[,2] == 1,]
output[,1]
}
##### Function for calculating cronbach's alpha for each scale #####
Cronbach.Alphas <- function(data, scales){
for(i in 1:length(scales)){
if(i == 1) {
responses <- list(grep(scales[i], colnames(data)))
alphas <- list(alpha(data[,responses[[i]]]))
} else {
responses <- append(responses, list(grep(scales[i], colnames(data))))
alphas <- append(alphas, list(alpha(data[,responses[[i]]])))
}
}
return(alphas)
}
### Import data from .csv file ###
scales.data <- data.frame(read.csv(file.choose()))
### Identify each item's scale ###
scale.items <- GetScales(scales.data)
### Reduce to names of scales ###
scale.names <- cbind(scale.items, !duplicated(scale.items))
scale.names <- scale.names[scale.names[,2] == TRUE, 1]
scale.names
### Calculate list of alphas ###
Cronbach.Alphas(scales.data, scale.names)
Thank you to anyone who has taken the time to look over my code. I appreciate your help. I was working off of the suggestions left here when I realized a simple mistake on my part...
One of the scales in the dataset that I've been using as a test while working on this script had only one item in it. Thus, data[,responses[[i]]] in my Cronbach.Alphas() function was passing a vector (rather than a data.frame or matrix) to the alpha() function at that point in the for loop. It is impossible to calculate cronbach alpha for a single item scale because it is an index of inter-item reliability...
Sooooo, all my code needed was a way to identify scales with just one item:
Cronbach.Alphas <- function(data, scales){
for(i in 1:length(scales)){
if(i == 1) {
responses <- list(grep(scales[i], colnames(data)))
if(length(responses[[i]]) > 1){
alphas <- list(alpha(data[,responses[[i]]]))
}
} else {
responses <- append(responses, list(grep(scales[i], colnames(data))))
if(length(responses[[i]]) > 1){
alphas <- append(alphas, list(alpha(data[,responses[[i]]])))
}
}
}
return(alphas)
}
Sorry for wasting anyone's time with my mistake. On the plus side, by substituting this new Cronbach.Alphas() function into the script above, I've now posted a script that will automatically identify scales and produce a list of cronbach's alphas (provided the columns are named with an underscore after the scale names) for anyone who might interested. Thanks again!

tryCatch() apparently ignoring a warning

I'm writing a function that uses kmeans to determine bin widths to convert a continuous measurement (a predicted probability) into an integer (one of 3 bins). I've stumbled upon an edge case in which it's possible for my algorithm to (correctly) predict the same probability for a whole set, and I want to handle that situation. I'm using the rattle package's binning() function in the following way:
btsKmeansBin <- function(x, k = 3, default = c(0, 0.3, 0.5, 1)) {
result <- binning(x, bins = k, method = "kmeans", ordered = T)
bins <- attr(result, "breaks")
attr(bins, "names") <- NULL
bins <- bins[order(bins)]
bins[1] <- 0
bins[length(bins)] <- 1
return(bins)
}
Run this function on x <- c(.5,.5,.5,.5,.5,.5), and you'll get an error at the order(bins) step, because bins will be NULL and therefore not a vector.
Obviously, if x has only one distinct value, kmeans shouldn't work. In this case, I'd like to return the default bin divisions. When this happens, binning issues "Warning: the variable is not considered." So I'd like to use tryCatch to handle this warning, but surrounding the result <- ... line with the following code doesn't work the way I expect:
...
tryCatch({
result <- binning(x, bins = k, method = "kmeans", ordered = T)
}, warning = function(w) {
warn(sprintf("%s. Using default values", w))
return(default)
}, error = function(e) {
stop(e)
})
...
The warning gets printed as though I hadn't used tryCatch, and the code progresses past the return statement and throws the error from order again. I have tried a bunch of variations to no avail. What am I missing, here??
If you look in binning I think you'll find that the "warning" you see is not generated via warning() but with cat(), which is why tryCatch isn't picking it up. The author of binning probably deserves a few lashings with a wet noodle for this oversight. ;) (Or it could be on purpose due to the particular way that rattle works, I'm not sure.)
It appears to return NULL when this happens, so you could simply handle it manually. Not ideal, but possibly the only way to go.

Resources