Getting strings recognized as variable names in R - r

Related: Strings as variable references in R
Possibly related: Concatenate expressions to subset a dataframe
I've simplified the question per the comment request. Here goes with some example data.
dat <- data.frame(num=1:10,sq=(1:10)^2,cu=(1:10)^3)
set1 <- subset(dat,num>5)
set2 <- subset(dat,num<=5)
Now, I'd like to make a bubble plot from these. I have a more complicated data set with 3+ colors and complicated subsets, but I do something like this:
symbols(set1$sq,set1$cu,circles=set1$num,bg="red")
symbols(set2$sq,set2$cu,circles=set2$num,bg="blue",add=T)
I'd like to do a for loop like this:
colors <- c("red","blue")
sets <- c("set1","set2")
vars <- c("sq","cu","num")
for (i in 1:length(sets)) {
symbols(sets[[i]][,sq],sets[[i]][,cu],circles=sets[[i]][,num],
bg=colors[[i]],add=T)
}
I know you can have a variable evaluated to specify the column (like var="cu"; set1[,var]; I want to know how to get a variable to specify the data.frame itself (and another to evaluate the column).
Update: Ran across this post on r-bloggers which has this example:
x <- 42
eval(parse(text = "x"))
[1] 42
I'm able to do something like this now:
eval(parse(text=paste(set[[1]],"$",var1,sep="")))
In fiddling with this, I'm finding it interesting that the following are not equivalent:
vars <- data.frame("var1","var2")
eval(parse(text=paste(set[[1]],"$",var1,sep="")))
eval(parse(text=paste(set[[1]],"[,vars[[1]]]",sep="")))
I actually have to do this:
eval(parse(text=paste(set[[1]],"[,as.character(vars[[1]])]",sep="")))
Update2: The above works to output values... but not in trying to plot. I can't do:
for (i in 1:length(set)) {
symbols(eval(parse(text=paste(set[[i]],"$",var1,sep=""))),
eval(parse(text=paste(set[[i]],"$",var2,sep=""))),
circles=paste(set[[i]],".","circles",sep=""),
fg="white",bg=colors[[i]],add=T)
}
I get invalid symbol coordinates. I checked the class of set[[1]] and it's a factor. If I do is.numeric(as.numeric(set[[1]])) I get TRUE. Even if I add that above prior to the eval statement, I still get the error. Oddly, though, I can do this:
set.xvars <- as.numeric(eval(parse(text=paste(set[[i]],"$",var1,sep=""))))
set.yvars <- as.numeric(eval(parse(text=paste(set[[i]],"$",var2,sep=""))))
symbols(xvars,yvars,circles=data$var3)
Why different behavior when stored as a variable vs. executed within the symbol function?

You found one answer, i.e. eval(parse()) . You can also investigate do.call() which is often simpler to implement. Keep in mind the useful as.name() tool as well, for converting strings to variable names.

The basic answer to the question in the title is eval(as.symbol(variable_name_as_string)) as Josh O'Brien uses. e.g.
var.name = "x"
assign(var.name, 5)
eval(as.symbol(var.name)) # outputs 5
Or more simply:
get(var.name) # 5

Without any example data, it really is difficult to know exactly what you are wanting. For instance, I can't at all divine what your object set (or is it sets) looks like.
That said, does the following help at all?
set1 <- data.frame(x = 4:6, y = 6:4, z = c(1, 3, 5))
plot(1:10, type="n")
XX <- "set1"
with(eval(as.symbol(XX)), symbols(x, y, circles = z, add=TRUE))
EDIT:
Now that I see your real task, here is a one-liner that'll do everything you want without requiring any for() loops:
with(dat, symbols(sq, cu, circles = num,
bg = c("red", "blue")[(num>5) + 1]))
The one bit of code that may feel odd is the bit specifying the background color. Try out these two lines to see how it works:
c(TRUE, FALSE) + 1
# [1] 2 1
c("red", "blue")[c(F, F, T, T) + 1]
# [1] "red" "red" "blue" "blue"

If you want to use a string as a variable name, you can use assign:
var1="string_name"
assign(var1, c(5,4,5,6,7))
string_name
[1] 5 4 5 6 7

Subsetting the data and combining them back is unnecessary. So are loops since those operations are vectorized. From your previous edit, I'm guessing you are doing all of this to make bubble plots. If that is correct, perhaps the example below will help you. If this is way off, I can just delete the answer.
library(ggplot2)
# let's look at the included dataset named trees.
# ?trees for a description
data(trees)
ggplot(trees,aes(Height,Volume)) + geom_point(aes(size=Girth))
# Great, now how do we color the bubbles by groups?
# For this example, I'll divide Volume into three groups: lo, med, high
trees$set[trees$Volume<=22.7]="lo"
trees$set[trees$Volume>22.7 & trees$Volume<=45.4]="med"
trees$set[trees$Volume>45.4]="high"
ggplot(trees,aes(Height,Volume,colour=set)) + geom_point(aes(size=Girth))
# Instead of just circles scaled by Girth, let's also change the symbol
ggplot(trees,aes(Height,Volume,colour=set)) + geom_point(aes(size=Girth,pch=set))
# Now let's choose a specific symbol for each set. Full list of symbols at ?pch
trees$symbol[trees$Volume<=22.7]=1
trees$symbol[trees$Volume>22.7 & trees$Volume<=45.4]=2
trees$symbol[trees$Volume>45.4]=3
ggplot(trees,aes(Height,Volume,colour=set)) + geom_point(aes(size=Girth,pch=symbol))

What works best for me is using quote() and eval() together.
For example, let's print each column using a for loop:
Columns <- names(dat)
for (i in 1:ncol(dat)){
dat[, eval(quote(Columns[i]))] %>% print
}

Related

How to concatenate NOT as character in R?

I want to concatenate iris$SepalLength, so I can use that in a function to get the Sepal Length column from iris data frame. But when I use paste function paste("iris$", colnames(iris[3])), the result is as characters (with quotes), as "iris$SepalLength". I need the result not as a character. I have tried noquotes(), as.datafram() etc but it doesn't work.
freq <- function(y) {
for (i in iris) {
count <-1
y <- paste0("iris$",colnames(iris[count]))
data.frame(as.list(y))
print(y)
span = seq(min(y),max(y), by = 1)
freq = cut(y, breaks = span, right = FALSE)
table(freq)
count = count +1
}
}
freq(1)
The crux of your problem isn't making that object not be a string, it's convincing R to do what you want with the string. You can do this with, e.g., eval(parse(text = foo)). Isolating out a small working example:
y <- "iris$Sepal.Length"
data.frame(as.list(y)) # does not display iris$Sepal.Length
data.frame(as.list(eval(parse(text = y)))) # DOES display iris.$Sepal.Length
That said, I wanted to point out some issues with your function:
The input variable appears to not do anything (because it is immediately overwritten), which may not have been intended.
The for loop seems broken, since it resets count to 1 on each pass, which I think you didn't mean. Relatedly, it iterates over all i in iris, but then it doesn't use i in any meaningful way other than to keep a count. Instead, you could do something like for(count in 1 : length(iris) which would establish the count variable and iterate it for you as well.
It's generally better to avoid for loops in R entirely; there's a host of families available for doing functions to (e.g.) every column of a data frame. As a very simple version of this, something like apply(iris, 2, table) will apply the table function along margin 2 (the columns) of iris and, in this case, place the results in a list. The idea would be to build your function to do what you want to a single vector, then pass each vector through the function with something from the apply() family. For instance:
cleantable <- function(x) {
myspan = seq(min(x), max(x)) # if unspecified, by = 1
myfreq = cut(x, breaks = myspan, right = FALSE)
table(myfreq)
}
apply(iris[1:4], 2, cleantable) # can only use first 4 columns since 5th isn't numeric
would do what I think you were trying to do on the first 4 columns of iris. This way of programming will be generally more readable and less prone to mistakes.

ggplot loop over columns and show plot in one page [duplicate]

This question already has answers here:
R assigning ggplot objects to list in loop
(4 answers)
Closed 5 years ago.
As you can see from the title, I ran into a problem that already took me one entire afternoon.
I have a data frame which can be accessed from here, I want to plot several columns against some other columns, a pair of columns at each time to be precise.
therefore, I create a data.frame to store the pair of column names that I want to plot against each other:
varname.df<-data.frame(num=c(1:9),
cust= c("custEnvironment",'custCommunity','custHuman','custEmp','custDiversity',
'custProduct','custCorp',"custtotal.index","custtotal.noHC.index"),
firm=c("firmEnvironment",'firmCommunity','firmHuman','firmEmp','firmDiversity',
'firmProduct','firmCorp',"firmtotal.index","firmtotal.noHC.index"))
## factor to character
i<-sapply(varname.df,is.factor)
varname.df[i]<-lapply(varname.df[i], as.character)
rm(i)
then plot using ggplot2 and store the resultant figure in a list, see the code below:
## data I provided in the link above
temp<-read.xlsx('sample data.xlsx',1)
f <- list()
for(i in 1:9) { #dim(varname.df)[1]
# p[[i]]<-plot(x = SC.csr[,varname.df[i,'cust']],y = SC.csr[,varname.df[i,'firm']])
dat<-subset(temp,select = c(varname.df[i,'cust'], varname.df[i,'firm']))
pc1 <- ggplot(dat,aes(y = dat[,1], x = dat[,2])) +
# labs(title="Plot of CSR", x =colnames(dat)[2], y = colnames(dat)[1]) +
geom_point()
f[[i]]<-pc1
print(pc1)
Sys.sleep(5)
rm(pc1,pc2,pc3)
}
do.call(grid.arrange,f)
which suppose to work as the answer here and here, however, things just seem not that good to me since it gives me
the exact same points in all the figure, but if you run the for loop, it will literally produce different figures at each iteration as you can see with your own eyes.
After debugging nearly an afternoon, it seems like whenever I add a new ggplot object to the list, it just changes all the data points of all other ggplot objects in the same list.
This is so weird and frustrating in a sense that no error throwout but things are wrong somewhere out there. Any suggestion would be deeply appreciated.
-----------"EDIT"-------
problem solved here, the 3rd answer.
If you would like to work with facets you could try this:
varname.df<-data.frame(num=c(1:9),
cust= c("custEnvironment",'custCommunity','custHuman','custEmp','custDiversity',
'custProduct','custCorp',"custtotal.index","custtotal.noHC.index"),
firm=c("firmEnvironment",'firmCommunity','firmHuman','firmEmp','firmDiversity',
'firmProduct','firmCorp',"firmtotal.index","firmtotal.noHC.index"))
## factor to character
i<-sapply(varname.df,is.factor)
varname.df[i]<-lapply(varname.df[i], as.character)
rm(i)
temp<-read.xlsx('sample data.xlsx',1)
temp1=temp%>%select(c(varname.df$cust))%>%melt(value.name = "y")%>%mutate(id=str_replace(variable,"cust",""))
temp2=temp%>%select(c(varname.df$firm))%>%melt(value.name = "x")%>%mutate(id=str_replace(variable,"firm",""))
temp0=merge(temp1,temp2,by="id")%>%select(id,x,y)
ggplot(temp0,aes(x=x,y=y))+geom_point()+facet_grid(.~id)+xlab("Firm")+ylab("Cust")
If you prefer to store your plots in a list and the plot them in a grid the code below seems to do the trick.
varname.df<-data.frame(num=c(1:9),
cust= c("custEnvironment",'custCommunity','custHuman','custEmp','custDiversity',
'custProduct','custCorp',"custtotal.index","custtotal.noHC.index"),
firm=c("firmEnvironment",'firmCommunity','firmHuman','firmEmp','firmDiversity',
'firmProduct','firmCorp',"firmtotal.index","firmtotal.noHC.index"))
## factor to character
i<-sapply(varname.df,is.factor)
varname.df[i]<-lapply(varname.df[i], as.character)
rm(i)
temp<-read.xlsx('sample data.xlsx',1)
f <- list()
for(i in 1:9) { #dim(varname.df)[1]
f[[i]]<-subset(temp,select = c(varname.df[i,'cust'], varname.df[i,'firm']))
}
plotlist=lapply(1:9,function(x) ggplot(f[[x]],aes(y = f[[x]][,1], x = f[[x]][,2])) +geom_point()+xlab("Firm")+
ylab("Cust"))
plot_grid(plotlist=plotlist)

Applying multiple function via sapply

I'm trying to replicate solution on applying multiple functions in sapply posted on R-Bloggers but I can't get it to work in the desired manner. I'm working with a simple data set, similar to the one generated below:
require(datasets)
crs_mat <- cor(mtcars)
# Triangle function
get_upper_tri <- function(cormat){
cormat[lower.tri(cormat)] <- NA
return(cormat)
}
require(reshape2)
crs_mat <- melt(get_upper_tri(crs_mat))
I would like to replace some text values across columns Var1 and Var2. The erroneous syntax below illustrates what I am trying to achieve:
crs_mat[,1:2] <- sapply(crs_mat[,1:2], function(x) {
# Replace first phrase
gsub("mpg","MPG",x),
# Replace second phrase
gsub("gear", "GeArr",x)
# Ideally, perform other changes
})
Naturally, the code is not syntactically correct and fails. To summarise, I would like to do the following:
Go through all the values in first two columns (Var1 and Var2) and perform simple replacements via gsub.
Ideally, I would like to avoid defining a separate function, as discussed in the linked post and keep everything within the sapply syntax
I don't want a nested loop
I had a look at the broadly similar subject discussed here and here but, if possible, I would like to avoid making use of plyr. I'm also interested in replacing the column values not in creating new columns and I would like to avoid specifying any column names. While working with my existing data frame it is more convenient for me to use column numbers.
Edit
Following very useful comments, what I'm trying to achieve can be summarised in the solution below:
fun.clean.columns <- function(x, str_width = 15) {
# Make character
x <- as.character(x)
# Replace various phrases
x <- gsub("perc85","something else", x)
x <- gsub("again", x)
x <- gsub("more","even more", x)
x <- gsub("abc","ohmg", x)
# Clean spaces
x <- trimws(x)
# Wrap strings
x <- str_wrap(x, width = str_width)
# Return object
return(x)
}
mean_data[,1:2] <- sapply(mean_data[,1:2], fun.clean.columns)
I don't need this function in my global.env so I can run rm after this but even nicer solution would involve squeezing this within the apply syntax.
We can use mgsub from library(qdap) to replace multiple patterns. Here, I am looping the first and second column using lapply and assign the results back to the crs_mat[,1:2]. Note that I am using lapply instead of sapply as lapply keeps the structure intact
library(qdap)
crs_mat[,1:2] <- lapply(crs_mat[,1:2], mgsub,
pattern=c('mpg', 'gear'), replacement=c('MPG', 'GeArr'))
Here is a start of a solution for you, I think you're capable of extending it yourself. There's probably more elegant approaches available, but I don't see them atm.
crs_mat[,1:2] <- sapply(crs_mat[,1:2], function(x) {
# Replace first phrase
step1 <- gsub("mpg","MPG",x)
# Replace second phrase. Note that this operates on a modified dataframe.
step2 <- gsub("gear", "GeArr",step1)
# Ideally, perform other changes
return(step2)
#or one nested line, not practical if more needs to be done
#return(gsub("gear", "GeArr",gsub("mpg","MPG",x)))
})

What's the shortest way of creating a load of R objects with consecutive names?

This is what I've got at the moment:
weights0 <- array(dim=c(nrow(ind),nrow(all.msim)))
weights1 <- array(dim=c(nrow(ind),nrow(all.msim)))
weights2 <- array(dim=c(nrow(ind),nrow(all.msim)))
weights3 <- array(dim=c(nrow(ind),nrow(all.msim)))
weights4 <- array(dim=c(nrow(ind),nrow(all.msim)))
weights5 <- array(dim=c(nrow(ind),nrow(all.msim)))
weights0 <- 1 # sets initial weights to 1
Nice and clear, but not nice and short!
Would experienced R programmers write this in a different way?
EDIT:
Also, is there an established way of creating a number of weights that depends on a pre-existing variable to make this generalisable? For example, the parameter num.cons would equal 5: the number of constraints (and hence weights) that we need. Imagine this is a common programming problem, so sure there is a solution.
Option 1
If you want to create the different elements in your environment, you can do it with a for loop and assign. Other options are sapply and the envir argument of assign
for (i in 0:5)
assign(paste0("weights", i), array(dim=c(nrow(ind),nrow(all.msim))))
Option 2
However, as #Axolotl9250 points out, depending on your application, more often than not it makes sense to have these all in a single list
weights <- lapply(rep(NA, 6), array, dim=c(nrow(ind),nrow(all.msim)))
Then to assign to weights0 as you have above, you would use
weights[[1]][ ] <- 1
note the empty [ ] which is important to assign to ALL elements of weights[[1]]
Option 3
As per #flodel's suggestion, if all of your arrays are of the same dim,
you can create one big array with an extra dim of length equal to the number
of objects you have. (ie, 6)
weights <- array(dim=c(nrow(ind),nrow(all.msim), 6))
Note that for any of the options:
If you want to assign to all elements of an array, you have to use empty brackets. For example, in option 3, to assign to the 1st array, you would use:
weights[,,1][] <- 1
I've just tried to have a go at achieving this but with no joy, maybe someone else is better than I (most likely!!). However I can't help but feel maybe it's easier to have all the arrays in a single object, a list; that way a single lapply line would do, and instead of referring to weights1 weights2 weights3 weights4 it would be weights[[1]] weights[[2]] weights[[3]] weights[[4]]. Future operations on those arrays would then also be achieved by the apply family of functions. Sorry I can't get it exactly as you describe.
given what you're duing, just using a for loop is quick and intuitive
# create a character vector containing all the variable names you want..
variable.names <- paste0( 'weights' , 0:5 )
# look at it.
variable.names
# create the value to provide _each_ of those variable names
variable.value <- array( dim=c( nrow(ind) , nrow(all.msim) ) )
# assign them all
for ( i in variable.names ) assign( i , variable.value )
# look at what's now in memory
ls()
# look at any of them
weights4

Changing arguments in tapply?

I have a several groups, let's say A,B,C and I want to cut another variable based on these groups, i.e. each group has specific breaks for the same variable.
If I had to calculate the groups mean, i´d use tapply like this:
tapply(mydata$var,mydata$group,mean)
Unfortunately I do not know how to fix this for cut with changing breaks=c(...) arguments for different groups.
tapply(mydata$var,mydata$group,cut)
Any suggestions? I´d like to do it with tapply but any other solution but a custom made function would be suitable too.
EDIT: some small example:
test <- data.frame(var=rnorm(100,0,1),
group=c(rep("A",30),
rep("B",20),
rep("C",50)))
# for group A:
cut(test$var,breaks=c(-4,0,4))
# for group B
cut(test$var,breaks=c(-4,1,4))
and so on...
I'm going to put my mind-reading hat on here and take a stab that you want something like this:
dat <- data.frame(x = runif(100),grp = rep(letters[1:3],length.out = 100))
mapply(cut,split(dat$x,dat$grp),list(c(-Inf,0.5,Inf),
c(-Inf,0.1,0.5,0.9,Inf),
c(-Inf,0.25,0.5,0.75,Inf)))
So this is simply splitting x by grp and applying cut to each piece using different breaks for each piece.
Actually R behaves quite clever here. I found a solution that does work the way I thought initially. Though it's not using the apply family. Somehow R creates integers here instead of factors – which is why in this solution, there is no problem with factor levels like Joran mentions.
dat <- data.frame(x = rnorm(100),grp = rep(letters[1:3],length.out = 100))
ifelse(dat$grp == "a",cut(dat$x,breaks=c(-Inf,0.1,0.2,Inf)),
ifelse(dat$grp == "b",cut(dat$x,breaks=c(-Inf,0.1,1,Inf)),
cut(dat$x,breaks=c(-Inf,0.9,2,Inf))) )

Resources