assign and get for dataframe variables in r - r

If I want to have variables with numbers accessible for example in a for loop I can use get and assign:
for(i in 1:2){
assign(paste0('a',toString(i)),i*pi)
}
get('a2')
output
[1] 6.283185
But what if I want to do something similar for a dataframe?
I would like to do something like
df<-data.frame(matrix(ncol = 2,nrow = 3))
varnames <- c()
for(i in 1:2){
varnames <- c(varnames, paste0('a', toString(i)))
}
colnames(df) <- varnames
for(i in 1:2){
assign(paste0('df$a',toString(i)), rep(i*pi,3))
}
get(paste0('df$a',toString(2)))
But this actually just creates variables called df$a1, df$a2 instead of assigning c(i*pi,i*pi,i*pi) to the columns of the dataframe df
And what I really want to do is to be able manipulate whole columns (individual entries) like this:
for(i in 1:2){
for(j in 1:3)
assign(paste0('df$a',toString(i),'[',toString(j),']'), i*pi)
}
get(paste0('df$a',toString(2),'[2]'))
where I would be able to get df$a2[2].
I think something like a python dictionary would work too.

Instead of assign, just directly do the [
for(i in 1:2) df[[paste0('a', i)]] <- rep(i * pi, 3)
and then can get the value with
df[[paste0('a', 2)]][2]
[1] 6.283185
assign can be used, but it is not recommended when we have do this more directly
for(i in 1:2) assign("df",`[[<-`(df, paste0('a', i), value = i * pi))
df[[paste0('a', 2)]][1]
[1] 6.283185
The get should be on the object i.e. 'df' instead of the columns i.e.
get('df')[[paste0('a', 2)]][1]

First of all, it is not generally a great idea to use assign to create objects in the global environment. In preference, you should be creating a named list instead for all sorts of good reasons, not least of which is the ability to iterate over the objects you create.
Secondly, note that the block of code:
varnames <- c()
for(i in 1:2){
varnames <- c(varnames, paste0('a', toString(i)))
}
colnames(df) <- varnames
Can be replaced with the one-liner:
colnames(df) <- paste0("a", 1:2)
Finally, you should take advantage of R's vectorization and the ability to subset with ["colname"] notation. This removes the need for an explicit loop altogether here:
df[paste0("a", 1:2)] <- sapply(1:2, \(i) rep(i * pi, 3))
df
#> a1 a2
#> 1 3.141593 6.283185
#> 2 3.141593 6.283185
#> 3 3.141593 6.283185

Related

How to loop dataframes for calculation in global environment?

I have many data in the global environment. And I want to loop some of them (have a pattern in their names) for calculation.
For example, how to
(1)select DataX1, DataX2, DataX3 for a loop, and
(2)add 1 on them, then
(3)storage the result in a new df with name pattern like r.DataX1
DataX1 <- sample(1:10)
DataX2 <- sample(1:10)
DataX3 <- sample(1:10)
DataY1 <- sample(1:10)
DataY2 <- sample(1:10)
DataY3 <- sample(1:10)
list = apropos("DataX.")
for (i in 1:3){
X <- list[[i]]
r.DataXi <- X + 1
}
BTW, how to select all the dataframe in the current global environment with a "Y" in the middle, with a "3" at the end, with a "D" at the beginning and loop them?
Thank you.

Using sapply instead of loop in R

I have a function that requires 4 parameters:
myFun <- function(a,b,c,d){}
I have a matrix where each row contains the parameters:
myMatrix = matrix(c(a1,a2,b1,b2,c1,c2,d1,d2), nrow=2, ncol=4)
Currently I have a loop which feeds the parameters to myFun:
m <- myMatrix
i <- 1
someVector <- c()
while (i<(length(m[,1])+1)){
someVector[i] <-
myFun(m[i,1],m[i,2],m[i,3],m[i,4])
i = i+1
}
print(someVector)
What I would like to know is there a better way to get this same result using sapply instead of a loop.
You can use mapply() here which allows you to give it vectors as arguments, you should turn your matrix into a dataframe.
df <- as.data.frame(myMatrix))
results <- mapply(myFun, df$a, df$b, df$c, df$d)

Create list with named objects in R and retrieve parts of the objects from this list

I have several dataframes (full data and reducted data) and now I want to do a whole lot of analysing with kmeans and hclust. I want to be able to work in a loop and store the results in a list where I can retreive (parts of) the stored objects based on their names. The reason is that in R-Markdown there is no good way to create new objects (and no, assign is NOT a good option to do so).
So the idea is that I make several kmeans-objects in a for-loop on several dataframes and put them to a list. But I can't seem to store them in such a way, that I can name these objects. In my list everything is cluttering up. See my example.
To retreive (parts of) the object of the desired list, I have problems how to address this parts (see my last part)
set.seed(4711)
df <- data.frame(matrix(sample(0:6, 120, replace = TRUE), ncol = 15, nrow = 8))
list_of_kmeans_objects <- list()
for (i in 2:4){
list_of_kmeans_objects <- c(list_of_kmeans_objects, kmeans(df, centers = i))
}
Now I have a clutterded up list of 36 items. But what I want is a list with 'items' which I also want to be named. My desired list would be:
C2_kmeans_df <- kmeans(df, centers = 2)
C3_kmeans_df <- kmeans(df, centers = 3)
C4_kmeans_df <- kmeans(df, centers = 4)
desired_list_of_kmeans <- list(C2_kmeans_df, C3_kmeans_df, C4_kmeans_df, C5_kmeans_df)
names(desired_list_of_kmeans)[1] <- "C2_kmeans_df"
names(desired_list_of_kmeans)[2] <- "C3_kmeans_df"
names(desired_list_of_kmeans)[3] <- "C4_kmeans_df"
If I should have this list, my last problem is how do I extract for example
C3_kmeans_df$cluster #or
C4_kmeans_df$tot.withinss
from this list, using the names of the objects in the desired list?
Here is an option using lapply and setNames.
idx <- 2:4
out <- setNames(object = lapply(idx, function(i) kmeans(df, centers = i)),
nm = paste0("C", idx, "_kmeans_df"))
Check the names
names(out)
# [1] "C2_kmeans_df" "C3_kmeans_df" "C4_kmeans_df"
Access cluster
out$C2_kmeans_df$cluster
# [1] 2 1 2 1 2 1 2 1
In your present for loop, you erase the list_of_kmeans_objects object at each iteration.
The following code should do what you do want:
list_of_kmeans_objects <- list()
aaa <- 0
for (i in 2:4) {
aaa <- aaa+1
list_of_kmeans_objects[[aaa]] <- kmeans(df, centers=i)
names(list_of_kmeans_objects)[aaa] <- paste0("C", aaa, "_kmeans_df")
}

probe global variables to call inside function

I want to pass variables within the .Globalenv when inside a function. Basically concatenate x number of data frames into a matrix.
Here is some dummy code;
Alpha <- data.frame(lon=124.9167,lat=1.53333)
Alpha_2 <- data.frame(lon=3.13333, lat=42.48333)
Alpha_3 <- data.frame(lon=-91.50667, lat=27.78333)
myfunc <- function(x){
vars <- ls(.GlobalEnv, pattern=x)
mat <- as.matrix(rbind(vars[1], vars[2], vars[3]))
return(mat)
}
When calling myfunc('Alpha') I would like the same thing to be returned as when you run;
as.matrix(rbind(Alpha, Alpha_2, Alpha_3)
lon lat
1 124.91670 1.53333
2 3.13333 42.48333
3 -91.50667 27.78333
Any pointers would be appreciated, thanks!
You can use get to retrieve variables by name. We do this here in a loop with lapply, and then use rbind to bind them together.
myfunc <- function(x){
vars <- ls(.GlobalEnv, pattern=x)
df <- do.call(rbind, mget(vars, .GlobalEnv)) # courtesy #Roland
return(df)
}
myfunc("Alpha")
# lon lat
# 1 124.91670 1.53333
# 2 3.13333 42.48333
# 3 -91.50667 27.78333
Note, in practice, you probably want to check that the variables that match the pattern actually are what you think they are, but this gives you the rough tools you want.
Old version (2nd line of func):
df <- do.call(rbind, lapply(vars, get, envir=.GlobalEnv))

vectorize replacement over 3d for loop

I would like to vectorize (or optimize in any way possible), the following 3d for loop:
dat: array with dim = c(n,n,m)
ref: matrix with dim = c(n,m)
for(i in 1:length(dat[,1,1])){
for(k in 1:length(dat[1,1,])){
dat[i,,k][dat[i,,k] > ref[i,k]] <- NA
}
}
The array I am working with is 7e3 x 7e3 x 2e2 so the for loop above is impractically expensive. To boot, I will need to perform two or three very similar operations (on different arrays), so any saved time will be multiplied.
Example dat and ref arrays:
dat <- array(seq(1,75), dim=c(5,5,3))
ref <- cbind(seq(6,10), seq(36,40), seq(61,65))
You can use this instead. It creates a new_ref array which is conformable to dat, so you can compare them directly:
new_ref <- aperm(array(ref, dim(dat)[c(1,3,2)]), c(1,3,2))
dat3 <- dat
dat3[dat3 > new_ref] <- NA
Comparison with your loop:
dat2 <- dat
for(i in 1:length(dat[,1,1])){
for(k in 1:length(dat[1,1,])){
dat2[i,,k][dat2[i,,k] > ref[i,k]] <- NA
}
}
identical(dat2, dat3)
#[1] TRUE

Resources