I have many data in the global environment. And I want to loop some of them (have a pattern in their names) for calculation.
For example, how to
(1)select DataX1, DataX2, DataX3 for a loop, and
(2)add 1 on them, then
(3)storage the result in a new df with name pattern like r.DataX1
DataX1 <- sample(1:10)
DataX2 <- sample(1:10)
DataX3 <- sample(1:10)
DataY1 <- sample(1:10)
DataY2 <- sample(1:10)
DataY3 <- sample(1:10)
list = apropos("DataX.")
for (i in 1:3){
X <- list[[i]]
r.DataXi <- X + 1
}
BTW, how to select all the dataframe in the current global environment with a "Y" in the middle, with a "3" at the end, with a "D" at the beginning and loop them?
Thank you.
Related
If I want to have variables with numbers accessible for example in a for loop I can use get and assign:
for(i in 1:2){
assign(paste0('a',toString(i)),i*pi)
}
get('a2')
output
[1] 6.283185
But what if I want to do something similar for a dataframe?
I would like to do something like
df<-data.frame(matrix(ncol = 2,nrow = 3))
varnames <- c()
for(i in 1:2){
varnames <- c(varnames, paste0('a', toString(i)))
}
colnames(df) <- varnames
for(i in 1:2){
assign(paste0('df$a',toString(i)), rep(i*pi,3))
}
get(paste0('df$a',toString(2)))
But this actually just creates variables called df$a1, df$a2 instead of assigning c(i*pi,i*pi,i*pi) to the columns of the dataframe df
And what I really want to do is to be able manipulate whole columns (individual entries) like this:
for(i in 1:2){
for(j in 1:3)
assign(paste0('df$a',toString(i),'[',toString(j),']'), i*pi)
}
get(paste0('df$a',toString(2),'[2]'))
where I would be able to get df$a2[2].
I think something like a python dictionary would work too.
Instead of assign, just directly do the [
for(i in 1:2) df[[paste0('a', i)]] <- rep(i * pi, 3)
and then can get the value with
df[[paste0('a', 2)]][2]
[1] 6.283185
assign can be used, but it is not recommended when we have do this more directly
for(i in 1:2) assign("df",`[[<-`(df, paste0('a', i), value = i * pi))
df[[paste0('a', 2)]][1]
[1] 6.283185
The get should be on the object i.e. 'df' instead of the columns i.e.
get('df')[[paste0('a', 2)]][1]
First of all, it is not generally a great idea to use assign to create objects in the global environment. In preference, you should be creating a named list instead for all sorts of good reasons, not least of which is the ability to iterate over the objects you create.
Secondly, note that the block of code:
varnames <- c()
for(i in 1:2){
varnames <- c(varnames, paste0('a', toString(i)))
}
colnames(df) <- varnames
Can be replaced with the one-liner:
colnames(df) <- paste0("a", 1:2)
Finally, you should take advantage of R's vectorization and the ability to subset with ["colname"] notation. This removes the need for an explicit loop altogether here:
df[paste0("a", 1:2)] <- sapply(1:2, \(i) rep(i * pi, 3))
df
#> a1 a2
#> 1 3.141593 6.283185
#> 2 3.141593 6.283185
#> 3 3.141593 6.283185
I have a data.frame, and I want to subset it every 10 rows and then applied a function to the subset, save the object, and remove the previous object. Here is what I got so far
L3 <- LETTERS[1:20]
df <- data.frame(1:391, "col", sample(L3, 391, replace = TRUE))
names(df) <- c("a", "b", "c")
b <- seq(from=1, to=391, by=10)
nsamp <- 0
for(i in seq_along(b)){
a <- i+1
nsamp <- nsamp+1
df_10 <- df[b[nsamp]:b[a], ]
res <- lapply(seq_along(df_10$b), function(x){...}
saveRDS(res, file="res.rds")
rm(res)
}
My problem is the for loop crashes when reaching the last element of my sequence b
When partitioning data, split is your friend. It will create a list with each data subset as an item which is then easy to iterate over.
dfs = split(df, 1:nrow(df) %/% 10)
Then your for loop can be simplified to something like this (untested... I'm not exactly sure what you're doing because example data seems to switch from df to sc2_10 and I only hope your column named b is different from your vector named b):
for(i in seq_along(dfs)){
res <- lapply(seq_along(dfs[[i]]$b), function(x){...}
saveRDS(res, file = sprintf("res_%s.rds", i))
rm(res)
}
I also modified your save file name so that you aren't overwriting the same file every time.
A post on here a day back has me wondering how to assign values to multiple objects in the global environment from within a function. This is my attempt using lapply (assign may be safer than <<- but I have never actually used it and am not familiar with it).
#fake data set
df <- data.frame(
x.2=rnorm(25),
y.2=rnorm(25),
g=rep(factor(LETTERS[1:5]), 5)
)
#split it into a list of data frames
LIST <- split(df, df$g)
#pre-allot 5 objects in R with class data.frame()
V <- W <- X <- Y <- Z <- data.frame()
#attempt to assign the data frames in the LIST to the objects just created
lapply(seq_along(LIST), function(x) c(V, W, X, Y, Z)[x] <<- LIST[[x]])
Please feel free to shorten any/all parts of my code to make this work (or work better/faster).
Update of 2018-10-10:
The most succinct way to carry out this specific task is to use list2env() like so:
## Create an example list of five data.frames
df <- data.frame(x = rnorm(25),
g = rep(factor(LETTERS[1:5]), 5))
LIST <- split(df, df$g)
## Assign them to the global environment
list2env(LIST, envir = .GlobalEnv)
## Check that it worked
ls()
## [1] "A" "B" "C" "D" "df" "E" "LIST"
Original answer, demonstrating use of assign()
You're right that assign() is the right tool for the job. Its envir argument gives you precise control over where assignment takes place -- control that is not available with either <- or <<-.
So, for example, to assign the value of X to an object named NAME in the the global environment, you would do:
assign("NAME", X, envir = .GlobalEnv)
In your case:
df <- data.frame(x = rnorm(25),
g = rep(factor(LETTERS[1:5]), 5))
LIST <- split(df, df$g)
NAMES <- c("V", "W", "X", "Y", "Z")
lapply(seq_along(LIST),
function(x) {
assign(NAMES[x], LIST[[x]], envir=.GlobalEnv)
}
)
ls()
[1] "df" "LIST" "NAMES" "V" "W" "X" "Y" "Z"
I think this question can have a nice crossover with this one: Can lists be created that name themselves based on input object names?
Say you want to do the same modification to a set of objects on the fly. But list2env() requires a named list, and you don't want to copy and paste them again. Borrowing the namedList function, and combining it with
Josh O'Brien anwser:
> namedList <- function(...) {
+ L <- list(...)
+ snm <- sapply(substitute(list(...)), deparse)[-1]
+ if (is.null(nm <- names(L))) nm <- snm
+ if (any(nonames <- nm=="")) nm[nonames] <- snm[nonames]
+ setNames(L ,nm)
+ }
>
> df_1 <- data.frame(x = 1)
> df_2 <- data.frame(x = 2)
> df_3 <- data.frame(x = 3)
>
> list2env(lapply(namedList(df_1, df_2, df_3), function(x) {
+ x <- cbind.data.frame(x, y = "B")
+ }), envir = .GlobalEnv)
<environment: R_GlobalEnv>
>
> df_1
x y
1 1 B
> df_2
x y
1 2 B
> df_3
x y
1 3 B
If you have a list of object names and file paths you can also use mapply:
object_names <- c("df_1", "df_2", "df_3")
file_paths <- list.files({path}, pattern = ".csv", full.names = T)
mapply(function(df_name, file)
assign(df_name, read.csv(file), envir=.GlobalEnv),
object_names,
file_paths)
I used list.files() to construct a vector of all the .csv files in a
specific directory. But file_paths could be written or constructed in any way.
If the files you want to read in are in the current working
directory, then file_paths could be replaced with a character vector of
file names.
In the code above, you need to replace {path} with a
string of the desired directory's path.
This demonstrates how to split out a nested dataframe into objects in the global environment with tidyverse functions:
library(tidyverse)
library(palmerpenguins)
penguins %>%
group_nest(species) %>%
deframe() %>%
list2env(.GlobalEnv)
I want to pass variables within the .Globalenv when inside a function. Basically concatenate x number of data frames into a matrix.
Here is some dummy code;
Alpha <- data.frame(lon=124.9167,lat=1.53333)
Alpha_2 <- data.frame(lon=3.13333, lat=42.48333)
Alpha_3 <- data.frame(lon=-91.50667, lat=27.78333)
myfunc <- function(x){
vars <- ls(.GlobalEnv, pattern=x)
mat <- as.matrix(rbind(vars[1], vars[2], vars[3]))
return(mat)
}
When calling myfunc('Alpha') I would like the same thing to be returned as when you run;
as.matrix(rbind(Alpha, Alpha_2, Alpha_3)
lon lat
1 124.91670 1.53333
2 3.13333 42.48333
3 -91.50667 27.78333
Any pointers would be appreciated, thanks!
You can use get to retrieve variables by name. We do this here in a loop with lapply, and then use rbind to bind them together.
myfunc <- function(x){
vars <- ls(.GlobalEnv, pattern=x)
df <- do.call(rbind, mget(vars, .GlobalEnv)) # courtesy #Roland
return(df)
}
myfunc("Alpha")
# lon lat
# 1 124.91670 1.53333
# 2 3.13333 42.48333
# 3 -91.50667 27.78333
Note, in practice, you probably want to check that the variables that match the pattern actually are what you think they are, but this gives you the rough tools you want.
Old version (2nd line of func):
df <- do.call(rbind, lapply(vars, get, envir=.GlobalEnv))
My question is why does the last statement "a <- ..." work to give me a subset of that data frame within the list, but when I try to automate the process with a for loop through all data frames in the list I am met with all kinds of warnings and not the answer I am looking for??
time <- c(1:20)
temp <- c(2,3,4,5,6,2,3,4,5,6,2,3,4,5,6,2,3,4,5,6)
data <- data.frame(time,temp)
tmp <- c(1,diff(data[[2]]))
tmp2 <- tmp < 0
tmp3 <- cumsum(tmp2)
data1 <- split(data, tmp3)
#this does not work. I want to automate the successful process below through all data frames in the list "data1"
for(i in 1:length(data1)){
finale[i] <- subset(data1[[i]], data1[[i]][,2] > 3)
}
#this works to give me a part of what I want
a <- subset(data1[[1]], data1[[1]][,2] >3)
Maybe you may want to try with lapply
lapply(data1, function(x) subset(x, x[,2]>3))
Same result using a for loop
finale <- vector("list", length(data1))
for(i in 1:length(data1)){
finale[[i]] <- subset(data1[[i]], data1[[i]][,2] > 3)
}
It works because I preallocate a type and a length for finale, it didn't work for you, because you did not declare what finale should be.
You're trying to save a data.frame (2D object) in a vector (1D objetc). Just define finale as list and the code will work:
time <- c(1:20)
temp <- c(2,3,4,5,6,2,3,4,5,6,2,3,4,5,6,2,3,4,5,6)
data <- data.frame(time,temp)
tmp <- c(1,diff(data[[2]]))
tmp2 <- tmp < 0
tmp3 <- cumsum(tmp2)
data1 <- split(data, tmp3)
#this does not work. I want to automate the successful process below through all data frames in the list "data1"
finale <- vector(mode='list')
for(i in 1:length(data1)){
finale[[i]] <- subset(data1[[i]], data1[[i]][,2] > 3) # Use [[i]] instead of [i]
}
To save all in 1 data.frame:
finale <- do.call(rbind, finale)