From the `environment` to a `list` - r

I have a set of datasets that end with .fin. I would like to create a list and merge them using
ls(pattern = ".fin")
"A.fin" "B.fin" "C.fin" "D.fin" "E.fin" "F.fin" "G.fin" "H.fin" "I.fin"
"J.fin" "K.fin" "L.fin" "M.fin" "N.fin"
I would like to go from the line and code above to the line below beginning with list, like list(ls(pattern = ".fin")); however this only returns a vector in a list of the data set names. I have also tried using list(get(ls(pattern = ".fin")) and list(eval(parse(text = ls(pattern = .fin)))) with no avail.
list(ls(pattern = ".fin")) ### <- REPLACE THIS SOMEHOW %>%
Reduce(function(dtf1,dtf2) full_join(dtf1,dtf2,by="i"), .)

You can use mget:
mget(ls(pattern = ".fin"))
A.fin <- c(1,2,3)
B.fin <- c(4,5,6)
mget(ls(pattern = ".fin"))
#$A.fin
#[1] 1 2 3
#$B.fin
#[1] 4 5 6

get is not vectorized so you should "loop" over whatever ls() is returning. You can do that either
sapply(ls(pattern = ".fin"), FUN = get)
or the long way
xy <- ls(pattern = ".fin")
mylist <- vector("list", length(xy))
for (i in 1:length(mylist)) {
mylist[[i]] <- get(xy[i])
}
or use mget(ls(pattern = ".fin")).

Related

How to order multiple dataframes in Global Environment R

I'm trying to Run a simulation but I'm having trouble storing multiple dataframes called "data_i" in a list ordering by i. I start with a df called "data_", which has data from 1901 to 2032 (132 rows). I apply a loop to create one dataframe per row called data_1, data_2,data_3,...,data_132 (row of 2032 is stored in data_132). Finally, I store all this dataframes in a list and use lapply to create a column in each dataframe. Here is a reproducible example:
#Main dataframe
time <- 1901:2032
b <- 1:132
data_ <- data.frame(time,b)
#Loop for creating data_i where i goes from 1 to 132
simulations <- 10000
for (i in 1:132) {
assign(paste("data_",i, sep = ""), as.data.frame( sapply(data_[i,], function(n) rep(n,simulations)), stringsAsFactors = FALSE ))
}
#Store all dataframes in list (**I THINK THE PROBLEM IS HERE**)
data_names<-str_extract(ls(), '^data_[[:digit:]]{1,3}$')[!is.na(str_extract(ls(), '^data_[[:digit:]]{1,3}$'))]
dataframes<-lapply(data_names, function(x)get(x))
#Create a new column in each dataframe
new_list <- lapply(dataframes, function(x) cbind(x, production = as.numeric(runif(simulations, min = 50, max = 100))))
#Create data_newi in environnment
list2env(setNames(new_list,paste0("data_new", seq_along(dataframes))),
envir = parent.frame())
The code runs but the problem is that the order of the dataframes is not data_1, data_2,data_3,...,data_132 but data_1,data_10,data_100,data_101...This generates that data_names stores this values in that order. This will lead to, for example, 2032 not being in data_new132 as I would want it to be.
Does anybody knows how to solve this? Thanks in advance!
Andres, See if this helps. I added a pad of '0' for the max number of characters (e.g. 132 = 3 characters wide):
#Main dataframe
time <- 1901:2032
b <- 1:132
data_ <- data.frame(time,b)
#Loop for creating data_i where i goes from 1 to 132
simulations <- 10000
for (i in 1:132) {
assign(paste("data_",str_pad(i,nchar(max(b)),pad="0"), sep = ""), as.data.frame( sapply(data_[i,], function(n) rep(n,simulations)), stringsAsFactors = FALSE ))
}
#Store all dataframes in list (**I THINK THE PROBLEM IS HERE**)
data_names<-str_extract(ls(), '^data_[[:digit:]]{1,3}$')[!is.na(str_extract(ls(), '^data_[[:digit:]]{1,3}$'))]
dataframes<-lapply(data_names, function(x)get(x))
#Create a new column in each dataframe
new_list <- lapply(dataframes, function(x) cbind(x, production = as.numeric(runif(simulations, min = 50, max = 100))))
#Create data_newi in environnment
list2env(setNames(new_list,paste0("data_new", paste(str_pad(seq_along(dataframes),nchar(max(seq_along(dataframes))),pad="0"),sep=""))),
envir = parent.frame())
1) Use mixedsort in gtools:
library(gtools)
for(i in c(2, 10)) assign(paste0("data", i), i)
ls(pattern = "^data")
## [1] "data10" "data2"
mixedsort(ls(pattern = "^data"))
## [1] "data2" "data10"
2) or ensure that the names are the same length using leading 0's in which case ls() will sort them appropriately:
for(i in c(2, 10)) assign(sprintf("data%03d", i), i)
ls(pattern = "^data")
## [1] "data002" "data010"
3) Normally one does not assign such objects directly into the global environment but puts them into a list. One can refer to elements using L[[1]], etc.
L <- list()
# for(i in 1:3) L[[i]] <- i
L
## [[1]]
## [1] 1
##
## [[2]]
## [1] 2
##
## [[3]]
## [1] 3
3a) or in one line:
L <- lapply(1:3, function(i) i)

split list into atomic character vectors by name [duplicate]

A post on here a day back has me wondering how to assign values to multiple objects in the global environment from within a function. This is my attempt using lapply (assign may be safer than <<- but I have never actually used it and am not familiar with it).
#fake data set
df <- data.frame(
x.2=rnorm(25),
y.2=rnorm(25),
g=rep(factor(LETTERS[1:5]), 5)
)
#split it into a list of data frames
LIST <- split(df, df$g)
#pre-allot 5 objects in R with class data.frame()
V <- W <- X <- Y <- Z <- data.frame()
#attempt to assign the data frames in the LIST to the objects just created
lapply(seq_along(LIST), function(x) c(V, W, X, Y, Z)[x] <<- LIST[[x]])
Please feel free to shorten any/all parts of my code to make this work (or work better/faster).
Update of 2018-10-10:
The most succinct way to carry out this specific task is to use list2env() like so:
## Create an example list of five data.frames
df <- data.frame(x = rnorm(25),
g = rep(factor(LETTERS[1:5]), 5))
LIST <- split(df, df$g)
## Assign them to the global environment
list2env(LIST, envir = .GlobalEnv)
## Check that it worked
ls()
## [1] "A" "B" "C" "D" "df" "E" "LIST"
Original answer, demonstrating use of assign()
You're right that assign() is the right tool for the job. Its envir argument gives you precise control over where assignment takes place -- control that is not available with either <- or <<-.
So, for example, to assign the value of X to an object named NAME in the the global environment, you would do:
assign("NAME", X, envir = .GlobalEnv)
In your case:
df <- data.frame(x = rnorm(25),
g = rep(factor(LETTERS[1:5]), 5))
LIST <- split(df, df$g)
NAMES <- c("V", "W", "X", "Y", "Z")
lapply(seq_along(LIST),
function(x) {
assign(NAMES[x], LIST[[x]], envir=.GlobalEnv)
}
)
ls()
[1] "df" "LIST" "NAMES" "V" "W" "X" "Y" "Z"
I think this question can have a nice crossover with this one: Can lists be created that name themselves based on input object names?
Say you want to do the same modification to a set of objects on the fly. But list2env() requires a named list, and you don't want to copy and paste them again. Borrowing the namedList function, and combining it with
Josh O'Brien anwser:
> namedList <- function(...) {
+ L <- list(...)
+ snm <- sapply(substitute(list(...)), deparse)[-1]
+ if (is.null(nm <- names(L))) nm <- snm
+ if (any(nonames <- nm=="")) nm[nonames] <- snm[nonames]
+ setNames(L ,nm)
+ }
>
> df_1 <- data.frame(x = 1)
> df_2 <- data.frame(x = 2)
> df_3 <- data.frame(x = 3)
>
> list2env(lapply(namedList(df_1, df_2, df_3), function(x) {
+ x <- cbind.data.frame(x, y = "B")
+ }), envir = .GlobalEnv)
<environment: R_GlobalEnv>
>
> df_1
x y
1 1 B
> df_2
x y
1 2 B
> df_3
x y
1 3 B
If you have a list of object names and file paths you can also use mapply:
object_names <- c("df_1", "df_2", "df_3")
file_paths <- list.files({path}, pattern = ".csv", full.names = T)
mapply(function(df_name, file)
assign(df_name, read.csv(file), envir=.GlobalEnv),
object_names,
file_paths)
I used list.files() to construct a vector of all the .csv files in a
specific directory. But file_paths could be written or constructed in any way.
If the files you want to read in are in the current working
directory, then file_paths could be replaced with a character vector of
file names.
In the code above, you need to replace {path} with a
string of the desired directory's path.
This demonstrates how to split out a nested dataframe into objects in the global environment with tidyverse functions:
library(tidyverse)
library(palmerpenguins)
penguins %>%
group_nest(species) %>%
deframe() %>%
list2env(.GlobalEnv)

Perform Student's t-test between data.frames contained in two lists

I have got two separate lists which contain 4 data.frames each one. I need to perform a Student's t-test (t.test) for rainfall between each data.frames within the two lists.
Here the lists:
lst1 = list(data.frame(rnorm(20), rnorm(20)), data.frame(rnorm(25), rnorm(25)), data.frame(rnorm(16), rnorm(16)), data.frame(rnorm(34), rnorm(34)))
lst1 = lapply(lst1, setNames, c('rainfall', 'snow'))
lst2 = list(data.frame(rnorm(19), rnorm(19)), data.frame(rnorm(38), rnorm(38)), data.frame(rnorm(22), rnorm(22)), data.frame(rnorm(59), rnorm(59)))
lst2 = lapply(lst2, setNames, c('rainfall', 'snow'))
What I would need to do is:
t.test(lst1[[1]]$rainfall, lst2[[1]]$rainfall)
t.test(lst1[[2]]$rainfall, lst2[[2]]$rainfall)
t.test(lst1[[3]]$rainfall, lst2[[3]]$rainfall)
t.test(lst1[[4]]$rainfall, lst2[[4]]$rainfall)
I can do it as above by writing each of the 4 data.frames (I actually have 40 with my real data) but I would like to know if there exists a smarter and quickier way to do it.
Here below what I tried (without success):
myfunction = function(x,y) {
test = t.test(x, y)
return(test)
}
result = mapply(myfunction, x=lst1, y=lst2)
x <- NULL
for (i in seq_along(lst1)){
x[[i]] <- t.test(lst1[[i]]$rainfall, lst2[[i]]$rainfall)
}
x
Works for me. I would use simplify = FALSE to get the results formatted better though.
lst1 <- list()
lst1[[1]] <- data.frame(rainfall = rnorm(10))
lst1[[2]] <- data.frame(rainfall = rnorm(10))
lst2 <- list()
lst2[[1]] <- data.frame(rainfall = rnorm(10))
lst2[[2]] <- data.frame(rainfall = rnorm(10))
myfunction = function(x,y) {
test = t.test(x$rainfall, y$rainfall)
return(test)
}
mapply(myfunction, x = lst1, y = lst2, SIMPLIFY = FALSE)

Store a single dataframe in environment from function in R [duplicate]

A post on here a day back has me wondering how to assign values to multiple objects in the global environment from within a function. This is my attempt using lapply (assign may be safer than <<- but I have never actually used it and am not familiar with it).
#fake data set
df <- data.frame(
x.2=rnorm(25),
y.2=rnorm(25),
g=rep(factor(LETTERS[1:5]), 5)
)
#split it into a list of data frames
LIST <- split(df, df$g)
#pre-allot 5 objects in R with class data.frame()
V <- W <- X <- Y <- Z <- data.frame()
#attempt to assign the data frames in the LIST to the objects just created
lapply(seq_along(LIST), function(x) c(V, W, X, Y, Z)[x] <<- LIST[[x]])
Please feel free to shorten any/all parts of my code to make this work (or work better/faster).
Update of 2018-10-10:
The most succinct way to carry out this specific task is to use list2env() like so:
## Create an example list of five data.frames
df <- data.frame(x = rnorm(25),
g = rep(factor(LETTERS[1:5]), 5))
LIST <- split(df, df$g)
## Assign them to the global environment
list2env(LIST, envir = .GlobalEnv)
## Check that it worked
ls()
## [1] "A" "B" "C" "D" "df" "E" "LIST"
Original answer, demonstrating use of assign()
You're right that assign() is the right tool for the job. Its envir argument gives you precise control over where assignment takes place -- control that is not available with either <- or <<-.
So, for example, to assign the value of X to an object named NAME in the the global environment, you would do:
assign("NAME", X, envir = .GlobalEnv)
In your case:
df <- data.frame(x = rnorm(25),
g = rep(factor(LETTERS[1:5]), 5))
LIST <- split(df, df$g)
NAMES <- c("V", "W", "X", "Y", "Z")
lapply(seq_along(LIST),
function(x) {
assign(NAMES[x], LIST[[x]], envir=.GlobalEnv)
}
)
ls()
[1] "df" "LIST" "NAMES" "V" "W" "X" "Y" "Z"
I think this question can have a nice crossover with this one: Can lists be created that name themselves based on input object names?
Say you want to do the same modification to a set of objects on the fly. But list2env() requires a named list, and you don't want to copy and paste them again. Borrowing the namedList function, and combining it with
Josh O'Brien anwser:
> namedList <- function(...) {
+ L <- list(...)
+ snm <- sapply(substitute(list(...)), deparse)[-1]
+ if (is.null(nm <- names(L))) nm <- snm
+ if (any(nonames <- nm=="")) nm[nonames] <- snm[nonames]
+ setNames(L ,nm)
+ }
>
> df_1 <- data.frame(x = 1)
> df_2 <- data.frame(x = 2)
> df_3 <- data.frame(x = 3)
>
> list2env(lapply(namedList(df_1, df_2, df_3), function(x) {
+ x <- cbind.data.frame(x, y = "B")
+ }), envir = .GlobalEnv)
<environment: R_GlobalEnv>
>
> df_1
x y
1 1 B
> df_2
x y
1 2 B
> df_3
x y
1 3 B
If you have a list of object names and file paths you can also use mapply:
object_names <- c("df_1", "df_2", "df_3")
file_paths <- list.files({path}, pattern = ".csv", full.names = T)
mapply(function(df_name, file)
assign(df_name, read.csv(file), envir=.GlobalEnv),
object_names,
file_paths)
I used list.files() to construct a vector of all the .csv files in a
specific directory. But file_paths could be written or constructed in any way.
If the files you want to read in are in the current working
directory, then file_paths could be replaced with a character vector of
file names.
In the code above, you need to replace {path} with a
string of the desired directory's path.
This demonstrates how to split out a nested dataframe into objects in the global environment with tidyverse functions:
library(tidyverse)
library(palmerpenguins)
penguins %>%
group_nest(species) %>%
deframe() %>%
list2env(.GlobalEnv)

Assign multiple objects to .GlobalEnv from within a function

A post on here a day back has me wondering how to assign values to multiple objects in the global environment from within a function. This is my attempt using lapply (assign may be safer than <<- but I have never actually used it and am not familiar with it).
#fake data set
df <- data.frame(
x.2=rnorm(25),
y.2=rnorm(25),
g=rep(factor(LETTERS[1:5]), 5)
)
#split it into a list of data frames
LIST <- split(df, df$g)
#pre-allot 5 objects in R with class data.frame()
V <- W <- X <- Y <- Z <- data.frame()
#attempt to assign the data frames in the LIST to the objects just created
lapply(seq_along(LIST), function(x) c(V, W, X, Y, Z)[x] <<- LIST[[x]])
Please feel free to shorten any/all parts of my code to make this work (or work better/faster).
Update of 2018-10-10:
The most succinct way to carry out this specific task is to use list2env() like so:
## Create an example list of five data.frames
df <- data.frame(x = rnorm(25),
g = rep(factor(LETTERS[1:5]), 5))
LIST <- split(df, df$g)
## Assign them to the global environment
list2env(LIST, envir = .GlobalEnv)
## Check that it worked
ls()
## [1] "A" "B" "C" "D" "df" "E" "LIST"
Original answer, demonstrating use of assign()
You're right that assign() is the right tool for the job. Its envir argument gives you precise control over where assignment takes place -- control that is not available with either <- or <<-.
So, for example, to assign the value of X to an object named NAME in the the global environment, you would do:
assign("NAME", X, envir = .GlobalEnv)
In your case:
df <- data.frame(x = rnorm(25),
g = rep(factor(LETTERS[1:5]), 5))
LIST <- split(df, df$g)
NAMES <- c("V", "W", "X", "Y", "Z")
lapply(seq_along(LIST),
function(x) {
assign(NAMES[x], LIST[[x]], envir=.GlobalEnv)
}
)
ls()
[1] "df" "LIST" "NAMES" "V" "W" "X" "Y" "Z"
I think this question can have a nice crossover with this one: Can lists be created that name themselves based on input object names?
Say you want to do the same modification to a set of objects on the fly. But list2env() requires a named list, and you don't want to copy and paste them again. Borrowing the namedList function, and combining it with
Josh O'Brien anwser:
> namedList <- function(...) {
+ L <- list(...)
+ snm <- sapply(substitute(list(...)), deparse)[-1]
+ if (is.null(nm <- names(L))) nm <- snm
+ if (any(nonames <- nm=="")) nm[nonames] <- snm[nonames]
+ setNames(L ,nm)
+ }
>
> df_1 <- data.frame(x = 1)
> df_2 <- data.frame(x = 2)
> df_3 <- data.frame(x = 3)
>
> list2env(lapply(namedList(df_1, df_2, df_3), function(x) {
+ x <- cbind.data.frame(x, y = "B")
+ }), envir = .GlobalEnv)
<environment: R_GlobalEnv>
>
> df_1
x y
1 1 B
> df_2
x y
1 2 B
> df_3
x y
1 3 B
If you have a list of object names and file paths you can also use mapply:
object_names <- c("df_1", "df_2", "df_3")
file_paths <- list.files({path}, pattern = ".csv", full.names = T)
mapply(function(df_name, file)
assign(df_name, read.csv(file), envir=.GlobalEnv),
object_names,
file_paths)
I used list.files() to construct a vector of all the .csv files in a
specific directory. But file_paths could be written or constructed in any way.
If the files you want to read in are in the current working
directory, then file_paths could be replaced with a character vector of
file names.
In the code above, you need to replace {path} with a
string of the desired directory's path.
This demonstrates how to split out a nested dataframe into objects in the global environment with tidyverse functions:
library(tidyverse)
library(palmerpenguins)
penguins %>%
group_nest(species) %>%
deframe() %>%
list2env(.GlobalEnv)

Resources