How to change column names while in a loop in R? - r

for(i in 1:3){
names <- c("n1","n2","n3")
assign(paste0("mydf",i), data.frame(matrix("", nrow = 3, ncol = 3)))
}
I tried the code shown below but it didn't work.
for(i in 1:3){
names <- c("n1","n2","n3")
assign(paste0("mydf",i), names(data.frame(matrix("", nrow = 3, ncol = 3)))[1:3] <- names)
}
What's your solution? Thanks in advance.

This the approach I would take. The following script not only changes the column names, but also creates 3 dataframes in the global environment kind of like your original script.
for (i in 1:3){
noms <- c("n1","n2","n3") # create the names in order the columns appear in the dataframe
df_ <- data.frame(matrix("", nrow = 3, ncol = 3)) # create the dataframe
df_nom <- paste("mydf", i, sep = "") # create the dataframe name
colnames(df_) <- noms # assign the names to the columns
assign(df_nom, df_) # rename the dataframe
}

1) Normally one puts the data frames in a list but if you really want to them into the current environment then do the following. If you want the global environment then replace the first line with e <- .GlobalEnv or if you want to create a list instead (preferable) then use e <- list() instead.
# define 3 data frames
e <- environment() # or e <- .GlobalEnv or e <- list()
nms <- paste0("mydf", 1:3)
for(nm in nms) e[[nm]] <- data.frame(matrix("", 3, 3))
# change their column names
for(nm in nms) names(e[[nm]]) <- c("n1", "n2", "n3")
2) Even better if we want lists is:
L <- Map(function(x) data.frame(matrix("", 3, 3)), paste0("mydf", 1:3))
L[] <- lapply(L, `names<-`, c("n1", "n2", "n3")) # change col names
Converting
Note that we can convert a list L to data frames that are loose in the environment using one of these depending on which environment you want to put the list components into.
list2env(L, environment())
list2env(L, .GlobalEnv)
and we can go the other way using where e is environment() or .GlobalEnv depending on what we need. We can omit the e argument is the data frames are in the current environment.
L <- mget(nms, e)

You can use get to get the data.frame by name, update the names and assign it back.
nNames <- c("n1","n2","n3")
for(i in 1:3) {
D <- paste0("mydf",i)
tt <- get(D)
names(tt) <- nNames
assign(D, tt)
}
names(mydf1)
#[1] "n1" "n2" "n3"
Alternatively the names could already be set when creating the matrix by using dimnames:
nNames <- c("n1","n2","n3")
for(i in 1:3) {
assign(paste0("mydf", i),
data.frame(matrix("", 3, 3, dimnames=list(NULL, nNames))))
}
names(mydf1)
#[1] "n1" "n2" "n3"

Related

How to order multiple dataframes in Global Environment R

I'm trying to Run a simulation but I'm having trouble storing multiple dataframes called "data_i" in a list ordering by i. I start with a df called "data_", which has data from 1901 to 2032 (132 rows). I apply a loop to create one dataframe per row called data_1, data_2,data_3,...,data_132 (row of 2032 is stored in data_132). Finally, I store all this dataframes in a list and use lapply to create a column in each dataframe. Here is a reproducible example:
#Main dataframe
time <- 1901:2032
b <- 1:132
data_ <- data.frame(time,b)
#Loop for creating data_i where i goes from 1 to 132
simulations <- 10000
for (i in 1:132) {
assign(paste("data_",i, sep = ""), as.data.frame( sapply(data_[i,], function(n) rep(n,simulations)), stringsAsFactors = FALSE ))
}
#Store all dataframes in list (**I THINK THE PROBLEM IS HERE**)
data_names<-str_extract(ls(), '^data_[[:digit:]]{1,3}$')[!is.na(str_extract(ls(), '^data_[[:digit:]]{1,3}$'))]
dataframes<-lapply(data_names, function(x)get(x))
#Create a new column in each dataframe
new_list <- lapply(dataframes, function(x) cbind(x, production = as.numeric(runif(simulations, min = 50, max = 100))))
#Create data_newi in environnment
list2env(setNames(new_list,paste0("data_new", seq_along(dataframes))),
envir = parent.frame())
The code runs but the problem is that the order of the dataframes is not data_1, data_2,data_3,...,data_132 but data_1,data_10,data_100,data_101...This generates that data_names stores this values in that order. This will lead to, for example, 2032 not being in data_new132 as I would want it to be.
Does anybody knows how to solve this? Thanks in advance!
Andres, See if this helps. I added a pad of '0' for the max number of characters (e.g. 132 = 3 characters wide):
#Main dataframe
time <- 1901:2032
b <- 1:132
data_ <- data.frame(time,b)
#Loop for creating data_i where i goes from 1 to 132
simulations <- 10000
for (i in 1:132) {
assign(paste("data_",str_pad(i,nchar(max(b)),pad="0"), sep = ""), as.data.frame( sapply(data_[i,], function(n) rep(n,simulations)), stringsAsFactors = FALSE ))
}
#Store all dataframes in list (**I THINK THE PROBLEM IS HERE**)
data_names<-str_extract(ls(), '^data_[[:digit:]]{1,3}$')[!is.na(str_extract(ls(), '^data_[[:digit:]]{1,3}$'))]
dataframes<-lapply(data_names, function(x)get(x))
#Create a new column in each dataframe
new_list <- lapply(dataframes, function(x) cbind(x, production = as.numeric(runif(simulations, min = 50, max = 100))))
#Create data_newi in environnment
list2env(setNames(new_list,paste0("data_new", paste(str_pad(seq_along(dataframes),nchar(max(seq_along(dataframes))),pad="0"),sep=""))),
envir = parent.frame())
1) Use mixedsort in gtools:
library(gtools)
for(i in c(2, 10)) assign(paste0("data", i), i)
ls(pattern = "^data")
## [1] "data10" "data2"
mixedsort(ls(pattern = "^data"))
## [1] "data2" "data10"
2) or ensure that the names are the same length using leading 0's in which case ls() will sort them appropriately:
for(i in c(2, 10)) assign(sprintf("data%03d", i), i)
ls(pattern = "^data")
## [1] "data002" "data010"
3) Normally one does not assign such objects directly into the global environment but puts them into a list. One can refer to elements using L[[1]], etc.
L <- list()
# for(i in 1:3) L[[i]] <- i
L
## [[1]]
## [1] 1
##
## [[2]]
## [1] 2
##
## [[3]]
## [1] 3
3a) or in one line:
L <- lapply(1:3, function(i) i)

split list into atomic character vectors by name [duplicate]

A post on here a day back has me wondering how to assign values to multiple objects in the global environment from within a function. This is my attempt using lapply (assign may be safer than <<- but I have never actually used it and am not familiar with it).
#fake data set
df <- data.frame(
x.2=rnorm(25),
y.2=rnorm(25),
g=rep(factor(LETTERS[1:5]), 5)
)
#split it into a list of data frames
LIST <- split(df, df$g)
#pre-allot 5 objects in R with class data.frame()
V <- W <- X <- Y <- Z <- data.frame()
#attempt to assign the data frames in the LIST to the objects just created
lapply(seq_along(LIST), function(x) c(V, W, X, Y, Z)[x] <<- LIST[[x]])
Please feel free to shorten any/all parts of my code to make this work (or work better/faster).
Update of 2018-10-10:
The most succinct way to carry out this specific task is to use list2env() like so:
## Create an example list of five data.frames
df <- data.frame(x = rnorm(25),
g = rep(factor(LETTERS[1:5]), 5))
LIST <- split(df, df$g)
## Assign them to the global environment
list2env(LIST, envir = .GlobalEnv)
## Check that it worked
ls()
## [1] "A" "B" "C" "D" "df" "E" "LIST"
Original answer, demonstrating use of assign()
You're right that assign() is the right tool for the job. Its envir argument gives you precise control over where assignment takes place -- control that is not available with either <- or <<-.
So, for example, to assign the value of X to an object named NAME in the the global environment, you would do:
assign("NAME", X, envir = .GlobalEnv)
In your case:
df <- data.frame(x = rnorm(25),
g = rep(factor(LETTERS[1:5]), 5))
LIST <- split(df, df$g)
NAMES <- c("V", "W", "X", "Y", "Z")
lapply(seq_along(LIST),
function(x) {
assign(NAMES[x], LIST[[x]], envir=.GlobalEnv)
}
)
ls()
[1] "df" "LIST" "NAMES" "V" "W" "X" "Y" "Z"
I think this question can have a nice crossover with this one: Can lists be created that name themselves based on input object names?
Say you want to do the same modification to a set of objects on the fly. But list2env() requires a named list, and you don't want to copy and paste them again. Borrowing the namedList function, and combining it with
Josh O'Brien anwser:
> namedList <- function(...) {
+ L <- list(...)
+ snm <- sapply(substitute(list(...)), deparse)[-1]
+ if (is.null(nm <- names(L))) nm <- snm
+ if (any(nonames <- nm=="")) nm[nonames] <- snm[nonames]
+ setNames(L ,nm)
+ }
>
> df_1 <- data.frame(x = 1)
> df_2 <- data.frame(x = 2)
> df_3 <- data.frame(x = 3)
>
> list2env(lapply(namedList(df_1, df_2, df_3), function(x) {
+ x <- cbind.data.frame(x, y = "B")
+ }), envir = .GlobalEnv)
<environment: R_GlobalEnv>
>
> df_1
x y
1 1 B
> df_2
x y
1 2 B
> df_3
x y
1 3 B
If you have a list of object names and file paths you can also use mapply:
object_names <- c("df_1", "df_2", "df_3")
file_paths <- list.files({path}, pattern = ".csv", full.names = T)
mapply(function(df_name, file)
assign(df_name, read.csv(file), envir=.GlobalEnv),
object_names,
file_paths)
I used list.files() to construct a vector of all the .csv files in a
specific directory. But file_paths could be written or constructed in any way.
If the files you want to read in are in the current working
directory, then file_paths could be replaced with a character vector of
file names.
In the code above, you need to replace {path} with a
string of the desired directory's path.
This demonstrates how to split out a nested dataframe into objects in the global environment with tidyverse functions:
library(tidyverse)
library(palmerpenguins)
penguins %>%
group_nest(species) %>%
deframe() %>%
list2env(.GlobalEnv)

subsetting a data.frame using a for loop

I have a data.frame, and I want to subset it every 10 rows and then applied a function to the subset, save the object, and remove the previous object. Here is what I got so far
L3 <- LETTERS[1:20]
df <- data.frame(1:391, "col", sample(L3, 391, replace = TRUE))
names(df) <- c("a", "b", "c")
b <- seq(from=1, to=391, by=10)
nsamp <- 0
for(i in seq_along(b)){
a <- i+1
nsamp <- nsamp+1
df_10 <- df[b[nsamp]:b[a], ]
res <- lapply(seq_along(df_10$b), function(x){...}
saveRDS(res, file="res.rds")
rm(res)
}
My problem is the for loop crashes when reaching the last element of my sequence b
When partitioning data, split is your friend. It will create a list with each data subset as an item which is then easy to iterate over.
dfs = split(df, 1:nrow(df) %/% 10)
Then your for loop can be simplified to something like this (untested... I'm not exactly sure what you're doing because example data seems to switch from df to sc2_10 and I only hope your column named b is different from your vector named b):
for(i in seq_along(dfs)){
res <- lapply(seq_along(dfs[[i]]$b), function(x){...}
saveRDS(res, file = sprintf("res_%s.rds", i))
rm(res)
}
I also modified your save file name so that you aren't overwriting the same file every time.

Store a single dataframe in environment from function in R [duplicate]

A post on here a day back has me wondering how to assign values to multiple objects in the global environment from within a function. This is my attempt using lapply (assign may be safer than <<- but I have never actually used it and am not familiar with it).
#fake data set
df <- data.frame(
x.2=rnorm(25),
y.2=rnorm(25),
g=rep(factor(LETTERS[1:5]), 5)
)
#split it into a list of data frames
LIST <- split(df, df$g)
#pre-allot 5 objects in R with class data.frame()
V <- W <- X <- Y <- Z <- data.frame()
#attempt to assign the data frames in the LIST to the objects just created
lapply(seq_along(LIST), function(x) c(V, W, X, Y, Z)[x] <<- LIST[[x]])
Please feel free to shorten any/all parts of my code to make this work (or work better/faster).
Update of 2018-10-10:
The most succinct way to carry out this specific task is to use list2env() like so:
## Create an example list of five data.frames
df <- data.frame(x = rnorm(25),
g = rep(factor(LETTERS[1:5]), 5))
LIST <- split(df, df$g)
## Assign them to the global environment
list2env(LIST, envir = .GlobalEnv)
## Check that it worked
ls()
## [1] "A" "B" "C" "D" "df" "E" "LIST"
Original answer, demonstrating use of assign()
You're right that assign() is the right tool for the job. Its envir argument gives you precise control over where assignment takes place -- control that is not available with either <- or <<-.
So, for example, to assign the value of X to an object named NAME in the the global environment, you would do:
assign("NAME", X, envir = .GlobalEnv)
In your case:
df <- data.frame(x = rnorm(25),
g = rep(factor(LETTERS[1:5]), 5))
LIST <- split(df, df$g)
NAMES <- c("V", "W", "X", "Y", "Z")
lapply(seq_along(LIST),
function(x) {
assign(NAMES[x], LIST[[x]], envir=.GlobalEnv)
}
)
ls()
[1] "df" "LIST" "NAMES" "V" "W" "X" "Y" "Z"
I think this question can have a nice crossover with this one: Can lists be created that name themselves based on input object names?
Say you want to do the same modification to a set of objects on the fly. But list2env() requires a named list, and you don't want to copy and paste them again. Borrowing the namedList function, and combining it with
Josh O'Brien anwser:
> namedList <- function(...) {
+ L <- list(...)
+ snm <- sapply(substitute(list(...)), deparse)[-1]
+ if (is.null(nm <- names(L))) nm <- snm
+ if (any(nonames <- nm=="")) nm[nonames] <- snm[nonames]
+ setNames(L ,nm)
+ }
>
> df_1 <- data.frame(x = 1)
> df_2 <- data.frame(x = 2)
> df_3 <- data.frame(x = 3)
>
> list2env(lapply(namedList(df_1, df_2, df_3), function(x) {
+ x <- cbind.data.frame(x, y = "B")
+ }), envir = .GlobalEnv)
<environment: R_GlobalEnv>
>
> df_1
x y
1 1 B
> df_2
x y
1 2 B
> df_3
x y
1 3 B
If you have a list of object names and file paths you can also use mapply:
object_names <- c("df_1", "df_2", "df_3")
file_paths <- list.files({path}, pattern = ".csv", full.names = T)
mapply(function(df_name, file)
assign(df_name, read.csv(file), envir=.GlobalEnv),
object_names,
file_paths)
I used list.files() to construct a vector of all the .csv files in a
specific directory. But file_paths could be written or constructed in any way.
If the files you want to read in are in the current working
directory, then file_paths could be replaced with a character vector of
file names.
In the code above, you need to replace {path} with a
string of the desired directory's path.
This demonstrates how to split out a nested dataframe into objects in the global environment with tidyverse functions:
library(tidyverse)
library(palmerpenguins)
penguins %>%
group_nest(species) %>%
deframe() %>%
list2env(.GlobalEnv)

Assign multiple objects to .GlobalEnv from within a function

A post on here a day back has me wondering how to assign values to multiple objects in the global environment from within a function. This is my attempt using lapply (assign may be safer than <<- but I have never actually used it and am not familiar with it).
#fake data set
df <- data.frame(
x.2=rnorm(25),
y.2=rnorm(25),
g=rep(factor(LETTERS[1:5]), 5)
)
#split it into a list of data frames
LIST <- split(df, df$g)
#pre-allot 5 objects in R with class data.frame()
V <- W <- X <- Y <- Z <- data.frame()
#attempt to assign the data frames in the LIST to the objects just created
lapply(seq_along(LIST), function(x) c(V, W, X, Y, Z)[x] <<- LIST[[x]])
Please feel free to shorten any/all parts of my code to make this work (or work better/faster).
Update of 2018-10-10:
The most succinct way to carry out this specific task is to use list2env() like so:
## Create an example list of five data.frames
df <- data.frame(x = rnorm(25),
g = rep(factor(LETTERS[1:5]), 5))
LIST <- split(df, df$g)
## Assign them to the global environment
list2env(LIST, envir = .GlobalEnv)
## Check that it worked
ls()
## [1] "A" "B" "C" "D" "df" "E" "LIST"
Original answer, demonstrating use of assign()
You're right that assign() is the right tool for the job. Its envir argument gives you precise control over where assignment takes place -- control that is not available with either <- or <<-.
So, for example, to assign the value of X to an object named NAME in the the global environment, you would do:
assign("NAME", X, envir = .GlobalEnv)
In your case:
df <- data.frame(x = rnorm(25),
g = rep(factor(LETTERS[1:5]), 5))
LIST <- split(df, df$g)
NAMES <- c("V", "W", "X", "Y", "Z")
lapply(seq_along(LIST),
function(x) {
assign(NAMES[x], LIST[[x]], envir=.GlobalEnv)
}
)
ls()
[1] "df" "LIST" "NAMES" "V" "W" "X" "Y" "Z"
I think this question can have a nice crossover with this one: Can lists be created that name themselves based on input object names?
Say you want to do the same modification to a set of objects on the fly. But list2env() requires a named list, and you don't want to copy and paste them again. Borrowing the namedList function, and combining it with
Josh O'Brien anwser:
> namedList <- function(...) {
+ L <- list(...)
+ snm <- sapply(substitute(list(...)), deparse)[-1]
+ if (is.null(nm <- names(L))) nm <- snm
+ if (any(nonames <- nm=="")) nm[nonames] <- snm[nonames]
+ setNames(L ,nm)
+ }
>
> df_1 <- data.frame(x = 1)
> df_2 <- data.frame(x = 2)
> df_3 <- data.frame(x = 3)
>
> list2env(lapply(namedList(df_1, df_2, df_3), function(x) {
+ x <- cbind.data.frame(x, y = "B")
+ }), envir = .GlobalEnv)
<environment: R_GlobalEnv>
>
> df_1
x y
1 1 B
> df_2
x y
1 2 B
> df_3
x y
1 3 B
If you have a list of object names and file paths you can also use mapply:
object_names <- c("df_1", "df_2", "df_3")
file_paths <- list.files({path}, pattern = ".csv", full.names = T)
mapply(function(df_name, file)
assign(df_name, read.csv(file), envir=.GlobalEnv),
object_names,
file_paths)
I used list.files() to construct a vector of all the .csv files in a
specific directory. But file_paths could be written or constructed in any way.
If the files you want to read in are in the current working
directory, then file_paths could be replaced with a character vector of
file names.
In the code above, you need to replace {path} with a
string of the desired directory's path.
This demonstrates how to split out a nested dataframe into objects in the global environment with tidyverse functions:
library(tidyverse)
library(palmerpenguins)
penguins %>%
group_nest(species) %>%
deframe() %>%
list2env(.GlobalEnv)

Resources