Naming dataframes based on counter iteration in R? - r

I have a loop that will spit out a bunch of dataframes, and want to name the dataframes based on current iteration of the loop, e.g. df1 for the first iteration, df2 for the second iteration, and so on.
However, i'm running into problems trying to use the loop iteration counter to construct the dataframe name. For example, let's imagine I am in the first iteration of the loop and want to name the dataframe:
counter <- 1
as.name(paste("df",counter,sep="")) <- data.frame(x = (1:10), y = (10:1))
I get an error
Error in as.name(paste("df", counter, sep = "")) <- data.frame(x = (1:10), :
target of assignment expands to non-language object
Does anyone know how I might use the counter information to create dataframe names?

This is meant to complement Richard's, as it felt a little too substantial to simply edit into his.
A typical code pattern for this sort of thing would be:
#Initialize an empty list of the desired length
dfs <- vector("list",3)
#Fill the list with data frames, naming as we go
for (i in seq_along(dfs)){
dfs[[i]] <- data.frame(x = runif(5),y = runif(5))
names(dfs)[[i]] <- paste0("df",i)
}
where the use of assign is typically frowned upon as bad (stylistically). If the naming of the data frames is very regular, you don't even need to do it in the loop:
names(dfs) <- paste0("df",seq_along(dfs))
you can do it in a vectorized fashion as above. And as I mentioned below Richard's answer, even though having them all in a list is never worse, and usually better, than having them as separate objects, you can convert the list to separate objects via:
list2env(dfs,envir = .GlobalEnv)

Instead of cluttering the global environment with data frames, it would be best to collect them in a list, and then you can use paste0 to name them in setNames with e.g.
> dfList <- setNames(list(data.frame(x = 1:10, y = 10:1)), paste0("df", 1))
after that you can refer to the data frame with
> dfList$df1
x y
1 1 10
2 2 9
3 3 8
4 4 7
5 5 6
6 6 5
7 7 4
8 8 3
9 9 2
10 10 1
As joran notes, if you insist on populating the global environment with these data frames, you can use
list2Env(dfList, envir = .GlobalEnv)
and the data frames will be assigned as objects in the global environment.

Use assign:
assign(paste0("df", counter), data.frame(x = (1:10), y = (10:1))

I think you are looking for
assign("name", dataframe)

Related

R - How to extract single dataframes from list of lists?

I have a list of lists called step2 containing dataframes like this one:
And I want to extract every element in the list as a single dataframe, so that I have one dataframe called Likert_rank_Americas, Likert_rank_APAC, Likert_rank_Civil_law and so on for each dataframe contained in the list.
I tried with this:
list2env(step2,envir=.GlobalEnv)
But I only get the sub-lists contained in the main one as single objects, like so:
While what I want instead are the underlying dataframes as standalone objects, with the names as specified above. Is it possible to do this in a neat way without using list2env for each sub-list and then manually renaming each dataset?
I am quite new to R so apologies if the solution's easy.
Thanks in advance!
Without any data provided by you, what you want specifically is hard to guess, but at a minimum, to access a list of dataframes you need to follow this kind of logic...
a.1 <- data.frame(matrix(1:9, nrow=3))
a.2 <- data.frame(matrix(6:14, nrow=3))
data <- list(list(a.1,a.2),list("1","2"))
# NOTE: want only info from data[1] processed
library(purrr)
b <- map_dfr(data[1],rbind)
b
class(b)
dim(b)
# > b
# X1 X2 X3
# 1 1 4 7
# 2 2 5 8
# 3 3 6 9
# 4 6 9 12
# 5 7 10 13
# 6 8 11 14
# > class(b)
# [1] "data.frame"
# > dim(b)
# [1] 6 3
I think this oneliner should work.
The function 'list2env' assigns all list components to the global environment.
The function 'lapply' applies the function 'list2env' to every element of the list 'step2'.
step2%>% lapply(.%>% {list2env(., envir=.GlobalEnv)})
You can rename the dataframes before doing so of course.
names(step2$Geography)<- c(
'Likert_Rank_Americas',
'Likert_Rank_APAC',
'Likert_Rank_EMEA',
'Likert_Rank_Global')
names(step2$Legal_System)<- c(
'Likert_Rank_Civil_law',
'Likert_Rank_Common_law')
I created a quick reproducible dataset for testing with (this is good practice to include when asking for help)
dat <- list(list(data.frame(), data.frame(), data.frame()), list(data.frame(), data.frame(), data.frame()))
names(dat) <- c('list1' , 'list2')
names(dat$list1) <- c('A', 'B', 'C')
names(dat$list2) <- c('D', 'E', 'F')
Then I used
lapply(dat, list2env, .GlobalEnv)
Edit: To rename the dataframes, use the same structure as above where I named the sample dataframe, but use the names you want the end objects to have. If you want to automate this process, I would seperate it into a different question, but I suspect you would be able to find another post with the answer already.
Something like (pseudo-code)...
name_vec <- paste0('naming_convention_', names(step2$Geography))
names(step2$Geography) <- name_vec

How to combine multiple data frames having similar variable names into one data frame?

I was trying to write a code to combine multiple data frames(Approximately 100) where each data frame is stored with variable name output1, output2,....,output100. I want to merge these data frames into a single dataframe using rbind function but it is not working as I have to write each variable name again.
I need a suggestion to write all variable names in one go or in the form of a loop.
Problem: I am trying to write the code as rbind(output1, output2, output3,....,output100) which is extremely long and tedious.
You could use mget. Example:
Calling ls() gives you the object names in your workspace.
ls()
# [1] "n" "out.lst" "output.1" "output.2" "output.3" "something.else"
Then use mget to grab the data frames by pattern= and rbind them using do.call.
output.long <- do.call(rbind, mget(ls(pattern="output.")))
# x y z
# output.1.1 1 1 2
# output.1.2 5 5 4
# output.2.1 2 1 4
# output.2.2 5 4 1
# output.3.1 5 4 2
# output.3.2 2 2 3
Toy data:
set.seed(42)
n <- 3
out.lst <- setNames(replicate(n, data.frame(x=sample(1:5, 2),
y=sample(1:5, 2),
z=sample(1:5, 2)), simplify=F),
paste0("output.", 1:n))
list2env(out.lst, env=.GlobalEnv)
If you're willing to use the tidyverse package, you can make output a list, then just write, say, combined <- bind_rows(output). That fits naturally with using lapply() to create the data frames in the first place.
[Untested code]
library(tidyverse)
output <- lapply(1:length(inputFiles), function(x) read.csv(inputFiles[x]))
combined <- bind_rows(output)

Creating Subset data frames in R within For loop [duplicate]

This question already has answers here:
Split a large dataframe into a list of data frames based on common value in column
(3 answers)
Closed 4 years ago.
What I am trying to do is filter a larger data frame into 78 unique data frames based on the value of the first column in the larger data frame. The only way I can think of doing it properly is by applying the filter() function inside a for() loop:
for (i in 1:nrow(plantline))
{x1 = filter(rawdta.df, Plant_Line == plantline$Plant_Line[i])}
The issue is I don't know how to create a new data frame, say x2, x3, x4... every time the loop runs.
Can someone tell me if that is possible or if I should be trying to do this some other way?
There must be many duplicates for this question
split(plantline, plantline$Plant_Line)
will create a list of data.frames.
However, depending on your use case, splitting the large data.frame into pieces might not be necessary as grouping can be used.
You could use split -
# creates a list of dataframes into 78 unique data frames based on
# the value of the first column in the larger data frame
lst = split(large_data_frame, large_data_frame$first_column)
# takes the dataframes out of the list into the global environment
# although it is not suggested since it is difficult to work with 78
# dataframes
list2env(lst, envir = .GlobalEnv)
The names of the dataframes will be the same as the value of the variables in the first column.
It would be easier if we could see the dataframes....
I propose something nevertheless. You can create a list of dataframes:
dataframes <- vector("list", nrow(plantline))
for (i in 1:nrow(plantline)){
dataframes[[i]] = filter(rawdta.df, Plant_Line == plantline$Plant_Line[i])
}
You can use assign :
for (i in 1:nrow(plantline))
{assign(paste0(x,i), filter(rawdta.df, Plant_Line == plantline$Plant_Line[i]))}
alternatively you can save your results in a list :
X <- list()
for (i in 1:nrow(plantline))
{X[[i]] = filter(rawdta.df, Plant_Line == plantline$Plant_Line[i])}
Would be easier with sample data. by would be my favorite.
d <- data.frame(plantline = rep(LETTERS[1:3], 4),
x = 1:12,
stringsAsFactors = F)
l <- by(d, d$plantline, data.frame)
print(l$A)
print(l$B)
Solution using plyr:
ma <- cbind(x = 1:10, y = (-4:5)^2, z = 1:2)
ma <- as.data.frame(ma)
library(plyr)
dlply(ma, "z") # you split ma by the column named z

R: Use lapply with a function acting on part of a split dataframe

This R code is setting up an example of the issue I am attempting to resolve. The data set measures a release of particles over non-uniform time intervals. The particle release is integrated over time using the trapezoid rule.
library(caTools)
test.data.frame <- data.frame(
sample = c('sample 1','sample 1','sample 1','sample 1',
'sample 2','sample 2','sample 2','sample 2'))
test.data.frame$time <- c(1,2,4,6,1,4,5,6)
test.data.frame$material.released.g <- c(5,3,2,1,2,4,5,1)
split.test <- split(test.data.frame, test.data.frame$sample)
integrate.test <- function(x){
dataframe.segment <- do.call(rbind.data.frame,x)
return(trapz(dataframe.segment$time,dataframe.segment$material.released.g))
}
So far the integrate.test function appears to work on a single element of a list.
> integrate.test(split.test[1])
[1] 12
> integrate.test(split.test[2])
[1] 16.5
The lapply function gives zeros in the output.
> lapply(split.test, integrate.test)
$`sample 1`
[1] 0
$`sample 2`
[1] 0
The output I am looking for is a data frame equivalent to:
expected.output <- data.frame(
sample = c('sample 1','sample 2'),
total.material.released = c(12 , 16.5))
Is anyone able to help resolve the error code. Thanks!
It's the difference between split.test[1], which is a one-element list containing a data frame, and split.test[[1]], which is the data frame stored in list element [[1]].
Your function, by calling do.call(rbind.data.frame, x), is expecting that x will be a list. But lapply(split.test, integrate.test) actually feeds it a data frame. Here's what happens when you feed integrate.test a data frame rather than a (generic) list:
x = do.call(rbind.data.frame, split.test[[1]])
x
c.1..1..5. c.1..2..3. c.1..4..2. c.1..6..1.
sample 1 1 1 1
time 1 2 4 6
material.released.g 5 3 2 1
do.call operates over a list. If you feed it a generic list (like split.test[1], which is a one-element list) it tries to rbind each list element. If the list contained several data frames, it would stack them into a single data frame. But there's only one element--the data frame contained in element 1 of split.test--so that's what gets returned.
However, when you run do.call(rbind, split.test[[1]]) you're giving do.call a data frame to operate on. A data frame is a special kind of list in which each column is a list element. So do.call takes the columns of your original data frame, transposes them into rows and stacks them. The integration returns 0, because the columns it wants to operate on no longer exist. When you reference those non-existent columns, values of NULL are returned instead of the data you were expecting and trapz(NULL, NULL) is zero.
The function will work if you use the data frame directly and skip the do.call step:
integrate.test <- function(x){
#dataframe.segment <- do.call(rbind.data.frame,x)
dataframe.segment = x
return(trapz(dataframe.segment$time,dataframe.segment$material.released.g))
}
lapply(split.test, integrate.test)
$`sample 1`
[1] 12
$`sample 2`
[1] 16.5
Of course this can be shortened to:
integrate.test <- function(x){
return(trapz(x$time,x$material.released.g))
}
Or you can just use trapz directly, without wrapping it in a function.

R creating multiple 2 by 2 tables from a data frame

Next question - I have created the following data frame in R
x <- as.integer(rnorm(n=1000, mean=10, sd=5))
y <- 1:1000
z <- sample (c(0,1),1000, replace=T)
df <- data.frame(x,y,z)
# create variables df using x
for(i in 1:10){
df[paste0("col",i)] <- ifelse(df$x <i, 1, 0)
}
# create 2 by 2 tables of z against col1 to col 10
for(i in 1:10){
table[i] <- table (df[paste0("col",i)], df$z)
}
I already received some excellent help to create variables in R using a for loop within a data frame.
However i am now struggling with using a similar for loop to create a two by two table (last section of the code).
Can anybody tell where i am going wrong?
Thanks again as always!
There are several problems with the code you have written.
First of all, the table data-object does not exist, so you cannot index-assign to it.
Secondly, you need to use "[[" when accessing a named item (otherwise you get a sublist).
Finally, if you make a list, which is really the most sensible type of storage for a series of table-objects, you need to use "[[" rather than "[" to extract an item (rather than a sublist).
I also took the liberty of renaming it to tbl so there would not be cognitive confusion about what was function and what was data.
tbl<- list();
for(i in 1:10){
tbl[[i]] <- table (df[[paste0("col",i)]], df$z)
}
tbl[[1]]
0 1
0 488 473
1 16 23

Resources