I want to give labels to data frames using a combination of a small function in combination with lapply()
I have the following code:
df1 <- data.frame(c(1,2,3), c(3,4,5))
df2 <- data.frame(c(6,7,8), c(9,10,11))
f.generate.name <- function(x) {
x$name <- deparse(substitute(x))
return(x)
}
my_list <- list(df1, df2)
# This works fine.
f.generate.name(df1)
# This does not work.
lapply(my_list, f.generate.name)
which produces the following output
[[1]]
c.1..2..3. c.3..4..5. name
1 1 3 X[[i]]
2 2 4 X[[i]]
3 3 5 X[[i]]
[[2]]
c.6..7..8. c.9..10..11. name
1 6 9 X[[i]]
2 7 10 X[[i]]
3 8 11 X[[i]]
What I want instead is:
[[1]]
c.1..2..3. c.3..4..5. name
1 1 3 df1
2 2 4 df1
3 3 5 df1
[[2]]
c.6..7..8. c.9..10..11. name
1 6 9 df2
2 7 10 df2
3 8 11 df2
What is the best way of dong this without using loops? How can I tweak the lapply() function or the function that I created in order to achieve the desired result?
Base R
lapply() cannot iterate over more than one argument. You can use mapply() or its wrapper Map() in this case that always returns a list.
Map(f = function(x, y){
x$name <- y
x },
my_list,
names(my_list))
$df1
c.1..2..3. c.3..4..5. name
1 1 3 df1
2 2 4 df1
3 3 5 df1
$df2
c.6..7..8. c.9..10..11. name
1 6 9 df2
2 7 10 df2
3 8 11 df2
Tidyverse
In case you are open to a purrr solution, you can use imap(). It makes the names of the object conveniently available. There is no need to write a function then:
my_list <- list(df1 = df1, df2 = df2)
imap(my_list, ~{
.x$name <- .y
.x
})
$df1
c.1..2..3. c.3..4..5. name
1 1 3 df1
2 2 4 df1
3 3 5 df1
$df2
c.6..7..8. c.9..10..11. name
1 6 9 df2
2 7 10 df2
3 8 11 df2
Really the question is where do the names come from? An unnamed list like my_list in the question has lost the df1 and df2 names as we can see by looking at its internals:
dput(my_list) # no df1 or df2 seen
## list(structure(list(c.1..2..3. = c(1, 2, 3), c.3..4..5. = c(3,
## 4, 5)), class = "data.frame", row.names = c(NA, -3L)), structure(list(
## c.6..7..8. = c(6, 7, 8), c.9..10..11. = c(9, 10, 11)), class =
## "data.frame", row.names = c(NA,
## -3L)))
Thus either we need to create a named list in the first place or else provide a vector of names. We show both using only base R.
Named list
First create a named list of the data frames and then use Map as shown:
L <- mget(ls("^df")) # create named list
Map(data.frame, L, name = names(L))
Unnamed list
Alternately if all you have is an unnamed list then we can Map over that and a vector of names:
my_list <- list(df1, df2) # unnamed list as in question
Map(data.frame, my_list, name = c("df1", "df2"))
Pass individual data frames
Yet another approach is to pass the individual data frames instead of a list. Because we have not destroyed the original names by creating an unnamed list we can still retrieve them. On R 4.0 and later deparse1 could optionally be used in place of deparse in the code.
add_names <- function(...) {
mc <- match.call()
Map(data.frame, list(...), names = sapply(mc[-1], deparse))
}
add_names(df1, df2)
Related
I have the following data which I have split by name into separate data frames. After I run the following code, the variables in each data set are automatically named "X..i..".
I would like to rename the variable of each separate data frame so it matches the data set.
# load data
df1_raw <- data.frame(name = c("A", "B", "C", "A", "C", "B"),
start = c(1, 3, 4, 5, 2, 1),
end = c(6, 5, 7, 8, 6, 7))
df1 <- split(x = df1_raw, f = df1_raw$name) # split data by name
df1 <- lapply(df1, function(x) Map(seq.int, x$start, x$end)) # generate sequence intervals
df1 <- map(df1, unlist) # unlist sequences
df1 <- lapply(df1, data.frame) # convert to df
# rename variables
name <- c("A", "B", "C")
for (i in seq_along(df1)) {
names(df1[i]) <- name[i]
}
The last for loop does not work to rename variables. When I type names(df1$A) I still get "X..i..". The output I would like from names(df1$A) is "A".
Does anyone have any thoughts on how to rename these variables? Thanks!
You need to use [[]] when indexing from a list
for (i in seq_along(df1)) {
names(df1[[i]]) <- name[i]
}
Alternatively you could change how you create the list so you don't have to rename after the fact
df1 <- split(x = df1_raw, f = df1_raw$name) # split data by name
df1 <- lapply(df1, function(x) Map(seq.int, x$start, x$end)) # generate sequence intervals
df1 <- map(df1, unlist) # unlist sequences
df1 <- Map(function(x,name) {as.data.frame(setNames(list(x), name))}, df1, names(df1))
I think the solution by #MrFlick is enough for addressing the issue of renaming within a for loop.
Here is a base R workaround that may work for you
lapply(
split(df1_raw, df1_raw$name),
function(x) {
with(
x,
setNames(
data.frame(unlist(mapply(seq, start, end))),
unique(name)
)
)
}
)
which gives
$A
A
1 1
2 2
3 3
4 4
5 5
6 6
7 5
8 6
9 7
10 8
$B
B
1 3
2 4
3 5
4 1
5 2
6 3
7 4
8 5
9 6
10 7
$C
C
1 4
2 5
3 6
4 7
5 2
6 3
7 4
8 5
9 6
I am new to R. I have a data frame that contains start and end values for 45 types of items, and I used dplyr to subset that data into 45 separate data frames. I have written a for loop that outputs a sequence from start to end for each row of the data frame. I would like to use this for loop on all data frames without having to copy and paste the code 45 times and tailor it to the name of each data frame. See below for an example:
A_list <- list()
B_list <- list()
C_list <- list()
dfA <- data.frame(name = c("A", "A"), start = c(1, 3), end = c(6, 5))
dfB <- data.frame(name = c("B", "B"), start = c(2, 1), end = c(7, 8))
dfC <- data.frame(name = c("C", "C"), start = c(1, 2), end = c(4, 7))
for(i in seq_along(dfA$start)) {
output <- seq.int(dfA$start[i], dfA$end[i])
A_list[[i]] <- output
}
I tried making a list of names of each data frame and then referring to it in the for loop, but this didn't work.
list_df_names <- list(dfA, dfB, dfC)
seq.int(list_df_names[1:3]$start[i], list_df_names[1:3]$end[i])
Does anyone have any thoughts on how to do this?
We can loop of list of datasets, then create the sequence between the 'start', 'end' columns with Map to have a list of lists. If needed to create separate objects (not recommended), use list2env after setting the names of the nested list with the preferred object names
out <- lapply(list_df_names, function(x) Map(seq.int, x$start, x$end))
names(out) <- paste0(c('A', 'B', 'C'), "_list")
list2env(out, .GlobalEnv)
-output
A_list
#[[1]]
#[1] 1 2 3 4 5 6
#[[2]]
#[1] 3 4 5
B_list
#[[1]]
#[1] 2 3 4 5 6 7
#[[2]]
#[1] 1 2 3 4 5 6 7 8
C_list
#[[1]]
#[1] 1 2 3 4
#[[2]]
#[1] 2 3 4 5 6 7
I have a very simply question about lapply. I am transitioning from STATA to R and I think there is some very basic concept that I am not getting about looping in R. But I have been reading about it all afternoon and can't figure out a reasonable way to do this very simple thing.
I have three data frames df1, df2, and df3 that all have the same column names, in the same order, etc.
I want to rename their columns all at once.
I put the data frames in a list:
dflist <- list(df1, df2, df3)
What I want the new names to be:
varlist <- c("newname1", "newname2", "newname3")
Write a function that replaces names with those in varlist, and lapply it over the data frames
ChangeNames <- function(x) {
names(x) <- varlist
return(x)
}
dflist <- lapply(dflist, ChangeNames)
So, as far as I understand, R has changed the names of the copies of the data frames that I put in the list, but not the original data frames themselves. I want the data frames themselves to be renamed, not the elements of the list (which are trapped in a list).
Now, I can go
df1 <- as.data.frame(dflist[1])
df2 <- as.data.frame(dflist[2])
df2 <- as.data.frame(dflist[3])
But that seems weird. You need a loop to get back the elements of a loop?
Basically: once you've put some data frames in a list and run your function on them via lapply, how do you get them back out of the list, without starting back at square one?
If you just want to change the names, that isn't too hard in R. Bear in mind that the assignment operator, <-, can be applied in sequence. Hence:
names(df1) <- names(df2) <- names(df3) <- c("newname1", "newname2", "newname3")
I am not sure I understand correctly, do you want to rename the columns of the data frames or the components of the list that contain the data frames?
If it is the first, please always search before asking, the question has been asked here.
So what you can easily do in case you have even more data frames in the list is:
# Creating some sample data first
> dflist <- list(df1 = data.frame(a = 1:3, b = 2:4, c = 3:5),
+ df2 = data.frame(a = 4:6, b = 5:7, c = 6:8),
+ df3 = data.frame(a = 7:9, b = 8:10, c = 9:11))
# See how it looks like
> dflist
$df1
a b c
1 1 2 3
2 2 3 4
3 3 4 5
$df2
a b c
1 4 5 6
2 5 6 7
3 6 7 8
$df3
a b c
1 7 8 9
2 8 9 10
3 9 10 11
# And do the trick
> dflist <- lapply(dflist, setNames, nm = c("newname1", "newname2", "newname3"))
# See how it looks now
> dflist
$df1
newname1 newname2 newname3
1 1 2 3
2 2 3 4
3 3 4 5
$df2
newname1 newname2 newname3
1 4 5 6
2 5 6 7
3 6 7 8
$df3
newname1 newname2 newname3
1 7 8 9
2 8 9 10
3 9 10 11
So the names were changed from a, b and c to newname1, newname2and newname3 for each data frame in the list.
If it is the second, you can do this:
> names(dflist) <- c("newname1", "newname2", "newname3")
I have n number of data.frame i would like to add column to all data.frame
a <- data.frame(1:4,5:8)
b <- data.frame(1:4, 5:8)
test=ls()
for (j in test){
j = cbind(get(j),IssueType=j)
}
Problem that i'm running into is
j = cbind(get(j),IssueType=j)
because it assigns all the data to j instead of a, b.
As commented, it's mostly better to keep related data in a list structure. If you already have the data.frames in your global environment and you want to get them into a list, you can use:
dflist <- Filter(is.data.frame, as.list(.GlobalEnv))
This is from here and makes sure that you only get data.frame objects from your global environment.
You will notice that you now already have a named list:
> dflist
# $a
# X1.4 X5.8
# 1 1 5
# 2 2 6
# 3 3 7
# 4 4 8
#
# $b
# X1.4 X5.8
# 1 1 5
# 2 2 6
# 3 3 7
# 4 4 8
So you can easily select the data you want by typing for example
dflist[["a"]]
If you still want to create extra columns, you could do it like this:
dflist <- Map(function(df, x) {df$IssueType <- x; df}, dflist, names(dflist))
Now, each data.frame in dflist has a new column called IssueType:
> dflist
# $a
# X1.4 X5.8 IssueType
# 1 1 5 a
# 2 2 6 a
# 3 3 7 a
# 4 4 8 a
#
# $b
# X1.4 X5.8 IssueType
# 1 1 5 b
# 2 2 6 b
# 3 3 7 b
# 4 4 8 b
In the future, you can create the data inside a list from the beginning, i.e.
dflist <- list(
a = data.frame(1:4,5:8)
b = data.frame(1:4, 5:8)
)
To create a list of your data.frames do this:
a <- data.frame(1:4,5:8); b <- data.frame(1:4, 5:8); test <- list(a,b)
This allows you to us the lapply function to perform whatever you like to do with each of the dataframes, eg:
out <- lapply(test, function(x) cbind(j))
For most data.frame operations I recommend using the packages dplyr and tidyr.
wooo wooo
here is answer for the issue
helped by #docendo discimus
Created Dataframe
a <- data.frame(1:4,5:8)
b <- data.frame(1:4, 5:8)
Group data.frame into list
dflist <- Filter(is.data.frame, as.list(.GlobalEnv))
Add's extra column
dflist <- Map(function(df, x) {df$IssueType <- x; df}, dflist, names(dflist))
unstinting the data frame
list2env(dflist ,.GlobalEnv)
In attempting to answer a question earlier, I ran into a problem that seemed like it should be simple, but I couldn't figure out.
If I have a list of dataframes:
df1 <- data.frame(a=1:3, x=rnorm(3))
df2 <- data.frame(a=1:3, x=rnorm(3))
df3 <- data.frame(a=1:3, x=rnorm(3))
df.list <- list(df1, df2, df3)
That I want to rbind together, I can do the following:
df.all <- ldply(df.list, rbind)
However, I want another column that identifies which data.frame each row came from. I expected to be able to use the deparse(substitute(x)) method (here and elsewhere) to get the name of the relevant data.frame and add a column. This is how I approached it:
fun <- function(x) {
name <- deparse(substitute(x))
x$id <- name
return(x)
}
df.all <- ldply(df.list, fun)
Which returns
a x id
1 1 1.1138062 X[[1L]]
2 2 -0.5742069 X[[1L]]
3 3 0.7546323 X[[1L]]
4 1 1.8358605 X[[2L]]
5 2 0.9107199 X[[2L]]
6 3 0.8313439 X[[2L]]
7 1 0.5827148 X[[3L]]
8 2 -0.9896495 X[[3L]]
9 3 -0.9451503 X[[3L]]
So obviously each element of the list does not contain the name I think it does. Can anyone suggest a way to get what I expected (shown below)?
a x id
1 1 1.1138062 df1
2 2 -0.5742069 df1
3 3 0.7546323 df1
4 1 1.8358605 df2
5 2 0.9107199 df2
6 3 0.8313439 df2
7 1 0.5827148 df3
8 2 -0.9896495 df3
9 3 -0.9451503 df3
Define your list with names and it should give you an .id column with the data.frame name
df.list <- list(df1=df1, df2=df2, df3=df3)
df.all <- ldply(df.list, rbind)
Output:
.id a x
1 df1 1 1.84658809
2 df1 2 -0.01177462
3 df1 3 0.58579469
4 df2 1 -0.64748756
5 df2 2 0.24384614
6 df2 3 0.59012676
7 df3 1 -0.63037679
8 df3 2 -1.17416295
9 df3 3 1.09349618
Then you can know the data.frame name from the column df.all$.id
Edit:
As per #Gary Weissman's comment if you want to generate the names automatically you can do
names(df.list) <- paste0('df',seq_along(df.list)
Using base only, one could try something like:
dd <- lapply(seq_along(df.list), function(x) cbind(df_name = paste0('df',x),df.list[[x]]))
do.call(rbind,dd)
In your definition, df.list does not have names, however, even then the deparse substitute idiom does not appear to work easilty (as lapply calls .Internal(lapply(X, FUN)) -- you would have to look at the source to see if the object name was available and how to get it
Something like
names(df.list) <- paste('df', 1:3, sep = '')
foo <- function(n, .list){
.list[[n]]$id <- n
.list[[n]]
}
a x id
1 1 0.8204213 a
2 2 -0.8881671 a
3 3 1.2880816 a
4 1 -2.2766111 b
5 2 0.3912521 b
6 3 -1.3963381 b
7 1 -1.8057246 c
8 2 0.5862760 c
9 3 0.5605867 c
if you want to use your function, instead of deparse(substitute(x)) use match.call(), and you want the second argument, making sure to convert it to character
name <- as.character(match.call()[[2]])