I have a list of dataframes. Each list element has a unique name but the column names are identical across all data frames.
I would like to paste the name of each dataframe to the columns, so that when I cbind them together into a single large dataframe I can distinguish between them.
Example data;
LIST <- list(df1 = data.frame("ColA" = c(1:5), "ColB" = c(10:14)),
df2 = data.frame("ColA" = c(21:25), "ColB" = c(30:34)))
str(LIST)
List of 2
$ df1:'data.frame': 5 obs. of 2 variables:
..$ ColA: int [1:5] 1 2 3 4 5
..$ ColB: int [1:5] 10 11 12 13 14
$ df2:'data.frame': 5 obs. of 2 variables:
..$ ColA: int [1:5] 21 22 23 24 25
..$ ColB: int [1:5] 30 31 32 33 34
Desired output;
List of 2
$ df1:'data.frame': 5 obs. of 2 variables:
..$ df1.ColA: int [1:5] 1 2 3 4 5
..$ df1.ColB: int [1:5] 10 11 12 13 14
$ df2:'data.frame': 5 obs. of 2 variables:
..$ df2.ColA: int [1:5] 21 22 23 24 25
..$ df2.ColB: int [1:5] 30 31 32 33 34
Since you mention that you want to use cbind later, you might use as.data.frame right away
as.data.frame(LIST)
# df1.ColA df1.ColB df2.ColA df2.ColB
#1 1 10 21 30
#2 2 11 22 31
#3 3 12 23 32
#4 4 13 24 33
#5 5 14 25 34
Thanks to #RonakShah you can use the following lines to get back a list in case you need it
df1 <- as.data.frame(LIST)
split.default(df1, sub("\\..*", "", names(df1)))
You could do this in a lapply with global assignment <<-.
lapply(seq_along(LIST), function(x)
names(LIST[[x]]) <<- paste0(names(LIST)[x], ".", names(LIST[[x]])))
Or using Map as #Sotos suggested
LIST <- Map(function(x, y) {names(x) <- paste0(y, '.', names(x)); x}, LIST, names(LIST))
Yields
str(LIST)
# List of 2
# $ df1:'data.frame': 5 obs. of 2 variables:
# ..$ df1.ColA: int [1:5] 1 2 3 4 5
# ..$ df1.ColB: int [1:5] 10 11 12 13 14
# $ df2:'data.frame': 5 obs. of 2 variables:
# ..$ df2.ColA: int [1:5] 21 22 23 24 25
# ..$ df2.ColB: int [1:5] 30 31 32 33 34
Hi you can use map2 to do this:
library(tidyverse)
map2(mylist, names(mylist), ~rename_all(.x, function(z) paste(.y, z, sep = ".")))
EDIT:
or as suggested in the commenst use imap
imap(mylist, ~rename_all(.x, function(z) paste(.y, z, sep = ".")))
Related
This should be pretty easy, but I dont know how. I have a single dataframe and a list with two dataframes. Now I want to combine them together, so that I have a single list with three dataframes. And I do not want to do in "manually".
a = data.frame(xa = 1:10,
ya = 11:20)
b = list(c = data.frame(x = 1:10),
d = data.frame(x = 1:20,
y = 11:30))
Now I though about something like this:
res = c(a, b)
But this results in this:
> sapply(res, class)
xa ya c d
"integer" "integer" "data.frame" "data.frame"
So it turns the two columns of the single dataframe into a vector.
How could I maintain the dataframe structure for the "single" dataframe and extract the dataframes from the list of 2?
You can use c but you have to cover your data.frame a into a list.
res <- c(b, list(a=a))
str(res)
#List of 3
# $ c:'data.frame': 10 obs. of 1 variable:
# ..$ x: int [1:10] 1 2 3 4 5 6 7 8 9 10
# $ d:'data.frame': 20 obs. of 2 variables:
# ..$ x: int [1:20] 1 2 3 4 5 6 7 8 9 10 ...
# ..$ y: int [1:20] 11 12 13 14 15 16 17 18 19 20 ...
# $ a:'data.frame': 10 obs. of 2 variables:
# ..$ xa: int [1:10] 1 2 3 4 5 6 7 8 9 10
# ..$ ya: int [1:10] 11 12 13 14 15 16 17 18 19 20
You can always add it as a new element
b[["a"]]=a
The "a" can be used in a loop or something similar.
I have a dataframe like:
df<-data.frame(x=1:5,y=6:10,z=11:15)
x y z
1 1 6 11
2 2 7 12
3 3 8 13
4 4 9 14
5 5 10 15
Now I would like to create a list new_list where every element is a single column of as a vector.
This would look like:
str(new_list)
list of 3
$ : num[1:5] 1 2 3 4 5
$ : num[1:5] 6 7 8 9 10
$ : num[1:5] 11 12 13 14 15
I tried:
new_list <-lapply(df,c)
but this only returend
list of 3
$ : num[1:5] 1
$ : num[1:5] 6
$ : num[1:5] 11
Note: I hope my goal is clear. If not plese let me know.
Using c.
l <- lapply(df, c)
str(l)
# List of 3
# $ x: int [1:5] 1 2 3 4 5
# $ y: int [1:5] 6 7 8 9 10
# $ z: int [1:5] 11 12 13 14 15
or as.list.
l <- as.list(df)
str(l)
# List of 3
# $ x: int [1:5] 1 2 3 4 5
# $ y: int [1:5] 6 7 8 9 10
# $ z: int [1:5] 11 12 13 14 15
Using identity()
lapply(df, identity)
Just modifying the attributes and class:
new_list <- unclass(df)
attr(new_list, "row.names") <- NULL
Or
new_list <- df
class(new_list) <- "list" # NULL also works but is not as clear
attr(new_list, "row.names") <- NULL
A data.frame is just a list with some additional attributes. The simplest way to turn a data.frame into a list with the structure you have requested is to remove all the attributes.
If you need to create a new object, you can copy the existing one and then strip it.
new_list <- df
attributes(new_list) <- NULL
> str(new_list)
List of 3
$ : int [1:5] 1 2 3 4 5
$ : int [1:5] 6 7 8 9 10
$ : int [1:5] 11 12 13 14 15
However, this is usually done "in the wild" using as.list(), even though this retains variable names and has some additional overhead.
I have a list data.
and there are several data frames in each.
[[1]]
ID: int [1:100] ...
Date: Factor w/ ...
days: num [1:100] ...
[[2]]
ID: int [1:100] ...
Date: Factor w/ ...
like this.
And I want to convert that factor to Date format.
I thought about
unlist the list - changing format - making it to list again.
But I have no idea how to do that..
sapply(data, function(x) x$Date <- as.Date(x$Date))
This doesn't work. It only returns Date and doesn't change the data type.
Is there any fast way to convert that format?
I can solve this by using for loop.
for(i in 1:2){
data[[i]]$Date <- as.Date(data[[i]]$Date)}
But I would like to use sapply or lappy.
It is better to transform factor into character at first and then to Date format. The most easiest way is to use lubridate package. ymd transform character vectors of format e.g. 2018-11-22 into Year-Month-Date datetime object. Please pay attention to lambda-function body, after the change of the data frame it is typed x, which is a shortcut of return(x). See the code below:
library(lubridate)
# simulation of data
df1 <- data.frame(
ID = 1:100,
Date = as.factor(sample(seq(ymd("2018-01-01"), ymd("2018-12-01"), 1), 100)),
days = sample(100))
df2 <- data.frame(
ID = 1:100,
Date = as.factor(sample(seq(ymd("2018-01-01"), ymd("2018-12-01"), 1), 100, replace =TRUE)))
dfs <- list(df1, df2)
str(dfs)
# List of 2
# $ :'data.frame': 100 obs. of 3 variables:
# ..$ ID : int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
# ..$ Date: Factor w/ 100 levels "2018-01-06","2018-01-10",..: 17 89 40 2 84 46 58 62 66 43 ...
# ..$ days: int [1:100] 50 4 19 6 33 47 95 25 13 5 ...
# $ :'data.frame': 100 obs. of 2 variables:
# ..$ ID : int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
# ..$ Date: Factor w/ 87 levels "2018-01-03","2018-01-04",..: 3 30 61 6 78 34 5 71 49 55 ...
# handling the data
dfs_2 <- lapply(dfs, function(x) {
x$Date <- ymd(as.character(x$Date))
x
})
str(dfs_2)
# List of 2
# $ :'data.frame': 100 obs. of 3 variables:
# ..$ ID : int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
# ..$ Date: Date[1:100], format: "2018-03-10" "2018-10-25" "2018-11-25" ...
# ..$ days: int [1:100] 7 99 75 91 30 78 9 82 15 37 ...
# $ :'data.frame': 100 obs. of 2 variables:
# ..$ ID : int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
# ..$ Date: Date[1:100], format: "2018-05-30" "2018-05-20" "2018-05-13" ...
I've written a simulation function in R. I'd like to do num simulations. Rather than using a for loop, I'm trying to use some sort of apply function, such as lapply or parallel::mclapply.
lapply, as I'm currently using it, is failing.
For example:
# t1() is a generic example function
t1 <- function() {data(cars); return(get("cars"))}
a <- t1() # works
a2 <- vector("list", 5) # pre-allocate list for 5 simulations
# otherwise: a2 <- vector("list", num) # where num was pre-specified
a2 <- lapply(a2, t1)
## Error in FUN(X[[1L]], ...) : unused argument (X[[1]])
What am I doing wrong? Thanks in advance!
I'd rather not need to do:
a2 <- vector("list", 5)
for (i in 1:5) {
a2[[i]] <- t1()
}
It's true that a <- t1() works but it's not true that a <- t1(2) would have "worked". You are trying to pass arguments to parameters that are not there. Put a dummy parameter in the argument list and all will be fine. You might also look at the replicate function. It is specifically designed to support simulation efforts. I think you will find that it does not require including dummy parameters in the argument list.
> t1 <- function(z) {data(cars); return(get("cars"))}
> a <- t1() # works
> a2 <- vector("list", 5) # pre-allocate list for 5 simulations
> # otherwise: a2 <- vector("list", num) # where num was pre-specified
> a2 <- lapply(a2, t1) ;str(a2)
List of 5
$ :'data.frame': 50 obs. of 2 variables:
..$ speed: num [1:50] 4 4 7 7 8 9 10 10 10 11 ...
..$ dist : num [1:50] 2 10 4 22 16 10 18 26 34 17 ...
$ :'data.frame': 50 obs. of 2 variables:
..$ speed: num [1:50] 4 4 7 7 8 9 10 10 10 11 ...
..$ dist : num [1:50] 2 10 4 22 16 10 18 26 34 17 ...
$ :'data.frame': 50 obs. of 2 variables:
..$ speed: num [1:50] 4 4 7 7 8 9 10 10 10 11 ...
..$ dist : num [1:50] 2 10 4 22 16 10 18 26 34 17 ...
$ :'data.frame': 50 obs. of 2 variables:
..$ speed: num [1:50] 4 4 7 7 8 9 10 10 10 11 ...
..$ dist : num [1:50] 2 10 4 22 16 10 18 26 34 17 ...
$ :'data.frame': 50 obs. of 2 variables:
..$ speed: num [1:50] 4 4 7 7 8 9 10 10 10 11 ...
..$ dist : num [1:50] 2 10 4 22 16 10 18 26 34 17 ...
>
Compare the behavior of data.table and data.frame below:
a.matrix <- matrix(seq_len(25),ncol = 5, nrow = 5)
a.list <- list(seq_len(5),a.matrix)
a.dt <- as.data.table(a.list)
a.df <- as.data.frame(a.list)
a.dt.df <- as.data.table(a.df)
str(a.dt)
str(a.df)
str(a.dt.df)
data.table recycles the columns of the matrix into a vector of appropriate length:
> str(a.dt)
Classes ‘data.table’ and 'data.frame': 25 obs. of 2 variables:
$ V1: int 1 2 3 4 5 1 2 3 4 5 ...
$ V2: int 1 2 3 4 5 6 7 8 9 10 ...
- attr(*, ".internal.selfref")=<externalptr>
On the other hand, data.frame breaks each column out:
> str(a.df)
'data.frame': 5 obs. of 6 variables:
$ X1.5: int 1 2 3 4 5
$ X1 : int 1 2 3 4 5
$ X2 : int 6 7 8 9 10
$ X3 : int 11 12 13 14 15
$ X4 : int 16 17 18 19 20
$ X5 : int 21 22 23 24 25
My current workaround to get this behavior quickly with as.data.table is just to feed it through both as coercers:
> str(a.dt.df)
Classes ‘data.table’ and 'data.frame': 5 obs. of 6 variables:
$ X1.5: int 1 2 3 4 5
$ X1 : int 1 2 3 4 5
$ X2 : int 6 7 8 9 10
$ X3 : int 11 12 13 14 15
$ X4 : int 16 17 18 19 20
$ X5 : int 21 22 23 24 25
- attr(*, ".internal.selfref")=<externalptr>
Why is there a difference, and is there a fast way to get the data.frame behavior with data.table?
Just to close this on the SO end, as mentioned in the comments, this is being handled as a bug/issue at github now, added to data.table milestone v1.9.8 of this writing.
Follow-up
This is now resolved as per commit 64f377...