vectorized simulation in R - r

I've written a simulation function in R. I'd like to do num simulations. Rather than using a for loop, I'm trying to use some sort of apply function, such as lapply or parallel::mclapply.
lapply, as I'm currently using it, is failing.
For example:
# t1() is a generic example function
t1 <- function() {data(cars); return(get("cars"))}
a <- t1() # works
a2 <- vector("list", 5) # pre-allocate list for 5 simulations
# otherwise: a2 <- vector("list", num) # where num was pre-specified
a2 <- lapply(a2, t1)
## Error in FUN(X[[1L]], ...) : unused argument (X[[1]])
What am I doing wrong? Thanks in advance!
I'd rather not need to do:
a2 <- vector("list", 5)
for (i in 1:5) {
a2[[i]] <- t1()
}

It's true that a <- t1() works but it's not true that a <- t1(2) would have "worked". You are trying to pass arguments to parameters that are not there. Put a dummy parameter in the argument list and all will be fine. You might also look at the replicate function. It is specifically designed to support simulation efforts. I think you will find that it does not require including dummy parameters in the argument list.
> t1 <- function(z) {data(cars); return(get("cars"))}
> a <- t1() # works
> a2 <- vector("list", 5) # pre-allocate list for 5 simulations
> # otherwise: a2 <- vector("list", num) # where num was pre-specified
> a2 <- lapply(a2, t1) ;str(a2)
List of 5
$ :'data.frame': 50 obs. of 2 variables:
..$ speed: num [1:50] 4 4 7 7 8 9 10 10 10 11 ...
..$ dist : num [1:50] 2 10 4 22 16 10 18 26 34 17 ...
$ :'data.frame': 50 obs. of 2 variables:
..$ speed: num [1:50] 4 4 7 7 8 9 10 10 10 11 ...
..$ dist : num [1:50] 2 10 4 22 16 10 18 26 34 17 ...
$ :'data.frame': 50 obs. of 2 variables:
..$ speed: num [1:50] 4 4 7 7 8 9 10 10 10 11 ...
..$ dist : num [1:50] 2 10 4 22 16 10 18 26 34 17 ...
$ :'data.frame': 50 obs. of 2 variables:
..$ speed: num [1:50] 4 4 7 7 8 9 10 10 10 11 ...
..$ dist : num [1:50] 2 10 4 22 16 10 18 26 34 17 ...
$ :'data.frame': 50 obs. of 2 variables:
..$ speed: num [1:50] 4 4 7 7 8 9 10 10 10 11 ...
..$ dist : num [1:50] 2 10 4 22 16 10 18 26 34 17 ...
>

Related

Make vector of each column of a dataframe and return the vectors in a list

I have a dataframe like:
df<-data.frame(x=1:5,y=6:10,z=11:15)
x y z
1 1 6 11
2 2 7 12
3 3 8 13
4 4 9 14
5 5 10 15
Now I would like to create a list new_list where every element is a single column of as a vector.
This would look like:
str(new_list)
list of 3
$ : num[1:5] 1 2 3 4 5
$ : num[1:5] 6 7 8 9 10
$ : num[1:5] 11 12 13 14 15
I tried:
new_list <-lapply(df,c)
but this only returend
list of 3
$ : num[1:5] 1
$ : num[1:5] 6
$ : num[1:5] 11
Note: I hope my goal is clear. If not plese let me know.
Using c.
l <- lapply(df, c)
str(l)
# List of 3
# $ x: int [1:5] 1 2 3 4 5
# $ y: int [1:5] 6 7 8 9 10
# $ z: int [1:5] 11 12 13 14 15
or as.list.
l <- as.list(df)
str(l)
# List of 3
# $ x: int [1:5] 1 2 3 4 5
# $ y: int [1:5] 6 7 8 9 10
# $ z: int [1:5] 11 12 13 14 15
Using identity()
lapply(df, identity)
Just modifying the attributes and class:
new_list <- unclass(df)
attr(new_list, "row.names") <- NULL
Or
new_list <- df
class(new_list) <- "list" # NULL also works but is not as clear
attr(new_list, "row.names") <- NULL
A data.frame is just a list with some additional attributes. The simplest way to turn a data.frame into a list with the structure you have requested is to remove all the attributes.
If you need to create a new object, you can copy the existing one and then strip it.
new_list <- df
attributes(new_list) <- NULL
> str(new_list)
List of 3
$ : int [1:5] 1 2 3 4 5
$ : int [1:5] 6 7 8 9 10
$ : int [1:5] 11 12 13 14 15
However, this is usually done "in the wild" using as.list(), even though this retains variable names and has some additional overhead.

Paste element name onto columns in each list element

I have a list of dataframes. Each list element has a unique name but the column names are identical across all data frames.
I would like to paste the name of each dataframe to the columns, so that when I cbind them together into a single large dataframe I can distinguish between them.
Example data;
LIST <- list(df1 = data.frame("ColA" = c(1:5), "ColB" = c(10:14)),
df2 = data.frame("ColA" = c(21:25), "ColB" = c(30:34)))
str(LIST)
List of 2
$ df1:'data.frame': 5 obs. of 2 variables:
..$ ColA: int [1:5] 1 2 3 4 5
..$ ColB: int [1:5] 10 11 12 13 14
$ df2:'data.frame': 5 obs. of 2 variables:
..$ ColA: int [1:5] 21 22 23 24 25
..$ ColB: int [1:5] 30 31 32 33 34
Desired output;
List of 2
$ df1:'data.frame': 5 obs. of 2 variables:
..$ df1.ColA: int [1:5] 1 2 3 4 5
..$ df1.ColB: int [1:5] 10 11 12 13 14
$ df2:'data.frame': 5 obs. of 2 variables:
..$ df2.ColA: int [1:5] 21 22 23 24 25
..$ df2.ColB: int [1:5] 30 31 32 33 34
Since you mention that you want to use cbind later, you might use as.data.frame right away
as.data.frame(LIST)
# df1.ColA df1.ColB df2.ColA df2.ColB
#1 1 10 21 30
#2 2 11 22 31
#3 3 12 23 32
#4 4 13 24 33
#5 5 14 25 34
Thanks to #RonakShah you can use the following lines to get back a list in case you need it
df1 <- as.data.frame(LIST)
split.default(df1, sub("\\..*", "", names(df1)))
You could do this in a lapply with global assignment <<-.
lapply(seq_along(LIST), function(x)
names(LIST[[x]]) <<- paste0(names(LIST)[x], ".", names(LIST[[x]])))
Or using Map as #Sotos suggested
LIST <- Map(function(x, y) {names(x) <- paste0(y, '.', names(x)); x}, LIST, names(LIST))
Yields
str(LIST)
# List of 2
# $ df1:'data.frame': 5 obs. of 2 variables:
# ..$ df1.ColA: int [1:5] 1 2 3 4 5
# ..$ df1.ColB: int [1:5] 10 11 12 13 14
# $ df2:'data.frame': 5 obs. of 2 variables:
# ..$ df2.ColA: int [1:5] 21 22 23 24 25
# ..$ df2.ColB: int [1:5] 30 31 32 33 34
Hi you can use map2 to do this:
library(tidyverse)
map2(mylist, names(mylist), ~rename_all(.x, function(z) paste(.y, z, sep = ".")))
EDIT:
or as suggested in the commenst use imap
imap(mylist, ~rename_all(.x, function(z) paste(.y, z, sep = ".")))

converting data type stored in list into Date R

I have a list data.
and there are several data frames in each.
[[1]]
ID: int [1:100] ...
Date: Factor w/ ...
days: num [1:100] ...
[[2]]
ID: int [1:100] ...
Date: Factor w/ ...
like this.
And I want to convert that factor to Date format.
I thought about
unlist the list - changing format - making it to list again.
But I have no idea how to do that..
sapply(data, function(x) x$Date <- as.Date(x$Date))
This doesn't work. It only returns Date and doesn't change the data type.
Is there any fast way to convert that format?
I can solve this by using for loop.
for(i in 1:2){
data[[i]]$Date <- as.Date(data[[i]]$Date)}
But I would like to use sapply or lappy.
It is better to transform factor into character at first and then to Date format. The most easiest way is to use lubridate package. ymd transform character vectors of format e.g. 2018-11-22 into Year-Month-Date datetime object. Please pay attention to lambda-function body, after the change of the data frame it is typed x, which is a shortcut of return(x). See the code below:
library(lubridate)
# simulation of data
df1 <- data.frame(
ID = 1:100,
Date = as.factor(sample(seq(ymd("2018-01-01"), ymd("2018-12-01"), 1), 100)),
days = sample(100))
df2 <- data.frame(
ID = 1:100,
Date = as.factor(sample(seq(ymd("2018-01-01"), ymd("2018-12-01"), 1), 100, replace =TRUE)))
dfs <- list(df1, df2)
str(dfs)
# List of 2
# $ :'data.frame': 100 obs. of 3 variables:
# ..$ ID : int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
# ..$ Date: Factor w/ 100 levels "2018-01-06","2018-01-10",..: 17 89 40 2 84 46 58 62 66 43 ...
# ..$ days: int [1:100] 50 4 19 6 33 47 95 25 13 5 ...
# $ :'data.frame': 100 obs. of 2 variables:
# ..$ ID : int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
# ..$ Date: Factor w/ 87 levels "2018-01-03","2018-01-04",..: 3 30 61 6 78 34 5 71 49 55 ...
# handling the data
dfs_2 <- lapply(dfs, function(x) {
x$Date <- ymd(as.character(x$Date))
x
})
str(dfs_2)
# List of 2
# $ :'data.frame': 100 obs. of 3 variables:
# ..$ ID : int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
# ..$ Date: Date[1:100], format: "2018-03-10" "2018-10-25" "2018-11-25" ...
# ..$ days: int [1:100] 7 99 75 91 30 78 9 82 15 37 ...
# $ :'data.frame': 100 obs. of 2 variables:
# ..$ ID : int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
# ..$ Date: Date[1:100], format: "2018-05-30" "2018-05-20" "2018-05-13" ...

Applying IF statement through multiple files in a list

I am attempting to apply an IF statement through a large list of 64 items. My data takes the following form:
file_list Large list (64 elements, 4.2 Mb)
file1: 'data.frame': 3012 obs. of 4 variables:
..$V1: int[1:3012] 1850 1850 1850 ...
..$V2: int[1:3012] 1 2 3 ...
..$V3: int[1:3012] 16 15 16 ...
..$V4: int[1:3012] 4.69E-05 6.99E-05 5.62E-05 ...
................................................................................
file64: 'data.frame': 5412 obs. of 4 variables:
..$V1: int[1:5412] 1850 1850 1850 ...
..$V2: int[1:5412] 1 2 3 ...
..$V3: int[1:5412] 16 15 16 ...
..$V4: int[1:5412] 6.96E-05 4.99E-05 5.37E-05 ...
What I want to do is multiply the fourth column ($V4) in each of the 64 files by a different number depending on the contents of the second column ($V2). The numbers in $V2 are months of the year, and I need to multiply $V4 by 31 when $V2 is 1, 3, 5, 7, 8, 10 and 12; 30 when $V2 is 4, 6, 9 and 11; and 28.25 when $V2 is 2.
I assume this will involve some sort of for loop, but I haven't been able to complete this task. Any suggestions?
Here's a reproducible solution that uses a small function:
file_list <- list(file1 = data.frame(v1 = sample(1:100, 100, TRUE),
v2 = sample(c(1,2,3,5,6,8,10,4,6,9,11), 100, TRUE),
v4 = rnorm(100)),
file2 = data.frame(v1 = sample(1:100, 100, TRUE),
v2 = sample(c(1,2,3,5,6,8,10,4,6,9,11), 100, TRUE),
v4 = rnorm(100)))
str(file_list)
# List of 2
# $ file1:'data.frame': 100 obs. of 3 variables:
# ..$ v1: int [1:100] 6 90 66 86 32 33 50 46 19 59 ...
# ..$ v2: num [1:100] 5 10 2 10 8 6 10 3 5 5 ...
# ..$ v4: num [1:100] -0.639 -2.234 -0.816 0.997 -0.302 ...
# $ file2:'data.frame': 100 obs. of 3 variables:
# ..$ v1: int [1:100] 34 25 24 4 100 59 80 100 21 97 ...
# ..$ v2: num [1:100] 3 6 8 8 9 1 8 1 3 3 ...
# ..$ v4: num [1:100] -2.2599 0.0548 -1.1666 -0.4049 0.4681 ...
myFun <- function(df) {
df$v4[df$v2 %in% c(1,3,5,7,8,10,12)] <- df$v4[df$v2 %in% c(1,3,5,7,8,10,12)] * 31
df$v4[df$v2 %in% c(4,6,9,11)] <- df$v4[df$v2 %in% c(4,6,9,11)] * 30
df$v4[df$v2 == 2] <- df$v4[df$v2 == 2] * 28.25
df
}
lapply(file_list, myFun)
# lapply(file_list, FUN = function(x) head(myFun(x)))
# $file1
# v1 v2 v4
# 1 6 5 -19.816836
# 2 90 10 -69.264329
# 3 66 2 -23.054110
# 4 86 10 30.910798
# 5 32 8 -9.347289
# 6 33 6 -16.316746
#
# $file2
# v1 v2 v4
# 1 34 3 -70.055942
# 2 25 6 1.642744
# 3 24 8 -36.165864
# 4 4 8 -12.550877
# 5 100 9 14.041857
# 6 59 1 -2.556662

Why does data.table recycle matrices into a single vector when data.frame does not?

Compare the behavior of data.table and data.frame below:
a.matrix <- matrix(seq_len(25),ncol = 5, nrow = 5)
a.list <- list(seq_len(5),a.matrix)
a.dt <- as.data.table(a.list)
a.df <- as.data.frame(a.list)
a.dt.df <- as.data.table(a.df)
str(a.dt)
str(a.df)
str(a.dt.df)
data.table recycles the columns of the matrix into a vector of appropriate length:
> str(a.dt)
Classes ‘data.table’ and 'data.frame': 25 obs. of 2 variables:
$ V1: int 1 2 3 4 5 1 2 3 4 5 ...
$ V2: int 1 2 3 4 5 6 7 8 9 10 ...
- attr(*, ".internal.selfref")=<externalptr>
On the other hand, data.frame breaks each column out:
> str(a.df)
'data.frame': 5 obs. of 6 variables:
$ X1.5: int 1 2 3 4 5
$ X1 : int 1 2 3 4 5
$ X2 : int 6 7 8 9 10
$ X3 : int 11 12 13 14 15
$ X4 : int 16 17 18 19 20
$ X5 : int 21 22 23 24 25
My current workaround to get this behavior quickly with as.data.table is just to feed it through both as coercers:
> str(a.dt.df)
Classes ‘data.table’ and 'data.frame': 5 obs. of 6 variables:
$ X1.5: int 1 2 3 4 5
$ X1 : int 1 2 3 4 5
$ X2 : int 6 7 8 9 10
$ X3 : int 11 12 13 14 15
$ X4 : int 16 17 18 19 20
$ X5 : int 21 22 23 24 25
- attr(*, ".internal.selfref")=<externalptr>
Why is there a difference, and is there a fast way to get the data.frame behavior with data.table?
Just to close this on the SO end, as mentioned in the comments, this is being handled as a bug/issue at github now, added to data.table milestone v1.9.8 of this writing.
Follow-up
This is now resolved as per commit 64f377...

Resources