Simulations of Poisson Distributions with different sample sizes and lambdas - r

Let's start with a statistical model, X where X is a random Poisson distribution with parameter
k - lambda with k being a constant
X ~ Pois(k - lambda)
Now,assume that k = 20. How do I create a function or make use of for loops to run a simulation
where we have different values of lambda <- c(2, 4, 8, 16) and each lambda has different sample sizes, n = [1,25] (from n = 1 to n = 25 ).
n <- 1:10
k <- 20
lambda <- c(2, 4, 8, 16)
result <- rpois(n, k - lambda)
result
The output:
28 12 13 1 13 16 16 3 12 15
Now obviously, my code is wrong because it is not giving me the right output. For any lambda values there should be output for each sample size from n=1, n=2, n=3 and up to n = 25.
My idea is to use a double for loop in order to create this. A for loop for the changing sample size,n and another for loop for the changing lambda values but I'm not too sure how to implement this.
The expected output should be something like this. For lambda = 8,
11
12,11
13,11,14
11,14,14,14
......
10 9 13 13 13 11 8 17 10 11 13 11 17 13 9 8 13 15 10 10 15 14 14 15 9

You can Vectorize rpois and put it in outer.
n <- 1:10
k <- 20
lambdas <- c(2, 4, 8, 16)
set.seed(42)
res <- outer(n, k - lambdas, Vectorize(rpois)) |> apply(2, as.list)
Gives
res |> setNames(paste0('lambda_', lambdas))
# $lambda_2
# $lambda_2[[1]]
# [1] 11
#
# $lambda_2[[2]]
# [1] 16 16
#
# $lambda_2[[3]]
# [1] 16 23 17
#
# $lambda_2[[4]]
# [1] 19 14 18 13
#
# $lambda_2[[5]]
# [1] 23 12 17 17 14
#
# $lambda_2[[6]]
# [1] 13 14 12 13 13 15
#
# $lambda_2[[7]]
# [1] 13 18 24 13 10 15 21
#
# $lambda_2[[8]]
# [1] 17 33 14 19 16 23 19 12
#
# $lambda_2[[9]]
# [1] 15 21 10 16 15 19 28 23 17
#
# $lambda_2[[10]]
# [1] 28 20 22 29 17 16 17 15 18 21
#
#
# $lambda_4
# $lambda_4[[1]]
# [1] 15
#
# $lambda_4[[2]]
# [1] 15 17
#
# $lambda_4[[3]]
# [1] 19 11 14
#
# $lambda_4[[4]]
# [1] 16 18 18 15
#
# $lambda_4[[5]]
# [1] 15 13 16 11 18
#
# $lambda_4[[6]]
# [1] 11 16 15 23 12 18
#
# $lambda_4[[7]]
# [1] 15 10 18 14 12 15 13
#
# $lambda_4[[8]]
# [1] 20 14 20 22 19 11 17 20
#
# $lambda_4[[9]]
# [1] 9 22 15 16 18 18 13 20 14
#
# $lambda_4[[10]]
# [1] 19 14 22 14 19 15 17 22 21 15
#
#
# $lambda_8
# $lambda_8[[1]]
# [1] 13
#
# $lambda_8[[2]]
# [1] 15 12
#
# $lambda_8[[3]]
# [1] 17 11 14
#
# $lambda_8[[4]]
# [1] 10 7 8 8
#
# $lambda_8[[5]]
# [1] 20 8 13 11 12
#
# $lambda_8[[6]]
# [1] 7 14 16 14 13 10
#
# $lambda_8[[7]]
# [1] 13 10 11 15 13 12 11
#
# $lambda_8[[8]]
# [1] 15 16 8 8 9 16 13 13
#
# $lambda_8[[9]]
# [1] 7 9 6 9 6 4 12 13 26
#
# $lambda_8[[10]]
# [1] 12 9 8 10 13 12 11 18 10 10
#
#
# $lambda_16
# $lambda_16[[1]]
# [1] 1
#
# $lambda_16[[2]]
# [1] 2 4
#
# $lambda_16[[3]]
# [1] 3 6 6
#
# $lambda_16[[4]]
# [1] 1 6 3 5
#
# $lambda_16[[5]]
# [1] 2 4 7 4 7
#
# $lambda_16[[6]]
# [1] 5 5 6 7 2 2
#
# $lambda_16[[7]]
# [1] 2 6 6 3 4 4 3
#
# $lambda_16[[8]]
# [1] 3 7 3 1 5 5 2 1
#
# $lambda_16[[9]]
# [1] 3 0 4 7 3 3 4 2 3
#
# $lambda_16[[10]]
# [1] 3 7 7 5 5 11 4 2 2 6

Related

creating new tibble columns based on mapping plus user data

I am trying generate new columns in a tibble from the output of a function that takes as input several existing columns of that tibble plus user data. As a simplified example, I would want to use this function
addup <- function(x, y, z){x + y + z}
and use it to add the numbers in the existing columns in this tibble...
set.seed(1)
(tib <- tibble(num1 = sample(12), num2 = sample(12)))
# A tibble: 12 x 2
num1 num2
<int> <int>
1 8 5
2 6 3
3 7 7
4 3 11
5 1 2
6 2 1
7 11 6
8 10 9
9 4 8
10 9 12
11 5 10
12 12 4
...together with user input. For instance, if a user defines the vector
vec <- c(3,6,4)
I would like to generate one new column per item in vec, adding the mapped values with the user input values.
The desired result in this case would look something like:
# A tibble: 12 x 5
num1 num2 `3` `6` `4`
<int> <int> <dbl> <dbl> <dbl>
1 5 7 15 18 16
2 8 2 13 16 14
3 7 9 19 22 20
4 1 11 15 18 16
5 3 3 9 12 10
6 9 12 24 27 25
7 6 6 15 18 16
8 10 10 23 26 24
9 11 4 18 21 19
10 12 5 20 23 21
11 4 1 8 11 9
12 2 8 13 16 14
If I know vec beforehand, I could achieve this by
tib %>%
mutate("3" = map2_dbl(num1, num2, ~addup(.x, .y, 3)),
"6" = map2_dbl(num1, num2, ~addup(.x, .y, 6)),
"4" = map2_dbl(num1, num2, ~addup(.x, .y, 4)))
but as the length of vec can vary, I do not know how to generalize this. I've found this answer repeated mutate in tidyverse, but there the functions are repeated over the existing columns instead of using the multiple existing columns for mapping.
Any ideas?
Since we don't have to have the function or the colnames as arguments, this is relatively simple. You just need to iterate over vec with a function that returns the summed column, and then combine with the original table. If you have an addup function that accepts vector inputs then you can skip the whole map2 part; in fact this one does but I don't know if your real function does.
library(tidyverse)
vec <- c(3,6,4)
set.seed(1)
tib <- tibble(num1 = sample(12), num2 = sample(12))
addup <- function(c1, c2, z) {c1 + c2 + z}
addup_vec <- function(df, vec) {
new_cols <- map_dfc(
.x = vec,
.f = function(v) {
map2_dbl(
.x = df[["num1"]],
.y = df[["num2"]],
.f = ~ addup(.x, .y, v)
)
}
)
colnames(new_cols) <- vec
bind_cols(df, new_cols)
}
tib %>%
addup_vec(vec)
#> # A tibble: 12 x 5
#> num1 num2 `3` `6` `4`
#> <int> <int> <dbl> <dbl> <dbl>
#> 1 4 9 16 19 17
#> 2 5 5 13 16 14
#> 3 6 8 17 20 18
#> 4 9 11 23 26 24
#> 5 2 6 11 14 12
#> 6 7 7 17 20 18
#> 7 10 3 16 19 17
#> 8 12 4 19 22 20
#> 9 3 12 18 21 19
#> 10 1 1 5 8 6
#> 11 11 2 16 19 17
#> 12 8 10 21 24 22
Created on 2019-01-16 by the reprex package (v0.2.0).
This uses lapply to apply the function to each element of your vector then binds the result to the original data frame and adds column names.
# Given example
set.seed(1)
(tib <- tibble(num1 = sample(12), num2 = sample(12)))
addup <- function(x, y, z){x + y + z}
vec <- c(3,6,4)
# Add columns and bind to original data frame
foo <- cbind(tib, lapply(vec, function(x)addup(tib$num1, tib$num2, x)))
# Correct column names
colnames(foo)[(ncol(tib)+1):ncol(foo)] <- vec
# Print result
print(foo)
# num1 num2 3 6 4
# 1 4 9 16 19 17
# 2 5 5 13 16 14
# 3 6 8 17 20 18
# 4 9 11 23 26 24
# 5 2 6 11 14 12
# 6 7 7 17 20 18
# 7 10 3 16 19 17
# 8 12 4 19 22 20
# 9 3 12 18 21 19
# 10 1 1 5 8 6
# 11 11 2 16 19 17
# 12 8 10 21 24 22

Adding a vector to components of a list

I have the following list:
A <- c(11)
B <- c(7, 13)
C <- c(1, 10, 11, 12)
my_list <- list(A, B, C)
> my_list
[[1]]
[1] 11
[[2]]
[1] 7 13
[[3]]
[1] 1 10 11 12
I would like to add -2, -1, 0, 1, and 2 to each number in this list, and retain all of the unique values within each list element, to obtain the following resulting list:
> my_new_list
[[1]]
[1] 9 10 11 12 13
[[2]]
[1] 5 6 7 8 9 11 12 13 14 15
[[3]]
[1] -1 0 1 2 3 8 9 10 11 12 13 14
I tried the following code, but I did not get the result I was hoping for:
my_new_list <- lapply(res, `+`, -2:2)
> my_new_list
$`1`
[1] 9 10 11 12 13
$`2`
[1] 5 12 7 14 9
$`3`
[1] -1 9 11 13 3
Why is this happening, and how can I obtain the result I'd like? Thanks!
Assuming that we need the unique values
lapply(my_list, function(x) sort(unique(unlist(lapply(x, `+`, -2:2)))))
Or with outer
lapply(my_list, function(x) sort(unique(c(outer(x, -2:2, `+`)))))
Or with rep and recyling
lapply(my_list, function(x) sort(unique(rep(-2:2, each = length(x)) + x)))
#[[1]]
# [1] 9 10 11 12 13
#[[2]]
# [1] 5 6 7 8 9 11 12 13 14 15
#[[3]]
# [1] -1 0 1 2 3 8 9 10 11 12 13 14
How about this:
my_new_list <- lapply(my_list, function(x) unique(union(x,sapply(x, function(y) y +c(-2:2)) )))
my_new_list <- lapply(my_new_list, sort)
my_new_list
[[1]]
[1] 9 10 11 12 13
[[2]]
[1] 5 6 7 8 9 11 12 13 14 15
[[3]]
[1] -1 0 1 2 3 8 9 10 11 12 13 14

Using purrr::map2 when one variable is not part of the function

If I had a function like this:
foo <- function(var) {
if(length(var) > 5) stop("can't be greater than 5")
data.frame(var = var)
}
Where this worked:
df <- 1:20
foo(var = df[1:5])
But this didn't:
foo(var = df)
The desired output is:
var
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
If I know that I can only run this function in chunk of 5 rows, what would be the best approach if I wanted to evaluate all 20 rows? Can I use purrr::map() for this? Assume that the 5 row constraint is rigid.
Thanks in advance.
We split df in chunks of 5 each then use purrr::map_dfr to apply foo function on them then bind everything together by rows
library(tidyverse)
foo <- function(var) {
if(length(var) > 5) stop("can't be greater than 5")
data.frame(var = var)
}
df <- 1:20
df_split <- split(df, (seq(length(df))-1) %/% 5)
df_split
map_dfr(df_split, ~ foo(.x))
var
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
You can use dplyr::group_by or tapply :
data.frame(df) %>%
mutate(grp = (row_number()-1) %/% 5) %>%
group_by(grp) %>%
mutate(var = foo(df)$var) %>%
ungroup %>%
select(var)
# # A tibble: 20 x 1
# var
# <int>
# 1 1
# 2 2
# 3 3
# 4 4
# 5 5
# 6 6
# 7 7
# 8 8
# 9 9
# 10 10
# 11 11
# 12 12
# 13 13
# 14 14
# 15 15
# 16 16
# 17 17
# 18 18
# 19 19
# 20 20
data.frame(var=unlist(tapply(df,(df-1) %/% 5,foo)))
# var
# 01 1
# 02 2
# 03 3
# 04 4
# 05 5
# 11 6
# 12 7
# 13 8
# 14 9
# 15 10
# 21 11
# 22 12
# 23 13
# 24 14
# 25 15
# 31 16
# 32 17
# 33 18
# 34 19
# 35 20

How to make data randomization faster in R?

I have very big data set and I'm computing thousands of models for it. For every model I need to randomize my data 100 times.This randomization part makes my script very slow.
Would someone help me to make this step faster?
Here is my code:
for (l in seq(repeat.times)) {
y <- as.matrix(dfr[1])
x <- as.matrix(dfr[2:ncol(dfr)])
# Random Generation
x.random.name = sample(colnames(x),1,replace=FALSE)
x.random.1 <- sample(x[,x.random.name],nrow(y),replace=FALSE)
x <- cbind(x,x.random.1)
.
.
.
For example:
> x
A B C D E
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
> y
[,1]
[1,] 10
[2,] 20
[3,] 30
[4,] 40
After randomization:
> x
A B C D E x.random.1
[1,] 1 5 9 13 17 10
[2,] 2 6 10 14 18 12
[3,] 3 7 11 15 19 9
[4,] 4 8 12 16 20 11
>
This is way way faster if I understand OP's requirement correctly
x
## A B C D E
## [1,] 1 5 9 13 17
## [2,] 2 6 10 14 18
## [3,] 3 7 11 15 19
## [4,] 4 8 12 16 20
y
## [,1]
## [1,] 10
## [2,] 20
## [3,] 30
## [4,] 40
xncol <- ncol(x)
ynrow <- nrow(y)
require(microbenchmark)
microbenchmark(xrand <- sapply(1:100, FUN = function(iter) {
sample(x[, sample(1:xncol, 1)], ynrow)
}), times = 1L)
## Unit: milliseconds
## expr min
## xrand <- sapply(1:100, FUN = function(iter) { sample(x[, sample(1:xncol, 1)], ynrow) }) 1.083906
## lq median uq max neval
## 1.083906 1.083906 1.083906 1.083906 1
x <- cbind(x, xrand)
x
## A B C D E
## [1,] 1 5 9 13 17 8 16 2 18 5 3 10 10 14 9 19 6 6 15 18 2 13 13 15 18 7 20 17 11 13 1 16 1 20 1 9 19 14 20
## [2,] 2 6 10 14 18 7 14 3 20 8 4 12 9 13 10 20 8 8 13 20 1 14 15 16 20 6 19 19 10 16 2 15 4 17 4 12 20 15 19
## [3,] 3 7 11 15 19 5 15 1 19 7 2 11 12 15 11 18 7 7 14 17 4 15 16 14 19 8 17 18 9 14 4 14 2 18 3 11 18 16 17
## [4,] 4 8 12 16 20 6 13 4 17 6 1 9 11 16 12 17 5 5 16 19 3 16 14 13 17 5 18 20 12 15 3 13 3 19 2 10 17 13 18
##
## [1,] 5 13 2 3 5 2 5 8 4 6 19 3 7 19 4 7 6 4 17 9 18 9 5 3 1 15 8 19 19 3 19 15 15 1 1 10 15 19 11 6 5 17 7
## [2,] 7 15 1 1 7 1 6 6 3 8 18 2 6 17 2 6 5 3 18 10 17 11 8 1 3 13 6 17 18 4 17 16 13 4 3 11 16 18 9 8 8 18 6
## [3,] 8 14 3 2 8 3 8 7 2 7 20 1 8 18 3 8 8 1 20 12 19 10 6 2 2 16 5 20 17 2 18 13 16 3 4 12 13 20 12 7 7 20 8
## [4,] 6 16 4 4 6 4 7 5 1 5 17 4 5 20 1 5 7 2 19 11 20 12 7 4 4 14 7 18 20 1 20 14 14 2 2 9 14 17 10 5 6 19 5
##
## [1,] 3 3 15 19 2 12 16 11 18 7 10 11 5 12 12 10 1 2 19 2 16 17 11
## [2,] 4 2 13 20 1 11 15 12 17 5 11 12 6 10 9 11 4 3 18 3 14 19 9
## [3,] 1 4 16 18 4 10 14 9 19 8 12 9 8 11 11 9 3 4 20 4 13 20 12
## [4,] 2 1 14 17 3 9 13 10 20 6 9 10 7 9 10 12 2 1 17 1 15 18 10
The key step is ofcourse, which I have wrapped in microbenchmark purely for benchmarking purpose.
xrand <- sapply(1:100, FUN = function(iter) { sample(x[, sample(1:xncol, 1)], ynrow) })
Here is a one-liner:
# Data
x<-matrix(1:10^4,nrow=10)
# Generate 2000 replicates.
replicate(2000,x[order(runif(nrow(x))),sample(ncol(x),1)])
Or even just:
replicate(2000,sample(x[,sample(ncol(x),1)]))
I found that you could dramatically reduce the runtime by moving x and y outside the loop. Then you can just create a new transformed matrix in the loop
y <- as.matrix(dfr[1])
XX <- as.matrix(dfr[2:ncol(dfr)])
for (l in seq(repeat.times)) {
# Random Generation
x.random.name = sample(colnames(x),1,replace=FALSE)
x.random.1 <- sample(XX[,x.random.name],nrow(y),replace=FALSE)
x <- cbind(XX,x.random.1)
}
So i've moved out x and renamed it. Then when you do your analysis, you would continue to use the newly made x. I found that with my benchmark this speed things up by nearly two orders of magnitude.

Creating a set of sequences with decreasing length in R

I want to make a set of sequences from x to 20, with x = c(2:19). I want this, essentially, but without having to do it this way:
a = seq(2, 20)
b = seq(3, 20)
...
q = seq(18, 20)
r = seq(19, 20)
> a
[1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> b
[1] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
...
> q
[1] 18 19 20
> r
[1] 19 20`
I've attempted it using a for loop, but I can't get the replacement to work out:
a = c(2:20)
b = numeric()
for (i in 1:19){
b = seq(a[i]:20)
}
Any help?
sapply(2:19, seq, to = 20)
[[1]]
[1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
[[2]]
[1] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
[[3]]
[1] 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
[[4]]
[1] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
[[5]]
[1] 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
[[6]]
[1] 7 8 9 10 11 12 13 14 15 16 17 18 19 20
[[7]]
[1] 8 9 10 11 12 13 14 15 16 17 18 19 20
[[8]]
[1] 9 10 11 12 13 14 15 16 17 18 19 20
[[9]]
[1] 10 11 12 13 14 15 16 17 18 19 20
[[10]]
[1] 11 12 13 14 15 16 17 18 19 20
[[11]]
[1] 12 13 14 15 16 17 18 19 20
[[12]]
[1] 13 14 15 16 17 18 19 20
[[13]]
[1] 14 15 16 17 18 19 20
[[14]]
[1] 15 16 17 18 19 20
[[15]]
[1] 16 17 18 19 20
[[16]]
[1] 17 18 19 20
[[17]]
[1] 18 19 20
[[18]]
[1] 19 20
If you want to save the object and give name to each element
res <- sapply(2:19, seq, to = 20)
names(res) <- letters[1:length(res)]
Extending on dickoa's answer to assign global variables a to r (although I would not see why that would ever be preferable over storing in a list):
mapply(FUN=assign,x=letters[1:18],value=sapply(2:19, seq, to = 20),MoreArgs=list(envir=.GlobalEnv))
Gives:
> a
[1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> b
[1] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> q
[1] 18 19 20
> r
[1] 19 20

Resources