I have written this simple function in order to generate tuples from a contaminated bivariate normal distribution. What it does is perform a Bernoulli experiment and based on the outcome, which is either 1 or 0, sample from one of two distributions.
require(mvtnorm)
rcn <- function(n, covar1, sigma1, sigma2, eps, bias1, bias2, covar2){
sigma1 <- matrix( c(sigma1, covar1, covar1, sigma2), ncol = 2, nrow = 2, byrow = T) ;
sigma2 <- matrix( c(sigma1, covar2, covar2, sigma2), ncol = 2, nrow = 2, byrow = T)
m <- matrix(0, nrow = n, ncol = 2)
for(i in 1:n){
ind <- rbinom(1, 1, eps)
m[i,] <- (1 - ind)*rmvnorm(1, sigma = sigma1) + ind*rmvnorm(1, sigma = sigma2, mean = c(bias1, bias2) )
}
list(y1 = m[,1], y2 = m[,2] )
}
rcn(20, 0.9, 1, 1, 0.05, 0, 0, -0.9)
The problem is that when I do that I get a warning I do not quite understand, namely
Warning message:
In matrix(c(sigma1, covar2, covar2, sigma2), ncol = 2, nrow = 2, :
data length [7] is not a sub-multiple or multiple of the number of rows [2]
Could you please tell me what this is about? I have tried changing my code in many ways but unfortunately I cannot make it go away. Thank you.
You have overwritten sigma1 with a matrix in your first line of the function but you use it in creating sigma2 in the second line. The two lines of code you should change are these:
sigma1 <- matrix( c(sigma1, covar1, covar1, sigma2), ncol = 2, nrow = 2, byrow = T) ;
sigma2 <- matrix( c(sigma1, covar2, covar2, sigma2), ncol = 2, nrow = 2, byrow = T)
As answered you have overwritten your sigmas. I reorganized your code to make it a bit more readable
require(mvtnorm)
rcn <- function(n, covar1, covar2, sigma1, sigma2, eps, bias1, bias2){
Matrix1 <-
matrix(
c(sigma1, covar1
, covar1, sigma2)
, ncol = 2, nrow = 2, byrow = TRUE)
Matrix2 <-
matrix(
c(sigma1, covar2
, covar2, sigma2)
, ncol = 2, nrow = 2, byrow = TRUE)
m <- matrix(0, nrow = n, ncol = 2)
for(i in 1:n){
ind <- rbinom(1, 1, eps)
m[i,] <- (1 - ind)*rmvnorm(1, sigma = Matrix1) + ind*rmvnorm(1, sigma = Matrix2, mean = c(bias1, bias2) )
}
list(y1 = m[,1], y2 = m[,2])
}
rcn(20, 0.9, -0.9, 1, 1, 0.05, 0, 0)
Related
Consider the following model for the evolution of an asset's price:
This what I have done (in R). I could not find a function that randomly outputs +1 or -1, so I decided to adapt the inbuilt rbinom function.
## This code is in R
rm(list = ls())
library(dplyr)
library(dint)
library(magrittr)
library(stats)
path =
function(T, mu, sigma, p, x0) {
x = rep(NA, T)
x[1] = x0
for(i in 2:T){
z = if_else(rbinom(1,1,p) == 0, -1, 1)
x[i] = x[i-1] * exp(mu + sigma*z)
}
return(x)
}
## Just some testing
x_sim = path(T = 4, mu = 0, sigma = 0.01, p = 0.5, x0 = 100)
## Actual answer
Np = 10000
mc = matrix(nrow = 17, ncol = Np)
for(j in 1:Np){
mc[,j] = path(T = 17, mu = 0, sigma = 0.01, p = 0.5, x0 = 100)
}
test = mc[2:nrow(mc), ] >= 100
sum_test = colSums(test)
comp = sum(sum_test >= 1)/length(sum_test)
prob = 1 - comp
Does this make sense? Any help/tips/advice would be much appreciated. Thanks!
Staying close to your code, I came up with this. Intuitively, if you think about it, the probability should be rather low due to the parameters and I get a probability of about 6.7% which is roughly what I get if I run your code with the parameters from the assignment.
simpath <- function(t, mu, sigma, p, x0, seed){
# set seed
if(!missing(seed)){
set.seed(seed)
}
# set up matrix for storing the results
res <- matrix(c(1:t, rep(NA, t*2)), ncol = 3)
colnames(res) <- c('t', 'z_t', 'x_t')
res[, 'z_t'] <- sample(c(1, -1), size = t, prob = c(p, 1-p), replace = TRUE)
res[1, 3] <- x0
for(i in 2:t){
res[i, 3] <- res[i-1, 3] * exp(mu+sigma*res[i, 2])
}
return(res)
}
x_sim <- simpath(t = 4, mu = 0, sigma = 0.01, p = 0.5, x0 = 100, seed = 123)
x_sim2 <- simpath(t = 36, mu = 0, sigma = 0.03, p = 0.5, x0 = 100, seed = 123)
## Actual answer
Np <- 100000
mc <- matrix(nrow = 36, ncol = Np)
for (j in 1:Np){
mc[, j] <- simpath(t = 36, mu = 0, sigma = 0.03, p = 0.5, x0 = 100)[, 3]
}
test <- mc > 100
sum_test <- colSums(test)
comp = sum(sum_test == 0)/length(sum_test)
prob = comp
> prob
[1] 0.06759
I have the following reprex list of 10 sample matrices:
# Sample of 10 3*3 matrices
z1 <- matrix(101:104, nrow = 2, ncol = 2)
z2 <- matrix(201:204, nrow = 2, ncol = 2)
z3 <- matrix(301:304, nrow = 2, ncol = 2)
z4 <- matrix(401:404, nrow = 2, ncol = 2)
z5 <- matrix(501:504, nrow = 2, ncol = 2)
z6 <- matrix(601:604, nrow = 2, ncol = 2)
z7 <- matrix(701:704, nrow = 2, ncol = 2)
z8 <- matrix(801:804, nrow = 2, ncol = 2)
z9 <- matrix(901:904, nrow = 2, ncol = 2)
z10 <- matrix(1001:1004, nrow = 2, ncol = 2)
# Combine all matrices into a single list
za <- list(z1, z2, z3, z4, z5, z6, z7, z8, z9, z10)
What we would like is to take za as an input and obtain 2 2*2 matrices called an upper_quantile and lower_quantile matrices.
Essentially this is to take the above list of 10 matrices and take the upper 97.5% quantile for the corresponding entries. And the same for the lower 2.5% quantile.
In this case we can manually construct the upper_quantile matrix for this example as follows:
upper_quantile <- matrix(data = c(quantile(x = seq(101, 1001, by = 100), probs = 0.975),
c(quantile(x = seq(102, 1002, by = 100), probs = 0.975)),
c(quantile(x = seq(103, 1003, by = 100), probs = 0.975)),
c(quantile(x = seq(104, 1004, by = 100), probs = 0.975)))
, nrow = 2
, ncol = 2
, byrow = FALSE)
upper_quantile
#> [,1] [,2]
#> [1,] 978.5 980.5
#> [2,] 979.5 981.5
I would like to understand how to do this using purrr or tidyverse tools as I have been trying to avoid cumbersome loops on lists and would like to adjust to dimensions automatically.
Could anyone please assist?
Here's a slightly clunky method which at least keeps everything in one pipe. It assumes that all the matrices are the same dimension, which needs to be true else the desired output doesn't make much sense. Working with matrices in purrr is always a little odd. The approach is basically to use flatten to make it easy to group the cells in the order we want, which is one column per location. That lets us map across columns to produce another vector, and then put that vector back into the right matrix. Might need some testing for larger matrices than 2x2.
The other approach I thought about was using cross to make a list of all index combinations, and then mapping through and creating the matrix cell by cell analogous to your example. Can attempt that if desired.
library(tidyverse)
z1 <- matrix(101:104, nrow = 2, ncol = 2)
z2 <- matrix(201:204, nrow = 2, ncol = 2)
z3 <- matrix(301:304, nrow = 2, ncol = 2)
z4 <- matrix(401:404, nrow = 2, ncol = 2)
z5 <- matrix(501:504, nrow = 2, ncol = 2)
z6 <- matrix(601:604, nrow = 2, ncol = 2)
z7 <- matrix(701:704, nrow = 2, ncol = 2)
z8 <- matrix(801:804, nrow = 2, ncol = 2)
z9 <- matrix(901:904, nrow = 2, ncol = 2)
z10 <- matrix(1001:1004, nrow = 2, ncol = 2)
# Combine all matrices into a single list
za <- list(z1, z2, z3, z4, z5, z6, z7, z8, z9, z10)
quant_mat <- function(list, p){
dim = ncol(list[[1]]) * nrow(list[[1]])
list %>%
flatten_int() %>%
matrix(ncol = dim, byrow = TRUE) %>%
as_tibble() %>%
map_dbl(quantile, probs = p) %>%
matrix(ncol = ncol(list[[1]]))
}
quant_mat(za, 0.975)
#> [,1] [,2]
#> [1,] 978.5 980.5
#> [2,] 979.5 981.5
quant_mat(za, 0.025)
#> [,1] [,2]
#> [1,] 123.5 125.5
#> [2,] 124.5 126.5
Created on 2018-03-14 by the reprex package (v0.2.0).
This should do the trick for a single quantile using tidyverse:
tibble(za) %>%
mutate(za = map(za, ~ data.frame(t(flatten_dbl(list(.)))))) %>%
unnest(za) %>%
summarize_all(quantile, probs = .975) %>%
matrix(ncol = 2)
I am trying to plot a two-dimensional phase portrait in R using the phaseR package. This is an example of what I want to do:
Example that works
library(phaseR)
lotkaVolterra <- function(t, y, parameters) {
x <- y[1]
y <- y[2]
lambda <- parameters[1]
epsilon <- parameters[2]
eta <- parameters[3]
delta <- parameters[4]
dy <- numeric(2)
dy[1] <- lambda*x - epsilon*x*y
dy[2] <- eta*x*y - delta*y
list(dy)
}
then when I plot it I get
lotkaVolterra.flowField <- flowField(lotkaVolterra, x.lim = c(0, 5), y.lim = c(0, 10), parameters = c(2, 1, 3, 2), points = 19, add = FALSE)
grid()
lotkaVolterra.nullclines <- nullclines(lotkaVolterra, x.lim = c(-1, 5), y.lim = c(-1, 10), parameters = c(2, 1, 3, 2), points = 500)
y0 <- matrix(c(1, 2, 2, 2, 3, 4), ncol = 2, nrow = 3, byrow = TRUE)
lotkaVolterra.trajectory <- trajectory(lotkaVolterra, y0 = y0, t.end = 10, parameters = c(2, 1, 3, 2), colour = rep("black", 3))
this is the plot I get:
The problem
When I try to do the same with my equation however the vector space does not appear:
WalpeFun <- function(t, y, parameters) {
x <- y[1]
y <- y[2]
k <- parameters[1]
z <- parameters[2]
w <- parameters[3]
b <- parameters[4]
d <- parameters[5]
v <- parameters[6]
a <- parameters[7]
g <- parameters[8]
l <- parameters[9]
e <- parameters[10]
dy <- numeric(2)
dy[1] <- 2.5*(1-(x/k)^z)+g*l+w*e - b*(x*y/d^2+y^2)
dy[2] <- 2.5 * (1 - (y/x + v)^a)
list(dy)
}
Walpe.flowField <-flowField(WalpeFun, x.lim = c(0, 150), y.lim = c(-1, 50), parameters = c(120.73851, 0.51786, -0.75178, 0.00100, 1.00000, 500, 0.001, 0.01102, 320.995455, 5.582273) , points = 20, add = FALSE)
grid()
Walpe.nullclines <-nullclines(WalpeFun, x.lim = c(0, 150), y.lim = c(-1, 50), parameters = c(120.73851, 0.51786, -0.75178, 0.00100, 1.00000, 500, 0.001, 0.01102, 320.995455, 5.582273))
y0 <- matrix(c(8.2, 2), ncol = 2, nrow = 1, byrow = TRUE)
Walpe.trajectory <-trajectory(WalpeFun, y0 = y0, t.end = 100, parameters = c(120.73851, 0.51786, -0.75178, 0.00100, 1.00000, 500, 0.001, 0.01102, 320.995455, 5.582273),system = "two.dim", colour = "black")
I get this very different plot:
and get the following error:
Error in if ((dx[i, j] != 0) & (dy[i, j] != 0)) { : missing value where TRUE/FALSE needed
I don't understand why the vectors don show, or why the blue nullcline is missing
Mathematically your x.lim range exceeds the domain where the function can have a value. Because your dy[2] expression has x in the denominator of one of its terms, the function blows up at x == 0 and then there will be an NA in the dy[]-matrix that is internal to the function code. (There's a bit of an ambiguity in that your dy-object is a 2 element vector whereas looking at the code, the calculations are being stored in 2d-matrices named dx and dy.)
flowField #look at the code
png()
Walpe.flowField <-flowField(WalpeFun, x.lim = c(0.01, 150), y.lim = c(-1, 50), parameters = c(120.73851, 0.51786, -0.75178, 0.00100, 1.00000, 500, 0.001, 0.01102, 320.995455, 5.582273) , points = 20, add = FALSE, system="two.dim")
Walpe.nullclines <-nullclines(WalpeFun, x.lim = c(0.01, 150), y.lim = c(-1, 50), parameters = c(120.73851, 0.51786, -0.75178, 0.00100, 1.00000, 500, 0.001, 0.01102, 320.995455, 5.582273))
y0 <- matrix(c(8.2, 2), ncol = 2, nrow = 1, byrow = TRUE)
Walpe.trajectory <-trajectory(WalpeFun, y0 = y0, t.end = 100, parameters = c(120.73851, 0.51786, -0.75178, 0.00100, 1.00000, 500, 0.001, 0.01102, 320.995455, 5.582273),system = "two.dim", colour = "black")
dev.off()
I don't know why the nullclines don't appear, but I'm guessing there are features of the function that neither of us understands.
I have been tinkering with power simulations recently and I have the following code:
library(MASS)
library(Matrix)
simdat <- data.frame(mmm = rep(rep(factor(1:2,
labels=c("m1", "m2")),
each = 2),
times = 2800),
ttt = rep(factor(1:2,
labels = c("t1", "t2")),
times = 5600),
sss = rep(factor(1:70),
each = 160),
iii = rep(rep(factor(1:40),
each = 4),
times = 70))
beta <- c(1, 2)
X1 <- model.matrix(~ mmm,
data = simdat)
Z1 <- model.matrix(~ ttt,
data = simdat)
X1 and Z1 are 11200x2 matrices. With the help of Stackoverflow I managed to make my calculations a lot more efficient than they were before:
funab <- function(){
ran_sub <- mvrnorm(70, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))
ran_ite <- mvrnorm(40, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))
Mb <- as.vector(X1 %*% beta)
M1 <- rowSums(Z1 * ran_sub[rep(1:70,
each = 160),])
M2 <- rowSums(Z1 * ran_ite[rep(rep(1:40, each = 4),
times = 70),])
Mout <- Mb + M1 + M2
Y <- as.vector(Mout) + rnorm(length(Mout), mean = 0 , sd = 0.27)
}
Y will then be a vector of length 11200. I then replicate this function a lot (say 1000 times):
sim <- replicate(n = 1000,
expr = funab()},
simplify = FALSE)
sim will be a 11200x1000 list. Given that I want to do this a lot more and possibly include more code into funab() I wonder if it is advisable to use sparse matrices for X1 and Z1 in the calculations in funab() as it is now?
Ok, I've tried to follow an advice given in the comments to my question and ran a test with the microbenchmark package. To make copy and pasting easier I will repeat the code from above:
library(MASS)
library(Matrix)
simdat <- data.frame(mmm = rep(rep(factor(1:2,
labels=c("m1", "m2")),
each = 2),
times = 2800),
ttt = rep(factor(1:2,
labels = c("t1", "t2")),
times = 5600),
sss = rep(factor(1:70),
each = 160),
iii = rep(rep(factor(1:40),
each = 4),
times = 70))
beta <- c(1, 2)
X1 <- model.matrix(~ mmm,
data = simdat)
Z1 <- model.matrix(~ ttt,
data = simdat)
I now create the same matrices as sparse matrices:
sparseX1 <- sparse.model.matrix(~ mmm,
data = simdat)
sparseZ1 <- sparse.model.matrix(~ ttt,
data = simdat)
I then set up the two functions:
funab_sparse <- function(){
ran_sub <- mvrnorm(70, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))
ran_ite <- mvrnorm(40, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))
Mb <- as.vector(sparseX1 %*% beta)
M1 <- Matrix::rowSums(sparseZ1 * ran_sub[rep(1:70,
each = 160),])
M2 <- Matrix::rowSums(sparseZ1 * ran_ite[rep(rep(1:40, each = 4),
times = 70),])
Mout <- Mb + M1 + M2
Y <- as.vector(Mout) + rnorm(length(Mout), mean = 0 , sd = 0.27)
}
funab <- function(){
ran_sub <- mvrnorm(70, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))
ran_ite <- mvrnorm(40, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))
Mb <- as.vector(X1 %*% beta)
M1 <- rowSums(Z1 * ran_sub[rep(1:70,
each = 160),])
M2 <- rowSums(Z1 * ran_ite[rep(rep(1:40, each = 4),
times = 70),])
Mout <- Mb + M1 + M2
Y <- as.vector(Mout) + rnorm(length(Mout), mean = 0 , sd = 0.27)
}
library(microbenchmark)
res <- microbenchmark(funab(), funab_sparse(), times = 1000)
and get the results:
> res <- microbenchmark(funab(), funab_sparse(), times = 1000)
> res
Unit: milliseconds
expr min lq median uq max neval
funab() 2.200342 2.277006 2.309587 2.481627 69.99895 1000
funab_sparse() 8.419564 8.568157 9.666248 9.874024 75.88907 1000
Assuming that I did not make any substantial mistakes I can conclude that with this particular way of doing the calculations using sparse matrices will not speed up my code.
I have the following code:
beta <- c(1, 2, 3)
X1 <- matrix(c(1, 1, 1, 1,
0, 1, 0, 1,
0, 0, 1, 1),
nrow = 4,
ncol = 3)
Z1 <- matrix(c(1, 1, 1, 1,
0, 1, 0, 1),
nrow = 4,
ncol = 2)
Z2 <- matrix(c(1, 1, 1, 1,
0, 1, 0, 1),
nrow = 4,
ncol = 2)
library(MASS)
S1 <- mvrnorm(70, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))
S2 <- mvrnorm(40, mu = c(0,0), Sigma = matrix(c(10, 4, 4, 2), ncol = 2))
z <- list()
y <- list()
for(j in 1:dim(S1)[1]){
for(i in 1:dim(S2)[1]){
z[[i]] <- X1 %*% beta+Z1 %*% S1[j,]+Z2 %*% S2[i,]+matrix(rnorm(4, mean = 0 , sd = 0.27), nrow = 4)
Z <- unname(do.call(rbind, z))
}
y[[j]] <- Z
Y <- unname(do.call(rbind, y))
}
X1 is a 4x3, Z1 and Z2 are 4x2 matrices. So everytime X1 %*% beta+X2 %*% S1[j,]+X2 %*% S2[i,]+matrix(rnorm(4, mean = 0 , sd = sigma), nrow = 4) is called it outputs a 4x1 matrix. So far I store all these values in the inner and outer loop in two lists and then call rbind() to transform them into a matrix. Is there a way to directly store them in matrices?
You can avoid using lists if you rely on the apply functions and on vector recycling. I broke down your equation into its parts. (I hope I interpreted it accurately!)
Mb <- as.vector(X1 %*% beta)
M1 <- apply(S1,1,function(x) Z1 %*% x )
M2 <- apply(S2,1,function(x) Z2 %*% x ) + Mb
Mout <- apply(M1,2,function(x) M2 + as.vector(x))
as.vector(Mout) + rnorm(length(Mout), mean = 0 , sd = 0.27)
because the random numbers are added after the matrix multiplication (ie are not involved in any calculation), you can just put them in on the end.
Also note that you can't add a smaller matrix to a larger one, but if you make it a vector first then R will recycle it as necessary. So when Mb (a vector of length 4) is added to a matrix with 4 rows and n columns, it is recycled n times.