Related
I have a summation that I can calculate using four for loops but I wonder if this can be simplified, maybe using a vectorized function, to reduce the computation time. Something similar to the Kronecker product (in R: kronecker(x, x)), or maybe something using outer?
The summation is:
where E is the sample space of integers ranging from 1 - 9. The i and j indices are also integers ranging from 1 - 9.
So, f, g, and h all matrices of dimension 9x9.
The h matrix is fixed and I have that but I am simulating g many times and then I choose the one that minimizes another function. The problem is, one thousand simulations, which is too few, takes about 1 second. I really want to try a million, but that many would take a long time.
I have the for loops in a function:
sim <- function(y, nreps, h) {
G <- vector("list", nreps) # list containing random values from Dirichlet distribution
F <- vector("list", nreps) # list containing the f matrices
M <- vector("numeric", nreps) # vector to store the results
require(gtools)
for(n in 1:nreps) {
f <- matrix(0, nrow=9, ncol=9) # initialize f
g <- gtools::rdirichlet(9, rep(1,9)) # simulate g
for(i in 1:9) {
for(j in 1:9) {
for(k in 1:9) {
for(l in 1:9) {
f[i,j] <- f[i,j] + h[i,k] * h[j,l] * g[k,l] # summation (see above)
}
}
}
}
F[[n]] <- f # store f matrix
G[[n]] <- g # store g matrix
M[n] <- sum((y - f)^2) # sum of squared differences between y and f
}
m <- which.min(M) # which M is the minimum?
return(list(g=G[[m]], m=M[m]))
}
And I call the function with
sim(y=f.y1, nreps=1000, h=x)
Here is the data:
> dput(f.y1)
structure(c(0.0182002022244692, 0.0121334681496461, 0.0101112234580384,
0, 0, 0, 0, 0, 0, 0.0485338725985844, 0.0940343781597573, 0.112234580384226,
0.0434782608695652, 0.00910010111223458, 0.00101112234580384,
0, 0, 0, 0.0333670374115268, 0.110212335692619, 0.132457027300303,
0.0808897876643074, 0.0222446916076845, 0.0070778564206269, 0.00101112234580384,
0, 0, 0.0070778564206269, 0.0202224469160768, 0.0596562184024267,
0.0616784630940344, 0.0262891809908999, 0.0070778564206269, 0,
0, 0, 0.00202224469160768, 0.00505561172901921, 0.0151668351870576,
0.0182002022244692, 0.0111223458038423, 0.00404448938321537,
0, 0, 0, 0.00202224469160768, 0.00404448938321537, 0.00505561172901921,
0.00505561172901921, 0.00202224469160768, 0.00202224469160768,
0, 0, 0, 0, 0.00202224469160768, 0.00202224469160768, 0.00202224469160768,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0), class = "table", dim = c(9L, 9L), dimnames = structure(list(
c("0", "1", "2", "3", "4", "5", "6", "7", "8"), c("0", "1",
"2", "3", "4", "5", "6", "7", "8")), names = c("", "")))
> dput(x)
structure(c(0.61, 0.16, 0.03, 0.005, 0, 0, 0, 0, 0, 0.32, 0.61,
0.16, 0.03, 0.005, 0, 0, 0, 0, 0.06, 0.16, 0.61, 0.16, 0.03,
0.005, 0, 0, 0, 0.01, 0.06, 0.16, 0.61, 0.16, 0.03, 0.01, 0,
0, 0, 0.01, 0.03, 0.16, 0.61, 0.16, 0.03, 0.01, 0, 0, 0, 0.01,
0.03, 0.16, 0.61, 0.16, 0.06, 0.01, 0, 0, 0, 0.005, 0.03, 0.16,
0.61, 0.16, 0.06, 0, 0, 0, 0, 0.005, 0.03, 0.16, 0.61, 0.32,
0, 0, 0, 0, 0, 0.005, 0.03, 0.16, 0.61), dim = c(9L, 9L))
And you'll need to load the gtools package for the rdirichlet function. Thanks heaps!
library(gtools)
Luckily this particular example is just "simple" matrix multiplication, so can easily be vectorised with:
sim1 <- function(y, nreps, h) {
G <- vector("list", nreps) # list containing random values from Dirichlet distribution
F <- vector("list", nreps) # list containing the f matrices
M <- vector("numeric", nreps) # vector to store the results
require(gtools)
for(n in 1:nreps) {
g <- gtools::rdirichlet(9, rep(1,9)) # simulate g
f <- h %*% g %*% t(h)
F[[n]] <- f # store f matrix
G[[n]] <- g # store g matrix
M[n] <- sum((y - f)^2) # sum of squared differences between y and f
}
m <- which.min(M) # which M is the minimum?
return(list(g=G[[m]], m=M[m]))
}
Run function for comparison
#Original version
set.seed(0)
system.time(a <- sim(y=f.y1, nreps=1000, h=x))
# user system elapsed
# 0.97 0.03 1.00
#revised version
set.seed(0)
system.time(b <- sim1(y=f.y1, nreps=1000, h=x))
# user system elapsed
# 0.01 0.00 0.02
#Check they give the same answer
all.equal(a, b)
#[1] TRUE
I've got a function written in Rcpp:
library(Rcpp)
cppFunction("NumericVector MatVecMul_cpp (NumericVector y, double k) {
int n = y.size();
NumericVector z(n);
int i; double *p1, *p2, *end = &z[n];
double tmp = 1.0;
for (i = 0; i < n; i++) {
for (p1 = &z[i], p2 = &y[0]; p1 < end; p1++, p2++) *p1 += tmp * (*p2);
tmp *= k;
}
return z;
}")
Basically the goal of the function is to take a numeric vector and parameter k and to calculate output vector where an i-th element is a sum of i-1-th element multiplied by k and a i-th element of input vector y. However, now I need to make some tweak, i.e. I need to take additional parameter c which would tell that c row after non-zero value in y vector the output vector z should be 0. See desired output below with c = 4, k = 0.9.
structure(list(y = c(0.7, 0, 0, 0, 0, 0, 0, 4, 0, 0, 6, 0, 0,
0), z = c(0.7, 0.63, 0.567, 0.5103, 0.45927, 0, 0, 4, 3.6, 3.24,
8.916, 8.0244, 7.22196, 6.499764)), row.names = c(NA, -14L), class = "data.frame")
So once again, the 5-th value of z is 0, because the parameter c is equal to 4 so we doesn't multiply the previous value of z anymore. But the 11-th value of z is 8.916000 as we don't only multiply previous value by 0.9, but also add 6.0 from y column.
I have tried to create a new 0-1 column in data.frame named as c which would indicate if the 0.9 decrease is still considered or not and then tried to adjust above function, but the following didn't work (values of z doesn't reset where c = 0).
cppFunction("NumericVector adjust_cpp (NumericVector y, double k, NumericVector ctrl) {
int n = y.size();
NumericVector z(n);
int i; double *p1, *p2, *p3, *end = &z[n];
double tmp = 1.0;
for (i = 0; i < n; i++) {
for (p1 = &z[i], p2 = &y[0], p3 = &ctrl[0]; p1 < end; p1++, p2++, p3++) {
*p1 += tmp * (*p2);
*p1 *= *p3;
}
tmp *= k;
}
return z;
}"
)
How can I accomplish that?
structure(list(y = c(0.7, 0, 0, 0, 0, 0, 0, 4, 0, 0, 6, 0, 0,
0), z = c(0.7, 0.63, 0.567, 0.5103, 0.45927, 0, 0, 4, 3.6, 3.24,
8.916, 8.0244, 7.22196, 6.499764), ctrl = c(1, 1, 1, 1, 1, 0,
0, 1, 1, 1, 1, 1, 1, 1)), .Names = c("y", "z", "ctrl"), row.names = c(NA,
-14L), class = "data.frame")
With above data in R this would be:
fun <- function(y, k, ctrl) {
n <- length(y)
z <- numeric(n)
z[1] <- y[1]
for (i in 1:(n - 1)) {
z[i + 1] <- (y[i + 1] + z[i] * k) * ctrl[i + 1]
} return(z)
}
Translating such a simple R function into Rcpp can be done line by line with minimal changes:
#include <Rcpp.h>
using Rcpp::NumericVector;
// [[Rcpp::export]]
NumericVector funC(NumericVector y, double k, NumericVector ctrl) {
R_xlen_t n = y.length();
NumericVector z(n);
z(0) = y(0);
for (R_xlen_t i = 0; i < n - 1; ++i) {
z(i + 1) = (y(i + 1) + z(i) * k) * ctrl(i + 1);
}
return z;
}
/*** R
df <- structure(list(y = c(0.7, 0, 0, 0, 0, 0, 0, 4, 0, 0, 6, 0, 0,
0), z = c(0.7, 0.63, 0.567, 0.5103, 0.45927, 0, 0, 4, 3.6, 3.24,
8.916, 8.0244, 7.22196, 6.499764), ctrl = c(1, 1, 1, 1, 1, 0,
0, 1, 1, 1, 1, 1, 1, 1)), .Names = c("y", "z", "ctrl"), row.names = c(NA,
-14L), class = "data.frame")
fun <- function(y, k, ctrl) {
n <- length(y)
z <- numeric(n)
z[1] <- y[1]
for (i in 1:(n - 1)) {
z[i + 1] <- (y[i + 1] + z[i] * k) * ctrl[i + 1]
}
return(z)
}
z <- fun(df$y, 0.9, df$ctrl)
all.equal(df$z, z)
z <- funC(df$y, 0.9, df$ctrl)
all.equal(df$z, z)
*/
For the provided vectors with length 14, the R version is still faster on this machine. Duplicating y and ctrl ten times gives vectors, for which Rcpp is already faster.
I have a list of vector and would like to make some conditions based on these vectors. All my conditions work good unless this condition
if (is.null(w[[i]]))
It returns this error:
Myfu(w=NULL,mat)
Error in w[[i]] :
attempt to select less than one element in integerOneIndex
w <- list(c(0.8,0.2),c(0.5,0.4))
mat1 <- c(0, 1, 3, 4, 4,
0, 0, 3, 4, 1,
0, 0, 0, 4, 1,
0, 0, 0, 0, 3,
0, 0, 0, 0, 0)
mat1 <- matrix(mat1, 5, 5)
# define R-vine pair-copula parameter matrix
mat2 <- c(0, 0.2, 0.9, 1.5, 3.9,
0, 0, 1.1, 1.6, 0.9,
0, 0, 0, 1.9, 0.5,
0, 0, 0, 0, 4.8,
0, 0, 0, 0, 0)
mat2 <- matrix(mat2, 5, 5)
mat <- list(mat1,mat2)
Myfu <- function(w,mat){
m <- length(w)
for(i in 1:m){
if (is.null(w[[i]]))
w[[i]] <- rep.int(1/m, m)
if (sum(w[[i]]) > 1)
stop("w must add to one")
if(any(w[[i]] < 0))
stop("w must be positive")
}
return(w)
}
Any help, please?
I've been given a matrix:
P <- matrix(c(0, 0, 0, 0.5, 0, 0.5, 0.1, 0.1, 0, 0.4, 0, 0.4, 0, 0.2, 0.2, 0.3, 0, 0.3, 0, 0, 0.3, 0.5, 0, 0.2, 0, 0, 0, 0.4, 0.6, 0, 0, 0, 0, 0, 0.4, 0.6), nrow = 6, ncol = 6, byrow = TRUE)
Using the functions, mpow, rows_equal, matrices_equal. I want to find when P^n converges, in other words what n is, when all the rows are equal in the matrix and when P^n = P^(n+1).
By just looking at the functions i have managed to deduce that around n=19-21 the matrix will converge.
Although, I want to find the right n using a loop. Here under are the functions mpow, rows_equal and matrices_equal. I know they can be written differently but please keep them as they are.
mpow <- function(P, n, d=4) {
if (n == 0) diag(nrow(P)))
else if (n== 1) P
else P %*% mpow(P, n - 1))
}
rows_equal <- function(P, d = 4) {
P_new <- trunc(P * 10^d)
for (k in 2:nrow(P_new)) {
if (!all(P_new[1, ] == P_new[k, ])) {
return(FALSE)}
}
return(TRUE)
}
matrices_equal <- function(A, B, d = 4) {
A_new <- trunc(A * 10^d)
B_new <-trunc(B * 10^d)
if (all(A_new == B_new)) TRUE else FALSE
}
Now, to write the loop, we should do it something along the lines of:
First creating a function like so:
when_converged <- function(P) {...}
and
for (n in 1:50)
To try for when t.ex n = 50.
Although i don't know how to write the code correctly to do so, can anyone help me with that?
Thank you for reading my question.
Actually, a much better way is to do this:
## transition probability matrix
P <- matrix(c(0, 0, 0, 0.5, 0, 0.5, 0.1, 0.1, 0, 0.4, 0, 0.4, 0, 0.2, 0.2, 0.3, 0, 0.3, 0, 0, 0.3, 0.5, 0, 0.2, 0, 0, 0, 0.4, 0.6, 0, 0, 0, 0, 0, 0.4, 0.6), nrow = 6, ncol = 6, byrow = TRUE)
## a function to find stationary distribution
stydis <- function(P, tol = 1e-16) {
n <- 1; e <- 1
P0 <- P ## transition matrix P0
while(e > tol) {
P <- P %*% P0 ## resulting matrix P
e <- max(abs(sweep(P, 2, colMeans(P))))
n <- n + 1
}
cat(paste("convergence after",n,"steps\n"))
P[1, ]
}
Then when you call the function:
stydis(P)
# convergence after 71 steps
# [1] 0.002590674 0.025906736 0.116580311 0.310880829 0.272020725 0.272020725
The function stydis, essentially continuously does:
P <- P %*% P0
until convergence of P is reached. Convergence is numerically determined by the L1 norm of discrepancy matrix:
sweep(P, 2, colMeans(P))
The L1 norm is the maximum, absolute value of all matrix elements. When the L1 norm drops below 1e-16, convergence occurs.
As you can see, convergence takes 71 steps. Now, we can obtain faster "convergence" by controlling tol (tolerance):
stydis(P, tol = 1e-4)
# convergence after 17 steps
# [1] 0.002589361 0.025898057 0.116564506 0.310881819 0.272068444 0.271997814
But if you check:
mpow(P, 17)
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 0.002589361 0.02589806 0.1165645 0.3108818 0.2720684 0.2719978
# [2,] 0.002589415 0.02589722 0.1165599 0.3108747 0.2720749 0.2720039
# [3,] 0.002589738 0.02589714 0.1165539 0.3108615 0.2720788 0.2720189
# [4,] 0.002590797 0.02590083 0.1165520 0.3108412 0.2720638 0.2720515
# [5,] 0.002592925 0.02592074 0.1166035 0.3108739 0.2719451 0.2720638
# [6,] 0.002588814 0.02590459 0.1166029 0.3109419 0.2720166 0.2719451
Only the first 4 digits are the same, as you put tol = 1e-4.
A floating point number has a maximum of 16 digits, so I would suggest you use tol = 1e-16 for reliable convergence test.
I'm setting up an Monte Carlo simulation, and I have been trying to create a set of dummy variables for 180 countries and 12 time periods. Given the large amount of data points, is there a shorter way to create dummy variables for time and country fixed effects without pulling it out of an excel file?
For Example
F.T(1) 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0. 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0......(Extends until 180 countries)
F.T(2) 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0. 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0......(Extrends until 180 countries)
Any help would be greatly appreciated.
Using replicate with your random number generator of choice should do the trick
Here is an example using a simple binomial distribution with prob = 1/2
replicate(12, rbinom(180, 1, .5), simplify=FALSE)
I think it might be easier/faster to create all data with rbinom first and then convert it into a matrix instead of calling rbinom 12 times. That is:
set.seed(45)
t <- rbinom(180*12, 1, 0.5)
dim(t) <- c(180, 12)
Just to see if there 's a difference, here's a benchmark
# I use simplify = TRUE here.
FUN1 <- function(n, a) {
set.seed(45)
replicate(n, rbinom(a, 1, .5), simplify = TRUE)
}
FUN2 <- function(n, a) {
set.seed(45)
t <- rbinom(n*a, 1, 0.5)
dim(t) <- c(a, n)
t
}
require(rbenchmark)
benchmark(t1 <- FUN1(1000, 12000), t2 <- FUN2(1000, 12000),
order="elapsed", replications=5)
# test replications elapsed relative user.self sys.self
# 2 t2 <- FUN2(1000, 12000) 5 3.991 1.000 3.859 0.111
# 1 t1 <- FUN1(1000, 12000) 5 5.337 1.337 4.785 0.472
identical(t1, t2)
# [1] TRUE
To answer your question in comment:
w <- rep(diag(12)[1:9, ], N)
dim(w) <- c(9, 12*N)
w <- t(w)
colnames(w) <- paste0("t", 1:9)
Or even better:
w2 <- do.call(rbind, replicate(N, diag(12)[, 1:9], simplify = FALSE))
colnames(w2) <- paste0("t", 1:9)