Creating Monte Carlo data for Dummy Variables in R - r

I'm setting up an Monte Carlo simulation, and I have been trying to create a set of dummy variables for 180 countries and 12 time periods. Given the large amount of data points, is there a shorter way to create dummy variables for time and country fixed effects without pulling it out of an excel file?
For Example
F.T(1) 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0. 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0......(Extends until 180 countries)
F.T(2) 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0. 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0......(Extrends until 180 countries)
Any help would be greatly appreciated.

Using replicate with your random number generator of choice should do the trick
Here is an example using a simple binomial distribution with prob = 1/2
replicate(12, rbinom(180, 1, .5), simplify=FALSE)

I think it might be easier/faster to create all data with rbinom first and then convert it into a matrix instead of calling rbinom 12 times. That is:
set.seed(45)
t <- rbinom(180*12, 1, 0.5)
dim(t) <- c(180, 12)
Just to see if there 's a difference, here's a benchmark
# I use simplify = TRUE here.
FUN1 <- function(n, a) {
set.seed(45)
replicate(n, rbinom(a, 1, .5), simplify = TRUE)
}
FUN2 <- function(n, a) {
set.seed(45)
t <- rbinom(n*a, 1, 0.5)
dim(t) <- c(a, n)
t
}
require(rbenchmark)
benchmark(t1 <- FUN1(1000, 12000), t2 <- FUN2(1000, 12000),
order="elapsed", replications=5)
# test replications elapsed relative user.self sys.self
# 2 t2 <- FUN2(1000, 12000) 5 3.991 1.000 3.859 0.111
# 1 t1 <- FUN1(1000, 12000) 5 5.337 1.337 4.785 0.472
identical(t1, t2)
# [1] TRUE
To answer your question in comment:
w <- rep(diag(12)[1:9, ], N)
dim(w) <- c(9, 12*N)
w <- t(w)
colnames(w) <- paste0("t", 1:9)
Or even better:
w2 <- do.call(rbind, replicate(N, diag(12)[, 1:9], simplify = FALSE))
colnames(w2) <- paste0("t", 1:9)

Related

How to construct mixture copula in R

I want to study mixture Copula for reliability analysis.however I can't construct RVINEMatrix ,
Therefore, the probability integral transformation (PIT) cannot be performed、 The copula used in H-equation to convert related variables into independent variables cannot be filled with mixed copulas。
Here is my code:
copula1 <- mixCopula(list(claytonCopula(param = 1.75,dim = 2),
frankCopula(param = 0.718,dim = 2),
gumbelCopula(param = 1.58,dim = 2)),w=c(0.4492,0.3383,0.2125))
copula2 <- mixCopula(list(frankCopula(param = 0.69,dim = 2),
gumbelCopula(param = 1.48,dim = 2),
claytonCopula(param = 1.9,dim = 2)),w=c(0.3784,0.3093,0.3123))
copula3 <- mixCopula(list(frankCopula(param = 7.01,dim = 2),
claytonCopula(param = 0.75,dim = 2),
gumbelCopula(param = 1.7,dim = 2)),w=c(0.4314,0.2611,0.3075))
copula4 <- mixCopula(list(gumbelCopula(param = 1.21,dim = 2),
claytonCopula(param = 0.89,dim = 2),
frankCopula(param = 3.62,dim = 2)),w=c(0.3306,0.2618,0.4076))
.......
Matrix <- c (5, 4, 3, 2, 1,
0, 4, 3, 2, 1,
0, 0, 3, 2, 1,
0, 0, 0, 2, 1,
0, 0, 0, 0, 1)
Matrix <- matrix(Matrix, 5, 5)
family1 <- c(0,copula10,copula9,copula7, copula4,
0, 0, copula8,copula6, copula3,
0, 0, 0, copula5, copula2,
0, 0, 0, 0, copula1,
0, 0, 0, 0, 0)
family1 <- matrix(family1, 5, 5)
par <- c(0, 0.2, 0.5,0.32, 0.50,``
0, 0, 0.5, 0.98, 0.5,
0, 0, 0, 0.9 , 0.5,
0, 0, 0, 0, 0.39,
0, 0, 0, 0, 0)
par <- matrix(par, 5, 5)
par2 <- c(0, 0, 0, 0, 0,
0, 0, 0, 0, 0,
0, 0, 0, 0, 0,
0, 0, 0, 0, 0,
0, 0, 0, 0, 0)
par2 <- matrix(par2, 5, 5)
RVM <- RVineMatrix(Matrix = Matrix, family = family1,
par = par, par2 = par2,
names = c("V1", "V2", "V3", "V4", "V5"),check.pars = TRUE)
so could you help me to construct the rvinematrix ? or Achieve this by other means. thanks!
There are some points you should be aware of:
You use the mixcopula from the copula package. That will provide you with a mixture model with a copula, not a mixture of R-vine copula.
Then you try to fit the copula generated from the mixture of copula into the Rvine copula model. This will not work because the index for copula functions in the R-vine copula is different from the one in the copula package. So, Rvine matrix accepts only a number, where each number corresponds to a specific type of copula.
So, to build a mixture of the R-vine copula model, you should build a mixture of R-vine densities. There exist a clustering GitHub package, called vineclust. It is designed for vine copula clustering models. By the way, for the mixture of Rvine copula, you need (for two components), two matrices of families, parameters, and Matrix.
An example of vine mixture from vineclust is:
dims <- 3
obs <- c(500,500)
RVMs <- list()
RVMs[[1]] <- VineCopula::RVineMatrix(Matrix=matrix(c(1,3,2,0,3,2,0,0,2),dims,dims),
family=matrix(c(0,3,4,0,0,14,0,0,0),dims,dims),
par=matrix(c(0,0.8571429,2.5,0,0,5,0,0,0),dims,dims),
par2=matrix(sample(0, dims*dims, replace=TRUE),dims,dims))
RVMs[[2]] <- VineCopula::RVineMatrix(Matrix=matrix(c(1,3,2,0,3,2,0,0,2), dims,dims),
family=matrix(c(0,6,5,0,0,13,0,0,0), dims,dims),
par=matrix(c(0,1.443813,11.43621,0,0,2,0,0,0),dims,dims),
par2=matrix(sample(0, dims*dims, replace=TRUE),dims,dims))
margin <- matrix(c('Normal', 'Gamma', 'Lognormal', 'Lognormal', 'Normal', 'Gamma'), 3, 2)
margin_pars <- array(0, dim=c(2, 3, 2))
margin_pars[,1,1] <- c(1, 2)
margin_pars[,1,2] <- c(1.5, 0.4)
margin_pars[,2,1] <- c(1, 0.2)
margin_pars[,2,2] <- c(18, 5)
margin_pars[,3,1] <- c(0.8, 0.8)
margin_pars[,3,2] <- c(1, 0.2)
x_data <- rvcmm(dims, obs, margin, margin_pars, RVMs)

Replicating dplyr pipe structure with apply family or loop

I have a data frame df in which for each column I want to calculate what share of occurrences also occur in another column. Each row of occurrences has a weight so ideally I would like to get a weighted share.
A <- c(0, 1, 0, 0, 1, 0, 1, 1, 1, 0)
B <- c(0, 1, 0, 1, 1, 0, 0, 0, 0, 0)
C <- c(0, 0, 0, 1, 1, 0, 0, 0, 0, 1)
D <- c(1, 0, 0, 1, 1, 0, 0, 0, 0, 0)
weight <- c(0.5, 1, 0.2, 0.3, 1.4, 1.5, 0.8, 1.2, 1, 0.9)
df <- data.frame(A, B, C, D, weight)
I was trying to calculate it for each column pair this way:
#total weight of occurences in A
wgt_A <- df%>%
filter(A == 1)%>%
summarise(weight_A = sum(weight))%>%
select(weight_A)
#weighted share of occurrences in A that also occur in B
wgt_A_B <- df%>%
filter(A == 1, B == 1)%>%
summarise(weight_A_B = sum(weight))%>%
select(weight_A_B)
Result_1 <- wgt_A_B / wgt_A
I would want to end up with six results in total for all combinations of the 4 columns. However, for this I would need to replicate this dplyr pipe a lot of times and my actual dataset has 20+ columns like this. Is there a more efficient/quicker way to do this with apply/sapply or some kind of loop where I can also select for which columns I want to perform this?
I'm new to R and stackoverflow so please let me know (and excuse me) if I'm doing/saying anything stupid
We may use combn to do the combinations in base R
out <- combn(df[1:4], 2, FUN = function(x)
sum(df$weight[x[[1]] & x[[2]]])/ sum(df$weight[as.logical(x[[1]])]) )
names(out) <- combn(names(df)[1:4], 2, FUN = paste, collapse = "_")
-output
> out
A_B A_C A_D B_C B_D C_D
0.4444444 0.2592593 0.2592593 0.6296296 0.6296296 0.6538462

How to create a confusion matrix using a function in R

I created the following data set:
actual <- c(1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0)
predicted <- c(1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0)
The following code works, but I want to use a function to create a confusion matrix instead:
#create new data frame
new_data <- data.frame(actual, predicted)
new_data["class"] <- ifelse(new_data["actual"]==0 & new_data["predicted"]==0, "TN",
ifelse(new_data["actual"]==0 & new_data["predicted"]==1, "FP",
ifelse(new_data["actual"]==1 & new_data["predicted"]==0, "FN", "TP")))
(conf.val <- table(new_data["class"]))
What might be the code to do that?
If you want the same output format as the one you posted, then consider this function
confusion <- function(pred, real) {
stopifnot(all(c(pred, real) %in% 0:1))
table(matrix(c("TN", "FP", "FN", "TP"), 2L)[cbind(pred, real) + 1L])
}
Output
> confusion(predicted, actual)
FN FP TN TP
1 2 5 4
The caret library offers a great collection of methods for machine learning
library(caret)
actual <- as.factor(c(1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0))
predicted <- as.factor(c(1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0))
caret::confusionMatrix(data = predicted, actual, positive="1")

Accuracy in list of knn loop results different from actual knn accuracy

I'm running a knn model in R and I'm trying to find the optimal k. To achieve this, I've constructed the following code.
suppressMessages(library(class))
set.seed(1)
Lag1 = rnorm(30)
Direction = c(0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1)
X_train <- data.frame(Lag1[1:20])
X_test <- data.frame(Lag1[21:30])
Y_train <- data.frame(Direction[1:20])
Y_test <- data.frame(Direction[21:30])
knn_res <- rep(1,10)
for (i in 1:10) {
predk <- knn(X_train, X_test, Y_train[,1], k=i)
cm <- as.matrix(table(predk, Y_test[,1]))
knn_res[i] <- sum(diag(cm))/length(predk)
}
# which is most optimal
which.max(knn_res)
# looks like K = 1 is the most optimal
predk <- knn(X_train, X_test, Y_train[,1], k=1)
cm <- as.matrix(table(predk1, Y_test[,1]))
sum(diag(cm))/length(predk)
According to which.max(knn_res) my optimal k should be but when I run the exact code from the loop to print my confusion matrix, the accuracy getting returned does not match up with the accuracy in my knn_res list. knn_res[1] returns 0.5 while sum(diag(cm))/length(predk) returns 0.3.
Where have I gone amiss? I feel like it's something in the way I'm adding to my knn_res list but I'm not sure what...

How to encode .. pick the best X of Y choices (min or max) .. in mixed integer linear program using R and lpSolve?

I am trying to solve an exercise related to optimization using binary constraint. Below is a description of the problem.
For this problem I am using R and lpSolveAPI - so far I managed to translate the problem into a list of constraints and build the correct objective function for the problem however my program does not produce the correct output because I place the three Y variables (yE, yT and yN) into my objective function. My objective function should not contain the three trailing 0 (see the definition of the objective function on the picture above).
My question, how can I define the variable y such that they are binary and only used as part of the constraint (so they don't appear in the objective function)?
# SELECT FROM ....
require(lpSolveAPI)
# Set the decision variables
obj <- c(21, 22.5, 22.5, 24.5, 23, 25.5, 0, 0, 0)
# Set the constrains parameters
# EG,EK,TG,TK,NG,NK,yE,yT,yN
LHS <- matrix(c(1, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 1, 0, 0, 0,
1, 0, 1, 0, 1, 0, 0, 0, 0,
0, 1, 0, 1, 0, 1, 0, 0, 0,
1, 1, 0, 0, 0, 0, -425, 0, 0,
0, 0, 1, 1, 0, 0, 0, -400, 0,
0, 0, 0, 0, 1, 1, 0, 0, -750,
0, 0, 0, 0, 0, 0, 1, 1, 1), nrow=9, byrow = TRUE)
RHS <- c(425, 400, 750, 550, 450, 0, 0, 0, 2)
constranints_direction <- c("<=", "<=", "<=", ">=", ">=", "<=", "<=", "<=", "<=")
# Set 9 constraints and 9 decision variables ==> THERE SHOULD BE ONLY 6 !!!
lprec <- make.lp(nrow = 9, ncol = 9)
# Set the type of problem we are trying to solve
lp.control(lprec, sense="min")
set.type(lprec, 7:9, c("binary"))
set.objfn(lprec, obj)
add.constraint(lprec, LHS[1, ], constranints_direction[[1]], RHS[1])
add.constraint(lprec, LHS[2, ], constranints_direction[[2]], RHS[2])
add.constraint(lprec, LHS[3, ], constranints_direction[[3]], RHS[3])
add.constraint(lprec, LHS[4, ], constranints_direction[[4]], RHS[4])
add.constraint(lprec, LHS[5, ], constranints_direction[[5]], RHS[5])
add.constraint(lprec, LHS[6, ], constranints_direction[[6]], RHS[6])
add.constraint(lprec, LHS[7, ], constranints_direction[[7]], RHS[7])
add.constraint(lprec, LHS[8, ], constranints_direction[[8]], RHS[8])
add.constraint(lprec, LHS[9, ], constranints_direction[[9]], RHS[9])
# Display the LPsolve matrix
lprec
get.type(lprec)
# Solve problem
solve(lprec)
# Get the decision variables values
get.variables(lprec)
# Get the value of the objective function
get.objective(lprec)
This code produce the objective output 22850
> # Get the decision variables values
> get.variables(lprec)
[1] 0 425 0 0 550 25 1 0 1
> # Get the value of the objective function
> get.objective(lprec)
[1] 22850
However it must produce 22850.50 for the same variable allocation.
If you would run:
obj <- c(21, 22.5, 22.5, 24.5, 23, 25.5)
x <- c(0, 425, 0, 0, 550, 25)
obj %*% x
you would see:
[,1]
[1,] 22850
i.e. this allocation gives an objective of 22850.

Resources