Iteratively define user-defined discrete distributions - r

I am writing a script that, using -distr-, defines some discrete distributions based on the following objects:
margins <- c("discrete1", "discrete2")
vec1 <- list(support=c(0,1,2), probabilities=c(0.2, 0.2, 0.6))
vec2 <- list(support=c(12,14,20), probabilities=c(0.1, 0.15, 0.75))
Here you have the code that works as expeced: it creates the two distributions.
library("distr")
discrete1 <- DiscreteDistribution (supp = vec1[[1]], prob = vec1[[2]])
ddiscrete1 <- d(discrete1) # Density function
pdiscrete1 <- p(discrete1) # Distribution function
qdiscrete1 <- q(discrete1) # Quantile function
rdiscrete1 <- r(discrete1)
discrete2 <- DiscreteDistribution (supp = vec2[[1]], prob = vec2[[2]])
ddiscrete2 <- d(discrete2)
pdiscrete2 <- p(discrete2)
qdiscrete2 <- q(discrete2)
rdiscrete2 <- r(discrete2)
Once the two (or possibly more) distributions are defined, my final goal is to sample random numbers from them:
rdiscrete1(100)
rdiscrete2(100)
The problem with this code is that the number of distributions can be very high.. I wonder how it could be possible to automatize the creation of the functions in a more elegant manner.
Also, I need the two functions to be of class DiscreteDistribution and not as nested in lists (see is(discrete1) in my example).

l <- list(list(support = c(0, 1, 2), probabilities = c(0.2, 0.2, 0.6)),
list(support = c(12, 14, 20), probabilities = c(0.1, 0.15, 0.75)))
distrs <- lapply(1:length(l), function(n) {
d <- DiscreteDistribution(supp = l[[n]][[1]], prob = l[[n]][[2]])
list(d = d, dd = d(d), pd = p(d), qd = q(d), rd = r(d))
})
# First object of class DiscreteDistribution
is(distrs[[1]][[1]])
# [1] "DiscreteDistribution" "UnivariateDistribution" "AcDcLcDistribution"
# [4] "Distribution" "UnivDistrListOrDistribution"
# Random numbers
dim(sapply(distrs, function(x) x[[5]](100)))
# [1] 100 2

Related

Avoiding duplication in R

I am trying to fit a variety of (truncated) probability distributions to the same very thin set of quantiles. I can do it but it seems to require lots of duplication of the same code. Is there a neater way?
I am using this code by Nadarajah and Kotz to generate the pdf of the truncated distributions:
qtrunc <- function(p, spec, a = -Inf, b = Inf, ...)
{
tt <- p
G <- get(paste("p", spec, sep = ""), mode = "function")
Gin <- get(paste("q", spec, sep = ""), mode = "function")
tt <- Gin(G(a, ...) + p*(G(b, ...) - G(a, ...)), ...)
return(tt)
}
where spec can be the name of any untruncated distribution for which code in R exists, and the ... argument is used to provide the names of the parameters of that untruncated distribution.
To achieve the best fit I need to measure the distance between the given quantiles and those calculated using arbitrary values of the parameters of the distribution. In the case of the gamma distribution, for example, the code is as follows:
spec <- "gamma"
fit_gamma <- function(x, l = 0, h = 20, t1 = 5, t2 = 13){
ct1 <- qtrunc(p = 1/3, spec, a = l, b = h, shape = x[1],rate = x[2])
ct2 <- qtrunc(p = 2/3, spec, a = l, b = h, shape = x[1],rate = x[2])
dist <- vector(mode = "numeric", length = 2)
dist[1] <- (t1 - ct1)^2
dist[2] <- (t2- ct2)^2
return(sqrt(sum(dist)))
}
where l is the lower truncation, h is the higher and I am given the two tertiles t1 and t2.
Finally, I seek the best fit using optim, thus:
gamma_fit <- optim(par = c(2, 4),
fn = fit_gamma,
l = l,
h = h,
t1 = t1,
t2 = t2,
method = "L-BFGS-B",
lower = c(1.01, 1.4)
Now suppose I want to do the same thing but fitting a normal distribution instead. The names of the parameters of the normal distribution that I am using in R are mean and sd.
I can achieve what I want but only by writing a whole new function fit_normal that is extremely similar to my fit_gamma function but with the new parameter names used in the definition of ct1 and ct2.
The problem of duplication of code becomes very severe because I wish to try fitting a large number of different distributions to my data.
What I want to know is whether there is a way of writing a generic fit_spec as it were so that the parameter names do not have to be written out by me.
Use x as a named list to create a list of arguments to pass into qtrunc() using do.call().
fit_distro <- function(x, spec, l = 0, h = 20, t1 = 5, t2 = 13){
args <- c(x, list(spec = spec, a = l, b = h))
ct1 <- do.call(qtrunc, args = c(list(p = 1/3), args))
ct2 <- do.call(qtrunc, args = c(list(p = 2/3), args))
dist <- vector(mode = "numeric", length = 2)
dist[1] <- (t1 - ct1)^2
dist[2] <- (t2 - ct2)^2
return(sqrt(sum(dist)))
}
This is called as follows, which is the same as your original function.
fit_distro(list(shape = 2, rate = 3), "gamma")
# [1] 13.07425
fit_gamma(c(2, 3))
# [1] 13.07425
This will work with other distributions, for however many parameters they have.
fit_distro(list(mean = 10, sd = 3), "norm")
# [1] 4.08379
fit_distro(list(shape1 = 2, shape2 = 3, ncp = 10), "beta")
# [1] 12.98371

Computing Economic Models in R: How to apply shocks to parameter values in the euler equation?

Hi everyone im using R to try and simulate some economic models. We do this primarily through the use of the euler equation. I've figured out that applying shocks to values which are defined within the function (in this case it is k is pretty simple as seen in the code below, however I'm interested in applying a shock to parameters like delta, theta and rho.
For what its worth I'm using the R package deSolve. Any help is appreciated.
library('deSolve')
##############################################
#Computing the neoclassical growth model in R#
##############################################
#parameters and state space
A<-1
theta<- 0.1
alpha<-0.5
delta<-0.3
rho<-0.9
kinital <- c(k = 1)
times <- seq(from = 0, to = 100, by = 0.2)
#define euler equation
euler <- function(t, k, parms)
list((1/theta)*alpha*A*k^(alpha-1)-delta-rho)
#Compute
out <- ode(y = kinital, times = times, func = euler,
parms = NULL)
plot(out, main = "Euler equation", lwd = 2)
#########################
#Temporary Capital Shock#
########################
eventdat <- data.frame(var = c("k"),
time = c(30) ,
value = c(10),
method = c("add"))
eventdat1 <- data.frame(var = c("k"),
time = c(30) ,
value = c(-5),
method = c("add"))
out3<-ode(y=kinital,times=times,func=euler,events=list(data=eventdat))
out4<-ode(y=kinital,times=times,func=euler,events=list(data=eventdat1))
plot(out,out3,out4,main="Temporary Shock",lwd=3)
Not a great fix but the way to deal with this type of problem is by conditioning your values to take place over some interval. I do this for depreciation as follows:
##############################
#Temporary Depreciation Shock#
##############################
#New Vars
A<-1
theta<- 0.1
alpha<-0.5
delta<-0.3
rho<-0.9
kinital <- c(k = 17)
times <- seq(from = 0, to = 400, by = 0.2)
#Redefine Euler
euler2<-function(t,k,prams){
list((1/theta)*alpha*A*k^(alpha-1)-delta-rho)}
euler3<-function(t,k,prams){
list((1/theta)*alpha*A*k^(alpha-1)-(delta+0.05*(t>=30&t<=40))-rho)}
#Output
doutbase<-ode(y=kinital,times=times, func=euler2, parms=NULL)
doutchange<-ode(y=kinital,times=times, func=euler3, parms=NULL)
#plots
plot(doutbase,doutchange,main="Change in depreciation at t=30 until t=40",lwd=2)
A colleague off of stackexchange suggested a cleaner bit of code which is a bit cleaner. This is seen below:
A<-1
theta<- 0.1
alpha <- 0.5
rho<-0.9
init <- c(k = 17, delta = 0.3)
times <- seq(from = 0, to = 400, by = 0.2)
euler.function<-function(t,y, prams){
k <- y[1]
delta <- y[2]
dk <- (1/theta)*alpha*A*k^(alpha-1)-delta-rho
list(c(dk, 0))}
deventdat<- data.frame(var = c("delta", "delta"),
time = c(30, 51) ,
value = c(0.1, -0.1),
method = c("add"))
res<-ode(y=init,times=times, func=euler.function, parms=NULL, events=list(data=deventdat))
plot(res,lwd=2)

How do I need to assign values to each other in triplets using R?

The situation is as follows:
I need to create a dataset of triplets where we have discrete distribution of stock prices S <- c(80,100,120,140,160), with probability P <- c(0.2, 0.3, 0.2, 0.2, 0.1), call option C <- max(S-120,0) = c(0,0,0,20,40) and liability of an option which pays 30 if in a certain region otherwise zero, namely L = I{110 \leq S \leq 150} = c(0,0,30,30,0) <- c(0,0,30,30,0). It is important to mention that if P[1] = 80, then C[1] and L[1]. This holds for i = 1,2,3,4,5. How do you create a dataset for N = 10000 simulations where each value for i corresponds to the other two values for the same i?
This is the code I had for now. Note that X_1 = S, X_2 = C and Y = L.
X_1 <- function(n) {
sample(c(80,100,120,140,160), size = n,
prob = c(0.2, 0.3, 0.2, 0.2, 0.1), replace=T)
}
X_2 <- function(n) {
sample(X_1 - 120, size = n,
prob = c(0.2, 0.3, 0.2, 0.2, 0.1), replace=T)
}
Y <- function(n) {
sample(L, size = n,
prob = c(0.2, 0.3, 0.2, 0.2, 0.1), replace=T)
}
##Creating triplets##
df <- data.frame(S_T = X_1(10000), C_T = X_2(10000), L_T =Y(10000))
df```
I'm not sure if you want C_T to be dependent on the S_T values. If you do, I think you just want to call X_1, assign the results to an object, then use that as the argument to X_2 (or just subtract 120, which is what X_2 does).
X_1 <- function(n) {
sample(c(80,100,120,140,160), size = n,
prob = c(0.2, 0.3, 0.2, 0.2, 0.1), replace=T)
}
# Call that function
S_T <- X_1(10) # for practice
C_T <- S_T - 120 # that's all you're doing in function X_2, if you want to use S_T
If you want to C_T to contain values independent of S_T, you can create function within function
X_1 <- function(n) {
sample(c(80,100,120,140,160), size = n,
prob = c(0.2, 0.3, 0.2, 0.2, 0.1), replace=T)
}
X_2 <- function(n) {
X_1(n) - 120
}
S_T <- X_1(10) # Same as above
C_T <- X_2(10) # Gives values not dependent on S_T
EDIT to address comment below:
It's hard to read the comment, but it looks like you want create a function that takes the results of function X_1 and returns a result based on a condition. Use ifelse to read each element one at at time. You can create another function and then input the results of function X_1
Y <- function(X_1_func){
ifelse( X_1_func == 80,
return(0),
ifelse(X_1_func == 100,
return(0),
ifelse(X_1_func == 120,
return(30),
return(60) # Add a default value here or the last possible value if others are F
)
)
)
}
sapply(X_1(10), Y) # Use an apply to input one element of function X_1 at a time. Assign results to L or whatever you with to call.
If this all works for you, you can accept the answer.

Monte Carlo simulations for VAR models

I've been trying to estimate VAR models using Monte Carlo Simulation. I have 3 endogenous variables. I need some guidance regarding this.
First of all, I want to add an outlier as a percentage of the sample size.
Second (second simulation for same model), I want to add multivariate contaminated normal distribution like 0.9N (0, I) + 0.1((0,0,0)',(100, 100, 100)) instead of outlier.
Could you tell me how to do these?
Thank you.
RR <- function(n, out){
# n is number of observations
k <- 3 # Number of endogenous variables
p <- 2 # Number of lags
# add outlier
n[1]<- n[1]+out
# Generate coefficient matrices
B1 <- matrix(c(.1, .3, .4, .1, -.2, -.3, .03, .1, .1), k) # Coefficient matrix of lag 1
B2 <- matrix(c(0, .2, .1, .07, -.4, -.1, .5, 0, -.1), k) # Coefficient matrix of lag 2
M <- cbind(B1, B2) # Companion form of the coefficient matrices
# Generate series
DT <- matrix(0, k, n + 2*p) # Raw series with zeros
for (i in (p + 1):(n + 2*p)){ # Generate series with e ~ N(0,1)
DT[, i] <- B1%*%DT[, i-1] + B2%*%DT[, i-2] + rnorm(k, 0, 1)
}
DT <- ts(t(DT[, -(1:p)])) # Convert to time series format
#names <- c("V1", "V2", "V3") # Rename variables
colnames(DT) <- c("Y1", "Y2", "Y3")
#plot.ts(DT) # Plot the series
# estimate VECM
vecm1 <- VECM(DT, lag = 2, r = 2, include = "const", estim ="ML")
vecm2 <- VECM(DT, lag = 2, r = 1, include = "const", estim ="ML")
# mse
mse1 <- mean(vecm1$residuals^2)
mse2 <- mean(vecm2$residuals^2)
#param_list <- unname(param_list)
return(list("mse1" = mse1, "mse2" = mse2, "mse3" = mse3))
}
# defined the parameter grids(define the parameters ranges we want to run our function with)
n_grid = c(50, 80, 200, 400)
out_grid = c(0 ,5, 10)
# collect parameter grids in a list (to enter it into the Monte Carlo function)
prml = list("n" = n_grid, "out" = out_grid)
# run simulation
RRS <- MonteCarlo(func = RR, nrep = 1000, param_list = prml)
summary(RRS)
# make table:
rows = "n"
cols = "out"
MakeTable(output = RRS, rows = rows, cols = cols)

R. lapply multinomial test to list of dataframes

I have a data frame A, which I split into a list of 100 data frames, each having 3 rows (In my real data each data frame has 500 rows). Here I show A with 2 elements of the list (row1-row3; row4-row6):
A <- data.frame(n = c(0, 1, 2, 0, 1, 2),
prob = c(0.4, 0.5, 0.1, 0.4, 0.5, 0.1),
count = c(24878, 33605, 12100 , 25899, 34777, 13765))
# This is the list:
nest <- split(A, rep(1:2, each = 3))
I want to apply the multinomial test to each of these data frames and extract the p-value of each test. So far I have done this:
library(EMT)
fun <- function(x){
multinomial.test(x$count,
prob=x$prob,
useChisq = FALSE, MonteCarlo = TRUE,
ntrial = 100, # n of withdrawals accomplished
atOnce=100)
}
lapply(nest, fun)
However, I get:
"Error in multinomial.test(x$counts_set, prob = x$norm_genome, useChisq = F, :
Observations have to be stored in a vector, e.g. 'observed <- c(5,2,1)'"
Does anyone have a smarter way of doing this?
The results of split are created with names 1, 2 and so on. That's why x$count in fun cannot access it. To make it simpler, you can combine your splitted elements using the list function and then use lapply:
n <- c(0,1,2,0,1,2)
prob <- c(0.4, 0.5, 0.1, 0.4, 0.5, 0.1)
count <- c(24878, 33605, 12100 , 25899, 34777, 13765)
A <- cbind.data.frame(n, prob, count)
nest = split(A,rep(1:2,each=3))
fun <- function(x){
multinomial.test(x$count,
prob=x$prob,
useChisq = F, MonteCarlo = TRUE,
ntrial = 100, # n of withdrawals accomplished
atOnce=100)
}
# Create a list of splitted elements
new_list <- list(nest$`1`, nest$`2`)
lapply(new_list, fun)
A solution with dplyr.
A = data.frame(n = c(0,1,2,0,1,2),
prob = c(0.4, 0.5, 0.1, 0.4, 0.5, 0.1),
count = c(43, 42, 9, 74, 82, 9))
library(dplyr)
nest <- A %>%
mutate(pattern = rep(1:2,each=3)) %>%
group_by(pattern) %>%
dplyr::summarize(mn_pvals = multinomial.test(count, prob)$p.value)
nest

Resources