I estimated the cdf of my density in an interval of length 0.03 with 10k points. Even though my cdf is pretty smooth, my inverse of the cdf isn't smooth at all. Here the interval of length 1 is also evaluated with 10k points.
See: Estimated CDF & Estimated Inverse CDF
For the Inverse CDF I use:
x = seq(from = 0, to = 1, length = 10000)
F_hat_inv_given_x = function(y){
uniroot(function(x){
F_hat_given_x(x)-y
},interval=c(0.065, 0.095))$root
}
F_hat_inv_given_x = Vectorize(F_hat_inv_given_x)
with F_hat_given_x defined as:
F_hat_given_x = function(y) {
integrate(f = f_hat_given_x, min(y_data), y)$value
}
F_hat_given_x <- Vectorize(F_hat_given_x)
where f_hat_given_x is my density:
f_hat_given_x = function(y){
tapply(y, x = x_sample, FUN = f_hat, INDEX = 1:length(y))
}
which is a conditional density f(y | X = x) for a given sample X = x_sample:
f_hat = function(x, y){
(sum(K(abs(x-x_data)/H_n) * (K(abs(y-y_data)/h_n)))) / (h_n * sum(K(abs(x-x_data)/H_n)))
}
with K being the gaussian kernel, h_n, H_n are bandwidths computed via npcdensbw and x_data and y_data are my given data on which I evaluate the conditional density f_hat(x,y)
Any idea why the inverse function isn't monotone increasing like the inverse should? What is causing the inaccuracity?
Related
I want to want to estimate a model in R.
One of its part is a finite mixture model which is consisted of two OLS.
As a freshman in R, I don't know how to write this probability density function in R.
I wonder if you can give some help.
The probability density function is as following:
f(y|x)=(p/σ1)*φ(y-x*b1/σ1)+((1-p)/σ2)*φ(y-x*b2/σ2)
I have used stata to write a example:
gen double f1'=normalden($ML_y1,xb1',exp(lns1'))
gen doublef2'=normalden($ML_y1,xb2',exp(lns2'))
tempvar p
gen double p'=exp(lp')/(1+exp(lp'))
replacelnf'=ln(p'*f1'+(1-p')*f2')
I wonder if you can show me how to write this function in R.
Thanks a lot and I am looking forward to your help
See the function FLXMRglm. The density is estimated with dnorm
library(flexmix)
FLXMRglm
# your case
if (family == "gaussian") {
z#defineComponent <- function(para) {
predict <- function(x, ...) {
dotarg = list(...)
if ("offset" %in% names(dotarg))
offset <- dotarg$offset
p <- x %*% para$coef
if (!is.null(offset))
p <- p + offset
p
}
logLik <- function(x, y, ...) dnorm(y, mean = predict(x,
...), sd = para$sigma, log = TRUE)
new("FLXcomponent", parameters = list(coef = para$coef,
sigma = para$sigma), logLik = logLik, predict = predict,
df = para$df)
}
z#fit <- function(x, y, w, component) {
fit <- lm.wfit(x, y, w = w, offset = offset)
z#defineComponent(para = list(coef = coef(fit), df = ncol(x) +
1, sigma = sqrt(sum(fit$weights * fit$residuals^2/mean(fit$weights))/(nrow(x) -
fit$rank))))
}
}
I have written some code to demonstrate the benefits of diversifying across a range of assets when planning a portfolio.
#Portfolio variance
y <- function(x) {
vars <- rnorm(n = x, mean = 0.7, sd = 0.07)
# rho generating function:
ranrho <- function(z) {
c(rep(0, times = z), runif(n = x-z, min = -1, max = 0.9)) + 0.1
}
#you get the lower diagonal of the covar matrix:
rhold <- sapply(1:x, ranrho)
rhold
#and the full corr matrix
corr <- rhold + t(rhold)
corr <- corr * sqrt(vars)
corr <- t(t(corr) * sqrt(vars))
#And the variance-covariance matrix:
vcov <- corr + if(x==1) {vars
} else {diag(vars)}
#And recover the portfolio variance through the following equation
portvar <- (1/x^2)*sum(vars) + if(x==1) { as.numeric(0)
} else {(1/x^2)*(sum(corr)/2)}
return(portvar)
}
y(1)
y(5)
y(10)
y(20)
y(50)
y(100)
plot(1:100, y(1:100))
y returns a single value for each value of x that is inputted. x is an integer representing number of assets and y is the consequent variance of the portfolio. I have examples for different values of x and each return a sensible value for y with no issues.
My problem is that I'm getting the following error when trying to plot my function:
Error in rhold + t(rhold) : non-conformable arrays
In addition: Warning message:
In 1:x : numerical expression has 100 elements: only the first used
Firstly, since the code works when I write y([whatever]), rhold and t(rhold) are clearly conformable. I'm not sure what the second error means. Am I using the plot function wrong?
Another issue - I have looked but can't seem to find how you might plot a function which only accepts integer values of x in its domain, but returns values of y on the continuous real line.
Grateful for any help.
The problem with your code is that your function expects a single numeric and you are using 100
Just use lappy:
#Portfolio variance
y <- function(x) {
vars <- rnorm(n = x, mean = 0.7, sd = 0.07)
# rho generating function:
ranrho <- function(z) {
c(rep(0, times = z), runif(n = x-z, min = -1, max = 0.9)) + 0.1
}
#you get the lower diagonal of the covar matrix:
rhold <- sapply(1:x, ranrho)
rhold
#and the full corr matrix
corr <- rhold + t(rhold)
corr <- corr * sqrt(vars)
corr <- t(t(corr) * sqrt(vars))
#And the variance-covariance matrix:
vcov <- corr + if(x==1) {vars
} else {diag(vars)}
#And recover the portfolio variance through the following equation
portvar <- (1/x^2)*sum(vars) + if(x==1) { as.numeric(0)
} else {(1/x^2)*(sum(corr)/2)}
return(portvar)
}
y(1)
y(5)
y(10)
y(20)
y(50)
y(100)
plot(1:100,lapply(1:100,y))
I first want to sample 100 gamma distributed numbers where shape = 2 and scale = 1/2. I wrote down the log-likelyhood function and negated it since I'm using a minimization tool to maximize. I also tried using optim but to no avail. both optim and nlm gave me different answers. This is my code thus far:
N = 100
shape = 2
scale = 1/2
Data <- rgamma(SampSize, shape, scale)
LogL = function (x){
k = x[1]
gamma = x[2]
(-1)*(N*x[1]*log(x[2])+(x[1]-1)*sum(log(Data))-x[2]*sum(Data))
}
nlm(LogL,c(1.5,1))
logL <- function (x) -sum(dgamma(Data, x[1], x[2], log = TRUE))
N = 100
shape = 2
scale = 1/2
Data <- rgamma(N, shape, scale)
optim(c(1.5, 1), logL)$par
nlm(logL, c(1.5, 1))$estimate
Given a Laplace Distribution proposal:
g(x) = 1/2*e^(-|x|)
and sample size n = 1000, I want to Conduct the Monte Carlo (MC) integration for estimating θ:
via importance sampling. Eventually I want to calculate the mean and standard deviation of this MC estimate in R once I get there.
Edit (arrived late after the answer below)
This is what I have for my R code so far:
library(VGAM)
n = 1000
x = rexp(n,0.5)
hx = mean(2*exp(-sqrt(x))*(sin(x))^2)
gx = rlaplace(n, location = 0, scale = 1)
Now we can write a simple R function to sample from Laplace distribution:
## `n` is sample size
rlaplace <- function (n) {
u <- runif(n, 0, 1)
ifelse(u < 0.5, log(2 * u), -log(2* (1 - u)))
}
Also write a function for density of Laplace distribution:
g <- function (x) ifelse(x < 0, 0.5 * exp(x), 0.5 * exp(-x))
Now, your integrand is:
f <- function (x) {
ifelse(x > 0, exp(-sqrt(x) - 0.5 * x) * sin(x) ^ 2, 0)
}
Now we estimate the integral using 1000 samples (set.seed for reproducibility):
set.seed(0)
x <- rlaplace(1000)
mean(f(x) / g(x))
# [1] 0.2648853
Also compare with numerical integration using quadrature:
integrate(f, lower = 0, upper = Inf)
# 0.2617744 with absolute error < 1.6e-05
I have a compositional sample and I would like to fit a finite mixture of Dirichlet distributions. To be more precise, consider the following example:
library(gtools)
set.seed(1)
PROB = c(0.25, 0.15, 0.60)
ALPHA = list(
c(1,1,1),
c(2,1,1),
c(1,1,20)
)
size = 500
N = sapply(1:3, function(i, z) sum(z == i),
sample(1:3, size, prob = PROB, replace = TRUE))
X = do.call('rbind',
sapply(1:3, function(i, N)
rdirichlet(N[i], ALPHA[[i]]), N))[sample(1:size),]
X contains sample generated from a mixture of Dirichlet distributions defined in the 3-part simplex. The first Dirichlet component of this mixture has parameter (1,1,1), the second component has parameter (2,1,1) and the third (1,1,20). The mixture probabilities are 0.25, 0.15, 0.60. I would like to retrieve these parameters from the sample.
How would you find this parameters?
Reparameterizing in terms of theta1=log(p1/p3), theta2=log(p2/p3) and logs of all 9 alpha parameters, and then maximising the log likelihood using optim() with method="BFGS" seems to work if using initial values sufficiently close to the parameter values used to simulate the data. At least, all eigenvalues of the Hessian are negative, and small changes in initial values leads to the the same optimum.
repar <- function(theta) {
p <- exp(theta[1])
p[2] <- exp(theta[2])
p[3] <- 1
p <- p/sum(p)
alpha <- matrix(exp(theta[3:11]),3,3,byrow=TRUE)
list(p=p,alpha=alpha)
}
logL <- function(theta,x) {
par <- repar(theta)
p <- par$p
alpha <- par$alpha
terms <- 0
for (i in 1:length(p)) {
terms <- terms + p[i]*ddirichlet(x,alpha[i,])
}
-sum(log(terms))
}
start <- c(log(c(.25,.15)/.6), log(c(1,1,1, 2,1,1, 1,1,20)))
fit <- optim(start,logL,x=X,hessian=TRUE,method="BFGS")
repar(fit$par)
eigen(fit$hessian)$val
fit2 <- optim(start+rnorm(11,sd=.2),logL,x=X,hessian=TRUE,method="BFGS")
repar(fit2$par)