How to fix code in RMarkdown for simulation that will not run due to "Error: Discrete value supplied to continuous scale"? - r

I am wondering what is wrong with my following R code (R markdown)? I keep getting an error message for the last line that says "Error in h(x.n, df = N - 2) : unused argument (df = N - 2)". I am very confused because my TA looked at my code and told me that it should run perfectly.
For context, this is the problem I am working on:
library(MASS)
library(tidyverse)
library(hypergeo)
set.seed(1)
rm(list=ls())
N=7
Nsim=10000
rho=0
Sigma=matrix(c(1,rho,rho,1),2,2)
Sigma
mu=c(0,0)
r_vec=matrix(NaN,nrow=1,ncol=Nsim)
#have function mvrnorm-->simulate from multivariate normal distribution. N=7 Correlation matrix sigma. before X was fixed but now is random and formal dependence from Y that I can control. Compute rho hat and see if on average it gives me correct rho. Check how serious bias is when the expected value of rho hat isn't equal to rho. I want a feeling about whether this is something I should worry about or not
for (i in 1:Nsim){
data=mvrnorm(N, mu, Sigma)
r_vec[i]=cor(data[,1],data[,2])
}
mean(r_vec)
update.packages("deSolve")
x.n=seq(-1,1,0.1)
sim_rho0<-function(Nsim,N,rho){
rho=rho
mu=c(0,0)
Sigma=matrix(c(1,rho,rho,1),nrow=2)
r_vec=matrix(NaN,nrow=Nsim)
for (i in 1:Nsim){
data=mvrnorm(N, mu, Sigma)
r_vec[i]=cor(data[,1],data[,2])
}
# here we compute t, which should have a t_{N-2} distribution. This is different here and trying to reconstruct the .Not a mathematical proof. Might be a mistake*****
#range of values and plotting density for each one
h<- function(N,rho,x.n){
rho=rho
a <- ((N-2)*(gamma(N-1))*(1-rho^2)^(N-1)/2*(1-x.n^2)^(N-4)/2)/((2*pi)*(sqrt(N-1/2))((1-x.n*rho)^(N-3/2)))
b <- hypergeo(1/2, 1/2, (2*N-1/2), ((x.n*rho)+1)/2)
h2 = a*b
return(h2)
}
t=r_vec*sqrt(N-2)/(1-r_vec^2)
x.n=seq(-1,1,0.1)
y.n= h(N=10, rho=0.8, x.n=x.n)
df=tibble(X=t)
df2=tibble(x=x.n,y=y.n)
ggplot()+geom_histogram(data=df, aes(x=X,y=..density..),binwidth=0.2,
color="black", fill="white")+ geom_line(data = df2, aes(x = x, y = y),
color = "red")+xlim(-5,5)
}
rho=0.8
Nsim=3000
N=10
sim_rho0(Nsim,N,rho)

You've defined that the function h has the arguments N, rho and x.n. Then you try to call it with the argument df which h does not have, therefore you get the error. You need to call h with the correct arguments (i.e. also don't leave out N and rho, and if the value x.n should be passed to the function argument x.n, you need to specify it (don't use a positional argument). I also recommend to follow a style guide, e.g. https://style.tidyverse.org/

Related

Trying to plot loglikelihood of Cauchy distribution for different values of theta in R

I am trying to plot the log-likelihood function of the Cauchy distribution for varying values of theta (location parameter). These are my observations:
obs<-c(1.77,-0.23,2.76,3.80,3.47,56.75,-1.34,4.24,3.29,3.71,-2.40,4.53,-0.07,-1.05,-13.87,-2.53,-1.74,0.27,43.21)
Here is my log-likelihood function:
ll_c<-function(theta,x_values){
n<-length(x_values)
logl<- -n*log(pi)-sum(log(1+(x_values-theta)^2))
return(logl)
}
and Ive tried making a plot by using this code:
x<-seq(from=-10,to=10,by=0.1);length(x)
theta_null<-NULL
for (i in x){
theta_log<-ll_c(i,counts)
theta_null<-c(theta_null,theta_log)
}
plot(theta_null)
The graph does not look right and for some reason the length of x and theta_null differs.
I am assuming that theta is your location parameter (the scale is set to 1 in my example). You should obtain the same result using a t-distribution with 1 df and shifting the observations by theta. I left some comments in the code as guidance.
obs = c(1.77,-0.23,2.76,3.80,3.47,56.75,-1.34,4.24,3.29,3.71,-2.40,4.53,-0.07,-1.05,-13.87,-2.53,-1.74,0.27,43.21)
ll_c=function(theta, obs)
{
# Compute log-lik for obs and a value of thet (location)
logl= sum(dcauchy(obs, location = theta, scale = 1, log = T))
return(logl)
}
# Loop for possible values of theta(obs given)
x = seq(from=-10,to=10,by=0.1)
ll = NULL
for (i in x)
{
ll = c(ll, ll_c(i, obs))
}
# Plot log-lik vs possible value of theta
plot(x, ll)
It is hard to say exactly what you are experiencing without more info. But I'll make an educated guess.
First of all, we can simplify this a lot by using the *t family of functions for the t distribution, as the cauchy distribution is just the t distribution with df = 1. So your calculations could've been done using
for(i in ncp)
theta_null <- c(theta_null, sum(dt(values, 1, i, log = TRUE)))
Note that multiplying by n doesn't actually matter for any practical purposes. We are usually interested in minimizing/maximizing the likelihood in which case all constants are irrelevant.
Now if we use this approach, we can quite quickly notice something by printing the values:
print(head(theta_null))
[1] -Inf -Inf -Inf -Inf -Inf -Inf
So I am assuming what you are experiencing is that many of your values are "almost" negative infinity, and maybe these are not stored correctly in your outcome vector. I can't see that this should be the case from your code, but this would be my initial guess.

How to calculate Kullback-leiber divergence of Kernel estimation in R

I used Kernel estimation to get a non parametric probability density function. Then, I want to compare the tails 'distance' between two Kernel distribution of continuous variables, using Kullback-leiber divergence. I have tried the following code:
kl_l <- function(x,y) {
integrand <- function(x,y) {
f.x <- fitted(density(x, bw="nrd0"))
f.y <- fitted(density(y, bw="nrd0"))
return((log(f.x)-log(f.y))*f.x)
}
return(integrate(integrand, lower=-Inf,upper=quantile(density(x, bw="nrd0"),0.25))$value)
#the Kullback-leiber equation
}
When I run kl_l(a,b) for a, b = 19 continuous variables, it returns a warning
Error in density(y, bw = "nrd0") : argument "y" is missing, with no default
Is there any way to calculate this?
(If anyone wants to see the actual equation: https://www.bankofengland.co.uk/-/media/boe/files/working-paper/2019/attention-to-the-tails-global-financial-conditions-and-exchange-rate-risks.pdf page 13.)
In short, I think you just need to move the f.x and f.y outside the integrand (and possibly replace fitted with approxfun):
kl_l <- function(x, y) {
f.x <- approxfun(density(x, bw = "nrd0"))
f.y <- approxfun(density(y, bw = "nrd0"))
integrand <- function(z) {
return((log(f.x(z)) - log(f.y(z))) * f.x(z))
}
return(integrate(integrand, lower = -Inf, upper = quantile(density(x, bw="nrd0"), 0.25))$value)
#the Kullback-leiber equation
}
Expanding a little:
Looking at the paper you referenced, it appears as though you need to first create the two fitted distributions f and g. So if your variable a contains observations under the 1-standard-deviation increase in global financial conditions, and b contains the observations under average global financial conditions, you can create two functions as in your example:
f <- approxfun(density(a))
g <- approxfun(density(b))
Then define the integrand:
integrand <- function(x) log(f(x) / g(x)) * f(x)
The upper bound:
upper <- quantile(density(b, bw = "nrd0"), 0.25)
And finally do the integration on x within the specified bounds. Note that each value of x in the numerical computation has to go into both f and g; in your function kl_l, the x and y were separately going into the integrand, which I think is incorrect; and in any case, integrate will only have operated on the first variable.
integrate(integrand, lower = -Inf, upper = upper)$value
One thing to check for is that approxfun returns NA for values outside the range specified in the density, which can mess up your operation, so you'll need to adjust for those (if you expect the density to go to zero, for example).

mle2 on Weibull sample

I would like to use the mle2 function to produce mles for weibull shape and scale parameters. I have written the following code, but got the error:
So which component is NULL and I should change to numeric? Is there any other problems with my code to obtain the mles?
x2<- rweibull(n, shape = 1, scale = 1.5)
library(bbmle)
loglik2 <- function(theta, x){
shape<- theta[1]
scale<- theta[2]
K<- length(theta)
n<- length(x2)
out<- rep(0,K)
for(k in 1:K){
out[k] <- sum(dweibull(x2, shape, scale, log=TRUE))
}
return(out)
}
theta.start<- c(1, 1.4)
(mod <- mle2(loglik2,start=list(theta.start),data=list(x2)))
Error in validObject(.Object) :
invalid class “mle2” object: invalid object for slot "fullcoef" in class "mle2": got class "NULL", should be or extend class "numeric"
Edit following Ben Bolkers comments below:
You can pass the parameters individually rather than as a vector or
you can pass a named vector as input instead: see the vecpar argument in the docs (and use parnames(nllfun) <- ... on your negative log-likelihood function).
Passing individual parameters:
# some example data
library(bbmle)
set.seed(1)
n = 1000
x2 = rweibull(n, shape = 1, scale = 1.5)
Rewrite the likelihood function to return the minus LL
loglik2 = function(shape, scale, x)
-sum(dweibull(x, shape=shape, scale=scale, log=TRUE))
Estimate: naming the start parameters (also set lower parameters limits to avoid warnings)
mle2(loglik2, start=list(shape=1, scale=1),
method="L-BFGS-B",lower=list(shape=0, scale=0),
data=list(x=x2))
#Coefficients:
# shape scale
#1.007049 1.485067
# you can also use the formula notation
mle2(x~dweibull(shape=shape, scale=scale),
start=list(shape=1, scale=1),
method="L-BFGS-B",lower=list(shape=0, scale=0),
data=list(x=x2))
Passing a named vector for the parameters:
Also note in this example that the parameters are forced to be greater than zero by using a log link. From Ben's comment "I would probably recommend a log-link rather than box constraints" -- this is instead of using the lower optimisation parameter in the above example.
loglik2 = function(theta, x)
-sum(dweibull(x, shape=exp(theta[1]), scale=exp(theta[2]), log=TRUE))
# set the parameter names & set `vecpar` to TRUE
parnames(loglik2) = c("shape", "scale")
m = mle2(loglik2,
start=list(shape=0, scale=0),
data=list(x=x2), vecpar=TRUE)
exp(coef(m)) # exponentiate to get coefficients
# or the formula notation
mle2(x~dweibull(shape=exp(logshape),scale=exp(logscale)),
start=list(logshape=0, logscale=0),
data=list(x=x2))
A couple of comments on your code; from ?bblme help page:
"Note that the minuslogl function should return the negative log-likelihood" which yours didn't, and the start parameters should be a named list.

Wrong Hessian from optim in R

I am doing some Extreme Values analysis. I don't want to use the fevd package for a variety of reasons (the first I want to be able to tweak some things that I cannot do otherwise). I wrote my own code. It is mostly very simple, and I thought I had solved everything. But for some parameter combinations, the Hessian coming out of my log-likelihood analysis (based on optim ) will not be correct.
Going over one step at the time. My code - or selected part of it - looks like this:
# routines for non stationary
Log_lik_GEV <- function(dataIN,scaleIN,shapeIN,locationIN){
# simply calculate the negative log likelihood value for a set of X and parameters, for the GPD
#xi, mu, sigma - xi is the shape parameter, mu the location parameter, and sigma is the scale parameter.
# shape = xi
# location = mu
# scale = beta
library(fExtremes)
#dgev Density of the GEV Distribution, dgev(x, xi = 1, mu = 0, sigma = 1)
LLvalues <- dgev(dataIN, xi = shapeIN, mu = locationIN, beta = scaleIN)
NLL <- -sum(log(LLvalues[is.finite(LLvalues)]))
return(NLL)
}
function_MLE <- function(par , dataIN){
scoreLL <- 0
shape_param <- par[1]
scale_param <- par[2]
location_param <- par[3]
scoreLL <- Log_lik_GEV(dataIN, scale_param, shape_param, location_param)
if (abs(shape_param) > 0.3) scoreLL <- scoreLL*10000000
if ((scale_param) <= 0) {
scale_param <- abs(scale_param)
par[2] <- abs(scale_param)
scoreLL <- scoreLL*1000000000
}
sum(scoreLL)
}
kernel_estimation <- function(dati_AM, shape_o, scale_o, location_o) {
paramOUT <- optim(par = c(shape_o, scale_o, location_o), fn = function_MLE, dataIN = dati_AM, control = list(maxit = 3000, reltol = 0.00000001), hessian = TRUE)
# calculation std errors
covmat <- solve(paramOUT$hessian)
stde <- sqrt(diag(covmat))
print(covmat)
print('')
result <- list(shape_gev =paramOUT$par[1], scale_gev = paramOUT$par[2],location_gev =paramOUT$par[3], var_covar = covmat)
return(result)
}
Everything works great, in some cases. If I run my routines and the fevd routines, I get exactly the same results. In some cases (in my specific case when shape=-0.29 so strongly negative/weibull), my routine will give negative variances and funky hessians. It is not always wrong, but some parameter combinations are clearly not giving valid hessian (Note: the parameters are still estimated correctly, meaning are identical to the fevd results, but the covariance matrix is completely off).
I found this post that compared the hessian from two procedures, and indeed optim seems to be flaky. However, if I simply substitute maxLik in my routine, it just doesn't converge at all (even in those cases when the convergence was happening).
paramOUT = maxLik(function_MLE, start =c(shape_o, scale_o, location_o),
dataIN=dati_AM, method ='NR' )
I tried to give different initial values - even the correct ones - but it just doesn't converge.
I am not supplying data because I think that the optim routine is used correctly in my example. Simply, the numerical results are not stable for some parameter combination. My question is:
1) Am I missing something in the way I use maxLik?
2) Are there other optimization routines, besides maxLik, from which I can extract the hessian?
thanks

Compute multiple Integral and plot them (with R)

I'm having trouble to compute and then plot multiple integral. It would be great if you could help me.
So I have this function
> f = function(x, mu = 30, s = 12){dnorm(x, mu, s)}
which i want to integrate multiple time between z(1:100) to +Inf to plot that with x=z and y = auc :
> auc = Integrate(f, z, Inf)
R return :
Warning message:
In if (is.finite(lower)) { :
the condition has length > 1 and only the first element will be used
I have tested to do a loop :
while(z < 100){
z = 1
auc = integrate(f,z,Inf)
z = z+1}
Doesn't work either ... don't know what to do
(I'm new to R , so I'm already sorry if it is really easy .. )
Thanks for your help :) !
There is no need to do the integrating by hand. pnorm gives the integral from negative infinity to the input for the normal density. You can get the upper tail instead by modifying the lower.tail parameter
z <- 1:100
y <- pnorm(z, mean = 30, sd = 12, lower.tail = FALSE)
plot(z, y)
If you're looking to integrate more complex functions then using integrate will be necessary - but if you're just looking to find probabilities for distributions then there will most likely be a function built in that does the integration for you directly.
Your problem is actually somewhat subtle, and in a certain sense gets to the core of how R works, so here is a slightly longer explanation.
R is a "vectorized" language, which means that just about everything works on vectors. If I have 2 vectors A and B, then A+B is the element-by-element sum of A and B. Nearly all R functions work this way also. If X is a vector, then Y <- exp(X) is also a vector, where each element of Y is the exponential of the corresponding element of X.
The function integrate(...) is one of the few functions in R that is not vectorized. So when you write:
f <- function(x, mu = 30, s = 12){dnorm(x, mu, s)}
auc <- integrate(f, z, Inf)
the integrate(...) function does not know what to do with z when it is a vector. So it takes the first element and complains. Hence the warning message.
There is a special function in R, Vectorize(...) that turns scalar functions into vectorized functions. You would use it this way:
f <- function(x, mu = 30, s = 12){dnorm(x, mu, s)}
auc <- Vectorize(function(z) integrate(f,z,Inf)$value)
z <- 1:100
plot(z,auc(z), type="l") # plot lines

Resources