How do I solve a SDE with two cases in R? - r

I want to solve the following stochastic differential equation with R:
\frac{dx}{dt}=f(x)+sigma*dW
f(x)= a+bx+cx^2 (for x \leq 1) f(x)= a+bx (for x > 1)
and
sigma=d^2
where (a, b, c, and d are constants).
I tried using:
f = expression(a+bx+cx^2)
s = expression(d^2)
solution <- sde.sim(X0=0.6, t0=0, N=2000, delta=0.01, drift = f, sigma = s )
But how do I include the second case (when x>1)?
Sorry for the poor inclusion of the mathematical expression. I do not how to write latex here.

Maybe something like this where (x <= 1) would evaluate to 0 or 1 depending on the case.
f = expression(1+ 2 * x + (x <= 1) * 3*x^2)
s = expression(2^2)
solution <- sde.sim(X0=0.6, t0=0, N=2000, delta=0.01, drift = f, sigma = s)

Related

Optimization problem | Solve an equation with 4 parameters having two conditions

I have this equation :
f(x) = i * ln(j * x + k)
With these two conditions : f(0) = 6 & f(1) = 12
After several hours of research, I can not find how to optimize the parameters i, j & k which respect the conditions with RStudio.
I know how to do it with Excel, but I want to succeed in doing it with R.
Anyone have any idea to fix this problem with R?
i can help you with the monte carlo method
so :
after math calcul you find :
i=log(k)/6
k=exp(72*log(j+k))
so you apply the monte carlo method :
a=data.frame(k=round(runif(1000000,-2,2),4),j=round(runif(1000000,-2,2),4))
a$k2=round(exp(72*log(a$j+a$k)),4)
a=a[-which(is.na(a$k2)==TRUE),] # you delete the NA coz of negatif number in log
library(tidyverse) # to use "near" function
a[which(near(a$k,a$k2,0.001)==TRUE),]
Given the constraints you can solve for i and j in terms of k:
f(0) = 6
=> i*ln( j*0 + k) = 6
=> i*ln(k) = 6
=> i = 6/ln(k)
f(1) = 12
=> i*ln( j*1 + k) = 12
=> (6/ln(k))*ln(j+k) = 12
=> ln(j+k) = 2*ln(k)
=> j+k = k*k
=> j = k*k-k
So
f(x) = (6/ln(k))*ln( (k*(k-1)*x + k)
As a check
f(0) = (6/ln(k))*ln( (k*(k-1)*0 + k)
= (6/ln(k))*ln(k) = 6
f(1) = (6/ln(k))*ln( (k*(k-1)*1 + k)
= (6/ln(k))*ln( k*k)
= (6/ln(k))*2*ln(k)
= 12
However I do not understand what you want to optimize.
optim
Define f as the function in the question except we explicitly list all arguments and ss as the residual sum of squares. Then minimize ss using an arbitrary value for i (since we have two equations and 3 unknowns). Below we show the solution for j and k (in the par component of the output) using i = 10 as an input.
f <- function(x, i, j, k) i * log(j * x + k)
ss <- function(p, i) (f(x = 0, i = i, j = p[1], k = p[2]) - 6)^2 +
(f(x = 1, i = i, j= p[1], k = p[2]) - 12)^2
optim(1:2, ss, i = 10)
giving:
$par
[1] 1.497972 1.822113
$value
[1] 9.894421e-09
$counts
function gradient
59 NA
$convergence
[1] 0
$message
NULL
nlsLM
Alternately we can use nonlinear least squares. This is slightly easier to specify since we don't need to define ss but it does require a package. We use nlsLM instead of nls in the core of R since nls does not handle zero residual problems well.
library(minpack.lm)
nlsLM(y ~ f(x, i, j, k), data = list(y = c(6, 12), x = 0:1, i = 10),
start = c(j = 1, k = 2))
giving:
Nonlinear regression model
model: y ~ f(x, i, j, k)
data: list(y = c(6, 12), x = 0:1, i = 10)
j k
1.50 1.82
residual sum-of-squares: 0
Number of iterations to convergence: 4
Achieved convergence tolerance: 1.49e-08

Implementing OLS in matrix form

I'm having problems implementing this exercise from a quantitative economics course.
Here's my code:
N = 50
M = 20
a = 0.1
b = 0.2
c = 0.5
d = 1.0
σ = 0.1
estimates = zeros(M, 5)
for i ∈ 1:M
x₁ = Vector{BigFloat}(randn(N))
x₂ = Vector{BigFloat}(randn(N))
w = Vector{BigFloat}(randn(N))
# Derive y vector (element wise operations)
y = a*x₁ .+ b.*(x₁.^2) .+ c.*x₂ .+ d .+ σ.*w
# Derive X matrix
X = [x₁ x₁ x₂ fill(d, (N, 1)) w]
# Implementation of the formula β = inv(XᵀX)Xᵀy
estimates[i, :] = (X'*X)\X'*y
end
histogram(estimates, layout=5, labels=["a", "b", "c", "d", "σ"])
I get a SingularException(5) error, as the matrix X'X has a determinant of 0 and has no inverse. My question is, where have I gone wrong in this exercise? I heard that a reason the determinant might be zero is floating point inaccuracy, so I made the random variables BigFloats to no avail. I know the mistake I'm making isn't very complicated but I'm lost. Thank you!
Your X should be
X = [x₁ x₁*x₁ x₂ fill(d, (N, 1))]
Explanation
It looks that you are trying to test OLS to estimate the parameters of the model:
y = α₀ + α₁x₁ + α₁₁x₁² + α₂x₂ + ϵ
where α₀, is the intercept of the model, α₁, α₁₁, α₂ are parameters for explanatory variables, and ϵ is the random error with the expected value 0 and variance σ². Hence the structure of X must match your case.
Putting the α₁ twice you introduced co-linearity and got the error.
You also do not want to "estimate" the parameter for ϵ because it represents the randomness.

How does cuhre function work in R?

I am going through my multivariate class notes and it uses cuhre function from R2Cuba package to evaluate probabilty according to rules on variables X & Y. Here's the complete chunk of the code:
integrand <- function(z){
x <- z[1]
y <- z[2]
if (0<x & x<1 & 0<y & y<1 & x+y>1)
f = 6*x*y^2
else
f = 0
return(f)
}
NDIM = 2
NCOMP = 1
int <- cuhre(NDIM, NCOMP, integrand, rel.tol = 1e-3, abs.tol = 1e-12,
flags = list(verbose = 2, final = 0))$value
int
The result is:
> int
[1] 0.8998863
I understand that this is the probability based on the rule:
0<x<1 & 0<y<1 & x+y>1
What I am not able to understand is that integrand has been defined as a function taking z as an argument, so only 1 parameter will be passed. When it is getting called below, it doesn't have any parameters and integration happens twice and probability gets saved in int. I got like 50% of the people but not very 100% clear on how it worked. Somewhere down the line I think we can use Cuhre to calculate marginal probability as well, can we?

How does ar.yw estimate the variance

In R, how does the function ar.yw estimate the variance? Specifically, where does the number "var.pred" come from? It does not seem to come from the usual YW estimate of the variance, nor the sum of squared residuals divided by df (even though there is disagreement about what the df should be, none of the choices give an answer equivalent to var.pred). And yes, I know that there are better methods than YW; just trying to figure out what R is doing.
set.seed(82346)
temp <- arima.sim(n=10, list(ar = 0.5), sd=1)
fit <- ar(temp, method = "yule-walker", demean = FALSE, aic=FALSE, order.max=1)
## R's estimate of the sigma squared
fit$var.pred
## YW estimate
sum(temp^2)/10 - fit$ar*sum(temp[2:10]*temp[1:9])/10
## YW if there was a mean
sum((temp-mean(temp))^2)/10 - fit$ar*sum((temp[2:10]-mean(temp))*(temp[1:9]-mean(temp)))/10
## estimate based on residuals, different possible df.
sum(na.omit(fit$resid^2))/10
sum(na.omit(fit$resid^2))/9
sum(na.omit(fit$resid^2))/8
sum(na.omit(fit$resid^2))/7
Need to read the code if it's not documented.
?ar.yw
Which says: "In ar.yw the variance matrix of the innovations is computed from the fitted coefficients and the autocovariance of x." If that is not enough explanation, then you need to look at the code:
methods(ar.yw)
#[1] ar.yw.default* ar.yw.mts*
#see '?methods' for accessing help and source code
getAnywhere(ar.yw.default)
# there are two cases that I see
x <- as.matrix(x)
nser <- ncol(x)
if (nser > 1L) # .... not your situation
#....
else{
r <- as.double(drop(xacf))
z <- .Fortran(C_eureka, as.integer(order.max), r, r,
coefs = double(order.max^2), vars = double(order.max),
double(order.max))
coefs <- matrix(z$coefs, order.max, order.max)
partialacf <- array(diag(coefs), dim = c(order.max, 1L,
1L))
var.pred <- c(r[1L], z$vars)
#.......
order <- if (aic)
(0L:order.max)[xaic == 0L]
else order.max
ar <- if (order)
coefs[order, seq_len(order)]
else numeric()
var.pred <- var.pred[order + 1L]
var.pred <- var.pred * n.used/(n.used - (order + 1L))
So you now need to find the Fortran code for C_eureka. I think I'm finding it here: https://svn.r-project.org/R/trunk/src/library/stats/src/eureka.f This is the code that aI think is returning the var.pred estimate. I'm not a time series guy and It's your responsibility to review this process for applicability to your problem.
subroutine eureka (lr,r,g,f,var,a)
c
c solves Toeplitz matrix equation toep(r)f=g(1+.)
c by Levinson's algorithm
c a is a workspace of size lr, the number
c of equations
c
snipped
c estimate the innovations variance
var(l) = var(l-1) * (1 - f(l,l)*f(l,l))
if (l .eq. lr) return
d = 0.0d0
q = 0.0d0
do 50 i = 1, l
k = l-i+2
d = d + a(i)*r(k)
q = q + f(l,i)*r(k)
50 continue

maximum likelihood estimation

I am new user of R and hope you will bear with me if my question is silly. I want to estimate the following model using the maximum likelihood estimator in R.
y= a+b*(lnx-α)
Where a, b, and α are parameters to be estimated and X and Y are my data set. I tried to use the following code that I get from the web:
library(foreign)
maindata <- read.csv("C:/Users/NUNU/Desktop/maindata/output2.csv")
h <- subset(maindata, cropid==10)
library(likelihood)
modelfun <- function (a, b, x) { b *(x-a)}
par <- list(a = 0, b = 0)
var<-list(x = "x")
par_lo <- list(a = 0, b = 0)
par_hi <- list(a = 50, b = 50)
var$y <- "y"
var$mean <- "predicted"
var$sd <- 0.815585
var$log <- TRUE
results <- anneal(model = modelfun, par = par, var = var,
source_data = h, par_lo = par_lo, par_hi = par_hi,
pdf = dnorm, dep_var = "y", max_iter = 20000)
The result I am getting is similar although the data is different, i.e., even when I change the cropid. Similarly, the predicted value generated is for x rather than y.
I do not know what I missed or went wrong. Your help is highly appreciated.
I am not sure if your model formula will lead to a unique solution, but in general you can find MLE with optim function
Here is a simple example for linear regression with optim:
fn <- function(beta, x, y) {
a = beta[1]
b = beta[2]
sum( (y - (a + b * log(x)))^2 )
}
# generate some data for testing
x = 1:100
# a = 10, b = 3.5
y = 10 + 3.5 * log(x)
optim(c(0,0,0),fn,x=x,y=y,method="BFGS")
you can change the function "fn" to reflect your model formula e.g.
sum( (y - (YOUR MODEL FORMULA) )^2 )
EDIT
I am just giving a simple example of using optim in case you have a custom model formula to optimize. I did not mean using it from simple linear regression, since lm will be sufficient.
I was a bit surprised that iTech used optim for what is a problem that is linear in its parameters. With his data for x and y:
> lm(y ~ log(x) )
Call:
lm(formula = y ~ log(x))
Coefficients:
(Intercept) log(x)
10.0 3.5
For linear problems, the least squares solution is the ML solution.

Resources