What I am about to explain is kinda tricky, but I hope I can explain it clearly.
Suppose you have a function that does the Hodrick Prescott detrending, that is pretty much this:
The user picks up the λ value, and therefore for every λ it exists a series trend τ(λ).
Suppose you pick a number near 0 (on the positive side)
This number is V, for this case suppose V=0.0001278846
Then you want to compute this:
(I have the function that does)
But you want to find a λ so that F(λ) = V
How can I complete this?
I was trying to make a while statement, but could not state it correctly, then I made a for loop with an if statement to break the for loop when F(λ)-V = 0.
This is what my for loop looks like:
for(L in 1:3500){
F_ <- find_v(dataa, L)
if((F_-V)==0){
print(paste("The λ value for this series following Rule 1 is:", L))
break
}
cat(paste("The λ =",L,"has a (F-V) difference of:", (F_-V),"\n"))
where dataa is my data composed of 89 obs.
Using this for loop I see that (F-V) turns negative between L = 3276 and L = 3277.
Is there a better way to do it? Like optimization?
Because using the for loop it feels like I'm obtaining the optimal λ by the force.
Sorry for not getting my data or codes for the hodrick prescott detrending or the find_v function, they are way too long.
Since you are doing double optimization, consider the following:
The data
set.seed(0)
y <- rnorm(89)
The function to be optimized:
lfun <- function(tau, y, lambda){
n <- length(tau)
tt <- tau[-(1:2)] - 2 * tau[-c(1, n)] + head(tau, -2)
sum((y-tau)^2) + lambda *sum(tt^2)
}
The F function:
f_lambda <- function(lambda, y, V = 0){
tau <- optim(y,lfun,y = y, lambda = lambda, method = 'BFGS')$par
tt <- tail(tau,-2) - 2 * head(tau[-1], -1) + head(tau, -2)
sqrt((sum((y-tau)^2)/sum(tt^2) - V)^2)
}
Optimizing the F function:
optim(0.1, f_lambda, y = y, V=0.0001278846, method="Brent",lower=0, upper=100)
$par
[1] 0.003412824
$value
[1] 2.633131e-10
$counts
function gradient
NA NA
$convergence
[1] 0
$message
NULL
Now the lambda = 0.003412824 gives the desired V ie:
f_lambda(0.003412824, y)
[1] 0.0001278843
Which is very close to the V=0.0001278846 you started with.
Related
I am reading Section 4.2 in Simulation (2006, 4ed., Elsevier) by Sheldon M. Ross, which introducing generating a Poisson random variable by the inverse transform method.
Denote pi =P(X=xi)=e^{-λ} λ^i/i!, i=0,1,... and F(i)=P(X<=i)=Σ_{k=0}^i pi to be the PDF and CDF for Poisson, respectively, which can be computed via dpois(x,lambda) and ppois(x,lambda) in R.
There are two inverse transform algorithms for Poisson: the regular version and the improved one.
The steps for the regular version are as follows:
Simulate an observation U from U(0,1).
Set i=0 and F=F(0)=p0=e^{-λ}.
If U<F, select X=i and terminate.
If U >= F, obtain i=i+1, F=F+pi and return to the previous step.
I write and test the above steps as follows:
### write the regular R code
pois_inv_trans_regular = function(n, lambda){
X = rep(0, n) # generate n samples
for(m in 1:n){
U = runif(1)
i = 0; F = exp(-lambda) # initialize
while(U >= F){
i = i+1; F = F + dpois(i,lambda) # F=F+pi
}
X[m] = i
}
X
}
### test the code (for small λ, e.g. λ=3)
set.seed(0); X = pois_inv_trans_regular(n=10000,lambda=3); c(mean(X),var(X))
# [1] 3.005000 3.044079
Note that the mean and variance for Poisson(λ) are both λ, so the writing and testing for the regular code are making sense!
Next I tried the improved one, which is designed for large λ and described according to the book as follows:
The regular algorithm will need to make 1+λ searches, i.e. O(λ) computing complexity, which is fine when λ is small, while it can be greatly improved upon when λ is large.
Indeed, since a Poisson random variable with mean λ is most likely to take on one of the two integral values closest to λ , a more efficient algorithm would first check one of these values, rather than starting at 0 and working upward. For instance, let I=Int(λ) and recursively determine F(I).
Now generate a Poisson random variable X with mean λ by generating a random number U, noting whether or not X <= I by seeing whether or not U <= F(I). Then search downward starting from I in the case where X <= I and upward starting from I+1 otherwise.
It is said that the improved algorithm only need 1+0.798√λ searches, i.e., having O(√λ) complexity.
I tried to wirte the R code for the improved one as follows:
### write the improved R code
pois_inv_trans_improved = function(n, lambda){
X = rep(0, n) # generate n samples
p = function(x) {dpois(x,lambda)} # PDF: p(x) = P(X=x) = λ^x exp(-λ)/x!
F = function(x) {ppois(x,lambda)} # CDF: F(x) = P(X ≤ x)
I = floor(lambda) # I=Int(λ)
F1 = F(I); F2 = F(I+1) # two close values
for(k in 1:n){
U = runif(1)
i = I
if ( F1 < U & U <= F2 ) {
i = I+1
}
while (U <= F1){ # search downward
i = i-1; F1 = F1 - p(i)
}
while (U > F2){ # search upward
i = i+1; F2 = F2 + p(i)
}
X[k] = i
}
X
}
### test the code (for large λ, e.g. λ=100)
set.seed(0); X = pois_inv_trans_improved(n=10000,lambda=100); c(mean(X),var(X))
# [1] 100.99900000 0.02180118
From the simulation results [1] 100.99900000 0.02180118 for c(mean(X),var(X)), which shows nonsense for the variance part. What should I remedy this issue?
The main problem was that F1 and F2 were modified within the loop and not reset, so eventually a very wide range of U's are considered to be in the middle.
The second problem was on the search downward the p(i) used should be the original i, because F(x) = P(X <= x). Without this, the code hangs for low U.
The easiest fix for this is to start i = I + 1. Then "in the middle" if statement isn't needed.
pois_inv_trans_improved = function(n, lambda){
X = rep(0, n) # generate n samples
p = function(x) {dpois(x,lambda)} # PDF: p(x) = P(X=x) = λ^x exp(-λ)/x!
`F` = function(x) {ppois(x,lambda)} # CDF: F(x) = P(X ≤ x)
I = floor(lambda) # I=Int(λ)
F1 = F(I); F2 = F(I+1) # two close values
for(k in 1:n){
U = runif(1)
i = I + 1
# if ( F1 < U & U <= F2 ) {
# i = I + 1
# }
F1tmp = F1
while (U <= F1tmp){ # search downward
i = i-1; F1tmp = F1tmp - p(i);
}
F2tmp = F2
while (U > F2tmp){ # search upward
i = i+1; F2tmp = F2tmp + p(i)
}
X[k] = i
}
X
}
This gives:
[1] 100.0056 102.2380
I have to use recursion to produce pseudo random numbers. For fixed values a, b and c, I need to calculate:
x_n+1 = (a * x_n + c) modulo 2^b. Random numbers are obtained by the function R_n = x_n / (2^b). I need to save these R_n values to make a histogram. How can I make a function in R that uses it's previous values x_n to produce x_n+1? I have made a start with my code, it's listed below.
a=5
b=4
c=3
k=10000
random <- function(x) {
if(x<k){
x = (a*x+c)%%2^b
k++
}
}
Here's a thought for starters,
random <- function(a = 5, b = 4, c = 3, k = 10000, x0 = 1) {
x <- x0 # or some other sane default
function(n = 1) {
newx <- Reduce(function(oldx, ign) (a*oldx + c) %% (2^b), seq_len(n),
init = x, accumulate = TRUE)[-1]
# if (x >= k)? do something else
if (length(newx)) {
x <<- newx[length(newx)]
k <<- k + n
}
newx
}
}
The premise is that the random function is a setup function that returns a function. This inner function has its a, b, c, k, and previous x variables stored within it.
thisran <- random()
thisran()
# [1] 8
thisran(3)
# [1] 11 10 5
I haven't studied creating PRNG in depth, but I'm inferring that x0 here is effectively your seed. I'm not certain why you had a if (x<k) conditional in your function; since k was never used otherwise, just incremented, I'm thinking it only serves as a termination indicator for your PRNG (so it is not infinite).
If need be, the current k value (and other variables, for that matter) can be peeked-at with
get("k", environment(thisran))
# [1] 10003
BTW: the use of Reduce might seem like an unnecessary complication, but it enables the ran(n) functionality, similar to other PRNGs in R. That is, one can do runif(7) for seven random numbers, and I thought it would be useful to do that here. The use of Reduce is required in that case since each calculation depends on the results from the previous calculation, so a sample replicate or sapply would not work (without some contrived coding that I wanted to avoid).
I am re-writting an algorithm I did in C++ in R for practice called the Finite Difference Method. I am pretty new with R so I don't know all the rules regarding vector/matrix multiplication. For some reason I am getting a non-conformable arguments error when I do this:
ST_u <- matrix(0,M,1)
ST_l <- matrix(0,M,1)
for(i in 1:M){
Z <- matrix(gaussian_box_muller(i),M,1)
ST_u[i] <- (S0 + delta_S)*exp((r - (sigma*sigma)/(2.0))*T + sigma*sqrt(T)%*%Z)
ST_l[i] <- (S0 - delta_S)*exp((r - (sigma*sigma)/(2.0))*T + sigma*sqrt(T)%*%Z)
}
I get this error:
Error in sqrt(T) %*% Z : non-conformable arguments
Here is my whole code:
gaussian_box_muller <- function(n){
theta <- runif(n, 0, 2 * pi)
rsq <- rexp(n, 0.5)
x <- sqrt(rsq) * cos(theta)
return(x)
}
d_j <- function(j, S, K, r, v,T) {
return ((log(S/K) + (r + (-1^(j-1))*0.5*v*v)*T)/(v*(T^0.5)))
}
call_delta <- function(S,K,r,v,T){
return (S * dnorm(d_j(1, S, K, r, v, T))-K*exp(-r*T) * dnorm(d_j(2, S, K, r, v, T)))
}
Finite_Difference <- function(S0,K,r,sigma,T,M,delta_S){
ST_u <- matrix(0,M,1)
ST_l <- matrix(0,M,1)
for(i in 1:M){
Z <- matrix(gaussian_box_muller(i),M,1)
ST_u[i] <- (S0 + delta_S)*exp((r - (sigma*sigma)/(2.0))*T + sigma*sqrt(T)%*%Z)
ST_l[i] <- (S0 - delta_S)*exp((r - (sigma*sigma)/(2.0))*T + sigma*sqrt(T)%*%Z)
}
Delta <- matrix(0,M,1)
totDelta <- 0
for(i in 1:M){
if(ST_u[i] - K > 0 && ST_l[i] - K > 0){
Delta[i] <- ((ST_u[i] - K) - (ST_l[i] - K))/(2*delta_S)
}else{
Delta <- 0
}
totDelta = totDelta + exp(-r*T)*Delta[i]
}
totDelta <- totDelta * 1/M
Var <- 0
for(i in 1:M){
Var = Var + (Delta[i] - totDelta)^2
}
Var = Var*1/M
cat("The Finite Difference Delta is : ", totDelta)
call_Delta_a <- call_delta(S,K,r,sigma,T)
bias <- abs(call_Delta_a - totDelta)
cat("The bias is: ", bias)
cat("The Variance of the Finite Difference method is: ", Var)
MSE <- bias*bias + Var
cat("The marginal squared error is thus: ", MSE)
}
S0 <- 100.0
delta_S <- 0.001
K <- 100.0
r <- 0.05
sigma <- 0.2
T <- 1.0
M <- 10
result1 <- Finite_Difference(S0,K,r,sigma,T,M,delta_S)
I can't seem to figure out the problem, any suggestions would be greatly appreciated.
In R, the %*% operator is reserved for multiplying two conformable matrices. As one special case, you can also use it to multiply a vector by a matrix (or vice versa), if the vector can be treated as a row or column vector that conforms to the matrix; as a second special case, it can be used to multiply two vectors to calculate their inner product.
However, one thing it cannot do is perform scalar multipliciation. Scalar multiplication of vectors or matrices always uses the plain * operator. Specifically, in the expression sqrt(T) %*% Z, the first term sqrt(T) is a scalar, and the second Z is a matrix. If what you intend to do here is multiply the matrix Z by the scalar sqrt(T), then this should just be written sqrt(T) * Z.
When I made this change, your program still didn't work because of another bug -- S is used but never defined -- but I don't understand your algorithm well enough to attempt a fix.
A few other comments on the program not directly related to your original question:
The first loop in Finite_Difference looks suspicious: guassian_box_muller(i) generates a vector of length i as i varies in the loop from 1 up to M, and forcing these vectors into a column matrix of length M to generate Z is probably not doing what you want. It will "reuse" the values in a cycle to populate the matrix. Try these to see what I mean:
matrix(gaussian_box_muller(1),10,1) # all one value
matrix(gaussian_box_muller(3),10,1) # cycle of three values
You also use loops in many places where R's vector operations would be easier to read and (typically) faster to execute. For example, your definition of Var is equivalent to:
Var <- sum((Delta - totDelta)^2)/M
and the definitions of Delta and totDelta could also be written in this simplified fashion.
I'd suggest Googling for "vector and matrix operations in r" or something similar and reading some tutorials. Vector arithmetic in particular is idiomatic R, and you'll want to learn it early and use it often.
You might find it helpful to consider the rnorm function to generate random Gaussians.
Happy R-ing!
I'm trying to simulate a variable and it's supposed to work like this:
v[t] = Q * v[t-1] + e[t]
e is a random error I generate using rnorm(156,0,0.001); v is what I aim to simulate; Q is a coefficient (I'm using 0.5).
The 1st value v[1] would be equal to e[1]. Then
v[2] = Q * v[1] + e[2]
v[3] = Q * v[2] + e[3]
. . .
I'm new to R, I'm trying to use a for loop but I'm struggling (I was going to publish my code here but it isn't working so I thought I wouldn't waste people's time). Thanks in advance!
This is a typical autoregressive process, which can be generated using of filter with "recursive" method.
e <- rnorm(156, 0, 0.001)
filter(x = c(0, e), filter = 0.5, method = "recursive")[-1]
Let's consider a small example with length 5 only:
set.seed(0)
e <- rnorm(5, 0, 0.1)
# [1] 0.12629543 -0.03262334 0.13297993 0.12724293 0.04146414
x <- filter(x = c(0, e), filter = 0.5, method = "recursive")
x[-1]
# [1] 0.12629543 0.03052438 0.14824212 0.20136399 0.14214614
filter is the workhorse of arima.sim, however, it is simply a computational routine with written C code and does not require the process to be stationary. Readers interested in arima.sim may continue to read:
Simulate a time series
Simulate an AR(1) process with uniform innovations
We note that the unit response to the auto-regressive process v(t)=Q*v(t-1) + u(t) is:
unit_res <- c(1, Q, Q^2, Q^3, ...)
We can generate this response using unit_res <- q^(seq_len(length(err))-1). Then, the response v to err is simply the convolution of err with this unit_res:
set.seed(123) ## for reproducibility
q <- 0.5
err <- rnorm(156,0,0.0001)
unit_res <- q^(seq_len(length(err))-1)
## first (initial value is zero) and we take the first 156 values from the convolution
v <- c(0, convolve(err,rev(unit_res),type="open")[1:156])
##head(v,20)
## [1] 0.000000e+00 -5.604756e-05 -5.104153e-05 1.303501e-04 7.222587e-05 4.904171e-05
## [7] 1.960274e-04 1.441053e-04 -5.445347e-05 -9.591202e-05 -9.252221e-05 7.614708e-05
##[13] 7.405492e-05 7.710461e-05 4.962057e-05 -3.077383e-05 1.633044e-04 1.314372e-04
##[19] -1.309431e-04 4.664044e-06
Since 156 is not a large number, another way to do this is to construct a unit response matrix for the difference equation v(t)=Q*v(t-1) + err(t) of the form:
Z = [1 0 0 0 ...
Q 1 0 0 ...
Q^2 Q 1 0 ...
Q^3 Q^2 Q 1 ...
... ... ... ... ...]
This matrix will be 156 x 156 in your case. Note that each column of this matrix is the response in time to a unit input in err at time t equaling to the column index. Since the system is linear, the response v to err=rnorm(156,0,0.001) is given by superposition of each individual unit response and can be computed by matrix multiplication v = Z %*% err.
To construct this matrix, we can use the function:
constructZ <- function(Q, N) {
r <- Q^(seq_len(N)-1)
m <- matrix(rep(r,N),nrow=N)
z <- matrix(0,nrow=N,ncol=N)
z[lower.tri(z,diag=TRUE)] <- m[row(m) <= (N+1-col(m))]
z
}
With this we have:
v <- c(0,constructZ(q, length(err)) %*% err)
which gives the same result.
I am trying to construct a new variable, z, using two pre-existing variables - x and y. Suppose for simplicity that there are only 5 observations (corresponding to 5 time periods) and that x=c(5,7,9,10,14) and y=c(0,2,1,2,3). I’m really only using the first observation in x as the initial value, and then constructing the new variable z using depreciated values of x[1] (depreciation rate of 0.05 per annum) and each of the observations over time in the vector, y. The variable I am constructing takes the form of a new 5 by 1 vector, z, and it can be obtained using the following simple commands in R:
z=NULL
for(i in 1:length(x)){n=seq(1,i,by=1)
z[i]=sum(c(0.95^(i-1)*x[1],0.95^(i-n)*y[n]))}
The problem I am having is that I need to define this operation as a function. That is, I need to create a function f that will spit out the vector z whenever any arbitrary vectors x and y are plugged into the function, f(x,y). I’ve been going around in circles for days now and I was wondering if someone would be kind enough to provide me with a suggestion about how to proceed. Thanks in advance.
I hope following will work for you...
x=c(5,7,9,10,14)
y=c(0,2,1,2,3)
getZ = function(x,y){
z = NULL
for(i in 1:length(x)){
n=seq(1,i,by=1)
z[i]=sum(c(0.95^(i-1)*x[1],0.95^(i-n)*y[n]))
}
return = z
}
z = getZ(x,y)
z
5.000000 6.750000 7.412500 9.041875 11.589781
This will allow .05 (or any other value) passed in as r.
ConstructZ <- function(x, y, r){
n <- length(y)
d <- 1 - r
Z <- vector(length = n)
for(i in seq_along(x)){
n = seq_len(i)
Z[i] = sum(c(d^(i-1)*x[1],d^(i-n)*y[n]))
}
return(Z)
}
Here is a cool (if I say so myself) way to implement this as an infix operator (since you called it an operation).
ff = function (x, y, i) {
n = seq.int(i)
sum(c(0.95 ^ (i - 1) * x[[1]],
0.95 ^ (i - n) * y[n]))
}
`%dep%` = function (x, y) sapply(seq_along(x), ff, x=x, y=y)
x %dep% y
[1] 5.000000 6.750000 7.412500 9.041875 11.589781
Doing the loop multiple times and recalculating the exponents every time may be inefficient. Here's another way to implement your calculation
getval <- function(x,y,lambda=.95) {
n <- length(y)
pp <- lambda^(1:n-1)
yy <- sapply(1:n, function(i) {
sum(y * c(pp[i:1], rep.int(0, n-i)))
})
pp*x[1] + yy
}
Testing with #vrajs5's sample data
x=c(5,7,9,10,14)
y=c(0,2,1,2,3)
getval(x,y)
# [1] 5.000000 6.750000 7.412500 9.041875 11.589781
but appears to be about 10x faster when testing on larger data such as
set.seed(15)
x <- rpois(200,20)
y <- rpois(200,20)
I'm not sure of how often you will run this or on what size of data so perhaps efficiency isn't a concern for you. I guess readability is often more important long-term for maintenance.