Difference equations in R - r

Below is a parasite growth model:
Ni(a, t) represents the expected number parasites of age a at time t, and ki(a, t) represents the killing effect, while PMF the multiplication factor. This is a discrete model as a equals 1, 2, 3.....48. Can anybody tell me how to implement this equation in R using difference equations? Many thanks for your support.

This is the best I could do with the information you provided. Point me to the rest of it and I might be able to get it to actually work, as stands I think it'll recur infinitely.
Ki <- function(a, t){ ## You need to actually define this properly
return(1)
}
Ni <- function(a, t, PMF){
if ((a %% 1 != 0)) stop("Only Takes Integer values of a")
if ((t %% 1 != 0)) stop("Only Takes Integer values of t")
if (a == 1){
x = Ni(48, t-1, PMF)
y = exp(-Ki(48,t-1))
result = PMF * x * y
return(result)
}
if (a > 1){
x = Ni(a-1, t-1, PMF)
y = exp(-Ki(a-1,t-1))
result = x * y
return(result)
}
}

You don't have a set of initial conditions. Once you've got some initial 48 values for N(a=1..48,t=1) you compute N(a=1,t=2) from the second equation, and then compute N(a=2..48,t=2) from the first equation. Repeat for t=3 and so on.
What you have is a recurrence relation, not a differential equation. You step through a recurrence relation as I've just explained.
It might be possible to convert this to a system of differential equations by looking at N(t)-N(t-1)/dt and solving but that's a maths job not a programming job.

Related

Newton root finding function does not work with sqrt(x) in R

Currently doing a homework exercise based on root finding algorithms:
A root finding algorithm can also be used to approximate certain functions. Show mathematically how the evaluation of the square root function f(x) = √x can be expressed as a root finding problem.4 Use both Newton’s method and the bisection method to approximate √x for different values of x. Compare your approximations with the R function sqrt. For which values of x does the approximation work well? Does Newton’s method or the bisection method perform better? How do the answers to these questions depend
on your starting value?
I have the following code that worked for every function so far:
newton.function <- function(f, fPrime, nmax, eps, x0){
n <- 1
x1 <- x0
result <- c()
while((n <= nmax) && (abs(f(x1)) >= eps)){
x1 <- (x0 - (f(x0)/fPrime(x0)))
result <- c(result, x1)
n <- n + 1
x0 <- x1
}
iterations <- n - 1
return(c(iterations, result[length(result)]))
}
Sqrt functions:
g <- function(x){
x^(1/2)
}
gPrime <- function(x){
1/(2*x^(1/2))
}
When I execute the function I either get Error in if (abs(f(x1)) <= eps) break :
missing value where TRUE/FALSE needed or if the x0 = 0 I get 1 and 0 as a result.
newton.function(f = g, fPrime = gPrime, nmax = 1000, eps = 1E-8, x0 = 0)
My bisection function works equally as bad, I am stuck answering the question.
From a programming point of view, your code works as expected.
If you start with 0, which is the exact solution, you get 0, fine.
Now look what happens when starting with any other number:
x1 <- (x0 - (f(x0)/fPrime(x0))) = (x0 - (x0^(1/2)/(1/(2*x^(1/2)))))
= x0-2x0 = -x0
So if you start with a positive number, x1 will be negative after the first iteration, and the next call to f(x1) returns NaN since you ask the square root of a negative number.
The error message tells you that R can not evaluate abs(f(x1)) >= eps to TRUE or FALSE, indeed, abs(f(x1)) returns NaN and the >= operator returns also NaN in this case. This is exactly what the error message tells you.
So I advice you to look at some mathematics source to check you algorithm, but the R part is ok.

How do I minimize a linear least squares function in R?

I'm reading Deep Learning by Goodfellow et al. and am trying to implement gradient descent as shown in Section 4.5 Example: Linear Least Squares. This is page 92 in the hard copy of the book.
The algorithm can be viewed in detail at https://www.deeplearningbook.org/contents/numerical.html with R implementation of linear least squares on page 94.
I've tried implementing in R, and the algorithm as implemented converges on a vector, but this vector does not seem to minimize the least squares function as required. Adding epsilon to the vector in question frequently produces a "minimum" less than the minimum outputted by my program.
options(digits = 15)
dim_square = 2 ### set dimension of square matrix
# Generate random vector, random matrix, and
set.seed(1234)
A = matrix(nrow = dim_square, ncol = dim_square, byrow = T, rlnorm(dim_square ^ 2)/10)
b = rep(rnorm(1), dim_square)
# having fixed A & B, select X randomly
x = rnorm(dim_square) # vector length of dim_square--supposed to be arbitrary
f = function(x, A, b){
total_vector = A %*% x + b # this is the function that we want to minimize
total = 0.5 * sum(abs(total_vector) ^ 2) # L2 norm squared
return(total)
}
f(x,A,b)
# how close do we want to get?
epsilon = 0.1
delta = 0.01
value = (t(A) %*% A) %*% x - t(A) %*% b
L2_norm = (sum(abs(value) ^ 2)) ^ 0.5
steps = vector()
while(L2_norm > delta){
x = x - epsilon * value
value = (t(A) %*% A) %*% x - t(A) %*% b
L2_norm = (sum(abs(value) ^ 2)) ^ 0.5
print(L2_norm)
}
minimum = f(x, A, b)
minimum
minimum_minus = f(x - 0.5*epsilon, A, b)
minimum_minus # less than the minimum found by gradient descent! Why?
On page 94 of the pdf appearing at https://www.deeplearningbook.org/contents/numerical.html
I am trying to find the values of the vector x such that f(x) is minimized. However, as demonstrated by the minimum in my code, and minimum_minus, minimum is not the actual minimum, as it exceeds minimum minus.
Any idea what the problem might be?
Original Problem
Finding the value of x such that the quantity Ax - b is minimized is equivalent to finding the value of x such that Ax - b = 0, or x = (A^-1)*b. This is because the L2 norm is the euclidean norm, more commonly known as the distance formula. By definition, distance cannot be negative, making its minimum identically zero.
This algorithm, as implemented, actually comes quite close to estimating x. However, because of recursive subtraction and rounding one quickly runs into the problem of underflow, resulting in massive oscillation, below:
Value of L2 Norm as a function of step size
Above algorithm vs. solve function in R
Above we have the results of A %% x followed by A %% min_x, with x estimated by the implemented algorithm and min_x estimated by the solve function in R.
The problem of underflow, well known to those familiar with numerical analysis, is probably best tackled by the programmers of lower-level libraries best equipped to tackle it.
To summarize, the algorithm appears to work as implemented. Important to note, however, is that not every function will have a minimum (think of a straight line), and also be aware that this algorithm should only be able to find a local, as opposed to a global minimum.

Floating point comparison with zero

I'm writing a function to calculate the quantile of the GEV distribution. The relevant aspect for this question is that a different form of the function is required when one of the parameters (the shape parameter or kappa) is zero
Programmatically, this is commonly addressed as follows (this is a snippet from evd:qgev and is similar in lmomco::quagev):
(Edit: Version 2.2.2 of lmomco has addressed the issue identified in this question)
if (shape == 0)
return(loc - scale * log(-log(p)))
else return(loc + scale * ((-log(p))^(-shape) - 1)/shape)
This works fine if shape/kappa is exactly equal to zero but there is odd behaviour near zero.
Lets look at an example:
Qgev_zero <- function(shape){
# p is an exceedance probability
p= 0.01
location=0
scale=1
if(shape == 0) return( location - scale*(log(-log(1-p) )))
location + (scale/shape)*((-log(1-p))^-shape - 1)
}
Qgev_zero(0)
#[1] 4.600149
Qgev_zero(1e-8)
#[1] 4.600149
This looks fine because the same answer is returned near zero and at zero. But look at what happens closer to zero.
k.seq <- seq(from = -4e-16, to = 4e-16, length.out = 1000)
plot(k.seq, sapply(k.seq, Qgev_zero), type = 'l')
The value returned by the function oscillates is often incorrect.
These problems go away if I replace the direct comparison with zero with all.equal e.g.
if(isTRUE(all.equal(shape, 0))) return( location - scale*(log(-log(1-p) )))
Looking at the help for all.equal suggests that for default values, anything smaller than 1.5e-8 will be treated as zero.
Of course this odd behaviour near zero is probably not generally an issue but in my case, I'm using optimisation/root finding to determine parameters from known quantiles so am concerned that my code needs to be robust.
To the question: is using all.equal(target, 0) an appropriate way to deal with this problem? Why is it that this approach isn't used routinely?
Some functions, when implemented the obvious way with floating point representations, are ill-behaved at certain points. That's especially likely to be the case when the function has to be manually defined at a single point: When things go absolutely undefined at a point, it's likely that they're hanging on for dear life when they get close.
In this case, that's from the kappa denominator fighting the kappa negative exponent. Which one wins the battle is determined on a bit-by-bit basis, each one sometimes winning the "rounding to a stronger magnitude" contest.
There's a variety of approaches to fixing these sorts of problems, all of them designed on a case-by-case basis. One often-flawed but easy-to-implement approach is to switch to a better-behaved representation (say, the Taylor expansion with respect to kappa) near the problematic point. That'll introduce discontinuities at the boundaries; if necessary, you can try interpolating between the two.
Following Sneftel's suggestion, I calculate the quantile at k = -1e-7 and k = 1e-7 and interpolate if k argument falls between these limits. This seems to work.
In this code I'm using the parameterisation for the gev quantile function from lmomco::quagev
(Edit: Version 2.2.2 of lmomco has addressed the issues identified in this question)
The function Qgev is the problematic version (black line on plot), while Qgev_interp, interpolates near zero (green line on plot).
Qgev <- function(K, f, XI, A){
# K = shape
# f = probability
# XI = location
# A = scale
Y <- -log(-log(f))
Y <- (1-exp(-K*Y))/K
x <- XI + A*Y
return(x)
}
Qgev_interp <- function(K, f, XI, A){
.F <- function(K, f, XI, A){
Y <- -log(-log(f))
Y <- (1-exp(-K*Y))/K
x <- XI + A*Y
return(x)
}
k1 <- -1e-7
k2 <- 1e-7
y1 <- .F(k1, f, XI, A)
y2 <- .F(k2, f, XI, A)
F_nearZero <- approxfun(c(k1, k2), c(y1, y2))
if(K > k1 & K < k2) {
return(F_nearZero(K))
} else {
return(.F(K, f, XI, A))
}
}
k.seq <- seq(from = -1.1e-7, to = 1.1e-7, length.out = 1000)
plot(k.seq, sapply(k.seq, Qgev, f = 0.01, XI = 0, A = 1), col=1, lwd = 1, type = 'l')
lines(k.seq, sapply(k.seq, Qgev_interp, f = 0.01, XI = 0, A = 1), col=3, lwd = 2)

Constrained optimization of a vector

I have a (non-symmetric) probability matrix, and an observed vector of integer outcomes. I would like to find a vector that maximises the probability of the outcomes, given the transition matrix. Simply, I am trying to estimate a distribution of particles at sea given their ultimate distribution on land, and a matrix of probabilities of a particle released from a given point in the ocean ending up at a given point on the land.
The vector that I want to find is subject to the constraint that all components must be between 0-1, and the sum of the components must equal 1. I am trying to figure out the best optimisation approach for the problem.
My transition matrix and data set are quite large, but I have created a smaller one here:
I used a simulated known at- sea distribution of
msim<-c(.3,.2,.1,.3,.1,0) and a simulated probability matrix (t) to come up with an estimated coastal matrix (Datasim2), as follows:
t<-matrix (c(0,.1,.1,.1,.1,.2,0,.1,0,0,.3,0,0,0,0,.4,.1,.3,0,.1,0,.1,.4,0,0,0,.1,0,.1,.1),
nrow=5,ncol=6, byrow=T)
rownames(t)<-c("C1","C2","C3","C4","C5") ### locations on land
colnames(t)<-c("S1","S2","S3","S4","S5","S6") ### locations at sea
Datasim<-as.numeric (round((t %*% msim)*500))
Datasim2<-c(rep("C1",95), rep("C2",35), rep("C3",90),rep("C4",15),rep("C5",30))
M <-c(0.1,0.1,0.1,0.1,0.1,0.1) ## starting M
I started with a straightforward function as follows:
EstimateSource3<-function(M,Data,T){
EstEndProbsall<-M%*%T
TotalLkhd<-rep(NA, times=dim(Data)[1])
for (j in 1:dim(Data)[1]){
ObsEstEndLkhd<-0
ObsEstEndLkhd<-1-EstEndProbsall[1,] ## likelihood of particle NOT ending up at locations other than the location of interest
IndexC<-which(colnames(EstEndProbsall)==Data$LocationCode[j], arr.ind=T) ## likelihood of ending up at location of interest
ObsEstEndLkhd[IndexC]<-EstEndProbsall[IndexC]
#Total likelihood
TotalLkhd[j]<-sum(log(ObsEstEndLkhd))
}
SumTotalLkhd<-sum(TotalLkhd)
return(SumTotalLkhd)
}
DistributionEstimate <- optim(par = M, fn = EstimateSource3, Data = Datasim2, T=t,
control = list(fnscale = -1, trace=5, maxit=500), lower = 0, upper = 1)
To constrain the sum to 1, I tried using a few of the suggestions posted here:How to set parameters' sum to 1 in constrained optimization
e.g. adding M<-M/sum(M) or SumTotalLkhd<-SumTotalLkhd-(10*pwr) to the body of the function, but neither yielded anything like msim, and in fact, the 2nd solution came up with the error “L-BFGS-B needs finite values of 'fn'”
I thought perhaps the quadprog package might be of some help, but I don’t think I have a symmetric positive definite matrix…
Thanks in advance for your help!
What about that: Let D = distribution at land, M = at sea, T the transition matrix. You know D, T, you want to calculate M. You have
D' = M' T
hence D' T' = M' (T T')
and accordingly D'T'(T T')^(-1) = M'
Basically you solve it as when doing linear regression (seems SO does not support math notation: ' is transpose, ^(-1) is ordinary matrix inverse.)
Alternatively, D may be counts of particles, and now you can ask questions like: what is the most likely distribution of particles at sea. That needs a different approach though.
Well, I have never done such models but think along the following lines. Let M be of length 3 and D of length 2, and T is hence 3x2. We know T and we observe D_1 particles at location 1 and D_2 particles at location 2.
What is the likelihood that you observe one particle at location D_1? It is Pr(D = 1) = M_1 T_11 + M_2 T_21 + M_3 T_32. Analogously, Pr(D = 2) = M_1 T_12 + M_2 T_22 + M_3 T_32. Now you can easily write the log-likelihood of observing D_1 and D_2 particles at locations 1 and 2. The code might look like this:
loglik <- function(M) {
if(M[1] < 0 | M[1] > 1)
return(NA)
if(M[2] < 0 | M[2] > 1)
return(NA)
M3 <- 1 - M[1] - M[2]
if(M3 < 0 | M3 > 1)
return(NA)
D[1]*log(T[1,1]*M[1] + T[2,1]*M[2] + T[3,1]*M3) +
D[2]*log(T[1,2]*M[1] + T[2,2]*M[2] + T[3,2]*M3)
}
T <- matrix(c(0.1,0.2,0.3,0.9,0.8,0.7), 3, 2)
D <- c(100,200)
library(maxLik)
m <- maxLik(loglik, start=c(0.4,0.4), method="BFGS")
summary(m)
I get the answer (0, 0.2, 0.8) when I estimate it but standard errors are very large.
As I told, I have never done it so I don't know it it makes sense.

pseudocode into R language

I just started using the 'wbs' package and I thought I could get a better sense of it if I code the pseudocode for it in r before I use the package. So, I'm currently turning the wild binary segmentation pseudocode into R programming language material.
function WildBinSeg(s, e, ζT)
if e−s < 1 then
STOP
else
Ms,e := set of those indices m for which [sm,em] ∈ FM T is such that [sm,em] ⊆ [s,e]
(Optional: augment Ms,e := Ms,e ∪{0}, where [s0,e0] = [s,e])
(m0,b0) := argmaxm∈Ms,e,b∈{sm,...,em−1}| ˜ Xb sm,em|
if | Xb0 sm0,em0| > ζT then
add b0 to the set of estimated change-points
WildBinSeg(s, b0, ζT)
WildBinSeg(b0 + 1, e, ζT)
else
STOP
end if
end if
end function
I got confused at the line:
Ms,e := set of those indices m for which [sm,em] ∈ FM T is such that [sm,em] ⊆ [s,e]
I know this is a the pseudocode is a function, but I'm not sure if I should create another function for this line since it has two commands. Can anyone help ne code this line?
You first need to write a function which would draw M intervals and find maximiser of the absolute value of the CUSUM statistic in each interval (for details see the WBS paper).
In the R code below, I assume that the output of such a function is stored in 'res' variable, which is a 4 by M matrix. The first two columns contain left and right endpoints of the intervals drawn (sm, em), the third one the maximiser of the CUSUM statistic for each interval (bm), the last contains the value of corresponding CUSUM statistic (Xbm sm, em).
WildBinSeg <- function(s, e, threshold,res){
if(e-s <1) return(NULL)
else{
#we leave only intervals contained in [s,e]
res <- res[res[,1]>= s & res[,2]<= e,,drop=FALSE]
#place for the optional augmentation
#check if there are any intervals left
if(nrow(res)==0) return(NULL)
else{
#we find the maximum
max.id <- which.max(abs(res[,4]))
b0 <- res[max.id,3]
Xb0 <- res[max.id,4]
if(abs(Xb0) > threshold)
return(c(
WildBinSeg(s, b0, threshold,res),
b0,
WildBinSeg(b0+1,e, threshold,res)
))
else return(NULL)
}
}
}
This function returns the localisations of the change-points. You can check how it work using this piece of code.
require(wbs)
set.seed(12)
#we generate a piecewise constant function plus Gaussian noise vector
x <- rnorm(1000)
x[1:500] <- x[1:500]+1
res <- wbs(x)$res[,1:4]
#we set the threshold to the recommended value
threshold <- 1.3*sqrt(2*log(1000))
WildBinSeg(1,1000,threshold,res)

Resources