Constrained optimization of a vector - r

I have a (non-symmetric) probability matrix, and an observed vector of integer outcomes. I would like to find a vector that maximises the probability of the outcomes, given the transition matrix. Simply, I am trying to estimate a distribution of particles at sea given their ultimate distribution on land, and a matrix of probabilities of a particle released from a given point in the ocean ending up at a given point on the land.
The vector that I want to find is subject to the constraint that all components must be between 0-1, and the sum of the components must equal 1. I am trying to figure out the best optimisation approach for the problem.
My transition matrix and data set are quite large, but I have created a smaller one here:
I used a simulated known at- sea distribution of
msim<-c(.3,.2,.1,.3,.1,0) and a simulated probability matrix (t) to come up with an estimated coastal matrix (Datasim2), as follows:
t<-matrix (c(0,.1,.1,.1,.1,.2,0,.1,0,0,.3,0,0,0,0,.4,.1,.3,0,.1,0,.1,.4,0,0,0,.1,0,.1,.1),
nrow=5,ncol=6, byrow=T)
rownames(t)<-c("C1","C2","C3","C4","C5") ### locations on land
colnames(t)<-c("S1","S2","S3","S4","S5","S6") ### locations at sea
Datasim<-as.numeric (round((t %*% msim)*500))
Datasim2<-c(rep("C1",95), rep("C2",35), rep("C3",90),rep("C4",15),rep("C5",30))
M <-c(0.1,0.1,0.1,0.1,0.1,0.1) ## starting M
I started with a straightforward function as follows:
EstimateSource3<-function(M,Data,T){
EstEndProbsall<-M%*%T
TotalLkhd<-rep(NA, times=dim(Data)[1])
for (j in 1:dim(Data)[1]){
ObsEstEndLkhd<-0
ObsEstEndLkhd<-1-EstEndProbsall[1,] ## likelihood of particle NOT ending up at locations other than the location of interest
IndexC<-which(colnames(EstEndProbsall)==Data$LocationCode[j], arr.ind=T) ## likelihood of ending up at location of interest
ObsEstEndLkhd[IndexC]<-EstEndProbsall[IndexC]
#Total likelihood
TotalLkhd[j]<-sum(log(ObsEstEndLkhd))
}
SumTotalLkhd<-sum(TotalLkhd)
return(SumTotalLkhd)
}
DistributionEstimate <- optim(par = M, fn = EstimateSource3, Data = Datasim2, T=t,
control = list(fnscale = -1, trace=5, maxit=500), lower = 0, upper = 1)
To constrain the sum to 1, I tried using a few of the suggestions posted here:How to set parameters' sum to 1 in constrained optimization
e.g. adding M<-M/sum(M) or SumTotalLkhd<-SumTotalLkhd-(10*pwr) to the body of the function, but neither yielded anything like msim, and in fact, the 2nd solution came up with the error “L-BFGS-B needs finite values of 'fn'”
I thought perhaps the quadprog package might be of some help, but I don’t think I have a symmetric positive definite matrix…
Thanks in advance for your help!

What about that: Let D = distribution at land, M = at sea, T the transition matrix. You know D, T, you want to calculate M. You have
D' = M' T
hence D' T' = M' (T T')
and accordingly D'T'(T T')^(-1) = M'
Basically you solve it as when doing linear regression (seems SO does not support math notation: ' is transpose, ^(-1) is ordinary matrix inverse.)
Alternatively, D may be counts of particles, and now you can ask questions like: what is the most likely distribution of particles at sea. That needs a different approach though.

Well, I have never done such models but think along the following lines. Let M be of length 3 and D of length 2, and T is hence 3x2. We know T and we observe D_1 particles at location 1 and D_2 particles at location 2.
What is the likelihood that you observe one particle at location D_1? It is Pr(D = 1) = M_1 T_11 + M_2 T_21 + M_3 T_32. Analogously, Pr(D = 2) = M_1 T_12 + M_2 T_22 + M_3 T_32. Now you can easily write the log-likelihood of observing D_1 and D_2 particles at locations 1 and 2. The code might look like this:
loglik <- function(M) {
if(M[1] < 0 | M[1] > 1)
return(NA)
if(M[2] < 0 | M[2] > 1)
return(NA)
M3 <- 1 - M[1] - M[2]
if(M3 < 0 | M3 > 1)
return(NA)
D[1]*log(T[1,1]*M[1] + T[2,1]*M[2] + T[3,1]*M3) +
D[2]*log(T[1,2]*M[1] + T[2,2]*M[2] + T[3,2]*M3)
}
T <- matrix(c(0.1,0.2,0.3,0.9,0.8,0.7), 3, 2)
D <- c(100,200)
library(maxLik)
m <- maxLik(loglik, start=c(0.4,0.4), method="BFGS")
summary(m)
I get the answer (0, 0.2, 0.8) when I estimate it but standard errors are very large.
As I told, I have never done it so I don't know it it makes sense.

Related

How do I minimize a linear least squares function in R?

I'm reading Deep Learning by Goodfellow et al. and am trying to implement gradient descent as shown in Section 4.5 Example: Linear Least Squares. This is page 92 in the hard copy of the book.
The algorithm can be viewed in detail at https://www.deeplearningbook.org/contents/numerical.html with R implementation of linear least squares on page 94.
I've tried implementing in R, and the algorithm as implemented converges on a vector, but this vector does not seem to minimize the least squares function as required. Adding epsilon to the vector in question frequently produces a "minimum" less than the minimum outputted by my program.
options(digits = 15)
dim_square = 2 ### set dimension of square matrix
# Generate random vector, random matrix, and
set.seed(1234)
A = matrix(nrow = dim_square, ncol = dim_square, byrow = T, rlnorm(dim_square ^ 2)/10)
b = rep(rnorm(1), dim_square)
# having fixed A & B, select X randomly
x = rnorm(dim_square) # vector length of dim_square--supposed to be arbitrary
f = function(x, A, b){
total_vector = A %*% x + b # this is the function that we want to minimize
total = 0.5 * sum(abs(total_vector) ^ 2) # L2 norm squared
return(total)
}
f(x,A,b)
# how close do we want to get?
epsilon = 0.1
delta = 0.01
value = (t(A) %*% A) %*% x - t(A) %*% b
L2_norm = (sum(abs(value) ^ 2)) ^ 0.5
steps = vector()
while(L2_norm > delta){
x = x - epsilon * value
value = (t(A) %*% A) %*% x - t(A) %*% b
L2_norm = (sum(abs(value) ^ 2)) ^ 0.5
print(L2_norm)
}
minimum = f(x, A, b)
minimum
minimum_minus = f(x - 0.5*epsilon, A, b)
minimum_minus # less than the minimum found by gradient descent! Why?
On page 94 of the pdf appearing at https://www.deeplearningbook.org/contents/numerical.html
I am trying to find the values of the vector x such that f(x) is minimized. However, as demonstrated by the minimum in my code, and minimum_minus, minimum is not the actual minimum, as it exceeds minimum minus.
Any idea what the problem might be?
Original Problem
Finding the value of x such that the quantity Ax - b is minimized is equivalent to finding the value of x such that Ax - b = 0, or x = (A^-1)*b. This is because the L2 norm is the euclidean norm, more commonly known as the distance formula. By definition, distance cannot be negative, making its minimum identically zero.
This algorithm, as implemented, actually comes quite close to estimating x. However, because of recursive subtraction and rounding one quickly runs into the problem of underflow, resulting in massive oscillation, below:
Value of L2 Norm as a function of step size
Above algorithm vs. solve function in R
Above we have the results of A %% x followed by A %% min_x, with x estimated by the implemented algorithm and min_x estimated by the solve function in R.
The problem of underflow, well known to those familiar with numerical analysis, is probably best tackled by the programmers of lower-level libraries best equipped to tackle it.
To summarize, the algorithm appears to work as implemented. Important to note, however, is that not every function will have a minimum (think of a straight line), and also be aware that this algorithm should only be able to find a local, as opposed to a global minimum.

generating random x and y coordinates with a minimum distance

Is there a way in R to generate random coordinates with a minimum distance between them?
E.g. what I'd like to avoid
x <- c(0,3.9,4.1,8)
y <- c(1,4.1,3.9,7)
plot(x~y)
This is a classical problem from stochastic geometry. Completely random points in space where the number of points falling in disjoint regions are independent of each other corresponds to a homogeneous Poisson point process (in this case in R^2, but could be in almost any space).
An important feature is that the total number of points has to be random before you can have independence of the counts of points in disjoint regions.
For the Poisson process points can be arbitrarily close together. If you define a process by sampling the Poisson process until you don't have any points that are too close together you have the so-called Gibbs Hardcore process. This has been studied a lot in the literature and there are different ways to simulate it. The R package spatstat has functions to do this. rHardcore is a perfect sampler, but if you want a high intensity of points and a big hard core distance it may not terminate in finite time... The distribution can be obtained as the limit of a Markov chain and rmh.default lets you run a Markov chain with a given Gibbs model as its invariant distribution. This finishes in finite time but only gives a realisation of an approximate distribution.
In rmh.default you can also simulate conditional on a fixed number of points. Note that when you sample in a finite box there is of course an upper limit to how many points you can fit with a given hard core radius, and the closer you are to this limit the more problematic it becomes to sample correctly from the distribution.
Example:
library(spatstat)
beta <- 100; R = 0.1
win <- square(1) # Unit square for simulation
X1 <- rHardcore(beta, R, W = win) # Exact sampling -- beware it may run forever for some par.!
plot(X1, main = paste("Exact sim. of hardcore model; beta =", beta, "and R =", R))
minnndist(X1) # Observed min. nearest neighbour dist.
#> [1] 0.102402
Approximate simulation
model <- rmhmodel(cif="hardcore", par = list(beta=beta, hc=R), w = win)
X2 <- rmh(model)
#> Checking arguments..determining simulation windows...Starting simulation.
#> Initial state...Ready to simulate. Generating proposal points...Running Metropolis-Hastings.
plot(X2, main = paste("Approx. sim. of hardcore model; beta =", beta, "and R =", R))
minnndist(X2) # Observed min. nearest neighbour dist.
#> [1] 0.1005433
Approximate simulation conditional on number of points
X3 <- rmh(model, control = rmhcontrol(p=1), start = list(n.start = 42))
#> Checking arguments..determining simulation windows...Starting simulation.
#> Initial state...Ready to simulate. Generating proposal points...Running Metropolis-Hastings.
plot(X3, main = paste("Approx. sim. given n =", 42))
minnndist(X3) # Observed min. nearest neighbour dist.
#> [1] 0.1018068
OK, how about this? You just generate random number pairs without restriction and then remove the onces which are too close. This could be a great start for that:
minimumDistancePairs <- function(x, y, minDistance){
i <- 1
repeat{
distance <- sqrt((x-x[i])^2 + (y-y[i])^2) < minDistance # pythagorean theorem
distance[i] <- FALSE # distance to oneself is always zero
if(any(distance)) { # if too close to any other point
x <- x[-i] # remove element from x
y <- y[-i] # and remove element from y
} else { # otherwise...
i = i + 1 # repeat the procedure with the next element
}
if (i > length(x)) break
}
data.frame(x,y)
}
minimumDistancePairs(
c(0,3.9,4.1,8)
, c(1,4.1,3.9,7)
, 1
)
will lead to
x y
1 0.0 1.0
2 4.1 3.9
3 8.0 7.0
Be aware, though, of the fact that these are not random numbers anymore (however you solve problem).
You can use rejection sapling https://en.wikipedia.org/wiki/Rejection_sampling
The principle is simple: you resample until you data verify the condition.
> set.seed(1)
>
> x <- rnorm(2)
> y <- rnorm(2)
> (x[1]-x[2])^2+(y[1]-y[2])^2
[1] 6.565578
> while((x[1]-x[2])^2+(y[1]-y[2])^2 > 1) {
+ x <- rnorm(2)
+ y <- rnorm(2)
+ }
> (x[1]-x[2])^2+(y[1]-y[2])^2
[1] 0.9733252
>
The following is a naive hit-and-miss approach which for some choices of parameters (which were left unspecified in the question) works well. If performance becomes an issue, you could experiment with the package gpuR which has a GPU-accelerated distance matrix calculation.
rand.separated <- function(n,x0,x1,y0,y1,d,trials = 1000){
for(i in 1:trials){
nums <- cbind(runif(n,x0,x1),runif(n,y0,y1))
if(min(dist(nums)) >= d) return(nums)
}
return(NA) #no luck
}
This repeatedly draws samples of size n in [x0,x1]x[y0,y1] and then throws the sample away if it doesn't satisfy. As a safety, trials guards against an infinite loop. If solutions are hard to find or n is large you might need to increase or decrease trials.
For example:
> set.seed(2018)
> nums <- rand.separated(25,0,10,0,10,0.2)
> plot(nums)
runs almost instantly and produces:
Im not sure what you are asking.
if you want random coordinates here.
c(
runif(1,max=y[1],min=x[1]),
runif(1,max=y[2],min=x[2]),
runif(1,min=y[3],max=x[3]),
runif(1,min=y[4],max=x[4])
)

Difference equations in R

Below is a parasite growth model:
Ni(a, t) represents the expected number parasites of age a at time t, and ki(a, t) represents the killing effect, while PMF the multiplication factor. This is a discrete model as a equals 1, 2, 3.....48. Can anybody tell me how to implement this equation in R using difference equations? Many thanks for your support.
This is the best I could do with the information you provided. Point me to the rest of it and I might be able to get it to actually work, as stands I think it'll recur infinitely.
Ki <- function(a, t){ ## You need to actually define this properly
return(1)
}
Ni <- function(a, t, PMF){
if ((a %% 1 != 0)) stop("Only Takes Integer values of a")
if ((t %% 1 != 0)) stop("Only Takes Integer values of t")
if (a == 1){
x = Ni(48, t-1, PMF)
y = exp(-Ki(48,t-1))
result = PMF * x * y
return(result)
}
if (a > 1){
x = Ni(a-1, t-1, PMF)
y = exp(-Ki(a-1,t-1))
result = x * y
return(result)
}
}
You don't have a set of initial conditions. Once you've got some initial 48 values for N(a=1..48,t=1) you compute N(a=1,t=2) from the second equation, and then compute N(a=2..48,t=2) from the first equation. Repeat for t=3 and so on.
What you have is a recurrence relation, not a differential equation. You step through a recurrence relation as I've just explained.
It might be possible to convert this to a system of differential equations by looking at N(t)-N(t-1)/dt and solving but that's a maths job not a programming job.

Errors when attempting constrained optimisation using optim()

I have been using the Excel solver to handle the following problem
solve for a b and c in the equation:
y = a*b*c*x/((1 - c*x)(1 - c*x + b*c*x))
subject to the constraints
0 < a < 100
0 < b < 100
0 < c < 100
f(x[1]) < 10
f(x[2]) > 20
f(x[3]) < 40
where I have about 10 (x,y) value pairs. I minimize the sum of abs(y - f(x)). And I can constrain both the coefficients and the range of values for the result of my function at each x.
I tried nls (without trying to impose the constraints) and while Excel provided estimates for almost any starting values I cared to provide, nls almost never returned an answer.
I switched to using optim, but I'm having trouble applying the constraints.
This is where I have gotten so far-
best = function(p,x,y){sum(abs(y - p[1]*p[2]*p[3]*x/((1 - p[3]*x)*(1 - p[3]*x + p[2]*p[3]*x))))}
p = c(1,1,1)
x = c(.1,.5,.9)
y = c(5,26,35)
optim(p,best,x=x,y=y)
I did this to add the first set of constraints-
optim(p,best,x=x,y=y,method="L-BFGS-B",lower=c(0,0,0),upper=c(100,100,100))
I get the error ""ERROR: ABNORMAL_TERMINATION_IN_LNSRCH"
and end up with a higher value of the error ($value). So it seems like I am doing something wrong. I couldn't figure out how to apply my other set of constraints at all.
Could someone provide me a basic idea how to solve this problem that a non-statistician can understand? I looked at a lot of posts and looked in a few R books. The R books stopped at the simplest use of optim.
The absolute value introduces a singularity:
you may want to use a square instead,
especially for gradient-based methods (such as L-BFGS).
The denominator of your function can be zero.
The fact that the parameters appear in products
and that you allow them to be (arbitrarily close to) zero
can also cause problems.
You can try with other optimizers
(complete list on the optimization task view),
until you find one for which the optimization converges.
x0 <- c(.1,.5,.9)
y0 <- c(5,26,35)
p <- c(1,1,1)
lower <- 0*p
upper <- 100 + lower
f <- function(p,x=x0,y=y0) sum(
(
y - p[1]*p[2]*p[3]*x / ( (1 - p[3]*x)*(1 - p[3]*x + p[2]*p[3]*x) )
)^2
)
library(dfoptim)
nmkb(p, f, lower=lower, upper=upper) # Converges
library(Rvmmin)
Rvmmin(p, f, lower=lower, upper=upper) # Does not converge
library(DEoptim)
DEoptim(f, lower, upper) # Does not converge
library(NMOF)
PSopt(f, list(min=lower, max=upper))[c("xbest", "OFvalue")] # Does not really converge
DEopt(f, list(min=lower, max=upper))[c("xbest", "OFvalue")] # Does not really converge
library(minqa)
bobyqa(p, f, lower, upper) # Does not really converge
As a last resort, you can always use a grid search.
library(NMOF)
r <- gridSearch( f,
lapply(seq_along(p), function(i) seq(lower[i],upper[i],length=200))
)

In R, how do I find the optimal variable to minimise the correlation between two datasets [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
In R, how do I find the optimal variable to maximize or minimize correlation between several datasets
This can be done in Excel, but my dataset has gotten too large. In excel, I would use solver.
I have 5 variables and I want to recreate a weighted average of these 5 variables so that they have the lowest correlation to a 6th variable.
Column A,B,C,D,E = random numbers
Column F = random number (which I want to minimise the correlation to)
Column G = Awi1+Bwi2+C*2i3+D*wi4+wi5*E
where wi1 to wi5 are coefficients resulted from solver In a separate cell, I would have correl(F,G)
This is all achieved with the following constraints in mind:
1. A,B,C,D, E have to be between 0 and 1
2. A+B+C+D+E= 1
I'd like to print the results of this so that I can have an efficient frontier type chart.
How can I do this in R? Thanks for the help.
I looked at the other thread mentioned by Vincent and I think I have a better solution. I hope it is correct. As Vincent points out, your biggest problem is that the optimization tools for such non-linear problems do not offer a lot of flexibility for dealing with your constraints. Here, you have two types of constraints: 1) all your weights must be >= 0, and 2) they must sum to 1.
The optim function has a lower option that can take care of your first constraint. For the second constraint, you have to be a bit creative: you can force your weights to sum to one by scaling them inside the function to be minimized, i.e. rewrite your correlation function as function(w) cor(X %*% w / sum(w), Y).
# create random data
n.obs <- 100
n.var <- 6
X <- matrix(runif(n.obs * n.var), nrow = n.obs, ncol = n.var)
Y <- matrix(runif(n.obs), nrow = n.obs, ncol = 1)
# function to minimize
correl <- function(w)cor(X %*% w / sum(w), Y)
# inital guess
w0 <- rep(1 / n.var, n.var)
# optimize
opt <- optim(par = w0, fn = correl, method = "L-BFGS-B", lower = 0)
optim.w <- opt$par / sum(opt$par)

Resources