There is my data (x and y columns are relevant):
What I need is to fit the data with the polyline. Matlab code that does this is:
function [score, params] = spline_fit (points, x, y)
min_f = min(x)-1;
max_f = max(x);
points = [min_f points max_f];
params = zeros(length(points)-1, 2);
score = 0;
for i = 1:length(points)-1
in = (x > points(i)) & (x <= points(i+1));
if sum(in) > 2
p = polyfit(x(in), y(in), 1);
pred = p(1)*x(in) + p(2);
score = score + norm(pred - y(in));
params(i, :) = p;
params(i, :) = nan;
%Find the parameters
r = [100,250,400];
p = fminsearch('spline_fit', r, [], x, y)
[score, param] = spline_fit(p, x, y)
%Plot the result
y1 = zeros(size(x));
p1 = [-inf, p, inf];
for i = 1:size(param, 1)
in = (x > p1(i)) & (x <= p1(i+1));
y1(in) = x(in)*param(i,1) + param(i,2);
[x1, I] = sort(x);
y1 = y1(I);
plot(x,y,'x',x1,y1,'k','LineWidth', 2)
And this does work fine, producing following optimization: [102.9842, 191.0006, 421.9912]
I've implemented the same idea in R:
spline_fit <- function(x, xx, yy) {
min_f = min(xx)-1;
max_f = max(xx);
points = c(min_f, x, max_f)
params = array(0, c(length(points)-1, 2));
score = 0;
for( i in 1:length(points)-1)
inn <- (xx > points[i]) & (xx <= points[i+1]);
if (sum(inn) > 2)
p <- polyfit(xx[inn], yy[inn], 1);
pred <- p[1]*xx[inn] + p[2];
score <- score + norm(as.matrix(pred - yy[inn]),"F");
params[i,] <- p;
params[i,] <- NA;
But I get very bad results:
> fminsearch(spline_fit,c(100,250,400), xx = Simple1$x, yy = Simple1$y)
[1] 100.1667 250.0000 400.0000
[1] 4452.761
[1] 2
As you can see, it stops after 2 iterations and doesn't produce good points.
I'll be very glad for any help in resolving this issue.
Also, if anyone knows how to implement this in C# using any free library, it will be even better. I know whereto get polyfit, but not fminsearch.
The problem here is that the likelihood surface is very badly behaved -- there are both multiple minima and discontinuous jumps -- which will make the results you get with different optimizers almost arbitrary. I will admit that MATLAB's optimizers are remarkably robust, but I would say that it's pretty much a matter of chance (and where you start) whether an optimizer will get to the global minimum for this case, unless you use some form of stochastic global optimization such as simulated annealing.
I chose to use R's built-in optimizer (which uses Nelder-Mead by default) rather than fminsearch from the pracma package.
spline_fit <- function(x, xx = Simple1$x, yy=Simple1$y) {
min_f = min(xx)-1
max_f = max(xx)
points = c(min_f, x, max_f)
params = array(0, c(length(points)-1, 2))
score = 0
for( i in 1:(length(points)-1))
inn <- (xx > points[i]) & (xx <= points[i+1]);
if (sum(inn) > 2)
p <- polyfit(xx[inn], yy[inn], 1);
pred <- p[1]*xx[inn] + p[2];
score <- score + norm(as.matrix(pred - yy[inn]),"F");
params[i,] <- p;
params[i,] <- NA;
library(pracma) ## for polyfit
Simple1 <- read.csv("Simple1.csv")
opt1 <- optim(fn=spline_fit,c(100,250,400), xx = Simple1$x, yy = Simple1$y)
## [1] 102.4365 201.5835 422.2503
This is better than the fminsearch results, but still different from the MATLAB results, and worse than them:
## Matlab results:
matlab_fit <- c(102.9842, 191.0006, 421.9912)
spline_fit(matlab_fit, xx = Simple1$x, yy = Simple1$y)
## 3724.3
## 3755.5 (worse)
The bbmle package offers an experimental/not very well documented set of tools for exploring optimization surfaces:
ss <- slice2D(fun=spline_fit,opt1$par,nt=51)
A 2D "slice" around the optim-estimated parameters. The circles show the optim fit (solid) and the minimum value within each slice (open).
A 'slice' between the matlab and optim fits shows that the surface is quite rugged:
ss2 <- bbmle:::slicetrans(matlab_fit,opt1$par,spline_fit)
I would like to compute the Area Under the Curve defined by a set of experimental values. I created a function to calculate an aproximation of the AUC using the Simpson's rule as I saw in this post. However, the function only works when it receives a vector of odd length. How can I modify the code to add the area of the last trapezoid when the input vector has an even length.
AUC <- function(x, h=1){
# AUC function computes the Area Under the Curve of a time serie using
# the Simpson's Rule (numerical method).
# Arguments
# x: (vector) time serie values
# h: (int) temporal resolution of the time serie. default h=1
n = length(x)-1
xValues = seq(from=1, to=n, by=2)
sum <- list()
for(i in 1:length(xValues)){
n_sub <- xValues[[i]]-1
n <- xValues[[i]]
n_add <- xValues[[i]]+1
v1 <- x[[n_sub+1]]
v2 <- x[[n+1]]
v3 <- x[[n_add+1]]
s <- (h/3)*(v1+4*v2+v3)
sum <- append(sum, s)
sum <- unlist(sum)
auc <- sum(sum)
Here a data example:
smoothed = c(0.3,0.317,0.379,0.452,0.519,0.573,0.61,0.629,0.628,0.613,0.587,0.556,0.521,
One recommended way to handle an even number of points and still achieve precision is to combine Simpson's 1/3 rule with Simpson's 3/8 rule, which can handle an even number of points. Such approaches can be found in (at least one or perhaps more) engineering textbooks on numerical methods.
However, as a practical matter, you can write a code chunk to check the data length and add a single trapezoid at the end, as was suggested in the last comment of the post to which you linked. I wouldn't assume that it is necessarily as precise as combining Simpson's 1/3 and 3/8 rules, but it is probably reasonable for many applications.
I would double-check my code edits below, but this is the basic idea.
AUC <- function(x, h=1){
# AUC function computes the Area Under the Curve of a time serie using
# the Simpson's Rule (numerical method).
# Arguments
# x: (vector) time serie values
# h: (int) temporal resolution of the time serie. default h=1
#jh edit: check for even data length
#and chop off last data point if even
nn = length(x)
if(length(x) %% 2 == 0){
xlast = x[length(x)]
x = x[-length(x)]
n = length(x)-1
xValues = seq(from=1, to=n, by=2)
sum <- list()
for(i in 1:length(xValues)){
n_sub <- xValues[[i]]-1
n <- xValues[[i]]
n_add <- xValues[[i]]+1
v1 <- x[[n_sub+1]]
v2 <- x[[n+1]]
v3 <- x[[n_add+1]]
s <- (h/3)*(v1+4*v2+v3)
sum <- append(sum, s)
sum <- unlist(sum)
auc <- sum(sum)
##jh edit: add trapezoid for last two data points to result
if(nn %% 2 == 0){
auc <- auc + (x[length(x)] + xlast)/2 * h
sm = smoothed[-length(smoothed)]
[1] 70
#even data as an example
[1] 20.17633
#original odd data
[1] 20.389
There may be a good reason for you to prefer using Simpson's rule, but if you're just looking for a quick and efficient estimate of AUC, the trapezoid rule is far easier to implement, and does not require an even number of breaks:
AUC <- function(x, h = 1) sum((x[-1] + x[-length(x)]) / 2 * h)
#> [1] 20.3945
Here, I show example code that uses the Simpson's 1/3 and 3/8 rules in tandem for the numerical integration of data. As always, the usual caveats about the possibility of coding errors or compatibility issues apply.
The output at the end compares the numerical estimates of this algorithm with the trapezoidal rule using R's "integrate" function.
#Algorithm adapted from:
#Numerical Methods for Engineers, Seventh Edition,
#By Chapra and Canale, page 623
#Modified to accept data instead of functional values
#Modified by: Jeffrey Harkness, M.S.
##Begin Simpson's rule function code
simp13 <- function(dat, h = 1){
ans = 2*h*(dat[1] + 4*dat[2] + dat[3])/6
simp13m <- function(dat, h = 1){
summ <- dat[1]
n <- length(dat)
nseq <- seq(2,(n-2),2)
for(i in nseq){
summ <- summ + 4*dat[i] + 2*dat[i+1]}
summ <- summ + 4*dat[n-1] + dat[n]
result <- (h*summ)/3
simp38 <- function(dat, h = 1){
ans <- 3*h*(dat[1] + 3*sum(dat[2:3]) + dat[4])/8
simpson = function(dat, h = 1){
hin = h
len = length(dat)
comp <- len %% 2
##number of segments
if(len == 2){
ans = sum(dat)/2*h} ##n = 2 is the trapezoidal rule
if(len == 3){
ans = simp13(dat, h = hin)}
if(len == 4){
ans = simp38(dat,h = hin)}
if(len == 6){
ans <- simp38(dat[1:4],h = hin) + simp13(dat[4:len],h = hin)}
if(len > 6 & comp == 0){
ans = simp38(dat[1:4],h = hin) + simp13m(dat[4:len],h = hin)}
if(len >= 5 & comp == 1){
ans = simp13m(dat,h = hin)}
##End Simpson's rule function code
This next section of code shows the performance comparison. This code can easily be altered for different test functions and cases.
The precision difference tends to change with the sample size and test function used; this example is not intended to imply that the difference is always this pronounced.
#other algorithm for comparison purposes, from Allan Cameron above
oa <- function(x, h = 1) sum((x[-1] + x[-length(x)]) / 2 * h)
#Testing and algorithm comparison code
simans = NULL; oaans = NULL; simerr = NULL; oaerr = NULL; mp = NULL
for( j in 1:10){
n = j
#f = function(x) cos(x) + 2 ##Test functions
f = function(x) 0.2 + 25*x - 200*x^2 + 675*x^3 - 900*x^4 + 400*x^5
a = 0;b = 10
h = (b-a)/n
datain = seq(a,b,by = h)
preans = integrate(f,a,b)$value #precise numerical estimate of test function
simans[j] = simpson(f(datain), h = h)
oaans[j] = oa(f(datain), h = h)
(simerr[j] = abs(simans[j] - preans)/preans * 100)
(oaerr[j] = abs(oaans[j] - preans)/preans * 100)
mp[j] = simerr[j] < oaerr[j]
(outframe = data.frame("simpsons percent diff" = simerr,"trapezoidal percent diff" = oaerr, "more precise?" = mp, check.names = F))
simpsons percent diff trapezoidal percent diff more precise?
1 214.73489738 214.734897 FALSE
2 15.07958148 64.993410 TRUE
3 6.70203621 29.816799 TRUE
4 0.94247384 16.955208 TRUE
5 0.54830021 10.905620 TRUE
6 0.18616767 7.593825 TRUE
7 0.12051767 5.588209 TRUE
8 0.05890462 4.282980 TRUE
9 0.04087107 3.386525 TRUE
10 0.02412733 2.744500 TRUE
I apologise if this is a duplicate; I've read answers to similar questions to no avail.
I'm trying to integrate under a curve, given a specific formula (below) for said integration.
As a toy example, here's some data:
Antia_Model <- function(t,y,p1){
r <- p1[1]; k <- p1[2]; p <- p1[3]; o <- p1[4]
P <- y[1]; I <- y[2]
dP = r*P - k*P*I
dI = p*I*(P/(P + o))
r <- 0.25; k <- 0.01; p <- 1; o <- 1000 # Note that r can range btw 0.1 and 10 in this model
parms <- c(r, k, p, o)
P0 <- 1; I0 <- 1
N0 <- c(P0, I0)
TT <- seq(0.1, 50, 0.1)
results <- lsoda(N0, TT, Antia_Model, parms, verbose = FALSE)
P <- results[,2]; I <- results[,3]
As I understand it, I should be able to use the auc() function from the MESS package (can I just use the integrate() function? Unclear...), which should look something like this:
auc(P, TT, from = x1, to = x2, type = "spline")
Though I don't really understand how to use the "from" and "to" arguments, or how to incorporate "u" from the original integration formula...
Using the integrate() function seems more intuitive, but if I try:
u <- 1
integrand <- function(P) {u*P}
q <- integrate(integrand, lower = 0, upper = Inf)
I get this error:
# Error in integrate(integrand, lower = 0, upper = Inf) :
# the integral is probably divergent
As you can tell, I'm pretty lost, so any help would be greatly appreciated! Thank you so much! :)
integrand is technically acceptable but right now, it's the identity function f(x) = x. The area under it from [0, inf) is infinite, i.e. divergent.
From the documentation of integrate the first argument is:
an R function taking a numeric first argument and returning a numeric vector of the same length. Returning a non-finite element will generate an error.
If instead you use a pulse function:
pulse <- function(x) {ifelse(x < 5 & x >= 0, 1, 0)}
integrate(pulse, lower = 0, upper = Inf)
#> 5 with absolute error < 8.5e-05
I am reading Section 4.2 in Simulation (2006, 4ed., Elsevier) by Sheldon M. Ross, which introducing generating a Poisson random variable by the inverse transform method.
Denote pi =P(X=xi)=e^{-λ} λ^i/i!, i=0,1,... and F(i)=P(X<=i)=Σ_{k=0}^i pi to be the PDF and CDF for Poisson, respectively, which can be computed via dpois(x,lambda) and ppois(x,lambda) in R.
There are two inverse transform algorithms for Poisson: the regular version and the improved one.
The steps for the regular version are as follows:
Simulate an observation U from U(0,1).
Set i=0 and F=F(0)=p0=e^{-λ}.
If U<F, select X=i and terminate.
If U >= F, obtain i=i+1, F=F+pi and return to the previous step.
I write and test the above steps as follows:
### write the regular R code
pois_inv_trans_regular = function(n, lambda){
X = rep(0, n) # generate n samples
for(m in 1:n){
U = runif(1)
i = 0; F = exp(-lambda) # initialize
while(U >= F){
i = i+1; F = F + dpois(i,lambda) # F=F+pi
X[m] = i
### test the code (for small λ, e.g. λ=3)
set.seed(0); X = pois_inv_trans_regular(n=10000,lambda=3); c(mean(X),var(X))
# [1] 3.005000 3.044079
Note that the mean and variance for Poisson(λ) are both λ, so the writing and testing for the regular code are making sense!
Next I tried the improved one, which is designed for large λ and described according to the book as follows:
The regular algorithm will need to make 1+λ searches, i.e. O(λ) computing complexity, which is fine when λ is small, while it can be greatly improved upon when λ is large.
Indeed, since a Poisson random variable with mean λ is most likely to take on one of the two integral values closest to λ , a more efficient algorithm would first check one of these values, rather than starting at 0 and working upward. For instance, let I=Int(λ) and recursively determine F(I).
Now generate a Poisson random variable X with mean λ by generating a random number U, noting whether or not X <= I by seeing whether or not U <= F(I). Then search downward starting from I in the case where X <= I and upward starting from I+1 otherwise.
It is said that the improved algorithm only need 1+0.798√λ searches, i.e., having O(√λ) complexity.
I tried to wirte the R code for the improved one as follows:
### write the improved R code
pois_inv_trans_improved = function(n, lambda){
X = rep(0, n) # generate n samples
p = function(x) {dpois(x,lambda)} # PDF: p(x) = P(X=x) = λ^x exp(-λ)/x!
F = function(x) {ppois(x,lambda)} # CDF: F(x) = P(X ≤ x)
I = floor(lambda) # I=Int(λ)
F1 = F(I); F2 = F(I+1) # two close values
for(k in 1:n){
U = runif(1)
i = I
if ( F1 < U & U <= F2 ) {
i = I+1
while (U <= F1){ # search downward
i = i-1; F1 = F1 - p(i)
while (U > F2){ # search upward
i = i+1; F2 = F2 + p(i)
X[k] = i
### test the code (for large λ, e.g. λ=100)
set.seed(0); X = pois_inv_trans_improved(n=10000,lambda=100); c(mean(X),var(X))
# [1] 100.99900000 0.02180118
From the simulation results [1] 100.99900000 0.02180118 for c(mean(X),var(X)), which shows nonsense for the variance part. What should I remedy this issue?
The main problem was that F1 and F2 were modified within the loop and not reset, so eventually a very wide range of U's are considered to be in the middle.
The second problem was on the search downward the p(i) used should be the original i, because F(x) = P(X <= x). Without this, the code hangs for low U.
The easiest fix for this is to start i = I + 1. Then "in the middle" if statement isn't needed.
pois_inv_trans_improved = function(n, lambda){
X = rep(0, n) # generate n samples
p = function(x) {dpois(x,lambda)} # PDF: p(x) = P(X=x) = λ^x exp(-λ)/x!
`F` = function(x) {ppois(x,lambda)} # CDF: F(x) = P(X ≤ x)
I = floor(lambda) # I=Int(λ)
F1 = F(I); F2 = F(I+1) # two close values
for(k in 1:n){
U = runif(1)
i = I + 1
# if ( F1 < U & U <= F2 ) {
# i = I + 1
# }
F1tmp = F1
while (U <= F1tmp){ # search downward
i = i-1; F1tmp = F1tmp - p(i);
F2tmp = F2
while (U > F2tmp){ # search upward
i = i+1; F2tmp = F2tmp + p(i)
X[k] = i
This gives:
[1] 100.0056 102.2380
I am taking baby steps to use metaheuristics for solving constrained optimization problems. I am trying to solve basic Markowitz Mean-Variance optimization model (given below) using NMOFpackage in R.
lambda * [sum{i=1 to N}sum{j = 1 to N}w_i*w_i*Sigma_ij] - (1-lambda) * [sum{i=1 to N}(w_i*mu_i)]
subject to
sum{i=1 to N}{w_i} = 1
0 <= w_i <= 1; i = 1,...,N
where, lambda takes values between 0 and 1, N is number of assets.
Following is my code (Based on Book: Numerical Methods and Optimization in Finance):
na <- dim(fundData)[2L]
ns <- dim(fundData)[1L]
Sigma <- cov(fundData)
winf <- 0.0
wsup <- 1.0
m <- colMeans(fundData)
resample <- function(x,...) x[,...)]
data <- list(R = t(fundData),
m = m,
na = dim(fundData)[2L],
ns = dim(fundData)[1L],
Sigma = Sigma,
eps = 0.5/100,
winf = winf,
wsup = wsup,
nFP = 100)
w0 <- runif(data$na); w0 <- w0/sum(w0)
OF <- function(w,data){
wmu <- crossprod(w,m)
res <- crossprod(w, data$Sigma)
res <- tcrossprod(w,res)
result <- res - wmu
neighbour <- function(w, data){
toSell <- w > data$winf
toBuy <- w < data$wsup
i <- resample(which(toSell), size = 1L)
j <- resample(which(toBuy), size = 1L)
eps <- runif(1) * data$eps
eps <- min(w[i] - data$winf, data$wsup - w[j], eps)
w[i] <- w[i] - eps
w[j] <- w[j] + eps
algo <- list(x0 = w0, neighbour = neighbour, nS = 5000L)
system.time(sol1 <- LSopt(OF, algo, data))
I am not sure how to include lambda in the objective function (OF). The above code does not include lambda in OF. I tried using for loop but it resulted in following error:
OF <- function(w,data){
lambdaSeq <- seq(.001,0.999, length = data$nFP)
for(lambda in lambdaSeq){
wmu <- crossprod(w,m)
res <- crossprod(w, data$Sigma)
res <- tcrossprod(w,res)
result <- lambda*res - (1-lambda)*wmu
Local Search.
Initial solution:
| | 0%
Error in if (xnF <= xcF) { : argument is of length zero
Timing stopped at: 0.01 0 0.03
It would be nice if someone could help me in this regard.
P.S: I am also aware that this can be solved using quadratic programming. This is just an initiation to include other constraints.
If I understand correctly, you want to replicate the mean--variance efficient frontier by Local Search? Then you need to run a Local Search for every value of lambda that you want to include in the frontier.
The following example should help you get going. I start by attaching the package and setting up the list data.
data <- list(m = colMeans(fundData), ## expected returns
Sigma = cov(fundData), ## expected var of returns
na = dim(fundData)[2L], ## number of assets
eps = 0.2/100, ## stepsize for LS
winf = 0, ## minimum weight
wsup = 1, ## maximum weight
lambda = 1)
Next I compute a benchmark for the minimum-variance case (i.e. lambda equals one).
## benchmark: the QP solution
## ==> this will only work with a recent version of NMOF,
## which you can get by saying:
## install.packages('NMOF', type = 'source',
## repos = c('',
## getOption('repos')))
sol <- NMOF:::minvar(data$Sigma, 0, 1)
Objective function and neighbourhood function. I have slightly simplified both functions (for clarity; using crossprod in the objective function would probably be more efficient).
OF <- function(w, data){
data$lambda * (w %*% data$Sigma %*% w) -
(1 - data$lambda) * sum(w * data$m)
neighbour <- function(w, data){
toSell <- which(w > data$winf)
toBuy <- which(w < data$wsup)
i <- toSell[, size = 1L)]
j <- toBuy[, size = 1L)]
eps <- runif(1) * data$eps
eps <- min(w[i] - data$winf, data$wsup - w[j], eps)
w[i] <- w[i] - eps
w[j] <- w[j] + eps
Now we can run Local Search. Since it is a fairly large dataset (200 assets),
you will need a relatively large number of steps to reproduce the QP solution.
w0 <- runif(data$na) ## a random initial solution
w0 <- w0/sum(w0)
algo <- list(x0 = w0, neighbour = neighbour, nS = 50000L)
sol1 <- LSopt(OF, algo, data)
You can compare the weights you get from Local Search with the QP solution.
par(mfrow = c(3,1), mar = c(2,4,1,1), las = 1)
barplot(sol, main = "QP solution")
barplot(sol1$xbest, main = "LS solution")
barplot(sol - sol1$xbest,
ylim = c(-0.001,0.001)) ## +/-0.1%
Finally, if you want to compute the whole frontier, you need to rerun this code for different levels of data$lambda.
Any help with this would be greatly appreciated. I am optimising parameters of a lognormal distribution so that the proportion of estimates matches a set of target values (distances). The proportions are calculated using the following functions:
adj_sumifs <- function(sum_array, condition_array, f, m=1){
n <- length(condition_array)
sm = 0
if (n == length(condition_array)){
fun <- function(x,i){if (f (condition_array[i])){sum_array[i] + x}else{x} }
sm <- Reduce(fun,1:n,0)
ifelse(m <= 0, sm , sm/m)
estimate.inrange <- function(vals,dist,lower,upper,total){
n <- length(lower)
if (n == length(upper)){
sapply(1:n, function(i){ ifelse(i < n ,
adj_sumifs(vals,dist, (function(x) x >= lower[i] && x < upper[i]),total) ,
adj_sumifs(vals,dist, (function(x) x >= lower[i]) , total)
) }
# for a failure in the process
And the function I would like to optimise is:
calculate_Det_ptns <- function(alpha, beta, pxa, low,up, distances, eF){
temp <- numeric()
if ( length(pxa) == length(distances) && length(low) == length(up) )
ln_values <- as.numeric(Map(function(pa,d) eF * pa * dlnorm(d, meanlog = alpha, sdlog = beta),pxa,distances))
temp <- estimate.inrange (ln_values,distances,low,up, total = sum(ln_values))
Optimisation is done using the Levenberg-Marquardt algorithm
lnVals <- nlsLM(target ~ calculate_Det_ptns(alpha = a,beta = b, pxa = odab,low = low, up = up, distances = dist, eF = expF),
start = list(a = mu, b = sd ),
trace = T)
where up,low and target are extracted from the same data file, e,g,
low, up, target
while odab and distance are vectors of arbitrary lengths (usually much longer than target,etc). The process works well when the target file has anout 150 rows, and distances and odab have about 500000 values. However, for reasons I cannot fathom, is fails when the target file has about 16 rows. The error message is:
Error in model.frame.default(formula = ~target + odab + low + up + dist) :
variable lengths differ (found for 'odab')
which suggests that the function is not being evaluated in the formula. Can anyone suggest a solution or explanation? It is important that the proportions are re-estimated for every new mu and sd.
You could try surrounding the function with I(), which will evaluate it as is before evaluating the formula; however, I could not replicate your problem with the code provided because I am missing some of the referenced objects (a, b, odab, dist, expF, mu, sd) so I could not confirm whether or not this works.
nVals <- nlsLM(target ~ I(calculate_Det_ptns(alpha = a,beta = b, pxa = odab,low = low, up = up, distances = dist, eF = expF)), start = list(a = mu, b = sd ), trace = T)