Create a function with multiple parameters in R - r

I want to compute the following functions :
here, g(x) is the density function of a distribution. I want to compute this function for several distributions. In addition, I use the library fitdistrplus.
To create g, I use the function do.call this way :
g<-function(x) {do.call(paste("d",i,sep=""),c(list(x=x),fti$estimate))}
fti$estimate contains the parameters of the distribution i.
G(x) is the cumulative distribution computed this way :
G<-function(x) {do.call(paste("p",i,sep=""),c(list(q=x),fti$estimate))}
I compute f(x) this way :
f<function(n,x) {n*g(x)*(1-G(x))^(n-1)
At last, I compute h(x) this way :
h<- function(n) {integrate(function(x) {x*f(n,x)},0,Inf)}
However, I can't plot these functions, I get the following errors :
1: In n*g(x):
Longer object length is not a multiple of shorter object length
2: In (1-G(x))^(n-1):
Longer object length is not a multiple of shorter object length
3: In x*f(n,x) :
Longer object length is not a multiple of shorter object length
Beyond, if I juste want to plot f(n,x), I get this error :
Error in list(x=x) :'x' is missing
The minimal snipset I have is the following
#i can be "exp" "lnorm" "norm" etc...
for( i in functionsName) {
png(paste(fileBase,"_",i,"_","graphics.png",sep=""))
plot.new()
fti<-fitdist(data, i)
plotdist(data,i, para=as.list(fti[[1]]))
#fti is a datatable or datafram
#fti$estimate looks like this :
# meanlog sdlog
#8.475449 1.204958
#g
pdf<-function(x) {do.call(paste("d",i,sep=""), c(list(x=x),fti$estimate))}
#G
cdf<-function(x) do.call(paste("p",i,sep=""), c(list(q=x),fti$estimate))
#f
minLaw<- function(n,x) {n*pdf(x)*(1-cdf(x))^(n-1)}
#h
minExpectedValue<-function(n) {integrate(function(x) {x*minLaw(n,x)},0,Inf)}
#these 2 following lines give an error
plot(minExpectedValue)
plot(minLaw)
dev.off()
}

I had to do some reverse engineering to figure out your d1, q1 etc calls, but I think this is how you do it. Perhaps the original problem lies in a function call like f(n=2:3, x=1:9); in such a call n should be a single value, not a vector of values.
Even if length of x was a multiple of n length, the output would most likely not be what you really wanted.
If you try to give n a vector form, you might end up in a recycled (false) output:
> print(data.frame(n=2:3, x=1:6))
- n x
1 2 1
2 3 2
3 2 3
4 3 4
5 2 5
6 3 6
where x would be evaluated with n=2 at point x=1, n=3 at point x=2 etc. What you really would've wanted is something in the lines of:
> print(expand.grid(x=1:5, n=2:3))
- x n
1 1 2
2 2 2
3 3 2
4 4 2
5 5 2
6 1 3
7 2 3
8 3 3
9 4 3
10 5 3
You could do this by calling separately for each n value:
lapply(2:3, FUN=function(n) (f(n, x=1:5)))
#[[1]]
#[1] 0.0004981910 0.0006066275 0.0007328627 0.0008786344 0.0010456478
#
#[[2]]
#[1] 0.0007464956 0.0009087272 0.0010974595 0.0013152213 0.0015644676
Did you use the same fti for all the distribution fits, even though it should've been different? Or does the i in fti refer to index i and it was a list of fits in form of ft[[i]]?
Below is a wrapper function, which is called separately for each n-value (and distribution i):
wrapper <- function(i, x, n, fti){
# As was provided by OP
g<-function(x) {do.call(paste("d",i,sep=""),c(list(x=x),fti$estimate))}
G<-function(x) {do.call(paste("p",i,sep=""),c(list(q=x),fti$estimate))}
# does the i in fti refer to fit of i:th distribution, i.e. should it be a list where i:th location in ft is i:th distribution estimates?
f<-function(n,x) {n*g(x)*(1-G(x))^(n-1)}
# was missing a '-' and a '}'
h<- function(n) {integrate(function(x) {x*f(n,x)},0,Inf)}
list(gres = g(x), Gres = G(x), fres = f(n,x), hres = h(n))
}
# Example data
require("fitdistrplus")
data(groundbeef)
serving <- groundbeef$serving
# Gumbel distribution
d1 <- function(x, a, b) 1/b*exp((a-x)/b)*exp(-exp((a-x)/b))
p1 <- function(q, a, b) exp(-exp((a-q)/b))
q1 <- function(p, a, b) a-b*log(-log(p))
fti1 <- fitdist(serving, "1", start=list(a=10, b=10))
#> fti1$estimate
# a b
#56.95893 29.07871
# Normal distribution
# dnorm, pnorm and qnorm are available in the default environment
d2 <- dnorm
p2 <- pnorm
q2 <- qnorm
fti2 <- fitdist(serving, "2", start=list(mean=0, sd=1))
#> fti2$estimate
# mean sd
#73.67743 35.92581
# Sequence of x-values
xs <- seq(-100, 100, by=1)
print((resultdist1n2 <- wrapper(i=1, x=xs, n=2, fti=fti1))$hres)
print((resultdist1n3 <- wrapper(i=1, x=xs, n=3, fti=fti1))$hres)
print((resultdist2n2 <- wrapper(i=2, x=xs, n=2, fti=fti2))$hres)
print((resultdist2n3 <- wrapper(i=2, x=xs, n=3, fti=fti2))$hres)
plot(xs, resultdist1n2$fres, col=1, type="l", ylim=c(0,0.025), xlab="x", ylab="f(n, x)")
points(xs, resultdist1n3$fres, col=2, type="l")
points(xs, resultdist2n2$fres, col=3, type="l")
points(xs, resultdist2n3$fres, col=4, type="l")
legend("topleft", legend=c("Gamma (i=1) n=2", "Gamma (i=1) n=3", "Normal (i=2) n=2", "Normal (i=2) n=3"), col=1:4, lty=1)
And the results of your desired h as found in resultdist1n2$hres etc:
h(n=2) for distribution i=1:
53.59385 with absolute error < 0.00022
h(n=3) for distribution i=1:
45.23146 with absolute error < 4.5e-05
h(n=2) for distribution i=2:
53.93748 with absolute error < 1.1e-05
h(n=3) for distribution i=2:
44.06331 with absolute error < 2e-05
EDIT: Here's how one uses the lapply function to call for each of the vector of n values 0<=n<=256:
ns <- 0:256
res1 <- lapply(ns, FUN=function(nseq) wrapper(i=1, x=xs, n=nseq, fti=fti1))
par(mfrow=c(1,2))
plot.new()
plot.window(xlim=c(-100,100), ylim=c(0, 0.05))
box(); axis(1); axis(2); title(xlab="x", ylab="f(n,x)", main="f(n,x) for gamma (i=1), n=0:256")
for(i in 1:length(ns)) points(xs, res1[[i]]$fres, col=rainbow(257)[i], type="l")
# perform similarly for the other distributions by calling with i=2, fti=fti2
# h as a function of n for dist i=1
plot(ns, unlist(lapply(res1, FUN=function(x) x$hres$value)), col=rainbow(257), xlab="n", ylab="h(n)", main="h(n) for gamma (i=1), n=0:256")
I would plot each distribution i separately like this.

The problem is that the plot method for a function expects that the function will be vectorised. In other words, if given an argument of length N, it should return a vector of results also of length N.
Your minExpectedValue doesn't satisfy this; it expects that n will be a scalar, and returns a scalar. You can quickly fix this up with Vectorize. You also need to specify the name of the argument to plot over, in this case n.
minExpectedValue <- Vectorize(minExpectedValue)
plot(minExpectedValue, xname="n")

Related

How to get the coefficents of a Cross Validated Lasso for a specific lambda (Not the “1se” or “min” lambda)

I run a CV Lasso with the cv.gamlr function in R. I can get the coefficients for the lambdas that correspond to the “1se” or “min” criterion.
set.seed(123)
lasso<-cv.gamlr(x = X, y = Y, family ='binomial')
coef(lasso,select = "1se")
coef(lasso,select = "min")
But what if I want to obtain the coefficients for a specific lambda, stored in the lasso$gamlr$lambda vector? Is it possible to obtain them?
For example, to get the coefficients for the first lambda in the model... Something like this:
lambda_100<- lasso$gamlr$lambda[100]
coef(lasso,select = lambda_100)
Of course, this sends the following error:
Error in match.arg(select) : 'arg' must be NULL or a character vector
Thanks :)
The coefficients are stored under lasso$gamlr$beta, in your example, you can access them like this:
library(gamlr)
x = matrix(runif(500),ncol=5)
y = rnorm(100)
cvfit <- cv.gamlr(x, y, gamma=1)
dim(cvfit$gamlr$beta)
[1] 5 100
length(cvfit$gamlr$lambda)
[1] 100
cvfit$gamlr$lambda[100]
seg100
0.00125315
cvfit$gamlr$beta[,drop=FALSE,100]
5 x 1 sparse Matrix of class "dgCMatrix"
seg100
1 0.12960060
2 -0.16406246
3 -0.46566731
4 0.08197053
5 -0.54170494
Or if you prefer it in a vector:
cvfit$gamlr$beta[,100]
1 2 3 4 5
0.12960060 -0.16406246 -0.46566731 0.08197053 -0.54170494

3D with value interpolation in R (X, Y, Z, V)

Is there an R package that does X, Y, Z, V interpolation? I see that Akima does X, Y, V but I need one more dimension.
Basically I have X,Y,Z coordinates plus the value (V) that I want to interpolate. This is all GIS data but my GIS does not do voxel interpolation
So if I have a point cloud of XYZ coordinates with a value of V, how can I interpolate what V would be at XYZ coordinate (15,15,-12) ? Some test data would look like this:
X <-rbind(10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50)
Y <- rbind(10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50)
Z <- rbind(-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29)
V <- rbind(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,25,35,75,25,50,0,0,0,0,0,10,12,17,22,27,32,37,25,13,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,50,125,130,105,110,115,165,180,120,100,80,60,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
I had the same question and was hoping for an answer in R.
My question was: How do I perform 3D (trilinear) interpolation using regular gridded coordinate/value data (x,y,z,v)? For example, CT images, where each image has pixel centers (x, y) and greyscale value (v) and there are multiple image "slices" (z) along the thing being imaged (e.g., head, torso, leg, ...).
There is a slight problem with the given example data.
# original example data (reformatted)
X <- rep( rep( seq(10, 50, by=10), each=25), 3)
Y <- rep( rep( seq(10, 50, by=10), each=5), 15)
Z <- rep(c(-5, -17, -29), each=125)
V <- rbind(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,25,35,75,25,50,0,0,0,0,0,10,12,17,22,27,32,37,25,13,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,50,125,130,105,110,115,165,180,120,100,80,60,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
# the dimensions of the 3D grid described do not match the number of values
(length(unique(X))*length(unique(Y))*length(unique(Z))) == length(V)
## [1] FALSE
## which makes sense since 75 != 375
# visualize this:
library(rgl)
plot3d(x=X, y=Y, z=Z, col=terrain.colors(181)[V])
# examine the example data real quick...
df <- data.frame(x=X,y=Y,z=Z,v=V);
head(df);
table(df$x, df$y, df$z);
# there are 5 V values at each X,Y,Z coordinate... duplicates!
# redefine Z so there are 15 unique values
# making 375 unique coordinate points
# and matching the length of the given value vector, V
df$z <- seq(-5, -29, length.out=15)
head(df)
table(df$x, df$y, df$z);
# there is now 1 V value at each X,Y,Z coordinate
# that was for testing, now actually redefine the Z vector.
Z <- rep(seq(-5,-29, length.out = 15), 25)
# plot it.
library(rgl)
plot3d(x=X, y=Y, z=Z, col=terrain.colors(181)[V])
I couldn't find any 4D interpolation functions in the usual R packages, so I wrote a quick and dirty one. The following implements (without ANY error checking... caveat emptor!) the technique described at: https://en.wikipedia.org/wiki/Trilinear_interpolation
# convenience function #1:
# define a function that takes a vector of lookup values and a value to lookup
# and returns the two lookup values that the value falls between
between = function(vec, value) {
# extract list of unique lookup values
u = unique(vec)
# difference vector
dvec = u - value
vals = c(u[dvec==max(dvec[dvec<0])], u[dvec==min(dvec[dvec>0])])
return(vals)
}
# convenience function #2:
# return the value (v) from a grid data.frame for given point (x, y, z)
get_value = function(df, xi, yi, zi) {
# assumes df is data.frame with column names: x, y, z, v
subset(df, x==xi & y==yi & z==zi)$v
}
# inputs df (x,y,z,v), points to look up (x, y, z)
interp3 = function(dfin, xin, yin, zin) {
# TODO: check if all(xin, yin, zin) equals a grid point, if so just return the point value
# TODO: check if any(xin, yin, zin) equals a grid point, if so then do bilinear or linear interp
cube_x <- between(dfin$x, xin)
cube_y <- between(dfin$y, yin)
cube_z <- between(dfin$z, zin)
# find the two values in each dimension that the lookup value falls within
# and extract the cube of 8 points
tmp <- subset(dfin, x %in% cube_x &
y %in% cube_y &
z %in% cube_z)
stopifnot(nrow(tmp)==8)
# define points in a periodic and cubic lattice
x0 = min(cube_x); x1 = max(cube_x);
y0 = min(cube_y); y1 = max(cube_y);
z0 = min(cube_z); z1 = max(cube_z);
# define differences in each dimension
xd = (xin-x0)/(x1-x0); # 0.5
yd = (yin-y0)/(y1-y0); # 0.5
zd = (zin-z0)/(z1-z0); # 0.9166666
# interpolate along x:
v00 = get_value(tmp, x0, y0, z0)*(1-xd) + get_value(tmp,x1,y0,z0)*xd # 2.5
v01 = get_value(tmp, x0, y0, z1)*(1-xd) + get_value(tmp,x1,y0,z1)*xd # 0
v10 = get_value(tmp, x0, y1, z0)*(1-xd) + get_value(tmp,x1,y1,z0)*xd # 0
v11 = get_value(tmp, x0, y1, z1)*(1-xd) + get_value(tmp,x1,y1,z1)*xd # 65
# interpolate along y:
v0 = v00*(1-yd) + v10*yd # 1.25
v1 = v01*(1-yd) + v11*yd # 32.5
# interpolate along z:
return(v0*(1-zd) + v1*zd) # 29.89583 (~91.7% between v0 and v1)
}
> interp3(df, 15, 15, -12)
[1] 29.89583
Testing that same source's assertion that trilinear is simply linear(bilinear(), bilinear()), we can use the base R linear interpolation function, approx(), and the akima package's bilinear interpolation function, interp(), as follows:
library(akima)
approx(x=c(-11.857143,-13.571429),
y=c(interp(x=df[round(df$z,1)==-11.9,"x"], y=df[round(df$z,1)==-11.9,"y"], z=df[round(df$z,1)==-11.9,"v"], xo=15, yo=15)$z,
interp(x=df[round(df$z,1)==-13.6,"x"], y=df[round(df$z,1)==-13.6,"y"], z=df[round(df$z,1)==-13.6,"v"], xo=15, yo=15)$z),
xout=-12)$y
# [1] 0.2083331
Checked another package to triangulate:
library(oce)
Vmat <- array(data = V, dim = c(length(unique(X)), length(unique(Y)), length(unique(Z))))
approx3d(x=unique(X), y=unique(Y), z=unique(Z), f=Vmat, xout=15, yout=15, zout=-12)
[1] 1.666667
So 'oce', 'akima' and my function all give pretty different answers. This is either a mistake in my code somewhere, or due to differences in the underlying Fortran code in the akima interp(), and whatever is in the oce 'approx3d' function that we'll leave for another day.
Not sure what the correct answer is because the MWE is not exactly "minimum" or simple. But I tested the functions with some really simple grids and it seems to give 'correct' answers. Here's one simple 2x2x2 example:
# really, really simple example:
# answer is always the z-coordinate value
sdf <- expand.grid(x=seq(0,1),y=seq(0,1),z=seq(0,1))
sdf$v <- rep(seq(0,1), each=4)
> interp3(sdf,0.25,0.25,.99)
[1] 0.99
> interp3(sdf,0.25,0.25,.4)
[1] 0.4
Trying akima on the simple example, we get the same answer (phew!):
library(akima)
approx(x=unique(sdf$z),
y=c(interp(x=sdf[sdf$z==0,"x"], y=sdf[sdf$z==0,"y"], z=sdf[sdf$z==0,"v"], xo=.25, yo=.25)$z,
interp(x=sdf[sdf$z==1,"x"], y=sdf[sdf$z==1,"y"], z=sdf[sdf$z==1,"v"], xo=.25, yo=.25)$z),
xout=.4)$y
# [1] 0.4
The new example data in the OP's own, accepted answer was not possible to interpolate with my simple interp3() function above because:
(a) the grid coordinates are not regularly spaced, and
(b) the coordinates to lookup (x1, y1, z1) lie outside of the grid.
# for completeness, here's the attempt:
options(scipen = 999)
XCoor=c(78121.6235,78121.6235,78121.6235,78121.6235,78136.723,78136.723,78136.723,78136.8969,78136.8969,78136.8969,78137.4595,78137.4595,78137.4595,78125.061,78125.061,78125.061,78092.4696,78092.4696,78092.4696,78092.7683,78092.7683,78092.7683,78092.7683,78075.1171,78075.1171,78064.7462,78064.7462,78064.7462,78052.771,78052.771,78052.771,78032.1179,78032.1179,78032.1179)
YCoor=c(5213642.173,523642.173,523642.173,523642.173,523594.495,523594.495,523594.495,523547.475,523547.475,523547.475,523503.462,523503.462,523503.462,523426.33,523426.33,523426.33,523656.953,523656.953,523656.953,523607.157,523607.157,523607.157,523607.157,523514.671,523514.671,523656.81,523656.81,523656.81,523585.232,523585.232,523585.232,523657.091,523657.091,523657.091)
ZCoor=c(-3.0,-5.0,-10.0,-13.0,-3.5,-6.5,-10.5,-3.5,-6.5,-9.5,-3.5,-5.5,-10.5,-3.5,-5.5,-7.5,-3.5,-6.5,-11.5,-3.0,-5.0,-9.0,-12.0,-6.5,-10.5,-2.5,-3.5,-8.0,-3.5,-6.5,-9.5,-2.5,-6.5,-8.5)
V=c(2.4000,30.0,620.0,590.0,61.0,480.0,0.3700,0.0,0.3800,0.1600,0.1600,0.9000,0.4100,0.0,0.0,0.0061,6.0,52.0,0.3400,33.0,235.0,350.0,9300.0,31.0,2100.0,0.0,0.0,10.5000,3.8000,0.9000,310.0,0.2800,8.3000,18.0)
adf = data.frame(x=XCoor, y=YCoor, z=ZCoor, v=V)
# the first y value looks like a typo?
> head(adf)
x y z v
1 78121.62 5213642.2 -3.0 2.4
2 78121.62 523642.2 -5.0 30.0
3 78121.62 523642.2 -10.0 620.0
4 78121.62 523642.2 -13.0 590.0
5 78136.72 523594.5 -3.5 61.0
6 78136.72 523594.5 -6.5 480.0
x1=198130.000
y1=1913590.000
z1=-8
> interp3(adf, x1,y1,z1)
numeric(0)
Warning message:
In min(dvec[dvec > 0]) : no non-missing arguments to min; returning Inf
Whether the test data did or not make sense, I still needed an algorithm. Test data is just that, something to fiddle with and as a test data it was fine.
I wound up programming it in python and the following code takes XYZ V and does a 3D Inverse Distance Weighted (IDW) interpolation where you can set the number of points used in the interpolation. This python recipe only interpolates to one point (x1, y1, z1) but it is easy enough to extend.
import numpy as np
import math
#34 points
XCoor=np.array([78121.6235,78121.6235,78121.6235,78121.6235,78136.723,78136.723,78136.723,78136.8969,78136.8969,78136.8969,78137.4595,78137.4595,78137.4595,78125.061,78125.061,78125.061,78092.4696,78092.4696,78092.4696,78092.7683,78092.7683,78092.7683,78092.7683,78075.1171,78075.1171,78064.7462,78064.7462,78064.7462,78052.771,78052.771,78052.771,78032.1179,78032.1179,78032.1179])
YCoor=np.array([5213642.173,523642.173,523642.173,523642.173,523594.495,523594.495,523594.495,523547.475,523547.475,523547.475,523503.462,523503.462,523503.462,523426.33,523426.33,523426.33,523656.953,523656.953,523656.953,523607.157,523607.157,523607.157,523607.157,523514.671,523514.671,523656.81,523656.81,523656.81,523585.232,523585.232,523585.232,523657.091,523657.091,523657.091])
ZCoor=np.array([-3.0,-5.0,-10.0,-13.0,-3.5,-6.5,-10.5,-3.5,-6.5,-9.5,-3.5,-5.5,-10.5,-3.5,-5.5,-7.5,-3.5,-6.5,-11.5,-3.0,-5.0,-9.0,-12.0,-6.5,-10.5,-2.5,-3.5,-8.0,-3.5,-6.5,-9.5,-2.5,-6.5,-8.5])
V=np.array([2.4000,30.0,620.0,590.0,61.0,480.0,0.3700,0.0,0.3800,0.1600,0.1600,0.9000,0.4100,0.0,0.0,0.0061,6.0,52.0,0.3400,33.0,235.0,350.0,9300.0,31.0,2100.0,0.0,0.0,10.5000,3.8000,0.9000,310.0,0.2800,8.3000,18.0])
def Distance(x1,y1,z1, Npoints):
i=0
d=[]
while i < 33:
d.append(math.sqrt((x1-XCoor[i])*(x1-XCoor[i]) + (y1-YCoor[i])*(y1-YCoor[i]) + (z1-ZCoor[i])*(z1-ZCoor[i]) ))
i = i + 1
distance=np.array(d)
myIndex=distance.argsort()[:Npoints]
weightedNum=0
weightedDen=0
for i in myIndex:
weightedNum=weightedNum + (V[i]/(distance[i]*distance[i]))
weightedDen=weightedDen + (1/(distance[i]*distance[i]))
InterpValue=weightedNum/weightedDen
return InterpValue
x1=198130.000
y1=1913590.000
z1=-8
print(Distance(x1,y1,z1, 12))

How to construct a sequence with a pattern in R

I would like to construct a sequence with length 50 of the following type:
Xn+1=4*Xn*(1-Xn). For your information, this is the Logistic Map for r=4. In the case of the Logistic Map with parameter r = 4 and an initial state in (0,1), the attractor is also the interval (0,1) and the probability measure corresponds to the beta distribution with parameters a = 0.5 and b = 0.5. (The Logistic Map is a polynomial mapping (equivalently, recurrence relation) of degree 2, often cited as an archetypal example of how complex, chaotic behaviour can arise from very simple non-linear dynamical equations). How can I do this in R?
There are some ready to use solution on the net. I cite the general solution of mage's blog where you can find more detailed description.
logistic.map <- function(r, x, N, M){
## r: bifurcation parameter
## x: initial value
## N: number of iteration
## M: number of iteration points to be returned
z <- 1:N
z[1] <- x
for(i in c(1:(N-1))){
z[i+1] <- r *z[i] * (1 - z[i])
}
## Return the last M iterations
z[c((N-M):N)]
}
For OP example:
logistic.map(4,0.2,50,49)
This isn't really an R question, is it? More basic programming. Anyway, you probably need an accumulator and a value to process.
values <- 0.2 ## this accumulates as a vector, starting with 0.2
xn <- values ## xn gets the first value
for (it in 2:50) { ## start the loop from the second iteration
xn <- 4L*xn*(1L-xn) ## perform the sequence function
values <- c(values, xn) ## add the new value to the vector
}
values
# [1] 0.2000000000 0.6400000000 0.9216000000 0.2890137600 0.8219392261 0.5854205387 0.9708133262 0.1133392473 0.4019738493 0.9615634951 0 .1478365599 0.5039236459
# [13] 0.9999384200 0.0002463048 0.0009849765 0.0039360251 0.0156821314 0.0617448085 0.2317295484 0.7121238592 0.8200138734 0.5903644834 0 .9673370405 0.1263843622
# [25] 0.4416454208 0.9863789723 0.0537419811 0.2034151221 0.6481496409 0.9122067356 0.3203424285 0.8708926280 0.4497546341 0.9899016128 0 .0399856390 0.1535471506
# [37] 0.5198816927 0.9984188732 0.0063145074 0.0250985376 0.0978744041 0.3531800204 0.9137755744 0.3151590962 0.8633353611 0.4719496615 0 .9968527140 0.0125495222
# [49] 0.0495681269 0.1884445109

Solve systems of nonlinear equations in R / BlackScholesMerton Model

I am writing my Masters final project in which I am deriving probability of default using Black Scholes Merton Model.I have got stuck in R code. Mathematically, I want to solve this system of nonlinear equations with the package nleqslv:
library(nleqslv)
T <- 1
D1 <- 20010.75
R <- 0.8516
sigmaS <- .11
SO1 <- 1311.74
fnewton <- function(x){
y <- numeric(2)
d1 <- (log(x[1]/D1)+(R+x[2]^2/2)*T)/x[2]*sqrt(T)
d2 <- d1 - x[2]*sqrt(T)
y[1] <- SO1 - (x[1]*pnorm(d1) - exp(-R*T)*D1*pnorm(d2))
y[2] <- sigmaS*SO1 - pnorm(d1)*x[2]*x[1]
y
}
xstart <- c(1311.74,0.11)
nleqslv(xstart, fnewton, method="Broyden")
# $x
# [1] 1311.74 0.11
# $fvec
# [1] 1311.7400 144.2914
# $termcd
# [1] 6
# $message
# [1] "Jacobian is singular (see allowSingular option)"
# $scalex
# [1] 1 1
# $nfcnt
# [1] 0
# $njcnt
# [1] 1
# $iter
# [1] 1
I have tried this with many values of the 5 inputs( stated above that I have computed for 2 companies for different years), but I am not getting the final values of S0 and sigma V.
I am getting message as "Jacobian is singular (see allowSingular option)" If I allow singular Jacobean using "control=list(trace=1,allowSingular=TRUE)", then also no answer is displayed. I do not know how to obtain the solution of these 2 variables now.
I really don’t know, what I am doing wrong as I oriented my model on Teterevas slides ( on slide no.5 is her model code), who’s presentation is the first result by googeling
https://www.google.de/search?q=moodys+KMV+in+R&rlz=1C1SVED_enDE401DE401&aq=f&oq=moodys+KMV+in+R&aqs=chrome.0.57.13309j0&sourceid=chrome&ie=UTF-8#q=distance+to+default+in+R
q=distance+to+default+in+R
Like me, however more successful, she calculates the Distance to Default risk measure via the Black Scholes Merton approach. In this model, the value of equity (usually represented by the market capitalization, > SO1) can be written as a European call option.
The other variables are:
x[1]: the variable I want to derive, value of total assets
x[2]: the variable I want to derive, volatility of total assets
D1: the book value of debt (19982009)
R: a riskfree interest rate
T: is set to 1 year (time)
sigmaS: estimated (historical) equity volatility
You should be able to use the initial values of SO1 and sigmaS as starting values for nleqslv.
First of all the R code given by Tetereva doesn't seem quite correct (the variable Z should be D1 as you have named it; similar changes for her S0 and D).
I have modified Tetereva's into this:
library(nleqslv)
T <- 1
D1 <- 33404048
R <- 2.32
sigmaS <- .02396919
SO1 <- 4740291 # Ve?
fnewton <- function(x){
y <- numeric(2)
d1 <- (log(x[1]/D1)+(R+x[2]^2/2)*T)/x[2]*sqrt(T)
d2 <- d1 - x[2]*sqrt(T)
y[1] <- SO1 - (x[1]*pnorm(d1) - exp(-R*T)*D1*pnorm(d2))
y[2] <- sigmaS*SO1 - pnorm(d1)*x[2]*x[1]
y
}
xstart <- c(SO1,sigmaS)
nleqslv(xstart, fnewton, method="Broyden",control=list(trace=1))
nleqslv(xstart, fnewton, method="Newton",control=list(trace=1))
which will give the solution given by Tetereva. (I use trace=1 here just to check the iteration steps.)
I believe the value you give for R should be 8.516 and not something else. Using your values for the parameters
T <- 1
D1 <- 20010.75
R <- 8.516 # modified
sigmaS <- .11
SO1 <- 1311.74
like this
xstart <- c(1311.74,0.11)
nleqslv(xstart, fnewton, method="Broyden")
nleqslv(xstart, fnewton, method="Newton")
Then running nleqslv with these values converges very quickly.
If one uses R <- 2.32 (like Tetereva) nleqslv will also converge albeit with more iterations.
I cannot help you with what R should actually be but from Tetereva's presentation I assume R is in percentages. Since I don't have enough knowledge on the Black-Scholes model I can't be of any help for finding out what the correct values are for the various parameters. It's up to you.

Mode of density function using optimize

I want to find the mode (x-value) of a univariate density function using R
s optimize function
I.e. For a standard normal function f(x) ~ N(3, 1) the mode should be the mean i.e. x=3.
I tried the following:
# Define the function
g <- function(x) dnorm(x = x, mean = 3, sd = 1)
Dvec <- c(-1000, 1000)
# First get the gradient of the function
gradfun <- function(x){grad(g, x)}
# Find the maximum value
x_mode <- optimize(f=g,interval = Dvec, maximum=TRUE)
x_mode
This gives the incorrect value of the mode as:
$maximum
[1] 999.9999
$objective
[1] 0
Which is incorrect i.e. gives the max value of the (-1000, 1000) interval as opposed to x=3.
Could anyone please help edit the optimisation code.
It will be used to pass more generic functions of x if this simple test case works
I would use optim for this, avoiding to mention the interval. You can tailor the seed by taking the maximum of the function on the original guessed interval:
guessedInterval = min(Dvec):max(Dvec)
superStarSeed = guessedInterval[which.max(g(guessedInterval))]
optim(par=superStarSeed, fn=function(y) -g(y))
#$par
#[1] 3
#$value
#[1] -0.3989423
#$counts
#function gradient
# 24 NA
#$convergence
#[1] 0
#$message
#NULL

Resources