extract coefficient of x^k from Ryacas calculated expression - r

I got a Ryacas expression like this, and I want to extract the coefficient of x^4,11. How should I write the code to get this, thanks.

Hope this helps:
x <- Sym("x")
P <- (x+1)*(x+2)*(x+3)
Expand(P)
# expression(x^3 + 6 * x^2 + 11 * x + 6)
# coefficient of x^2:
yacas("Coef((x+1)*(x+2)*(x+3), x, 2)")
# expression(6)
# or:
yacas(paste0("Coef(", P, ",x , 2)"))
# expression(6)
EDIT 2020-02
Updated for the new version of Ryacas:
library(Ryacas)
x <- ysym("x")
yac_assign((x+1)*(x+2)*(x+3), "P")
### expand the polynomial
yac_str("Expand(P)")
# [1] "x^3+6*x^2+11*x+6"
### extract coefficient of x^2:
yac_str("Coef(P, x, 2)")
# [1] "6"

Related

Minimising area under the ROC curve to optimise the parameters of a polynomial predictor with optim

My predictor (x) has U-shaped distribution in relation to the binary outcome (y), with positive outcomes at both low and high values of x, leading to a biconcave roc curve with a poor area under the curve (auc).
To maximise its ability to discriminate the outcome, I am trying to optimise the parameters of a second grade polynomial of x, by using optim and 1 - auc as the cost function to minimise.
x = c(13,7,7,7,1,100,3,4,4,2,2,7,14,8,3,14,5,12,8,
13,9,4,9,4,8,3,13,9,4,4,5,9,10,10,7,6,12,7,2,
6,6,4,3,2,3,10,5,2,5,8,3,5,4,2,7,5,7,6,79,9)
y = c(0,0,1,0,0,1,0,0,1,1,0,1,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0)
theta = c(0, 0, 0)
min_auc <- function(theta, x, y) {
(1 - roc(y, (theta[1] + theta[2]*x + theta[3]*x^2))$auc)
}
optim(theta, min_auc, x = x, y = y)
The results are as follow:
$par
[1] 0.0 0.1 0.0
$value
[1] 0.4380054
$counts
function gradient
8 NA
$convergence
[1] 0
$message
NULL
However, from a manual definition of the parameters, I know that min_auc can be further minimised.
theta = c(0, -40, 1)
(1 - roc(y, (theta[1] + theta[2]*x + theta[3]*(x^2)))$auc)
[1] 0.2762803
Could anyone explain to me what I am doing wrong, please? Is it possibly due to a non-convex cost function?
One possibility is it's a collinearity problem. Scaling the inputs helps:
min_auc <- function(theta, x, y) {
(1 - roc(y, (theta[1] + theta[2]*scale(x) + theta[3]*scale(x)^2))$auc)
}
optim(theta, min_auc, x = x, y = y)
# $par
# [1] -0.02469136 -0.03117284 0.11049383
#
# $value
# [1] 0.2762803
#
# $counts
# function gradient
# 30 NA
#
# $convergence
# [1] 0
#
# $message
# NULL
#
Another potential problem is that the surface over which you're optimizing has some flat spots. Let's say, for example, that we fix the intercept in this equation to -2. This is about what you get if you do qlogis(mean(y)). Then, you're only optimizing over 2 parameters so the surface is easier to see. Here's what it looks like with the two remaining theta terms on the two horizontal axes and the 1-AUC value on the y-axis.
min_auc <- function(theta, x, y) {
(1 - roc(y, (-2 + theta[1]*scale(x) + theta[2]*scale(x)^2))$auc)
}
s <- seq(-.25, .25, length=50)
o <- outer(s, s, Vectorize(function(z,w)min_auc(c(z,w),x,y)))
library(plotly)
plot_ly(x = ~s, y = ~s, z = ~o) %>% add_surface()
As you may have noticed above, there is no unique solution to the problem. There are lots of solutions that seem to get to the minimum value.

Definite, improper and multiple integration?

I have a question, whether is it possible in R to implement the Excel "search of the decision function"? It is necessary to create a script in R to solve an integral equation.
To solve 4 integrals below manually I just need paper, a pencil and 10 minutes:
Improper:
Double:
Triple:
Definite:
So I don't want solve the integrals like these manually, how can it be solved using R?
LATEX formala editor code:
improper
\int_{2}^{\infty} \frac{1}{\left(x - 1\right)^{2}}\, dx
double integrals
\int_{0}^{1}\int_{\frac{-1 x}{2}}^{\frac{x}{2}} e^{- x - y}\, dy\, dx
triple integrals
\int_{0}^{1}\int_{\frac{-1 x}{2}}^{\frac{x}{2}}\int_{\frac{-1 y}{3}}^{\frac{y}{3}} e^{- z + - x - y}\, dz\, dy\, dx
definite integrals
\int_{0}^{1} x^{2} \sin{\left (x \right )}\, dx
You can use rSymPy package to integrate all four expresisions as below:
Improper:
library(rSymPy)
x <- Var("x")
sympy("integrate(1 / (x - 1) ** 2, (x, 2, oo))")
# [1] "1"
Double:
library(rSymPy)
x <- Var("x")
y <- Var("y")
# double
sympy("integrate(exp(-x - y), (y, -x/2, x/2), (x, 0, 1))")
# [1] "4/3 + 2*exp(-3/2)/3 - 2*exp(-1/2)"
Triple:
library(rSymPy)
x <- Var("x")
y <- Var("y")
z <- Var("z")
sympy("integrate(exp(-x - y - z), (z, -y/3, y/3), (y, -x/2, x/2), (x, 0, 1))")
# [1] "-27/40 - 9*exp(-5/3)/20 + 9*exp(-4/3)/8 - 9*exp(-2/3)/4 + 9*exp(-1/3)/4"
Definite:
library(rSymPy)
x <- Var("x")
sympy("integrate(x ** 2 * sin(x), (x, 0, 1))")
# [1] "-2 + 2*sin(1) + cos(1)"

root values of simultaneous nonlinear equations in R

I've been trying to code this problem:
https://sg.answers.yahoo.com/question/index?qid=20110127015240AA9RjyZ
I believe there is a R function somewhere to solve for the root values of the following equations:
(x+3)^2 + (y-50)^2 = 1681
(x-11)^2 + (y+2)^2 = 169
(x-13)^2 + (y-34)^2 = 625
I tried using the 'solve' function but they're only for linear equations(?)
Also tried 'nls'
dt = data.frame(a=c(-3,11,13), b = c(50, -2, 34), c = c(1681,169,625))
nls(c~(x-a)^2 + (y-b)^2, data = dt, start = list(x = 1, y = 1))
but getting an error all the time. (and yes I already tried changing the max iteration)
Error in nls(c ~ (x - a)^2 + (y - b)^2, data = dt, start = list(x = 1, :
number of iterations exceeded maximum of 50
how do you solve the root values in R?
nls does not work with zero residual data -- see ?nls where this is mentioned. nlxb in the nlmrt package is mostly similar to nls in terms of input arguments and does support zero residual data. Using dt from the question just replace nls with nlxb:
library(nlmrt)
nlxb(c~(x-a)^2 + (y-b)^2, data = dt, start = list(x = 1, y = 1))
giving:
nlmrt class object: x
residual sumsquares = 2.6535e-20 on 3 observations
after 5 Jacobian and 6 function evaluations
name coeff SE tstat pval gradient JSingval
x 6 7.21e-12 8.322e+11 7.649e-13 -1.594e-09 96.93
y 10 1.864e-12 5.366e+12 1.186e-13 -1.05e-08 22.45
You cannot always solve three equations for two variables.You can solve two equations for two variables and test if the solution satisfies the third equation.
Use package nleqslv as follows.
library(nleqslv)
f1 <- function(z) {
f <- numeric(2)
x <- z[1]
y <- z[2]
f[1] <- (x+3)^2 + (y-50)^2 - 1681
f[2] <- (x-11)^2 + (y+2)^2 - 169
f
}
f2 <- function(z) {
x <- z[1]
y <- z[2]
(x-13)^2 + (y-34)^2 - 625
}
zstart <- c(0,0)
z1 <- nleqslv(zstart,f1)
z1
f2(z1$x)
which gives you the following output:
>z1
$x
[1] 6 10
$fvec
[1] 7.779818e-09 7.779505e-09
$termcd
[1] 1
$message
[1] "Function criterion near zero"
$scalex
[1] 1 1
$nfcnt
[1] 9
$njcnt
[1] 1
$iter
[1] 9
>f2(z1$x)
[1] 5.919242e-08
So a solution has been found and the solution follows from the vector z$x. Inserting z$x in function f2 also gives almost zero.
So a solution has been found.
You could also try package BB.
Just go through rootSolve package and you will be done:
https://cran.r-project.org/web/packages/rootSolve/vignettes/rootSolve.pdf

3D with value interpolation in R (X, Y, Z, V)

Is there an R package that does X, Y, Z, V interpolation? I see that Akima does X, Y, V but I need one more dimension.
Basically I have X,Y,Z coordinates plus the value (V) that I want to interpolate. This is all GIS data but my GIS does not do voxel interpolation
So if I have a point cloud of XYZ coordinates with a value of V, how can I interpolate what V would be at XYZ coordinate (15,15,-12) ? Some test data would look like this:
X <-rbind(10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50)
Y <- rbind(10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50)
Z <- rbind(-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29)
V <- rbind(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,25,35,75,25,50,0,0,0,0,0,10,12,17,22,27,32,37,25,13,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,50,125,130,105,110,115,165,180,120,100,80,60,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
I had the same question and was hoping for an answer in R.
My question was: How do I perform 3D (trilinear) interpolation using regular gridded coordinate/value data (x,y,z,v)? For example, CT images, where each image has pixel centers (x, y) and greyscale value (v) and there are multiple image "slices" (z) along the thing being imaged (e.g., head, torso, leg, ...).
There is a slight problem with the given example data.
# original example data (reformatted)
X <- rep( rep( seq(10, 50, by=10), each=25), 3)
Y <- rep( rep( seq(10, 50, by=10), each=5), 15)
Z <- rep(c(-5, -17, -29), each=125)
V <- rbind(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,25,35,75,25,50,0,0,0,0,0,10,12,17,22,27,32,37,25,13,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,50,125,130,105,110,115,165,180,120,100,80,60,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
# the dimensions of the 3D grid described do not match the number of values
(length(unique(X))*length(unique(Y))*length(unique(Z))) == length(V)
## [1] FALSE
## which makes sense since 75 != 375
# visualize this:
library(rgl)
plot3d(x=X, y=Y, z=Z, col=terrain.colors(181)[V])
# examine the example data real quick...
df <- data.frame(x=X,y=Y,z=Z,v=V);
head(df);
table(df$x, df$y, df$z);
# there are 5 V values at each X,Y,Z coordinate... duplicates!
# redefine Z so there are 15 unique values
# making 375 unique coordinate points
# and matching the length of the given value vector, V
df$z <- seq(-5, -29, length.out=15)
head(df)
table(df$x, df$y, df$z);
# there is now 1 V value at each X,Y,Z coordinate
# that was for testing, now actually redefine the Z vector.
Z <- rep(seq(-5,-29, length.out = 15), 25)
# plot it.
library(rgl)
plot3d(x=X, y=Y, z=Z, col=terrain.colors(181)[V])
I couldn't find any 4D interpolation functions in the usual R packages, so I wrote a quick and dirty one. The following implements (without ANY error checking... caveat emptor!) the technique described at: https://en.wikipedia.org/wiki/Trilinear_interpolation
# convenience function #1:
# define a function that takes a vector of lookup values and a value to lookup
# and returns the two lookup values that the value falls between
between = function(vec, value) {
# extract list of unique lookup values
u = unique(vec)
# difference vector
dvec = u - value
vals = c(u[dvec==max(dvec[dvec<0])], u[dvec==min(dvec[dvec>0])])
return(vals)
}
# convenience function #2:
# return the value (v) from a grid data.frame for given point (x, y, z)
get_value = function(df, xi, yi, zi) {
# assumes df is data.frame with column names: x, y, z, v
subset(df, x==xi & y==yi & z==zi)$v
}
# inputs df (x,y,z,v), points to look up (x, y, z)
interp3 = function(dfin, xin, yin, zin) {
# TODO: check if all(xin, yin, zin) equals a grid point, if so just return the point value
# TODO: check if any(xin, yin, zin) equals a grid point, if so then do bilinear or linear interp
cube_x <- between(dfin$x, xin)
cube_y <- between(dfin$y, yin)
cube_z <- between(dfin$z, zin)
# find the two values in each dimension that the lookup value falls within
# and extract the cube of 8 points
tmp <- subset(dfin, x %in% cube_x &
y %in% cube_y &
z %in% cube_z)
stopifnot(nrow(tmp)==8)
# define points in a periodic and cubic lattice
x0 = min(cube_x); x1 = max(cube_x);
y0 = min(cube_y); y1 = max(cube_y);
z0 = min(cube_z); z1 = max(cube_z);
# define differences in each dimension
xd = (xin-x0)/(x1-x0); # 0.5
yd = (yin-y0)/(y1-y0); # 0.5
zd = (zin-z0)/(z1-z0); # 0.9166666
# interpolate along x:
v00 = get_value(tmp, x0, y0, z0)*(1-xd) + get_value(tmp,x1,y0,z0)*xd # 2.5
v01 = get_value(tmp, x0, y0, z1)*(1-xd) + get_value(tmp,x1,y0,z1)*xd # 0
v10 = get_value(tmp, x0, y1, z0)*(1-xd) + get_value(tmp,x1,y1,z0)*xd # 0
v11 = get_value(tmp, x0, y1, z1)*(1-xd) + get_value(tmp,x1,y1,z1)*xd # 65
# interpolate along y:
v0 = v00*(1-yd) + v10*yd # 1.25
v1 = v01*(1-yd) + v11*yd # 32.5
# interpolate along z:
return(v0*(1-zd) + v1*zd) # 29.89583 (~91.7% between v0 and v1)
}
> interp3(df, 15, 15, -12)
[1] 29.89583
Testing that same source's assertion that trilinear is simply linear(bilinear(), bilinear()), we can use the base R linear interpolation function, approx(), and the akima package's bilinear interpolation function, interp(), as follows:
library(akima)
approx(x=c(-11.857143,-13.571429),
y=c(interp(x=df[round(df$z,1)==-11.9,"x"], y=df[round(df$z,1)==-11.9,"y"], z=df[round(df$z,1)==-11.9,"v"], xo=15, yo=15)$z,
interp(x=df[round(df$z,1)==-13.6,"x"], y=df[round(df$z,1)==-13.6,"y"], z=df[round(df$z,1)==-13.6,"v"], xo=15, yo=15)$z),
xout=-12)$y
# [1] 0.2083331
Checked another package to triangulate:
library(oce)
Vmat <- array(data = V, dim = c(length(unique(X)), length(unique(Y)), length(unique(Z))))
approx3d(x=unique(X), y=unique(Y), z=unique(Z), f=Vmat, xout=15, yout=15, zout=-12)
[1] 1.666667
So 'oce', 'akima' and my function all give pretty different answers. This is either a mistake in my code somewhere, or due to differences in the underlying Fortran code in the akima interp(), and whatever is in the oce 'approx3d' function that we'll leave for another day.
Not sure what the correct answer is because the MWE is not exactly "minimum" or simple. But I tested the functions with some really simple grids and it seems to give 'correct' answers. Here's one simple 2x2x2 example:
# really, really simple example:
# answer is always the z-coordinate value
sdf <- expand.grid(x=seq(0,1),y=seq(0,1),z=seq(0,1))
sdf$v <- rep(seq(0,1), each=4)
> interp3(sdf,0.25,0.25,.99)
[1] 0.99
> interp3(sdf,0.25,0.25,.4)
[1] 0.4
Trying akima on the simple example, we get the same answer (phew!):
library(akima)
approx(x=unique(sdf$z),
y=c(interp(x=sdf[sdf$z==0,"x"], y=sdf[sdf$z==0,"y"], z=sdf[sdf$z==0,"v"], xo=.25, yo=.25)$z,
interp(x=sdf[sdf$z==1,"x"], y=sdf[sdf$z==1,"y"], z=sdf[sdf$z==1,"v"], xo=.25, yo=.25)$z),
xout=.4)$y
# [1] 0.4
The new example data in the OP's own, accepted answer was not possible to interpolate with my simple interp3() function above because:
(a) the grid coordinates are not regularly spaced, and
(b) the coordinates to lookup (x1, y1, z1) lie outside of the grid.
# for completeness, here's the attempt:
options(scipen = 999)
XCoor=c(78121.6235,78121.6235,78121.6235,78121.6235,78136.723,78136.723,78136.723,78136.8969,78136.8969,78136.8969,78137.4595,78137.4595,78137.4595,78125.061,78125.061,78125.061,78092.4696,78092.4696,78092.4696,78092.7683,78092.7683,78092.7683,78092.7683,78075.1171,78075.1171,78064.7462,78064.7462,78064.7462,78052.771,78052.771,78052.771,78032.1179,78032.1179,78032.1179)
YCoor=c(5213642.173,523642.173,523642.173,523642.173,523594.495,523594.495,523594.495,523547.475,523547.475,523547.475,523503.462,523503.462,523503.462,523426.33,523426.33,523426.33,523656.953,523656.953,523656.953,523607.157,523607.157,523607.157,523607.157,523514.671,523514.671,523656.81,523656.81,523656.81,523585.232,523585.232,523585.232,523657.091,523657.091,523657.091)
ZCoor=c(-3.0,-5.0,-10.0,-13.0,-3.5,-6.5,-10.5,-3.5,-6.5,-9.5,-3.5,-5.5,-10.5,-3.5,-5.5,-7.5,-3.5,-6.5,-11.5,-3.0,-5.0,-9.0,-12.0,-6.5,-10.5,-2.5,-3.5,-8.0,-3.5,-6.5,-9.5,-2.5,-6.5,-8.5)
V=c(2.4000,30.0,620.0,590.0,61.0,480.0,0.3700,0.0,0.3800,0.1600,0.1600,0.9000,0.4100,0.0,0.0,0.0061,6.0,52.0,0.3400,33.0,235.0,350.0,9300.0,31.0,2100.0,0.0,0.0,10.5000,3.8000,0.9000,310.0,0.2800,8.3000,18.0)
adf = data.frame(x=XCoor, y=YCoor, z=ZCoor, v=V)
# the first y value looks like a typo?
> head(adf)
x y z v
1 78121.62 5213642.2 -3.0 2.4
2 78121.62 523642.2 -5.0 30.0
3 78121.62 523642.2 -10.0 620.0
4 78121.62 523642.2 -13.0 590.0
5 78136.72 523594.5 -3.5 61.0
6 78136.72 523594.5 -6.5 480.0
x1=198130.000
y1=1913590.000
z1=-8
> interp3(adf, x1,y1,z1)
numeric(0)
Warning message:
In min(dvec[dvec > 0]) : no non-missing arguments to min; returning Inf
Whether the test data did or not make sense, I still needed an algorithm. Test data is just that, something to fiddle with and as a test data it was fine.
I wound up programming it in python and the following code takes XYZ V and does a 3D Inverse Distance Weighted (IDW) interpolation where you can set the number of points used in the interpolation. This python recipe only interpolates to one point (x1, y1, z1) but it is easy enough to extend.
import numpy as np
import math
#34 points
XCoor=np.array([78121.6235,78121.6235,78121.6235,78121.6235,78136.723,78136.723,78136.723,78136.8969,78136.8969,78136.8969,78137.4595,78137.4595,78137.4595,78125.061,78125.061,78125.061,78092.4696,78092.4696,78092.4696,78092.7683,78092.7683,78092.7683,78092.7683,78075.1171,78075.1171,78064.7462,78064.7462,78064.7462,78052.771,78052.771,78052.771,78032.1179,78032.1179,78032.1179])
YCoor=np.array([5213642.173,523642.173,523642.173,523642.173,523594.495,523594.495,523594.495,523547.475,523547.475,523547.475,523503.462,523503.462,523503.462,523426.33,523426.33,523426.33,523656.953,523656.953,523656.953,523607.157,523607.157,523607.157,523607.157,523514.671,523514.671,523656.81,523656.81,523656.81,523585.232,523585.232,523585.232,523657.091,523657.091,523657.091])
ZCoor=np.array([-3.0,-5.0,-10.0,-13.0,-3.5,-6.5,-10.5,-3.5,-6.5,-9.5,-3.5,-5.5,-10.5,-3.5,-5.5,-7.5,-3.5,-6.5,-11.5,-3.0,-5.0,-9.0,-12.0,-6.5,-10.5,-2.5,-3.5,-8.0,-3.5,-6.5,-9.5,-2.5,-6.5,-8.5])
V=np.array([2.4000,30.0,620.0,590.0,61.0,480.0,0.3700,0.0,0.3800,0.1600,0.1600,0.9000,0.4100,0.0,0.0,0.0061,6.0,52.0,0.3400,33.0,235.0,350.0,9300.0,31.0,2100.0,0.0,0.0,10.5000,3.8000,0.9000,310.0,0.2800,8.3000,18.0])
def Distance(x1,y1,z1, Npoints):
i=0
d=[]
while i < 33:
d.append(math.sqrt((x1-XCoor[i])*(x1-XCoor[i]) + (y1-YCoor[i])*(y1-YCoor[i]) + (z1-ZCoor[i])*(z1-ZCoor[i]) ))
i = i + 1
distance=np.array(d)
myIndex=distance.argsort()[:Npoints]
weightedNum=0
weightedDen=0
for i in myIndex:
weightedNum=weightedNum + (V[i]/(distance[i]*distance[i]))
weightedDen=weightedDen + (1/(distance[i]*distance[i]))
InterpValue=weightedNum/weightedDen
return InterpValue
x1=198130.000
y1=1913590.000
z1=-8
print(Distance(x1,y1,z1, 12))

How to generate correlated numbers?

I have correlated one set number with .9, .5, .0
A derives from rnorm(30,-0.5,1)
B derives from rnorm(30,.5,2)
and want to make A & B correlated with .9, .5, .0.
You are describing a multivariate normal distribution, which can be computed with the mvrnorm function:
library(MASS)
meanA <- -0.5
meanB <- 0.5
sdA <- 1
sdB <- 2
correlation <- 0.9
set.seed(144)
vals <- mvrnorm(10000, c(meanA, meanB), matrix(c(sdA^2, correlation*sdA*sdB,
correlation*sdA*sdB, sdB^2), nrow=2))
mean(vals[,1])
# [1] -0.4883265
mean(vals[,2])
# [1] 0.5201586
sd(vals[,1])
# [1] 0.9994628
sd(vals[,2])
# [1] 1.992816
cor(vals[,1], vals[,2])
# [1] 0.8999285
As an alternative, please consider the following. Let the random variables X ~ N(0,1) and Y ~ N(0,1) independently. Then the random variables X and rho X + sqrt(1 - rho^2) Y are both distributed N(0,1), but are now correlated with correlation rho. So possible R code could be
# Define the parameters
meanA <- -0.5
meanB <- 0.5
sdA <- 1
sdB <- 2
correlation <- 0.9
n <- 10000 # You want 30
# Generate from independent standard normals
x <- rnorm(n, 0, 1)
y <- rnorm(n, 0, 1)
# Transform
x2 <- x # could be avoided
y2 <- correlation*x + sqrt(1 - correlation^2)*y
# Fix up means and standard deviations
x3 <- meanA + sdA*x2
y3 <- meanB + sdB*y2
# Check summary statistics
mean(x3)
# [1] -0.4981958
mean(y3)
# [1] 0.4999068
sd(x3)
# [1] 1.014299
sd(y3)
# [1] 2.022377
cor(x3, y3)
# [1] 0.9002529
I created the correlate package to be able to create a correlation between any type of variable (regardless of distribution) given a certain amount of toleration. It does so by permutations.
install.packages('correlate')
library('correlate')
A <- rnorm(30, -0.5, 1)
B <- rnorm(30, .5, 2)
C <- correlate(cbind(A,B), 0.9)
# 0.9012749
D <- correlate(cbind(A,B), 0.5)
# 0.5018054
E <- correlate(cbind(A,B), 0.0)
# -0.00407327
You can pretty much decide the whole matrix if you want (for multiple variables), by giving a matrix as second argument.
Ironically, you can also use it to create a multivariate normal.....

Resources