Interpolate new points between two given points - r

Given two points, how can I interpolate and generate 20 points in between those two points.
E.g., points:
x = c(2,8)
y = c(2,19)
I tried to fit a linear model and then use that to generate the points, but when the x value is the same, a linear line cannot be fit.

Possibly easier to run approx(x,y, n=20)

This is weird, because interpolating two points means...a straight line?
Anyway, here you go:
> x2<-seq(x[1],x[2],length.out=20)
> x2
[1] 2.000000 2.315789 2.631579 2.947368 3.263158 3.578947 3.894737 4.210526 4.526316 4.842105
[11] 5.157895 5.473684 5.789474 6.105263 6.421053 6.736842 7.052632 7.368421 7.684211 8.000000
> y2<-seq(y[1],y[2],length.out=20)
> y2
[1] 2.000000 2.894737 3.789474 4.684211 5.578947 6.473684 7.368421 8.263158 9.157895
[10] 10.052632 10.947368 11.842105 12.736842 13.631579 14.526316 15.421053 16.315789 17.210526
[19] 18.105263 19.000000

How about...
yfrom <- 8
yto <- 19
y <- seq(yfrom, yto, by = ((yto - yfrom)/(20 + 1)))
x <- rep(2, 22)
data.frame(x,y)

Related

3D with value interpolation in R (X, Y, Z, V)

Is there an R package that does X, Y, Z, V interpolation? I see that Akima does X, Y, V but I need one more dimension.
Basically I have X,Y,Z coordinates plus the value (V) that I want to interpolate. This is all GIS data but my GIS does not do voxel interpolation
So if I have a point cloud of XYZ coordinates with a value of V, how can I interpolate what V would be at XYZ coordinate (15,15,-12) ? Some test data would look like this:
X <-rbind(10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50)
Y <- rbind(10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50)
Z <- rbind(-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29)
V <- rbind(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,25,35,75,25,50,0,0,0,0,0,10,12,17,22,27,32,37,25,13,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,50,125,130,105,110,115,165,180,120,100,80,60,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
I had the same question and was hoping for an answer in R.
My question was: How do I perform 3D (trilinear) interpolation using regular gridded coordinate/value data (x,y,z,v)? For example, CT images, where each image has pixel centers (x, y) and greyscale value (v) and there are multiple image "slices" (z) along the thing being imaged (e.g., head, torso, leg, ...).
There is a slight problem with the given example data.
# original example data (reformatted)
X <- rep( rep( seq(10, 50, by=10), each=25), 3)
Y <- rep( rep( seq(10, 50, by=10), each=5), 15)
Z <- rep(c(-5, -17, -29), each=125)
V <- rbind(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,25,35,75,25,50,0,0,0,0,0,10,12,17,22,27,32,37,25,13,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,50,125,130,105,110,115,165,180,120,100,80,60,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
# the dimensions of the 3D grid described do not match the number of values
(length(unique(X))*length(unique(Y))*length(unique(Z))) == length(V)
## [1] FALSE
## which makes sense since 75 != 375
# visualize this:
library(rgl)
plot3d(x=X, y=Y, z=Z, col=terrain.colors(181)[V])
# examine the example data real quick...
df <- data.frame(x=X,y=Y,z=Z,v=V);
head(df);
table(df$x, df$y, df$z);
# there are 5 V values at each X,Y,Z coordinate... duplicates!
# redefine Z so there are 15 unique values
# making 375 unique coordinate points
# and matching the length of the given value vector, V
df$z <- seq(-5, -29, length.out=15)
head(df)
table(df$x, df$y, df$z);
# there is now 1 V value at each X,Y,Z coordinate
# that was for testing, now actually redefine the Z vector.
Z <- rep(seq(-5,-29, length.out = 15), 25)
# plot it.
library(rgl)
plot3d(x=X, y=Y, z=Z, col=terrain.colors(181)[V])
I couldn't find any 4D interpolation functions in the usual R packages, so I wrote a quick and dirty one. The following implements (without ANY error checking... caveat emptor!) the technique described at: https://en.wikipedia.org/wiki/Trilinear_interpolation
# convenience function #1:
# define a function that takes a vector of lookup values and a value to lookup
# and returns the two lookup values that the value falls between
between = function(vec, value) {
# extract list of unique lookup values
u = unique(vec)
# difference vector
dvec = u - value
vals = c(u[dvec==max(dvec[dvec<0])], u[dvec==min(dvec[dvec>0])])
return(vals)
}
# convenience function #2:
# return the value (v) from a grid data.frame for given point (x, y, z)
get_value = function(df, xi, yi, zi) {
# assumes df is data.frame with column names: x, y, z, v
subset(df, x==xi & y==yi & z==zi)$v
}
# inputs df (x,y,z,v), points to look up (x, y, z)
interp3 = function(dfin, xin, yin, zin) {
# TODO: check if all(xin, yin, zin) equals a grid point, if so just return the point value
# TODO: check if any(xin, yin, zin) equals a grid point, if so then do bilinear or linear interp
cube_x <- between(dfin$x, xin)
cube_y <- between(dfin$y, yin)
cube_z <- between(dfin$z, zin)
# find the two values in each dimension that the lookup value falls within
# and extract the cube of 8 points
tmp <- subset(dfin, x %in% cube_x &
y %in% cube_y &
z %in% cube_z)
stopifnot(nrow(tmp)==8)
# define points in a periodic and cubic lattice
x0 = min(cube_x); x1 = max(cube_x);
y0 = min(cube_y); y1 = max(cube_y);
z0 = min(cube_z); z1 = max(cube_z);
# define differences in each dimension
xd = (xin-x0)/(x1-x0); # 0.5
yd = (yin-y0)/(y1-y0); # 0.5
zd = (zin-z0)/(z1-z0); # 0.9166666
# interpolate along x:
v00 = get_value(tmp, x0, y0, z0)*(1-xd) + get_value(tmp,x1,y0,z0)*xd # 2.5
v01 = get_value(tmp, x0, y0, z1)*(1-xd) + get_value(tmp,x1,y0,z1)*xd # 0
v10 = get_value(tmp, x0, y1, z0)*(1-xd) + get_value(tmp,x1,y1,z0)*xd # 0
v11 = get_value(tmp, x0, y1, z1)*(1-xd) + get_value(tmp,x1,y1,z1)*xd # 65
# interpolate along y:
v0 = v00*(1-yd) + v10*yd # 1.25
v1 = v01*(1-yd) + v11*yd # 32.5
# interpolate along z:
return(v0*(1-zd) + v1*zd) # 29.89583 (~91.7% between v0 and v1)
}
> interp3(df, 15, 15, -12)
[1] 29.89583
Testing that same source's assertion that trilinear is simply linear(bilinear(), bilinear()), we can use the base R linear interpolation function, approx(), and the akima package's bilinear interpolation function, interp(), as follows:
library(akima)
approx(x=c(-11.857143,-13.571429),
y=c(interp(x=df[round(df$z,1)==-11.9,"x"], y=df[round(df$z,1)==-11.9,"y"], z=df[round(df$z,1)==-11.9,"v"], xo=15, yo=15)$z,
interp(x=df[round(df$z,1)==-13.6,"x"], y=df[round(df$z,1)==-13.6,"y"], z=df[round(df$z,1)==-13.6,"v"], xo=15, yo=15)$z),
xout=-12)$y
# [1] 0.2083331
Checked another package to triangulate:
library(oce)
Vmat <- array(data = V, dim = c(length(unique(X)), length(unique(Y)), length(unique(Z))))
approx3d(x=unique(X), y=unique(Y), z=unique(Z), f=Vmat, xout=15, yout=15, zout=-12)
[1] 1.666667
So 'oce', 'akima' and my function all give pretty different answers. This is either a mistake in my code somewhere, or due to differences in the underlying Fortran code in the akima interp(), and whatever is in the oce 'approx3d' function that we'll leave for another day.
Not sure what the correct answer is because the MWE is not exactly "minimum" or simple. But I tested the functions with some really simple grids and it seems to give 'correct' answers. Here's one simple 2x2x2 example:
# really, really simple example:
# answer is always the z-coordinate value
sdf <- expand.grid(x=seq(0,1),y=seq(0,1),z=seq(0,1))
sdf$v <- rep(seq(0,1), each=4)
> interp3(sdf,0.25,0.25,.99)
[1] 0.99
> interp3(sdf,0.25,0.25,.4)
[1] 0.4
Trying akima on the simple example, we get the same answer (phew!):
library(akima)
approx(x=unique(sdf$z),
y=c(interp(x=sdf[sdf$z==0,"x"], y=sdf[sdf$z==0,"y"], z=sdf[sdf$z==0,"v"], xo=.25, yo=.25)$z,
interp(x=sdf[sdf$z==1,"x"], y=sdf[sdf$z==1,"y"], z=sdf[sdf$z==1,"v"], xo=.25, yo=.25)$z),
xout=.4)$y
# [1] 0.4
The new example data in the OP's own, accepted answer was not possible to interpolate with my simple interp3() function above because:
(a) the grid coordinates are not regularly spaced, and
(b) the coordinates to lookup (x1, y1, z1) lie outside of the grid.
# for completeness, here's the attempt:
options(scipen = 999)
XCoor=c(78121.6235,78121.6235,78121.6235,78121.6235,78136.723,78136.723,78136.723,78136.8969,78136.8969,78136.8969,78137.4595,78137.4595,78137.4595,78125.061,78125.061,78125.061,78092.4696,78092.4696,78092.4696,78092.7683,78092.7683,78092.7683,78092.7683,78075.1171,78075.1171,78064.7462,78064.7462,78064.7462,78052.771,78052.771,78052.771,78032.1179,78032.1179,78032.1179)
YCoor=c(5213642.173,523642.173,523642.173,523642.173,523594.495,523594.495,523594.495,523547.475,523547.475,523547.475,523503.462,523503.462,523503.462,523426.33,523426.33,523426.33,523656.953,523656.953,523656.953,523607.157,523607.157,523607.157,523607.157,523514.671,523514.671,523656.81,523656.81,523656.81,523585.232,523585.232,523585.232,523657.091,523657.091,523657.091)
ZCoor=c(-3.0,-5.0,-10.0,-13.0,-3.5,-6.5,-10.5,-3.5,-6.5,-9.5,-3.5,-5.5,-10.5,-3.5,-5.5,-7.5,-3.5,-6.5,-11.5,-3.0,-5.0,-9.0,-12.0,-6.5,-10.5,-2.5,-3.5,-8.0,-3.5,-6.5,-9.5,-2.5,-6.5,-8.5)
V=c(2.4000,30.0,620.0,590.0,61.0,480.0,0.3700,0.0,0.3800,0.1600,0.1600,0.9000,0.4100,0.0,0.0,0.0061,6.0,52.0,0.3400,33.0,235.0,350.0,9300.0,31.0,2100.0,0.0,0.0,10.5000,3.8000,0.9000,310.0,0.2800,8.3000,18.0)
adf = data.frame(x=XCoor, y=YCoor, z=ZCoor, v=V)
# the first y value looks like a typo?
> head(adf)
x y z v
1 78121.62 5213642.2 -3.0 2.4
2 78121.62 523642.2 -5.0 30.0
3 78121.62 523642.2 -10.0 620.0
4 78121.62 523642.2 -13.0 590.0
5 78136.72 523594.5 -3.5 61.0
6 78136.72 523594.5 -6.5 480.0
x1=198130.000
y1=1913590.000
z1=-8
> interp3(adf, x1,y1,z1)
numeric(0)
Warning message:
In min(dvec[dvec > 0]) : no non-missing arguments to min; returning Inf
Whether the test data did or not make sense, I still needed an algorithm. Test data is just that, something to fiddle with and as a test data it was fine.
I wound up programming it in python and the following code takes XYZ V and does a 3D Inverse Distance Weighted (IDW) interpolation where you can set the number of points used in the interpolation. This python recipe only interpolates to one point (x1, y1, z1) but it is easy enough to extend.
import numpy as np
import math
#34 points
XCoor=np.array([78121.6235,78121.6235,78121.6235,78121.6235,78136.723,78136.723,78136.723,78136.8969,78136.8969,78136.8969,78137.4595,78137.4595,78137.4595,78125.061,78125.061,78125.061,78092.4696,78092.4696,78092.4696,78092.7683,78092.7683,78092.7683,78092.7683,78075.1171,78075.1171,78064.7462,78064.7462,78064.7462,78052.771,78052.771,78052.771,78032.1179,78032.1179,78032.1179])
YCoor=np.array([5213642.173,523642.173,523642.173,523642.173,523594.495,523594.495,523594.495,523547.475,523547.475,523547.475,523503.462,523503.462,523503.462,523426.33,523426.33,523426.33,523656.953,523656.953,523656.953,523607.157,523607.157,523607.157,523607.157,523514.671,523514.671,523656.81,523656.81,523656.81,523585.232,523585.232,523585.232,523657.091,523657.091,523657.091])
ZCoor=np.array([-3.0,-5.0,-10.0,-13.0,-3.5,-6.5,-10.5,-3.5,-6.5,-9.5,-3.5,-5.5,-10.5,-3.5,-5.5,-7.5,-3.5,-6.5,-11.5,-3.0,-5.0,-9.0,-12.0,-6.5,-10.5,-2.5,-3.5,-8.0,-3.5,-6.5,-9.5,-2.5,-6.5,-8.5])
V=np.array([2.4000,30.0,620.0,590.0,61.0,480.0,0.3700,0.0,0.3800,0.1600,0.1600,0.9000,0.4100,0.0,0.0,0.0061,6.0,52.0,0.3400,33.0,235.0,350.0,9300.0,31.0,2100.0,0.0,0.0,10.5000,3.8000,0.9000,310.0,0.2800,8.3000,18.0])
def Distance(x1,y1,z1, Npoints):
i=0
d=[]
while i < 33:
d.append(math.sqrt((x1-XCoor[i])*(x1-XCoor[i]) + (y1-YCoor[i])*(y1-YCoor[i]) + (z1-ZCoor[i])*(z1-ZCoor[i]) ))
i = i + 1
distance=np.array(d)
myIndex=distance.argsort()[:Npoints]
weightedNum=0
weightedDen=0
for i in myIndex:
weightedNum=weightedNum + (V[i]/(distance[i]*distance[i]))
weightedDen=weightedDen + (1/(distance[i]*distance[i]))
InterpValue=weightedNum/weightedDen
return InterpValue
x1=198130.000
y1=1913590.000
z1=-8
print(Distance(x1,y1,z1, 12))

How do I estimate the coordinates of points along a graphed line in R?

Assume I have data:
x <- c(1900,1930,1944,1950,1970,1980,1983,1984)
y <- c(100,300,500,1500,2500,3500,4330,6703)
I then plot this data and add a line graph between my known x and y coordinates:
plot(x,y)
lines(x,y)
Is there a way to predict coordinates of unknown points along the graphed line?
You can use approxfun.
f <- approxfun(x, y=y)
f(seq(1900, 2000, length.out = 10))
# [1] 100.0000 174.0741 248.1481 347.6190 574.0741 1777.7778 2333.3333
# [8] 3277.7778 NA NA
Note the NA, when the sequence is outside the range of interpolated points (there are left and right options to approxfun).
You can manually calculate the slopes of each line.
The equation of a line is given by
y − y1 = grad*(x − x1)
where grad is calculated by the change in y divided by the change in x
We can produce equations for each line by using two points from each line in the plot.
f2 <- function(xnew, X=x, Y=y) {
id0 <- findInterval(xnew, X, rightmost=T)
id1 <- id0 + 1
grad <- (Y[id1] - Y[id0]) / (X[id1] - X[id0])
Y[id0] + grad* (xnew - X[id0])
}
f2(x)
#[1] 100 300 500 1500 2500 3500 4330 6703
f <- approxfun(x, y=y) # bunks
f(seq(1900, 2000, length.out = 10))
# [1] 100.0000 174.0741 248.1481 347.6190 574.0741 1777.7778 2333.3333 3277.7778 NA NA
f2(seq(1900, 2000, length.out = 10))
# [1] 100.0000 174.0741 248.1481 347.6190 574.0741 1777.7778 2333.3333 3277.7778 NA NA
If you wanted to extrapolate using the final slope, you can do this by adding
the all.in=TRUE argument to findInterval.
With that said, approxfun does it better & easier!
Or, if you want to approximate points along the lines in regular intervals, you can use approx instead of approxfun. For that purpose, this maybe saves you a tiny bit of coding.
x <- c(1900,1930,1944,1950,1970,1980,1983,1984)
y <- c(100,300,500,1500,2500,3500,4330,6703)
new_points <- approx(x, y)
lapply(new_points, head)
#> $x
#> [1] 1900.000 1901.714 1903.429 1905.143 1906.857 1908.571
#>
#> $y
#> [1] 100.0000 111.4286 122.8571 134.2857 145.7143 157.1429
plot(new_points)
lines(x,y)
Created on 2022-11-20 with reprex v2.0.2

most occurring value in a vector

I have a vector file with 1000 values. All the values were generated using Random function between 0-1.
x <- runif(100,min=0,max=1)
x
[1] 0.84620011 0.82525410 0.31622827 0.08040362 0.12894525 0.23997187 0.57177296 0.91691368 0.65751720
[10] 0.39810175 0.60632205 0.26339035 0.93543618 0.09662383 0.35147739 0.51731042 0.29151612 0.54411769
[19] 0.73688309 0.26086586 0.37808273 0.19163366 0.62776847 0.70973345 0.31802726 0.69101574 0.50042561
[28] 0.20768256 0.23555818 0.21015820 0.18221151 0.85593725 0.12916935 0.52222127 0.62269135 0.51267707
[37] 0.60164023 0.30723904 0.81990231 0.61771762 0.02502631 0.47427724 0.21250040 0.88611710 0.88648546
[46] 0.92586513 0.57015942 0.33454379 0.03572245 0.68120369 0.48692522 0.76587764 0.55214917 0.31137200
[55] 0.47170307 0.48639510 0.68922858 0.73506033 0.23541740 0.81793240 0.17184666 0.06670039 0.55664270
[64] 0.10030533 0.94620061 0.58572228 0.53333567 0.80887841 0.55015406 0.82491114 0.81251132 0.06038019
[73] 0.10918904 0.84011824 0.33169617 0.03568364 0.07703029 0.15601158 0.31623253 0.25021777 0.77024833
[82] 0.88588620 0.49044305 0.10165930 0.55494697 0.17455070 0.94458467 0.43135868 0.99313733 0.04482747
[91] 0.53453604 0.52500493 0.35496966 0.06994880 0.11377845 0.71307042 0.35086237 0.04032254 0.23744845
[100] 0.81131033
Out of all these values in the vector, I need to find the most occurring value(Or close to that). I'm new to R and have no idea what this. Please help?
One approach I have - Divide all the values in a certain ranges and find the frequency distribution. But will it be helpful?
One possibility to analyze the distribution of the numbers could consist in plotting a histogram and adding an approximate probability density distribution.
This can be done with the ggplot2 library:
set.seed(123) # used here for reproducibility
x <- runif(100) # pseudo-random numbers between 0 and 1
library(ggplot2)
p <- ggplot(as.data.frame(x),aes(x=x, y=..density..)) +
geom_histogram(fill="lightblue",colour="grey60",bins=50) +
geom_density()
The value of bins specified in geom_histigram() is the number of bars in the histogram. You may want to try to change this value to obtain a different representation of the distribution.
OR
You could use base Rand plot a simple histogram:
hist(x)
There you can also change the bin width (see breaks), but the default might be sufficient to show the concept.
You can identify which bin in this histogram has the most entries with
> hist(x)$mids[which.max(hist(x)$counts)]
#[1] 0.45
Which in this case means that most values occur near a value of 0.45 (the middle of the bin describing the range between 0.4 and 0.5).
Hope this helps.
You can do this:
set.seed(12)
x <- runif(100,min=0,max=1)
n <- length(x)
x_cut<-cut(x, breaks = n/4)
which(table(x_cut)==max(table(x_cut)))
The result depends on the breaks value you set. This is an alternative to using a histogram if you don't need one.
To really get just the most occurrent value, or when using discrete data as input, you could simply create a table, sort the results and return the highest value:
values <- c("a", "a", "c", "c", "c")
names(sort(table(values), decreasing = TRUE)[1])
#> [1] "c"
Breaking it down:
# create a table of the values
table(values)
#> a c
#> 2 3
# sort the table descending on number of occurrences
sort(table(values), decreasing = TRUE)
#> c a
#> 3 2
# now only keep the first value
sort(table(values), decreasing = TRUE)[1]
#> c
#> 3
# so the final line:
names(sort(table(values), decreasing = TRUE)[1])
#> [1] "c"
If you're feeling like wanting to do fancy stuff, create a function that does this for you:
get_mode <- function(x) {
names(sort(table(values), decreasing = TRUE)[1])
}
get_mode(values)
#> [1] "c"

How to obtain the density estimation of a specific value in stats::density?

Suppose I have data like the following:
val <- .65
set.seed(1)
distr <- replicate(1000, jitter(.5, amount = .2))
d <- density(distr)
Since stats::density uses a specific bw, it does not include all possible values in the interval (becuase they're infinite):
d$x[ d$x > .64 & d$x < .66 ]
[1] 0.6400439 0.6411318 0.6422197 0.6433076 0.6443955 0.6454834 0.6465713 0.6476592 0.6487471
[10] 0.6498350 0.6509229 0.6520108 0.6530987 0.6541866 0.6552745 0.6563624 0.6574503 0.6585382
[19] 0.6596261
I would like to find a way to provide val to the density function, so that it will return its d$y estimate (I will then use it to color areas of the density plot).
I can't guess how silly this question is, but I can't find a fast solution.
I thought of obtaining it by a linear interpolation of the d$y corresponding to the two values of d$x that are closer to val. Is there a faster way?
This illustrates the use of approxfun:
> Af <- approxfun(d$x, d$y)
> Af(val)
[1] 2.348879
> plot(d(
+
> plot(d)
> points(val,Af(val) )
> png();plot(d); points(val,Af(val) ); dev.off()

interpolate a series over another series in R

Suppose if I have a random time series that I want to interpolate over another time series. How would I do this in R?
# generate time interval series from exponential distribution
s = sort(rexp(10))
# scale between 0 and 1
scale01 = function(x){(x-min(x))/(max(x)-min(x))}
s = scale01(s)
> s
[1] 0.00000000 0.02804113 0.05715588 0.10630185 0.15778932 0.20391987 0.26066608 0.27265697 0.39100373
[10] 1.00000000
# generate random normal series
x = rnorm(20)
> x
[1] -0.82530658 0.92289557 0.39827984 -0.62416117 -1.69055539 -0.28164232 -1.32717654 -1.36992509
[9] -1.54352202 -1.09826247 -0.68260576 1.07307043 2.35298180 -0.41472811 0.38919315 -0.27325343
[17] -1.52592682 0.05400849 -0.43498544 0.73841106
# interpolate 'x' over 's' ?
> approx(x,xout=s)
$x
[1] 0.00000000 0.02804113 0.05715588 0.10630185 0.15778932 0.20391987 0.26066608 0.27265697 0.39100373
[10] 1.00000000
$y
[1] NA NA NA NA NA NA NA NA NA
[10] -0.8253066
>
I want to interpolate the series 'x' over the series 's'. Lets assume time interval series for the 'x' series has 20 elements distributed uniformly over the interval [0,1]. Now I want to interpolate those 10 elements from 'x' that occur at time intervals described by 's'.
EDIT:
I think this does the job.
> approx(seq(0,1,length.out=20), x, xout=s)
$x
[1] 0.00000000 0.02804113 0.05715588 0.10630185 0.15778932 0.20391987 0.26066608 0.27265697 0.39100373
[10] 1.00000000
$y
[1] -0.8253066 0.1061033 0.8777987 0.3781018 -0.6221134 -1.5566990 -0.3483466 -0.4703429 -1.4444105
[10] 0.7384111
Thanks for your help guys. I think I now understand how to use interpolation functions in R now. I should really use a time series data structure here.
This isn't meant as a direct answer to the OP's Q but rather to illustrate how approx() works so the OP can formulate a better Q
Your Q makes next to no sense. approx() works by taking a reference set of x, and y coordinates and then interpolating to find y at n locations over the range of x, or at the specified xout locations supplied by the user.
So in your call, you don't provide y and x doesn't contain a y component so I don't see how this can work.
If you want to interpolate s, so you can find time intervals for any value over range of s then:
> approx(s, seq_along(s), n = 20)
$x
[1] 0.00000000 0.05263158 0.10526316 0.15789474 0.21052632 0.26315789
[7] 0.31578947 0.36842105 0.42105263 0.47368421 0.52631579 0.57894737
[13] 0.63157895 0.68421053 0.73684211 0.78947368 0.84210526 0.89473684
[19] 0.94736842 1.00000000
$y
[1] 1.00000 26.25815 42.66323 54.79831 64.96162 76.99433 79.67388
[8] 83.78458 86.14656 89.86223 91.98513 93.36233 93.77353 94.19731
[15] 94.63652 95.26239 97.67724 98.74056 99.40548 100.00000
Here $y contains the interpolated values for s at n = 20 equally spaced locations on the range of s (0,1).
Edit: If x represents the series at unstated time intervals uniform on 0,1 and you want the interpolated values of x at the time intervals s, then you need something like this:
> set.seed(1)
> x <- rnorm(20)
> s <- sort(rexp(10))
> scale01 <- function(x) {
+ (x - min(x)) / (max(x) - min(x))
+ }
> s <- scale01(s)
>
> ## interpolate x at points s
> approx(seq(0, 1, length = length(x)), x, xout = s)
$x
[1] 0.00000000 0.04439851 0.11870795 0.14379236 0.20767388 0.21218632
[7] 0.25498856 0.29079300 0.40426335 1.00000000
$y
[1] -0.62645381 0.05692127 -0.21465011 0.94393053 0.39810806 0.29323742
[7] -0.64197207 -0.13373472 0.62763207 0.59390132
Is that closer to what you want?

Resources