Obtain function from akima::interp() matrix - r

Using the interp function (Akima package), it is possible to draw the surface corresponding to the bivariate interpolation of a data set, see example below (from interp documentation):
library(rgl)
data(akima)
# data visualisation
rgl.spheres(akima$x,akima$z , akima$y,0.5,color="red")
rgl.bbox()
# bivariate linear interpolation
# interp:
akima.li <- interp(akima$x, akima$y, akima$z,
xo=seq(min(akima$x), max(akima$x), length = 100),
yo=seq(min(akima$y), max(akima$y), length = 100))
# interp surface:
rgl.surface(akima.li$x,akima.li$y,akima.li$z,color="green",alpha=c(0.5))
However, the output is only a list describing a set of points, not a general function.
Question: is there any method to obtain a function z = f(x,y) that matches the previously obtained surface ? I know that it works using interp(akima$x, akima$y, akima$z, xo=A, yo=B), but it is very slow.
In two dimensions, the approxfun() function would do the job, but I could not find the equivalent for multiple parameters interpolation.

If you want a linear interpolation so that the surface cross all points, you will not be able to interpolate with a function z = f(x,y), except if the dataset has been simulated through this kind of function.
If you are looking for a function z=f(x,y) that matches your point set, you will have to build a model with GLM or GAM for instance. However, this induces that the surface will not cross all points data and there will be some residuals.
As I use to work with spatial datasets, which means x and y coordinates with a z observation, I will give you some clues in this way.
First, I prepare a dataset for interpolation:
library(rgl)
library(akima)
library(dplyr)
library(tidyr)
data(akima)
data.akima <- as.data.frame(akima)
# data visualisation
rgl.spheres(akima$x, akima$z , akima$y,0.5,color="red")
rgl.bbox()
# Dataset for interpolation
seq_x <- seq(min(akima$x) - 1, max(akima$x) + 1, length.out = 20)
seq_y <- seq(min(akima$y) - 1, max(akima$y) + 1, length.out = 20)
data.pred <- dplyr::full_join(data.frame(x = seq_x, by = 1),
data.frame(y = seq_y, by = 1)) %>%
dplyr::select(-by)
Then, I use your akima interpolation function:
# bivariate linear interpolation
# interp:
akima.li <- interp(akima$x, akima$y, akima$z,
xo=seq_x,
yo=seq_y)
# interp surface:
rgl.surface(akima.li$x,akima.li$y,akima.li$z,color="green",alpha=c(0.5))
rgl.spheres(akima$x, akima$z , akima$y,0.5,color="red")
rgl.bbox()
Using rasters
From now, if you want to get interpolated information on some specific points, you can re-use interp function or decide to work with a rasterized image. Using rasters, you are then able to increase resolution, and get any spatial position information data.
# Using rasters
library(raster)
r.pred <- raster(akima.li$z, xmn = min(seq_x), xmx = max(seq_x),
ymn = min(seq_y), ymx = max(seq_y))
plot(r.pred)
## Further bilinear interpolations
## Double raster resolution
r.pred.2 <- disaggregate(r.pred, fact = 2, method = "bilinear")
plot(r.pred.2)
Spatial interpolation (inverse distance interpolation or kriging)
When thinking in spatial for interpolation, I first think about kriging. This will smooth your surface, thus it will not cross every data points.
# Spatial inverse distance interpolation
library(sp)
library(gstat)
# Transform data as spatial objects
data.akima.sp <- data.akima
coordinates(data.akima.sp) <- ~x+y
data.pred.sp <- data.pred
coordinates(data.pred.sp) <- ~x+y
# Inverse distance interpolation
# idp is set to 2 as weight for interpolation is :
# w = 1/dist^idp
# nmax is set to 3, so that only the 3 closest points are used for interpolation
pred.idw <- idw(
formula = as.formula("z~1"),
locations = data.akima.sp,
newdata = data.pred.sp,
idp = 2,
nmax = 3)
data.spread.idw <- data.pred %>%
select(-pred) %>%
mutate(idw = pred.idw$var1.pred) %>%
tidyr::spread(key = y, value = idw) %>%
dplyr::select(-x)
surface3d(seq_x, seq_y, as.matrix(data.spread.idw), col = "green")
rgl.spheres(akima$x, akima$y , akima$z, 0.5, color = "red")
rgl.bbox()
Interpolate using gam or glm
However, if you want to find a formula like z = f(x,y), you should use GLM or GAM with high degrees of freedom depending on the smooth you hope to see. Another advantage is that you can add other covariates, not only x and y. The model needs to be fitted with a x/y interaction.
Here an example with a simple GAM smooth:
# Approximation with a gam model
library(mgcv)
gam1 <- gam(z ~ te(x, y), data = data.akima)
summary(gam1)
plot(gam1)
data.pred$pred <- predict(gam1, data.pred)
data.spread <- tidyr::spread(data.pred, key = y, value = pred) %>%
dplyr::select(-x)
surface3d(seq_x, seq_y, as.matrix(data.spread), col = "blue")
rgl.spheres(akima$x, akima$y , akima$z, 0.5, color = "red")
rgl.bbox()
Does this answer goes in the right direction for you ?

Related

How to get Principal Component Data in PAM in R

I create a graph using autoplot function using mtcars data and get graph like this
here my code:
library(cluster)
library(NbClust)
library(ggplot2)
library(ggfortify)
x <- mtcars
number.cluster <- NbClust(x, distance = "euclidean", min.nc = 1, max.nc = 5, method = "complete", index = "ch")
best.cluster <- as.numeric(number.cluster$Best.nc[1])
x.pam <- pam(x, best.cluster)
autoplot(x.pam, data = x, frame = T) + ggtitle("PAM MTCARS")
my question is how do i get PC1 & PC2 data Coordinate based on this graph?
thank you
You can use layer_data() to get the data used for a ggplot object:
p <- autoplot(x.pam, data = x, frame = T) + ggtitle("PAM MTCARS")
layer_data(p, 1L) # coordinates of all points
layer_data(p, 2L) # coordinates of points that contribute to polygons
Your entire process is flawed. First you use complete linkage to estimate the number of clusters; but rather than using the "best" clustering found you then cluster again with PAM instead.
You use Euclidean distance, but in Euclidean space k-means will usually work better than PAM - PAM shines when you don't have Euclidean geometry and cannot use k-means.
And then you want to use this PCA plot, which is heavily distorted (almost the entire variance is in the first component, the y axis is visualizing pretty much random deviation). Just use PCA if you want these coordinates, not reconstruct this from the plot.

Identify all local extrema of a fitted smoothing spline via R function 'smooth.spline'

I have a 2-dimensional data set.
I use the R's smooth.spline function to smooth my points graph following an example in this article:
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/predict.smooth.spline.html
So that I get the spline graph similar to the green line on this picture
I'd like to know the X values, where the first derivative of the smoothing spline equals zero (to determine exact minimum or maximum).
My problem is that my initial dataset (or a dataset that I could auto-generate) to feed into the predict() function does not contain such exact X values that correspond to the smoothing spline extrema.
How can I find such X values?
Here is the picture of the first derivative of the green spline line above
But exact X coordinate of extremums are still not exact.
My approximate R script to generate the pictures looks like the following
sp1 <- smooth.spline(df)
pred.prime <- predict(sp1, deriv=1)
pred.second <- predict(sp1, deriv=2)
d1 <- data.frame(pred.prime)
d2 <- data.frame(pred.second)
dfMinimums <- d1[abs(d1$y) < 1e-4, c('x','y')]
I think that there are two problems here.
You are using the original x-values and they are spaced too far apart AND
Because of the wide spacing of the x's, your threshold for where you consider the derivative "close enough" to zero is too high.
Here is basically your code but with many more x values and requiring smaller derivatives. Since you do not provide any data, I made a coarse approximation to it that should suffice for illustration.
## Coarse approximation of your data
x = runif(300, 0,45000)
y = sin(x/5000) + sin(x/950)/4 + rnorm(300, 0,0.05)
df = data.frame(x,y)
sp1 <- smooth.spline(df)
Spline code
Sx = seq(0,45000,10)
pred.spline <- predict(sp1, Sx)
d0 <- data.frame(pred.spline)
pred.prime <- predict(sp1, Sx, deriv=1)
d1 <- data.frame(pred.prime)
Mins = which(abs(d1$y) < mean(abs(d1$y))/150)
plot(df, pch=20, col="navy")
lines(sp1, col="darkgreen")
points(d0[Mins,], pch=20, col="red")
The extrema look pretty good.
plot(d1, type="l")
points(d1[Mins,], pch=20, col="red")
The points identified look like zeros of the derivative.
You can use my R package SplinesUtils: https://github.com/ZheyuanLi/SplinesUtils, which can be installed by
devtools::install_github("ZheyuanLi/SplinesUtils")
The function to be used are SmoothSplinesAsPiecePoly and solve. I will just use the example under the documentation.
library(SplinesUtils)
## a toy dataset
set.seed(0)
x <- 1:100 + runif(100, -0.1, 0.1)
y <- poly(x, 9) %*% rnorm(9)
y <- y + rnorm(length(y), 0, 0.2 * sd(y))
## fit a smoothing spline
sm <- smooth.spline(x, y)
## coerce "smooth.spline" object to "PiecePoly" object
oo <- SmoothSplineAsPiecePoly(sm)
## plot the spline
plot(oo)
## find all stationary / saddle points
xs <- solve(oo, deriv = 1)
#[1] 3.791103 15.957159 21.918534 23.034192 25.958486 39.799999 58.627431
#[8] 74.583000 87.049227 96.544430
## predict the "PiecePoly" at stationary / saddle points
ys <- predict(oo, xs)
#[1] -0.92224176 0.38751847 0.09951236 0.10764884 0.05960727 0.52068566
#[7] -0.51029209 0.15989592 -0.36464409 0.63471723
points(xs, ys, pch = 19)
One caveat in the #G5W implementation that I found is that it sometimes returns multiple records close around extrema instead of a single one. On the diagram they cannot be seen, since they all fall into one point effectively.
The following snippet from here filters out single extrema points with the minimum value of the first derivative:
library(tidyverse)
df2 <- df %>%
group_by(round(y, 4)) %>%
filter(abs(d1) == min(abs(d1))) %>%
ungroup() %>%
select(-5)

Finding the parts of the ellipse from my data in R

Given a set of data I have calculated an ellipse that fit to them using the next command:
eli<-ellipse(cor(x,y),scale=c(sd(x),sd(y)), centre=c(mean(x), mean(y)), level = 0.95)
Where "x" and "y" are the columns of my bivariate data. I would like to know how to find the elements of my ellipse (in red), say: the foci and the a" and "b" values.
In an attempt to find the semi-axis distance I tried to get a lineal regression of the data but I truly doubt of my method
How can I find those parameters? Or get the equation of the ellipse?
Since ellipse generates 100 points, this approach may be accurate enough. Of course you could set npoints to higher value to increase accuracy. I've also made plots to explain.
#rm(list = ls()) #Remove everything from the environment
#Generate some points
set.seed(42)
x = rnorm(20,5,1)
y = rnorm(20,5,2)
#Fit Ellipse
require(ellipse)
eli = ellipse(cor(x,y),scale=c(sd(x),sd(y)), centre=c(mean(x), mean(y)), level = 0.95, npoints = 250)
#Draw ellipse and points
plot(eli[,1], eli[,2], type = "l", asp = 1)
points(x,y)
#Calculate the center of ellipse
eli_center = c(mean(eli[,1]), mean(eli[,2]))
#Plot eli_center
points(eli_center[1], eli_center[2], pch = 19, cex = 1.5)
#A function to calculate distance between points 'x1' and 'x2'
dist_2_points <- function(x1, x2) {
return(sqrt(sum((x1 - x2)^2)))
}
#Compute distance of each point in ellipse from eli_center
distance = numeric(0)
for (i in 1:nrow(eli)){
distance[i] = dist_2_points(eli_center, eli[i,])
}
#The maximum distance from eli_center is 'a'
a = distance[which.max(distance)]
a_point = eli[ which.max(distance), ]
#Draw 'a'
points(a_point[1],a_point[2], pch = 5)
lines(rbind(eli_center, a_point))
#The minimum distance from eli_center is 'b'
b = distance[which.min(distance)]
b_point = eli[ which.min(distance), ]
#Draw 'b'
points(b_point[1],b_point[2], pch = 5)
lines(rbind(eli_center, b_point))
#find foci
foci = sqrt(a^2 - b^2)
This is the code that the car:::ellipse function uses after doing some error checking and other "housekeeping":
angles <- (0:segments) * 2 * pi/segments
unit.circle <- cbind(cos(angles), sin(angles))
Q <- chol(shape, pivot = TRUE)
order <- order(attr(Q, "pivot"))
ellipse <- t(center + radius * t(unit.circle %*% Q[, order]))
colnames(ellipse) <- c("x", "y")
You will notice that the regression line you drew was a bit "off-axis". If you drew in the line from X regressed on Y it would also be "off-axis" in the other direction. Do a search on "total least squares regression" or "Deming regression" (and you'll find some other names that I'm not coming up with off the top of my head.) Regression lines determined by ordinary least squares lines do not go through the major axis of the ellipse that that you are calculating.

Spatial correlogram using the raster package

Dear Crowd
Problem
I tried to calculate a spatial correlogram with the packages nfc, pgirmess, SpatialPack and spdep. However, I was troubling to define the start and end-point of the distance. I'm only interested in the spatial autocorrelation at smaller distances, but there on smaller bins. Additionally, as the raster is quite large (1.8 Megapixels), I run into memory troubles with these packages but the SpatialPack.
So I tried to produce my own code, using the function Moran from the package raster. But I must have some error, as the result for the complete dataset is somewhat different than the one from the other packages. If there is no error in my code, it might at least help others with similar problems.
Question
I'm not sure, whether my focal matrix is erroneous. Could you please tell me whether the central pixel needs to be incorporated? Using the testdata I can't show the differences between the methods, but on my complete dataset, there are differences visible, as shown in the Image below. However, the bins are not exactly the same (50m vs. 69m), so this might explain parts of the differences. However, at the first bin, this explanation seems not to be plausible to me. Or might the irregular shape of my raster, and different ways to handle NA's cause the difference?
Comparison of Own method with the one from SpatialPack
Runable Example
Testdata
The code for calculating the testdata is taken from http://www.petrkeil.com/?p=1050#comment-416317
# packages used for the data generation
library(raster)
library(vegan) # will be used for PCNM
# empty matrix and spatial coordinates of its cells
side=30
my.mat <- matrix(NA, nrow=side, ncol=side)
x.coord <- rep(1:side, each=side)*5
y.coord <- rep(1:side, times=side)*5
xy <- data.frame(x.coord, y.coord)
# all paiwise euclidean distances between the cells
xy.dist <- dist(xy)
# PCNM axes of the dist. matrix (from 'vegan' package)
pcnm.axes <- pcnm(xy.dist)$vectors
# using 8th PCNM axis as my atificial z variable
z.value <- pcnm.axes[,8]*200 + rnorm(side*side, 0, 1)
# plotting the artificial spatial data
r <- rasterFromXYZ(xyz = cbind(xy,z.value))
plot(r, axes=F)
Own Code
library(raster)
sp.Corr <- matrix(nrow = 0,ncol = 2)
formerBreak <- 0 #for the first run important
for (i in c(seq(10,200,10))) #Calculate the Morans I for these bins
{
cat(paste0("..",i)) #print the bin, which is currently calculated
w = focalWeight(r,d = i,type = 'circle')
wTemp <- w #temporarily saves the weigtht matrix
if (formerBreak>0) #if it is the second run
{
midpoint <- ceiling(ncol(w)/2) # get the midpoint
w[(midpoint-formerBreak):(midpoint+formerBreak),(midpoint-formerBreak):(midpoint+formerBreak)] <- w[(midpoint-formerBreak):(midpoint+formerBreak),(midpoint-formerBreak):(midpoint+formerBreak)]*(wOld==0)#set the previous focal weights to 0
w <- w*(1/sum(w)) #normalizes the vector to sum the weights to 1
}
wOld <- wTemp #save this weight matrix for the next run
mor <- Moran(r,w = w)
sp.Corr <- rbind(sp.Corr,c(Moran =mor,Distance = i))
formerBreak <- i/res(r)[1]#divides the breaks by the resolution of the raster to be able to translate them to the focal window
}
plot(x=sp.Corr[,2],y = sp.Corr[,1],type = "l",ylab = "Moran's I",xlab="Upper bound of distance")
Other methods to calculate the Spatial Correlogram
library(SpatialPack)
sp.Corr <- summary(modified.ttest(z.value,z.value,coords = xy,nclass = 21))
plot(x=sp.Corr$coef[,1],y = data$coef[,4],type = "l",ylab = "Moran's I",xlab="Upper bound of distance")
library(ncf)
ncf.cor <- correlog(x.coord, y.coord, z.value,increment=10, resamp=1)
plot(ncf.cor)
In order to compare the results of the correlogram, in your case, two things should be considered. (i) your code only works for bins proportional to the resolution of your raster. In that case, a bit of difference in the bins could make to include or exclude an important amount of pairs. (ii) The irregular shape of the raster has a strong impact of the pairs that are considered to compute the correlation for certain distance interval. So your code should deal with both, allow any value for the length of bin and consider the irregular shape of the raster. A small modification of your code to tackle those problems are below.
# SpatialPack correlation
library(SpatialPack)
test <- modified.ttest(z.value,z.value,coords = xy,nclass = 21)
# Own correlation
bins <- test$upper.bounds
library(raster)
sp.Corr <- matrix(nrow = 0,ncol = 2)
for (i in bins) {
cat(paste0("..",i)) #print the bin, which is currently calculated
w = focalWeight(r,d = i,type = 'circle')
wTemp <- w #temporarily saves the weigtht matrix
if (i > bins[1]) {
midpoint <- ceiling(dim(w)/2) # get the midpoint
half_range <- floor(dim(wOld)/2)
w[(midpoint[1] - half_range[1]):(midpoint[1] + half_range[1]),
(midpoint[2] - half_range[2]):(midpoint[2] + half_range[2])] <-
w[(midpoint[1] - half_range[1]):(midpoint[1] + half_range[1]),
(midpoint[2] - half_range[2]):(midpoint[2] + half_range[2])]*(wOld==0)
w <- w * (1/sum(w)) #normalizes the vector to sum the weights to 1
}
wOld <- wTemp #save this weight matrix for the next run
mor <- Moran(r,w=w)
sp.Corr <- rbind(sp.Corr,c(Moran =mor,Distance = i))
}
# Comparing
plot(x=test$upper.bounds, test$imoran[,1], col = 2,type = "b",ylab = "Moran's I",xlab="Upper bound of distance", lwd = 2)
lines(x=sp.Corr[,2],y = sp.Corr[,1], col = 3)
points(x=sp.Corr[,2],y = sp.Corr[,1], col = 3)
legend('topright', legend = c('SpatialPack', 'Own code'), col = 2:3, lty = 1, lwd = 2:1)
The image shows that the results of using the SpatialPack package and the own code are the same.

spatial clustering in R (simple example)

I have this simple data.frame
lat<-c(1,2,3,10,11,12,20,21,22,23)
lon<-c(5,6,7,30,31,32,50,51,52,53)
data=data.frame(lat,lon)
The idea is to find the spatial clusters based on the distance
First, I plot the map (lon,lat) :
plot(data$lon,data$lat)
so clearly I have three clusters based in the distance between the position of points.
For this aim, I've tried this code in R :
d= as.matrix(dist(cbind(data$lon,data$lat))) #Creat distance matrix
d=ifelse(d<5,d,0) #keep only distance < 5
d=as.dist(d)
hc<-hclust(d) # hierarchical clustering
plot(hc)
data$clust <- cutree(hc,k=3) # cut the dendrogram to generate 3 clusters
This gives :
Now I try to plot the same points but with colors from clusters
plot(data$x,data$y, col=c("red","blue","green")[data$clust],pch=19)
Here the results
Which is not what I'm looking for.
Actually, I want to find something like this plot
Thank you for help.
What about something like this:
lat<-c(1,2,3,10,11,12,20,21,22,23)
lon<-c(5,6,7,30,31,32,50,51,52,53)
km <- kmeans(cbind(lat, lon), centers = 3)
plot(lon, lat, col = km$cluster, pch = 20)
Here's a different approach. First it assumes that the coordinates are WGS-84 and not UTM (flat). Then it clusters all neighbors within a given radius to the same cluster using hierarchical clustering (with method = single, which adopts a 'friends of friends' clustering strategy).
In order to compute the distance matrix, I'm using the rdist.earth method from the package fields. The default earth radius for this package is 6378.388 (the equatorial radius) which might not be what one is looking for, so I've changed it to 6371. See this article for more info.
library(fields)
lon = c(31.621785, 31.641773, 31.617269, 31.583895, 31.603284)
lat = c(30.901118, 31.245008, 31.163886, 30.25058, 30.262378)
threshold.in.km <- 40
coors <- data.frame(lon,lat)
#distance matrix
dist.in.km.matrix <- rdist.earth(coors,miles = F,R=6371)
#clustering
fit <- hclust(as.dist(dist.in.km.matrix), method = "single")
clusters <- cutree(fit,h = threshold.in.km)
plot(lon, lat, col = clusters, pch = 20)
This could be a good solution if you don't know the number of clusters (like the k-means option), and is somewhat related to the dbscan option with minPts = 1.
---EDIT---
With the original data:
lat<-c(1,2,3,10,11,12,20,21,22,23)
lon<-c(5,6,7,30,31,32,50,51,52,53)
data=data.frame(lat,lon)
dist <- rdist.earth(data,miles = F,R=6371) #dist <- dist(data) if data is UTM
fit <- hclust(as.dist(dist), method = "single")
clusters <- cutree(fit,h = 1000) #h = 2 if data is UTM
plot(lon, lat, col = clusters, pch = 20)
As you have a spatial data to cluster, so DBSCAN is best suited for you data.
You can do this clustering using dbscan() function provided by fpc, a R package.
library(fpc)
lat<-c(1,2,3,10,11,12,20,21,22,23)
lon<-c(5,6,7,30,31,32,50,51,52,53)
DBSCAN <- dbscan(cbind(lat, lon), eps = 1.5, MinPts = 3)
plot(lon, lat, col = DBSCAN$cluster, pch = 20)

Resources