Given a set of data I have calculated an ellipse that fit to them using the next command:
eli<-ellipse(cor(x,y),scale=c(sd(x),sd(y)), centre=c(mean(x), mean(y)), level = 0.95)
Where "x" and "y" are the columns of my bivariate data. I would like to know how to find the elements of my ellipse (in red), say: the foci and the a" and "b" values.
In an attempt to find the semi-axis distance I tried to get a lineal regression of the data but I truly doubt of my method
How can I find those parameters? Or get the equation of the ellipse?
Since ellipse generates 100 points, this approach may be accurate enough. Of course you could set npoints to higher value to increase accuracy. I've also made plots to explain.
#rm(list = ls()) #Remove everything from the environment
#Generate some points
set.seed(42)
x = rnorm(20,5,1)
y = rnorm(20,5,2)
#Fit Ellipse
require(ellipse)
eli = ellipse(cor(x,y),scale=c(sd(x),sd(y)), centre=c(mean(x), mean(y)), level = 0.95, npoints = 250)
#Draw ellipse and points
plot(eli[,1], eli[,2], type = "l", asp = 1)
points(x,y)
#Calculate the center of ellipse
eli_center = c(mean(eli[,1]), mean(eli[,2]))
#Plot eli_center
points(eli_center[1], eli_center[2], pch = 19, cex = 1.5)
#A function to calculate distance between points 'x1' and 'x2'
dist_2_points <- function(x1, x2) {
return(sqrt(sum((x1 - x2)^2)))
}
#Compute distance of each point in ellipse from eli_center
distance = numeric(0)
for (i in 1:nrow(eli)){
distance[i] = dist_2_points(eli_center, eli[i,])
}
#The maximum distance from eli_center is 'a'
a = distance[which.max(distance)]
a_point = eli[ which.max(distance), ]
#Draw 'a'
points(a_point[1],a_point[2], pch = 5)
lines(rbind(eli_center, a_point))
#The minimum distance from eli_center is 'b'
b = distance[which.min(distance)]
b_point = eli[ which.min(distance), ]
#Draw 'b'
points(b_point[1],b_point[2], pch = 5)
lines(rbind(eli_center, b_point))
#find foci
foci = sqrt(a^2 - b^2)
This is the code that the car:::ellipse function uses after doing some error checking and other "housekeeping":
angles <- (0:segments) * 2 * pi/segments
unit.circle <- cbind(cos(angles), sin(angles))
Q <- chol(shape, pivot = TRUE)
order <- order(attr(Q, "pivot"))
ellipse <- t(center + radius * t(unit.circle %*% Q[, order]))
colnames(ellipse) <- c("x", "y")
You will notice that the regression line you drew was a bit "off-axis". If you drew in the line from X regressed on Y it would also be "off-axis" in the other direction. Do a search on "total least squares regression" or "Deming regression" (and you'll find some other names that I'm not coming up with off the top of my head.) Regression lines determined by ordinary least squares lines do not go through the major axis of the ellipse that that you are calculating.
Related
I would like to fit a line through two points from a random distribution of points, then calculate the location along that line that each point intersects it orthogonally. I am not interested in the residual distance of each point from the line (points above/below the line are treated equally), I am only interested in calculating the location along the line of where that point would intersect (e.g. points at different distances from the line but at the same orthogonal location would have the same value). The data aren't connected to the line explicitly as the abline is drawn from the location of only 2 points, and so i can't extract these values in a classic residual type way. I don't think this is difficult, but I can't wrap by head around how to calculate it and it's really bugging me!
I have explored the dist2d function but that calculates the orthogonal distance of each point to the line. Is there a way to use that value to the then calculate the hypotenuse from the data point to some fixed constant point on the line, and then in turn calculate the adjacent distance from that constant? I would really appreciate any help!
#here is some example starter code here to visualise what I mean
#get random data
r = rnorm(100)
t = rnorm(100)
#bind and turn into a df
data = cbind(r,t)
data = as.data.frame(data)
head(data)
#plot
plot(data)
#want to draw abline between 2 points
#isolate points of interest
#here randomly select first two rows
d = data[c(1:2),]
head(d)
#calculate abline through selected points
lm = lm(t ~ r, d)
abline(lm)
#draw points to see which ones they cut through
points(d$r, d$t, bg = "red", pch = 21)
This code below works.
# Create dataframe
data = data.frame(x = rnorm(100), y = rnorm(100))
plot(data, xlim=c(-3, 3), ylim=c(-3, 3))
# Select two points
data$x1_red <- data[1,1]; data$y1_red <- data[1,2]; data$x2_red <- data[2,1]; data$y2_red <- data[2,2];
points(data$x1_red, data$y1_red, bg = "red", pch = 21); points(data$x2_red, data$y2_red, bg = "red", pch = 21);
# Show a red line where the points intersect
# Get its slope (m_red) and intercept (b_red)
data$m_red <- (data[2,2] - data[1,2]) / (data[2,1] - data[1,1])
data$b_red <- data$y1_red - data$m * data$x1_red
abline(data$b_red, data$m_red, col='red')
# Calculate the orthogonal slope
data$m_blue <- (-1/data$m_red)
abline(0, data$m_blue, col='blue')
# Solve for each point's b-intercept (if using the blue slope)
# y = m_blue * x + b
# b = y - m_blue * x
data$b <- data$y - data$m_blue * data$x
# Solve for where each point (using the m_blue slope) intersects the red line (x' and y')
# y' = m_blue * x' + b
# y' = m_red * x' + b_red
# Set those equations equal to each other and solve for x'
data$x_intersect <- (data$b_red - data$b) / (data$m_blue - data$m_red)
# Then solve for y'
data$y_intersect <- data$m_blue * data$x_intersect + data$b
# Calculate the distance between the point and where it intersects the red line
data$dist <- sqrt( (data$x - data$x_intersect)^2 + (data$y - data$y_intersect)^2 )
Using the interp function (Akima package), it is possible to draw the surface corresponding to the bivariate interpolation of a data set, see example below (from interp documentation):
library(rgl)
data(akima)
# data visualisation
rgl.spheres(akima$x,akima$z , akima$y,0.5,color="red")
rgl.bbox()
# bivariate linear interpolation
# interp:
akima.li <- interp(akima$x, akima$y, akima$z,
xo=seq(min(akima$x), max(akima$x), length = 100),
yo=seq(min(akima$y), max(akima$y), length = 100))
# interp surface:
rgl.surface(akima.li$x,akima.li$y,akima.li$z,color="green",alpha=c(0.5))
However, the output is only a list describing a set of points, not a general function.
Question: is there any method to obtain a function z = f(x,y) that matches the previously obtained surface ? I know that it works using interp(akima$x, akima$y, akima$z, xo=A, yo=B), but it is very slow.
In two dimensions, the approxfun() function would do the job, but I could not find the equivalent for multiple parameters interpolation.
If you want a linear interpolation so that the surface cross all points, you will not be able to interpolate with a function z = f(x,y), except if the dataset has been simulated through this kind of function.
If you are looking for a function z=f(x,y) that matches your point set, you will have to build a model with GLM or GAM for instance. However, this induces that the surface will not cross all points data and there will be some residuals.
As I use to work with spatial datasets, which means x and y coordinates with a z observation, I will give you some clues in this way.
First, I prepare a dataset for interpolation:
library(rgl)
library(akima)
library(dplyr)
library(tidyr)
data(akima)
data.akima <- as.data.frame(akima)
# data visualisation
rgl.spheres(akima$x, akima$z , akima$y,0.5,color="red")
rgl.bbox()
# Dataset for interpolation
seq_x <- seq(min(akima$x) - 1, max(akima$x) + 1, length.out = 20)
seq_y <- seq(min(akima$y) - 1, max(akima$y) + 1, length.out = 20)
data.pred <- dplyr::full_join(data.frame(x = seq_x, by = 1),
data.frame(y = seq_y, by = 1)) %>%
dplyr::select(-by)
Then, I use your akima interpolation function:
# bivariate linear interpolation
# interp:
akima.li <- interp(akima$x, akima$y, akima$z,
xo=seq_x,
yo=seq_y)
# interp surface:
rgl.surface(akima.li$x,akima.li$y,akima.li$z,color="green",alpha=c(0.5))
rgl.spheres(akima$x, akima$z , akima$y,0.5,color="red")
rgl.bbox()
Using rasters
From now, if you want to get interpolated information on some specific points, you can re-use interp function or decide to work with a rasterized image. Using rasters, you are then able to increase resolution, and get any spatial position information data.
# Using rasters
library(raster)
r.pred <- raster(akima.li$z, xmn = min(seq_x), xmx = max(seq_x),
ymn = min(seq_y), ymx = max(seq_y))
plot(r.pred)
## Further bilinear interpolations
## Double raster resolution
r.pred.2 <- disaggregate(r.pred, fact = 2, method = "bilinear")
plot(r.pred.2)
Spatial interpolation (inverse distance interpolation or kriging)
When thinking in spatial for interpolation, I first think about kriging. This will smooth your surface, thus it will not cross every data points.
# Spatial inverse distance interpolation
library(sp)
library(gstat)
# Transform data as spatial objects
data.akima.sp <- data.akima
coordinates(data.akima.sp) <- ~x+y
data.pred.sp <- data.pred
coordinates(data.pred.sp) <- ~x+y
# Inverse distance interpolation
# idp is set to 2 as weight for interpolation is :
# w = 1/dist^idp
# nmax is set to 3, so that only the 3 closest points are used for interpolation
pred.idw <- idw(
formula = as.formula("z~1"),
locations = data.akima.sp,
newdata = data.pred.sp,
idp = 2,
nmax = 3)
data.spread.idw <- data.pred %>%
select(-pred) %>%
mutate(idw = pred.idw$var1.pred) %>%
tidyr::spread(key = y, value = idw) %>%
dplyr::select(-x)
surface3d(seq_x, seq_y, as.matrix(data.spread.idw), col = "green")
rgl.spheres(akima$x, akima$y , akima$z, 0.5, color = "red")
rgl.bbox()
Interpolate using gam or glm
However, if you want to find a formula like z = f(x,y), you should use GLM or GAM with high degrees of freedom depending on the smooth you hope to see. Another advantage is that you can add other covariates, not only x and y. The model needs to be fitted with a x/y interaction.
Here an example with a simple GAM smooth:
# Approximation with a gam model
library(mgcv)
gam1 <- gam(z ~ te(x, y), data = data.akima)
summary(gam1)
plot(gam1)
data.pred$pred <- predict(gam1, data.pred)
data.spread <- tidyr::spread(data.pred, key = y, value = pred) %>%
dplyr::select(-x)
surface3d(seq_x, seq_y, as.matrix(data.spread), col = "blue")
rgl.spheres(akima$x, akima$y , akima$z, 0.5, color = "red")
rgl.bbox()
Does this answer goes in the right direction for you ?
Dear Crowd
Problem
I tried to calculate a spatial correlogram with the packages nfc, pgirmess, SpatialPack and spdep. However, I was troubling to define the start and end-point of the distance. I'm only interested in the spatial autocorrelation at smaller distances, but there on smaller bins. Additionally, as the raster is quite large (1.8 Megapixels), I run into memory troubles with these packages but the SpatialPack.
So I tried to produce my own code, using the function Moran from the package raster. But I must have some error, as the result for the complete dataset is somewhat different than the one from the other packages. If there is no error in my code, it might at least help others with similar problems.
Question
I'm not sure, whether my focal matrix is erroneous. Could you please tell me whether the central pixel needs to be incorporated? Using the testdata I can't show the differences between the methods, but on my complete dataset, there are differences visible, as shown in the Image below. However, the bins are not exactly the same (50m vs. 69m), so this might explain parts of the differences. However, at the first bin, this explanation seems not to be plausible to me. Or might the irregular shape of my raster, and different ways to handle NA's cause the difference?
Comparison of Own method with the one from SpatialPack
Runable Example
Testdata
The code for calculating the testdata is taken from http://www.petrkeil.com/?p=1050#comment-416317
# packages used for the data generation
library(raster)
library(vegan) # will be used for PCNM
# empty matrix and spatial coordinates of its cells
side=30
my.mat <- matrix(NA, nrow=side, ncol=side)
x.coord <- rep(1:side, each=side)*5
y.coord <- rep(1:side, times=side)*5
xy <- data.frame(x.coord, y.coord)
# all paiwise euclidean distances between the cells
xy.dist <- dist(xy)
# PCNM axes of the dist. matrix (from 'vegan' package)
pcnm.axes <- pcnm(xy.dist)$vectors
# using 8th PCNM axis as my atificial z variable
z.value <- pcnm.axes[,8]*200 + rnorm(side*side, 0, 1)
# plotting the artificial spatial data
r <- rasterFromXYZ(xyz = cbind(xy,z.value))
plot(r, axes=F)
Own Code
library(raster)
sp.Corr <- matrix(nrow = 0,ncol = 2)
formerBreak <- 0 #for the first run important
for (i in c(seq(10,200,10))) #Calculate the Morans I for these bins
{
cat(paste0("..",i)) #print the bin, which is currently calculated
w = focalWeight(r,d = i,type = 'circle')
wTemp <- w #temporarily saves the weigtht matrix
if (formerBreak>0) #if it is the second run
{
midpoint <- ceiling(ncol(w)/2) # get the midpoint
w[(midpoint-formerBreak):(midpoint+formerBreak),(midpoint-formerBreak):(midpoint+formerBreak)] <- w[(midpoint-formerBreak):(midpoint+formerBreak),(midpoint-formerBreak):(midpoint+formerBreak)]*(wOld==0)#set the previous focal weights to 0
w <- w*(1/sum(w)) #normalizes the vector to sum the weights to 1
}
wOld <- wTemp #save this weight matrix for the next run
mor <- Moran(r,w = w)
sp.Corr <- rbind(sp.Corr,c(Moran =mor,Distance = i))
formerBreak <- i/res(r)[1]#divides the breaks by the resolution of the raster to be able to translate them to the focal window
}
plot(x=sp.Corr[,2],y = sp.Corr[,1],type = "l",ylab = "Moran's I",xlab="Upper bound of distance")
Other methods to calculate the Spatial Correlogram
library(SpatialPack)
sp.Corr <- summary(modified.ttest(z.value,z.value,coords = xy,nclass = 21))
plot(x=sp.Corr$coef[,1],y = data$coef[,4],type = "l",ylab = "Moran's I",xlab="Upper bound of distance")
library(ncf)
ncf.cor <- correlog(x.coord, y.coord, z.value,increment=10, resamp=1)
plot(ncf.cor)
In order to compare the results of the correlogram, in your case, two things should be considered. (i) your code only works for bins proportional to the resolution of your raster. In that case, a bit of difference in the bins could make to include or exclude an important amount of pairs. (ii) The irregular shape of the raster has a strong impact of the pairs that are considered to compute the correlation for certain distance interval. So your code should deal with both, allow any value for the length of bin and consider the irregular shape of the raster. A small modification of your code to tackle those problems are below.
# SpatialPack correlation
library(SpatialPack)
test <- modified.ttest(z.value,z.value,coords = xy,nclass = 21)
# Own correlation
bins <- test$upper.bounds
library(raster)
sp.Corr <- matrix(nrow = 0,ncol = 2)
for (i in bins) {
cat(paste0("..",i)) #print the bin, which is currently calculated
w = focalWeight(r,d = i,type = 'circle')
wTemp <- w #temporarily saves the weigtht matrix
if (i > bins[1]) {
midpoint <- ceiling(dim(w)/2) # get the midpoint
half_range <- floor(dim(wOld)/2)
w[(midpoint[1] - half_range[1]):(midpoint[1] + half_range[1]),
(midpoint[2] - half_range[2]):(midpoint[2] + half_range[2])] <-
w[(midpoint[1] - half_range[1]):(midpoint[1] + half_range[1]),
(midpoint[2] - half_range[2]):(midpoint[2] + half_range[2])]*(wOld==0)
w <- w * (1/sum(w)) #normalizes the vector to sum the weights to 1
}
wOld <- wTemp #save this weight matrix for the next run
mor <- Moran(r,w=w)
sp.Corr <- rbind(sp.Corr,c(Moran =mor,Distance = i))
}
# Comparing
plot(x=test$upper.bounds, test$imoran[,1], col = 2,type = "b",ylab = "Moran's I",xlab="Upper bound of distance", lwd = 2)
lines(x=sp.Corr[,2],y = sp.Corr[,1], col = 3)
points(x=sp.Corr[,2],y = sp.Corr[,1], col = 3)
legend('topright', legend = c('SpatialPack', 'Own code'), col = 2:3, lty = 1, lwd = 2:1)
The image shows that the results of using the SpatialPack package and the own code are the same.
I am using the taylor.diagram function in the plotrix package e.g.
obs = runif(100,1,100)
mod1 = runif(100,1,100)
mod2 = runif(100,1,100)
mod3 = runif(100,1,100)
taylor.diagram(obs,mod1)
taylor.diagram(obs,mod2,add=TRUE)
taylor.diagram(obs,mod3,add=TRUE)
In the conventional Taylor diagram there is no bias but in his paper (Taylor, 2001, K.E. Summarizing multiple aspects of model performance in a single diagram Taylor JGR, 106, 7183-7192) Taylor says that
"Although the diagram has been designed to convey information about centered pattern differences it is also possible to indicate differences in overall means (i.e., the bias). This can be done on the diagram by attaching to each plotted point a line segment drawn at a right angle to the straight line defined by the point and the reference point. If the length of the attached line segment is equal to the bias, then the distance from the reference point to the end of the line segment will be equal to the total (uncentered) RMS error"
I admit that I don't know where to start to try and do this. Has anyone succeeded at adding this information on the plot?
If I understand correctly the bias is the difference in means between the model vector and the observation vector. Then, the problem is to, (a) find the line between the observation and model points, (b) find a line perpendicular to this line, (c) find a point along the perpendicular line, at a distance from the model point equal to the bias.
One possible solution is:
taylor.bias <- function(ref, model, normalize = FALSE){
R <- cor(model, ref, use = "pairwise")
sd.f <- sd(model)
sd.r <- sd(ref)
m.f <- mean(model)
m.r <- mean(ref)
## normalize if requested
if (normalize) {
m.f <- m.f/sd.r
m.r <- m.r/sd.r
sd.f <- sd.f/sd.r
sd.r <- 1
}
## calculate bias
bias <- m.f - m.r
## coordinates for model and observations
dd <- rbind(mp = c(sd.f * R, sd.f * sin(acos(R))), rp = c(sd.r, 0))
## find equation of line passing through pts
v1 <- solve(cbind(1, dd[,1])) %*% dd[,2]
## find perpendicular line
v2 <- c(dd[1,2] + dd[1,1]/v1[2], -1/v1[2])
## find point defined by bias
nm <- dd[1,] - c(0, v2[1])
nm <- nm / sqrt(sum(nm^2))
bp <- dd[1,] + bias*nm
## plot lines
arrows(x0 = dd[1,1], x1 = bp[1], y0 = dd[1,2], y1 = bp[2], col = "red", length = 0.05, lwd = 1.5)
lines(rbind(dd[2,], bp), col = "red", lty = 3)
lines(dd, col = "red", lty = 3)
}
Then,
library(plotrix)
obs = runif(100,1,100)
mod1 = runif(100,1,100)
taylor.diagram(obs,mod1)
taylor.bias(obs,mod1)
Where the length of the red vector indicates the bias and the length of dotted line joining the vector's tip to the reference point is the RMS error. The direction of the red vector indicates the sign of the bias -- in the picture below, negative bias.
Is there a way to draw a simple ellipse based on the following definition (instead of eigenvalue) in R?
The definition I want to use is that an ellipse is the set of points in a plane for which the sum of the distances to two fixed points F1 and F2 is a constant.
Should I just use a polar cordinate?
This may be more algorithmic question.
As #DWin suggested, there are several implementations for plotting ellipses (such as function draw.ellipse in package plotrix). To find them:
RSiteSearch("ellipse", restrict="functions")
That being said, implementing your own function is fairly simple if you know a little geometry. Here is an attempt:
ellipse <- function(xf1, yf1, xf2, yf2, k, new=TRUE,...){
# xf1 and yf1 are the coordinates of your focus F1
# xf2 and yf2 are the coordinates of your focus F2
# k is your constant (sum of distances to F1 and F2 of any points on the ellipse)
# new is a logical saying if the function needs to create a new plot or add an ellipse to an existing plot.
# ... is any arguments you can pass to functions plot or lines (col, lwd, lty, etc.)
t <- seq(0, 2*pi, by=pi/100) # Change the by parameters to change resolution
k/2 -> a # Major axis
xc <- (xf1+xf2)/2
yc <- (yf1+yf2)/2 # Coordinates of the center
dc <- sqrt((xf1-xf2)^2 + (yf1-yf2)^2)/2 # Distance of the foci to the center
b <- sqrt(a^2 - dc^2) # Minor axis
phi <- atan(abs(yf1-yf2)/abs(xf1-xf2)) # Angle between the major axis and the x-axis
xt <- xc + a*cos(t)*cos(phi) - b*sin(t)*sin(phi)
yt <- yc + a*cos(t)*sin(phi) + b*sin(t)*cos(phi)
if(new){ plot(xt,yt,type="l",...) }
if(!new){ lines(xt,yt,...) }
}
An example:
F1 <- c(2,3)
F2 <- c(1,2)
plot(rbind(F1, F2), xlim=c(-1,5), ylim=c(-1, 5), pch=19)
abline(h=0, v=0, col="grey90")
ellipse(F1[1], F1[2], F2[1], F2[2], k=2, new=FALSE, col="red", lwd=2)
points((F1[1]+F2[1])/2, (F1[2]+F2[2])/2, pch=3)