I would like to fit a line through two points from a random distribution of points, then calculate the location along that line that each point intersects it orthogonally. I am not interested in the residual distance of each point from the line (points above/below the line are treated equally), I am only interested in calculating the location along the line of where that point would intersect (e.g. points at different distances from the line but at the same orthogonal location would have the same value). The data aren't connected to the line explicitly as the abline is drawn from the location of only 2 points, and so i can't extract these values in a classic residual type way. I don't think this is difficult, but I can't wrap by head around how to calculate it and it's really bugging me!
I have explored the dist2d function but that calculates the orthogonal distance of each point to the line. Is there a way to use that value to the then calculate the hypotenuse from the data point to some fixed constant point on the line, and then in turn calculate the adjacent distance from that constant? I would really appreciate any help!
#here is some example starter code here to visualise what I mean
#get random data
r = rnorm(100)
t = rnorm(100)
#bind and turn into a df
data = cbind(r,t)
data = as.data.frame(data)
head(data)
#plot
plot(data)
#want to draw abline between 2 points
#isolate points of interest
#here randomly select first two rows
d = data[c(1:2),]
head(d)
#calculate abline through selected points
lm = lm(t ~ r, d)
abline(lm)
#draw points to see which ones they cut through
points(d$r, d$t, bg = "red", pch = 21)
This code below works.
# Create dataframe
data = data.frame(x = rnorm(100), y = rnorm(100))
plot(data, xlim=c(-3, 3), ylim=c(-3, 3))
# Select two points
data$x1_red <- data[1,1]; data$y1_red <- data[1,2]; data$x2_red <- data[2,1]; data$y2_red <- data[2,2];
points(data$x1_red, data$y1_red, bg = "red", pch = 21); points(data$x2_red, data$y2_red, bg = "red", pch = 21);
# Show a red line where the points intersect
# Get its slope (m_red) and intercept (b_red)
data$m_red <- (data[2,2] - data[1,2]) / (data[2,1] - data[1,1])
data$b_red <- data$y1_red - data$m * data$x1_red
abline(data$b_red, data$m_red, col='red')
# Calculate the orthogonal slope
data$m_blue <- (-1/data$m_red)
abline(0, data$m_blue, col='blue')
# Solve for each point's b-intercept (if using the blue slope)
# y = m_blue * x + b
# b = y - m_blue * x
data$b <- data$y - data$m_blue * data$x
# Solve for where each point (using the m_blue slope) intersects the red line (x' and y')
# y' = m_blue * x' + b
# y' = m_red * x' + b_red
# Set those equations equal to each other and solve for x'
data$x_intersect <- (data$b_red - data$b) / (data$m_blue - data$m_red)
# Then solve for y'
data$y_intersect <- data$m_blue * data$x_intersect + data$b
# Calculate the distance between the point and where it intersects the red line
data$dist <- sqrt( (data$x - data$x_intersect)^2 + (data$y - data$y_intersect)^2 )
Related
I have some code which plots two lines from a set of points.
I need to find the intersect of the two.
red <- read.table("red.txt", header=TRUE) #Get's table
mag <- red$T #T values
outer <- red$O #outer values
inner <- red$I #inner values
plot(outer, mag) #plot outer points
abline(lm(mag~outer)) #line of best fit
abline(inner, mag) #add inner line
It produces a graph which is similar to what I want.
(I am aware it is not the most stylish). How can I find the coordinates of the lines crossing?
(sorry about the dodgy probably disgusting code I am just learning how to use it)
You can solve this system of equations as #SteveM said using some linear algebra, which is done below using the solve function.
lm gives you the coefficients for the line of best fit you just have to store it, which is done below in the object fit.
You have coded the slope and intercept for your other line. The slope that you are plotting is mag[1] and the intercept is inner[1]. Note: you are passing abline the vectors mag and inner, but this function takes single values so it is only using the first element of each vector.
Using the form mx - y = -b put the x and negated y coefficients into a matrix A. These coefficients are m and -1 for x and y. Put their negated intercepts into a vector b.
The output of solve will give you the x and y values (in that order) where the two lines intersect.
fit <- lm(mag~outer)
plot(outer, mag) #plot outer points
abline(fit) #line of best fit
abline(inner[1], mag[1]) #add inner line
A <- matrix(c(coef(fit)[2], -1,
mag[1], -1), byrow = T, nrow = 2)
b <- c(-coef(fit)[1], -inner[1])
(coord <- solve(A, b))
[1] 1.185901 239.256914
points(coord[1], coord[2], col = "red")
Generate the equations of the lines from the slope/intercept (m/b) regression outputs. So
reg1 <- lm(y1 ~ x1)
reg2 <- lm(y2 ~ x2)
Provide the slopes (m1, m2) and the intercepts (b1, b2)
y1 = m1x1 + b1
y2 = m2x2 + b2
Since x1 = x2 = x and y1 = y2 = y then
m1x - y = -b1
m2x - y = -b2
You can solve the equations for the x,y unknowns.
I was wondering how I can efficiently (using short R code) fill a curve with points that can fill up the area under my curve?
I have tried something without success, here is my R code:
data = rnorm(1000) ## random data points to fill the curve
curve(dnorm(x), -4, 4) ## curve to be filled by "data" above
points(data) ## plotting the points to fill the curve
Here's a method that uses interpolation to ensure that the plotted points won't exceed the height of the curve (although, if you want the actual point markers to not stick out above the curve, you'll need to set the threshold slightly below the height of the curve):
# Curve to be filled
c.pts = as.data.frame(curve(dnorm(x), -4, 4))
# Generate 1000 random points in the same x-interval and with y value between
# zero and the maximum y-value of the curve
set.seed(2)
pts = data.frame(x=runif(1000,-4,4), y=runif(1000,0,max(c.pts$y)))
# Using interpolation, keep only those points whose y-value is less than y(x)
pts = pts[pts$y < approx(c.pts$x,c.pts$y,xout=pts$x)$y, ]
# Plot the points
points(pts, pch=16, col="red", cex=0.7)
A method for plotting exactly a desired number of points under a curve
Responding to #d.b's comment, here's a way to get exactly a desired number of points plotted under a curve:
First, let's figure out how many random points we need to generate over the entire plot region in order to get (roughly) a target number of points under the curve. We do this as follows:
Calculate the area under the curve as a fraction of the area of the rectangle bounded by zero and the maximum height of the curve on the vertical axis, and by the width of the curve on the horizontal axis.
The number of random points we need to generate is the target number of points, divided by the area ratio calculated above.
# Area ratio
aa = sum(c.pts$y*median(diff(c.pts$x)))/(diff(c(-4,4))*max(c.pts$y))
# Target number of points under curve
n.target = 1000
# Number of random points to generate
n = ceiling(n.target/aa)
But we need more points than this to ensure we get at least n.target, because random variation will result in fewer than n.target points about half the time, once we limit the plotted points to those below the curve. So we'll add an excess.factor in order to generate more points under the curve than we need, then we'll just randomly select n.target of those points to plot. Here's a function that takes care of the entire process for a general curve.
# Plot a specified number of points under a curve
pts.under.curve = function(data, n.target=1000, excess.factor=1.5) {
# Area under curve as fraction of area of plot region
aa = sum(data$y*median(diff(data$x)))/(diff(range(data$x))*max(data$y))
# Number of random points to generate
n = excess.factor*ceiling(n.target/aa)
# Generate n random points in x-range of the data and with y value between
# zero and the maximum y-value of the curve
pts = data.frame(x=runif(n,min(data$x),max(data$x)), y=runif(n,0,max(data$y)))
# Using interpolation, keep only those points whose y-value is less than y(x)
pts = pts[pts$y < approx(data$x,data$y,xout=pts$x)$y, ]
# Randomly select only n.target points
pts = pts[sample(1:nrow(pts), n.target), ]
# Plot the points
points(pts, pch=16, col="red", cex=0.7)
}
Let's run the function for the original curve:
c.pts = as.data.frame(curve(dnorm(x), -4, 4))
pts.under.curve(c.pts)
Now let's test it with a different distribution:
# Curve to be filled
c.pts = as.data.frame(curve(df(x, df1=100, df2=20),0,5,n=1001))
pts.under.curve(c.pts, n.target=200)
n_points = 10000 #A large number
#Store curve in a variable and plot
cc = curve(dnorm(x), -4, 4, n = n_points)
#Generate 1000 random points
p = data.frame(x = seq(-4,4,length.out = n_points), y = rnorm(n = n_points))
#OR p = data.frame(x = runif(n_points,-4,4), y = rnorm(n = n_points))
#Find out the index of values in cc$x closest to p$x
p$ind = findInterval(p$x, cc$x)
#Only retain those points within the curve whose p$y are smaller than cc$y
p2 = p[p$y >= 0 & p$y < cc$y[p$ind],] #may need p[p$y < 0.90 * cc$y[p$ind],] or something
#Plot points
points(p2$x, p2$y)
Given a set of data I have calculated an ellipse that fit to them using the next command:
eli<-ellipse(cor(x,y),scale=c(sd(x),sd(y)), centre=c(mean(x), mean(y)), level = 0.95)
Where "x" and "y" are the columns of my bivariate data. I would like to know how to find the elements of my ellipse (in red), say: the foci and the a" and "b" values.
In an attempt to find the semi-axis distance I tried to get a lineal regression of the data but I truly doubt of my method
How can I find those parameters? Or get the equation of the ellipse?
Since ellipse generates 100 points, this approach may be accurate enough. Of course you could set npoints to higher value to increase accuracy. I've also made plots to explain.
#rm(list = ls()) #Remove everything from the environment
#Generate some points
set.seed(42)
x = rnorm(20,5,1)
y = rnorm(20,5,2)
#Fit Ellipse
require(ellipse)
eli = ellipse(cor(x,y),scale=c(sd(x),sd(y)), centre=c(mean(x), mean(y)), level = 0.95, npoints = 250)
#Draw ellipse and points
plot(eli[,1], eli[,2], type = "l", asp = 1)
points(x,y)
#Calculate the center of ellipse
eli_center = c(mean(eli[,1]), mean(eli[,2]))
#Plot eli_center
points(eli_center[1], eli_center[2], pch = 19, cex = 1.5)
#A function to calculate distance between points 'x1' and 'x2'
dist_2_points <- function(x1, x2) {
return(sqrt(sum((x1 - x2)^2)))
}
#Compute distance of each point in ellipse from eli_center
distance = numeric(0)
for (i in 1:nrow(eli)){
distance[i] = dist_2_points(eli_center, eli[i,])
}
#The maximum distance from eli_center is 'a'
a = distance[which.max(distance)]
a_point = eli[ which.max(distance), ]
#Draw 'a'
points(a_point[1],a_point[2], pch = 5)
lines(rbind(eli_center, a_point))
#The minimum distance from eli_center is 'b'
b = distance[which.min(distance)]
b_point = eli[ which.min(distance), ]
#Draw 'b'
points(b_point[1],b_point[2], pch = 5)
lines(rbind(eli_center, b_point))
#find foci
foci = sqrt(a^2 - b^2)
This is the code that the car:::ellipse function uses after doing some error checking and other "housekeeping":
angles <- (0:segments) * 2 * pi/segments
unit.circle <- cbind(cos(angles), sin(angles))
Q <- chol(shape, pivot = TRUE)
order <- order(attr(Q, "pivot"))
ellipse <- t(center + radius * t(unit.circle %*% Q[, order]))
colnames(ellipse) <- c("x", "y")
You will notice that the regression line you drew was a bit "off-axis". If you drew in the line from X regressed on Y it would also be "off-axis" in the other direction. Do a search on "total least squares regression" or "Deming regression" (and you'll find some other names that I'm not coming up with off the top of my head.) Regression lines determined by ordinary least squares lines do not go through the major axis of the ellipse that that you are calculating.
I am using the taylor.diagram function in the plotrix package e.g.
obs = runif(100,1,100)
mod1 = runif(100,1,100)
mod2 = runif(100,1,100)
mod3 = runif(100,1,100)
taylor.diagram(obs,mod1)
taylor.diagram(obs,mod2,add=TRUE)
taylor.diagram(obs,mod3,add=TRUE)
In the conventional Taylor diagram there is no bias but in his paper (Taylor, 2001, K.E. Summarizing multiple aspects of model performance in a single diagram Taylor JGR, 106, 7183-7192) Taylor says that
"Although the diagram has been designed to convey information about centered pattern differences it is also possible to indicate differences in overall means (i.e., the bias). This can be done on the diagram by attaching to each plotted point a line segment drawn at a right angle to the straight line defined by the point and the reference point. If the length of the attached line segment is equal to the bias, then the distance from the reference point to the end of the line segment will be equal to the total (uncentered) RMS error"
I admit that I don't know where to start to try and do this. Has anyone succeeded at adding this information on the plot?
If I understand correctly the bias is the difference in means between the model vector and the observation vector. Then, the problem is to, (a) find the line between the observation and model points, (b) find a line perpendicular to this line, (c) find a point along the perpendicular line, at a distance from the model point equal to the bias.
One possible solution is:
taylor.bias <- function(ref, model, normalize = FALSE){
R <- cor(model, ref, use = "pairwise")
sd.f <- sd(model)
sd.r <- sd(ref)
m.f <- mean(model)
m.r <- mean(ref)
## normalize if requested
if (normalize) {
m.f <- m.f/sd.r
m.r <- m.r/sd.r
sd.f <- sd.f/sd.r
sd.r <- 1
}
## calculate bias
bias <- m.f - m.r
## coordinates for model and observations
dd <- rbind(mp = c(sd.f * R, sd.f * sin(acos(R))), rp = c(sd.r, 0))
## find equation of line passing through pts
v1 <- solve(cbind(1, dd[,1])) %*% dd[,2]
## find perpendicular line
v2 <- c(dd[1,2] + dd[1,1]/v1[2], -1/v1[2])
## find point defined by bias
nm <- dd[1,] - c(0, v2[1])
nm <- nm / sqrt(sum(nm^2))
bp <- dd[1,] + bias*nm
## plot lines
arrows(x0 = dd[1,1], x1 = bp[1], y0 = dd[1,2], y1 = bp[2], col = "red", length = 0.05, lwd = 1.5)
lines(rbind(dd[2,], bp), col = "red", lty = 3)
lines(dd, col = "red", lty = 3)
}
Then,
library(plotrix)
obs = runif(100,1,100)
mod1 = runif(100,1,100)
taylor.diagram(obs,mod1)
taylor.bias(obs,mod1)
Where the length of the red vector indicates the bias and the length of dotted line joining the vector's tip to the reference point is the RMS error. The direction of the red vector indicates the sign of the bias -- in the picture below, negative bias.
Is there a way to draw a simple ellipse based on the following definition (instead of eigenvalue) in R?
The definition I want to use is that an ellipse is the set of points in a plane for which the sum of the distances to two fixed points F1 and F2 is a constant.
Should I just use a polar cordinate?
This may be more algorithmic question.
As #DWin suggested, there are several implementations for plotting ellipses (such as function draw.ellipse in package plotrix). To find them:
RSiteSearch("ellipse", restrict="functions")
That being said, implementing your own function is fairly simple if you know a little geometry. Here is an attempt:
ellipse <- function(xf1, yf1, xf2, yf2, k, new=TRUE,...){
# xf1 and yf1 are the coordinates of your focus F1
# xf2 and yf2 are the coordinates of your focus F2
# k is your constant (sum of distances to F1 and F2 of any points on the ellipse)
# new is a logical saying if the function needs to create a new plot or add an ellipse to an existing plot.
# ... is any arguments you can pass to functions plot or lines (col, lwd, lty, etc.)
t <- seq(0, 2*pi, by=pi/100) # Change the by parameters to change resolution
k/2 -> a # Major axis
xc <- (xf1+xf2)/2
yc <- (yf1+yf2)/2 # Coordinates of the center
dc <- sqrt((xf1-xf2)^2 + (yf1-yf2)^2)/2 # Distance of the foci to the center
b <- sqrt(a^2 - dc^2) # Minor axis
phi <- atan(abs(yf1-yf2)/abs(xf1-xf2)) # Angle between the major axis and the x-axis
xt <- xc + a*cos(t)*cos(phi) - b*sin(t)*sin(phi)
yt <- yc + a*cos(t)*sin(phi) + b*sin(t)*cos(phi)
if(new){ plot(xt,yt,type="l",...) }
if(!new){ lines(xt,yt,...) }
}
An example:
F1 <- c(2,3)
F2 <- c(1,2)
plot(rbind(F1, F2), xlim=c(-1,5), ylim=c(-1, 5), pch=19)
abline(h=0, v=0, col="grey90")
ellipse(F1[1], F1[2], F2[1], F2[2], k=2, new=FALSE, col="red", lwd=2)
points((F1[1]+F2[1])/2, (F1[2]+F2[2])/2, pch=3)