I have some code which plots two lines from a set of points.
I need to find the intersect of the two.
red <- read.table("red.txt", header=TRUE) #Get's table
mag <- red$T #T values
outer <- red$O #outer values
inner <- red$I #inner values
plot(outer, mag) #plot outer points
abline(lm(mag~outer)) #line of best fit
abline(inner, mag) #add inner line
It produces a graph which is similar to what I want.
(I am aware it is not the most stylish). How can I find the coordinates of the lines crossing?
(sorry about the dodgy probably disgusting code I am just learning how to use it)
You can solve this system of equations as #SteveM said using some linear algebra, which is done below using the solve function.
lm gives you the coefficients for the line of best fit you just have to store it, which is done below in the object fit.
You have coded the slope and intercept for your other line. The slope that you are plotting is mag[1] and the intercept is inner[1]. Note: you are passing abline the vectors mag and inner, but this function takes single values so it is only using the first element of each vector.
Using the form mx - y = -b put the x and negated y coefficients into a matrix A. These coefficients are m and -1 for x and y. Put their negated intercepts into a vector b.
The output of solve will give you the x and y values (in that order) where the two lines intersect.
fit <- lm(mag~outer)
plot(outer, mag) #plot outer points
abline(fit) #line of best fit
abline(inner[1], mag[1]) #add inner line
A <- matrix(c(coef(fit)[2], -1,
mag[1], -1), byrow = T, nrow = 2)
b <- c(-coef(fit)[1], -inner[1])
(coord <- solve(A, b))
[1] 1.185901 239.256914
points(coord[1], coord[2], col = "red")
Generate the equations of the lines from the slope/intercept (m/b) regression outputs. So
reg1 <- lm(y1 ~ x1)
reg2 <- lm(y2 ~ x2)
Provide the slopes (m1, m2) and the intercepts (b1, b2)
y1 = m1x1 + b1
y2 = m2x2 + b2
Since x1 = x2 = x and y1 = y2 = y then
m1x - y = -b1
m2x - y = -b2
You can solve the equations for the x,y unknowns.
Related
I would like to fit a line through two points from a random distribution of points, then calculate the location along that line that each point intersects it orthogonally. I am not interested in the residual distance of each point from the line (points above/below the line are treated equally), I am only interested in calculating the location along the line of where that point would intersect (e.g. points at different distances from the line but at the same orthogonal location would have the same value). The data aren't connected to the line explicitly as the abline is drawn from the location of only 2 points, and so i can't extract these values in a classic residual type way. I don't think this is difficult, but I can't wrap by head around how to calculate it and it's really bugging me!
I have explored the dist2d function but that calculates the orthogonal distance of each point to the line. Is there a way to use that value to the then calculate the hypotenuse from the data point to some fixed constant point on the line, and then in turn calculate the adjacent distance from that constant? I would really appreciate any help!
#here is some example starter code here to visualise what I mean
#get random data
r = rnorm(100)
t = rnorm(100)
#bind and turn into a df
data = cbind(r,t)
data = as.data.frame(data)
head(data)
#plot
plot(data)
#want to draw abline between 2 points
#isolate points of interest
#here randomly select first two rows
d = data[c(1:2),]
head(d)
#calculate abline through selected points
lm = lm(t ~ r, d)
abline(lm)
#draw points to see which ones they cut through
points(d$r, d$t, bg = "red", pch = 21)
This code below works.
# Create dataframe
data = data.frame(x = rnorm(100), y = rnorm(100))
plot(data, xlim=c(-3, 3), ylim=c(-3, 3))
# Select two points
data$x1_red <- data[1,1]; data$y1_red <- data[1,2]; data$x2_red <- data[2,1]; data$y2_red <- data[2,2];
points(data$x1_red, data$y1_red, bg = "red", pch = 21); points(data$x2_red, data$y2_red, bg = "red", pch = 21);
# Show a red line where the points intersect
# Get its slope (m_red) and intercept (b_red)
data$m_red <- (data[2,2] - data[1,2]) / (data[2,1] - data[1,1])
data$b_red <- data$y1_red - data$m * data$x1_red
abline(data$b_red, data$m_red, col='red')
# Calculate the orthogonal slope
data$m_blue <- (-1/data$m_red)
abline(0, data$m_blue, col='blue')
# Solve for each point's b-intercept (if using the blue slope)
# y = m_blue * x + b
# b = y - m_blue * x
data$b <- data$y - data$m_blue * data$x
# Solve for where each point (using the m_blue slope) intersects the red line (x' and y')
# y' = m_blue * x' + b
# y' = m_red * x' + b_red
# Set those equations equal to each other and solve for x'
data$x_intersect <- (data$b_red - data$b) / (data$m_blue - data$m_red)
# Then solve for y'
data$y_intersect <- data$m_blue * data$x_intersect + data$b
# Calculate the distance between the point and where it intersects the red line
data$dist <- sqrt( (data$x - data$x_intersect)^2 + (data$y - data$y_intersect)^2 )
I want to see how three variables x, y, and z respond to a function f using R.
I've searched for R solutions (e.g. rgl using 4d plots) but none seem to allow the input of a function as the fourth variable while allowing manipulation of x, y, and z across their full range of values.
# First I create three variables that each have a domain 0 to 4
x
y
z
# Then I create a function from those three variables
f <- sqrt(x^2 + y^2 + z^2)
EDIT: I originally stated that I wanted x, y, and z to be seq(0, 4, 0.01) but in fact I only want them to range from 0 to 4, and do so independently of other variables. In other words, I want to plot the function across a range of values letting x move independently of y and z and so forth, rather than plotting a 3-D line. The result should be a 3-D surface.
I want to:
a) see how the function f responds to all possible combinations of x, y, and z across a range of x, y, and z values 0 to 4, and
b) find what maxima/minima exist especially when holding one variable constant.
This is rather a mathematical questions. Unfortunately, our computer screens are not really made fro 4D, neither our brains. So what you ask wont be possible as if. Indeed, you want to show a dense set of data (a cube between 0 and 4), and we can not display what is "inside" the cube.
To come back to R, you can always display a slice of it, for example fixing z and plot sqrt(x^2 + y^2 + z^2) for x and y. Here you have two examples:
# Points where the function should be evaluated
x <- seq(0, 4, 0.01)
y <- seq(0, 4, 0.01)
z <- seq(0, 4, 0.01)
# Compute the distance from origin
distance <- function(x,y,z) {
sqrt(x^2 + y^2 + z^2)
}
# Matrix to store the results
slice=matrix(0, nrow=length(x),ncol=length(y))
# Fill the matrix with a slice at z=3
i=1
for (y_val in y)
{
slice[,i]=distance(x,y_val,3)
i=i+1
}
# PLot with plot3D library
require(plot3D)
persp3D(z = slice, theta = 100,phi=50)
# PLot with raster library
library(raster)
plot(raster(slice,xmn=min(x), xmx=max(x), ymn=min(y), ymx=max(y)))
If you change your z values, you will not really change the shape (just making it "flatter" for bigger z). Note that the function being symmetric in x, y and z, the same plots are produced if you keep xor y constant.
For your last question about the maximum, you can re-use the slice matrix and do:
max_ind=which(slice==max(slice),arr.ind = TRUE)
x[max_ind[,1]]
y[max_ind[,2]]
(see Get the row and column name of the minimum element of a matrix)
But again with math we can see from your equation that the maximum will always be obtained by maxing x, y and z. Indeed, the function simply measure the distance from the origin.
I have a 2-dimensional data set.
I use the R's smooth.spline function to smooth my points graph following an example in this article:
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/predict.smooth.spline.html
So that I get the spline graph similar to the green line on this picture
I'd like to know the X values, where the first derivative of the smoothing spline equals zero (to determine exact minimum or maximum).
My problem is that my initial dataset (or a dataset that I could auto-generate) to feed into the predict() function does not contain such exact X values that correspond to the smoothing spline extrema.
How can I find such X values?
Here is the picture of the first derivative of the green spline line above
But exact X coordinate of extremums are still not exact.
My approximate R script to generate the pictures looks like the following
sp1 <- smooth.spline(df)
pred.prime <- predict(sp1, deriv=1)
pred.second <- predict(sp1, deriv=2)
d1 <- data.frame(pred.prime)
d2 <- data.frame(pred.second)
dfMinimums <- d1[abs(d1$y) < 1e-4, c('x','y')]
I think that there are two problems here.
You are using the original x-values and they are spaced too far apart AND
Because of the wide spacing of the x's, your threshold for where you consider the derivative "close enough" to zero is too high.
Here is basically your code but with many more x values and requiring smaller derivatives. Since you do not provide any data, I made a coarse approximation to it that should suffice for illustration.
## Coarse approximation of your data
x = runif(300, 0,45000)
y = sin(x/5000) + sin(x/950)/4 + rnorm(300, 0,0.05)
df = data.frame(x,y)
sp1 <- smooth.spline(df)
Spline code
Sx = seq(0,45000,10)
pred.spline <- predict(sp1, Sx)
d0 <- data.frame(pred.spline)
pred.prime <- predict(sp1, Sx, deriv=1)
d1 <- data.frame(pred.prime)
Mins = which(abs(d1$y) < mean(abs(d1$y))/150)
plot(df, pch=20, col="navy")
lines(sp1, col="darkgreen")
points(d0[Mins,], pch=20, col="red")
The extrema look pretty good.
plot(d1, type="l")
points(d1[Mins,], pch=20, col="red")
The points identified look like zeros of the derivative.
You can use my R package SplinesUtils: https://github.com/ZheyuanLi/SplinesUtils, which can be installed by
devtools::install_github("ZheyuanLi/SplinesUtils")
The function to be used are SmoothSplinesAsPiecePoly and solve. I will just use the example under the documentation.
library(SplinesUtils)
## a toy dataset
set.seed(0)
x <- 1:100 + runif(100, -0.1, 0.1)
y <- poly(x, 9) %*% rnorm(9)
y <- y + rnorm(length(y), 0, 0.2 * sd(y))
## fit a smoothing spline
sm <- smooth.spline(x, y)
## coerce "smooth.spline" object to "PiecePoly" object
oo <- SmoothSplineAsPiecePoly(sm)
## plot the spline
plot(oo)
## find all stationary / saddle points
xs <- solve(oo, deriv = 1)
#[1] 3.791103 15.957159 21.918534 23.034192 25.958486 39.799999 58.627431
#[8] 74.583000 87.049227 96.544430
## predict the "PiecePoly" at stationary / saddle points
ys <- predict(oo, xs)
#[1] -0.92224176 0.38751847 0.09951236 0.10764884 0.05960727 0.52068566
#[7] -0.51029209 0.15989592 -0.36464409 0.63471723
points(xs, ys, pch = 19)
One caveat in the #G5W implementation that I found is that it sometimes returns multiple records close around extrema instead of a single one. On the diagram they cannot be seen, since they all fall into one point effectively.
The following snippet from here filters out single extrema points with the minimum value of the first derivative:
library(tidyverse)
df2 <- df %>%
group_by(round(y, 4)) %>%
filter(abs(d1) == min(abs(d1))) %>%
ungroup() %>%
select(-5)
I'm trying to make a 3D scatterplot with boudaries or zones based on combinations of 3 variables that return certain values. The variables each range between 0:1, and combine to make an index that ranges from -1:1 as follows:
f(x,y,z) = (x*y)-z
I'd like to create a visual representation that will highlight all combinations of variables that return a certain index value. As an example, I can easily show those variables where index > 0 using scatterplot3d (rgl would also work):
# Create imaginary dataset of 50 observations for each variable
x<-runif(50,0,1)
y<-runif(50,0,1)
z<-runif(50,0,1)
# Create subset where f(x,y,z) > 0
x1<-y1<-z1<-1
for (i in 1:length(x)){ if ((x[i]*y[i])-z[i] > 0) {
x1<-rbind(x1, x[i])
y1<-rbind(y1, y[i])
z1<-rbind(z1, z[i])}
}
s3d<-scatterplot3d(x,y,z) # Plot entire dataset
s3d$points3d(x1,y1,z1,pch=19, col="red") # Highlight subset where f(x,y,z) > 0
This gives me the following graph:
It seems fairly intuitive that there should be an easy way to plot either the surface (extending from top/right/back to bottom/left/front) separating the subset from the rest of the data, or else a volume/3D area within which these plots lie. E.g. my first instinct was to use something like surface3d, persp3d or planes3d. However, all attempts so far have only yielded error messages. Most solutions seem to use some form of z<-lm(y~x) but I obviously need something like q<-func((x*y)-z) for all values of x, y and z that yield q > 0.
I know I could calculate extreme points and use them as vertices for a 3D polygon, but that seems too "manual". It feels like I'm overlooking something fairly simple and obvious. I've looked at many similar questions on Stack but can't seem to find one that fits my particular problem. If I've missed any and this question has been answered already, please do point me in the right direction!
Here is a suggestion for an interactive 3D plot that is based on an example from the "R Graphics Cookbook" by Winston Chang.
set.seed(4321)
library(rgl)
interleave <- function(v1,v2) as.vector(rbind(v1,v2))
x <- runif(50)
y <- runif(50)
z <- runif(50)
plot3d(x, y, z, type="s", size=0.6, col=(2+(x*y<z)))
x0 <- y0 <- seq(0, 1, 0.1)
surface3d(x0, y0, outer(x0, y0), alpha=0.4) #plot the surface f(x,y)=x*y
x1 <- x[x * y > z] #select subset that is below the separating surface
y1 <- y[x * y > z]
z1 <- z[x * y > z]
segments3d(interleave(x1, x1), #highlight the distance of the points below the surface
interleave(y1, y1),
interleave(x1 * y1, z1), col="red", alpha=0.4)
If you don't like the red lines and only want the surface and the colored points, this will be sufficient:
plot3d(x,y,z,type="s",size=0.6,col=(2+(x*y<z)))
x0 <- y0 <- seq(0,1,0.1)
surface3d(x0,y0,outer(x0,y0),alpha=0.4)
Does this representation provide the information that you wanted to highlight?
The first thought was to see if the existing functions within scatterplot3d could handle the problem but I think not:
my.lm <- lm(z ~ I(x) * I(y)+0)
s3d$plane3d(my.lm, lty.box = "solid", col="red")
pkg:scatterplot3d doesn't really have a surface3d function so you will need to choose a package that provides that capability; say 'rgl', 'lattice', or 'plot3d'. Any of them should provide the needed facilities.
I am using the taylor.diagram function in the plotrix package e.g.
obs = runif(100,1,100)
mod1 = runif(100,1,100)
mod2 = runif(100,1,100)
mod3 = runif(100,1,100)
taylor.diagram(obs,mod1)
taylor.diagram(obs,mod2,add=TRUE)
taylor.diagram(obs,mod3,add=TRUE)
In the conventional Taylor diagram there is no bias but in his paper (Taylor, 2001, K.E. Summarizing multiple aspects of model performance in a single diagram Taylor JGR, 106, 7183-7192) Taylor says that
"Although the diagram has been designed to convey information about centered pattern differences it is also possible to indicate differences in overall means (i.e., the bias). This can be done on the diagram by attaching to each plotted point a line segment drawn at a right angle to the straight line defined by the point and the reference point. If the length of the attached line segment is equal to the bias, then the distance from the reference point to the end of the line segment will be equal to the total (uncentered) RMS error"
I admit that I don't know where to start to try and do this. Has anyone succeeded at adding this information on the plot?
If I understand correctly the bias is the difference in means between the model vector and the observation vector. Then, the problem is to, (a) find the line between the observation and model points, (b) find a line perpendicular to this line, (c) find a point along the perpendicular line, at a distance from the model point equal to the bias.
One possible solution is:
taylor.bias <- function(ref, model, normalize = FALSE){
R <- cor(model, ref, use = "pairwise")
sd.f <- sd(model)
sd.r <- sd(ref)
m.f <- mean(model)
m.r <- mean(ref)
## normalize if requested
if (normalize) {
m.f <- m.f/sd.r
m.r <- m.r/sd.r
sd.f <- sd.f/sd.r
sd.r <- 1
}
## calculate bias
bias <- m.f - m.r
## coordinates for model and observations
dd <- rbind(mp = c(sd.f * R, sd.f * sin(acos(R))), rp = c(sd.r, 0))
## find equation of line passing through pts
v1 <- solve(cbind(1, dd[,1])) %*% dd[,2]
## find perpendicular line
v2 <- c(dd[1,2] + dd[1,1]/v1[2], -1/v1[2])
## find point defined by bias
nm <- dd[1,] - c(0, v2[1])
nm <- nm / sqrt(sum(nm^2))
bp <- dd[1,] + bias*nm
## plot lines
arrows(x0 = dd[1,1], x1 = bp[1], y0 = dd[1,2], y1 = bp[2], col = "red", length = 0.05, lwd = 1.5)
lines(rbind(dd[2,], bp), col = "red", lty = 3)
lines(dd, col = "red", lty = 3)
}
Then,
library(plotrix)
obs = runif(100,1,100)
mod1 = runif(100,1,100)
taylor.diagram(obs,mod1)
taylor.bias(obs,mod1)
Where the length of the red vector indicates the bias and the length of dotted line joining the vector's tip to the reference point is the RMS error. The direction of the red vector indicates the sign of the bias -- in the picture below, negative bias.