lets say I have 3 binary points (5,0),(0,5),(-5,0) and I want to find a point equidistant from those 3 point (in short find center of the circle passing through those 3 points). I know from geometry that if my answer is (a,b) then I can find distance between the (a,b) and 3 points and equate them and then solve 3 simultaneous equations. How can I quickly do this in R? I know equations will be linear and all square terms will cancel out.
_____________________________update1
I tried searching google for how to solve linear equations in R. But didnt get good results as all the links expect me to provide LHS coefficients and RHS value for all 3 equations. But I dont have RHS. I have to take 2 equations at a time and move terms to find RHS. Is there any R package that will do this for me?
You've probably moved on from now, but I'm going to add to this thread for posterity sake. I suppose people were just mad that you didn't pose an attempt? I stumbled upon this post looking for help and some of the respondents up there were completely unhelpful. I tried your approach with not much success for my application and wrote the following function, circlefit(), to perform a least squares approximation of the points along the arc if supplied a dataframe with column 1= X, and column2=Y. I'm pretty sure that the code above failed in my application because I had >100 of points along a "fuzzy" edge so my application was more attuned to a least squares approach.
cheers!
x_x<-c(0,0.5, 1, 1.5, 2, 2.5, 3)
y_x<-c(0,.25, 1, 2.25, 4, 6.25, 9)
df<-as.data.frame(cbind(x_x, y_x))
circlefit<-function (df){
names(df)<-c("X", "Y")
#find mean x y so we can cacluate the difference of each X, Y from its respective mean
xmean<-mean(df$X)
ymean<-mean(df$Y)
#adds needed columns for summations required to perform least squares
mat2<-df%>%
mutate(a=X-xmean)%>%
mutate(b=Y-ymean)%>%
mutate(aa=a^2)%>%
mutate(ab=a*b)%>%
mutate(bb=b^2)%>%
mutate(aaa=a^3)%>%
mutate(abb=a*b^2)%>%
mutate(baa=b*a^2)%>%
mutate(bbb=b^3)
#column sums for construction of linear system of equations
Saa<-sum(mat2$aa)
Sab<-sum(mat2$ab)
Sbb<-sum(mat2$bb)
Saaa<-sum(mat2$aaa)
Sbbb<-sum(mat2$bbb)
Sabb<-sum(mat2$abb)
Sbaa<-sum(mat2$baa)
#linear stystem of equations
sums_row1<-c(Saa, Sab)
sums_row2<-c(Sab, Sbb)
sum_mat<-as.matrix(rbind(sums_row1, sums_row2), nrow=2)
#gauss elimination ratio
gauss_ratio<-sum_mat[1,2]/sum_mat[1,1]
#new eliminated row
elim_row2<-sums_row2-(sums_row1*gauss_ratio)
#initial (A,B)
Ac<-0.5*(Saaa+Sabb)
Bc<-0.5*(Sbbb+Sbaa)
#result of Bc after elimination
elim_Bc<-Bc-(gauss_ratio*Ac)
#final deviation of (A, B) from mean
fin_Bc<-elim_Bc/elim_row2[2]
fin_Ac<-(Ac-(fin_Bc*sum_mat[1,2]))/sum_mat[1,1]
#center of least squares fit of circle (xc,yc)
Xc<-xmean+fin_Ac
yc<-ymean+fin_Bc
alpha<-fin_Ac^2+fin_Bc^2+((Saa+Sbb)/nrow(mat2))
#radius of circle
radius<-sqrt(alpha)
#temporarily stores circle parameters, names them and then puts them in the globalEnv
circle_parms<-c(Xc, yc, radius)
names(circle_parms)<-c("Xc", "Yc", "Radius")
circle_parms<<-circle_parms
#generates a ggplot of the the input data and the approximated circle; puts plot in the globalEnv as circleplot
circleplot<<-ggplot(df, aes(x=X, y=Y))+geom_point()+
geom_point(aes(x=Xc, y=yc), color="Red", size=4)+theme(aspect.ratio = 1)
#defines a circle fit function such that it can be added to the circleplot above; this function is available in globalEnv
gg_circle <<- function(r, xc, yc, color="blue", fill=NA, ...) {
x <- xc + r*cos(seq(0, pi, length.out=100))
ymax <- yc + r*sin(seq(0, pi, length.out=100))
ymin <- yc + r*sin(seq(0, -pi, length.out=100))
annotate("ribbon", x=x, ymin=ymin, ymax=ymax, color=color, fill=fill, ...)
}
#adds the fit circle to the data.
circleplot+gg_circle(r=radius, xc=Xc, yc=yc)
}
circlefit(df)
circle_parms
I used the link that was given in comments. My code is as below
#finding circles center
p3=c(0,5,5,0,-5,0)#coordinates of point in (x1,y1,x2,y2,x3,y3) format
mat1=matrix(c(p3[1]^2+p3[2]^2,p3[2],1,p3[3]^2+p3[4]^2,p3[4],1,p3[5]^2+p3[6]^2,p3[6],1),nrow=3,ncol=3,byrow=TRUE)
mat2=matrix(c(p3[1],p3[1]^2+p3[2]^2,1,p3[3],p3[3]^2+p3[4]^2,1,p3[5],p3[5]^2+p3[6]^2,1),nrow=3,ncol=3,byrow=TRUE)
mat3=matrix(c(p3[1],p3[2],1,p3[3],p3[4],1,p3[5],p3[6],1),nrow=3,ncol=3,byrow=TRUE)
mat1
mat2
mat3
xcenter=det(mat1)/(2*det(mat3))
ycenter=det(mat2)/(2*det(mat3))
radius=sqrt((p3[1]-xcenter)^2+(p3[2]-ycenter)^2)
Related
I´m working on the topic of calculating the robust working range of a process. For this purpose I´m building models from DOE data and simulating data with a monte carlo approach. Filtering the data with a criteria for the response leads to a allowed space (see plots for better visualization).
In the example below, there are 3 variables and the goal is to calculate the biggest possible square (in parallel with the axis) within the allowed room. This would describe the working range of the process. The coding is just to get every variable in the same range (-1 to 1).
library(tidyverse)
library(MASS)
library(ggplot2)
library(gridExtra)
library(rgl)
df<-data.frame(
X1=runif(100,0,2),
X2=runif(100,10,30),
X3=runif(100,5,75))%>%
mutate(Y1=2*X1-2*X2+X3)
f1<-Y1~X1+X2+X3
model1<- lm(f1, data=df)
m.c <- NULL
n=10000
for (k in 1:n)
{
X1=runif(1,0,2)
X2=runif(1,10,30)
X3=runif(1,5,75)
m.c = rbind(m.c, data.frame(X1, X2, X3))
}
m.c_coded<-m.c%>%
mutate(predict1=predict(model1, newdata = .))%>%
mutate(X1=(X1-1/1))%>%
mutate(X2=(X2-20)/10)%>%
mutate(X3=(X3-40)/35)
Space<- m.c_coded%>%
filter(predict1<=0)
p1<-ggplot(Space)+
geom_point(aes(X1, X2))+
xlim(-1,1)+
ylim(-1,1)
p2<-ggplot(Space)+
geom_point(aes(X1, X3))+
xlim(-1,1)+
ylim(-1,1)
p3<-ggplot(Space)+
geom_point(aes(X2, X3))+
xlim(-1,1)+
ylim(-1,1)
grid.arrange(arrangeGrob(p1,p2,p3, nrow = 1), nrow = 1)
MODR_plot3D<-plot3d( x=Space$X1, y=Space$X2, z=Space$X3, type = "p",
xlim = (c(-1,1)), ylim(c(-1,1)), zlim = (c(-1,1))
)
There are specialized programms for that (DOE software) which can calculate this so called Design-space, but I want to implement it in my R skript. Sadly I do not have any idea, how I can calculate the position (edges) of this square. My approach would be to find the maximum distance to the surface on (center of the square).
Does anyone an idea how I can calculate this cube in a proper way? If possible I want to extend this also for the n-dimensional room.
How do I plot decision boundary from weight vector?
My original data is 2-dimensional but non-linearly separable so I used a polynomial transformation of order 2 and therefore I ended up with a 6-dimensional weight vector.
Here's the code I used to generate my data:
polar2cart <- function(theta,R,x,y){
x = x+cos(theta) * R
y = y+sin(theta) * R
c=matrix(x,ncol=1000)
c=rbind(c,y)
}
cart2polar <- function(x, y)
{
r <- sqrt(x^2 + y^2)
t <- atan(y/x)
c(r,t)
}
R=5
eps=5
sep=-5
c1<-polar2cart(pi*runif(1000,0,1),runif(1000,0,eps)+R,0,0)
c2<-polar2cart(-pi*runif(1000,0,1),runif(1000,0,eps)+R,R+eps/2,-sep)
data <- data.frame("x" = append(c1[1,], c2[1,]), "y" = append(c1[2,], c2[2,]))
labels <- append(rep(1,1000), rep(-1, 1000))
and here's how it is displayed (using ggplot2):
Thank you in advance.
EDIT: I'm sorry if I didn't provide enough information about the weight vector. The algorithm I'm using is pocket which is a variation of perceptron, which means that the output weight vector is the perpendicular vector that determines the hyper-plane in the feature space plus the bias . Therefore, the hyper-plane equation is , where are the variables. Now, since I used a polynomial transformation of order 2 to go from a 2-dimensional space to a 5-dimensional space, my variables are : and thus the equation for my decision boundary is:
So basically, my question is how do I go about drawing my decision boundary given
PS: I've found a solution while waiting, it might not be the best approach but, it gives the expected results. I'll share it as soon as I finish my project if anyone is interested. Meanwhile, I'd love to hear a better alternative.
I have a problem I wish to solve in R with example data below. I know this must have been solved many times but I have not been able to find a solution that works for me in R.
The core of what I want to do is to find how to translate a set of 2D coordinates to best fit into an other, larger, set of 2D coordinates. Imagine for example having a Polaroid photo of a small piece of the starry sky with you out at night, and you want to hold it up in a position so they match the stars' current positions.
Here is how to generate data similar to my real problem:
# create reference points (the "starry sky")
set.seed(99)
ref_coords = data.frame(x = runif(50,0,100), y = runif(50,0,100))
# generate points take subset of coordinates to serve as points we
# are looking for ("the Polaroid")
my_coords_final = ref_coords[c(5,12,15,24,31,34,48,49),]
# add a little bit of variation as compared to reference points
# (data should very similar, but have a little bit of noise)
set.seed(100)
my_coords_final$x = my_coords_final$x+rnorm(8,0,.1)
set.seed(101)
my_coords_final$y = my_coords_final$y+rnorm(8,0,.1)
# create "start values" by, e.g., translating the points we are
# looking for to start at (0,0)
my_coords_start =apply(my_coords_final,2,function(x) x-min(x))
# Plot of example data, goal is to find the dotted vector that
# corresponds to the translation needed
plot(ref_coords, cex = 1.2) # "Starry sky"
points(my_coords_start,pch=20, col = "red") # start position of "Polaroid"
points(my_coords_final,pch=20, col = "blue") # corrected position of "Polaroid"
segments(my_coords_start[1,1],my_coords_start[1,2],
my_coords_final[1,1],my_coords_final[1,2],lty="dotted")
Plotting the data as above should yield:
The result I want is basically what the dotted line in the plot above represents, i.e. a delta in x and y that I could apply to the start coordinates to move them to their correct position in the reference grid.
Details about the real data
There should be close to no rotational or scaling difference between my points and the reference points.
My real data is around 1000 reference points and up to a few hundred points to search (could use less if more efficient)
I expect to have to search about 10 to 20 sets of reference points to find my match, as many of the reference sets will not contain my points.
Thank you for your time, I'd really appreciate any input!
EDIT: To clarify, the right plot represent the reference data. The left plot represents the points that I want to translate across the reference data in order to find a position where they best match the reference. That position, in this case, is represented by the blue dots in the previous figure.
Finally, any working strategy must not use the data in my_coords_final, but rather reproduce that set of coordinates starting from my_coords_start using ref_coords.
So, the previous approach I posted (see edit history) using optim() to minimize the sum of distances between points will only work in the limited circumstance where the point distribution used as reference data is in the middle of the point field. The solution that satisfies the question and seems to still be workable for a few thousand points, would be a brute-force delta and comparison algorithm that calculates the differences between each point in the field against a single point of the reference data and then determines how many of the rest of the reference data are within a minimum threshold (which is needed to account for the noise in the data):
## A brute-force approach where min_dist can be used to
## ameliorate some random noise:
min_dist <- 5
win_thresh <- 0
win_thresh_old <- 0
for(i in 1:nrow(ref_coords)) {
x2 <- my_coords_start[,1]
y2 <- my_coords_start[,2]
x1 <- ref_coords[,1] + (x2[1] - ref_coords[i,1])
y1 <- ref_coords[,2] + (y2[1] - ref_coords[i,2])
## Calculate all pairwise distances between reference and field data:
dists <- dist( cbind( c(x1, x2), c(y1, y2) ), "euclidean")
## Only take distances for the sampled data:
dists <- as.matrix(dists)[-1*1:length(x1),]
## Calculate the number of distances within the minimum
## distance threshold minus the diagonal portion:
win_thresh <- sum(rowSums(dists < min_dist) > 1)
## If we have more "matches" than our best then calculate a new
## dx and dy:
if (win_thresh > win_thresh_old) {
win_thresh_old <- win_thresh
dx <- (x2[1] - ref_coords[i,1])
dy <- (y2[1] - ref_coords[i,2])
}
}
## Plot estimated correction (your delta x and delta y) calculated
## from the brute force calculation of shifts:
points(
x=ref_coords[,1] + dx,
y=ref_coords[,2] + dy,
cex=1.5, col = "red"
)
I'm very interested to know if there's anyone that solves this in a more efficient manner for the number of points in the test data, possibly using a statistical or optimization algorithm.
I am using R to visualize some data. I am found RGL to be a great library for plotting points.
points3d(x,y,z)
where x = c(x1,x2, ...), y = c(y1,y2,...), z = c(z1,z2, ...) and x,y,z have the same length, is a great function for plotting large sets of data.
Now, I would like to plot ellipses, mixed in with the data. I have a characterization of ellipses by a center point C, a vector describing the major axis U, and a vector describing the minor axis V. I obtain points P on the boundary of the ellipse by
P = U*cos(t) + V*sin(t) (t ranges between 0 and 2*pi)
obtaining vectors, xt, yt, and zt. Then I can plot the ellipse with
polygon3d(xt,yt,zt)
It works fine, but I'm guessing everyone reading is cringing, and will tell me that this is a bad way to do this. Indeed it takes a couple seconds to render each ellipse this way.
I don't think the ellipse3d function from the RGL package works here; at the very least, I am not working a matrix of covariances, nor do I understand how to get the ellipse I want from this function. Also, it returns an ellipsoid, not an ellipse.
****** EDIT ************
For a concrete example that takes awhile:
library(rgl)
open3d()
td <- c(0:359)
t <- td*pi/180
plotEllipseFromVector <- function(c,u,v){
xt <- c[1] + u[1]*cos(t) + v[1]*sin(t)
yt <- c[2] + u[2]*cos(t) + v[2]*sin(t)
zt <- c[3] + u[3]*cos(t) + v[3]*sin(t)
polygon3d(xt,yt,zt)
}
Input center point, major, and minor axis you want. It takes just over 2 seconds for me.
On the other hand, if I change t to be 0,20,40,... 340, then it works quite fast.
First, this is my first Stack Overflow question so I apologize for violating and decorum. Second, I realize this will be very trivial but I'm stumped. I'm trying to figure out how to find the minimum and maximum gradients on a sigmoidal curve.
I have a function that generates a vector of y values that form a sigmoidal curve:
#function to generate Sigmoid curves - works better with enough Xs to be smooth
genSigmoid = function(a, b, c, theta){
y = c + ((1-c) / (1 + exp(-a*(theta-b))))
return(y)
}
x<-c(1:100)
y<-genSigmoid(.25, .50, 0, x)
plot(x, y, type="n")
lines(x, y)
What I would like to do is find the points along this curve where the gradient is the smallest or zero and the points where the gradient is largest. My ultimate goal is to plot the different sections of this curve with different lines styles according the strength of the gradient along the curve. I can generate these different styles by 'eye-balling' it but it would be nice to have something that can do this more precisely.
You could do this using the grad(...) function in package numDeriv.
genSigmoid = function(theta,pars){
y <- with(pars,c + ((1-c) / (1 + exp(-a*(theta-b)))))
return(y)
}
x<-c(1:100)
pars <- list(a=0.25, b=0.50, c=0.0)
y<-genSigmoid(x, pars)
plot(x, y, type="l", ylim=c(0,1), col="blue")
library(numDeriv)
z<-grad(genSigmoid,x,pars=pars)
lines(x,z,col="red")
Here z is a vector of the derivative of genSigmoid(...) with respect to theta.
I redefined your function a bit to make the calling sequence simpler (combined the parameters into a named list, and reversed the order of the arguments).
Plotting segments of the curve with different line styles is a bit trickier:
lt <- as.integer(3*(z-min(z))/diff(range(z))+1)
df <- data.frame(x,y,z,lt)
plot(x,y,type="n")
lapply(split(df,df$lt),function(df)with(df,lines(x,y,lty=lt)))
So this creates a vector of line types (1,2,3, or 4) based on the value of the derivative, then splits the data based on line type, and plots the segments.