I have a vector of coordinates where each row designates the centre of a circle:
x <- runif(5,0,2)
y <- runif(5,0,2)
As you can see the circles centres are all found within the square (0,2).
Each circle has a radius 0.2. I want to randomly shift the centre of the circles within the bounds of the original circle. I figured I could do this:
radii <- (sample(20,5,replace=TRUE))/100
angles <- sample(360,5,replace=TRUE)
newx <- x + radii*(cos(angles))
newy <- y + radii*(sin(angles))
However, I realise that doing this I could technically get circle centres that fall outside of the square (0,2). I could try and write a loop that rejects newx and newy values that are negative. But have to do this for 10s of thousands of rows and worried about the speed of this. Is it possible to run this conditional coordinate shift without resorting to a loop?
My rule set is as follows:
pick a new circle centre for each centre.
The new centres must fall within the area of each circle (radius 0.2 distance from the original centre)
The new centres must lie within the original square.
If a centre meets the border of the circle it should be reflected as of the law of reflection (be reflected the remaining length of the random radius distance selected)
Something like this:
#lets do only one point first
x <- runif(1,0,2)
y <- runif(1,0,2)
randomwalk <- function (pos) {
x <- pos[1]
y <- pos[2]
radius <- (sample(20,1,replace=TRUE))/100
angle <- sample(360,1,replace=TRUE)
newx <- x + radius*(cos(angle))
newy <- y + radius*(sin(angle))
if (newy > 2) { #check the geometric calculations
r2 <- (2-y)/sin(angle)
hitx <- x + r2*(cos(angle))
hity <- 2
newx <- hitx + (radius-r2)*sin(angle)
newy <- hity - (radius-r2)*cos(angle)
}
#implement other borders yourself
#and include a check, which border is hit first
#and include the possibility for multiple hits
#(e.g., left border and then top border)
cbind(newx,newy)
}
resx <- vector(50,mode="numeric")
resy <- vector(50,mode="numeric")
res <- cbind(resx,resy)
res[1,] <- cbind(x,y)
for (i in 2:50) {
res[i,] <- randomwalk(res[i-1,])
}
I suspect this still contains some geometric errors, but don't have time to check.
The functions inpip and inout from package splancs is quite useful; they can be used to check if points fall inside a polygon. You just need a matrix with 2 columns which represents any polygon (such as a square). This functions are made to be fast, using C and Fortran programs.
If your square is:
square <- cbind(c(0, 10, 10, 0), c(0, 0, 10, 10)) # In case side = 10
Then create all the new centers (I suggest using runif instead of sample for the radii and angle, but that's up to you). Then check if those centers fall inside the square with one line:
inside <- inout(newCenters, square)
newCenters <- newCenters[inside]
And afterwards you should do all the necessary steps to recreate the newCenters that where selected out, as many times as needed until they fall inside the square. Note that this needs a while loop (or equivalent).
Note also that in the same package (splancs) there is this function csr that create random points inside a polygon. So in principle you could cut a piece of every circumference that falls outside the square and then use the resulting polygons (the cut circles) as input to this function. This can become slow because you have to use a loop (or a lapply maybe) for all cut circles.
As a last idea, maybe you can combine the two strategies. First use your initial idea to all circumferences that fall completely inside the square (or equivalently, all the centers that are at a distance of 2 or more from the perimeter). Then use the csr function for all the rest of the circles.
Hope this helps!
Related
I want to visualize proportions using points inside a circle. For example, let's say that I have 100 points that I wish to scatter (somewhat randomly jittered) in a circle.
Next, I want to use this diagram to represent the proportions of people who voted Biden/Harris in 2020 US presidential elections, in each state.
Example #1 -- Michigan
Biden got 50.62% of Michigan's votes. I'm going to draw a horizontal diameter that splits the circle to two halves, and then color the points under the diameter in blue (Democrats' color).
Example #2 -- Wyoming
Unlike Michigan, in Wyoming Biden got only 26.55% of the votes, which is approximately a quarter of the vote. In this case I'd draw a horizontal chord that divides the circle such that the disk's area under the chord is 25% of the entire disk area. Then I'll color the respective points in that area in blue. Since I have 100 points in total, 25 points represent the 25% who voted Biden in Wyoming.
My question: How can I do this with ggplot? I researched this issue, and there's a lot of geometry going on here. First, the kind of area I'm talking about is called a "circular segment". Second, there are many formulas to calculate its area, if we know some other parameters about the shape (such as the radius length, etc.). See this nice demo.
However, my goal isn't to solve geometry problems, but just to represent proportions in a very specific way:
draw a circle
sprinkle X number of points inside
draw a (real or invisible) horizontal line that divides the circle/disk area according to a given proportion
ensure that the points are arranged respective to the split. That is, if we want to represent a 30%-70% split, then have 30% of the points under the line that divides the disk.
color the points under the line.
I understand that this is somewhat an exotic visualization, but I'll be thankful for any help with this.
EDIT
I've found a reference to a JavaScript package that does something very similar to what I'm asking.
I took a crack at this for fun. There's a lot more that could be done. I agree that this is not a great way to visualize proportions, but if it's engaging your audience ...
Formulas for determining appropriate heights are taken from Wikipedia. In particular we need the formulas
a/A = (theta - sin(theta))/(2*pi)
h = 1-cos(theta/2)
where a is the area of the segment; A is the whole area of the circle; theta is the angle described by the arc that defines the segment (see Wikipedia for pictures); and h is the height of the segment.
Machinery for finding heights.
afun <- function(x) (x-sin(x))/(2*pi)
## curve(afun, from=0, to = 2*pi)
find_a <- function(a) {
uniroot(
function(x) afun(x) -a,
interval=c(0, 2*pi))$root
}
find_h <- function(a) {
1- cos(find_a(a)/2)
}
vfind_h <- Vectorize(find_h)
## find_a(0.5)
## find_h(0.5)
## curve(vfind_h(x), from = 0, to= 1)
set up a circle
dd <- data.frame(x=0,y=0,r=1)
library(ggforce)
library(ggplot2); theme_set(theme_void())
gg0 <- ggplot(dd) + geom_circle(aes(x0=x,y0=y,r=r)) + coord_fixed()
finish
props <- c(0.2,0.5,0.3) ## proportions
n <- 100 ## number of points to scatter
cprop <- cumsum(props)[-length(props)]
h <- vfind_h(cprop)
set.seed(101)
r <- runif(n)
th <- runif(n, 0, 2 * pi)
dd <-
data.frame(x = sqrt(r) * cos(th),
y = sqrt(r) * sin(th))
dd2 <- data.frame(x=r*cos(2*pi*th), y = r*sin(2*pi*th))
dd2$g <- cut(dd2$y, c(1, 1-h, -1))
gg0 + geom_point(data=dd2, aes(x, y, colour = g), size=3)
There are a bunch of tweaks that would make this better (meaningful names for the categories; reverse the axis order to match the plot; maybe add segments delimiting the sections, or (more work) polygons so you can shade the sections.
You should definitely check this for mistakes — e.g. there are places where I may have used a set of values where I should have used their first differences, or vice versa (values vs cumulative sum). But this should get you started.
I have a problem I wish to solve in R with example data below. I know this must have been solved many times but I have not been able to find a solution that works for me in R.
The core of what I want to do is to find how to translate a set of 2D coordinates to best fit into an other, larger, set of 2D coordinates. Imagine for example having a Polaroid photo of a small piece of the starry sky with you out at night, and you want to hold it up in a position so they match the stars' current positions.
Here is how to generate data similar to my real problem:
# create reference points (the "starry sky")
set.seed(99)
ref_coords = data.frame(x = runif(50,0,100), y = runif(50,0,100))
# generate points take subset of coordinates to serve as points we
# are looking for ("the Polaroid")
my_coords_final = ref_coords[c(5,12,15,24,31,34,48,49),]
# add a little bit of variation as compared to reference points
# (data should very similar, but have a little bit of noise)
set.seed(100)
my_coords_final$x = my_coords_final$x+rnorm(8,0,.1)
set.seed(101)
my_coords_final$y = my_coords_final$y+rnorm(8,0,.1)
# create "start values" by, e.g., translating the points we are
# looking for to start at (0,0)
my_coords_start =apply(my_coords_final,2,function(x) x-min(x))
# Plot of example data, goal is to find the dotted vector that
# corresponds to the translation needed
plot(ref_coords, cex = 1.2) # "Starry sky"
points(my_coords_start,pch=20, col = "red") # start position of "Polaroid"
points(my_coords_final,pch=20, col = "blue") # corrected position of "Polaroid"
segments(my_coords_start[1,1],my_coords_start[1,2],
my_coords_final[1,1],my_coords_final[1,2],lty="dotted")
Plotting the data as above should yield:
The result I want is basically what the dotted line in the plot above represents, i.e. a delta in x and y that I could apply to the start coordinates to move them to their correct position in the reference grid.
Details about the real data
There should be close to no rotational or scaling difference between my points and the reference points.
My real data is around 1000 reference points and up to a few hundred points to search (could use less if more efficient)
I expect to have to search about 10 to 20 sets of reference points to find my match, as many of the reference sets will not contain my points.
Thank you for your time, I'd really appreciate any input!
EDIT: To clarify, the right plot represent the reference data. The left plot represents the points that I want to translate across the reference data in order to find a position where they best match the reference. That position, in this case, is represented by the blue dots in the previous figure.
Finally, any working strategy must not use the data in my_coords_final, but rather reproduce that set of coordinates starting from my_coords_start using ref_coords.
So, the previous approach I posted (see edit history) using optim() to minimize the sum of distances between points will only work in the limited circumstance where the point distribution used as reference data is in the middle of the point field. The solution that satisfies the question and seems to still be workable for a few thousand points, would be a brute-force delta and comparison algorithm that calculates the differences between each point in the field against a single point of the reference data and then determines how many of the rest of the reference data are within a minimum threshold (which is needed to account for the noise in the data):
## A brute-force approach where min_dist can be used to
## ameliorate some random noise:
min_dist <- 5
win_thresh <- 0
win_thresh_old <- 0
for(i in 1:nrow(ref_coords)) {
x2 <- my_coords_start[,1]
y2 <- my_coords_start[,2]
x1 <- ref_coords[,1] + (x2[1] - ref_coords[i,1])
y1 <- ref_coords[,2] + (y2[1] - ref_coords[i,2])
## Calculate all pairwise distances between reference and field data:
dists <- dist( cbind( c(x1, x2), c(y1, y2) ), "euclidean")
## Only take distances for the sampled data:
dists <- as.matrix(dists)[-1*1:length(x1),]
## Calculate the number of distances within the minimum
## distance threshold minus the diagonal portion:
win_thresh <- sum(rowSums(dists < min_dist) > 1)
## If we have more "matches" than our best then calculate a new
## dx and dy:
if (win_thresh > win_thresh_old) {
win_thresh_old <- win_thresh
dx <- (x2[1] - ref_coords[i,1])
dy <- (y2[1] - ref_coords[i,2])
}
}
## Plot estimated correction (your delta x and delta y) calculated
## from the brute force calculation of shifts:
points(
x=ref_coords[,1] + dx,
y=ref_coords[,2] + dy,
cex=1.5, col = "red"
)
I'm very interested to know if there's anyone that solves this in a more efficient manner for the number of points in the test data, possibly using a statistical or optimization algorithm.
I'm wondering: Is there some function/clever way to find the smallest ellipse covering a given fraction of a set of 2d points in R? With smallest I mean the ellipse with the smallest area.
Clarification: I'm fine with an approximately correct solution if the number of points are large (as I guess an exact solution would have to try all combinations of subsets of points)
This question might sound like a duplicate of the question Ellipse containing percentage of given points in R but the way that question is phrased the resulting answer does not result in the smallest ellipse. For example, using the solution given to Ellipse containing percentage of given points in R:
require(car)
x <- runif(6)
y <- runif(6)
dataEllipse(x,y, levels=0.5)
The resulting ellipse is clearly not the smallest ellipse containing half of the points, Which, I guess, would be a small ellipse covering the three points up in the top-left corner.
I think I have a solution which requires two functions, cov.rob from the MASS package and ellipsoidhull from the cluster package. cov.rob(xy, quantile.used = 50, method = "mve") finds approximately the "best" 50 points out of the total number of 2d points in xy that are contained in the minimum volume ellipse. However, cov.rob does not directly return this ellipse but rather some other ellipse estimated from the best points (the goal being to robustly estimate the covariance matrix). To find the actuall minimum ellipse we can give the best points to ellipsoidhull which finds the minimum ellipse, and we can use predict.ellipse to get out the coordinates of the path defining the hull of the elllipse.
I'm not 100% certain this method is the easiest and/or that it works 100% (It feels like it should be possible to avoid the seconds step of using ellipsoidhull but I havn't figured out how.). It seems to work on my toy example at least....
Enough talking, here is the code:
library(MASS)
library(cluster)
# Using the same six points as in the question
xy <- cbind(x, y)
# Finding the 3 points in the smallest ellipse (not finding
# the actual ellipse though...)
fit <- cov.rob(xy, quantile.used = 3, method = "mve")
# Finding the minimum volume ellipse that contains these three points
best_ellipse <- ellipsoidhull( xy[fit$best,] )
plot(xy)
# The predict() function returns a 2d matrix defining the coordinates of
# the hull of the ellipse
lines(predict(best_ellipse), col="blue")
Looks pretty good! You can also inspect the ellipse object for more info
best_ellipse
## 'ellipsoid' in 2 dimensions:
## center = ( 0.36 0.65 ); squared ave.radius d^2 = 2
## and shape matrix =
## x y
## x 0.00042 0.0065
## y 0.00654 0.1229
## hence, area = 0.018
Here is a handy function which adds an ellipse to an existing base graphics plot:
plot_min_ellipse <- function(xy, points_in_ellipse, color = "blue") {
fit <- cov.rob(xy, quantile.used = points_in_ellipse, method = "mve")
best_ellipse <- ellipsoidhull( xy[fit$best,] )
lines(predict(best_ellipse), col=color)
}
Let's use it on a larger number of points:
x <- runif(100)
y <- runif(100)
xy <- cbind(x, y)
plot(xy)
plot_min_ellipse(xy, points_in_ellipse = 50)
This sounds very much like a 2D confidence interval. Try http://stat.ethz.ch/R-manual/R-devel/library/cluster/html/ellipsoidhull.html. You will probably need to run it on each combination of N points, then choose the smallest result.
Here is my problem.
I have an hypercube I built using the following codes:
X <- seq (-1/sqrt(2),1/sqrt(2),length.out=100)
Y <- seq (-sqrt(2)/(2*sqrt(3)),sqrt(2)/sqrt(3),length.out=100)
Z <- seq (-1/(2*sqrt(3)),sqrt(3)/2,length.out=100)
grid <- data.frame (expand.grid(X=X,Y=Y,Z=Z))
Then, I would delete from the grid data.frame all the points that are not located within the tetrahedron defined by the following coordinates:
w : (0,0,sqrt(3)/2)
x : (0,sqrt(2)/sqrt(3),-1/(2*sqrt(3)))
y : (-1/sqrt(2),-sqrt(2)/(2*sqrt(3)),-1/(2*sqrt(3)))
z : (1/sqrt(2),-sqrt(2)/(2*sqrt(3)),-1/(2*sqrt(3)))
I do not find a away to do this without howfully long codes. Can anyone help me please
Thanks !!!
Package ptinpoly has a function pip3d to find wether a point is in a polyhedron or not.
library(ptinpoly)
X <- seq(-1/sqrt(2),1/sqrt(2),length.out=10) #I used a smaller dataset here
Y <- seq(-sqrt(2)/(2*sqrt(3)),sqrt(2)/sqrt(3),length.out=10)
Z <- seq(-1/(2*sqrt(3)),sqrt(3)/2,length.out=10)
# The query points has to be inputted as a matrix.
grid <- as.matrix(expand.grid(X=X,Y=Y,Z=Z))
w <- c(0,0,sqrt(3)/2)
x <- c(0,sqrt(2)/sqrt(3),-1/(2*sqrt(3)))
y <- c(-1/sqrt(2),-sqrt(2)/(2*sqrt(3)),-1/(2*sqrt(3)))
z <- c(1/sqrt(2),-sqrt(2)/(2*sqrt(3)),-1/(2*sqrt(3)))
# The matrix of vertices
tetra_vert <- matrix(c(w,x,y,z),byrow=TRUE,nrow=4)
# The matrix of faces (each row correspond to a vector of vertices linked by a face.
tetra_faces <- matrix(c(1,2,3,
1,2,4,
1,3,4,
2,3,4),byrow=TRUE,nrow=4)
inout <- pip3d(tetra_vert, tetra_faces, grid)
The result is a vector of integers, 0 means the point fall on a face, 1 that it is inside the polyhedron, -1 outside.
The solution of your problem is therefore:
grid[inout%in%c(0,1),]
make planes which form the tetrahedron and compare if a point is on the right side of each of the planes.
pointers: think of calculating dot products with the plane normal and such. One option is to draw a vector from tetrahedron point to each corner, 4 in total and 1 vector from point to point and then use dotproducts and whatnot to see if the point-point vector is within the 4 others.
the point is probably within the tetrahedron if vector to it can be expressed as a sum of non negative multiples of the corner vectors and the vector short enough.
My question is this.. I am working on some clustering algorithms.. For this first i am experimenting with 2d shapes..
Given a particular area say 500sq units .. I need to generate random shapes for a particular area
say a Rect, Square, Triangle of 500 sq units.. etc .. Any suggestions on how i should go about this problem.. I am using R language..
It's fairly straightforward to do this for regular polygon.
The area of an n-sided regular polygon, with a circumscribed circle of radius R is
A = 1/2 nR^2 * sin((2pi)/n)
Therefore, knowing n and A you can easily find R
R = sqrt((2*A)/(n*sin((2pi)/n))
So, you can pick the center, go at distance R and generate n points at 2pi/n angle increments.
In R:
regular.poly <- function(nSides, area)
{
# Find the radius of the circumscribed circle
radius <- sqrt((2*area)/(nSides*sin((2*pi)/nSides)))
# I assume the center is at (0;0) and the first point lies at (0; radius)
points <- list(x=NULL, y=NULL)
angles <- (2*pi)/nSides * 1:nSides
points$x <- cos(angles) * radius
points$y <- sin(angles) * radius
return (points);
}
# Some examples
par(mfrow=c(3,3))
for (i in 3:11)
{
p <- regular.poly(i, 100)
plot(0, 0, "n", xlim=c(-10, 10), ylim=c(-10, 10), xlab="", ylab="", main=paste("n=", i))
polygon(p)
}
We can extrapolate to a generic convex polygon.
The area of a convex polygon can be found as:
A = 1/2 * [(x1*y2 + x2*y3 + ... + xn*y1) - (y1*x2 + y2*x3 + ... + yn*x1)]
We generate the polygon as above, but deviate angles and radii from those of the regular polygon.
We then scale the points to get the desired area.
convex.poly <- function(nSides, area)
{
# Find the radius of the circumscribed circle, and the angle of each point if this was a regular polygon
radius <- sqrt((2*area)/(nSides*sin((2*pi)/nSides)))
angle <- (2*pi)/nSides
# Randomize the radii/angles
radii <- rnorm(nSides, radius, radius/10)
angles <- rnorm(nSides, angle, angle/10) * 1:nSides
angles <- sort(angles)
points <- list(x=NULL, y=NULL)
points$x <- cos(angles) * radii
points$y <- sin(angles) * radii
# Find the area of the polygon
m <- matrix(unlist(points), ncol=2)
m <- rbind(m, m[1,])
current.area <- 0.5 * (sum(m[1:nSides,1]*m[2:(nSides+1),2]) - sum(m[1:nSides,2]*m[2:(nSides+1),1]))
points$x <- points$x * sqrt(area/current.area)
points$y <- points$y * sqrt(area/current.area)
return (points)
}
A random square of area 500m^2 is easy - its a square of side sqrt(500)m. Do you care about rotations? Then rotate it by runif(x,0,2*pi). Do you care about its location? Add an (x,y) offset computed from runif or whatever.
Rectangle? Given the length of any one pair of sides you only have the freedom to choose the length of the other two. How do you choose the length of the first pair of sides? Well, you might want to use runif() between some 'sensible' limits for your application. You could use rnorm() but that might give you negative lengths, so maybe rnorm-squared. Then once you've got that side, the other side length is 500/L. Rotate, translate, and add salt and pepper to taste.
For triangles, the area formula is half-base-times-height. So generate a base length - again, runif, rnorm etc etc - then choose another point giving the required height. Rotate, etc.
Summarily, a shape has a number of "degrees of freedom", and constraining the area to be fixed will limit at least one of those freedoms[1], so if you start building a shape with random numbers you'll come to a point where you have to put in a computed value.
[1] exactly one? I'm not sure - these aren't degrees of freedom in the statistical sense...
I would suggest coding a random walk of adjacent tiny squares, so that the aggregation of the tiny squares could be of arbitrary shape with known area.
http://en.wikipedia.org/wiki/File:Random_walk_in2D.png
It would be very tough to make a generic method.
But you could code up example for 3, 4, 5 sided objects.
Here is an example of a random triangle.(in C#)
class Triangle
{
double Angle1;
double Angle2;
//double angle3; 180 - angle1 - angle2;
double Base;
}
Triangle randomTriangle(double area){
//A = (base*hieght)/2.0;
double angle1 = *random number < 180*;
double angle2 = *random number < (180 - angle1)*;
*use trig to get height in terms of angles and base*
double base = (area*2.0)/height;
return new Triangle(){Angle1 = angle1, Angle2 = angle2, Base = base};
}