I have a grid of rectangles, whose coordinates are stored in the variable say, 'gridPoints' as shown below:
gridData.Grid=GridTopology(c(min(data$LATITUDE),min(data$LONGITUDE)),c(0.005,0.005),c(32,32));
gridPoints = as.data.frame(coordinates(gridData.Grid))[1:1000,];
names(gridPoints) = c("LATITUDE","LONGITUDE");
plot(gridPoints,col=4);
points(data,col=2);
When plotted, these are the black points in the image,
Now, I have another data set of points called say , 'data', which when plotted are the blue points above.
I would want a count of how many blue points fall within each rectangle in the grid. Each rectangle can be represented by the center of the rectangle, along with the corresponding count of blue points within it in the output. Also, if the blue point lies on any of the sides of the rectangle, it can be considered as lying within the rectangle while making the count. The plot has the blue and black points looking like circles, but they are just standard points/coordinates and hence, much smaller than the circles. In a special case, the rectangle can also be a square.
Try this,
x <- seq(0,10,by=2)
y <- seq(0, 30, by=10)
grid <- expand.grid(x, y)
N <- 100
points <- cbind(runif(N, 0, 10), runif(N, 0, 30))
plot(grid, t="n", xaxs="i", yaxs="i")
points(points, col="blue", pch="+")
abline(v=x, h=y)
binxy <- data.frame(x=findInterval(points[,1], x),
y=findInterval(points[,2], y))
(results <- table(binxy))
d <- as.data.frame.table(results)
xx <- x[-length(x)] + 0.5*diff(x)
d$x <- xx[d$x]
yy <- y[-length(y)] + 0.5*diff(y)
d$y <- yy[d$y]
with(d, text(x, y, label=Freq))
A more general approach (may be overkill for this case, but if you generalize to arbitrary polygons it will still work) is to use the over function in the sp package. This will find which polygon each point is contained in (then you can count them up).
You will need to do some conversions up front (to spatial objects) but this method will work with more complicated polygons than rectangles.
If all the rectangles are exactly the same size, then you could use k nearest neighbor techniques using the centers of the rectangles, see the knn and knn1 functions in the class package.
Related
I want to visualize proportions using points inside a circle. For example, let's say that I have 100 points that I wish to scatter (somewhat randomly jittered) in a circle.
Next, I want to use this diagram to represent the proportions of people who voted Biden/Harris in 2020 US presidential elections, in each state.
Example #1 -- Michigan
Biden got 50.62% of Michigan's votes. I'm going to draw a horizontal diameter that splits the circle to two halves, and then color the points under the diameter in blue (Democrats' color).
Example #2 -- Wyoming
Unlike Michigan, in Wyoming Biden got only 26.55% of the votes, which is approximately a quarter of the vote. In this case I'd draw a horizontal chord that divides the circle such that the disk's area under the chord is 25% of the entire disk area. Then I'll color the respective points in that area in blue. Since I have 100 points in total, 25 points represent the 25% who voted Biden in Wyoming.
My question: How can I do this with ggplot? I researched this issue, and there's a lot of geometry going on here. First, the kind of area I'm talking about is called a "circular segment". Second, there are many formulas to calculate its area, if we know some other parameters about the shape (such as the radius length, etc.). See this nice demo.
However, my goal isn't to solve geometry problems, but just to represent proportions in a very specific way:
draw a circle
sprinkle X number of points inside
draw a (real or invisible) horizontal line that divides the circle/disk area according to a given proportion
ensure that the points are arranged respective to the split. That is, if we want to represent a 30%-70% split, then have 30% of the points under the line that divides the disk.
color the points under the line.
I understand that this is somewhat an exotic visualization, but I'll be thankful for any help with this.
EDIT
I've found a reference to a JavaScript package that does something very similar to what I'm asking.
I took a crack at this for fun. There's a lot more that could be done. I agree that this is not a great way to visualize proportions, but if it's engaging your audience ...
Formulas for determining appropriate heights are taken from Wikipedia. In particular we need the formulas
a/A = (theta - sin(theta))/(2*pi)
h = 1-cos(theta/2)
where a is the area of the segment; A is the whole area of the circle; theta is the angle described by the arc that defines the segment (see Wikipedia for pictures); and h is the height of the segment.
Machinery for finding heights.
afun <- function(x) (x-sin(x))/(2*pi)
## curve(afun, from=0, to = 2*pi)
find_a <- function(a) {
uniroot(
function(x) afun(x) -a,
interval=c(0, 2*pi))$root
}
find_h <- function(a) {
1- cos(find_a(a)/2)
}
vfind_h <- Vectorize(find_h)
## find_a(0.5)
## find_h(0.5)
## curve(vfind_h(x), from = 0, to= 1)
set up a circle
dd <- data.frame(x=0,y=0,r=1)
library(ggforce)
library(ggplot2); theme_set(theme_void())
gg0 <- ggplot(dd) + geom_circle(aes(x0=x,y0=y,r=r)) + coord_fixed()
finish
props <- c(0.2,0.5,0.3) ## proportions
n <- 100 ## number of points to scatter
cprop <- cumsum(props)[-length(props)]
h <- vfind_h(cprop)
set.seed(101)
r <- runif(n)
th <- runif(n, 0, 2 * pi)
dd <-
data.frame(x = sqrt(r) * cos(th),
y = sqrt(r) * sin(th))
dd2 <- data.frame(x=r*cos(2*pi*th), y = r*sin(2*pi*th))
dd2$g <- cut(dd2$y, c(1, 1-h, -1))
gg0 + geom_point(data=dd2, aes(x, y, colour = g), size=3)
There are a bunch of tweaks that would make this better (meaningful names for the categories; reverse the axis order to match the plot; maybe add segments delimiting the sections, or (more work) polygons so you can shade the sections.
You should definitely check this for mistakes — e.g. there are places where I may have used a set of values where I should have used their first differences, or vice versa (values vs cumulative sum). But this should get you started.
I have a bunch of points in 2D space and have calculated a convex hull for them. I would now like to "tighten" the hull so that it no longer necessarily encompasses all points. In the typical nails-in-board-with-rubber-band analogy, what I'd like to achieve is to be able to tune the elasticity of the rubber band and allow nails to bend at pressure above some limit. That's just an analogy, there is no real physics here. This would kind-of be related to the reduction in hull area if a given point was removed, but not quite because there could be two points that are very close to each-other. This is not necessarily related to outlier detection, because you could imagine a pattern where a large fractions of the nails would bend if they are on a narrow line (imagine a hammer shape for example). All of this has to be reasonably fast for thousands of points. Any hints where I should look in terms of algorithms? An implementation in R would be perfect, but not needed.
EDIT AFTER COMMENT: The three points I've labelled are those with largest potential for reducing the hull area if they are excluded. In the plot there is no other set of three points that would result in a larger area reduction. A naive implementation of what I'm looking for would maybe be to randomly sample some fraction of the points, calculate the hull area, remove each point on the hull iteratively, recalculate the area, repeat many times and remove points that tend to lead to high area reduction. Maybe this could be implemented in some random forest variant? It's not quite right though, because I would like the points to be removed one by one so that you get the following result. If you looked at all points in one go it would possibly be best to trim from the edges of the "hammer head".
Suppose I have a set of points like this:
set.seed(69)
x <- runif(20)
y <- runif(20)
plot(x, y)
Then it is easy to find the subset points that sit on the convex hull by doing:
ss <- chull(x, y)
This means we can plot the convex hull by doing:
lines(x[c(ss, ss[1])], y[c(ss, ss[1])], col = "red")
Now we can randomly remove one of the points that sits on the convex hull (i.e. "bend a nail") by doing:
bend <- ss[sample(ss, 1)]
x <- x[-bend]
y <- y[-bend]
And we can then repeat the process of finding the convex hull of this new set of points:
ss <- chull(x, y)
lines(x[c(ss, ss[1])], y[c(ss, ss[1])], col = "blue", lty = 2)
To get the point which will, on removal, cause the greatest reduction in area, one option would be the following function:
library(sp)
shrink <- function(coords)
{
ss <- chull(coords[, 1], coords[, 2])
outlier <- ss[which.min(sapply(seq_along(ss),
function(i) Polygon(coords[ss[-i], ], hole = FALSE)#area))]
coords[-outlier, ]
}
So you could do something like:
coords <- cbind(x, y)
new_coords <- shrink(coords)
new_chull <- new_coords[chull(new_coords[, 1], new_coords[, 2]),]
new_chull <- rbind(new_chull, new_chull[1,])
plot(x, y)
lines(new_chull[,1], new_chull[, 2], col = "red")
Of course, you could do this in a loop so that new_coords is fed back into shrink multiple times.
Calculate a robust center and variance using mcd.cov in MASS and the mahalanobis distance of each point from it (using mahalanobis in psych). We then show a quantile plot of the mahalanobis distances using PlotMD from modi and also show the associated outliers in red in the second plot. (There are other functions in modi that may be of interest as well.)
library(MASS)
library(modi)
library(psych)
set.seed(69)
x <- runif(20)
y <- runif(20)
m <- cbind(x, y)
mcd <- cov.mcd(m)
md <- mahalanobis(m, mcd$center, mcd$cov)
stats <- PlotMD(md, 2, alpha = 0.90)
giving:
(continued after screenshot)
and we show the convex hull using lines and the outliers in red:
plot(m)
ix <- chull(m)
lines(m[c(ix, ix[1]), ])
wx <- which(md > stats$halpha)
points(m[wx, ], col = "red", pch = 20)
Thank you both! I've tried various methods for outlier detection, but it's not quite what I was looking for. They have worked badly due to weird shapes of my clusters. I know I talked about convex hull area, but I think filtering on segment lengths yields better results and is closer to what I really wanted. Then it would look something like this:
shrink <- function(xy, max_length = 30){
to_keep <- 1:(dim(xy)[1])
centroid <- c(mean(xy[,1]), mean(xy[,2]))
while (TRUE){
ss <- chull(xy[,1], xy[,2])
ss <- c(ss, ss[1])
lengths <- sapply(1:(length(ss)-1), function(i) sum((xy[ss[i+1],] - xy[ss[i],])^2))
# This gets the point with the longest convex hull segment. chull returns points
# in clockwise order, so the point to remove is either this one or the one
# after it. Remove the one furthest from the centroid.
max_point <- which.max(lengths)
if (lengths[max_point] < max_length) return(to_keep)
if (sum((xy[ss[max_point],] - centroid)^2) > sum((xy[ss[max_point + 1],] - centroid)^2)){
xy <- xy[-ss[max_point],]
to_keep <- to_keep[-ss[max_point]]
}else{
xy <- xy[-ss[max_point + 1],]
to_keep <- to_keep[-ss[max_point + 1]]
}
}
}
It's not optimal because it factors in the distance to the centroid, which I would have liked to avoid, and there is a max_length parameter that should be calculated from the data instead of being hard-coded.
No filter:
It looks like this because there are 500 000 cells in here, and there are many that end up "wrong" when projecting from ~20 000 dimensions to 2.
Filter:
Note that it filters out points at tips of some clusters. This is less-than-optimal but ok. The overlap between some clusters is true and should be there.
I have two matrices which are built as follows
x1=cbind(V1,V2,ID)
X2=cbind(V1,V2,ID)
X3=rbind(X1,X2)
ID takes only the values "red" and "blue"
when I plot X1 and X2 I have the following plot
I want to select the data points which are within 1 unit distance (euclidian distance) basically filtering only the red points which are overlapping or quasi-overlapping a blue point or vice versa.
Red overlapping red and blue overlapping blue is not interesting for me.
Thanks a lot for your assistance.
You definitely need to provide a reproducible example for this one to get the best answer; however, I think below script will serve the purpose:
library(spatstat)
# setting seeds
set.seed(222)
# two different point patterns
X <- runifpoint(15)
Y <- runifpoint(20)
plot(X, pch=19, main="")
plot(Y, col="red", pch=19, add=T)
#you can get both which and dist from nncross
#N.which <- nncross(X,Y, k=1:20, what="which")
#N.dist <- nncross(X,Y, k=1:20, what="dist")
out <- subset(X, nncross(X,Y, k=1:20, what="dist") < 0.1) # you may change 0.1
plot(out, col="blue", pch=19, add=T)
For the above plot, black points represent X and red points represent Y. Blue are intersecting points which are within 0.1 unit distance. This distance can be further modified. For more detaild please see spatstat to compute distances between two different datasets using nncross.
I want to create 50 concentric circles. I did it with python but now I want to do this in R. I have tried the symbols function but with no result. I want my circles to start from x,y coordinates and the radius of each circle to be 3times bigger than the previous.
step=1
for(i in seq(1,50,1)){
symbols (x, y, circles=50, col="grey")
step=step+3
}
From this I get one circle as a result.
I am new in programming so it is probably very simple. Should I use a specific package?
The beauty of R is that many things can be vectorized, including the imput to the 'symbols' function. Here's an example for you:
#vector of radii
#written in a way that's easily changable
n_circles <- 50
my_circles <- seq(1,by=1,length.out = n_circles)
#generate x and y
x <- rep(1,n_circles)
y <- rep(1, n_circles)
#plot
symbols(x,y,1:n_circles)
I have a vector of coordinates where each row designates the centre of a circle:
x <- runif(5,0,2)
y <- runif(5,0,2)
As you can see the circles centres are all found within the square (0,2).
Each circle has a radius 0.2. I want to randomly shift the centre of the circles within the bounds of the original circle. I figured I could do this:
radii <- (sample(20,5,replace=TRUE))/100
angles <- sample(360,5,replace=TRUE)
newx <- x + radii*(cos(angles))
newy <- y + radii*(sin(angles))
However, I realise that doing this I could technically get circle centres that fall outside of the square (0,2). I could try and write a loop that rejects newx and newy values that are negative. But have to do this for 10s of thousands of rows and worried about the speed of this. Is it possible to run this conditional coordinate shift without resorting to a loop?
My rule set is as follows:
pick a new circle centre for each centre.
The new centres must fall within the area of each circle (radius 0.2 distance from the original centre)
The new centres must lie within the original square.
If a centre meets the border of the circle it should be reflected as of the law of reflection (be reflected the remaining length of the random radius distance selected)
Something like this:
#lets do only one point first
x <- runif(1,0,2)
y <- runif(1,0,2)
randomwalk <- function (pos) {
x <- pos[1]
y <- pos[2]
radius <- (sample(20,1,replace=TRUE))/100
angle <- sample(360,1,replace=TRUE)
newx <- x + radius*(cos(angle))
newy <- y + radius*(sin(angle))
if (newy > 2) { #check the geometric calculations
r2 <- (2-y)/sin(angle)
hitx <- x + r2*(cos(angle))
hity <- 2
newx <- hitx + (radius-r2)*sin(angle)
newy <- hity - (radius-r2)*cos(angle)
}
#implement other borders yourself
#and include a check, which border is hit first
#and include the possibility for multiple hits
#(e.g., left border and then top border)
cbind(newx,newy)
}
resx <- vector(50,mode="numeric")
resy <- vector(50,mode="numeric")
res <- cbind(resx,resy)
res[1,] <- cbind(x,y)
for (i in 2:50) {
res[i,] <- randomwalk(res[i-1,])
}
I suspect this still contains some geometric errors, but don't have time to check.
The functions inpip and inout from package splancs is quite useful; they can be used to check if points fall inside a polygon. You just need a matrix with 2 columns which represents any polygon (such as a square). This functions are made to be fast, using C and Fortran programs.
If your square is:
square <- cbind(c(0, 10, 10, 0), c(0, 0, 10, 10)) # In case side = 10
Then create all the new centers (I suggest using runif instead of sample for the radii and angle, but that's up to you). Then check if those centers fall inside the square with one line:
inside <- inout(newCenters, square)
newCenters <- newCenters[inside]
And afterwards you should do all the necessary steps to recreate the newCenters that where selected out, as many times as needed until they fall inside the square. Note that this needs a while loop (or equivalent).
Note also that in the same package (splancs) there is this function csr that create random points inside a polygon. So in principle you could cut a piece of every circumference that falls outside the square and then use the resulting polygons (the cut circles) as input to this function. This can become slow because you have to use a loop (or a lapply maybe) for all cut circles.
As a last idea, maybe you can combine the two strategies. First use your initial idea to all circumferences that fall completely inside the square (or equivalently, all the centers that are at a distance of 2 or more from the perimeter). Then use the csr function for all the rest of the circles.
Hope this helps!