Starting from the centre of this hexagonal grid and working outwards, I would like to generate a list of 100000 pairs of coordinates (without any coordinates duplicated). Looking for suggestions on the best way to do this. If possible I'd like them ordered from closest to the centre to furthest from the centre.
The following function will find the centroids of hexagons in a regular grid, centered at x = 0, y = 0, where each hexagon is 1 unit wide:
hex_rings_at_d <- function(d) {
if(d == 0) return(data.frame(x = 0, y = 0))
d2 <- sqrt(d^2 - (0.5 * d)^2)
d3 <- 0.5 * d
f <- function(a, b) seq(a, b, length.out = d + 1)[-1]
data.frame(x = c(f( d, d3), f( d3, -d3), f(-d3, -d),
f(-d, -d3), f(-d3, d3), f( d3, d)),
y = c(f(0, d2), f(d2, d2), f(d2, 0),
f(0, -d2), f(-d2, -d2), f(-d2, 0)))
}
If we get all centroids out to a distance of 577 we will have 100,000 of them:
df <- do.call(rbind, lapply(0:577, hex_rings_at_d))
nrow(df)
#> [1] 1000519
These will be ordered from inside out:
head(df)
#> x y
#> 1 0.0 0.0000000
#> 2 0.5 0.8660254
#> 3 -0.5 0.8660254
#> 4 -1.0 0.0000000
#> 5 -0.5 -0.8660254
#> 6 0.5 -0.8660254
And we can confirm they are arranged in a regular hexagonal grid. Here are the first 169 centroids:
plot(df[1:169,])
Created on 2022-06-24 by the reprex package (v2.0.1)
Related
I have data frame, for example
df <- data.frame(x = 1:1e3, y = rnorm(1e3))
I need to split points on N (in my case N = 6, 12 and 24) rectangles with equal number of points. How to split my df using R-tree algorithm?
For uniformely distributed data on the x axis, kmeans clustering works (without surprise) well:
library(dplyr)
library(ggplot2)
set.seed(1)
df <- data.frame(x = 1:1e3, y = rnorm(1e3))
N <- 10
df$cluster <- kmeans(df,N)$cluster
cluster_rectangles <- df %>% group_by(cluster) %>%
summarize(xmin = min(x),
xmax = max(x),
ymin = min(y),
ymax = max(y),
n = n())
ggplot() + geom_rect(data = cluster_rectangles, mapping=aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax, fill=cluster)) +
geom_point(data = df,mapping=aes(x,y),color='white')
It also works if x distribution is normal :
df <- data.frame(x = rnorm(1e3), y = rnorm(1e3))
Drawback is that the number of points for each rectangle varies :
> cluster_rectangles %>% select(cluster,n)
# A tibble: 10 x 2
cluster n
<int> <int>
1 1 137
2 2 58
3 3 121
4 4 61
5 5 72
6 6 184
7 7 78
8 8 70
9 9 126
10 10 93
For an uniform distribution, the result is quite good (with N=9):
In case that all the points have different x coordinates, as it is the case in your example, sort the points increasingly according to the x coordinate. Note that, in this case, your problem of finding a covering with rectangles (with equal number of points) for the 2d points can be simplified to finding a covering with segments for 1d points (i.e. you can ignore the height of the rectangles).
Here how you can find the points in each rectangle:
num_rect <- 7 # In your example 6, 12 or 24
num_points <- 10 # In your example 1e3
# Already ordered according to x
df <- data.frame(x = 1:num_points, y = rnorm(num_points))
# Minimum number of points in the rectangles to cover all of them
points_in_rect <- ceiling(num_points/num_rect)
# Cover the first points using non-overlaping rectangles
breaks <- seq(0,num_points, by=points_in_rect)
cover <- split(seq(num_points), cut(seq(num_points), breaks))
names(cover) <- paste0("rect", seq(length(cover)))
# Cover the last points using overlaping rectangles
cur_num <- length(cover)
if (num_points < num_rect*points_in_rect ) {
# To avoid duplicate rectangles
last <- num_points
if (num_points %% 1 == 0)
last <- last -1
while (cur_num < num_rect) {
cur_num <- cur_num + 1
new_rect <- list(seq(last-points_in_rect+1, last))
names(new_rect) <- paste0("rect", cur_num)
cover <- c(cover,new_rect)
last <- last - points_in_rect
}
}
The points in the rectangles are:
$rect1
[1] 1 2
$rect2
[1] 3 4
$rect3
[1] 5 6
$rect4
[1] 7 8
$rect5
[1] 9 10
$rect6
[1] 8 9
$rect7
[1] 6 7
The minimum bounding rectangles (parallel to the axes) that enclose those set of points are the ones that you are finding.
Duplicated coordinate values in both axes
Randomly rotate the points (save the rotation angle) and check if there are not duplicate x (or y) coordinates. If this is the case, use the above strategy with the rotated coordinates (remember to sort before the rotated points according to the new x coordinates), and then rotate back the obtained rectangles in the opposite direction. If duplicated coordinates remain in both axes, rotate the points again with a different (random) angle. Since you have a finite number of points, you can always find a rotation angle that separates de x (or y) coordinates.
I am trying to solve the dispersion equation:
w^2 = k*g * tanh(kh)
I have a vector (25x1 element) input for w and want a vector output for k.
I've tried below, but is highly dependent on the tolerance values for uniroot.all:
g = 9.81 #m/s^2
h = 8 #m
w = c(0.1,0.2,0.3) # 3 element vector for ease
dispersion <- function(k) { 0 == k*g*tanh(k*h)^0.5-w }
k1 <- uniroot.all(function(k) dispersion(1e4), c(-10,10), tol = 1e-100, maxiter = 1000)
You can use an apply function, like so:
# Parameters
g <- 9.81
h <- 8
w <- c(0.1, 0.2, 0.3)
# Function to find root of
disp_root <- function(k, w) {k * g * tanh(k * h) - w^2 }
# Apply for each w
res <- sapply(w, function(x)rootSolve::uniroot.all(disp_root, c(-1,1), w = x))
# Repackage results
df_res <- data.frame(w, t(res))
# Fix names
names(df_res)[2:3] <- c("first_root", "second_root")
# Examine results
df_res
#> w first_root second_root
#> 1 0.1 -0.01126658 0.01126658
#> 2 0.2 -0.02272161 0.02272161
#> 3 0.3 -0.03419646 0.03419646
Created on 2020-01-31 by the reprex package (v0.3.0)
I am trying to create or find a function that calculates the distance and angle between two points, the idea is that I can have two data.frames with x, y coordinates as follows:
Example dataset
From <- data.frame(x = c(0.5,1, 4, 0), y = c(1.5,1, 1, 0))
To <- data.frame(x =c(3, 0, 5, 1), y =c(3, 0, 6, 1))
Current function
For now, I've managed to develop the distance part using Pythagoras:
distance <- function(from, to){
D <- sqrt((abs(from[,1]-to[,1])^2) + (abs(from[,2]-to[,2])^2))
return(D)
}
Which works fine:
distance(from = From, to = To)
[1] 2.915476 1.414214 5.099020 1.414214
but I can't figure out how to get the angle part.
What I tried so far:
I tried adapting the second solution of this question
angle <- function(x,y){
dot.prod <- x%*%y
norm.x <- norm(x,type="2")
norm.y <- norm(y,type="2")
theta <- acos(dot.prod / (norm.x * norm.y))
as.numeric(theta)
}
x <- as.matrix(c(From[,1],To[,1]))
y <- as.matrix(c(From[,2],To[,2]))
angle(t(x),y)
But I am clearly making a mess of it
Desired output
I would like having the angle part of the function added to my first function, where I get both the distance and angle between the from and to dataframes
By angle between two points, I am assuming you mean angle between two vectors
defined by endpoints (and assuming the start is the origin).
The example you used was designed around only a single pair of points, with the transpose used only on this principle. It is however robust enough to work in more than 2 dimensions.
Your function should be vectorised as your distance function is, as it is expecting a number of pairs of points (and we are only considering 2 dimensional points).
angle <- function(from,to){
dot.prods <- from$x*to$x + from$y*to$y
norms.x <- distance(from = `[<-`(from,,,0), to = from)
norms.y <- distance(from = `[<-`(to,,,0), to = to)
thetas <- acos(dot.prods / (norms.x * norms.y))
as.numeric(thetas)
}
angle(from=From,to=To)
[1] 0.4636476 NaN 0.6310794 NaN
The NaNs are due to you having zero-length vectors.
how about:
library(useful)
df=To-From
cart2pol(df$x, df$y, degrees = F)
which returns:
# A tibble: 4 x 4
r theta x y
<dbl> <dbl> <dbl> <dbl>
1 2.92 0.540 2.50 1.50
2 1.41 3.93 -1.00 -1.00
3 5.10 1.37 1.00 5.00
4 1.41 0.785 1.00 1.00
where r us the distance and theta is the angle
I have an R matrix which is very data dense. It has 500,000 rows. If I plot 1:500000 (x axis) to the third column of the matrix mat[, 3] it takes too long to plot, and sometimes even crashes. I've tried plot, matplot, and ggplot and all of them take very long.
I am looking to group the data by 10 or 20. ie, take the first 10 elements from the vector, average that, and use that as a data point.
Is there a fast and efficient way to do this?
We can use cut and aggregate to reduce the number of points plotted:
generate some data
set.seed(123)
xmat <- data.frame(x = 1:5e5, y = runif(5e5))
use cut and aggregate
xmat$cutx <- as.numeric(cut(xmat$x, breaks = 5e5/10))
xmat.agg <- aggregate(y ~ cutx, data = xmat, mean)
make plot
plot(xmat.agg, pch = ".")
more than 1 column solution:
Here, we use the data.table package to group and summarize:
generate some more data
set.seed(123)
xmat <- data.frame(x = 1:5e5,
u = runif(5e5),
z = rnorm(5e5),
p = rpois(5e5, lambda = 5),
g = rbinom(n = 5e5, size = 1, prob = 0.5))
use data.table
library(data.table)
xmat$cutx <- as.numeric(cut(xmat$x, breaks = 5e5/10))
setDT(xmat) #convert to data.table
#for each level of cutx, take the mean of each column
xmat[,lapply(.SD, mean), by = cutx] -> xmat.agg
# xmat.agg
# cutx x u z p g
# 1: 1 5.5 0.5782475 0.372984058 4.5 0.6
# 2: 2 15.5 0.5233693 0.032501186 4.6 0.8
# 3: 3 25.5 0.6155837 -0.258803746 4.6 0.4
# 4: 4 35.5 0.5378580 0.269690334 4.4 0.8
# 5: 5 45.5 0.3453964 0.312308395 4.8 0.4
# ---
# 49996: 49996 499955.5 0.4872596 0.006631221 5.6 0.4
# 49997: 49997 499965.5 0.5974486 0.022103345 4.6 0.6
# 49998: 49998 499975.5 0.5056578 -0.104263093 4.7 0.6
# 49999: 49999 499985.5 0.3083803 0.386846148 6.8 0.6
# 50000: 50000 499995.5 0.4377497 0.109197095 5.7 0.6
plot it all
par(mfrow = c(2,2))
for(i in 3:6) plot(xmat.agg[,c(1,i), with = F], pch = ".")
I'd like to snap single points to other points based on their distance in R.
In detail, I have a bunch of points defined by X and Y coordinate pairs.
In addition I have single different points, that I want to snap to
the closest neighbors (euclidean distance).
# target points
df <- data.frame(X=c(1,2,2,2,3,4),Y=c(1,2,2,3,3,4))
# points that need snapping
point1 <- data.frame(X=2.1, Y=2.3)
point2 <- data.frame(X=2.5, Y=2.5)
plot(df)
points(point1$X,point1$Y,pch=20,col="red")
points(point2$X,point2$Y,pch=20,col="blue")
But how to proceed with the snapping of the points?
How can I snap the points and assign new coordinate pairs to the single points?
Is there a simple function available in R? Or do I need to apply
the dist() function to obtain a distance matrix and search for
the closest distance? Maybe there is a more straight forward way.
Thats how it should look like:
1) snap to the closest (euclidean distance) point (clear solution for point 1)
point1$X_snap <- 2
point1$Y_snap <- 2
2) if two or more points similarily close than
snap to that which is more "north-east"
a) snap first to the one which is more north (Y direction)
b) if there are more than one that a similarly distant in Y direction snap
to the one that is more east
point2$X_snap <- 3
point2$Y_snap <- 3
For a graphical illustration how the results should look like
#plot snapped points:
points(point1$X_snap,point1$Y_snap,pch=8,col="red")
points(point2$X_snap,point2$Y_snap,pch=8,col="blue")
It's possible using dist:
dist(rbind(point1,df))
1 2 3 4 5
2 1.7029386
3 0.3162278 1.4142136
4 0.3162278 1.4142136 0.0000000
5 0.7071068 2.2360680 1.0000000 1.0000000
6 1.1401754 2.8284271 1.4142136 1.4142136 1.0000000
7 2.5495098 4.2426407 2.8284271 2.8284271 2.2360680
6
2
3
4
5
6
7 1.4142136
So the row with the minimum value (distance) in the first column identifies the point in df which is closest to point1 . In your example, you have a repeated location. Repeat for each of your point_x .
I found another solution using the matchpt() function
from Biobase (Bioconductor):
# target points
df <- data.frame(X=c(1,2,2,2,3,4),Y=c(1,2,2,3,3,4))
# points that need snapping
point1 <- data.frame(X=2.1, Y=2.3)
point2 <- data.frame(X=2.5, Y=2.5)
snap <- function(df,point){
require(Biobase)
d <- matchpt(as.matrix(df),
as.matrix(data.frame(X=point$X+0.0001,Y=point$Y+0.0001))) # to the "northwest" criteria correct
min_row <- as.numeric(rownames(d[d$distance==min(d$distance),]))
point$X_snap <- unique(df[min_row,"X"])
point$Y_snap <- unique(df[min_row,"Y"])
point
}
snap(df,point2)
You might also want to try the RANN package for Fast Nearest Neighbour Search:
# your data
df <- data.frame(X=c(1,2,2,2,3,4),Y=c(1,2,2,3,3,4))
pts <- data.frame(X=c(2.1, 2.5), Y=c(2.3, 2.5))
library(RANN)
# for each point in pts, find the nearest neighbor from df
closest <- RANN::nn2(data = df, query = pts, k = 1)
# argument k sets the number of nearest neighbours, here 1 (the closest)
closest
# $nn.idx
# [,1]
# [1,] 3
# [2,] 5
#
# $nn.dists
# [,1]
# [1,] 0.3162278
# [2,] 0.7071068
# Get coordinates of nearest neighbor
pts$X_snap <- df[closest$nn.idx, "X"]
pts$Y_snap <- df[closest$nn.idx, "Y"]
pts
# X Y X_snap Y_snap
# 1 2.1 2.3 2 2
# 2 2.5 2.5 3 3
I would put the criteria (distance, "southness", "westness") in a dataframe and then sort this dataframe along these criteria:
# input data
df <- data.frame(X=c(1,2,2,2,3,4),Y=c(1,2,2,3,3,4))
point1 <- data.frame(X=2.1, Y=2.3)
point2 <- data.frame(X=2.5, Y=2.5)
df.res[with(df.res, order(dst, dy, dx)), ]
# function that sorts all potential snapping points according to distance, "westness", "southness"
snap.xy <- function(point, other.points) {
df.res <- data.frame(X = other.points$X, # to later access the coordinates to snap to
Y = other.points$Y, # dto
dx <- point$X - other.points$X, # "westness" (the higher, the more "west")
dy <- point$Y - other.points$Y, # "southness"
dst = sqrt(dx^2 + dy^2)) # distance
# print(df.res[with(df.res, order(dst, dy, dx)), ]) # just for checking the results
return(df.res[with(df.res, order(dst, dy, dx)), ][1,c("X", "Y")]) # return only the X/Y coordinates
}
# examples
snap.xy(point1, df) # 2/2
snap.xy(point2, df) # 3/3
snap.xy(point2, df)$X # 3
snap.xy(point2, df)$Y # 3