I am using the akima package and bilinear function to interpolate z values (temperatures) from a coarse coordinate grid (2.5° x 2.5°) to a finer grid (0.5° x 0.5°). The bilinear function works as follows:
Usage
bilinear(x, y, z, x0, y0)
Arguments
x a vector containing the x coordinates of the rectangular data grid.
y a vector containing the y coordinates of the rectangular data grid.
z a matrix containing the z[i,j] data values for the grid points (x[i],y[j]).
x0 vector of x coordinates used to interpolate at.
y0 vector of y coordinates used to interpolate at.
Value
This function produces a list of interpolated points:
x vector of x coordinates.
y vector of y coordinates.
z vector of interpolated data z.
Given the following data:
# coarse grid longitudes x -> c(0, 2.5, 5, 7.5, 10)
# coarse grid latitudes y -> c(50, 55, 60, 65, 70)
# temperatures z -> c(10.5, 11.1, 12.4, 9.8, 10.6)
# fine grid longitudes x0 -> c(0, 0.5, 1, 1.5, 2)
# fine grid latitudes y0 -> c(50, 50.5, 51, 51.5, 52)
I tried the function:
bilinear -> (x=x, y=y, z=z, x0=x0, y0=y0)
But I get the following:
Error in if (dim(z)[1] != nx) stop("dim(z)[1] and length of x differs!") :
argument is of length zero
I clearly don't fully understand how this function works and would really appreciate any suggestions if somebody knows what I'm doing wrong? I'm open to an alternative solution using a different package also.
Read carefully the description of the function, z need to be a matrix with dimension x,y:
library(akima)
x <- c(0, 2.5, 5, 7.5, 10)
y <- c(50, 55, 60, 65, 70)
z <- matrix(rnorm(25), 5, 5)
x0 <- seq(0, 10, .5)
y0 <- seq(50, 70, length = length(x0))
> bilinear(x, y, z, x0, y0)
$x
[1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
[16] 7.5 8.0 8.5 9.0 9.5 10.0
$y
[1] 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
$z
[1] 1.14880762 1.08789150 0.88252672 0.53271328 0.03845118 -0.60025959
[7] -0.13758256 0.17947029 0.35089894 0.37670342 0.25688371 -0.06736752
[13] -0.42197570 -0.80694083 -1.22226291 -1.66794194 -1.38940279 -1.08889523
[19] -0.76641923 -0.42197481 -0.05556197
Related
I'm trying to interpolate data from ERA5 Interim to a finer resolution (from 0.1° to 0.05°). I'm using the bilinear function from the akima package in Rstudio to do so, but whenever I try to compute it I only get zeros. What might be the reason behind this?
As a simplification of my data, I run this line
test_bilinear = bilinear(c(-70.6, -70.5, -70.4, -70.3, -70.2),
c(-34.0, -34.1, -34.2, -34.3, -34.4),
matrix(c(151, 151, 154, 162, 171, 142, 142, 146, 155, 167, 137, 135.09, 141.57, 153.08, 164.60, 139.00, 139.22, 144.59, 154.31, 163.37, 143.13, 144.59, 149.90, 158.23, 164.99), nrow = 5, ncol = 5),
c(-70.6, -70.35, -70.65, -71.85),
c(-34.0, -34.05, -34.37, -33.88))[["z"]]
> test_bilinear
[1] 0 0 0 0
But, when I use an example of the bilinear function, it works as intended.
x <- c(0, 2.5, 5, 7.5, 10)
y <- c(50, 55, 60, 65, 70)
z <- matrix(rnorm(25), 5, 5)
x0 <- seq(0, 10, .5)
y0 <- seq(50, 70, length = length(x0))
bilinear(x, y, z, x0, y0)[["z"]]
> bilinear(x, y, z, x0, y0)[["z"]]
[1] 0.5212130 0.4629981 0.6316416 1.0271434 1.6495036 2.4987221 1.6069258 0.9774763 0.6103738
[10] 0.5056182 0.6632096 0.6206286 0.6874867 0.8637841 1.1495206 1.5446963 0.7932024 0.3964096
[19] 0.3543180 0.6669276 1.3342383
What might be the problem?
The documentation doesn't say this, but apparently akima::bilinear assumes that both x and y are increasing, and that x0 and y0 fall in their range.
In your data, y is c(-34.0, -34.1, -34.2, -34.3, -34.4) which is decreasing. So put it in increasing order, and modify z in a corresponding way.
You'll still have the problem that only the first two of your (x0, y0) points falls in the range of your data. To fix that, you'll need more data, or fewer points to interpolate.
Here are some calculations to illustrate:
# Your original data:
x <- c(-70.6, -70.5, -70.4, -70.3, -70.2)
y <- c(-34.0, -34.1, -34.2, -34.3, -34.4)
z <- matrix(c(151, 151, 154, 162, 171, 142, 142, 146, 155, 167, 137, 135.09, 141.57, 153.08, 164.60, 139.00, 139.22, 144.59, 154.31, 163.37, 143.13, 144.59, 149.90, 158.23, 164.99), nrow = 5, ncol = 5)
x0 <- c(-70.6, -70.35, -70.65, -71.85)
y0 <- c(-34.0, -34.05, -34.37, -33.88)
library(akima)
# Completely bad results:
bilinear(x, y, z, x0, y0)
#> $x
#> [1] -70.60 -70.35 -70.65 -71.85
#>
#> $y
#> [1] -34.00 -34.05 -34.37 -33.88
#>
#> $z
#> [1] 0 0 0 0
# Fix y and z:
y <- rev(y)
z <- z[, 5:1]
bilinear(x, y, z, x0, y0)
#> $x
#> [1] -70.60 -70.35 -70.65 -71.85
#>
#> $y
#> [1] -34.00 -34.05 -34.37 -33.88
#>
#> $z
#> [1] 151.00 154.25 0.00 0.00
x0 < min(x)
#> [1] FALSE FALSE TRUE TRUE
Created on 2022-02-08 by the reprex package (v2.0.1.9000)
We now have some reasonable interplated values for the 1st two points, but not the last two. But look at x0 < min(x): it is TRUE for the last two points. They are out of range.
I have a grid of 10x10m coordinates that I extracted from a raster. I have a set of 'starting points'. For each starting point, I want to find the location (coordinates) of cells within a 10-50m radius around it.
I am aware of functions to do this with a raster starting point, but additional analyses that I have not included here require that I perform the search from a grid of coordinates in the format shown below.
The code below achieves my aim, however the outer function produces vectors that are far too large (> 10 Gb) on my actual dataset (which is a grid of 9 million 10x10m cells, with 3000 starting points).
I am looking for alternatives that achieve the same result as the following (simplified) code, but do not require large vector storage or looping over each starting point separately.
library(raster)
library(tidyverse)
#Set up the mock raster
orig=raster(nrows=100, ncols=100)
res(orig)=10
vals <- rep(c(1, 2, 3, 1, 2, 3, 1, 3, 2), times = c(72, 72, 72, 72, 72, 72, 72, 72, 72))
setValues(orig, vals)
values(orig) <- vals
xygrid <- as.data.frame(orig, xy = TRUE) %>% .[,1:2]
head(xygrid)
x y
1 -175 85
2 -165 85
3 -155 85
4 -145 85
5 -135 85
6 -125 85
#the initial starting points
init_locs <- c(5, 10, 15, 20)
#calculate the distance to every surrounding cell from starting point
Rx <- outer(xygrid[init_locs, 1], xygrid[, 1], "-")
Ry <- outer(xygrid[init_locs, 2], xygrid[, 2], "-")
R <- sqrt(Rx^2+Ry^2) #overall distance
for (i in 1:length(R[,1])) {
expr2 <- (R[i,] > 10 & R[i,] <= 50) #extract the location of cells within 10-50m
inv <- xygrid[expr2,] #extract the coordinates of these cells
}
head(inv)
x y
15 -35 85
16 -25 85
17 -15 85
18 -5 85
22 35 85
23 45 85
(Raster and spatial data are not my specialty, but this made me think of a naive approach that might work acceptably. I don't know anything about the methods #Robert Hijmans mentioned, those are likely much more performant. I just thought this sounded like an interesting question to explore with basic methods.)</caveat>
Approach
The main challenge here is you have 9 million cells, but only around 80 of those will be with 50m of any given point. If you calculate all those cells' distances to 3,000 starting points and then filter for those under 50m, that's 9M x 3k = 27 billion calculations, and a gigantic data structure, almost all of which is unnecessary.
We can quickly get ~1,000x more efficient by splitting this into two problems -- first, what general region of potentially-within-50m-points should we look at, and second, what is the actual distance to the points in those regions?
We can precalculate a modestly sized <2MB hash table for step 1. Then, by joining it to our locations (a very fast operation), we can focus our calculations on the 1/1000th of points that have a chance of being within 50m. I arbitrarily split the original cells into 100 x 100 = 10k sectors, each sector holding 30x30 cells.
1. Creating hash table
For the hash table, I'll assign each point to a sector, somewhat arbitrarily as 30x30 cells, so we have 100x100 = 10k sectors. This could be tuned based on speed vs. memory tradeoffs.
max_dist = 30 # sector width, in cells
xygrid2 <- expand_grid(
x = seq(0, 2999, by = 1), # 3000x3000 location grid
y = seq(0, 2999, by = 1))
xygrid2$sector_x = xygrid2$x %/% max_dist # 100 x 100 sectors
xygrid2$sector_y = xygrid2$y %/% max_dist
y_range = max(xygrid2$sector_y) + 1
xygrid2$sector_num = xygrid2$sector_x*y_range + xygrid2$sector_y
We now have 10,000 sectors assigned. Now which sectors are adjacent to which others? In every case, the adjacent sectors follow the same pattern. In this case, I have 100 sectors across x, so the sectors adjacent to sector S will have sector numbers that vary from S by -101 -100 -99 -1 0 1 99 100 101. We can use this pattern to assign all the adjacencies instantaneously. For simplicity, I leave in sectors outside our range; they will be ignored later anyway.
sector_num_deltas <- rep(-1:1, by = 3) + rep(-1:1, each = 3) * y_range
distinct(xygrid2, sector_num) %>%
uncount(9) %>% # copy each row 9 times, one for each adjacency
mutate(sector_num_adj = sector_num + sector_num_deltas) -> adjacencies
2. Join and calculate
Now that we have that, the rest goes much faster, since we can do the calculations only on the 1/1000th of sectors that are nearby. With that, we can now identify the 240,000 points that are within 50m of the 3,000 starting positions in under 4 seconds:
# Here are 3,000 random starting locations
set.seed(42)
sample_starts <- xygrid2 %>%
slice_sample(n = 3000) %>%
mutate(sample_num = row_number())
# Join each location to all the adjacent sectors, and then add all the
# locations within those sectors, and then calculate distances.
sample_starts %>% # 3,000 starting points...
# join each position to the nine adjacent sectors = ~27,000 rows
left_join(adjacencies, by = "sector_num") %>%
# join each sector to the (30x30 = 900) cells in those sectors --> 24 million rows
# That's a lot, but it's only 1/1000th of the starting problem with
# 3k x 9M = 27 billion comparisons!
left_join(xygrid2, by = c("sector_num_adj" = "sector_num")) %>%
select(-contains("sector")) %>%
mutate(dist = sqrt((x.x-x.y)^2 + (y.x-y.y)^2)) %>%
filter(dist <= 5) -> result
The result tells us that our 3,000 sample starting points are within 5 decimeters (50m) of 242,575 cells, about 80 for each starting point.
result
# A tibble: 242,575 x 6
x.x y.x sample_num x.y y.y dist
<dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 1069 140 1 1064 140 5
2 1069 140 1 1065 137 5
3 1069 140 1 1065 138 4.47
4 1069 140 1 1065 139 4.12
5 1069 140 1 1065 140 4
6 1069 140 1 1065 141 4.12
7 1069 140 1 1065 142 4.47
8 1069 140 1 1065 143 5
9 1069 140 1 1066 136 5
10 1069 140 1 1066 137 4.24
# … with 242,565 more rows
Here's a sample to see how that's working in a small corner of our data:
ggplot(a %>% mutate(sample_grp = sector_num_adj %% 8 %>% as.factor),
aes(x.y, y.y, color = sample_grp)) +
geom_point(data = adjacencies %>% filter(sector_num_adj == 5864) %>%
left_join(xygrid2) %>% distinct(x, y, sector_num),
color = "gray80", shape = 21,
aes(x, y)) +
geom_point(data = adjacencies %>% filter(sector_num == 5864) %>%
left_join(xygrid2) %>% distinct(x, y, sector_num),
color = "gray70", shape = 21,
aes(x, y)) +
annotate("text", alpha = 0.5,
x = c(1725, 1750),
y = c(1960, 1940),
label = c("Lookup area", "sector of\nstarting location")) +
geom_point(size = 1) +
scale_color_discrete(guide = FALSE) +
coord_equal() -> my_plot
library(gganimate)
animate(
my_plot +
gganimate::view_zoom_manual(pan_zoom = -1, ease = "quadratic-in-out",
xmin = c(0, 1700),
xmax = c(3000, 1800),
ymin = c(0, 1880),
ymax = c(3000, 1980)),
duration = 3, fps = 20, width = 300)
Example data --- you were using a lon/lat example, but based on your code, I am assuming that you are using planar data.
library(raster)
r <- raster(nrows=100, ncols=100, xmn=0, xmx=100, ymn=0, ymx=100, crs="+proj=utm +zone=1 +datum=WGS84")
values(r) <- 1:ncell(r) # for display only
xygrid <- as.data.frame(r, xy = TRUE)[,1:2]
locs <- c(8025, 1550, 5075)
dn <- 2.5 # min dist
dx <- 5.5 # max dist
The simplest approach would be to use pointDistance
p <- xyFromCell(r, locs)
d <- pointDistance(xygrid, p, lonlat=FALSE)
u <- unique(which(d>dn & d<dx) %% nrow(d))
pts <- xygrid[u,]
plot(r)
points(pts)
But you will probably run out of memory with that, and it is inefficient to compute all distance. Instead, you may intersect the points with a buffer around the points of interest
b1 <- buffer(SpatialPoints(p, proj4string=crs(r)), dx)
b2 <- buffer(SpatialPoints(p, proj4string=crs(r)), dn)
b <- erase(b1, b2)
x <- intersect(SpatialPoints(xygrid, proj4string=crs(r)), b)
plot(r)
points(x, cex=.5)
points(xyFromCell(r, locs), col="red", pch="x")
With terra it goes like this -- and works well for large datasets in version 1.1-11 that should be on CRAN this week
library(terra)
rr <- rast(r)
pp <- xyFromCell(rr, locs)
bb1 <- buffer(vect(pp), dx)
bb2 <- buffer(vect(pp), dn)
bb <- erase(bb1, bb2)
xx <- intersect(vect(as.matrix(xygrid)), bb)
You can do similar things with sf.
Given that you have so many data points, you might want to start with removing all points that are clearly not of interest
xySel <- lapply(locs, function(i) {
xy <- xygrid[i,]
s <- xygrid[,1] > xy[,1]-dx & xygrid[,1] < xy[,1]+dx & xygrid[,2] > xy[,2]-dx & xygrid[,2] < xy[,2]+dx
xygrid[s,]
})
xySel = do.call(rbind, xySel)
dim(xySel)
# [1] 363 2
dim(xygrid)
#[1] 10000 2
And now you could run pointDistance as above on all data (or else inside the lapply function)
You say that you need to use points, and not a raster. I have seen that idea many times, and 9 out of 10 times that is wrong. Maybe it is true in your case. For others who stumble upon this question, here are are two raster based approaches.
With the raster package you could use extract( ... ,cellnumbers=TRUE) or ajacent. With adjacent, you would first make a weights matrix using one of the buffers made above
buf <- disaggregate(b)[2,]
rb <- crop(r, buf)
w <- as.matrix(rasterize(buf, rb, background=NA) )
w[6,6]=0
And then use the weight matrix like this
a <- adjacent(r, locs, w, pairs=FALSE)
pts <- xyFromCell(r, a)
plot(r)
points(pts)
With terra you could use the cells method
d <- cells(rr, bb)
xy <- xyFromCell(rr, d[,2])
plot(rr)
points(xy, cex=.5)
lines(bb, col="red", lwd=2)
I have a vector of numbers that I need to round according to the rules in the image below:
Consider the following examples:
0.5 -> 0.5 (no rounding)
1.2 -> 1.0
3.7 -> 4.0
18.9 -> 20.0
28.1 -> 30.0
110 -> 120
I could in theory write a series of conditional statements to achieve this task; however, it will be a tedious and inefficient thing to do. Is there a way to achieve the desired outcome in an efficient manner?
Thank you
You could use the floor of base 10 logarithm to calculate powers of 10. Then divide the vector by that, round it and multiply with the powers of 10 again.
tens <- 10^floor(log10(abs(x)))
round(x/tens)*tens
# [1] NaN 0.5 1.0 4.0 -4.0 20.0 30.0 100.0
Note, that this won't work for zero and you therefore should use case-handling.
(However, 110 -> 120 is not obvious to me.)
Data:
x <- c(0, .5, 1.2, 3.7, -3.7, 18.9, 28.1, 110)
This solution uses findInterval to get which of the rounding functions is to be applied to the vector's elements.
roundSpecial <- function(x){
round_funs <- list(
no_round <- function(x) x,
round_by_1 <- function(x) round(x),
round_to_20 <- function(x) 20,
round_by_10 <- function(x) 10*round(x / 10),
round_by_15 <- function(x) 15*round(x / 15),
round_by_30 <- function(x) 30*round(x / 30)
)
lims <- c(0, 1, 17, 20, 30, 90, Inf)
which_fun <- findInterval(x, lims)
sapply(seq_along(which_fun), function(i) {
round_funs[[ which_fun[i] ]](x[i])
})
}
roundSpecial(x)
#[1] 0.5 1.0 4.0 20.0 30.0 120.0
Data
x <- c(0.5, 1.2, 3.7, 18.9, 28.1, 110)
I am trying to create or find a function that calculates the distance and angle between two points, the idea is that I can have two data.frames with x, y coordinates as follows:
Example dataset
From <- data.frame(x = c(0.5,1, 4, 0), y = c(1.5,1, 1, 0))
To <- data.frame(x =c(3, 0, 5, 1), y =c(3, 0, 6, 1))
Current function
For now, I've managed to develop the distance part using Pythagoras:
distance <- function(from, to){
D <- sqrt((abs(from[,1]-to[,1])^2) + (abs(from[,2]-to[,2])^2))
return(D)
}
Which works fine:
distance(from = From, to = To)
[1] 2.915476 1.414214 5.099020 1.414214
but I can't figure out how to get the angle part.
What I tried so far:
I tried adapting the second solution of this question
angle <- function(x,y){
dot.prod <- x%*%y
norm.x <- norm(x,type="2")
norm.y <- norm(y,type="2")
theta <- acos(dot.prod / (norm.x * norm.y))
as.numeric(theta)
}
x <- as.matrix(c(From[,1],To[,1]))
y <- as.matrix(c(From[,2],To[,2]))
angle(t(x),y)
But I am clearly making a mess of it
Desired output
I would like having the angle part of the function added to my first function, where I get both the distance and angle between the from and to dataframes
By angle between two points, I am assuming you mean angle between two vectors
defined by endpoints (and assuming the start is the origin).
The example you used was designed around only a single pair of points, with the transpose used only on this principle. It is however robust enough to work in more than 2 dimensions.
Your function should be vectorised as your distance function is, as it is expecting a number of pairs of points (and we are only considering 2 dimensional points).
angle <- function(from,to){
dot.prods <- from$x*to$x + from$y*to$y
norms.x <- distance(from = `[<-`(from,,,0), to = from)
norms.y <- distance(from = `[<-`(to,,,0), to = to)
thetas <- acos(dot.prods / (norms.x * norms.y))
as.numeric(thetas)
}
angle(from=From,to=To)
[1] 0.4636476 NaN 0.6310794 NaN
The NaNs are due to you having zero-length vectors.
how about:
library(useful)
df=To-From
cart2pol(df$x, df$y, degrees = F)
which returns:
# A tibble: 4 x 4
r theta x y
<dbl> <dbl> <dbl> <dbl>
1 2.92 0.540 2.50 1.50
2 1.41 3.93 -1.00 -1.00
3 5.10 1.37 1.00 5.00
4 1.41 0.785 1.00 1.00
where r us the distance and theta is the angle
I am not totally sure how to describe my problem, so I might just need help to find the right keywords to search for.
Here are some dummy data that resembles my own data, there are x and y co-ordinates and a z value:
require(data.table)
example <- data.table(x = c(-3, -4, -2, -1, -1, 0, 0, 0, 1, 4, 4, 5),
y = c(2, -2, -2, -3, -0, 3, 4, 4, -1, 4, 4, 4),
z = c(10, 10, 20, 30, 40, 40, 50, 70, 70, 80, 90, 90))
There are some duplicate co-ordinates in there, e.g. at (4,4) so the next step is to average the z values for the duplicate points:
example <- as.data.table(aggregate(z ~ x + y, data = example, FUN = "mean"))
Next, I would like to add z = 0 values to all of the coordinates that I don't have data for, e.g. (x = 0, y = 0), (x = 1, y = 1) etc. for the range -5:5 in both x and y axes.
How do I go about this?
To clarify: I have z values for specific x and y coordinates, I'd like to create a data table (or matrix) which has all x,y coordinates from -5,-5 to 5,5 with z = 0 except for the specific z values I already have.
Thanks!
Maybe this is what you are looking for.
example[, .(z=mean(z)), by=.(x, y)][CJ(x=-5:5, y=-5:5), on=c("x", "y")][is.na(z), z:=0][]
x y z
1: -5 -5 0
2: -5 -4 0
3: -5 -3 0
4: -5 -2 0
5: -5 -1 0
---
117: 5 1 0
118: 5 2 0
119: 5 3 0
120: 5 4 90
121: 5 5 0
Here, example[, .(z=mean(z)), by=.(x, y)] performs the data.table equivalent of your aggregate function. The result is then joined to the Cartesian product of -5:5 with itself (11^2 = 121 observations) CJ(x=-5:5, y=-5:5) with the second chain [CJ(x=-5:5, y=-5:5), on=c("x", "y")]. The join fills in NA for x y combinations not present in the aggregated data, so in the final chain, The NA values of z are set to 0 [is.na(z), z:=0]. The last bit prints the output.