Find coordinates within radius around many starting points in grid - r

I have a grid of 10x10m coordinates that I extracted from a raster. I have a set of 'starting points'. For each starting point, I want to find the location (coordinates) of cells within a 10-50m radius around it.
I am aware of functions to do this with a raster starting point, but additional analyses that I have not included here require that I perform the search from a grid of coordinates in the format shown below.
The code below achieves my aim, however the outer function produces vectors that are far too large (> 10 Gb) on my actual dataset (which is a grid of 9 million 10x10m cells, with 3000 starting points).
I am looking for alternatives that achieve the same result as the following (simplified) code, but do not require large vector storage or looping over each starting point separately.
library(raster)
library(tidyverse)
#Set up the mock raster
orig=raster(nrows=100, ncols=100)
res(orig)=10
vals <- rep(c(1, 2, 3, 1, 2, 3, 1, 3, 2), times = c(72, 72, 72, 72, 72, 72, 72, 72, 72))
setValues(orig, vals)
values(orig) <- vals
xygrid <- as.data.frame(orig, xy = TRUE) %>% .[,1:2]
head(xygrid)
x y
1 -175 85
2 -165 85
3 -155 85
4 -145 85
5 -135 85
6 -125 85
#the initial starting points
init_locs <- c(5, 10, 15, 20)
#calculate the distance to every surrounding cell from starting point
Rx <- outer(xygrid[init_locs, 1], xygrid[, 1], "-")
Ry <- outer(xygrid[init_locs, 2], xygrid[, 2], "-")
R <- sqrt(Rx^2+Ry^2) #overall distance
for (i in 1:length(R[,1])) {
expr2 <- (R[i,] > 10 & R[i,] <= 50) #extract the location of cells within 10-50m
inv <- xygrid[expr2,] #extract the coordinates of these cells
}
head(inv)
x y
15 -35 85
16 -25 85
17 -15 85
18 -5 85
22 35 85
23 45 85

(Raster and spatial data are not my specialty, but this made me think of a naive approach that might work acceptably. I don't know anything about the methods #Robert Hijmans mentioned, those are likely much more performant. I just thought this sounded like an interesting question to explore with basic methods.)</caveat>
Approach
The main challenge here is you have 9 million cells, but only around 80 of those will be with 50m of any given point. If you calculate all those cells' distances to 3,000 starting points and then filter for those under 50m, that's 9M x 3k = 27 billion calculations, and a gigantic data structure, almost all of which is unnecessary.
We can quickly get ~1,000x more efficient by splitting this into two problems -- first, what general region of potentially-within-50m-points should we look at, and second, what is the actual distance to the points in those regions?
We can precalculate a modestly sized <2MB hash table for step 1. Then, by joining it to our locations (a very fast operation), we can focus our calculations on the 1/1000th of points that have a chance of being within 50m. I arbitrarily split the original cells into 100 x 100 = 10k sectors, each sector holding 30x30 cells.
1. Creating hash table
For the hash table, I'll assign each point to a sector, somewhat arbitrarily as 30x30 cells, so we have 100x100 = 10k sectors. This could be tuned based on speed vs. memory tradeoffs.
max_dist = 30 # sector width, in cells
xygrid2 <- expand_grid(
x = seq(0, 2999, by = 1), # 3000x3000 location grid
y = seq(0, 2999, by = 1))
xygrid2$sector_x = xygrid2$x %/% max_dist # 100 x 100 sectors
xygrid2$sector_y = xygrid2$y %/% max_dist
y_range = max(xygrid2$sector_y) + 1
xygrid2$sector_num = xygrid2$sector_x*y_range + xygrid2$sector_y
We now have 10,000 sectors assigned. Now which sectors are adjacent to which others? In every case, the adjacent sectors follow the same pattern. In this case, I have 100 sectors across x, so the sectors adjacent to sector S will have sector numbers that vary from S by -101 -100 -99 -1 0 1 99 100 101. We can use this pattern to assign all the adjacencies instantaneously. For simplicity, I leave in sectors outside our range; they will be ignored later anyway.
sector_num_deltas <- rep(-1:1, by = 3) + rep(-1:1, each = 3) * y_range
distinct(xygrid2, sector_num) %>%
uncount(9) %>% # copy each row 9 times, one for each adjacency
mutate(sector_num_adj = sector_num + sector_num_deltas) -> adjacencies
2. Join and calculate
Now that we have that, the rest goes much faster, since we can do the calculations only on the 1/1000th of sectors that are nearby. With that, we can now identify the 240,000 points that are within 50m of the 3,000 starting positions in under 4 seconds:
# Here are 3,000 random starting locations
set.seed(42)
sample_starts <- xygrid2 %>%
slice_sample(n = 3000) %>%
mutate(sample_num = row_number())
# Join each location to all the adjacent sectors, and then add all the
# locations within those sectors, and then calculate distances.
sample_starts %>% # 3,000 starting points...
# join each position to the nine adjacent sectors = ~27,000 rows
left_join(adjacencies, by = "sector_num") %>%
# join each sector to the (30x30 = 900) cells in those sectors --> 24 million rows
# That's a lot, but it's only 1/1000th of the starting problem with
# 3k x 9M = 27 billion comparisons!
left_join(xygrid2, by = c("sector_num_adj" = "sector_num")) %>%
select(-contains("sector")) %>%
mutate(dist = sqrt((x.x-x.y)^2 + (y.x-y.y)^2)) %>%
filter(dist <= 5) -> result
The result tells us that our 3,000 sample starting points are within 5 decimeters (50m) of 242,575 cells, about 80 for each starting point.
result
# A tibble: 242,575 x 6
x.x y.x sample_num x.y y.y dist
<dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 1069 140 1 1064 140 5
2 1069 140 1 1065 137 5
3 1069 140 1 1065 138 4.47
4 1069 140 1 1065 139 4.12
5 1069 140 1 1065 140 4
6 1069 140 1 1065 141 4.12
7 1069 140 1 1065 142 4.47
8 1069 140 1 1065 143 5
9 1069 140 1 1066 136 5
10 1069 140 1 1066 137 4.24
# … with 242,565 more rows
Here's a sample to see how that's working in a small corner of our data:
ggplot(a %>% mutate(sample_grp = sector_num_adj %% 8 %>% as.factor),
aes(x.y, y.y, color = sample_grp)) +
geom_point(data = adjacencies %>% filter(sector_num_adj == 5864) %>%
left_join(xygrid2) %>% distinct(x, y, sector_num),
color = "gray80", shape = 21,
aes(x, y)) +
geom_point(data = adjacencies %>% filter(sector_num == 5864) %>%
left_join(xygrid2) %>% distinct(x, y, sector_num),
color = "gray70", shape = 21,
aes(x, y)) +
annotate("text", alpha = 0.5,
x = c(1725, 1750),
y = c(1960, 1940),
label = c("Lookup area", "sector of\nstarting location")) +
geom_point(size = 1) +
scale_color_discrete(guide = FALSE) +
coord_equal() -> my_plot
library(gganimate)
animate(
my_plot +
gganimate::view_zoom_manual(pan_zoom = -1, ease = "quadratic-in-out",
xmin = c(0, 1700),
xmax = c(3000, 1800),
ymin = c(0, 1880),
ymax = c(3000, 1980)),
duration = 3, fps = 20, width = 300)

Example data --- you were using a lon/lat example, but based on your code, I am assuming that you are using planar data.
library(raster)
r <- raster(nrows=100, ncols=100, xmn=0, xmx=100, ymn=0, ymx=100, crs="+proj=utm +zone=1 +datum=WGS84")
values(r) <- 1:ncell(r) # for display only
xygrid <- as.data.frame(r, xy = TRUE)[,1:2]
locs <- c(8025, 1550, 5075)
dn <- 2.5 # min dist
dx <- 5.5 # max dist
The simplest approach would be to use pointDistance
p <- xyFromCell(r, locs)
d <- pointDistance(xygrid, p, lonlat=FALSE)
u <- unique(which(d>dn & d<dx) %% nrow(d))
pts <- xygrid[u,]
plot(r)
points(pts)
But you will probably run out of memory with that, and it is inefficient to compute all distance. Instead, you may intersect the points with a buffer around the points of interest
b1 <- buffer(SpatialPoints(p, proj4string=crs(r)), dx)
b2 <- buffer(SpatialPoints(p, proj4string=crs(r)), dn)
b <- erase(b1, b2)
x <- intersect(SpatialPoints(xygrid, proj4string=crs(r)), b)
plot(r)
points(x, cex=.5)
points(xyFromCell(r, locs), col="red", pch="x")
With terra it goes like this -- and works well for large datasets in version 1.1-11 that should be on CRAN this week
library(terra)
rr <- rast(r)
pp <- xyFromCell(rr, locs)
bb1 <- buffer(vect(pp), dx)
bb2 <- buffer(vect(pp), dn)
bb <- erase(bb1, bb2)
xx <- intersect(vect(as.matrix(xygrid)), bb)
You can do similar things with sf.
Given that you have so many data points, you might want to start with removing all points that are clearly not of interest
xySel <- lapply(locs, function(i) {
xy <- xygrid[i,]
s <- xygrid[,1] > xy[,1]-dx & xygrid[,1] < xy[,1]+dx & xygrid[,2] > xy[,2]-dx & xygrid[,2] < xy[,2]+dx
xygrid[s,]
})
xySel = do.call(rbind, xySel)
dim(xySel)
# [1] 363 2
dim(xygrid)
#[1] 10000 2
And now you could run pointDistance as above on all data (or else inside the lapply function)
You say that you need to use points, and not a raster. I have seen that idea many times, and 9 out of 10 times that is wrong. Maybe it is true in your case. For others who stumble upon this question, here are are two raster based approaches.
With the raster package you could use extract( ... ,cellnumbers=TRUE) or ajacent. With adjacent, you would first make a weights matrix using one of the buffers made above
buf <- disaggregate(b)[2,]
rb <- crop(r, buf)
w <- as.matrix(rasterize(buf, rb, background=NA) )
w[6,6]=0
And then use the weight matrix like this
a <- adjacent(r, locs, w, pairs=FALSE)
pts <- xyFromCell(r, a)
plot(r)
points(pts)
With terra you could use the cells method
d <- cells(rr, bb)
xy <- xyFromCell(rr, d[,2])
plot(rr)
points(xy, cex=.5)
lines(bb, col="red", lwd=2)

Related

R library for R-tree implementation

I have data frame, for example
df <- data.frame(x = 1:1e3, y = rnorm(1e3))
I need to split points on N (in my case N = 6, 12 and 24) rectangles with equal number of points. How to split my df using R-tree algorithm?
For uniformely distributed data on the x axis, kmeans clustering works (without surprise) well:
library(dplyr)
library(ggplot2)
set.seed(1)
df <- data.frame(x = 1:1e3, y = rnorm(1e3))
N <- 10
df$cluster <- kmeans(df,N)$cluster
cluster_rectangles <- df %>% group_by(cluster) %>%
summarize(xmin = min(x),
xmax = max(x),
ymin = min(y),
ymax = max(y),
n = n())
ggplot() + geom_rect(data = cluster_rectangles, mapping=aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax, fill=cluster)) +
geom_point(data = df,mapping=aes(x,y),color='white')
It also works if x distribution is normal :
df <- data.frame(x = rnorm(1e3), y = rnorm(1e3))
Drawback is that the number of points for each rectangle varies :
> cluster_rectangles %>% select(cluster,n)
# A tibble: 10 x 2
cluster n
<int> <int>
1 1 137
2 2 58
3 3 121
4 4 61
5 5 72
6 6 184
7 7 78
8 8 70
9 9 126
10 10 93
For an uniform distribution, the result is quite good (with N=9):
In case that all the points have different x coordinates, as it is the case in your example, sort the points increasingly according to the x coordinate. Note that, in this case, your problem of finding a covering with rectangles (with equal number of points) for the 2d points can be simplified to finding a covering with segments for 1d points (i.e. you can ignore the height of the rectangles).
Here how you can find the points in each rectangle:
num_rect <- 7 # In your example 6, 12 or 24
num_points <- 10 # In your example 1e3
# Already ordered according to x
df <- data.frame(x = 1:num_points, y = rnorm(num_points))
# Minimum number of points in the rectangles to cover all of them
points_in_rect <- ceiling(num_points/num_rect)
# Cover the first points using non-overlaping rectangles
breaks <- seq(0,num_points, by=points_in_rect)
cover <- split(seq(num_points), cut(seq(num_points), breaks))
names(cover) <- paste0("rect", seq(length(cover)))
# Cover the last points using overlaping rectangles
cur_num <- length(cover)
if (num_points < num_rect*points_in_rect ) {
# To avoid duplicate rectangles
last <- num_points
if (num_points %% 1 == 0)
last <- last -1
while (cur_num < num_rect) {
cur_num <- cur_num + 1
new_rect <- list(seq(last-points_in_rect+1, last))
names(new_rect) <- paste0("rect", cur_num)
cover <- c(cover,new_rect)
last <- last - points_in_rect
}
}
The points in the rectangles are:
$rect1
[1] 1 2
$rect2
[1] 3 4
$rect3
[1] 5 6
$rect4
[1] 7 8
$rect5
[1] 9 10
$rect6
[1] 8 9
$rect7
[1] 6 7
The minimum bounding rectangles (parallel to the axes) that enclose those set of points are the ones that you are finding.
Duplicated coordinate values in both axes
Randomly rotate the points (save the rotation angle) and check if there are not duplicate x (or y) coordinates. If this is the case, use the above strategy with the rotated coordinates (remember to sort before the rotated points according to the new x coordinates), and then rotate back the obtained rectangles in the opposite direction. If duplicated coordinates remain in both axes, rotate the points again with a different (random) angle. Since you have a finite number of points, you can always find a rotation angle that separates de x (or y) coordinates.

How do I calculate the mean of two different variables taking into account the values of latitude and longitude in R?

I am currently trying to obtain some data in R from a table.
I have a dataset with two different variables, the annual range and the annual mean, of the worldwide sea surface temperature (SST). I have these values for each latitude (from 90 to -90) and longitude (from 180 to -180) level.
I would like to obtain the mean of the aforementioned variables (annual range and annual mean) for 5x5 grid cells of latitude/longitude. For example, I would need to know the "annual range" mean for for a longitude between -180 and -176 and a latitude between 90 and 86, and so on until getting the mean of this variable for all the possible 5x5 grid cells.
My data looks like:
lon lat ANNUAL_MEAN ANNUAL_RANGE
1 0.5 89.5 -1.8 0
2 1.5 89.5 -1.8 0
3 2.5 89.5 -1.8 0
4 3.5 89.5 -1.8 0
5 4.5 89.5 -1.8 0
6 5.5 89.5 -1.8 0
...
52001 354.5 -89.5 -1.8 0
52002 355.5 -89.5 -1.8 0
52003 356.5 -89.5 -1.8 0
52004 357.5 -89.5 -1.8 0
52005 358.5 -89.5 -1.8 0
52006 359.5 -89.5 -1.8 0
Thank you in advance
You can use raster package and its focal function for computations with a moving window.
First I will create a dummy data.frame which represents your data
# Prepare dummy data.frame
set.seed(2222)
lonlat <- expand.grid(1:10, 1:10)
df <- data.frame( lon = lonlat[, 1],
lat = lonlat[, 2],
ANNUAL_MEAN = rnorm(100),
ANNUAL_RANGE = runif(100, 1, 5)
)
Now we have to convert data frame into raster and to perform a moving window averaging.
library(raster)
# Convert data frame to raster object
rdf <- df
coordinates(rdf) <- ~ lon + lat
gridded(rdf) <- TRUE
rdf <- brick(rdf) # our raster brick
## Perform moving window averaging
# prepare weights matrix (5*5)
w <- matrix(1, ncol = 5, nrow = 5)
# perform moving window averaging
ANNUAL_MEAN_AVG <- focal(rdf[[1]], w, mean, pad = TRUE, na.rm = TRUE)
ANNUAL_RANGE_AVG <- focal(rdf[[2]], w, mean, pad = TRUE, na.rm = TRUE)
# Append new data to initial data.frame
df$ANNUAL_MEAN_AVG <- as.data.frame(ANNUAL_MEAN_AVG)
df$ANNUAL_RANGE_AVG <- as.data.frame(ANNUAL_RANGE_AVG)
Now each cell in df$ANNUAL_MEAN_AVG and df$ANNUAL_RANGE_AVG contains the mean value of the corresponding 5*5 square.
UPD 1. 5x5 downsampling
If you need a fixed 5x5 grid cells with mean values per cell you can use raster::agregate function.
Working with rdf raster brick from the previous example.
# perform an aggregation with given downsampling factor
rdf_d <- aggregate(rdf, fact=5, fun = mean)
# Now each pixel in the raster `rdf_d` contains a mean value of 5x5 pixels from initial `rdf`
# we need to get pixels coordinates and their values
coord <- coordinates(rdf_d)
vals <- as.data.frame(rdf_d)
colnames(coord) <- c("lon", "lat")
colnames(vals) <- c("ANNUAL_MEAN_AVG", "ANNUAL_RANGE_AVG")
res <- cbind(coord, vals)
This is a solution that uses the dplyr package, included in tidyverse. It should be easy to follow, step by step.
library(tidyverse)
# set.seed() assures reproducability of the example with identical random numbers
set.seed(42)
# build a simulated data set as described in the question
lats <- seq(from = -90, to = 90, by = 0.5)
lons <- seq(from = -180, to = 179.5, by = 0.5) # we must omit +180 or we would
# double count those points
# since they coincide with -180
# combining each latitude point with each longitude point
coord <- merge(lats, lons) %>%
rename(lat = x) %>%
rename(lon = y) %>%
# adding simulated values
mutate(annual_mean = runif(n = nrow(.), min = -2, max = 2)) %>%
mutate(annual_range = runif(n = nrow(.), min = 0, max = 3)) %>%
# defining bands of 5 latitude and 5 longitude points by using integer division
mutate(lat_band = lat%/%5) %>%
mutate(lon_band = lon%/%5) %>%
# creating a name label for each unique 5x5 gridcell
mutate(gridcell_5x5 = paste(lat_band, lon_band, sep = ",")) %>%
# group-by instruction, much like in SQL
group_by(lat_band, lon_band, gridcell_5x5) %>%
# sorting to get a nice order
arrange(lat_band, lon_band) %>%
# calculating minimum and maximum latitude and longitude for each gridcell
# calculating the mean values per gridcell
summarize(gridcell_min_lat = min(lat),
gridcell_max_lat = max(lat),
gridcell_min_lon = min(lon),
gridcell_max_lon = max(lon),
gridcell_mean_annual_mean = round(mean(annual_mean), 3),
gridcell_mean_annual_range = round(mean(annual_range), 3) )

Link segments matched by column value in R

Hello
I am attempting to plot segmented lines and connect them by matching values.
I have already plotted segments by the "Start" and "End" values as x coordinates and the Group as the y coordinates in R. I would like to connect these segments with a line if they share the same "id", as indicated by my sample dataset data:
Name Start End Group ID
TP1 363248 366670 7 98
TP2 365869 369291 11 98
TP3 366459 369881 1 98
AB1 478324 481599 11 134
AB2 478855 482130 1 134
AB3 480681 483956 10 134
JD1 166771 169764 6 214
JD2 386419 389244 7 214
JD2 389025 391850 11 214
What I have so far using data is:
x <- seq(0, 4100000, length = 200)
y <- seq(0, 15, length = 200)
plot(x,y,type="n");
start.x <- (data[,2])
end.x <- (data[,3])
end.y <- start.y <- (data[,4]) # from and to y coords the same
segments(x0 = start.x, y0 = start.y, x1 = end.x, y1 = end.y)
lines(data[,1], data[,5])
My segments are plotted just fine, but my connecting lines do not appear. Any suggestions as to how I can draw connecting lines? Thank you very much.
In my code below I zoomed in the plot using the xlim and ylim parameters so we can get a better look at the plotted data.
As you can see, I'm using a for loop to iterate over each unique ID value. For each value, I get the combinations of all pairs of records in the group using combn(). I then iterate over each combination using apply(). For each combination I call segments() to draw a segment between the centers of the two (original) segments. I use a different color for each group so they can easily be distinguished.
df <- data.frame(Name=c('TP1','TP2','TP3','AB1','AB2','AB3','JD1','JD2','JD2'),Start=c(363248,365869,366459,478324,478855,480681,166771,386419,389025),End=c(366670,369291,369881,481599,482130,483956,169764,389244,391850),Group=c(7,11,1,11,1,10,6,7,11),ID=c(98,98,98,134,134,134,214,214,214));
xlim <- c(min(df$Start),max(df$End));
ylim <- c(min(df$Group),max(df$Group));
plot(NA,xlim=xlim,ylim=ylim,xlab='x',ylab='y');
start.x <- df[,'Start'];
end.x <- df[,'End'];
end.y <- start.y <- df[,'Group'];
segments(start.x,start.y,end.x,end.y);
uid <- unique(df$ID);
cols <- rainbow(length(uid));
for (i in seq_along(uid)) {
df.sub <- subset(df,ID==uid[i]);
col <- cols[i];
apply(combn(nrow(df.sub),2),2,function(ris) {
r1 <- df.sub[ris[1],];
r2 <- df.sub[ris[2],];
segments(mean(c(r1$Start,r1$End)),r1$Group,mean(c(r2$Start,r2$End)),r2$Group,col=col);
});
};

Analyze data for spatial join (points to grid) and generate new dataset in R

I've got a dataset with longitude/latitude points and an outcome value for each set of coordinates. I would like to create a spatial grid and then take the average of outcomes for coordinates that are in the same grid and generate a new dataframe for which each coordinate is assigned a grid number and has the averaged outcome. For example, starting with this code:
require(sp)
require(raster)
frame <- data.frame(x = c(7.5, 8.2, 8.3), y = c(1,4,4.5), z = c(10,15,30))
coordinates(frame) <- c("x", "y")
proj4string(frame) <- CRS("+proj=longlat")
grid <- GridTopology(cellcentre.offset= c(0,0), cellsize = c(2,2), cells.dim = c(5,5))
sg <- SpatialGrid(grid)
poly <- as.SpatialPolygons.GridTopology(grid)
proj4string(poly) <- CRS("+proj=longlat")
plot(poly)
text(coordinates(poly), labels = row.names(poly), col = "gray", cex. =.6)
points(frame$x, frame$y, col = "blue", cex = .8)
I would then like to average the outcomes (z) within grid cells and produce an dataframe that looks like this (.e.g. observation):
x y z grid grid_mean
1 7.5 1.0 10 g20 10
2 8.2 4.0 15 g15 22.5
3 8.3 4.5 30 g15 22.5
Thanks for any and all help.
You can use the over(...) function in package sp for this. You don't need package raster at all, as far as I can see.
require(sp)
frame <- data.frame(x = c(7.5, 8.2, 8.3), y = c(1,4,4.5), z = c(10,15,30))
points <- SpatialPoints(frame)
proj4string(points) <- CRS("+proj=longlat")
grid <- GridTopology(cellcentre.offset= c(0,0), cellsize = c(2,2), cells.dim = c(5,5))
sg <- SpatialGrid(grid)
poly <- as.SpatialPolygons.GridTopology(grid)
proj4string(poly) <- CRS("+proj=longlat")
# identify grids...
result <- data.frame(frame,grid=over(points,poly))
# calculate means...
result <- merge(result,aggregate(z~grid,result,mean),by="grid")
# rename and reorder columns to make it look like your result
colnames(result) <- c("grid","x","y","z","grid_mean")
result <- result[,c(2,3,4,1,5)]
result
# x y z grid grid_mean
# 1 8.2 4.0 15 15 22.5
# 2 8.3 4.5 30 15 22.5
# 3 7.5 1.0 10 25 10.0
The over(x,y,...) function compares two Spatial* objects as overlays and returns a vector with the index into y of each geometry in x. In this case x is a SpatialPoints object and y is a SpatialPolygons object. So over(...) identifies the polygon ID (grid cell) in y associated with each point in x. The rest just calculates the means, merges the means with the original data frame, and renames and reorders the columns so the result looks like your result.
I tweaked your code a bit because it didn't make sense: you create a data frame with z-values, then convert it to a SpatialPoints object, which discards the z-values...

Create Spatial Data in R

I have a dataset of species and their rough locations in a 100 x 200 meter area. The location part of the data frame is not in a format that I find to be usable. In this 100 x 200 meter rectangle, there are two hundred 10 x 10 meter squares named A through CV. Within each 10 x 10 square there are four 5 x 5 meter squares named 1, 2, 3, and 4, respectively (1 is south of 2 and west of 3. 4 is east of 2 and north of 3). I want to let R know that A is the square with corners at (0 ,0), (10,0), (0,0), and (0,10), that B is just north of A and has corners (0,10), (0,20), (10,10), and (10,20), and K is just east of A and has corners at (10,0), (10,10), (20,0), and (20,10), and so on for all the 10 x 10 meter squares. Additionally, I want to let R know where each 5 x 5 meter square is in the 100 x 200 meter plot.
So, my data frame looks something like this
10x10 5x5 Tree Diameter
A 1 tree1 4
B 1 tree2 4
C 4 tree3 6
D 3 tree4 2
E 3 tree5 3
F 2 tree6 7
G 1 tree7 12
H 2 tree8 1
I 2 tree9 2
J 3 tree10 8
K 4 tree11 3
L 1 tree12 7
M 2 tree13 5
Eventually, I want to be able to plot the 100 x 200 meter area and have each 10 x 10 meter square show up with the number of trees, or number of species, or total biomass
What is the best way to turn the data I have into spatial data that R can use for graphing and perhaps analysis?
Here's a start.
## set up a vector of all 10x10 position tags
tags10 <- c(LETTERS,
paste0("A",LETTERS),
paste0("B",LETTERS),
paste0("C",LETTERS[1:22]))
A function to convert (e.g.) {"J",3} to the center of the corresponding sub-square.
convpos <- function(pos10,pos5) {
## convert letters to major (x,y) positions
p1 <- as.numeric(factor(pos10,levels=tags10)) ## or use match()
p1.x <- ((p1-1) %% 10) *10+5 ## %% is modulo operator
p1.y <- ((p1-1) %/% 10)*10+5 ## %/% is integer division
## sort out sub-positions
p2.x <- ifelse(pos5 <=2,2.5,7.5) ## {1,2} vs {3,4} values
p2.y <- ifelse(pos5 %%2 ==1 ,2.5,7.5) ## odd {1,3} vs even {2,4} values
c(p1.x+p2.x,p1.y+p2.y)
}
usage:
convpos("J",2)
convpos(mydata$tenbytenpos,mydata$fivebyfivepos)
Important notes:
this is a proof of concept, I can pretty much guarantee I haven't got the correspondence of x and y coordinates quite right. But you should be able to trace through this line-by-line and see what it's doing ...
it should work correctly on vectors (see second usage example above): I switched from switch to ifelse for that reason
your column names (10x10) are likely to get mangled into something like X10.10 when reading data into R: see ?data.frame and ?check.names
Similar to what #Ben Bolker has done, here's a lookup function (though you may need to transpose something to make the labels match what you describe).
tenbyten <- c(LETTERS[1:26],
paste0("A",LETTERS[1:26]),
paste0("B",LETTERS[1:26]),
paste0("C",LETTERS[1:22]))
tenbyten <- matrix(rep(tenbyten, each = 2), ncol = 10)
tenbyten <- t(apply(tenbyten, 1, function(x){rep(x, each = 2)}))
# the 1234 squares
squares <- matrix(c(rep(c(1,2),10),rep(c(4,3),10)), nrow = 20, ncol = 20)
# stick together into a reference grid
my.grid <- matrix(paste(tenbyten, squares, sep = "-"), nrow = 20, ncol = 20)
# a lookup function for the site grid
coordLookup <- function(tbt, fbf, .my.grid = my.grid){
x <- col(.my.grid) * 5 - 2.5
y <- row(.my.grid) * 5 - 2.5
marker <- .my.grid == paste(tbt, fbf, sep = "-")
list(x = x[marker], y = y[marker])
}
coordLookup("BB",2)
$x
[1] 52.5
$y
[1] 37.5
If this isn't what you're looking for, then maybe you'd prefer a SpatialPolygonsDataFrame, which has proper polygon IDs, and you attach data to, etc. In that case just Google around for how to make one from scratch, and manipulate the row() and col() functions to get your polygon corners, similar to what's given in this lookup function, which only returns centroids.
Edit: getting SPDF started:
This is modified from the function example and can hopefully be a good start:
library(sp)
# really you have a 20x20 grid, counting the small ones.
# c(2.5,2.5) specifies the distance in any direction from the cell center
grd <- GridTopology(c(1,1), c(2.5,2.5), c(20,20)))
grd <- as.SpatialPolygons.GridTopology(grd)
# get centroids
coords <- coordinates(polys)
# make SPDF, with an extra column for your grid codes, taken from the above.
# you can add further columns to this data.frame(), using polys#data
polys <- SpatialPolygonsDataFrame(grd,
data=data.frame(x=coords[,1], y=coords[,2], my.ID = as.vector(my.grid),
row.names=getSpPPolygonsIDSlots(grd)))

Resources