R - Linking two different sets of coordinates - r

I have two data sets - let's call them 'plot'(734 rows) and 'stations'(62 rows). A while ago I worked out that this code should let me link each 'plot' to its nearest 'station' based on their coordinates
data set is a little like this - (but without the headers of Long and Lat)
plot - Long Lat stations - Long Lat
13.2 60.5 14.6 55.4
15.4 62.6 15.5 62.9
15.6 62.4 16.4 58.9
16.5 58.7 19.3 64.0
16.5 58.5
#print results to "results.csv"
sink("results.csv")
#identifyl long + lat coords of each data set
p_coord<-SpatialPoints(plot[,c(1,2)])
s_coord<-SpatialPoints(stations[,c(1,2)])
#link coordinates
require(FNN)
g = get.knnx(coordinates(s_coord), coordinates(p_coord),k=1)
str(g)
plot(s_coord_2, col=2, xlim=c(-1e5,6e5))
plot(p_coord, add=TRUE)
segments(coordinates(p_coord)[,1], coordinates(p_coord)[,2], coordinates(s_coord[g$nn.index[,1]])[,1], coordinates(s_coord[g$nn.index[,1]])[,2])
#print result in results.csv
print(g)
I've since realised that the results i get are slightly wrong - for example plots #3 and #4 are linked to station #4, when it would be more applicable that plots #4 and #5 are linked to station #4.
So this leads me to think that something in the code is slightly off, but only by one row
would appreciate any comments on my code, or am equally interested into suggestions on simpler ways to connect two series of coordinates
Thanks

What is your coordinate reference system? Are these points in Scandinavia?
Anyway, you could go with the geosphere package and use distHaversine or
distVincentyEllipsoid(more precise) to get the distances:
plot <- data.frame(Lon = c(13.2,15.4,15.6,16.5,16.5),
Lat = c(60.5,62.6,62.4,58.7,58.5))
stations <- data.frame(Lon = c(14.6,15.5,16.4,19.3),
Lat = c(55.4,62.9,58.9,64))
p_coord <- SpatialPoints(plot[,c(1,2)])
s_coord <- SpatialPoints(stations[,c(1,2)])
library(geosphere)
apply(p_coord#coords, 1, function(x) {
which.min(distHaversine(p1 = x, p2 = s_coord#coords))
})
The output will be
[1] 3 2 2 3 3
which means that plot 1 is close to station 3, plot 2 is linked to station 2 and so on.

Related

Cutting raster data in R

I have an interesting case to solve, as I don't have that much experience with this type of data I am asking for some help.
I have about 1000 TIF files, each over 30MB. Every of those files has at least one house roof I am interested in. I would like to cut from the tif only some specific locations based on a file where I store lon lat data. I managed to plot the tif file but I am struggeling to plot the lon lat points. As a next step I would like to cut of only the area near the point. Is it possible?
For importing the TIF files I am using stars.
y <- read_stars('data/file_1.tif')
y_ll <- st_transform(y, "+proj=longlat +ellps=WGS84 +datum=WGS84")
coordinates(lon_lat_data) = ~lon+lat
plot(y_ll[,,,1], axes = TRUE)
points(lon_lat_data$lon, lon_lat_data$lat, pch=19, col="red", cex=0.5)
The above is not working, any idea why?
After plotting the data point, is there a function in raster to extract only the area by the point?
Here you can find the files:
https://drive.google.com/drive/folders/1UCgcqCKHQHc5PsbPv95zEAmQ0qwKoYY0?usp=sharing
Example data (files that ship with R):
library(terra)
f <- system.file("ex/elev.tif", package="terra")
r <- rast(f)
xy <- spatSample(r, 10, xy=TRUE, na.rm=TRUE)
v <- vect(as.matrix(xy[,1:2]), crs=crs(r))
plot(r)
points(v)
In this example the raster and point data have the same CRS. If that is not the case for you, you should always transform the points (exact and lossless), not the raster data.
v <- project(v, crs(r))
I do not know what exactly you mean wit "cutting data", but you can use extract to get the raster cell values for the point locations.
extract(r, v)
# ID elevation
#1 1 325
#2 2 281
#3 3 324
#4 4 353
#5 5 271
#6 6 306
#7 7 332
#8 8 260
#9 9 270
#10 10 280
You could also make a buffer around the points, and perhaps use crop.

Adding labels to points plotted on world map in R

I have plotted a series of latitude/longitude coordinates on a World Map using R. I would like to add labels to these points that I have plotted.
Currently my code is:
library(maps)
cities<-read.csv("cities.csv", header=T)
cities
id lat lon
1 Nigeria 7.0 6.0
2 Gambia 13.3 16.0
3 Cambodia 12.0 105.0
4 France 46.0 2.0
5 Greece 38.0 23.7
map(database="world")
points(x = cities$lon, y = cities$lat, col = "red", pch=20)
I want to add either a label (cities$id) or number each of the points in order so I know which point corresponds to which of my data entries.
I have seen codes with ggplot2 but I cannot install ggmap (perhaps my version 1.0.44 is too old) so am trying to stay away from this approach.
Any advice would be greatly appreciated. Many thanks in advance!
World map with unlabeled points
Using base R text function, add after the points call:
text(cities$lon, y = cities$lat, cities$id, pos = 4)
You can change the pos to suit: 1 shows text below the point, 2 to the left, 3 above and 4 to the right.

R - Spatial Join Between SpatialPoints (GPS coordinates) and SpatialLinesDataFrame

I am working on a university project which combines data science and GIS. We need to find an open-source solution capable of obtaining additional information from a massive GPS coordinates dataset. Clearly, I cannot use any API with daily request limit.
THE DATA
Here you can find a sample of the dataset the Professor provided us:
longitude <- c(10.86361, 10.96062, 10.93032, 10.93103, 10.93212)
latitude <- c(44.53355, 44.63234, 44.63470, 44.63634, 44.64559)
longlat <- data.frame(longitude, latitude)
ID <- seq.int(1, 10)
FIRST TASK: Already Accomplished!
The first step was joining my SpatialPoints with a SpatialPolygonsDataFrame using over() of rgeos. The SpatialPolygonsDataFrame was obtained through getData('GADM', country='ITA', level=3) of rgeos.
For this first accomplished task, the objective was to associate to each GPS coordinates the information about City and Region which they belong to.
An example of the result I was able to obtain is:
require(sp)
require(rgeos)
my_spdf <- SpatialPointsDataFrame(coords = longlat, data = ID, proj4string = CRS(" +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 "))
italy_administrative_boundaries_level3 <- getData('GADM', country='ITA', level=3)
result <- over(my_spdf, italy_administrative_boundaries_level3)[, c("NAME_0", "NAME_1", "NAME_2", "NAME_3")]
result$ID <- ID
print(result)
SECOND TASK: MY QUESTION
Now the stuff become tricky because I need to associate additional and deeper information like road_name and road_type.
This information are contained in the shapefiles created on OpenStreetMap and available at: download.geofabrik.de/europe/italy.html.
I loaded the shapefile in R obtaining a SpatialLinesDataFrame:
require(rgdal)
shapefile_roads <- readOGR(dsn = "./road", layer = "roads")
Then, I naively tried to apply the same technique as for joining SpatialPoints and SpatialPolygonsDataFrame:
result <- over(my_spdf, shapefile_roads)
Clearly, the result is just NA. One possible reason that came into my mind was that the coordinates of my_df are not in the exact position of the Lines in shapefile_roads, therefore, I should need some kind of radius parameter. However, I am not really sure.
Can you suggest me the correct approach to perform this spatial join between my SpatialPoints and the attributes of the SpatialLinesDataFrame obtained from the road_shapefile of OpenStreetMap?
Please if something is not very clear do not hesitate to ask.
Your example data
library(raster)
longitude <- c(10.86361, 10.96062, 10.93032, 10.93103, 10.93212)
latitude <- c(44.53355, 44.63234, 44.63470, 44.63634, 44.64559)
longlat <- data.frame(longitude, latitude)
ID <- data.frame(ID=1:5)
ita_gadm3 <- getData('GADM', country='ITA', level=3)[, c("NAME_0", "NAME_1", "NAME_2", "NAME_3")]
#use `sp::over` or `raster::extract`
result <- extract(ita_gadm3, longlat)
Some roads:
road <- spLines(cbind(longitude+.1, latitude), cbind(longitude-.1, rev(latitude)), cbind(longitude-.1, latitude+1), crs=crs(ita_gadm3))
Now find the nearest road segment. You can use geosphere::dist2Line because you are using angular (lon/lat) coordinates.
library(geosphere)
geosphere::dist2Line(longlat, road)
# distance lon lat ID
#[1,] 2498.825 10.83212 44.53355 2
#[2,] 5527.646 11.03032 44.63470 1
#[3,] 5524.227 10.86062 44.63634 2
#[4,] 5577.372 10.86062 44.63634 2
#[5,] 5756.113 10.86062 44.63634 2
Note the variable ID which refers back to the roads. The problem is that dist2line is currently slow and you have a large data set.
The alternative is to transform your spatial data to a planar coordinate system appropriate for Italy and use gDistance.
library(rgeos)
library(rgeos)
sp <- SpatialPoints(longlat, proj4string=crs(ita_gadm3))
spita <- spTransform(sp, "+proj=tmerc +lat_0=0 +lon_0=15 +k=0.9996 +x_0=2520000 +y_0=0 +ellps=intl +units=m")
rdita <- spTransform(road, "+proj=tmerc +lat_0=0 +lon_0=15 +k=0.9996 +x_0=2520000 +y_0=0 +ellps=intl +units=m")
gd <- gDistance(rdita, spita, byid=TRUE)
a <- apply(gd, 1, which.min)
a
#1 2 3 4 5
#2 1 2 2 2
That is, point 2 is closest to road 1. The other points are closest to road 2.
You probably need to do that in batches of points or tiles to avoid getting a distance matrix that is too large.
The buffer solution suggested by Sébastien could work in principle, but gets really complicated as there is no good buffer size. At the one hand, points may be outside any buffer and, at the other hand, they may overlap with several buffers. If you use buffers, sp::over returns an arbitrary match if there are multiple matches, whereas raster::extract will return them all. Neither is pretty, and I would avoid this approach. Illustrated here:
b <- buffer(road, width=.15, dissolve=F)
plot(b)
lines(road, col='red', lwd=2)
points(longlat, pch=20, col='blue')
extract(b, longlat)
# point.ID poly.ID
#1 1 1
#2 1 2
#3 2 2
#4 2 1
#5 3 2
#6 3 1
#7 4 2
#8 4 1
#9 5 1
#10 5 2
over(sp, b)
#1 2 3 4 5
#2 2 2 2 2
You need to join polygons with your points, not Lines. To do so, you can create a buffer area around your Lines using rgeos::gBuffer(). Be careful, because the buffer area will be in the coordinate system of your Lines. Probably degrees (wgs84) in your case (Verify it). Choose the correct distance (width) according to your case.
LinesBuffer <- rgeos::gBuffer(shapefile_roads, width = 0.01)
Then you will be able to join points with "LinesBuffer" using over (if they are in the same coordinates system).
result <- over(my_spdf, LinesBuffer)

Aggregate raster by a non-integer factor with arbitrary function

I would like to aggregate a population raster by a factor of 1.5, summing the values of the cells.
While aggregate() allows me to sum values when aggregating, its factor parameter accepts only integer values. projectRaster() and resample() allow me to adjust the resolution precisely, but (as far as I know) I am restricted to the prepackaged bilinear-interpolation and nearest-neighbor computation methods.
Is there a way to aggregate a raster by a non-integer factor AND specify the function to use when aggregating?
library(raster)
set.seed(10)
proj <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
r <- raster(resolution = 1, nrow = 100, crs = proj)
r[] <- round(rnorm(ncell(r), 100, 10))
# Doesn't accept non-integer factors
aggregate(r, fact = 1.5, fun = sum)
template <- raster(extent(r), crs = crs(r), resolution = 1.5)
# Correct resolution, but incorrect / impossible values for population
projectRaster(r, to = template, method = "ngb")
projectRaster(r, to = template, method = "bilinear")
Possible workaround
So far, the only method I've been able to come up with is to coerce the template to a SpatialPoints object; extract values from the original, higher-resolution raster; and rasterize() the result:
pts <- as(template, "SpatialPoints")
vals <- extract(r, pts)
pts2 <- SpatialPointsDataFrame(pts, data.frame(vals))
rasterize(pts2, template, field = "vals", fun = sum)
However, if the points are created at the centroids of the raster cells, I'm not sure how they are handled when extracting with a resolution of 1.5x the original raster. My preferred method would be to create a SpatialPolygonsDataFrame and rasterize with fun = mean, but (in my experience) extracting raster values using polygons is very inefficient.
Here a workaround:
#first resample to higher resolution
template <- raster(extent(r), crs = crs(r), resolution = .5)
detailedRas <- projectRaster(r, to = template, method = "ngb")
#then use an integer as a factor (in this case 3)
aggRas <- aggregate(detailedRas, fact=3, fun=sum)
Note however that sum in this case won't return the sum of people who are living in a certain aggregated area.
I.e.: Let's say we have four cells and these values with a resolution of 1 m:
10 15
12 18
after resampling to 0.5 using NN:
10 10 15 15
10 10 15 15
12 12 18 18
12 12 18 18
Then aggregating by sum to 1.5m we get for the first pixel:
10+10+15+10+10+15+12+12+18 = 112
While in fact it should be something like:
10 + 15/2 + 12/2 + 18/4 = 28 (if we assume an equal population distribution over each pixel.)
I would recommend using the focal raster function with a custom / user defined function for summing up population values as you wish.
Or you divide the resampled raster by 4 and then take the sum:
2.5 2.5 3.75 3.75
2.5 2.5 3.75 3.75
3 3 4.5 4.5
3 3 4.5 4.5
2.5 + 2.5 + 3.75 + 2.5 + 2.5 + 3.75 + 3 + 3 + 4.5 = 28

Visualize correlations on a map in R

I caculated correlations between temperatures und the date of grape harvest. I stored the results as matrix:
32.5 37.5 42.5 47.5 52.5 57.5 62.5
-12.5 -0.05783118 -0.001655467 -0.07857098 -0.1526494 -0.0007327898 -0.02078552 0.06121682
-7.5 -0.23219824 -0.059952117 -0.06895444 -0.1674386 -0.1311612338 -0.08476390 0.09831010
-2.5 -0.11040995 -0.147325160 -0.15016740 -0.1796807 -0.1819844495 -0.14472899 -0.03550576
2.5 -0.20577359 -0.180857373 -0.15077067 -0.2293366 -0.2577666092 -0.21645676 -0.13044584
7.5 -0.44526971 -0.176224708 -0.15114994 -0.2459971 -0.2741514139 -0.19281484 -0.15683870
12.5 -0.12481683 -0.121675085 -0.16011098 -0.2288839 -0.2503969467 -0.26616721 -0.23089796
17.5 -0.15352693 -0.220012419 -0.11456690 -0.2314059 -0.2194705426 -0.20557053 -0.22529422
Now I want to visualize the results on a map. It should look like this:
example for visualizing correlations on a map
Only my longitude und latitude are different. My latitude ranges from 32,5°N to 62,5°N and my longitude goes from -12,5°E to 17,5°E.
I have absolutely no idea how it's been done! It would be nice, if someone can help me.
Regards.
This is one way. Your grid is rather coarse (increments of 5°, or ~350 miles at the equator), and of course you did not provide an actual map, but this will plot a "heat map" of correlation at the coordinates you provided.
df <- cbind(lon=rownames(df),df)
library(reshape2)
library(ggplot2)
library(RColorBrewer)
gg <- melt(df,id="lon",variable.name="lat",value.name="corr")
gg$lat <- as.numeric(substring(gg$lat,2)) # remove pre-pended "X"
gg$lon <- as.numeric(as.character(gg$lon)) # convert factor to numeric
ggplot(gg)+
geom_tile(aes(x=lon,y=lat, fill=corr))+
scale_fill_gradientn(colours=rev(brewer.pal(9,"Spectral")))+
coord_fixed()
The corrplot() function from corrplot R package can be used to plot a correlogram.
library(corrplot)
M<-cor(mtcars) # compute correlation matrix
corrplot(M, method="circle")
this is described here:
http://www.sthda.com/english/wiki/visualize-correlation-matrix-using-correlogram
Online R software is also available to compute and visualize a correlation matrix, by a simple click without any installation :
http://www.sthda.com/english/rsthda/correlation-matrix.php

Resources