Rounding R simple features linestring geometry coordinates - r

I need to round the coordinates of a simple features object (containing approx. 1,000,000 linestring features) to the nearest 5 decimal places. The code below does this correctly, but runtime is very slow because the last line of the for loop (indata$geometry[i] <- st_linestring(coords) takes several seconds for each iteration.
Does anyone know a more efficient way to code this?
indata <- st_read(dsn=dir, layer=layer)
indata <- st_cast(indata,"LINESTRING")
for (i in 1:nrow(indata)) {
coords <- st_coordinates(indata$geometry[i])
coords <- round(coords, 5)
indata$geometry[i] <- st_linestring(coords) #This is the slow part
}

I don't think you can improve much upon what you have without writing out a shapefile. The limitation seems to be dealing with linestring. However, you can use the st_set_precision function to set the precision and write out a file. It doesn't change the geometry precision until your write a file. You can read more about how precision works in the st_binary section of the sf manual on page 48 of the sf manual. Basically it is related to the number of zeros.
outdata <- st_set_precision(indata, precision=10^5)
st_write(outdata, "/path/to/file.shp")
indata <- st_read("/path/to/file.shp")

Related

How to streamline and speed up loop with getData package in R

I am trying to download high-resolution climate data for a bunch of lat/long coordinates, and combine them into a single dataframe. I've come up with a solution (below), but it will take forever with the large list of coordinates I have. I asked a related question on the GIS StackExchange to see if anyone knew of a better approach for downloading and merging the data, but I'm wondering if I could somehow just speed up the operation of the loop? Does anyone have any suggestions on how I might do that? Here is a reproducible example:
# Download and merge 0.5 minute MAT/MAP data from WorldClim for a list of lon/lat coordinates
# This is based on https://emilypiche.github.io/BIO381/raster.html
# Make a dataframe with coordinates
coords <- data.frame(Lon = c(-83.63, 149.12), Lat=c(10.39,-35.31))
# Load package
library(raster)
# Make an empty dataframe for dumping data into
coords3 <- data.frame(Lon=integer(), Lat=integer(), MAT_10=integer(), MAP_MM=integer())
# Get WorldClim data for all the coordinates, and dump into coords 3
for(i in seq_along(coords$Lon)) {
r <- getData("worldclim", var="bio", res=0.5, lon=coords[i,1], lat=coords[i,2]) # Download the tile containing the lat/lon
r <- r[[c(1,12)]] # Reduce the layers in the RasterStack to just the variables we want to look at (MAT*10 and MAP_mm)
names(r) <- c("MAT_10", "MAP_mm") # Rename the columns to something intelligible
points <- SpatialPoints(na.omit(coords[i,1:2]), proj4string = r#crs) #give lon,lat to SpatialPoints
values <- extract(r,points)
coords2 <- cbind.data.frame(coords[i,1:2],values)
coords3 <- rbind(coords3, coords2)
}
# Convert MAT*10 from WorldClim into MAT in Celcius
coords3$MAT_C <- coords3$MAT_10/10
Edit: Thanks to advice from Dave2e, I've first made a list, then put intermediate results in the list, and rbind it at the end. I haven't timed this yet to see how much faster it is than my original solution. If anyone has further suggestions on how to improve the speed, I'm all ears! Here is the new version:
coordsList <- list()
for(i in seq_along(coordinates$lon_stm)) {
r <- getData("worldclim", var="bio", res=0.5, lon=coordinates[i,7], lat=coordinates[i,6]) # Download the tile containing the lat/lon
r <- r[[c(1,12)]] # Reduce the layers in the RasterStack to just the variables we want to look at (MAT*10 and MAP_mm)
names(r) <- c("MAT_10", "MAP_mm") # Rename the columns to something intelligible
points <- SpatialPoints(na.omit(coordinates[i,7:6]), proj4string = r#crs) #give lon,lat to SpatialPoints
values <- extract(r,points)
coordsList[[i]] <- cbind.data.frame(coordinates[i,7:6],values)
}
coords_new <- bind_rows(coordsList)
Edit2: I used system.time() to time the execution of both of the above approaches. When I did the timing, I had already downloaded all of the data, so the download time isn't included in my time estimates. My first approach took 45.01 minutes, and the revised approach took 44.15 minutes, so I'm not really seeing a substantial time savings by doing it the latter way. Still open to advice on how to revise the code so I can improve the speed of the operations!

Normalize RasterLayer as Matrix to use as Clip Frame

I was assigned the task to clip a raster from .nc file from a .tif file.
edit (from comment):
i want to extract temp. info from the .nc because i need to check the yearly mean temperature of a specific region. to be comparable the comparison has to occur on exactly the same area. The .nc file is larger than the previously checked area so i need to "clip" it to the extent of a .tif I have. The .tif data is in form 0|1 where it is 0 (or the .tif is smaller than the .nc) the .nc data should be "cliped". In the end i want to keep the .nc data but at the extent of the .tif while still retaining its resolution & projection. (.tif and .nc have different projections&pixel sizes)
Now ordinarily that wouldn't be a problem as i could use raster::crop. This doesn't deal with different projections and different pixel size/resolution though. (I still used it to generate an approximation, but it is not precise enough for the final infromation, as can be seen in the code snippet below). The obvious method to generate a more reliable dataset/rasterset would be to first use a method like raster::projectRaster or raster::sp.Transform # adding sp.transform was done in an edit to the original question and homogenize the datasets but this approach takes too much time, as i have to do this for quite a few .nc files.
I was told the best method would be to generate a normalized matrix from the smaller raster "clip_frame" and then just multiply it with the "nc_to_clip" raster. Doing so should prevent any errors through map projections or other factors. This makes a lot of sense to me in theory but I have no idea how to do this in practice. I would be very grateful to any kind of hint/code snippet or any other help.
I have looked at similar problems on StackOverflow (and other sites) like:
convert matrix to raster in R
Convert raster into matrix with R
https://www.researchgate.net/post/Hi_Is_there_a_way_to_multiply_Raster_value_by_Raster_Latitude
As I am not even sure how to frame the question correctly, I might have overlooked an answer to this problem, if so please point me there!
My (working) code so far, just to give you an idea of how I want to approach the topic (here using the crop-function).
#library(ncdf4)
library(raster)
library(rgdal)
library(tidyverse)
nc_list<-list.files(pattern = ".*0.nc$") # list of .nc files containing raster and temperature information
#nc_to_clip <- lapply(nc_list, raster, varname="GST") # read in as raster
nc_to_clip < -raster(ABC.nc, vername="GST)
clip_frame <- raster("XYZ.tif") # read in .tif for further use as frame
mean_temp_from_raster<-function(input_clip_raster, input_clip_frame){ # input_clip_raster= raster to clip, input_clip_frame
r2_coord<-rasterToPoints(input_clip_raster, spatial = TRUE) # step 1 to extract coordinates
map_clip <- crop(input_clip_raster, extent(input_clip_frame)) # use crop to cut the input_clip_raster (this being the function I have to extend on)
temp<-raster::extract(map_clip, r2_coord#coords) # step 2 to extract coordinates
temp_C<-temp*0.01-273.15 # convert kelvin*100 to celsius
temp_C<-na.omit(temp_C)
mean(temp_C)
return_list<-list(map_clip, mean(temp_C))
return(return_list)
}
mean_tempC<-lapply(nc_to_clip, mean_temp_from_raster,clip_frame)
Thanks!
PS:
I don't have much experience working with .nc files and/or RasterLayers in R as I used to work with ArcGIS/Python (arcpy) for problems like this, which is not an option right now.
Perhaps something like this?
library(raster)
nc <- raster(ABC.nc, vername="GST)
clip <- raster("XYZ.tif")
x <- as(extent(clip), "SpatialPolygons")
crs(x) <- crs(clip)
y <- sp::spTransform(x, crs(nc))
clipped <- crop(nc, y)

Repeat loop having no effect in R

I'm trying to access elevation data using the "elevatr" package in R. I have a large set of coordinates (over 5,000 points) that I have set in a data frame, which is how the package is suppose to work. However, I have found that the connection gets interrupted regularly and the entire calculation stops.
I want to use a repeat loop to make sure the calculation runs its entirety, and when it fails, it starts over again on its own. However, the first few times I have tried, the repeat loop seems to have no effect and the calculation doesn't start over again.
I left a sample of my code. "n" is my desired length, which is pre-set. Everything in the code I provided works. Its just that sometimes get_elev_points does not complete the calculation, and putting it in a repeat loop does not seem to have any effect.
Any advice on the best way to solve this problem?
library(elevatr)
library(rgdal)
e <- 0
elevation <- 0
ll_prj <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
repeat{
coordinates <- data.frame(lon,lat)
e <- get_elev_point(locations = coordinates[m,], units="feet", prj = ll_prj)
elevation <- e$elevation
if (length(elevation) == n) {
break
}}

efficient use of raster functions in r

I have 500+ points in a SpatialPointsDataFrame object; I have a 1.7GB (200,000 rows x 200,000 cols) raster object. I want to have a tabulation of the values of the raster cells within a buffer around each of the 500+ points.
I have managed to achieve that with the code below (I got a lot of inspiration from here.). However, it is slow to run and I would like to make it run faster. It actually runs OK for buffers with "small" widths, say 5km ro even 15km (~1 million cells), but it becomes super slow when buffer increases to say 100km (~42 million cells).
I could easily improve on the loop below by using something from the apply family and/or a parallel loop. But my suspicion is that it is slow because the raster package writes 400Mb+ temporary files for each interaction of the loop.
# packages
library(rgeos)
library(raster)
library(rgdal)
myPoints = readOGR(points_path, 'myLayer')
myRaster = raster(raster_path)
myFunction = function(polygon_obj, raster_obj) {
# this function return a tabulation of the values of raster cells
# inside a polygon (buffer)
# crop to extent of polygon
clip1 = crop(raster_obj, extent(polygon_obj))
# crops to polygon edge & converts to raster
clip2 = rasterize(polygon_obj, clip1, mask = TRUE)
# much faster than extract
ext = getValues(clip2)
# tabulates the values of the raster in the polygon
tab = table(ext)
return(tab)
}
# loop over the points
ids = unique(myPoints$ID)
for (id in ids) {
# select point
myPoint = myPoints[myPoints$ID == id, ]
# create buffer
myPolygon = gBuffer(spgeom = myPoint, byid = FALSE, width = myWidth)
# extract the data I want (projections, etc are fine)
tab = myFunction(myPolygon, myRaster)
# do stuff with tab ...
}
My questions:
Am I right to partially blame the writing operations? If I managed to avoid all those writing operations, would this code run faster? I have access to a machine with 32GB of RAM -- so I guess it is safe to assume I could load the raster to the memory and need not to write temporary files?
What else could I do to improve efficiency in this code?
I think you should approach it like this
library(raster)
library(rgdal)
myPoints <- readOGR(points_path, 'myLayer')
myRaster <- raster(raster_path)
e <- extract(myRaster, myPoints, buffer=myWidth)
And then something like
etab <- sapply(e, table)
It is hard to answer your question #1 as we do not know enough about your data (we do not know how many cells are covered by a "100 km" buffer). But you can set options about when to write to file with the rasterOptions function. You notice that getValues is faster than extract, based on the post you link to, but I think that is wrong, or at least not very important. The combination of crop, rasterize and getValues should have a similar performance as extract (which does almost exactly that under the hood). If you go this route anyway, you should pass an empty RasterLayer, created by raster(myRaster) for faster cropping.

writing a loop for upscaling precipitation for USA

I am writing a code to calculate the mean amount of precipitation for different regions of conterminous USA. My total data has 300 times 120 (lon*lat) grids in Netcdf format. I want to write a loop in R to take the average of each 10 by 10 number of grids and assign that value (average) to all of the grids inside the region and repeat this for the next region. At the end instead of a 120 by 300 grids I will have 12 by 30 grids. So this is kind a upscaling method I want to apply to my data. I can use a for-loop for each region separately but It makes my code very huge and I don’t want to do that. Any idea would be appreciated. Thanks.
P.S: Here is the function I have written for one region (10by10) lat*lon.
upscaling <- function(file, variable, start.time=1, count.time=1)
{
library(ncdf) # load ncdf library to manipulate ncdf data
ncdata <- open.ncdf(file); # open ncdf file
lon <- get.var.ncdf(ncdata, "lon");
lat <- get.var.ncdf(ncdata, "lat");
time <- get.var.ncdf(ncdata, "time");
start.lon <- 1
end.lon <- length(lon)
start.lat <- 1
end.lat <- length(lat)
count.lon <- end.lon - start.lon + 1; # count number of longitude
count.lat <- end.lat - start.lat + 1; # count number of latitude
dat <- get.var.ncdf(ncdata, variable, start=c(start.lon, start.lat, 1),
count=c(count.lon, count.lat, 1))
temp.data<- array(0,dim=c(10,10))
for (i in 1:10)
{
for (j in 1:10)
{
temp.data <- mean(dat[i,j,])
}
}
}
There is no need to make a messy loop to spatially aggregate your data. Just use the aggregate function in the raster package:
library(raster)
a=matrix(data=c(1:100),nrow=10,ncol=10)
a=raster(a)
ra <- aggregate(a, fact=5, fun=mean) #fact=5 will aggregate using a 5x5 window
ra=as.matrix(ra)
ra
Now for your netcdf data, use raster's rasterFromXYZ to create the raster that can then be aggregated with the above method. Bonus includes the option to define your projection as an argument in the function so you end up with a georeferenced object at the end. This is important because if you aggregate your data without it you will then have to figure out by hand how to georeference the resulting matrix.
EDIT: If you want a resulting raster with the same dimensions as the original one, disaggregate the data right after aggregating it. While this seems redundant, these raster methods are very fast.
library(raster)
a=matrix(data=c(1:100),nrow=10,ncol=10)
a=raster(a)
ra <- aggregate(a, fact=5, fun=mean) #fact=5 will aggregate using a 5x5 window
ra <- disaggregate(ra, fact=5)
ra=as.matrix(ra)
ra
If you grid definitions follow standard netcdf conventions, then you might be able to remap using the CDO remapping functions. For first order conservative remapping you can try
cdo remapcon,grid_specification_here in.nc out.nc
Note that the answer given above is approximate, and not quite correct as the grid cell size is not the same as a function of latitude. The size of the error is likely small for this particular task as the cell sizes are fine, but nevertheless the answer will be slightly off.

Resources