I'm trying to access elevation data using the "elevatr" package in R. I have a large set of coordinates (over 5,000 points) that I have set in a data frame, which is how the package is suppose to work. However, I have found that the connection gets interrupted regularly and the entire calculation stops.
I want to use a repeat loop to make sure the calculation runs its entirety, and when it fails, it starts over again on its own. However, the first few times I have tried, the repeat loop seems to have no effect and the calculation doesn't start over again.
I left a sample of my code. "n" is my desired length, which is pre-set. Everything in the code I provided works. Its just that sometimes get_elev_points does not complete the calculation, and putting it in a repeat loop does not seem to have any effect.
Any advice on the best way to solve this problem?
library(elevatr)
library(rgdal)
e <- 0
elevation <- 0
ll_prj <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
repeat{
coordinates <- data.frame(lon,lat)
e <- get_elev_point(locations = coordinates[m,], units="feet", prj = ll_prj)
elevation <- e$elevation
if (length(elevation) == n) {
break
}}
Related
I am trying to download high-resolution climate data for a bunch of lat/long coordinates, and combine them into a single dataframe. I've come up with a solution (below), but it will take forever with the large list of coordinates I have. I asked a related question on the GIS StackExchange to see if anyone knew of a better approach for downloading and merging the data, but I'm wondering if I could somehow just speed up the operation of the loop? Does anyone have any suggestions on how I might do that? Here is a reproducible example:
# Download and merge 0.5 minute MAT/MAP data from WorldClim for a list of lon/lat coordinates
# This is based on https://emilypiche.github.io/BIO381/raster.html
# Make a dataframe with coordinates
coords <- data.frame(Lon = c(-83.63, 149.12), Lat=c(10.39,-35.31))
# Load package
library(raster)
# Make an empty dataframe for dumping data into
coords3 <- data.frame(Lon=integer(), Lat=integer(), MAT_10=integer(), MAP_MM=integer())
# Get WorldClim data for all the coordinates, and dump into coords 3
for(i in seq_along(coords$Lon)) {
r <- getData("worldclim", var="bio", res=0.5, lon=coords[i,1], lat=coords[i,2]) # Download the tile containing the lat/lon
r <- r[[c(1,12)]] # Reduce the layers in the RasterStack to just the variables we want to look at (MAT*10 and MAP_mm)
names(r) <- c("MAT_10", "MAP_mm") # Rename the columns to something intelligible
points <- SpatialPoints(na.omit(coords[i,1:2]), proj4string = r#crs) #give lon,lat to SpatialPoints
values <- extract(r,points)
coords2 <- cbind.data.frame(coords[i,1:2],values)
coords3 <- rbind(coords3, coords2)
}
# Convert MAT*10 from WorldClim into MAT in Celcius
coords3$MAT_C <- coords3$MAT_10/10
Edit: Thanks to advice from Dave2e, I've first made a list, then put intermediate results in the list, and rbind it at the end. I haven't timed this yet to see how much faster it is than my original solution. If anyone has further suggestions on how to improve the speed, I'm all ears! Here is the new version:
coordsList <- list()
for(i in seq_along(coordinates$lon_stm)) {
r <- getData("worldclim", var="bio", res=0.5, lon=coordinates[i,7], lat=coordinates[i,6]) # Download the tile containing the lat/lon
r <- r[[c(1,12)]] # Reduce the layers in the RasterStack to just the variables we want to look at (MAT*10 and MAP_mm)
names(r) <- c("MAT_10", "MAP_mm") # Rename the columns to something intelligible
points <- SpatialPoints(na.omit(coordinates[i,7:6]), proj4string = r#crs) #give lon,lat to SpatialPoints
values <- extract(r,points)
coordsList[[i]] <- cbind.data.frame(coordinates[i,7:6],values)
}
coords_new <- bind_rows(coordsList)
Edit2: I used system.time() to time the execution of both of the above approaches. When I did the timing, I had already downloaded all of the data, so the download time isn't included in my time estimates. My first approach took 45.01 minutes, and the revised approach took 44.15 minutes, so I'm not really seeing a substantial time savings by doing it the latter way. Still open to advice on how to revise the code so I can improve the speed of the operations!
I need to round the coordinates of a simple features object (containing approx. 1,000,000 linestring features) to the nearest 5 decimal places. The code below does this correctly, but runtime is very slow because the last line of the for loop (indata$geometry[i] <- st_linestring(coords) takes several seconds for each iteration.
Does anyone know a more efficient way to code this?
indata <- st_read(dsn=dir, layer=layer)
indata <- st_cast(indata,"LINESTRING")
for (i in 1:nrow(indata)) {
coords <- st_coordinates(indata$geometry[i])
coords <- round(coords, 5)
indata$geometry[i] <- st_linestring(coords) #This is the slow part
}
I don't think you can improve much upon what you have without writing out a shapefile. The limitation seems to be dealing with linestring. However, you can use the st_set_precision function to set the precision and write out a file. It doesn't change the geometry precision until your write a file. You can read more about how precision works in the st_binary section of the sf manual on page 48 of the sf manual. Basically it is related to the number of zeros.
outdata <- st_set_precision(indata, precision=10^5)
st_write(outdata, "/path/to/file.shp")
indata <- st_read("/path/to/file.shp")
I'm still new to R and don't know how to create a loop for my workprocess to make it more efficient.
I have a Digital Elevation Model (raster Barrow_5m.tif), a shapefile for lakes and buffer with 10 iDs in a row of the table each.
In the script below I created a new raster file for all values of the lake and the buffer shape file with the data from the DEM raster. This works fine.
setwd("...")
Barrow_5m <- raster("Barrow_5m.tif")
Barrow_DTLB <- st_read("Barrow_DTLB.shp")
Barrow_DTLB_Buffer <- st_read("Barrow_DTLB_BufferOUT.shp")
Barrow_lake <- crop(Barrow_5m, extent(Barrow_DTLB))
raster_lake <- rasterize(Barrow_DTLB, Barrow_lake, mask = TRUE)
Barrow_buffer <- crop(Barrow_2m, extent(Barrow_DTLB_Buffer))
raster_buffer <- rasterize(Barrow_DTLB_Buffer, Barrow_buffer, mask = TRUE)
writeRaster(raster_lake, "raster_lake.tif")
writeRaster(raster_buffer, "raster_buffer.tif")
But now I want to have a raster file for every id of the lake and the buffer shapefile seperately, so 2x10 files.
I thought it's best to write a loop for this, but my skills are not enough so far to do this.
Also other questions didn't bring the solution so far. I tried to help me with this.
Alternatively I could use my end product tif from the script above and undo this in files for every ID.
I want to write the loop and not do it by hand for all the IDs of the shapefiles, because afterwards I am going to do the same with an even bigger shapefile of more values.
I found a solution now, by extracting data by the ID.
It creates a largelist with 11 elements and all values of each id, which is sufficient for my further work. You can also directly creat the mean, max, min, etc values of each element (so each ID).
k <- Barrow_DTLB$ID #k= number of rows
LakesA <- extract(raster_lakeA, Barrow_DTLB[k, ])
LakesA_mean <- extract(raster_lakeA, Barrow_DTLB[k, ], fun=mean)
Maybe this solution is also helpful for a few, who already viewed the question.
I think this should work:
for (i in unique(raster_lake)){
r <- raster_lake
r[!(values(r) == i)] <- NA
r <- trim(r)
writeRaster(r, paste0("raster_lake_", i, ".tif"))
}
I have 500+ points in a SpatialPointsDataFrame object; I have a 1.7GB (200,000 rows x 200,000 cols) raster object. I want to have a tabulation of the values of the raster cells within a buffer around each of the 500+ points.
I have managed to achieve that with the code below (I got a lot of inspiration from here.). However, it is slow to run and I would like to make it run faster. It actually runs OK for buffers with "small" widths, say 5km ro even 15km (~1 million cells), but it becomes super slow when buffer increases to say 100km (~42 million cells).
I could easily improve on the loop below by using something from the apply family and/or a parallel loop. But my suspicion is that it is slow because the raster package writes 400Mb+ temporary files for each interaction of the loop.
# packages
library(rgeos)
library(raster)
library(rgdal)
myPoints = readOGR(points_path, 'myLayer')
myRaster = raster(raster_path)
myFunction = function(polygon_obj, raster_obj) {
# this function return a tabulation of the values of raster cells
# inside a polygon (buffer)
# crop to extent of polygon
clip1 = crop(raster_obj, extent(polygon_obj))
# crops to polygon edge & converts to raster
clip2 = rasterize(polygon_obj, clip1, mask = TRUE)
# much faster than extract
ext = getValues(clip2)
# tabulates the values of the raster in the polygon
tab = table(ext)
return(tab)
}
# loop over the points
ids = unique(myPoints$ID)
for (id in ids) {
# select point
myPoint = myPoints[myPoints$ID == id, ]
# create buffer
myPolygon = gBuffer(spgeom = myPoint, byid = FALSE, width = myWidth)
# extract the data I want (projections, etc are fine)
tab = myFunction(myPolygon, myRaster)
# do stuff with tab ...
}
My questions:
Am I right to partially blame the writing operations? If I managed to avoid all those writing operations, would this code run faster? I have access to a machine with 32GB of RAM -- so I guess it is safe to assume I could load the raster to the memory and need not to write temporary files?
What else could I do to improve efficiency in this code?
I think you should approach it like this
library(raster)
library(rgdal)
myPoints <- readOGR(points_path, 'myLayer')
myRaster <- raster(raster_path)
e <- extract(myRaster, myPoints, buffer=myWidth)
And then something like
etab <- sapply(e, table)
It is hard to answer your question #1 as we do not know enough about your data (we do not know how many cells are covered by a "100 km" buffer). But you can set options about when to write to file with the rasterOptions function. You notice that getValues is faster than extract, based on the post you link to, but I think that is wrong, or at least not very important. The combination of crop, rasterize and getValues should have a similar performance as extract (which does almost exactly that under the hood). If you go this route anyway, you should pass an empty RasterLayer, created by raster(myRaster) for faster cropping.
I am writing a code to calculate the mean amount of precipitation for different regions of conterminous USA. My total data has 300 times 120 (lon*lat) grids in Netcdf format. I want to write a loop in R to take the average of each 10 by 10 number of grids and assign that value (average) to all of the grids inside the region and repeat this for the next region. At the end instead of a 120 by 300 grids I will have 12 by 30 grids. So this is kind a upscaling method I want to apply to my data. I can use a for-loop for each region separately but It makes my code very huge and I don’t want to do that. Any idea would be appreciated. Thanks.
P.S: Here is the function I have written for one region (10by10) lat*lon.
upscaling <- function(file, variable, start.time=1, count.time=1)
{
library(ncdf) # load ncdf library to manipulate ncdf data
ncdata <- open.ncdf(file); # open ncdf file
lon <- get.var.ncdf(ncdata, "lon");
lat <- get.var.ncdf(ncdata, "lat");
time <- get.var.ncdf(ncdata, "time");
start.lon <- 1
end.lon <- length(lon)
start.lat <- 1
end.lat <- length(lat)
count.lon <- end.lon - start.lon + 1; # count number of longitude
count.lat <- end.lat - start.lat + 1; # count number of latitude
dat <- get.var.ncdf(ncdata, variable, start=c(start.lon, start.lat, 1),
count=c(count.lon, count.lat, 1))
temp.data<- array(0,dim=c(10,10))
for (i in 1:10)
{
for (j in 1:10)
{
temp.data <- mean(dat[i,j,])
}
}
}
There is no need to make a messy loop to spatially aggregate your data. Just use the aggregate function in the raster package:
library(raster)
a=matrix(data=c(1:100),nrow=10,ncol=10)
a=raster(a)
ra <- aggregate(a, fact=5, fun=mean) #fact=5 will aggregate using a 5x5 window
ra=as.matrix(ra)
ra
Now for your netcdf data, use raster's rasterFromXYZ to create the raster that can then be aggregated with the above method. Bonus includes the option to define your projection as an argument in the function so you end up with a georeferenced object at the end. This is important because if you aggregate your data without it you will then have to figure out by hand how to georeference the resulting matrix.
EDIT: If you want a resulting raster with the same dimensions as the original one, disaggregate the data right after aggregating it. While this seems redundant, these raster methods are very fast.
library(raster)
a=matrix(data=c(1:100),nrow=10,ncol=10)
a=raster(a)
ra <- aggregate(a, fact=5, fun=mean) #fact=5 will aggregate using a 5x5 window
ra <- disaggregate(ra, fact=5)
ra=as.matrix(ra)
ra
If you grid definitions follow standard netcdf conventions, then you might be able to remap using the CDO remapping functions. For first order conservative remapping you can try
cdo remapcon,grid_specification_here in.nc out.nc
Note that the answer given above is approximate, and not quite correct as the grid cell size is not the same as a function of latitude. The size of the error is likely small for this particular task as the cell sizes are fine, but nevertheless the answer will be slightly off.