Question
I am trying to perform a number of polygon clips using the gIntersection function with R in a loop. I can obtain the correct clips and re-enter data manually (so I can turn the resulting SpatialPolygons object back into a SpatialPolygonsDataFrame object). What I can't do is get this working in a loop with for() or apply().
At the moment this isn't a problem. I have 9 English regions (with London), so it's not a huge challenge to set each clip up manually. But, I want to eventually clip LSOAs in LADs, which essentially means setting up >400 clips.
So, my question is, how do I turn my manual clips into a working loop?
Minimal Reproducible Example
To keep things simple, let's use the English regions (n = 9). For each of the 9 regions, I'm going to clip the counties. The following code loads the appropriate shapefiles and reprojects them as British National Grid:
require(rgdal)
require(rgeos)
# English counties shapefile (~ 10MB zipped)
download.file(
"https://dl.dropboxusercontent.com/s/6o0mi28pjo1kh9k/england-counties.zip",
"ec", method = "wget")
unzip("ec")
ec <- readOGR("england-counties", "england_ct_2011")
proj4string(ec) <- CRS("+init=epsg:27700")
# English regions (~6MB zipped)
download.file(
"https://dl.dropboxusercontent.com/s/p69m0vk2fh4xe3o/england-regions-2011.zip",
"er", method = "wget")
unzip("er")
er <- readOGR("england-regions-2011", "England_gor_2011")
proj4string(er) <- CRS("+init=epsg:27700")
You should be left with two objects, er (English regions) and ec (English counties). Both are SpatialPolygonsDataFrame objects.
Taking the first region - East of England E12000006 - let's clip the counties and turn the result back in to a SpatialPolygonsDataFrame object:
ee <- gIntersection(ec, er[er$CODE == "E12000006", ],
byid = T, drop_not_poly = T)
row.names(ee) <- as.character(gsub(" 0", "", row.names(ee)))
# gIntersection adds ' 0' to each row.name?
ee <- SpatialPolygonsDataFrame(ee, ec#data[row.names(ee), ])
A plot of ee confirms this worked:
As you can see, this is a nice workflow for just a few shapes, but I really want to loop through all regions and, ultimately, many more polygons.
What I've Tried
I'm not very good with apply() loops, so what I've tried so far is a for() loop (which I know is relatively slow, but still quicker than typing everything out!):
regions <- as.character(er$CODE) # length = 9 as expected
for(i in 1:length(regions)){
as.name(paste0(regions[i], "c")) <-
gIntersection(ec, er[er$CODE == regions[1], ], byid = T, drop_not_poly = T)
}
Rather than the expected behaviour I get the following error:
Error in as.name(paste0(regions[1], "c")) <- gIntersection(ec, er[er$CODE == :
could not find function "as.name<-"
I also tried wrapping the object name in an eval() but get the following error:
Error in eval(as.name(paste0(regions[1], "c"))) <- gIntersection(ec, er[er$CODE == :
could not find function "eval<-"
What am I missing?
In addition to the gIntersection, I would like to re-create a SpatialPolygonsDataFrame object if possible. I've tried the following code, having done one gIntersection manually, but again it doesn't work:
for(i in 1:length(regions)){
row.names(as.name(paste0(regions[i], "c"))) <- as.character(gsub(" 0", "",
row.names(as.name(paste0(regions[i], "c")))))
}
I get the following error:
Error in `rownames<-`(x, value) :
attempt to set 'rownames' on an object with no dimensions
I'm also not sure how to increment the " 0", as this increases by one for each new region (" 1", " 2", etc.)
Again, setting the first example up manually I also can't perform the final SpatialPolygonsDataFrame step:
for(i in 1:length(regions)){
as.name(regions[i]) <- SpatialPolygonsDataFrame(regions[i],
ec#data[row.names(regions[i], )])
}
For this I get the following error:
Error in stopifnot(length(Sr#polygons) == nrow(data)) :
trying to get slot "polygons" from an object of a basic class ("character") with no
slots
Where I've looked
The following SO examples are related by do not seem to help, or at least I can't see how I would make them apply to this example:
rgeos gIntersection in loop takes too long to clip path network
How to clip WorldMap with polygon in R?
https://gis.stackexchange.com/questions/33278/no-intersection-found-between-polygons-i-know-intersect
Thanks for taking the time to read this.
Does this help?
ee <- lapply(regions, function(x)
gIntersection(ec, er[er$CODE == x, ], byid = TRUE, drop_not_poly = TRUE))
This gives you a list of SpatialPolygonsDataFrames, one for each region. Which you can access in the usual way, e.g.
ee[[1]]
plot(ee[[1]]) # to plot the first region with counties
Edit
Your orignial code should work with a sligh modification (see blow).
res <- list()
for (i in 1:length(regions)) {
ee <- gIntersection(ec, er[er$CODE == regions[i], ],
byid = TRUE, drop_not_poly = TRUE)
row.names(ee) <- as.character(gsub(paste0(" ", i-1), "", row.names(ee)))
ee <- SpatialPolygonsDataFrame(ee, ec#data[row.names(ee), ])
res[[i]] <- ee
}
If that solves the problem, then the problem was, that row names of ee always incremented by one and you did not account for this.
Related
/EDIT: My problem has been resolved. It turns out it was indeed an error unrelated to parallel.
Hi StackOverflow Community,
today I have a rather odd problem. I have a function which I am calling to run on 8 cores, and it has a parameter called type. Running it with type = "points" works perfectly, but I am concerned about running it with type = "lines". Even though it follows almost the same code, at some point the script crashes, and all what I get is (after almost 2 days of running!):
Warning message:
In mclapply(cluster_times, the_nt_function, all_trips, ellipses, :
all scheduled cores encountered errors in user code
Error: unexpected symbol in:
"
endtime"
Execution halted
The first thing I checked is, of course, the code of the_nt_function. I provide a summary of the code below.
The important things here, are:
The code for type = "points" runs perfectly and gives the expected output
The code for type = "lines" seems to run perfectly for nearly 2 days until the error stated above appears
The error stated above is inexplicable: there is no 'endtime' or anything in the code below!
the_nt_function takes some inputs, and for each row in times_df it creates a kind of sample of dat. If type="points", it then counts either how many (geographical) points in dat are in the ell object, using st_intersection from sf. If type="lines", it basically does the same, but instead of counting points intersecting with the ell object, it counts lines intersecting with the ell object.
I let the code run on a high performance cluster (but only using 8 cores). On my computer, it runs fine on 7 cores, at least for a test set of seven entries in the times_df object. It does not crash but gives the expected output.
I believe that the error is not in the_nt_function. I've googled it and looked for similar problems here on stackoverflow, but the only thing I found so far is this (which I am now trying out of desperation).
Do you have any ideas, what this error wants to tell me?
/Edit: I call the function in parallel by:
system.time(
out <- mclapply(cluster_times, the_nt_function, all_trips, ellipses, TRUE, opts$type, mc.cores = cpus)
)
where I give the arguments in correct order.
the_nt_function <- function(times_df, dat, sf_object, type) {
times_df$N_t = as.numeric(NA)
# Here is code that performs some kind of preprocessing and filtering down data, basically creating
# the data.table dat_filt_spc
for every row in times_df {
# do some more filtering and end up with a data table called dat_filt_time
if(type == "points"){
# Convert filtered DT to sf object (using startloc)
suppressWarnings(
dat_start_sf <- st_as_sf(
as.data.frame(dat_filt_time),
coords = c("startloclon", "startloclat"),
crs = 4326
)
)
} else if(type == "lines") {
suppressWarnings(
dat_start_sf <- st_as_sf(
as.data.frame(dat_filt_time),
crs = 4326
)
)
}
# Intersect startlocs with ellipse
suppressMessages(
start_intersect <- st_intersection(dat_start_sf, ell)$tripid
)
# Convert filtered DT to sf object (using endloc) and filter out trips which are already intersected
if(type == "points"){
suppressWarnings(
dat_end_sf <- st_as_sf(
as.data.frame(dat_filt_time[!tripid %in% start_intersect]),
coords = c("endloclon", "endloclat"),
crs = 4326
)
)
# Intersect endlocs with ellipse
suppressMessages(
end_intersect <- st_intersection(dat_end_sf, ell)$tripid
)
# concatenate start and endloc intersections, assess Nt and add to times_df
trips_intersect <- unique(c(start_intersect, end_intersect))
} else if(type == "lines"){
trips_intersect <- start_intersect
}
times_df[i, "N_t"] <- length(trips_intersect)
}
return(times_df)
}
}
I have several rasters, 343 to be more exact, from Cropscape. I need to get the locations (centroids) and area measurements of pixels that represent potatoes and tomatoes based on the associated values in the rasters. The pixel values are 43 and 54, respectively. Cropscape provides rasters separated by year and state, except for 2016, which has the lower 48 states combined. The rasters are saved as GeoTiffs on a Google Drive and I am using Google File Stream to connect to the rasters locally.
I want to create a SpatialPointsDataFrame from the centroids of each pixel or group of adjacent pixels for tomatoes and potatoes in all the rasters. Right now, my code will
Subset the rasters to potatoes and tomatoes
Change the raster subsets to polygons, one for potatoes and one for tomatoes
Create centroids from each polygon
Create a SpatialPointsDataFrame based on the centroids
Extract the area measurement for each area of interest with SpatialPointsDataFrame
Write the raster subsets and each polygon to a file.
Code:
library(raster)
library(rgdal)
library(rgeos)
dat_dir2 = getwd()
mepg <- make_EPSG()
ae_pr <- mepg[mepg$code == "5070", "prj4"]
# Toy raster list for use with code
# I use `list.files()` with the directories that hold
# the rasters and use list that is generated from
# that to read in the files to raster. My list is called
# "tiflist". Not used in the code, but mentioned later.
rmk1 <- function(x, ...) {
r1 = raster(ncol = 1000, nrow = 1000)
r1[] = sample(1:60, 1000000, replace = T)
proj4string(r1) = CRS(ae_pr)
return(r1)
}
rlis <- lapply(1:5, rmk1)
#Pixel values needed
ptto <- c(43, 54)
# My function to go through rasters for locations and area measurements.
# This code is somewhat edited to work with the demo raster list.
# It produces the same output as what I wanted, but with the demo list.
pottom <- function(x, ...) {
# Next line is not necessary with the raster list created above.
# temras = raster(x)
now = format(Sys.time(), "%b%d%H%M%S")
nwnm = paste0(names(x), now)
rasmatx = match(x = x, table = ptto)
writeRaster(rasmatx, file.path( dat_dir2, paste0(nwnm,"ras")), format = "GTiff")
tempol = rasterToPolygons(rasmatx, fun = function(x) { x > 0 & x < 4}, dissolve = T)
tempol2 = disaggregate(tempol)
# for potatoes
tempol2p = tempol2[tempol2$layer == '1',]
if (nrow(tempol2p) > 0) {
temcenp = gCentroid(tempol2p, byid = T)
temcenpdf = SpatialPointsDataFrame(temcenp, data.frame(ID = 1:length(temcenp) , temcenp))
temcenpdf$pot_p = extract(rasmatx, temcenpdf)
temcenpdf$areap_m = gArea(tempol2p, byid = T)
# writeOGR(temcenpdf, dsn=file.path(dat_dir2), paste0(nwnm, "p"), driver = "ESRI Shapefile")
}
# for tomatoes
tempol2t = tempol2[tempol2$layer == '2',]
if (nrow(tempol2t) > 0) {
temcent = gCentroid(tempol2t, byid = T)
temcentdf = SpatialPointsDataFrame(temcent, data.frame(ID = 1:length(temcent) , temcent))
temcentdf$tom_t = extract(rasmatx, temcentdf)
temcentdf$areat_m = gArea(tempol2t, byid = T)
writeOGR(temcentdf, dsn=file.path(dat_dir2), paste0(nwnm,"t"), driver = "ESRI Shapefile")
}
}
lapply(rlis, pottom)
I know I should provide some toy data and I created some, but I don't know if they exactly recreate my problem, which follows.
Besides my wonky code, which seems to work, I have a bigger problem. A lot of memory is used when this code runs. The tiflist can only get through the first 4 files of the list and by then RAM, which is 16 GB on my laptop, is completely consumed. I'm pretty sure it's the connections to the Google Drive, since the cache for the drive stream is at least 8 GB. I guess each raster is staying open after being connected to in the Google Drive? I don't know how to confirm that.
I think I need to get the function to clear out all of the objects that are created, e.g. temras, rasmatx, tempol, etc., after processing each raster, but I'm not sure how to do that. I did try adding rm(temras ...) to the end of the function, but when I did that, there was no output at all from the function after 10 minutes and by then, I've usually got the first 3 rasters processed.
27/Oct EDIT after comments from RobertHijmans. It seems that the states with large geographic extents are causing problems with rasterToPolygons(). I edited the code from the way it works for me locally to work with the demo data I included, since RobertHijmans pointed out it wasn't functional. So I hope this is now reproducible.
I feel silly answering my own question, but here it is: the rasterToPolygons function is notoriously slow. I was unaware of this issue. I waited 30 minutes before killing the process with no result in one of my attempts. It works on the conditions I require for rasters for Alabama and Arkansas for example, but not California.
A submitted solution, which I am in the process of testing, comes from this GitHub repo. The test is ongoing at 12 minutes, so I don't know if it works for an object as large as California. I don't want to copy and paste someone else's code in an answer to my own question.
One of the comments suggested using profvis, but I couldn't really figure out the output. And it hung with the process too.
I have a dataset contains lines and I have imported them into R. I want to take a close look at the coordinates of them and define the identical first and last coordinate of each point if there is some -->(looking for polygon). Therefore, I am using Slot which makes me able to have a close look at the details of the desired object.
My final goal is to define the number of identical point coordinates(First and last) for each line in order to discover the number of the potential polygon in my data.
recapping my difficulty is the following question:
How many of lines objects have potential to be a polygon?
To do so, I have done several steps:
In the first step, I read my data into R.
The second step, I have used slot to have a close look at the coordinate of each point(Sequence of points represent line object).
the third step: I have tried to define the number of identical points but I have faced with an error says CRDs not found
at the following, you can tack a look at the codes
enter library(maptools)
#Read data directly from National Geophysical Data Center (NGDC) coastline
#extractor.
shorelinedat="http://www.asdar-book.org/RC1/datasets/auckland_mapgen.dat"
#Assign CRS
llCRS <- CRS("+proj=longlat +ellps=WGS84")
#Read data from mapgen into a SpatialLines object.
auck_shore <- MapGen2SL("auckland_mapgen.dat", llCRS)
#Required code to identify the lines.
lns <- slot(auck_shore, "lines")
table(sapply(lns, function(x) length(slot(x, "Lines"))))
Here is the code in which I faced with the error
#identifying the number of identical coordinates
islands_auck <- sapply(lns, function(x) {
+ crds <- slot(slot(x, "Lines")[[1]], "coords")
+ identical(crds[1, ], crds[nrow(crds), ])
+ })
This is the error
Error in +crds <- slot(slot(x, "Lines")[[1]], "coords") :
object 'crds' not found
I would appreciate if anyone can give a hint.
This is just a guess, are those +s actually in your code? If I try to assign to a variable with a + before it (as you do in the code example you posted), I get the same error you got:
+ crds <- 5
Error in +crds <- 5 : object 'crds' not found
When you run a multi-line block of code, R inserts + in the console to show continuing lines, but that's just a visual effect and they are illegal in your actual code.
The problem I had was related to the syntax in the final code which I modified them in below
islands_auck <- sapply(lns, function(x) {
crds <- slot(slot(x, "Lines")[[1]], "coords")
identical(crds[1, ], crds[nrow(crds), ])
})
table(islands_auck)
so the final result will be
islands_auck
FALSE TRUE
16 64
16 lines without equal first and last coordinate (Closed polygon)
64 lines with closed polygon
I have some large shapefiles with multiple millions of polygons that I need to dissolve. Depending upon the shapefile I need to either dissolve by group or just use st_union for all. I have been using the st_par function and it has been working great for most sf applications. Though when I use this function on st_union it returns a list and I cannot figure out how to parallize the sf dissolve function st_union.
Any suggestions would be most helpful! Here is a small code snippet to illustrate my point.
library(sf)
library(assertthat)
library(parallel)
us_shp <- "data/cb_2016_us_state_20m/cb_2016_us_state_20m.shp"
if (!file.exists(us_shp)) {
loc <- "https://www2.census.gov/geo/tiger/GENZ2016/shp/cb_2016_us_state_20m.zip"
dest <- paste0("data/cb_2016_us_state_20m", ".zip")
download.file(loc, dest)
unzip(dest, exdir = "data/cb_2016_us_state_20m")
unlink(dest)
assert_that(file.exists(us_shp))
}
usa <- st_read("data/cb_2016_us_state_20m/cb_2016_us_state_20m.shp", quiet= TRUE) %>%
filter(!(STUSPS %in% c("AK", "HI", "PR")))
test <- usa %>%
st_par(., st_union, n_cores = 2)
I think you can solve your specific problem with a small modification of the original st_par function.
However this is just a quick and bold fix and this might broke the code for other uses of the function.
The author of the function could certainly provide a better fix...
library(parallel)
# Paralise any simple features analysis.
st_par <- function(sf_df, sf_func, n_cores, ...){
# Create a vector to split the data set up by.
split_vector <- rep(1:n_cores, each = nrow(sf_df) / n_cores, length.out = nrow(sf_df))
# Perform GIS analysis
split_results <- split(sf_df, split_vector) %>%
mclapply(function(x) sf_func(x), mc.cores = n_cores)
# Combine results back together. Method of combining depends on the output from the function.
if ( length(class(split_results[[1]]))>1 | class(split_results[[1]])[1] == 'list' ){
result <- do.call("c", split_results)
names(result) <- NULL
} else {
result <- do.call("rbind", split_results)
}
# Return result
return(result)
}
I was trying to use this for st_join and was running into problems with the returned data type. In looking at the result more closely it became evident that the split_results was just a list of sf objects. I ended up modifying the code to use dplyr::bind_rows() to get what I wanted.
There probably needs to be some more logic around the "combine" to deal with different return types but this works for the st_join function.
# Parallelise any simple features analysis.
st_par <- function(sf_df, sf_func, n_cores, ...) {
# Create a vector to split the data set up by.
split_vector <- rep(1:n_cores, each = nrow(sf_df) / n_cores, length.out = nrow(sf_df))
# Perform GIS analysis
split_results <- split(sf_df, split_vector) %>%
mclapply(function(x) sf_func(x, ...), mc.cores = n_cores)
# Combine results back together. Method of combining probably depends on the
# output from the function. For st_join it is a list of sf objects. This
# satisfies my needs for reverse geocoding
result <- dplyr::bind_rows(split_results)
# Return result
return(result)
}
I've been trying to find a time-efficient way to merge multiple raster images in R. These are adjacent ASTER scenes from the southern Kilimanjaro region, and my target is to put them together to obtain one large image.
This is what I got so far (object 'ast14dmo' representing a list of RasterLayer objects):
# Loop through single ASTER scenes
for (i in seq(ast14dmo.sd)) {
if (i == 1) {
# Merge current with subsequent scene
ast14dmo.sd.mrg <- merge(ast14dmo.sd[[i]], ast14dmo.sd[[i+1]], tolerance = 1)
} else if (i > 1 && i < length(ast14dmo.sd)) {
tmp.mrg <- merge(ast14dmo.sd[[i]], ast14dmo.sd[[i+1]], tolerance = 1)
ast14dmo.sd.mrg <- merge(ast14dmo.sd.mrg, tmp.mrg, tolerance = 1)
} else {
# Save merged image
writeRaster(ast14dmo.sd.mrg, paste(path.mrg, "/AST14DMO_sd_", z, "m_mrg", sep = ""), format = "GTiff", overwrite = TRUE)
}
}
As you surely guess, the code works. However, merging takes quite long considering that each single raster object is some 70 mb large. I also tried Reduce and do.call, but that failed since I couldn't pass the argument 'tolerance' which circumvents the different origins of the raster files.
Anybody got an idea of how to speed things up?
You can use do.call
ast14dmo.sd$tolerance <- 1
ast14dmo.sd$filename <- paste(path.mrg, "/AST14DMO_sd_", z, "m_mrg.tif", sep = "")
ast14dmo.sd$overwrite <- TRUE
mm <- do.call(merge, ast14dmo.sd)
Here with some data, from the example in raster::merge
r1 <- raster(xmx=-150, ymn=60, ncols=30, nrows=30)
r1[] <- 1:ncell(r1)
r2 <- raster(xmn=-100, xmx=-50, ymx=50, ymn=30)
res(r2) <- c(xres(r1), yres(r1))
r2[] <- 1:ncell(r2)
x <- list(r1, r2)
names(x) <- c("x", "y")
x$filename <- 'test.tif'
x$overwrite <- TRUE
m <- do.call(merge, x)
The 'merge' function from the Raster package is a little slow. For large projects a faster option is to work with gdal commands in R.
library(gdalUtils)
library(rgdal)
Build list of all raster files you want to join (in your current working directory).
all_my_rasts <- c('r1.tif', 'r2.tif', 'r3.tif')
Make a template raster file to build onto. Think of this a big blank canvas to add tiles to.
e <- extent(-131, -124, 49, 53)
template <- raster(e)
projection(template) <- '+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs'
writeRaster(template, file="MyBigNastyRasty.tif", format="GTiff")
Merge all raster tiles into one big raster.
mosaic_rasters(gdalfile=all_my_rasts,dst_dataset="MyBigNastyRasty.tif",of="GTiff")
gdalinfo("MyBigNastyRasty.tif")
This should work pretty well for speed (faster than merge in the raster package), but if you have thousands of tiles you might even want to look into building a vrt first.
You can use Reduce like this for example :
Reduce(function(...)merge(...,tolerance=1),ast14dmo.sd)
SAGA GIS mosaicking tool (http://www.saga-gis.org/saga_tool_doc/7.3.0/grid_tools_3.html) gives you maximum flexibility for merging numeric layers, and it runs in parallel by default! You only have to translate all rasters/images to SAGA .sgrd format first, then run the command line saga_cmd.
I have tested the solution using gdalUtils as proposed by Matthew Bayly. It works quite well and fast (I have about 1000 images to merge). However, after checking with document of mosaic_raster function here, I found that it works without making a template raster before mosaic the images. I pasted the example codes from the document below:
outdir <- tempdir()
gdal_setInstallation()
valid_install <- !is.null(getOption("gdalUtils_gdalPath"))
if(require(raster) && require(rgdal) && valid_install)
{
layer1 <- system.file("external/tahoe_lidar_bareearth.tif", package="gdalUtils")
layer2 <- system.file("external/tahoe_lidar_highesthit.tif", package="gdalUtils")
mosaic_rasters(gdalfile=c(layer1,layer2),dst_dataset=file.path(outdir,"test_mosaic.envi"),
separate=TRUE,of="ENVI",verbose=TRUE)
gdalinfo("test_mosaic.envi")
}
I was faced with this same problem and I used
#Read desired files into R
data_name1<-'file_name1.tif'
r1=raster(data_name1)
data_name2<-'file_name2.tif'
r2=raster(data_name2)
#Merge files
new_data <- raster::merge(r1, r2)
Although it did not produce a new merged raster file, it stored in the data environment and produced a merged map when plotted.
I ran into the following problem when trying to mosaic several rasters on top of each other
In vv[is.na(vv)] <- getValues(x[[i]])[is.na(vv)] :
number of items to replace is not a multiple of replacement length
As #Robert Hijmans pointed out, it was likely because of misaligned rasters. To work around this, I had to resample the rasters first
library(raster)
x <- raster("Base_raster.tif")
r1 <- raster("Top1_raster.tif")
r2 <- raster("Top2_raster.tif")
# Resample
x1 <- resample(r1, crop(x, r1))
x2 <- resample(r2, crop(x, r2))
# Merge rasters. Make sure to use the right order
m <- merge(merge(x1, x2), x)
# Write output
writeRaster(m,
filename = file.path("Mosaic_raster.tif"),
format = "GTiff",
overwrite = TRUE)