I am trying to loop several functions from the 'raster' package, namely, crop(), mask(), reclassify() and unstack()/as.list(). I have ten raster layers that share the same extent and data type; they correspond to land cover over 10 time-points. I want to create individual list variables for each output of crop() -> mask() -> reclassify() -> as.list(). I was able to pipe the process for 1 polygon feature, but I need to be able to loop it for each of the 10 polygon features stored in the multipolygon Shapefile, such that I can save each output list according to a specified naming convention.
Thank you and please advise. I share my code below.
EDIT: I am wondering if a for-loop is the right way to go about this, or would an lapply approach be better?
# Load libraries
library(raster) # for raster processing
library(rgdal) # for raster/vector processing
library(sf) # for Shapefile processing
# Stack 10 rasters together
raster.stack = stack(
raster("path/raster1.tif"),
raster("path/raster2.tif"),
raster("path/raster3.tif"),
raster("path/raster4.tif"),
raster("path/raster5.tif"),
raster("path/raster6.tif"),
raster("path/raster7.tif"),
raster("path/raster8.tif"),
raster("path/raster9.tif"),
raster("path/raster10.tif")
)
# Prepare reclassification codes from 9-class raster to 3-class raster
reclasscodes = c(
0,0, # no data
1,1,
2,1,
3,1,
4,1,
5,2,
6,2,
7,3,
8,3,
9,3
)
# Convert reclass codes list into n x 2 matrix
reclassmatrix = matrix(reclasscodes, ncol=2, byrow = T)
# Load multipolygon vector Shapefile
multipolygon = shapefile("path/multipolygon.shp") # Shapefile is made of n polygons
# Example subset Shapefile to polygon_1 using attribute "ID"
polygon_1 = subset(multipolygon,ID=="D-4")
# Create output for polygon_1
list_polygon_1 =
raster.stack %>%
crop(y = polygon_1) %>% # crop to bounds
mask(mask = polygon_1) %>% # mask to polygon cutline
reclassify(rcl = reclassmatrix) %>% # reclassify to 3-class
as.list() # functions the same as unstack() where raster brick is converted to list of raster layers
# I use %>% because I do not want to save any of the intermediate outputs.
# Resultant output is a variable list for polygon_1 named 'list_polygon_1' which is exactly what I want.
# Worked perfectly.
# How do I repeat this process for polygon_1 to polygon n?
# My attempt
for (i in 1:nrow(multipolygon)) {
raster.stack %>%
crop(y = multipolygon[i,]) %>%
mask(mask = multipolygon[i,]) %>%
reclassify(rcl = reclassmatrix) %>%
as.list() %>% # up till here it is the same steps as before for polygon_1
# now I want to save each list output as a separate variable according to i, e.g. list_polygon_2, list_polygon_3 etc.
assign(paste(multipolygon$ID, i, sep = '_')) # assign a naming convention for each output variable
}
# Does not work. Even without the last line of code "..assign(paste(...))" there is no output variable from the as.list() line.
Here is a minimal self-contained reproducible example.
Example data
library(raster)
s <- stack(system.file("external/rlogo.grd", package="raster"))
xy1 <- xy2 <- xy3 <- matrix(c(10,17, 6,10,71,60,62,71), ncol=2)
xy2[,1] <- xy2[,1] + 30
xy3[,2] <- xy3[,2] - 30
p <- spPolygons(xy1, xy2, xy3)
#plot(r, 1)
#lines(p)
What you are after
rm = matrix(c(0,100,0,100,150,2,150,255,3), ncol=3, byrow=TRUE)
out <- list()
for (i in 1:length(p)) {
x <- crop(s, p[i,])
x <- mask(x, p[i,])
out[[i]] <- reclassify(x, rm)
}
What you are saying about unstack does not make sense (and unlist does not work). I would advise against it, but you could do
out2 <- lapply(out, unstack)
I am not sure what you are really after. If you want the cell values you can make it much simpler (no need for a loop) and do
r <- reclassify(s, rm)
e <- extract(r, p)
To your question about lapply vs a loop. In terms of performance that rarely matters. lapply can be concise, but in cases like this, writing a loop is better as it is easier to read, and write, especially if you do not use %>%.
Related
Basically I have 12 multispectral images, and I want to mask them using 2 polygons (small waterbodies). The 2 polygons are in one shapefile, but I can break them up if it would make the process easier. With the help of some nice users on here, I tested this all out using the 12 images on one polygon and it works just fine, but I'll eventually need to do this for multiple polygons so I want to adapt my code.
The loop to crop all rasters using a single polygon:
#The single polygon
mask <- st_read(here::here("data", "mask.shp") %>%
st_as_sf()
#Creates list of input files and their paths
crop_in <- list.files(here::here("data", "s2_rasters"), pattern="tif$", full.names=TRUE)
#Creates list of output files and their directory.
crop_out <- gsub(here::here("data", "s2_rasters"), here::here("data", "s2_cropped"), crop_in)
for (i in seq_along(crop_in)) {
b <- brick(crop_in[i])
crop(b, mask, filename = crop_out[i])
}
Like I said this works just fine, but I want to mask instead of crop. Additionally, I need to mask using multiple polygons.
My working loop to do the same thing but for multiple (2) polygons:
masks_2 <- st_read(here::here("data", "multiple_masks.shp")) %>%
st_as_sf()
for (i in seq_along(crop_in)) {
b <- brick(crop_in[i])
mask(b, masks_2, filename = crop_out[i], overwrite = TRUE)
}
This took around 2 hours (which makes me suspicious) and I think it lost the polygon id somewhere along the way. When I tried plotting the results the plot was empty. My final output should be 24 rasterstacks, 12 for each polygon. I will need to do further image analysis so I will need to keep the names. I hope this makes sense and thank you!
Here is a minimal, self-contained, reproducible example using terra because it is much faster than raster (make sure you are using the current version)
Raster dataset with 12 layers
library(terra)
f <- system.file("ex/elev.tif", package="terra")
r <- rast(f)
r <- rep(r, 12) * 1:12
names(r) <- paste0("band", 1:12)
Two "lakes"
v <- vect(system.file("ex/lux.shp", package="terra"))
v <- v[c(1,12)]
Solution:
x <- mask(r, v)
And always try things for a single case before running the loop.
So if you have 12 files, you can do something like
inf <- list.files("data/s2_rasters", pattern="tif$", full.names=TRUE)
outf <- gsub(".tif$", "_masked.tif", inf)
for (for i in 1:length(inf)) {
r <- rast(inf[i])
m <- mask(r, v, filename=outf[i])
}
It might be a little faster to instead do this (only rasterize the polygons once)
msk <- rast(inf[1])
msk <- rasterize(v, msk)
for (for i in 1:length(inf)) {
r <- rast(inf[i])
m <- mask(r, msk, filename=outf[i])
}
Or make one object/file, if that is practical.
rr <- rast(inf)
mm <- mask(rr, v)
Summary: Despite a complicated lead-up, the solution was very simple: In order to plot a row of a dataframe as a line instead of a lattice, I needed to transpose the data in order to invert from x obs of y variables to y obs of x variables.
I am using RStudio on a Windows 10 computer.
I am using scientific equipment to write measurements to a csv file. Then I ZIP several files and read to R using read.csv. However, the data frame behaves strangely. Commands "length" and "dim" disagree and the "plot" function throws errors. Because I can create simulated data that doesn't throw the errors, I think the problem is either in how the machine wrote the data or in my loading and processing of the data.
Two ZIP files are located in my stackoverflow repository (with "Monterey Jack" in the name):
https://github.com/baprisbrey/stackoverflow
Here is my code for reading and processing them:
# Unzip the folders
unZIP <- function(folder){
orig.directory <- getwd()
setwd(folder)
zipped.folders <- list.files(pattern = ".*zip")
for (i in zipped.folders){
unzip(i)}
setwd(orig.directory)
}
folder <- "C:/Users/user/Documents/StackOverflow"
unZIP(folder)
# Load the data into a list of lists
pullData <- function(folder){
orig.directory <- getwd()
setwd(folder)
#zipped.folders <- list.files(pattern = ".*zip")
#unzipped.folders <- list.files(folder)[!(list.files(folder) %in% zipped.folders)]
unzipped.folders <- list.dirs(folder)[-1] # Removing itself as the first directory.
oData <- vector(mode = "list", length = length(unzipped.folders))
names(oData) <- str_remove(unzipped.folders, paste(folder,"/",sep=""))
for (i in unzipped.folders) {
filenames <- list.files(i, pattern = "*.csv")
#setwd(paste(folder, i, sep="/"))
setwd(i)
files <- lapply(filenames, read.csv, skip = 5, header = TRUE, fileEncoding = "UTF-16LE") #Note unusual encoding
oData[[str_remove(i, paste(folder,"/",sep=""))]] <- vector(mode="list", length = length(files))
oData[[str_remove(i, paste(folder,"/",sep=""))]] <- files
}
setwd(orig.directory)
return(oData)
}
theData <- pullData(folder) #Load the data into a list of lists
# Process the data into frames
bigFrame <- function(bigList) {
#where bigList is theData is the result of pullData
#initialize the holding list of frames per set
preList <- vector(mode="list", length = length(bigList))
names(preList) <- names(bigList)
# process the data
for (i in 1:length(bigList)){
step1 <- lapply(bigList[[i]], t) # transpose each data
step2 <- do.call(rbind, step1) # roll it up into it's own matrix #original error that wasn't reproduced: It showed length(step2) = 24048 when i = 1 and dim(step2) = 48 501. Any comments on why?
firstRow <- step2[1,] #holding onto the first row to become the names
step3 <- as.data.frame(step2) # turn it into a frame
step4 <- step3[grepl("µA", rownames(step3)),] # Get rid of all those excess name rows
rownames(step4) <- 1:(nrow(step4)) # change the row names to rowID's
colnames(step4) <- firstRow # change the column names to the first row steps
step4$ID <- rep(names(bigList[i]),nrow(step4)) # Add an I.D. column
step4$Class[grepl("pos",tolower(step4$ID))] <- "Yes" # Add "Yes" class
step4$Class[grepl("neg",tolower(step4$ID))] <- "No" # Add "No" class
preList[[i]] <- step4
}
# bigFrame <- do.call(rbind, preList) #Failed due to different number of measurements (rows that become columns) across all the data sets
# return(bigFrame)
return(preList) # Works!
}
frameList <- bigFrame(theData)
monterey <- rbind(frameList[[1]],frameList[[2]])
# Odd behaviors
dim(monterey) #48 503
length(monterey) #503 #This is not reproducing my original error of length = 24048
rowOne <- monterey[1,1:(ncol(monterey)-2)]
plot(rowOne) #Error in plot.new() : figure margins too large
#describe the data
quantile(rowOne, seq(0, 1, length.out = 11) )
quantile(rowOne, seq(0, 1, length.out = 11) ) %>% plot #produces undesired lattice plot
# simulate the data
doppelganger <- sample(1:20461,501,replace = TRUE)
names(doppelganger) <- names(rowOne)
# describe the data
plot(doppelganger) #Successful scatterplot. (With my non-random data, I want a line where the numbers in colnames are along the x-axis)
quantile(doppelganger, seq(0, 1, length.out = 11) ) #the random distribution is mildly different
quantile(doppelganger, seq(0, 1, length.out = 11) ) %>% plot # a simple line of dots as desired
# investigating structure
str(rowOne) # results in a dataframe of 1 observation of 501 variables. This is a correct interpretation.
str(as.data.frame(doppelganger)) # results in 501 observations of 1 variable. This is not a correct interpretation but creates the plot that I want.
How do I convert the rowOne to plot like doppelganger?
It looks like one of my errors is not reproducing, where calls to "dim" and "length" apparently disagree.
However, I'm confused as to why the "plot" function is producing a lattice plot on my processed data and a line of dots on my simulated data.
What I would like is to plot each row of data as a line. (Next, and out of the scope of this question, is I would like to classify the data with adaboost. My concern is that if "plot" behaves strangely then the classifier won't work.)
Any tips or suggestions or explanations or advice would be greatly appreciated.
Edit: Investigating the structure with ("str") of the two examples explains the difference between plots. I guess my modified question is, how do I switch between the two structures to enable plotting a line (like doppelganger) instead of a lattice (like rowOne)?
I am answering my own question.
I am leaving behind the part about the discrepancy between "length" and "dim" since I can't provide a reproducible example. However, I'm happy to leave up for comment.
The answer is that in order to produce my plot, I simply have to transpose the row as follows:
rowOne %>% t() %>% as.data.frame() %>% plot
This inverts the structure from one observation of 501 variables to 501 obs of one variable as follows:
rowOne %>% t() %>% as.data.frame() %>% str()
#'data.frame': 501 obs. of 1 variable:
# $ 1: num 8712 8712 8712 8712 8712 ...
Because of the unusual encoding I used, and the strange "length" result, I failed to see a simple solution to my "plot" problem.
I have some large shapefiles with multiple millions of polygons that I need to dissolve. Depending upon the shapefile I need to either dissolve by group or just use st_union for all. I have been using the st_par function and it has been working great for most sf applications. Though when I use this function on st_union it returns a list and I cannot figure out how to parallize the sf dissolve function st_union.
Any suggestions would be most helpful! Here is a small code snippet to illustrate my point.
library(sf)
library(assertthat)
library(parallel)
us_shp <- "data/cb_2016_us_state_20m/cb_2016_us_state_20m.shp"
if (!file.exists(us_shp)) {
loc <- "https://www2.census.gov/geo/tiger/GENZ2016/shp/cb_2016_us_state_20m.zip"
dest <- paste0("data/cb_2016_us_state_20m", ".zip")
download.file(loc, dest)
unzip(dest, exdir = "data/cb_2016_us_state_20m")
unlink(dest)
assert_that(file.exists(us_shp))
}
usa <- st_read("data/cb_2016_us_state_20m/cb_2016_us_state_20m.shp", quiet= TRUE) %>%
filter(!(STUSPS %in% c("AK", "HI", "PR")))
test <- usa %>%
st_par(., st_union, n_cores = 2)
I think you can solve your specific problem with a small modification of the original st_par function.
However this is just a quick and bold fix and this might broke the code for other uses of the function.
The author of the function could certainly provide a better fix...
library(parallel)
# Paralise any simple features analysis.
st_par <- function(sf_df, sf_func, n_cores, ...){
# Create a vector to split the data set up by.
split_vector <- rep(1:n_cores, each = nrow(sf_df) / n_cores, length.out = nrow(sf_df))
# Perform GIS analysis
split_results <- split(sf_df, split_vector) %>%
mclapply(function(x) sf_func(x), mc.cores = n_cores)
# Combine results back together. Method of combining depends on the output from the function.
if ( length(class(split_results[[1]]))>1 | class(split_results[[1]])[1] == 'list' ){
result <- do.call("c", split_results)
names(result) <- NULL
} else {
result <- do.call("rbind", split_results)
}
# Return result
return(result)
}
I was trying to use this for st_join and was running into problems with the returned data type. In looking at the result more closely it became evident that the split_results was just a list of sf objects. I ended up modifying the code to use dplyr::bind_rows() to get what I wanted.
There probably needs to be some more logic around the "combine" to deal with different return types but this works for the st_join function.
# Parallelise any simple features analysis.
st_par <- function(sf_df, sf_func, n_cores, ...) {
# Create a vector to split the data set up by.
split_vector <- rep(1:n_cores, each = nrow(sf_df) / n_cores, length.out = nrow(sf_df))
# Perform GIS analysis
split_results <- split(sf_df, split_vector) %>%
mclapply(function(x) sf_func(x, ...), mc.cores = n_cores)
# Combine results back together. Method of combining probably depends on the
# output from the function. For st_join it is a list of sf objects. This
# satisfies my needs for reverse geocoding
result <- dplyr::bind_rows(split_results)
# Return result
return(result)
}
I have a question about stack() Raster Layers.
Usually I stack() Raster Layers like that:
stack(RasterLayer1,RasterLayer2,RasterLayer3) # e.g. for 3 Layers
My Question is, how can I stack() Raster Layers without typing in every Raster Layer?
For example: n is the amount of Raster Layers (e.g. 12), all named band.
I created n-Raster Layers and now I want to stack all without typing n-times the Name of the Raster Layers. So instead of typing:
stack(band1,band2,band3,band4,band5,band6,band7,band8,band9,band10,band11,band12)
I want to short that by stack(band[n]), but that doesn't work.
And if I create a list of all bands, I can't stack that list, because they don't appear in my Working Directory because I just created them.
Can anyone help me, please?
If your data is in a directory, you can use a search pattern (for example: *.tif, *.grd,...) and store it in a variable.
bands <- list.files(path=".",pattern="*.tif",full.names=TRUE,recursive=TRUE)
now assume that your data is called:
band_01.tif
band_02.tif
band_03.tif
band_04.tif
band_05.tif
band_06.tif
band_07.tif
then you can stack for example:
data_stack <- stack(bands) #stack all data
data_stack <- stack(bands[1:3]) #stack 1,2 and 3 data
data_stack <- stack(bands[c(1,3,5,7)])
I would recommend not to save them under separate variables like band1,band2,... but instead store them in a list. Here an example:
#Create empty rasters
ras1<- raster()
ras2<- raster()
#Initialise and append to list
list_ras <- list()
list_ras[[1]] <- ras1
list_ras[[2]] <- ras2
#Stack single bands
ras_stack <- stack(list_ras[[1]], list_ras[[2]])
#Stack all bands
ras_stack <- stack(list_ras)
Here is an other approach using mget:
# Generate some data
library(raster)
r <- raster()
r[] <- runif(ncell(r))
for (i in 1:10) assign(paste0("r", i), r)
# create a stack
stack(mget(ls(pattern = "^r.+")))
I've been trying to find a time-efficient way to merge multiple raster images in R. These are adjacent ASTER scenes from the southern Kilimanjaro region, and my target is to put them together to obtain one large image.
This is what I got so far (object 'ast14dmo' representing a list of RasterLayer objects):
# Loop through single ASTER scenes
for (i in seq(ast14dmo.sd)) {
if (i == 1) {
# Merge current with subsequent scene
ast14dmo.sd.mrg <- merge(ast14dmo.sd[[i]], ast14dmo.sd[[i+1]], tolerance = 1)
} else if (i > 1 && i < length(ast14dmo.sd)) {
tmp.mrg <- merge(ast14dmo.sd[[i]], ast14dmo.sd[[i+1]], tolerance = 1)
ast14dmo.sd.mrg <- merge(ast14dmo.sd.mrg, tmp.mrg, tolerance = 1)
} else {
# Save merged image
writeRaster(ast14dmo.sd.mrg, paste(path.mrg, "/AST14DMO_sd_", z, "m_mrg", sep = ""), format = "GTiff", overwrite = TRUE)
}
}
As you surely guess, the code works. However, merging takes quite long considering that each single raster object is some 70 mb large. I also tried Reduce and do.call, but that failed since I couldn't pass the argument 'tolerance' which circumvents the different origins of the raster files.
Anybody got an idea of how to speed things up?
You can use do.call
ast14dmo.sd$tolerance <- 1
ast14dmo.sd$filename <- paste(path.mrg, "/AST14DMO_sd_", z, "m_mrg.tif", sep = "")
ast14dmo.sd$overwrite <- TRUE
mm <- do.call(merge, ast14dmo.sd)
Here with some data, from the example in raster::merge
r1 <- raster(xmx=-150, ymn=60, ncols=30, nrows=30)
r1[] <- 1:ncell(r1)
r2 <- raster(xmn=-100, xmx=-50, ymx=50, ymn=30)
res(r2) <- c(xres(r1), yres(r1))
r2[] <- 1:ncell(r2)
x <- list(r1, r2)
names(x) <- c("x", "y")
x$filename <- 'test.tif'
x$overwrite <- TRUE
m <- do.call(merge, x)
The 'merge' function from the Raster package is a little slow. For large projects a faster option is to work with gdal commands in R.
library(gdalUtils)
library(rgdal)
Build list of all raster files you want to join (in your current working directory).
all_my_rasts <- c('r1.tif', 'r2.tif', 'r3.tif')
Make a template raster file to build onto. Think of this a big blank canvas to add tiles to.
e <- extent(-131, -124, 49, 53)
template <- raster(e)
projection(template) <- '+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs'
writeRaster(template, file="MyBigNastyRasty.tif", format="GTiff")
Merge all raster tiles into one big raster.
mosaic_rasters(gdalfile=all_my_rasts,dst_dataset="MyBigNastyRasty.tif",of="GTiff")
gdalinfo("MyBigNastyRasty.tif")
This should work pretty well for speed (faster than merge in the raster package), but if you have thousands of tiles you might even want to look into building a vrt first.
You can use Reduce like this for example :
Reduce(function(...)merge(...,tolerance=1),ast14dmo.sd)
SAGA GIS mosaicking tool (http://www.saga-gis.org/saga_tool_doc/7.3.0/grid_tools_3.html) gives you maximum flexibility for merging numeric layers, and it runs in parallel by default! You only have to translate all rasters/images to SAGA .sgrd format first, then run the command line saga_cmd.
I have tested the solution using gdalUtils as proposed by Matthew Bayly. It works quite well and fast (I have about 1000 images to merge). However, after checking with document of mosaic_raster function here, I found that it works without making a template raster before mosaic the images. I pasted the example codes from the document below:
outdir <- tempdir()
gdal_setInstallation()
valid_install <- !is.null(getOption("gdalUtils_gdalPath"))
if(require(raster) && require(rgdal) && valid_install)
{
layer1 <- system.file("external/tahoe_lidar_bareearth.tif", package="gdalUtils")
layer2 <- system.file("external/tahoe_lidar_highesthit.tif", package="gdalUtils")
mosaic_rasters(gdalfile=c(layer1,layer2),dst_dataset=file.path(outdir,"test_mosaic.envi"),
separate=TRUE,of="ENVI",verbose=TRUE)
gdalinfo("test_mosaic.envi")
}
I was faced with this same problem and I used
#Read desired files into R
data_name1<-'file_name1.tif'
r1=raster(data_name1)
data_name2<-'file_name2.tif'
r2=raster(data_name2)
#Merge files
new_data <- raster::merge(r1, r2)
Although it did not produce a new merged raster file, it stored in the data environment and produced a merged map when plotted.
I ran into the following problem when trying to mosaic several rasters on top of each other
In vv[is.na(vv)] <- getValues(x[[i]])[is.na(vv)] :
number of items to replace is not a multiple of replacement length
As #Robert Hijmans pointed out, it was likely because of misaligned rasters. To work around this, I had to resample the rasters first
library(raster)
x <- raster("Base_raster.tif")
r1 <- raster("Top1_raster.tif")
r2 <- raster("Top2_raster.tif")
# Resample
x1 <- resample(r1, crop(x, r1))
x2 <- resample(r2, crop(x, r2))
# Merge rasters. Make sure to use the right order
m <- merge(merge(x1, x2), x)
# Write output
writeRaster(m,
filename = file.path("Mosaic_raster.tif"),
format = "GTiff",
overwrite = TRUE)