How to Loop over Rasters to convert them to data.frames - r

I have some rasters that I would like to transform to data frames. I can do it manually one by one but it is ineffcient. When I try to make a loop (using a list or vector with names) the code doesn't work and R error says " Error in as.data.frame.default(x[[i]], optional = TRUE) : cannot coerce class ‘structure("RasterLayer", package = "raster")’ to a data.frame"
I have tried to make it using the function assign() but it doesn't work either. When using a vector of names I can only get R to make a dataframe of one single observation containing the name of the vector
When I do it one by one, R actually makes what I want. My code for one raster is just
#"a" is the name of the raster
r_1 <- as.data.frame(a, xy=TRUE, na.rm=TRUE, centroids=TRUE)
I have tried several things to male a loop but all have failed. First, I tried by creating a vector and looping with the function assign()
# "a" and "b" are the names of my rasters
o2 <- c("a","b")
for(i in 1:length(o2)){
nam <- substr(o2[i],1,nchar(o2))
assign(nam,as.data.frame(o2[i], xy=TRUE, na.rm=TRUE, centroids=TRUE))
}
But this only creates a dataframe named a1 with one observation "a1" and one variable.
I have tried to make a list too
o4 <- list(a,b)
for(i in 1:length(o4)){
nam <- substr(o4[i],1,nchar(ola4))
r_i <- as.data.frame(o4[i], xy=TRUE, na.rm=TRUE, centroids=TRUE)
}
The error this time says: " Error in as.data.frame.default(x[[i]], optional = TRUE) : cannot coerce class ‘structure("RasterLayer", package = "raster")’ to a data.frame"
I expect to have a data frame with three columns and as much rows as cells in my raster. The columns should be the latitude and longitude of the centroid of each cell and a column with the information each cell. I don't see any mistake in my code, maybe someone can help me.
I created the rasters myself using different shapefiles. I have more than 40 rasters with the following characteristics: witdth 8806, height: 10389, origin: -77.6699, 4.94778, pixel size: 0,001041666, SRC: EPSG:4326 - WGS 84 - Geographic. As I said, I created the rasters myself and all of them have those same characteristics.

When asking a question like this, always include some example data (normally not your data). Here are use three (identical) raster files
f <- system.file("external/test.grd", package="raster")
ff <- c(f,f,f)
Now use lists to accomplish what you want.
r <- lapply(ff, raster)
x <- lapply(r, function(i) as.data.frame(i, xy=TRUE, na.rm=TRUE))
Never use assign

Instead of a loop you can use apply :
s=c(raster1,raster2,raster3)
lapply(s, as.data.frame)

Related

Extract raster by a list of SpatialPolygonsDataFrame objects in R

I am trying to extract summed raster cell values from a single big file for various SpatialPolygonsDataFrames (SPDF) objects in R stored in a list, then add the extracted values to the SPDF objects attribute tables. I would like to iterate this process, and have no idea how to do so. I have found an efficient solution for multiple polygons stored in a single SPDF object (see: https://gis.stackexchange.com/questions/130522/increasing-speed-of-crop-mask-extract-raster-by-many-polygons-in-r), but do not know how to apply the crop>mask>extract procedure to a LIST of SPDF objects, each containing multiple polygons. Here is a reproducible example:
library(maptools) ## For wrld_simpl
library(raster)
## Example SpatialPolygonsDataFrame
data(wrld_simpl) #polygon of world countries
bound <- wrld_simpl[1:25,] #country subset 1
bound2 <- wrld_simpl[26:36,] #subset 2
## Example RasterLayer
c <- raster(nrow=2e3, ncol=2e3, crs=proj4string(wrld_simpl), xmn=-180,
xmx=180, ymn=-90, ymx=90)
c[] <- 1:length(c)
#plot, so you can see it
plot(c)
plot(bound, add=TRUE)
plot(bound2, add=TRUE, col=3)
#make list of two SPDF objects
boundl<-list()
boundl[[1]]<-bound1
boundl[[2]]<-bound2
#confirm creation of SPDF list
boundl
The following is what I would like to run for the entire list, in a forloop format. For a single SPDF from the list, the following series of functions seem to work:
clip1 <- crop(c, extent(boundl[[1]])) #crops the raster to the extent of the polygon, I do this first because it speeds the mask up
clip2 <- mask(clip1, boundl[[1]]) #crops the raster to the polygon boundary
extract_clip <- extract(clip2, boundl[[1]], fun=sum)
#add column + extracted raster values to polygon dataframe
boundl[[1]]#data["newcolumn"] = extract_clip
But when I try to isolate the first function for the SPDF list (raster::crop), it does not return a raster object:
crop1 <- crop(c, extent(boundl[[1]])) #correctly returns object class 'RasterLayer'
cropl <- lapply(boundl, crop, c, extent(boundl)) #incorrectly returns objects of class 'SpatialPolygonsDataFrame'
When I try to isolate the mask function for the SPDF list (raster::mask), it returns an error:
maskl <- lapply(boundl, mask, c)
#Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘mask’ for signature ‘"SpatialPolygonsDataFrame", "RasterLayer"’
I would like to correct these errors, and efficiently iterate the entire procedure within a single loop (i.e., crop>mask>extract>add extracted values to SPDF attribute tables. I am really new to R and don't know where to go from here. Please help!
One approach is to take what is working and simply put the desired "crop -> mask -> extract -> add" into a for loop:
for(i in seq_along(boundl)) {
clip1 <- crop(c, extent(boundl[[i]]))
clip2 <- mask(clip1, boundl[[i]])
extract_clip <- extract(clip2, boundl[[i]], fun=sum)
boundl[[i]]#data["newcolumn"] <- extract_clip
}
One can speed-up the loop with parallel execution, e.g., with the R package foreach. Conversely, the speed gain of using lapply() instead of the for loop will be small.
Why the error occurs:
cropl <- lapply(boundl, crop, c, extent(boundl))
applies the function crop() to each element of the list boundl. The performed operation is
tmp <- crop(boundl[[1]], c)
## test if equal to first element
all.equal(cropl[[1]], tmp)
[1] TRUE
To get the desired result use
cropl <- lapply(boundl, function(x, c) crop(c, extent(x)), c=c)
## test if the first element is as expected
all.equal(cropl[[1]], crop(c, extent(boundl[[1]])))
[1] TRUE
Note:
Using c to denote an R object is a bade choice, because it can be easily confused with c().

Splitting an ffdf object

I'm using ff and ffbase libraries to manage a big csv file (~40Go and 275e6 observations). I'd like to split/partition this file according to one of its columns (which is a factor column).
With a normal data frame, I would do something like that:
a <- data.frame(rnorm(10000,0,1),
sample(1:100,10000,replace=T),
sample(letters,10000,replace = T))
names(a) <- c('V1','V2','V3')
a_partition <- split(a,a$V3)
names(a_partition) <- paste("df",names(a_partition),sep = "_")
list2env(a_partition,globalenv())
but ff and ffbase doesn't have a split function. So, looking in the ffbase documentation, I found ffdfply and tried to use it as follows:
ffa <- as.ffdf(a)
ffa_partititon <- ffdfdply(x = ffa,split = ffa$V3)
Alas, I get the log message :
calculating split sizes
building up split locations
working on split 1/1, extracting data in RAM of 26 split elements,
totalling, 0.00015 GB, while max specified
data specified using BATCHBYTES is 0.01999 GB
... applying FUN to selected data
Error: argument "FUN" is missing, with no default
I tried FUN = as.data.frame (since the result of the function must be a data frame) with no luck : doing so makes ffa_partition a copy of ffa...
How can I partition my ffdf?
Two years late, but I believe this does what you needed:
result_list <- list()
for(letter in letters){
result_list[[letter]] <- subset(ffa, V3 == letter)
}

Calculating percentile across netcdf files fails

I have five netcdf files where each file contains data for a time section. I want to calculate the 98th percentile for the whole timespan for each cell individually.
The accumulated file size for the netcdf files is around 250 MB.
My approach it this:
library(raster)
fileType="\\.nc$"
filenameList <- list.files(path=getwd(), pattern=fileType, full.names=F, recursive=FALSE)
#rasterStack for all layers
rasterStack <- stack()
#stack all data
for(i in 1:length(filenameList)){
filename <- filenameList[i]
stack.temp<-stack(filename)
rasterStack<-stack(rasterStack, stack.temp)
}
#calculate raster containing the 98th percentiles
result <- calc(rasterStack, fun = function(x) {quantile(x,probs = .98,na.rm=TRUE)} )
However, I get this error:
Error in ncdf4::nc_close(x#file#con) :
no slot of name "con" for this object of class ".RasterFile"
The stacking section of my code works, the crash happens during the calc function.
Do you have any idea where this might come from? Is it maybe an issue of where the data is stored (memory/disk)?
Strange, I generated some dummy data and it seems to work just fine, it does not seems to be your method. 250MB is not overly huge. I would clip a small piece of each raster and test if it works.
dat<-matrix(rnorm(16), 4, 4)
r1<-raster(dat)
r2<-r1*2
r3<-r2+1
r4<-r3+4
rStack <- stack(r1,r2,r3,r4)
result <- calc(rStack, fun = function(x) {quantile(x,probs = .98)} )
Perhaps this is related to the odd way you create a RasterStack. You should simply do:
filenames <- list.files(pattern="\\.nc$")
rasterStack <- stack(filenames)

Merging multiple rasters in R

I've been trying to find a time-efficient way to merge multiple raster images in R. These are adjacent ASTER scenes from the southern Kilimanjaro region, and my target is to put them together to obtain one large image.
This is what I got so far (object 'ast14dmo' representing a list of RasterLayer objects):
# Loop through single ASTER scenes
for (i in seq(ast14dmo.sd)) {
if (i == 1) {
# Merge current with subsequent scene
ast14dmo.sd.mrg <- merge(ast14dmo.sd[[i]], ast14dmo.sd[[i+1]], tolerance = 1)
} else if (i > 1 && i < length(ast14dmo.sd)) {
tmp.mrg <- merge(ast14dmo.sd[[i]], ast14dmo.sd[[i+1]], tolerance = 1)
ast14dmo.sd.mrg <- merge(ast14dmo.sd.mrg, tmp.mrg, tolerance = 1)
} else {
# Save merged image
writeRaster(ast14dmo.sd.mrg, paste(path.mrg, "/AST14DMO_sd_", z, "m_mrg", sep = ""), format = "GTiff", overwrite = TRUE)
}
}
As you surely guess, the code works. However, merging takes quite long considering that each single raster object is some 70 mb large. I also tried Reduce and do.call, but that failed since I couldn't pass the argument 'tolerance' which circumvents the different origins of the raster files.
Anybody got an idea of how to speed things up?
You can use do.call
ast14dmo.sd$tolerance <- 1
ast14dmo.sd$filename <- paste(path.mrg, "/AST14DMO_sd_", z, "m_mrg.tif", sep = "")
ast14dmo.sd$overwrite <- TRUE
mm <- do.call(merge, ast14dmo.sd)
Here with some data, from the example in raster::merge
r1 <- raster(xmx=-150, ymn=60, ncols=30, nrows=30)
r1[] <- 1:ncell(r1)
r2 <- raster(xmn=-100, xmx=-50, ymx=50, ymn=30)
res(r2) <- c(xres(r1), yres(r1))
r2[] <- 1:ncell(r2)
x <- list(r1, r2)
names(x) <- c("x", "y")
x$filename <- 'test.tif'
x$overwrite <- TRUE
m <- do.call(merge, x)
The 'merge' function from the Raster package is a little slow. For large projects a faster option is to work with gdal commands in R.
library(gdalUtils)
library(rgdal)
Build list of all raster files you want to join (in your current working directory).
all_my_rasts <- c('r1.tif', 'r2.tif', 'r3.tif')
Make a template raster file to build onto. Think of this a big blank canvas to add tiles to.
e <- extent(-131, -124, 49, 53)
template <- raster(e)
projection(template) <- '+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs'
writeRaster(template, file="MyBigNastyRasty.tif", format="GTiff")
Merge all raster tiles into one big raster.
mosaic_rasters(gdalfile=all_my_rasts,dst_dataset="MyBigNastyRasty.tif",of="GTiff")
gdalinfo("MyBigNastyRasty.tif")
This should work pretty well for speed (faster than merge in the raster package), but if you have thousands of tiles you might even want to look into building a vrt first.
You can use Reduce like this for example :
Reduce(function(...)merge(...,tolerance=1),ast14dmo.sd)
SAGA GIS mosaicking tool (http://www.saga-gis.org/saga_tool_doc/7.3.0/grid_tools_3.html) gives you maximum flexibility for merging numeric layers, and it runs in parallel by default! You only have to translate all rasters/images to SAGA .sgrd format first, then run the command line saga_cmd.
I have tested the solution using gdalUtils as proposed by Matthew Bayly. It works quite well and fast (I have about 1000 images to merge). However, after checking with document of mosaic_raster function here, I found that it works without making a template raster before mosaic the images. I pasted the example codes from the document below:
outdir <- tempdir()
gdal_setInstallation()
valid_install <- !is.null(getOption("gdalUtils_gdalPath"))
if(require(raster) && require(rgdal) && valid_install)
{
layer1 <- system.file("external/tahoe_lidar_bareearth.tif", package="gdalUtils")
layer2 <- system.file("external/tahoe_lidar_highesthit.tif", package="gdalUtils")
mosaic_rasters(gdalfile=c(layer1,layer2),dst_dataset=file.path(outdir,"test_mosaic.envi"),
separate=TRUE,of="ENVI",verbose=TRUE)
gdalinfo("test_mosaic.envi")
}
I was faced with this same problem and I used
#Read desired files into R
data_name1<-'file_name1.tif'
r1=raster(data_name1)
data_name2<-'file_name2.tif'
r2=raster(data_name2)
#Merge files
new_data <- raster::merge(r1, r2)
Although it did not produce a new merged raster file, it stored in the data environment and produced a merged map when plotted.
I ran into the following problem when trying to mosaic several rasters on top of each other
In vv[is.na(vv)] <- getValues(x[[i]])[is.na(vv)] :
number of items to replace is not a multiple of replacement length
As #Robert Hijmans pointed out, it was likely because of misaligned rasters. To work around this, I had to resample the rasters first
library(raster)
x <- raster("Base_raster.tif")
r1 <- raster("Top1_raster.tif")
r2 <- raster("Top2_raster.tif")
# Resample
x1 <- resample(r1, crop(x, r1))
x2 <- resample(r2, crop(x, r2))
# Merge rasters. Make sure to use the right order
m <- merge(merge(x1, x2), x)
# Write output
writeRaster(m,
filename = file.path("Mosaic_raster.tif"),
format = "GTiff",
overwrite = TRUE)

merge data vector to shapefile data slot

i try do add economic data to a shapefile using merge and the 2 digit ISO code as ID. The code looks somewhat like this:
library(maptools)
library(foreign)
library(sp)
library(lattice)
library(shapefiles)
world.shp<-readShapePoly("world_shapefile.shp")
world.shp#data<-merge(world.shp#data, data.frame(country=iso.code.vector, net=country.data.vector), by.x="ISO2", by.y="country", all.x=TRUE, sort=FALSE)
Unfortunately this ruins the order of the .shp file even though i put the sort argument. A plot afterwards shows me that the data does not match the polygons like it should. What am i doing wrong?
i got the world map data from thematicmapping.org
Thanks for your help
Merge will always break the sp object. Here are two ways to merge a dataframe to the sp #data datframe.
shape#data = data.frame(shape#data, OtherData[match(sdata#data$IDS, OtherData$IDS),])
Where; shape is your shape file, IDS is the identifier you want to merge on and OtherData is the dataframe that you want to combine with shape. Note that IDS can be different names in the two datasets but need to actually be the same values (not fuzzy).
Alternatively you can use this function.
join.sp.df <- function(x, y, xcol, ycol) {
x$sort_id <- 1:nrow(as(x, "data.frame"))
x.dat <- as(x, "data.frame")
x.dat2 <- merge(x.dat, y, by.x = xcol, by.y = ycol)
x.dat2.ord <- x.dat2[order(x.dat2$sort_id), ]
x2 <- x[x$sort_id %in% x.dat2$sort_id, ]
x2.dat <- as(x2, "data.frame")
row.names(x.dat2.ord) <- row.names(x2.dat)
x2#data <- x.dat2.ord
return(x2)
}
Where; x=sp SpatialDataFrame object, y=dataframe object to merge with x, xcol=Merge column name in sp object (need to quote), ycol=Merge column name in dataframe object (need to quote)
I found the same problem when using R versions 2.12.x and 2.13.x, but the problem appears to have been resolved in version 2.15.1.
I found a workaround. Not very elegant actually and it takes some time to execute but it works:
world.shp<-readShapePoly("world_shapefile.shp")
net<-rep(NA,length(world.shp#data$NAME))
for(i in 1:length(net))
{
for(j in 1:length(iso.code.vector))
{
if(!is.na(world.shp#data$ISO2[i])){if(world.shp#data$ISO2[i]==iso.code.vector[j]){net[i]=country.data.vector[j]}}
}
}
world.shp#data<-data.frame(world.shp#data, net)

Resources