Iterating over a function to combine many raster stacks into one - r

Been stuck on this for a while now. Looked everywhere for an answer, but I can't seem to find anything on Stack. Any help you all can give that would be very appreciated.
My main issue is that I need to import many, many netcdf4 files, create raster bricks of each, then combine many bricks to make a "master brick" per variable. To give you a clearer example, I have 40 years (netcdf = 40) of many climate variables (n = 15) that are at a daily resolution. The goal is to aggregate to monthly, but first I have to get this function that reads all years of netcdf's for one variable in and into one large stack.
What I have now reads as follows:
# Libraries --------------------------------------------------------------
library(raster)
library(ncdf4)
# Directories -------------------------------------------------------------
tmp_dl <- list.files("/Users/NateM", pattern = "nc",
full.names = TRUE)
# Functions ---------------------------------------------------------------
rstlist = stack()
netcdf_import <- function(file) {
nc <- nc_open(file)
nc_att <- attributes(nc$var)$names
ncvar <- ncvar_get(nc, nc_att)
rm(nc)
proj <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
rbrck <- brick(ncvar, crs= proj)
rm(ncvar)
extent(rbrck) <- c(-124.772163391113, -67.06383005778, 25.0626894632975,
49.3960227966309)
}
t <- for(i in 1:length(tmp_dl)) {
x <- netcdf_import(tmp_dl[i])
rstlist <- stack(rstlist, x)
}
allyears <- stack(t)
Two years of the data can be found here:
https://www.northwestknowledge.net/metdata/data/pdsi_2016.nc
https://www.northwestknowledge.net/metdata/data/pdsi_2015.nc
Any help would be most welcomed. Thank you all in advance, and if this is a duplicate post I apologize; I looked far and wide to no avail.

Your code is fine, you just need to return the loaded brick rbrck from your function, otherwise you'll get the extent.
As for loading and stacking, I'd suggest using lapply to apply the function to each datafile. This will give you a neat list with a year per item. There you could do some more processing and finally just call stack on the list to produce your "master brick".
Mind that I only did this with two files, so I'm not sure about the size of the whole thing when you do it with 40.
Here's your modified code:
# Libraries --------------------------------------------------------------
library(raster)
library(ncdf4)
# Directories -------------------------------------------------------------
tmp_dl <- list.files("/Users/NateM", pattern = "nc",
full.names = TRUE)
# Functions ---------------------------------------------------------------
netcdf_import <- function(file) {
nc <- nc_open(file)
nc_att <- attributes(nc$var)$names
ncvar <- ncvar_get(nc, nc_att)
rm(nc)
proj <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
rbrck <- brick(ncvar, crs= proj)
rm(ncvar)
extent(rbrck) <- c(-124.772163391113, -67.06383005778, 25.0626894632975,
49.3960227966309)
return(rbrck)
}
# apply function
allyrs <- lapply(tmp_dl,netcdf_import)
# stack to master brick
allyrs <- do.call(stack,allyrs)
HTH

Related

I want to extract data from multiple netcdf files

I have a code in R that extracts daily values of every month from a single .nc4 file. I have 49 netcdf files. i want to extract the data from all those files using loop and write them in a unique csv file.
I have this code for a single file but I need help for multiple files.
flux1701 <- nc_open(list[14])
GPP.array1701 <- ncvar_get(flux1701, "GPP")
fillvalue1701 <- ncatt_get(flux1701, "GPP", "_FillValue")
nc_close(flux1612)
GPP.array1701[GPP.array1701 == fillvalue$value] <- NA
rbrick1701 <- brick(GPP.array1701, xmn=min(lat), xmx=max(lat), ymn=min(lon), ymx=max(lon), crs=CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs+ towgs84=0,0,0")) ##to convert the entire 3d array of data to a raster brick
rbrick1701 <- flip(t(rbrick1701), direction='y')
qro_lon <- -99.941
qro_lat <- 20.717
qro_series1701 <- extract(rbrick1701, SpatialPoints(cbind(qro_lon,qro_lat)), method='simple')
qro_df1701 <- data.frame(day= seq(from=1, to=31, by=1), GPP=t(qro_series1612))
write.csv(qro_df1701, file="qro201701.csv")
I do not think your code gives you the correct extent . I would suggest doing
library(raster)
rbrick1701 <- brick(list[14], "GPP")
And here is one of many examples/answers on how to write a loop

How to streamline and speed up loop with getData package in R

I am trying to download high-resolution climate data for a bunch of lat/long coordinates, and combine them into a single dataframe. I've come up with a solution (below), but it will take forever with the large list of coordinates I have. I asked a related question on the GIS StackExchange to see if anyone knew of a better approach for downloading and merging the data, but I'm wondering if I could somehow just speed up the operation of the loop? Does anyone have any suggestions on how I might do that? Here is a reproducible example:
# Download and merge 0.5 minute MAT/MAP data from WorldClim for a list of lon/lat coordinates
# This is based on https://emilypiche.github.io/BIO381/raster.html
# Make a dataframe with coordinates
coords <- data.frame(Lon = c(-83.63, 149.12), Lat=c(10.39,-35.31))
# Load package
library(raster)
# Make an empty dataframe for dumping data into
coords3 <- data.frame(Lon=integer(), Lat=integer(), MAT_10=integer(), MAP_MM=integer())
# Get WorldClim data for all the coordinates, and dump into coords 3
for(i in seq_along(coords$Lon)) {
r <- getData("worldclim", var="bio", res=0.5, lon=coords[i,1], lat=coords[i,2]) # Download the tile containing the lat/lon
r <- r[[c(1,12)]] # Reduce the layers in the RasterStack to just the variables we want to look at (MAT*10 and MAP_mm)
names(r) <- c("MAT_10", "MAP_mm") # Rename the columns to something intelligible
points <- SpatialPoints(na.omit(coords[i,1:2]), proj4string = r#crs) #give lon,lat to SpatialPoints
values <- extract(r,points)
coords2 <- cbind.data.frame(coords[i,1:2],values)
coords3 <- rbind(coords3, coords2)
}
# Convert MAT*10 from WorldClim into MAT in Celcius
coords3$MAT_C <- coords3$MAT_10/10
Edit: Thanks to advice from Dave2e, I've first made a list, then put intermediate results in the list, and rbind it at the end. I haven't timed this yet to see how much faster it is than my original solution. If anyone has further suggestions on how to improve the speed, I'm all ears! Here is the new version:
coordsList <- list()
for(i in seq_along(coordinates$lon_stm)) {
r <- getData("worldclim", var="bio", res=0.5, lon=coordinates[i,7], lat=coordinates[i,6]) # Download the tile containing the lat/lon
r <- r[[c(1,12)]] # Reduce the layers in the RasterStack to just the variables we want to look at (MAT*10 and MAP_mm)
names(r) <- c("MAT_10", "MAP_mm") # Rename the columns to something intelligible
points <- SpatialPoints(na.omit(coordinates[i,7:6]), proj4string = r#crs) #give lon,lat to SpatialPoints
values <- extract(r,points)
coordsList[[i]] <- cbind.data.frame(coordinates[i,7:6],values)
}
coords_new <- bind_rows(coordsList)
Edit2: I used system.time() to time the execution of both of the above approaches. When I did the timing, I had already downloaded all of the data, so the download time isn't included in my time estimates. My first approach took 45.01 minutes, and the revised approach took 44.15 minutes, so I'm not really seeing a substantial time savings by doing it the latter way. Still open to advice on how to revise the code so I can improve the speed of the operations!

convert multiple .csv files to .shp files in R

I should preface that I am terrible at loops in R and I recognize this question is similar to this post Batch convert .csv files to .shp files in R. However, I was not able to leave a comment to see if this user found a solution on this thread because I do not have enough reputation points and the suggested solutions did not help me.
I have multiple .csv files that contain GPS points of animals. I would like to create multiple shapefiles for spatial analysis. I have tried creating a loop to read in the .csv file, make spatial data from csv file with latitudes and longitudes, transform the spatial data frame to a UTM projection so that I can calculate distances and then write the file as a shapefile. Here is the loop I have tried, but I think my indexing in the out and utm_out is incorrect.
Here is some test data; remember to set your working directory before writing the .csv:
#write sample .csv for animal 1
ID1<-rep(1, 3)
Latitude1<-c(25.48146, 25.49211, 25.47954)
Longitude1<-c(-84.66530, -84.64892, -84.69765)
df1<-data.frame(ID1, Latitude1, Longitude1)
colnames(df1)<-c("ID", "Latitude", "Longitude")
write.csv(df1, "df1.csv", row.names = FALSE)
#write sample .csv for animal 2
ID2<-rep(2, 2)
Latitude2<-c(28.48146, 28.49211)
Longitude2<-c(-88.66530, -88.64892)
df2<-data.frame(ID2, Latitude2, Longitude2)
colnames(df2)<-c("ID", "Latitude", "Longitude")
write.csv(df2, "df2.csv", row.names = FALSE)
#create a list of file names in my working directory where .csv files are located
all.files<-list.files(pattern="\\.csv")
#set the points geographic coordinate system
points_crs <- crs("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")
#write a loop to read in each file, specify the file as a spatial points data frame & project then write as .shp file
for(i in 1:length(all.files)) {
file<-read.csv(all.files[i], header=TRUE, sep = ",", stringsAsFactors = FALSE) #read files
coords<-file[c("Longitude", "Latitude")] #select coordinates from the file
out<-SpatialPointsDataFrame(
coords = coords,
file,
proj4string = points_crs) #create Spatial Points Data Frame and specify geographic coordinate system = points_crs
utm_out<-spTransform(out, crs("+init=epsg:32616")) #transform to UTM
writeOGR(utm_out[[i]],dsn="C:/Users/Desktop/Shapefile_test",
"*", driver="ESRI Shapefile")
}
This gives me the following: Error: inherits(obj, "Spatial") is not TRUE
I've also tried:
for(i in 1:length(all.files)) {
file<-read.csv(all.files[i], header=TRUE, sep = ",", stringsAsFactors = FALSE)
coords<-file[c("Longitude", "Latitude")]
out<-SpatialPointsDataFrame(
coords = coords,
file,
proj4string = points_crs)
utm_out<-spTransform(out[[i]], crs("+init=epsg:32616"))
writeOGR(utm_out[[i]],dsn="C:/Users/Desktop/Shapefile_test", "*", driver="ESRI Shapefile")
}
This produces: Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘spTransform’ for signature ‘"integer", "CRS"’
Ideally, the output file will be something like "animal1.shp" "animal2.shp"...etc.
Alternatively, I do have animal 1 and 2 in one file. I could bring in this file, set the projection and then create multiple subsets for each unique animal ID and write the subset to a .shp file, but I am having issues with subsetting the spatial data and I think that is a topic for another thread.
Thanks in advance for your assistance.
Here is a minor variation on the solution by Mako212
library(raster)
all.files <- list.files(pattern="\\.csv$")
out.files <- gsub("\\.csv$", ".shp")
crs <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")
for(i in 1:length(all.files)) {
d <- read.csv(all.files[i], stringsAsFactors = FALSE)
sp <- SpatialPointsDataFrame(coords = d[c("Longitude", "Latitude")], d, proj4string = crs)
utm <- spTransform(sp, CRS("+proj=utm +zone=16 +datum=WGS84"))
shapefile(utm, out.files[i])
}
Expanding on my comment, it's important to test batch processing operations like this on a single file first, and then adapt your solution as necessary to process the batch. My first step in troubleshooting your issue was to strip away the for loop, and try running the code with the first file,all.files[1], and it still failed, indicating there was at least one issue not related to the loop.
Try this out. I've changed crs to CRS because the function in sp is capitalized. Your loop range can be simplified to for(i in all.files), and I removed the attempts to access non-existent lists with out and utm_out
require(sp)
require(rgdal)
points_crs <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")
#write a loop to read in each file, specify the file as a spatial points data frame & project then write as .shp file
for(i in all.files) {
file <- read.csv(i, header=TRUE, sep = ",", stringsAsFactors = FALSE) #read files
coords <- file[c("Longitude", "Latitude")] #select coordinates from the file
out <- SpatialPointsDataFrame(
coords = coords,
file,
proj4string = points_crs) #create Spatial Points Data Frame and specify geographic coordinate system = points_crs
names<-substr(i, 1, nchar(all.files)-4)
utm_out <- spTransform(out, CRS("+init=epsg:32616")) #transform to UTM
writeOGR(utm_out,dsn="/path/Shapefile_test", layer=names, driver="ESRI Shapefile")
}
Edit:
I had to modify the writeOGR line by specifying a layer name:
writeOGR(utm_out,dsn="/path/Shapefile_test", layer="test", driver="ESRI Shapefile")

R: create a loop for importing, manipulating (spatial join) of multiple files

I am a basic R user who needs your help. I have multiple data files that I want to process by creating a loop function; basically, import one or two files, process, and remove them; and repeat this process for several times. However, I am stuck with probably simple codes for many of you. Please kindly help me solve this.
Simply I can import and process data with a single file, followed by.
test <- read.table("test.txt", header = FALSE, sep='\t', stringsAsFactors = FALSE)
test <- as.data.frame(test)
## prepared for spatial joining with polygon
coordinates(test) = ~ lon + lat
proj4string(test) = CRS("+proj=longlat +datum=NAD83")
## Import gis polygon shapefile
ZIPshp <- readShapeSpatial("D:/data/gis/Zipcode.shp",
proj4string=CRS("+proj=longlat +datum=NAD83"))
## spatial join b/w point and polygon
test_zip <- over(test, ZIPshp[,"zipc"])
test_zip <- subset(test_zip, zipc != "")
write.table(test_zip, "test_zip.csv", sep = ",", na = "NA", row.names = FALSE)
However, I failed to figure out how to create a loop function to repeat this process in multiple times, especially, removing processed data frame after data processing is complete. Here are my trial but it still misses a key portion, which I really need your help. (I also thought about do.call and lapply functions but failed to come up with)
files=list.files(pattern='*.txt')
ldf <- list()
for (i in 1:length(files)) {
ldf[[i]] <- read.table(files[[i]], header=FALSE, sep='\t',
stringsAsFactors = FALSE)
coordinates(ldf[[i]]) = ~ lon + lat
proj4string(ldf[[i]]) = CRS("+proj=longlat +datum=NAD83")
}
## (missing parts are spatial join, removal of processed
## data frame, and repeating this process with new data)
Please help me! Thanks
you can use the below as a skeleton to complete your solution
options(stringsAsFactors=FALSE)
## Import gis polygon shapefile
ZIPshp <- readShapeSpatial("D:/data/gis/Zipcode.shp",
proj4string=CRS("+proj=longlat +datum=NAD83"))
##read in each file and process it
lapply(list.files(pattern='*.txt'), function(txtfile) {
test <- read.table(txtfile, header=FALSE, sep='\t')
## prepared for spatial joining with polygon
coordinates(test) <- ~lon+lat
proj4string(test) <- CRS("+proj=longlat +datum=NAD83")
## spatial join b/w point and polygon
test_zip <- over(test, ZIPshp[,"zipc"])
test_zip <- subset(test_zip, zipc!="")
## output processed file as a csv
write.csv(test_zip,
paste0(tools::file_path_sans_ext(txtfile), ".csv"),
row.names = FALSE)
})

Merging multiple rasters in R

I've been trying to find a time-efficient way to merge multiple raster images in R. These are adjacent ASTER scenes from the southern Kilimanjaro region, and my target is to put them together to obtain one large image.
This is what I got so far (object 'ast14dmo' representing a list of RasterLayer objects):
# Loop through single ASTER scenes
for (i in seq(ast14dmo.sd)) {
if (i == 1) {
# Merge current with subsequent scene
ast14dmo.sd.mrg <- merge(ast14dmo.sd[[i]], ast14dmo.sd[[i+1]], tolerance = 1)
} else if (i > 1 && i < length(ast14dmo.sd)) {
tmp.mrg <- merge(ast14dmo.sd[[i]], ast14dmo.sd[[i+1]], tolerance = 1)
ast14dmo.sd.mrg <- merge(ast14dmo.sd.mrg, tmp.mrg, tolerance = 1)
} else {
# Save merged image
writeRaster(ast14dmo.sd.mrg, paste(path.mrg, "/AST14DMO_sd_", z, "m_mrg", sep = ""), format = "GTiff", overwrite = TRUE)
}
}
As you surely guess, the code works. However, merging takes quite long considering that each single raster object is some 70 mb large. I also tried Reduce and do.call, but that failed since I couldn't pass the argument 'tolerance' which circumvents the different origins of the raster files.
Anybody got an idea of how to speed things up?
You can use do.call
ast14dmo.sd$tolerance <- 1
ast14dmo.sd$filename <- paste(path.mrg, "/AST14DMO_sd_", z, "m_mrg.tif", sep = "")
ast14dmo.sd$overwrite <- TRUE
mm <- do.call(merge, ast14dmo.sd)
Here with some data, from the example in raster::merge
r1 <- raster(xmx=-150, ymn=60, ncols=30, nrows=30)
r1[] <- 1:ncell(r1)
r2 <- raster(xmn=-100, xmx=-50, ymx=50, ymn=30)
res(r2) <- c(xres(r1), yres(r1))
r2[] <- 1:ncell(r2)
x <- list(r1, r2)
names(x) <- c("x", "y")
x$filename <- 'test.tif'
x$overwrite <- TRUE
m <- do.call(merge, x)
The 'merge' function from the Raster package is a little slow. For large projects a faster option is to work with gdal commands in R.
library(gdalUtils)
library(rgdal)
Build list of all raster files you want to join (in your current working directory).
all_my_rasts <- c('r1.tif', 'r2.tif', 'r3.tif')
Make a template raster file to build onto. Think of this a big blank canvas to add tiles to.
e <- extent(-131, -124, 49, 53)
template <- raster(e)
projection(template) <- '+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs'
writeRaster(template, file="MyBigNastyRasty.tif", format="GTiff")
Merge all raster tiles into one big raster.
mosaic_rasters(gdalfile=all_my_rasts,dst_dataset="MyBigNastyRasty.tif",of="GTiff")
gdalinfo("MyBigNastyRasty.tif")
This should work pretty well for speed (faster than merge in the raster package), but if you have thousands of tiles you might even want to look into building a vrt first.
You can use Reduce like this for example :
Reduce(function(...)merge(...,tolerance=1),ast14dmo.sd)
SAGA GIS mosaicking tool (http://www.saga-gis.org/saga_tool_doc/7.3.0/grid_tools_3.html) gives you maximum flexibility for merging numeric layers, and it runs in parallel by default! You only have to translate all rasters/images to SAGA .sgrd format first, then run the command line saga_cmd.
I have tested the solution using gdalUtils as proposed by Matthew Bayly. It works quite well and fast (I have about 1000 images to merge). However, after checking with document of mosaic_raster function here, I found that it works without making a template raster before mosaic the images. I pasted the example codes from the document below:
outdir <- tempdir()
gdal_setInstallation()
valid_install <- !is.null(getOption("gdalUtils_gdalPath"))
if(require(raster) && require(rgdal) && valid_install)
{
layer1 <- system.file("external/tahoe_lidar_bareearth.tif", package="gdalUtils")
layer2 <- system.file("external/tahoe_lidar_highesthit.tif", package="gdalUtils")
mosaic_rasters(gdalfile=c(layer1,layer2),dst_dataset=file.path(outdir,"test_mosaic.envi"),
separate=TRUE,of="ENVI",verbose=TRUE)
gdalinfo("test_mosaic.envi")
}
I was faced with this same problem and I used
#Read desired files into R
data_name1<-'file_name1.tif'
r1=raster(data_name1)
data_name2<-'file_name2.tif'
r2=raster(data_name2)
#Merge files
new_data <- raster::merge(r1, r2)
Although it did not produce a new merged raster file, it stored in the data environment and produced a merged map when plotted.
I ran into the following problem when trying to mosaic several rasters on top of each other
In vv[is.na(vv)] <- getValues(x[[i]])[is.na(vv)] :
number of items to replace is not a multiple of replacement length
As #Robert Hijmans pointed out, it was likely because of misaligned rasters. To work around this, I had to resample the rasters first
library(raster)
x <- raster("Base_raster.tif")
r1 <- raster("Top1_raster.tif")
r2 <- raster("Top2_raster.tif")
# Resample
x1 <- resample(r1, crop(x, r1))
x2 <- resample(r2, crop(x, r2))
# Merge rasters. Make sure to use the right order
m <- merge(merge(x1, x2), x)
# Write output
writeRaster(m,
filename = file.path("Mosaic_raster.tif"),
format = "GTiff",
overwrite = TRUE)

Resources