Subsetting and saving netcdf in R - r

I am trying to subset 8 netcdf files (one of them is here) in a foor loop according to a shorter period of time and then to save them as a new netcdf file.
I saw that other people already asked on how to subset netcdf files according to different time period (here or here) but once I do it on my netcdf files the "for loop" keeps on running without finishing (not even the first netcdf file) and I can't figure out why.
Here the code I use:
library(raster)
library(netcdf4)
library(lubridate)
#setting wd=indir containing netcdf files
setwd(indir)
files=list.files(pattern="nc")
for (j in seq_along(files)){
#setting wd containing netcdf files in the loop
setwd(indir)
b<-brick(files[j])
nc<-nc_open(files[j])
#variable
varname<-names(nc[['var']][3])
varunits <- ncatt_get(nc,varname,"units")[[2]]
lon<-ncvar_get(nc,"lon")
lat<-ncvar_get(nc,"lat", verbose = F)
time<-ncvar_get(nc, "time")
tunits <- ncatt_get(nc, "time", "units")[[2]]
dlname <- ncatt_get(nc, varname,"long_name")[[2]]
nc_close(nc)
#assigning a crs
proj4string(b)<-"+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"
#setting time as.Date
tm<-ymd(getZ(b))
#setting time to rasterBrick
b<-setZ(b, tm)
# subsetting
b2<-subset(b, which(tm < as.Date('2006-01-01')))
#setting wd where I want to save the "new" netcdf files
setwd(outdir)
writeRaster(b2, filename = paste0(varname, "_1971_2006_Noce.nc"),
format="CDF", varname=varname, varunit=varunits, longname=dlname,
xname="lon", yname="lat", zname="time", zunit=tunits, overwrite=TRUE)
}
Any help on how to get the loop working would be very much appreciated!

Related

Upload a Large Batch (70+) of MODIS netCDF Files into R to Convert into Raster Files for QGIS

Issue:
I have a large batch of 70+ Aqua Modis files from Ocean Colour, NASA in .nc format containing coordinates for monthly data specifically for one variable called sea surface temperature (SST) and I need to convert them into Raster files.
My data frame has approximately 650 rows that represent the GPS location (latitude and longitude coordinates) of dolphin IDs recorded between 2016-and 2021.
Aim:
My ultimate aim is to convert the Modis files into raster files, stack the Modis files using stack::raster(), interpolate and extract the SST values per data point (dolphin IDs) for our research dates using extract() function, calculate the average SST across all raster 70+ files between 2016-2022, convert the final averaged map into a .tiff file, and write the values into a .csv file.
I am a complete novice with GIS methods and I have been dabbling but I can't find an accurate solution regarding how to perform these actions. I've been uploading and converting the MODIS files individually, and extracting the values using the 'Points to Path' function in QGIS; however, it's very time consuming and I won't have enough time to finish before the deadline using this method.
I have been thinking that there must be a quicker and easier way to analyse my data by using R to convert large batches (70+) of files (e.g. convert netCDF files to raster files and raster files to .tiff files) at the same time between the different file types for further analysis.
So far, I have managed to open one 'Aqua Modis' file using this code:
##Open Packages**
library(ncdf4)
library(terra)
library('RNetCDF')
#Open the netCDF files
ncin<-nc_open("AQUA_MODIS.20160901_20160930.L3m.MO.SST.sst.4km.nc")
##Check the information on the netCDF file
print(ncin)
##Get the variables from the netCDF file
SST_Variables<-ncvar_get(ncin, "sst")
lon <- ncvar_get(ncin, 'lon')
lat <- ncvar_get(ncin, 'lat')
##Get information about each variables dimensions
dim(SST_Variables) *8640 4320
dim(lon) *8640
dim(lat) *4320
However, I want to open 70+ Aqua Modis files all at once contained in the same folder.
##Open all netCDF files in the folder
filenames = list.files('~/Documents/Ocean_ColorSST_2016_2021',pattern='*.nc',full.names=TRUE)
# Loop over files
for(i in seq_along(filenames)) {
nc = open.nc(filenames[i])
# Read the whole nc file and read the length of the varying dimension
lw = var.get.nc(nc,'sst')
x=dim(lw)
# Vary the time dimension for each file as required
lw = var.get.nc(nc,'sst')
# Add the values from each file to a single data.frame
rbind(df, data.frame(lw))-> SST_Files
}
Error Message
Error in rep(xi, length.out = nvar) :
attempt to replicate an object of type 'closure'
> SST_Files
Error: object 'SST_Files' not found
Extract the values from each netCDF file for the variable "sst"
ncin_SST <- raster::stack(filenames,varname = "sst")

I want to extract data from multiple netcdf files

I have a code in R that extracts daily values of every month from a single .nc4 file. I have 49 netcdf files. i want to extract the data from all those files using loop and write them in a unique csv file.
I have this code for a single file but I need help for multiple files.
flux1701 <- nc_open(list[14])
GPP.array1701 <- ncvar_get(flux1701, "GPP")
fillvalue1701 <- ncatt_get(flux1701, "GPP", "_FillValue")
nc_close(flux1612)
GPP.array1701[GPP.array1701 == fillvalue$value] <- NA
rbrick1701 <- brick(GPP.array1701, xmn=min(lat), xmx=max(lat), ymn=min(lon), ymx=max(lon), crs=CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs+ towgs84=0,0,0")) ##to convert the entire 3d array of data to a raster brick
rbrick1701 <- flip(t(rbrick1701), direction='y')
qro_lon <- -99.941
qro_lat <- 20.717
qro_series1701 <- extract(rbrick1701, SpatialPoints(cbind(qro_lon,qro_lat)), method='simple')
qro_df1701 <- data.frame(day= seq(from=1, to=31, by=1), GPP=t(qro_series1612))
write.csv(qro_df1701, file="qro201701.csv")
I do not think your code gives you the correct extent . I would suggest doing
library(raster)
rbrick1701 <- brick(list[14], "GPP")
And here is one of many examples/answers on how to write a loop

Why does cropping a raster stack changes the names of layers?

I'm processing yearly multilayer netCDF files with daily precipitation data from CHIRPS. I have the files for the whole world, each file about 1.2gb large. I need to calculate indices from the precipitation data for each cell in the raster for a specific region. In order to do that I'm trying to crop the files to get a rectangular shape above my area of interest using the raster R package.
This is the code I'm using, exemplary for the first file.
library(ncdf4)
library(raster)
library(rgdal)
# Crop extent
crop_extent <- as(raster::extent(79, 89, 25, 31), "SpatialPolygons")
proj4string(crop_extent) <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
# Set directory with original files
setwd("~/data")
# Read file
chirps81 <- stack("chirps-v2.0.1981.days_p05.nc")
chirps81crop <-crop(chirps1981, crop_extent)
# Write cropped file back to different folder
setwd("~/croppeddata")
writeRaster(chirps81crop, "chirps81crop.nc", overwrite=TRUE)
For some reason however while writing the file the layers lose their name. In the original files and after cropping the names have layer names of the format "X1981.01.01". But after writing and reading the netCDF file with new file <- stack("chirps81crop.nc") the layer names are changed to the format 'X1' up to 'X365'. I think it should be fine working with it, assuming that the order of the layers didn't get mixed up but I don't understand what is happening to the layer names and if this happens because there is something wrong with the code.
It's the writeRaster() function that is losing the layer names, not the crop operation. It is possible to use lower level ncdf functions to assign a numeric value (not a string unfortunately) to each layer which will then show up in the name of the layers after reading. Taking inspiration from the example here, I created some code that shows this.
library(ncdf4)
library(raster)
library(rgdal)
# Crop extent
crop_extent <- as(raster::extent(5.74, 5.75, 50.96, 50.97), "SpatialPolygons")
proj4string(crop_extent) <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
# make a sample file
r <- raster(system.file("external/test.grd", package="raster"))
r.latlon <- projectRaster(r, crs = proj4string(crop_extent))
writeRaster(x=r.latlon, filename = 'test.nc', format = 'CDF', overwrite=TRUE)
# read the sample as a 2 layer stack and crop it
test <- stack('test.nc', 'test.nc')
writeRaster(test, 'teststack.nc', overwrite=TRUE, format='CDF')
testcrop <- crop(test, crop_extent)
names(testcrop)
# [1] "test.1" "test.2"
# write the cropped file and make the zname equal to Layer
writeRaster(testcrop, 'testcrop.nc', overwrite=TRUE, format='CDF', zname='Layer')
# open the cdf file directly
nc <- nc_open('testcrop.nc', write = T)
# give the layers numbers starting from 10 so
# we can see them easily
layers = 1:nlayers(testcrop) + 10
layers
# [1] 11 12
ncvar_put(nc, 'Layer', layers)
nc_close(nc)
newtestcrop <- stack('testcrop.nc')
names(newtestcrop)
# [1] "X11" "X12"
nc <- nc_open('testcrop.nc', write = F)
layers = ncvar_get(nc, 'Layer')
layers
# [1] 11 12
nc_close(nc)
So it is possible to get names with numbers under your control when writing the raster, but I don't know enough about your environment to determine if this will help since it might be tricky to map the names you need to a single unambiguous number.
I hope you don't mind me offering a non-R solution, but this task is much easier from the command line using CDO:
cdo sellonlatbox,79,89,25,31 chirps-v2.0.1981.days_p05.nc cropped_file.nc
Which indices did you want to calculate? I suspect it is possible to calculate those quickly and easily with CDO functions too...

convert multiple .csv files to .shp files in R

I should preface that I am terrible at loops in R and I recognize this question is similar to this post Batch convert .csv files to .shp files in R. However, I was not able to leave a comment to see if this user found a solution on this thread because I do not have enough reputation points and the suggested solutions did not help me.
I have multiple .csv files that contain GPS points of animals. I would like to create multiple shapefiles for spatial analysis. I have tried creating a loop to read in the .csv file, make spatial data from csv file with latitudes and longitudes, transform the spatial data frame to a UTM projection so that I can calculate distances and then write the file as a shapefile. Here is the loop I have tried, but I think my indexing in the out and utm_out is incorrect.
Here is some test data; remember to set your working directory before writing the .csv:
#write sample .csv for animal 1
ID1<-rep(1, 3)
Latitude1<-c(25.48146, 25.49211, 25.47954)
Longitude1<-c(-84.66530, -84.64892, -84.69765)
df1<-data.frame(ID1, Latitude1, Longitude1)
colnames(df1)<-c("ID", "Latitude", "Longitude")
write.csv(df1, "df1.csv", row.names = FALSE)
#write sample .csv for animal 2
ID2<-rep(2, 2)
Latitude2<-c(28.48146, 28.49211)
Longitude2<-c(-88.66530, -88.64892)
df2<-data.frame(ID2, Latitude2, Longitude2)
colnames(df2)<-c("ID", "Latitude", "Longitude")
write.csv(df2, "df2.csv", row.names = FALSE)
#create a list of file names in my working directory where .csv files are located
all.files<-list.files(pattern="\\.csv")
#set the points geographic coordinate system
points_crs <- crs("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")
#write a loop to read in each file, specify the file as a spatial points data frame & project then write as .shp file
for(i in 1:length(all.files)) {
file<-read.csv(all.files[i], header=TRUE, sep = ",", stringsAsFactors = FALSE) #read files
coords<-file[c("Longitude", "Latitude")] #select coordinates from the file
out<-SpatialPointsDataFrame(
coords = coords,
file,
proj4string = points_crs) #create Spatial Points Data Frame and specify geographic coordinate system = points_crs
utm_out<-spTransform(out, crs("+init=epsg:32616")) #transform to UTM
writeOGR(utm_out[[i]],dsn="C:/Users/Desktop/Shapefile_test",
"*", driver="ESRI Shapefile")
}
This gives me the following: Error: inherits(obj, "Spatial") is not TRUE
I've also tried:
for(i in 1:length(all.files)) {
file<-read.csv(all.files[i], header=TRUE, sep = ",", stringsAsFactors = FALSE)
coords<-file[c("Longitude", "Latitude")]
out<-SpatialPointsDataFrame(
coords = coords,
file,
proj4string = points_crs)
utm_out<-spTransform(out[[i]], crs("+init=epsg:32616"))
writeOGR(utm_out[[i]],dsn="C:/Users/Desktop/Shapefile_test", "*", driver="ESRI Shapefile")
}
This produces: Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘spTransform’ for signature ‘"integer", "CRS"’
Ideally, the output file will be something like "animal1.shp" "animal2.shp"...etc.
Alternatively, I do have animal 1 and 2 in one file. I could bring in this file, set the projection and then create multiple subsets for each unique animal ID and write the subset to a .shp file, but I am having issues with subsetting the spatial data and I think that is a topic for another thread.
Thanks in advance for your assistance.
Here is a minor variation on the solution by Mako212
library(raster)
all.files <- list.files(pattern="\\.csv$")
out.files <- gsub("\\.csv$", ".shp")
crs <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")
for(i in 1:length(all.files)) {
d <- read.csv(all.files[i], stringsAsFactors = FALSE)
sp <- SpatialPointsDataFrame(coords = d[c("Longitude", "Latitude")], d, proj4string = crs)
utm <- spTransform(sp, CRS("+proj=utm +zone=16 +datum=WGS84"))
shapefile(utm, out.files[i])
}
Expanding on my comment, it's important to test batch processing operations like this on a single file first, and then adapt your solution as necessary to process the batch. My first step in troubleshooting your issue was to strip away the for loop, and try running the code with the first file,all.files[1], and it still failed, indicating there was at least one issue not related to the loop.
Try this out. I've changed crs to CRS because the function in sp is capitalized. Your loop range can be simplified to for(i in all.files), and I removed the attempts to access non-existent lists with out and utm_out
require(sp)
require(rgdal)
points_crs <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")
#write a loop to read in each file, specify the file as a spatial points data frame & project then write as .shp file
for(i in all.files) {
file <- read.csv(i, header=TRUE, sep = ",", stringsAsFactors = FALSE) #read files
coords <- file[c("Longitude", "Latitude")] #select coordinates from the file
out <- SpatialPointsDataFrame(
coords = coords,
file,
proj4string = points_crs) #create Spatial Points Data Frame and specify geographic coordinate system = points_crs
names<-substr(i, 1, nchar(all.files)-4)
utm_out <- spTransform(out, CRS("+init=epsg:32616")) #transform to UTM
writeOGR(utm_out,dsn="/path/Shapefile_test", layer=names, driver="ESRI Shapefile")
}
Edit:
I had to modify the writeOGR line by specifying a layer name:
writeOGR(utm_out,dsn="/path/Shapefile_test", layer="test", driver="ESRI Shapefile")

Iterating over a function to combine many raster stacks into one

Been stuck on this for a while now. Looked everywhere for an answer, but I can't seem to find anything on Stack. Any help you all can give that would be very appreciated.
My main issue is that I need to import many, many netcdf4 files, create raster bricks of each, then combine many bricks to make a "master brick" per variable. To give you a clearer example, I have 40 years (netcdf = 40) of many climate variables (n = 15) that are at a daily resolution. The goal is to aggregate to monthly, but first I have to get this function that reads all years of netcdf's for one variable in and into one large stack.
What I have now reads as follows:
# Libraries --------------------------------------------------------------
library(raster)
library(ncdf4)
# Directories -------------------------------------------------------------
tmp_dl <- list.files("/Users/NateM", pattern = "nc",
full.names = TRUE)
# Functions ---------------------------------------------------------------
rstlist = stack()
netcdf_import <- function(file) {
nc <- nc_open(file)
nc_att <- attributes(nc$var)$names
ncvar <- ncvar_get(nc, nc_att)
rm(nc)
proj <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
rbrck <- brick(ncvar, crs= proj)
rm(ncvar)
extent(rbrck) <- c(-124.772163391113, -67.06383005778, 25.0626894632975,
49.3960227966309)
}
t <- for(i in 1:length(tmp_dl)) {
x <- netcdf_import(tmp_dl[i])
rstlist <- stack(rstlist, x)
}
allyears <- stack(t)
Two years of the data can be found here:
https://www.northwestknowledge.net/metdata/data/pdsi_2016.nc
https://www.northwestknowledge.net/metdata/data/pdsi_2015.nc
Any help would be most welcomed. Thank you all in advance, and if this is a duplicate post I apologize; I looked far and wide to no avail.
Your code is fine, you just need to return the loaded brick rbrck from your function, otherwise you'll get the extent.
As for loading and stacking, I'd suggest using lapply to apply the function to each datafile. This will give you a neat list with a year per item. There you could do some more processing and finally just call stack on the list to produce your "master brick".
Mind that I only did this with two files, so I'm not sure about the size of the whole thing when you do it with 40.
Here's your modified code:
# Libraries --------------------------------------------------------------
library(raster)
library(ncdf4)
# Directories -------------------------------------------------------------
tmp_dl <- list.files("/Users/NateM", pattern = "nc",
full.names = TRUE)
# Functions ---------------------------------------------------------------
netcdf_import <- function(file) {
nc <- nc_open(file)
nc_att <- attributes(nc$var)$names
ncvar <- ncvar_get(nc, nc_att)
rm(nc)
proj <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
rbrck <- brick(ncvar, crs= proj)
rm(ncvar)
extent(rbrck) <- c(-124.772163391113, -67.06383005778, 25.0626894632975,
49.3960227966309)
return(rbrck)
}
# apply function
allyrs <- lapply(tmp_dl,netcdf_import)
# stack to master brick
allyrs <- do.call(stack,allyrs)
HTH

Resources