How to merge netcdf files separately in R? - r

I am a user of R and would like some help in the following:
I have two netcdf files (each of dimensions 30x30x365) and one more with 30x30x366. These 3 files contain a year's worth of daily data, where the last dimension refers to the time dimension. I wanted to combine them separately i.e. I wanted the output file to contain 30x30x1096.
Note: I have seen a similar question but the output results in an average (i.e. 30x30x3) which I do not want.

from the comment I see below you seem to want to merge 3 files in the time dimension. As an alternative to R, you could do this quickly from the command line using cdo (climate data operators):
cdo mergetime file1.nc file2.nc file3.nc mergedfile.nc
or using wildcards:
cdo mergetime file?.nc mergedfile.nc
cdo is easy to install under ubuntu:
sudo apt install cdo

Without knowing exactly what dimensions and variables you have, this may be enough to get you started:
library(ncdf4)
output_data <- array(dim = c(30, 30, 1096))
files <- c('file1.nc', 'file2.nc', 'file3.nc')
days <- c(365, 365, 366)
# Open each file and add it to the final output array
for (i in seq_along(files)) {
nc <- nc_open(files[i])
input_arr <- ncvar_get(nc, varid='var_name')
nc_close(nc)
# Calculate the indices where each file's data should go
if (i > 1) {
day_idx <- (1:days[i]) + sum(days[1:(i-1)])
} else {
day_idx <- 1:days[i]
}
output_data[ , , day_idx] <- input_arr
}
# Write out output_data to a NetCDF. How exactly this should be done depends on what
# dimensions and variables you have.
# See here for more:
# https://publicwiki.deltares.nl/display/OET/Creating+a+netCDF+file+with+R

Related

Retaining original file names when processing multiple raster files using R

I have the following problem: I need to process multiple raster files using the same function in R package landscapemetrics. Basically my raster files are parts of a country map, all of the same shape and size (i.e. quadrants. I figured out a code for 1 file, but I have to do the same with more than 600 rasters. So, doing it manually is very irrational. The steps in my code are the following:
# 1. I load "raster" and "landscapemetrics" packages:
library(raster)
library(landscapemetrics)
# 2. I read in my quadrant:
Quadrant <- raster("C:\\Users\\customer\\Documents\\ ... \\2434-44.tif")
# 3. I process the raster to get landscape metrics tibble:
LS_metrics <- calculate_lsm(landscape = Quadrant)
# 4. Finally, I write it into a csv:
write.csv(LS_metrics, file = "2434-44.csv")
I need to keep the same file name for my csv files as I had for tif (e.g. results from processing quadrant "2434-44.tif", need to be stored in "2434-44.csv", possibly in a folder in wd).
I am new to R. I tried to use list.files() and then apply a for loop, but my code did not work.
I need your advice.
Yours faithfully,
Denis
Your question is really about iteration and character (filename) manipulation; not about landscapemetrics etc. There are many similar questions on this site and resources elsewhere that you can consult. The basic approach can be like this:
# get input filenames
inf <- list.files("/my/path", pattern="\\.tif$", full=TRUE)
# create output filenames
outf <- gsub(".tif", ".csv", basename(inf))
# perhaps put output files in particular folder
dir.create("out", FALSE, FALSE)
outf <- file.path("out", outf)
# iterate
for (i in 1:length(inf)) {
# read input
input <- raster(inf[i])
# do something
output <- data.frame(id=1)
# write output
write.csv(output, outf[i])
}
It's very hard to help without further information. What was the issue with your approach of looping through all files using list.files(). In general, this should work.
Furthermore, most likely you don't want to calculate all available landscape metrics, but rather specify a subselection during the calculate_lsm() function call.

Finding two directories (which are in ten min bins) based on a time. A diabolical directory disaster

I have looked all round and can't find a working solution. A bit of background:
I am using R to find raw images based on a validated image name (all this bit works). The issue is there are at least 30 date directories with each of these having a large number of time directories, these are divided up into 10 min bins. Looking in all the bins or just the parent directory is asking a bit too much computationally. An example format of the bin would be
R_Experiments\RawImageFinder\Raw\2016-10-08\1536
R_Experiments\RawImageFinder\Raw\2016-10-08\1546
It is important to note that the bins are not consistent with their starting minutes; it can vary and here in lies the problem.
I know what time the image was taken from the file name using the following bit of code
SingleImage <- Pia1.2016-10-08.1103+N2353_hc.tif
TimeDir <- sub('.*?\\.\\d{4}-\\d{2}-\\d{2}\\.(\\d{2})(\\d{2}).*', '\\1:\\2', SingleImage)
TimeDir <- sub(':','', TimeDir)
#
> print(TimeDir)
[1] "1103"
So the image could belong in any of the following bins:
\1053,\1054,\1055,..you get the idea...,\1112,\1113
it just depends when the bin was started. So I want the "finder" code to look in all possible bins that are within tin mins either side (as per the example above), obviously some of them will not exist.
I thought about doing:
TimeDir1 <- as.numeric(TimeDir)+1
TimeDir2 <- as.numeric(TimeDir)+2
but the issue arises if we get to 59 mins, because there is no such thing as 61 mins in the hour (haha).
I then use the following to tell which directories to search, although I am a bit stuck also on how to tell it to look in multiple directories.
Directorytosearch <- ParentDirectory
#this has the \ in it, same for time, it works
Directorytosearch <- sub('$',paste(DateDir), Directorytosearch)
Directorytoserach <- sub('$',paste(TimeDir), Directorytoserach)
IMAGEtocopy <- list.files(
path = c(Directorytosearch),
recursive = TRUE,
include.dirs = FALSE,
full.names = FALSE,
pattern = SingleImagePattern)
Any help really would be great!
Could be using the strptime function?
Many thanks
Jim
Update for #Nya
test <- strptime("1546", format = "%H%M")
dirs[select.image.dir(test, dirs.time)]
> dirs[select.image.dir(test, dirs.time)]
[1] "test/1546"
To list directories, you are looking for the list.dirs() function. Let's assume that the following example was obtained from such a search through all the directories.
# directories possibly obtained with list.dirs
dirs <- c("test/1536", "test/1546", "test/1556", "test/1606")
A good practice then would be to extract both the date and time components from the directories and image file names. Here, I will only use the time since that was the original request.
# convert times
dirs.time <- sub(".*/(\\d+)$", "\\1", dirs)
dirs.time <- strptime(dirs.time, format="%H%M")
# test data, in your case from image file names
test <- strptime(c("1538", "1559", "1502"), format="%H%M")
The function that will select the desired directories by comparing if the time from the image file is within the 10 minutes interval up and down the time of the directory. It will then provide the indices, where the image could be located.
select.image.dir <- function(i, dt){
res <- NULL
# adding and substracting 10 minutes converted to seconds
ik <- c(i - 600, i + 600)
condition <- c(ik[1] <= dt & ik[2] >= dt)
if(any(condition)){
res <- which(condition)
} else { res <- NA }
res
}
Note that the updated function accepts a single image file time to test in each round. The indices can then be used to extract the path to the image directory. The last time is outside the range of the directories and thus the function returns NA.
dirs[select.image.dir(test[1], dirs.time)]
# [1] "test/1536" "test/1546"
dirs[select.image.dir(test[2], dirs.time)]
# [1] "test/1556" "test/1606"
dirs[select.image.dir(test[3], dirs.time)]
# [1] NA NA NA NA

Reading hdf files into R and converting them to geoTIFF rasters

I'm trying to read MODIS 17 data files into R, manipulate them (cropping etc.) and then save them as geoTIFF's. The data files come in .hdf format and there doesn't seem to be an easy way to read them into R.
Compared to other topics there isn't a lot of advice out there and most of it is several years old. Some of it also advises using additional programmes but I want to stick with just using R.
What package/s do people use for dealing with .hdf files in R?
Ok, so my MODIS hdf files were hdf4 rather than hdf5 format. It was surprisingly difficult to discover this, MODIS don't mention it on their website but there are a few hints in various blogs and stack exchange posts. In the end I had to download HDFView to find out for sure.
R doesn't do hdf4 files and pretty much all the packages (like rgdal) only support hdf5 files. There are a few posts about downloading drivers and compiling rgdal from source but it all seemed rather complicated and the posts were for MAC or Unix and I'm using Windows.
Basically gdal_translate from the gdalUtils package is the saving grace for anyone who wants to use hdf4 files in R. It converts hdf4 files into geoTIFFs without reading them into R. This means that you can't manipulate them at all e.g. by cropping them, so its worth getting the smallest tiles you can (for MODIS data through something like Reverb) to minimise computing time.
Here's and example of the code:
library(gdalUtils)
# Provides detailed data on hdf4 files but takes ages
gdalinfo("MOD17A3H.A2000001.h21v09.006.2015141183401.hdf")
# Tells me what subdatasets are within my hdf4 MODIS files and makes them into a list
sds <- get_subdatasets("MOD17A3H.A2000001.h21v09.006.2015141183401.hdf")
sds
[1] "HDF4_EOS:EOS_GRID:MOD17A3H.A2000001.h21v09.006.2015141183401.hdf:MOD_Grid_MOD17A3H:Npp_500m"
[2] "HDF4_EOS:EOS_GRID:MOD17A3H.A2000001.h21v09.006.2015141183401.hdf:MOD_Grid_MOD17A3H:Npp_QC_500m"
# I'm only interested in the first subdataset and I can use gdal_translate to convert it to a .tif
gdal_translate(sds[1], dst_dataset = "NPP2000.tif")
# Load and plot the new .tif
rast <- raster("NPP2000.tif")
plot(rast)
# If you have lots of files then you can make a loop to do all this for you
files <- dir(pattern = ".hdf")
files
[1] "MOD17A3H.A2000001.h21v09.006.2015141183401.hdf" "MOD17A3H.A2001001.h21v09.006.2015148124025.hdf"
[3] "MOD17A3H.A2002001.h21v09.006.2015153182349.hdf" "MOD17A3H.A2003001.h21v09.006.2015166203852.hdf"
[5] "MOD17A3H.A2004001.h21v09.006.2015099031743.hdf" "MOD17A3H.A2005001.h21v09.006.2015113012334.hdf"
[7] "MOD17A3H.A2006001.h21v09.006.2015125163852.hdf" "MOD17A3H.A2007001.h21v09.006.2015169164508.hdf"
[9] "MOD17A3H.A2008001.h21v09.006.2015186104744.hdf" "MOD17A3H.A2009001.h21v09.006.2015198113503.hdf"
[11] "MOD17A3H.A2010001.h21v09.006.2015216071137.hdf" "MOD17A3H.A2011001.h21v09.006.2015230092603.hdf"
[13] "MOD17A3H.A2012001.h21v09.006.2015254070417.hdf" "MOD17A3H.A2013001.h21v09.006.2015272075433.hdf"
[15] "MOD17A3H.A2014001.h21v09.006.2015295062210.hdf"
filename <- substr(files,11,14)
filename <- paste0("NPP", filename, ".tif")
filename
[1] "NPP2000.tif" "NPP2001.tif" "NPP2002.tif" "NPP2003.tif" "NPP2004.tif" "NPP2005.tif" "NPP2006.tif" "NPP2007.tif" "NPP2008.tif"
[10] "NPP2009.tif" "NPP2010.tif" "NPP2011.tif" "NPP2012.tif" "NPP2013.tif" "NPP2014.tif"
i <- 1
for (i in 1:15){
sds <- get_subdatasets(files[i])
gdal_translate(sds[1], dst_dataset = filename[i])
}
Now you can read your .tif files into R using, for example, raster from the raster package and work as normal. I've checked the resulting files against a few I converted manually using QGIS and they match so I'm confident the code is doing what I think it is. Thanks to Loïc Dutrieux and this for the help!
These days you can use the terra package with HDF files
Either get sub-datasets
library(terra)
s <- sds("file.hdf")
s
That can be extracted as SpatRasters like this
s[1]
Or create a SpatRaster of all subdatasets like this
r <- rast("file.hdf")
The following worked for me. It's a short program and just takes in the input folder name. Make sure you know which sub data you want. I was interested in sub data 1.
library(raster)
library(gdalUtils)
inpath <- "E:/aster200102/ast_200102"
setwd(inpath)
filenames <- list.files(,pattern=".hdf$",full.names = FALSE)
for (filename in filenames)
{
sds <- get_subdatasets(filename)
gdal_translate(sds[1], dst_dataset=paste0(substr(filename, 1, nchar(filename)-4) ,".tif"))
}
Use the HEG toolkit provided by NASA to convert your hdf file to geotiff and then use any package ("raster" for example) to read the file. I do the same for both old and new hdf files.
Heres the link: https://newsroom.gsfc.nasa.gov/sdptoolkit/HEG/HEGHome.html
Take a look at the NASA products supported here: https://newsroom.gsfc.nasa.gov/sdptoolkit/HEG/HEGProductList.html
Hope this helps.
This script has been very useful and I managed to convert a batch of 36 files using it. However, my problem is that the conversion does not seem correct. When I do it using ArcGIS 'Make NetCDF Raster Layer tool', I get different results + I am able to convert the numbers to C from Kelvin using simple formula: RasterValue * 0.02 - 273.15. With the results from R conversion I don't get the right results after conversion which leads me to believe ArcGIS conversion is good, and R conversion returns an error.
library(gdalUtils)
library(raster)
setwd("D:/Data/Climate/MODIS")
# Get a list of sds names
sds <- get_subdatasets('MOD11C3.A2009001.006.2016006051904.hdf')
# Isolate the name of the first sds
name <- sds[1]
filename <- 'Rasterinr.tif'
gdal_translate(sds[1], dst_dataset = filename)
# Load the Geotiff created into R
r <- raster(filename)
# Identify files to read:
rlist=list.files(getwd(), pattern="hdf$", full.names=FALSE)
# Substract last 5 digits from MODIS filename for use in a new .img filename
substrRight <- function(x, n){
substr(x, nchar(x)-n+1, nchar(x))
}
filenames0 <- substrRight(rlist,9)
# Suffixes for MODIS files for identyfication:
filenamessuffix <- substr(filenames0,1,5)
listofnewnames <- c("2009.01.MODIS_","2009.02.MODIS_","2009.03.MODIS_","2009.04.MODIS_","2009.05.MODIS_",
"2009.06.MODIS_","2009.07.MODIS_","2009.08.MODIS_","2009.09.MODIS_","2009.10.MODIS_",
"2009.11.MODIS_","2009.12.MODIS_",
"2010.01.MODIS_","2010.02.MODIS_","2010.03.MODIS_","2010.04.MODIS_","2010.05.MODIS_",
"2010.06.MODIS_","2010.07.MODIS_","2010.08.MODIS_","2010.09.MODIS_","2010.10.MODIS_",
"2010.11.MODIS_","2010.12.MODIS_",
"2011.01.MODIS_","2011.02.MODIS_","2011.03.MODIS_","2011.04.MODIS_","2011.05.MODIS_",
"2011.06.MODIS_","2011.07.MODIS_","2011.08.MODIS_","2011.09.MODIS_","2011.10.MODIS_",
"2011.11.MODIS_","2011.12.MODIS_")
# Final new names for converted files:
newnames <- vector()
for (i in 1:length(listofnewnames)) {
newnames[i] <- paste0(listofnewnames[i],filenamessuffix[i],".img")
}
# Loop converting files to raster from NetCDF
for (i in 1:length(rlist)) {
sds <- get_subdatasets(rlist[i])
gdal_translate(sds[1], dst_dataset = newnames[i])
}

Read Multiple ncdf files and make average in R

By using R ill try to open my NetCDF data that contain 5 dimensional space with 15 variables. (variable for calculation is in matrix 1000X920 )
This problem actually look like the same with the other question before.
I got explanation from here and the others
At first I used RNetCDF package, but after some trial i found unconsistensy when the package read my data. And then finally better after used ncdf package.
there is no problem for opening data in a single file, but after ill try for looping in more than hundred data inside folder for a spesific variable (for example: var no 15) the program was failed.
> days = formatC(001:004, width=3, flag="0")
> ncfiles = lapply (days,
> function(d){ filename = paste("data",d,".nc",sep="")
> open.ncdf(filename) })
also when i try the command like this for a spesific variable
> sapply(ncfiles,function(file,{get.var.ncdf(file,"var15")})
so my question is, any solution to read all netcdf file with special variable then make calculation in one frame. From the solution before i was failed for generating the variable no 15 on whole netcdf data.
thanks for any solution to this problem.
UPDATE:
this is the last what i have done
when i write
library(ncdf)
files=list.files("allnc/",pattern='*nc',full.names=TRUE)
for(i in seq_along(files)) {
nc <- lapply(files[i],open.ncdf)
lw = get.var.ncdf(nc,"var15")
x=dim(lw)
rbind(df,data.frame(lw))->df
}
i can get all netcdf data by > nc
so i how i can get variable data with new name automatically like lw1,lw2...etc
i cant apply
var1 <- lapply(files, FUN = get.var.ncdf, variable = "var15")
then i can do calculation with all data.
the other technique i try used RNetCDF package n doing a looping
# Declare data frame
df=NULL
#Open all files
files= list.files("allnc/",pattern='*.nc',full.names=TRUE)
# Loop over files
for(i in seq_along(files)) {
nc = open.nc(files[i])
# Read the whole nc file and read the length of the varying dimension (here, the 3rd dimension, specifically time)
lw = var.get.nc(nc,'DBZH')
x=dim(lw)
# Vary the time dimension for each file as required
lw = var.get.nc(nc,'var15')
# Add the values from each file to a single data.frame
}
i can take a variable data but i just got one data from my all file nc.
note: sampe of my data name ( data20150102001.nc,data20150102002.nc.....etc)
This solution uses NCO, not R. You may use it to check your R solution:
ncra -v var15 data20150102*.nc out.nc
That is all.
Full documentation in NCO User Guide.
You can use the ensemble statistics capabilities of CDO, but note that on some systems the number of files is limited to 256:
cdo ensmean data20150102*.nc ensmean.nc
you can replace "mean" with the statistic of your choice, max, std, var, min etc...

Opening and reading multiple netcdf files with RnetCDF

Using R, I am trying to open all the netcdf files I have in a single folder (e.g 20 files) read a single variable, and create a single data.frame combining the values from all files. I have been using RnetCDF to read netcdf files. For a single file, I read the variable with the following commands:
library('RNetCDF')
nc = open.nc('file.nc')
lw = var.get.nc(nc,'LWdown',start=c(414,315,1),count=c(1,1,240))
where 414 & 315 are the longitude and latitude of the value I would like to extract and 240 is the number of timesteps.
I have found this thread which explains how to open multiple files. Following it, I have managed to open the files using:
filenames= list.files('/MY_FOLDER/',pattern='*.nc',full.names=TRUE)
ldf = lapply(filenames,open.nc)
but now I'm stuck. I tried
var1= lapply(ldf, var.get.nc(ldf,'LWdown',start=c(414,315,1),count=c(1,1,240)))
but it doesn't work.
The added complication is that every nc file has a different number of timestep. So I have 2 questions:
1: How can I open all files, read the variable in each file and combine all values in a single data frame?
2: How can I set the last dimension in count to vary for all files?
Following #mdsummer's comment, I have tried a do loop instead and have managed to do everything I needed:
# Declare data frame
df=NULL
#Open all files
files= list.files('MY_FOLDER/',pattern='*.nc',full.names=TRUE)
# Loop over files
for(i in seq_along(files)) {
nc = open.nc(files[i])
# Read the whole nc file and read the length of the varying dimension (here, the 3rd dimension, specifically time)
lw = var.get.nc(nc,'LWdown')
x=dim(lw)
# Vary the time dimension for each file as required
lw = var.get.nc(nc,'LWdown',start=c(414,315,1),count=c(1,1,x[3]))
# Add the values from each file to a single data.frame
rbind(df,data.frame(lw))->df
}
There may be a more elegant way but it works.
You're passing the additional function parameters wrong. You should use ... for that. Here's a simple example of how to pass na.rm to mean.
x.var <- 1:10
x.var[5] <- NA
x.var <- list(x.var)
x.var[[2]] <- 1:10
lapply(x.var, FUN = mean)
lapply(x.var, FUN = mean, na.rm = TRUE)
edit
For your specific example, this would be something along the lines of
var1 <- lapply(ldf, FUN = var.get.nc, variable = 'LWdown', start = c(414, 315, 1), count = c(1, 1, 240))
though this is untested.
I think this is much easier to do with CDO as you can select the varying timestep easily using the date or time stamp, and pick out the desired nearest grid point. This would be an example bash script:
# I don't know how your time axis is
# you may need to use a date with a time stamp too if your data is not e.g. daily
# see the CDO manual for how to define dates.
date=20090101
lat=10
lon=50
files=`ls MY_FOLDER/*.nc`
for file in $files ; do
# select the nearest grid point and the date slice desired:
# %??? strips the .nc from the file name
cdo seldate,$date -remapnn,lon=$lon/lat=$lat $file ${file%???}_${lat}_${lon}_${date}.nc
done
Rscript here to read in the files
It is possible to merge all the new files with cdo, but you would need to be careful if the time stamp is the same. You could try cdo merge or cdo cat - that way you can read in a single file to R, rather than having to loop and open each file separately.

Resources