How to append new data into existing netcdf file - netcdf

I have a multivariable ncdf that I created and wanted to add additional data to each variable. the lat and long dimensions will remain the same but I want to extend the time dimension by appending new data to each variable. The new data set has the same number of variables, dim1 and dim2 but its dim3 starts where the time of dim3 of the first dataset ends.
Existing ncdf (has 42 variable):
Here is startup code:
library(ncdf4)
dim1 = ncdim_def("lat")
dim2 = ncdim_def( "long")
dim3 = ncdim_def( "time", "days since 2004-01-01", as.integer(time))
Var<-c("a","b","c","d",.....) # variables of existing "merged.nc" file
unit<-c("aa","ab","ac","ad",...)
mat<-(n by m data matrix)
mync = nc_open('merged.nc', write=TRUE)
for (k in 2:length(var)){
ncvar_put(mync,var[k],mat[,k])
}
nc_close(mync)

You might try NCO's ncrcat
ncrcat in1.nc in2.nc out.nc

you can merge in time using
cdo mergetime in1.nc in2.nc out.nc

Related

How to merge netcdf files separately in R?

I am a user of R and would like some help in the following:
I have two netcdf files (each of dimensions 30x30x365) and one more with 30x30x366. These 3 files contain a year's worth of daily data, where the last dimension refers to the time dimension. I wanted to combine them separately i.e. I wanted the output file to contain 30x30x1096.
Note: I have seen a similar question but the output results in an average (i.e. 30x30x3) which I do not want.
from the comment I see below you seem to want to merge 3 files in the time dimension. As an alternative to R, you could do this quickly from the command line using cdo (climate data operators):
cdo mergetime file1.nc file2.nc file3.nc mergedfile.nc
or using wildcards:
cdo mergetime file?.nc mergedfile.nc
cdo is easy to install under ubuntu:
sudo apt install cdo
Without knowing exactly what dimensions and variables you have, this may be enough to get you started:
library(ncdf4)
output_data <- array(dim = c(30, 30, 1096))
files <- c('file1.nc', 'file2.nc', 'file3.nc')
days <- c(365, 365, 366)
# Open each file and add it to the final output array
for (i in seq_along(files)) {
nc <- nc_open(files[i])
input_arr <- ncvar_get(nc, varid='var_name')
nc_close(nc)
# Calculate the indices where each file's data should go
if (i > 1) {
day_idx <- (1:days[i]) + sum(days[1:(i-1)])
} else {
day_idx <- 1:days[i]
}
output_data[ , , day_idx] <- input_arr
}
# Write out output_data to a NetCDF. How exactly this should be done depends on what
# dimensions and variables you have.
# See here for more:
# https://publicwiki.deltares.nl/display/OET/Creating+a+netCDF+file+with+R

Writing a loop to store many data variables in r

I want to begin by saying that I am not a programmer, I'm just trying to store data so it's easily readable to myself.
I just downloaded a large .nc file of weather data and I am trying to take data from the file and store it in .csv format so I can easily view it in excel. The problem with the data is that it contains 53 variables with three 'dimensions': latitude, longitude, and time. I have written some code to only take a certain latitude and longitude and every timestamp so I get one nice column for each variable (with a single latitude and longitude but every timestamp). My problem is that I want to have the loop store a column for every variable to a different (arbitrary) object in R so that I just have to run it once and then write all the data to one .csv file with the write.csv function.
Here's the code I've written so far, where janweather is the .nc file.
while( j <= 53){
v1 <- janweather$var[[j]]
varsize <- v1$varsize
ndims <- v1$ndims
nt <- varsize[ndims] # Remember timelike dim is always the LAST dimension!
j <- j +1;
for( i in 1:nt ) {
# Initialize start and count to read one timestep of the variable.
start <- rep(1,ndims) # begin with start=(1,1,1,...,1)
start[1] <- i
start[2] <- i# change to start=(i,i,1
count <- varsize # begin w/count=(nx,ny,nz,...), reads entire var
count[1] <- 1
count[2] <- 1
data3 <- get.var.ncdf( janweather, v1, start=start, count=count )
}
}
Here are the details of the nc file from print.ncdf(janweather):
[1] "file netcdf-atls04-20150304074032-33683-0853.nc has 3 dimensions:" [1] "longitude Size: 240" [1] "latitude Size: 121" [1] "time Size: 31" [1] "------------------------" [1] "file netcdf-atls04-20150304074032-33683-0853.nc has 53 variables:"
My main goal is to have all the variables stored under a different name by the get.var.ncdf function. Right now I realize that it just keeps overwritting 'data3' until it reaches the last variable so all I've accomplished is getting data3 written to the last variable. I'd like to think there is an easy solution to this but I'm not exactly sure how to generate strings to store the variables under.
Again, I'm not a programmer so I'm sorry if anything I've said doesn't make any sense, I'm not very well versed in the lingo or anything.
Thanks for any and all help you guys bring!
If your not a programmer and want only to get variables in csv format, you can use the NCO commands. With this commands you can do multiple operations on netcdf files.
So with the command ncks you can output the data from a variable with and specific dimensions slice.
ncks -H -v latitude janweather.nc
This command will list on the screen the values in the latitude variable.
ncks -s '%f ,' -H -v temperature janweather.nc
This command will list the values of the variable temperature, with the format specified with the -p argument (sprintf style).
So just pass the output to a file and there you have the contents of a variables in a text file.
ncks -s '%f ,' -H -v temperature janweather.nc > temperature.csv

Read Multiple ncdf files and make average in R

By using R ill try to open my NetCDF data that contain 5 dimensional space with 15 variables. (variable for calculation is in matrix 1000X920 )
This problem actually look like the same with the other question before.
I got explanation from here and the others
At first I used RNetCDF package, but after some trial i found unconsistensy when the package read my data. And then finally better after used ncdf package.
there is no problem for opening data in a single file, but after ill try for looping in more than hundred data inside folder for a spesific variable (for example: var no 15) the program was failed.
> days = formatC(001:004, width=3, flag="0")
> ncfiles = lapply (days,
> function(d){ filename = paste("data",d,".nc",sep="")
> open.ncdf(filename) })
also when i try the command like this for a spesific variable
> sapply(ncfiles,function(file,{get.var.ncdf(file,"var15")})
so my question is, any solution to read all netcdf file with special variable then make calculation in one frame. From the solution before i was failed for generating the variable no 15 on whole netcdf data.
thanks for any solution to this problem.
UPDATE:
this is the last what i have done
when i write
library(ncdf)
files=list.files("allnc/",pattern='*nc',full.names=TRUE)
for(i in seq_along(files)) {
nc <- lapply(files[i],open.ncdf)
lw = get.var.ncdf(nc,"var15")
x=dim(lw)
rbind(df,data.frame(lw))->df
}
i can get all netcdf data by > nc
so i how i can get variable data with new name automatically like lw1,lw2...etc
i cant apply
var1 <- lapply(files, FUN = get.var.ncdf, variable = "var15")
then i can do calculation with all data.
the other technique i try used RNetCDF package n doing a looping
# Declare data frame
df=NULL
#Open all files
files= list.files("allnc/",pattern='*.nc',full.names=TRUE)
# Loop over files
for(i in seq_along(files)) {
nc = open.nc(files[i])
# Read the whole nc file and read the length of the varying dimension (here, the 3rd dimension, specifically time)
lw = var.get.nc(nc,'DBZH')
x=dim(lw)
# Vary the time dimension for each file as required
lw = var.get.nc(nc,'var15')
# Add the values from each file to a single data.frame
}
i can take a variable data but i just got one data from my all file nc.
note: sampe of my data name ( data20150102001.nc,data20150102002.nc.....etc)
This solution uses NCO, not R. You may use it to check your R solution:
ncra -v var15 data20150102*.nc out.nc
That is all.
Full documentation in NCO User Guide.
You can use the ensemble statistics capabilities of CDO, but note that on some systems the number of files is limited to 256:
cdo ensmean data20150102*.nc ensmean.nc
you can replace "mean" with the statistic of your choice, max, std, var, min etc...

How can I export data from GrADs to a .csv file or from NetCDF to .csv?

I'm having real difficulty with exporting data from GrADS to a .csv file although it should be really easy. The file in question is from the APHRODITE project relating to rainfall over Asia. Basically I can read this file into GrADS using:
open d:/aphro/aphro.ctl
and it tells me that:
Data file d:/aphro/APHRO_MA_025deg_V1101R2.%y4 is open as file 1
Lon set to 60.125 149.875
Lat set to -14.875 54.875
Lev set to 1 1
Time values set: 1961:1:1:0 1961:1:1:0
E set to 1 1
If I execute:
q ctlinfo
it also tells me that I have three variables:
precip 1 0 daily precipitation analysis
rstn 1 0 ratio of 0.05 degree grids with station
flag 1 0 ratio of 0.05 degree grids with snow
Okay, now all I want to do is produce a list in a .csv file (or .txt) file with the following information:
Precipitation Lon Lat Time(date)
It sounds really easy but I just can't do it. One method is to use:
fprintf precip d:/output.csv %g 1
This gives me an .csv file with the entire data for that day in one long column (which is what I want). I can also do the same for lon and lat in different files and combine them. The problem is that this takes for ages for the output file - it is much faster if you don't mind lots of columns but this becomes a pain to manage. Basically, this method is too slow.
Another method is to export the data as a NetCDF file by:
Set sdfwrite -4d d:/output.nc
define var = precip
sdfwrite precip
This then very quickly writes a file called output.nc which contains all the data I need. Using R I can then read all the variables individually e.g.
f <- open.ncdf("D:/aphro/test.nc")
A <- get.var.ncdf(nc=f,varid="time")
B <- get.var.ncdf(nc=f,varid="rain")
D <- get.var.ncdf(nc=f,varid="lon")
E <- get.var.ncdf(nc=f,varid="lat")
But what I want is to make an output file where each row tells me the time, rain amount, lon and lat. I tried rbind but it doesn't associate the correct time(date) with the right rain amount, and similarly messes up the lon and lat as there are hundreds of thousand of rain data but only a few dates and only 360 lon points and 280 lat points (i.e. the rain data is a grid of data for each day over several days). I'm sure this should be easy but how to do it?
Please help
Tony
Up to my knowledge, you can change the GrAD file to NetCDF file by using climate data operator and R together. Details can be found here. Further a NetCDF file can be converted in to a .csv file. For this I am providing a dummy code.
library(ncdf)
nc <- open.ncdf("foo.nc") #open ncdf file and read variables
lon <- get.var.ncdf(nc, "lon") # Lon lat and Time
lat <- get.var.ncdf(nc, "lat")
time <- get.var.ncdf(nc, "time")
dname <- "t" # name of variable which can be found by using print(nc)
nlon <- dim(lon)
nlat<- dim(lat)
nt<- dim(time)
lonlat <- expand.grid(lon, lat) # make grid of given longitude and latitude
mintemp.array <- get.var.ncdf(nc, dname)
dlname <- att.get.ncdf(nc, dname, "long_name")
dunits <- att.get.ncdf(nc, dname, "units")
fillvalue <- att.get.ncdf(nc, dname, "_FillValue")
mintemp.vec.long <- as.vector(mintemp.array)
mintemp.mat <- matrix(mintemp.vec.long, nrow = nlon * nlat, ncol = nt)
mintemp.df <- data.frame(cbind(lonlat, mintemp.mat))
options(width = 110)
write.csv(mintemp.df, "mintemp_my.csv")
I hope, it explains your question.

Opening and reading multiple netcdf files with RnetCDF

Using R, I am trying to open all the netcdf files I have in a single folder (e.g 20 files) read a single variable, and create a single data.frame combining the values from all files. I have been using RnetCDF to read netcdf files. For a single file, I read the variable with the following commands:
library('RNetCDF')
nc = open.nc('file.nc')
lw = var.get.nc(nc,'LWdown',start=c(414,315,1),count=c(1,1,240))
where 414 & 315 are the longitude and latitude of the value I would like to extract and 240 is the number of timesteps.
I have found this thread which explains how to open multiple files. Following it, I have managed to open the files using:
filenames= list.files('/MY_FOLDER/',pattern='*.nc',full.names=TRUE)
ldf = lapply(filenames,open.nc)
but now I'm stuck. I tried
var1= lapply(ldf, var.get.nc(ldf,'LWdown',start=c(414,315,1),count=c(1,1,240)))
but it doesn't work.
The added complication is that every nc file has a different number of timestep. So I have 2 questions:
1: How can I open all files, read the variable in each file and combine all values in a single data frame?
2: How can I set the last dimension in count to vary for all files?
Following #mdsummer's comment, I have tried a do loop instead and have managed to do everything I needed:
# Declare data frame
df=NULL
#Open all files
files= list.files('MY_FOLDER/',pattern='*.nc',full.names=TRUE)
# Loop over files
for(i in seq_along(files)) {
nc = open.nc(files[i])
# Read the whole nc file and read the length of the varying dimension (here, the 3rd dimension, specifically time)
lw = var.get.nc(nc,'LWdown')
x=dim(lw)
# Vary the time dimension for each file as required
lw = var.get.nc(nc,'LWdown',start=c(414,315,1),count=c(1,1,x[3]))
# Add the values from each file to a single data.frame
rbind(df,data.frame(lw))->df
}
There may be a more elegant way but it works.
You're passing the additional function parameters wrong. You should use ... for that. Here's a simple example of how to pass na.rm to mean.
x.var <- 1:10
x.var[5] <- NA
x.var <- list(x.var)
x.var[[2]] <- 1:10
lapply(x.var, FUN = mean)
lapply(x.var, FUN = mean, na.rm = TRUE)
edit
For your specific example, this would be something along the lines of
var1 <- lapply(ldf, FUN = var.get.nc, variable = 'LWdown', start = c(414, 315, 1), count = c(1, 1, 240))
though this is untested.
I think this is much easier to do with CDO as you can select the varying timestep easily using the date or time stamp, and pick out the desired nearest grid point. This would be an example bash script:
# I don't know how your time axis is
# you may need to use a date with a time stamp too if your data is not e.g. daily
# see the CDO manual for how to define dates.
date=20090101
lat=10
lon=50
files=`ls MY_FOLDER/*.nc`
for file in $files ; do
# select the nearest grid point and the date slice desired:
# %??? strips the .nc from the file name
cdo seldate,$date -remapnn,lon=$lon/lat=$lat $file ${file%???}_${lat}_${lon}_${date}.nc
done
Rscript here to read in the files
It is possible to merge all the new files with cdo, but you would need to be careful if the time stamp is the same. You could try cdo merge or cdo cat - that way you can read in a single file to R, rather than having to loop and open each file separately.

Resources