How to regrid variables in a netcdf file to another grid such that the fluxes, mass and other fields remain conserved - netcdf

I want to regrid my variable to a file in such a way that the fluxes and other important conservation fields remain conserved and there is no violation of conservation laws.

You can use cdo in the following way
cdo remapcon,newgrid.nc input_file.nc output_file.nc
where newgrid.nc can be any file which has the target grid and input_file is the file which is to be regridded. The result obtained is output_file.nc, one thing to note in this is that the metadata/attributes should be CF compliant for cdo to understand the files and data.
remapcon ensures that conservation laws are not violated.

Related

Merge Two Datasets with Different Number of Variables per Timestep Using CDO

I am analyzing a time series of data that is split by time into two NetCDF files (infiles). These files have a different number of variables/fields, by design. Traditionally I have been using Climate Data Operators (CDO) to easily merge two datasets sorted by time using the following command in a terminal:
cdo mergetime <infiles> <outfile>
this command merges any number of files "infiles" sorted by time and writes a new "outfile" containing a time series of all the data in each ; however this doesn't appear to work by default with cdo, as it kicks back the following:
cdo select (Abort): Input streams have different number of variables per timestep!
the statement is true, each file does have a different number of variables per timestep. But it prevents me from looking at the dataset as a whole. I have also tried the following modifications to the cdo command I use to merge the time series, without success:
cdo mergetime -select,name=<variable> <infiles> <outfile>
cdo -select,name=<variable> <infiles> <outfile>
I have read through the CDO Userguide and have not found any alternative solutions yet. I would be very grateful if anyone could offer a workaround for joining the two files into a single time-series of data (preferably in cdo but not necessarily) as I am running out of ideas.
On phone but you could delete the extra annoying new variables from files with nco like this
ncks -x -v var1,var2 in.nc out.nc
And then merge as usual. I think you can use the cdo delete operator to do the same thing.

How can i split a netCDF file with conditions

I'm working with a netCDF file with a spatially averaged wind variable, which is a function of time only.
I would like to split the file into years with east wind and years with west wind.
I thought I would do it with cdo but I don't know how to write the condition.
Anything with splityear, 'u <0'?
I do not think this is advisable, as you will split the files in to two different NetCDF files with incompatible grids. In my view this would defeat the purpose of storing the data in NetCDF files.
But, if you wish to do it, there is a way within CDO. As you haven't provided files I can outline a strategy.
First create a mask file identifying cells with u<0:
cdo -setrtomiss,-10000,0 -selname,u infile.nc mask.nc
Then apply reducegrid to the infile using this mask:
cdo -reducegrid,mask.nc infile.nc outfile.nc
That should do it for the u condition. Just test that and modify it for the other variables.

How to remove seasonality from time series data?

How can I remove seasonality data from a timeseries with the data stored in a netcdf file? I would like to find a solution using Linux, while I used Grads and Ferret for visualization.
Thanks a lot!
You can use CDO to calculate the average for each day/month of the year and subtract from the origin file:
If the file contains daily data:
cdo sub in.nc -ydaymean in.nc deseasonalized.nc
Likewise if the data is monthly:
cdo sub in.nc -ymonmean in.nc deseasonalized.nc
The ydaymean and ymonmean commands calculate the annual cycle over the dataset in.nc, i.e. ymonmean returns 12 time slices, the average of all the january, february and so on, which is then subtracted from the original file using sub. I've used piping, but it may be easier to understand on two separate lines:
cdo ymonmean in.nc annual_cycle.nc
cdo sub in.nc annual_cycle.nc deseasonalized.nc
This does exactly the same, deseasonalized.nc will be identical (well almost, there will be a few bytes differences due to the different "history" log in the netcdf global metadata header), but you will also have a new file with the annual_cycle.nc inside it (might also be useful?).
When doing the subtraction, CDO detects that the number of timeslices is smaller in the second file to be subtracted and thus loops/cycles through it. Note as the seasonal cycle is calculated from the same file as the original data it is fine to simply use "sub" as, if the data starts in e.g. April, the results of ymonmean will also start from April. However, if you want to remove a seasonal cycle calculated from a different source, the start day/month may be different and you end up subtracting e.g. April mean from January! To avoid this, you can use the ymonsub command instead:
cdo ymonsub full_timeseries.nc seasonal_file.nc deseasonalised.nc
In addition, there are now also packages in both R and python to allow you to access the full functionality of cdo from within those languages without having to resort to using shell access tools.
Edit 2021: i now have a video on this topic you can view here https://youtu.be/jKlA1ouoQIs

R save as NetCDF file after simple calculation

I want to do something (apparently) simple, but didn't yet find the right way to do it:
I read a netcdf file (wind speed from the ERA5 reanalysis) on a grid.
From this, I use the wind speed to calculate a wind capacity factor (using a given power curve).
I then want to write a new netcdf file, with exactly the same structure as the input file, but just replacing the input wind speed by the new variable (wind capacity factor).
Is there a simple/fast way to do this, avoiding to redefine all the dims, vars ... with ncvar_def and ncdim_def ?
Thanks in advance for your replies!
Writing a netcdf file in R is not overly complicated, there is a nice example online here:
http://geog.uoregon.edu/GeogR/topics/netCDF-write-ncdf4.html
You could copy the dimensions from the input file.
However if your wind power curve is a simple analytical expression then you could perform this task in one line from the command line in bash/linux using climate data operators (cdo).
For example, if you have two variables 10u and 10v in the file (I don't recalled the reanalysis names exactly) then you could make a new variable WCF=SQRT(U2+V2) in the following way
cdo expr,'wcf=sqrt(10u**2+10v**2)' input.nc output.nc
See an example here:
https://code.mpimet.mpg.de/boards/53/topics/1622
So if your window power function is an analytical expression you can define it this way without using R at all or worrying about dimensions etc, the new file will have an variable wcf added. You should then probably use NCO to alter the metadata (units etc) to ensure they are appropriate.

How to get RPKM value from bed file or wig files? And what's the difference between these two type of files?

I want to download fastq raw file from RNAseq to get gene expression values. But GEO only provides .bed.gz and .wig.gz formats. What can I do to get the RPKM values? Thank you very much!
In order to calculate RPKM, you need (mapped) raw reads as contained in BAM/SAM or even CRAM files. Wiggle, BED and their derivatives such as bigWiggle are compressed versions of those only containing the coverage (mainly used for plotting), that is they have lost the read information needed for counting and therefore calculating RPKM (or FPKM/TPM for that manner).
The standard approach is to start from a bam file, extract the reads counts for regions of interest and calculate RPKM etc. There is many pipelines out there such as this.
If Bam files are not available, GEO usually has at least the raw fastq files (or sra files that can be converted to fastq) as a basis for mapping to obtain a bam file. Also have a look at ArrayExpress, they could have the raw files for that project since it is mirroring GEO.
Maybe as a word of warning, if you intend to do differential expression analysis, you need to go from the raw counts, not the RPKM values.

Resources