The ERA5 climate dataset is, in general, defined as a [lat, lon, time] matrix. Except for recent data, where it adds the exp_ver variable that indicate if the data is provisory (recent data, up to 3 months until the present, coded as "05") or old (stable version, coded as "01"), so the matrix are defined [lat, lon, time, exp_ver] only for recent data.
The exp_ver variable only has two values: "01" (old data) and "05" (recent data), and if there is data in "01" so the corresponding time has missing values in the "05" field, and viceversa. I'm looking for merge both "01" and "05" in a unique [lat, lon, time] matrix (so, remove the exp_ver variable), but I dont' know how to conduct this procedure. This is maybe an option:
A. Split the file with cdo splitlevel obtaining the "01" and "05" exp_ver separate files.
B. Remove the missing values segment, or only select the segment with data, from both files (I don't know how to to this!)
C. Remove the redundant variable "exp_ver" from both files (with cdo reduce_dim)
Any help with this? Thank you in advance!
Diego
Can you use ncwa to average over and remove the dimension in step two and then use merge time?
cdo splitlevel # as you were suggesting
ncwa -a exp_ver v1.nc v1rd.nc
ncwa -a exp_ver v5.nc v5rd.nc
cdo mergetime v1rd.nc v5rd.nc out.nc
Or thinking about it what if you use ncwa directly on the original file? It averages over that dimension. I don't recall how missing+value is handled though.
ncwa -a exp_ver era5.nc out.nc
Check out this link too which could be relevant Can ncwa (NCO) understand missing_value
( Preliminary answer posted from phone, will revise tomorrow)
Related
I have a .nc file that contains data every 6 hours of precipitation for 1 full year, my interest is to calculate the daily precipitation and compare with observed data, for this I must make them coincide temporally. To achieve this, the precipitation should accumulate between 12 utc of one day and 12 utc of the next day. Does anyone have a suggestion on how to achieve this with CDO?
Thank you!
Well if the first slice covers 12-18 UTC, then essentially you want to average the timeseries 4 slices at a time, (right?) in which case you can use this
cdo timselmean,4 infile.nc outfile.nc
If the timeseries starts instead at 00, you may need to remove the first two timeslices before you start (cdo seltimestep)
Another method is a bit more of a fudge, in that you can shift the series by 12 hours, and then use the day mean function. This would have the advantage of working for any frequency of data (i.e. you don't hardwire the factor "4" based on the data frequency)
cdo daymean -shifttime,-12hours infile.nc outfile.nc
The answer Adrian Tompkins gives should work well. One additional point to note is that you can remove time steps in CDO. So, if your time starts at 0 UTC ands at 24 UTC, you do not want the first and last time step of Adrian's first answer, but you could modify it as follows:
cdo -timselmean,4 -delete,timestep=-1,-2,1,2 infile.nc outfile.nc
This will require a 2.x version of CDO.
I have a netCDF file for temperature going back the last 22 thousand years at a decadal average (TraCE dataset). I want to calculate 100 or 1000 year averages.
I am really stuck, if anyone could help then that would be great. I am mostly using R, but if it is simple in cdo then I can try this too.
I don't have any code to show as I really don't know where to start. Most examples I have seen have been on daily or yearly data... not decadal
Your data is decadal averages, so it should be easy to do this in CDO. You want to calculate a rolling average which is averaged over every 10 time steps. For this runmean is your friend. Just do the following:
cdo runmean,10 infile.nc outfile.nc
You might need to subset time afterwards, depending on the exact output you want. It sounds like the time you have may be non-standard, but runmean should still be OK.
Robert's solution is useful if you want a smoothed output at the 100 or 1000 year timescale. Your original dataset has 2200 timesteps, and runmean,10 smooths this and produces an output with 2200-9=2191 timesteps, each of which is an average over a 100 year window either centered on the slice itself or lagged/lead, depending on the option used.
However, from your question, I think you are more likely to want an output where the first slice is the average over the first century, the second is for the second century and so on, that is, an output with 220 timeslices, each a century average (or 22 time-slices of 1000 year averages). In other words, you want a command analogous to daymean, monmean and yearmean, but as there is no command called centurymean, then you can instead resort to the more generic command timselmean and manually define your window length:
# Centurial average:
cdo timselmean,10 infile.nc outfile.nc
# Millennial Average:
cdo timselmean,100 infile.nc outfile.nc
I think this should still work despite the non-CF compliant time units you mention in the comment (but let us know if it doesn't)
I have a netcdf file with the daily precipitation (for a whole decade) in every latitude and longitude, it's in the form (lon,lat,time). I want to get the monthly average for the longitude=-118.25:-84.75 and for the latitude=13.25:33.25. I need to write another netcdf file in which the variable is monthly precipitation given by (lon,lat,time) but i dont know how to extract the ranges and how to obtain the monthly average since the months are repeated each year.
Just use the tool called cdo and operator sellonlatbox:
cdo -sellonlatbox -118.25,-84.75,13.25,33.25 filein fileout
filein is the name of your input file and fileout is the name of the output.
Afterwards you can use operator monmean to calculate monthly means:
cdo -monmean fileout final_file
I have climate data with a daily temporal resolution and would like a count of days that have precipitation (e.g., greater than 1mm/day) by month and by year.
I've tried eca_pd,1 and eca_rr1, but these commands return wet-day totals for all years.
For example, cdo eca_pd,1 infile outfile
Is there a command to return wet-days for each month and/or year?
You can accomplish this task with CDO's masking function, for more details beyond the answer below, you can also refer to my video guide on masking using cdo.
The first step is to make an equivalent file with 1 if P>threshold (1mm/day in your case) and 0 otherwise. For this we use the "greater than or equal to a constant" gec function (or ge="greater than" if you prefer):
cdo gec,1 input.nc mask.nc
(assuming units are mm/day in your input file).
Then you can simply sum this mask over the period (months, years etc) that you want your statistic
cdo monsum mask.nc nwetdays_mon.nc
cdo yearsum mask.nc nwetdays_year.nc
Of course you can pipe this if you like to do this on one line: e.g.
cdo monsum -gec,1 input.nc nwetdays_mon.nc
We can take this even further if you want to work out the climatology for a particular month. If you have a multiyear dataset then you can use the wonderful "ymonstat" commands. So for example, once you have calculated your monthly series of wet days above, you can calculate the average for each month with
cdo ymonmean nwetdays_mon.nc nwetdays_mon_clim.nc
You can then difference the series from this monthly climatology to give you the anomaly of wet days in each month over the series
cdo ymonsub nwetdays_mon.nc nwetdays_mon_clim.nc nwetdays_mon_anom.nc
I hope that helps!
(ps: I usually always find it is easier to calculate these kinds of statistics directly with CDO in this way, I rarely find that the built in climate functions calculate exactly the statistic as/how I want).
With NCO's ncap2, create a binary flag then total it in the desired dimension(s):
ncap2 -s 'rainy=(precip > 1);rainy_days=rainy.total($time)' in.nc out.nc
You can also do this in cf-python, essentially using the same methodology as the CDO example above, but in a Python environment, using the where and collapse methods:
import cf
# Read the dataset
f = cf.read('filename.nc')[0]
# Mask out dry days (assuming that your data
# units are 'mm day-1' or 'kg m-2 day-1', etc.)
wet = f.where(cf.le(1), cf.masked)
# If the data are in units of 'metres/day', say, then you could do:
# wet = f.where(cf.le(0.001), cf.masked)
# or
# wet = f.where(cf.le(1, 'mm day-1'), cf.masked)
# etc.
# Count the wet day occurrences by month
count_monthly = wet.collapse('T: sample_size', group=cf.M())
# Count the wet day occurrences by year
count_yearly = wet.collapse('T: sample_size', group=cf.Y())
# Get the data as numpy arrays
print(count_monthly.array)
print(count_yearly.array)
# Count the wet day totals by month
wet_day_sum_monthly = wet.collapse('T: sum', group=cf.M())
# Count the wet day totals by year
wet_day_sum_yearly = wet.collapse('T: sum', group=cf.Y())
There is a data frame like this:
The first two columns in the df describe the start date (month and year) and the end date (month and year). Column names describe every single month and year of a certain time period.
I need a function/loop that insterts "1" or "0" in each cell - "1" when the date from given column name is within the period described by the two first columns, and "0" if not.
I would appreciate any help.
You want to do two different things. (a) create a dummy variable and (b) see if a particular date is in an interval.
Making a dummy variable is the easiest one, in base R you can use ifelse. For example in the iris data frame:
iris$dummy <- ifelse(iris$Sepal.Width > 2.5, 1, 0)
Now working with dates is more complicated. In this answer we will use the library lubridate. First you need to convert all those dates to a format 'Month Year' to something that R can understand. For example for February you could do:
new_format_february_2016 <- interval(ymd('2016-02-01'), ymd('2016-03-01') - dseconds(1))
#[1] 2016-02-01 UTC--2016-02-29 23:59:59 UTC
This is February, the interval of time from the 1 of February to one second before the 1 of March. You can do the same with your start date column and you end date column.
To compare two intevals of time (so, to see if a particular month fall into your other intervals) you can do:
int_overlaps(new_format_february_2016, other_interval)
If this returns true, the two intervals (one particular month and another one) overlaps. This is not the same as one being inside another, but in your case it will work. Using this you can iterate over different columns and rows and build your dummy variable.
But before doing so, I would recommend to clean your data, as your current format is complicate to work with. To get all the power that vector types in R provides ideally you would want to have one row per observation and one variable per column. This does not seem to be the case with your data frame. Take a look to the chapter 'Tidy data' of 'R for Data Science' specially the spreading and gathering subsection:
Tidy data