How to convert annual netCDF data to daily from the command line? - netcdf

Before I resort to using Python, I would like to know if there is a simple way from the command line to convert an annual netCDF file into a file with daily data simply by duplication (including leap years), i.e. each annual value is duplicated 365 (366) times with the appropriate date stamp.
In the file each data value has the date stamped for the first day of each year:
cdo showdate population_ssp2soc_0p5deg_annual_2006-2100_density.nc4
2006-01-01 2007-01-01 2008-01-01 2009-01-01 2010-01-01 ...etc
I know it seems like a strange thing to do (the file size will be 365.25 times bigger!), but I want to read the data into a Fortran program that uses a daily timestep and do not want to mess around with dates in the Fortran code.

There might be a more efficient way of doing this. But you could first merge the original file and then a second which is the first, but time shifted to the final day of the year. Then temporally interpolate:
cdo -mergetime population_ssp2soc_0p5deg_annual_2006-2100_density.nc4 -shifttime,-1day -shifttime,1year population_ssp2soc_0p5deg_annual_2006-2100_density.nc4 temp.nc
cdo inttime,2006-01-01,12:00:00,1day temp.nc outfile.nc

Related

Accumulated precipitation between 12UTC to 12UC of the other day

I have a .nc file that contains data every 6 hours of precipitation for 1 full year, my interest is to calculate the daily precipitation and compare with observed data, for this I must make them coincide temporally. To achieve this, the precipitation should accumulate between 12 utc of one day and 12 utc of the next day. Does anyone have a suggestion on how to achieve this with CDO?
Thank you!
Well if the first slice covers 12-18 UTC, then essentially you want to average the timeseries 4 slices at a time, (right?) in which case you can use this
cdo timselmean,4 infile.nc outfile.nc
If the timeseries starts instead at 00, you may need to remove the first two timeslices before you start (cdo seltimestep)
Another method is a bit more of a fudge, in that you can shift the series by 12 hours, and then use the day mean function. This would have the advantage of working for any frequency of data (i.e. you don't hardwire the factor "4" based on the data frequency)
cdo daymean -shifttime,-12hours infile.nc outfile.nc
The answer Adrian Tompkins gives should work well. One additional point to note is that you can remove time steps in CDO. So, if your time starts at 0 UTC ands at 24 UTC, you do not want the first and last time step of Adrian's first answer, but you could modify it as follows:
cdo -timselmean,4 -delete,timestep=-1,-2,1,2 infile.nc outfile.nc
This will require a 2.x version of CDO.

Calculate a 1000 year mean using decadal data from NetCDF file in R or CDO

I have a netCDF file for temperature going back the last 22 thousand years at a decadal average (TraCE dataset). I want to calculate 100 or 1000 year averages.
I am really stuck, if anyone could help then that would be great. I am mostly using R, but if it is simple in cdo then I can try this too.
I don't have any code to show as I really don't know where to start. Most examples I have seen have been on daily or yearly data... not decadal
Your data is decadal averages, so it should be easy to do this in CDO. You want to calculate a rolling average which is averaged over every 10 time steps. For this runmean is your friend. Just do the following:
cdo runmean,10 infile.nc outfile.nc
You might need to subset time afterwards, depending on the exact output you want. It sounds like the time you have may be non-standard, but runmean should still be OK.
Robert's solution is useful if you want a smoothed output at the 100 or 1000 year timescale. Your original dataset has 2200 timesteps, and runmean,10 smooths this and produces an output with 2200-9=2191 timesteps, each of which is an average over a 100 year window either centered on the slice itself or lagged/lead, depending on the option used.
However, from your question, I think you are more likely to want an output where the first slice is the average over the first century, the second is for the second century and so on, that is, an output with 220 timeslices, each a century average (or 22 time-slices of 1000 year averages). In other words, you want a command analogous to daymean, monmean and yearmean, but as there is no command called centurymean, then you can instead resort to the more generic command timselmean and manually define your window length:
# Centurial average:
cdo timselmean,10 infile.nc outfile.nc
# Millennial Average:
cdo timselmean,100 infile.nc outfile.nc
I think this should still work despite the non-CF compliant time units you mention in the comment (but let us know if it doesn't)

Comparing dates in a dataframe and appending info based on comparison result in R

so I am lost with the following problem:
I have a dataframe, in which one column contains (STARTED) the starting time of a survey, and several others information of the survey schedule of that survey participant (D5 to D10: only the planned survey dates, D17 to D50: planned send-out times of measurement per day). I'd like to create to columns that indicate now which survey day (1-6) and which measurement per day (1-6) this survey corresponds to.
First problem is the format (!)...
STARTED has the format %Y-%m-%d %H:%M:%S, D5 to D10 %d.%m.%Y and D17 to D50 %d.%m.%Y %H:%M.
I tried dmy_hms() from lubridate, parse_date_time(), and simply as.POSIXct(), but I always fail to get STARTED and the D17 to D50 section into a comparable format. Any solutions on this one?
After just separating STARTED into date & time columns, I was able to compare using ifelse() with D5 to D10 and to create the column of day running from 1 to 6.
This might be already more elegant with something like which(), but I was not able to create a vectorized version of this, as which(<<D5:D10>> == STARTED) would need to compare that per row. Does anyone have a solution for this?
And lastly, how on earth can I set up the second column indicating the measurement time? The first and last survey of the is easy, as there are also uniquely labelled, but for the other four ones I would need to compare per day whether the starting time is before the planned survey time of the following survey. I could imagine just checking whether STARTED falls in between two planned survey times just next to each other - as a POSIXct object that might work, if I can parse the different formats.
Help is greatly appreciated, thanks!
A screenshot from the beginning of the data:
Screenshot from R data using View()
For these first few rows, the intended variable day would need to be c(1,2,1,1,1,2,2) and measurement c(3,2,4,2,1,2,3).
Your other columns are not formatted with %d.%m.%Y, instead either %d.%m.%t (date only) or %d.%m.%y %H:%M. Note the change from %Y to %y.
Try:
as.Date("20.05.22", format = "%d.%m.%y")
# [1] "2022-05-20"
as.POSIXct("20.05.22 06:00", format = "%d.%m.%y %H:%M")
# [1] "2022-05-20 06:00:00 EDT"

Days between dates calculation

I imported date variables as strings from SQL (date1) into Stata and then created a new date variable (date2) like this:
gen double date2 = clock(date1, "YMDhms")
format date2 %tc
However, now I want to calculate the number of days between two dates (date3-date2), formatted as above, but I can't seem to do it.
I don't care about the hms, so perhaps I should strip that out first? And then deconstruct the date into YYYY MM DD as separate variables? Nothing I seems to do is working right now.
It sounds like by dates you actually mean timestamp (aka datetime) variables. In my experience, there's usually no need to cast dates/timestamps as strings since ODBC and Stata will handle the conversion to SIF td/tc formats nicely.
But perhaps you exported to a text file and then read in the data instead. Here are a couple solutions.
tc timestamps are in milliseconds since 01jan1960 00:00:00.000, assuming 1000*60*60*24=86,400 seconds/day (that is, ignoring leap seconds). This means that you need to divide your difference by that number to get elapsed days.
For example, 2016 was a leap year:
. display (tc(01jan2017 00:00:00) - tc(01jan2016 00:00:00))/(1000*60*60*24)
366
You can also use the dofc() function to make dates out of timestamps and omit the division:
. display (dofc(tc(01jan2018 00:00:00)) - dofc(tc(01jan2016 00:00:00)))
731
2017 is not a leap year, so 366 + 365 = 731 days.
You can use generate with all these functions, though display is often easier for debugging initial attempts.

R: subsetting timestamped dataframe periodically

I have a csv file that contains many thousands of timestamped data points. The file includes the following columns: Date, Tag, East, North & DistFromMean. The following is a sample of the data in the file:
The data is recorded approximately every 15 minutes for 12 tags over a month. What I'm wanting to do is select from the data, starting from the first date entry, subsets of data i.e. every 3 hours but due to the tags transmitting at slightly different rates I need a minimum and maximum value start and end time.
I have found the a related previous question but don't understand the answer enough to implement.
The solution could firstly ask for the Tag number, then the period required perhaps in minutes from the start time (i.e. every 3hrs or 180 minutes), the minimum time range and the maximum time range, both of which would be constant for whatever time period was used. The minimum and maximum would probably need to be plus and minus 6 minutes from the period selected.
As the code below shows, I've managed to read in the file, change the Date format to POSIXlt and extract data within a specific time frame but the bit I'm stuck on is extracting the data every nth minute and within a range.
TestData<- read.csv ("TestData.csv", header=TRUE, as.is=TRUE)
TestData$Date <- strptime(TestData$Date, "%d/%m/%Y %H:%M")
TestData[TestData$Date >= as.POSIXlt("2014-02-26 7:10:00") & TestData$Date < as.POSIXlt("2014-02-26 7:18:00"),]

Resources