Xarray time coordinate from integer to datetime - netcdf

I have a NetCDF datafile which I read in with xarray.
import pandas as pd
import xarray as xr
import numpy as np
dataDIR = myNetCDFdatafile
DS = xr.open_dataset(dataDIR)
DS
shows the dataset's structure:
As you can see. time is of int32 type. Thhis integer number represents a time stamp of seconds since 1970-01-01. The period of this time stamps is not equally spaced.
How can I convert it to datetime type?
If I try the numpy way
DS.time = DS.time.astype('datetime64')
I do get a
ValueError: datetime64/timedelta64 must have a unit specified
Any hint?

do you know what those time values stands for? 1647475208 for example?
Sometimes this kind of values represents seconds or days from X date. If you know the min, max and frequency of the time variable you can use pd.data_range()
something like this:
ds['time'] = pd.date_range('2001-01-01', '2010-12-31', freq = 'D') #for a daily interval for example

Related

How to convert a date/time object into a decimal?

I have an object in R that I have converted to a POSIXct object:
data<- data.frame(date_time= c('2021-06-24 18:37:00', '2021-06-24 19:07:00', '2021-06-24 19:37:00', '2021-06-24 20:07:00','2021-06-24 20:37:00'))
data$date_time<- as.POSIXct(data$date_time, format = "%Y-%m-%d %H:%M:%S")
I want to convert this column to a decimal that gets bigger as the time passes. For example, '2021-06-24 18:37:00' should be smaller than '2021-06-24 19:07:00' and so on. However everything that I have tried so far does yield a decimal, but it does not get bigger as the time goes on. I have tried this:
data$date_time2<- yday(data$date_time) + hour(data$date_time)/24 + minute(data$date_time)/60
However this yields:
[1] 176.3667 175.9083 176.4083 175.9500 176.4500
I need the numbers to increase incrementally as minutes go by. Any help?
A datetime object is an integer counting the number of seconds from 1/1/1970. So this works as.integer(data$date_time) to create an integer value. Note the datetime is reference to GMT timezone.
To get the date as a decimal, requires the use of some integer math. The end result is the number of days from 1/1/1970 and the time as fraction.
data<- data.frame(date_time= c('2021-06-24 18:37:00', '2021-06-24 19:07:00', '2021-06-24 19:37:00', '2021-06-24 20:07:00','2021-06-24 20:37:00'))
data$date_time<- as.POSIXct(data$date_time, format = "%Y-%m-%d %H:%M:%S", tz="GMT")
intvalue <- as.integer(data$date_time)
#numbers of seconds, take the MOD with the seconds per day
#divide the result by seconds per day to make the decimal part
decfraction <- intvalue%%(3600*24)/(3600*24)
#perform integer division to get the number of days
days <- intvalue%/%(3600*24)
# or as.integer(as.Date(data$date_time))
#put together for the final answer
dateAsDecimal <- days + decfraction
#result
#18802.78 18802.80 18802.82 18802.84 18802.86
If you are only concerned that the number mapped to preserves order then xtfrm will map objects to order preserving numbers. In the case of POSIXct objects it just returns the internal numeric representation, i.e. seconds since the UNIX Epoch.
xtfrm(data$date_time)

Saving NetCDF as subsets of the time dimension using xarray

I have a NetCDF with one variable (front) and four dimensions (time, altitude, lat and lon). Downloaded from https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdGAtfnt10day_LonPM180.html
It is a monthly composite, i.e. <xarray.DataArray 'time' (time: 151)>
array(['2001-01-16T12:00:00.000000000', '2001-02-16T00:00:00.000000000', ...'2013-12-16T12:00:00.000000000'], dtype='datetime64[ns]').
I would like to create a single file (either a NetCDF or Geotif it doesn't matter which) for each timestamp.
I've tried:
ds = xr.open_dataset("Front_monthly2001to2013.nc", decode_times=True)
months, datasets = zip(*ds.groupby("time.month")) #.groupby("time.month")
paths = ["test%s.nc" % m for m in months]
xr.save_mfdataset(datasets, paths)
But then it groups all the months and I get 12 output files instead of one for each year-month.
How do I save by year-month (which is the same, in my case, as each timestamp)?
Thanks in advance
You can use the datetime components of your time dimension to re-format the timestamp to year & month and use it for grouping:
months, datasets = zip(*ds.groupby(ds.time.dt.strftime("%Y%m")))

How to set datetime index frequency in workdays time series pandas

When dealing with a time series pandas dataframe with date index (working days from Monday to Friday), I tried to change the index frequency from None to 'D' for daily time series. I got the following error:
ValueError: Inferred frequency None from passed values does not conform to passed frequency D
This is how my dataframe looks like:
And this is my code for setting the frequency:
df.index.freq = 'D'
df = df.asfreq('B')
'B' stands for business day frequency (Mon-Fri).

Reading Time series data in R

I am trying to import time series data in R with the below code. The data is from 1-7-2014 to 30-4-2017 making it 1035 data point. But when I use the below code it gives 1093 observation.
series <- ts(data1, start=c(2014,7,1), end=c(2017,4,30), frequency = 365)
Can someone help me in understanding where am I going wrong?
ts doesn't allow input for start and end in this form. Either a single number or a vector of two integers is allowed. In second case it's year and day number, starting from 1st January.
With the help of lubridate you can use the following. decimal_date will convert the date to proper integer, suitable for ts.
library(lubridate)
series <- ts(data1, start=decimal_date(as.Date("2014-07-01")), end=decimal_date(as.Date("2017-04-30") + 1), frequency = 365)
> length(series)
[1] 1035

Plotting millisecond range in pandas

I am trying to create a plot waith an x range of e.g. 500 milliseconds.
rng = date_range(s,periods=500,freq="U")
df = DataFrame(randn(500),index=rng,columns=["A"])
to plot column A:
df["A"].plot()
The whole plot will be squeezed into a single spike because the x range is defined from Jan-2011 until Jul-2014.
Is there a way to change this?
I made a github issue regarding your problem: https://github.com/pydata/pandas/issues/1599 Please check back next week for a bug-fix release of pandas.
Also, the offset alias for millisecond frequency in pandas is 'L'. 'U' is the microsecond frequency alias.

Resources