Directional statistics in R - r

I need to create a function for some work I'm doing on directional statistics. I want to show the distribution of flood events using a circle and calculate the mean direction and variance.
I need to calculate the angular value in radians by multiplying the julian date by (360/365). I am having problems because I need a function that takes account of the leap years in the 40 year record I am considering. i.e. IF leap year angular value = julian date x (360/366).
The data I am using is Peaks above threshold so I do not have a piece of data for every year and in some years I have more than one entry
Date Time Flow
04/05/1973 00:00 44.67
22/06/1974 00:00 128.38
22/11/1974 23:45 129.15
26/09/1976 22:00 89.51
15/10/1976 00:00 139.35
24/02/1978 19:30 183.69
27/12/1978 04:00 229.65
18/03/1980 09:15 117.7
02/03/1981 22:00 262.39
Many thanks
Rich

There may be a more elegant way to do this, but try
df$Year<-format(df$Date,"%Y")
that should put just the year if a single column. Then make a new column to indicate if it is a leap year
df$Leap<-0
df$Leap[df$Year=="1972" | df$ Year=="1976" |df$Year=="1980"]<-1
depending on your data, you may find it easier to change to a number and then use the %% to see if you can divide it evenly by 4, but beware of the year 2000.
Then you can use an if statement to the effect of
if (df$Leap==0) {do * 360/365} else {do * 360/366}

Related

Accumulated precipitation between 12UTC to 12UC of the other day

I have a .nc file that contains data every 6 hours of precipitation for 1 full year, my interest is to calculate the daily precipitation and compare with observed data, for this I must make them coincide temporally. To achieve this, the precipitation should accumulate between 12 utc of one day and 12 utc of the next day. Does anyone have a suggestion on how to achieve this with CDO?
Thank you!
Well if the first slice covers 12-18 UTC, then essentially you want to average the timeseries 4 slices at a time, (right?) in which case you can use this
cdo timselmean,4 infile.nc outfile.nc
If the timeseries starts instead at 00, you may need to remove the first two timeslices before you start (cdo seltimestep)
Another method is a bit more of a fudge, in that you can shift the series by 12 hours, and then use the day mean function. This would have the advantage of working for any frequency of data (i.e. you don't hardwire the factor "4" based on the data frequency)
cdo daymean -shifttime,-12hours infile.nc outfile.nc
The answer Adrian Tompkins gives should work well. One additional point to note is that you can remove time steps in CDO. So, if your time starts at 0 UTC ands at 24 UTC, you do not want the first and last time step of Adrian's first answer, but you could modify it as follows:
cdo -timselmean,4 -delete,timestep=-1,-2,1,2 infile.nc outfile.nc
This will require a 2.x version of CDO.

How to I transform half-hourly data that does not span the whole day to a Time Series in R?

This is my first question on stackoverflow, sorry if the question is poorly put.
I am currently developing a project where I predict how much a person drinks each day. I currently have data that looks like this:
The menge column represents how much water a person has actually drunk in 30 minutes (So first value represents amount from 8:00 till before 8:30 etc..). This is a 1 day sample from 3 months of data. The day starts at 8 AM and ends at 8 PM.
I am trying to forecast the Time Series for each day. For example, given the first one or two time steps, we would predict the whole day and then we know how much in total the person has drunk until 8 PM.
I am trying to model this data as a Time Series object in R (Google Colab), in order to use Croston's Method for the forecasting. Using the ts() function, what should I set the frequency to knowing that:
The data is half-hourly
The data is from 8:00 till 20:00 each day (Does not span the whole day)
Would I need to make the data span the whole day by adding 0 values? Are there maybe better approaches for this? Thank you in advance.
When using the ts() function, the frequency is used to define the number of (usually regularly spaced) observations within a given time period. For your example, your observations are every 30 minutes between 8AM and 8PM, and your time period is 1 day. The time period of 1 day assumes that the patterns over each day is of most interest here, you could also use 1 week here.
So within each day of your data (8AM-8PM) you have 24 observations (24 half hours). So a suitable frequency for this data would be 24.
You can also pad the data with 0 values, however this isn't necessary and would complicate the model. If you padded the data so that it has observations for all half-hours of the day, the frequency would then be 48.

Is IDL able to add / subtract from date?

As you can see the question above, I was wondering if IDL is able to add or subtract days / months / years to a given date.
For example:
given_date = anytim('01-jan-2000')
print, given_date
1-Jan-2000 00:00:00.000
When I would add 2 weeks to the given_date, then this date should appear:
15-Jan-2000 00:00:00.000
I was already looking for a solution for this problem, but I unfortunately couldn't find any solution.
Note:
I am using a normal calendar date, not the julian date.
Are you only concerned with dates after 1582? Is accuracy to the second important?
The ANYTIM routine is not part of the IDL distribution. Possibly there are third party routines to handle time increments, but I don't know of any builtin to the IDL library.
By default, which you are using, ANYTIM returns seconds from Jan 1, 1979. So to add/subtract some number of days, weeks, or years, you could calculate the number of seconds in the time interval. Of course, this does not take into account leap seconds/years (but leap years are fairly easy to take into account, leap seconds requires a database of when they were added). And adding months is going to require determining which month so to determine the number of days in it.
IDL can convert to and from Julian dates using JULDAY and CALDAT.
You may also read and write Julian dates (which are doubles or long integers) to and from strings using the format keyword to PRINT, STRING, and READS.
You'll want to use the (C()) calendar date format code.
format='(c(cdi0,"-",cMoa,"-"cyi04," ",cHi02,":",cmi02,":",csf06.3))'
date = julday(1, 1, 2000)
print, date, format=format
; 1-Jan-2000 00:00:00.000
date = date + 14
print, date, format=format
; 15-Jan-2000 00:00:00.000

Determine week number from date over several years

I'm looking for a way to determine the week number (week beginning on Monday) over several years. That means I don't want to have 0-53 but if, let's say I have 2 years of dates, I want them to be numbered with 0-106 in R.
I tried strftime(Datum, format ="%W") but then I only get the annual week number and not as a whole.
Given that you did not provide any data, I took the liberty of creating some:
#create data
Datum<-c("2013-03-01", "2014-06-02", "2013-06-01")
# format data to year-month-day with strptime
Datum<-strptime(Datum, "%Y-%m-%d")
You now need to identify the origin year. As I'm sure you are aware not all years have the same number of weeks 52.29 in a leap year vs. 52.4 in a standard calendar year but as this is unlikely to be a consideration for only 2 years we can use the number of weeks returned through the strftime function.
origin.year=as.numeric(min(substring(Datum,1,4)))
# number of weeks in first year (offset for second year)
n.weeks<-52
Now we can create a vector containing the number of weeks to offset each week in Datum (X).
X<-as.numeric(substring(Datum,1,4)!=origin.year)*n.weeks
We can then simply add this vector to the number of weeks returned by strftime when it is applied to Datum
week.vec<-as.numeric(strftime(Datum, "%W"))+X
This will work for 2 years, but if you have more years than this, you will need to modify the offsets to account for this.

How do i extract a specific, recurring time from 1 minute tick data in R?

For instance, let's say I want to extract the price at 09:04:00 everyday from a timeseries that is formatted as:
DateTime | Price
2011-04-09 09:01:00 | 100.00
2011-04-09 09:02:00 | 100.10
2011-04-09 09:03:00 | 100.13
(NB: there is no | in the actual data, i've just included it here to illustrate that the DateTime is the index and Price is the coredata and that the two are distinct within the xts object)
and put those extracted values into an xts vector...what is the most efficient way to do this?
Also, if i have a five year time series of a cross-border spread, where - due to time differences - the spread opens at different times during the year (say 9am during winter, and 10am during summer) how can I get R to take account of those time differences and recognise either 9am-16:30 or 10am-16:30 as the same "day" interval.
In other words, I want to convert an intraday, 1m tick data file to daily OHLC data. Normally would just use xts and to.period to do this, but - given the time difference noted above - gives odd / strange day start/end times due
Any advice greatly appreciated!
You can use the "T" prefix with xts subsetting to specify a time interval for each day. You must specify an interval; a single time will not work.
set.seed(21)
x <- xts(cumprod(1+rnorm(2*60*24)/100),
as.POSIXct("2011-04-09 09:01:00")+60*(1:(2*60*24)))
x["T09:01:59/T09:02:01"]
# [,1]
# 2011-04-09 09:02:00 0.9980737
# 2011-04-10 09:02:00 1.0778835

Resources