I have a dataset with intraday data where three important variables are Country, Datetime, Price.
An example could be:
Sweden, 2019-12-23 09:08:00, 105.31
This data is downloaded from Bloomberg, and it looks like it uses my local time (Denmark). For example, for Australia I have a market which starts at 23 00 which does not make sense unless it is European time. I would like to convert the time that I have in the data to the local time in that particular country. Of course, I could add or subtract some hours, but the time difference is not fixed: some countries have summer/winter time while other countries don't, and countries which do have summer/winter time may change on different days (for example I think there is about one week between the time change in US and Europe). Do you have an advice how to transform my dataset into the local timezone? So, if it says "2019-12-23 09:08:00", then I would like to know that in that particular country it was 09:08 in the morning (and not in my country). I really hope there is a smart R function for this.
Thanks in advance!
You could use lubridate::force_tz and lubridate::with_tz:
dat <- as.POSIXct("2021-05-01 12:00:00",tz = "UTC")
lubridate::force_tz(dat,tz="CET")
#> [1] "2021-05-01 12:00:00 CEST"
lubridate::with_tz(lubridate::force_tz(dat,tz="UTC"))
#> [1] "2021-05-01 14:00:00 CEST"
Related
I am working with financial asset pricing data and I am interested in finding the time difference between the dates of two consecutive operations on the market.
A simple example could be:
first operation -> client X buys stock A on 2020-01-02 09:00:00
second operation -> client X buys stock B on 2020-01-03 09:00:00
Here is my problem:
I am looking for a function that computes the FINANCIAL time difference between this two datetime objects.
Hence, I am not interested in a simple calendar time difference (that can be computed in R using the well-known difftime() function), but in a time difference that considers the financial or trading day, that is (roughly speaking) a day that starts at 9.00 am and ends at 18.00 pm.
Therefore, this function should give a result of 9 hours (if the unit reference is hour) to the simple example above, instead of 24 hours as the usual calendar time difference would suggest.
A more complex version of this function would also take into account the market holidays and exclude weekends from computation.
Following an example using R
d1 <- as.POSIXct("2020-01-02 09:00:00", tz = "UTC")
d2 <- as.POSIXct("2020-01-03 09:00:00", tz = "UTC")
difftime(d2, d1, units = "hours")
This produce a time difference of 24 hours.
However, the expected result would be just 9 hours, since the financial (or market trading) day ends on 2020-01-02 at 18.00 pm and starts again the day after at 9.00 am, hence there should be only 9 hours of trading between the two.
I mainly work in R, so I would appreciate if someone can give me some advice on that language, but if anyone knows something similar in other languages would also be very useful.
Thank you a lot for your help and time.
What is the correct way to deal with datetimes in ggplot ?
I have data at several different dates and I would like to facet each date by the same time of day, e.g. between 1:30PM and 1:35PM, and plot the points between this time frame, how can I achieve this?
My data looks like:
datetime col1
2015-01-02 00:00:01 20
... ...
2015-01-02 11:59:59 34
2015-02-19 00:00:03 12
... ...
2015-02-19 11:59:58 27
I find myself often wanting to ggplot time series using datetime objects as the x-axis but I don't know how to use times only when dates aren't of interest.
The lubridate package will do the trick. There are commands you could use, specifically floor_date or ceiling_date to transform your datetime array.
I always use the chron package for times. It completely disregards dates and stores your time numerically (e.g. 1:30PM is stored as 13.5 because it's 13.5 hours into the day). That allows you to perform math on times, which is great for a lot of reasons, including calculating average time, the time between two points, etc.
For specific help with your plot you'll need to share a sample data frame in an easily copy-able format, and show the code you've tried so far.
This is a question I'd asked previously regarding the chron package, and it also gives an idea of how to share your data/ask a question that's easier for folks to reproduce and therefore answer:
Clear labeling of times class data on horizontal barplot/geom_segment
I am a GIS analyst and using R for a project. I am a bit rusty with R code. I have data in csv format from radio collared foxes with datetime stamps and GPS locations. However, throughout our study the time interval changed so some of the dates have 3 records per day and some have only one. For example:
[1] 2014-12-24 03:00:00
[2] 2014-12-24 12:00:00
[3] 2014-12-24 22:00:00.
There are duplicate datetimes as well that I need to thin, but they have different GPS locations:
[55] 2015-11-03 12:00:00
[56] 2015-11-03 12:00:00.
Ultimately I need just one record per day and I would like it to randomly choose which one is deleted so that I end up with a mix of time values. For example:
[1] 2014-12-24 12:00:00
[2] 2014-12-25 22:00:00.
I tried the !duplicate function with the date only in a separate column but the problem is it only returns the first value so all the times would be at 3:00 am. example code:
oneaday6730 <- xFox6730[!duplicated(xFox6730$Date), drop = FALSE]
With dplyr, assuming mydf is your data:
mydf %>%
group_by(Date) %>%
sample_n(1) -> result
Note that I'm making some assumptions on the structure of your data, in particular that the Date column contains the date you want to sample on.
I have chemistry water data taken from a river. Normally, the sample dates were on a Wednesday every two weeks. The data record starts in 1987 and ends in 2013.
Now, I want to re-check if there are any inconsistencies within the data, that is if the samples are really taken every 14 days. For that task I want to use the r function difftime. But I have no idea on how to do that for multiple dates.
Here is some data:
Date Value
1987-04-16 12:00:00 1,5
1987-04-30 12:00:00 1,2
1987-06-25 12:00:00 1,7
1987-07-14 12:00:00 1,3
Can you tell me on how to use the function difftime properly in that case or any other function that does the job. The result should be the number of days between the samplings and/or a true and false for the 14 days.
Thanks to you guys in advance. Any google-fu was to no avail!
Assuming your data.frame is named dd, you'll want to verify that the Date column is being treated as a date. Most times R will read them as a character which gets converted to a factor in a data.frame. If class(df$Date) is "character" or "factor", run
dd$Date<-as.POSIXct(as.character(dd$Date), format="%Y-%m-%d %H:%M:%S")
Then you can so a simple diff() to get the time difference in days
diff(dd$Date)
# Time differences in days
# [1] 14 56 19
# attr(,"tzone")
# [1] ""
so you can check which ones are over 14 days.
For instance, let's say I want to extract the price at 09:04:00 everyday from a timeseries that is formatted as:
DateTime | Price
2011-04-09 09:01:00 | 100.00
2011-04-09 09:02:00 | 100.10
2011-04-09 09:03:00 | 100.13
(NB: there is no | in the actual data, i've just included it here to illustrate that the DateTime is the index and Price is the coredata and that the two are distinct within the xts object)
and put those extracted values into an xts vector...what is the most efficient way to do this?
Also, if i have a five year time series of a cross-border spread, where - due to time differences - the spread opens at different times during the year (say 9am during winter, and 10am during summer) how can I get R to take account of those time differences and recognise either 9am-16:30 or 10am-16:30 as the same "day" interval.
In other words, I want to convert an intraday, 1m tick data file to daily OHLC data. Normally would just use xts and to.period to do this, but - given the time difference noted above - gives odd / strange day start/end times due
Any advice greatly appreciated!
You can use the "T" prefix with xts subsetting to specify a time interval for each day. You must specify an interval; a single time will not work.
set.seed(21)
x <- xts(cumprod(1+rnorm(2*60*24)/100),
as.POSIXct("2011-04-09 09:01:00")+60*(1:(2*60*24)))
x["T09:01:59/T09:02:01"]
# [,1]
# 2011-04-09 09:02:00 0.9980737
# 2011-04-10 09:02:00 1.0778835