R: ggplot with datetime on x axis - r

I'm trying to plot 5 days of historical stock data in R using ggplot. Datetime on the x axis, and the stock values ('close') on the y axis. I only want to show the minutes of the day when the stock market is open, and my data set is limited to 7 hours a day of values x 5 days.
But when I plot it with ggplot, the scale is changed so I get all the hours per day.
ggplot(data = df_stock, aes(x = datetime, y = close)) +
geom_line()
I've tried googling this and using the R help function. I'm quite new to R so my apologies if this is very easy to solve. I hope someone can guide me in the right direction.

While a minimal example would be helpful to address this question, a more general answer which could be useful can be given.
ggplot2 does not have a facility for axis-breaking, as it's not considered good practice. However what you're doing here is actually a transformation of the variable --- hours open rather than hours of the day. So you'll have to transform the variable itself. You can do this using the lubridate package. Let's take two days and imagine the market is open between 10am and 5pm.
require(lubridate)
dates <- c("2014-01-01 10:00:00 UTC", "2014-01-01 13:00:00 UTC", "2014-01-01 17:00:00 UTC", "2014-01-02 13:00:00 UTC", "2014-01-02 17:00:00 UTC")
dates <- ymd_hms(dates)
Now you will need to scale the data to only have the times you want. You can call the parts of the date with `lubridate', and divide by the hours in the day, where here they are 24 rather than 7.
hour(dates) <- hour(dates)-10
scaleddates <- day(dates)-1 + hour(dates)/7 + minute(dates)/60/7 + second(dates)/60/60/7
scaleddates
[1] 0.0000000 0.4285714 1.0000000 1.4285714 2.0000000
Now you can plot the graph, with the x-axis reading 'Days' rather than 'Dates'. The x-axis is now how far you are through the working day.

Related

R, intraday data, convert datetime to a different timezone

I have a dataset with intraday data where three important variables are Country, Datetime, Price.
An example could be:
Sweden, 2019-12-23 09:08:00, 105.31
This data is downloaded from Bloomberg, and it looks like it uses my local time (Denmark). For example, for Australia I have a market which starts at 23 00 which does not make sense unless it is European time. I would like to convert the time that I have in the data to the local time in that particular country. Of course, I could add or subtract some hours, but the time difference is not fixed: some countries have summer/winter time while other countries don't, and countries which do have summer/winter time may change on different days (for example I think there is about one week between the time change in US and Europe). Do you have an advice how to transform my dataset into the local timezone? So, if it says "2019-12-23 09:08:00", then I would like to know that in that particular country it was 09:08 in the morning (and not in my country). I really hope there is a smart R function for this.
Thanks in advance!
You could use lubridate::force_tz and lubridate::with_tz:
dat <- as.POSIXct("2021-05-01 12:00:00",tz = "UTC")
lubridate::force_tz(dat,tz="CET")
#> [1] "2021-05-01 12:00:00 CEST"
lubridate::with_tz(lubridate::force_tz(dat,tz="UTC"))
#> [1] "2021-05-01 14:00:00 CEST"

Can I specify the dates and times of a time series in R?

I have a dataset that contains times and dates in the first column, and the stock prices in the second column.
I used the following format.
Time Price
2015-02-01 10:00 50
I want to turn this into a time series object. I tried ts(data) function, but when I plot the data I cannot observe the dates in the x-axis. Also I tried ts(data, start=) function. Because I have some hours with missing prices, and those hours are not included in my data set, if I set start date and frequency, my plot will be misleading.
Here is the sample data that I have. It is called df.
time price
1 2013-05-01 00:00:00 124.30
2 2013-05-01 01:00:00 98.99
3 2013-05-01 02:00:00 64.00
4 2013-05-01 03:00:00 64.00
This is the code that I used
Time1 <- ts(df)
autoplot(Time1)
Also tried this,
Time1 <- zoo(Time_series_data[,2], order.by = Time_series_data[,1])
Time_n <- ts(Time1)
autoplot(Time1)
However, when I plot the graph with autoplot(Time1) the x-axis doesn't show the times that I specified but numbers from 0 to 4. I want to have plot of a ts object that includes the date columns in the x-axis and values in Y
Is there any way to convert it to a time series object in R. Thanks.
Try the following:
Create some data using the nifty tribble function from the tibble package.
library(tibble)
df <- tribble(~time, ~price,
"2013-05-01 00:00:00", 124.30,
"2013-05-01 01:00:00", 98.99,
"2013-05-01 02:00:00", 64.00,
"2013-05-01 03:00:00", 64.00)
The time column is a character class and cannot be plotted in the usual way. So convert it using as.Posixct. I'll use the dplyr package here but that's optional.
library(dplyr)
df <- df %>%
mutate(time=as.POSIXct(time))
Next, convert the data to a time series object. This requires the xts package, although I'm sure there are other options including zoo.
library(xts)
df.ts <- xts(df[, -1], order.by=df$time)
Now you can visualise the data.
plot(df.ts) # This should call the `plot.xts` method
And if you prefer ggplot2.
library(ggplot2)
autoplot(df.ts)

Binning and making histogram for dates in R

I am new to using dates in R so sorry if this is a basic question. I have a data set that has the name of fracking wells and their job end date as listed below:
df = as.data.frame(df)
head(df)
`WellName JobEndDate
1 WILLIAM VALENTINE 1 5/19/1982 12:00:00 AM
2 LIZARD HEAD 1-8H RE 2/7/1995 12:00:00 AM
3 North Westbrook Unit/Well No. 3032 6/11/1996 12:00:00 AM
4 Olene Reagan 3-1 12/13/2001 12:00:00 AM
5 CNX3 9/22/2008 12:00:00 AM
7 CNX2 1/22/2009 12:00:00 AM`
It is a large file with about 100,000 entries that go until 2017. I want to create a histogram based on the job end date. To do that, I figured I would place the dates into bins, breaking by months. However, I am struggling with placing them into bins so that each month has a number corresponding to how many wells were finished in each month. Therefore, I am also struggling with the histogram. I would appreciate any help!! Thank you!
First, extract month from every date
library(data.table)
df$months <- month(df$JobEndDate)
Then, make your plot:
library(ggplot2)
ggplot(df, aes(x='months')) + geom_histogram()
# alternate
hist(table(df$months))

xts - how to subset on each day of the week

I understand similar questions have been answered. My problem is I have a time series data for 2033 days on 15 minutes interval. I would like to plot the series for each day (Mon-Sun). For instance how an average Monday looks like.
I tried to subset by using .indexwday, but the series for the day starts at 13:00.
I am kind of novice, so please let me know if I need to provide additional details.
Sample data (xts)
2008-01-01 00:00:00 16
2008-01-01 00:15:00 56
2008-01-01 00:30:00 136
2008-01-01 00:45:00 170
2008-01-01 01:00:00 132
....
2013-07-25 22:30:00 95
2013-07-25 22:45:00 82
2013-07-25 23:00:00 66
2013-07-25 23:15:00 65
2013-07-25 23:30:00 66
2013-07-25 23:45:00 46
The plot below might make more sense what I want (This is the average of all Mondays)
Here's another solution, which does not depend on packages other than xts and zoo.
# example data
ix <- seq(as.POSIXct("2008-01-01"), as.POSIXct("2013-07-26"), by="15 min")
set.seed(21)
x <- xts(sample(200, length(ix), TRUE), ix)
# aggregate by 15-minute observations for each weekday
a <- lapply(split.default(x, format(index(x), "%A")), # split by weekday
function(x) aggregate(x, format(index(x), "%H:%M"), mean)) # aggregate by 15-min
# merge aggregated data into one zoo object, ordering columns
z <- do.call(merge, a)[,c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")]
# convert index to POSIXct to make plotting easier
index(z) <- as.POSIXct(index(z), format="%H:%M")
# plot
plot(z, type="l", nc=1, ylim=range(z), main="Average daily volume", las=1)
Setting ylim forces each plot to have the same y-axis range. Otherwise they would depend on each individual series, which may make them difficult to compare if the values vary greatly.
Try this:
#Get necessary packages
install.packages("lubridate")
install.packages("magrittr")
install.packages("dplyr")
install.packages("ggplot2")
install.packages("scales")
#Import packages
library(lubridate,warn=F)
library(dplyr,warn=F)
library(magrittr)
library(ggplot2,warn=F)
library(scales, warn=F)
#Getting the data
tstart = as.POSIXct('2008-01-01 00:00:00')
tend = as.POSIXct('2013-07-25 23:45:00')
ttimes <- seq(from = tstart,to=tend,by='15 mins')
tvals <- sample(seq(1,200),length(ttimes),T)
tsdata <- data.frame(Dates=ttimes,Vals=tvals)
tsdata <- tsdata %>% mutate(DayofWeek = wday(Dates,label=T), Hours = as.POSIXct(strftime(Dates,format="%H:%M:%S"),format="%H:%M:%S"))
#Pick a day at a time. I am using Mondays for this example.
tsdata_monday <- tsdata %>% filter(DayofWeek=='Mon') %>% group_by(Hours) %>% summarise(meanVals=mean(Vals)) %>% as.data.frame()
#Plotting the graph of mean values versus times for Monday:
ggplot(tsdata_monday) + aes(x=Hours,y=meanVals) + geom_line() + scale_x_datetime(breaks=date_breaks("4 hour"), labels=date_format("%H:%M"))
#If you want you can go ahead and plot all the days. But please keep in mind
#that this does not look good at all. Too many plots for the plot window to
#Display nicely.
alltsdata <- tsdata %>% group_by(DayofWeek, Hours) %>% summarise(MeanVals=mean(Vals)) %>% as.data.frame()
ggplot(alltsdata) + aes(x=Hours,y=MeanVals) + geom_line() + scale_x_datetime(breaks=date_breaks("4 hour"), labels=date_format("%H:%M")) + facet_grid(.~DayofWeek)
I recommend you plot one day at a time or use a for loop or one of the apply function variations to get the plots.
Also when filtering by day of the week, please keep in mind that the days are shortened as follows:
unique(tsdata$DayofWeek)
[1] Tues Wed Thurs Fri Sat Sun Mon
Hope it helps.
apply.daily does exactly what you want.( assumming your data is called d.xts and a xts-object)
apply.daily(d.xts,sum)
another solution would be using aggregate:
aggregate(d.xts,as.Date(index(d.xts)),sum)
Note that the answers are slightly different: apply.daily starts from start(d.xts) to end(d.xts) whereas aggregate goes by day from midnight to midnight.

ggplot2 and chron barplot of time data scale_x_chron

I have a number of times and want to plot the frequency of each time in a barplot
library(ggplot2)
library(chron)
test <- data.frame(times = c(rep("7:00:00",4), rep("8:00:00",3),
rep("12:00:00",1), rep("13:00:00",5)))
test$times <- times(test$times)
test
times
1 07:00:00
2 07:00:00
3 07:00:00
4 07:00:00
5 08:00:00
6 08:00:00
7 08:00:00
8 12:00:00
9 13:00:00
10 13:00:00
11 13:00:00
12 13:00:00
13 13:00:00
The value of binwidth is chosen to represent minutes
p <- ggplot(test, aes(x = times)) + geom_bar(binwidth=1/24/60)
p + scale_x_chron(format="%H:%M")
As you see the scales are plus one hour in the x-axis:
I have the feeling that is has something to do with the timezone, but I cant really place it:
Sys.timezone()
[1] "CET"
Edit:
Thanks #shadow for comment
UPDATE:
If I run Sys.setenv(TZ='GMT') first it works perfectly. The problem is in the times() function. I automatically sets the timezone to GMT and if I'm plotting the x-axis, ggplot notices that my system-timezone is CET and adds one hour on the plot.
Now if i'm setting my system-timezone to GMT, ggplot doesn't add an hour.
The problem is that times(...) assumes the timezone is GMT, and then ggplot compensates for your actual timezone. This is fair enough: times are meaningless unless you specify timezone. The bigger problem is that it does not seem possible to tell times(...) what the actual timezone is (if someone else knows how to do this I'd love to know).
A workaround is to use POSIXct and identify your timezone (mine is EST).
test <- data.frame(times = c(rep("7:00:00",4), rep("8:00:00",3),
rep("12:00:00",1), rep("13:00:00",5)))
test$times <- as.POSIXct(test$times,format="%H:%M:%S",tz="EST")
p <- ggplot(test, aes(x = times)) + geom_bar(binwidth=60,width=.01)
binwidth=60 is 60 seconds.
It has nothing to do with timeszone, the only problem is that in format, %m represents the month and %M represents the minute. So the following will work
p + scale_x_chron(format="%H:%M")

Resources