ggplot2 and chron barplot of time data scale_x_chron - r

I have a number of times and want to plot the frequency of each time in a barplot
library(ggplot2)
library(chron)
test <- data.frame(times = c(rep("7:00:00",4), rep("8:00:00",3),
rep("12:00:00",1), rep("13:00:00",5)))
test$times <- times(test$times)
test
times
1 07:00:00
2 07:00:00
3 07:00:00
4 07:00:00
5 08:00:00
6 08:00:00
7 08:00:00
8 12:00:00
9 13:00:00
10 13:00:00
11 13:00:00
12 13:00:00
13 13:00:00
The value of binwidth is chosen to represent minutes
p <- ggplot(test, aes(x = times)) + geom_bar(binwidth=1/24/60)
p + scale_x_chron(format="%H:%M")
As you see the scales are plus one hour in the x-axis:
I have the feeling that is has something to do with the timezone, but I cant really place it:
Sys.timezone()
[1] "CET"
Edit:
Thanks #shadow for comment
UPDATE:
If I run Sys.setenv(TZ='GMT') first it works perfectly. The problem is in the times() function. I automatically sets the timezone to GMT and if I'm plotting the x-axis, ggplot notices that my system-timezone is CET and adds one hour on the plot.
Now if i'm setting my system-timezone to GMT, ggplot doesn't add an hour.

The problem is that times(...) assumes the timezone is GMT, and then ggplot compensates for your actual timezone. This is fair enough: times are meaningless unless you specify timezone. The bigger problem is that it does not seem possible to tell times(...) what the actual timezone is (if someone else knows how to do this I'd love to know).
A workaround is to use POSIXct and identify your timezone (mine is EST).
test <- data.frame(times = c(rep("7:00:00",4), rep("8:00:00",3),
rep("12:00:00",1), rep("13:00:00",5)))
test$times <- as.POSIXct(test$times,format="%H:%M:%S",tz="EST")
p <- ggplot(test, aes(x = times)) + geom_bar(binwidth=60,width=.01)
binwidth=60 is 60 seconds.

It has nothing to do with timeszone, the only problem is that in format, %m represents the month and %M represents the minute. So the following will work
p + scale_x_chron(format="%H:%M")

Related

How can I parse string values to a workable POSIXct format when the duration of some of the values are over 24 hours?

Please see this previous question I asked.
I have a data set in R that has values in hours, minutes, and seconds format. However, some values only have hours and minutes, some only have minutes and seconds, some only have minutes, and some only have seconds. It's also not formatted very favorably. When I use parse_date_time() on it, the parsing only works for values that are less than 24 hours. Sample data can be found below:
example <- as.data.frame(c("25h28m", "17m7s", "15m", "14s"))
Using parse_date_time(), I get this:
Column Title
NA
00:17:07
00:15:00
00:00:14
But would like to get this:
Column Title
25:28:00
00:17:07
00:15:00
00:00:14
We may do this by converting to period class from lubridate and then use hms from hms
library(lubridate)
hms::hms(as.period(toupper(example)))
-output
25:28:00
00:17:07
00:15:00
00:00:14
data
example <- c("25h28m", "17m7s", "15m", "14s")

xts - how to subset on each day of the week

I understand similar questions have been answered. My problem is I have a time series data for 2033 days on 15 minutes interval. I would like to plot the series for each day (Mon-Sun). For instance how an average Monday looks like.
I tried to subset by using .indexwday, but the series for the day starts at 13:00.
I am kind of novice, so please let me know if I need to provide additional details.
Sample data (xts)
2008-01-01 00:00:00 16
2008-01-01 00:15:00 56
2008-01-01 00:30:00 136
2008-01-01 00:45:00 170
2008-01-01 01:00:00 132
....
2013-07-25 22:30:00 95
2013-07-25 22:45:00 82
2013-07-25 23:00:00 66
2013-07-25 23:15:00 65
2013-07-25 23:30:00 66
2013-07-25 23:45:00 46
The plot below might make more sense what I want (This is the average of all Mondays)
Here's another solution, which does not depend on packages other than xts and zoo.
# example data
ix <- seq(as.POSIXct("2008-01-01"), as.POSIXct("2013-07-26"), by="15 min")
set.seed(21)
x <- xts(sample(200, length(ix), TRUE), ix)
# aggregate by 15-minute observations for each weekday
a <- lapply(split.default(x, format(index(x), "%A")), # split by weekday
function(x) aggregate(x, format(index(x), "%H:%M"), mean)) # aggregate by 15-min
# merge aggregated data into one zoo object, ordering columns
z <- do.call(merge, a)[,c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")]
# convert index to POSIXct to make plotting easier
index(z) <- as.POSIXct(index(z), format="%H:%M")
# plot
plot(z, type="l", nc=1, ylim=range(z), main="Average daily volume", las=1)
Setting ylim forces each plot to have the same y-axis range. Otherwise they would depend on each individual series, which may make them difficult to compare if the values vary greatly.
Try this:
#Get necessary packages
install.packages("lubridate")
install.packages("magrittr")
install.packages("dplyr")
install.packages("ggplot2")
install.packages("scales")
#Import packages
library(lubridate,warn=F)
library(dplyr,warn=F)
library(magrittr)
library(ggplot2,warn=F)
library(scales, warn=F)
#Getting the data
tstart = as.POSIXct('2008-01-01 00:00:00')
tend = as.POSIXct('2013-07-25 23:45:00')
ttimes <- seq(from = tstart,to=tend,by='15 mins')
tvals <- sample(seq(1,200),length(ttimes),T)
tsdata <- data.frame(Dates=ttimes,Vals=tvals)
tsdata <- tsdata %>% mutate(DayofWeek = wday(Dates,label=T), Hours = as.POSIXct(strftime(Dates,format="%H:%M:%S"),format="%H:%M:%S"))
#Pick a day at a time. I am using Mondays for this example.
tsdata_monday <- tsdata %>% filter(DayofWeek=='Mon') %>% group_by(Hours) %>% summarise(meanVals=mean(Vals)) %>% as.data.frame()
#Plotting the graph of mean values versus times for Monday:
ggplot(tsdata_monday) + aes(x=Hours,y=meanVals) + geom_line() + scale_x_datetime(breaks=date_breaks("4 hour"), labels=date_format("%H:%M"))
#If you want you can go ahead and plot all the days. But please keep in mind
#that this does not look good at all. Too many plots for the plot window to
#Display nicely.
alltsdata <- tsdata %>% group_by(DayofWeek, Hours) %>% summarise(MeanVals=mean(Vals)) %>% as.data.frame()
ggplot(alltsdata) + aes(x=Hours,y=MeanVals) + geom_line() + scale_x_datetime(breaks=date_breaks("4 hour"), labels=date_format("%H:%M")) + facet_grid(.~DayofWeek)
I recommend you plot one day at a time or use a for loop or one of the apply function variations to get the plots.
Also when filtering by day of the week, please keep in mind that the days are shortened as follows:
unique(tsdata$DayofWeek)
[1] Tues Wed Thurs Fri Sat Sun Mon
Hope it helps.
apply.daily does exactly what you want.( assumming your data is called d.xts and a xts-object)
apply.daily(d.xts,sum)
another solution would be using aggregate:
aggregate(d.xts,as.Date(index(d.xts)),sum)
Note that the answers are slightly different: apply.daily starts from start(d.xts) to end(d.xts) whereas aggregate goes by day from midnight to midnight.

Adding number of time steps at the end/start of a zoo time series

I have a zoo time series and I want to add some dummy time steps with same time interval at the end/start. For example, I have the following time series and I want to add two more time steps at the end, for times ......21:00:00 BST and ......21:30:00 BST where all observations are zero.
my.zoo.ts = zoo(matrix(c(1:8),ncol=2),
c("2012-07-05 19:00:00 BST", "2012-07-05 19:30:00 BST",
"2012-07-05 20:00:00 BST", "2012-07-05 20:30:00 BST"))
What is the easiest way to do it? (Apart from editing the above code, Of course) )
The series is currently using character strings for times which is not likely what you want so first convert them to POSIXct date/time objects:
time(my.zoo.ts) <- as.POSIXct(time(my.zoo.ts))
The times seem to be spaced by 30 minutes so suppose we want to append 100 and 101 in the two columns at 30 minutes past the last time:
z <- zoo(cbind(100, 101), end(my.zoo.ts) + 30 * 60)
rbind(my.zoo.ts, z)

R: ggplot with datetime on x axis

I'm trying to plot 5 days of historical stock data in R using ggplot. Datetime on the x axis, and the stock values ('close') on the y axis. I only want to show the minutes of the day when the stock market is open, and my data set is limited to 7 hours a day of values x 5 days.
But when I plot it with ggplot, the scale is changed so I get all the hours per day.
ggplot(data = df_stock, aes(x = datetime, y = close)) +
geom_line()
I've tried googling this and using the R help function. I'm quite new to R so my apologies if this is very easy to solve. I hope someone can guide me in the right direction.
While a minimal example would be helpful to address this question, a more general answer which could be useful can be given.
ggplot2 does not have a facility for axis-breaking, as it's not considered good practice. However what you're doing here is actually a transformation of the variable --- hours open rather than hours of the day. So you'll have to transform the variable itself. You can do this using the lubridate package. Let's take two days and imagine the market is open between 10am and 5pm.
require(lubridate)
dates <- c("2014-01-01 10:00:00 UTC", "2014-01-01 13:00:00 UTC", "2014-01-01 17:00:00 UTC", "2014-01-02 13:00:00 UTC", "2014-01-02 17:00:00 UTC")
dates <- ymd_hms(dates)
Now you will need to scale the data to only have the times you want. You can call the parts of the date with `lubridate', and divide by the hours in the day, where here they are 24 rather than 7.
hour(dates) <- hour(dates)-10
scaleddates <- day(dates)-1 + hour(dates)/7 + minute(dates)/60/7 + second(dates)/60/60/7
scaleddates
[1] 0.0000000 0.4285714 1.0000000 1.4285714 2.0000000
Now you can plot the graph, with the x-axis reading 'Days' rather than 'Dates'. The x-axis is now how far you are through the working day.

Binning time series in R?

I'm new to R. My data has 600k objects defined by three attributes: Id, Date and TimeOfCall.
TimeofCall has a 00:00:00 format and range from 00:00:00 to 23:59:59.
I want to bin the TimeOfCall attribute, into 24 bins, each one representing hourly slot (first bin 00:00:00 to 00:59:59 and so on).
Can someone talk me through how to do this? I tried using cut() but apparently my format is not numeric. Thanks in advance!
While you could convert to a formal time representation, in this case it might be easier to just use substr:
test <- c("00:00:01","02:07:01","22:30:15")
as.numeric(substr(test,1,2))
#[1] 0 2 22
Using a POSIXct time to deal with it would also work, and might be handy if you plan on further calculations (differences in time etc):
testtime <- as.POSIXct(test,format="%H:%M:%S")
#[1]"2013-12-09 00:00:01 EST" "2013-12-09 02:07:01 EST" "2013-12-09 22:30:15 EST"
as.numeric(format(testtime,"%H"))
#[1] 0 2 22
You can use cut.POsixlt function. But you should coerce your data to a valid time object. here I am using handy hms from lubridate. And strftime to get the time format.
library(lubridate)
x <- c("09:10:01", "08:10:02", "08:20:02","06:10:03 ", "Collided at 9:20:04 pm")
x.h <- strftime(cut(as.POSIXct(hms(x),origin=Sys.Date()),'hours'),
format='%H:%M:%S')
data.frame(x,x.h)
x x.h
1 09:10:01 10:00:00
2 08:10:02 09:00:00
3 08:20:02 09:00:00
4 06:10:03 07:00:00
5 Collided at 9:20:04 pm 22:00:00

Resources