R how to ggplot frequency every 2 hours in dataframe - r

I've following dataset:
time tta
08:20:00 1
21:30:00 5
22:00:00 1
22:30:00 1
00:25:00 1
17:00:00 5
I would like to plot bar chart using ggplot so that the x-axis has every every 2 hours(00:00:00,02:00:00,04:00:00 and so on) and y-axis has frequency for a factor tta (1 and 5).
x-axis should be 00-01,01-02,... so on

I approached this using the xts package, but then found that it does not offer flooring the time. Hence, I conclude lubridate to be more practical here, also because ggplot does not understand xts objects right away. Both packages help you transforming time data in many ways.
Use xts::align.time or lubridate::floor_date to shift your times to the next/previous full hour/day/etc.
Either way, you aggregate the data before you pass it to ggplot. You can use sum to sum up tta, or just use length to count the number of occurences, but in the latter case you could also use geom_histogram on the time series only. You can carefully shift the bars in ggplot with position_nudge to represent a period rather than just sitting centered on a point of time. You sould specify scale_x_time(labels = ..., breaks = ...) in the plot.
Data:
time <- c(
"08:20:00",
"21:30:00",
"22:00:00",
"22:30:00",
"00:25:00",
"17:00:00"
)
time <- as.POSIXct(time, format = "%H:%M:%S")
tta <- c(1, 5, 1, 1, 1, 5)
Using xts:
library(xts)
myxts <- xts(tta, order.by = time)
myxts_aligned <- align.time(myxts, n = 60*60*2) # shifts all times to the next full
# 2 hours
myxts_agg <- period.apply(myxts_aligned,
INDEX = endpoints(myxts, "hours", 2),
FUN = sum) # sums up every two hours
require(ggplot2)
ggplot(mapping = aes(x = index(myxts_agg), y = myxts_agg[, 1])) +
geom_bar(stat = "identity",
width = 60*60*2, # one bar to be 2 hours wide
position = position_nudge(x = -60*60), # shift one hour to the left
# so that the bar represents the actual period
colour = "black") +
scale_x_time(labels = function(x) strftime(x, "%H:%M"),
breaks = index(myxts_agg)) + # add more breaks manually if you like
scale_y_continuous() # to escape the warning of ggplot not knowing
# how to deal with xts object
Using lubridate:
require(lubridate)
require(tidyverse)
mydf <- data.frame(time = time, tta = tta)
mydf_agg <-
mydf %>%
group_by(time = floor_date(time, "2 hours")) %>%
summarise(tta_sum = sum(tta), tta_freq = n())
ggplot(mydf_agg, aes(x = time, y = tta_sum)) +
geom_bar(stat = "identity",
width = 60*60*2, # one bar to be 2 hours wide
position = position_nudge(x = 60*60), # shift one hour to the *right*
# so that the bar represents the actual period
colour = "black") +
scale_x_time(labels = function(x) strftime(x, "%H:%M"),
breaks = mydf_agg$time) # add more breaks manually if you like
After all, allmost the same:

use the floor_date function from lubridate
library(tidyverse)
library(lubridate)
your_df %>% group_by(floor_date(time,"2 hours")) %>% count(tta)
and then ggplot with geom_col from there

library(lubridate)
library(ggplot2)
Make sure the class for your timestamp is POSxx
> class(df$timestamp)
[1] "POSIXct" "POSIXt"
Then use the scale_x_datetime function as follows.
gg +
scale_x_datetime(expand = c(0, 0), breaks=date_breaks("1 hour"), labels=date_format("%H:%M"))
On this case, it will space the brakes on the x axis, every one hour and the labels will look 09:00 for example.

Related

Change ggplot2 point color based on date occurring less than 4 weeks after previous date

I have an example dataframe composed of:
example dataframe
I have used ggplot2 to plot dates on the x-axis with a count on the y-axis:
df_ggplot <- read.csv("ggplot_ex.csv", header = T, na.strings = "", fileEncoding = "UTF-8-BOM")
df_ggplot$Date <- mdy(df_ggplot$Date)
df_ggplot$Ccount <- as.numeric(as.character(df_ggplot$Ccount))
ggplot(df_ggplot, aes(x=Date, y = Ccount)) +
geom_line() +
geom_point()
ggplot ex output
I am wanting points that occur less than 4 weeks after the previous point to turn red. Can anyone help? In this example, the second point would be red as it occurs about 2 weeks after the previous point.
You probably have to do the calculation in the dataframe before the plot (make sure your Date column is in the correct date format).
One option you can try:
df_ggplot <- df_ggplot %>%
mutate(time_diff = difftime(time1 = Date, time2 = lag(x = Date, n = 1), units = "weeks"),
is_red = as.factor(time_diff < 4))
will give you the points that must be flagged.
Date Ccount time_diff is_red
1 2019-08-17 20000 NA weeks <NA>
2 2019-08-30 15000 1.857143 weeks TRUE
3 2019-09-30 25000 4.285714 weeks FALSE
Then you can plot, using some the colors you want.
ggplot(df_ggplot, aes(x = Date, y = Ccount)) +
geom_line() +
geom_point(aes(color = is_red)) +
scale_color_manual(values = c("black", "red"), na.value = "black")

R histogram of timeseries data with duration on y-axis

I'm trying to create a histogram from time-series data in R, similar to this question. Each bin should show the total duration for the values falling within the bin. I have non-integer sample times in an zoo object of thousands of rows. The timestamps are irregular, and the data is assumed to be constant between each timestamp (sample-and-hold).
Example data:
library(zoo)
library(ggplot2)
timestamp = as.POSIXct(c("2018-02-21 15:00:00.0", "2018-02-21 15:00:02.5", "2018-02-21 15:00:05.2", "2018-02-21 15:00:07.0", "2018-02-21 15:00:09.3", "2018-02-21 15:00:10.0", "2018-02-21 15:00:12.0"), tz = "GMT")
data = c(0,3,5,1,3,0,2)
z = zoo(data, order.by = timestamp)
x.df <- data.frame(Date = index(z), Value = as.numeric(coredata(z)))
ggplot(x.df, aes(x = Date, y = Value)) + geom_step() + scale_x_datetime(labels = date_format("%H:%M:%OS"))
Please see the times-series plot here. Creating a histogram with hist(z, freq = T) does not care about the timestamps: Plot from hist method.
My desired output is a histogram with duration in seconds on the y-axis, something like this: Histogram with non-integer duration on y-axis.
Edit:
I should point out that the data values are not integers, and that i want to be able to control the bin width(s). I could use diff(timestamp) to create a (non-integer) column showing duration for each point, and plotting a bar graph like suggested by #MKR:
x.df = data.frame(DurationSecs = as.numeric(diff(timestamp)), Value = data[-length(data)])
ggplot(x.df, aes(x = Value, y = DurationSecs)) + geom_bar(stat = "identity")
This gives a histogram with the right bar heights for the example. But this fails when the values are floating point numbers.
Since you want duration (in seconds) on y-axis, hence you should add one column in x.df for duration. A histogram with stat = sum will fit needs of OP. The steps are
library(zoo)
library(dplyr)
timestamp = as.POSIXct(c("2018-02-21 15:00:00.0", "2018-02-21 15:00:02.5",
"2018-02-21 15:00:05.2", "2018-02-21 15:00:07.0", "2018-02-21 15:00:09.3",
"2018-02-21 15:00:10.0", "2018-02-21 15:00:12.0"), tz = "GMT")
data = c(0,3,5,1,3,0,2)
z = zoo(data, order.by = timestamp)
x.df <- data.frame(Date = index(z), Value = as.numeric(coredata(z)))
# DurationSecs is added as numeric. It shows diff from earliest time.
x.df <- x.df %>% arrange(Date) %>%
mutate(DurationSecs = ifelse(is.na(lead(Date)), 0, lead(Date) - Date))
# Draw the plot now
ggplot(x.df, aes(x = Value, y = DurationSecs)) + geom_histogram(stat="sum")
#The data
# Date Value DurationSecs
#1 2018-02-21 15:00:00 0 2.5
#2 2018-02-21 15:00:02 3 2.7
#3 2018-02-21 15:00:05 5 1.8
#4 2018-02-21 15:00:07 1 2.3
#5 2018-02-21 15:00:09 3 0.7
#6 2018-02-21 15:00:10 0 2.0
#7 2018-02-21 15:00:12 2 0.0
After some trial and error I found a solution. The answer provided by MKR sort of works, but I could not set the number of bins and it failed for floating-pont values.
I came across the wonderful functions cut and xtab in this question: How to plot an histogram with y as a sum of the x values for every bin in ggplot2. The solution provided there was painfully slow, drawing each data-point duration as stacked bars.
I don't need separate bars for each data-point, I just need the sum of the durations within each bin. This is my solution:
library(dplyr)
library(magrittr)
library(zoo)
library(ggplot2)
timestamp = as.POSIXct(c("2018-02-21 15:00:00.0", "2018-02-21 15:00:02.5",
"2018-02-21 15:00:05.2", "2018-02-21 15:00:07.0", "2018-02-21 15:00:09.3",
"2018-02-21 15:00:10.0", "2018-02-21 15:00:12.0"), tz = "GMT")
data = c(0,3,5,1,3,0,2)
z = zoo(data, order.by = timestamp)
x.df <- data.frame(Date = index(z), Value = as.numeric(coredata(z)))
# DurationSecs is added as numeric. It shows diff from the previous datapoint.
x.df <- x.df %>% arrange(Date) %>%
mutate(DurationSecs = ifelse(is.na(lead(Date)), 0, lead(Date) - Date))
# Adding a column of bins to the dataframe:
BinCount <- 7
x.df$bins = cut(x.df$Value, pretty(x.df$Value, n = BinCount), include.lowest = TRUE, right = FALSE)
# Creating a new dataframe containing bins and the sum of DurationSecs for each bin.
y.df = data.frame(xtabs(DurationSecs ~ bins, x.df))
# Ready to plot
ggplot(y.df, aes(x = bins, y = Freq)) +
geom_bar(stat = "identity") +
ylab("Duration") +
xlab("Value") +
scale_x_discrete(drop = F) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.3, hjust = 1)) +
scale_y_continuous(breaks = scales::pretty_breaks(n = 10))
The result is shown here. As a bonus, the labels on the x-axis are really beautiful, and I have the frequency table available for further analysis.

R - extracting time only from xts, zoo and POSIXct

I am analyzing day to day data to see when the value would be lower. I set each day as categorical variable so I can differentiate each day. But I want to get each day plotted on top of another day instead of one continuous graph as shown below.
Data set:
Value Day
2013-01-03 01:55:00 0.35435715 1
2013-01-03 02:00:00 0.33018654 1
2013-01-03 02:05:00 0.38976118 1
2013-01-04 02:10:00 0.45583868 2
2013-01-04 02:15:00 0.29290860 2
My current ggplot code is as follows:
g <- ggplot(data = Data, aes(x = Index, color = Dates)) +
geom_line(y = Data$Value) +
scale_x_datetime(date_breaks = TimeIntervalForGraph, date_labels = "%H") +
xlab("Time") +
ylab("Random value")
I would really appreciate if anyone can guide me on how I can turn my x-axis into 24hrs time series so that I can plot each day on the same graph to see when the value is lower during the 24 hrs.Thanks in advance.
Method tried:
I tried creating an 3rd column with time only, for some reasons the following codes didnt work:
time <- format(index(x), format = "%H:%M"))
data <- cbind(data, time)
You need a way of summarising the data for each hour of the day. Here are some approaches you're probably looking for:
library(xts)
library(data.table)
library(ggplot2)
tm <- seq(as.POSIXct("2017-08-08 17:30:00"), by = "5 mins", length.out = 10000)
z <- xts(runif(10000), tm, dimnames = list(NULL, "vals"))
DT <- data.table(time = index(z), coredata(z))
# note the data.table syntax is different:
DT[, hr := hour(time)]
# Plot the average value by hour:
datByHour <- DT[, list(avgval = mean(vals)), by = c("hr")]
# Use line plot if you have one point per hour:
g <- ggplot(data = datByHour, aes(x = hr, y = avgval, colour = avgval)) +
geom_line()
datByHour <- DT[, list(avgval = mean(vals)), by = c("hr")]
# visualise the distribution by hour:
g2 <- ggplot(data = DT, aes(x = hr, y = vals, group = hr)) +
geom_boxplot()
Please try the following and let me know if it works (here I am taking tm time column as given):
Data$tm = strftime(Data$tm, format="%H:%M:%S")
library(ggplot2)
ggplot(Data, aes(x = tm, y = Value, group = Day, colour = Day)) +
geom_line() +
theme_classic()

How to creat a time vector excluding the system date when using R?

I want to creat a time vector which starts at 0:05:00 A.M and ends at 0:00:00 A.M the next day.The interval between each time spot is 5 minutes;
Then I want a y-t line plot with qplot().
Here is my R code:
t<-strptime('0:05:00','%H:%M:%S')+(0:287)*300
y<-rnorm(288,5,1)
qplot(t,y,geom = 'line')
the outcome is like this:
As you can see, the 't' is added with system date 'Aug 05'.What I want is 'hour : minute' only.
What should I do with my code?
Here is a solution using ggplot2 and POSIX formatting for dates which is easy to manipulate with ggplot:
df = data.frame(
t = seq(as.POSIXct("2016-01-01 05:00:00"), as.POSIXct("2016-01-02 00:00:00"), by = '5 min', tz = "Europe"),
y = rnorm(229,5,1))
ggplot(df, aes(t, y)) + geom_line() +
scale_x_datetime(labels = date_format('%H:%M', tz = "GMT"), breaks = date_breaks('2 hours'))
One suggestion is to manually set the tick labels. Note that in the snippet below, I amended slightly your code for t and y, so that they start and end at 0:00:00 (instead of starting at 0:05:00).
t <- strptime('0:00:00','%H:%M:%S')+(0:288)*300
y <- c(NA, rnorm(288,5,1))
tlabs <- format(t, "%H:%M")
breaks <- seq(1, 289, 72)
qplot(as.numeric(t),y,geom = 'line') +
scale_x_continuous(labels=tlabs[breaks], breaks=as.numeric(t)[breaks]) +
xlab("t")
Output:

How do I scale an axis for time intervals using ggplot2?

Say I want to plot a histogram of a bunch of time intervals:
library(ggplot2)
library(dplyr)
# Generate 1000 random time difftime values
set.seed(919)
center <- as.POSIXct(as.Date("2014-12-18"))
df <- data.frame(
center,
noise = center + rnorm(1000, mean = 86400, sd = 86400 * 3)
) %>%
mutate(diff = center - noise)
# Plot histogram of the difftime values --
# coerce to numeric, because otherwise it won't plot
qplot(data = df, x = as.numeric(diff), geom = "histogram")
I get this plot:
Is there a way to change the x-axis to be reasonable date-time values? (That is, I'd want 86400 to be labelled as "1 day", -86400 to be labelled as "- 1 day", etc.) I could do this manually by setting breaks and labels, but I'm hoping that ggplot has a way to handle difftime values automatically.
Instead of subtracting the dates you can use difftime() and use days as the units.
library(ggplot2)
library(dplyr)
# Generate 1000 random time difftime values
set.seed(919)
center <- as.POSIXct(as.Date("2014-12-18"))
df <- data.frame(
center,
noise = center + rnorm(1000, mean = 86400, sd = 86400 * 3)
) %>%
mutate(diff = difftime(center, noise, unit = "days"))
# Plot histogram of the difftime values --
# coerce to numeric, because otherwise it won't plot
qplot(data = df, x = as.numeric(diff), geom = "histogram") +
xlab("Days") + ylab("Count")

Resources