I've the following dataset:
https://app.box.com/s/au58xaw60r1hyeek5cua6q20byumgvmj
I want to create a density plot based on the time of the day. Here is what I've done so far:
library("ggplot2")
library("scales")
library("lubridate")
timestamp_df$timestamp_time <- format(ymd_hms(hn_tweets$timestamp), "%H:%M:%S")
ggplot(timestamp_df, aes(timestamp_time)) +
geom_density(aes(fill = ..count..)) +
scale_x_datetime(breaks = date_breaks("2 hours"),labels=date_format("%H:%M"))
It gives the following error:
Error: Invalid input: time_trans works with objects of class POSIXct only
If I convert that to POSIXct, it adds dates to the data.
Update 1
The following converted data to 'NA'
timestamp_df$timestamp_time <- as.POSIXct(timestamp_df$timestamp_time, format = "%H:%M%:%S", tz = "UTC"
Update 2
Following is what I want to achieve:
One problem with the solutions posted here is that they ignore the fact that this data is circular/polar (i.e. 00hrs == 24hrs). You can see on the plots on the other answer that the ends of the charts dont match up with each other. This wont make too much of a difference with this particular dataset, but for events that happen near midnight, this could be an extremely biased estimator of density. Here's my solution, taking into account the circular nature of time data:
# modified code from https://freakonometrics.hypotheses.org/2239
library(dplyr)
library(ggplot2)
library(lubridate)
library(circular)
df = read.csv("data.csv")
datetimes = df$timestamp %>%
lubridate::parse_date_time("%m/%d/%Y %h:%M")
times_in_decimal = lubridate::hour(datetimes) + lubridate::minute(datetimes) / 60
times_in_radians = 2 * pi * (times_in_decimal / 24)
# Doing this just for bandwidth estimation:
basic_dens = density(times_in_radians, from = 0, to = 2 * pi)
res = circular::density.circular(circular::circular(times_in_radians,
type = "angle",
units = "radians",
rotation = "clock"),
kernel = "wrappednormal",
bw = basic_dens$bw)
time_pdf = data.frame(time = as.numeric(24 * (2 * pi + res$x) / (2 * pi)), # Convert from radians back to 24h clock
likelihood = res$y)
p = ggplot(time_pdf) +
geom_area(aes(x = time, y = likelihood), fill = "#619CFF") +
scale_x_continuous("Hour of Day", labels = 0:24, breaks = 0:24) +
scale_y_continuous("Likelihood of Data") +
theme_classic()
Note that the values and slopes of the density plot match up at the 00h and 24h points.
Here is one approach:
library(ggplot2)
library(lubridate)
library(scales)
df <- read.csv("data.csv") #given in OP
convert character to POSIXct
df$timestamp <- as.POSIXct(strptime(df$timestamp, "%m/%d/%Y %H:%M", tz = "UTC"))
library(hms)
extract hour and minute:
df$time <- hms::hms(second(df$timestamp), minute(df$timestamp), hour(df$timestamp))
convert to POSIXct again since ggplot does not work with class hms.
df$time <- as.POSIXct(df$time)
ggplot(df, aes(time)) +
geom_density(fill = "red", alpha = 0.5) + #also play with adjust such as adjust = 0.5
scale_x_datetime(breaks = date_breaks("2 hours"), labels=date_format("%H:%M"))
to plot it scaled to 1:
ggplot(df) +
geom_density( aes(x = time, y = ..scaled..), fill = "red", alpha = 0.5) +
scale_x_datetime(breaks = date_breaks("2 hours"), labels=date_format("%H:%M"))
where ..scaled.. is a computed variable for stat_density made during plot creation.
Related
I'm trying to visualize time series data from measurements with animals, in ggplot2.
In y-axis I have groups of days for different sets of data, in x-axis I have day time (in hour:min:sec format), and set the color to factors associated to whether the determinations were registered in daylight or in darkness.
My problem is that I need to set the measurement starting point at 7:45am every day and ending point at 7:45am of the following day.
I succeeded to group the data into days by for loops like this one (repeated until during the time we did the experiment):
df$Day <- 0
for(i in 1:nrow(df)) {
if((ymd_hms(df$Take[i]) - ymd_hms(df$Take[1]) < as_hms(86400))) {
df$Day[i] <- 1
} else df$Day[i] <- df$Day[i]
}
for(i in 1:nrow(df)) {
if((ymd_hms(df$Take[i]) - ymd_hms(df$Take[1]) < as_hms(86400*2)) & (ymd_hms(df$Take[i]) - ymd_hms(df$Take[1]) > as_hms(86400))) {
df$Day[i] <- 2
} else df$Day[i] <- df$Day[i]
}
And I also managed to assign the daylight and darkness categories with this dplyr code:
light_a <- as.hms('07:45:00')
light_b <- as.hms('19:45:00')
library(dplyr)
df2 <- df %>%
mutate(Light_dark = if_else(df$Take_Time > light_a & df$Take_Time < light_b, 'Light', 'Dark'))%>%
mutate(Light_dark2 = if_else(df$Take_Time > light_a & df$Take_Time < light_b, '1', '0'))
I also defined categories and group the data by days:
df2$Day <- as.factor(df2$Day)
df2 <- df2 %>%
group_by(Day, Light_dark)
However, I cannot obtain the plot starting by the starting point. What I obtain is a graph with disordered data:
ggplot(data = df2, aes(x = Take_Time,y= as.factor(Day))) +
geom_point(aes(col = as.factor(Light_dark)),alpha= 0.6, size = 2)+
labs(subtitle="Feeding system.",y="Day",x="Time (h:m:s)",title="Distribution of the data", color = "Lightness intervals")+
scale_color_manual(values = c('aquamarine3', 'chocolate')) +
theme_minimal() + theme(axis.text.x = element_text(angle = 45,hjust = 1))
I tried to add this scale_x_time(), but then the points disappear:
scale_x_time(
labels = "%H:%M",
breaks = "2 hours",
limits = c(as.hms('07:45:00'),
as.hms('07:44:00')),
timezone = Sys.timezone(),
expand = c(0,0)
) +
Any idea to fix the plot is welcomed.
I just wanted to inform I managed to get a proper plot by calculating differences.
If the difference of the value and midnight is lower than the difference of my cutoff value and midnight, means it is in the time range between the starting point and midnight and I leave the value as it is.
If the difference is bigger, means that it is actually the range of time between midnight and the cutoff of the following day, and what I do is to sum the value to midnight's one.
Afterwards, I change the labels in the ggplot2 command.
df2$crono <- 0
for(i in 1:nrow(df2)) {
if((as_hms('24:00:00') - as_hms(df2$Take_Time[i])) <= (as_hms('24:00:00') -
as_hms('07:45:00'))) {
df2$crono[i] <- ((as_hms(df2$Take_Time[i])))
} else df2$crono[i] <- ((as_hms('24:00:00') + as_hms(df2$Take_Time[i])))
}
ggplot(data = df2, aes(x = crono,y= as.factor(Day))) +
geom_point(aes(col = as.factor(Light_dark)),alpha= 0.6, size = 2)+
labs(subtitle="Feeding system.",y="Day",x="Time (h:m:s)",title="Distribution of the data",
color = "Lightness intervals")+
scale_color_manual(values = c('aquamarine3', 'chocolate')) +
scale_x_continuous( limits = c( 25000, 115000),
breaks= c(28800,43200,57600,72000,86400,100800,115200),
labels=c("08:00","12:00","16:00","20:00","24:00","04:00","08:00"))+
theme_minimal() + theme(axis.text.x = element_text(angle = 45,hjust = 1))
Plot of the distribution data with fixed x-axis
I am currently trying to generate NOAA tide prediction charts (x = datetime, y = water level) with the dawn/sunrise/dusk/sunset times as vertical lines along the x axis timeline.
The rnoaa package calls the data and gives me the prediction date times in POSIXct. The suncalc library provides me a data frame with each date in the range's sunrise, sunset, etc. in POSIXct format as well.
library(rnoaa)
library(tidyverse)
library(ggplot2)
library(suncalc)
march.tides <- as.data.frame(coops_search(station_name = 8551762,
begin_date = 20200301, end_date = 20200331,
datum = "mtl", product = "predictions"))
march.tides <- march.tides %>%
mutate(yyyy.mm.dd = as.Date(predictions.t))
dates <- unique(march.tides$yyyy.mm.dd)
sunlight.times <- getSunlightTimes(date = seq.Date(as.Date("2020/3/1"), as.Date("2020/3/31"), by = 1),
lat = 39.5817, lon = -75.5883, tz = "EST")
I then have a loop that spits out separate plots for each calendar date - which works hunky dory. The vertical lines are drawing on the graph without an error, but are definitely in the wrong spot (sunrise is being drawn around 11am when it should be 06:30).
for (i in seq_along(dates)) {
plot <- ggplot(subset(march.tides, march.tides$yyyy.mm.dd==dates[i])) +
aes(x = predictions.t, y = predictions.v) +
geom_line(size = 1L, colour = "#0c4c8a") +
theme_bw() +
geom_vline(xintercept = sunlight.times$sunrise) +
geom_vline(xintercept = sunlight.times$sunset) +
geom_vline(xintercept = sunlight.times$dawn, linetype="dotted") +
geom_vline(xintercept = sunlight.times$dusk, linetype="dotted") +
ggtitle(dates[i])
print(plot)
}
I could alternatively facet the separate dates instead of this looping approach. Even when I subset the data to a single date, the vertical lines still did not draw correctly.
I wondered if maybe the issue was a time zone one. If I try to stick a time zone argument onto the tide prediction data call, I get the error:
Error in if (!repeated && grepl("%[[:xdigit:]]{2}", URL, useBytes = TRUE)) return(URL) :
missing value where TRUE/FALSE needed
It looks like you want to use EST as your timezone, so you could include in your conversion of predictions.t.
I would be explicit in what you want labeled on your xaxis in ggplot using scale_x_datetime, including the timezone.
library(rnoaa)
library(tidyverse)
library(ggplot2)
library(suncalc)
library(scales)
march.tides <- as.data.frame(coops_search(station_name = 8551762,
begin_date = 20200301, end_date = 20200331,
datum = "mtl", product = "predictions"))
march.tides <- march.tides %>%
mutate(yyyy.mm.dd = as.Date(predictions.t, tz = "EST"))
dates <- unique(march.tides$yyyy.mm.dd)
sunlight.times <- getSunlightTimes(date = seq.Date(as.Date("2020/3/1"), as.Date("2020/3/31"), by = 1),
lat = 39.5817, lon = -75.5883, tz = "EST")
for (i in seq_along(dates)) {
plot <- ggplot(subset(march.tides, march.tides$yyyy.mm.dd==dates[i])) +
aes(x = predictions.t, y = predictions.v) +
geom_line(size = 1L, colour = "#0c4c8a") +
theme_bw() +
geom_vline(xintercept = sunlight.times$sunrise) +
geom_vline(xintercept = sunlight.times$sunset) +
geom_vline(xintercept = sunlight.times$dawn, linetype="dotted") +
geom_vline(xintercept = sunlight.times$dusk, linetype="dotted") +
ggtitle(dates[i]) +
scale_x_datetime(labels = date_format("%b %d %H:%M", tz = "EST"))
print(plot)
}
Plot
I am trying to plot some data, fitted values and forecasts on a nice ggplot format but when I plot my data the way I think should work I get a gap between the real data and the forecast. The gap is meaningless but it would be nice if it was gone.
Some R code you can use to recreate my problem is:
library(xts)
library(tidyverse)
library(forecast)
dates <- seq(as.Date("2016-01-01"), length = 100, by = "days")
realdata <- arima.sim(model = list(ar = 0.7, order = c(1,1,0)), n = 99)
data <- xts(realdata, order.by = dates)
user_arima <- arima(data, order = c(1,1,0))
user_arimaf <- forecast(user_arima)
fits <- xts(user_arimaf$fitted, order.by = dates)
fcastdates <- as.Date(dates[100]) + 1:10
meancast <- xts(user_arimaf$mean[1:10], order.by = fcastdates)
lowercast95 <- xts(user_arimaf$lower[1:10], order.by = fcastdates)
uppercast95 <- xts(user_arimaf$upper[1:10], order.by = fcastdates)
frame <- merge(data, fits, meancast, uppercast95, lowercast95, all = TRUE, fill = NA)
frame <- as.data.frame(frame) %>%
mutate(date = as.Date(dates[1] + 0:(109)))
frame %>%
ggplot() +
geom_line(aes(date, data, color = "Data")) +
geom_line(aes(date, fits, color = "Fitted")) +
geom_line(aes(date, meancast, color = "Forecast")) +
geom_ribbon(aes(date, ymin=lowercast95,ymax=uppercast95),alpha=.25) +
scale_color_manual(values = c(
'Data' = 'black',
'Fitted' = 'red',
'Forecast' = 'darkblue')) +
labs(color = 'Legend') +
theme_classic() +
ylab("some data") +
xlab("Date") +
labs(title = "chart showing a gap",
subtitle = "Shaded area is the 95% CI from the ARIMA")
And the chart is below
I know there is a geom_forecast in ggplot now but I would like to build this particular plot the way i'm doing it. Although if there's no other solution to the gap then i'll use the geom_forecast.
Closing the gap requires providing a data point in the meancast column for the blank area. I guess it makes sense just to use the value for the last "real" data point.
# Grab the y-value corresponding to the date just before the gap.
last_data_value = frame[frame$date == as.Date("2016-04-09"), "data"]
# Construct a one-row data.frame.
extra_row = data.frame(data=NA_real_,
fits=NA_real_,
meancast=last_data_value,
uppercast95=last_data_value,
lowercast95=last_data_value,
date=as.Date("2016-04-09"))
# Add extra row to the main data.frame.
frame = rbind(frame, extra_row)
Suppose we have a series of positions in a 2D space that change over time:
start.time <- strptime("2016-11-22_15-44-24",
format = "%Y-%m-%d_%H-%M-%S",
tz = "UTC")
end.time <- strptime("2016-11-22_17-25-12",
format = "%Y-%m-%d_%H-%M-%S",
tz = "UTC")
date <- seq.POSIXt(from = start.time, to = end.time, length.out = 100)
a <- seq(0, 10, length.out = 100)
x <- a * cos(a)
y <- a * sin(a)
my.df <- data.frame(date, x, y)
ggplot(my.df, aes(x, y, color = date)) + geom_point()
This gives the following graph:
Now I would like to change the color palette to something more "dynamic", say the Spectral palette of the color brewer.
Following this answer, I used that commands:
myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")))
sc <- scale_colour_gradientn(colours = myPalette(100))
ggplot(my.df, aes(x, y, color = date)) + geom_point() + sc
But I have an error:
Error in Ops.POSIXt((x - from[1]), diff(from)) :
'/' not defined for "POSIXt" objects
(Rough translation from the actual error I get, which is half English/half French).
I guess at some point, scale_color_gradientn tries to divide my time scale in equal parts, and fails to do so because it doesn't know how to divide dates.
What can I do then?
you could do a little numeric work around like this with scale_color_distiller:
library(lubridate)
ggplot(my.df, aes(x, y, color = as.numeric(date))) +
geom_point() +
scale_color_distiller(palette = "Spectral",
breaks = as.numeric(my.df$date[c(1,50,100)]),
labels = paste0(hour(my.df$date[c(1,50,100)]), ":", minute(my.df$date[c(1,50,100)])),
name = "date")
scale_color_brewer will only work for discrete date
I want to creat a time vector which starts at 0:05:00 A.M and ends at 0:00:00 A.M the next day.The interval between each time spot is 5 minutes;
Then I want a y-t line plot with qplot().
Here is my R code:
t<-strptime('0:05:00','%H:%M:%S')+(0:287)*300
y<-rnorm(288,5,1)
qplot(t,y,geom = 'line')
the outcome is like this:
As you can see, the 't' is added with system date 'Aug 05'.What I want is 'hour : minute' only.
What should I do with my code?
Here is a solution using ggplot2 and POSIX formatting for dates which is easy to manipulate with ggplot:
df = data.frame(
t = seq(as.POSIXct("2016-01-01 05:00:00"), as.POSIXct("2016-01-02 00:00:00"), by = '5 min', tz = "Europe"),
y = rnorm(229,5,1))
ggplot(df, aes(t, y)) + geom_line() +
scale_x_datetime(labels = date_format('%H:%M', tz = "GMT"), breaks = date_breaks('2 hours'))
One suggestion is to manually set the tick labels. Note that in the snippet below, I amended slightly your code for t and y, so that they start and end at 0:00:00 (instead of starting at 0:05:00).
t <- strptime('0:00:00','%H:%M:%S')+(0:288)*300
y <- c(NA, rnorm(288,5,1))
tlabs <- format(t, "%H:%M")
breaks <- seq(1, 289, 72)
qplot(as.numeric(t),y,geom = 'line') +
scale_x_continuous(labels=tlabs[breaks], breaks=as.numeric(t)[breaks]) +
xlab("t")
Output: