I have data such that
TIME<- c(36655,14330,23344,9992,...)
which represents seconds after midnight. I want to produce a histogram using ggplot2 so that i can see the distribution of the data BUT including the time during the day on the x-axis rather than seconds.
SO far I have:
ggplot(data=df1, aes(TIME)) + geom_histogram(col="red", fill="green", alpha = .2,bins=9)+
labs(title="Histogram for call time") +labs(x="Time", y="Count")
But this just gives the seconds. However, if i then convert it by:
TIME <- as.POSIXct(strptime(TIME, format="%R"))
this includes today's date which i do NOT want. I just want times i.e. split into c(0:00, 9:00, 12:00, 18:00, 24:00). Is this possible?
As per comments, this works:
TIME <- as.POSIXct(strptime(TIME, format="%R"))
ggplot(data=df1, aes(TIME)) + geom_histogram(col="red", fill="green",
alpha = .2,bins=9)+ scale_x_datetime(date_labels = "%R")
Related
In ggplot2, I have a question about appropriate scales for making POSIXct datetimes into time-of-day in an axis. Consider:
library(tidyverse)
library(lubridate)
library(hms)
library(patchwork)
test <- tibble(
dates = c(ymd_hms("2022-01-01 6:00:00"),
ymd_hms("2023-01-01 19:00:00")),
x = c(1, 2),
hms_dates = as_hms(dates)
)
plot1 <- ggplot(test) + geom_point(aes(x = x, y = dates)) +
scale_y_time()
plot2 <- ggplot(test) + geom_point(aes(x = x, y = hms_dates)) +
scale_y_time()
plot1 + plot2
Plot 1 y axis includes dates and time, but Plot 2 shows just time of day. That's what I want! I'd like to generate plot 2 like images without having to use the hms::as_hms approach. This seems to imply some options for scale_y_datetime (or similar) that I can't discover. I'd welcome suggestions.
Does someone have an example of how to use the limits option in scale_*_time, or (see question #1) limits for a scale_y_datetime that specifies hours within the day, e.g. .. limits(c(8,22)) predictably fails.
For your second question, when dealing with dates or datetimes or times you have to set the limits and/or breaks as dates, datetimes or times too, i.e. use limits = as_hms(c("8:00:00", "22:00:00"):
library(tidyverse)
library(lubridate)
library(hms)
ggplot(test) + geom_point(aes(x = x, y = hms_dates)) +
scale_y_time(limits = as_hms(c("8:00:00", "22:00:00")))
#> Warning: Removed 1 rows containing missing values (`geom_point()`).
Concerning your first question. TBMK this could not be achieved via scale_..._datetime. And if you just want to show the time part of your dates then converting to an has object is IMHO the easiest way to achieve that. You could of course set the units to be shown as axis text via the date_labels argument, e.g. date_labels="%H:%M:%S" to show only the time of day. However, as your dates variable is still a datetime the scale, breaks and limits will still reflect that, i.e. you only change the format of the labels and for your example data you end up with an axis showing the same time for each break, i.e. the start of the day.
ggplot(test) + geom_point(aes(x = x, y = dates)) +
scale_y_datetime(date_labels = "%H:%M:%S")
The company I work for has certain COVID infection rate targets before letting people return from home. One of those targets is daily new infections per unit population to be below 10 per 100k. How can I determine when the upper and lower confidence intervals hit that target? See image with annotation in red.
Right now, the two vertical lines are entered manually, but I'd like these add them automatically at the intersection points of the upper and lower confidence interval.
Data: https://raw.githubusercontent.com/robhanssen/covid19-v3/main/data/sc-casesdeath.csv (filtered after Jan 21, 2021 in image).
Code example (from https://github.com/robhanssen/covid19-v3/blob/main/process_us_data.r)
casesdeathsbylocation %>% filter(date > as.Date("2021-01-21")) %>%
ggplot + aes(x=date, y=casesper100k) + geom_point() + geom_smooth(method="lm", fullrange=TRUE) +
scale_y_continuous(limit=c(-50,100), breaks=seq(0,100,10)) +
scale_x_date(breaks="2 weeks", date_labels="%b %d", limit=as.Date(c("2021-01-21","2021-04-07"))) +
labs(x="Date", y="Cases per 100k population", title="Cases in South Carolina", subtitle="Cases per 100,000") +
geom_hline(yintercept=10, lty=2) +
geom_hline(yintercept=5, lty=3) +
geom_hline(yintercept=0, lty=1) +
geom_vline(xintercept=as.Date("2021-03-09"),lty=2) + geom_vline(xintercept=as.Date("2021-04-02"),lty=2)
You can do this by creating a linear approximation of the inverse of a specified confidence limit (which is linear in this case anyway!) and using it to interpolate the value at which the line hits a specified threshold.
Note that here we are approximating x as a function of y (e.g. date as a function of lower CI):
find_value <- function(x,y,target=10) {
aa <- approx(y,x,xout=target)$y
as.Date(aa,origin="1970-01-01") ## convert back to a date (ugh)
}
Once we have this helper function, we can use it in a tidy workflow that uses broom::augment to generate the confidence intervals.
library(broom)
lims <- (cdbl
## fit linear model
%>% lm(formula=casesper100k~date)
## predict/add confidence intervals
%>% augment(interval="confidence",
newdata=data.frame(date=
seq.Date(from=min(cdbl$date),to=max(cdbl$date)+20,
by="1 day")))
%>% select(date,.lower,.upper)
## interpolate to find date corresponding to target value (10)
## should use across() but I can't get it working
%>% summarise(lwr=find_value(date,.lower),
upr=find_value(date,.upper))
## convert to useful data frame for ggplot
%>% pivot_longer(cols=everything(),names_to="limit",values_to="date")
)
Now you have a lims data frame that you can use for whatever you want. Using it in the plotting context:
(ggplot(cdbl)
+ aes(x=date, y=casesper100k)
+ geom_point()
+ expand_limits(x=max(cdbl$date+20))
+ geom_smooth(method="lm", fullrange=TRUE)
+ scale_y_continuous(limit=c(-50,100), breaks=seq(0,100,10))
+ scale_x_date(breaks="2 weeks", date_labels="%b %d")
+ geom_hline(yintercept=10,lty=2)
+ geom_vline(data=lims,aes(xintercept=date),lty=2)
)
As pointed out in the comments, you will get a more reliable answer if you use a more sophisticated forecasting method. As long as you get the confidence intervals returned by augment, the code here will work.
I have plotted water meter averages for different dates. I want to colour the averages which are measured on the weekends? How do I do this please?
plot <- ggplot(DF, aes(Date, Measurement)) +
geom_point() +
ggtitle('Water Meter Averages') +
xlab('Day No') +
ylab('Measurement in Cubic Feet')
Date <- c("2018-06-25", "2018-06-26", "2018-06-27", "2018-06-28", "2018-06-29", "2018-06-30", "2018-07-01")
Measurement <- c("1","3","5","2","4","5","7")
DF <- data.frame(Date, Measurement)
"2018-06-30" and "2018-07-01" are weekend dates with the corresponding values 5 and 7 respectively. How can I adapt my ggplot code so that R recognizes these dates as weekends and colors the points related to this dates on my ggplot.
First, make sure your data values are actually coded as date/time values in R and not strings or factors. Then you can do
# Make sure class(DF$Date)=="Date"
DF <- data.frame(Date=as.Date(Date), Measurement)
ggplot(DF, aes(Date, Measurement, color=weekdays(Date) %in% c("Saturday","Sunday")))+geom_point() +
ggtitle('Water Meter Averages') +
xlab('Day No') +
ylab('Measurement in Cubic Feet') +
scale_color_discrete(name="Is Weekend")
I have a set of data showing patients arrival and departure in a hospital:
arrival<-c("12:00","12:30","14:23","16:55","00:04","01:00","03:00")
departure<-c("13:00","16:00","17:38","00:30","02:00","07:00","23:00")
I want to produce a histogram counting the number of patients at each time band (00:00-01:00; 01:00-02:00 etc) in the hospital.
So I would get something like between 12:00- 12:59 there is 2 patients etc.
You can try this (change the example data a little bit, to ensure that the departure time is always greater than the arrival time, it will be good if you have date and time both in the arrival and departure), in the figure below, the time label 10:00 actually represents time from 10:00-10:59, you can change the labels if you want.
arrival<-c("12:00","12:30","14:23","16:55","00:04","01:00","03:00")
departure<-c("13:00","16:00","17:38","23:30","02:00","07:00","11:00")
df <- data.frame(arrival=strptime(arrival, '%H:%M'),departure=strptime(departure, '%H:%M'))
hours_present <- do.call('c', apply(df, 1, function(x) seq(from=as.POSIXct(x[1], tz='UTC'),
to=as.POSIXct(x[2], tz='UTC'), by="hour")))
library(ggplot2)
qplot(hours_present, geom='bar') +
scale_x_datetime(date_breaks= "1 hour", date_labels = "%H:%M",
limits = as.POSIXct(c(strptime("0:00", "%H:%M"), strptime("23:00", "%H:%M")), tz='UTC')) +
scale_y_continuous(breaks=1:5) +
theme(axis.text.x = element_text(angle=90, vjust = 0.5))
you can have 'histogram' instead as geom in qplot to get the following figure:
I have a data frame which contains time sequence, like this:
example <- data.frame(
Date=seq(
from=as.POSIXct("2012-1-1 0:00", tz="UTC"),
to=as.POSIXct("2012-1-31 23:00", tz="UTC"),
by="10 min"),
frequency=runif(4459, min=12, max=26))
I would like count min value, mean, max value etc. (using summary table) by days: for example summary table of days 2012 1. 1. (using only the first 144 raws), 2012 1. 2. (using raws from 145 to 288), 2012 1. 3. (using raws from 289 to 432) etc.
how can I get this table? I have tried this
summary(example$freqency, example$Date, by="day")
how can I draw dropbox for every day separately? I have tried this:
boxplot(example$freqency, example$Date, by="day")
How can I select time data within days? I also want to calculate summary table by days, but in this case I want to use only data in every hours (e.g. 0:00, 1:00, 2:00 etc.)
Can somebody help me?
To get summary of frequency by day, you could use aggregate from base R in combination with strftime():
aggregate(frequency ~ strftime(Date, "%d"),
FUN = summary, data = example)
To get a boxplot per day, we just need to create a $day column for the x-axis in ggplot2.
library(ggplot2)
example$day <- strftime(example$Date, "%d")
ggplot(example, aes(x = factor(day), y = frequency)) + geom_boxplot()
Try this simply:
within days:
example$str.date <- substring(as.character(example$Date),1,10)
summary.example <- aggregate(frequency~str.date, example, FUN = summary)
library(ggplot2)
ggplot(example, aes(str.date, frequency, group=str.date, fill=str.date)) + geom_boxplot() +
theme(axis.text.x = element_text(angle=90, vjust = 0.5))
within hours (within each day):
example$str.date.hrs <- substring(as.character(example$Date),1,13)
summary.example <- aggregate(frequency~str.date.hrs, example, FUN = summary)
library(ggplot2)
ggplot(example[example$str.date=='2012-01-01',], aes(str.date.hrs, frequency, group=str.date.hrs, fill=str.date.hrs)) + geom_boxplot() +
theme(axis.text.x = element_text(angle=90, vjust = 0.5))