Producing a histogram - occurence of events across the hours - r

I have a set of data showing patients arrival and departure in a hospital:
arrival<-c("12:00","12:30","14:23","16:55","00:04","01:00","03:00")
departure<-c("13:00","16:00","17:38","00:30","02:00","07:00","23:00")
I want to produce a histogram counting the number of patients at each time band (00:00-01:00; 01:00-02:00 etc) in the hospital.
So I would get something like between 12:00- 12:59 there is 2 patients etc.

You can try this (change the example data a little bit, to ensure that the departure time is always greater than the arrival time, it will be good if you have date and time both in the arrival and departure), in the figure below, the time label 10:00 actually represents time from 10:00-10:59, you can change the labels if you want.
arrival<-c("12:00","12:30","14:23","16:55","00:04","01:00","03:00")
departure<-c("13:00","16:00","17:38","23:30","02:00","07:00","11:00")
df <- data.frame(arrival=strptime(arrival, '%H:%M'),departure=strptime(departure, '%H:%M'))
hours_present <- do.call('c', apply(df, 1, function(x) seq(from=as.POSIXct(x[1], tz='UTC'),
to=as.POSIXct(x[2], tz='UTC'), by="hour")))
library(ggplot2)
qplot(hours_present, geom='bar') +
scale_x_datetime(date_breaks= "1 hour", date_labels = "%H:%M",
limits = as.POSIXct(c(strptime("0:00", "%H:%M"), strptime("23:00", "%H:%M")), tz='UTC')) +
scale_y_continuous(breaks=1:5) +
theme(axis.text.x = element_text(angle=90, vjust = 0.5))
you can have 'histogram' instead as geom in qplot to get the following figure:

Related

How to properly plot a histogram with dates using ggplot?

I would like to create an interactive histogram with dates on the x-axis.
I have used ggplot+ggplotly.
I've read I need to use to pass the proper information using the "text=as.character(mydates)" option and sometimes "tooltips=mytext".
This trick works for other kinds of plots but there is a problem with the histograms, instead of getting a single bar with a single value I get many sub-bars stacked.
I guess the reason is passing "text=as.character(fechas)" produces many values instead of just the class value defining that bar.
How can I solve this problem?
I have tried filtering myself the data but I don't know how to make this the parameters match the parameters used by the histogram, such as where the dates start for each bar.
library(lubridate)
library(ggplot2)
library(ggplotly)
Ejemplo <- data.frame(fechas = dmy("1-1-20")+sample(1:100,100, replace=T),
valores=runif(100))
dibujo <- ggplot(Ejemplo, aes(x=fechas, text=as.character(fechas))) +
theme_bw() + geom_histogram(binwidth=7, fill="darkblue",color="black") +
labs(x="Fecha", y="Nº casos") +
theme(axis.text.x=element_text(angle=60, hjust=1)) +
scale_x_date(date_breaks = "weeks", date_labels = "%d-%m-%Y",
limits=c(dmy("1-1-20"), dmy("1-4-20")))
ggplotly(dibujo)
ggplotly(dibujo, tooltip = "text")
As you can see, the bars are not regular histogram bars but something complex.
Using just ggplot instead of ggplotly shows the same problem, though then you woulnd't need to use the extra "text" parameter.
Presently, feeding as.character(fechas) to the text = ... argument inside of aes() will display the relative counts of distinct dates within each bin. Note the height of the first bar is simply a count of the total number of dates between 6th of January and the 13th of January.
After a thorough reading of your question, it appears you want the maximum date within each weekly interval. In other words, one date should hover over each bar. If you're partial to converting ggplot objects into plotly objects, then I would advise pre-processing the data frame before feeding it to the ggplot() function. First, group by week. Second, pull the desired date by each weekly interval to show as text (i.e., end date). Next, feed this new data frame to ggplot(), but now layer on geom_col(). This will achieve similar output since you're grouping by weekly intervals.
library(dplyr)
library(lubridate)
library(ggplot2)
library(plotly)
set.seed(13)
Ejemplo <- data.frame(fechas = dmy("1-1-20") + sample(1:100, 100, replace = T),
valores = runif(100))
Ejemplo_stat <- Ejemplo %>%
arrange(fechas) %>%
filter(fechas >= ymd("2020-01-01"), fechas <= ymd("2020-04-01")) %>% # specify the limits manually
mutate(week = week(fechas)) %>% # create a week variable
group_by(week) %>% # group by week
summarize(total_days = n(), # total number of distinct days
last_date = max(fechas)) # pull the maximum date within each weekly interval
dibujo <- ggplot(Ejemplo_stat, aes(x = factor(week), y = total_days, text = as.character(last_date))) +
geom_col(fill = "darkblue", color = "black") +
labs(x = "Fecha", y = "Nº casos") +
theme_bw() +
theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
scale_x_discrete(label = function(x) paste("Week", x))
ggplotly(dibujo) # add more text (e.g., week id, total unique dates, and end date)
ggplotly(dibujo, tooltip = "text") # only the end date is revealed
The "end date" is displayed once you hover over each bar, as requested. Note, the value "2020-01-12" is not the last day of the second week. It is the last date observed in the second weekly interval.
The benefit of the preprocessing approach is your ability to modify your grouped data frame, as needed. For example, feel free to limit the date range to a smaller (or larger) subset of weeks, or start your weeks on a different day of the week (e.g., Sunday). Furthermore, if you want more textual options to display, you could also display your total number of unique dates next to each bar, or even display the date ranges for each week.

How to plot time from seconds to HH/MM

I have data such that
TIME<- c(36655,14330,23344,9992,...)
which represents seconds after midnight. I want to produce a histogram using ggplot2 so that i can see the distribution of the data BUT including the time during the day on the x-axis rather than seconds.
SO far I have:
ggplot(data=df1, aes(TIME)) + geom_histogram(col="red", fill="green", alpha = .2,bins=9)+
labs(title="Histogram for call time") +labs(x="Time", y="Count")
But this just gives the seconds. However, if i then convert it by:
TIME <- as.POSIXct(strptime(TIME, format="%R"))
this includes today's date which i do NOT want. I just want times i.e. split into c(0:00, 9:00, 12:00, 18:00, 24:00). Is this possible?
As per comments, this works:
TIME <- as.POSIXct(strptime(TIME, format="%R"))
ggplot(data=df1, aes(TIME)) + geom_histogram(col="red", fill="green",
alpha = .2,bins=9)+ scale_x_datetime(date_labels = "%R")

Show weekdays and times on the x-axis in ggplot

I have a dataset with events. These events have a start time and a duration. I want to create a scatter plot with the start time on the x-axis and the duration on the y-axis, but I want to alter the x-axis so that it displays the course of a week. That is, I want the x-axis to start on Monday 00:00 and run through Sunday 23:59.
All the solutions I've found online show me how to perform group-by-and-sum over weekdays, which is not what I want to do. I want to plot all data points individually, I simply want to reduce the date-axis to weekday and time.
Any suggestions?
This does what you need. What it does is to create a new variable by putting every observation in one week, and then generate a scatter plot in a necessary format.
library(lubridate)
library(dplyr)
set.seed(1)
tmp <- data.frame(st_time = mdy("01-01-2018") + minutes(sample(1e5, size = 100)))
tmp <- tmp %>%
mutate(st_week = floor_date(st_time, unit = 'week')) %>% # calculate the start of week
mutate(st_time_inweek = st_time - st_week) %>% # calculate the time elapsed from the start of the week
mutate(st_time_all_in_oneweek = st_week[1] + st_time_inweek) %>% # put every obs in one week
mutate(duration = runif(100, 0, 100)) # generate a random duration variable
This is how to generate the plot. The part "%a %H:%M:%S" could be just "%a" as the time portion is not informative.
library(ggplot2)
ggplot(tmp) + aes(x = st_time_all_in_oneweek, y = duration) +
geom_point() + scale_x_datetime(date_labels = "%a %H:%M:%S", date_breaks = "1 day")
With "%a" the plot look like this:
Maybe late, but for others searching:
there is a solution with
scale_x_date(date_labels = '%a')
described here: Weekdays below date on x-axis in ggplot2

ggplot2: adjusting the number of points on a line graph

I would like to lower the number of points on the lines on my plot.
For example,
date <- c("2017-04-15","2017-04-16","2017-04-17","2017-04-18","2017-04-19","2017-04-20","2017-04-21")
x <- c(1,3,3,4,3,5,2)
df <- data.frame(date,x)
Rather than having a point located at every vertex. I would like one located at every other vertex. The first, third, fifth and seventh vertex would have points while the others would not.
ggplot(df, aes(date,x,group=1)) +
geom_line(size=.4) +
geom_point(size=.7)
This seems simple enough, but I have been unable to find any information on how to do it.
You can use scale_x_date to scale your x axis dates
date <- c("2017-04-15","2017-04-16","2017-04-17","2017-04-18","2017-04-19","2017-04-20","2017-04-21")
x <- c(1,2,3,4,3,5,2)
#Convert date to DATE format using as.Date()
df <- data.frame(date = as.Date(date),x)
ggplot(df, aes(date,x,group=1)) +
geom_line(size=.4) +
geom_point(size=.7) +
scale_x_date(date_breaks = "2 day", date_labels = "%d-%b") #using Scale_x_date to change the spacing and label format for display

How can get summary table/boxplot in time sequence data frame?

I have a data frame which contains time sequence, like this:
example <- data.frame(
Date=seq(
from=as.POSIXct("2012-1-1 0:00", tz="UTC"),
to=as.POSIXct("2012-1-31 23:00", tz="UTC"),
by="10 min"),
frequency=runif(4459, min=12, max=26))
I would like count min value, mean, max value etc. (using summary table) by days: for example summary table of days 2012 1. 1. (using only the first 144 raws), 2012 1. 2. (using raws from 145 to 288), 2012 1. 3. (using raws from 289 to 432) etc.
how can I get this table? I have tried this
summary(example$freqency, example$Date, by="day")
how can I draw dropbox for every day separately? I have tried this:
boxplot(example$freqency, example$Date, by="day")
How can I select time data within days? I also want to calculate summary table by days, but in this case I want to use only data in every hours (e.g. 0:00, 1:00, 2:00 etc.)
Can somebody help me?
To get summary of frequency by day, you could use aggregate from base R in combination with strftime():
aggregate(frequency ~ strftime(Date, "%d"),
FUN = summary, data = example)
To get a boxplot per day, we just need to create a $day column for the x-axis in ggplot2.
library(ggplot2)
example$day <- strftime(example$Date, "%d")
ggplot(example, aes(x = factor(day), y = frequency)) + geom_boxplot()
Try this simply:
within days:
example$str.date <- substring(as.character(example$Date),1,10)
summary.example <- aggregate(frequency~str.date, example, FUN = summary)
library(ggplot2)
ggplot(example, aes(str.date, frequency, group=str.date, fill=str.date)) + geom_boxplot() +
theme(axis.text.x = element_text(angle=90, vjust = 0.5))
within hours (within each day):
example$str.date.hrs <- substring(as.character(example$Date),1,13)
summary.example <- aggregate(frequency~str.date.hrs, example, FUN = summary)
library(ggplot2)
ggplot(example[example$str.date=='2012-01-01',], aes(str.date.hrs, frequency, group=str.date.hrs, fill=str.date.hrs)) + geom_boxplot() +
theme(axis.text.x = element_text(angle=90, vjust = 0.5))

Resources