I'm using ggplot to plot various events as a function of the date (x-axis) and start time (y-axis) on which they began. The data/code are as follows:
date<-c("2013-06-05","2013-06-05","2013-06-04","2013-06-04","2013-06-04","2013-06-04","2013-06-04",
"2013-06-04","2013-06-04","2013-06-03","2013-06-03","2013-06-03","2013-06-03","2013-06-03",
"2013-06-02","2013-06-02","2013-06-02","2013-06-02","2013-06-02","2013-06-02","2013-06-02")
start <-c("07:36:00","01:30:00","22:19:00","22:12:00","20:16:00","19:19:00","09:00:00",
"06:45:00","01:03:00","22:15:00","19:05:00","08:59:00","08:01:00","07:08:00",
"23:24:00","20:39:00","18:53:00","16:57:00","15:07:00","14:33:00","13:24:00")
duration <-c(0.5,6.1,2.18,0.12,1.93,0.95,10.32,
2.25,5.7,2.78,3.17,9.03,0.95,0.88,
7.73,2.75,1.77,1.92,1.83,0.57,1.13)
event <-c("AF201","SS431","BE201","CD331","HG511","CD331","WQ115",
"CD331","SS431","WQ115","HG511","WQ115","CD331","AF201",
"SS431","WQ115","HG511","WQ115","CD331","AS335","CD331")
df<-data.frame(date,start,duration,event)
library(ggplot2)
library(scales)
p <- ggplot(df, aes(as.Date(date),as.POSIXct(start,format='%H:%M:%S'),color=event))
p <- p+geom_point(alpha = I(6/10),size=5)
p + ylab("time (hr)") + xlab("date") + scale_x_date(labels = date_format("%m/%d")) +
scale_y_datetime(labels = date_format("%H"))+
scale_colour_hue(h=c(360, 90))
theme(axis.text.x = element_text(hjust=1, angle=0))
The resulting plot looks like this:
Question: Instead of simply indicating the start time of the event with a single point (shown above), how can I plot a bar that spans the time duration of the event? As shown in the data frame above I have this duration data (in hours). Alternatively, I could supply a 'stop time' (not shown).
I'm imagining the solution would look something like a stacked bar chart. However, a bar chart isn't quite right as it assumes the bar starts at the bottom of the plot and that the vertically stacked events have no gaps between them. My events may be non-contiguous -- 'starting' and 'stopping' at various positions along the y-axis. The solution will also have to take into consideration that 1) some events may ultimately be concurrent (overlap in time) and 2) some events will span multiple days.
I'd be very grateful for any suggestions!
It's a bit unclear exactly what you want - #Michele's answer seemed good, I wasn't clear if you wanted to to use geom_rect because it would make for thicker lines (if so, just change the line width), or if there was another reason. I decided to give it a go using geom_rect to enable dodging. I've plotted it with the starting date on the x axis, and the start and end times on y. I've set up the data slightly differently to enable that. If you're after something different, try to make it explicit, but at least here's another option:
df<-data.frame(date,start,duration,event)
df <- transform(df,
start = as.POSIXct(paste(date, start)),
end = as.POSIXct(paste(date, start)) + duration*3600)
df <- df[c("event", "start", "end")]
df$date <- strptime(df$start, "%Y-%m-%d")
df$start.new <- format(df$start, format = "%H:%M:%S")
df$end.new <- format(df$end, format = "%H:%M:%S")
df$day <- factor(as.POSIXct(df$date))
levels(df$day) <- 1:4
df$day <- as.numeric(as.character(df$day))
df$event.int <- df$event
levels(df$event.int) <- 1:7
df$event.int <- as.numeric(as.character(df$event.int))
p <- ggplot(df, aes(day, start)) + geom_rect(aes(ymin = start, ymax = end,
xmin = (day - 0.45) + event.int/10,
xmax = (day - 0.35) + event.int/10,
fill = event)) +
scale_x_discrete(limits = 1:4,breaks = 1:4, labels = sort(unique(date)),
name = "Start date") + ylab("Duration")
Thanks (+1s) to #Michele and #alexwhan for your input. Using geom_rect I was able to get all of the events which occur on the same date on the same point on the x axis. (I'm anticipating that this data set may ultimately include many months of events.)
df<-data.frame(date,start,duration,event)
library(ggplot2)
p <- ggplot(df, aes(xmin=as.Date(date),xmax=as.Date(date)+1,
ymin=as.POSIXct(start,format='%H:%M:%S'),
ymax=as.POSIXct(start,format='%H:%M:%S')+duration*3600,
fill=event))
p <- p+geom_rect(alpha = I(8/10))
p + ylab("time") + xlab("date") + scale_x_date(labels = date_format("%m/%d")) +
scale_y_datetime(labels = date_format("%H"))+
scale_colour_hue(h=c(360, 90))
theme(axis.text.x = element_text(hjust=1, angle=0))
... resulting in this:
This is pretty close to what I was aiming for.
I think I can deal with the potential overplotting issue by adjusting the alpha.
Ideally I'd like the y axis to include just a single day (00 to 00). To do this I guess I'll probably need to reformat the data such that events with durations that extend beyond midnight are reallocated to the next day. (Not sure how to do this in R.)
try this method. Probably it's different to what you planned but I think it's a quite clear way to show your data:
df<-data.frame(date,start,duration,event)
df <- transform(df,
start = as.POSIXct(paste(date, start)),
end = as.POSIXct(paste(date, start)) + duration*3600)
df <- df[c("event", "start", "end")]
library(reshape2)
df <- melt(df, id.vars="event")
df$value <- as.POSIXct(df$value, origin=as.Date("1970-01-01"))
df <- df[order(df$event, df$value),]
df$eventID <- rep(seq(1, nrow(df)/2, 1), each=2)
library(ggplot2)
ggplot(df) +
geom_line(aes(value, event, group=eventID, color=event))
Combining the benefits of: (i) y-axis containing a single ~24 hour period; (ii) events not overlapping; (iii) events labelled within the graph in addition to the legend; and (iv) concise code.
library(dplyr)
library(lubridate)
# Re-create data frame
df <- data_frame(date, start, duration, event) %>%
mutate(start_dt = as.POSIXct(paste(date, start), tz = 'UTC'),
start_hr = hour(start_dt),
end_dt = start_dt + duration * 3600,
end_hr = hour(end_dt) + (as.Date(end_dt) - as.Date(start_dt)) * 24)
# Plot
df %>% ggplot() +
geom_segment(aes(x = event, y = start_hr, xend = event, yend = end_hr,
color = event, size = 1)) +
facet_wrap(~ date, nrow = 1) +
guides(size = 'none')
Image of plot:
Related
Using the R programming language, I create some time series data (daily measurements, over a period of 20 years). I aggregated this data at monthly time periods and then produced a graph:
library(ggplot2)
library(xts)
library(scales)
set.seed(123)
day = seq(as.Date("2000/1/1"), as.Date("2020/1/1"),by="day")
day <- format(as.Date(day), "%Y/%m/%d")
amount <- rnorm(7306 ,100,10)
data <- data.frame(day, amount)
y.mon<-aggregate(amount~format(as.Date(day),
format="%Y/%m"),data=data, FUN=sum)
y.mon$d = y.mon$`format(as.Date(day), format = "%Y/%m")`
ggplot(y.mon, aes(x = d, y=amount))+
geom_line(aes(group=1))
Right now, the x-axis is completely unreadable. Is there a way to "unclutter" the x-axis? Perhaps "slant" the dates or show the dates at intervals of 4 month periods? I can completely delete the x-axis but ideally I would like to keep it there for reference.
At the end of the graph, there is a huge downwards "spike". I think this is because the data is aggregated every month - and since the last day the data is available at is "Jan-01-2020", this causes the "downwards spike". Is it possible to "query" the "y.mon" object so that the graph is made only until the last "complete" time period? This "spike" is deceiving, someone might look at the graph and think a big anomaly happened in Jan-2020, but it's actually because there is only 1 measurement at this time.
Thanks
You can also try:
library(ggplot2)
library(xts)
library(scales)
set.seed(123)
#Data
day = seq(as.Date("2000/1/1"), as.Date("2020/1/1"),by="day")
amount <- rnorm(7306 ,100,10)
data <- data.frame(day, amount)
#Aggregate
y.mon<-aggregate(amount~format(as.Date(day),
format="%Y/%m"),data=data, FUN=sum)
#Count days
y.mon2<-aggregate(amount~format(as.Date(day),
format="%Y/%m"),data=data,
FUN=function(x) length(x))
names(y.mon2)[2]<-'N'
#Format and merge to add N
y.mon$d = y.mon$`format(as.Date(day), format = "%Y/%m")`
mmon <- merge(y.mon,y.mon2)
#Add a dummy date
mmon$d <- as.Date(paste0(mmon$d,'/01'),'%Y/%m/%d')
#Plot
ggplot(subset(mmon,N!=1), aes(x = d, y=amount))+
geom_line(aes(group=1))+
scale_x_date(date_breaks = '4 month',date_labels = '%Y-%m',
expand = c(0,0))+
theme(axis.text.x = element_text(angle = 90))
Output:
Update: Using previous code and only changing for labels:
#Plot Update
ggplot(subset(mmon,N!=1), aes(x = d, y=amount))+
geom_line(aes(group=1))+
scale_x_date(date_breaks = '12 month',date_labels = '%Y',
expand = c(0,0))+
theme(axis.text.x = element_text(angle = 90))
Output:
I need to visualize and compare the difference in two equally long sales periods. 2018/2019 and 2019/2020. Both periods begin at week 44 and end at week 36 of the following year. If I create a graph, both periods are continuous and line up. If I use only the week number, the values are sorted as continuum and the graph does not make sense. Can you think of a solution?
Thank You
Data:
set.seed(1)
df1 <- data.frame(sells = runif(44),
week = c(44:52,1:35),
YW = yearweek(seq(as.Date("2018-11-01"), as.Date("2019-08-31"), by = "1 week")),
period = "18/19")
df2 <- data.frame(sells = runif(44),
week = c(44:52,1:35),
YW = yearweek(seq(as.Date("2019-11-01"), as.Date("2020-08-31"), by = "1 week")),
period = "19/20")
# Yearweek on x axis, when both period are separated
ggplot(df1, aes(YW, sells)) +
geom_line(aes(color="Period 18/19")) +
geom_line(data=df2, aes(color="Period 19/20")) +
labs(color="Legend text")
# week on x axis when weeks are like continuum and not splited by year
ggplot(df1, aes(week, sells)) +
geom_line(aes(color="Period 18/19")) +
geom_line(data=df2, aes(color="Period 19/20")) +
labs(color="Legend text")
Another alternative is to facet it. This'll require combining the two sets into one, preserving the data source. (This is commonly a better way of dealing with it in general, anyway.)
(I don't have tstibble, so my YW just has seq(...), no yearweek. It should translate.)
ggplot(dplyr::bind_rows(tibble::lst(df1, df2), .id = "id"), aes(YW, sells)) +
geom_line(aes(color = id)) +
facet_wrap(id ~ ., scales = "free_x", ncol = 1)
In place of dplyr::bind_rows, one might also use data.table::rbindlist(..., idcol="id"), or do.call(rbind, ...), though with the latter you will need to assign id externally.
One more note: the default formatting of the x-axis is obscuring the "year" of the data. If this is relevant/important (and not apparent elsewhere), then use ggplot2's normal mechanism for forcing labels, e.g.,
... +
scale_x_date(labels = function(z) format(z, "%Y-%m"))
While unlikely that you can do this without having tibble::lst available, you can replace that with list(df1=df1, df2=df2) or similar.
If you want to keep the x axis as a numeric scale, you can do:
ggplot(df1, aes((week + 9) %% 52, sells)) +
geom_line(aes(color="Period 18/19")) +
geom_line(data=df2, aes(color="Period 19/20")) +
scale_x_continuous(breaks = 1:52,
labels = function(x) ifelse(x == 9, 52, (x - 9) %% 52),
name = "week") +
labs(color="Legend text")
Try this. You can format your week variable as a factor and keep the desired order. Here the code:
library(ggplot2)
library(tsibble)
#Data
df1$week <- factor(df1$week,levels = unique(df1$week),ordered = T)
df2$week <- factor(df2$week,levels = unique(df2$week),ordered = T)
#Plot
ggplot(df1, aes(week, sells)) +
geom_line(aes(color="Period 18/19",group=1)) +
geom_line(data=df2, aes(color="Period 19/20",group=1)) +
labs(color="Legend text")
Output:
I need help for a R graphic issue with ggplot2.
Lets take an example :
date <- c("oct", "dec")
min.national <- c(17, 20)
min.international <- c(11, 12)
min.roaming <- c(5, 7)
mb.national <- c(115, 150)
mb.international <- c(72, 75)
mb.roaming <- c(30, 40)
df <- data.frame(min.national, min.international, min.roaming, mb.national, mb.international, mb.roaming)
What I want is to have two graphic one for the minutes and one for the megabytes sideline. And to get bars for the three variable (for the minutes in national, international and roaming for example) on the same graphic with fill = date ?
Is it clear for you ?
Thanks
I appreciate there may be a language challenge here, and it sounds like you're just getting started with ggplot2 so not sure how to get started on this, so I hope you find this useful.
It makes sense to treat the minutes and mb separately; they're different units. So I'll just use the minutes as an example. What I understand you're trying to achieve is easy with the right approach and the tidyr library.
library(tidyr)
library(ggplot2)
#first get your data in a data frame
min.df <- data.frame(national = min.national, international = min.international, roaming = min.roaming, month = date)
#now use the tidyr function to create a long data frame, you should recognize that this gives you a data structure readily suited to what you want to plot
min.df.long <- gather(min.df, "region", "minutes", 1:3)
ggplot(min.df.long) + geom_bar(aes(x = region, y = minutes, fill = month), stat = "identity")
If you want the months side by side, as I understand your question, then you could do:
ggplot(min.df.long) + geom_bar(aes(x = region, y = minutes, fill = factor(month, levels = c("oct", "dec"))), position = "dodge", stat = "identity") + labs(fill = "month")
The key parameter is the position keyword, the rest is just to make it neater.
df <- data.frame(date, min.national, min.international, min.roaming, mb.national, mb.international, mb.roaming)
df.stk <- tidyr::separate(melt(df), col="variable", into=c("min_byte", "type"), sep="\\.")
plt <- ggplot(df.stk, aes(type, value, fill = date)) +
geom_bar(stat = "identity") +
facet_grid(.~min_byte)
print(plt)
I want the x-axis in the following graph to start at 06:00 and end at 22:00, with breaks at every 4 hours. I can't figure out the following, however.
a) How to make the x-axis start at 06:00 without any empty space before 06:00.
b) How to make the x-axis end at 22:00 without any empty space after 22:00. Right now it doesn't even show 22:00
c) How to have breaks at every 4 hours.
d) How to assign a label to the y-axis (currently it's simply X4, the column name).
I've tried several things, but without success. Some example data:
range <- seq(as.POSIXct("2015/4/18 06:00"),as.POSIXct("2015/4/18 22:00"),"mins")
df <- data.frame(matrix(nrow=length(range),ncol=4))
df[,1] <- c(1:length(range))
df[,2] <- 2*c(1:length(range))
df[,3] <- 3*c(1:length(range))
df[,4] <- range
Reshape:
library(reshape2)
df2 <- melt(df,id="X4")
Graph:
library(ggplot2)
ggplot(data=df2,aes(x=X4,y=value,color=variable)) + geom_line()+
scale_y_continuous(expand=c(0,0)) +
coord_cartesian(xlim=c(as.POSIXct("2015/4/18 06:00:00"),as.POSIXct("2015/4/18 22:00:00")))
Which makes the graph look like this:
Any ideas?
Here is some code that should help you. This can easily be done using scale_x_datetime.
## desired start and end points
st <- as.POSIXct("2015/4/18 06:00:00")
nd <- as.POSIXct("2015/4/18 22:00:00")
## display data for given time range
ggplot(data = df2, aes(x = X4, y = value, color = variable)) +
geom_line() +
scale_y_continuous("Some name", expand = c(0, 0)) +
scale_x_datetime("Some name", expand = c(0, 0), limits = c(st, nd),
breaks = seq(st, nd, "4 hours"),
labels = strftime(seq(st, nd, "4 hours"), "%H:%S"))
I've been trying to add appropriate dates on the x-axis of my graph, but can't figure out how to do it in a sane way. What I want is pretty simple: a date at every January 1st in between the minimum and maximum of my data set.
I don't want to include the month - just '2008' or '2009' or whatever is fine. A great example would be this graph:
example graph
Except I want the date on every year, rather than every other year.
I can't seem to figure this out. My dates are defined as days since 1/1/1970, and I've included a method dateEPOCH_formatter which converts the epoch format to a format using the chron package. I've figured out how to make a tick mark and date at the origin of the graph and every 365 days thereafter, but that's not quite the same thing.
Another minor problem is that, mysteriously, the line chron(floor(y), out.format="mon year",origin.=epoch) outputs a graph with axis markers like 'Mar 2008', but changing the line to chron(floor(y), out.format="year",origin.=epoch) doesn't give me a result like '2008' - it just results in the error:
Error in parse.format(format[1]) : unrecognized format year
Calls: print ... as.character.times -> format -> format.dates -> parse.format
Execution halted
Here's my code - thanks for the help.
library(ggplot2)
library(chron)
argv <- commandArgs(trailingOnly = TRUE)
mydata = read.csv(argv[1])
png(argv[2], height=300, width=470)
timeHMS_formatter <- function(x) { # Takes time in seconds from midnight, converts to HH:MM:SS
h <- floor(x/3600)
m <- floor(x %% 60)
s <- round(60*(x %% 1)) # Round to nearest second
lab <- sprintf('%02d:%02d', h, m, s) # Format the strings as HH:MM:SS
lab <- gsub('^00:', '', lab) # Remove leading 00: if present
lab <- gsub('^0', '', lab) # Remove leading 0 if present
}
dateEPOCH_formatter <- function (y){
epoch <- c(month=1,day=1,year=1970)
chron(floor(y), out.format="mon year",origin.=epoch)
}
p= ggplot() +
coord_cartesian(xlim=c(min(mydata$day),max(mydata$day)), ylim=c(0,86400)) + # displays data from first email through present
scale_color_hue() +
xlab("Date") +
ylab("Time of Day") +
scale_y_continuous(label=timeHMS_formatter, breaks=seq(0, 86400, 14400)) + # adds tick marks every 4 hours
scale_x_continuous(label=dateEPOCH_formatter, breaks=seq(min(mydata$day), max(mydata$day), 365) ) +
ggtitle("Email Sending Times") + # adds graph title
theme( legend.position = "none", axis.title.x = element_text(vjust=-0.3)) +
theme_bw() +
layer(
data=mydata,
mapping=aes(x=mydata$day, y=mydata$seconds),
stat="identity",
stat_params=list(),
geom="point",
geom_params=list(alpha=5/8, size=2, color="#A9203E"),
position=position_identity(),
)
print(p)
dev.off()
I think it will be much easier to use the built in function scale_x_date with date_format and date_breaks from the scales package. These should work with most date classes in R, such as Date, chron etc
for example
library(ggplot2)
library(chron)
library(scales)
# some example data
days <- seq(as.Date('01-01-2000', format = '%d-%m-%Y'),
as.Date('01-01-2010', format = '%d-%m-%Y'), by = 1)
days_chron <- as.chron(days)
mydata <- data.frame(day = days_chron, y = rnorm(length(days)))
# the plot
ggplot(mydata, aes(x=days, y= y)) + geom_point() +
scale_x_date(breaks = date_breaks('year'), labels = date_format('%Y'))
To show how intuitive and easy these function are, if you wanted Montth-year labels every 6 months - note that this requires a very wide plot or very small axis labels
ggplot(mydata, aes(x=days, y= y)) + geom_point() +
scale_x_date(breaks = date_breaks('6 months'), labels = date_format('%b-%Y'))