How to unclutter the x-axis in a plot - r

Using the R programming language, I create some time series data (daily measurements, over a period of 20 years). I aggregated this data at monthly time periods and then produced a graph:
library(ggplot2)
library(xts)
library(scales)
set.seed(123)
day = seq(as.Date("2000/1/1"), as.Date("2020/1/1"),by="day")
day <- format(as.Date(day), "%Y/%m/%d")
amount <- rnorm(7306 ,100,10)
data <- data.frame(day, amount)
y.mon<-aggregate(amount~format(as.Date(day),
format="%Y/%m"),data=data, FUN=sum)
y.mon$d = y.mon$`format(as.Date(day), format = "%Y/%m")`
ggplot(y.mon, aes(x = d, y=amount))+
geom_line(aes(group=1))
Right now, the x-axis is completely unreadable. Is there a way to "unclutter" the x-axis? Perhaps "slant" the dates or show the dates at intervals of 4 month periods? I can completely delete the x-axis but ideally I would like to keep it there for reference.
At the end of the graph, there is a huge downwards "spike". I think this is because the data is aggregated every month - and since the last day the data is available at is "Jan-01-2020", this causes the "downwards spike". Is it possible to "query" the "y.mon" object so that the graph is made only until the last "complete" time period? This "spike" is deceiving, someone might look at the graph and think a big anomaly happened in Jan-2020, but it's actually because there is only 1 measurement at this time.
Thanks

You can also try:
library(ggplot2)
library(xts)
library(scales)
set.seed(123)
#Data
day = seq(as.Date("2000/1/1"), as.Date("2020/1/1"),by="day")
amount <- rnorm(7306 ,100,10)
data <- data.frame(day, amount)
#Aggregate
y.mon<-aggregate(amount~format(as.Date(day),
format="%Y/%m"),data=data, FUN=sum)
#Count days
y.mon2<-aggregate(amount~format(as.Date(day),
format="%Y/%m"),data=data,
FUN=function(x) length(x))
names(y.mon2)[2]<-'N'
#Format and merge to add N
y.mon$d = y.mon$`format(as.Date(day), format = "%Y/%m")`
mmon <- merge(y.mon,y.mon2)
#Add a dummy date
mmon$d <- as.Date(paste0(mmon$d,'/01'),'%Y/%m/%d')
#Plot
ggplot(subset(mmon,N!=1), aes(x = d, y=amount))+
geom_line(aes(group=1))+
scale_x_date(date_breaks = '4 month',date_labels = '%Y-%m',
expand = c(0,0))+
theme(axis.text.x = element_text(angle = 90))
Output:
Update: Using previous code and only changing for labels:
#Plot Update
ggplot(subset(mmon,N!=1), aes(x = d, y=amount))+
geom_line(aes(group=1))+
scale_x_date(date_breaks = '12 month',date_labels = '%Y',
expand = c(0,0))+
theme(axis.text.x = element_text(angle = 90))
Output:

Related

Is there a way to adjust the order of time of day on an axis in ggplot2?

I am running an analysis looking at the diel and seasonal activity patterns of bats. I'm hoping to plot time on the y-axis centered around midnight but am not sure how. Right now the y-axis runs 0-24 hours. I am using ggplot2. The code is below along with a picture of the plot.
library(ggplot2)
library(ggExtra) #for ggMarginal()
p <- ggplot(LBB, aes(x = RecordingDate, y = Time)) +
geom_point(size = .5) +
theme(legend.position="none")
p2 <- ggMarginal(p, type = "density")
I do not know of a way to center Time only data on midnight in a ggplot, but as a fairly simple workaround you can convert your times to datetimes, with all times before midnight coded as one day (you can choose any arbitrary day, in the code here I used today), and all times after midnight coded as the next day. You can then make a ggplot with a datetime axis centered on the midnight between those two days, and do some axis label formatting to make it look like a time only axis.
library(tidyverse)
library(hms) #package for handling time data
#Generate date and time data
LBB <- data.frame(RecordingDate =
rep(seq.Date(as.Date("2021-04-01"), as.Date("2021-04-20"),
"days"), 25),
Time = hms(sample(0:59, replace = TRUE, 500),
sample(0:59, replace = TRUE, 500),
sample(c(0:5, 20:23), replace = TRUE, 500)))
# Make a graph.datetime column that assigns all times before midnight to the same
# date, and all times after midnight to the next day (you can choose any day)
LBB <- LBB %>%
mutate(graphing.date = if_else(
Time > parse_hms("12:00:00"), as.Date("2021-10-19"), #times 12 to 24 are this date
as.Date("2021-10-20")), #other times (0-12) are this date
#combine date and time to get a graphing datetime spread over two days
#(now you have data that is centered on midnight)
graphing.datetime = as.POSIXct(paste(graphing.date, Time)))
#Graph, but use recording.datetime, not Time as y variable
p <- ggplot(LBB, aes(x = RecordingDate, y = graphing.datetime)) +
geom_point() +
#format the y axis so it looks like Time even though it is datetime
scale_y_datetime("Time", #change axis title back to time
#Set limits to show a full day
limits = c(as.POSIXct("2021-10-19 12:00:00"),
as.POSIXct("2021-10-20 12:00:00")),
expand = c(0,0), #turn off axis limit expansion
date_breaks = "4 hours", #set custom axis breaks
date_labels = "%H:%M") #Hours:minutes only for labels
p

R: ggplot of average daily counts by month

I am trying to plot average daily trip counts by month. However, I am struggling in finding how I can only include the mean number of trips per day by month in the plot instead of the total monthly trips.
The days of the week and months have already been converted from numeric type to abbreviations and have also been ordered (type: ).
Here's what I've tried for the plot.
by_day <- df_temp %>%
group_by(Start.Day)
ggplot(by_day, aes(x=Start.Month,
fill=Start.Month)) +
geom_bar() +
scale_fill_brewer(palette = "Paired") +
labs(title="Number of Daily Trips by Month",
x=" ",
y="Number of Daily Trips")
Here's the plot I am trying to replicate:
You are almost there. Since you did not share a reproducible example, I simulate your data. You may need to adapt the variable naming and/or correct my assumptions.
{lubridate} is a powerful package for date-time crunching. It comes handy when working with dates and binning dates for summaries, etc.
# simulating your data
## a series of dates from June through October
days <- seq(from = lubridate::ymd("2020-06-01")
,to = lubridate::ymd("2020-10-30")
,by = "1 day")
## random trips on each day
set.seed(666)
trips <- sample(2000:5000, length(days), replace = TRUE)
# putting things together in a data frame
df_temp <- data.frame(date = days, counts = trips) %>%
# I assume the variable Start.Month is the monthly bin
# let's use lubridate to "bin" the month from the date
mutate(Start.Month = lubridate::floor_date(date, unit = "month"))
# aggregate trips for each month, calculate average daily trips
by_month <- df_temp %>%
group_by(Start.Month) %>% # group by the binning variable
summarise(Avg.Trips = mean(counts)) # calculate the mean for each group
ggplot( data = by_month
, aes(x = Start.Month, y = Avg.Trips
, fill=as.factor(Start.Month)) # to work with a discrete palette, factorise
) +
# ------------ bar layer -----------------------------------------
## instead of geom_bar(... stat = "identity"), you can use geom_col()
## and define the fill colour
geom_col() +
scale_fill_brewer(palette = "Paired") +
# ------------ if you like provide context with annotation -------
geom_text(aes(label = Avg.Trips %>% round(2)), vjust = 1) +
# ------------ finalise plot with labels, theme, etc.
labs(title="Number of Daily Trips by Month",
x=NULL, # setting an unused lab to NULL is better than printing empty " "!
y="Number of Daily Trips"
) +
theme_minimal() +
theme(legend.position = "none") # to suppress colour legend

How do i reshape my data from y-m-d into years?

I want to plot a frequency of topics over years.
However my variable containing dates have the following structure, example:2016-01-01. This means that the data is structured in days.
However i want the data to be visualized on a monthly basis.
The data is structured in a data.frame
I tried to visualize my topic frequency over the dates as such:
ggplot(data = dat,
aes(x = date,
fill = Topics[1])) +
geom_freqpoly(binwidth = 30)
However when i execute the command my visualization only shows every third month, like: January, April, July, etc..
How do I get the dates on the x-axis to show all the months: (January, Februrary, March, April .. etc)?
You can edit your x axis labels
library(tidyverse)
library(lubridate)
last_month <- Sys.Date() - 0:199
df <- data.frame(
date = last_month,
price = runif(200)
)
base <- ggplot(df, aes(date, price)) +
geom_line()
base + scale_x_date(date_breaks = "months", date_labels="%B")

faceting monthly data by year in r - months with no data appear

I'm trying to create a grouped bar chart of monthly data, aggregated from daily data, over multiple years. I have accomplished what I wanted from my x-axis from faceting, using faceting as a way to apply a secondary sort (on year and month). Now that I've faceted by year, ggplot is showing all months - even when there's no data. This is wasting space and my actual data set has years of data and I want to add labels, so space is an issue.
How can I accomplish this without the wasted space? Is there a way to add the secondary sort (year,month) on the x-axis without faceting?
# create data set
date = seq(as.Date("2014-05-01"),as.Date("2015-05-10"), "day")
revenue = runif(375, min = 0, max = 200)
cost = runif(375, min = 0, max = 100)
df = data.frame(date,revenue,cost)
head(df)
# adding month and year column, then aggregating to monthly revenue and cost
library(plyr)
df$month <- month(df$date, label=TRUE)
df$year <- year(df$date)
df <- as.data.frame(ddply(df, .(month,year), numcolwise(sum)))
# melting the data for a 'grouped chart' in ggplot
library(reshape)
df <-melt(df, id = c("month","year"))
#create chart
library(ggplot2)
g <-ggplot(df, aes(x=month, y=value, fill=variable))
g + geom_bar(stat="identity", position="dodge") + facet_wrap(~ year)
I feel certain that there's a more elegant way to do this from within ggplot. Am I right?
The key is to use scale = "free" in facet_wrap(). By following your code (with a revision), you'll see the graphic below.
set.seed(222)
date = seq(as.Date("2014-05-01"),as.Date("2015-05-10"), "day")
revenue = runif(375, min = 0, max = 200)
cost = runif(375, min = 0, max = 100)
mydf = data.frame(date,revenue,cost)
mydf$month <- month(mydf$date, label=TRUE)
mydf$year <- year(mydf$date)
mydf2 <- as.data.frame(ddply(mydf, .(month,year), numcolwise(sum)))
mydf3 <- melt(mydf2, id = c("month","year"))
ggplot(mydf3, aes(x=month, y=value, fill=variable)) +
geom_bar(stat = "identity", position = "dodge") +
facet_wrap(~ year, scale = "free")

ggplot2: plotting non-contiguous time durations as a bar chart

I'm using ggplot to plot various events as a function of the date (x-axis) and start time (y-axis) on which they began. The data/code are as follows:
date<-c("2013-06-05","2013-06-05","2013-06-04","2013-06-04","2013-06-04","2013-06-04","2013-06-04",
"2013-06-04","2013-06-04","2013-06-03","2013-06-03","2013-06-03","2013-06-03","2013-06-03",
"2013-06-02","2013-06-02","2013-06-02","2013-06-02","2013-06-02","2013-06-02","2013-06-02")
start <-c("07:36:00","01:30:00","22:19:00","22:12:00","20:16:00","19:19:00","09:00:00",
"06:45:00","01:03:00","22:15:00","19:05:00","08:59:00","08:01:00","07:08:00",
"23:24:00","20:39:00","18:53:00","16:57:00","15:07:00","14:33:00","13:24:00")
duration <-c(0.5,6.1,2.18,0.12,1.93,0.95,10.32,
2.25,5.7,2.78,3.17,9.03,0.95,0.88,
7.73,2.75,1.77,1.92,1.83,0.57,1.13)
event <-c("AF201","SS431","BE201","CD331","HG511","CD331","WQ115",
"CD331","SS431","WQ115","HG511","WQ115","CD331","AF201",
"SS431","WQ115","HG511","WQ115","CD331","AS335","CD331")
df<-data.frame(date,start,duration,event)
library(ggplot2)
library(scales)
p <- ggplot(df, aes(as.Date(date),as.POSIXct(start,format='%H:%M:%S'),color=event))
p <- p+geom_point(alpha = I(6/10),size=5)
p + ylab("time (hr)") + xlab("date") + scale_x_date(labels = date_format("%m/%d")) +
scale_y_datetime(labels = date_format("%H"))+
scale_colour_hue(h=c(360, 90))
theme(axis.text.x = element_text(hjust=1, angle=0))
The resulting plot looks like this:
Question: Instead of simply indicating the start time of the event with a single point (shown above), how can I plot a bar that spans the time duration of the event? As shown in the data frame above I have this duration data (in hours). Alternatively, I could supply a 'stop time' (not shown).
I'm imagining the solution would look something like a stacked bar chart. However, a bar chart isn't quite right as it assumes the bar starts at the bottom of the plot and that the vertically stacked events have no gaps between them. My events may be non-contiguous -- 'starting' and 'stopping' at various positions along the y-axis. The solution will also have to take into consideration that 1) some events may ultimately be concurrent (overlap in time) and 2) some events will span multiple days.
I'd be very grateful for any suggestions!
It's a bit unclear exactly what you want - #Michele's answer seemed good, I wasn't clear if you wanted to to use geom_rect because it would make for thicker lines (if so, just change the line width), or if there was another reason. I decided to give it a go using geom_rect to enable dodging. I've plotted it with the starting date on the x axis, and the start and end times on y. I've set up the data slightly differently to enable that. If you're after something different, try to make it explicit, but at least here's another option:
df<-data.frame(date,start,duration,event)
df <- transform(df,
start = as.POSIXct(paste(date, start)),
end = as.POSIXct(paste(date, start)) + duration*3600)
df <- df[c("event", "start", "end")]
df$date <- strptime(df$start, "%Y-%m-%d")
df$start.new <- format(df$start, format = "%H:%M:%S")
df$end.new <- format(df$end, format = "%H:%M:%S")
df$day <- factor(as.POSIXct(df$date))
levels(df$day) <- 1:4
df$day <- as.numeric(as.character(df$day))
df$event.int <- df$event
levels(df$event.int) <- 1:7
df$event.int <- as.numeric(as.character(df$event.int))
p <- ggplot(df, aes(day, start)) + geom_rect(aes(ymin = start, ymax = end,
xmin = (day - 0.45) + event.int/10,
xmax = (day - 0.35) + event.int/10,
fill = event)) +
scale_x_discrete(limits = 1:4,breaks = 1:4, labels = sort(unique(date)),
name = "Start date") + ylab("Duration")
Thanks (+1s) to #Michele and #alexwhan for your input. Using geom_rect I was able to get all of the events which occur on the same date on the same point on the x axis. (I'm anticipating that this data set may ultimately include many months of events.)
df<-data.frame(date,start,duration,event)
library(ggplot2)
p <- ggplot(df, aes(xmin=as.Date(date),xmax=as.Date(date)+1,
ymin=as.POSIXct(start,format='%H:%M:%S'),
ymax=as.POSIXct(start,format='%H:%M:%S')+duration*3600,
fill=event))
p <- p+geom_rect(alpha = I(8/10))
p + ylab("time") + xlab("date") + scale_x_date(labels = date_format("%m/%d")) +
scale_y_datetime(labels = date_format("%H"))+
scale_colour_hue(h=c(360, 90))
theme(axis.text.x = element_text(hjust=1, angle=0))
... resulting in this:
This is pretty close to what I was aiming for.
I think I can deal with the potential overplotting issue by adjusting the alpha.
Ideally I'd like the y axis to include just a single day (00 to 00). To do this I guess I'll probably need to reformat the data such that events with durations that extend beyond midnight are reallocated to the next day. (Not sure how to do this in R.)
try this method. Probably it's different to what you planned but I think it's a quite clear way to show your data:
df<-data.frame(date,start,duration,event)
df <- transform(df,
start = as.POSIXct(paste(date, start)),
end = as.POSIXct(paste(date, start)) + duration*3600)
df <- df[c("event", "start", "end")]
library(reshape2)
df <- melt(df, id.vars="event")
df$value <- as.POSIXct(df$value, origin=as.Date("1970-01-01"))
df <- df[order(df$event, df$value),]
df$eventID <- rep(seq(1, nrow(df)/2, 1), each=2)
library(ggplot2)
ggplot(df) +
geom_line(aes(value, event, group=eventID, color=event))
Combining the benefits of: (i) y-axis containing a single ~24 hour period; (ii) events not overlapping; (iii) events labelled within the graph in addition to the legend; and (iv) concise code.
library(dplyr)
library(lubridate)
# Re-create data frame
df <- data_frame(date, start, duration, event) %>%
mutate(start_dt = as.POSIXct(paste(date, start), tz = 'UTC'),
start_hr = hour(start_dt),
end_dt = start_dt + duration * 3600,
end_hr = hour(end_dt) + (as.Date(end_dt) - as.Date(start_dt)) * 24)
# Plot
df %>% ggplot() +
geom_segment(aes(x = event, y = start_hr, xend = event, yend = end_hr,
color = event, size = 1)) +
facet_wrap(~ date, nrow = 1) +
guides(size = 'none')
Image of plot:

Resources