I'm trying to plot time series data by week and month; ideally, I think, I'd like to use boxplots to visualise daily data binned by week. While I can change the labels and gridlines on the x-axis using scale_x_date, that won't affect the points in the plot.
Here's a demonstration of the problem and my current (clumsy) solution.
library(zoo)
library(ggplot2)
d = as.Date(c(as.Date("2007-06-01"):as.Date("2008-05-31"))) # using zoo to reformat numeric
x = runif(366, min = 0, max = 100)
df = data.frame(d,x)
# PROBLEM #
p = ggplot(df, aes(d, x))
p + geom_point()
p + geom_boxplot() # more or less useless
# CURRENT FIX #
df$Year.Month <- format(df$d, "%Y-%m")
p = ggplot(df, aes(Year.Month, x))
p + geom_point(alpha = 0.75)
p + geom_boxplot() # where I'm trying to get to...
I feel certain that there's a more elegant way to do this from within ggplot. Am I right?
#shadow's answer below is much neater. But is there a way to do this using binning? Using stats in some form, perhaps?
You can treat Dates as dates in R, and use scale_x_date() in ggplot to get the x-labels you want.
Also, I find it easier to just create a new variable-factor called "Month" to group the boxplots by month. In this case I used lubridate to accomplish the task.
If you do not want to go through the trouble of creating a new variable "Month", your bloxplot will be plotted on the 15th of the month, making the viz reading a bit more difficult.
library(magrittr)
library(lubridate)
library(dplyr)
df %>%
mutate(Date2 = as.Date(paste0("2000-", month(d), "-", "01"))) %>%
mutate(Month = lubridate::month(d)) %>%
ggplot(aes(Date2, x, group=Month)) +
geom_boxplot() +
scale_x_date(date_breaks="1 month", date_labels = "%b")
If you do not create the variable "Month", boxplots won't align nicely with the x tick marks:
Related
I have been working on a plot in R using ggplot and plotting dates on the x axis. I have noticed that R does not recognize them as dates, and so the order on the x axis is wrong. I have tried many different things such as using as.Date(), manually editing levels and ordering the x axis, but nothing has worked. Here's my code:
library(dplyr)
library(ggplot2)
library(hrbrthemes)
calories_data = read.csv('dailyCalories_clean.csv',header = TRUE, sep=",")
ggplot(calories_data, aes(x= ActivityDay, y=Calories, group=Id, color = Id))+
geom_line()
Here's the plot
I appreciate any help, I'm new at this and have been researching for hours with no success. Thank you!
One option to fix your issue would be to convert your dates to proper dates to fix the order and use the date_labels argument of scale_x_date to format your dates. To convert to dates you have to add a fake year to your ActivityDay, e.g. "2022":
Using some fake random data to mimic your real data:
library(ggplot2)
set.seed(123)
calories_data <- data.frame(
ActivityDay <- rep(c("4/1", "4/10", "5/11", "5/1"), 3),
Id = rep(1:3, each = 4),
Calories = runif(12, 1000, 3000)
)
calories_data$ActivityDay <- as.Date(paste("2022", calories_data$ActivityDay, sep = "/"), format = "%Y/%m/%d")
ggplot(calories_data, aes(x= ActivityDay, y=Calories, group=Id, color = Id))+
geom_line() +
scale_x_date(date_breaks = "5 day", date_labels = "%m/%d")
I’m putting together some functions to help summarize temporal data in fiscal quarters. Function I have will take a date—e.g. 2017-01-01—and spit out factored character value that corresponds—e.g. ”1Q2017”. I’m using my data to create graphs in ggplot. But since I factor the quarters, I can’t use attributes like geom_line() to connect my data points, like you would for dates.
Can I create a data type for quarters that displays as quarters but behaves like dates? How would I do this?
The "yearqtr" class in zoo represents year/quarters but acts sort of like dates in so far as internally such objects are represented numerically as year + frac where frac is 0, 1/4, 2/4, 3/4 and one can perform arithmetic on them and they format as meaningful year/quarter strings and work with lines in ggplot2 (and classic graphics and lattice graphics). See ?yearqtr and ?scale_x_yearqtr.
library(ggplot2)
library(zoo)
# test data
dates <- c("2017-01-01", "2017-04-01")
values <- 1:2
z <- zoo(values, as.yearqtr(dates)) # test zoo object
# 1. classic graphics
plot(z, axat = "n")
axis(1, at = time(z), labels = format(time(z), "%YQ%q"))
# 2. ggplot2 graphics
autoplot(z) + scale_x_yearqtr()
# 3. ggplot2 graphics using data frame with yearqtr
DF <- fortify.zoo(z) # test data frame
sapply(DF, class)
## Index z
## "yearqtr" "integer"
ggplot(DF, aes(Index, z)) + geom_line() + scale_x_yearqtr()
Taking the comment from #Jaap and incorporating with example graph:
library(ggplot2)
library(zoo)
df <- data.frame(date1 = c("2017-01-01", "2016-10-01", "2016-07-01"),
v1 = c(2, 4, 3))
df$date1 <- as.Date(df$date1)
ggplot(df, aes(x = date1, y = v1)) +
geom_line() +
scale_x_date(name = "quarters",
date_labels = as.yearqtr)
You just need to specify group=1 in aes.
library(tidyverse) # install.packages('tidyverse') if needed
dat = data_frame(date = seq.Date(as.Date('2017-01-01'),
as.Date('2017-12-31'),
length.out=365),
x = rnorm(365))
dat = mutate(dat, qtr = paste0(lubridate::quarter(date), 'Q', lubridate::year(date)))
dat$qtr = as.factor(dat$qtr) # for similarity to your situation
dat %>%
group_by(qtr) %>%
summarise(n = sum(x)) %>%
ggplot(aes(x=qtr, y=n, group=1)) +
geom_line()
i need your help.
I was trying to do a stacked bar plot in R and i m not succeding for the moment. I have read several post but, no succed neither.
Like i am newbie, this is the chart I want (I made it in excel)
And this is how i have the data
Thank you in advance
I would use the package ggplot2 to create this plot as it is easier to position text labels than compared to the basic graphics package:
# First we create a dataframe using the data taken from your excel sheet:
myData <- data.frame(
Q_students = c(1000,1100),
Students_with_activity = c(950, 10000),
Average_debt_per_student = c(800, 850),
Week = c(1,2))
# The data in the dataframe above is in 'wide' format, to use ggplot
# we need to use the tidyr package to convert it to 'long' format.
library(tidyr)
myData <- gather(myData,
Condition,
Value,
Q_students:Average_debt_per_student)
# To add the text labels we calculate the midpoint of each bar and
# add this as a column to our dataframe using the package dplyr:
library(dplyr)
myData <- group_by(myData,Week) %>%
mutate(pos = cumsum(Value) - (0.5 * Value))
#We pass the dataframe to ggplot2 and then add the text labels using the positions which
#we calculated above to place the labels correctly halfway down each
#column using geom_text.
library(ggplot2)
# plot bars and add text
p <- ggplot(myData, aes(x = Week, y = Value)) +
geom_bar(aes(fill = Condition),stat="identity") +
geom_text(aes(label = Value, y = pos), size = 3)
#Add title
p <- p + ggtitle("My Plot")
#Plot p
p
so <- data.frame ( week1= c(1000,950,800), week2=c(1100,10000,850),row.names = c("Q students","students with Activity","average debt per student")
barplot(as.matrix(so))
I'm attempting to use ggplot and R for analysing some epidemiologic data, and I'm continuing to struggle with getting an epidemic curve to appear properly.
Data is here
attach(epicurve)
head(epicurve)
onset age
1 21/12/2012 18
2 14/06/2013 8
3 10/06/2013 64
4 28/05/2013 79
5 14/04/2013 56
6 9/04/2013 66
epicurve$onset <- as.Date(epicurve$onset, format="%d/%m/%Y")
ggplot(epicurve, aes(onset)) + geom_histogram() + scale_x_date(breaks=date_breaks("1 year"), minor_breaks=date_breaks("1 month"), labels = date_format("%b-%Y"))
gives this graph. This is fine, but the binwidths are not related to any time period of note, and adjusting them is a bit trial and error.
For this particular dataset, I'd like to display the cases by month of onset.
One way I worked out how to do this is:
epicurve$monyr <- format(epicurve$onset, "%b-%Y")
epicurve$monyr <- as.factor(epicurve$monyr)
ggplot(epicurve, aes(monyr)) + geom_histogram()
Outputs a graph I can't post because of the reputation system. The bars represent something meaningful, but the axis labels are a bomb-site. I can't format the axes using scale_x_date because they aren't dates and I can't work out what arguments to pass to scale_x_discrete to give useful labels.
I have a feeling there should be an easier way to do this by doing an operation on the onset column. Can anyone give me any pointers, please?
One option is to aggregate the data outside ggplot and then use geom_bar. This will produce counts by month.
edited Sept. 21 2013. Altered plot to show months with no counts.
epicurve <- read.csv("epicurve.csv", sep=",", header=T)
# initial formatting
epicurve$onset <- as.Date(epicurve$onset, format="%d/%m/%Y") # convert to Date class
epicurve$onset <- strftime(epicurve$onset, format="%Y/%m") # convert to Year-month
epicurve$onset <- paste(epicurve$onset, "/01", sep = "") # add arbitrary day on to end to make compatible w/ ggplot2
# aggregate by month
onset_counts <- aggregate(epicurve$onset, by = list(date = epicurve$onset), length) # aggregate by month
onset_counts$date = as.Date(onset_counts$date, format = "%Y/%m/%d") # covert to Date class
# plot
library(ggplot2)
library(scales)
ggplot(onset_counts, aes(x=date, y=x)) + geom_bar(stat="identity") + theme_bw() + theme(axis.text.x = element_text(angle=90, hjust = 1, vjust = 1)) +
ylab("Frequency") + xlab(NULL) + scale_x_date(breaks="month", labels=date_format("%Y-%m"))
I've also just happened across another way of making it look pretty, although it feels like a bit of a kludge.
#read data
epicurve <- read.csv("epicurve.csv", sep=",", header=T)
epicurve$onset <- as.Date(epicurve$onset, format="%d/%m/%Y")
#load libraries
library(ggplot2)
library(scales)
#plot
ggplot(epicurve, aes(onset)) + geom_histogram(colour="white", binwidth=30.4375) +
scale_x_date(breaks=date_breaks("1 year"), minor_breaks=("1 month"), labels=date_format("%b-%Y")) +
scale_y_continuous(breaks=0:10, minor_breaks=NULL) +
theme(axis.text.x = element_text(angle=45, vjust=0.5))
# binwidth = (365.25/12) = 30.4375 - which nicely makes the bins fit the scale nicely
Which gives this (notice the beautiful alignment of the bins!):
Many thanks to Nate for the help, and hopefully this will be useful!
I've been trying to add appropriate dates on the x-axis of my graph, but can't figure out how to do it in a sane way. What I want is pretty simple: a date at every January 1st in between the minimum and maximum of my data set.
I don't want to include the month - just '2008' or '2009' or whatever is fine. A great example would be this graph:
example graph
Except I want the date on every year, rather than every other year.
I can't seem to figure this out. My dates are defined as days since 1/1/1970, and I've included a method dateEPOCH_formatter which converts the epoch format to a format using the chron package. I've figured out how to make a tick mark and date at the origin of the graph and every 365 days thereafter, but that's not quite the same thing.
Another minor problem is that, mysteriously, the line chron(floor(y), out.format="mon year",origin.=epoch) outputs a graph with axis markers like 'Mar 2008', but changing the line to chron(floor(y), out.format="year",origin.=epoch) doesn't give me a result like '2008' - it just results in the error:
Error in parse.format(format[1]) : unrecognized format year
Calls: print ... as.character.times -> format -> format.dates -> parse.format
Execution halted
Here's my code - thanks for the help.
library(ggplot2)
library(chron)
argv <- commandArgs(trailingOnly = TRUE)
mydata = read.csv(argv[1])
png(argv[2], height=300, width=470)
timeHMS_formatter <- function(x) { # Takes time in seconds from midnight, converts to HH:MM:SS
h <- floor(x/3600)
m <- floor(x %% 60)
s <- round(60*(x %% 1)) # Round to nearest second
lab <- sprintf('%02d:%02d', h, m, s) # Format the strings as HH:MM:SS
lab <- gsub('^00:', '', lab) # Remove leading 00: if present
lab <- gsub('^0', '', lab) # Remove leading 0 if present
}
dateEPOCH_formatter <- function (y){
epoch <- c(month=1,day=1,year=1970)
chron(floor(y), out.format="mon year",origin.=epoch)
}
p= ggplot() +
coord_cartesian(xlim=c(min(mydata$day),max(mydata$day)), ylim=c(0,86400)) + # displays data from first email through present
scale_color_hue() +
xlab("Date") +
ylab("Time of Day") +
scale_y_continuous(label=timeHMS_formatter, breaks=seq(0, 86400, 14400)) + # adds tick marks every 4 hours
scale_x_continuous(label=dateEPOCH_formatter, breaks=seq(min(mydata$day), max(mydata$day), 365) ) +
ggtitle("Email Sending Times") + # adds graph title
theme( legend.position = "none", axis.title.x = element_text(vjust=-0.3)) +
theme_bw() +
layer(
data=mydata,
mapping=aes(x=mydata$day, y=mydata$seconds),
stat="identity",
stat_params=list(),
geom="point",
geom_params=list(alpha=5/8, size=2, color="#A9203E"),
position=position_identity(),
)
print(p)
dev.off()
I think it will be much easier to use the built in function scale_x_date with date_format and date_breaks from the scales package. These should work with most date classes in R, such as Date, chron etc
for example
library(ggplot2)
library(chron)
library(scales)
# some example data
days <- seq(as.Date('01-01-2000', format = '%d-%m-%Y'),
as.Date('01-01-2010', format = '%d-%m-%Y'), by = 1)
days_chron <- as.chron(days)
mydata <- data.frame(day = days_chron, y = rnorm(length(days)))
# the plot
ggplot(mydata, aes(x=days, y= y)) + geom_point() +
scale_x_date(breaks = date_breaks('year'), labels = date_format('%Y'))
To show how intuitive and easy these function are, if you wanted Montth-year labels every 6 months - note that this requires a very wide plot or very small axis labels
ggplot(mydata, aes(x=days, y= y)) + geom_point() +
scale_x_date(breaks = date_breaks('6 months'), labels = date_format('%b-%Y'))