I'm working with a data frame that includes both the day and month of the observation:
Day Month
1 January
2 January
3 January
...
32 February
33 February
34 February
...
60 March
61 March
And so on. I am creating a line graph in ggplot that reflects the day by day value of column wp (which can just be assumed to be random values between 0 and 1).
Because there are so many days in the data set, I don't want them to be reflected in the x-axis tick marks. I need to refer to the day column in creating the plot so I can see the day-by-day change in wp, but I just want month to be shown on the x-axis labels. I can easily rename the x-axis title with xlab("Month"), but I can't figure out how to change the tick marks to only show the month. So ideally, I want to see "January", "February", and "March" as the labels along the x-axis.
test <- ggplot(data = df, aes(x=day, y=wp)) +
geom_line(color="#D4BF91", size = 1) +
geom_point(color="#D4BF91", size = .5) +
ggtitle("testex") +
xlab("Day") +
ylab("Value") +
theme_fivethirtyeight() + theme(axis.title = element_text())
When I treat the day as an actual date, I get the desired result:
library(lubridate)
set.seed(21540)
dat <- tibble(
date = seq(mdy("01-01-2020"), mdy("01-01-2020")+75, by=1),
day = seq_along(date),
wp = runif(length(day), 0, 1)
)
ggplot(data = dat, aes(x=date, y=wp)) +
geom_line(color="#D4BF91", size = 1) +
geom_point(color="#D4BF91", size = .5) +
ggtitle("testex") +
xlab("Day") +
ylab("Value") +
theme(axis.title = element_text())
More generally, though you could use scale_x_continuous() to do this:
library(lubridate)
set.seed(21540)
dat <- tibble(
date = seq(mdy("01-01-2020"), mdy("01-01-2020")+75, by=1),
day = seq_along(date),
wp = runif(length(day), 0, 1),
month = as.character(month(date, label=TRUE))
)
firsts <- dat %>%
group_by(month) %>%
slice(1)
ggplot(data = dat, aes(x=day, y=wp)) +
geom_line(color="#D4BF91", size = 1) +
geom_point(color="#D4BF91", size = .5) +
ggtitle("testex") +
xlab("Day") +
ylab("Value") +
scale_x_continuous(breaks=firsts$day, label=firsts$month) +
theme(axis.title = element_text())
Related
I have a dataframe which contains a variable for week-since-2017. So, it counts up from 1 to 313 in that column. I mutated another variable into the dataframe to indicate the year. So, in my scatterplot, I have each week as a point, but the x-axis is horrid, counting up from 1 to 313. Is there a way I can change the scale at the bottom to instead display the variable year, possibly even adding vertical lines in between to show when the year changes?
Currently, I have this:
ggplot(HS, aes(as.integer(Obs), Total)) + geom_point(aes(color=YEAR)) + geom_smooth() + labs(title="Weekly Sales since 2017",x="Week",y="Written Sales") + theme(axis.line = element_line(colour = "orange", size = 1, linetype = "solid"))
You can convert the number of weeks to a number of days using 7 * Obs and add this value on to the start date (as.Date('2017-01-01')). This gives you a date-based x axis which you can format as you please.
Here, we set the breaks at the turn of each year so the grid fits to them:
ggplot(HS, aes(as.Date('2017-01-01') + 7 * Obs, Total)) +
geom_point(aes(color = YEAR)) +
geom_smooth() +
labs(title = "Weekly Sales since 2017", x = "Week", y = "Written Sales") +
theme(axis.line = element_line(colour = "orange", size = 1)) +
scale_x_date('Year', date_breaks = 'year', date_labels = '%Y')
Data used
Obviously, we don't have your data, so I had to create a reproducible set with the same names and similar values to yours for the above example:
set.seed(1)
HS <- data.frame(Obs = 1:312,
Total = rnorm(312, seq(1200, 1500, length = 312), 200)^2,
YEAR = rep(2017:2022, each = 52))
I am practicing with R and have hit a speedbump while trying to create a graph of airline passengers per month.
I want to show a separate monthly line graph for each year from 1949 to 1960 whereby data has been recorded. To do this I have used ggplot to create a line graph with the values per month. This works fine, however when I try to separate this by year using facet_wrap() and formatting the current month field: facet_wrap(format(air$month[seq(1, length(air$month), 12)], "%Y")); it returns this:
Graph returned
I have also tried to format the facet by inputting my own sequence for the years: rep(c(1949:1960), each = 12). This returns a different result which is better but still wrong:
Second graph
Here is my code:
air = data.frame(
month = seq(as.Date("1949-01-01"), as.Date("1960-12-01"), by="months"),
air = as.vector(AirPassengers)
)
ggplot(air, aes(x = month, y = air)) +
geom_point() +
labs(x = "Month", y = "Passengers (in thousands)", title = "Total passengers per month, 1949 - 1960") +
geom_smooth(method = lm, se = F) +
geom_line() +
scale_x_date(labels = date_format("%b"), breaks = "12 month") +
facet_wrap(format(air$month[seq(1, length(air$month), 12)], "%Y"))
#OR
facet_wrap(rep(c(1949:1960), each = 12))
So how do I make an individual graph per year?
Thanks!
In the second try you were really close. The main problem with the data is that you are trying to make a facetted plot with different x-axis values (dates including the year). An easy solution to fix that would be to transform the data to a "common" x axis scale and then do the facetted plot. Here is the code that should output the desired plot.
library(tidyverse)
library(lubridate)
air %>%
# Get the year value to use it for the facetted plot
mutate(year = year(month),
# Get the month-day dates and set all dates with a dummy year (2021 in this case)
# This will get all your dates in a common x axis scale
month_day = as_date(paste(2021,month(month),day(month), sep = "-"))) %>%
# Do the same plot, just change the x variable to month_day
ggplot(aes(x = month_day,
y = air)) +
geom_point() +
labs(x = "Month",
y = "Passengers (in thousands)",
title = "Total passengers per month, 1949 - 1960") +
geom_smooth(method = lm,
se = F) +
geom_line() +
# Set the breaks to 1 month
scale_x_date(labels = scales::date_format("%b"),
breaks = "1 month") +
# Use the year variable to do the facetted plot
facet_wrap(~year) +
# You could set the x axis in an 90° angle to get a cleaner plot
theme(axis.text.x = element_text(angle = 90,
vjust = 0.5,
hjust = 1))
I'm trying to plot the monthly sales data with RStudio, but the dates on the x-axis are not showing correctly.
My code :-
uc_ts_plot <- ggplot(monthly_sales, aes(DATE,DAUTONSA)) + geom_line(na.rm=TRUE) +
xlab("Month") + ylab("Auto Sales in Thousands") +
scale_x_date(labels = date_format(format= "%b-%Y"),
breaks = date_breaks("1 year")) +
stat_smooth(colour = "green")
uc_ts_plot
I expect the dates on the x-axis to be displayed as Jan-2011, Jan-2012, as shown here.
All I'm getting is a 0001-01 at the left end and a 0002-01 at the right end of the x-axis.
The plot which is shown is filtered between year 2011 and 2018 whereas the data you have is from 1967.
The below code produces the exact plot
library(tidyverse)
library(scales)
library(lubridate)
monthly_sales %>%
mutate(DATE = as.Date(DATE)) %>%
filter(year(DATE) >= 2011 & year(DATE) < 2018) %>%
ggplot() + aes(DATE,DAUTONSA) +
geom_line(na.rm=TRUE) +
xlab("Month") + ylab("Auto Sales in Thousands") +
scale_x_date(labels = date_format(format= "%b-%Y"),
breaks = date_breaks("1 year")) +
stat_smooth(colour = "green")
You can remove the filter step to plot data for all the years but then it clutters the x-axis with lot of labels.
I am trying to create a circular plot to the display frequency/counts of months in my dataset but I would also like to group the months by season. Here is a similar plot for time of day, and now I would like to use the same approach to plot months/seasons. However, for some reason I can't seem to specify the right option to break my scale into non-overlapping month categories. Any suggestions are much appreciated.
library(lubridate)
library(ggplot2) # use at least 0.9.3 for theme_minimal()
library(circular)
### PLOT FOR HOURS ###
## generate random data in POSIX date-time format
set.seed(44)
N=500
events <- as.POSIXct("2011-01-01", tz="GMT") +
days(floor(365*runif(N))) +
hours(floor(24*rnorm(N))) + # using rnorm here
minutes(floor(60*runif(N))) +
seconds(floor(60*runif(N)))
# extract hour with lubridate function
hour_of_event <- hour(events)
# make a dataframe
eventdata <- data.frame(datetime = events, eventhour = hour_of_event)
# determine if event is in business hours
eventdata$Workday <- eventdata$eventhour %in% seq(6, 18)
ra<-length(eventdata[,2])
for (i in 1:ra){
if(eventdata[,3][i]=="TRUE"){eventdata$diel[i]<-"day"}else{eventdata$diel[i]<-"night"}
}
# Plot
ggplot(eventdata, aes(x = eventhour, fill = diel)) +
geom_histogram(breaks = seq(0,24), width = 2, colour = "grey") +
coord_polar(start = 0) + theme_minimal() +
scale_fill_brewer() + ylab("Count") + ggtitle("Events by Time of day") +
scale_x_continuous("", limits = c(0, 24), breaks = seq(0, 24), labels = seq(0,24))
This is my attempt to do a plot by month/season,
### PLOT FOR MONTHS ###
head(events)
# extract hour with lubridate function
month_of_event <- month(events)
# make a dataframe
eventdata <- data.frame(datetime = events, months = month_of_event)
# classify months into seasons
summer<-c(1,2,12)
fall<-c(3,4,5)
winter<-c(6,7,8)
spring<-c(9,10,11)
season.names <- rep("",12)
season.names[summer] <- "Summer"
season.names[fall] <- "Fall"
season.names[winter] <- "Winter"
season.names[spring] <- "Spring"
season.names
eventdata$season<-season.names[eventdata$months]
str(eventdata)
# Plot
ggplot(eventdata, aes(x = months, fill = season)) +
geom_histogram(breaks = seq(0,12, by=1), width = 4) +
coord_polar(start = 0) + theme_minimal() +
scale_fill_brewer() + ylab("Count") +
scale_x_continuous("", limits = c(0, 12), breaks = seq(0, 12), labels = seq(0,12))
Following simple version works:
ggplot(eventdata, aes(x = factor(months), fill = season)) +
geom_histogram()+
coord_polar()
I am trying to plot the change in a time series for each calendar year using ggplot and I am having problems with the fine control of the x-axis. If I do not use scale="free_x" then I end up with an x-axis that shows several years as well as the year in question, like this:
If I do use scale="free_x" then as one would expect I end up with tick labels for each plot, and that in some cases vary by plot, which I do not want:
I have made various attempts to define the x-axis using scale_x_date etc but without any success. My question is therefore:
Q. How can I control the x-axis breaks and labels on a ggplot facet grid so that the (time series) x-axis is identical for each facet, shows only at the bottom of the panel and is in the form of months formatted 1, 2, 3 etc or as 'Jan','Feb','Mar'?
Code follows:
require(lubridate)
require(ggplot2)
require(plyr)
# generate data
df <- data.frame(date=seq(as.Date("2009/1/1"), by="day", length.out=1115),price=runif(1115, min=100, max=200))
# remove weekend days
df <- df[!(weekdays(as.Date(df$date)) %in% c('Saturday','Sunday')),]
# add some columns for later
df$year <- as.numeric(format(as.Date(df$date), format="%Y"))
df$month <- as.numeric(format(as.Date(df$date), format="%m"))
df$day <- as.numeric(format(as.Date(df$date), format="%d"))
# calculate change in price since the start of the calendar year
df <- ddply(df, .(year), transform, pctchg = ((price/price[1])-1))
p <- ggplot(df, aes(date, pctchg)) +
geom_line( aes(group = 1, colour = pctchg),size=0.75) +
facet_wrap( ~ year, ncol = 2,scale="free_x") +
scale_y_continuous(formatter = "percent") +
opts(legend.position = "none")
print(p)
here is an example:
df <- transform(df, doy = as.Date(paste(2000, month, day, sep="/")))
p <- ggplot(df, aes(doy, pctchg)) +
geom_line( aes(group = 1, colour = pctchg),size=0.75) +
facet_wrap( ~ year, ncol = 2) +
scale_x_date(format = "%b") +
scale_y_continuous(formatter = "percent") +
opts(legend.position = "none")
p
Do you want this one?
The trick is to generate day of year of a same dummy year.
UPDATED
here is an example for the dev version (i.e., ggplot2 0.9)
p <- ggplot(df, aes(doy, pctchg)) +
geom_line( aes(group = 1, colour = pctchg), size=0.75) +
facet_wrap( ~ year, ncol = 2) +
scale_x_date(label = date_format("%b"), breaks = seq(min(df$doy), max(df$doy), "month")) +
scale_y_continuous(label = percent_format()) +
opts(legend.position = "none")
p