I'd like to make a plot for 2019 and 2020, but I'm running into a problem with the month() function from lubridate.
If I run this:
df %>%
mutate (Month = month(Date, label=T)) %>%
group_by(Month, Var1) %>%
summarize (sum = sum(numeric_variable)) %>%
ggplot(aes(Month, sum)) +
geom_col() +
facet_wrap(. ~ Var1, scales ="free_y")
The data for January 2019 and 2020 and other months are combined in the plot, which makes sense since they're both labelled as 'Jan' in the Month variable for both 2019 and 2020.
How can I best separate the months for 2019 and 2020 while still keeping the label ('janb, 'feb') and the order of my Month variable? Do I have to reorder them as a factor manually or is there a better way?
Lubridate is nice for some things, but I much prefer zoo::as.yearmon for months and years. There is even a nice scale_x_yearmon function for ggplot:
library(zoo)
df %>%
mutate (Month = zoo::as.yearmon(Date)) %>%
group_by(Month, Var1) %>%
summarize (sum = sum(numeric_variable)) %>%
ggplot(aes(Month, sum)) +
geom_col() +
facet_wrap(. ~ Var1, scales ="free_y") +
zoo::scale_x_yearmon(format = "%b")
Sample data:
set.seed(123)
df <- data.frame(Date = rep(seq(as.Date("2019-01-01"),as.Date("2020-12-31"), by = "day"),2),
Var1 = rep(LETTERS[1:2],each = 731),
numeric_variable = round(runif(2*731,1,100)))
Thanks Ian you set me on the right path. The scale_x_yearmon didn't work for me though. But the since order was right, so I could just convert the outcome of zoo's yearmon a factor and work from there.
df %>%
mutate (Month = as.factor(zoo::as.yearmon(Date))) %>%
group_by(Month, Var1) %>%
ggplot(aes(Month, numeric_variable)) +
geom_col() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
I know have a 'jan 2019'/ 'dec 2019' and 'jan 2020' / 'dec 2020' at the x-axis which is good enough for now. Issue solved.
Related
I have a dataset:
TimeSeries <- data.frame(date=c("2013","2013","2013","2014","2014","2015","2015","2016", "2016","2017","2017"),
score=c(-0.333, 0.500, 0.000, 0.333, -0.500, 0.777, -0.450, 0.667, -0.011, 0.111, -0.145))
Now I just want to simply plot this as a time series, that shows all observations, every year. I just the code:
ggplot(TimeSeries, aes(x = date, y = score)) +
geom_line() + scale_x_discrete(breaks = c(2013,2014,2015,2016))
This gives me the following:
But instead of this, I want the observations from "2013" to be in order as they come in the dataset, in between "2013" and "2014" on the xas. Thus this will be a line going from: -0.333 to 0.500 to 0.000. Same for all other years and observations. They need to be attached to each other, in the order that they come in the dataset. So first of 2013 first, then second, etc. Can anyone help?
I'd suggest making the year and observation into a fractional year measurement so that you can benefit from the convenience of a continuous axis:
library(dplyr)
TimeSeries %>%
group_by(date) %>%
mutate(obs_num = row_number(),
year_dec = as.numeric(date) + (obs_num - 1)/max(obs_num)) %>%
ungroup() %>%
ggplot(aes(year_dec, score)) +
geom_line()
You'll find the functions in the lubridate package to be helpful.
Basically what we do is create an artificial date that is based on the year plus the order of the variables. The order is converted to decimal, then added to the year, and converted to an actual date for plotting.
TimeSeries %>%
mutate(date = as.numeric(date)) %>%
group_by(date) %>%
mutate(date_order = row_number()) %>%
mutate(n_in_year = n()) %>%
mutate(decimal = date_order / n_in_year) %>%
mutate(full_date_decimal = date + decimal) %>%
mutate(plot_date = date_decimal(full_date_decimal)) %>%
mutate(plot_date = as_date(plot_date)) %>%
ggplot(aes(x = plot_date, y = score)) +
geom_line() +
scale_x_date()
I am trying to do a faceted plot of a grouped dataframe with ggplot2, using geom_line(). My dataframe has a Date column and I would like to have dates on the horizontal axis. If I just use Date in aes(x=Date, ...) I get nice labels on the horizontal axis. However, the line has an almost horizontal section where the date jumps from the end of one group to the beginning of the next group. This code and chart shows that:
dts <- seq.Date(as.Date("2020-01-01"), as.Date("2021-12-31"), by="day")
mos <- sapply(dts, month)
df <- data.frame(Date=dts, Month=mos)
nr <- nrow(df)
df$X <- rep(1, nr)
df %>%
group_by(Month) -> dfgrp
dfgrp %>%
group_by(Month) %>%
mutate(Time = Date[1:n()],
Z = cumsum(X)) %>%
ggplot(aes(x=Date, y=Z)) +
geom_line(color="darkgreen", size=0.5) +
facet_grid(. ~ Month, scale="free_x") +
theme(axis.text.x = element_text(angle=45, size=7))
I would not like my chart to have those almost-horizontal lines when the date changes by a large amount. I was able to generate a chart without those lines using integers on aes() as follows:
dfgrp %>%
mutate(Time = 1:n() %>% as.integer(),
Z = cumsum(X)) %>%
ggplot(aes(x=Time, y=Z)) +
geom_line(color="darkgreen", size=0.5) +
facet_grid(. ~ Month, scale="free_x") +
scale_x_continuous(breaks = seq(from=1, to=nr, by=10) %>% as.integer(),
labels = function(x) as.character(dfgrp$Date[x])) +
theme(axis.text.x = element_text(angle=45, size=7))
The line on the chart looks like I want it but the dates on the horizontal axis are not correct: they end in February 2020 in every facet while the dates in the dataframe end in December 2021 and the dates in the first chart begin and end on different months in different facets.
I tried many things but nothing worked. Any suggestions on how to have a chart with dates like in the first chart above and lines like in the second chart above?
Help will be much appreciated.
You may want to adjust the dates to be in the same year, but noting the original year as a variable:
library(lubridate)
dfgrp %>%
group_by(Month) %>%
mutate(year = year(Date),
adj_date = ymd(paste(2020, month(Date), day(Date)))) %>%
# 2020 was leap year so 2/29 won't be lost
mutate(Time = Date[1:n()],
Z = cumsum(X)) %>%
ggplot(aes(x=adj_date, y=Z, color = year, group = year)) +
geom_line(size=0.5) +
facet_grid(. ~ Month, scale="free_x") +
theme(axis.text.x = element_text(angle=45, size=7))
Starting with the following dataset:
$ Orders,Year,Date
1608052.2,2019,2019-08-02
1385858.4,2018,2018-07-27
1223593.3,2019,2019-07-25
1200356.5,2018,2018-01-20
1198226.3,2019,2019-07-15
837866.1,2019,2019-07-02
Trying to make a similar format as:
with the criteria: X-axis will be days or months, y-axis will be sum of Orders, grouping / colors will be by year.
Attempts:
1) No overlay
dataset %>%
ggplot( aes(x=`Merge Date`, y=`$ Orders`, group=`Merge Date (Year)`, color=`Merge Date (Year)`)) +
geom_line()
2) ggplot month grouping
dataset %>%
mutate(Date = as.Date(`Date`) %>%
mutate(Year = format(Date,'%Y')) %>%
mutate(Month = format(Date,'%b')) -> dataset2
ggplot(data=dataset2, aes(x=Month, y=`$ Orders`, group=Year, color=factor(Year))) +
geom_line(size=.75) +
ylab("Volume")
The lubridate package is your answer. Extract month from the Date field and turn it into a variable. This code worked for me:
library(tidyverse)
library(lubridate)
dataset <- read_delim("OrderValue,Year,Date\n1608052.2,2019,2019-08-02\n1385858.4,2018,2018-07-27\n1223593.3,2019,2019-07-25\n1200356.5,2018,2018-01-20\n1198226.3,2019,2019-07-15\n837866.1,2019,2019-07-02", delim = ",")
dataset <- dataset %>%
mutate(theMonth = month(Date))
ggplot(dataset, aes(x = as.factor(theMonth), y = OrderValue, group = as.factor(Year), color = as.factor(Year))) +
geom_line()
I have a dataframe in R where:
Date MeanVal
2002-01 37.70722
2002-02 43.50683
2002-03 45.31268
2002-04 14.96000
2002-05 29.95932
2002-09 52.95333
2002-10 12.15917
2002-12 53.55144
2003-03 41.15083
2003-04 21.26365
2003-05 33.14714
2003-07 66.55667
.
.
2011-12 40.00518
And when I plot a time series using ggplot with:
ggplot(mean_data, aes(Date, MeanVal, group =1)) + geom_line()+xlab("")
+ ylab("Mean Value")
I am getting:
but as you can see, the x axis scale is not very neat at all. Is there any way I could just scale it by year (2002,2003,2004..2011)?
Let's use lubridate's parse_date_time() to convert your Date to a date class:
library(tidyverse)
library(lubridate)
mean_data %>%
mutate(Date = parse_date_time(as.character(Date), "Y-m")) %>%
ggplot(aes(Date, MeanVal)) +
geom_line()
Similarly, we can convert to an xts and use autoplot():
library(timetk)
mean_data %>%
mutate(Date = parse_date_time(as.character(Date), "Y-m")) %>%
tk_xts(silent = T) %>%
autoplot()
This achieves the plot above as well.
library(dplyr)
mean_data %>%
mutate(Date = as.integer(gsub('-.*', '', Date)) %>%
#use the mutate function in dplyr to remove the month and cast the
#remaining year value as an integer
ggplot(aes(Date, MeanVal, group = 1)) + geom_line() + xlab("")
+ ylab("Mean Value")
I am trying to create a plot to compare year to year revenue, but I can't get it to work and don't understand why.
Consider my df:
df <- data.frame(date = seq(as.Date("2016-01-01"), as.Date("2017-10-01"), by = "month"),
rev = rnorm(22, 150, sd = 20))
df %>%
separate(date, c("Year", "Month", "Date")) %>%
filter(Month <= max(Month[Year == "2017"])) %>%
group_by(Year, Month) %>%
ggplot(aes(x = Month, y = rev, fill = Year)) +
geom_line()
geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?
I don't really understand why this isn't working. What I want is two lines that go from January to October.
this should work for you:
library(tidyverse)
df <- data.frame(date = seq(as.Date("2016-01-01"), as.Date("2017-10-01"), by = "month"),
rev = rnorm(22, 150, sd = 20))
df %>%
separate(date, c("Year", "Month", "Date")) %>%
filter(Month <= max(Month[Year == "2017"])) %>%
ggplot(aes(x = Month, y = rev, color = Year, group = Year)) +
geom_line()
it was just the grouping which gone wrong due to the type of variables, it might be usefull if you use lubridate for the dates (also a tidyverse package)
library(lubridate)
df %>%
mutate(Year = as.factor(year(date)), Month = month(date)) %>%
filter(Month <= max(Month[Year == "2017"])) %>%
ggplot(aes(x = Month, y = rev, color = Year)) +
geom_line()
I think ggplot2 is confused because it doesn't recognise the format of your Month column, which is a character in this case. Try converting it to numeric:
... +
ggplot(aes(x = as.numeric(Month), y = rev, colour = Year)) +
....
Note that I replace the word fill with colour, which I believe makes more sense for this chart:
Btw, I'm not sure the group_by statement is adding anything. I get the same chart with or without it.