I am trying to create a plot to compare year to year revenue, but I can't get it to work and don't understand why.
Consider my df:
df <- data.frame(date = seq(as.Date("2016-01-01"), as.Date("2017-10-01"), by = "month"),
rev = rnorm(22, 150, sd = 20))
df %>%
separate(date, c("Year", "Month", "Date")) %>%
filter(Month <= max(Month[Year == "2017"])) %>%
group_by(Year, Month) %>%
ggplot(aes(x = Month, y = rev, fill = Year)) +
geom_line()
geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?
I don't really understand why this isn't working. What I want is two lines that go from January to October.
this should work for you:
library(tidyverse)
df <- data.frame(date = seq(as.Date("2016-01-01"), as.Date("2017-10-01"), by = "month"),
rev = rnorm(22, 150, sd = 20))
df %>%
separate(date, c("Year", "Month", "Date")) %>%
filter(Month <= max(Month[Year == "2017"])) %>%
ggplot(aes(x = Month, y = rev, color = Year, group = Year)) +
geom_line()
it was just the grouping which gone wrong due to the type of variables, it might be usefull if you use lubridate for the dates (also a tidyverse package)
library(lubridate)
df %>%
mutate(Year = as.factor(year(date)), Month = month(date)) %>%
filter(Month <= max(Month[Year == "2017"])) %>%
ggplot(aes(x = Month, y = rev, color = Year)) +
geom_line()
I think ggplot2 is confused because it doesn't recognise the format of your Month column, which is a character in this case. Try converting it to numeric:
... +
ggplot(aes(x = as.numeric(Month), y = rev, colour = Year)) +
....
Note that I replace the word fill with colour, which I believe makes more sense for this chart:
Btw, I'm not sure the group_by statement is adding anything. I get the same chart with or without it.
Related
I have the following code for a stacked bar graph for the period 1970-2020, which is the X-axis label.
The graph is generated from a dataframe with 3 columns and 51 rows representing various years.
Year Active New
1970 1 1
......
2020 268 60
df %>%
mutate(Old = Active - New) %>%
select(-Active) %>%
pivot_longer(-Year, names_to = "Type", values_to = "Count") %>%
ggplot() +
geom_col(aes(x = Year, y = Count, fill = forcats::fct_rev(Type)))+
ggtitle("example graph")+
scale_fill_discrete(name="Cases",
breaks=c("Old", "New"),
labels=c("Ongoing", "New"))
As my X-axis is not readable, I would like to remove most of the Years in the X-axis, and keep eg. 1970, 1980, 1990, 2000, 2020 - at their corresponding positions. I'm not sure that scale_x_discrete can do this?
The obvious answer in this specific case is to convert Year to a numeric variable, which will make the breaks pretty by default.
This sample data allows us to run your code and reproduce your issue:
set.seed(1)
df <- data.frame(Year = factor(1970:2020),
Active = cumsum(rnorm(51, 4, 2)),
New = cumsum(rnorm(51, 1, 1)))
Using your exact plotting code produces a similar plot with unreadable axis labels:
df %>%
mutate(Old = Active - New) %>%
select(-Active) %>%
pivot_longer(-Year, names_to = "Type", values_to = "Count") %>%
ggplot() +
geom_col(aes(x = Year, y = Count, fill = forcats::fct_rev(Type)))+
ggtitle("example graph")+
scale_fill_discrete(name="Cases",
breaks=c("Old", "New"),
labels=c("Ongoing", "New"))
But if we simply convert Year to numeric values, we get the same plot with pretty breaks:
df %>%
mutate(Old = Active - New) %>%
select(-Active) %>%
pivot_longer(-Year, names_to = "Type", values_to = "Count") %>%
ggplot() +
geom_col(aes(x = as.numeric(as.character(Year)), y = Count,
fill = forcats::fct_rev(Type)))+
ggtitle("example graph")+
xlab("Year") +
scale_fill_discrete(name="Cases",
breaks=c("Old", "New"),
labels=c("Ongoing", "New"))
If for whatever reason it needs to be a factor, you can leave specific years blank using a labeling function.
df %>%
mutate(Old = Active - New) %>%
select(-Active) %>%
pivot_longer(-Year, names_to = "Type", values_to = "Count") %>%
ggplot() +
geom_col(aes(x = Year, y = Count, fill = forcats::fct_rev(Type)))+
ggtitle("example graph")+
scale_fill_discrete(name="Cases",
breaks=c("Old", "New"),
labels=c("Ongoing", "New")) +
scale_x_discrete(labels = function(x) ifelse(as.numeric(x) %% 10, "", x)) +
theme(axis.ticks.length.x = unit(0, "mm"))
Created on 2022-08-19 with reprex v2.0.2
I am trying to plot daily data with days of the week (Monday:Sunday) on the y-axis, week of the year on the x-axis with monthly labels (January:December), and facet by year with each facet as its own row. I want the week of the year to align between the facets. I also want each tile to be square.
Here is a toy dataset to work with:
my_data <- tibble(Date = seq(
as.Date("1/11/2013", "%d/%m/%Y"),
as.Date("31/12/2014", "%d/%m/%Y"),
"days"),
Value = runif(length(VectorofDates)))
One solution I came up with is to use lubridate::week() to number the weeks and plot by week. This correctly aligns the x-axis between the facets. The problem is, I can't figure out how to label the x-axis with monthly labels.
my_data %>%
mutate(Week = week(Date)) %>%
mutate(Weekday = wday(Date, label = TRUE, week_start = 1)) %>%
mutate(Year = year(Date)) %>%
ggplot(aes(fill = Value, x = Week, y = Weekday)) +
geom_tile() +
theme_bw() +
facet_grid(Year ~ .) +
coord_fixed()
Alternatively, I tried plotting by the first day of the week using lubridate::floor_date and lubridate::round_date. In this solution, the x-axis is correctly labeled, but the weeks don't align between the two years. Also, the tiles aren't perfectly square, though I think this could be fixed by playing around with the coord_fixed ratio.
my_data %>%
mutate(Week = floor_date(Date, week_start = 1),
Week = round_date(Week, "week", week_start = 1)) %>%
mutate(Weekday = wday(Date, label = TRUE, week_start = 1)) %>%
mutate(Year = year(Date)) %>%
ggplot(aes(fill = Value, x = Week, y = Weekday)) +
geom_tile() +
theme_bw() +
facet_grid(Year ~ .) +
scale_x_datetime(name = NULL, labels = label_date("%b")) +
coord_fixed(7e5)
Any suggestions of how to get the columns to align correctly by week of the year while labeling the months correctly?
The concept is a little flawed, since the same week of the year is not guaranteed to fall in the same month. However, you can get a "close enough" solution by using the labels and breaks argument of scale_x_continuous. The idea here is to write a function which takes a number of weeks, adds 7 times this value as a number of days onto an arbitrary year's 1st January, then format it as month name only using strftime:
my_data %>%
mutate(Week = week(Date)) %>%
mutate(Weekday = wday(Date, label = TRUE, week_start = 1)) %>%
mutate(Year = year(Date)) %>%
ggplot(aes(fill = Value, x = Week, y = Weekday)) +
geom_tile() +
theme_bw() +
facet_grid(Year ~ .) +
coord_fixed() +
scale_x_continuous(labels = function(x) {
strftime(as.Date("2000-01-01") + 7 * x, "%B")
}, breaks = seq(1, 52, 4.2))
Another option if you're sick of reinventing the wheel is to use the calendarHeat function in the Github-only makeR package:
install_github("jbryer/makeR")
library(makeR)
calendarHeat(my_data$Date, my_data$Value)
I have some time series data plotted using ggplot. I'd like the legend, which appears to the right of the plot, to be in the same order as the line on the most recent date/value on the plot's x-axis. I tried using the case_when function, but I'm obviously using it wrong. Here is an example.
df <- tibble(
x = runif(100),
y = runif(100),
z = runif(100),
year = sample(seq(1900, 2010, 10), 100, T)
) %>%
gather(variable, value,-year) %>%
group_by(year, variable) %>%
summarise(mean = mean(value))
df %>%
ggplot(aes(year, mean, color = variable)) +
geom_line()
## does not work
df %>%
mutate(variable = fct_reorder(variable, case_when(mean ~ year == 2010)))
ggplot(aes(year, mean, color = variable)) +
geom_line()
We may add one extra line
ungroup() %>% mutate(variable = fct_reorder(variable, mean, tail, n = 1, .desc = TRUE))
before plotting, or use
df %>%
mutate(variable = fct_reorder(variable, mean, tail, n = 1, .desc = TRUE)) %>%
ggplot(aes(year, mean, color = variable)) +
geom_line()
In this way we look at the last values of mean and reorder variable accordingly.
There's another way without adding a new column using fct_reorder2():
library(tidyverse)
df %>%
ggplot(aes(year, mean, color = fct_reorder2(variable, year, mean))) +
geom_line() +
labs(color = "variable")
Although it's not recommendable in your case, to order the legend based on the first (earliest) values in your plot you can set
df %>%
ggplot(aes(year, mean, color = fct_reorder2(variable, year, mean, .fun = first2))) +
geom_line() +
labs(color = "variable")
The default is .fun = last2 (see also https://forcats.tidyverse.org/reference/fct_reorder.html)
I am trying to plot my data as a stacked bar chart using the ggplot2 package. I want to:
get the dataframe's row names on the x axis;
sum up the values by month and show the split by each column as well;
order the values in decreasing order for every month.
My data:
neg.trans <- data.frame( Fraud = c(1.686069964, 2.95565648,
1.170119649,0.429596978),
DeviceDeposit= c( 0.86629,0.61366,0.97226,0.42835),
Usagefees= c(2.2937235,2.294725,2.587091,1.841178),
SecurityDeposit= c(1.616816492, 3.036161258,5.820125209, 2.62082681),
row.names=c("2018-Oct","2018-Nov","2018-Dec","2019-Jan"))
I'd like to generate a chart that looks like below:
Is this possible to do this with R?
Here is an improved handling of the dates and a more base R (well still using ggplot2...) solution:
library(tidyverse)
my.df <- neg.trans %>%
# Convert the row.names to a proper column so it can be the x-axis
rownames_to_column("Date") %>%
# Format the Date colum with parse_date, %Y is the symbol for year, %b for abbrev. months
mutate(Date = parse_date(Date, format = "%Y-%b")) %>%
# Transform the data from wide to long format
gather("type", "value", -Date)
ggplot(my.df, aes(Date, value, fill = type)) +
geom_col() +
scale_x_date(date_labels = "%Y-%b") # Take care of the correct date-labels
library(ggplot2)
# Convert the row.names to a proper column so it can be the x-axis
neg.trans$Date <- row.names(neg.trans)
# Columns which should be gathered into one
ids <- c("Fraud", "DeviceDeposit", "Usagefees", "SecurityDeposit")
# Transform the data from wide to long format
my.df <- reshape(neg.trans, idvar = "Date", varying = list(ids),
times = ids, v.names = "value", direction = "long")
row.names(my.df) <- NULL
# Add a day to each Date so we can transform it
my.df$Date <- paste0(my.df$Date, "-01")
# Format the Date colum with as.Date, %Y is for year, %b for abbrev. months, %d for day
my.df$Date <- as.Date(my.df$Date, format = "%Y-%b-%d")
ggplot(my.df, aes(Date, value, fill = time)) +
geom_col() +
scale_x_date(date_labels = "%Y-%b")
Descending odering
If you want to order your columns individually you can do the following (adapted from https://stackoverflow.com/a/53598064/5892059)
my.df <- my.df %>%
arrange(Date, type) %>%
mutate(type = factor(type)) %>%
arrange(Date, -value)
aux <- with(my.df, match(sort(unique(type)), type))
ggplot(my.df, aes(Date, value, fill = interaction(-value, Date))) +
geom_col() +
scale_fill_manual(values = scales::hue_pal()(4)[my.df$type],
labels = with(my.df, type[aux]),
breaks = with(my.df, interaction(-value, Date)[aux])) +
scale_x_date(date_labels = "%Y-%b")
In my opinion that looks confusing.
This? Hopefully someone suggests an edit. The way I've handled the date is really not the best.
library(tidyverse)
df<-neg.trans %>%
mutate(Date=row.names(.),Day=rep(1,nrow(.)),Date=paste(Date,Day,sep="-0"))
df<-df %>%
mutate(Date=as.factor(Date))
levels(df$Date)<-c("2018-Oct-01","2018-Nov-01","2018-Dec-01","2019-Jan-01")
df%>%
gather("ID","Value",-Date,-Day) %>%
select(-Day) %>%
ggplot(aes(Date,Value,fill=ID)) + geom_col()
NOTE:
Months<-sapply(strsplit(as.character(df$Date),"-"),"[[",2)
Months<-recode(Months,"Dec"=12,"Nov"=11,"Oct"=10,"Jan"=1)
df %>%
mutate(Months=Months,Date=str_remove_all(df$Date,"-.*"),
Date=make_date(Date,Months,Day),Date=as.factor(Date)) %>%
gather("ID","Value",-Date,-Day,-Months) %>%
arrange(Date) %>%
select(-Day,-Months) %>%
ggplot(aes(Date,Value,fill=ID)) + geom_col()
I've got a data similar to example below:
library(dplyr)
nycflights13::flights %>%
mutate(date = as.Date(paste(day, month, year, sep = '-'), format = '%d-%m-%Y')) %>%
select(date, carrier, distance)
Now I need to build a plot with stacked sums of distance in each day, where subsequent layers would refer to different carriers. I mean something similar to
ggplot(diamonds, aes(x = price, fill = cut)) + geom_area(stat = "bin")
but with sum as a stat.
I have tried with
nycflights13::flights %>%
mutate(date = as.Date(paste(day, month, year, sep = '-'), format = '%d-%m-%Y')) %>%
select(date, carrier, distance) %>%
ggplot() +
geom_area(aes(date, distance, fill = carrier, group = carrier), stat = 'sum')
but it didn't do a trick, resulting in
Error in f(...) : Aesthetics can not vary with a ribbon
It's pretty easy with geom_bar, but any ideas how to make a stacked geom_area plot?
library(dplyr)
nycflights13::flights %>%
mutate(date = as.Date(paste(day, month, year, sep = '-'),
format = '%d-%m-%Y')) %>%
select(date, carrier, distance) %>%
group_by(date, carrier) %>%
summarise(distance = sum(distance)) %>%
ggplot() +
geom_area(aes(date, distance, fill = carrier,
group = carrier), stat = 'identity')
should do the trick.