I have what I think is a version of remove data points when using stat_summary to generate mean and confidence band or How to set multiple colours in a ggplot2 stat_summary plot? and may also relate to this bug report relating to the SE parameter https://github.com/tidyverse/ggplot2/issues/1546, but I can't seem to figure out what I am doing wrong.
I have weekly data and I am trying to plot current year, previous year, 5 year average, and 5 year range. I can get the plot and all the elements that I want, but I can't get the fill in the range to relate to my scale_fill command.
Here is the code I am using:
library(plyr)
require(dplyr)
require(tidyr)
library(ggplot2)
library(lubridate)
library(zoo)
library(viridis)
ggplot(df1,aes(week,value)) +
geom_point(data=subset(df1,year(date)==year(Sys.Date()) ),size=1.7,aes(colour="1"))+
geom_line(data=subset(df1,year(date)==year(Sys.Date()) ),size=1.7,aes(colour="1"))+
geom_line(data=subset(df1,year(date)==year(Sys.Date())-1 ),size=1.7,aes(colour="2"))+
geom_point(data=subset(df1,year(date)==year(Sys.Date())-1 ),size=1.7,aes(colour="2"))+
#stat_summary(data=subset(df1,year(date)<year(Sys.Date()) &year(date)>year(Sys.Date())-6),geom = 'smooth', alpha = 0.2,size=1.7,
# fun.data = median_hilow,aes(colour=c("1","2","3"),fill="range"))+
stat_summary(data=subset(df1,year(date)<year(Sys.Date()) &year(date)>year(Sys.Date())-6),geom="smooth",fun.y = mean, fun.ymin = min, fun.ymax = max,size=1.7,aes(colour="c",fill="b"))+
#stat_summary(fun.data=mean_cl_normal, geom='smooth', color='black')+
scale_color_viridis("",discrete=TRUE,option="C",labels=c(year(Sys.Date()), year(Sys.Date())-1,paste(year(Sys.Date())-6,"-",year(Sys.Date())-1,"\naverage",sep ="")))+
scale_fill_viridis("",discrete=TRUE,option="C",labels=paste(year(Sys.Date())-6,"-",year(Sys.Date())-1,"\nrange",sep =""))+
#scale_fill_continuous()+
scale_x_continuous(limits=c(min(df1$week),max(df1$week)),expand=c(0,0))+
theme_minimal()+theme(
legend.position = "bottom",
legend.margin=margin(c(0,0,0,0),unit="cm"),
legend.text = element_text(colour="black", size = 12),
plot.caption = element_text(size = 14, face = "italic"),
plot.title = element_text(face = "bold"),
plot.subtitle = element_text(size = 14, face = "italic"),
#panel.grid.minor = element_blank(),
text = element_text(size = 14,face = "bold"),
axis.text.y =element_text(size = 14,face = "bold", colour="black"),
axis.text.x=element_text(size = 14,face = "bold", colour="black",angle=90, hjust=1),
)+
labs(y="Crude Oil Imports \n(Weekly, Thousands of Barrels per Day)",x="Week",
title=paste("US Imports of Crude Oil",sep=""),
caption="Source: EIA API, graph by Andrew Leach.")
I have placed an test.Rdata file here with the df1 data frame: https://drive.google.com/file/d/1aMt4WQaOi1vFJcMlgXFY7dzF_kjbgBiU/view?usp=sharing
Ideally, I'd like to have a fill legend item that looks like this, only with the text as I have it in my graph:
Any help would be much appreciated.
The short answer is that you seem to be misunderstanding how ggplot's scale_xx_xx commands are meant to be used (this trips up a lot of people). Whenever possible, the intention is for the aesthetics (the aes() bit inside most geoms) to be mapped to the scale functions. For example, the following code maps year to line color:
plot.simple <- ggplot(data = df1, aes(x = week, y = value, color = as.factor(year(date)))) +
geom_line()
print(plot.simple)
Since we specified that year (converted to a factor) should be used to define line color, ggplot defaults to using scale_color_hue. We could use a different scale:
plot.gray <- ggplot(data = df1, aes(x = week, y = value, color = as.factor(year(date)))) +
geom_line() +
scale_color_grey()
print(plot.gray)
If we don't want to tie aesthetics such as color or fill to values in the data, we can just specify them outside of the call to aes(). Typically you only do this if you don't have multiple values for an aesthetic:
plot.simple <- ggplot(data = df1, aes(x = week, y = value, color = as.factor(year(date)))) +
geom_line(alpha = 0.2)
print(plot.simple)
But you're in the unenviable position of wanting both of these things at once. For your 2017 and 2018 lines, color is meaningful. For the summary ribbon and its associated line, color is just decorative. In such cases, I usually avoid ggplot's built-in summary functions, since they can often "help" in ways that end up confusing or cumbersome.
I would suggest creating two data sets, one containing the 2017 and 2018 years, and the other containing the summary statistics for the ribbon:
df.years <- df1 %>%
mutate(year = year(date)) %>%
filter(year >= year(Sys.Date()) - 1)
df.year.range <- df1 %>%
mutate(year = year(date)) %>%
filter(year >= year(Sys.Date()) - 6 & year <= year(Sys.Date()) - 1) %>%
group_by(week) %>%
summarize(mean = mean(value), min = min(value), max = max(value))
We can then trick ggplot into printing a nice title for the fill on the legend, by setting fill inside aes to the intended string. Because fill is set in aes(), we control its color with scale_fill_manual.
the.plot <- ggplot() +
geom_ribbon(data = df.year.range, aes(x = week, ymin = min, ymax = max, fill = 'Previous 5 Year Range\nof Weekly Exports')) +
geom_line(data = df.year.range, aes(x = week, y = mean), color = 'purple') +
geom_line(data = df.years, aes(x = week, y = value, color = as.factor(year))) +
geom_point(data = filter(df.years, year == year(Sys.Date())), aes(x = week, y = value, color = as.factor(year))) +
scale_fill_manual(values = '#ffccff')
print(the.plot)
This is still rather cumbersome, because you have quite a few different elements tied to various different sources of data (lines for some years, points for others, a ribbon for a summary, etc). But it gets the job done!
Related
I made a map in R and was wondering how to label the States Codes (variable which is in my dataset) appropriately. Using the simple geom_text or even geom_text_repel I get a lot of labels for each State (I can actually understand why), as I proceed to show:
Map
How can I solve it so each State gets 1 and only 1 text abbreviation (these State Codes are in my dataset as a variable under the name State Codes)? Thanks in advance.
Code below:
library(tidyverse)
library(maps)
library(wesanderson)
library(hrbrthemes)
ggplot(data = data,
mapping = aes(x = long,
y = lat,
group = group,
fill = black_percentage)) +
geom_polygon(col = "black") +
geom_text(aes(label = black_percentage)) +
theme_void() +
theme(legend.position = "bottom",
legend.title = element_blank(),
plot.title = element_text(hjust = 0.5, family = "Times", face = "bold"),
plot.subtitle = element_text(hjust = 0.5, family = "Times", face = "italic"),
plot.caption = element_text(family = "Times", face = "italic"),
legend.key.height = unit(0.85, "cm"),
legend.key.width = unit(0.85, "cm")) +
scale_fill_gradient(low = "#E6A0C4",
high = "#7294D4") +
labs(title = "Percentage of Black People, US States 2018",
subtitle = "Pink colors represent lower percentages. Light-blue colors represents higer percentages") +
ggsave("failed_map.png")
Can you provide the/some sample data?
One possible reason for multiple labels is that each state has multiple rows in the data, so ggplot thinks it needs to plot multiple labels. If you only need a single label, a solution is to create a separate summary dataset, which has only one row for each state/label. You then provide this summary data to geom_text() rather than the original data. Although not the problem in this instance, this is a solution to the common problem of 'blurry' labels; when 10's or 100's of labels are printed on top of one another they appear blurry, but when a single label is printed it appears fine.
Looking at your code and mapping aesthetics, it looks like geom_text() is inheriting the x and y aesthetics from the first ggplot() line. Therefore geom_text() will make a label for every value of x and y (long and lat) per state. This also explains why the labels all appear to follow the state borders.
I would suggest that you summarise each state to a single (x, y) coordinate (e.g. the middle of the state), and give this to geom_text(). Again, without some sample data it may be hard to explain, but something like:
# make the summary label dataframe
state_labels <- your_data %>%
group_by(state) %>%
summarise(
long = mean(long),
lat = mean(lat),
mean_black = mean(black_percentage)
)
# then we plot it
ggplot(data = data,
mapping = aes(x = long,
y = lat,
group = group,
fill = black_percentage)) +
geom_polygon(col = "black") +
geom_text(data = state_labels, aes(label = mean_black))
As the name of the x and y coords are the same in your data and the new state_labels summary we made (long and lat), geom_text() will 'inherit' (assume/use) the same x and y aesthetics that you supplied inside the first line of ggplot(). This is convenient, but sometimes can cause you grief if either dataset has different/the same column names or you want to assign different aesthetics. For example, you don't need geom_text() to inherit the fill = black_percentage aesthetic (although in this instance I don't think it will cause a problem, as geom_text() doesn't accept a fill aesthetic). To disable aesthetic inheritance, simply provide inherit.aes = FALSE to the geom. In this instance, it would look like this, note how we now provide geom_text() with x and y aesthetics.
ggplot(data = data,
mapping = aes(x = long,
y = lat,
group = group,
fill = black_percentage)) +
geom_polygon(col = "black") +
geom_text(data = state_labels, aes(x = long, y = lat, label = mean_black), inherit.aes = FALSE)
EDIT If you want a single label, but the label is not a numeric value and you can't calculate a summary statistic using mean or similar, then the same principles apply; you want to create a summarised version of the data, with a single coordinates for each state and a single label - 1 row for each state. There's many ways to do this, but my go-to would be something like dplyr::first or similar.
# make the summary label dataframe
state_labels <- your_data %>%
group_by(state) %>%
summarise(
long = mean(long),
lat = mean(lat),
my_label = first(`State Codes`)
)
# then we plot it
ggplot(data = data,
mapping = aes(x = long,
y = lat,
group = group,
fill = black_percentage)) +
geom_polygon(col = "black") +
geom_text(data = state_labels, aes(label = my_label))
Finally, ggplot has several built-in functions to plot and map spatial data. It is a good idea to use these where possible, as it will make your life a lot easier. A great 3-part tutorial can be found here, and it even includes an example of exactly what you are trying to do.
I have two dataframes and I want to plot a comparison between them. The plot and dataframes look like so
df2019 <- data.frame(Role = c("A","B","C"),Women_percent = c(65,50,70),Men_percent = c(35,50,30), Women_total =
c(130,100,140), Men_total = c(70,100,60))
df2016 <- data.frame(Role= c("A","B","C"),Women_percent = c(70,45,50),Men_percent = c(30,55,50),Women_total =
c(140,90,100), Men_total = c(60,110,100))
all_melted <- reshape2::melt(
rbind(cbind(df2019, year=2019), cbind(df2016, year=2016)),
id=c("year", "Role"))
Theres no reason I need the data in melted from, I just did it because I was plotting bar graphs with it, but now I need a line graph and I dont know how to make line graphs in melted form, and dont know how to keep that 19/16 tag if not in melted frame. When i try to make a line graph I dont know how to specify what "variable" will be used. I want the lines to be the Women,Men percent values, and the label to be the totals. (in this picture the geom_text is the percent values, I want it to use the total values)
Crucially I want the linetype to be dotted in 2016 and for the legend to show that
I think it would be simplest to rbind the two frames after labelling them with their year, then reshape the result so that you have columns for role, year, gender, percent and total.
I would then use a bit of alpha scale trickery to hide the points and labels from 2016:
df2016$year <- 2016
df2019$year <- 2019
rbind(df2016, df2019) %>%
pivot_longer(cols = 2:5, names_sep = "_", names_to = c("Gender", "Type")) %>%
pivot_wider(names_from = Type) %>%
ggplot(aes(Role, percent, color = Gender,
linetype = factor(year),
group = paste(Gender, year))) +
geom_line(size = 1.3) +
geom_point(size = 10, aes(alpha = year)) +
geom_text(aes(label = total, alpha = year), colour = "black") +
scale_colour_manual(values = c("#07aaf6", "#ef786f")) +
scale_alpha(range = c(0, 1), guide = guide_none()) +
scale_linetype_manual(values = c(2, 1)) +
labs(y = "Percent", color = "Gender", linetype = "Year")
I am trying to find accumulated values for each year of variables A to Z in myData. I have tried a few things but didn't succeed. Once i do that, i would then need to compute maximum,minimum, median, upper and lower quartile average across all those years. Here is my laborious code so far but don't have any idea how to proceed further- in fact, the current code also is not giving me what i am after.
library(tidyverse)
mydate <- as.data.frame(seq(as.Date("2000-01-01"), to= as.Date("2019-12-31"), by="day"))
colnames(mydate) <- "Date"
Data <- data.frame(A = runif(7305,0,10),
J = runif(7305,0,8),
X = runif(7305,0,12),
Z = runif(7305,0,10))
DF <- data.frame(mydate, Data)
myData <- DF %>% separate(Date, into = c("Year","Month","Day")) %>%
sapply(as.numeric) %>%
as.data.frame() %>%
mutate(Date = DF$Date) %>%
filter(Month > 4 & Month < 11) %>%
mutate(DOY = format(Date, "%j")) %>%
group_by(Year) %>%
mutate(cumulativeSum = accumulate(DOY))
I am trying to get a Figure like below for A, J, X, Z. any help would be appreciated.
Update (EDIT)
My question is pretty confusing so i decided to break it down into steps using excel. Here i am using only one variable which in this case is A (note: in my question i have multiple variable). i am accumulated data from May to October each year which is reflected in column cumulative sum. In the second step (Step-2), i re-arrange the data in day of the year (May to October) with their data. in step-3, i am taking the statistics i mentioned earlier across all the years for every day of the year. I try to clarify as much as i could but probably this a bit strange question.
Ultimate Figure
Here is an example Figure that i would like to derive as a result of this exercise.
So, if I'm understand well, you are trying to plot the statistical descriptive of the cumulative values of each variable between May and October of years 2000 to 2019.
So here is a possible solution to calculate first descriptive statistics of each variable (usingdplyr, lubridate, tiydr package) - I encouraged you to break this code in several part in order to understand all steps.
Basically, I isolate month and year of the date, then, pivot the dataframe into a longer format, filter for keeping values only in the period of interest (May to October), calculate the cumulative sum of values grouped by variables and year. Then, I create a fake date (by pasting a consistent year with real month and days) in order to calculate descriptive statistics in function of this date and variable.
Altogether, it gives something like that:
library(lubridate)
library(dplyr)
library(tidyr)
mydata <- DF %>% mutate(Year = year(Date), Month = month(Date)) %>%
pivot_longer(-c(Date,Year,Month), names_to = "variable", values_to = "values") %>%
filter(between(Month,5,10)) %>%
group_by(Year, variable) %>%
mutate(Cumulative = cumsum(values)) %>%
mutate(NewDate = ymd(paste("2020", Month,day(Date), sep = "-"))) %>%
ungroup() %>%
group_by(variable, NewDate) %>%
summarise(Median = median(Cumulative),
Maximum = max(Cumulative),
Minimum = min(Cumulative),
Upper = quantile(Cumulative,0.75),
Lower = quantile(Cumulative, 0.25))
Then, you can get a similar plot to your example by doing:
library(ggplot2)
ggplot(mydata, aes(x = NewDate))+
geom_ribbon(aes(ymin = Lower, ymax = Upper), color = "grey", alpha =0.5)+
geom_line(aes(y = Median), color = "darkblue")+
geom_line(aes(y = Maximum), color = "red", linetype = "dashed", size = 1.5)+
geom_line(aes(y = Minimum), color ="red", linetype = "dashed", size = 1.5)+
facet_wrap(~variable, scales = "free")+
scale_x_date(date_labels = "%b", date_breaks = "month", name = "Month")+
ylab("Daily Cumulative Precipitation (mm)")
Does it look what you are trying to achieve ?
EDIT: Adding Legends
Adding a legend here is not easy as you are using different geom (ribbon, line) with different color, shape, ...
So, one way is to regroup statistics that can be plot with the same geom and do:
mydata %>% pivot_longer(cols = c(Median, Minimum,Maximum), names_to = "Statistic",values_to = "Value") %>%
ggplot(aes(x = NewDate))+
geom_ribbon(aes(ymin = Lower, ymax = Upper, fill = "Upper / Lower"), alpha =0.5)+
geom_line(aes(y = Value, color = Statistic, linetype = Statistic, size = Statistic))+
facet_wrap(~variable, scales = "free")+
scale_x_date(date_labels = "%b", date_breaks = "month", name = "Month")+
ylab("Daily Cumulative Precipitation (mm)")+
scale_size_manual(values = c(1.5,1,1.5))+
scale_linetype_manual(values = c("dashed","solid","dashed"))+
scale_color_manual(values = c("red","darkblue","red"))+
scale_fill_manual(values = "grey", name = "")
So, it looks good but as you can see, it's a litle bit weird as the Upper/Lower is slightly out of the main legends.
Another solution is to add legends as labeling on the last date. For that, you can create a second dataframe by subsetting only the last date of your first dataframe:
mydata_label <- mydata %>% filter(NewDate == max(NewDate)) %>%
pivot_longer(cols = Median:Lower, names_to = "Stat",values_to = "val")
Then, without changing much the plotting part, you can do:
ggplot(mydata, aes(x = NewDate))+
geom_ribbon(aes(ymin = Lower, ymax = Upper), alpha =0.5)+
geom_line(aes(y = Median), color = "darkblue")+
geom_line(aes(y = Maximum), color = "red", linetype = "dashed", size = 1.5)+
geom_line(aes(y = Minimum), color ="red", linetype = "dashed", size = 1.5)+
facet_wrap(~variable, scales = "free")+
scale_x_date(date_labels = "%b", date_breaks = "month", name = "Month", limits = c(min(mydata$NewDate),max(mydata$NewDate)+25))+
ylab("Daily Cumulative Precipitation (mm)")+
geom_text(data = mydata_label,
aes(x = NewDate+5, y = val, label = Stat, color = Stat), size = 2, hjust = 0, show.legend = FALSE)+
scale_color_manual(values = c("Median" = "darkblue","Maximum" = "red","Minimum" = "red","Upper" = "black", "Lower" = "black"))
I reduced on purpose the size of the text labeling due to space issues in order you can see all of them. But based on the figure you attached to your question, you should have plenty of space to make it working.
Edit: This question has been marked as duplicated, but the responses here have been tried and did not work because the case in question is a line chart, not a bar chart. Applying those methods produces a chart with 5 lines, 1 for each year - not useful. Did anyone who voted to mark as duplicate actually try those approaches on the sample dataset supplied with this question? If so please post as an answer.
Original Question:
There's a feature in Excel pivot charts which allows multilevel categorical axes.I'm trying to find a way to do the same thing with ggplot (or any other plotting package in R).
Consider the following dataset:
set.seed(1)
df=data.frame(year=rep(2009:2013,each=4),
quarter=rep(c("Q1","Q2","Q3","Q4"),5),
sales=40:59+rnorm(20,sd=5))
If this is imported to an Excel pivot table, it is straightforward to create the following chart:
Note how the x-axis has two levels, one for quarter and one for the grouping variable, year. Are multilevel axes possible with ggplot?
NB: There is a hack with facets that produces something similar, but this is not what I'm looking for.
library(ggplot2)
ggplot(df) +
geom_line(aes(x=quarter,y=sales,group=year))+
facet_grid(.~year,scales="free")
New labels are added using annotate(geom = "text",. Turn off clipping of x axis labels with clip = "off" in coord_cartesian.
Use theme to add extra margins (plot.margin) and remove (element_blank()) x axis text (axis.title.x, axis.text.x) and vertical grid lines (panel.grid.x).
library(ggplot2)
ggplot(data = df, aes(x = interaction(year, quarter, lex.order = TRUE),
y = sales, group = 1)) +
geom_line(colour = "blue") +
annotate(geom = "text", x = seq_len(nrow(df)), y = 34, label = df$quarter, size = 4) +
annotate(geom = "text", x = 2.5 + 4 * (0:4), y = 32, label = unique(df$year), size = 6) +
coord_cartesian(ylim = c(35, 65), expand = FALSE, clip = "off") +
theme_bw() +
theme(plot.margin = unit(c(1, 1, 4, 1), "lines"),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank())
See also the nice answer by #eipi10 here: Axis labels on two lines with nested x variables (year below months)
The suggested code by Henrik does work and helped me a lot! I think the solution has a high value. But please be aware, that there is a small misstake in the first line of the code, which results in a wrong order of the data.
Instead of
... aes(x = interaction(year,quarter), ...
it should be
... aes(x = interaction(quarter,year), ...
The resulting graphic has the data in the right order.
P.S. I suggested an edit (which was rejected until now) and, due to a small lack of reputation, I am not allowed to comment, what I rather would have done.
User Tung had a great answer on this thread
library(tidyverse)
library(lubridate)
library(scales)
set.seed(123)
df <- tibble(
date = as.Date(41000:42000, origin = "1899-12-30"),
value = c(rnorm(500, 5), rnorm(501, 10))
)
# create year column for facet
df <- df %>%
mutate(year = as.factor(year(date)))
p <- ggplot(df, aes(date, value)) +
geom_line() +
geom_vline(xintercept = as.numeric(df$date[yday(df$date) == 1]), color = "grey60") +
scale_x_date(date_labels = "%b",
breaks = pretty_breaks(),
expand = c(0, 0)) +
# switch the facet strip label to the bottom
facet_grid(.~ year, space = 'free_x', scales = 'free_x', switch = 'x') +
labs(x = "") +
theme_classic(base_size = 14, base_family = 'mono') +
theme(panel.grid.minor.x = element_blank()) +
# remove facet spacing on x-direction
theme(panel.spacing.x = unit(0,"line")) +
# switch the facet strip label to outside
# remove background color
theme(strip.placement = 'outside',
strip.background.x = element_blank())
p
I am trying to make some changes to my plot, but am having difficulty doing so.
(1) I would like warm, avg, and cold to be filled in as the colors red, yellow, and blue, respectively.
(2) I am trying to make the y-axis read "Count" and have it be horizontally written.
(3) In the legend, I would like the title to be Temperatures, rather than variable
Any help making these changes would be much appreciated along with other suggestions to make the plot look nicer.
df <- read.table(textConnection(
'Statistic Warm Avg Cold
Homers(Away) 1.151 1.028 .841
Homers(Home) 1.202 1.058 .949'), header = TRUE)
library(ggplot2)
library(reshape2)
df <- melt(df, id = 'Statistic')
ggplot(
data = df,
aes(
y = value,
x = Statistic,
group = variable,
shape = variable,
fill = variable
)
) +
geom_bar(stat = "identity")
You are on the right lines by trying to reshape the data into long format. My preference is to use gather from the tidyr package for that. You can also create the variable names Temperatures and Count in the gather step.
The next step is to turn the 3 classes of temperature into a factor, ordered from cold, through average, to warm.
Now you can plot. You want position = "dodge" to get the bars side by side, since it makes no sense to stack the values in a single bar. Fill colours you specify using scale_fill_manual.
You rotate the y-axis title by manipulating axis.title.y.
So putting all of that together (plus a black/white theme):
library(dplyr)
library(tidyr)
library(ggplot2)
df %>%
gather(Temperatures, Count, -Statistic) %>%
mutate(Temperatures = factor(Temperatures, c("Cold", "Avg", "Warm"))) %>%
ggplot(aes(Statistic, Count)) +
geom_col(aes(fill = Temperatures), position = "dodge") +
scale_fill_manual(values = c("blue", "yellow", "red")) +
theme_bw() +
theme(axis.title.y = element_text(angle = 0, vjust = 0.5))
Result:
I'd question whether Count is a sensible variable name in this case.
You are almost there. To map specific colors to specific factor levels you can use scale_fill_manual and create your own scale:
scale_fill_manual(values=c("Warm"="red", "Avg"="yellow", "Cold"="blue")) +
Changing the y axis legend is also easy in ggplot:
ylab("Count") +
And to change the legend title you can use:
labs(fill='TEMPERATURE') +
Giving us:
ggplot(df, aes(y = value, x = Statistic, group= variable, fill = variable)) +
geom_bar(stat = "identity") +
scale_fill_manual(values=c("Warm"="red", "Avg"="yellow", "Cold"="blue")) +
labs(fill='TEMPERATURE') +
ylab("Count") +
xlab("") +
theme_bw() +
theme(axis.title.y = element_text(angle = 0, vjust = 0.5))