This is my code
df %>%
select(`City Name`, `Real Wage 1`, `Real Wage 2`) %>%
pivot_longer(-c(`City Name`)) %>%
group_by(`City Name`) %>%
ggplot() +
aes(x = reorder(`City Name`, value), y = value, fill = name) +
geom_col() +
theme_classic()+
coord_flip()
I get this kind of plot, which is correct
PLOT
Actually, I need the bars of the variable "Real Wage 1" (the ones in red) on the left, while the bars of "Real Wage 2" (the ones in blue) on the right. I want to exchange the position of the variables ( i'm not looking for an exchange of colors).
How can I solve?
Related
I'd like to shade the area under a geom_step() curve on a plot with a discrete and ordered x-axis, e.g. to show the cumulative distribution for some frequency-ordered categories/
The basic geom_step() curve could be created like this:
library(dplyr)
library(ggplot2)
library(forcats)
diamonds %>%
group_by(color) %>%
summarize(count=n()) %>%
arrange(desc(count)) %>%
mutate(frac_of_tot = count/sum(count),
ecdf=cumsum(frac_of_tot),
color=fct_reorder(color, ecdf)) %>%
ggplot(aes(x=color, y=ecdf, group=0)) +
geom_step() +
expand_limits(y=0) +
labs(title="a pareto-style cumulative distribution chart",
subtitle="with x-axis ordered by decreasing frequency",
y="cumulative fraction of total") +
theme_minimal()
but adding the shaded area using geom_rect() as taught by this answer seems to re-order the x-axis, resulting in a nonsensical plot:
diamonds %>%
group_by(color) %>%
summarize(count=n()) %>%
arrange(desc(count)) %>%
mutate(frac_of_tot = count/sum(count),
ecdf=cumsum(frac_of_tot),
color=fct_reorder(color, ecdf)) %>%
ggplot(aes(x=color, y=ecdf, group=0)) +
geom_step() +
geom_rect(aes(xmin=color, xmax=lead(color), ymin=0, ymax=ecdf), alpha=0.3) +
expand_limits(y=0) +
labs(title="A sudden mess after adding geom_rect",
subtitle="with x-axis surprisingly back in alpha order",
y="cumulative fraction of total") +
theme_minimal()
Why is the geom_rect() layer causing the x-axis to be re-ordered?
How can I produce a plot that looks just like the first one, but with the area under the curve shaded?
It seems to me that doing this with geom_rect is doing it the hard way. With some minor data reshaping you can simply use geom_area
library(dplyr)
library(ggplot2)
library(forcats)
library(tidyr)
diamonds %>%
group_by(color) %>%
summarize(count = n()) %>%
arrange(desc(count)) %>%
mutate(frac_of_tot = count/sum(count),
ecdf = cumsum(frac_of_tot),
ecd = lag(ecdf),
color = fct_reorder(color, ecdf)) %>%
pivot_longer(starts_with("ecd")) %>%
arrange(color, name) %>%
ggplot(aes(x = color, y = value, group = 0)) +
geom_area(position = "identity", color = "black", alpha = 0.5) +
expand_limits(y = 0) +
labs(title = "a pareto-style cumulative distribution chart",
subtitle = "with x-axis ordered by decreasing frequency",
y = "cumulative fraction of total") +
theme_minimal()
This question already has an answer here:
Align violin plots with dodged box plots
(1 answer)
Closed 2 years ago.
I'm trying to show my data as a violin plot with an overlaid boxplot. I have four groups, split by two independent factors, so I put in the commands below.
The table has X1, Category, and Area columns.
ggplot(malbdata,aes(x=X1,y=Area,fill=Category))+
geom_violin()+
geom_boxplot(width=.1)
And what I get is the graph attached, where it places the boxplots next to the violins, but not within them. I'm very new to working with R; any ideas on what might be going wrong?
I believe the issue is the width = 0.1 parameter, e.g.
library(tidyverse)
library(palmerpenguins)
penguins %>%
na.omit() %>%
select(species, island, bill_length_mm) %>%
ggplot(aes(x = island, y = bill_length_mm, fill = species)) +
geom_boxplot(width=.1) +
geom_violin()
If you make the widths the same they line up as expected:
library(tidyverse)
library(palmerpenguins)
penguins %>%
na.omit() %>%
select(species, island, bill_length_mm) %>%
ggplot(aes(x = island, y = bill_length_mm, fill = species)) +
geom_boxplot(width=.2) +
geom_violin(width=.2)
Also, instead of using boxplots and violins (both of illustrating the distribution of values) it might be better to plot the individual values and the distribution, e.g.
library(tidyverse)
library(palmerpenguins)
library(ggbeeswarm)
penguins %>%
na.omit() %>%
select(species, island, bill_length_mm) %>%
rename(Species = species, Island = island) %>%
ggplot(aes(x = Island, y = bill_length_mm, fill = Species)) +
geom_boxplot(width=.4, outlier.shape = NA,
position = position_dodge2(preserve = "single")) +
geom_quasirandom(aes(colour = Species), groupOnX = TRUE,
width=.2, alpha = 0.5, dodge.width = 0.4) +
theme_bw(base_size = 16) +
ylab("Bill Length (mm)")
I would like to create a line plot using ggplot's geom_line() where all distances between years are equal independent of the actual value the year-variable takes and where the dots of geom_point() are connected if there are only two years in between but not if the temporal distance is more than that.
Example:
my.data<-data.frame(
year=c(2001,2003,2005,NA,NA,NA,NA,NA,NA,2019),
value=c(runif(10)))
As for the plot I have tried two different things, both of which are not ideal:
Plotting year as continuous variable with breaks=year and minor_breaks=F, where, obviously the distances between the first three observations are much smaller than the distance between 2005 and 2019, and where, unfortunately, all dots are connected:
library(ggplot2)
library(dplyr)
my.data %>%
ggplot(aes(x=year,y=value)) +
geom_line() +
geom_point() +
scale_x_continuous(breaks=c(2001,2003,2005,2019), minor_breaks=F) +
theme_minimal()
Removing NAs and plotting year as factor which yields equal spacing between the years, but obviously removes the lines between data points:
my.data %>%
filter(!is.na(year)) %>%
ggplot(aes(x=factor(year),y=value)) +
geom_line() +
geom_point() +
theme_minimal()
Are there any solutions to these issues? What am I overlooking?
First attempt:
Second attempt:
What I need (but ideally without the help of Paint):
my.data %>%
ggplot(aes(x=year)) +
geom_line(aes(y = ifelse(year <= 2005,value,NA))) +
geom_point(aes(y = value)) +
scale_x_continuous(breaks=c(2001,2003,2005,2019), minor_breaks=F) +
theme_minimal()
maybe something like this would work
I came to a bit convoluted and not super clean solution, but it might get the job done. I am checking if one year should be connected to the next one with lead(). And "remove" the appropriate connections by turning them white. The dummy column is there to put all years in one line and not two.
my.data = data.frame(year=c(2001,2003,2005,2008,2009,2012,2015,2016,NA,2019),
value=c(runif(10))) %>%
filter(!is.na(year)) %>%
mutate(grouped = if_else(lead(year) - year <= 2, "yes", "no")) %>%
fill(grouped, .direction = "down") %>%
mutate(dummy = "all")
my.data %>%
ggplot(aes(x = factor(year),y = value)) +
geom_line(aes(y = value, group = dummy, color = grouped), show.legend = FALSE) +
geom_point() +
scale_color_manual(values = c("yes" = "black", "no" = "white")) +
theme_classic()
This question already has an answer here:
ggplot2: Change factor order in legend
(1 answer)
Closed 3 years ago.
I have the dataframe below:
Target_Category<-c("Adhesion","Cytochrome")
Validated<-c(5,12)
Candidate<-c(10,23)
Exploratory<-c(7,6)
Unknown<-c(9,4)
dataf<-data.frame(Target_Category,Validated,Candidate,Exploratory,Unknown)
and I create the stacked barplot below with :
library(tidyverse)
d<-dataf %>%
gather(col, value, -Target_Category) %>%
ggplot() +
geom_bar(aes(Target_Category, value, fill = col), stat="identity")
d+scale_fill_manual(values=c("orange","gray48","black","green4"),
breaks = c("Validated", "Candidate",
"Exploratory", "Unknown"))
The issue is that the color values and the breaks are not corresponded correctly since the correct output should be like green,orange,grey black and the correspondence should be like the image below. The legend names' order is correct but not the color order in the plot.
Even when I use
d<-dataf %>%
gather(col, value, -Target_Category) %>%
mutate(col=factor(col, levels = c("Validated", "Candidate",
"Exploratory", "Unknown"))) %>%
ggplot() +
geom_bar(aes(Target_Category, value, fill = col), stat="identity")
d+scale_fill_manual(values=c("orange","gray48","black","green4"),
breaks = c("Validated", "Candidate",
"Exploratory", "Unknown"))
the output is not like the expected one in the 2nd image.
As camille explained, you can use factor to control colors' order:
d<-dataf %>%
gather(col, value, -Target_Category) %>%
mutate(col=factor(col, levels = c("Unknown", "Exploratory", "Candidate", "Validated"))) %>%
ggplot() +
geom_bar(aes(Target_Category, value, fill = col), stat="identity")
d+scale_fill_manual(values=c("black", "gray48", "orange", "green4"),
breaks = c("Validated", "Candidate",
"Exploratory", "Unknown"))
Take the following straightforward plot of two time series from the economics{ggplot2} dataset
require(dplyr)
require(ggplot2)
require(lubridate)
require(tidyr)
economics %>%
gather(indicator, percentage, c(4:5), -c(1:3, 6)) %>%
mutate(Y2K = year(date) >= 2000) %>%
group_by(indicator, Y2K) %>%
ggplot(aes(date, percentage, group = indicator, colour = indicator)) + geom_line(size=1)
I would like to change the linetype from "solid" to "dashed" (and possibly also the line size) for all points in the 21st century, i.e. for those observations for which Y2K equals TRUE.
I did a group_by(indicator, Y2K) but inside the ggplot command it appears I cannot use group = on multiple levels, so the line properties only differ by indicator now.
Question: How can I achieve this segmented line appearance?
UPDATE: my preferred solution is a slight tweak from the one by #sahoang:
economics %>%
gather(indicator, percentage, c(4:5), -c(1:3, 6)) %>%
ggplot(aes(date, percentage, colour = indicator)) +
geom_line(size=1, aes(linetype = year(date) >= 2000)) +
scale_linetype(guide = F)
This eliminates the group_by as commented by #Roland, and the filter steps make sure that the time series will be connected at the Y2K point (in case the data would be year based, there could be a visual discontinuity otherwise).
Even easier than #Roland's suggestion:
economics %>%
gather(indicator, percentage, c(4:5), -c(1:3, 6)) %>%
mutate(Y2K = year(date) >= 2000) %>%
group_by(indicator, Y2K) -> econ
ggplot(econ, aes(date, percentage, group = indicator, colour = indicator)) +
geom_line(data = filter(econ, !Y2K), size=1, linetype = "solid") +
geom_line(data = filter(econ, Y2K), size=1, linetype = "dashed")
P.S. Alter plot width to remove spike artifacts (red line).
require(dplyr)
require(ggplot2)
require(lubridate)
require(tidyr)
economics %>%
gather(indicator, percentage, c(4:5), -c(1:3, 6)) %>%
mutate(Y2K = year(date) >= 2000) %>%
ggplot(aes(date, percentage, colour = indicator)) +
geom_line(size=1, aes(linetype = Y2K))