I have a Lorenz Curve graph that I filled by factor variables (male and female). This was done simply enough and overlapping was not an issue because there were only two factors.
Wage %>%
ggplot(aes(x = salary, fill = gender)) +
stat_lorenz(geom = "polygon", alpha = 0.65) +
geom_abline(linetype = "dashed") +
coord_fixed() +
scale_fill_hue() +
theme(legend.title = element_blank()) +
labs(x = "Cumulative Percentage of Observations",
y = "Cumulative Percentage of Wages",
title = "Lorenz curve by sex")
This provides the following graph:
However, when I have more than two factors (in this case four), the overlapping becomes a serious problem even if I use contrasting colors. Changing alpha does not do much at this stage. Have a look:
Wage %>%
ggplot(aes(x = salary, fill = Diploma)) +
stat_lorenz(geom = "polygon", alpha = 0.8) +
geom_abline(linetype = "dashed") +
coord_fixed() +
scale_fill_manual(values = c("green", "blue", "black", "white")) +
theme(legend.title = element_blank()) +
labs(x = "Cumulative Percentage of Observations",
y = "Cumulative Percentage of Wages",
title = "Lorenz curve by diploma")
At this point I've tried all different color pallettes, hues, brewers, manuals etc. I've also tried reordering the factors but as you can imagine, this did not work as well.
What I need is probably a single argument or function to stack all these areas on top of each other so they all have their distinct colors. Funny enough, I've failed to find what I'm looking for and decided to ask for help.
Thanks a lot.
The problem was solved by a dear friend. This was done by adding the categorical variables layer by layer, without defining the Lorenz Curve as a whole.
ggplot() + scale_fill_manual(values = wes_palette("GrandBudapest2", n = 4)) +
stat_lorenz(aes(x=Wage[Wage$Diploma==levels(Wage$Diploma)[3],]$salary, fill=Wage[Wage$Diploma==levels(Wage$Diploma)[3],]$Diploma), geom = "polygon") +
stat_lorenz(aes(x=Wage[Wage$Diploma==levels(Wage$Diploma)[4],]$salary, fill=Wage[Wage$Diploma==levels(Wage$Diploma)[4],]$Diploma), geom = "polygon") +
stat_lorenz(aes(x=Wage[Wage$Diploma==levels(Wage$Diploma)[2],]$salary, fill=Wage[Wage$Diploma==levels(Wage$Diploma)[2],]$Diploma), geom = "polygon") +
stat_lorenz(aes(x=Wage[Wage$Diploma==levels(Wage$Diploma)[1],]$salary, fill=Wage[Wage$Diploma==levels(Wage$Diploma)[1],]$Diploma), geom = "polygon") +
geom_abline(linetype = "dashed") +
coord_fixed() +
theme(legend.title = element_blank()) +
labs(x = "Cumulative Percentage of Observations",
y = "Cumulative Percentage of Wages",
title = "Lorenz curve by diploma")
Which yields:
Related
Is it possible to make the transparency (alpha) appear as a continuous gradient in the ggplot legend? Currently, the plot looks like this:
Here, the different values for alpha are represented by dots in a different transparency. I would like it to be represented as a bar with a continuous transparency gradient as illustrated below (but as a gradient of transparency instead of color):
The code I use for making the plot is this:
df %>%
ggplot(aes(x = intraEU_trade_bymemberstate_pct,
y = gini_eurostat,
alpha = year,
size = GDP_percap_currentUSD,
color = as.factor(lowGDP_percap_currentUSD))) +
geom_point() +
geom_smooth(method="lm", formula = y ~ x, show.legend = FALSE, color = "#6c757d") +
theme_few() +
scale_colour_manual(name="GDP per capita \n(dummy)",
labels = c("Above-average", "Below-average"),
values = c("#046d9a", "#ce5348")) +
scale_alpha_continuous(range = c(0.1, 1)) +
scale_size(range=c(0.3, 4)) + # control the size of the dots
guides(alpha = guide_legend(order = 1),
size = guide_legend(order = 2),
color = guide_legend(order = 3)) +
labs(x = "Intra-EU trade (% of total trade)",
y = "Gini (%)",
alpha = "Year",
size = "GDP per capita \n(current USD)",
color = "GDP per capita")
You can't have a color bar as the guide for an alpha scale. However, you can set a color gradient in which one of the two colours is fully transparent, which amounts to the same thing.
If you are already using the color scale (as in your example), it would be best to have an alpha color bar for each of your two dummy variables. For this you need the ggnewscale package
Obviously, I don't have your data, so here's a working example with the built-in mtcars data set.
library(ggplot2)
ggplot(mtcars[1:16, ], aes(wt, disp)) +
geom_point(aes(color = mpg), size = 3) +
scale_color_gradient(low = alpha("navy", 0), high = "navy",
name = "Below average") +
ggnewscale::new_scale_color() +
geom_point(aes(color = mpg), data = mtcars[17:32,], size = 3) +
scale_color_gradient(low = alpha("red3", 0), high = "red3",
name = "Above average") +
theme_light(base_size = 16)
I am trying to add a legend for the mean and median to my histogram. I am also trying to change the scale on the y-axis that is labeled count. It is currently showing the density scale. I want the density plot but the count scale. Alternatively, I would be fine with a second scale or the counts at the end of the histogram. I am just not sure how to go about it. Below is some data and the current code. Thank you in advance.
studyData=data.frame(X=rchisq(1:100000, df=3))
colnames(studyData) <- "hoursstudying"
mu <- data.frame(mean(studyData$hoursstudying))
colnames(mu) <- "Mean"
med <- data.frame(median(studyData$hoursstudying))
colnames(med) <- "Median"
p <- ggplot(studyData, aes(x = hoursstudying)) +
geom_histogram(aes(y=(..density..)), binwidth = 1, colour = "black", fill = "lightblue") +
geom_density(alpha=.2, fill="#FF6666") +
geom_vline(data = mu, aes(xintercept = Mean),
color = "red", linetype = "dashed", size = 1) +
geom_vline(data = med, aes(xintercept = median(Median)),
color = "purple", size = 1) +
labs(title = "Hours Spent Completing Course Work") +
ylab("Count") +
xlab("Hours Studying")
theme(plot.title = element_text(hjust = 0.5))
p
You can access the count instead of density on the y axis much in the same way you reference the internal calculation of density using the "..XXXX.." notation. In this case, use ..count...
You will need to change both y aesthetics for geom_histogram() and geom_density():
ggplot(studyData, aes(x = hoursstudying)) +
geom_histogram(aes(y=(..count..)), binwidth = 1, colour = "black", fill = "lightblue") +
geom_density(aes(y=..count..), alpha=.2, fill="#FF6666") +
# ... everything else is the same
Note: also, I echo the comment from u/Limey. The median and mean values in your original plot shared are clearly wrong... yet when I run the code I am getting the values looking correct. Not sure what that's about, OP, but perhaps that's a different question.
Since #chemdork123 answered the question about the y-axis scale I won't say anything about it. To add the median/mean values to the legend you need to add them as aesthetics.
p <- ggplot(studyData, aes(x = hoursstudying)) +
geom_histogram(aes(y=(..density..)), binwidth = 1, colour = "black", fill = "lightblue") +
geom_density(alpha=.2, fill="#FF6666") +
geom_vline(data = mu, aes(xintercept = Mean,
color = "red"),
linetype = "dashed", size = 1) +
geom_vline(data = med, aes(xintercept = Median,
color = "purple"),
size = 1) +
scale_color_manual(values = c("purple", "red"),
labels = c("Median", "Mean")) +
labs(title = "Hours Spent Completing Course Work") +
ylab("Count") +
xlab("Hours Studying") +
theme(plot.title = element_text(hjust = 0.5))
I'm making depth profiles with ggplot. Some of the lines are drawn between the variable points using geom_path but some are not, even when I try adding "group=1" (which was the only solution I've found for this problem). I'm doing multiple plots for different lakes and for each lake there is one or multiple variables not getting a line by using geom_path. For the code below only the Chl.a variable is not drawing a line, all the others do. What could this depend on?
I also tried geom_line instead but this only worked for some variables since the it draws the line following the x-axis, but I want the line to go vertically following the y-axis. Can I achieve this using geom_line since geom_path doesn't seem to work for all variables?
gs <- ggplot(goodspirit, aes(y=goodspirit$Depth.m)) +
geom_point(aes(x=Temp, colour= "Temp")) +
geom_path(aes(x=Temp, color = "Temp"), size=1.5) +
geom_point(aes(x=zDOmg, color ="z(DO mg/L)")) +
geom_path(aes(x=zDOmg, color ="z(DO mg/L)"), size=1.5) +
geom_point(aes(x=Chl.a, color ="Chl.a"), na.rm = TRUE) +
geom_path(aes(x=Chl.a, color ="Chl.a"), na.rm = TRUE, size=1.5) +
geom_point(aes(x=zN2O, color ="z(N2O.nM)"), na.rm = TRUE) +
geom_line(aes(x=zN2O, color ="z(N2O.nM)"), na.rm = TRUE, size=1.5) +
geom_point(aes(x=Sal.ppt, color ="Salinity.ppt"), na.rm = TRUE) +
geom_line(aes(x=Sal.ppt, color ="Salinity.ppt"), na.rm = TRUE, size=1.5)+
geom_point(aes(x=zph, color ="z(pH)")) +
geom_path(aes(x=zph, color ="z(pH)"), size=1.5) +
scale_x_continuous(position = "top", limits=c(-3,5), expand = c(0,0))+
scale_y_reverse(expand = c(0.05,0))+
ylab("Depth (m)") + xlab("x") + ggtitle("Good spirit lake") + labs(colour
= "Parameters") +
theme(plot.title = element_text(hjust = 0.5)) + theme_light()
gs
enter image description here
After searching the web both yesterday and today, the only way I get a legend working was to follow the solution by 'Brian Diggs' in this post:
Add legend to ggplot2 line plot
Which gives me the following code:
library(ggplot2)
ggplot()+
geom_line(data=myDf, aes(x=count, y=mean, color="TrueMean"))+
geom_hline(yintercept = myTrueMean, color="SampleMean")+
scale_colour_manual("",breaks=c("SampleMean", "TrueMean"),values=c("red","blue"))+
labs(title = "Plot showing convergens of Mean", x="Index", y="Mean")+
theme_minimal()
Everything works just fine if I remove the color of the hline, but if I add a value in the color of hline that is not an actual color (like "SampleMean") I get an error that it's not a color (only for the hline).
How can adding a such common thing as a legend big such a big problem? There much be an easier way?
To create the original data:
#Initial variables
myAlpha=2
myBeta=2
successes=14
n=20
fails=n-successes
#Posterior values
postAlpha=myAlpha+successes
postBeta=myBeta+fails
#Calculating the mean and SD
myTrueMean=(myAlpha+successes)/(myAlpha+successes+myBeta+fails)
myTrueSD=sqrt(((myAlpha+successes)*(myBeta+fails))/((myAlpha+successes+myBeta+fails)^2*(myAlpha+successes+myBeta+fails+1)))
#Simulate the data
simulateBeta=function(n,tmpAlpha,tmpBeta){
tmpValues=rbeta(n, tmpAlpha, tmpBeta)
tmpMean=mean(tmpValues)
tmpSD=sd(tmpValues)
returnVector=c(count=n, mean=tmpMean, sd=tmpSD)
return(returnVector)
}
#Make a df for the data
myDf=data.frame(t(sapply(2:10000, simulateBeta, postAlpha, postBeta)))
Given solution works in most of the cases, but not for geom_hline (vline). For them you usually don't have to use aes, but when you need to generate a legend then you have to wrap them within aes:
library(ggplot2)
ggplot() +
geom_line(aes(count, mean, color = "TrueMean"), myDf) +
geom_hline(aes(yintercept = myTrueMean, color = "SampleMean")) +
scale_colour_manual(values = c("red", "blue")) +
labs(title = "Plot showing convergens of Mean",
x = "Index",
y = "Mean",
color = NULL) +
theme_minimal()
Seeing original data you can use geom_point for better visualisation (also added some theme changes):
ggplot() +
geom_point(aes(count, mean, color = "Observed"), myDf,
alpha = 0.3, size = 0.7) +
geom_hline(aes(yintercept = myTrueMean, color = "Expected"),
linetype = 2, size = 0.5) +
scale_colour_manual(values = c("blue", "red")) +
labs(title = "Plot showing convergens of Mean",
x = "Index",
y = "Mean",
color = "Mean type") +
theme_minimal() +
guides(color = guide_legend(override.aes = list(
linetype = 0, size = 4, shape = 15, alpha = 1))
)
I am making a plot with data from an incomplete factorial design. Due to the design, I have different length for the manual scale for colour and the manual scale for fill. Thus, I get two legends. How could I delete one of them or even better combine them?
I have looked at those questions:
Merge separate size and fill legends in ggplot
How to merge color, line style and shape legends in ggplot
How to combine scales for colour and size into one legend?
However, the answers did not help me as they did not handle incomplete designs.
Here is some example data and the plot I produced so far:
#Example data
Man1 <- c(25,25,30,30,30,30,35,35,40,40,40,40,45,45)
Man2 <- c(25,25,30,30,40,40,35,35,40,40,30,30,45,45)
DV <- c(24.8,25.2,29.9,30.3,35.2,35.7,34,35.1,40.3,39.8,35.8,35.9,44,44.8)
Data <- data.frame(Man1,Man2,DV)
#Plot
ggplot(data = Data, aes(x = Man1, y = DV, group=as.factor(Man2), colour=as.factor(Man2))) +
theme_bw() +
geom_abline(intercept = 0, slope = 1, linetype = "longdash") +
geom_point(position = position_dodge(1))
geom_smooth(method = "lm", aes(x = Man1, y = DV, group=as.factor(Man2), fill=as.factor(Man2))) +
scale_colour_manual(name = "Man2", values=c('grey20', 'blue','grey20','tomato3', 'grey20')) +
scale_fill_manual(name = "Man2", values=c('blue','tomato3'))
This gives me the following picture:
ggplot of incomplete design with two legends
Could someone give me a hint how to delete one of the legends or even better combine them? I would appreciate it!
By default the scale drops unused factor levels, which is relevant here because can only get lines for a couple of your groups.
You can use drop = FALSE to change this in the appropriate scale_*_manual() (which is for fill here).
Then use the same vector of colors for both the fill and color scales. I usually make a named vector for this.
# Make vector of colors
colors = c("25" = 'grey20', "30" = 'blue', "35" = 'grey20', "40" = 'tomato3', "45" = 'grey20')
#Plot
ggplot(data = Data, aes(x = Man1, y = DV, group=as.factor(Man2), colour= as.factor(Man2))) +
theme_bw() +
geom_abline(intercept = 0, slope = 1, linetype = "longdash") +
geom_point(position = position_dodge(1)) +
geom_smooth(method = "lm", aes(fill=as.factor(Man2))) +
scale_colour_manual(name = "Man2", values = colors) +
scale_fill_manual(name = "Man2", values = colors, drop = FALSE)
Alternatively, use guide = "none" to remove the fill legend all together.
ggplot(data = Data, aes(x = Man1, y = DV, group=as.factor(Man2), colour= as.factor(Man2))) +
theme_bw() +
geom_abline(intercept = 0, slope = 1, linetype = "longdash") +
geom_point(position = position_dodge(1)) +
geom_smooth(method = "lm", aes(fill=as.factor(Man2))) +
scale_colour_manual(name = "Man2", values = colors) +
scale_fill_manual(name = "Man2", values=c('blue','tomato3'), guide = "none")