Annotating figure changes legend in ggplot2 - r

I've been trying to gain some experience with ggplot2 using Kieran Healy's excellent online book as a starting point, but I've run into a quirk I can't figure out. Using Gapminder data, I'm trying to create a scatterplot showing life expectancy vs GDP per capita. I'd like to include two years of data, distinguishing the years using both color and shape. Finally, I would like to label an outlier, Kuwait in 1952.
I know I could use annotate to do this manually, but I was hoping someone might have a more elegant solution. In addition, I'd love to know why this code, which seems perfectly legitimate to this novice, is not working the way it seems it should. Thanks very much!
library(ggplot2)
library(gapminder)
gap <- subset(gapminder,year==min(year) | year==max(year))
gap$year <- as.character(gap$year)
p <- ggplot(data = gap,
mapping = aes(y = lifeExp,
x = gdpPercap,
col = year))
p + geom_point(aes(shape=year)) + theme_classic() +
scale_x_log10(labels=scales::dollar) +
geom_text_repel(data=subset(gap,gdpPercap>100000),
mapping=aes(label=country)) +
labs(title="Life expectancy by output per capita",
y="",x="GDP per capita")

I personally prefer using annotate for annotation, because you won't have exactly this type of surprises that you have with using a geom. Also, geoms tend to draw for each row of your data frame, so this may create some ugly effects on the fonts/ shapes.
library(ggplot2)
library(ggrepel)
library(gapminder)
gap <- subset(gapminder,year==min(year) | year==max(year))
gap$year <- as.character(gap$year)
ggplot(data = gap, aes(y = lifeExp, x = gdpPercap, col = year)) +
geom_point(aes(shape=year)) +
theme_classic() +
scale_x_log10(labels=scales::dollar) +
annotate(geom = "label_repel",
x = gap$gdpPercap[gap$gdpPercap>100000],
y = gap$lifeExp[gap$gdpPercap>100000],
label = gap$country[gap$gdpPercap>100000])
Created on 2020-04-25 by the reprex package (v0.3.0)

Related

ggplot - is it at all possible to draw multiple lines without grouping data

I am currently writing a theoretical article where no data is used and unfortunately I must say that I find ggplot hard to use in such applications for showing theoretical examples. I've been using ggplot for years on real, empirical data and there I liked ggplot very much. However, consider my current case. I am trying to plot two exponential functions together on a graph. One function is 0.5^x and the other one is 0.8^x. In order to produce a ggplot graph, I have to do the following:
x <- 1:20
a <- 0.5^x
b <- 0.8^x
data.frame(x, a, b) %>%
pivot_longer(c(a, b)) %>%
ggplot(aes(x = x, y = value, color = name, group = name))+
geom_line()
Output:
Which completely doesn't correspond to the psychological process in my head to create such a graph - mainly becasue of converting it to the long format to be able to group it.
In my head, I am creating two simple, but distinct curves on the same canvas. So I should be able to use something like:
qplot(x, 0.5^x, geom = "line")+
qplot(x, 0.8^x, geom = "line")
However, that doesn't work because
Can't add `qplot(x, 0.8^x, geom = "line")` to a ggplot object.
Any help with how to create such a simple graph without reshaping the data would be appreciated, thanks.
Using geom_function you could do:
library(ggplot2)
ggplot() +
geom_function(fun = ~ 0.5^.x, mapping = aes(color = "a")) +
geom_function(fun = ~ 0.8^.x, mapping = aes(color = "b")) +
xlim(1, 20)
Created on 2022-05-08 by the reprex package (v2.0.1)
Maybe something like this. It is possible to keep the data in wide format. But generally it is better to bring it long foramt:
library(ggplot2)
ggplot()+
geom_line(aes(x, 0.5^x, color="red"))+
geom_line(aes(x, 0.8^x, color = "blue"))+
scale_color_identity()

Bug in ggplot2?

I'm currently working on plotting simple plots using ggplot2.
The graph looks good, but there is one tiny detail I can't fix.
When you look at the legend, it says "Low n" twice. One of them should be "High n".
Here is my code:
half_plot <- ggplot() +
ggtitle(plot_title) +
geom_line(data = plot_dataframe_SD1, mapping = aes(x = XValues, y = YValues_SD1, color = "blue")) +
geom_line(data = plot_dataframe_SD2, mapping = aes(x = XValues, y = YValues_SD2, color = "green")) +
xlim(1, 2) +
ylim(1, 7) +
xlab("Standard Deviation") +
ylab(AV_column_name) +
scale_fill_identity(name = 'the fill', guide = 'legend',labels = c('m1')) +
scale_colour_manual(name = 'Legend',
values =c('blue'='blue','green'='green'),
labels = c(paste("Low ", Mod_column_name), paste("High ", Mod_column_name))
Here is the graph I get in my output:
So do you know how to fix this?
And there is one more thing that makes me curious: I can't remember that I changes anything in this code, but I know that the legend worked just fine a few days ago. I safed pictures I made wih this code and it looks alright..
Also if you have any further suggestions how to upgrade the graph, these suggestions are very welcome too.
When asking questions, it will help us if you provide a reproducible example including the data. With some sample data, there are a couple ways to fix it.
Sample data
library(dplyr)
plot_dataframe_SD1 = data.frame(XValues=seq(1,2,by=.2)) %>%
mutate(YValues_SD1=XValues*2)
plot_dataframe_SD2 = data.frame(XValues=seq(1,2,by=.2)) %>%
mutate(YValues_SD2=XValues*5)
The simplest way to modify your code is to supply the desired color label in the aesthetic.
Mod_column_name = 'n'
half_plot <- ggplot() +
# put the desired label name in the aesthetic
# link describing the bang bang operator (!!) https://www.r-bloggers.com/2019/07/bang-bang-how-to-program-with-dplyr/ geom_line(data=plot_dataframe_SD1,mapping=aes(x=XValues,y=YValues_SD1,color=!!paste('Low',Mod_column_name))) +
geom_line(data=plot_dataframe_SD2,mapping=aes(x=XValues,y=YValues_SD2,color=!!paste('High',Mod_column_name))) +
scale_color_manual(values=c('blue','green'),
labels=c(paste('Low',Mod_column_name),paste('High',Mod_column_name)))
A more general approach is to join the dataframes and pivot the joined df to have a column with the SD values and another to specify how to separate the colors. This makes it easier to plot without having to make multiple calls to geom_line.
# Join the dfs, pivot the SD columns longer, and make a new column with your desired labels
joined_df = plot_dataframe_SD1 %>% full_join(plot_dataframe_SD2,by='XValues') %>%
tidyr::pivot_longer(cols=contains('YValues'),names_to='df_num',values_to='SD') %>%
mutate(label_name=if_else(df_num == 'YValues_SD1',paste('Low',Mod_column_name),paste('High',Mod_column_name)))
# Simplified plot
ggplot(data=joined_df,aes(x=XValues,y=SD,color=label_name)) +
geom_line() +
scale_color_manual(values=c('blue','green'),
labels=c(paste('Low',Mod_column_name),paste('High',Mod_column_name)))

Increase spaces between x values of boxplot (overlapping x labels)

Hello I am very new to using coding language and recently made my first couple of figures in R. I used this code to make the figures and they turned out good except that the labels in the x axis were overlapping.
library(ggplot2)
ggplot(LR_density, aes(x=Plant_Lines, y=`Lateral_Root_Density.(root/cm)`, fill=Expression_Type)) +
geom_boxplot() +
geom_jitter(color="black", size=0.4, alpha=0.9) +
ggtitle("Lateral root density across plant expression types")
The figure produced by the line of code I used
I was wondering if anyone knew how to get the x axis labels to be more spaced out in ggplot2 boxplots. I have been looking around but havent found a clear answer on this. Any help on what to do or where to look would be great!
As per comment, this thread shows another option to deal with overlapping x axis labels, which one can use since ggplot2 3.3.0
In included a second graph which "squeezes" the axis a bit, which kind of also simulates the effect of changing the viewport/ file size.
library(ggplot2)
ggplot(diamonds, aes(x = cut, y = price)) +
geom_boxplot() +
scale_x_discrete(guide = guide_axis(n.dodge = 2))
ggplot(diamonds, aes(x = cut, y = price)) +
geom_boxplot() +
scale_x_discrete(guide = guide_axis(n.dodge = 2)) +
coord_fixed(1/10^3.4)
Created on 2020-04-30 by the reprex package (v0.3.0)

How to format the scatterplots of data series in R

I have been struggling in creating a decent looking scatterplot in R. I wouldn't think it was so difficult.
After some research, it seemed to me that ggplot would have been a choice allowing plenty of formatting. However, I'm struggling in understanding how it works.
I'd like to create a scatterplot of two data series, displaying the points with two different colours, and perhaps different shapes, and a legend with series names.
Here is my attempt, based on this:
year1 <- mpg[which(mpg$year==1999),]
year2 <- mpg[which(mpg$year==2008),]
ggplot() +
geom_point(data = year1, aes(x=cty,y=hwy,color="yellow")) +
geom_point(data = year2, aes(x=cty,y=hwy,color="green")) +
xlab('cty') +
ylab('hwy')
Now, this looks almost OK, but with non-matching colors (unless I suddenly became color-blind). Why is that?
Also, how can I add series names and change symbol shapes?
Don't build 2 different dataframes:
df <- mpg[which(mpg$year%in%c(1999,2008)),]
df$year<-as.factor(df$year)
ggplot() +
geom_point(data = df, aes(x=cty,y=hwy,color=year,shape=year)) +
xlab('cty') +
ylab('hwy')+
scale_color_manual(values=c("green","yellow"))+
scale_shape_manual(values=c(2,8))+
guides(colour = guide_legend("Year"),
shape = guide_legend("Year"))
This will work with the way you currently have it set-up:
ggplot() +
geom_point(data = year1, aes(x=cty,y=hwy), col = "yellow", shape=1) +
geom_point(data = year2, aes(x=cty,y=hwy), col="green", shape=2) +
xlab('cty') +
ylab('hwy')
You want:
library(ggplot2)
ggplot(mpg, aes(cty, hwy, color=as.factor(year)))+geom_point()

3-variables plotting heatmap ggplot2

I'm currently working on a very simple data.frame, containing three columns:
x contains x-coordinates of a set of points,
y contains y-coordinates of the set of points, and
weight contains a value associated to each point;
Now, working in ggplot2 I seem to be able to plot contour levels for these data, but i can't manage to find a way to fill the plot according to the variable weight. Here's the code that I used:
ggplot(df, aes(x,y, fill=weight)) +
geom_density_2d() +
coord_fixed(ratio = 1)
You can see that there's no filling whatsoever, sadly.
I've been trying for three days now, and I'm starting to get depressed.
Specifying fill=weight and/or color = weight in the general ggplot call, resulted in nothing. I've tried to use different geoms (tile, raster, polygon...), still nothing. Tried to specify the aes directly into the geom layer, also didn't work.
Tried to convert the object as a ppp but ggplot can't handle them, and also using base-R plotting didn't work. I have honestly no idea of what's wrong!
I'm attaching the first 10 points' data, which is spaced on an irregular grid:
x = c(-0.13397460,-0.31698730,-0.13397460,0.13397460,-0.28867513,-0.13397460,-0.31698730,-0.13397460,-0.28867513,-0.26794919)
y = c(-0.5000000,-0.6830127,-0.5000000,-0.2320508,-0.6547005,-0.5000000,-0.6830127,-0.5000000,-0.6547005,0.0000000)
weight = c(4.799250e-01,5.500250e-01,4.799250e-01,-2.130287e+12,5.798250e-01,4.799250e-01,5.500250e-01,4.799250e-01,5.798250e-01,6.618956e-01)
any advise? The desired output would be something along these lines:
click
Thank you in advance.
From your description geom_density doesn't sound right.
You could try geom_raster:
ggplot(df, aes(x,y, fill = weight)) +
geom_raster() +
coord_fixed(ratio = 1) +
scale_fill_gradientn(colours = rev(rainbow(7)) # colourmap
Here is a second-best using fill=..level... There is a good explanation on ..level.. here.
# load libraries
library(ggplot2)
library(RColorBrewer)
library(ggthemes)
# build your data.frame
df <- data.frame(x=x, y=y, weight=weight)
# build color Palette
myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")), space="Lab")
# Plot
ggplot(df, aes(x,y, fill=..level..) ) +
stat_density_2d( bins=11, geom = "polygon") +
scale_fill_gradientn(colours = myPalette(11)) +
theme_minimal() +
coord_fixed(ratio = 1)

Resources