R plot_model - Change both font size and legend labels - r

I created a plot of an interaction using plot_model. The problem is that plot_model doesn't have a way to change the legend labels, so I had to add that using scale_color_manual. But that seems to override the changes I want to make to the size of the axis labels.
The data file df1 looks like this:
X Groups DV SK
1 1 Group 1 5.00 1.1813822
2 2 Group 1 1.00 1.1813822
3 3 Group 1 2.00 1.1813822
4 4 Group 1 1.75 0.2678133
5 5 Group 1 3.75 0.2678133
6 6 Group 1 2.50 0.7245978
...
360 360 Group 3 6.00 0.2678133
361 361 Group 3 1.75 -1.5593244
362 362 Group 3 3.25 -0.6457555
363 363 Group 3 5.25 -1.1025399
364 364 Group 3 3.00 -2.9296776
365 365 Group 3 2.75 0.2678133
And here is my attempt to create the plot:
library(sjPlot)
library(ggplot2)
fit1 = lm(DV~Groups*SK, data=df1)
gg_colors <- c("blue", "red")
p1 <- plot_model(fit1, type = "pred", terms = c("Groups", "SK[-0.6457555 ,0.7245978]"), title= "", axis.title = c("","Y TITLE"), legend.title="SK")
p1 +
scale_color_manual(labels = c("25 Percentile", "75th Percentile"), values=gg_colors) + font_size(axis_title.y = 12, labels.x=12) + legend_style(pos="top")
The output looks like this:
plot
The problem is that the font size didn't change to 12 as I specified in the code.
Ideally I'd also like to change the axis x labels to black instead of grey.

This might be most clearly handled using a theme element and giving a global text size:
library(sjPlot)
#> Install package "strengejacke" from GitHub (`devtools::install_github("strengejacke/strengejacke")`) to load all sj-packages at once!
library(tidyverse)
df1 <- tibble(Groups = rep(c("Group 1", "Group 2", "Group 3"), each = 100),
DV = rnorm(300, 5),
SK = rnorm(300, 0, 2))
fit1 <- lm(DV~Groups*SK, data=df1)
gg_colors <- c("blue", "red")
p1 <- plot_model(fit1, type = "pred", terms = c("Groups", "SK[-0.6457555 ,0.7245978]"), title= "", axis.title = c("","Y TITLE"), legend.title="SK")
p1 +
scale_color_manual(labels = c("25 Percentile", "75th Percentile"), values=gg_colors) +
theme(text = element_text(size = 14),
axis.text = element_text(colour = "black"),
legend.position = "top")
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.

Related

label mean lines in ggplot that are mapped in a group

I have density plots for each shift and year. The means are plotted by grouping in a df called mu. I also add vertical reference lines which I can label without issue but I cannot seem to get the labels on the grouped vertical lines. You will see my latest attempt which throws an error "Aesthetics must be either length 1 or the same as the data (134): x"
My code
library(ggplot2)
library(dplyr)
df <- read.csv("f4_bna_no_cup.csv")
head(df)
ï..n yr s ys x
1 1 2021 1 2021-1 116.83
2 2 2021 1 2021-1 114.83
3 3 2021 1 2021-1 115.50
4 4 2021 1 2021-1 115.42
5 5 2021 1 2021-1 115.58
6 6 2021 1 2021-1 115.58
#summarize means by ys (year-shift)
mu <- df %>%
group_by(ys,s) %>%
summarise(grp.mean = mean(x))
mu
ys s grp.mean
<chr> <int> <dbl>
1 2021-1 1 116.
2 2021-2 2 117.
3 2022-1 1 114.
4 2022-2 2 115.
llab<-mu
shift <- c("Shift 1", "Shift 2")
#density charts on df
ggplot(data=df, aes(x=x,group =ys, fill = yr, color = yr)) +
geom_density(alpha = 0.4) +
scale_x_continuous(limits=c(112,120))+
geom_vline(aes(xintercept = grp.mean), data = mu, linetype = "dashed", size = 0.5) +
geom_text(aes(x=llab$grp.mean, y=.6), label = llab$ys) + #this throws the error
geom_vline(aes(xintercept=114.8), linetype="dashed", size=0.5, color = 'green3') +
geom_text(aes(x=114.8, y=.6), label = "Target", angle = 90, color="black",size=3) +
geom_vline(aes(xintercept=114.1), linetype="solid", size=0.5, color = 'limegreen') +
geom_text(aes(x=114.1, y=.55), label = "Potential", angle = 90, color="black",size=3 ) +
geom_vline(aes(xintercept=113.4), linetype="solid", size=0.5, color = 'firebrick3') +
geom_text(aes(x=113.4, y=.62), label = "Label wt", angle = 90,
color="black",size=3, family = "Times New Roman", vjust=0) +
facet_grid(
.~s,
labeller = labeller(
s = c(`1` = "Shift 1", `2` = "Shift 2")
))+
theme_light()+
theme(legend.position = "none")
Output so far...I'm so close.
Persistence pays off. I figured it out and thought I would share it in case someone else has a similar problem:
All code remains the same as in my question except a slight change to grouping for the mu df, AND replace the line that I noted as throwing the error as follows:
#small change to group_by, retaining yr
mu <- df %>%
group_by(yr,s,ys) %>%
summarise(grp.mean = mean(x))
Replace: geom_text(aes(x=llab$grp.mean, y=.6), label = llab$ys), with
geom_text(data = mu, aes(label = yr), x = mu$grp.mean, y = .60, color = "black", angle = 90, vjust = 0)

Why does gganimate fail to order lines correctly by date with transition_reveal?

I'm aiming to reproduce an animated figure by Ed Hawkins on climate change in R with gganimate. The figure is called climate spiral. While a static ggplot figure shows the correct order of lines by year (the most recent data on top), the animated plot with transition_reveal() results in a wrong order of the lines.
Here is a reproducible example code with synthetic data:
library(tidyverse)
library(lubridate)
library(gganimate)
library(RColorBrewer)
# Create monthly data from 1950 to 2020 (and a component for rising values with time)
df <- tibble(year = rep(1950:2020, each = 12),
month = rep(month.abb, 2020-1950+1)) %>%
mutate(date = dmy(paste("01",month,year)),
value = rnorm(n(), 0, 2) + row_number()*0.005) %>%
with_groups(year, mutate, value_yr = mean(value))
temp <- df %>%
ggplot(aes(x = month(date, label=T), y = value, color = value_yr)) +
geom_line(size = 0.6, aes(group = year)) +
geom_hline(yintercept = 0, color = "white") +
geom_hline(yintercept = c(-4,4), color = c("skyblue3","red1"), size = 0.2) +
geom_vline(xintercept = 1:12, color = "white", size = 0.2) +
annotate("label", x = 12.5, y = c(-4,0,4), label = c("-4°C","0°C","+4°C"),
color = c("skyblue3","white","red1"), size = 2.5, fill = "#464950",
label.size = NA, label.padding = unit(0.1, "lines"),) +
geom_point(x = 1, y = -11, size = 15, color = "#464950") +
geom_label(aes(x = 1, y = -11, label = year),
color = "white", size = 4,
fill = "#464950", label.size = NA) +
coord_polar(start = 0) +
scale_color_gradientn(colors = rev(brewer.pal(n=11, name = "RdBu")),
limits = range(df$value_yr)) +
labs(x = "", y = "") +
theme_bw() +
theme(panel.background = element_blank(),
panel.border = element_blank(),
panel.grid.major = element_blank(),
plot.background=element_rect(fill="#464950", color="#464950"),
axis.text.x = element_text(margin = margin(t = -20, unit = "pt"),
color = "white"),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
legend.position = "none")
Now, we can either save the plot as PNG or animate and save as GIF:
ggsave(temp, filename = "test.png", width = 5, height = 5, dpi = 320)
# Animate by date:
anim <- temp +
transition_reveal(date) +
ease_aes('linear')
output <- animate(anim, nframes = 100, end_pause = 30,
height = 5, width = 5, units = "in", res = 300)
anim_save("test.gif", output)
Let's see the results!
Static PNG:
Animated GIF:
At first glance, the results look equal, however, the detail shows differences (for instance, the marked blue line).
In this example code with synthetic data, the differences are minor. But with real data, the figures look pretty different as many red lines (recent data points with high temperatures) disappear in the background. So, how can you retain the order in transition_reveal() by date? Any help appreciated, thanks a lot!
This isn't the answer per se. This is the why. You'll have to tell me what you prefer given this information for me to give you a solution.
I tried a few things—each of which I was just sure would work but did not. So, I wanted to see what was happening in ggplot. My hunch proved correct. Your data is in order of value_yr in the png, not year.
I repeat this question at the end:
Either you can put the animation in order of value_yr or you can put the color in ggplot in order by year. Which would you prefer?
How do I know? I extracted the assigned colors in the object.
tellMe <- ggplot_build(temp)$data[[1]]
head(tellMe)
# colour x y group PANEL flipped_aes size linetype alpha
# 1 #1E60A4 1 -1.75990067 1 1 FALSE 0.6 1 NA
# 2 #1E60A4 2 -0.08968196 1 1 FALSE 0.6 1 NA
# 3 #1E60A4 3 -0.69657130 1 1 FALSE 0.6 1 NA
# 4 #1E60A4 4 -0.10777727 1 1 FALSE 0.6 1 NA
# 5 #1E60A4 5 1.57710505 1 1 FALSE 0.6 1 NA
# 6 #1E60A4 6 1.63277369 1 1 FALSE 0.6 1 NA
gimme <- tellMe %>% group_by(group) %>%
summarise(color = unique(colour)) %>%
print(n = 100) # there are less than 100, I just want them all
head(gimme)
# # A tibble: 6 × 2
# group color
# <int> <chr>
# 1 1 #1E60A4
# 2 2 #114781
# 3 3 #175290
# 4 4 #053061
# 5 5 #1C5C9E
# 6 6 #3E8BBF
To me, this indicated that the colors weren't in group order, so I wanted to see the colors to visualize the order.
I used this function. I know it came from a demo, but I don't remember which one. I looked just so I could include that here, but I didn't find it.
# this is from a demo (not sure which one anymore!
showCols <- function(cl=colors(), bg = "lightgrey",
cex = .75, rot = 20) {
m <- ceiling(sqrt(n <-length(cl)))
length(cl) <- m*m; cm <- matrix(cl, m)
require("grid")
grid.newpage(); vp <- viewport(w = .92, h = .92)
grid.rect(gp=gpar(fill=bg))
grid.text(cm, x = col(cm)/m, y = rev(row(cm))/m, rot = rot,
vp=vp, gp=gpar(cex = cex, col = cm))
}
showCols(gimme$color)
The top left color is the oldest year, the value below it is the following year, and so on. The most recent year is the bottom value in the right-most column.
df %>% group_by(yr) %>% summarise(value_yr = unique(value_yr))
# they are in 'value_yr' order in ggplot, not year
# # A tibble: 71 × 2
# yr value_yr
# <int> <dbl>
# 1 1950 0.0380
# 2 1951 -0.215
# 3 1952 -0.101
# 4 1953 -0.459
# 5 1954 -0.00130
# 6 1955 0.559
# 7 1956 -0.457
# 8 1957 -0.251
# 9 1958 1.10
# 10 1959 0.282
# # … with 61 more rows
Either you can put the animation in order of value_yr or you can put the color in ggplot in order by year. Which would you prefer?
Update
You won't use transition_reveal to group and transition by the same element. Unfortunately, I can't tell you why, but it seems to get stuck at 1958!
To make this gif on the left match that ggplot png on the right:
First, I modified the calls to ggplot and geom_line
ggplot(aes(x = month(date, label = T), y = value,
group = yr, color = yr)) +
geom_line(size = .6)
Then I tried to use transition_reveal but noticed that subsequent years were layered underneath other years. I can't account for that odd behavior. When I ran showCol after changing temp, the colors were in order. That ruled out what I had thought the problem was initially.
I modified the object anim, using transition_manual to force the order of the plot layers.
anim <- temp +
transition_manual(yr, cumulative = T) +
ease_aes('linear')
That's it. Now the layers match.
As to whether this would have worked before you changed the color assignment: original plot with manual transitions of the year on the left, ggplot png on the right:
It looks like that would've have worked, as well. So, my original drawn-out explanation wasn't nearly as useful as I thought, but at least you have a working solution now. (Sigh.)

Plotting a line in ggplot with three different subsets with different colors

I'm trying to creat a plot with 3 colors on the same line based on the "Recorte column" which you can check on the sample bellow:
Legenda `Hora da publicaçã~ Alcance `Curtidas e rea~ Comentários Compartilhament~ Resultados
<chr> <dttm> <dbl> <dbl> <dbl> <dbl> <lgl>
1 "Lucielle Lauren~ 2020-10-13 05:49:48 93 0 0 0 NA
2 "Lucielle Lauren~ 2020-10-13 05:57:12 2827 164 10 0 NA
3 "Assista agora a~ 2020-10-13 15:34:43 2641 68 1 0 NA
4 "Acompanhe agora~ 2020-10-23 09:07:32 3063 65 1 0 NA
5 "\U0001f535 Esta~ 2020-11-12 16:29:55 4578 105 17 0 NA
6 "Em respeito ao ~ 2020-10-23 16:48:41 4891 227 17 6 NA
This is the column which i want to use to subset the line in 3 diferent labels, representing 3 stages in the timeline the graph is demonstrating.
> head(campanhalutudo$Recorte)
[1] "Periodo 1" "Periodo 1" "Periodo 1" "Periodo 2" "Periodo 3" "Periodo 2"
This line of code produces the base graph without any attempt to subset the line
`graflutudo <- ggplot(campanhalutudo, aes(x = `Hora da publicação`, y = Alcance), label = "periodo") +
geom_line(color = "Purple") +
geom_line(aes(y = mean(Alcance)), color = "red", linetype = "dotted") +
scale_y_continuous(breaks = seq(0, 85000, by = 10000)) +
scale_x_datetime(labels = date_format("%m-%d"),
date_breaks = "4 days") +
theme_light()`
I've tried using the following code to color the line:
graflutudo<- ggplot(campanhalutudo, aes(x=`Hora da publicação`, y = Alcance, factor= (Recorte)), label = "periodo") +
geom_line(color = "Purple") +
geom_line(aes(y = mean(Alcance)), color = "red", linetype = "dotted") +
scale_x_datetime(labels = date_format("%m-%d"),
date_breaks = "4 days") +
scale_y_continuous(breaks = seq(0, 85000, by = 10000)) +
theme_light()
It returned the same graph without any coloring other than the already used purple.
I was also thinking that maybe this isn't even the best way to showcase the 3 different time periods in the timeline, something like bars that subest the line could also do the trick, any tips are welcome.
If you want separate purple lines for each series distinguished by the Recorte column, you can use geom_line(aes(group = Recorte), color = "purple")
If you want separate lines with different colors, you could use geom_line(aes(color = Recorte)) and combine that with something like scale_color_manual(values = c("Periodo 1" = "purple", "Periodo 2" = "darkorchid", "Periodo 3" = "darkmagenta")) if you want to specify different purple colors for the various series.

Barplots in pairs for each row of a dataframe

i'm fairly new to R so please excuse me for the noob question.
I have a dataframe that looks like this:
gene ctrl treated
gene_1 100 37.5
gene_2 100 20.2
... ... ...
For each row (ie each gene) in the df, I want to plot the values in such a way that ctrl and treated are one next to the other.
The code below gives something close to what i want, but the output is not grouped as it should: the bars for controls are plotted before the ones for treated samples.
barplot(height = df$df.ctrl1, df$df.avg_treated), names.arg = df$df.gene)
I know there are many similar questions, but i've gone through them with no success.
Anyone can help me understand what am i doing wrong?
Second (optional) question: what if i want to color-code the bars according to the gene id?
Many thanks.
I would use ggplot for this. Let's start with a slightly expanded example:
df <- data.frame(genes = c("gene_1", "gene_2", "gene_3", "gene_4"),
ctrl = c(50, 60, 70, 80),
treated = c(55, 64, 75, 83))
df
#> genes ctrl treated
#> 1 gene_1 50 55
#> 2 gene_2 60 64
#> 3 gene_3 70 75
#> 4 gene_4 80 83
The first thing we need to do is switch the dataframe to long format using tidyr::pivot_longer to put all your values in one column, and the labels of "ctrl" and "treatment" in another column. Then we can use ggplot to build our output:
library(tidyr)
library(ggplot2)
df %>%
pivot_longer(cols = c("ctrl", "treated")) %>%
ggplot(aes(name, value, fill = genes, alpha = name)) +
geom_col(position = position_dodge(), color = "black") +
scale_alpha_manual(values = c(0.5, 1), guide = guide_none()) +
facet_grid(~genes, scales = "free_x", switch = "x") +
theme(strip.placement = "outside",
panel.spacing = unit(0, "points"),
strip.background = element_blank(),
strip.text = element_text(face = "bold", size = 12)) +
labs(x = "Gene")
Created on 2020-08-22 by the reprex package (v0.3.0)
Consider transposing your data, converting into matrix with dimnames. Then run barplot with legend. Below demonstrates with random data. Note: ylim is adjusted for pretty range limit.
set.seed(92220)
df <- data.frame(gene = paste("gene", 1:30),
ctrl = runif(30, 50, 100),
treated = runif(30, 50, 100))
head(df)
# gene ctrl treated
# 1 gene 1 75.74607 76.15832
# 2 gene 2 61.73860 70.19874
# 3 gene 3 56.57906 63.67602
# 4 gene 4 60.23045 80.21108
# 5 gene 5 62.52773 60.86909
# 6 gene 6 85.71849 61.25974
# TRANSPOSE INTO MATRIX WITH DIMNAMES
dat <- `dimnames<-`(t(as.matrix(df[c("ctrl", "treated")])),
list(c("ctrl", "treated"), df$gene))
barplot(dat, beside=TRUE, col=c("blue", "red"), las=3,
main="Control Vs. Treatment",
ylim=range(pretty(c(0, dat*1.05))))
legend("top", legend=row.names(dat),
fill=c("blue", "red"), ncol=2, cex=0.75)

ggplot2 - create stacked histogram of proportions for indiciduals, and seperate them by population

Essentially, I have a dataset in which I have 4 columns containing the following information: individuals ("Ind"), the geographic population to which those individuals belong ("Pop"), the proportion of their genome that belongs to cluster1 and the proportion of their genome that belongs to cluster2 (these last two add up to 1).
Example:
Ind <- c(1:20)
Pop <- rep(1:2, each = 10)
set.seed(234)
Cluster1 <- runif(20, 0.0, 1.0)
Cluster2 <- 1-Cluster1
df <- data.frame(Ind, Pop, Cluster1, Cluster2)
Data:
Ind Pop Cluster1 Cluster2
1 1 1 0.745619998 0.25438000
2 2 1 0.781712425 0.21828758
3 3 1 0.020037114 0.97996289
4 4 1 0.776085387 0.22391461
5 5 1 0.066910093 0.93308991
6 6 1 0.644795124 0.35520488
7 7 1 0.929385959 0.07061404
8 8 1 0.717642189 0.28235781
9 9 1 0.927736510 0.07226349
10 10 1 0.284230120 0.71576988
11 11 2 0.555724930 0.44427507
12 12 2 0.547701653 0.45229835
13 13 2 0.582847855 0.41715215
14 14 2 0.582989913 0.41701009
15 15 2 0.001198341 0.99880166
16 16 2 0.441117854 0.55888215
17 17 2 0.313152501 0.68684750
18 18 2 0.740014466 0.25998553
19 19 2 0.138326844 0.86167316
20 20 2 0.871777777 0.12822222
I want to try and produce a plot using ggplot2 that resembles the "A" panel in this figure. In this figure, each individual is a bar with the proportion of each cluster, but the x ticks are the populations and the vertical grids separate these populations. I know that I can easily produce a stacked histogram if I ignore Pop and use melt(). But I would like to know how to incorporate Pop to produce elegant an elegant plot such as the one in the link above.
Thanks!
How about melting with both Ind and Pop as id variables and graphing it with a facet_grid? It's not 100% like the plot you were looking for but gets pretty close with a few theme adjustments:
dfm <- melt(df, id = c("Ind", "Pop"))
ggplot(dfm, aes(Ind, value, fill = variable)) +
geom_bar(stat="identity", width = 1) +
facet_grid(~Pop, scales = "free_x") +
scale_y_continuous(name = "", expand = c(0, 0)) +
scale_x_continuous(name = "", expand = c(0, 0), breaks = dfm$Ind) +
theme(
panel.border = element_rect(colour = "black", size = 1, fill = NA),
strip.background = element_rect(colour = "black", size = 1),
panel.margin = unit(0, "cm"),
axis.text.x = element_blank()
)
UPDATE: my example fails to cover the more complex case of multiple populations with uneven numbers of individuals. Quick amendment to deal with this case using the spaces = "free_x" attribute, complete code for example:
require(ggplot2)
require(reshape2)
require(grid)
Ind <- c(1:30)
Pop <- rep(paste("Pop", 1:3), times = c(5, 15, 10))
set.seed(234)
Cluster1 <- runif(30, 0.0, 1.0)
Cluster2 <- 1-Cluster1
df <- data.frame(Ind, Pop, Cluster1, Cluster2)
dfm <- melt(df, id = c("Ind", "Pop"))
ggplot(dfm, aes(Ind, value, fill = variable)) +
geom_bar(stat="identity", width = 1) +
facet_grid(~Pop, scales = "free_x", space = "free_x") +
scale_y_continuous(name = "", expand = c(0, 0)) +
scale_x_continuous(name = "", expand = c(0, 0), breaks = dfm$Ind) +
theme(
panel.border = element_rect(colour = "black", size = 1, fill = NA),
strip.background = element_rect(colour = "black", size = 1),
panel.margin = unit(0, "cm"),
axis.text.x = element_blank()
)

Resources