How to produce a similar plot? [R] - r

The authors of this paper (https://www.sciencedirect.com/science/article/pii/S0092867415006418) mention in their supplementary file that these were produced in Matlab. Due to lack of proficiency, time to learn it, and the license, I was trying to replicate the figure below (Figure 2 of the paper, specifically figure 2A on the left) in R:
Any suggestions? What is this plot called more generally?
Thank you!

To me it looks like a classic point plot! You can reproduce this kind of plot in R with ggplot:
# Fake dataframe with xy coordinates, type of data (for the coloring), pvalue (for size), and different panel
df <- data.frame(
x = rep(1:20, 10),
y = rnorm(200, mean = 0, sd = 2),
type = rep(rep(LETTERS[1:5], each = 4), 10),
pvalue = sample(0:50, size = 200, replace = T)/1000,
panel = sample(rep(paste0("panel", 1:4), each = 50)), 200, replace = F)
# plot
library(ggplot2)
ggplot(df, aes(x, y*x , color = type, size = pvalue)) + geom_hline(yintercept = 0) + geom_point() + facet_wrap(~panel, ncol = 2)
ggsave("demo.png")

Related

Get sjPlot in R to show and sort estimates

I am trying to make an interaction plot in sjPlot showing percent probabiliites of my outcome under two conditions of my predictive variable. Everything works perfectly, except the show.values = T and sort.est = T arguments, which don't seem to do anything. Is there a way to get this to work? Or, if not, how can I extract the dataframe sjPlot is using to create this figure? Looking for some way to either label or tabulate the displayed probability values. Thank you!
Here is some example data and what I have so far:
set.seed(100)
dat <- data.frame(Species = rep(letters[1:10], each = 5),
threat_cat = rep(c("recreation", "climate", "pollution", "fire", "invasive_spp"), 10),
impact.pres = sample(0:1, size = 50, replace = T),
threat.pres = sample(0:1, size = 50, replace = T))
mod <- glm(impact.pres ~ 0 + threat_cat/threat.pres,
data = dat, family = "binomial")
library(sjPlot)
library(ggpubr)
plot_model(mod, type = "int",
title = "",
axis.title = c("Threat category", "Predicted probabilities of threat being observed"),
legend.title = "Threat predicted",
colors = c("#f2bf10",
"#4445ad"),
line.size = 2,
dot.size = 4,
sort.est = T,
show.values = T)+
coord_flip()+
theme_pubr(legend = "right", base_size = 30)
sjPlot produces a ggplot object, so you can examine the aesthetic mappings and underlying data. After a bit of digging around you will find the default mapping is already correct for the x, y placements of text labels, so all you need to do is add a geom_text to the plot, and only need to specify the labels as an aesthetic mapping. You can get the labels from a column called predicted stored in the ggplot object.
The upshot is that if you add the following layer to your plot:
geom_text(aes(label = scales::percent(predicted)),
position = position_dodge(width = 1), size = 8)
You get
Getting the labels in order is trickier. You have to fiddle with the internal components of the plot to do this. Suppose we store the above plot as p, then we can sort by the predicted percentages by doing:
p$data <- as.data.frame(p$data)
ord <- p$data$x[p$data$group == 1][order(p$data$predicted[p$data$group == 1])]
p$data$x <- match(p$data$x, ord)
p$scales$scales[[1]]$labels <- p$scales$scales[[1]]$labels[ord]
p

Advice/ on how to plot side by side histograms with line graph going through in ggplot2

I'm currently finishing off my Masters project and need to include some graphics for the write-up. Without boring you too much, I have some data which is associated with AR(1) parameters ranging from 0.1 to 0.9 by 0.1 increments. As such I thought of doing a faceted histogram like the one below (worry not about the hideous fruit salad of colours, it will not be used).
I used this code.
ggplot(opt_lens_geom,aes(x=l_1024,fill=factor(rho))) + geom_histogram()+coord_flip()+facet_grid(.~rho,scales = "free_x")
I also would like to draw a trend line for the median values since the AR(1) parameter is continuous. In a later iteration I deleted the padding and made it "look" like it was one graph, but I have had issues with the endpoints matching up since each facet is a separate graphical device. Can anyone give me some advice on how to do this? I am not particularly partial to the faceting so if it is not needed I do away with it.
I will try and upload sample data, but all simulating 100 values for each of the 9 rhos would work just to get it started like:
opt_lens_geom <- data.frame(rho= rep(seq(0.1,0.9,by=0.1),each=100),l_1024=rnorm(900))
You might consider ggridges. I've assumed here that you want a median value for each value of rho.
library(ggplot2)
library(ggridges)
library(dplyr)
set.seed(1001)
opt_lens_geom <- data.frame(rho = rep(seq(0.1, 0.9, by = 0.1), each = 100),
l_1024 = rnorm(900))
opt_lens_geom %>%
mutate(rho_f = factor(rho)) %>%
ggplot(aes(l_1024, rho_f)) +
stat_density_ridges(quantiles = 2, quantile_lines = TRUE)
Result. You can add scale = 1 as a parameter to stat_density_ridges if you don't like the amount of overlap.
Try the following. It uses a pre-computed data frame of the medians.
library(ggplot2)
df <- iris[c(1, 5)]
names(df) <- c("val", "rho")
med <- plyr::ddply(df, "rho", summarise, m = median(val))
ggplot(data = df, aes(x = val, fill = factor(rho))) +
geom_histogram() +
coord_flip() +
geom_vline(data = med, aes(xintercept = m), colour = 'black') +
facet_wrap(~ factor(rho))
You could do a variant on this using geom_violin instead of using histograms, although you wouldn't get labelled counts, just an idea of the relative density. Example with made up data:
df = data.frame(
rho = rep(c(0.1, 0.2, 0.3), each = 50),
val = sample(1:10, 150, replace = TRUE)
)
df$val = df$val + (5 * (df$rho == 0.2)) + (8 * (df$rho == 0.3))
ggplot(df, aes(x = rho, y = val, fill = factor(rho))) +
geom_violin() +
stat_summary(aes(group = 1), colour = "black",
geom = "line", fun.y = "median")
This produces a violin for each value of rho, and joins the medians for each violin.

How to annotate lines like this in ggplot2?

See example:
I hope I don't need to manually assign the coordinators of the texts. If this is too complicated to achieve in ggplot2, what are the alternatives in R? Or maybe even not in R?
As #Axeman says, ggrepel is a decent option. Unfortunately it will only avoid overlap with other labels, and not the lines, so the solution isn't quite perfect.
library(ggplot2)
install.packages("ggrepel")
library(ggrepel)
set.seed(50)
d <- data.frame(y = c(rnorm(50), rnorm(50, 5), rnorm(50, 10)),
x = rep(seq(50), times = 3),
group = rep(LETTERS[seq(3)], each = 50))
ggplot(d, aes(x, y, group = group, label = group)) +
geom_line() +
geom_text_repel(data = d[d$x == sample(d$x, 1), ], size = 10)

R -- paired dot plot and box plot on same graph: is there a template in ggplot2 or another package?

I am a new R user and found graphs I would like to replicate with my data. From the look of the plot, it looks as though it was made in ggplot2. I've searched and searched and can't find a template within ggplot2 or another package. Just wondering if anyone has seen template code for this?
See attached image and paper here: http://ehp.niehs.nih.gov/1205963/
Perhaps this will get you started:
d <- data.frame(y = rnorm(20, 9, 2),
group = as.factor(rep(c('Post-FAP', 'Post-DEP'), each = 10)),
id = rep(1:10, 2))
ggplot(d, aes(y = y)) +
geom_boxplot(aes(x = rep(c(-3, 3), each = 10), group = group), fill = 'steelblue') +
geom_point(aes(x = rep(c(-1, 1), each = 10)), size = 5) +
geom_line(aes(x = rep(c(-1, 1), each = 10), group = id))

Plot jitter points for certain boxplots only

I have five files with data in matrix form and I am plotting it using geom_boxplot. each boxplot corresponds to a file.
What I want to achieve is for only certain files say here for div1,div3,div5 I want to plot boxplot with data points overlaid on the boxplot. I could add data points using geom_jitter but i had to separate those plots with data points from the only boxplots plots.
Since I want to preserve the order of plotting the files..i.e div0,div1.. etc. I could not plot data points for only certain boxplots.
How can add overlay data points for only certain boxplots and not all?
files <- c(div0,div1,div2,div3,div4,div5)
p1 <- ggplot(moltenNew,aes(x=L1,y=value,colour=L1))+ ylim(0.3,0.8) +
geom_boxplot() + facet_wrap(~variable,nrow=1) + scale_x_discrete(limits = basename(files) ,labels = basename(files))
![enter image description here][1]
You could use subset:
set.seed(1)
moltenNew <- rbind(
data.frame(value = rnorm(20, 50, 20), L1 = gl(2, 10), variable = 1),
data.frame(value = rnorm(20, 100, 100), L1 = gl(2, 10), variable = 2),
data.frame(value = rnorm(20, 75, 10), L1 = gl(2, 10), variable = 3)
)
moltenNew
library(ggplot2)
ggplot(moltenNew,aes(x=L1,y=value,colour=L1)) +
geom_boxplot() +
facet_wrap(~variable,nrow=1, scale = "free_y") +
geom_point(subset = .(variable == 2), position = position_jitter(width = .2))

Resources