For some reason I am getting two legends in my dot-whisker plot.
Plot produced by the below code:
The data are available here.
#first importing data
Q2a<-read.table("~/Q2a.txt", header=T)
# Optionally, read in data directly from figshare.
# Q2a <- read.table("https://ndownloader.figshare.com/files/13283882?private_link=ace5b44bc12394a7c46d", header=TRUE)
library(dplyr)
#splitting into female and male
F2female<-Q2a %>%
filter(sex=="F")
F2male<-Q2a %>%
filter(sex=="M")
library(lme4)
#Female models
ab_f2_f_LBS = lmer(LBS ~ ft + grid + (1|byear), data = subset(F2female))
ab_f2_f_surv = glmer.nb(age ~ ft + grid + (1|byear), data = subset(F2female), control=glmerControl(tol=1e-6,optimizer="bobyqa",optCtrl=list(maxfun=1e19)))
#Male models
ab_f2_m_LBS = lmer(LBS ~ ft + grid + (1|byear), data = subset(F2male))
ab_f2_m_surv = glmer.nb(age ~ ft + grid + (1|byear), data = subset(F2male), control=glmerControl(tol=1e-6,optimizer="bobyqa",optCtrl=list(maxfun=1e19)))
I only plot two of the variables (ft2 and gridSU) from each model.
ab_f2_f_LBS <- tidy(ab_f2_f_LBS) %>% filter(!grepl('sd_Observation.Residual', term)) %>% filter(!grepl('byear', group)) %>% mutate(model = "ab_f2_f_LBS")
ab_f2_m_LBS <- tidy(ab_f2_m_LBS) %>% filter(!grepl('sd_Observation.Residual', term)) %>% filter(!grepl('byear', group)) %>% mutate(model = "ab_f2_m_LBS")
ab_f2_f_surv <- tidy(ab_f2_f_surv)%>% filter(!grepl('sd_Observation.Residual', term)) %>% filter(!grepl('byear', group)) %>% mutate(model = "ab_f2_f_surv")
ab_f2_m_surv <- tidy(ab_f2_m_surv) %>% filter(!grepl('sd_Observation.Residual', term)) %>% filter(!grepl('byear', group)) %>% mutate(model = "ab_f2_m_surv")
tidy_mods <- bind_rows(ab_f2_f_LBS, ab_f2_m_LBS, ab_f2_f_surv, ab_f2_m_surv)
I am then ready to make a dot-whisker plot.
#required packages
library(dotwhisker)
library(broom)
dwplot(tidy_mods,
vline = geom_vline(xintercept = 0, colour = "black", linetype = 2),
conf.int=TRUE,
dodge_size=0.2, #space between the CI's
dot_args = list(aes(shape = model), size = 3), #changes shape of points and the size of the points
style="dotwhisker") %>% # plot line at zero _behind_ coefs
relabel_predictors(c(DamDisFate2= "Immigrant mothers",
gridSU = "Grid (SU)")) +
theme_classic() +
xlab("Coefficient estimate (+/- CI)") +
ylab("") +
scale_color_manual(values=c("#000000", "#666666", "#999999", "#CCCCCC"),
labels = c("Daughter LBS", "Son LBS", "Daughter longevity", "Son longevity"),
name = "First generation models, maternity known") +
theme(axis.title=element_text(size=15),
axis.text.x = element_text(size=15),
axis.text.y = element_text(size=15, angle=90, hjust=.5),
legend.position = c(0.7, 0.7),
legend.justification = c(0, 0),
legend.title=element_text(size=15),
legend.text=element_text(size=13),
legend.key = element_rect(size = 0),
legend.key.size = unit(0.5, "cm"))+
guides(colour = guide_legend(override.aes=list(shape=c(16,17,15,3)))) #changes shape of points in legend
I am encountering this problem:
As is obvious from the plot, I have two legends. One that is unmodified and one that is modified.
I can't find any short cut within the theme() function and the dwplot() package doesn't offer any solutions either.
How can I suppress the unmodified legend (bottom one) and only keep my modified legend (top one)?
Assuming this function uses ggplot, try adding shape="none" to your guides():
guides(colour = guide_legend(override.aes=list(shape=c(16,17,15,3))), shape="none")
Related
I am interested in two things 1) Summary for multiple subgroups in the same table and 2) dotplot for the subgroups based on the summary generated in step1.
For example ,
if this is my dataset
data("pbc")
I like to generate summary of cholesterol (chol), by sex, stage, ascites and spiders for two treatment levels 1, 2
table(pbc$trt)
1 2
158 154
I can do this separately like this.
library(Hmisc)
summary(chol ~ sex + stage + ascites + spiders, data = subset(pbc, trt=1))
summary(chol ~ sex + stage + ascites + spiders, data = subset(pbc, trt=2))
This creates two separate summaries.
Two different corresponding plots
plot(summary(chol ~ sex + stage + ascites + spiders, data = subset(pbc, trt=1)))
plot(summary(chol ~ sex + stage + ascites + spiders, data = subset(pbc, trt=2)))
I like the summaries to be in one table , two columns 1 column for trt=1 and 2nd column for trt=2
N
chol (trt=1)
chol (trt=2)
sex
m
..
..... .
.... ..
f
..
..... .
.... ..
And the plot side by side. 1st plot for trt=1 , second plot for trt=2
Kindly suggest suggest how to scale the Hmisc:::summary.formula , summary function to 1) show summaries by subgroups side-by-side 2) Plot the summaries side-by-side. Thanks.
Please note that your current summaries and plots are identical; despite using subset with the two levels of trt, your two posted plots are identical. You can use filter to definitively filter by the levels of trt.
First, I prefer gtsummary with my tables, since you can use tbl_continuous to make one singular table instead of trying to combine two tables. Second, you will likely encounter difficulty trying to combine your two plots since you're using base R plotting functions on Hmisc summary objects. Even trying to save each plot to an object will result in NULL. In the long run, it may be easier to recreate each plot using ggplot and combining with cowplot::plot_grid.
library(survival)
library(Hmisc)
# create combined summary
library(gtsummary)
library(tidyverse)
data(pbc)
df <- pbc %>%
select(id, trt, chol, sex, stage, ascites, spiders) %>%
mutate(across(c(sex, stage, ascites, spiders), as.factor)) %>%
mutate(trt = factor(trt)) %>%
mutate(chol = as.numeric(chol))
dftrt1 <- df %>% filter(trt == 1)
dftrt2 <- df %>% filter(trt == 2)
df %>%
select(trt, chol, sex, stage, ascites, spiders) %>%
tbl_continuous(variable = chol,
digits = everything() ~ 2,
statistic = everything() ~ "{mean}",
label = list(sex ~ "Sex",
stage ~ "Stage",
ascites ~ "Ascites",
spiders ~ "Spiders"),
by = trt)
# create combined plot
library(cowplot)
p1 <- dftrt1 %>%
select(-trt) %>% pivot_longer(cols = -c(id, chol)) %>% group_by(name, value) %>%
summarise(chol = mean(chol, na.rm = TRUE)) %>%
ggplot(aes(x = value, y = chol, fill = factor(value))) +
geom_point() + coord_flip() +
facet_wrap(~name, scales = "free_y", nrow = 4, strip.position = "top") +
theme(panel.spacing = unit(0, "lines"),
panel.border = element_rect(fill = NA),
strip.background = element_blank(),
axis.title.y = element_blank(),
legend.position = "none",
strip.placement = "outside") +
ggtitle("trt = 1") + theme(plot.title = element_text(hjust = 0.5))
p2 <- dftrt2 %>%
select(-trt) %>% pivot_longer(cols = -c(id, chol)) %>% group_by(name, value) %>%
summarise(chol = mean(chol, na.rm = TRUE)) %>%
ggplot(aes(x = value, y = chol, fill = factor(value))) +
geom_point() + coord_flip() +
facet_wrap(~name, scales = "free_y", nrow = 4, strip.position = "top") +
theme(panel.spacing = unit(0, "lines"),
panel.border = element_rect(fill = NA),
strip.background = element_blank(),
axis.title.y = element_blank(),
legend.position = "none",
strip.placement = "outside") +
ggtitle("trt = 2") + theme(plot.title = element_text(hjust = 0.5))
plot_grid(p1, p2, ncol = 2)
I have an example code below. I have built a figure with ggplot and it is almost there, but I would like to add an additional curve across all facets from y. The final output should look like the image attached. I'm not sure how I would do this.
x <- iris[-1:-3]
bw <- 1
nbin <- 100
y <- head(iris, 50)[2]
ggplot(x, aes(x = Petal.Width)) +
geom_density(aes(y = bw *..count.., fill = Species), size = 1, alpha = 0.4) +
facet_wrap(~Species)+
scale_x_continuous(labels = scales::math_format(10^.x), limits = c(0, 5), expand = c(0,0)) +
scale_y_continuous(expand = c(0,0), limits = c(0, NA)) +
annotation_logticks(sides = "b", short=unit(-1,"mm"), mid=unit(-2,"mm"), long=unit(-3,"mm")) +
coord_cartesian(clip='off') + theme(panel.background = element_blank(),
panel.border = element_rect(colour = "black", fill=NA))
Does this do what you are looking to achieve?
I'm sure there are better ways; basically I've generated three versions of the y data and shuffled the grouping variables to allow ggplot's facet_wrap and fill to manage the appearance.
It would be great if there is way to make one set of data appear in all facets without this repetition.
library(ggplot2)
library(dplyr)
library(tidyr)
x1 <-
x %>%
mutate(var = "Petal width") %>%
rename(val = Petal.Width)
df <-
y %>%
mutate(var = "Sepal width",
spp1 = "setosa",
spp2 = "versicolor",
spp3 = "virginica") %>%
pivot_longer(cols = starts_with("spp"), names_to = "temp", values_to = "Species") %>%
select(-temp) %>%
rename(val = Sepal.Width )%>%
bind_rows(x1) %>%
mutate(g1 = case_when(var == "Sepal width" ~ "all species: sepal width",
TRUE ~ paste0(Species, ": petal width")))
ggplot(df, aes(x = val)) +
geom_density(aes(y = bw *..count.., fill = g1), size = 1, alpha = 0.4) +
facet_wrap(~Species)+
scale_x_continuous(labels = scales::math_format(10^.x), limits = c(0, 5), expand = c(0,0)) +
scale_y_continuous(expand = c(0,0), limits = c(0, NA)) +
annotation_logticks(sides = "b", short=unit(-1,"mm"), mid=unit(-2,"mm"), long=unit(-3,"mm")) +
coord_cartesian(clip='off') + theme(panel.background = element_blank(),
panel.border = element_rect(colour = "black", fill=NA))
Created on 2020-07-01 by the reprex package (v0.3.0)
My question is similar to those posted here and here.
I am working on creating a graph in ggplot where I have one bar plot and then want to overlay multiple line graphs. For the purposes of this question, I have reproduced my code for two barplots (one that includes all years (2007-2015) and two from specific years (2007 and 2015), but ultimately I will be overlaying data from 10 different years. The data used can be found here.
library(dplyr)
library(tidyr)
library(gridExtra)
library(ggplot2)
overallpierc<-data[(data$item=="piercing"),]
overp<-overallpierc %>%
group_by(age) %>%
count(sex) %>%
ungroup %>%
mutate(age = factor(age)) %>%
complete(age, sex, fill = list(n = 0)) %>%
ggplot(aes(age, n)) + geom_col(aes(fill = sex), position = "dodge") +
theme_classic() +
scale_fill_manual(values=c("#000000", "#CCCCCC"), name = "Sex") +
labs(x = "Age", y = "Number of observations") +
theme(legend.position=c(0.4,0.8),
plot.title = element_text(size = 10),
legend.title=element_text(size=15),
axis.title=element_text(size=15),
legend.key.size = unit(1.13, "cm"),
legend.direction="vertical",
legend.text=element_text(size=15))
p07<-data[(data$yy=="2007") & (data$item=="piercing"),]
summary(p07)
subp07<-p07 %>%
group_by(age) %>%
count(sex) %>%
ungroup %>%
mutate(age = factor(age)) %>%
complete(age, sex, fill = list(n = 0)) %>%
ggplot(aes(age, n)) + geom_col(aes(fill = sex), position = "dodge") +
theme_classic() +
scale_fill_manual(values=c("#000000", "#CCCCCC"), name = "Sex") +
labs(x = "Age", y = "Number of observations") +
theme(legend.position=c(0.4,0.8),
plot.title = element_text(size = 10),
legend.title=element_text(size=15),
axis.title=element_text(size=15),
legend.key.size = unit(1.13, "cm"),
legend.direction="vertical",
legend.text=element_text(size=15))
p15<-data[(data$yy=="2015") & (data$item=="piercing"),]
subp15<-p15 %>%
group_by(age) %>%
count(sex) %>%
ungroup %>%
mutate(age = factor(age)) %>%
complete(age, sex, fill = list(n = 0)) %>%
ggplot(aes(age, n)) + geom_col(aes(fill = sex), position = "dodge") +
theme_classic() +
scale_fill_manual(values=c("#000000", "#CCCCCC"), name = "Sex") +
labs(x = "Age", y = "Number of observations") +
theme(legend.position=c(0.4,0.8),
plot.title = element_text(size = 10),
legend.title=element_text(size=15),
axis.title=element_text(size=15),
legend.key.size = unit(1.13, "cm"),
legend.direction="vertical",
legend.text=element_text(size=15))
grid.arrange(overp, subp07, subp15)
The code I have posted gives me the following figure.
What I am trying to do is plot the frequencies for females in 2007 and 2015 and males in 2007 and 2015 on top of the barplot for total frequencies (where this is also reflected in the legend). Is there a way to do that in R using ggplot2?
UPDATE: I tried using the geom_smooth and geom_line functions to add the lines to my ggplot as suggested in the comments and as other solutions to users questions, but I get the following error:
Error: Discrete value supplied to continuous scale
I created a new data frame for a subset that I would like to plot:
df<-data.frame(age=c(15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,40,50,60), val=c(0,5,13,77,70,106,62,51,46,27,46,16,22,16,14,48,21, 3,4))
And then added it to the ggplot code:
overallpierc %>%
filter(age != "15") %>%
group_by(age) %>%
count(sex) %>%
ungroup %>%
mutate(age = factor(age)) %>%
complete(age, sex, fill = list(n = 0)) %>%
ggplot(aes(age, n)) +
geom_line(data=df,aes(x=as.numeric(age),y=val),colour="blue") +
geom_col(aes(fill = sex), position = "dodge") +
theme_classic() +
scale_fill_manual(values=c("#000000", "#CCCCCC"), name = "Sex") +
labs(x = "Age", y = "Number of observations") +
theme(legend.position=c(0.4,0.8),
plot.title = element_text(size = 10),
legend.title=element_text(size=15),
axis.title=element_text(size=15),
legend.key.size = unit(1.13, "cm"),
legend.direction="vertical",
legend.text=element_text(size=15))
Others have encountered similar issues and used as.numeric to solve the problem. However, age needs to be treated as a factor for the purposes of plotting.
Based on our discussion in the comments, let's try stacked bars and facets. I think it works but you can decide for yourself.
The stacked bar has the advantage of showing both proportions and total count in the same bar. To compare years, a facet grid places years in rows, so the eye can scan downwards to compare the same age in different years. Note that I kept age as a continuous variable here, rather than a factor.
library(dplyr)
library(ggplot2)
data30g %>%
count(yy, sex, age) %>%
ggplot(aes(age, n)) +
geom_col(aes(fill = sex)) +
facet_grid(yy ~ .) +
theme_bw() +
scale_fill_manual(values = c("#000000", "#cccccc"))
Not bad - I can see straight away, for example, an increase in both total and female count at age 30 over time, but perhaps a little small and crowded.
We can use a facet wrap instead of a grid to make the bars clearer, but at the expense of quick visual comparison across years.
data30g %>%
count(yy, sex, age) %>%
ggplot(aes(age, n)) +
geom_col(aes(fill = sex)) +
facet_wrap(~yy, ncol = 2) +
theme_bw() +
scale_fill_manual(values = c("#000000", "#cccccc"))
One more example which does not address your question in terms of total counts or barplots - but I thought it might be of interest. This code generates a "heatmap" style of plot which is poor for quantitative comparison, but can sometimes give a quick visual impression of interesting features. I think it shows, for example, that females aged 20 in 2014 have the highest total count.
data30g %>%
count(yy, sex, age) %>%
ggplot(aes(factor(age), yy)) +
geom_tile(aes(fill = n)) +
facet_grid(sex ~ .) +
scale_fill_gradient2() +
scale_y_reverse(breaks = 2006:2015) +
labs(x = "age", y = "Year")
EDIT:
Based on further discussions in the comments, here is one way to plot age as a factor, using bars for sexes, overlaid with a line for the totals and split by year.
overallpierc %>%
count(yy, sex, age) %>%
ggplot() +
geom_col(aes(factor(age), n, fill = sex), position = "dodge") +
stat_summary(aes(factor(age), n), fun.y = "sum", geom = "line", group = 1) +
facet_grid(yy ~ .)
I am trying to use facet_wrap to plot indvidual plots.
library(lme4)
library(dplyr)
library(tibble)
# Convert to tibble for better printing. Convert factors to strings
sleepstudy <- sleepstudy %>%
as_tibble() %>%
mutate(Subject = as.character(Subject))
xlab <- "Days of sleep deprivation"
ylab <- "Average reaction time (ms)"
ggplot(df_sleep) +
aes(x = Days, y = Reaction) +
stat_smooth(method = "lm", se = FALSE) +
# Put the points on top of lines
geom_point() +
facet_wrap("Subject") +
labs(x = xlab, y = ylab) +
theme(axis.text=element_text(size=0.02),
axis.title=element_text(size=0.02,face="bold"),
plot.title = element_text(size=0.02)) +
theme(strip.text.x = element_text(size = 8),
strip.background = element_rect(fill="lightblue", colour="black",size=0.2)) +
theme(strip.text.x = element_text(margin = margin(0.02, 0, 0.02, 0, "cm")))
What I want to do is to only visualise selected Subject using facet_wrap? At the moment,
it is plotting plots of all the Subject. How do I plot only for say subject 308`` and352`?
Thanks
You just want to filter your data before plotting
library(lme4)
library(dplyr)
library(tibble)
library(ggplot2)
# Convert to tibble for better printing. Convert factors to strings
sleepstudy <- sleepstudy %>%
as_tibble() %>%
mutate(Subject = as.character(Subject))
xlab <- "Days of sleep deprivation"
ylab <- "Average reaction time (ms)"
sleepstudy %>%
filter(Subject %in% c("308", "352")) %>%
ggplot(.) +
aes(x = Days, y = Reaction) +
stat_smooth(method = "lm", se = FALSE) +
# Put the points on top of lines
geom_point() +
facet_wrap("Subject") +
labs(x = xlab, y = ylab) +
theme(axis.text=element_text(size=0.02),
axis.title=element_text(size=0.02,face="bold"),
plot.title = element_text(size=0.02)) +
theme(strip.text.x = element_text(size = 8),
strip.background = element_rect(fill="lightblue", colour="black",size=0.2)) +
theme(strip.text.x = element_text(margin = margin(0.02, 0, 0.02, 0, "cm")))
I plan to build a customized ACF and PACF plot for a simulated time series
ts <- arima.sim(n=5300,list(order=c(2,0,1), ar=c(0.4,0.3), ma=-0.2))
Below are the codes I wrote to produce the plot through ggplot2:
library(gridExtra)
theme_setting <- theme(
panel.background = element_blank(),
panel.grid.major.y = element_line(color="grey90", size=0.5),
panel.grid.major.x = element_blank(),
panel.border = element_rect(fill=NA, color="grey20"),
axis.text = element_text(family="Times"),
axis.title = element_text(family="Times"),
plot.title = element_text(size=10, hjust=0.5, family="Times"))
acf_ver_conf <- acf(ts, plot=FALSE)$acf %>%
as_tibble() %>% mutate(lags = 1:n()) %>%
ggplot(aes(x=lags, y = V1)) + scale_x_continuous(breaks=seq(0,41,4)) +
labs(y="Autocorrelations", x="Lag", title= "Time Series, ACF") +
geom_segment(aes(xend=lags, yend=0)) +geom_point() + theme_setting
pacf_ver_conf <- pacf(ts, main=NULL,plot=FALSE)$acf %>%
as_tibble() %>% mutate(lags = 1:n()) %>%
ggplot(aes(x=lags, y = V1)) +
geom_segment(aes(xend=lags, yend=0)) +geom_point() + theme_setting +
scale_x_continuous(breaks=seq(0,41,4))+
labs(y="Partial Autocorrelations", x="Lag", title= "Time Series, PACF")
grid.arrange(acf_ver_conf, pacf_ver_conf, ncol=2)
While this is exactly what I want, I am not sure how to produce the confidence intervals in acf(ts) and pacf(ts):
So, my question has two parts:
How to statistically derive the upper and lower bound of the confidence intervals for Autocorrelated Functions and Partial Autocorrelations in R?
How would you plot it onto the first graph? I was thinking about geom_ribbon but any additional idea will be appreciated!
This may work (the formula for the confidence limits are taken from here https://stats.stackexchange.com/questions/211628/how-is-the-confidence-interval-calculated-for-the-acf-function, may need some tweaking):
ts.acf <- acf(ts, plot=TRUE)
alpha <- 0.95
conf.lims <- c(-1,1)*qnorm((1 + alpha)/2)/sqrt(ts.acf$n.used)
ts.acf$acf %>%
as_tibble() %>% mutate(lags = 1:n()) %>%
ggplot(aes(x=lags, y = V1)) + scale_x_continuous(breaks=seq(0,41,4)) +
geom_hline(yintercept=conf.lims, lty=2, col='blue') +
labs(y="Autocorrelations", x="Lag", title= "Time Series, ACF") +
geom_segment(aes(xend=lags, yend=0)) +geom_point() + theme_setting
ts.pacf <- pacf(ts, main=NULL,plot=TRUE)
alpha <- 0.95
conf.lims <- c(-1,1)*qnorm((1 + alpha)/2)/sqrt(ts.pacf$n.used)
ts.pacf$acf %>%
as_tibble() %>% mutate(lags = 1:n()) %>%
ggplot(aes(x=lags, y = V1)) +
geom_segment(aes(xend=lags, yend=0)) +geom_point() + theme_setting +
scale_x_continuous(breaks=seq(0,41,4))+
geom_hline(yintercept=conf.lims, lty=2, col='blue') +
labs(y="Partial Autocorrelations", x="Lag", title= "Time Series, PACF")