Combining 2 different graph outputs in R into one graph - r

So I used the following code, to generate graphs, where appl and apple generate 2 different graphs and now I want to combine them into a single graph
data <- ddply(data, .(Value), summarise,
N = length(means),
mean = mean(means),
sd = sd(means),
se = sd(means) / sqrt(length(means)) )
apple=ggplot(data, aes(x=Value, y=mean)) +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.1) +
geom_ribbon(aes(ymin=mean-se, ymax=mean+se),alpha=0.5) +
geom_line() +
geom_point()
dat <- ddply(dat1, .(Value), summarise,
N = length(means),
mean = mean(means),
sd = sd(means),
se = sd(means) / sqrt(length(means)))
appl=ggplot(dat, aes(x=Value, y=mean)) +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.1) +
geom_ribbon(aes(ymin=mean-se, ymax=mean+se),alpha=0.5) +
geom_line() +
geom_point()

The answer involves combining the datasets into one big one, with an additional column specifying to which dataset that subset belonged. There is no need for creating plots separately and combining them. Let's assume that column is named id, then you can use an additional argument in aes to get the plot to work, i.e. aes(x=Value, y=mean, color=id). Combining the datasets can be done using rbind.
A code example:
df1 = data.frame(Value = sample(LETTERS[1:8], 1000, replace = TRUE),
means = runif(1000))
df2 = data.frame(Value = sample(LETTERS[1:8], 1000, replace = TRUE),
means = runif(1000) + 0.5)
df1 = ddply(df1, .(Value), summarise,
N = length(means),
mean = mean(means),
sd = sd(means),
se = sd(means) / sqrt(length(means)))
df1$id = "ID1"
df2 = ddply(df2, .(Value), summarise,
N = length(means),
mean = mean(means),
sd = sd(means),
se = sd(means) / sqrt(length(means)))
df2$id = "ID2"
df_all = rbind(df1, df2)
ggplot(df_all, aes(x=Value, y=mean, color = id)) +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.1) +
geom_ribbon(aes(ymin=mean-se, ymax=mean+se),alpha=0.5) +
geom_line() +
geom_point()
Which results in the following graph:
Note that I have had to invent some data due to lack of example data form your side, so this might not entirely fit your situation. However, it nicely illustrates the approach.

Related

Assigning many line colors based on group in ggplot

Suppose I have some code like the following, generating a lineplot with a considerable number of lines (example taken from here)
library(ggplot2)
library(reshape2)
n = 1000
set.seed(123)
mat = matrix(rnorm(n^2), ncol=n)
cmat = apply(mat, 2, cumsum)
cmat = t(cmat)
rownames(cmat) = paste("trial", seq(n), sep="")
colnames(cmat) = paste("time", seq(n), sep="")
dat = as.data.frame(cmat)
dat$trial = rownames(dat)
mdat = melt(dat, id.vars="trial")
mdat$time = as.numeric(gsub("time", "", mdat$variable))
p = ggplot(mdat, aes(x=time, y=value, group=trial)) +
theme_bw() +
theme(panel.grid=element_blank()) +
geom_line(size=0.2, alpha=0.1)
So here, "trial number" is my group producing all of these lines, and there are 1000 trials.
Suppose I want to "group my grouping variable" now - that is, I want to see the exact same lines in this plot, but I want the first 500 trial lines to be one color and the next 500 trial lines to be another. How can I do this with ggplot? I've been poking around for some time and I can't figure out how to manually set the colors per group.
Add a variable splitting the data into two groups, then add use it to color the lines in ggplot
dat = as.data.frame(cmat)
dat$trial = rownames(dat)
dat$group = rep(c("a","b"), each = n/2)
mdat = melt(dat, id.vars=c("trial", "group"))
mdat$time = as.numeric(gsub("time", "", mdat$variable))
p = ggplot(mdat, aes(x=time, y=value, group=trial, color = group)) +
theme_bw() +
theme(panel.grid=element_blank()) +
geom_line(size=0.2, alpha=0.1)
One possible solution will be to create a new column with the index of the trial number and then using an ifelse condition, you can set different group based on the trial number and pass the grouping variable as color in aes such as:
mdat %>% mutate(Trial = as.numeric(sub("trial","",trial))) %>%
mutate(Group = ifelse(Trial < 51,"A","B")) %>%
ggplot(aes(x=time, y=value, group=trial, color = Group)) +
theme_bw() +
theme(panel.grid=element_blank()) +
geom_line(size=0.2, alpha=0.8)
Is it what you are looking for ?
NB: I only use n = 100 to get smallest dataframe.

Grouped bar plot using replicates

I can't find the answer looking in other group bar plot conversations. Each rename (or site name) should add up to 100% but the bars add up to more than that. I am wondering if I have my data set up incorrectly.
I also want to add error bars, but maybe once I get the replicates correct I can figure that out.
testData <- read.csv("composition.csv")
testData$id <- as.factor(testData$rename)
testDataMelt <- reshape2::melt(testData, rename.vars = "rename")
ggplot(testDataMelt,
aes(x = rename, y =value, group = replicate, fill = replicate)) +
geom_bar(stat = "identity", position = "dodge") +
xlab("Lake") +
ylab("% of Sediment Mass") +
labs(fill = "") +
scale_fill_grey()
As suggested by #PoGibas, here is an example with summarizing your data before passing it to ggplot.
Because I do not have your data in a easy to use format, I'll make some fake data for 3 sites; gravel, sand, silt & clay sum up to 100% for each row as in your original data.
set.seed(2018)
df <- data.frame(rename = c("HOG", "MAR", "MO BH"),
gravel = sample(20:40, 9),
sand = sample(40:50, 9),
silt = sample(0:10, 9))
df$clay = as.integer(100 - rowSums(df[,2:4]))
Here is a solution with data.table (this package needs far more advertising) for computing the means and standard errors (to be used for error bars).
library(ggplot2)
library(data.table) # for aggregations
# Convert to data.table object and
# calculate the means and standard errors of each variable per site.
setDT(df)
testDataMelt <- melt(df, id.vars = "rename")
testDataMelt_agg <- testDataMelt[, .(mean = mean(value),
se = sd(value)/.N),
by = .(rename, variable)]
# The mean percent of sediments sum up to 100% for each site.
# We are ready to make the graph.
ggplot(testDataMelt_agg,
aes(x = rename, y = mean, fill = variable)) +
geom_bar(stat = "identity", position = "dodge") +
# Add error bars (here +/- 1.96 SE)
geom_errorbar(aes(ymax = mean + 1.96*se,
ymin = mean - 1.96*se),
position = "dodge") +
xlab("Lake") +
ylab("% of Sediment Mass") +
labs(fill = "") +
scale_fill_grey()

position_dodge when using separate datasets

I am attempting to produce a graph that shows two groups of error bars, but the different error bars represent different estimates of central tendency/variability (e.g., mean with sd and median with quantiles). I'm trying to use position_dodge, but it's not working, and I suspect this is because I'm feeding it values from a different dataset. Here's a reproducible example:
#### simulate dosages
dose = factor(rep(c("small", "medium", "large"), times=10))
dose = relevel(dose, "small")
#### simulate fevers, based on dosage (but highly skewed)
fever = rnorm(length(dose), 100, 1)
betas = matrix(c(0, -3, -6), nrow=1)
fever = fever + as.numeric(betas%*%t(model.matrix(fever~dose)))
#### put into data frame
d = data.frame(dose=dose, fever=fever)
#### compute means and standard errors
means = d %>% group_by(dose) %>% summarise(mean=mean(fever), lower=mean - sd(fever), upper = mean + sd(fever))
medians = d %>% group_by(dose) %>% summarise(median=median(fever), lower=quantile(fever, .25), upper = quantile(fever, .75))
#### put all into a ggplot
ggplot(d, aes(x=dose, y=fever)) +
geom_jitter(alpha=.2, width=.2) +
geom_point(data=means, aes(x=dose, y=mean)) +
geom_point(data=medians, aes(x=dose, y=median), col="red") +
geom_errorbar(data=means, aes(y=mean, ymin=lower, ymax=upper), width=.2, position=position_dodge(width=.2)) +
geom_errorbar(data= medians, aes(y=median, ymin=lower, ymax=upper), width=.2, position=position_dodge(width=.2), col="red")
Which gives the results of the following image:
Notice dodging isn't working.
Let's assume I can't just use stat_summary (I can't...I'm actually comparing means with some robust estimates from another package). Is there any way to offset the error bars/dots so they can be better seen?
Combine your dataframes for both statistics so you can map the kind of statistic on group:
means <- df %>%
group_by(dose) %>%
summarise(Statistic = "Mean", Value = mean(fever), lower=mean(fever) - sd(fever), upper = mean(fever) + sd(fever))
medians <- df %>%
group_by(dose) %>%
summarise(Statistic = "Median", Value = median(fever), lower=quantile(fever, 0.25), upper = quantile(fever, 0.75))
df2 <- bind_rows(means, medians)
#### put all into a ggplot
ggplot(df, aes(x = dose, y = fever)) +
geom_jitter(alpha = .2, width = .2) +
geom_point(data = df2, aes(x = dose, y = Value, color = Statistic)) +
geom_errorbar(data = df2, aes(y = Value, ymin = lower, ymax = upper,
group = Statistic, color = Statistic),
width=.2, position = position_dodge(width = .2))

Annotate x-axis with N in faceted plot, but preserve empty facets

I asked a question yesterday about annotating the x-axis with N in a faceted plot using a minimal example that turns out to be too simple, relative to my real problem. The answer given there works in the case of complete data, but if you have missing facets you would like to preserve, the combination of facet_wrap options drop=FALSE and scales="free_x" triggers an error: "Error in if (zero_range(from) || zero_range(to)) { : missing value where TRUE/FALSE needed"
Here is a new, less-minimal example. The goal here is to produce a large graph with two panels using grid.arrange; the first showing absolute values over time by treatment group; the second showing the change from baseline over time by treatment group. In the second panel, we need a blank facet when vis=1.
# setup
library(ggplot2)
library(plyr)
library(gridExtra)
trt <- factor(rep(LETTERS[1:2],150),ordered=TRUE)
vis <- factor(c(rep(1,150),rep(2,100),rep(3,50)),ordered=TRUE)
id <- c(c(1:150),c(1:100),c(1:50))
val <- rnorm(300)
data <- data.frame(id,trt,vis,val)
base <- with(subset(data,vis==1),data.frame(id,trt,baseval=val))
data <- merge(data,base,by="id")
data <- transform(data,chg=ifelse(vis==1,NA,val-baseval))
data.sum <- ddply(data, .(vis, trt), summarise, N=length(na.omit(val)))
data <- merge(data,data.sum)
data <- transform(data, trtN=paste(trt,N,sep="\n"))
mytheme <- theme_bw() + theme(panel.margin = unit(0, "lines"), strip.background = element_blank())
# no missing facets
plot.a <- ggplot(data) + geom_boxplot(aes(x=trtN,y=val,group=trt,colour=trt), show.legend=FALSE) +
facet_wrap(~ vis, drop=FALSE, switch="x", nrow=1, scales="free_x") +
labs(x="Visit") + mytheme
# first facet should be blank
plot.b <- ggplot(data) + geom_boxplot(aes(x=trtN,y=chg,group=trt,colour=trt), show.legend=FALSE) +
facet_wrap(~ vis, drop=FALSE, switch="x", nrow=1, scales="free_x") +
labs(x="Visit") + mytheme
grid.arrange(plot.a,plot.b,nrow=2)
You can add a blank layer to draw all the facets in your second plot. The key is that you need a variable that exists for every level of vis to use as your y variable. In your case you can simply use the variable you used in your first plot.
ggplot(data) +
geom_boxplot(aes(x = trtN, y = chg, group = trt, colour = trt), show.legend = FALSE) +
geom_blank(aes(x = trtN, y = val)) +
facet_wrap(~ vis, switch = "x", nrow = 1, scales = "free_x") +
labs(x="Visit") + mytheme
If your variables have different ranges, you can set the y limits using the overall min and max of your boxplot y variable.
+ scale_y_continuous(limits = c(min(data$chg, na.rm = TRUE), max(data$chg, na.rm = TRUE)))

How to show directlabels after geom_smooth and not after geom_line?

I'm using directlabels to annotate my plot. As you can see in this picture the labels are after geom_line but I want them after geom_smooth. Is this supported by directlabels? Or any other ideas how to achieve this? Thanks in advance!
This is my code:
library(ggplot2)
library(directlabels)
set.seed(124234345)
# Generate data
df.2 <- data.frame("n_gram" = c("word1"),
"year" = rep(100:199),
"match_count" = runif(100 ,min = 1000 , max = 2000))
df.2 <- rbind(df.2, data.frame("n_gram" = c("word2"),
"year" = rep(100:199),
"match_count" = runif(100 ,min = 1000 , max = 2000)) )
# plot
ggplot(df.2, aes(year, match_count, group=n_gram, color=n_gram)) +
geom_line(alpha = I(7/10), color="grey", show_guide=F) +
stat_smooth(size=2, span=0.3, se=F, show_guide=F) +
geom_dl(aes(label=n_gram), method = "last.bumpup", show_guide=F) +
xlim(c(100,220))
This answer takes the basic concept of #celt-Ail's answer, and rather than function, base R, and direct label, attempts a tidyverse approach, stealing some code from here for the multiple loess models.
Happy to hear suggested improvements.
set.seed(124234345)
# Generate data
df.2 <- data.frame("n_gram" = c("word1"),
"year" = rep(100:199),
"match_count" = runif(100 ,min = 1000 , max = 2000))
df.2 <- rbind(df.2, data.frame("n_gram" = c("word2"),
"year" = rep(100:199),
"match_count" = runif(100 ,min = 1000 , max = 2000)) )
#example of loess for multiple models
#https://stackoverflow.com/a/55127487/4927395
library(dplyr)
library(tidyr)
library(purrr)
library(ggplot2)
models <- df.2 %>%
tidyr::nest(-n_gram) %>%
dplyr::mutate(
# Perform loess calculation on each CpG group
m = purrr::map(data, loess,
formula = match_count ~ year, span = .3),
# Retrieve the fitted values from each model
fitted = purrr::map(m, `[[`, "fitted")
)
# Apply fitted y's as a new column
results <- models %>%
dplyr::select(-m) %>%
tidyr::unnest()
#find final x values for each group
my_last_points <- results %>% group_by(n_gram) %>% summarise(year = max(year, na.rm=TRUE))
#Join dataframe of predictions to group labels
my_last_points$pred_y <- left_join(my_last_points, results)
# Plot with loess line for each group
ggplot(results, aes(x = year, y = match_count, group = n_gram, colour = n_gram)) +
geom_line(alpha = I(7/10), color="grey", show.legend=F) +
#stat_smooth(size=2, span=0.3, se=F, show_guide=F)
geom_point() +
geom_line(aes(y = fitted))+
geom_text(data = my_last_points, aes(x=year+5, y=pred_y$fitted, label = n_gram))
# use stat smooth with geom_dl to get matching direct labels.
span <- 0.3
ggplot(df.2, aes(year, match_count, group=n_gram, color=n_gram)) +
geom_line(alpha = I(7/10), color="grey") +
stat_smooth(size=2, span=span, se=F) +
geom_dl(aes(label=n_gram), method = "last.qp", stat="smooth", span=span) +
xlim(c(100,220))+
guides(colour="none")
This is not what you asked for as I don't know how to do that, but this might be more useful to you as you will lose less plotting area to labels:
PLOT <- ggplot(df.2, aes(year, match_count, group=n_gram, color=n_gram)) +
geom_line(alpha = I(7/10), color="grey", show_guide=F) +
stat_smooth(size=2, span=0.3, se=F, show_guide=F)
mymethod <- list(
"top.points",
dl.move("word1", hjust=-6.65, vjust=13),
dl.move("word2", hjust =-7.9, vjust=20.25)
)
direct.label(PLOT, mymethod)
which yields:
You could also try:
mymethod <- list(
"top.points",
dl.move("word1", hjust=-6, vjust=14),
dl.move("word2", hjust =-7.1, vjust=19.5)
)
ggplot(df.2, aes(year, match_count, group=n_gram, color=n_gram)) +
geom_line(alpha = I(7/10), color="grey", show_guide=F) +
xlim(c(100,220))+
stat_smooth(size=2, span=0.3, se=F, show_guide=F) +
geom_dl(aes(label=n_gram), method = mymethod, show_guide=F)
which yields:
NOTE: to print to other graphics devices (this was the windows rgui) you'll need to tweak the vjust and hjust to suit. But if there's a more direct way that would be nicer.
I'm gonna answer my own question here, since I figured it out thanks to a response from Tyler Rinker.
This is how I solved it using loess() to get label positions.
# Function to get last Y-value from loess
funcDlMove <- function (n_gram) {
model <- loess(match_count ~ year, df.2[df.2$n_gram==n_gram,], span=0.3)
Y <- model$fitted[length(model$fitted)]
Y <- dl.move(n_gram, y=Y,x=200)
return(Y)
}
index <- unique(df.2$n_gram)
mymethod <- list(
"top.points",
lapply(index, funcDlMove)
)
# Plot
PLOT <- ggplot(df.2, aes(year, match_count, group=n_gram, color=n_gram)) +
geom_line(alpha = I(7/10), color="grey", show_guide=F) +
stat_smooth(size=2, span=0.3, se=F, show_guide=F)
direct.label(PLOT, mymethod)
Which will generate this plot: http://i.stack.imgur.com/FGK1w.png

Resources