Add points to grouped bar plot (ggplot2) - r

I have a dataset containing measurements at two timepoints. I have made a barplot of the data of the first timepoint, and would like to add points for the second timepoint (this is just meant as a reference). As you can see below the points are there, but they are not in the correct x-axis position (i.e. they are all on the same x-value, unlike the bars).
How to solve this?
library(ggplot2)
MyData = data.frame(
method=rep(c("A","B","C","D","E"),times=3),
time1=rnorm(30,10,3),
time2=rnorm(30,8,2),
lab=rep(rep(c(1,2,3),each=5),times=2),
cat=rep(c(1,2),each=15)
)
p <- ggplot(data = MyData,
aes(x=lab)) +
geom_bar(aes(y=time1,fill=method),
stat="identity",
position="dodge",
alpha=.7
) +
geom_point(aes(y=time2,group=method),
stat="identity",
position="dodge",
alpha=.8,
size=3) +
scale_fill_brewer(palette=3) +
facet_grid(. ~ cat)
p

Use position = position_dodge(width = .9) in geom_point.

Related

Three level group to geom_col plot using facet_wrap [duplicate]

I have the following dataset:
subj <- c(rep(11,3),rep(12,3),rep(14,3),rep(15,3),rep(17,3),rep(18,3),rep(20,3))
group <- c(rep("u",3),rep("t",6),rep("u",6),rep("t",6))
time <- rep(1:3,7)
mean <- c(0.7352941, 0.8059701, 0.8823529, 0.9264706, 0.9852941, 0.9558824, 0.7941176, 0.8676471, 0.7910448, 0.7058824, 0.8382353, 0.7941176, 0.9411765, 0.9558824, 0.9852941, 0.7647059, 0.8088235, 0.7968750, 0.8088235, 0.8500000, 0.8412698)
df <- data.frame(subj,group,time,mean)
df$subj <- as.factor(df$subj)
df$time <- as.factor(df$time)
And now I create a barplot with ggplot2:
library(ggplot2)
qplot(x=subj, y=mean*100, fill=time, data=df, geom="bar",stat="identity",position="dodge") +
facet_wrap(~ group)
How do I make it so that the x-axis labels that are not present in each facet are not shown? How do I get equal distances between each subj (i.e. get rid of the bigger gaps)?
You can use scale="free":
ggplot(df, aes(x=subj, y=mean*100, fill=time)) +
geom_bar(stat="identity", position="dodge") +
facet_wrap(~ group, scale="free")
Another option with slightly different aesthetics using facet_grid. In contrast to the plots above, the panels aren't the same width here, but due to "space="free_x", the bars are the same widths.
ggplot(df, aes(x=subj, y=mean*100, fill=time)) +
geom_bar(stat="identity", position="dodge") +
facet_grid(~ group, scale="free", space="free_x")

Connecting the points of a dodged geom_point by group

Consider this nested data set, in which i aim at plotting the two nested factor variables on the x-axis:
df <- data.frame(X=c(rep("A",9), rep("B",9), rep("C",9)),
nested=c(rep(c(rep("X",3), rep("Y",3), rep("Z",3)),3)),
response=runif(27))
ggplot(df) +
geom_point(aes(x=X, y=response, col=nested, group=nested, shape=nested), position=position_dodge(width=1))
I want to connect the dots in each level of nested in each level of X to have vertical, parallel lines in the plot, spanning from the maximum to the minimum of response in each nested level. (much alike i would use fill=nested if would go for boxplots), but my first approach was not satisfactory:
ggplot(df) +
geom_point(aes(x=X, y=response, col=nested, group=nested, shape=nested), position=position_dodge(width=0.3))+
geom_line(aes(x=X, y=response, col=nested, group=nested))
I can imagine using geom_errorbar, but this means i would need to create a separate dataframe with min and max values, correct?
You can dodge the lines too. Just make sure that the group aesthetic is mapped to the interaction between nested and X:
ggplot(df) +
geom_point(aes(x = X, y = response, col = nested, shape = nested),
position=position_dodge(width = 0.3)) +
geom_line(aes(x = X, y = response, col = nested,
group = interaction(nested, X)),
position = position_dodge(width = 0.3))
Maybe try this approach with facet_wrap():
library(ggplot2)
#Code
ggplot(df) +
geom_point(aes(x=X, y=response,
col=nested, group=nested, shape=nested),
position=position_dodge(width=1))+
geom_line(aes(x=X, y=response,
col=nested,group=nested),position=position_dodge(width=1))+
facet_wrap(.~X,scales='free_x')+
theme(strip.background = element_blank(),
strip.text = element_blank())
Output:

Plotting two data series in overlapping barplot (ggplot2)

I would like to plot two data series (same type and number of measures, but measured at two timepoints) in the same barplot. Preferably the first series is plotted in grey, with the second series plotted in colours with transparency such that the series 1 data is still visible.
The data I have is of the following format:
MyData = data.frame(
method=rep(c("A","B","C","D","E"),times=3),
time1=rnorm(30,10,3),
time2=rnorm(30,8,2),
lab=rep(rep(c(1,2,3),each=5),times=2),
cat=rep(c(1,2),each=15)
)
To show the type of plot I'm looking for I have added the code for plotting data series 1 below:
p <- ggplot(data = MyData,
aes(x=lab,
y=time1,
fill=method))
p + geom_bar(stat="identity",
position="dodge",
alpha=.3) +
facet_grid(. ~ cat)
In the end it doesn't really matter which one of the data series is in grey and which is in colour, as long as they are plotted on top of each other, and both are visible.
All suggestions are welcome!
There can only be one active fill_scale, so we need to map the variable method to something else, either group or color.
library(ggplot2)
MyData = data.frame(
method=rep(c("A","B","C","D","E"),times=3),
time1=rnorm(30,10,3),
time2=rnorm(30,8,2),
lab=rep(rep(c(1,2,3),each=5),times=2),
cat=rep(c(1,2),each=15)
)
p <- ggplot(data = MyData,
aes(x=lab)) +
geom_bar(aes(y=time2,fill=method),
stat="identity",
position="dodge",
alpha=.3
) +
geom_bar(aes(y=time1,group=method),
stat="identity",
position="dodge",
alpha=.3) +
scale_fill_discrete() +
facet_grid(. ~ cat)
p
I have been thinking about a different way to add the second data series. I can add the second series using geom_point instead of geom_bar, as this gives less clutter. However, how do I position the points on the corresponding bar? (i.e. right now the points are all on the same x-axis position).
library(ggplot2)
MyData = data.frame(
method=rep(c("A","B","C","D","E"),times=3),
time1=rnorm(30,10,3),
time2=rnorm(30,8,2),
lab=rep(rep(c(1,2,3),each=5),times=2),
cat=rep(c(1,2),each=15)
)
p <- ggplot(data = MyData,
aes(x=lab)) +
geom_bar(aes(y=time1,fill=method),
stat="identity",
position="dodge",
alpha=.7
) +
geom_point(aes(y=time2,group=method),
stat="identity",
position="dodge",
alpha=.8,
size=3) +
scale_fill_brewer(palette=3) +
facet_grid(. ~ cat)
p

Subgroup Boxplots in R

I'm trying to make a graphic that will show three things side-by-side. First is to show change in the individual over time. Next is to show change in their peer group over time. Last is to show change in the overall population over time.
I have four time points on each observation. What I'd like to see is two sets of boxplots next to each other, one for the peer group and one for the population. Overlaid on each of these would the datapoints for a given individual. Each set would show data at time1, time2, time3, and time4. The overlayed points would convey where the individuals had been at each time, so the information can be conveyed in two sets of boxplots.
Here is code to simulate the sort of data I am working with, and my ineffective attempt at creating my plot.
peer <- c(rep(1, 15), rep(2, 41))
year <- rep(c(1, 2), 28)
pct <- rep(1:8, 7)
dat <- data.frame(cbind(peer, year, pct))
ggplot(dat, aes(peer==1, pct)) + geom_boxplot() + facet_grid(. ~ year)
I don't think my ggplot approach is even close to correct. Please help!
Here's a sketch of what I'm trying to do.
Is this close to what you had in mind? There's a boxplot for each value of peer for each year. I've also included the mean values for each group.
# Boxplots for each combination of year and peer, with means superimposed
ggplot(dat, aes(year, pct, group=interaction(year,peer), colour=factor(peer))) +
geom_boxplot(position=position_dodge(width=0.4), width=0.4) +
stat_summary(fun.y=mean, geom="line", position=position_dodge(width=0.4),
aes(group=peer)) +
stat_summary(fun.y=mean, geom="point", position=position_dodge(width=0.4), size=4,
aes(group=peer)) +
scale_x_continuous(breaks=unique(dat$year))
You can add a population boxplot, but then the plot starts to look cluttered:
# Add population boxplot (not grouped by peer)
ggplot(dat, aes(year, pct, group=interaction(year,peer), colour=factor(peer))) +
geom_boxplot(aes(group=year), width=0.05, colour="grey60", fill="#FFFFFF90") +
geom_boxplot(position=position_dodge(width=0.4), width=0.2) +
stat_summary(fun.y=mean, geom="line", position=position_dodge(width=0.4),
aes(group=peer)) +
stat_summary(fun.y=mean, geom="point", position=position_dodge(width=0.4), size=4,
aes(group=peer)) +
scale_x_continuous(breaks=unique(dat$year))
UPDATE: Based on your comment, maybe something like this:
# Add an ID variable to the data
dat$id = rep(1:(nrow(dat)/2), each=2)
library(gridExtra) # For grid.arrange function
pdf("plots.pdf", 7, 5)
for (i in unique(dat$id)) {
p1 = ggplot() +
geom_boxplot(data=dat[dat$peer==unique(dat$peer[dat$id==i]),],
aes(year, pct, group=year)) +
geom_point(data=dat[dat$id==i,], aes(year, pct),
pch=8, colour="red", size=5) +
ggtitle("Your Peers")
p2 = ggplot() +
geom_boxplot(data=dat, aes(year, pct, group=year)) +
geom_point(data=dat[dat$id==i,], aes(year, pct),
pch=8, colour="red", size=5) +
ggtitle("All Participants")
grid.arrange(p1, p2, ncol=2, main=paste0("ID = ", i))
}
dev.off()
Here's what the first plot looks like:

Plotting means as a line plot onto a scatter plot with ggplot

I have this simple data frame holding three replicates (value) for each factor (CT). I would like to plot it as geom_point and than the means of the point as geom_line.
gene <- c("Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5")
value <- c(0.86443, 0.79032, 0.86517, 0.79782, 0.79439, 0.89221, 0.93071, 0.87170, 0.86488, 0.91133, 0.87202, 0.84028, 0.83242, 0.74016, 0.86656)
CT <- c("ET","ET","ET", "HP","HP","HP","HT","HT","HT", "LT","LT","LT","P","P","P")
df<- cbind(gene,value,CT)
df<- data.frame(df)
So, I can make the scatter plot.
ggplot(df, aes(x=CT, y=value)) + geom_point()
How do I get a geom_line representing the means for each factor. I have tried the stat_summary:
ggplot(df, aes(x=CT, y=value)) + geom_point() +
stat_summary(aes(y = value,group = CT), fun.y=mean, colour="red", geom="line")
But it does not work.
"geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?"
But each group has three observations, what is wrong?
Ps. I am also interested in a smooth line.
You should set the group aes to 1:
ggplot(df, aes(x=CT, y=value)) + geom_point() +
stat_summary(aes(y = value,group=1), fun.y=mean, colour="red", geom="line",group=1)
You can use the dplyr package to get the means of each factor.
library(dplyr)
group_means <- df %>%
group_by(CT) %>%
summarise(mean = mean(value))
Then you will need to convert the factors to numeric to let you plot lines on the graph using the geom_segment function. In addition, the scale_x_continuous function will let you set the labels for the x axis.
ggplot(df, aes(x=as.numeric(CT), y=value)) + geom_point() +
geom_segment(aes(x=as.numeric(CT)-0.4, xend=as.numeric(CT)+0.4, y=mean, yend=mean),
data=group_means, colour="red") +
scale_x_continuous("name", labels=as.character(df$CT), breaks=as.numeric(df$CT))
Following on from hrbrmstr's comment you can add the smooth line using the following:
ggplot(df, aes(x=as.numeric(CT), y=value, group=1)) + geom_point() +
geom_segment(aes(x=as.numeric(CT)-0.4, xend=as.numeric(CT)+0.4, y=mean, yend=mean),
data=group_means, colour="red") +
scale_x_continuous("name", labels=as.character(df$CT), breaks=as.numeric(df$CT)) +
geom_smooth()

Resources