Plotting means as a line plot onto a scatter plot with ggplot - r

I have this simple data frame holding three replicates (value) for each factor (CT). I would like to plot it as geom_point and than the means of the point as geom_line.
gene <- c("Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5")
value <- c(0.86443, 0.79032, 0.86517, 0.79782, 0.79439, 0.89221, 0.93071, 0.87170, 0.86488, 0.91133, 0.87202, 0.84028, 0.83242, 0.74016, 0.86656)
CT <- c("ET","ET","ET", "HP","HP","HP","HT","HT","HT", "LT","LT","LT","P","P","P")
df<- cbind(gene,value,CT)
df<- data.frame(df)
So, I can make the scatter plot.
ggplot(df, aes(x=CT, y=value)) + geom_point()
How do I get a geom_line representing the means for each factor. I have tried the stat_summary:
ggplot(df, aes(x=CT, y=value)) + geom_point() +
stat_summary(aes(y = value,group = CT), fun.y=mean, colour="red", geom="line")
But it does not work.
"geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?"
But each group has three observations, what is wrong?
Ps. I am also interested in a smooth line.

You should set the group aes to 1:
ggplot(df, aes(x=CT, y=value)) + geom_point() +
stat_summary(aes(y = value,group=1), fun.y=mean, colour="red", geom="line",group=1)

You can use the dplyr package to get the means of each factor.
library(dplyr)
group_means <- df %>%
group_by(CT) %>%
summarise(mean = mean(value))
Then you will need to convert the factors to numeric to let you plot lines on the graph using the geom_segment function. In addition, the scale_x_continuous function will let you set the labels for the x axis.
ggplot(df, aes(x=as.numeric(CT), y=value)) + geom_point() +
geom_segment(aes(x=as.numeric(CT)-0.4, xend=as.numeric(CT)+0.4, y=mean, yend=mean),
data=group_means, colour="red") +
scale_x_continuous("name", labels=as.character(df$CT), breaks=as.numeric(df$CT))
Following on from hrbrmstr's comment you can add the smooth line using the following:
ggplot(df, aes(x=as.numeric(CT), y=value, group=1)) + geom_point() +
geom_segment(aes(x=as.numeric(CT)-0.4, xend=as.numeric(CT)+0.4, y=mean, yend=mean),
data=group_means, colour="red") +
scale_x_continuous("name", labels=as.character(df$CT), breaks=as.numeric(df$CT)) +
geom_smooth()

Related

Three level group to geom_col plot using facet_wrap [duplicate]

I have the following dataset:
subj <- c(rep(11,3),rep(12,3),rep(14,3),rep(15,3),rep(17,3),rep(18,3),rep(20,3))
group <- c(rep("u",3),rep("t",6),rep("u",6),rep("t",6))
time <- rep(1:3,7)
mean <- c(0.7352941, 0.8059701, 0.8823529, 0.9264706, 0.9852941, 0.9558824, 0.7941176, 0.8676471, 0.7910448, 0.7058824, 0.8382353, 0.7941176, 0.9411765, 0.9558824, 0.9852941, 0.7647059, 0.8088235, 0.7968750, 0.8088235, 0.8500000, 0.8412698)
df <- data.frame(subj,group,time,mean)
df$subj <- as.factor(df$subj)
df$time <- as.factor(df$time)
And now I create a barplot with ggplot2:
library(ggplot2)
qplot(x=subj, y=mean*100, fill=time, data=df, geom="bar",stat="identity",position="dodge") +
facet_wrap(~ group)
How do I make it so that the x-axis labels that are not present in each facet are not shown? How do I get equal distances between each subj (i.e. get rid of the bigger gaps)?
You can use scale="free":
ggplot(df, aes(x=subj, y=mean*100, fill=time)) +
geom_bar(stat="identity", position="dodge") +
facet_wrap(~ group, scale="free")
Another option with slightly different aesthetics using facet_grid. In contrast to the plots above, the panels aren't the same width here, but due to "space="free_x", the bars are the same widths.
ggplot(df, aes(x=subj, y=mean*100, fill=time)) +
geom_bar(stat="identity", position="dodge") +
facet_grid(~ group, scale="free", space="free_x")

I am drawing a plot in ggplot in R and all I am getting as the result is a line with points in vertical direction with no x-labels or markings

ggplot(data=df, aes(x='Matcing_Probability', y=Locus_Name, group=1)) +
+ geom_line(color="#aa0022", size=1.75) +
+ geom_point(color="#aa0022", size=3.5)
This is the graph I am getting from the code.
You need to send ggplot2 symbols (unquoted column names) in aes() if you are assigning an aesthetic to a column in your dataset. Otherwise, it will assume you are sending the string of a new symbol. So:
# your original
ggplot(data=df, aes(x='Matching_Probability', y=Locus_Name, group=1))
# change to this:
ggplot(data=df, aes(x=Matching_Probability, y=Locus_Name, group=1))
Consider the difference in the following example to highlight why even more:
# this works fine
df <- data.frame(x=1:10, y=1:10)
ggplot(df, aes(x=x,y=y)) + geom_point()
# this doesn't:
ggplot(df, aes(x="x",y=y)) + geom_point()

Calculating means with stat_summary for two different groupings and plotting in one plot

I am having issues with plotting two calculated means using stat_summary in the same figure.
I am using ggplot and stat_summary to plot means of a dataset that I grouped based on variable A. Variable A can have value 1,2,3,4. The same data also have variable B that can have value 1,2.
So, I can make a plot with means of the data grouped after variable A, and I get 4 lines.
I can also make a plot with means of the data grouped after variable B, where I get 2 lines.
But how can I plot them in the same figure, so that I get 6 lines? I have made a somewhat similar example using the mtcars dataset:
library(ggplot2)
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$vs <- as.factor(mtcars$vs)
mtcars
plot1 <- ggplot(mtcars, aes(x=gear, y=hp, color=cyl, fill=cyl)) +
stat_summary(geom='ribbon', fun.data = mean_cl_normal, fun.args=list(conf.int=0.95), alpha=0.5) +
stat_summary(geom='line', fun.y = mean, size=1)
plot1
plot2 <- ggplot(mtcars, aes(x=gear, y=hp, color=vs, fill=vs)) +
stat_summary(geom='ribbon', fun.data = mean_cl_normal, fun.args=list(conf.int=0.95), alpha=0.5) +
stat_summary(geom='line', fun.y = mean, size=1)
plot2
So far I have the impression, that since I start with ggplot(xxx), where xxx defines the data and grouping, I can't combine it with another ggplot with another grouping. If I could initiate ggplot() without defining anything in the argument, but only defining data and grouping in the argument for stat_summary, I feel like that would be the solution. But I can't figure out how to use stat_summary like that, if even possible.
You can just add more layers, defining the aes for each seperately:
ggplot(mtcars) +
stat_summary(aes(x=gear, y=hp, color=paste('cyl:', cyl), fill = paste('cyl:', cyl)), geom='ribbon', fun.data = mean_cl_normal, fun.args=list(conf.int=0.95), alpha=0.5) +
stat_summary(aes(x=gear, y=hp, color=paste('cyl:', cyl)), geom='line', fun.y = mean, size=1) +
stat_summary(aes(x=gear, y=hp, color=paste('vs:', vs), fill=paste('vs:', vs)), geom='ribbon', fun.data = mean_cl_normal, fun.args=list(conf.int=0.95), alpha=0.5) +
stat_summary(aes(x=gear, y=hp, color=paste('vs:', vs)), geom='line', fun.y = mean, size=1)

Draw interval on geom_density

How do I draw a horizontal line indicating the Highest (Posterior) Density interval for faceted density plots in ggplot2? This is what I have tried:
# Functions to calculate lower and upper part of HPD.
hpd_lower = function(x) coda::HPDinterval(as.mcmc(x))[1]
hpd_upper = function(x) coda::HPDinterval(as.mcmc(x))[2]
# Data: two groups with different means
df = data.frame(value=c(rnorm(500), rnorm(500, mean=5)), group=rep(c('A', 'B'), each=500))
# Plot it
ggplot(df, aes(x=value)) +
geom_density() +
facet_wrap(~group) +
geom_segment(aes(x=hpd_lower(value), xend=hpd_upper(value), y=0, yend=0), size=3)
As you can see, geom_segment computes on all data for both facets whereas I would like it to respect the faceting. I would also like a solution where HPDinterval is only run once per facet.
Pre-calculate the hpd intervals. ggplot evaluates the calculations in the aes() function in the entire data frame, even when data are grouped.
# Plot it
library(dplyr)
df_hpd <- group_by(df, group) %>% summarize(x=hpd_lower(value), xend=hpd_upper(value))
ggplot(df, aes(x=value)) +
geom_density() +
facet_wrap(~group) +
geom_segment(data = df_hpd, aes(x=x, xend=xend, y=0, yend=0), size=3)

How to plot two histograms on the same axis scale?

I have two dataframes: dataf1, dataf2. They have the same structure and columns.
3 columns names are A,B,C. And they both have 50 rows.
I would like to plot the histogram of column B on dataf1 and dataf2. I can plot two histograms separately but they are not of the same scale. I would like to know how to either put them on the same histogram using different colors or plot two histograms of the same scale?
ggplot() + aes(dataf1$B)+ geom_histogram(binwidth=1, colour="black",fill="white")
ggplot() + aes(dataf2$B)+ geom_histogram(binwidth=1, colour="black", fill="white")
Combine your data into a single data frame with a new column marking which data frame the data originally came from. Then use that new column for the fill aesthetic for your plot.
data1$source="Data 1"
data2$source="Data 2"
dat_combined = rbind(data1, data2)
You haven't provided sample data, so here are a few examples of possible plots, using the built-in iris data frame. In the plots below, dat is analogous to dat_combined, Petal.Width is analogous to B, and Species is analogous to source.
dat = subset(iris, Species != "setosa") # We want just two species
ggplot(dat, aes(Petal.Width, fill=Species)) +
geom_histogram(position="identity", colour="grey40", alpha=0.5, binwidth=0.1)
ggplot(dat, aes(Petal.Width, fill=Species)) +
geom_histogram(position="dodge", binwidth=0.1)
ggplot(dat, aes(Petal.Width, fill=Species)) +
geom_histogram(position="identity", colour="grey40", binwidth=0.1) +
facet_grid(Species ~ .)
As Zheyuan says, you just need to set the y limits for each plot to get them on the same scale. With ggplot2, one way to do this is with the lims command (though scale_y_continuous and coord_cartesian also work, albeit slightly differently). You also should never use data$column indside aes(). Instead, use the data argument for the data frame and unquoted column names inside aes(). Here's an example with some built-in data.
p1 = ggplot(mtcars, aes(x = mpg)) + geom_histogram() + lims(y = c(0, 13))
p2 = ggplot(iris, aes(x = Sepal.Length)) + geom_histogram() + lims(y = c(0, 13))
gridExtra::grid.arrange(p1, p2, nrow = 1)
Two get two histograms on the same plot, the best way is to combine your data frames. A guess, without seeing what your data looks like:
dataf = rbind(dataf1["B"], dataf2["B"])
dafaf$source = c(rep("f1", nrow(dataf1)), rep("f2", nrow(dataf2))
ggplot(dataf, aes(x = B, fill = source)) +
geom_histogram(position = "identity", alpha = 0.7)

Resources