How do I draw a horizontal line indicating the Highest (Posterior) Density interval for faceted density plots in ggplot2? This is what I have tried:
# Functions to calculate lower and upper part of HPD.
hpd_lower = function(x) coda::HPDinterval(as.mcmc(x))[1]
hpd_upper = function(x) coda::HPDinterval(as.mcmc(x))[2]
# Data: two groups with different means
df = data.frame(value=c(rnorm(500), rnorm(500, mean=5)), group=rep(c('A', 'B'), each=500))
# Plot it
ggplot(df, aes(x=value)) +
geom_density() +
facet_wrap(~group) +
geom_segment(aes(x=hpd_lower(value), xend=hpd_upper(value), y=0, yend=0), size=3)
As you can see, geom_segment computes on all data for both facets whereas I would like it to respect the faceting. I would also like a solution where HPDinterval is only run once per facet.
Pre-calculate the hpd intervals. ggplot evaluates the calculations in the aes() function in the entire data frame, even when data are grouped.
# Plot it
library(dplyr)
df_hpd <- group_by(df, group) %>% summarize(x=hpd_lower(value), xend=hpd_upper(value))
ggplot(df, aes(x=value)) +
geom_density() +
facet_wrap(~group) +
geom_segment(data = df_hpd, aes(x=x, xend=xend, y=0, yend=0), size=3)
Related
I have the following dataset:
subj <- c(rep(11,3),rep(12,3),rep(14,3),rep(15,3),rep(17,3),rep(18,3),rep(20,3))
group <- c(rep("u",3),rep("t",6),rep("u",6),rep("t",6))
time <- rep(1:3,7)
mean <- c(0.7352941, 0.8059701, 0.8823529, 0.9264706, 0.9852941, 0.9558824, 0.7941176, 0.8676471, 0.7910448, 0.7058824, 0.8382353, 0.7941176, 0.9411765, 0.9558824, 0.9852941, 0.7647059, 0.8088235, 0.7968750, 0.8088235, 0.8500000, 0.8412698)
df <- data.frame(subj,group,time,mean)
df$subj <- as.factor(df$subj)
df$time <- as.factor(df$time)
And now I create a barplot with ggplot2:
library(ggplot2)
qplot(x=subj, y=mean*100, fill=time, data=df, geom="bar",stat="identity",position="dodge") +
facet_wrap(~ group)
How do I make it so that the x-axis labels that are not present in each facet are not shown? How do I get equal distances between each subj (i.e. get rid of the bigger gaps)?
You can use scale="free":
ggplot(df, aes(x=subj, y=mean*100, fill=time)) +
geom_bar(stat="identity", position="dodge") +
facet_wrap(~ group, scale="free")
Another option with slightly different aesthetics using facet_grid. In contrast to the plots above, the panels aren't the same width here, but due to "space="free_x", the bars are the same widths.
ggplot(df, aes(x=subj, y=mean*100, fill=time)) +
geom_bar(stat="identity", position="dodge") +
facet_grid(~ group, scale="free", space="free_x")
I am trying to make a scatter plot with ggplot to show time watching TV on x axis and immigrant sentiment on y axis.
The code I am using is
ggplot(totalTV,
aes(x = dfnew.TV.watching..total.time.on.average.weekday,
y = dfnew.Immigrant.Sentiment)) +
geom_point()
I am getting this output
My table is so, with first variable being character, and subsequent two being numeric:
Any idea on how to produce a representative scatter of the outcome?
Cheers
Here are some examples using the mtcars dataset.
library(ggplot2)
# Original
ggplot(mtcars,aes(factor(cyl),mpg)) +
geom_point()
# Jitter
ggplot(mtcars,aes(factor(cyl),mpg)) +
geom_jitter(width = .2) # Control spread with width
# Violin plot
ggplot(mtcars,aes(factor(cyl),mpg)) +
geom_violin()
# Boxplot
ggplot(mtcars,aes(factor(cyl),mpg)) +
geom_boxplot()
# Remember that different geoms can be combined
ggplot(mtcars,aes(factor(cyl),mpg)) +
geom_violin() +
geom_jitter(width = .2)
# Or something more exotic ala Raincloud-plots
# https://micahallen.org/2018/03/15/introducing-raincloud-plots/
I have two dataframes: dataf1, dataf2. They have the same structure and columns.
3 columns names are A,B,C. And they both have 50 rows.
I would like to plot the histogram of column B on dataf1 and dataf2. I can plot two histograms separately but they are not of the same scale. I would like to know how to either put them on the same histogram using different colors or plot two histograms of the same scale?
ggplot() + aes(dataf1$B)+ geom_histogram(binwidth=1, colour="black",fill="white")
ggplot() + aes(dataf2$B)+ geom_histogram(binwidth=1, colour="black", fill="white")
Combine your data into a single data frame with a new column marking which data frame the data originally came from. Then use that new column for the fill aesthetic for your plot.
data1$source="Data 1"
data2$source="Data 2"
dat_combined = rbind(data1, data2)
You haven't provided sample data, so here are a few examples of possible plots, using the built-in iris data frame. In the plots below, dat is analogous to dat_combined, Petal.Width is analogous to B, and Species is analogous to source.
dat = subset(iris, Species != "setosa") # We want just two species
ggplot(dat, aes(Petal.Width, fill=Species)) +
geom_histogram(position="identity", colour="grey40", alpha=0.5, binwidth=0.1)
ggplot(dat, aes(Petal.Width, fill=Species)) +
geom_histogram(position="dodge", binwidth=0.1)
ggplot(dat, aes(Petal.Width, fill=Species)) +
geom_histogram(position="identity", colour="grey40", binwidth=0.1) +
facet_grid(Species ~ .)
As Zheyuan says, you just need to set the y limits for each plot to get them on the same scale. With ggplot2, one way to do this is with the lims command (though scale_y_continuous and coord_cartesian also work, albeit slightly differently). You also should never use data$column indside aes(). Instead, use the data argument for the data frame and unquoted column names inside aes(). Here's an example with some built-in data.
p1 = ggplot(mtcars, aes(x = mpg)) + geom_histogram() + lims(y = c(0, 13))
p2 = ggplot(iris, aes(x = Sepal.Length)) + geom_histogram() + lims(y = c(0, 13))
gridExtra::grid.arrange(p1, p2, nrow = 1)
Two get two histograms on the same plot, the best way is to combine your data frames. A guess, without seeing what your data looks like:
dataf = rbind(dataf1["B"], dataf2["B"])
dafaf$source = c(rep("f1", nrow(dataf1)), rep("f2", nrow(dataf2))
ggplot(dataf, aes(x = B, fill = source)) +
geom_histogram(position = "identity", alpha = 0.7)
I am trying to plot combined graphs for logistic regressions as the function logi.hist.plot but I would like to do it using ggplot2 (aesthetic reasons).
The problem is that only one of the histograms should have the scale_y_reverse().
Is there any way to specify this in a single plot (see code below) or to overlap the two histograms by using coordinates that can be passed to the previous plot?
ggplot(dat) +
geom_point(aes(x=ind, y=dep)) +
stat_smooth(aes(x=ind, y=dep), method=glm, method.args=list(family="binomial"), se=FALSE) +
geom_histogram(data=dat[dat$dep==0,], aes(x=ind)) +
geom_histogram(data=dat[dat$dep==1,], aes(x=ind)) ## + scale_y_reverse()
This final plot is what I have been trying to achieve:
We use geom_segment to create the "bars" for the histogram and also to create the rug plots. Adjust the size parameter to change the "bar" widths in the histogram. In the example below, the bar heights are equal to the percentage of values within a given x range. If you want to change the absolute heights of the bars, just multiply n/sum(n) by a scaling factor when you create the h data frame of histogram counts.
To generate histogram counts for the plot, we pre-summarize the data to create the histogram values. Note the ifelse statement in the mutate function, which adjusts the values of pct in order to get the upward and downward bars in the plot, depending on whether y is 0 or 1, respectively. You can do this in the plot code itself, but then you need two separate calls to geom_segment.
library(dplyr)
# Fake data
set.seed(1926)
dat = data.frame(y = sample(0:1, 1000, replace=TRUE))
dat$x1 = rnorm(1000, 5, 2) * (dat$y+1)
# Summarise data to create histogram counts
h = dat %>% group_by(y) %>%
mutate(breaks = cut(x1, breaks=seq(-2,20,0.5), labels=seq(-1.75,20,0.5),
include.lowest=TRUE),
breaks = as.numeric(as.character(breaks))) %>%
group_by(y, breaks) %>%
summarise(n = n()) %>%
mutate(pct = ifelse(y==0, n/sum(n), 1 - n/sum(n)))
ggplot() +
geom_segment(data=h, size=4, show.legend=FALSE,
aes(x=breaks, xend=breaks, y=y, yend=pct, colour=factor(y))) +
geom_segment(dat=dat[dat$y==0,], aes(x=x1, xend=x1, y=0, yend=-0.02), size=0.2, colour="grey30") +
geom_segment(dat=dat[dat$y==1,], aes(x=x1, xend=x1, y=1, yend=1.02), size=0.2, colour="grey30") +
geom_line(data=data.frame(x=seq(-2,20,0.1),
y=predict(glm(y ~ x1, family="binomial", data=dat),
newdata=data.frame(x1=seq(-2,20,0.1)),
type="response")),
aes(x,y), colour="grey50", lwd=1) +
scale_y_continuous(limits=c(-0.02,1.02)) +
scale_x_continuous(limits=c(-1,20)) +
theme_bw(base_size=12)
I have this simple data frame holding three replicates (value) for each factor (CT). I would like to plot it as geom_point and than the means of the point as geom_line.
gene <- c("Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5")
value <- c(0.86443, 0.79032, 0.86517, 0.79782, 0.79439, 0.89221, 0.93071, 0.87170, 0.86488, 0.91133, 0.87202, 0.84028, 0.83242, 0.74016, 0.86656)
CT <- c("ET","ET","ET", "HP","HP","HP","HT","HT","HT", "LT","LT","LT","P","P","P")
df<- cbind(gene,value,CT)
df<- data.frame(df)
So, I can make the scatter plot.
ggplot(df, aes(x=CT, y=value)) + geom_point()
How do I get a geom_line representing the means for each factor. I have tried the stat_summary:
ggplot(df, aes(x=CT, y=value)) + geom_point() +
stat_summary(aes(y = value,group = CT), fun.y=mean, colour="red", geom="line")
But it does not work.
"geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?"
But each group has three observations, what is wrong?
Ps. I am also interested in a smooth line.
You should set the group aes to 1:
ggplot(df, aes(x=CT, y=value)) + geom_point() +
stat_summary(aes(y = value,group=1), fun.y=mean, colour="red", geom="line",group=1)
You can use the dplyr package to get the means of each factor.
library(dplyr)
group_means <- df %>%
group_by(CT) %>%
summarise(mean = mean(value))
Then you will need to convert the factors to numeric to let you plot lines on the graph using the geom_segment function. In addition, the scale_x_continuous function will let you set the labels for the x axis.
ggplot(df, aes(x=as.numeric(CT), y=value)) + geom_point() +
geom_segment(aes(x=as.numeric(CT)-0.4, xend=as.numeric(CT)+0.4, y=mean, yend=mean),
data=group_means, colour="red") +
scale_x_continuous("name", labels=as.character(df$CT), breaks=as.numeric(df$CT))
Following on from hrbrmstr's comment you can add the smooth line using the following:
ggplot(df, aes(x=as.numeric(CT), y=value, group=1)) + geom_point() +
geom_segment(aes(x=as.numeric(CT)-0.4, xend=as.numeric(CT)+0.4, y=mean, yend=mean),
data=group_means, colour="red") +
scale_x_continuous("name", labels=as.character(df$CT), breaks=as.numeric(df$CT)) +
geom_smooth()