I would like to plot two data series (same type and number of measures, but measured at two timepoints) in the same barplot. Preferably the first series is plotted in grey, with the second series plotted in colours with transparency such that the series 1 data is still visible.
The data I have is of the following format:
MyData = data.frame(
method=rep(c("A","B","C","D","E"),times=3),
time1=rnorm(30,10,3),
time2=rnorm(30,8,2),
lab=rep(rep(c(1,2,3),each=5),times=2),
cat=rep(c(1,2),each=15)
)
To show the type of plot I'm looking for I have added the code for plotting data series 1 below:
p <- ggplot(data = MyData,
aes(x=lab,
y=time1,
fill=method))
p + geom_bar(stat="identity",
position="dodge",
alpha=.3) +
facet_grid(. ~ cat)
In the end it doesn't really matter which one of the data series is in grey and which is in colour, as long as they are plotted on top of each other, and both are visible.
All suggestions are welcome!
There can only be one active fill_scale, so we need to map the variable method to something else, either group or color.
library(ggplot2)
MyData = data.frame(
method=rep(c("A","B","C","D","E"),times=3),
time1=rnorm(30,10,3),
time2=rnorm(30,8,2),
lab=rep(rep(c(1,2,3),each=5),times=2),
cat=rep(c(1,2),each=15)
)
p <- ggplot(data = MyData,
aes(x=lab)) +
geom_bar(aes(y=time2,fill=method),
stat="identity",
position="dodge",
alpha=.3
) +
geom_bar(aes(y=time1,group=method),
stat="identity",
position="dodge",
alpha=.3) +
scale_fill_discrete() +
facet_grid(. ~ cat)
p
I have been thinking about a different way to add the second data series. I can add the second series using geom_point instead of geom_bar, as this gives less clutter. However, how do I position the points on the corresponding bar? (i.e. right now the points are all on the same x-axis position).
library(ggplot2)
MyData = data.frame(
method=rep(c("A","B","C","D","E"),times=3),
time1=rnorm(30,10,3),
time2=rnorm(30,8,2),
lab=rep(rep(c(1,2,3),each=5),times=2),
cat=rep(c(1,2),each=15)
)
p <- ggplot(data = MyData,
aes(x=lab)) +
geom_bar(aes(y=time1,fill=method),
stat="identity",
position="dodge",
alpha=.7
) +
geom_point(aes(y=time2,group=method),
stat="identity",
position="dodge",
alpha=.8,
size=3) +
scale_fill_brewer(palette=3) +
facet_grid(. ~ cat)
p
Related
I have the following dataset:
subj <- c(rep(11,3),rep(12,3),rep(14,3),rep(15,3),rep(17,3),rep(18,3),rep(20,3))
group <- c(rep("u",3),rep("t",6),rep("u",6),rep("t",6))
time <- rep(1:3,7)
mean <- c(0.7352941, 0.8059701, 0.8823529, 0.9264706, 0.9852941, 0.9558824, 0.7941176, 0.8676471, 0.7910448, 0.7058824, 0.8382353, 0.7941176, 0.9411765, 0.9558824, 0.9852941, 0.7647059, 0.8088235, 0.7968750, 0.8088235, 0.8500000, 0.8412698)
df <- data.frame(subj,group,time,mean)
df$subj <- as.factor(df$subj)
df$time <- as.factor(df$time)
And now I create a barplot with ggplot2:
library(ggplot2)
qplot(x=subj, y=mean*100, fill=time, data=df, geom="bar",stat="identity",position="dodge") +
facet_wrap(~ group)
How do I make it so that the x-axis labels that are not present in each facet are not shown? How do I get equal distances between each subj (i.e. get rid of the bigger gaps)?
You can use scale="free":
ggplot(df, aes(x=subj, y=mean*100, fill=time)) +
geom_bar(stat="identity", position="dodge") +
facet_wrap(~ group, scale="free")
Another option with slightly different aesthetics using facet_grid. In contrast to the plots above, the panels aren't the same width here, but due to "space="free_x", the bars are the same widths.
ggplot(df, aes(x=subj, y=mean*100, fill=time)) +
geom_bar(stat="identity", position="dodge") +
facet_grid(~ group, scale="free", space="free_x")
I have two dataframes: dataf1, dataf2. They have the same structure and columns.
3 columns names are A,B,C. And they both have 50 rows.
I would like to plot the histogram of column B on dataf1 and dataf2. I can plot two histograms separately but they are not of the same scale. I would like to know how to either put them on the same histogram using different colors or plot two histograms of the same scale?
ggplot() + aes(dataf1$B)+ geom_histogram(binwidth=1, colour="black",fill="white")
ggplot() + aes(dataf2$B)+ geom_histogram(binwidth=1, colour="black", fill="white")
Combine your data into a single data frame with a new column marking which data frame the data originally came from. Then use that new column for the fill aesthetic for your plot.
data1$source="Data 1"
data2$source="Data 2"
dat_combined = rbind(data1, data2)
You haven't provided sample data, so here are a few examples of possible plots, using the built-in iris data frame. In the plots below, dat is analogous to dat_combined, Petal.Width is analogous to B, and Species is analogous to source.
dat = subset(iris, Species != "setosa") # We want just two species
ggplot(dat, aes(Petal.Width, fill=Species)) +
geom_histogram(position="identity", colour="grey40", alpha=0.5, binwidth=0.1)
ggplot(dat, aes(Petal.Width, fill=Species)) +
geom_histogram(position="dodge", binwidth=0.1)
ggplot(dat, aes(Petal.Width, fill=Species)) +
geom_histogram(position="identity", colour="grey40", binwidth=0.1) +
facet_grid(Species ~ .)
As Zheyuan says, you just need to set the y limits for each plot to get them on the same scale. With ggplot2, one way to do this is with the lims command (though scale_y_continuous and coord_cartesian also work, albeit slightly differently). You also should never use data$column indside aes(). Instead, use the data argument for the data frame and unquoted column names inside aes(). Here's an example with some built-in data.
p1 = ggplot(mtcars, aes(x = mpg)) + geom_histogram() + lims(y = c(0, 13))
p2 = ggplot(iris, aes(x = Sepal.Length)) + geom_histogram() + lims(y = c(0, 13))
gridExtra::grid.arrange(p1, p2, nrow = 1)
Two get two histograms on the same plot, the best way is to combine your data frames. A guess, without seeing what your data looks like:
dataf = rbind(dataf1["B"], dataf2["B"])
dafaf$source = c(rep("f1", nrow(dataf1)), rep("f2", nrow(dataf2))
ggplot(dataf, aes(x = B, fill = source)) +
geom_histogram(position = "identity", alpha = 0.7)
I have a dataset containing measurements at two timepoints. I have made a barplot of the data of the first timepoint, and would like to add points for the second timepoint (this is just meant as a reference). As you can see below the points are there, but they are not in the correct x-axis position (i.e. they are all on the same x-value, unlike the bars).
How to solve this?
library(ggplot2)
MyData = data.frame(
method=rep(c("A","B","C","D","E"),times=3),
time1=rnorm(30,10,3),
time2=rnorm(30,8,2),
lab=rep(rep(c(1,2,3),each=5),times=2),
cat=rep(c(1,2),each=15)
)
p <- ggplot(data = MyData,
aes(x=lab)) +
geom_bar(aes(y=time1,fill=method),
stat="identity",
position="dodge",
alpha=.7
) +
geom_point(aes(y=time2,group=method),
stat="identity",
position="dodge",
alpha=.8,
size=3) +
scale_fill_brewer(palette=3) +
facet_grid(. ~ cat)
p
Use position = position_dodge(width = .9) in geom_point.
I have this simple data frame holding three replicates (value) for each factor (CT). I would like to plot it as geom_point and than the means of the point as geom_line.
gene <- c("Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5")
value <- c(0.86443, 0.79032, 0.86517, 0.79782, 0.79439, 0.89221, 0.93071, 0.87170, 0.86488, 0.91133, 0.87202, 0.84028, 0.83242, 0.74016, 0.86656)
CT <- c("ET","ET","ET", "HP","HP","HP","HT","HT","HT", "LT","LT","LT","P","P","P")
df<- cbind(gene,value,CT)
df<- data.frame(df)
So, I can make the scatter plot.
ggplot(df, aes(x=CT, y=value)) + geom_point()
How do I get a geom_line representing the means for each factor. I have tried the stat_summary:
ggplot(df, aes(x=CT, y=value)) + geom_point() +
stat_summary(aes(y = value,group = CT), fun.y=mean, colour="red", geom="line")
But it does not work.
"geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?"
But each group has three observations, what is wrong?
Ps. I am also interested in a smooth line.
You should set the group aes to 1:
ggplot(df, aes(x=CT, y=value)) + geom_point() +
stat_summary(aes(y = value,group=1), fun.y=mean, colour="red", geom="line",group=1)
You can use the dplyr package to get the means of each factor.
library(dplyr)
group_means <- df %>%
group_by(CT) %>%
summarise(mean = mean(value))
Then you will need to convert the factors to numeric to let you plot lines on the graph using the geom_segment function. In addition, the scale_x_continuous function will let you set the labels for the x axis.
ggplot(df, aes(x=as.numeric(CT), y=value)) + geom_point() +
geom_segment(aes(x=as.numeric(CT)-0.4, xend=as.numeric(CT)+0.4, y=mean, yend=mean),
data=group_means, colour="red") +
scale_x_continuous("name", labels=as.character(df$CT), breaks=as.numeric(df$CT))
Following on from hrbrmstr's comment you can add the smooth line using the following:
ggplot(df, aes(x=as.numeric(CT), y=value, group=1)) + geom_point() +
geom_segment(aes(x=as.numeric(CT)-0.4, xend=as.numeric(CT)+0.4, y=mean, yend=mean),
data=group_means, colour="red") +
scale_x_continuous("name", labels=as.character(df$CT), breaks=as.numeric(df$CT)) +
geom_smooth()
I am trying to plot side by side the following datasets
dataset1=data.frame(obs=runif(20,min=1,max=10))
dataset2=data.frame(obs=runif(20,min=1,max=20))
dataset3=data.frame(obs=runif(20,min=5,max=10))
dataset4=data.frame(obs=runif(20,min=8,max=10))
I've tried to add the option position="dodge" for geom_histogram with no luck. How can I change the following code to plot the histograms columns side by side without overlap ??
ggplot(data = dataset1,aes_string(x = "obs",fill="dataset")) +
geom_histogram(binwidth = 1,colour="black", fill="blue")+
geom_histogram(data=dataset2, aes_string(x="obs"),binwidth = 1,colour="black",fill="green")+
geom_histogram(data=dataset3, aes_string(x="obs"),binwidth = 1,colour="black",fill="red")+
geom_histogram(data=dataset4, aes_string(x="obs"),binwidth = 1,colour="black",fill="orange")
ggplot2 works best with "long" data, where all the data is in a single data frame and different groups are described by other variables in the data frame. To that end
DF <- rbind(data.frame(fill="blue", obs=dataset1$obs),
data.frame(fill="green", obs=dataset2$obs),
data.frame(fill="red", obs=dataset3$obs),
data.frame(fill="orange", obs=dataset3$obs))
where I've added a fill column which has the values that you used in your histograms. Given that, the plot can be made with:
ggplot(DF, aes(x=obs, fill=fill)) +
geom_histogram(binwidth=1, colour="black", position="dodge") +
scale_fill_identity()
where position="dodge" now works.
You don't have to use the literal fill color as the distinction. Here is a version that uses the dataset number instead.
DF <- rbind(data.frame(dataset=1, obs=dataset1$obs),
data.frame(dataset=2, obs=dataset2$obs),
data.frame(dataset=3, obs=dataset3$obs),
data.frame(dataset=4, obs=dataset3$obs))
DF$dataset <- as.factor(DF$dataset)
ggplot(DF, aes(x=obs, fill=dataset)) +
geom_histogram(binwidth=1, colour="black", position="dodge") +
scale_fill_manual(breaks=1:4, values=c("blue","green","red","orange"))
This is the same except for the legend.