multiple histograms with ggplot2 - position - r

I am trying to plot side by side the following datasets
dataset1=data.frame(obs=runif(20,min=1,max=10))
dataset2=data.frame(obs=runif(20,min=1,max=20))
dataset3=data.frame(obs=runif(20,min=5,max=10))
dataset4=data.frame(obs=runif(20,min=8,max=10))
I've tried to add the option position="dodge" for geom_histogram with no luck. How can I change the following code to plot the histograms columns side by side without overlap ??
ggplot(data = dataset1,aes_string(x = "obs",fill="dataset")) +
geom_histogram(binwidth = 1,colour="black", fill="blue")+
geom_histogram(data=dataset2, aes_string(x="obs"),binwidth = 1,colour="black",fill="green")+
geom_histogram(data=dataset3, aes_string(x="obs"),binwidth = 1,colour="black",fill="red")+
geom_histogram(data=dataset4, aes_string(x="obs"),binwidth = 1,colour="black",fill="orange")

ggplot2 works best with "long" data, where all the data is in a single data frame and different groups are described by other variables in the data frame. To that end
DF <- rbind(data.frame(fill="blue", obs=dataset1$obs),
data.frame(fill="green", obs=dataset2$obs),
data.frame(fill="red", obs=dataset3$obs),
data.frame(fill="orange", obs=dataset3$obs))
where I've added a fill column which has the values that you used in your histograms. Given that, the plot can be made with:
ggplot(DF, aes(x=obs, fill=fill)) +
geom_histogram(binwidth=1, colour="black", position="dodge") +
scale_fill_identity()
where position="dodge" now works.
You don't have to use the literal fill color as the distinction. Here is a version that uses the dataset number instead.
DF <- rbind(data.frame(dataset=1, obs=dataset1$obs),
data.frame(dataset=2, obs=dataset2$obs),
data.frame(dataset=3, obs=dataset3$obs),
data.frame(dataset=4, obs=dataset3$obs))
DF$dataset <- as.factor(DF$dataset)
ggplot(DF, aes(x=obs, fill=dataset)) +
geom_histogram(binwidth=1, colour="black", position="dodge") +
scale_fill_manual(breaks=1:4, values=c("blue","green","red","orange"))
This is the same except for the legend.

Related

How to plot two histograms on the same axis scale?

I have two dataframes: dataf1, dataf2. They have the same structure and columns.
3 columns names are A,B,C. And they both have 50 rows.
I would like to plot the histogram of column B on dataf1 and dataf2. I can plot two histograms separately but they are not of the same scale. I would like to know how to either put them on the same histogram using different colors or plot two histograms of the same scale?
ggplot() + aes(dataf1$B)+ geom_histogram(binwidth=1, colour="black",fill="white")
ggplot() + aes(dataf2$B)+ geom_histogram(binwidth=1, colour="black", fill="white")
Combine your data into a single data frame with a new column marking which data frame the data originally came from. Then use that new column for the fill aesthetic for your plot.
data1$source="Data 1"
data2$source="Data 2"
dat_combined = rbind(data1, data2)
You haven't provided sample data, so here are a few examples of possible plots, using the built-in iris data frame. In the plots below, dat is analogous to dat_combined, Petal.Width is analogous to B, and Species is analogous to source.
dat = subset(iris, Species != "setosa") # We want just two species
ggplot(dat, aes(Petal.Width, fill=Species)) +
geom_histogram(position="identity", colour="grey40", alpha=0.5, binwidth=0.1)
ggplot(dat, aes(Petal.Width, fill=Species)) +
geom_histogram(position="dodge", binwidth=0.1)
ggplot(dat, aes(Petal.Width, fill=Species)) +
geom_histogram(position="identity", colour="grey40", binwidth=0.1) +
facet_grid(Species ~ .)
As Zheyuan says, you just need to set the y limits for each plot to get them on the same scale. With ggplot2, one way to do this is with the lims command (though scale_y_continuous and coord_cartesian also work, albeit slightly differently). You also should never use data$column indside aes(). Instead, use the data argument for the data frame and unquoted column names inside aes(). Here's an example with some built-in data.
p1 = ggplot(mtcars, aes(x = mpg)) + geom_histogram() + lims(y = c(0, 13))
p2 = ggplot(iris, aes(x = Sepal.Length)) + geom_histogram() + lims(y = c(0, 13))
gridExtra::grid.arrange(p1, p2, nrow = 1)
Two get two histograms on the same plot, the best way is to combine your data frames. A guess, without seeing what your data looks like:
dataf = rbind(dataf1["B"], dataf2["B"])
dafaf$source = c(rep("f1", nrow(dataf1)), rep("f2", nrow(dataf2))
ggplot(dataf, aes(x = B, fill = source)) +
geom_histogram(position = "identity", alpha = 0.7)

Plotting two data series in overlapping barplot (ggplot2)

I would like to plot two data series (same type and number of measures, but measured at two timepoints) in the same barplot. Preferably the first series is plotted in grey, with the second series plotted in colours with transparency such that the series 1 data is still visible.
The data I have is of the following format:
MyData = data.frame(
method=rep(c("A","B","C","D","E"),times=3),
time1=rnorm(30,10,3),
time2=rnorm(30,8,2),
lab=rep(rep(c(1,2,3),each=5),times=2),
cat=rep(c(1,2),each=15)
)
To show the type of plot I'm looking for I have added the code for plotting data series 1 below:
p <- ggplot(data = MyData,
aes(x=lab,
y=time1,
fill=method))
p + geom_bar(stat="identity",
position="dodge",
alpha=.3) +
facet_grid(. ~ cat)
In the end it doesn't really matter which one of the data series is in grey and which is in colour, as long as they are plotted on top of each other, and both are visible.
All suggestions are welcome!
There can only be one active fill_scale, so we need to map the variable method to something else, either group or color.
library(ggplot2)
MyData = data.frame(
method=rep(c("A","B","C","D","E"),times=3),
time1=rnorm(30,10,3),
time2=rnorm(30,8,2),
lab=rep(rep(c(1,2,3),each=5),times=2),
cat=rep(c(1,2),each=15)
)
p <- ggplot(data = MyData,
aes(x=lab)) +
geom_bar(aes(y=time2,fill=method),
stat="identity",
position="dodge",
alpha=.3
) +
geom_bar(aes(y=time1,group=method),
stat="identity",
position="dodge",
alpha=.3) +
scale_fill_discrete() +
facet_grid(. ~ cat)
p
I have been thinking about a different way to add the second data series. I can add the second series using geom_point instead of geom_bar, as this gives less clutter. However, how do I position the points on the corresponding bar? (i.e. right now the points are all on the same x-axis position).
library(ggplot2)
MyData = data.frame(
method=rep(c("A","B","C","D","E"),times=3),
time1=rnorm(30,10,3),
time2=rnorm(30,8,2),
lab=rep(rep(c(1,2,3),each=5),times=2),
cat=rep(c(1,2),each=15)
)
p <- ggplot(data = MyData,
aes(x=lab)) +
geom_bar(aes(y=time1,fill=method),
stat="identity",
position="dodge",
alpha=.7
) +
geom_point(aes(y=time2,group=method),
stat="identity",
position="dodge",
alpha=.8,
size=3) +
scale_fill_brewer(palette=3) +
facet_grid(. ~ cat)
p

ggplot transparency on individual bar

I am currently attempting to use ggplot to create a bar chart with a single bar that is partially transparent.
I have the following code:
dt1 <- data.table(yr=c(2010,2010,2011,2011),
val=c(1500,3000,2000,1100),
x=c("a","b","a","b"))
ggplot() + geom_bar(data=dt1, aes(x=yr, y=val,fill=x),stat="identity") +
scale_x_continuous(breaks=dt1$yr)
This will create a simple chart with 2 columns with stacked data. I have tried the following code to adjust the 2011 value to have transparency, however I am not having much luck. Any pointers?
dt1[,alphayr:=ifelse(yr==2011,.5,1)]
ggplot() + geom_bar(data=dt1, aes(x=yr, y=val,fill=x),stat="identity", alpha=dt1$alphayr) +
scale_x_continuous(breaks=dt1$yr)
First you put the alpha inside the aes as suggested by #jazzurro. However, you should use factor for this to get a discrete scale. Then you can manually adjust the alpha scale.
ggplot() + geom_bar(data=dt1, aes(x=yr, y=val, fill=x, alpha=factor(alphayr)), stat="identity") +
scale_x_continuous(breaks=dt1$yr) +
scale_alpha_manual(values = c("0.5"=0.5, "1"=1), guide='none')
An instructive question and answer. Other readers may not use data.table syntax and may want to see the result, so I simply revised #shadow's answer to create a factor with a data frame, and display the plot below.
dt1 <- data.frame(yr=c(2010,2010,2011,2011), val=c(1500,3000,2000,1100), x=c("a","b","a","b"))
create the factor
dt1$alphayr <- as.factor(ifelse(dt1$yr == "2011", 0.5, 1))
ggplot() + geom_bar(data=dt1, aes(x=yr, y=val, fill=x, alpha=factor(alphayr)), stat="identity") +
scale_x_continuous(breaks=dt1$yr) +
scale_alpha_manual(values = c("0.5"=0.5, "1"=1), guide='none')

How to get boxplots using ggplot to overlap instead of faceting

I am new to ggplot, and using ggplot to show box plots of my data corresponding to different types like this. There are four types. I found that I can use facet_wrap to generate four different graphs.
ggplot(o.xp.sample, aes(power, reduction, fill=interaction(type,power), dodge=type)) +
stat_boxplot(geom ='errorbar')+
geom_boxplot() +
facet_wrap(~type)
My question is, I want to combine all the four graphs into one graph such that each type has a different color (and slightly transparent to show other plots through). Is this possible?
Here is the data https://gist.github.com/anonymous/9589729
Try this:
library(ggplot2)
o.xp.sample = read.csv("C:\\...\\data.csv",sep=",")
ggplot(o.xp.sample, aes(factor(power), reduction, fill=interaction(type,power), dodge=type)) +
stat_boxplot(geom ='errorbar') +
geom_boxplot() +
theme_bw() +
guides(fill = guide_legend(ncol = 3)) #added line as suggested by Paulo Cardoso

Splitting distribution visualisations on the y-axis in ggplot2 in r

The most commonly cited example of how to visualize a logistic fit using ggplot2 seems to be something very much like this:
data("kyphosis", package="rpart")
ggplot(data=kyphosis, aes(x=Age, y = as.numeric(Kyphosis) - 1)) +
geom_point() +
stat_smooth(method="glm", family="binomial")
This visualisation works great if you don't have too much overlapping data, and the first suggestion for crowded data seems to be to use injected jitter in the x and y coordinates of the points then adjust the alpha value of the points. When you get to the point where individual points aren't useful but distributions of points are, is it possible to use geom_density(), geom_histogram(), or something else to visualise the data but continue to split the categorical variable along the y-axis as it is done with geom_point()?
From what I have found, geom_density() and geom_histogram() can easily be split/grouped by the categorical variable and both levels can easily be reversed using scale_y_reverse() but I can't figure out if it is even possible to move only one of the categorical variable distributions to the top of the plot. Any help/suggestions would be appreciated.
The annotate() function in ggplot allows you to add geoms to a plot with properties that "are not mapped from the variables of a data frame, but are instead in as vectors," meaning that you can add layers that are unrelated to your data frame. In this case your two density curves are related to the data frame (since the variables are in it), but because you're trying to position them differently, using annotate() is useful.
Here's one way to go about it:
data("kyphosis", package="rpart")
model.only <- ggplot(data=kyphosis, aes(x=Age, y = as.numeric(Kyphosis) - 1)) +
stat_smooth(method="glm", family="binomial")
absents <- subset(kyphosis, Kyphosis=="absent")
presents <- subset(kyphosis, Kyphosis=="present")
dens.absents <- density(absents$Age)
dens.presents <- density(presents$Age)
scaling.factor <- 10 # Make the density plots taller
model.only + annotate("line", x=dens.absents$x, y=dens.absents$y*scaling.factor) +
annotate("line", x=dens.presents$x, y=dens.presents$y*scaling.factor + 1)
This adds two annotated layers with scaled density plots for each of the kyphosis groups. For the presents variable, y is scaled and increased by 1 to shift it up.
You can also fill the density plots instead of just using a line. Instead of annotate("line"...) you need to use annotate("polygon"...), like so:
model.only + annotate("polygon", x=dens.absents$x, y=dens.absents$y*scaling.factor, fill="red", colour="black", alpha=0.4) +
annotate("polygon", x=dens.presents$x, y=dens.presents$y*scaling.factor + 1, fill="green", colour="black", alpha=0.4)
Technically you could use annotate("density"...), but that won't work when you shift the present plot up by one. Instead of shifting, it fills the whole plot:
model.only + annotate("density", x=dens.absents$x, y=dens.absents$y*scaling.factor, fill="red") +
annotate("density", x=dens.presents$x, y=dens.presents$y*scaling.factor + 1, fill="green")
The only way around that problem is to use a polygon instead of a density geom.
One final variant: flipping the top density plot along y-axis = 1:
model.only + annotate("polygon", x=dens.absents$x, y=dens.absents$y*scaling.factor, fill="red", colour="black", alpha=0.4) +
annotate("polygon", x=dens.presents$x, y=(1 - dens.presents$y*scaling.factor), fill="green", colour="black", alpha=0.4)
I am not sure I get your point, but here an attempt:
dat <- rbind(kyphosis,kyphosis)
dat$grp <- factor(rep(c('smooth','dens'),each = nrow(kyphosis)),
levels = c('smooth','dens'))
ggplot(dat,aes(x=Age)) +
facet_grid(grp~.,scales = "free_y") +
#geom_point(data=subset(dat,grp=='smooth'),aes(y = as.numeric(Kyphosis) - 1)) +
stat_smooth(data=subset(dat,grp=='smooth'),aes(y = as.numeric(Kyphosis) - 1),
method="glm", family="binomial") +
geom_density(data=subset(dat,grp=='dens'))

Resources