I am currently attempting to use ggplot to create a bar chart with a single bar that is partially transparent.
I have the following code:
dt1 <- data.table(yr=c(2010,2010,2011,2011),
val=c(1500,3000,2000,1100),
x=c("a","b","a","b"))
ggplot() + geom_bar(data=dt1, aes(x=yr, y=val,fill=x),stat="identity") +
scale_x_continuous(breaks=dt1$yr)
This will create a simple chart with 2 columns with stacked data. I have tried the following code to adjust the 2011 value to have transparency, however I am not having much luck. Any pointers?
dt1[,alphayr:=ifelse(yr==2011,.5,1)]
ggplot() + geom_bar(data=dt1, aes(x=yr, y=val,fill=x),stat="identity", alpha=dt1$alphayr) +
scale_x_continuous(breaks=dt1$yr)
First you put the alpha inside the aes as suggested by #jazzurro. However, you should use factor for this to get a discrete scale. Then you can manually adjust the alpha scale.
ggplot() + geom_bar(data=dt1, aes(x=yr, y=val, fill=x, alpha=factor(alphayr)), stat="identity") +
scale_x_continuous(breaks=dt1$yr) +
scale_alpha_manual(values = c("0.5"=0.5, "1"=1), guide='none')
An instructive question and answer. Other readers may not use data.table syntax and may want to see the result, so I simply revised #shadow's answer to create a factor with a data frame, and display the plot below.
dt1 <- data.frame(yr=c(2010,2010,2011,2011), val=c(1500,3000,2000,1100), x=c("a","b","a","b"))
create the factor
dt1$alphayr <- as.factor(ifelse(dt1$yr == "2011", 0.5, 1))
ggplot() + geom_bar(data=dt1, aes(x=yr, y=val, fill=x, alpha=factor(alphayr)), stat="identity") +
scale_x_continuous(breaks=dt1$yr) +
scale_alpha_manual(values = c("0.5"=0.5, "1"=1), guide='none')
Related
I have been trying to look for an answer to my particular problem but I have not been successful, so I have just made a MWE to post here.
I tried the answers here with no success.
The task I want to do seems easy enough, but I cannot figure it out, and the results I get are making me have some fundamental questions...
I just want to overlay points and error bars on a bar plot, using ggplot2.
I have a long format data frame that looks like the following:
> mydf <- data.frame(cell=paste0("cell", rep(1:3, each=12)),
scientist=paste0("scientist", rep(rep(rep(1:2, each=3), 2), 3)),
timepoint=paste0("time", rep(rep(1:2, each=6), 3)),
rep=paste0("rep", rep(1:3, 12)),
value=runif(36)*100)
I have attempted to get the plot I want the following way:
myPal <- brewer.pal(3, "Set2")[1:2]
myPal2 <- brewer.pal(3, "Set1")
outfile <- "test.pdf"
pdf(file=outfile, height=10, width=10)
print(#or ggsave()
ggplot(mydf, aes(cell, value, fill=scientist )) +
geom_bar(stat="identity", position=position_dodge(.9)) +
geom_point(aes(cell, color=rep), position=position_dodge(.9), size=5) +
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_manual(values=myPal) +
scale_color_manual(values=myPal2)
)
dev.off()
But I obtain this:
The problem is, there should be 3 "rep" values per "scientist" bar, but the values are ordered by "rep" instead (they should be 1,2,3,1,2,3, instead of 1,1,2,2,3,3).
Besides, I would like to add error bars with geom_errorbar but I didn't manage to get a working example...
Furthermore, overlying actual value points to the bars, it is making me wonder what is actually being plotted here... if the values are taken properly for each bar, and why the max value (or so it seems) is plotted by default.
The way I think this should be properly plotted is with the median (or mean), adding the error bars like the whiskers in a boxplot (min and max value).
Any idea how to...
... have the "rep" value points appear in proper order?
... change the value shown by the bars from max to median?
... add error bars with max and min values?
I restructured your plotting code a little to make things easier.
The secret is to use proper grouping (which is otherwise inferred from fill and color. Also since you're dodging on multiple levels, dodge2 has to be used.
When you are unsure about "what is plotted where" in bar/column charts, it's always helpful to add the option color="black" which reveals that still things are stacked on top each other, because of your use of dodge instead of dodge2.
p = ggplot(mydf, aes(x=cell, y=value, group=paste(scientist,rep))) +
geom_col(aes(fill=scientist), position=position_dodge2(.9)) +
geom_point(aes(cell, color=rep), position=position_dodge2(.9), size=5) +
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_brewer(palette = "Set2")+
scale_color_brewer(palette = "Set1")
ggsave(filename = outfile, plot=p, height = 10, width = 10)
gives:
Regarding error bars
Since there are only three replicates I would show original data points and maybe a violin plot. For completeness sake I added also a geom_errorbar.
ggplot(mydf, aes(x=cell, y=value,group=paste(cell,scientist))) +
geom_violin(aes(fill=scientist),position=position_dodge(),color="black") +
geom_point(aes(cell, color=rep), position=position_dodge(0.9), size=5) +
geom_errorbar(stat="summary",position=position_dodge())+
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_brewer(palette = "Set2")+
scale_color_brewer(palette = "Set1")
gives
Update after comment
As I mentioned in my comment below, the stacking of the percentages leads to an undesirable outcome.
ggplot(mydf, aes(x=paste(cell, scientist), y=value)) +
geom_bar(aes(fill=rep),stat="identity", position=position_stack(),color="black") +
geom_point(aes(color=rep), position=position_dodge(.9), size=3) +
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_brewer(palette = "Set2")+
scale_color_brewer(palette = "Set1")
I am trying to color bars in ggplot but having issues. Can someone explain how to correctly use the fill parameter and the scale_colour parameters?
library(ggplot2)
df<-data.frame(c(80,33,30),c("Too militarized","Just doing their job","Unfairly tarnished by a few"),c("57%","23%","21%"))
colnames(df)<-c("values","names","percentages")
ggplot(df,aes(names,values))+
geom_bar(stat = "identity",position = "dodge",fill=names)+
geom_text(aes(label=percentages), vjust=0)+
ylab("percentage")+
xlab("thought")+
scale_colour_manual(values = rainbow(nrow(df)))
Working barplot example
barplot(c(df$values),names=c("Too militarized","Just doing their job","Unfairly tarnished by a few"),col = rainbow(nrow(df)))
The main issue is that you don't have fill inside a call to aes in geom_bar(). When mapping from data to visuals like colors, it has to be inside aes(). You can fix this by either wrapping fill=names with aes() or by just specifying fill colors directly, instead of using names:
Option 1 (no legend):
ggplot(df, aes(names, values)) +
geom_bar(stat="identity", fill=rainbow(nrow(df))) +
ylab("percentage") +
xlab("thought")
Option 2 (legend, because mapping from data to colors):
ggplot(df, aes(names, values)) +
geom_bar(stat="identity", aes(fill=names)) +
ylab("percentage") +
xlab("thought") +
scale_fill_manual(values=rainbow(nrow(df)))
Note that in both cases you might want to explicitly factor df$names ahead of the call to ggplot in order to get the bars in the order you want.
I'm creating a plot with ggplot that uses colored points, vertical lines, and horizontal lines to display the data. Ideally, I'd like to use two different color or linetype scales for the geom_vline and geom_hline layers, but ggplot discourages/disallows multiple variables mapped to the same aesthetic.
# Create example data
library(tidyverse)
library(lubridate)
set.seed(1234)
example.df <- data_frame(dt = seq(ymd("2016-01-01"), ymd("2016-12-31"), by="1 day"),
value = rnorm(366),
grp = sample(LETTERS[1:3], 366, replace=TRUE))
date.lines <- data_frame(dt = ymd(c("2016-04-01", "2016-10-31")),
dt.label = c("April Fools'", "Halloween"))
value.lines <- data_frame(value = c(-1, 1),
value.label = c("Threshold 1", "Threshold 2"))
If I set linetype aesthetics for both geom_*lines, they get put in the
linetype legend together, which doesn't necessarily make logical sense
ggplot(example.df, aes(x=dt, y=value, colour=grp)) +
geom_hline(data=value.lines, aes(yintercept=value, linetype=value.label)) +
geom_vline(data=date.lines, aes(xintercept=as.numeric(dt), linetype=dt.label)) +
geom_point(size=1) +
scale_x_date() +
theme_minimal()
Alternatively, I could set one of the lines to use a colour aesthetic,
but then that again puts the legend lines in an illogical legend
grouping
ggplot(example.df, aes(x=dt, y=value, colour=grp)) +
geom_hline(data=value.lines, aes(yintercept=value, colour=value.label)) +
geom_vline(data=date.lines, aes(xintercept=as.numeric(dt), linetype=dt.label)) +
geom_point(size=1) +
scale_x_date() +
theme_minimal()
The only partial solution I've found is to use a fill aesthetic instead
of colour in geom_pointand setting shape=21 to use a fillable shape,
but that forces a black border around the points. I can get rid of the
border by manually setting color="white, but then the white border
covers up points. If I set colour=NA, no points are plotted.
ggplot(example.df, aes(x=dt, y=value, fill=grp)) +
geom_hline(data=value.lines, aes(yintercept=value, colour=value.label)) +
geom_vline(data=date.lines, aes(xintercept=as.numeric(dt), linetype=dt.label)) +
geom_point(shape=21, size=2, colour="white") +
scale_x_date() +
theme_minimal()
This might be a case where ggplot's "you can't have two variables mapped
to the same aesthetic" rule can/should be broken, but I can't figure out clean way around it. Using fill with geom_point shows the most promise, but there's no way to remove the point borders.
Any ideas for plotting two different color or linetype aesthetics here?
Does anyone know how to create a scatterplot in R to create plots like these in PRISM's graphpad:
I tried using boxplots but they don't display the data the way I want it. These column scatterplots that graphpad can generate show the data better for me.
Any suggestions would be appreciated.
As #smillig mentioned, you can achieve this using ggplot2. The code below reproduces the plot that you are after pretty well - warning it is quite tricky. First load the ggplot2 package and generate some data:
library(ggplot2)
dd = data.frame(values=runif(21), type = c("Control", "Treated", "Treated + A"))
Next change the default theme:
theme_set(theme_bw())
Now we build the plot.
Construct a base object - nothing is plotted:
g = ggplot(dd, aes(type, values))
Add on the points: adjust the default jitter and change glyph according to type:
g = g + geom_jitter(aes(pch=type), position=position_jitter(width=0.1))
Add on the "box": calculate where the box ends. In this case, I've chosen the average value. If you don't want the box, just omit this step.
g = g + stat_summary(fun.y = function(i) mean(i),
geom="bar", fill="white", colour="black")
Add on some error bars: calculate the upper/lower bounds and adjust the bar width:
g = g + stat_summary(
fun.ymax=function(i) mean(i) + qt(0.975, length(i))*sd(i)/length(i),
fun.ymin=function(i) mean(i) - qt(0.975, length(i)) *sd(i)/length(i),
geom="errorbar", width=0.2)
Display the plot
g
In my R code above I used stat_summary to calculate the values needed on the fly. You could also create separate data frames and use geom_errorbar and geom_bar.
To use base R, have a look at my answer to this question.
If you don't mind using the ggplot2 package, there's an easy way to make similar graphics with geom_boxplot and geom_jitter. Using the mtcars example data:
library(ggplot2)
p <- ggplot(mtcars, aes(factor(cyl), mpg))
p + geom_boxplot() + geom_jitter() + theme_bw()
which produces the following graphic:
The documentation can be seen here: http://had.co.nz/ggplot2/geom_boxplot.html
I recently faced the same problem and found my own solution, using ggplot2.
As an example, I created a subset of the chickwts dataset.
library(ggplot2)
library(dplyr)
data(chickwts)
Dataset <- chickwts %>%
filter(feed == "sunflower" | feed == "soybean")
Since in geom_dotplot() is not possible to change the dots to symbols, I used the geom_jitter() as follow:
Dataset %>%
ggplot(aes(feed, weight, fill = feed)) +
geom_jitter(aes(shape = feed, col = feed), size = 2.5, width = 0.1)+
stat_summary(fun = mean, geom = "crossbar", width = 0.7,
col = c("#9E0142","#3288BD")) +
scale_fill_manual(values = c("#9E0142","#3288BD")) +
scale_colour_manual(values = c("#9E0142","#3288BD")) +
theme_bw()
This is the final plot:
For more details, you can have a look at this post:
http://withheadintheclouds1.blogspot.com/2021/04/building-dot-plot-in-r-similar-to-those.html?m=1
I am trying to plot side by side the following datasets
dataset1=data.frame(obs=runif(20,min=1,max=10))
dataset2=data.frame(obs=runif(20,min=1,max=20))
dataset3=data.frame(obs=runif(20,min=5,max=10))
dataset4=data.frame(obs=runif(20,min=8,max=10))
I've tried to add the option position="dodge" for geom_histogram with no luck. How can I change the following code to plot the histograms columns side by side without overlap ??
ggplot(data = dataset1,aes_string(x = "obs",fill="dataset")) +
geom_histogram(binwidth = 1,colour="black", fill="blue")+
geom_histogram(data=dataset2, aes_string(x="obs"),binwidth = 1,colour="black",fill="green")+
geom_histogram(data=dataset3, aes_string(x="obs"),binwidth = 1,colour="black",fill="red")+
geom_histogram(data=dataset4, aes_string(x="obs"),binwidth = 1,colour="black",fill="orange")
ggplot2 works best with "long" data, where all the data is in a single data frame and different groups are described by other variables in the data frame. To that end
DF <- rbind(data.frame(fill="blue", obs=dataset1$obs),
data.frame(fill="green", obs=dataset2$obs),
data.frame(fill="red", obs=dataset3$obs),
data.frame(fill="orange", obs=dataset3$obs))
where I've added a fill column which has the values that you used in your histograms. Given that, the plot can be made with:
ggplot(DF, aes(x=obs, fill=fill)) +
geom_histogram(binwidth=1, colour="black", position="dodge") +
scale_fill_identity()
where position="dodge" now works.
You don't have to use the literal fill color as the distinction. Here is a version that uses the dataset number instead.
DF <- rbind(data.frame(dataset=1, obs=dataset1$obs),
data.frame(dataset=2, obs=dataset2$obs),
data.frame(dataset=3, obs=dataset3$obs),
data.frame(dataset=4, obs=dataset3$obs))
DF$dataset <- as.factor(DF$dataset)
ggplot(DF, aes(x=obs, fill=dataset)) +
geom_histogram(binwidth=1, colour="black", position="dodge") +
scale_fill_manual(breaks=1:4, values=c("blue","green","red","orange"))
This is the same except for the legend.