I want to make a plot that looks like this in ggplot2:
Here's a dummy dataset:
set.seed(1)
dat <- data.frame(x = exp(rnorm(6)),
f1 = factor(c("a","b","c","c","d","d")),
f2 = factor(c("","","1","2","1","2")))
And a non-working example:
ggplot(data=dat, aes(x=f1, y=x, fill= f2)) +
geom_bar(stat="identity", width=0.9)+
scale_fill_manual(values=c("red", "blue", rep("green",4)))
I could do this by brute force in base graphics, but I'm unsure how to go about it in ggplot2, and I need to use ggplot2 so as to keep this plot consistent with the theme I'm using throughout the project.
So how do I make the factors 1 and 2 go side by side? And how do I center the labels like I have in the drawing? And how do I make the colors behave?
You are trying to make a barplot with grouped bars, a question that was answered here: Grouped bar plot in ggplot
fill only determines the colors, not the grouping, this is why you get blue and green bars for c and d.
Related
I want to create a stacked barplot where I can have distinct colours for the categories along the x axis (for communication purposes as they relate to a set of strongly colour coded items), but I also want to distinguish between the stacked parts of the bar with two categories, ideally using a pattern. I've found a partial solution using colour transparency, but it's not exactly what I want.
I found some solutions using extra packages to give complex fill patterns using image fills, and some workarounds to plot lines to sit over the bars to create an artificial fill effect, and some that allowed bars to have a colour and pattern fill, but only splitting either based on bars or stacks, not both. So far I have found nothing that just allows a simple pattern fill for the stacks while also allowing the bars to be different colours.
Example:
mydata <- as.data.frame(cbind(letters = c("a","b","c","d","e","f","a","b","c","d","e","f"),
split=c("yes","yes","yes","yes","yes","yes","no","no","no","no","no","no"),
amount= c(2,3,5,3,4,6,7,2,5,7,2,4)))
colfill <- c("red","blue","green","orange","magenta","purple") ## fill for letters variable
## stackfill <- c("solid","striped") ## example of type of fill variable I want for for 'split'
## Make the barplots:
# this one colours bars by 'split':
ggplot(data=mydata, aes(x=letters, y=amount, fill=split)) +
geom_bar(stat="identity",position="stack")+
scale_fill_manual(values=colfill)
# while this one distinguished based on 'letters'
ggplot(data=mydata, aes(x=letters, y=amount, fill=letters)) +
geom_bar(stat="identity",position="stack")+
scale_fill_manual(values=colfill)
# I want to combine both to get something like this, with colour coded 'letters' and pattern coded 'split':
ggplot(data=mydata)+
geom_col(aes(x=letters, y=amount, fill=letters, alpha=split)) +
scale_alpha_discrete(range=c(1,0.5))+
scale_fill_manual(values=colfill)
Appreciate any suggestions!
Thanks,
Try this:
library(ggplot2)
#remotes::install_github("coolbutuseless/ggpattern")
library(ggpattern)
#Data
mydata <- as.data.frame(cbind(letters = c("a","b","c","d","e","f","a","b","c","d","e","f"),
split=c("yes","yes","yes","yes","yes","yes","no","no","no","no","no","no"),
amount= c(2,3,5,3,4,6,7,2,5,7,2,4)))
colfill <- c("red","blue","green","orange","magenta","purple") ## fill for letters variable
## Make the barplots:
ggplot(data=mydata)+
geom_col(aes(x=letters, y=amount, fill=letters)) +
geom_col_pattern(
aes(letters, amount, pattern_fill = split,fill=letters),
pattern = 'stripe',
colour = 'black'
)+
scale_fill_manual(values=colfill)
Output:
I was creating histograms with ggplot2 in R whose bins are separated with colors and noticed one thing. When the bins of a histogram are separated by colors with fill option, the density value of the histogram turns funny.
Here is the data.
set.seed(42)
x <- rnorm(10000,0,1)
df <- data.frame(x=x, b=x>1)
This is a histogram without fill.
ggplot(df, aes(x = x)) +
geom_histogram(aes(y=..density..))
This is a histogram with fill.
ggplot(df, aes(x = x, fill=b)) +
geom_histogram(aes(y=..density..))
You can see the latter is pretty crazy. The left side of the bins is sticking out. The density values of the bins of each color are obviously wrong.
I thought over this issue for a while. The data can't be wrong for the first histogram was normal. It should be something in ggplot2 or geom_histogram function. I googled "geom_histogram density fill" and couldn't find much help.
I want the end product to look like:
Separated by colors as you see in the second histogram
Size and shape identical to the first histogram
The vertical axis being density
How would you deal with issue?
I think what you may want is this:
ggplot(df, aes(x = x, fill=b)) +
geom_histogram()
Rather than the density. As mentioned above the density is asking for extra calcuations.
One thing that is important (in my opinion) is that histograms are graphs of one variable. As soon as you start adding data from other variables you start to change them more into bar charts or something else like that.
You will want work on setting the axis manually if you want it to range from 0 to .4.
The solution is to hand-compute density like this (instead of using the built-in ggplot2 version):
library(ggplot2)
# Generate test data
set.seed(42)
x <- rnorm(10000,0,1)
df <- data.frame(x=x, b=x>1)
ggplot(df, aes(x = x, fill=b)) +
geom_histogram(mapping = aes(y = ..count.. / (sum(..count..) * ..width..)))
when you provide a column name for the fill parameter in ggplot it groups varaiables and plots them according to each group with a unique color.
if you want a single color for the plot just specify the color you want:
FIXED
ggplot(df, aes(x = x)) +
geom_histogram(aes(y=..density..),fill="Blue")
In trying to color my stacked histogram according to a factor column; all the bars have a "green" roof? I want the bar-top to be the same color as the bar itself. The figure below shows clearly what is wrong. All the bars have a "green" horizontal line at the top?
Here is a dummy data set :
BodyLength <- rnorm(100, mean = 50, sd = 3)
vector <- c("80","10","5","5")
colors <- c("black","blue","red","green")
color <- rep(colors,vector)
data <- data.frame(BodyLength,color)
And the program I used to generate the plot below :
plot <- ggplot(data = data, aes(x=data$BodyLength, color = factor(data$color), fill=I("transparent")))
plot <- plot + geom_histogram()
plot <- plot + scale_colour_manual(values = c("Black","blue","red","green"))
Also, since the data column itself contains color names, any way I don't have to specify them again in scale_color_manual? Can ggplot identify them from the data itself? But I would really like help with the first problem right now...Thanks.
Here is a quick way to get your colors to scale_colour_manual without writing out a vector:
data <- data.frame(BodyLength,color)
data$color<- factor(data$color)
and then later,
scale_colour_manual(values = levels(data$color))
Now, with respect to your first problem, I don't know exactly why your bars have green roofs. However, you may want to look at some different options for the position argument in geom_histogram, such as
plot + geom_histogram(position="identity")
..or position="dodge". The identity option is closer to what you want but since green is the last line drawn, it overwrites previous the colors.
I like density plots better for these problems myself.
ggplot(data=data, aes(x=BodyLength, color=color)) + geom_density()
ggplot(data=data, aes(x=BodyLength, fill=color)) + geom_density(alpha=.3)
I am trying to plot a 5 dimensional plot in R. I am currently using the rgl package to plot my data in 4 dimensions, using 3 variables as the x,y,z, coordinates, another variable as the color. I am wondering if I can add a fifth variable using this package, like for example the size or the shape of the points in the space. Here's an example of my data, and my current code:
set.seed(1)
df <- data.frame(replicate(4,sample(1:200,1000,rep=TRUE)))
addme <- data.frame(replicate(1,sample(0:1,1000,rep=TRUE)))
df <- cbind(df,addme)
colnames(df) <- c("var1","var2","var3","var4","var5")
require(rgl)
plot3d(df$var1, df$var2, df$var3, col=as.numeric(df$var4), size=0.5, type='s',xlab="var1",ylab="var2",zlab="var3")
I hope it is possible to do the 5th dimension.
Many thanks,
Here is a ggplot2 option. I usually shy away from 3D plots as they are hard to interpret properly. I also almost never put in 5 continuous variables in the same plot as I have here...
ggplot(df, aes(x=var1, y=var2, fill=var3, color=var4, size=var5^2)) +
geom_point(shape=21) +
scale_color_gradient(low="red", high="green") +
scale_size_continuous(range=c(1,12))
While this is a bit messy, you can actually reasonably read all 5 dimensions for most points.
A better approach to multi-dimensional plotting opens up if some of your variables are categorical. If all your variables are continuous, you can turn some of them to categorical with cut and then use facet_wrap or facet_grid to plot those.
For example, here I break up var3 and var4 into quintiles and use facet_grid on them. Note that I also keep the color aesthetics as well to highlight that most of the time turning a continuous variable to categorical in high dimensional plots is good enough to get the key points across (here you'll notice that the fill and border colors are pretty uniform within any given grid cell):
df$var4.cat <- cut(df$var4, quantile(df$var4, (0:5)/5), include.lowest=T)
df$var3.cat <- cut(df$var3, quantile(df$var3, (0:5)/5), include.lowest=T)
ggplot(df, aes(x=var1, y=var2, fill=var3, color=var4, size=var5^2)) +
geom_point(shape=21) +
scale_color_gradient(low="red", high="green") +
scale_size_continuous(range=c(1,12)) +
facet_grid(var3.cat ~ var4.cat)
I would like to use ggplot and faceting to construct a series of density plots grouped by a factor. Additionally, I would like to a layer another density plot on each of the facets that is not subject to the constraints imposed by the facet.
For example, the faceted plot would look like this:
require(ggplot2)
ggplot(diamonds, aes(price)) + facet_grid(.~clarity) + geom_density()
and then I would like to have the following single density plot layered on top of each of the facets:
ggplot(diamonds, aes(price)) + geom_density()
Furthermore, is ggplot with faceting the best way to do this, or is there a preferred method?
One way to achieve this would be to make new data frame diamonds2 that contains just column price and then two geom_density() calls - one which will use original diamonds and second that uses diamonds2. As in diamonds2 there will be no column clarity all values will be used in all facets.
diamonds2<-diamonds["price"]
ggplot(diamonds, aes(price)) + geom_density()+facet_grid(.~clarity) +
geom_density(data=diamonds2,aes(price),colour="blue")
UPDATE - as suggested by #BrianDiggs the same result can be achieved without making new data frame but transforming it inside the geom_density().
ggplot(diamonds, aes(price)) + geom_density()+facet_grid(.~clarity) +
geom_density(data=transform(diamonds, clarity=NULL),aes(price),colour="blue")
Another approach would be to plot data without faceting. Add two calls to geom_density() - in one add aes(color=clarity) to have density lines in different colors for each level of clarity and leave empty second geom_density() - that will add overall black density line.
ggplot(diamonds,aes(price))+geom_density(aes(color=clarity))+geom_density()