I am trying to make one figure with two categories of data, which looks like:
A comparison between two groups (indicated by pink and black) concerning various different species
It seems the author of this figure put two boxplot pictures into one figure. I constructed similar boxplot by R, codes like below:
{library(reshape2)
species_melt <- melt(species, "Species")
library(ggplot2)
p<-ggplot(species_melt, aes(Species, value),color="Red") + geom_boxplot()
windowsFonts(myFont1=windowsFont("Arial"),myFont2=windowsFont("Times New Roman"))
p+scale_y_log10()}
Which generate a boxplot like below (partly):
enter image description here
Thus I wonder how I could add another layer of boxplot on it, yet it seems difficult with R.
It's hard to test without having your data, but something like this should work:
library(ggplot2)
ggplot() +
geom_boxplot(data = species_melt_1,
aes(Species, value),
fill = "#ff84b3", color = "#994f6b") +
geom_boxplot(data = species_melt_2,
aes(Species, value),
alpha = 0, color = "black")
I'm using two geom_boxplot's with different datasets (species_melt_1 and species_melt_2). First one is reddish and second one is transparent.
Related
So I'm self-teaching myself R right now using this online resource: "https://r4ds.had.co.nz/data-visualisation.html#facets"
This particular section is going over the use of facet_wrap and facet_grid. It's clear to me that facet_grid is primarily used when wanting to visualize a plot along two additional dimensions, rather than just one. What I don't understand is why you can use facet_grid(.~variable) or facet_grid(variable~.) to basically achieve the same result as facet_wrap. Putting a "." in place of a variable results in just not faceting along the row or column dimension, or in other words showing 1 additional variable just as facet_wrap would do.
If anyone can shed some light on this, thank you!
If you use facet_grid, the facets will always be in one row/column. They will never wrap to make a rectangle. But really if you just have one variable with few levels, it doesn't much matter.
You can also see that facet_grid(.~variable) and facet_grid(variable~.) will put the facet labels in different places (row headings vs column headings)
mg <- ggplot(mtcars, aes(x = mpg, y = wt)) + geom_point()
mg + facet_grid(vs~ .) + labs(title="facet_grid(vs~ .)"),
mg + facet_grid(.~ vs) + labs(title="facet_grid(.~ vs)")
So in the most simple of cases, there's nothing that different between them. The main reason to use facet_grid is to have a single, common axis for all facets so you can easily scan across all panels to make a direct comparison of data.
Actually, the same result is not produced all the time...
The number of facets which appear across the graphs pane is fixed with facet_grid (always the number of unique values in the variable) where as facet_wrap, like its name suggests, wraps the facets around the graphics pane. In this way the functions only result in the same graph when the number of facets produced is small.
Both facet_grid and facet_wrap take their arguments in the form row~columns, and nowdays we don't need to use the dot with facet_grid.
In order to compare their differences let's add a new variable with 8 unqiue values to the mtcars data set:
library(tidyverse)
mtcars$example <- rep(1:8, length.out = 32)
ggplot()+
geom_point(data = mtcars, aes(x = mpg, y = wt))+
facet_grid(~example, labeller = label_both)
Which results in a cluttered plot:
Compared to:
ggplot()+
geom_point(data = mtcars, aes(x = mpg, y = wt))+
facet_wrap(~example, labeller = label_both)
Which results in:
I want to compare the distribution of several variables (here X1 and X2) with a single value (here bm). The issue is that these variables are too many (about a dozen) to use a single boxplot.
Additionaly the levels are too different to use one plot. I need to use facets to make things more organised:
However with this plot my benchmark category (bm), which is a single value in X1 and X2, does not appear in X1 and seems to have several values in X2. I want it to be only this green line, which it is in the first plot. Any ideas why it changes? Is there any good workaround? I tried the options of facet_wrap/facet_grid, but nothing there delivered the right result.
I also tried combining a bar plot with bm and three empty categories with the boxplot. But firstly it looked terrible and secondly it got similarly screwed up in the facetting. Basically any work around would help.
Below the code to create the minimal example displayed here:
# Creating some sample data & loading libraries
library(ggplot2)
library(RColorBrewer)
set.seed(10111)
x=matrix(rnorm(40),20,2)
y=rep(c(-1,1),c(10,10))
x[y==1,]=x[y==1,]+1
x[,2]=x[,2]+20
df=data.frame(x,y)
# creating a benchmark point
benchmark=data.frame(y=rep("bm",2),key=c("X1","X2"),value=c(-0.216936,20.526312))
# melting the data frame, rbinding it with the benchmark
test_dat=rbind(tidyr::gather(df,key,value,-y),benchmark)
# Creating a plot
p_box <- ggplot(data = test_dat, aes(x=key, y=value,color=as.factor(test_dat$y))) +
geom_boxplot() + scale_color_manual(name="Cluster",values=brewer.pal(8,"Set1"))
# The first line delivers the first plot, the second line the second plot
p_box
p_box + facet_wrap(~key,scales = "free",drop = FALSE) + theme(legend.position = "bottom")
The problem only lies int the use of test_dat$y inside the color aes. Never use $ in aes, ggplot will mess up.
Anyway, I think you plot would improve if you use a geom_hline for the benchmark, instead of hacking in a single value boxplot:
library(ggplot2)
library(RColorBrewer)
ggplot(tidyr::gather(df,key,value,-y)) +
geom_boxplot(aes(x=key, y=value, color=as.factor(y))) +
geom_hline(data = benchmark, aes(yintercept = value), color = '#4DAF4A', size = 1) +
scale_color_manual(name="Cluster",values=brewer.pal(8,"Set1")) +
facet_wrap(~key,scales = "free",drop = FALSE) +
theme(legend.position = "bottom")
I have a dataframe with Wikipedia edits, with information about the number of edit for the user (1st edit, 2nd edit and so on), the timestamp when the edit was made, and how many words were added.
In the actual dataset, I have up to 20.000 edits per user and in some edits, they add up to 30.000 words.
However, here is a downloadable small example dataset to exemplify my problem. The header looks like this:
I am trying to plot the distribution of added words across the Edit Progression and across time. If I use the regular R barplot, i works just like expected:
barplot(UserFrame3$NoOfAdds,UserFrame3$EditNo)
But I want to do it in ggplot for nicer graphics and more customizing options.
If I plot this as a scatterplot, I get the same result:
ggplot(data = UserFrame3, aes(x = UserFrame3$EditNo, y = UserFrame3$NoOfAdds)) + geom_point(size = 0.1)
Same for a linegraph:
ggplot(data = UserFrame3, aes(x = UserFrame3$EditNo, y = UserFrame3$NoOfAdds)) +geom_line(size = 0.1)
But when I try to plot it as a bargraph in ggplot, I get this result:
ggplot(data = UserFrame3, aes(x = UserFrame3$EditNo, y = UserFrame3$NoOfAdds)) + geom_bar(stat = "identity", position = "dodge")
There appear to be a lot more holes on the X-axis and the maximum is nowhere close to where it should be (y = 317).
I suspect that ggplot somehow groups the bars and uses means instead of the actual values despite the "dodge" parameter? How can I avoid this? and how would I go about plotting the time progression as a bargraph aswell without ggplot averaging over multiple edits?
You should expect more x-axis "holes" using bars as compared with lines. Lines connect the zero values together, bars do not.
I used geom_col with your data download, it looks as expected:
UserFrame3 %>%
ggplot(aes(EditNo, NoOfAdds)) + geom_col()
In trying to color my stacked histogram according to a factor column; all the bars have a "green" roof? I want the bar-top to be the same color as the bar itself. The figure below shows clearly what is wrong. All the bars have a "green" horizontal line at the top?
Here is a dummy data set :
BodyLength <- rnorm(100, mean = 50, sd = 3)
vector <- c("80","10","5","5")
colors <- c("black","blue","red","green")
color <- rep(colors,vector)
data <- data.frame(BodyLength,color)
And the program I used to generate the plot below :
plot <- ggplot(data = data, aes(x=data$BodyLength, color = factor(data$color), fill=I("transparent")))
plot <- plot + geom_histogram()
plot <- plot + scale_colour_manual(values = c("Black","blue","red","green"))
Also, since the data column itself contains color names, any way I don't have to specify them again in scale_color_manual? Can ggplot identify them from the data itself? But I would really like help with the first problem right now...Thanks.
Here is a quick way to get your colors to scale_colour_manual without writing out a vector:
data <- data.frame(BodyLength,color)
data$color<- factor(data$color)
and then later,
scale_colour_manual(values = levels(data$color))
Now, with respect to your first problem, I don't know exactly why your bars have green roofs. However, you may want to look at some different options for the position argument in geom_histogram, such as
plot + geom_histogram(position="identity")
..or position="dodge". The identity option is closer to what you want but since green is the last line drawn, it overwrites previous the colors.
I like density plots better for these problems myself.
ggplot(data=data, aes(x=BodyLength, color=color)) + geom_density()
ggplot(data=data, aes(x=BodyLength, fill=color)) + geom_density(alpha=.3)
Summary: I want to choose the colors for a ggplot2() density distribution plot without losing the automatically generated legend.
Details: I have a dataframe created with the following code (I realize it is not elegant but I am only learning R):
cands<-scan("human.i.cands.degnums")
non<-scan("human.i.non.degnums")
df<-data.frame(grp=factor(c(rep("1. Candidates", each=length(cands)),
rep("2. NonCands",each=length(non)))), val=c(cands,non))
I then plot their density distribution like so:
library(ggplot2)
ggplot(df, aes(x=val,color=grp)) + geom_density()
This produces the following output:
I would like to choose the colors the lines appear in and cannot for the life of me figure out how. I have read various other posts on the site but to no avail. The most relevant are:
Changing color of density plots in ggplot2
Overlapped density plots in ggplot2
After searching around for a while I have tried:
## This one gives an error
ggplot(df, aes(x=val,colour=c("red","blue"))) + geom_density()
Error: Aesthetics must either be length one, or the same length as the dataProblems:c("red", "blue")
## This one produces a single, black line
ggplot(df, aes(x=val),colour=c("red","green")) + geom_density()
The best I've come up with is this:
ggplot() + geom_density(aes(x=cands),colour="blue") + geom_density(aes(x=non),colour="red")
As you can see in the image above, that last command correctly changes the colors of the lines but it removes the legend. I like ggplot2's legend system. It is nice and simple, I don't want to have to fiddle about with recreating something that ggplot is clearly capable of doing. On top of which, the syntax is very very ugly. My actual data frame consists of 7 different groups of data. I cannot believe that writing + geom_density(aes(x=FOO),colour="BAR") 7 times is the most elegant way of coding this.
So, if all else fails I will accept with an answer that tells me how to get the legend back on to the 2nd plot. However, if someone can tell me how to do it properly I will be very happy.
set.seed(45)
df <- data.frame(x=c(rnorm(100), rnorm(100, mean=2, sd=2)), grp=rep(1:2, each=100))
ggplot(data = df, aes(x=x, color=factor(grp))) + geom_density() +
scale_color_brewer(palette = "Set1")
ggplot(data = df, aes(x=x, color=factor(grp))) + geom_density() +
scale_color_brewer(palette = "Set3")
gives me same plots with different sets of colors.
Provide vector containing colours for the "values" argument to map discrete values to manually chosen visual ones:
ggplot(df, aes(x=val,color=grp)) +
geom_density() +
scale_color_manual(values=c("red", "blue"))
To choose any colour you wish, enter the hex code for it instead:
ggplot(df, aes(x=val,color=grp)) +
geom_density() +
scale_color_manual(values=c("#f5d142", "#2bd63f")) # yellow/green