want to layer aes in ggplot2 - r

I would like to plot another series of data on top of a current graph. The additional data only contains information for 3 (out of 6) spp, which are used in the facet_wraping.
The other series of data is currently a column (in the same data file).
Current graph:
ped.num <- ggplot(data, aes(ped.length, seeds.inflorstem))
ped.num + geom_point(size=2) + theme_bw() + facet_wrap(~spp, scales = "free_y")
Additional layer would be:
aes(ped.length, seeds.filled)
I feel I should be able to plot them using the same y-axis, because they have just slightly smaller values. How do I go about add this layer?

#ialm 's solution should work fine, but I recommend calling the aes function separately in each geom_* because it makes the code easier to read.
ped.num <- ggplot(data) +
geom_point(aes(x=ped.length, y=seeds.inflorstem), size=2) +
theme_bw() +
facet_wrap(~spp, scales="free_y") +
geom_point(aes(x=ped.length, y=seeds.filled))

(You'll always get better answers if you include example data, but I'll take a shot in the dark)
Since you want to plot two variables that are on the same data.frame, it's probably easiest to reshape the data before feeding it into ggplot:
library(reshape2)
# Melting data gives you exactly one observation per row - ggplot likes that
dat.melt <- melt(dat,
id.var = c("spp", "ped.length"),
measure.var = c("seeds.inflorstem", "seeds.filled")
)
# Plotting is slightly different - instead of explicitly naming each variable,
# you'll refer to "variable" and "value"
ggplot(dat.melt, aes(x = ped.length, y = value, color = variable)) +
geom_point(size=2) +
theme_bw() +
facet_wrap(~spp, scales = "free_y")
The seeds.filled values should plot only on the facets for the corresponding species.
I prefer this to Drew's (totally valid) approach of explicitly mapping different layers because you only need a single geom_point() whether you have two variables or twenty and it's easy to map a variety of aesthetics to variable.

Related

What is the purpose of using facet_grid(variable ~ .) instead of just using facet_wrap?

So I'm self-teaching myself R right now using this online resource: "https://r4ds.had.co.nz/data-visualisation.html#facets"
This particular section is going over the use of facet_wrap and facet_grid. It's clear to me that facet_grid is primarily used when wanting to visualize a plot along two additional dimensions, rather than just one. What I don't understand is why you can use facet_grid(.~variable) or facet_grid(variable~.) to basically achieve the same result as facet_wrap. Putting a "." in place of a variable results in just not faceting along the row or column dimension, or in other words showing 1 additional variable just as facet_wrap would do.
If anyone can shed some light on this, thank you!
If you use facet_grid, the facets will always be in one row/column. They will never wrap to make a rectangle. But really if you just have one variable with few levels, it doesn't much matter.
You can also see that facet_grid(.~variable) and facet_grid(variable~.) will put the facet labels in different places (row headings vs column headings)
mg <- ggplot(mtcars, aes(x = mpg, y = wt)) + geom_point()
mg + facet_grid(vs~ .) + labs(title="facet_grid(vs~ .)"),
mg + facet_grid(.~ vs) + labs(title="facet_grid(.~ vs)")
So in the most simple of cases, there's nothing that different between them. The main reason to use facet_grid is to have a single, common axis for all facets so you can easily scan across all panels to make a direct comparison of data.
Actually, the same result is not produced all the time...
The number of facets which appear across the graphs pane is fixed with facet_grid (always the number of unique values in the variable) where as facet_wrap, like its name suggests, wraps the facets around the graphics pane. In this way the functions only result in the same graph when the number of facets produced is small.
Both facet_grid and facet_wrap take their arguments in the form row~columns, and nowdays we don't need to use the dot with facet_grid.
In order to compare their differences let's add a new variable with 8 unqiue values to the mtcars data set:
library(tidyverse)
mtcars$example <- rep(1:8, length.out = 32)
ggplot()+
geom_point(data = mtcars, aes(x = mpg, y = wt))+
facet_grid(~example, labeller = label_both)
Which results in a cluttered plot:
Compared to:
ggplot()+
geom_point(data = mtcars, aes(x = mpg, y = wt))+
facet_wrap(~example, labeller = label_both)
Which results in:

Difference between putting aes(x=…) in ggplot() or in geom()

What is the difference between putting aes(x=…) in ggplot() or in geom() (e.g. geom_histogram() below):
1. in ggplot():
ggplot(diamonds) +
geom_histogram(binwidth=500, aes(x=diamonds$price))+
xlab("Diamond Price U$") + ylab("Frequency")+
ggtitle("Diamond Price Distribution")
2. in the geom():
ggplot(diamonds, aes(x=diamonds$price)) +
geom_histogram(bidwidth= 500) +
xlab("Price") + ylab("Frequncy") +
ggtitle("Diamonds Price distribution")
Whether you put x = price in the original ggplot() call or in a specific geom only really matters if you have multiple geoms with different mappings. The mapping you specify in the ggplot() call will be applied to all geoms, so it's often best to put the mapping in the top level like that, if only to save you having to type it out again for each individual geom. Specify mappings in the individual geoms if they only apply to that specific geom.
Also note that it should just be aes(x = price), not aes(x = diamonds$price). ggplot knows to look in the dataframe you're using as your data argument. If you pass a vector manually like diamonds$price you might mess up facetting or grouping in a more complex plot.

Strings in ggplot x-axis

I'm trying to create a graph in R like this:
I have three columns (online, offline and routes). However, when I add the following code:
library(ggplot2)
ggplot(coefroute, aes(routes,offline)) + geom_line()
I get the following message:
geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?
sample of coefroute:
routes online offline
(Intercept) 210.4372 257.215
route10 7.543 30.0182
route100 18.3794 1.5313
route11 38.6537 78.8655
route12 66.501 94.8838
route13 -22.2391 -25.8448
route14 24.3652 177.7728
route15 48.5464 51.126 ...
routes: char, online and offline: num
Can anybody help me with putting strings in x-axis in R?
Thank you!
In the absence of sample data, here's some toy data that has the same structure as yours:
coefroute <- data.frame(routes = c("A","B","C","D","E"),
online = c(21,26,30,15,20),
offline = c(15,20,7,12,15))
To replicate your example graph in ggplot2 you would want your data in a long format, so that you can group on offline/online. See more here: Plotting multiple lines from a data frame with ggplot2 and http://ggplot2.tidyverse.org/reference/aes.html.
You can rearrange your data into a long format very easily with lots of different functions or packages, but a standard approach is to use gather from tidyr and group your series for online and offline into something called, say, status or whatever you want.
library(tidyr)
coefroute <- gather(coefroute, key = status, value = coef, online:offline)
Then you can plot this easily in ggplot:
library(ggplot2)
ggplot(coefroute, aes(x = routes, y = coef, group = status, colour = status))
+ geom_line() + scale_x_discrete()
That should create something like your example graph. You may want to modify the colours, captions, etc. There's lots of documentation about these things that's easy enough to find. I've added scale_x_discrete() here so that ggplot knows to treat your x variable as a discrete one.
Secondly, my suspicion is that a line plot may be less effective than geoms in communicating what you're trying to communicate here. I would perhaps use geom_bar(stat = "identity", position = "dodge") in place of geom_line. That would create a vertical bar chart for each coefficient with offline and online coefficients side by side.
ggplot(coefroute, aes(x = routes, y = coef, group = status, fill = status))
+ geom_bar(stat = "identity", position = "dodge") + scale_x_discrete()
There are two approaches:
Plotting the data in wide format (quick & dirty, not recommended)
plotting the data after reshaping from wide to long format (as shown by dshkol but using a different approach.
Plotting the data in wide format
# using dshkol's toy data
coefroute <- data.frame(routes = c("A","B","C","D","E"),
online = c(21,26,30,15,20),
offline = c(15,20,7,12,15))
library(ggplot2)
# plotting data in wide format (not recommended)
ggplot(coefroute, aes(x = routes, group = 1L)) +
geom_line(aes(y = online), colour = "blue") +
geom_line(aes(y = offline), colour = "orange")
This approach has several drawbacks. Each variable needs its own call to geom_line() and there is no legend.
Plotting reshaped data
For reshaping, the melt() is used which is available from the reshape2 package (the predecessor of the tidyr/dplyr packages) or in a faster implementation form the data.table package.
ggplot(data.table::melt(coefroute, id.var = "routes"),
aes(x = routes, y = value, group = variable, colour = variable)) +
geom_line()
Note that in both cases the group aesthetic has to be specified because the x-axis is discrete. This tells ggplot to consider the data points belonging to one series despite the discrete x values.

compare boxplots with a single value

I want to compare the distribution of several variables (here X1 and X2) with a single value (here bm). The issue is that these variables are too many (about a dozen) to use a single boxplot.
Additionaly the levels are too different to use one plot. I need to use facets to make things more organised:
However with this plot my benchmark category (bm), which is a single value in X1 and X2, does not appear in X1 and seems to have several values in X2. I want it to be only this green line, which it is in the first plot. Any ideas why it changes? Is there any good workaround? I tried the options of facet_wrap/facet_grid, but nothing there delivered the right result.
I also tried combining a bar plot with bm and three empty categories with the boxplot. But firstly it looked terrible and secondly it got similarly screwed up in the facetting. Basically any work around would help.
Below the code to create the minimal example displayed here:
# Creating some sample data & loading libraries
library(ggplot2)
library(RColorBrewer)
set.seed(10111)
x=matrix(rnorm(40),20,2)
y=rep(c(-1,1),c(10,10))
x[y==1,]=x[y==1,]+1
x[,2]=x[,2]+20
df=data.frame(x,y)
# creating a benchmark point
benchmark=data.frame(y=rep("bm",2),key=c("X1","X2"),value=c(-0.216936,20.526312))
# melting the data frame, rbinding it with the benchmark
test_dat=rbind(tidyr::gather(df,key,value,-y),benchmark)
# Creating a plot
p_box <- ggplot(data = test_dat, aes(x=key, y=value,color=as.factor(test_dat$y))) +
geom_boxplot() + scale_color_manual(name="Cluster",values=brewer.pal(8,"Set1"))
# The first line delivers the first plot, the second line the second plot
p_box
p_box + facet_wrap(~key,scales = "free",drop = FALSE) + theme(legend.position = "bottom")
The problem only lies int the use of test_dat$y inside the color aes. Never use $ in aes, ggplot will mess up.
Anyway, I think you plot would improve if you use a geom_hline for the benchmark, instead of hacking in a single value boxplot:
library(ggplot2)
library(RColorBrewer)
ggplot(tidyr::gather(df,key,value,-y)) +
geom_boxplot(aes(x=key, y=value, color=as.factor(y))) +
geom_hline(data = benchmark, aes(yintercept = value), color = '#4DAF4A', size = 1) +
scale_color_manual(name="Cluster",values=brewer.pal(8,"Set1")) +
facet_wrap(~key,scales = "free",drop = FALSE) +
theme(legend.position = "bottom")

ggplot2 stacked barplots, formatting, and grids

In the data that I am attempting to plot, each sample belongs in one of several groups, that will be plotted on their own grids. I am plotting stacked bar plots for each sample that will be ordered in increasing number of sequences, which is an id attribute of each sample.
Currently, the plot (with some random data) looks like this:
(Since I don't have the required 10 rep for images, I am linking it here)
There are couple things I need to accomplish. And I don't know where to start.
I would like the bars not to be placed at its corresponding nseqs value, rather placed next to each other in ascending nseqs order.
I don't want each grid to have the same scale. Everything needs to fit snugly.
I have tried to set scales and size to for facet_grid to free_x, but this results in an unused argument error. I think this is related to the fact that I have not been able to get the scales library loaded properly (it keeps saying not available).
Code that deals with plotting:
ggfdata <- melt(fdata, id.var=c('group','nseqs','sample'))
p <- ggplot(ggfdata, aes(x=nseqs, y=value, fill = variable)) +
geom_bar(stat='identity') +
facet_grid(~group) +
scale_y_continuous() +
opts(title=paste('Taxonomic Distribution - grouped by',colnames(meta.frame)[i]))
Try this:
update.packages()
## I'm assuming your ggplot2 is out of date because you use opts()
## If the scales library is unavailable, you might need to update R
ggfdata <- melt(fdata, id.var=c('group','nseqs','sample'))
ggfdata$nseqs <- factor(ggfdata$nseqs)
## Making nseqs a factor will stop ggplot from treating it as a numeric,
## which sounds like what you want
p <- ggplot(ggfdata, aes(x=nseqs, y=value, fill = variable)) +
geom_bar(stat='identity') +
facet_wrap(~group, scales="free_x") + ## No need for facet_grid with only one variable
labs(title = paste('Taxonomic Distribution - grouped by',colnames(meta.frame)[i]))

Resources