'Merging' two plots using ggplot2 and R - r

Given the following dataset:
data = cbind(1:10,c('open','reopen','closed'),letters[1:3],1:10)
data = rbind(data,cbind(1:10,c('open','closed','reopen'),letters[1:3],5:10))
data = rbind(data,cbind(1:10,c('closed','open','reopen'),letters[1:3],3:10))
data = data.frame(data);
colnames(data) <- c("id","status","author","when")
I'd like to get a plot similar to the following:
ggplot(data, aes(when,id)) +
geom_line(aes(group = id,colour = status)) +
geom_point(aes(group = id,colour = author))
But, as such I get a single legend by 'author' with the status and author values. How can I get the same result but with a legend for author and other for status? My rationale is that I want to layer two plots of the same dataset on top of each other.

I don't think you can have different color scales / legends for one ggplot. You could hack something together (see this question for legend hacking), but in this case where one of your geom's is point, you could just use fill and one of the point options that are filled in.
ggplot(data, aes(when,id)) +
geom_line(aes(group = id,colour = status)) +
geom_point(aes(group = id, fill = author),
shape = 21, color = NA, size = 4)
Here the colors used are the same for each, but you can edit the color or fill scales individually, e.g., adding
scale_fill_brewer(type = "qual") +
scale_color_brewer(type = "qual", palette = 2)
I do agree with AndyClifton that using color in two ways will be hard to distinguish. You could also experiment with line types, point shapes, or even plotting with geom_text using a word, a letter, or a number as a label instead of points. You say you have more than 6 values for author, but it will be very difficult to distinguish more than 6 colors for author, especially when color is also being used for status.

Let's take your data. First you should be aware that you have a problem that your when and id column is a string, so you are plotting 1, 10, 2, 3, ... not 1,...9,10. We can fix that:
data$when.num <-as.numeric(as.character(data$when))
data$id.num <-as.numeric(as.character(data$id))
Then we'll plot it but use different shapes to get two different legends:
require(ggplot2)
p <- ggplot(data, aes(x = when.num, y = id)) +
geom_line(aes(group = id,colour = status)) +
geom_point(aes(group = id,shape = author))
print(p)
And you get this:
I think this is much clearer than using coloured points for the author, but this is a question of taste.

Related

R code of scatter plot for three variables

Hi I am trying to code for a scatter plot for three variables in R:
Race= [0,1]
YOI= [90,92,94]
ASB_mean = [1.56, 1.59, 1.74]
Antisocial <- read.csv(file = 'Antisocial.csv')
Table_1 <- ddply(Antisocial, "YOI", summarise, ASB_mean = mean(ASB))
Table_1
Race <- unique(Antisocial$Race)
Race
ggplot(data = Table_1, aes(x = YOI, y = ASB_mean, group_by(Race))) +
geom_point(colour = "Black", size = 2) + geom_line(data = Table_1, aes(YOI,
ASB_mean), colour = "orange", size = 1)
Image of plot: https://drive.google.com/file/d/1E-ePt9DZJaEr49m8fguHVS0thlVIodu9/view?usp=sharing
Data file: https://drive.google.com/file/d/1UeVTJ1M_eKQDNtvyUHRB77VDpSF1ASli/view?usp=sharing
Can someone help me understand where I am making mistake? I want to plot mean ASB vs YOI grouped by Race. Thanks.
I am not sure what is your desidered output. Maybe, if I well understood your question I Think that you want somthing like this.
g_Antisocial <- Antisocial %>%
group_by(Race) %>%
summarise(ASB = mean(ASB),
YOI = mean(YOI))
Antisocial %>%
ggplot(aes(x = YOI, y = ASB, color = as_factor(Race), shape = as_factor(Race))) +
geom_point(alpha = .4) +
geom_point(data = g_Antisocial, size = 4) +
theme_bw() +
guides(color = guide_legend("Race"), shape = guide_legend("Race"))
and this is the output:
#Maninder: there are a few things you need to look at.
First of all: The grammar of graphics of ggplot() works with layers. You can add layers with different data (frames) for the different geoms you want to plot.
The reason why your code is not working is that you mix the layer call and or do not really specify (and even mix) what is the scatter and line visualisation you want.
(I) Use ggplot() + geom_point() for a scatter plot
The ultimate first layer is: ggplot(). Think of this as your drawing canvas.
You then speak about adding a scatter plot layer, but you actually do not do it.
For example:
# plotting antisocal data set
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race)))
will plot your Antiscoial data set using the scatter, i.e. geom_point() layer.
Note that I put Race as a factor to have a categorical colour scheme otherwise you might end up with a continous palette.
(II) line plot
In analogy to above, you would get for the line plot the following:
# plotting Table_1
ggplot() +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean))
I save showing the plot of the line.
(III) combining different layers
# putting both together
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race))) +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean)) +
## this is to set the legend title and have a nice(r) name in your colour legend
labs(colour = "Race")
This yields:
That should explain how ggplot-layering works. Keep an eye on the datasets and geoms that you want to use. Before working with inheritance in aes, I recommend to keep the data= and aes() call in the geom_xxxx. This avoids confustion.
You may want to explore with geom_jitter() instead of geom_point() to get a bit of a better presentation of your dataset. The "few" points plotted are the result of many datapoints in the same position (and overplotted).
Moving away from plotting to your question "I want to plot mean ASB vs YOI grouped by Race."
I know too little about your research to fully comprehend what you mean with that.
I take it that the mean ASB you calculated over the whole population is your reference (aka your Table_1), and you would like to see how the Race groups feature vs this population mean.
One option is to group your race data points and show them as boxplots for each YOI.
This might be what you want. The boxplot gives you the median and quartiles, and you can compare this per group against the calculated ASB mean.
For presentation purposes, I highlighted the line by increasing its size and linetype. You can play around with the colours, etc. to give you the aesthetics you aim for.
Please note, that for the grouped boxplot, you also have to treat your integer variable YOI, I coerced into a categorical factor. Boxplot works with fill for the body (colour sets only the outer line). In this setup, you also need to supply a group value to geom_line() (I just assigned it to 1, but that is arbitrary - in other contexts you can assign another variable here).
ggplot() +
geom_boxplot(data = Antisocial, aes(x = as.factor(YOI), y = ASB, fill = as.factor(Race))) +
geom_line(data = Table_1, aes(x = as.factor(YOI), y = ASB_mean, group = 1)
, size = 2, linetype = "dashed") +
labs(x = "YOI", fill = "Race")
Hope this gets you going!

Legend type when plotting different geometries [duplicate]

Let's say I don't need a 'proper' variable mapping but still would like to have legend keys to help the chart understanding. My actual data are similar to the following df
df <- data.frame(id = 1:10, line = rnorm(10), points = rnorm(10))
library(ggplot2)
ggplot(df) +
geom_line(aes(id, line, colour = "line")) +
geom_point(aes(id, points, colour = "points"))
Basically, I would like the legend key relative to points to be.. just a point, without the line in the middle. I got close to that with this:
library(reshape2)
df <- melt(df, id.vars="id")
ggplot() +
geom_point(aes(id, value, shape = variable), df[df$variable=="points",]) +
geom_line(aes(id, value, colour = variable), df[df$variable=="line",])
but it defines two separate legends. Fixing the second code (and having to reshape my data) would be fine too, but I'd prefer a way (if any) to manually change any legend key (and keep using the first approch). Thanks!
EDIT :
thanks #alexwhan you refreshed my memory about variable mapping. However, the easiest way I've got so far is still the following (very bad hack!):
df <- data.frame(id = 1:10, line = rnorm(10), points = rnorm(10))
ggplot(df) +
geom_line(aes(id, line, colour = "line")) +
geom_point(aes(id, points, shape = "points")) +
theme(legend.title=element_blank())
which is just hiding the title of the two different legends.
Other ideas more than welcome!!!
You can use override.aes= inside guides() function to change default appearance of legend. In this case your guide is color= and then you should set shape=c(NA,16) to remove shape for line and then linetype=c(1,0) to remove line from point.
ggplot(df) +
geom_line(aes(id, line, colour = "line")) +
geom_point(aes(id, points, colour = "points"))+
guides(color=guide_legend(override.aes=list(shape=c(NA,16),linetype=c(1,0))))
I am not aware of any way to do this easily, but you can do a hack version like this (using your melted dataframe):
p <- ggplot(df.m, aes(id, value)) +
geom_line(aes(colour = variable, linetype = variable)) + scale_linetype_manual(values = c(1,0)) +
geom_point(aes(colour = variable, alpha = variable)) + scale_alpha_manual(values = c(0,1))
The key is that you need to get the mapping right to have it displayed correctly in the legend. In this case, getting it 'right', means fooling it to look the way you want it to. It's probably worth pointing out this only works because you can set linetype to blank (0) and then use the alpha scale for the points. You can't use alpha for both, because it will only take one scale.

Plotting with secondary axis

I am trying to overlay a bar graph (primary axis) and line (secondary axis), but I keep getting an error that I don't understand how to fix. I have tried to follow multiple examples from other questions, but I'm still not getting the result I need.
Here's my code:
ggplot(data = MRIP, aes(x = Length_mm)) +
geom_bar(aes(y = Perc.of.Fish), stat="identity", width = 10, fill = "black") +
geom_line(aes(y = Landings), stat = "identity", size = 2, color = "red") +
scale_y_continuous(name = "Percentage", sec.axis = sec_axis (~./Landings, name = "Landings"))
How do I fix this error: "Error in f(...) : object 'Landings' not found"?
Try this:
coef <- 4000
MRIP %>%
mutate(LandingsPlot=Landings/coef) %>%
ggplot(aes(x = Length_mm)) +
geom_col(aes(y = Perc.of.Fish), width = 10, fill = "black") +
geom_line(aes(y = LandingsPlot), size = 2, color = "red") +
scale_y_continuous(
name = "Percentage",
sec.axis = sec_axis (trans= ~.*coef, name = "Landings")
)
Giving
Why does this work? The scale factor used to define the secondary axis cannot be part of the input data.frame - because if it were, it could potentially vary across rows (as it does here). That would mean you had a separate scale for each row of the input data.frame. That doesn't make sense. So, you scale the secondary variable to take a similar range to that of the primary variable. I chose coef <- 4000 by eye. The exact value doesn't matter, so long as it's sensible.
Having divided by the scale factor to obtain the plotted values, you need to multiply by the scale factor in the transformation in order to get the correct labels on the secondary axis.
Thank you for providing a good MWE. But next time, for extra marks, please post the results of dput() in your question rather than in the comments...
Update
To answer OP's follow up question in the comments: legends are linked to aesthetics. So to get a legend, move the attribute that you want to label inside aes(). You can then define and customise the legend using the appropriate scale_<aesthetic>_<type>. However it's worth noting that if you write, say, aes(colour="black") then "black" is just a character string. It doesn't define a colour. (Using the standard defaults, it will in fact appear as a slightly pinkish red, labelled "black"!) This can be confusing, so it might be a good idea to use arbitrary strings like "a", "b" and "c" (or "Landings" and "Percentage") in the aesthetics. Anyway...
coef <- 4000
#Note fill and color have moved inside aes()
MRIP %>%
mutate(LandingsPlot=Landings/coef) %>%
ggplot(aes(x = Length_mm)) +
geom_col(aes(y = Perc.of.Fish, fill = "black"), width = 10,) +
geom_line(aes(y = LandingsPlot, color = "red"), size = 2) +
scale_y_continuous(
name = "Percentage",
sec.axis = sec_axis (trans= ~.*coef, name = "Landings")
) +
scale_color_manual(values=c("red"), labels=c("Landings"), name=" ") +
scale_fill_manual(values=c("black"), labels=c("Percentage"), name=" ")
Gives

How to label stacked histogram in ggplot

I am trying to add corresponding labels to the color in the bar in a histogram. Here is a reproducible code.
ggplot(aes(displ),data =mpg) + geom_histogram(aes(fill=class),binwidth = 1,col="black")
This code gives a histogram and give different colors for the car "class" for the histogram bars. But is there any way I can add the labels of the "class" inside corresponding colors in the graph?
The inbuilt functions geom_histogram and stat_bin are perfect for quickly building plots in ggplot. However, if you are looking to do more advanced styling it is often required to create the data before you build the plot. In your case you have overlapping labels which are visually messy.
The following codes builds a binned frequency table for the dataframe:
# Subset data
mpg_df <- data.frame(displ = mpg$displ, class = mpg$class)
melt(table(mpg_df[, c("displ", "class")]))
# Bin Data
breaks <- 1
cuts <- seq(0.5, 8, breaks)
mpg_df$bin <- .bincode(mpg_df$displ, cuts)
# Count the data
mpg_df <- ddply(mpg_df, .(mpg_df$class, mpg_df$bin), nrow)
names(mpg_df) <- c("class", "bin", "Freq")
You can use this new table to set a conditional label, so boxes are only labelled if there are more than a certain number of observations:
ggplot(mpg_df, aes(x = bin, y = Freq, fill = class)) +
geom_bar(stat = "identity", colour = "black", width = 1) +
geom_text(aes(label=ifelse(Freq >= 4, as.character(class), "")),
position=position_stack(vjust=0.5), colour="black")
I don't think it makes a lot of sense duplicating the labels, but it may be more useful showing the frequency of each group:
ggplot(mpg_df, aes(x = bin, y = Freq, fill = class)) +
geom_bar(stat = "identity", colour = "black", width = 1) +
geom_text(aes(label=ifelse(Freq >= 4, Freq, "")),
position=position_stack(vjust=0.5), colour="black")
Update
I realised you can actually selectively filter a label using the internal ggplot function ..count... No need to preformat the data!
ggplot(mpg, aes(x = displ, fill = class, label = class)) +
geom_histogram(binwidth = 1,col="black") +
stat_bin(binwidth=1, geom="text", position=position_stack(vjust=0.5), aes(label=ifelse(..count..>4, ..count.., "")))
This post is useful for explaining special variables within ggplot: Special variables in ggplot (..count.., ..density.., etc.)
This second approach will only work if you want to label the dataset with the counts. If you want to label the dataset by the class or another parameter, you will have to prebuild the data frame using the first method.
Looking at the examples from the other stackoverflow links you shared, all you need to do is change the vjust parameter.
ggplot(mpg, aes(x = displ, fill = class, label = class)) +
geom_histogram(binwidth = 1,col="black") +
stat_bin(binwidth=1, geom="text", vjust=1.5)
That said, it looks like you have other issues. Namely, the labels stack on top of each other because there aren't many observations at each point. Instead I'd just let people use the legend to read the graph.

Best way to calculate number of facets in geom_hline/_vline

When I combine geom_vline() with facet_grid() like so:
DATA <- data.frame(x = 1:6,y = 1:6, f = rep(letters[1:2],3))
ggplot(DATA,aes(x = x,y = y)) +
geom_point() +
facet_grid(f~.) +
geom_vline(xintercept = 2:3,
colour =c("goldenrod3","dodgerblue3"))
I get an error message stating Error: Aesthetics must be either length 1 or the same as the data (4): colour because there are two lines in each facet and there are two facets. One way to get around this is to use rep(c("goldenrod3","dodgerblue3"),2), but this requires that every time I change the faceting variables, I also have to calculate the number of facets and replace the magic number (2) in the call to rep(), which makes re-using ggplot code so much less nimble.
Is there a way to get the number of facets directly from ggplot for use in this situation?
You could put the xintercept and colour info into a data.frame to pass to geom_vline and then use scale_color_identity.
ggplot(DATA, aes(x = x, y = y)) +
geom_point() +
facet_grid(f~.) +
geom_vline(data = data.frame(xintercept = 2:3,
colour = c("goldenrod3","dodgerblue3") ),
aes(xintercept = xintercept, color = colour) ) +
scale_color_identity()
This side-steps the issue of figuring out the number of facets, although that could be done by pulling out the number of unique values in the faceting variable with something like length(unique(DATA$f)).

Resources