Discrete values and geom_ribbon and geom_lines + problems with "discrete" scale - r

I have got a file like this one:
Month,Open,Closed
2017-08,53,38
2017-09,102,85
2017-10,58,38
2017-11,51,42
2017-12,32,24
2018-01,24,30
2018-02,56,46
2018-03,82,74
2018-04,95,89
2018-05,16,86
I want to plot both lines, and also shade the difference between them. So this works:
ggplot() +geom_line(data=issues.m,aes(x=Month,y=Open,group=1))
+geom_line(data=issues.m,aes(x=Month,y=Closed,group=1))
+geom_ribbon(data=issues.m, aes(x=Month,ymin=Closed,ymax=Open,color=Open-Closed))
+theme_tufte()
+theme(axis.text.x = element_text(angle = 90, hjust = 1))
producing this
First problem here is that I would like the whole area between the two lines shaded if possible, not a single line. How can I do that?
But I would also like to color the two lines. If I add a color to one of them:
ggplot()
+geom_line(data=issues.m,aes(x=Month,y=Open,group=1,color='open'))
+geom_line(data=issues.m,aes(x=Month,y=Closed,group=1))
+geom_ribbon(data=issues.m, aes(x=Month,ymin=Closed,ymax=Open,color=Open-Closed))
+theme_tufte()
+theme(axis.text.x = element_text(angle = 90, hjust = 1))
I get the error:
Error: Continuous value supplied to discrete scale
So, can what I want to do be done at all? Would it be possible to change the colour palette of the ribbon too?

Your error was because you were mapping Open - Closed onto the color, which will be a continuous variable, i.e. the difference between those two values for each month. But you also assigned "open" to color inside the aes in one of your geom_lines. That means you're trying to assign both continuous values and discrete values to the same scale, and that's not going to work.
If all you need to do is get 2 colors, one for each line, you can do this one of two ways, the second of which fits more into the ggplot/tidyverse way of doing things.
First off I turned your dates into date objects to clean up the x-axis and avoid rotating the labels—feel free to experiment with the date breaks that work well in scale_x_date.
The less "tidy" way is to just make two geom_lines, one for Open and one for Closed, and assign a color to each.
library(tidyverse)
df_dated <- df %>%
mutate(month2 = sprintf("%s-01", Month) %>% lubridate::ymd())
ggplot(df_dated, aes(x = month2)) +
geom_ribbon(aes(ymin = Open, ymax = Closed), fill = "lightblue2") +
geom_line(aes(y = Open), color = "green3") +
geom_line(aes(y = Closed), color = "red") +
ggthemes::theme_tufte()
But the more idiomatically "tidy" way is to make a long-shaped version of the data so you can map a variable—in this case whether an observation is the opening or closing value—onto an aesthetic such as color. This also gives you a legend—if you don't want it, you can get rid of it in the theme. This lets you set a scale for the colors, instead of hard-coding into each geom_line.
df_date_long <- df_dated %>%
gather(key, value, -month2, -Month)
ggplot(df_dated, aes(x = month2)) +
geom_ribbon(aes(ymin = Open, ymax = Closed), fill = "lightblue2") +
geom_line(aes(y = value, color = key), data = df_date_long) +
scale_color_manual(values = c(Open = "green3", Closed = "red")) +
ggthemes::theme_tufte()

Related

R code of scatter plot for three variables

Hi I am trying to code for a scatter plot for three variables in R:
Race= [0,1]
YOI= [90,92,94]
ASB_mean = [1.56, 1.59, 1.74]
Antisocial <- read.csv(file = 'Antisocial.csv')
Table_1 <- ddply(Antisocial, "YOI", summarise, ASB_mean = mean(ASB))
Table_1
Race <- unique(Antisocial$Race)
Race
ggplot(data = Table_1, aes(x = YOI, y = ASB_mean, group_by(Race))) +
geom_point(colour = "Black", size = 2) + geom_line(data = Table_1, aes(YOI,
ASB_mean), colour = "orange", size = 1)
Image of plot: https://drive.google.com/file/d/1E-ePt9DZJaEr49m8fguHVS0thlVIodu9/view?usp=sharing
Data file: https://drive.google.com/file/d/1UeVTJ1M_eKQDNtvyUHRB77VDpSF1ASli/view?usp=sharing
Can someone help me understand where I am making mistake? I want to plot mean ASB vs YOI grouped by Race. Thanks.
I am not sure what is your desidered output. Maybe, if I well understood your question I Think that you want somthing like this.
g_Antisocial <- Antisocial %>%
group_by(Race) %>%
summarise(ASB = mean(ASB),
YOI = mean(YOI))
Antisocial %>%
ggplot(aes(x = YOI, y = ASB, color = as_factor(Race), shape = as_factor(Race))) +
geom_point(alpha = .4) +
geom_point(data = g_Antisocial, size = 4) +
theme_bw() +
guides(color = guide_legend("Race"), shape = guide_legend("Race"))
and this is the output:
#Maninder: there are a few things you need to look at.
First of all: The grammar of graphics of ggplot() works with layers. You can add layers with different data (frames) for the different geoms you want to plot.
The reason why your code is not working is that you mix the layer call and or do not really specify (and even mix) what is the scatter and line visualisation you want.
(I) Use ggplot() + geom_point() for a scatter plot
The ultimate first layer is: ggplot(). Think of this as your drawing canvas.
You then speak about adding a scatter plot layer, but you actually do not do it.
For example:
# plotting antisocal data set
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race)))
will plot your Antiscoial data set using the scatter, i.e. geom_point() layer.
Note that I put Race as a factor to have a categorical colour scheme otherwise you might end up with a continous palette.
(II) line plot
In analogy to above, you would get for the line plot the following:
# plotting Table_1
ggplot() +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean))
I save showing the plot of the line.
(III) combining different layers
# putting both together
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race))) +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean)) +
## this is to set the legend title and have a nice(r) name in your colour legend
labs(colour = "Race")
This yields:
That should explain how ggplot-layering works. Keep an eye on the datasets and geoms that you want to use. Before working with inheritance in aes, I recommend to keep the data= and aes() call in the geom_xxxx. This avoids confustion.
You may want to explore with geom_jitter() instead of geom_point() to get a bit of a better presentation of your dataset. The "few" points plotted are the result of many datapoints in the same position (and overplotted).
Moving away from plotting to your question "I want to plot mean ASB vs YOI grouped by Race."
I know too little about your research to fully comprehend what you mean with that.
I take it that the mean ASB you calculated over the whole population is your reference (aka your Table_1), and you would like to see how the Race groups feature vs this population mean.
One option is to group your race data points and show them as boxplots for each YOI.
This might be what you want. The boxplot gives you the median and quartiles, and you can compare this per group against the calculated ASB mean.
For presentation purposes, I highlighted the line by increasing its size and linetype. You can play around with the colours, etc. to give you the aesthetics you aim for.
Please note, that for the grouped boxplot, you also have to treat your integer variable YOI, I coerced into a categorical factor. Boxplot works with fill for the body (colour sets only the outer line). In this setup, you also need to supply a group value to geom_line() (I just assigned it to 1, but that is arbitrary - in other contexts you can assign another variable here).
ggplot() +
geom_boxplot(data = Antisocial, aes(x = as.factor(YOI), y = ASB, fill = as.factor(Race))) +
geom_line(data = Table_1, aes(x = as.factor(YOI), y = ASB_mean, group = 1)
, size = 2, linetype = "dashed") +
labs(x = "YOI", fill = "Race")
Hope this gets you going!

GGPlot is returning different colours to what I specify

I'm somewhat new to R - and having a really strange issue while trying to produce the following plot
worst_death <- df_clean %>%
group_by(event_cat) %>%
summarise(Deaths = sum(FATALITIES)
, Injuries = sum(INJURIES)) %>%
ggplot()+
geom_segment(aes(x=reorder(event_cat,Injuries),xend=reorder(event_cat,Injuries), y=Deaths, yend = Injuries, color="black")) +
geom_point(aes(x=reorder(event_cat,Injuries), y=Deaths,color="yellow", size=1 ))+
geom_point(aes(x=reorder(event_cat,Injuries), y=Injuries,color="white", size=1 ))+
coord_flip()+
theme_ipsum()+
theme(legend.position = "none",) +
xlab("Event Type") +
ylab("Human Impact")
worst_death
The graph is running perfectly - except the colours and aesthetic options (size etc.) are not returning what I specified.
Strangely enough the colours are red blue and green, rather than yellow black and white.
does anyone know why this might be happening?
thanks
I can't test this without your data, but the following should work for you:
worst_death <- df_clean %>%
group_by(event_cat) %>%
summarise(Deaths = sum(FATALITIES),
Injuries = sum(INJURIES)) %>%
ggplot(aes(x = reorder(event_cat,Injuries), y = Deaths)) +
geom_segment(aes(xend = reorder(event_cat,Injuries),
y = Deaths, yend = Injuries)) +
geom_point(color = "yellow", size = 1) +
geom_point(aes(y = Injuries), color = "white", size = 1) +
coord_flip() +
theme_ipsum() +
theme(legend.position = "none") +
xlab("Event Type") +
ylab("Human Impact")
worst_death
There are a couple of points to note:
When you use a character string for color inside aes, ggplot reads it as a single factor level that assigns the geom to a labelled color grouping, and will not interpret it as a literal color assignment. If you had a legend in your plot, the key would show the labels "white" and "yellow" against the red and blue key dots. You can either add + scale_color_identity() to your plot if you want these labels to be interpreted as literal colors or, more commonly, just bring color = outside of aes, where it is interpreted as an actual color assignment. This is the easiest way to do it if you don't want a legend.
You should probably bring size = outside the aes call too, effectively for the same reason. ggplot is mapping the number 1 to its default size scale rather than literally making the points size 1.
The geom_segment is black by default, so it doesn't need a color assignment.
You can save some typing (and hence reduce risk of bugs and make maintenance easier) if you include the default x and y aesthetics in the original ggplot call. These are inherited by any subsequent geoms, but can be over-ridden if required.
When posting a question on SO, please include data as well as code, otherwise no-one can reproduce your problem or test / demonstrate possible solutions. The easiest way to do this in your case is to copy and paste the output of dput(df_clean) into your question.

Plotting with secondary axis

I am trying to overlay a bar graph (primary axis) and line (secondary axis), but I keep getting an error that I don't understand how to fix. I have tried to follow multiple examples from other questions, but I'm still not getting the result I need.
Here's my code:
ggplot(data = MRIP, aes(x = Length_mm)) +
geom_bar(aes(y = Perc.of.Fish), stat="identity", width = 10, fill = "black") +
geom_line(aes(y = Landings), stat = "identity", size = 2, color = "red") +
scale_y_continuous(name = "Percentage", sec.axis = sec_axis (~./Landings, name = "Landings"))
How do I fix this error: "Error in f(...) : object 'Landings' not found"?
Try this:
coef <- 4000
MRIP %>%
mutate(LandingsPlot=Landings/coef) %>%
ggplot(aes(x = Length_mm)) +
geom_col(aes(y = Perc.of.Fish), width = 10, fill = "black") +
geom_line(aes(y = LandingsPlot), size = 2, color = "red") +
scale_y_continuous(
name = "Percentage",
sec.axis = sec_axis (trans= ~.*coef, name = "Landings")
)
Giving
Why does this work? The scale factor used to define the secondary axis cannot be part of the input data.frame - because if it were, it could potentially vary across rows (as it does here). That would mean you had a separate scale for each row of the input data.frame. That doesn't make sense. So, you scale the secondary variable to take a similar range to that of the primary variable. I chose coef <- 4000 by eye. The exact value doesn't matter, so long as it's sensible.
Having divided by the scale factor to obtain the plotted values, you need to multiply by the scale factor in the transformation in order to get the correct labels on the secondary axis.
Thank you for providing a good MWE. But next time, for extra marks, please post the results of dput() in your question rather than in the comments...
Update
To answer OP's follow up question in the comments: legends are linked to aesthetics. So to get a legend, move the attribute that you want to label inside aes(). You can then define and customise the legend using the appropriate scale_<aesthetic>_<type>. However it's worth noting that if you write, say, aes(colour="black") then "black" is just a character string. It doesn't define a colour. (Using the standard defaults, it will in fact appear as a slightly pinkish red, labelled "black"!) This can be confusing, so it might be a good idea to use arbitrary strings like "a", "b" and "c" (or "Landings" and "Percentage") in the aesthetics. Anyway...
coef <- 4000
#Note fill and color have moved inside aes()
MRIP %>%
mutate(LandingsPlot=Landings/coef) %>%
ggplot(aes(x = Length_mm)) +
geom_col(aes(y = Perc.of.Fish, fill = "black"), width = 10,) +
geom_line(aes(y = LandingsPlot, color = "red"), size = 2) +
scale_y_continuous(
name = "Percentage",
sec.axis = sec_axis (trans= ~.*coef, name = "Landings")
) +
scale_color_manual(values=c("red"), labels=c("Landings"), name=" ") +
scale_fill_manual(values=c("black"), labels=c("Percentage"), name=" ")
Gives

geom_ribbon different colours - R

I am using the following code to plot my data but I cannot manage to set the colours to geom_ribbon properly.
My graph contains 4 lines, each of one with a different color. I want the 'geom_ribbon' of each line to have the same color as its line (with transparency - alpha).
In addition, when I change the value of alpha (e.g. from 0.1 to 0.9) I dont't see any change on the transparency. Finally, an extra class is added in the legend and I would like to remove this? Any help on this basic ggplot?
ggplot(dfmean_forplot, aes(x = image, y = value, group = ID)) +
geom_line(aes(colour=factor(ID)))+
scale_x_discrete(breaks=1:21,
labels=c("19/1","7/2","17/2","18/3","17/4","27/4","17/5","27/5","7/6","16/6","26/6","5/7","16/7","6/8","15/8","25/8","4/9","25/9","4/10","14/10","22/11"))+
xlab("# reference")+
ylab("value")+
scale_colour_discrete(name = "class")+
ylim(0,0.9)+
geom_ribbon(aes(ymin=dfmean_forplot$value-dfsd_forplot$value, ymax=dfmean_forplot$value+dfsd_forplot$value, alpha = 0.3))
EDIT
What about the legend? Ideally, I would like to combine them so that there is a square for each color crossed by a line of the same color
You need to add the fill aesthetic and take alpha outside aes, both for geom_ribbon. The following code should solve that.
ggplot(dfmean_forplot, aes(x = image, y = value, group = ID)) +
geom_line(aes(colour=factor(ID)))+
scale_x_discrete(breaks=1:21,
labels=c("19/1","7/2","17/2","18/3","17/4","27/4","17/5","27/5","7/6","16/6","26/6","5/7","16/7","6/8","15/8","25/8","4/9","25/9","4/10","14/10","22/11"))+
xlab("# reference")+
ylab("value")+
scale_colour_discrete(name = "class")+
ylim(0,0.9)+
geom_ribbon(aes(ymin=dfmean_forplot$value-dfsd_forplot$value,
ymax=dfmean_forplot$value+dfsd_forplot$value,
fill = factor(ID)), alpha = 0.3)

combined bar, line, and point geom in ggplots2- how to change fill on point and dashed line?

Ive worked on this from a previous post: Combined line & bar geoms: How to generate proper legend? And have gotten close. Here is the code I used which adds a line and point geom to the bar plot:
mort12=data.frame(
Adj.S=c(.68,.33,.66,.62,.6,.51,.6,.76,.51,.5),
QTL=c(1:10),
Cum.M=c(.312,.768,NA,.854,NA,.925,.954,NA,NA,.977)
)
ggplot(data=mort12, aes(QTL)) +
geom_bar(aes(y = Adj.S, color = "Adj.S"), stat="identity", fill = "red") +
geom_point(data=mort12[!is.na(mort12$Cum.M),], aes(y = Cum.M, group = 1,size=4, color = "Cum.M"))+
geom_line(data=mort12[!is.na(mort12$Cum.M),],aes(y=Cum.M, linetype="dotted",group = 1))
(Note, I have some missing data for Cum.M, so to connect those points I added code to ignore the missing values).
And when I run this, I get this figure (I cant post pictures here, so its linked):
https://docs.google.com/uc?export=view&id=0B-6a5UsIa6UpZnRZTy1OZmxrY1E
Id like to control the appearance of the line and points. But attempts to make the line dotted (linetype="dotted") did not change it, and when I attempt to change the fill of the dots (fill="white") I ge this error
Error: A continuous variable can not be mapped to shape
Any suggestions on how to alter the attributes of the line and points?
This worked for me:
ggplot(data=mort12, aes(QTL)) +
geom_bar(aes(y = Adj.S, color = "Adj.S"), stat="identity", fill = "white") +
geom_point(data=mort12[!is.na(mort12$Cum.M),], aes(y = Cum.M, group = 1,size=4, color = "Cum.M"))+
geom_line(data=mort12[!is.na(mort12$Cum.M),],aes(y=Cum.M, group = 1), linetype="dotted")
All I did was move linetype outside of aes. Generally speaking, aesthetics that are not driven by your data should not be inside aes. For example, size should probably also not be in aes.

Resources