How to add labels and points to each geom_line in ggplot2? - r

I have a dataframe called (casos_obitos) that looks something like this:
EPI_WEEK CASES DEATHS
SE 51 1053 19
SE 52 1384 21
SE 53 1892 25
SE 01/21 1806 43
I'm making a plot with ggplot that places both cases and deaths in two different geom_lines. This is my code:
scl = 10
ggplot(data = casos_obitos, aes(x = EPI_WEEK, y = CASES, fill = CASES, group =1))+
scale_y_continuous(limits = c(0, max(casos_obitos$CASES)+10), expand = expansion(mult = c(0, .1)),
sec.axis = sec_axis(~./scl, name = "Nº de Óbitos"))+
geom_line(aes(x = SEM_EPI, y = CASES, color = "CASES"), size = 1)+
geom_line(aes(x = SEM_EPI, y = DEATHS*scl, color = "DEATHS"), size = 1) +
geom_text(aes(label= CASES), hjust= 0.5, vjust = -2, size= 2.0, color= "black") +
labs(x = "Semana Epidemiológica", y = "Nº de Casos") +
scale_colour_manual(" ", values=c("CASES" = "blue", "DEATHS" = "red"))+
theme_minimal(base_size = 10) +
theme(legend.position = "bottom", axis.line = element_line(colour = "black"),
axis.text.x=element_text(angle = 90, vjust = 0.5, hjust=1, color="black"),
axis.text.y=element_text(color="black"))
For now, my plot looks like this:
Where the blue line is the cases column and the red one is the deaths column. I need to put labels on the red line but I can't seem to find answers for that. I also wany to put labels in a "nice looking" way so I can understand the numbers and they don't look messy like they're right now.
Thanks!

You should be able to add the following to get labels on the bottom line:
geom_text(aes(y = DEATHS*scl, label= DEATHS), hjust= 0.5, vjust = -2, size= 2.0, color= "black") +
You might also consider reshaping your data into a long format so that the CASES and DEATHS (after scaling) values are combined into the same column, with another column distinguishing which series is related to each value. ggplot2 generally works more smoothly with data in that form -- you would map the color aesthetic to the column specifying which series, and then you'd only need one geom_line and one geom_text to get both series. In this case, with only two series, and one of them scaled, it might not be worth the trouble to switch.
"Nice looking labels" is subjective and a harder problem than it might sound. There are a few options, including:
use a function like ggrepel::geom_text_repel to automatically shift labels from overlapping each other. It works by starting from an initial point and iteratively nudging until the labels have as much separation as you've specified. Many options for adjusting the initial starting position and how the nudging should work.
manually nudge the labels you need to using code, e.g. by adjusting vjust for certain points. You might, for instance, use vjust to make the labels under the line for the points that are lower than neighboring points, by pre-calculating a moving average and comparing values to that.
manually nudge the points afterward, e.g. by using officer/svg to output to a vector file you can edit in powerpoint, for instance.
avoid persistent labels altogether by shifting to an interactive option like ggplotly and see the labels upon hover instead of all the time.
You might also take a look at functions like scales::comma to control how the labels themselves appear. I'm anticipating that your Deaths labels will have many digits of decimals but you probably just will want the integer part of that...

Related

Adding space *just* on right size of x-axis, color based on relative position, specify labels

I have a time series graph of 49 countries, and I'd like to do three things: (1) prevent the country label name from being cut off, (2) specify so that the coloring is based on the position in the graph rather than alphabetically, and (3) specify which countries I would like to label (49 labels in one graph is too many).
library(ggplot2)
library(directlabels)
library(zoo)
library(RColorBrewer)
library(viridis)
colourCount = length(unique(df$newCol))
getPalette = colorRampPalette(brewer.pal(11, "Paired"))
## Yearly Incorporation Rates
ggplot(df,aes(x=year2, y=total_count_th, group = newCol, color = newCol)) +
geom_line() +
geom_dl(aes(label = newCol),
method= list(dl.trans(x = x + 0.1),
"last.points", cex = 0.8)) +
scale_color_manual(values = getPalette(colourCount)) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
legend.position = "none") +
labs(title = "Title",
x = "Year",
y = "Count")
This code works -- there are 49 lines, and each of them is labelled. But it just so happens that all the countries with the highest y-values have the same/similar colors (red/orange). So is there a way to specify the colors dynamically (maybe with scale_color_identity)? And how do I add space just on the right side of the labels? I found the expand = expand_scale, but it added space on both sides (though I did read that in the new version, it should be possible to do so.)
I am also fine defining a list of 49 manually-defined colors rather than using the color ramp.
One way to do it is to limit the x axis by adding something like
coord_cartesian(xlim = c(1,44), expand = TRUE)
In this case, I had 41 years of observations on the axis, so by specifying 44, I added space to the x-axis.
Thank you to #JonSpring for the help and getting me to the right answer!

ggplot2 geom_linerange remove whitespace between rows

Am attempting to create a plot similar to a strip chart recorder showing outage data. Outage Severity is Major and Minor. Plot has a large amount of vertical white space between the two rows and before and after that I would like to remove to create a compact two-row chart.
dataframe is:
> head(dfsub)
StartDateTime EndDateTime Outage.DUR Outage.Severity
1 2021-07-01T00:23:33.0000000 2021-07-01T00:25:26.0000000 1.8833333 Minor
2 2021-07-01T00:25:26.0000000 2021-07-01T00:31:33.0000000 6.1166667 Major
3 2021-07-01T00:31:33.0000000 2021-07-01T00:40:34.0000000 9.0166667 Major
4 2021-07-01T00:40:34.0000000 2021-07-01T00:42:57.0000000 2.3833333 Minor
5 2021-07-01T00:42:57.0000000 2021-07-01T00:43:49.0000000 0.8666667 Minor
6 2021-07-01T00:43:49.0000000 2021-07-01T00:45:35.0000000 1.7666667 Minor
R Code I am running
ggplot(dfsub) +
geom_linerange(aes(y = Outage.Severity,
xmin = StartDateTime,
xmax = EndDateTime,
colour = as.factor(Outage.Severity)
),
show.legend = FALSE,
size = 50) +
scale_color_manual(values = c("red", "yellow")) +
theme(legend.position = "none") +
theme_test()
generates this plot
Two suggestions.
You didn't ask about this, but your x-axis is broken, using time (which is a continuous thing) in a categorical sense. Note that R and ggplot2 are treating the current columns as strings not timestamps. This is easily resolved:
dfsub[c("StartDateTime", "EndDateTime")] <-
lapply(dfsub[c("StartDateTime", "EndDateTime")], as.POSIXct, format="%Y-%m-%dT%H:%M:%OS", tz="UTC")
I don't think you're going to get the fine control over blank space between the reds and yellows using geom_linerange, I suggest geom_rect as an option. With that, remove size=, and we'll need to control ymin= and ymax=. This benefits from setting Outage.Severity to a factor; while not completely necessary, it's common for this work to then come back with "how do I change the order of the y-axis categories?", for which the only (sane) response is to convert them to factors and control their levels=. We also need to add fill=, which geom_linerange did not need.
dfsub$Outage.Severity <- factor(dfsub$Outage.Severity) # add 'levels=' if you want to control the order
From here, knowing that categorical data are plotted on integers, we'll fill the gap between them by extending their rectangles +/- 0.48 (arbitrary, but should likely be close to but not at/beyond 0.5).
ggplot(dfsub) +
geom_rect(aes(ymin = as.numeric(Outage.Severity)-0.48,
ymax = as.numeric(Outage.Severity)+0.48,
xmin = StartDateTime,
xmax = EndDateTime,
colour = Outage.Severity,
fill = Outage.Severity),
show.legend = FALSE) +
scale_y_continuous(breaks = unique(as.numeric(dfsub$Outage.Severity)), labels = unique(dfsub$Outage.Severity)) +
scale_color_manual(values = c("Major"="red", "Minor"="yellow")) +
scale_fill_manual(values = c("Major"="red", "Minor"="yellow")) +
theme(legend.position = "none") +
theme_test()

I'm using ggplot in R version 3.5.3 trying to add to annotate text to a faceted grid but I keep getting an error about the aesthetics length

Tr<-c("Sorghum Male \n Sorghum Female","Sorghum Male \n Wheat Female","Wheat Male \n Sorghum Female","Wheat Male \n Wheat Female")
Treatment<-c(rep(Tr,3))
Matingdiet<-c(rep(c("Same diet","Cross diet","Cross diet", "Same diet"),3))
Rejection<-c(0.05, 0.00, 0.10, 0.00, 0.00, 0.05, 0.05, 0.00, 0.05, 0.05, 0.05, 0.05)
d<-as.data.frame(cbind(Treatment,Rejection, Matingdiet))
d$pop<-c(rep("JN200A-OBL",4),rep("JN200B-OBL",4),rep("JN200C-OBL",4))
d$Rejection<-as.numeric(as.character(d$Rejection))
d$pop<-as.factor(d$pop)
datatxt<-as.data.frame(cbind(labels = rep("N = 20 per treatment",3)),pop=c("JN200A-OBL","JN200B-OBL","JN200C-OBL"))
pl<-ggplot(data = d, aes(x=Treatment, y=Rejection, fill=Matingdiet))+geom_col()+facet_wrap(~pop)
pl<-pl+labs(fill="Mating pair type", y = "Proportion of mates rejected")+ylim(0,1)+theme(axis.text.x = element_text(angle = -60, hjust = 1, vjust = -1))
pl<-pl+theme(plot.background = element_blank(),panel.grid.major = element_blank(), panel.grid.minor = element_blank())
pl+geom_text(data=datatxt,aes(label = labels))
Which gives this error
Error: Aesthetics must be either length 1 or the same as the data (9): x, y and fill
When I run it without adding the geom_text() function I get my desired graph but I want to annotate it.
Well, first of all, it looks like in your code there is a misplaced parentheses when you define datatxt that results in that data frame to have only one column called labels. You're also using as.data.frame() when it makes much more sense to use simply data.frame(), where you do not have to use cbind(), but rather just list the column names as vectors, separated by ,.
datatxt <- data.frame(
labels = rep("N = 20 per treatment",3),
pop=c("JN200A-OBL","JN200B-OBL","JN200C-OBL")
)
As for placing the text on your plot, if you are using geom_text it will be mapping the data like ggplot does for all the other geoms. That is to say that what is drawn on the plot will be based on the data itself and the mapping you define in aes(), which is linked to the columns of that data. For geom_text, it will look through each observation in the dataset you give it (in this case, datatxt) and look for those values pertaining to x, y, and also fill (because this was defined in the overall call to ggplot() in your plot. The error message is due to not finding those columns in the dataset, and in fact, you do not have columns that are mapped to x, y, or fill in datatxt at all.
The first fix is to remove the fill aesthetic from the overall call to ggplot(). If it is used in all geoms, it makes sense to put it here, but I like to define the aesthetics that are used only for particular geoms inside the geom call itself. Hence, I'm moving fill=Matingdiet inside the aes() for geom_col() where it is used. We can get around another way, but this is simplest.
Second, you presumably want the text to appear in the same location for each facet, right? Since it's not going to move with the data, we should be defining where it goes outside the mapping= specification of geom_text() - in other worse, outside aes(). I also change a few other aesthetics so you can see what else you may want to specify here.
Here's the result:
pl<-
ggplot(data = d, aes(x=Treatment, y=Rejection)) +
geom_col(aes(fill=Matingdiet)) +
facet_wrap(~pop) +
labs(fill="Mating pair type", y = "Proportion of mates rejected") +
ylim(0,1) +
theme(
axis.text.x = element_text(angle = -60, hjust = 1, vjust = -1),
plot.background = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()
)
pl +
geom_text(
data=datatxt, aes(label = labels),
x=1, y=0.95, hjust=0, fontface='italic', color='red')
Remember that geom_text is really supposed to be suited for the case where the value of whatever you assign inside aes() changes with respect to the data. For example, you might have N=20 for two of the facets, but N=30 for another one. If that's the case, you can use that approach above. If you need to have the text remain the same regardless of the data, an easier approach might be to use annotate() instead:
pl +
annotate(
geom='text', x=1, y=0.95, color='red', fontface='italic',
label='N = 20 per treatment', hjust=0
)
Ultimately, it's up to you, as both work here. The above code gives you the same plot as using geom_text.

RColorbrewer not changing plot colors on geom_bar - R

I am working on a plot where my dataset is a simple column of numbers,
count1 -> c(10,10,10,10,9,8,6,4,4,3,3,3)
for the y-axis, and just a list of room numbers for the y-axis. The data is arranged so that count1's highest values are placed on the left since the room numbers are irrelevant. I am trying to use a color palette which will color the bars based upon their number, i.e. 10 is more important than 8, more important than 4, etc.
I have set the column as.numeric(count1), and tried a few things. RColorbrewer is loaded in, and I just get some random, bad looking colors that indicate RColorbrewer isn't active at all.
Here is my plot code, maybe someone can find my mistake? I imagine the datatype of the room numbers (x-axis) shouldn't matter.
ggplot(fumeNumber3, aes(x = reorder(Location, -count1), y = count1)) +
theme(axis.text.x = element_text(angle = 85, hjust = 1)) +
geom_bar(stat = "identity", fill = (fumeNumber3$count1)) +
labs(x = "Lab Location", y = "Number of Fume Hoods in lab") +
scale_fill_brewer(palette = "Blues")

Discrete values and geom_ribbon and geom_lines + problems with "discrete" scale

I have got a file like this one:
Month,Open,Closed
2017-08,53,38
2017-09,102,85
2017-10,58,38
2017-11,51,42
2017-12,32,24
2018-01,24,30
2018-02,56,46
2018-03,82,74
2018-04,95,89
2018-05,16,86
I want to plot both lines, and also shade the difference between them. So this works:
ggplot() +geom_line(data=issues.m,aes(x=Month,y=Open,group=1))
+geom_line(data=issues.m,aes(x=Month,y=Closed,group=1))
+geom_ribbon(data=issues.m, aes(x=Month,ymin=Closed,ymax=Open,color=Open-Closed))
+theme_tufte()
+theme(axis.text.x = element_text(angle = 90, hjust = 1))
producing this
First problem here is that I would like the whole area between the two lines shaded if possible, not a single line. How can I do that?
But I would also like to color the two lines. If I add a color to one of them:
ggplot()
+geom_line(data=issues.m,aes(x=Month,y=Open,group=1,color='open'))
+geom_line(data=issues.m,aes(x=Month,y=Closed,group=1))
+geom_ribbon(data=issues.m, aes(x=Month,ymin=Closed,ymax=Open,color=Open-Closed))
+theme_tufte()
+theme(axis.text.x = element_text(angle = 90, hjust = 1))
I get the error:
Error: Continuous value supplied to discrete scale
So, can what I want to do be done at all? Would it be possible to change the colour palette of the ribbon too?
Your error was because you were mapping Open - Closed onto the color, which will be a continuous variable, i.e. the difference between those two values for each month. But you also assigned "open" to color inside the aes in one of your geom_lines. That means you're trying to assign both continuous values and discrete values to the same scale, and that's not going to work.
If all you need to do is get 2 colors, one for each line, you can do this one of two ways, the second of which fits more into the ggplot/tidyverse way of doing things.
First off I turned your dates into date objects to clean up the x-axis and avoid rotating the labels—feel free to experiment with the date breaks that work well in scale_x_date.
The less "tidy" way is to just make two geom_lines, one for Open and one for Closed, and assign a color to each.
library(tidyverse)
df_dated <- df %>%
mutate(month2 = sprintf("%s-01", Month) %>% lubridate::ymd())
ggplot(df_dated, aes(x = month2)) +
geom_ribbon(aes(ymin = Open, ymax = Closed), fill = "lightblue2") +
geom_line(aes(y = Open), color = "green3") +
geom_line(aes(y = Closed), color = "red") +
ggthemes::theme_tufte()
But the more idiomatically "tidy" way is to make a long-shaped version of the data so you can map a variable—in this case whether an observation is the opening or closing value—onto an aesthetic such as color. This also gives you a legend—if you don't want it, you can get rid of it in the theme. This lets you set a scale for the colors, instead of hard-coding into each geom_line.
df_date_long <- df_dated %>%
gather(key, value, -month2, -Month)
ggplot(df_dated, aes(x = month2)) +
geom_ribbon(aes(ymin = Open, ymax = Closed), fill = "lightblue2") +
geom_line(aes(y = value, color = key), data = df_date_long) +
scale_color_manual(values = c(Open = "green3", Closed = "red")) +
ggthemes::theme_tufte()

Resources