ggplot: Adding labels to lines, not endpoints - r

Looking for some gglot help here, for a pretty non-standard plot type. Here is code for one sample graph, but the final product will have dozens like this.
library(ggplot2)
library(data.table)
SOURCE <- c('SOURCE.1','SOURCE.1','SOURCE.2','SOURCE.2')
USAGE <- rep(c('USAGE.1','USAGE.2'),2)
RATIO <- c(0.95,0.05,0.75,0.25)
x <- data.table(SOURCE,USAGE,RATIO)
ggplot(x, aes(x=SOURCE,y=RATIO,group=USAGE)) +
geom_point() +
geom_line() +
geom_label(aes(label=USAGE))
This produces a graph with two lines, as desired. But the label geom adds text to the endpoints. What we want is the text to label the line (appearing once per line, around the middle). The endpoints will always have the same labels so it just creates redundancy and clutter (each graph will have different labels). See the attached file, mocked up in a paint programme (ignore the font and size):
I know we could use geom_line(aes(linetype=USAGE)), but we prefer not to rely on legends due to the sheer number of graphs required and because each graph is quite minimal as the vast majority will have just the two lines and the most extreme cases will only have four.
(Use of stacked bars deliberately avoided.)

You can achieve this with annotate and can move the label around by changing the x and y values.
library(ggplot2)
#library(data.table)
SOURCE<-c('SOURCE.1','SOURCE.1','SOURCE.2','SOURCE.2')
USAGE<-rep(c('USAGE.1','USAGE.2'),2)
RATIO<-c(0.95,0.05,0.75,0.25)
#x<-data.table(SOURCE,USAGE,RATIO)
df <- data.frame(SOURCE, USAGE, RATIO)
ggplot(df, aes(x=SOURCE,y=RATIO,group=USAGE)) +
geom_point() +
geom_line() +
#geom_label(aes(label=USAGE))+
annotate('text', x=1.5, y=1, label = USAGE[1])+
annotate('text', x=1.5, y=0.25, label = USAGE[2])

Related

Why does changing the label mess up my plot?

I have recently been playing around with various plot types using fictitious data to get my head around how I could display various pieces of information. One plot type that is gaining popularity is the so called individual differences dot plot which shows the change in each subjects score pre-post. The plot is fairly easy to produce, but my issue is that when I go to change the labels using either the labs or xlab ylab functions in ggplot, the plot itself becomes messed up. Below I have attached the fictitious data, the code used and the results.
Data
df<- data.frame(Participant<- c(rep(1:10,2)), Score<- c(rnorm(20,100,5)), Session<- c(1,1,1,1,1,1,1,1,1,1, 2,2,2,2,2,2,2,2,2,2))
colnames(df) <- c("Participant", "Score", "Session")
Code for plot
p<- ggplot(df, aes(x=df$Session, y=df$Score, colour=df$Participant))+ geom_point()+
geom_line(group=df$Participant)+
theme_classic()
Plot
Individual difference plot
My dilemma is that anytime I try to change the label names, the plot messes up as per below.
Problem
p + xlab("Session") + ylab("Score")
Plot after relabelling
The same thing happens if I try the labs function i.e, p + labs(x= "Session", y= "Score"). You can see that the labels themselves do actually change, but for some reason this messes up the actual plot. Does any have any ideas as to what could be going wrong here?
The issue appears to be the grouping is undone when the label functions are called. Instead, issue the grouping as an aesthetic mapping:
library(dplyr); library(ggplot)
df %>% mutate(across(c(Session,Participant),factor)) -> df
p <- ggplot(df, aes(x=Session, y=Score, colour=Participant))+ geom_point()+
geom_line(aes(group=Participant))+
theme_classic()
p + xlab("Session") + ylab("Score")
I suspect this is probably a bug.

How to make stacked bar chart with count values on y axis>

I'm trying to create a stacked barchart with gene sequencing data, where for each gene there is a tRF.type and Amino.Acid value. An example data set looks like this:
tRF <- c('tRF-26-OB1690PQR3E', 'tRF-27-OB1690PQR3P', 'tRF-30-MIF91SS2P46I')
tRF.type <- c('5-tRF', 'i-tRF', '3-tRF')
Amino.Acid <- c('Ser', 'Lys', 'Ser')
tRF.data <- data.frame(tRF, tRF.type, Amino.Acid)
I would like the x-axis to represent the amino acid type, the y-axis the number of counts of each tRF type and the the fill of the bars to represent each tRF type.
My code is:
ggplot(chart_data, aes(x = Amino.Acid, y = tRF.type, fill = tRF.type)) +
geom_bar(stat="identity") +
ggtitle("LAN5 - 4 days post CNTF treatment") +
xlab("Amino Acid") +
ylab("tRF type")
However, it generates this graph, where the y-axis is labelled with the categories of tRF type. How can I change my code so that the y-axis scale is numerical and represents the counts of each tRF type?
Barchart
OP and Welcome to SO. In future questions, please, be sure to provide a minimal reproducible example - meaning provide code, an image (if possible), and at least a representative dataset that can demonstrate your question or problem clearly.
TL;DR - don't use stat="identity", just use geom_bar() without providing a stat, since default is to use the counts. This should work:
ggplot(chart_data, aes(x = Amino.Acid, fill = tRF.type)) + geom_bar()
The dataset provided doesn't adequately demonstrate your issue, so here's one that can work. The example data herein consists of 100 observations and two columns: one called Capitals for randomly-selected uppercase letters and one Lowercase for randomly-selected lowercase letters.
library(ggplot2)
set.seed(1234)
df <- data.frame(
Capitals=sample(LETTERS, 100, replace=TRUE),
Lowercase=sample(letters, 100, replace=TRUE)
)
If I plot similar to your code, you can see the result:
ggplot(df, aes(x=Capitals, y=Lowercase, fill=Lowercase)) +
geom_bar(stat="identity")
You can see, the bars are stacked, but the y axis is all smooshed down. The reason is related to understanding the difference between geom_bar() and geom_col(). Checking the documentation for these functions, you can see that the main difference is that geom_col() will plot bars with heights equal to the y aesthetic, whereas geom_bar() plots by default according to stat="count". In fact, using geom_bar(stat="identity") is really just a complicated way of saying geom_col().
Since your y aesthetic is not numeric, ggplot still tries to treat the discrete levels numerically. It doesn't really work out well, and it's the reason why your axis gets smooshed down like that. What you want, is geom_bar(stat="count").... which is the same as just using geom_bar() without providing a stat=.
The one problem is that geom_bar() only accepts an x or a y aesthetic. This means you should only give it one of them. This fixes the issue and now you get the proper chart:
ggplot(df, aes(x=Capitals, fill=Lowercase)) + geom_bar()
You want your y-axis to be a count, not tRF.type. This code should give you the correct plot: I've removed the y = tRF.type from ggplot(), and stat = "identity from geom_bar() (it is using the default value of stat = "count instead).
ggplot(tRF.data, aes(x = Amino.Acid, fill = tRF.type)) +
geom_bar() +
ggtitle("LAN5 - 4 days post CNTF treatment") +
xlab("Amino Acid") +
ylab("tRF type")

Customize linetype in ggplot2 OR add automatic arrows/symbols below a line

I would like to use customized linetypes in ggplot. If that is impossible (which I believe to be true), then I am looking for a smart hack to plot arrowlike symbols above, or below, my line.
Some background:
I want to plot some water quality data and compare it to the standard (set by the European Water Framework Directive) in a red line. Here's some reproducible data and my plot:
df <- data.frame(datum <- seq.Date(as.Date("2014-01-01"),
as.Date("2014-12-31"),by = "week"),y=rnorm(53,mean=100,sd=40))
(plot1 <-
ggplot(df, aes(x=datum,y=y)) +
geom_line() +
geom_point() +
theme_classic()+
geom_hline(aes(yintercept=70),colour="red"))
However, in this plot it is completely unclear if the Standard is a maximum value (as it would be for example Chloride) or a minimum value (as it would be for Oxygen). So I would like to make this clear by adding small pointers/arrows Up or Down. The best way would be to customize the linetype so that it consists of these arrows, but I couldn't find a way.
Q1: Is this at all possible, defining custom linetypes?
All I could think of was adding extra points below the line:
extrapoints <- data.frame(datum2 <- seq.Date(as.Date("2014-01-01"),
as.Date("2014-12-31"),by = "week"),y2=68)
plot1 + geom_point(data=extrapoints, aes(x=datum2,y=y2),
shape=">",size=5,colour="red",rotate=90)
However, I can't seem to rotate these symbols pointing downward. Furthermore, this requires calculating the right spacing of X and distance to the line (Y) every time, which is rather inconvenient.
Q2: Is there any way to achieve this, preferably as automated as possible?
I'm not sure what is requested, but it sounds as though you want arrows at point up or down based on where the y-value is greater or less than some expected value. If that's the case, then this satisfies using geom_segment:
require(grid) # as noted by ?geom_segment
(plot1 <-
ggplot(df, aes(x=datum,y=y)) + geom_line()+
geom_segment(data = data.frame( df$datum, y= 70, up=df$y >70),
aes(xend = datum , yend =70 + c(-1,1)[1+up]*5), #select up/down based on 'up'
arrow = arrow(length = unit(0.1,"cm"))
) + # adjust units to modify size or arrow-heads
geom_point() +
theme_classic()+
geom_hline(aes(yintercept=70),colour="red"))
If I'm wrong about what was desired and you only wanted a bunch of down arrows, then just take out the stuff about creating and using "up" and use a minus-sign.

Can the minimum y-value be adjusted when using scales = "free" in ggplot?

Using the following data set:
day <- gl(8,1,48,labels=c("Mon","Tues","Wed","Thurs","Fri","Sat","Sun","Avg"))
day <- factor(day, level=c("Mon","Tues","Wed","Thurs","Fri","Sat","Sun","Avg"))
month<-gl(3,8,48,labels=c("Jan","Mar","Apr"))
month<-factor(month,level=c("Jan","Mar","Apr"))
snow<-gl(2,24,48,labels=c("Y","N"))
snow<-factor(snow,levels=c("Y","N"))
count <- c(.94,.95,.96,.98,.93,.94,.99,.9557143,.82,.84,.83,.86,.91,.89,.93,.8685714,1.07,.99,.86,1.03,.81,.92,.88,.9371429,.94,.95,.96,.98,.93,.94,.99,.9557143,.82,.84,.83,.86,.91,.89,.93,.8685714,1.07,.99,.86,1.03,.81,.92,.88,.9371429)
d <- data.frame(day=day,count=count,month=month,snow=snow)
I like the y-scale in this graph, but not the bars:
ggplot()+
geom_line(data=d[d$day!="Avg",],aes(x=day, y=count, group=month, colour=month))+
geom_bar(data=d[d$day=="Avg",],aes(x=day, y=count, fill=month),position="dodge", group=month)+
scale_x_discrete(limits=levels(d$day))+
facet_wrap(~snow,ncol=1,scales="free")+
scale_y_continuous(labels = percent_format())
I like the points, but not the scale:
ggplot(data=d[d$day=="Avg",],aes(x=day, y=count, fill=month,group=month,label=month),show_guide=F)+
facet_wrap(~snow,ncol=1,scales="free")+
geom_line(data=d[d$day!="Avg",],aes(x=day, y=count, group=month, colour=month), show_guide=F)+
scale_x_discrete(limits=levels(d$day))+
scale_y_continuous(labels = percent_format())+
geom_point(aes(colour = month),size = 4,position=position_dodge(width=1.2))
How to combine the desirable qualities in the above graphs?
Essentially, I'm asking: How can I graph the points with a varied y-max while setting the y-min to zero?
Note: The solution that I'm aiming to find will apply to about 27 graphs built from one dataframe. So I'll vote up those solutions that avoid alterations to individual graphs. I'm hoping for a solution that applies to all the facet wrapped graphs.
Minor Questions (possibly for a separate post):
- How can I add a legend to each of the facet wrapped graphs? How
can I change the title of the legend to read "Weekly Average"? How
can the shape/color of the lines/points be varied and then reported
in one single legend?
there's expand_limits(y=0), which essentially adds a dummy layer with invisible geom_blank only to stretch the scales.

How can I change the colors in a ggplot2 density plot?

Summary: I want to choose the colors for a ggplot2() density distribution plot without losing the automatically generated legend.
Details: I have a dataframe created with the following code (I realize it is not elegant but I am only learning R):
cands<-scan("human.i.cands.degnums")
non<-scan("human.i.non.degnums")
df<-data.frame(grp=factor(c(rep("1. Candidates", each=length(cands)),
rep("2. NonCands",each=length(non)))), val=c(cands,non))
I then plot their density distribution like so:
library(ggplot2)
ggplot(df, aes(x=val,color=grp)) + geom_density()
This produces the following output:
I would like to choose the colors the lines appear in and cannot for the life of me figure out how. I have read various other posts on the site but to no avail. The most relevant are:
Changing color of density plots in ggplot2
Overlapped density plots in ggplot2
After searching around for a while I have tried:
## This one gives an error
ggplot(df, aes(x=val,colour=c("red","blue"))) + geom_density()
Error: Aesthetics must either be length one, or the same length as the dataProblems:c("red", "blue")
## This one produces a single, black line
ggplot(df, aes(x=val),colour=c("red","green")) + geom_density()
The best I've come up with is this:
ggplot() + geom_density(aes(x=cands),colour="blue") + geom_density(aes(x=non),colour="red")
As you can see in the image above, that last command correctly changes the colors of the lines but it removes the legend. I like ggplot2's legend system. It is nice and simple, I don't want to have to fiddle about with recreating something that ggplot is clearly capable of doing. On top of which, the syntax is very very ugly. My actual data frame consists of 7 different groups of data. I cannot believe that writing + geom_density(aes(x=FOO),colour="BAR") 7 times is the most elegant way of coding this.
So, if all else fails I will accept with an answer that tells me how to get the legend back on to the 2nd plot. However, if someone can tell me how to do it properly I will be very happy.
set.seed(45)
df <- data.frame(x=c(rnorm(100), rnorm(100, mean=2, sd=2)), grp=rep(1:2, each=100))
ggplot(data = df, aes(x=x, color=factor(grp))) + geom_density() +
scale_color_brewer(palette = "Set1")
ggplot(data = df, aes(x=x, color=factor(grp))) + geom_density() +
scale_color_brewer(palette = "Set3")
gives me same plots with different sets of colors.
Provide vector containing colours for the "values" argument to map discrete values to manually chosen visual ones:
ggplot(df, aes(x=val,color=grp)) +
geom_density() +
scale_color_manual(values=c("red", "blue"))
To choose any colour you wish, enter the hex code for it instead:
ggplot(df, aes(x=val,color=grp)) +
geom_density() +
scale_color_manual(values=c("#f5d142", "#2bd63f")) # yellow/green

Resources