ggplot - annotate() - "Discrete value supplied to continuous scale" - r

I ve read many SO Answers regarding what can cause the error "Discrete value supplied to continuous scale" but I still fail to solve the following issue. In my case, the error is caused by using annotate(). If get rid of + annotate(...) everything works well. Else the error is raised.
My code is as follows:
base <- ggplot() +
annotate(geom = "rect", ymin = -Inf , ymax = 0, xmax = 0, xmin = Inf, alpha = .1)
annotated <- base +
geom_boxplot(outlier.shape=NA, data = technicalsHt, aes(x = name, y = px_last))
> base # fine
> annotated
Error: Discrete value supplied to continuous scale
Unfortunately, I cannot give the code leading to the dataframe used here (viz. technicalsHt) bacause it is very long and reliant on APis. A description of it:
> str(technicalsHt)
'data.frame': 512 obs. of 3 variables:
$ date : Date, format: "2016-11-14" "2016-11-15" ...
$ px_last: num 1.096 0.365 -0.067 0.796 0.281 ...
$ name : Factor w/ 4 levels "Stock Price Strength",..: 1 1 1 1 1 1 1 1 1 1 ...
> head(technicalsHt)
date px_last name
1 2016-11-14 1.09582090 Stock Price Strength
2 2016-11-15 0.36458685 Stock Price Strength
3 2016-11-16 -0.06696111 Stock Price Strength
4 2016-11-17 0.79613481 Stock Price Strength
5 2016-11-18 0.28067475 Stock Price Strength
6 2016-11-21 1.10780834 Stock Price Strength
The code without annotate works perfectly:
base <- ggplot()
annotated <- annotated +
geom_boxplot(outlier.shape=NA, data = technicalsHt, aes(x = name, y = px_last))
> annotated # fine
I tried playing around with technicalsHt, e.g. doing the following:
technicalsHt[,3] <- "hi"
technicalsHt[,2] <- rnorm(ncol(technicalsHt), 2,3)
but no matter what, using a annotate statement raises the error.
EDIT:
following the answer below, I tried to put the data and aes in the initial ggplot call, and have geom_boxplot from the outset:
base <-
# also tried: base <- ggplot(data = technicalsHt, aes(x = factor(name), y = px_last)) + geom_boxplot(outlier.shape=NA)
annotated <- base + ggplot(data = technicalsHt, aes(x = name, y = px_last)) + geom_boxplot(outlier.shape=NA)
annotate(geom = "rect", ymin = -Inf , ymax = 0, xmax = 0, xmin = Inf, alpha = .1)
this works but it is not really satisfactory since the annotation layer (shading part of the coordinate system) then covers the boxes.
(While e.g., that link also mentions this error in connection with annotate, the answer given there does not solve my issue, so I would be extremely grateful for help. First of all, which of the variable is causing problem?)

I had this issue and did not find the answer I wanted, so here is my solution. This is a bit prettier than plotting two times the boxplot.
If you want to annotate a rectangle below the points when there is a discrete scale you need to specify that to ggplot
ggplot(mtcars, aes(factor(cyl), mpg)) +
scale_x_discrete() +
annotate(geom = "rect", ymin = -Inf , ymax = 10, xmax = 0, xmin = Inf, alpha = .1) +
geom_boxplot()

Switch around the order, and bring the data and main aesthetics into your ggplot call. You are basically writing this:
p1 <- ggplot() +
annotate(geom = "rect", ymin = -Inf , ymax = 10, xmax = 0, xmin = Inf, alpha = .1)
At this point, p1 has a continuous x axis, since you provided numbers here.
p2 <- p1 + geom_boxplot(aes(factor(cyl), mpg), mtcars)
Now you add another layer that has a discrete axis, this yields an error.
If you write it the 'proper' way, everything is OK:
ggplot(mtcars, aes(factor(cyl), mpg)) +
geom_boxplot() +
annotate(geom = "rect", ymin = -Inf , ymax = 10, xmax = 0, xmin = Inf, alpha = .1)
p.s.: Also it's not that hard to make a reproducible minimal example that accurately shows your problem, as you can see.

In response to the layering, the easiest work around that i have found is simply plotting the same box plot twice. I am aware that it is unnecessary code, but it is a very quick fix for the layering issue.
ggplot(mtcars, aes(factor(cyl), mpg)) +
geom_boxplot() +
annotate(geom = "rect", ymin = -Inf , ymax = 10, xmax = 0, xmin = Inf, alpha = .1) +
geom_boxplot()
I cannot notice any image degradation from it as the pixels perfectly overlap. Feel free to correct me if anyone has a UHD monitor.

Related

Alternating bar colours by group while alternating background colour and maintaining x-axis label order (ggplot2)

General Info
I am using the ggplot2 package in R to plot some data where I am interested in plotting each row of my data frame as a separate bar using geom_col. Each bar is related to a group that should be plotted in the same colour and on top of that I would like to alternate the colour of the background using geom_rect where the background will span multiple groups.
I have also not figured out how to plot the background without having a separate geom_rect call for each background. In my real example I would need to perform 24 geom_rect calls and surely there must be a better way (see attempt under title "Example geom_rect in one call").
Below is my code on some test data where I almost have it working.
I am actually quite happy with the outcome but I have two issues.
There needs to be a better way of calling the geom_rect call and without loosing the order on the x-axis which is why the solution under the title "Example geom_rect in one call" is not fit for purpose just yet.
I want to be able control the colour of the bars for each group. Normally I would use scale_fill_manual(values=c("my_colors")) but that overides the colour of the rectangles in the background.
##Generate simul data
gene_list <- c("CHEK2", "AML", "TP53", "AKT1", "ATRX", "CDK4")
df <- data.frame(x = gene_list, y = rnorm(6))
df$grp <- c( rep(c(1),3), rep(c(2), 3) )
df$col <- c( rep(c("grey"),3), rep(c("blue"), 3) )
df_rec <- data.frame(xmin = c("CHEK2", "AKT1"), xmax = c("TP53", "CDK4"), ymin = c(-Inf, -Inf), ymax=c(Inf, Inf), col=c("red", "blue"))
##Reformat order
my_factor <- factor(gene_list, levels = gene_list)
df$x <- my_factor
##Create barplot
ggplot() +
geom_rect(data = df_rec,
aes(
fill=df_rec$col[1],
alpha=),
xmin = as.numeric(df$x[df$x == as.character(df_rec$xmin[1])]) - 0.45,
xmax = as.numeric(df$x[df$x == as.character(df_rec$xmax[1])]) + 0.55,
ymin = -Inf,
ymax = Inf) +
geom_rect(data = df_rec,
aes(
fill=df_rec$col[2],
alpha=1),
xmin = as.numeric(df$x[df$x == as.character(df_rec$xmin[2])]) - 0.45,
xmax = as.numeric(df$x[df$x == as.character(df_rec$xmax[2])]) + 0.45,
ymin = -Inf,
ymax = Inf) +
geom_col(df, mapping = aes(x = x, y=y, fill = as.character(grp)))
Example geom_rect in one call
Using the same data as outlined in code above.
Whenever I did try to use the df_rec which has the limits for each rectangle, I loose the order of my labels on the x-axis, control of the colouring of rectangles and can't adjust the geom_rect xmin and xmax position as I did in the code above without it prompting the error "Discrete value supplied to continuous scale"
(Edited) I forgot to add that setting the levels of the factors to have the order that I want for both xmin and xmax still does not order the bars correctly.
##Reformat order
my_factor <- factor(gene_list, levels = gene_list)
df$x <- my_factor
ggplot() +
geom_rect(data = df_rec,
aes(
alpha=1,
xmin = factor(df_rec$xmin, levels=levels(df$x)),
xmax = factor(df_rec$xmax, levels=levels(df$x)),
ymin = -Inf,
ymax = Inf),
fill = df_rec$col
) +
geom_col(df, mapping = aes(x = x, y=y), fill = df$col)
Thanks for anyone taking the time to look at this.
I managed to solve it in the end. Below is an explanation of how I solved it. It turned out to be quite an easy fix after all.
I do recommend having an in depth read on the aes function used for mapping the data as there is clearly a difference colouring the parts of the plot you are interested in colouring inside or outside the aes function. Unfortunately I cannot give you an exact answer of how aes works but I can inform you on how I fixed it in my case.
The simple answer is. If you want to control the colour of each separate bar/rectangle (as I do in my case) you are better off explicitly defining that as a column in your data frame and calling the fill function outside of aes.
Also if you want to force a specific order of your discrete labels on the x-axis. This can be done using the scale_x_discrete function where you can explicitly state the order by passing a vector with the order to limits as seen in the code below.
Successful example
##Generat simul data
gene_list <- c("CHEK2", "AML", "TP53", "AKT1", "ATRX", "CDK4")
df <- data.frame(x = gene_list, y = rnorm(6))
df$grp <- c( rep(c(1),3), rep(c(2), 3) )
df$col <- c( rep(c("grey"),3), rep(c("blue"), 3) )
df_rec <- data.frame(xmin = c("CHEK2", "AKT1"), xmax = c("TP53", "CDK4"), ymin = c(-Inf, -Inf), ymax=c(Inf, Inf), col=c("red", "blue"))
##Reformat order
my_factor <- factor(gene_list, levels = gene_list)
df$x <- my_factor
##Succesful test of barplot in configurations i wanted!!!
ggplot() +
geom_rect(data = df_rec,
aes(alpha=1),
xmin = as.numeric(factor(df_rec$xmin, levels=levels(df$x))) - 0.45,
xmax = as.numeric(factor(df_rec$xmax, levels=levels(df$x))) + 0.55,
ymin = -Inf,
ymax = Inf,
fill = df_rec$col
) +
geom_col(df, mapping = aes(x = x, y=y), fill = df$col) +
scale_x_discrete(limits=c("CHEK2","AML","TP53","AKT1","ATRX","CDK4"))
Thanks to everyone who had a look at this.

Creating a ggplot() from scratch in R to illustrate results

I'm a bit new to R and this is the first time I'd like to use ggplot(). My aim is to create a few plots that will look like the template below, which is an output from the package effects for those who know it:
:
Given this data:
Average Error Area
1: 0.4407528 0.1853854 Loliondo
2: 0.2895050 0.1945540 Seronera
How can I replicate the plot seen in the image with labels, error bars as in Error and the line connecting both Average points?
I hope somebody can put me on the right track and then I will go from there for other data I have.
Any help is appreciated!
Using ggplot2::geom_errorbar you can add error bars by first deriving your ymin and ymax.
df <- tibble::tribble(~Average, ~Error, ~Area,
0.4407528, 0.1853854, "Loliondo",
0.2895050, 0.1945540, "Seronera")
dfnew <- df %>%
mutate(ymin = Average - Error,
ymax = Average + Error)
p <- ggplot(data = dfnew, aes(x = Area, y = Average)) +
geom_point(colour = "blue") + geom_line(aes(group = 1), colour = "blue") +
geom_errorbar(aes(x = Area, ymin = ymin, ymax = ymax), colour = "purple")
Here's a quick and dirty one that is similar to what was just posted:
df <-
tibble(
average = c(0.44, 0.29),
error = c(0.185, 0.195),
area = c("Loliondo", "Seronera")
)
df %>%
ggplot(aes(x = area)) +
geom_line(
aes(y = average, group = 1),
color = "blue"
) +
geom_errorbar(
aes(ymin = average - 0.5 * error, ymax = average + 0.5 * error),
color = "purple",
width = 0.1
)
The trickiest part here is the group = 1 segment, which you need for the line to be drawn with factors on the x axis.
The aes(x = area) goes up top because it's used in both geoms, while the y, group, ymin, and ymax are used only locally. The color and width arguments appear outside of the aes() call since they are used for appearance modifications.

Implementing paired lines into boxplot.ggplot2

I have a set of paired data, and I'm using ggplot2.boxplot (of the easyGgplot2 package) with added (jittered) individual data points:
ggplot2.boxplot(data=INdata,xName='condition',yName='vicarious_pain',groupName='condition',showLegend=FALSE,
position="dodge",
addDot=TRUE,dotSize=3,dotPosition=c("jitter", "jitter"),jitter=0.2,
ylim=c(0,100),
backgroundColor="white",xtitle="",ytitle="Pain intenstity",mainTitle="Pain intensity",
brewerPalette="Paired")
INdata:
ID,condition,pain
1,Treatment,4.5
3,Treatment,12.5
4,Treatment,16
5,Treatment,61.75
6,Treatment,23.25
7,Treatment,5.75
8,Treatment,5.75
9,Treatment,5.75
10,Treatment,44.5
11,Treatment,7.25
12,Treatment,40.75
13,Treatment,17.25
14,Treatment,2.75
15,Treatment,15.5
16,Treatment,15
17,Treatment,25.75
18,Treatment,17
19,Treatment,26.5
20,Treatment,27
21,Treatment,37.75
22,Treatment,26.5
23,Treatment,15.5
25,Treatment,1.25
26,Treatment,5.75
27,Treatment,25
29,Treatment,7.5
1,No Treatment,34.5
3,No Treatment,46.5
4,No Treatment,34.5
5,No Treatment,34
6,No Treatment,65
7,No Treatment,35.5
8,No Treatment,48.5
9,No Treatment,35.5
10,No Treatment,54.5
11,No Treatment,7
12,No Treatment,39.5
13,No Treatment,23
14,No Treatment,11
15,No Treatment,34
16,No Treatment,15
17,No Treatment,43.5
18,No Treatment,39.5
19,No Treatment,73.5
20,No Treatment,28
21,No Treatment,12
22,No Treatment,30.5
23,No Treatment,33.5
25,No Treatment,20.5
26,No Treatment,14
27,No Treatment,49.5
29,No Treatment,7
The resulting plot looks like this:
However, since this is paired data, I want to represent this in the plot - specifically to add lines between paired datapoints. I've tried adding
... + geom_line(aes(group = ID))
..but I am not able to implement this into the ggplot2.boxplot code. Instead, I get this error:
Error in if (addMean) p <- p + stat_summary(fun.y = mean, geom = "point", :
argument is not interpretable as logical
In addition: Warning message:
In if (addMean) p <- p + stat_summary(fun.y = mean, geom = "point", :
the condition has length > 1 and only the first element will be used
Grateful for any input on this!
I do not know the package from which ggplot2.boxplot comes from but I will show you how perform the requested operation in ggplot.
The requested output is a bit problematic for ggplot since you want both points and lines connecting them to be jittered by the same amount. One way to perform that is to jitter the points prior making the plot. But the x axis is discrete, here is a workaround:
b <- runif(nrow(df), -0.1, 0.1)
ggplot(df) +
geom_boxplot(aes(x = as.numeric(condition), y = pain, group = condition))+
geom_point(aes(x = as.numeric(condition) + b, y = pain)) +
geom_line(aes(x = as.numeric(condition) + b, y = pain, group = ID)) +
scale_x_continuous(breaks = c(1,2), labels = c("No Treatment", "Treatment"))+
xlab("condition")
First I have made a vector to jitter by called b, and converted the x axis to numeric so I could add b to the x axis coordinates. Latter I relabeled the x axis.
I do agree with eipi10's comment that the plot works better without jitter:
ggplot(df, aes(condition, pain)) +
geom_boxplot(width=0.3, size=1.5, fatten=1.5, colour="grey70") +
geom_point(colour="red", size=2, alpha=0.5) +
geom_line(aes(group=ID), colour="red", linetype="11") +
theme_classic()
and the updated plot with jittered points eipi10 style:
ggplot(df) +
geom_boxplot(aes(x = as.numeric(condition),
y = pain,
group = condition),
width=0.3,
size=1.5,
fatten=1.5,
colour="grey70")+
geom_point(aes(x = as.numeric(condition) + b,
y = pain),
colour="red",
size=2,
alpha=0.5) +
geom_line(aes(x = as.numeric(condition) + b,
y = pain,
group = ID),
colour="red",
linetype="11") +
scale_x_continuous(breaks = c(1,2),
labels = c("No Treatment", "Treatment"),
expand = c(0.2,0.2))+
xlab("condition") +
theme_classic()
Although I like the oldschool way of plotting with ggplot as shown by #missuse's answer, I wanted to check whether using your ggplot2.boxplot-based code this was also possible.
I loaded your data:
'data.frame': 52 obs. of 3 variables:
$ ID : int 1 3 4 5 6 7 8 9 10 11 ...
$ condition: Factor w/ 2 levels "No Treatment",..: 2 2 2 2 2 2 2 2 2 2 ...
$ pain : num 4.5 12.5 16 61.8 23.2 ...
And called your code, adding geom_line at the end as you suggested your self:
ggplot2.boxplot(data = INdata,xName = 'condition', yName = 'pain', groupName = 'condition',showLegend = FALSE,
position = "dodge",
addDot = TRUE, dotSize = 3, dotPosition = c("jitter", "jitter"), jitter = 0,
ylim = c(0,100),
backgroundColor = "white",xtitle = "",ytitle = "Pain intenstity", mainTitle = "Pain intensity",
brewerPalette = "Paired") + geom_line(aes(group = ID))
Note that I set jitter to 0. The resulting graph looks like this:
If you don't set jitter to 0, the lines still run from the middle of each boxplot, ignoring the horizontal location of the dots.
Not sure why your call gives an error. I thought it might be a factor issue, but I see that my ID variable is not factor class.
I implemented missuse's jitter solution into the ggplot2.boxplot approach in order to align the dots and lines. Instead of using "addDot", I had to instead add dots using geom_point (and lines using geom_line) after, so I could apply the same jitter vector to both dots and lines.
b <- runif(nrow(df), -0.2, 0.2)
ggplot2.boxplot(data=df,xName='condition',yName='pain',groupName='condition',showLegend=FALSE,
ylim=c(0,100),
backgroundColor="white",xtitle="",ytitle="Pain intenstity",mainTitle="Pain intensity",
brewerPalette="Paired") +
geom_point(aes(x=as.numeric(condition) + b, y=pain),colour="black",size=3, alpha=0.7) +
geom_line(aes(x=as.numeric(condition) + b, y=pain, group=ID), colour="grey30", linetype="11", alpha=0.7)

ggplot2 create barplot with CIs with descriptive data (i.e., without raw data)

in the base version of R it is easy (but cumbersome) to create a plot with error bars based on the descriptive data. With ggplot2 I am struggling to do so and all the examples I have found are based on the raw data.
Specifically, how can I create a barplot with confidence intervals for a simple two-group design? M1 = 3, M2 = 4, SD1 = 1, SD2 = 1.2, n1 = 111, n2 = 222? I started off simply with
ggplot(aes(x=c(1:2), y=c(3, 4))) + geom_bar()
# or
ggplot(aes(y=c(3, 4))) + geom_bar()
but not even this seem to work to create a barplot.
Any suggestions?
What about using ggplot2::stat_summary()? You can let it take care of your mean and se calculations (it relies on library(Hmisc) for most of these summary functions, so look there for more help).
library(ggplot2)
ggplot(mtcars, aes(cyl, mpg)) +
stat_summary(geom = "bar", fun.y = mean) +
stat_summary(geom = "errorbar", fun.data = mean_se)
Adjust width = for skinnier bars or error bars.
You can also use a true confidence interval with mean_cl_normal or mean_cl_boot and for a better visualization of the data dispersion:
ggplot(mtcars, aes(cyl, mpg)) +
stat_summary(geom = "crossbar", fun.data = mean_cl_normal)
Edit:
If your want to recreate a published paper just roll your data into a data.frame first:
datf <- data.frame(
group = c("1", "2"),
means = c(3,4),
sds = c(1,1.2),
ns = c(111, 222)
)
# add your CI calcs as column called upr and lwr
library(tidyverse)
datf <- datf %>% mutate(lwr = means - (qnorm(.975)*(sds/sqrt(ns))),
upr = means + (qnorm(.975)*(sds/sqrt(ns))))
ggplot(datf, aes(group, y = means, ymin = lwr, ymax = upr)) +
geom_crossbar()
Or the traditional standard of columns with error bars if you must like this:
ggplot(datf, aes(group, y = means, ymin = lwr, ymax = upr)) +
geom_col() +
geom_errorbar()
You can draw an error bar to whatever values you want. They have an aesthetic called ymin and ymax that you can set. Here I draw the bars +/- 1 standard devaiation from the mean
dd<-read.table(text="sample mean sd n
1 3 1 111
2 4 1.2 222", header=T)
ggplot(dd, aes(sample)) +
geom_col(aes(y=mean)) +
geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd))

Is it possible to expand geom_ribbon to xlimits?

I have the following code (as an example) which I would like to adapt such that the ribbon extends to the entire xrange, as geom_hline() does. The ribbon indicates what values are within accepted bounds. In my real application sometimes has no upper or lower bound, so the hline by itself is not enough to determine whether values are within bounds.
library(ggplot2)
set.seed(2016-12-19)
dates <- seq(as.Date('2016-01-01'), as.Date('2016-12-31'), by = 1)
values <- rexp(length(dates), 1)
groups <- rpois(length(dates), 5)
temp <- data.frame(
date = dates,
value = values,
group = groups,
value.min = 0,
value.max = 2
)
ggplot(temp, aes(date, value)) +
geom_ribbon(aes(ymin = value.min, ymax = value.max), fill = '#00cc33', alpha = 0.6) +
geom_hline(aes(yintercept = value.min)) +
geom_hline(aes(yintercept = value.max)) +
geom_point() +
facet_wrap(~group)
I tried setting the x in geom_ribbon to datesas well, but then only fractions of the range are filled.
Also I tried this:
geom_ribbon(aes(ymin = -Inf, ymax = 2, x = dates), data = data.frame(), fill = '#00cc33', alpha = 0.6)
but then the data seems to be overwritten for the entire plot and I get the error Error in eval(expr, envir, enclos) : object 'value' not found. Even if it would work then the range is still actually too narrow as the xlimits are expanded.
Here's one way to do it:
ggplot(temp, aes(as.numeric(date), value)) +
geom_rect(aes(xmin=-Inf, xmax=Inf, ymin = value.min, ymax = value.max), temp[!duplicated(temp$group),], fill = '#00cc33', alpha = 0.6) +
geom_hline(aes(yintercept = value.min)) +
geom_hline(aes(yintercept = value.max)) +
geom_point() +
scale_x_continuous(labels = function(x) format(as.Date(x, origin = "1970-01-01"), "%b %y")) +
facet_wrap(~group)
Note that I used as.numeric(date), because otherwise Inf and -Inf yield
Error: Invalid input: date_trans works with objects of class Date only
To get date labels for numeric values, I adjusted the scale_x_continuous labels accordingly. (Although they are not exact here. You may want to adjust it by using the exact dates instead of month/year, or alternatively set manual breaks using the breaks argument and for example seq.Date.)
Also note that I used temp[!duplicated(temp$group),] to avoid overplotting and thus maintaining the desired alpha transparency.
Based on lukeA's answer I produced the following code, which I think is a little simpler:
library(ggplot2)
set.seed(2016-12-19)
dates <- seq(as.Date('2016-01-01'), as.Date('2016-12-31'), by = 1)
values <- rexp(length(dates), 1)
groups <- rpois(length(dates), 5)
temp <- data.frame(
date = dates,
value = values,
group = groups,
value.min = 1,
value.max = 2
)
bounds <- data.frame(
xmin = -Inf,
xmax = Inf,
ymin = temp$value.min[1],
ymax = temp$value.max[1]
)
ggplot(temp, aes(date, value)) +
geom_rect(
aes(
xmin = as.Date(xmin, origin = '1970-01-01'),
xmax = as.Date(xmax, origin = '1970-01-01'),
ymin = ymin,
ymax = ymax
),
data = bounds,
fill = '#00cc33',
alpha = 0.3,
inherit.aes = FALSE
) +
geom_point() +
facet_wrap(~group)
I created a temporary dataframe containing the bounds for the rectangle, and added inherit.aes = FALSE since apparently the bounds otherwise overrule the temp data.frame (still seems a bug to me). By transforming the -Inf and Inf to the correct datatype I didn't need the custom labeler (if your dealing with POSIXt use the correct as.POSIXct/lt as automatic transformation fails).

Resources