This is a follow-on to a previous question about getting some custom error bars.
The look of the plot is how I need it, so don't worry about commenting in solely in regards to that (happy to hear opinions attached to other help though)
Because these plots are generated in a loop, and the error bars are actually only added if a condition is met, I cant simply merge all the data up front, so assume for the purpose of this exercise the plot data and errorbar data are from different dfs.
I have a ggplot, to which I attempt to add some error bars using a different dataframe. When I call the plot, it says that it cannot find the y values from the parent plot, even though I'm just trying to add error bars using new data. I know this has to be a syntax error but I am stumped...
First lets generate data and the plot
library(ggplot2)
library(scales)
# some data
data.2015 = data.frame(score = c(-50,20,15,-40,-10,60),
area = c("first","second","third","first","second","third"),
group = c("Findings","Findings","Findings","Benchmark","Benchmark","Benchmark"))
data.2014 = data.frame(score = c(-30,40,-15),
area = c("first","second","third"),
group = c("Findings","Findings","Findings"))
# breaks and limits
breaks.major = c(-60,-40,-22.5,-10, 0,10, 22.5, 40, 60)
breaks.minor = c(-50,-30,-15,-5,0, 5, 15,30,50)
limits =c(-70,70)
# plot 2015 data
ggplot(data.2015, aes(x = area, y = score, fill = group)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
coord_flip() +
scale_y_continuous(limit = limits, oob = squish, minor_breaks = breaks.minor,
breaks = breaks.major)
Calling the plot (c) produces a nice plot as expected, now lets set up the error bars and attempt to add them as a new layer in the plot "c"
# get the error bar values
alldat = merge(data.2015, data.2014, all = TRUE, by = c("area", "group"),
suffixes = c(".2015", ".2014"))
alldat$plotscore = with(alldat, ifelse(is.na(score.2014), NA, score.2015))
alldat$direction = with(alldat, ifelse(score.2015 < score.2014, "dec", "inc"))
alldat$direction[is.na(alldat$score.2014)] = "absent"
#add error bars to original plot
c <- c+
geom_errorbar(data=alldat, aes(ymin = plotscore, ymax = score.2014, color = direction),
position = position_dodge(width = .9), lwd = 1.5, show.legend = FALSE)
When I call c now, I get
"Error in eval(expr, envir, enclos) : object 'score' not found"
Why does it look for data.2015$score when I just want it to overlay the geom_errorbar using the second alldat dataframe?
EDIT* I've tried to specify the ymin/ymax values for the error bars using alldata$plotscore and alldat$score.2014 (which I am sure is bad practice), it plots, but the bars are in the wrong positions/out of order with the plot (e.g. swapped around, on the benchmark bars instead, etc.)
In my experience, this error about some variable not being found tells me that R went to look in a data.frame for a variable and it wasn't there. Sometimes the solution is as simple as fixing a typo, but in your case the score variable isn't in the dataset you used to make your error bars.
names(alldat)
[1] "area" "group" "score.2015" "score.2014" "plotscore" "direction"
The y variable is a required aesthetic for geom_errorbar. Because you set a y variable globally within ggplot, the other geoms inherit the global y unless you specifically map it to a different variable. In the current dataset, you'll need map y to the 2015 score variable.
geom_errorbar(data=alldat, aes(y = score.2015, ymin = plotscore,
ymax = score.2014, color = direction),
position = position_dodge(width = .9), lwd = 1.5, show.legend = FALSE)
In your comment you indicated you also had to add fill to geom_errobar, as well, but I didn't find that necessary when I ran the code (you can see above that group is a variable in the second dataset in the example you give).
The other option would be to make sure the 2015 score variable is still named score after merging. This can be done by changing the suffixes argument in in merge. Then score will be in the second dataset and you won't have to set your y variable in geom_errorbar.
alldat2 = merge(data.2015, data.2014, all = TRUE, by = c("area", "group"),
suffixes = c("", ".2014"))
...
names(alldat2)
[1] "area" "group" "score" "score.2014" "plotscore" "direction"
Related
I have a boxplot which summarizes ~60000 turbidity data points into quartiles, median, whiskers and sometimes outliers. Often a few outliers are so high up that the whole plot is compressed at the bottom, and I therefor choose to omit the outliers. However, I also have added averages to the plots as points, and I want these to be plotted always. The problem is that the y-axis of the boxplot does not adjust to the added average points, so when averages are far above the box they are simply plotted outside the chart window (see X-point for 2020, but none for 2021 or 2022). Normally with this parameter, the average will be between the whisker end and the most extreme outliers. This is normal, and expected in the data.
I have tried to capture the boxplot y-axis range to compare with the average, and then setting the ylim if needed, but I just don't know how to retrieve these axis ranges.
My code is just
boxplot(...)
points(...)
and works as far as plotting the points. Just not adjusting the y-axis.
Question 1: is it not possible to get the boxplot to redraw with the new points data? I thought this was standard in R plots.
Question 2: if not, how can I dynamically adjust the y-axis range?
Let's try to show a concrete example of the problem with some simulated data:
set.seed(1)
df <- data.frame(y = c(rexp(99), 150), x = rep(c("A", "B"), each = 50))
Here, group "B" has a single outlier at 150, even though most values are a couple of orders of magnitude lower. That means that if we try to draw a boxplot, the boxes get squished at the bottom of the plot:
boxplot(y ~ x, data = df, col = "lightblue")
If we remove outliers, the boxes plot nicely:
boxplot(y ~ x, data = df, col = "lightblue", outline = FALSE)
The problem comes when we want to add a point indicating the mean value for each boxplot, since the mean of "B" lies outside the plot limits. Let's calculate and plot the means:
mean_vals <- sapply(split(df$y, df$x), mean)
mean_vals
#> A B
#> 0.9840417 4.0703334
boxplot(y ~ x, data = df, col = "lightblue", outline = FALSE)
points(1:2, mean_vals, cex = 2, pch = 16, col = "red")
The mean for "B" is missing because it lies above the upper range of the plot.
The secret here is to use boxplot.stats to get the limits of the whiskers. By concatenating our vector of means to this vector of stats and getting its range, we can set our plot limits exactly where they need to be:
y_limits <- range(c(boxplot.stats(df$y)$stats, mean_vals))
Now we apply these limits to a new boxplot and draw it with the points:
boxplot(y ~ x, data = df, outline = FALSE, ylim = y_limits, col = "lightblue")
points(1:2, mean_vals, cex = 2, pch = 16, col = "red")
For comparison, you could do the whole thing in ggplot like this:
library(ggplot2)
ggplot(df, aes(x, y)) +
geom_boxplot(fill = "lightblue", outlier.shape = NA) +
geom_point(size = 3, color = "red", stat = "summary", fun = mean) +
coord_cartesian(ylim = range(c(range(c(boxplot.stats(df$y)$stats,
mean_vals))))) +
theme_classic(base_size = 16)
Created on 2023-02-05 with reprex v2.0.2
I'm trying to plot an adjusted survival curve, but struggling with changing the line types by group. I'm able to customise other aspects of the plot using typical ggplot2 language, but I've hit a wall with changing line type.
Example:
library(survival)
library(survminer)
fit2 <- coxph( Surv(stop, event) ~ size + strata(rx), data = bladder )
ggadjustedcurves(fit2,
variable = "rx",
data = bladder,
method = "average",
palette = c("#E69F00", "#56B4E9"),
size = 1.3,
legend = "right",
legend.title = expression(bold("Legend title")),
xlab = "Time",
font.legend = 12) +
theme(legend.text.align = 0.5)
I've tried adding in:
geom_line( aes( linetype = c(1, 2) )
add.params = list(linetype = c(1, 2))
and just
linetype = c(1, 2)
but nothing seems to work.
First you need to look at the code.
ggadjustedcurves
It appears that ggadjustedcurves passes all it arguments on to helper functions that depend on the "method" argument, in this case "average", so now look at that (hidden) function:
getAnywhere( ggadjustedcurves.average )
And note that there is no provision to accept additional arguments beyond the few that are defined in the "master function", i.e. no use of R's ellipsis mechanism or specifications of other possible aes-arguments besides size. (It's also not using geom_line.) So you need to change both the master function and the helper function to accept a "linetype" argument. Here I show how to modify the helper function (although this needs to be done to the ggadjustedcurves function as well and maybe the rest of the helper functions if you want this to be completely general):
assignInNamespace('ggadjustedcurves.average',
function (data, fit, variable, size = 1, ..., linetype=linetype)
{
time <- surv <- NULL
lev <- sort(unique(data[, variable]))
pred <- survexp(as.formula(paste("~", variable)), data = data,
ratetable = fit)
curve <- data.frame(time = rep(c(0, pred$time), length(lev)),
variable = factor(rep(lev, each = 1 + length(pred$time))),
surv = c(rbind(1, pred$surv)))
ggplot(curve, aes(x = time, y = surv, color = variable)) +
geom_step(size = size, ..., linetype=linetype) # not geom_line
},
pos="package:survminer")
If you do an SO search on "geom_segment linetype" you find that geom_segment (which is what geon_step uses) is not constructed in a manner that makes it easy to give it short vectors to modify "contiguous" lengths of step function results. See ggplot error using linetype and group aesthetics . This means you would need to use a for-loop or lapply to build separate "step-curves" if you need different line types.
I've searched all over for an answer to this issue. I have a plot created in visreg, and I have successfully changed the line and point colors with the following code.
visreg(RTmodel, "Verb_Type",
by = "Trust_Untrust", ylab = "LogRT", gg = T,
points = F, overlay = T) +
ggtitle("LogRT LME") +
scale_colour_manual(values = cb)
Awesome, just the colors I wanted! But the legend to the side is shaded with the original colors used by base Visreg. So, it doesn't really work with the colors I've chosen.
I've tried the following solutions:
Entering legend=FALSE to get rid of the legend.
Adding guides(color=guide_legend(title="Trustworthiness")) to the end, but that only creates a legend with the correct manually entered colors AND keeps an empty legend with the filled in boxes! (I've also tried this with legends=F and no dice).
.
visreg(RTmodel, "Verb_Type",
by = "Trust_Untrust", ylab = "LogRT", gg = T,
points = F, overlay = T, legend = F) +
ggtitle("LogRT ~ Verb Type by Trustworthiness") +
scale_colour_manual(values = cb) +
guides(color=guide_legend(title = "Trustworthiness"))
Entering plot=FALSE and creating a ggplot of the data visreg returns, but that comes with its own issues!
A previously given suggestion of adding one or a combination of the following:
.
fill=list(col=c("#56B4E9", "#D55E00")),
point=list(col=c("#56B4E9", "#D55E00")),
line=list(col=c("#56B4E9", "#D55E00"))
But I get the following error.
visreg(RTmodel, "Verb_Type",
by = "Trust_Untrust", ylab = "LogRT", gg = F,
points = T, overlay = T, legend = F,
fill = list(col = c("#56B4E9", "#D55E00")),
point = list(col = c("#56B4E9", "#D55E00")),
line = list(col = c("#56B4E9", "#D55E00")))
Error in plot.visreg(v, ...) :
formal argument "points.par" matched by multiple actual arguments
I've tried every combination I can in the visreg code T/F changes (gg, plots, legend etc). And the only two packages I have loaded are Tidyverse and Visreg. I think ultimately, this might be a bug in the system.
I want to show the added line via geom_abline in the legend since the bar chart is denoted in the x axis labels.
How embarrassing, not sure how i forgot toy data. I also cleaned up the example making sure i was running the most up to date version of R and ggplot (and reshape!) I forgot how it can make a difference sometimes
The end product is a bar chart with the added line (indicating the average) with this information showing in the legend, so a red dotted line that says "County Average".
library(ggplot2)
DataToPlot.. <- data.frame(UGB = c("EUG","SPR","COB","VEN"),
Rate = c( 782, 798,858,902))
ggplot(DataToPlot.. ,y = Rate, x = UGB) +
geom_bar(aes(x=UGB,y=Rate, fill = UGB),stat="identity",show.legend = FALSE) +
scale_fill_brewer(palette="Set3") +
geom_abline(aes(intercept = 777, slope = 0), colour = "red",
size = 1.25, linetype="dashed",show.legend = TRUE)
After playing around for awhile (it was not as easy as I expected) I used this:
library(ggplot2)
DataToPlot.. <- data.frame(UGB = c("EUG","SPR","COB","VEN"),
Rate = c( 782, 798,858,902))
x <- c(0.5,nrow(DataToPlot..)+0.5)
AvgLine.. <- data.frame(UGB=x,Rate=777,avg="777")
ggplot(DataToPlot.. ,y = Rate, x = UGB) +
geom_bar(aes(x=UGB,y=Rate, fill = UGB),stat="identity",show.legend=TRUE ) +
scale_fill_brewer(palette="Set3") +
geom_line(data=AvgLine..,aes(x=UGB,y=Rate,linetype=avg),
colour = "red", size = 1.25) +
scale_linetype_manual(values=c("777"="dashed")) +
# make the guide wider and specify the order
guides(linetype=guide_legend(title="Country Average",order=1,keywidth = 3),
color=guide_legend(title="UGB",order=2))
Note I couldn't coerce geom_abline to make its own guide. I had to create a dataframe. The x-coordinates for that line are basically the factor values, and I adjusted them to reach beyond the edges of the plot.
To get this:
How can I show the dots colored using the mosaic package to do a dotplot?
library(mosaic)
n=500
r =rnorm(n)
d = data.frame( x = sample(r ,n= 1,size = n, replace = TRUE), color = c(rep("red",n/2), rep("green",n/2)))
dotPlot(d$x,breaks = seq(min(d$x)-.1,max(d$x)+.1,.1))
right now all the dots are blue but I would like them to be colored according to the color column inthe data table
If you are still interested in a mosaic/lattice solution rather than a ggplot2 solution, here you go.
dotPlot( ~ x, data = d, width = 0.1, groups = color,
par.settings=list(superpose.symbol = list(pch = 16, col=c("green", "red"))))
resulting plot
Notice also
as with ggplot2, the colors are not determined by the values in your color variable but by the theme. You can use par.settings to modify this on the level of a plot or trellis.par.set() to change the defaults.
it is preferable to use a formula and data = and to avoid the $ operator.
you can use the width argument rather than breaks if you want to set the bin width. (You can use the center argument to control the centers of the bins if that matters to you. By default, 0 will be the center of a bin.)
You need to add stackgroups=TRUE so that the two different colors aren't plotted on top of each other.
n=20
set.seed(15)
d = data.frame(x = sample(seq(1,10,1), n, replace = TRUE),
color = c(rep("red",n/2), rep("green",n/2)))
table(d$x[order(d$x)])
length(d$x[order(d$x)])
binwidth= 1
ggplot(d, aes(x = x)) +
geom_dotplot(breaks = seq(0.5,10.5,1), binwidth = binwidth,
method="histodot", aes(fill = color),
stackgroups=TRUE) +
scale_x_continuous(breaks=1:10)
Also, ggplot uses its internal color palette for the fill aesthetic. You'd get the same colors regardless of what you called the values of the "color" column in your data. Add scale_fill_manual(values=c("green","red")) if you want to set the colors manually.