I am trying to use geom_ribbon to fill an area under a geom_smooth line in ggplot and there are gaps under the curve where the color is not shaded. My data consists of six discrete values for proportion values on the y axis. Is there a way to use ymax in geom_ribbon differently to have the color meet the curved line better?
Here is the reproducible code for the data:
q1 <- structure(list(Session = 1:6, Counts = c(244L, 358L, 322L, 210L,
156L, 100L), Density_1000 = c(NA, NA, NA, NA, NA, NA), Proportion_Activity = c(0.175539568,
0.257553957, 0.231654676, 0.151079137, 0.112230216, 0.071942446
), Lifestage = structure(c(3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Adult",
"Nymph", "Larvae"), class = "factor")), .Names = c("Session",
"Counts", "Density_1000", "Proportion_Activity", "Lifestage"), row.names = 13:18, class = "data.frame")
Here is the ggplot code:
ggplot(q1,aes(x=Session, y=Proportion_Activity, col = Lifestage,fill=Lifestage))
+ geom_smooth(method = 'loess')
+ geom_ribbon(data = q1,aes(x = Session, ymin=0, ymax=Proportion_Activity, alpha=0.5))
You can just use the area geom with the stat_smooth layer. For example
ggplot(q1,aes(x=Session, y=Proportion_Activity, col = Lifestage,fill=Lifestage)) +
geom_smooth(method = 'loess') +
stat_smooth(se=FALSE, geom="area", method = 'loess', alpha=.5)
Thou I really think smoothing should be used when you have a lot of data and want to show a general pattern. Using it like this to "smooth" the line to make it look pretty doesn't make it clear that you have modeled the results and shows data in places where you did not observe it.
You can do something like this.
p1 <- ggplot(q1,aes(x=Session, y=Proportion_Activity)) +
geom_smooth(method = 'loess', aes(color = Lifestage))
g1 <- ggplot_build(p1)
p2 <- data.frame(Session = g1$data[[1]]$x,
Proportion_Activity = g1$data[[1]]$y,
Lifestage = structure(g1$data[[1]]$group, .Label = c("Larvae", "Nymph", "Adult"), class = "factor"))
p1 + geom_ribbon(data = p2, aes(x = Session, ymin = 0, ymax = Proportion_Activity, fill = Lifestage), alpha = 0.5)
You can also use geom_line instead of geom_smooth.
geom_line(stat = "smooth", method = 'loess', alpha = 0.5, aes(color = Lifestage))
And remove the color from geom_smooth/geom_line if you want. Just add guides(color = FALSE) or fill if you want to remove that.
Related
This question already has answers here:
How can I make a discontinuous axis in R with ggplot2?
(3 answers)
Force the origin to start at 0
(4 answers)
Closed last year.
I am still working on finalizing a reproducible figure for publication. Reviewers would like to see the below plot's y-axis start at 0 and include line break "//". The y-axis will need to not only be pretty large (think, 1500 units) but also zoomed in pretty tightly (think, 300 units). This makes the reviewer want us to add a line break to denote that our axis starts at 0, but continues on.
Example of what I can create:
Example of what I want (note the y axis; this was done manually in powerpoint in a similar figure):
My code:
ggplot(data = quad2,
aes(x, predicted, group = group)) +
geom_point(aes(shape = group), size = 6) +
scale_shape_manual(values=c(19, 1)) +
geom_line(size = 2,
aes(linetype = group),
color = "black") +
scale_linetype_manual(values = c("solid", "dashed")) +
geom_linerange(size = 1,
aes(ymin = predicted - conf.low,
ymax = predicted + conf.high),
color = "black",
alpha = .8) +
geom_segment(aes(xend = x,
yend = ifelse(group == "Control", conf.high, conf.low)),
arrow = arrow(angle = 90), color = "red")+
labs(x = "Time",
y = expression(bold("QUAD Volume (cm"^"3"*")")),
linetype = "",
shape = "") + #Legend title
scale_y_continuous(limits =c(1500, 2000))
Reproducible data:
dput(quad2)
structure(list(x = structure(c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L,
5L, 5L), .Label = c("PRE", "MID1", "MID2", "MID3", "POST"), class = "factor"),
predicted = c(1666.97185871754, 1660.27445165342, 1743.2831065274,
1678.48945165342, 1788.50605542978, 1637.40907049806, 1807.55826371403,
1639.78265640012, 1865.8766220711, 1652.91070173056), std.error = c(88.8033117577884,
91.257045996107, 92.9973963841595, 95.3834973421298, 95.0283457128716,
97.3739053806999, 95.6466346849776, 97.9142418717957, 93.3512943191676,
95.5735155125126), conf.low = c(0, 91.257045996107, 0, 95.3834973421298,
0, 97.3739053806999, 0, 97.9142418717957, 0, 95.5735155125126
), conf.high = c(88.8033117577884, 0, 92.9973963841595, 0,
95.0283457128716, 0, 95.6466346849776, 0, 93.3512943191676,
0), group = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L), .Label = c("Intervention", "Control"), class = "factor")), class = "data.frame", row.names = c(NA,
-10L))
Plotting discontinuous axis is made difficult for a reason, that reason being that you should avoid doing it whenever possible. While I disagree with your reviewers, you can get down and dirty with the underlying grid graphics if you truly want a y-axis break.
First make your plot. The only thing I added was y-axis formatting and an axis line theme. We'll just label the bottom tick with "0".
plt <- ggplot(data = quad2,
aes(x, predicted, group = group)) +
geom_point(aes(shape = group), size = 6) +
scale_shape_manual(values=c(19, 1)) +
geom_line(size = 2,
aes(linetype = group),
color = "black") +
scale_linetype_manual(values = c("solid", "dashed")) +
geom_linerange(size = 1,
aes(ymin = predicted - conf.low,
ymax = predicted + conf.high),
color = "black",
alpha = .8) +
geom_segment(aes(xend = x,
yend = ifelse(group == "Control", conf.high, conf.low)),
arrow = arrow(angle = 90), color = "red")+
labs(x = "Time",
y = expression(bold("QUAD Volume (cm"^"3"*")")),
linetype = "",
shape = "") + #Legend title
scale_y_continuous(limits =c(1400, 2000),
breaks = seq(1400, 2000, by = 200),
labels = c(0, seq(1600, 2000, by = 200)),
expand = c(0,0,0.05,0)) +
theme(axis.line = element_line())
Then, we'll make this into a gtable and grab the y-axis line:
gt <- ggplotGrob(plt)
is_yaxis <- which(gt$layout$name == "axis-l")
yaxis <- gt$grobs[[is_yaxis]]
# You should grab the polyline child
yline <- yaxis$children[[1]]
Now we can edit the line as we see fit:
yline$x <- unit(rep(1, 4), "npc")
yline$y <- unit(c(0, 0.1, 1, 0.15), "npc")
yline$id <- c(1, 1, 2, 2)
yline$arrow <- arrow(angle = 90)
Place it back into the gtable object and plot it:
yaxis$children[[1]] <- yline
gt$grobs[[is_yaxis]] <- yaxis
# grid plotting syntax
grid.newpage(); grid.draw(gt)
You can make stylistic choices at the line editing step as you see fit.
To my knowledge ggplot2 doesn't support axis breaks. There is a solution here with facet_grid.
I am trying to get the colours of a confusion matrix to correspond to the percent value in the middle of each matrix.
I have tried adjusting the geom_tile section fill to various options of Freq, or percentage, but with no luck.
valid_actualFunc <- as.factor(c(conf$ObsFunc))
valid_predFunc <- as.factor(c(conf$PredFunc))
cfmFunc <- confusionMatrix(valid_actualFunc, valid_predFunc)
ggplotConfusionMatrix <- function(m){
mytitle <- paste("Accuracy", percent_format()(m$overall[1]),
"Kappa", percent_format()(m$overall[2]))
data_c <- mutate(group_by(as.data.frame(m$table), Prediction ),
percentage=percent(Freq/sum(Freq)))
p <-
ggplot(data = data_c,
aes(x = Reference, y = Prediction)) +
geom_tile(aes(fill = Freq/sum(Freq)), colour = "white") +
scale_fill_gradient(low = "white", high = "red", na.value="white") +
geom_text(aes(x = Reference, y = Prediction, label = percentage)) +
theme(axis.text.x=element_text(angle = -90, hjust = 0),
axis.ticks=element_blank(), legend.position="none") +
ggtitle(mytitle)+
scale_y_discrete(limits = rev(levels(as.factor(valid_predFunc))))
return(p)
}
conf2Func=ggplotConfusionMatrix(cfmFunc)
conf2Func
Currently the fill is not equal to the value in the middle, i.e. a tile with 89% is lighter than one with 70%
As per the comment the return is
dput(head(cfmFunc))
list(positive = NULL, table = structure(c(2331L, 102L, 262L,
52L, 290L, 1986L, 178L, 89L, 495L, 74L, 2966L, 52L, 189L, 58L,
92L, 800L), .Dim = c(4L, 4L), .Dimnames = list(Prediction = c("Algae",
"Hard Coral", "Other", "Other Inv"), Reference = c("Algae", "Hard Coral",
"Other", "Other Inv")), class = "table"), overall = c(Accuracy =
0.807008785942492,
Kappa = 0.730790156424558, AccuracyLower = 0.799141307917932,
AccuracyUpper = 0.814697342402988, AccuracyNull = 0.358126996805112,
AccuracyPValue = 0, McnemarPValue = 6.95780670112837e-62), byClass =
structure(c(0.848562067710229,
0.780967361384192, 0.826874825759688, 0.702370500438982,
0.866006328243225,
0.968687274187073, 0.917249961113703, 0.978258420637603,
0.705295007564297,
0.894594594594595, 0.847913093196112, 0.805639476334341,
0.938012218745343,
0.928553104155977, 0.904725375882172, 0.962429347223761,
0.705295007564297,
0.894594594594595, 0.847913093196112, 0.80563947633434, 0.848562067710229,
0.780967361384192, 0.826874825759688, 0.702370500438982,
0.770323859881031,
0.833928196514802, 0.837261820748059, 0.75046904315197, 0.274261182108626,
0.253893769968051, 0.358126996805112, 0.113718051118211,
0.232727635782748,
0.198282747603834, 0.296126198083067, 0.0798722044728434,
0.329972044728434,
0.221645367412141, 0.349241214057508, 0.0991413738019169,
0.857284197976727,
0.874827317785633, 0.872062393436696, 0.840314460538292), .Dim = c(4L,
11L), .Dimnames = list(c("Class: Algae", "Class: Hard Coral",
"Class: Other", "Class: Other Inv"), c("Sensitivity", "Specificity",
"Pos Pred Value", "Neg Pred Value", "Precision", "Recall", "F1",
"Prevalence", "Detection Rate", "Detection Prevalence", "Balanced
Accuracy"
))), mode = "sens_spec", dots = list())
If you check the structure of your dataset to be plotted str(data_c), you will see that percentage is a character vector, and needs to be converted to numeric to be used as continuous input to the fill gradient.
data_c$percentage.numeric <- as.numeric(gsub("%", "", data_c$percentage))
You can use percentage.numeric for aes fill and percentage for aes label.
ggplot(data = data_c,
aes(x = Reference, y = Prediction)) +
geom_tile(aes(fill = percentage.numeric), colour = "white") +
scale_fill_gradient(low = "white", high = "red", na.value="white") +
geom_text(aes(x = Reference, y = Prediction, label = percentage)) +
theme(axis.text.x=element_text(angle = -90, hjust = 0),
axis.ticks=element_blank(), legend.position="none") +
ggtitle(mytitle)
Note scale_y_discrete(limits = rev(levels(as.factor(valid_predFunc)))) produces an error in your example
Error in as.factor(valid_predFunc) : object 'valid_predFunc' not found
I've created a bar graph in R and now I tried to add the significant differences to the bar graph.
I've tried using geom_signif from the ggsignif package and stat_compare_means from the ggpubr package (based on these suggestions/examples: Put stars on ggplot barplots and boxplots - to indicate the level of significance (p-value) or https://cran.r-project.org/web/packages/ggsignif/vignettes/intro.html)
I was only able to add the significance levels when using geom_signif and choose the parameters as in https://cran.r-project.org/web/packages/ggsignif/vignettes/intro.html.
This is an example of what I would like to get:
And this is what I get:
So when I want to add the asterisks, it shifts the bars from the bar graph. I don't know how to change it...
This is a part of what I wrote:
bargraph = ggplot(dataPlotROI, aes(x = ROI, y=mean, fill = Group))
bargraph +
geom_bar(position = position_dodge(.5), width = 0.5, stat = "identity") +
geom_errorbar(position = position_dodge(width = 0.5), width = .2,
aes(ymin = mean-SEM, ymax = mean+SEM)) +
geom_signif(y_position = c(4.5,10,10), xmin=c(0.85,0.85,4.3), xmax = c(5,4,7.45),
annotation=c("***"), tip_length = 0.03, inherit.aes = TRUE) +
facet_grid(.~ROI, space= "free_x", scales = "free_x", switch = "x")
This is the output from dput(dataPlotROI):
> Dput <- dput(dataPlotROI)
structure(list(Group = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("1",
"2"), class = "factor"), ROI = structure(c(1L, 2L, 3L, 1L, 2L,
3L), .Label = c("LOT", "MO", "ROT"), class = "factor"), mean = c(2.56175803333696,
7.50825658538044, 3.34290874605435, 2.41750375190217, 6.90310020776087,
3.03040666678261), SD = c(1.15192431061913, 4.30564383354597,
2.01581544982848, 1.11404900115086, 3.35276625079825, 1.23786817391241
), SEM = c(0.120096411333424, 0.448894400545147, 0.210163288684092,
0.11614763735292, 0.349550045127766, 0.129056678481624)), class = "data.frame", row.names = c(NA,
-6L))
> Dput
Group ROI mean SD SEM
1 1 LOT 2.561758 1.151924 0.1200964
2 1 MO 7.508257 4.305644 0.4488944
3 1 ROT 3.342909 2.015815 0.2101633
4 2 LOT 2.417504 1.114049 0.1161476
5 2 MO 6.903100 3.352766 0.3495500
6 2 ROT 3.030407 1.237868 0.1290567
Does anyone know what I am doing wrong and how I can fix it?
Thanks!
I don't think geom_signif is meant to span across the facets, but in your case, I don't see any real need for facets anyway. See if the following works for you:
ggplot(dataPlotROI,
aes(x = ROI, y = mean, fill = Group)) +
# geom_col is equivalent to geom_bar(stat = "identity")
geom_col(position = position_dodge(0.5), width = 0.5) +
geom_errorbar(position = position_dodge(0.5), width = 0.2,
aes(ymin = mean - SEM, ymax = mean + SEM)) +
# xmin / xmax positions should match the x-axis labels' positions
geom_signif(y_position = c(4.5, 10, 10),
xmin = c(1, 1, 2.05),
xmax = c(3, 1.95, 3),
annotation = "***",
tip_length = 0.03)
How do I plot a curve for a line of best fit using ggplot? My best guess is that I need to change the stat_smooth parameter somehow but I have no clue how. My goal is something like the black line in the image below.
vv<-structure(list(X = 16:19, school = structure(c(3L, 3L, 3L, 3L), .Label = c("UCB", "UCD", "UIUC"), class = "factor"), year = 2009:2012, mean = c(15.60965, 16.785, 16.77725, 15.91729), sd = c(6.483547,6.852999, 6.327013, 6.74991)), .Names = c("X", "school", "year", "mean", "sd"), row.names = 16:19, class = "data.frame")
ggplot(vv, aes(x = year, y = mean)) +
ggtitle("scores")+
geom_point() +
stat_smooth(method = "lm", col = "red")
You can try changing the formula:
ggplot(vv, aes(x = year, y = mean)) +
ggtitle("scores")+
geom_point() +
stat_smooth(method = "lm", formula = y ~ splines::bs(x, 3), col = "red")
Or simply this (with loess) although you will get some warnings due to span too small for your very small data, but it works:
ggplot(vv, aes(x = year, y = mean)) +
ggtitle("scores") +
geom_point(size=3) +
stat_smooth(col = "red")
I am trying to graph the following data:
to_graph <- structure(list(Teacher = c("BS", "BS", "FA"
), Level = structure(c(2L, 1L, 1L), .Label = c("BE", "AE", "ME",
"EE"), class = "factor"), Count = c(2L, 25L, 28L)), .Names = c("Teacher",
"Level", "Count"), row.names = c(NA, 3L), class = "data.frame")
and want to add labels in the middle of each piece of the bars that are the percentage for that piece. Based on this post, I came up with:
ggplot(data=to_graph, aes(x=Teacher, y=Count, fill=Level), ordered=TRUE) +
geom_bar(aes(fill = Level), position = 'fill') +
opts(axis.text.x=theme_text(angle=45)) +
scale_y_continuous("",formatter="percent") +
opts(title = "Score Distribution") +
scale_fill_manual(values = c("#FF0000", "#FFFF00","#00CC00", "#0000FF")) +
geom_text(aes(label = Count), size = 3, hjust = 0.5, vjust = 3, position = "stack")
But it
Doesn't have any effect on the graph
Probably doesn't display the percentage if it did (although I'm not entirely sure of this point)
Any help is greatly appreciated. Thanks!
The y-coordinate of the text is the actual count (2, 25 or 28), whereas the y-coordinates in the plot panel range from 0 to 1, so the text is being printed off the top.
Calculate the fraction of counts using ddply (or tapply or whatever).
graph_avgs <- ddply(
to_graph,
.(Teacher),
summarise,
Count.Fraction = Count / sum(Count)
)
to_graph <- cbind(to_graph, graph_avgs$Count.Fraction)
A simplified version of your plot. I haven't bothered to play about with factor orders so the numbers match up to the bars yet.
ggplot(to_graph, aes(Teacher), ordered = TRUE) +
geom_bar(aes(y = Count, fill = Level), position = 'fill') +
scale_fill_manual(values = c("#FF0000", "#FFFF00","#00CC00", "#0000FF")) +
geom_text(
aes(y = graph_avgs$Count.Fraction, label = graph_avgs$Count.Fraction),
size = 3
)