Related
I have three regressions in one plot that I am trying to display the equation of each for. I've been working off of this question to try and do this. However, the filtering doesn't seem to do anything and it displays the same equation 3 times.
The end goal is to compare cpue in relation to veg, while controlling for location (block), and get the slopes/r^2 values for each of the three regression lines.
Data
cpue<- structure(list(lake = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L), veg = c(254.8026498, 219.9422136, 450.9662078, 484.8605026,
407.1662151, 286.7015617, 351.6441798, 179.9959443, 340.4276843,
247.2907435, 502.4119071, 336.4259995, 349.1543197, 281.7493811,
201.8284859, 325.6380404, 288.3855723, 230.8755861, 214.8890894,
326.6376698, 214.7468224, 132.0511504, 335.2727641, 336.8727253,
143.8923225, 277.3053436, 302.7005649, 355.0332852, 307.5736711,
371.8407176, 168.7645221, 365.9156811, 349.205548, 273.8392697,
171.4513348, 197.1067049, 350.5833827, 202.9605797, 365.3415045,
413.2762633, 329.8539209, 377.1415341, 180.8524994, 217.4007852,
258.5909286, 146.7092479, 258.7440138, 393.2014549, 492.6719497,
208.5002392, 219.1466664, 182.1366352, 308.0534171, 317.6037795,
131.7534807, 324.0011761, 469.5861988, 237.4492916, 318.6897863,
47.94967582, 223.5382632, 386.2227607, 343.7657123, 493.6393726,
204.2960349, 294.4218332, 178.7555635, 454.0358039, 207.1363947,
364.6063223, 462.8508521, 292.8613255, 330.3893897, 209.1769838,
237.4264742, 427.8856667), cpue = c(32.63512612, 47.98168449,
33.26735173, 14.41435377, 30.94664495, 40.26817963, 41.26204388,
31.63227286, 36.97932408, 21.54620143, 34.27556883, 6.506644061,
32.24677471, 38.24536746, 30.95968644, 24.86408391, 31.15438304,
21.69779047, 39.86223079, 27.92263229, 23.55684281, 34.6157024,
42.06943746, 24.70597527, 28.36396188, 50.34591832, 55.06361184,
48.69468021, 26.00084784, 44.77320597, 14.56328001, 33.29291085,
21.55078237, 29.95980975, 40.61006429, 43.46931237, 26.26407484,
15.87009067, 39.47297313, 20.50811378, 35.66157343, 35.64563497,
44.47319537, 42.06574907, 40.16356125, 35.57462201, 32.10051291,
34.1254268, 34.21084448, 28.18410732, 32.11249307, 38.39890418,
31.24778375, 29.76951583, 41.52508487, 34.48914051, 28.30923803,
29.33886042, 37.57268795, 59.29849175, 28.9317113, 41.27342427,
38.44878019, 44.53768204, 44.48611219, 33.15553274, 34.48894561,
34.86722967, 31.92515626, 50.04825584, 53.67528105, 37.53150868,
33.16255301, 33.22374846, 28.28172263, 42.5795616), block = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("1",
"2", "3"), class = "factor")), row.names = c(NA, -76L), class = "data.frame")
Code
# Make lm() with blocking variable----------
lm_eqn2 <- function(df2){
m2 <- lmer(cpue ~ veg + (1|block), cpue);
eq2 <- substitute(italic(CPUE) == a + b*","~~italic(r)^2~"="~r2, # Write CPUE = a+b, r^2 = x
list(a = format(unname(coef(m2)[1]), digits = 4), # define 'a'
b = format(unname(coef(m2)[2]), digits = 2), # define 'b'
r2 = format(summary(m)$r.squared, digits = 3))) # define 'r2'
as.character(as.expression(eq)); # declare expression as a character
}
ggplot(cpue, aes(x=veg, y=cpue, col=block))+
geom_point()+
geom_smooth(method="lm", show.legend=F, se=F)+
annotate("text", x=100, y=20, label= lm_eqn2(cpue %>% filter(block==1)), parse=T)+
annotate("text", x=200, y=30, label= lm_eqn2(cpue %>% filter(block==2)), parse=T)+
annotate("text", x=300, y=40, label= lm_eqn2(cpue %>% filter(block==3)), parse=T)
When I try to view the equation for each line with the following code:
lm_eqn2(cpue %>% filter(block==2))
it returns the same equation for each blocking number that I filter it by. This makes me think there's something wrong with the code that I made the model and the equation with? The only thing different (that I can tell) from the linked question is that my model has a blocking variable. Not sure if that would actually affect anything though.
Any help would be greatly appreciated.
You have a few problems here.
Firstly, it isn't good practice to use the same name for the dataframe and a vector within. It makes lines like lmer(cpue ~ veg + (1|block), cpue); and ggplot(cpue, aes(x=veg, y=cpue, col=block))+ confusing to many.
But also, using cpue here for the dataframe within your function, means that your function doesn't care what you are passing to it later. Such that m2 <- lmer(cpue ~ veg + (1|block), cpue); is the same every time - hence the same equation is being produced. cpue %>% filter(block==2) is ignored as an argument because df2 doesn't exist within your function. So you need something like this:
lm_eqn2 <- function(df2){
m2 <- lmer(cpue ~ veg + (1|block), df2); ## note the change to df2 here
eq2 <- substitute(italic(CPUE) == a + b*","~~italic(r)^2~"="~r2,
list(a = format(unname(coef(m2)[1]), digits = 4),
b = format(unname(coef(m2)[2]), digits = 2),
r2 = format(summary(m2)$r.squared, digits = 3)))
as.character(as.expression(eq2));
}
** also note that m and eq were not found (in your original code), so I changed them to m2 and eq2 respectively.
This gives the error:
Error: grouping factors must have > 1 sampled level
which makes sense, because you've fit block as a random intercept in your model code, yet you are filtering your data by the blocking factor. So there is only one "type" of blocking factor in each of the lines cpue %>% filter(block==1), cpue %>% filter(block==2), and cpue %>% filter(block==3). That means there is no information added to your regression when you use (1|block), since block is now a constant.
You might want to explain what you are hoping to do with this blocking factor. Some relevant posts: https://stats.stackexchange.com/q/4700/238878 and https://stats.stackexchange.com/q/31569/238878
I am trying to make multiple facetted plots in r/ggplot2 that show R^2 and P values generated using the ggpmisc package, but format each line based on whether the P-value is below a certain number.
I can successfully do this with a single character, the "R" and "P", but I cannot get the superscript 2 in R^2, the equal sign, or the value itself formatted. Possibly complicating things is that I'd like the value rounded to 3 decimal places, which I can do, but again I can't get the format to apply. Here is the problem code and output:
edit: added my.formula <- y ~ x line of code at beginning of code to make it work.
library(tidyverse)
library(ggpmisc)
##edit: I left out this very important line previously (Thank you for pointing this out)
my.formula <- y ~ x
##data (sorry for all the extra whitespace, have never known how to remove this
ex1 <- structure(list(time = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
.Label = c("early", "late"), class = "factor"),
x = c(0.321386372587633, 0.321386372587633, 0.321386372587633, 0.321386372587633, 0.321386372587633,
0.344034242910647, 0.344034242910647, 0.344034242910647, 0.344034242910647, 0.344034242910647,
0.339242868568382, 0.339242868568382, 0.339242868568382, 0.339242868568382, 0.339242868568382,
0.319449901768173, 0.319449901768173, 0.319449901768173, 0.319449901768173, 0.319449901768173,
0.355824915824916, 0.355824915824916, 0.355824915824916, 0.355824915824916, 0.355824915824916,
0.343082264957265, 0.343082264957265, 0.343082264957265, 0.343082264957265, 0.343082264957265,
0.328739896647675, 0.328739896647675, 0.328739896647675, 0.328739896647675, 0.328739896647675,
0.321470937129300, 0.321470937129300, 0.321470937129300, 0.321470937129300, 0.321470937129300,
0.329134067099854, 0.329134067099854, 0.329134067099854, 0.329134067099854, 0.329134067099854,
0.303929221962009, 0.303929221962009, 0.303929221962009, 0.303929221962009, 0.303929221962009,
0.318415163880479, 0.318415163880479, 0.318415163880479, 0.318415163880479, 0.318415163880479,
0.299444516212376, 0.299444516212376, 0.299444516212376, 0.299444516212376, 0.299444516212376,
0.343325715822019, 0.343325715822019, 0.343325715822019, 0.343325715822019, 0.343325715822019,
0.372169617126390, 0.372169617126390, 0.372169617126390, 0.372169617126390, 0.372169617126390,
0.370415982484948, 0.370415982484948, 0.370415982484948, 0.370415982484948, 0.370415982484948,
0.356533513879486, 0.356533513879486, 0.356533513879486, 0.356533513879486, 0.356533513879486,
0.388973753645327, 0.388973753645327, 0.388973753645327, 0.388973753645327, 0.388973753645327,
0.372479078062834, 0.372479078062834, 0.372479078062834, 0.372479078062834, 0.372479078062834,
0.379030035822541, 0.379030035822541, 0.379030035822541, 0.379030035822541, 0.379030035822541,
0.407584269662921, 0.407584269662921, 0.407584269662921, 0.407584269662921, 0.407584269662921,
0.376392901361948, 0.376392901361948, 0.376392901361948, 0.376392901361948, 0.376392901361948,
0.317804974338729, 0.317804974338729, 0.317804974338729, 0.317804974338729, 0.317804974338729,
0.364539393114710, 0.364539393114710, 0.364539393114710, 0.364539393114710, 0.364539393114710,
0.379058888277380, 0.379058888277380, 0.379058888277380, 0.379058888277380, 0.379058888277380),
fctr = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L,
1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L,
1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L,
1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L,
1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L,
1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L),
.Label = c("a", "b", "c", "d", "e"), class = "factor"),
y = c(4.04851970360232, -0.102591188819765, 3.73315302756709, 0.340504779534468, 0.237913590714702,
4.06911664322439, -0.0987598497016705, 5.54001741914177, 0.373135505872404, 0.274375656170733,
3.67443913261548, -0.0875837250365816, 3.03232358376749, 0.326682585279794,
0.239098860243213, 4.17944244767142, -0.0889017722380819, 3.1663639174688,
0.242471128656955, 0.153569356418873, 3.21516073180644, -0.0825729432584231,
6.99230459733604, 0.41716036686455, 0.334587423606127, 3.9367867766317,
-0.100174510844092, 2.72154053689335, 0.328889772105954, 0.228715261261863,
3.99729911731233, -0.100900126928578, 4.53860828306993, 0.366724076035605,
0.265823949107028, 4.56693109900323, -0.126308420437427, 5.10762234664757,
0.395180555800261, 0.268872135362834, 3.51448237912049, -0.0878426078144829,
5.65594606508526, 0.248915416023726, 0.161072808209243, 3.1383099374462,
-0.0732066492304829, 4.64242423111922, 0.217790427134848, 0.144583777904365,
4.49027171118563, -0.113362808190544, 4.80536464343379, 0.36523834263989,
0.251875534449346, 3.45497839018504, -0.0846503583222099, 3.45264289330118,
0.286441498422214, 0.201791140100004, 2.09894643097191, -0.0181292647706588,
3.15649251734621, 0.487077302298916, 0.468948037528257, 2.88011613789647,
0.0332011441108258, 2.17243045829905, 0.473011569737156, 0.506212713847982,
3.34598270139375, -0.0610926284972918, 2.64804765938524, 0.50849363186508,
0.447401003367788, 2.50787708448308, -0.0538689982930191, 2.38574021348553,
0.484438241081951, 0.430569242788932, 3.50259329310981, -0.0851862773159426,
3.14484623466867, 0.675500135099749, 0.590313857783806, 5.7188910696372,
-0.0954071517814848, 0.583392623483105, 0.624411329255129, 0.529004177473644,
2.67827579027081, 0.0460408230771886, 2.52145840963862, 0.625860271515617,
0.671901094592805, 3.13901517396219, -0.0693247901648161, 3.2356649048874,
0.668874437622921, 0.599549647458105, 2.51959640215471, -0.049164926875836,
2.23187740027734, 0.492702963511537, 0.443538036635701, 2.42625504206661,
0.0874001285858868, 2.8286583173362, 0.545165351011274, 0.632565479597161,
3.24857901035993, -0.0689851948115451, 2.91350545205866, 0.569549019107752,
0.500563824296207, 2.87505027703064, -0.0611185132088805, 3.2680427609413,
0.552874791715019, 0.491756278506139)), row.names = c(NA, -120L), class = "data.frame")
gtest <- ggplot(data = ex1, aes(x, y)) +
geom_hline(yintercept = -Inf, size = 0.6) +
geom_vline(xintercept = -Inf, size = 0.8) +
scale_x_continuous(expand = expansion(mult = c(.06, .1))) +
scale_y_continuous(expand = expansion(mult = c(.1, .6))) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, formula = my.formula) +
stat_poly_eq(formula = my.formula,
output.type = "expression", parse = TRUE,
aes(label = ifelse(stat(p.value) < 0.05,
paste("bold(R)^2 == ", round(stat(r.squared), digits = 3)), ## at least shows the r.squared value, but I can't get the rest bold
ifelse(stat(p.value) < 0.1,
paste("plain(R)^2 == ", round(stat(r.squared), digits = 3)), ##this contains plain(R) because the default is italic, which I don't want
""))),
size = 3,
label.x = .1,
label.y = .9) +
stat_poly_eq(formula = my.formula,
output.type = "expression", parse = TRUE,
aes(label = ifelse(stat(p.value) < 0.001,
paste("italic(P) < 0.001"),
ifelse(stat(p.value) < 0.05,
paste("bolditalic(P) == ", round(stat(p.value), digits = 3)), ##at least shows the p.value, but I can't get the rest bold
paste("italic(P) == ", round(stat(p.value), digits = 3))))),
size = 3,
label.x = .1,
label.y = .76) +
theme(axis.title.x = element_text(colour = "black", face = "bold", size = 14, margin = margin(10,0,0,0)),
axis.title.y = element_blank(),
legend.key = element_blank(),
legend.title = element_blank(),
plot.margin = unit(c(0.2,0.4,.2,.2), "cm"),
plot.title = element_text(colour = "black", face = "bold", size = 16, hjust = 0.5,margin = margin(0,0,10,0)),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.spacing.x = unit(2, "pt"),
panel.spacing.y = unit(4, "pt"),
panel.background = element_blank(),
strip.placement = "outside",
strip.text.x = element_text(colour = "black", face = "bold", size = 12, margin = margin(0,0,6,0)),
strip.text.y = element_text(colour = "black", face = "bold", size = 12, margin = margin(0,3,0,0))) +
facet_grid(fctr ~ time, scales = "free", switch = "y")
gtest
I'm assuming that the paste() in aes(label = ...) is not base paste but rather plotmath paste, since output.type = "expression" and parse = TRUE in the stat_poly_eq function call. Futhermore, I've seen numerous examples using bquote(), mtext(), substitute(), etc., instead of paste(), but I cannot get them to do what I'd like. Also, I don't think this has anything to do with the ggpmisc package, but rather my lack of understanding of how to format a string with a changing variable using plotmath expressions.
Here is what I generate with the code above. It's hard to see, but the R and P are in bold in panels (early, d) and (early, e), as they should be:
And here is what I'd like to see, with the entire line in bold:
Any help would be greatly appreciated!
bolditalic() and bold() work like functions, applying the formatting to their argument. So, the code below should do what you want. I haven't tested this on your example as it does not run as is.
paste("bolditalic(P) == bold(\"", round(stat(p.value), digits = 3), "\")")
We need to enclose the number in quotation marks for bold() to apply to it. These quotation marks need to be escaped as \" so that they are recognized as part of the character string. R's expressions are so that numbers and Greek symbols are never typeset as bold. So, by including embedded quotation marks when the text is parsed the numbers are seen as character strings rather than numeric constants. So the key question here is how to format numbers as bold in R plotmath expressions.
UPDATED:
Data has now been updated to full chemistry values as opposed to mean values.
I am attempting to create a box and whisker plot in r, on a very small dataset. My data is not behaving itself or I am missing some glaringly obvious error.
This is the code i have for making said plot
library(ggplot2)
Methanogenesis_Data=read.csv("CO2-CH4 Rates.csv")
attach(Methanogenesis_Data)
summary(Methanogenesis_Data)
str(Methanogenesis_Data)
boxplot(CH4rate~Patch+Temperature, data = Methanogenesis_Data,
xlab="Patch", ylab="CH4 Production")
cols<-c("red", "blue")
From this small dataset.
structure(list(Patch = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Gravel", "Macrophytes",
"Marginal"), class = "factor"), Temperature = structure(c(2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), .Label = c("Cold",
"Warm"), class = "factor"), CH4rate = c(0.001262595, 0.00138508,
0.001675944, 0.001592354, 0.002169233, 0.001772964, 0.002156633,
0.002864403, 0.002301383, 0.002561042, 0.005189598, 0.004557227,
0.008484851, 0.006867866, 0.007438633, 0.005405327, 0.006381582,
0.008860084, 0.007615417, 0.007705906, 0.009198508, 0.00705233,
0.007943024, 0.008319768, 0.010362114, 0.007822153, 0.010339339,
0.009252302, 0.008249555, 0.008197657), CO2rate = c(0.002274825,
0.002484866, 0.003020209, 0.00289133, 0.003927232, 0.003219346,
0.003922613, 0.005217026, 0.00418674, 0.00466427, 0.009427322,
0.008236453, 0.015339532, 0.012494729, 0.013531303, 0.009839847,
0.011624428, 0.016136746, 0.0138831, 0.014051034, 0.016753211,
0.012780956, 0.01445912, 0.01515584, 0.01883252, 0.014249452,
0.018849478, 0.016863299, 0.015045964, 0.014941168)), .Names = c("Patch",
"Temperature", "CH4rate", "CO2rate"), class = "data.frame", row.names =
c(NA,
-30L))
The plot I get as output is good, however I would like the Variables on the X axis to simply display "Gravel" "Macrophytes" "Marginal" as opposed to each of those variables with Warm and Cold. Thanks for any assistance
THIS IS WHAT I AM TRYING TO ACHEIVE -----> Exact Boxplot I want to create
Following your update with an example graph :
I have also included the formating for the legend position. If you want to edit the y axis label to include subscript I would suggest you read over this. I have included a blank title for relabelling.
test <- structure(list(Patch = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Gravel", "Macrophytes",
"Marginal"), class = "factor"), Temperature = structure(c(2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), .Label = c("Cold",
"Warm"), class = "factor"), CH4rate = c(0.001262595, 0.00138508,
0.001675944, 0.001592354, 0.002169233, 0.001772964, 0.002156633,
0.002864403, 0.002301383, 0.002561042, 0.005189598, 0.004557227,
0.008484851, 0.006867866, 0.007438633, 0.005405327, 0.006381582,
0.008860084, 0.007615417, 0.007705906, 0.009198508, 0.00705233,
0.007943024, 0.008319768, 0.010362114, 0.007822153, 0.010339339,
0.009252302, 0.008249555, 0.008197657), CO2rate = c(0.002274825,
0.002484866, 0.003020209, 0.00289133, 0.003927232, 0.003219346,
0.003922613, 0.005217026, 0.00418674, 0.00466427, 0.009427322,
0.008236453, 0.015339532, 0.012494729, 0.013531303, 0.009839847,
0.011624428, 0.016136746, 0.0138831, 0.014051034, 0.016753211,
0.012780956, 0.01445912, 0.01515584, 0.01883252, 0.014249452,
0.018849478, 0.016863299, 0.015045964, 0.014941168)), .Names = c("Patch",
"Temperature", "CH4rate", "CO2rate"), class = "data.frame", row.names =
c(NA,
-30L))
Now I will create two data sets one for each graph just for simplicity you could leave them combined and facet but for formatting purposes this might be easier.
CH4rate <- test %>%
gather("id", "value", 3:4) %>%
filter(id == "CH4rate")
CO2rate <- test %>%
gather("id", "value", 3:4) %>%
filter(id == "CO2rate")
First plot:
ggplot(CH4rate) +
geom_boxplot(mapping = aes(x = Patch, y = value, fill=factor(Temperature, levels = c("Warm", "Cold")))) +
theme(legend.position = c(0.15, 0.9), panel.background = element_rect(fill = "white", colour = "grey50")) +
labs(title = "Title of graph", x="Patch Type", y = "CH4rate") +
scale_fill_manual(name = "", values = c("orange", "light blue")
, labels = c("Cold" = "Incubated at 10˙C", "Warm" = "Incubated at 26˙C"))
Second plot:
ggplot(CO2rate) +
geom_boxplot(mapping = aes(x = Patch, y = value, fill=factor(Temperature, levels = c("Warm", "Cold")))) +
theme(legend.position = c(0.15, 0.9), panel.background = element_rect(fill = "white", colour = "grey50")) +
labs(title = "Title of graph", x="Patch Type", y = "CO2rate") +
scale_fill_manual(name = "", values = c("orange", "light blue")
, labels = c("Cold" = "Incubated at 10˙C", "Warm" = "Incubated at 26˙C"))
I have a dataframe tag, with 51X5 structure
structure(list(Tagging = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CIRCLE CAMPIAGN",
"NATIONAL CAMPIAGN"), class = "factor"), Status = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L), .Label = c("Negative", "Positive"), class = "factor"),
Month = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L), .Label = c("JUL",
"JUN", "MAY"), class = "factor"), Category = structure(c(1L,
4L, 6L, 1L, 2L, 4L, 6L, 1L, 2L, 4L, 5L, 6L, 1L, 2L, 4L, 5L,
6L, 1L, 2L, 4L, 5L, 6L, 1L, 2L, 4L, 6L, 1L, 4L, 6L, 2L, 3L,
4L, 6L, 1L, 2L, 3L, 4L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L,
3L, 4L, 5L, 6L, 6L), .Label = c("Data", "Other", "Roaming",
"Unlimited", "VAS", "Voice"), class = "factor"), count = c(3L,
2L, 1L, 4L, 5L, 2L, 1L, 2L, 6L, 7L, 2L, 3L, 4L, 9L, 6L, 2L,
3L, 3L, 3L, 10L, 2L, 5L, 5L, 5L, 4L, 3L, 1L, 1L, 1L, 2L,
1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 4L, 1L, 1L, 3L, 3L, 2L,
1L, 1L, 1L, 3L, 4L, 2L)), class = "data.frame", row.names = c(NA,
-51L))
I want to create a bar plot (ggplot) to show bar graph with label on bar as sum of count of category month wise I am using below code
ggplot(data = tag, aes(x = Tagging, y = count, fill = Status)) +
geom_col() +
labs(x = "Tagging", y = "Count", title = "FlyTxt ROI", subtitle = "Statistics") +
geom_text(aes(label = count), color = "white", size = 3, position = position_stack(vjust = 0.5)) +
theme_minimal()+facet_wrap(~Month)
But I am getting split count values:
Help as I want only sum of count for each status
The problem is, that the information you show in the bar is accumulated by geom_col over all categories but the geom_text doesn't do that.
On option is to pre-summarize the data (to get rid of the category split) and then plot the graph.
library(tidyverse)
tag_sum <- tag %>%
group_by(Tagging, Status, Month) %>%
summarise(count_sm = sum(count))
ggplot(data = tag_sum, aes(x = Tagging, y = count_sm, fill = Status)) +
geom_col() +
geom_text(aes(label = count_sm), color = "white", size = 3,
position = position_stack(vjust = 0.5)) +
facet_wrap(~Month) +
labs(x = "Tagging", y = "Count", title = "FlyTxt ROI", subtitle = "Statistics") +
theme_minimal()
I have a factor comp_id that has 4 levels (comp1 to comp4). I want to order each level from the highest to the lowest in a geom_line plot.
I got this plot
using this script
library(data.table)
library(ggplot2)
dat <- as.data.table(df)
dat[, ord := sprintf("%02i", frank(dat, comp_id, -value, ties.method = "first"))]
ggplot(dat, aes(x = ord, y = value , group = comp_id , colour = comp_id))+
geom_line()+
facet_wrap(~comp_id, ncol = 1, scales = "free_x", labeller = label_parsed, drop = TRUE)+
theme(axis.text.x=element_text(angle=35, vjust=1, hjust=1,
))
to replace x axis labels
+scale_x_discrete(labels = dat[, setNames(as.character(predictor), ord)])
As you can see, it worked fine for all levels except comp3 where variables ordered (100 to 105) were plotted at the start of facet where they were supposed to be plotted at the end. I wonder what went wrong. Any suggestions will be appreciated.
DATA
> dput(df)
structure(list(predictor = c("c_C2", "c_C3", "c_C4", "d_D2",
"d_D3", "d_D4", "d_D5", "h_BF", "h_BFI", "h_ER", "h_f", "h_PET",
"h_QuFl", "h_Ra", "l_Da", "l_NaCo", "l_ShBe", "m_a", "m_DrDe",
"m_ElRa", "m_MeElm", "m_MeSlPe", "Mr_Co", "Mr_GRAv", "Mr_GREy",
"Mr_Mu", "Mr_Sa", "s_SaLo", "s_SiLo", "s_sSiLo", "s_Stl", "Sr_Li",
"Sr_SaCoCoTe", "Sr_SaLoSi", "Sr_SaMubcl", "c_C2", "c_C3", "c_C4",
"d_D2", "d_D3", "d_D4", "d_D5", "h_BF", "h_BFI", "h_ER", "h_f",
"h_PET", "h_QuFl", "h_Ra", "l_Da", "l_NaCo", "l_ShBe", "m_a",
"m_DrDe", "m_ElRa", "m_MeElm", "m_MeSlPe", "Mr_Co", "Mr_GRAv",
"Mr_GREy", "Mr_Mu", "Mr_Sa", "s_SaLo", "s_SiLo", "s_sSiLo", "s_Stl",
"Sr_Li", "Sr_SaCoCoTe", "Sr_SaLoSi", "Sr_SaMubcl", "c_C2", "c_C3",
"c_C4", "d_D2", "d_D3", "d_D4", "d_D5", "h_BF", "h_BFI", "h_ER",
"h_f", "h_PET", "h_QuFl", "h_Ra", "l_Da", "l_NaCo", "l_ShBe",
"m_a", "m_DrDe", "m_ElRa", "m_MeElm", "m_MeSlPe", "Mr_Co", "Mr_GRAv",
"Mr_GREy", "Mr_Mu", "Mr_Sa", "s_SaLo", "s_SiLo", "s_sSiLo", "s_Stl",
"Sr_Li", "Sr_SaCoCoTe", "Sr_SaLoSi", "Sr_SaMubcl", "c_C2", "c_C3",
"c_C4", "d_D2", "d_D3", "d_D4", "d_D5", "h_BF", "h_BFI", "h_ER",
"h_f", "h_PET", "h_QuFl", "h_Ra", "l_Da", "l_NaCo", "l_ShBe",
"m_a", "m_DrDe", "m_ElRa", "m_MeElm", "m_MeSlPe", "Mr_Co", "Mr_GRAv",
"Mr_GREy", "Mr_Mu", "Mr_Sa", "s_SaLo", "s_SiLo", "s_sSiLo", "s_Stl",
"Sr_Li", "Sr_SaCoCoTe", "Sr_SaLoSi", "Sr_SaMubcl"), comp_id = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("comp1",
"comp2", "comp3", "comp4"), class = "factor"), value = c(0.0633325075111356,
-0.0193713154441617, 0.000785081075580719, 0.287610195287972,
-0.0913783988809322, -0.122928438782758, 0.305621459875726, 0.0356570047659489,
0.367574915852176, -0.240835821698893, 0.0035597425358522, 0.295952594554233,
-0.0439920206129066, -0.235580426938533, 0.191947159509267, -0.132931615006652,
0.065155805120025, 0.038311284807646, 0.187182963731454, 0.120969596703282,
-0.118935354491654, -0.173851183397175, 0.125870264508295, 0.158977975187947,
-0.209351605852615, -0.0231602829054583, 0.078383405846316, 0.0959455355349004,
0.238306328058919, -0.188667962455942, -0.138302814516594, -0.0586994514783439,
0.019524606432138, 0.210636138928319, -0.204454169255484, -0.149879080476447,
0.282741114373524, -0.272911905666994, 0.102508662574812, -0.35056583225677,
0.257262737814283, 0.202117594283655, 0.191773977367133, 0.298513575892895,
0.139576016330362, 0.165641757285727, -0.071542760140058, 0.116819894570386,
0.145104320521166, 0.126636637925691, 0.0810830011112734, -0.0949935353116725,
0.0785254958291791, 0.0326439188223452, 0.065833153228218, 0.155405435626813,
0.128737420120173, 0.214943178842044, -0.0210359058420932, 0.0117832135586799,
0.0762824228178598, -0.29145271973574, -0.17089908579109, -0.0992003952524557,
0.163749177828358, 0.196561728687348, 0.0951493527111932, 0.17238711709624,
0.0638301486629609, -0.0351097560634362, 0.0647994534663104,
-0.154895398844537, 0.186448424833243, 0.240881706707846, -0.241364320964797,
-0.089459273670017, 0.0491598702691844, -0.200660845431752, -0.0339722426751736,
0.131396251991635, -0.195471026941394, -0.05919918680627, -0.184160478394361,
0.129464190293723, 0.193021703469902, 0.178985522376368, -0.245966624042807,
-0.23478025602535, 0.198620462933836, -0.157573246492692, -0.00808698000885529,
0.0413693509741982, -0.121020524702316, 0.105148862728949, 0.214386790903084,
-0.204515275979768, -0.0906160054540168, -0.276985960928353,
0.0768294557774406, -0.074181085595352, 0.138680723918144, -0.119684214245213,
-0.0919678069134681, 0.322602153170851, 0.228878715511945, -0.433082572929477,
0.05754301130056, 0.130719232236558, 0.253999327778221, 0.0469683234741709,
-0.0258294537417061, -0.258318910865727, -0.00406472629347961,
-0.165003562015847, -0.0292142578447021, 0.00862320222199929,
0.0875367120866572, 0.0331716236283754, -0.0418387105725687,
-0.12523142839593, -0.200857915084298, 0.138378222132672, 0.00992811008724002,
-0.0201043482518474, -0.148894977354092, -0.323240591170999,
-0.0556713655820164, 0.379033571103569, -0.264420286734383, 0.127560649906739,
-0.00546455207923468, -0.203293330594455, -0.122085266718802,
-0.0970860819632599, -0.173818516285048, -0.0585031143296301,
0.125084378608705, 0.0655074180474436, 0.254339734692359, 0.00114212078410835
)), class = "data.frame", .Names = c("predictor", "comp_id",
"value"), row.names = c(NA, -140L))
Here is an approach using tidyverse and continuous scale
library(tidyverse)
df %>%
arrange(comp_id, desc(value)) %>% #arrange by comp_id and descending value
mutate(ord = 1:n()) -> dat #create the x scale
ggplot(dat, aes(x = ord, y = value , group = comp_id , colour = comp_id))+
geom_line()+
facet_wrap(~comp_id, ncol = 1, scales = "free_x", drop = TRUE)+
theme(axis.text.x=element_text(angle=35, vjust=1, hjust=1)) +
scale_x_continuous(labels = dat$predictor, breaks = dat$ord, expand = c(0.02, 0.02))
In addition to the nice answer by #missuse, there was another way that gave me what I wanted.
using as factor / as numeric / as.character with the x axis
aes(x = as.factor(as.numeric(as.character(ord)))
and using as numeric /as character while replacing the x axis labels
as.numeric(as.character(ord))
The final script is
ggplot(dat, aes(x = as.factor(as.numeric(as.character(ord))), y = value , group = comp_id , colour = comp_id))+
geom_line()+
facet_wrap(~comp_id, ncol = 1, scales = "free_x", labeller = label_parsed, drop = TRUE)+
theme(axis.text.x=element_text(angle=35, vjust=1, hjust=1,
))+
scale_x_discrete(labels = dat[, setNames(as.character(predictor), as.numeric(as.character(ord)))])