Data labels for mean and percentiles in a distribution chart - r

I'm creating a custom chart to visualize a variable's distribution using geom_density. I added 3 vertical lines for a custom value, the 5th percentile and the 95th percentile.
How do I add labels for those lines?
I tried using geom_text but i don't know how to parameter the x and y variables
library(ggplot2)
ggplot(dataset, aes(x = dataset$`Estimated percent body fat`)) +
geom_density() +
geom_vline(aes(xintercept = dataset$`Estimated percent body fat`[12]),
color = "red", size = 1) +
geom_vline(aes(xintercept = quantile(dataset$`Estimated percent body fat`,
0.05, na.rm = TRUE)),
color = "grey", size = 0.5) +
geom_vline(aes(xintercept = quantile(dataset$`Estimated percent body fat`,
0.95, na.rm = TRUE)),
color="grey", size=0.5) +
geom_text(aes(x = dataset$`Estimated percent body fat`[12],
label = "Custom", y = 0),
colour = "red", angle = 0)
I'd like to obtain the following:
for the custom value, I'd like to add the label at the top of the chart, just to the right of the line
for the percentiles label, I'd like to add them in the middle of the chart; at the left of the line for the 5th percentile and right of the line for 95th percentile
Here is what I was able to obtain https://i.imgur.com/thSQwyg.png
And these are the first 50 lines of my dataset:
structure(list(`Respondent sequence number` = c(21029L, 21034L,
21043L, 21056L, 21067L, 21085L, 21087L, 21105L, 21107L, 21109L,
21110L, 21125L, 21129L, 21138L, 21141L, 21154L, 21193L, 21195L,
21206L, 21215L, 21219L, 21221L, 21232L, 21239L, 21242L, 21247L,
21256L, 21258L, 21287L, 21310L, 21325L, 21367L, 21380L, 21385L,
21413L, 21418L, 21420L, 21423L, 21427L, 21432L, 21437L, 21441L,
21444L, 21453L, 21466L, 21467L, 21477L, 21491L, 21494L, 21495L
), `Estimated percent body fat` = c(NA, 7.2, NA, NA, 24.1, 25.1,
30.2, 23.6, 24.3, 31.4, NA, 14.1, 20.5, NA, 23.1, 30.6, 21, 20.9,
NA, 24, 26.7, 16.6, NA, 26.9, 16.9, 21.3, 15.9, 27.4, 13.9, NA,
20, NA, 12.8, NA, 33.8, 18.1, NA, NA, 28.4, 10.9, 38.1, 33, 39.3,
15.9, 32.7, NA, 20.4, 16.8, NA, 29)), row.names = c(NA, 50L), class =
"data.frame")

First I recommend clean column names.
dat <- dataset
names(dat) <- tolower(gsub("\\s", "\\.", names(dat)))
Whith base R plots you could do the following. The clou is, that you can store the quantiles and custom positions to use them as coordinates later which gives you a dynamic positioning. I'm not sure if/how this is possible with ggplot.
plot(density(dat$estimated.percent.body.fat, na.rm=TRUE), ylim=c(0, .05),
main="Density curve")
abline(v=c1 <- dat$estimated.percent.body.fat[12], col="red")
abline(v=q1 <- quantile(dat$estimated.percent.body.fat, .05, na.rm=TRUE), col="grey")
abline(v=q2 <- quantile(dat$estimated.percent.body.fat, .95, na.rm=TRUE), col="grey")
text(c1 + 4, .05, c(expression("" %<-% "custom")), cex=.8)
text(q1 - 5.5, .025, c(expression("5% percentile" %->% "")), cex=.8)
text(q2 + 5.5, .025, c(expression("" %<-% "95% percentile")), cex=.8)
Note: Case you don't like the arrows just do e.g. "5% percentile" instead of c(expression("5% percentile" %->% "")).
Or in ggplot you could use annotate.
library(ggplot2)
ggplot(dataset, aes(x = dataset$`Estimated percent body fat`)) +
geom_density() +
geom_vline(aes(xintercept = dataset$`Estimated percent body fat`[12]),
color = "red", size = 1) +
geom_vline(aes(xintercept = quantile(dataset$`Estimated percent body fat`,
0.05, na.rm = TRUE)),
color = "grey", size = 0.5) +
geom_vline(aes(xintercept = quantile(dataset$`Estimated percent body fat`,
0.95, na.rm = TRUE)),
color="grey", size=0.5) +
annotate("text", x=16, y=.05, label="custom") +
annotate("text", x=9.5, y=.025, label="5% percentile") +
annotate("text", x=38, y=.025, label="95% percentile")
Note, that in either solution the result (i.e. exact label positions) depends on your export size. To learn how to control this, take e.g. a look into How to save a plot as image on the disk?.
Data
dataset <- structure(list(`Respondent sequence number` = c(21029L, 21034L,
21043L, 21056L, 21067L, 21085L, 21087L, 21105L, 21107L, 21109L,
21110L, 21125L, 21129L, 21138L, 21141L, 21154L, 21193L, 21195L,
21206L, 21215L, 21219L, 21221L, 21232L, 21239L, 21242L, 21247L,
21256L, 21258L, 21287L, 21310L, 21325L, 21367L, 21380L, 21385L,
21413L, 21418L, 21420L, 21423L, 21427L, 21432L, 21437L, 21441L,
21444L, 21453L, 21466L, 21467L, 21477L, 21491L, 21494L, 21495L
), `Estimated percent body fat` = c(NA, 7.2, NA, NA, 24.1, 25.1,
30.2, 23.6, 24.3, 31.4, NA, 14.1, 20.5, NA, 23.1, 30.6, 21, 20.9,
NA, 24, 26.7, 16.6, NA, 26.9, 16.9, 21.3, 15.9, 27.4, 13.9, NA,
20, NA, 12.8, NA, 33.8, 18.1, NA, NA, 28.4, 10.9, 38.1, 33, 39.3,
15.9, 32.7, NA, 20.4, 16.8, NA, 29)), row.names = c(NA, 50L), class =
"data.frame")

Related

How to plot fitted,observed and forecast values in the same graph using R

So, I have a nnetar model and I want to plot my fitted, observed (actual) and forecast in the same plot with different colors and legend. It is a time-series data and "y" is my ts() object.
fit<-nnetar(y,xreg = train_reg)
results<-forecast(fit,xreg = test_reg)
plot(results)
With this code, I only have the forecast values and visualization, I know that I can reach the fitted ones using results$fitted and for the forecasted ones results$mean.
Thank you!
This is very simple, you are looking for the command points(). When you run the plot()command, make sure to write type="n", then fill in data with the points()command as below
Please provide data on your next post.
Here, I have a pretend dataframe dfthat has two columns of data:
People that voted yes (percent)
Percent employment (meaning, how high employment is in the area)
The variables are arbitrary here, but it's just to illustrate what you want to do.
Here is the df
df <- structure(list(employment = c(23, 14.6, 9.9, 20.1, 34.4, 13.8,
20.6, 37.2, 21.8, 17.3, 13.1, 16.8, 24.6, 12.6, 13.6, 24.4, 19.3,
20, 22.6, 27.4, 23.1, 10.7, 32.1, 22, 25.6, 25), yes = c(55.2,
54.4, 63.5, 50.6, 39, 51.1, 48.5, 39.1, 59.4, 50.6, 44.1, 53.3,
39.3, 58.8, 59.1, 58.1, 63.1, 54.6, 55.9, 68.2, 57.8, 58.2, 38.9,
48.3, 49.9, 47.3)), class = "data.frame", row.names = c(NA, -26L
))
Run your model and add fitted values to your df
model<-lm(yes ~ employment, data = df)
df$fit <-predict(model) # add fitted, aka predicted, values to df
attach(df)
we can also add confidence intervals to the plot using the code below
& you can shade in the confidence intervals if you wish with the polygon command
lets make a custom transparent color using library(yarr) and the function function yarr::transparent()
library(yarrr)
par(mfrow = c(1, 1), cex=2, pch=10) # set plot window to desired conditions
plot(yes ~ employment, data = df, ylab = "Yes [%]", xlab="Employment [%]", type="n")
points(yes ~ employment)
points(fit~employment, col="red") # Note, these are the fitted values in red
newx <- seq(min(df$employment), max(df$employment), length.out=100)
preds <- predict(model, newdata = data.frame(employment=newx), interval = 'confidence')
polygon(c(rev(newx), newx), c(rev(preds[ ,3]), preds[ ,2]), col = yarrr::transparent("583", trans.val = .5), border = NA)
abline(lm(yes~foreigners),col="blue", lwd=2)
lines(newx, preds[ ,3], lty = 'dashed', col = 'blue', lwd=2)
lines(newx, preds[ ,2], lty = 'dashed', col = 'blue',lwd=2)

How to get rid of annotations on faceted graph?

Problem
I am trying to label the left facet side of my graph while leaving out the annotations on the right side.
Data
Here are my libraries and data:
#### Libraries ####
library(tidyverse)
library(ggpubr)
library(plotly)
#### Dput ####
emlit <- structure(list(X = 1:20, Ethnicity = c("Asian (other than Chinese)",
"Filipino", "Indonesian", "Thai", "Japanese", "Korean", "South Asian",
"Indian", "Nepalese", "Pakistani", "Other South Asian", "Other Asian",
"White", "Mixed", "With Chinese parent", "Other mixed", "Others",
"All ethnic minorities", "All ethnic minorities, excluding\n foreign domestic helpers",
"Whole population"), Age_5.14 = c(65.8, 72.2, 69.4, 83.1, 26.6,
52.4, 67.4, 60.4, 69.5, 71.5, 92.5, 92, 34.8, 76.6, 84.2, 45.3,
51.3, 64.3, 64.3, 94.8), Age_15.24 = c(28.1, 29.2, 4.4, 72.9,
34.8, 50.3, 38.7, 41.4, 22.2, 54.3, 41.9, 64.7, 24.4, 82.9, 90.7,
37.4, 53.2, 40.6, 52.9, 96.9), Age_25.34 = c(4.5, 1.8, 4.6, 20,
17.2, 26.8, 6.6, 4.2, 6.4, 11.9, 12, 33.9, 15, 60.5, 82, 6.7,
11.2, 7.8, 21.8, 84.9), Age_35.44 = c(6.3, 2, 6.1, 35.7, 36.5,
25.5, 9.4, 6.2, 10.5, 10.1, 22.4, 35.7, 8.6, 63, 83.2, 4.5, 12.2,
9.5, 23.4, 84.6), Age_45.54 = c(8.1, 2.3, 8, 23.2, 43.4, 59.6,
7.5, 6.3, 3.9, 13.5, 28.3, 47.5, 13.1, 72.1, 84, 4.4, 22.4, 14.2,
27.7, 92.5), Age_55.64 = c(15.9, 4.4, 44, 27, 41.7, 52.8, 11.8,
7.4, 9.5, 2, 54.2, 39.6, 12.7, 75.3, 80.1, 2.6, 20.6, 25, 32.4,
94.8), Age_65. = c(31.1, 11.9, 82.6, 39, 46.4, 57, 9.5, 3.9,
NA, 11.4, 66.5, 74.5, 14.5, 80.5, 81, 57.5, 13.6, 42.7, 44, 82.3
), Age_Overall = c(10.1, 3.5, 6.4, 31.4, 35.1, 39.8, 20.4, 15.3,
16.4, 33.8, 30.4, 46.3, 15.4, 72.7, 83.9, 19.4, 19.8, 16.9, 35.2,
89.4)), class = "data.frame", row.names = c(NA, -20L))
I have also pivoted the data for my graph:
#### Pivot Data ####
emlitpivot <- emlit %>%
pivot_longer(cols = contains("Age"),
names_to = "Age_Range",
values_to = "Percent")
Plot
Here is my plot so far, a faceted graph that breaks down literacy by age with some notes on some important points on the left:
#### EM vs all ####
# Order
order <- c("5-14", "15-24", "25-34", "35-44", "45-54", "55-64", "65+", "Overall",
"5-14", "15-24", "25-34", "35-44", "45-54", "55-64", "65+", "Overall")
# Plot
plot <- emlitpivot %>%
filter(Ethnicity %in% c("All ethnic minorities",
"Whole population")) %>%
ggbarplot(x="Age_Range",
y="Percent",
fill = "Ethnicity",
label = T,
palette = "jco",
facet.by = "Ethnicity",
title = "EM x Native Chinese Literacy by Age",
xlab = "Age Range",
ylab = "Literacy in Chinese (By Percent)",
caption = "*Data obtained from Census and Statistics Department Hong Kong SAR, 2016.")+
theme_cleveland()+
theme(axis.text.x = element_text(angle = 45,
hjust = .5,
vjust = .5),
legend.position = "none",
plot.caption = element_text(face = "italic"))+
scale_x_discrete(labels=order)+
geom_segment(aes(x = 3, y = 15, xend = 3, yend = 48))+
geom_segment(aes(x = 1, y = 71, xend = 1, yend = 80))+
geom_segment(aes(x = 7, y = 50, xend = 7, yend = 65))+
annotate("text",
x=4,
y=53,
label = "Post-college workers can't read.")+
annotate("text",
x=3.5,
y=85,
label = "School age supports seem to boost initial literacy.")+
annotate("text",
x=6,
y=70,
label = "Increase due to generational literacy?")
# Print plot:
plot
However, you can probably guess what the problem is:
How do I get rid of the annotations on the right? I'm not sure if there is a simple way of getting rid of them, but it would be helpful to only have text on the left side.
In this case, I'll use geom_text instead of annotate, since it allows you to have subset of your data.
library(tidyverse)
library(ggpubr)
emlitpivot %>%
filter(Ethnicity %in% c(
"All ethnic minorities",
"Whole population"
)) %>%
ggbarplot(
x = "Age_Range",
y = "Percent",
fill = "Ethnicity",
label = T,
palette = "jco",
facet.by = "Ethnicity",
title = "EM x Native Chinese Literacy by Age",
xlab = "Age Range",
ylab = "Literacy in Chinese (By Percent)",
caption = "*Data obtained from Census and Statistics Department Hong Kong SAR, 2016."
) +
theme_cleveland() +
theme(
axis.text.x = element_text(
angle = 45,
hjust = .5,
vjust = .5
),
legend.position = "none",
plot.caption = element_text(face = "italic")
) +
scale_x_discrete(labels = order) +
geom_segment(data = subset(emlitpivot, Ethnicity == "All ethnic minorities"), aes(x = 3, y = 15, xend = 3, yend = 48)) +
geom_segment(data = subset(emlitpivot, Ethnicity == "All ethnic minorities"), aes(x = 1, y = 71, xend = 1, yend = 80)) +
geom_segment(data = subset(emlitpivot, Ethnicity == "All ethnic minorities"), aes(x = 7, y = 50, xend = 7, yend = 65)) +
geom_text(data = subset(emlitpivot, Ethnicity == "All ethnic minorities"), aes(4, 53), label = "Post-college workers can't read.", check_overlap = T) +
geom_text(data = subset(emlitpivot, Ethnicity == "All ethnic minorities"), aes(3.5, 85), label = "School age supports seem to boost initial literacy.", check_overlap = T) +
geom_text(data = subset(emlitpivot, Ethnicity == "All ethnic minorities"), aes(6, 70), label = "Increase due to generational literacy?", check_overlap = T)
Update remove lines in second facet:
Create a dataframe with your text labels and position and add it to the plot,
to remove the lines do the same procedure:
df for text = ann_text
df for lines = segm
ann_text <- data.frame(x = c(4, 3.5, 6),
y = c(53, 85, 70),
lab = c("Post-college workers can't read.", "School age supports seem to boost initial literacy.",
"Increase due to generational literacy?"),
Ethnicity = rep("All ethnic minorities", 3))
segm <- data.frame(x = c(3,1,7),
y = c(15, 71, 50),
xend = c(3,1,7),
yend = c(48,80,65),
Ethnicity = rep("All ethnic minorities", 3))
plot1 <- plot +
geom_text(
data = ann_text,
mapping = aes(x = x, y = y, label = lab)
)
plot1 + geom_segment(
data = segm,
mapping = aes(x = x, y = y, xend = xend, yend = yend)
)
remove the following from your code:
annotate("text",
x=4,
y=53,
label = "Post-college workers can't read.")+
annotate("text",
x=3.5,
y=85,
label = "School age supports seem to boost initial literacy.")+
annotate("text",
x=6,
y=70,
label = "Increase due to generational literacy?")

Add labels to scatter plot

I cannot seem to add labels to my scatter plot.
I want to label the scatter with the variable states. It is currently not working either due to conflict with annotate or some other geom function.
Here is my data:
datatrials <- structure(list(states = c(
"AP", "AR", "AS", "BR", "CH", "GJ",
"HR", "HP", "JK", "JH", "KA", "KL", "MP", "NL", "OR", "PY", "PB",
"RJ", "SK", "TG", "TR", "UP", "UT"
), cured = c(
60.44117647, 2.631578947,
24.70095694, 47.31207289, 71.09634551, 67.46961326, 36.86526743,
41.78272981, 35.24676234, 42.68617021, 37.26310608, 43.57429719,
63.40242198, 0, 59.29648241, 30.48780488, 85.3956229, 69.87152922,
0, 51.52317881, 36.96581197, 25.99078341, 59.29659755
), Total = c(
4080L,
38L, 1672L, 4390L, 301L, 18100L, 2954L, 359L, 2857L, 752L, 4063L,
1494L, 8588L, 58L, 2388L, 82L, 2376L, 9652L, 2L, 3020L, 468L,
1085L, 8729L
), deaths = c(
1.666666667, 0, 0.23923445, 0.569476082,
1.661129568, 6.198895028, 0.778605281, 1.39275766, 1.190059503,
0.664893617, 1.304454836, 0.736278447, 4.319981369, 0, 0.293132328,
0, 1.978114478, 2.165354331, 0, 3.278145695, 0, 0.737327189,
2.623439111
), SLT_preval = c(
7.1, 39.3, 41.7, 23.5, 6.1, 19.2,
6.3, 3.1, 4.3, 35.4, 16.3, 5.4, 28.1, 39, 42.9, 4.7, 8, 14.1,
9.7, 10.1, 48.5, 29.4, 12.4
), smoking_preval = c(
14.2, 22.7,
13.3, 5.1, 9.4, 7.7, 19.7, 14.2, 20.8, 11.1, 8.8, 9.3, 10.2,
13.2, 7, 7.2, 7.3, 13.2, 10.9, 8.3, 27.7, 13.5, 18.1
), density_Population = c(
330.7605972,
18.75330475, 453.9513884, 1325.360556, 10162.04386, 325.839688,
637.9420067, 133.8522264, 108.3866651, 484.1552049, 352.2724528,
918.5972004, 276.9192201, 135.6954581, 297.7151573, 2951.02714,
598.4943608, 236.7722235, 97.27325254, 351.2115064, 397.6534427,
987.360228, 210.3632556
), avg_tobacco_use = c(
10.65, 31, 27.5,
14.3, 7.75, 13.45, 13, 8.65, 12.55, 23.25, 12.55, 7.35, 19.15,
26.1, 24.95, 5.95, 7.65, 13.65, 10.3, 9.2, 38.1, 21.45, 15.25
), urbanization = c(
29.47, 22.94, 14.1, 11.29, 97.25, 42.6, 34.88,
10.03, 26.11, 24.05, 38.67, 47.7, 27.63, 28.86, 16.69, 68.33,
37.48, 24.87, 25.15, 38.88, 26.17, 22.27, 30.23
), gats2_tobacco_india = c(
20,
45.5, 48.2, 25.9, 13.7, 25.1, 23.6, 16.1, 23.7, 38.9, 22.8, 12.7,
34.2, 43.3, 45.6, 11.2, 13.4, 24.7, 17.9, 17.8, 64.5, 35.5, 26.5
), Cases_per_pop = c(
7.56909681, 2.419676298, 4.695700757, 3.517630291,
25.98247866, 28.33774883, 10.4734347, 4.817527749, 20.99759524,
1.948492028, 6.013674471, 4.184939244, 10.06104045, 2.578127257,
5.151399591, 5.80103032, 7.882852583, 11.91124239, 0.289749671,
7.672231694, 11.22357603, 0.456107101, 77.58519395
)), class = "data.frame", row.names = c(
NA,
-23L
))
This is my code:
library(ggplot2)
library(ggExtra)
library(gridExtra)
#working plot
plot1 <- ggplot(datatrials, aes(SLT_preval,urbanization)) + geom_point(color = '#CC9933') +
geom_smooth(fullrange=TRUE,method = "lm", level=0.95) +
ylab("Urbanization %") +
xlab("Smokeless Tobacco Use %") +
theme(axis.text=element_text(size=14),
axis.title=element_text(size=14)) +
scale_x_continuous(expand=c(0,0), limits=c(0,100)) +
scale_y_continuous(expand=c(0,0), limits=c(-50,100)) +
coord_cartesian(xlim = c(0, 70), ylim = c(0, 100)) +
theme(axis.title.y = element_text(margin=margin (t=0, r=5, b=0, l=0))) +
geom_label(x = 0.95*max(SLT_preval), y = 0.92*max(urbanization), size = 4.3, label = "n = 32; p-value = 0.015; \n CI = -0.799:-0.050; rho = -0.426")
plot2 <- ggplot(datatrials, aes(smoking_preval,urbanization)) + geom_point(color = '#615513') +
geom_smooth(fullrange=TRUE,method = "lm", se=FALSE) +
ylab("Urbanization %") +
xlab("Smoking %") +
theme(axis.text=element_text(size=14),
axis.title=element_text(size=14)) +
scale_x_continuous(expand=c(0,0), limits=c(0,100)) +
scale_y_continuous(expand=c(0,0), limits=c(-50,100)) +
coord_cartesian(xlim = c(0, 70), ylim = c(0, 100)) +
geom_label(x = 1.35*max(smoking_preval), y = 0.92*max(urbanization), size = 4.3, label = "n = 32; p-value = 0.186; \n CI = -0.641:0.165; rho = -0.239")
p1 <- ggMarginal(plot1, type="histogram", colour = '#FF0000', fill = '#FAC95F')
p2 <- ggMarginal(plot2, type="histogram", colour = '#FF0000', fill = '#615513')
grid.arrange(p1, p2, ncol=2)
Try this. I added the geom_text. Next. There was an error in your use of geom_label which lacked the datatrials$. Also I switched to annotate which works fine if you add geom = "label".
library(ggplot2)
library(ggExtra)
# working plot
plot1 <- ggplot(datatrials, aes(SLT_preval, urbanization)) +
geom_point(color = "#CC9933") +
geom_smooth(fullrange = TRUE, method = "lm", level = 0.95) +
# Add geom_text
geom_text(aes(label = states)) +
ylab("Urbanization %") +
xlab("Smokeless Tobacco Use %") +
theme(
axis.text = element_text(size = 14),
axis.title = element_text(size = 14)
) +
scale_x_continuous(expand = c(0, 0), limits = c(0, 100)) +
scale_y_continuous(expand = c(0, 0), limits = c(-50, 100)) +
coord_cartesian(xlim = c(0, 70), ylim = c(0, 100)) +
theme(axis.title.y = element_text(margin = margin(t = 0, r = 5, b = 0, l = 0))) +
# Switch to annotate. Add geom. Add datatrials$
annotate(
geom = "label", x = 0.95 * max(datatrials$SLT_preval),
y = 0.92 * max(datatrials$urbanization), size = 4.3, label = "n = 32; p-value = 0.015; \n CI = -0.799:-0.050; rho = -0.426"
)
plot2 <- ggplot(datatrials, aes(smoking_preval, urbanization)) +
geom_point(color = "#615513") +
geom_smooth(fullrange = TRUE, method = "lm", se = FALSE) +
# Add geom_text
geom_text(aes(label = states)) +
ylab("Urbanization %") +
xlab("Smoking %") +
theme(
axis.text = element_text(size = 14),
axis.title = element_text(size = 14)
) +
scale_x_continuous(expand = c(0, 0), limits = c(0, 100)) +
scale_y_continuous(expand = c(0, 0), limits = c(-50, 100)) +
coord_cartesian(xlim = c(0, 70), ylim = c(0, 100)) +
# Switch to annotate. Add geom. Add datatrials$
annotate(geom = "label", x = 1.35 * max(datatrials$smoking_preval), y = 0.92 * max(datatrials$urbanization), size = 4.3, label = "n = 32; p-value = 0.186; \n CI = -0.641:0.165; rho = -0.239")
p1 <- ggMarginal(plot1, type = "histogram", colour = "#FF0000", fill = "#FAC95F")
#> `geom_smooth()` using formula 'y ~ x'
#> `geom_smooth()` using formula 'y ~ x'
p2 <- ggMarginal(plot2, type = "histogram", colour = "#FF0000", fill = "#615513")
#> `geom_smooth()` using formula 'y ~ x'
#> `geom_smooth()` using formula 'y ~ x'
#par(mfrow = c(1, 2))
gridExtra::grid.arrange(p1, p2, ncol = 2)
Created on 2020-06-23 by the reprex package (v0.3.0)

geom_col renders an invisible plot

While trying to generate an error plot, I found the following undesirable behavior:
# sample data (please excuse the length, you'll see it's important!)
a <- structure(list(valor = c(22.35, 23.9, 32, 36.2, 35.2, 24.3, 42,
36.4, 16.65, 40.95, 21, 33.2, 32, 33, 28.9, 28, 40.9, 28.4, 32.5,
24.9, 28.35, 36.4, 31.05, 28.4, 37.9, 35.9, 24, 27.6, 28.6, 37.4,
31.6, 31.9, 28.6, 33.9, 31.2, 27, 25.6, 31.2, 32.5, 26.4, 40,
32.9, 32.9, 31.5, 24.9, 21.9, 33.4, 31.8, 31.1, 29.6, 31.5, 29.8,
32.9, 26.6, 24.6, 35.9), error = c(-18.7573531872138, 1.31066637545209,
NA, 0.277829536700779, -2.64925385673394, -11.8996081065239,
-2.60692704590275, -1.33715023334453, NA, -7.61175343400322,
2.55982080155896, 4.4863429357563, 4.16085789426681, -3.90087313834282,
-1.8721045665811, 0.369086865146173, 12.2927002385953, -0.848796857979458,
4.13045179906004, 4.28348313246773, 3.05347592474616, -5.33715023334453,
-1.68395695575215, 5.15120314202054, -3.45360182568537, 0.700053120316895,
4.50817359293553, 1.58628172614129, 7.54200618644399, 7.58601073994592,
-6.61548902751109, -1.03317248113754, 3.54200618644399, 1.84047336001635,
3.60755820405548, 1.41075911687027, 0.661540377840424, 6.60755820405548,
-15.86954820094, 4.2336254711588, -15.4283737200925, 1.90546464068269,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA,
-56L), class = "data.frame")
This plot works:
p32 <- ggplot(a[1:32,], aes(x = valor, y = error))+geom_col(position = "jitter")
p32
This doesn't work:
p33 <- ggplot(a[1:33,], aes(x = valor, y = error))+geom_col(position = "jitter")
p33
My reasearch
I understand now that this is caused by a zero-width of the bars (see this closed issue), as can be seen here:
# Notice xmin == xmax:
head(ggplot_build(p33)$data[[1]], 3)
x y PANEL group ymin ymax xmin xmax colour
1 22.35 -18.752212 1 -1 -1.875920e+01 0.002793391 22.35 22.35 NA
2 23.90 1.315615 1 -1 -3.452852e-05 1.304455085 23.90 23.90 NA
3 32.00 NA 1 -1 NA NA 32.00 32.00 NA
fill size linetype alpha
1 grey35 0.5 1 NA
2 grey35 0.5 1 NA
3 grey35 0.5 1 NA
What I tried:
I know I can set the width manually:
ggplot(a[1:33,], aes(x = valor, y = error))+geom_col(position = "jitter", width = 0.1)
but the problem is I'm using the ggplot call within a function that takes the data.frame (a in my example) as argument. Obviously, it can have any length and the data may be essentially different, so fixing a width manually creates some plots with very thin bars and other with very thick ones:
ggplot(a, aes(x = valor, y = error))+geom_col(position = "jitter", width = .1)
ggplot(b, aes(x = valor, y = error))+geom_col(position = "jitter", width = .1)
# with b=
b <- structure(list(valor = c(1.03, 0.43, 1.25, 1.2, 0.74, 2.33, 1.49,
1.5, 0.3, 0.96, 0.81, 1.13, 0.83, 0.68, 2.22, 0.68, 0.9, 1.03,
0.39, 0.84, 1.4, 0.85, 0.7, 1.55, 1.08, 0.37, 0.66, 0.67, 1.36,
0.97, 1.03, 0.64, 1, 0.78, 0.62, 0.5, 0.94, 0.56, 2.09, 1.01),
error = c(2.23998224289866, 0.224579421022632, -0.637159523178084,
-2.74850423807165, -2.69675570480791, 4.59843342442166, 2.34260767883423,
-12.4611961378406, 1.02484359455743, 2.46750883802447, 0.376157081501411,
-1.354215218894, 0.947671489701406, 0.426378012316505, 19.9389705823977,
-1.5736219876835, -22.1173385165668, 5.23998224289866, -0.540189922794198,
7.23019854807831, -3.46146029781903, -2.48937236945532, 5.06312219297025,
-1.49229963183367, -3.53967661036512, 0.534698732147042,
1.77779803536164, 6.10360322576836, 6.71339758402689, -5.27443362843563,
1.23998224289866, 1.11679330753741, -0.510113509024535, 0.502074779997471,
1.44272604499763, -0.91952618750328, -17.0537006712522, 3.33491106257746,
-8.09000221353266, 1.7414648468139)), row.names = c(NA, -40L
), class = "data.frame")
Per this post I tried adding +scale_x_continuous(oob = scales::rescale_none) AND / OR +scale_y_continuous(oob = scales::rescale_none), but none of them worked (neither to thicken the thin bars nor to thin the thick ones).
How can I address this and have a call that can handle a and b and produce an output that looks similar? (regarding the appearance of the bars)
A barplot conceptually can't work on a continuous x scale. However, you can use a different geom:
+ geom_segment(aes(xend = valor, yend = 0))

Vertical gradient color with geom_area [duplicate]

This question already has answers here:
How to make gradient color filled timeseries plot in R
(4 answers)
Closed 5 years ago.
I have hard time finding a solution for creating gradient color.
This is how it should look like(dont mind the blue bars)
Something similar to How to make gradient color filled timeseries plot in R, but a bit to advanced for me to reuse this example. I dont have any negative values and max is 80.I have tried the answer offered by nograpes, my PC was frozen for some 6-7 min and then I got message:
Error in rowSums(na) :
'Calloc' could not allocate memory (172440001 of 16 bytes)
This is only a subset of data with 841 rows (some containing NAs), and solution in previous answer could hardly work for me.
df <- structure(list(date = structure(c(1497178800, 1497182400, 1497186000,
1497189600, 1497193200, 1497196800, 1497200400, 1497204000, 1497207600,
1497211200, 1497214800, 1497218400, 1497222000, 1497225600, 1497229200,
1497232800, 1497236400, 1497240000, 1497243600, 1497247200, 1497250800,
1497254400, 1497258000, 1497261600, 1497265200, 1497268800, 1497272400,
1497276000, 1497279600, 1497283200, 1497286800, 1497290400, 1497294000,
1497297600, 1497301200, 1497304800, 1497308400, 1497312000, 1497315600,
1497319200, 1497322800, 1497326400, 1497330000, 1497333600, 1497337200,
1497340800, 1497344400, 1497348000, 1497351600, 1497355200), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), dk_infpressure = c(22, 21.6, 21.2,
20.9, 20.5, 20.1, 19.8, 19.4, 19, 18.6, 18.2, 17.9, 17.5, 17.1,
16.8, 16.4, 16, 15.6, 15.2, 14.9, 14.5, 14.1, 13.8, 13.4, 13,
12.5, 11.9, 11.4, 10.8, 10.3, 9.8, 9.2, 8.7, 8.1, 7.6, 7, 6.5,
6, 5.4, 4.9, 4.3, 3.8, 3.2, 2.7, 2.2, 1.6, 1.1, 0.5, 0, 0)), .Names = c("date",
"dk_infpressure"), row.names = c(NA, -50L), class = c("tbl_df",
"tbl", "data.frame"))
Code to get basic plot:
ggplot()+
geom_area(data=df, aes(x = date, y= dk_infpressure ) )+
scale_y_continuous(limits = c(0, 80))
Because geom_area can't take a gradient fill, it's a somewhat hard problem.
Here's a decidedly hacky but possibly sufficient option that makes a raster (but using geom_tile since x and y sizes differ) and covering the ragged edges with cropping and ggforce::geom_link (a version of geom_segment that can plot a gradient):
library(tidyverse)
df %>%
mutate(dk_infpressure = map(dk_infpressure, ~seq(0, .x, .05))) %>% # make grid of points
unnest() %>%
ggplot(aes(date, dk_infpressure, fill = dk_infpressure)) +
geom_tile(width = 3600, height = 0.05) +
# hide square tops
ggforce::geom_link(aes(color = dk_infpressure, xend = lag(date), yend = lag(dk_infpressure)),
data = df, size = 2.5, show.legend = FALSE) +
scale_x_datetime(expand = c(0, 0)) + # hide overplotting of line
scale_y_continuous(expand = c(0, 0))

Resources