I'd like to make a boxplot with mean instead of median. Moreover, I would like the line to stop at 5% (lower) end 95% (upper) quantile. Here the code;
ggplot(data, aes(x=Cement, y=Mean_Gap, fill=Material)) +
geom_boxplot(fatten = NULL,aes(fill=Material), position=position_dodge(.9)) +
xlab("Cement") + ylab("Mean cement layer thickness") +
stat_summary(fun=mean, geom="point", aes(group=Material), position=position_dodge(.9),color="black")
I'd like to change geom to errorbar, but this doesn't work. I tried middle = mean(Mean_Gap), but this doesn't work either. I tried ymin = quantile(y,0.05), but nothing was changing. Can anyone help me?
The standard boxplot using ggplot. fill is Material:
Here is how you can create the boxplot using custom parameters for the box and whiskers. It's the solution shown by #lukeA in stackoverflow.com/a/34529614/6288065, but this one will also show you how to make several boxes by groups.
The R built-in data set called "ToothGrowth" is similar to your data structure so I will use that as an example. We will plot the length of tooth growth (len) for each vitamin C supplement group (supp), separated/filled by dosage level (dose).
# "ToothGrowth" at a glance
head(ToothGrowth)
# len supp dose
#1 4.2 VC 0.5
#2 11.5 VC 0.5
#3 7.3 VC 0.5
#4 5.8 VC 0.5
#5 6.4 VC 0.5
#6 10.0 VC 0.5
library(dplyr)
# recreate the data structure with specific "len" coordinates to plot for each group
df <- ToothGrowth %>%
group_by(supp, dose) %>%
summarise(
y0 = quantile(len, 0.05),
y25 = quantile(len, 0.25),
y50 = mean(len),
y75 = quantile(len, 0.75),
y100 = quantile(len, 0.95))
df
## A tibble: 6 x 7
## Groups: supp [2]
# supp dose y0 y25 y50 y75 y100
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 OJ 0.5 8.74 9.7 13.2 16.2 19.7
#2 OJ 1 16.8 20.3 22.7 25.6 26.9
#3 OJ 2 22.7 24.6 26.1 27.1 30.2
#4 VC 0.5 4.65 5.95 7.98 10.9 11.4
#5 VC 1 14.0 15.3 16.8 17.3 20.8
#6 VC 2 19.8 23.4 26.1 28.8 33.3
# boxplot using the mean for the middle and 95% quantiles for the whiskers
ggplot(df, aes(supp, fill = as.factor(dose))) +
geom_boxplot(
aes(ymin = y0, lower = y25, middle = y50, upper = y75, ymax = y100),
stat = "identity"
) +
labs(y = "len", title = "Boxplot with Mean Middle Line") +
theme(plot.title = element_text(hjust = 0.5))
In the figure above, the boxplot on the left is the standard boxplot with regular median line and regular min/max whiskers. The boxplot on the right uses the mean middle line and 5%/95% quantile whiskers.
Related
I want to plot a facet_matrix showing scatter plots and autodensity plots on the diagonal. However, for some reason it does not show the density plot for a certain variable (gini_eurostat). I assume this is because there are some missing values for gini_eurostat. How can I make it show the density plot, even though there are some missing values?
This is the code I used:
ggplot(df_Q2, aes(x = .panel_x, y = .panel_y)) +
geom_autodensity() +
geom_point(alpha = 1, shape = 16, size = 0.5) +
facet_matrix(vars(c(intraEU_trade_bymemberstate_pct, gini_eurostat, exports_currentUSD)),
layer.upper = 2, layer.diag=1, layer.lower = 2) +
theme_few()
The data frame looks like this:
head(df_Q2[,c("intraEU_trade_bymemberstate_pct", "gini_eurostat", "exports_currentUSD")])
# A tibble: 6 × 3
# intraEU_trade_bymemberstate_pct gini_eurostat exports_currentUSD
# <dbl> <dbl> <dbl>
# 1 8.6 NA 96701496330.
# 2 8.8 27.4 116638893905.
# 3 8.8 25.8 141025428359.
# 4 8.4 26.3 153625979356.
# 5 8 25.3 170827273868.
# 6 8.1 26.2 204299603066.
I am developing an EDA (Estimation of Distribution Algorithm). I'm getting all measure of the Pareto Front's solutions with distint configurations.
I have a structure with all values:
> metrics20
# A tibble: 320 x 6
File Hypervolume `Modified Hypervolume` Spread Spacing Time
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 001-unif-0.csv 25771 26294. 391. 30.1 16.8
2 002-unif-0.csv 27481 28416. 534. 41.1 16.5
3 003-unif-0.csv 26394 26842. 356. 29.6 16.5
4 004-unif-0.csv 30828 31696 418. 38.0 16.5
5 005-unif-0.csv 28146 28727 444. 34.2 16.6
6 006-unif-0.csv 30176 31006 451. 50.1 16.6
7 007-unif-0.csv 29374 30216 537. 35.8 16.5
8 008-unif-0.csv 27434 28156. 439. 31.4 16.5
9 009-unif-0.csv 28944 29426 471. 33.7 16.4
10 010-unif-0.csv 28339 29302. 576. 44.3 16.4
I want to visualize the values by this way. I take for example the Hipervolume column, I split data by File column value: -unif-, -sat-, -eff- and -prod- distribution and show values with -0.csv,-0.25.csv,-0.5.csv and -0.75.csv in x axis for the same distribution.
Reproducible example:
library(readr)
metrics20 <- read_csv("./metrics20.csv")
Data: Link
Hopefully this is a step towards what you're looking for:
library(readr)
library(dplyr)
library(ggplot2)
metrics20 <- read_csv("metrics20.csv")
metrics20 %>%
mutate(tag = factor(gsub("(^\\d+-)(\\w+)(-.*$)", "\\2", .$File), levels = c("unif", "sat", "eff", "prod")),
level = gsub("(^\\d+-\\w+-)(.*)(\\.csv$)", "\\2", .$File)) %>%
ggplot(aes(x = level, y = Hypervolume)) +
geom_boxplot() +
facet_wrap(~tag, nrow = 1)+
theme_minimal() +
theme(panel.border = element_rect(colour = "black", fill = NA),
panel.grid = element_blank())
From here there may be other things you want to tweak if you need to adjust it to be more like the example plot. You should be able to find all next steps in the help for the functions used.
I am trying to generate a cumulative gain plot using ggplot2 in R. Basically I want to replicate following using ggplot2.
My Data is this
df
# A tibble: 10 x 6
Decile resp Cumresp Gain Cumlift
<int> <dbl> <dbl> <dbl> <dbl>
1 8301 8301 57.7 5.77
2 2449 10750 74.8 3.74
3 1337 12087 84.0 2.80
4 751 12838 89.3 2.23
5 462 13300 92.5 1.85
6 374 13674 95.1 1.58
7 252 13926 96.8 1.38
8 195 14121 98.2 1.23
9 136 14257 99.1 1.10
10 124 14381 100 1
## Cumulative Gains Plot
ggplot(df, aes(Decile, Gain)) +
geom_point() +
geom_line() +
geom_abline(intercept = 52.3 , slope = 4.77)
scale_y_continuous(breaks = seq(0, 100, by = 20)) +
scale_x_continuous(breaks = c(1:10)) +
labs(title = "Cumulative Gains Plot",
y = "Cumulative Gain %")
However, I am not able to get the diagonal line, even though I tried geom_abline or niether my y-axis is right. I could not start from 0 to 100.
I would really appreciate if someone can get me the plot as in picture using ggplot2.
Thanks in advance
library(dplyr); library(ggplot2)
df2 <- df %>%
add_row(Decile = 0, Gain =0) %>%
arrange(Decile)
ggplot(df2, aes(Decile, Gain)) +
geom_point() +
geom_line() +
# This makes another geom_line that only sees the first and last row of the data
geom_line(data = df2 %>% slice(1, n())) +
scale_y_continuous(breaks = seq(0, 100, by = 20), limits = c(0,100)) +
scale_x_continuous(breaks = c(1:10)) +
labs(title = "Cumulative Gains Plot",
y = "Cumulative Gain %")
I'm trying to plot a geom_tile plot for a dataset, where I need to highlight the max and min values in every row (colour palette going from green to red)
Dataset:
draft_mean trim rf_pwr
1 12.0 1.0 12253
2 12.0 0.8 12052
3 12.0 0.6 12132
4 12.0 0.4 12280
5 12.0 0.2 11731
6 12.0 0.0 11317
7 12.0 -0.2 12126
8 12.0 -0.4 12288
9 12.0 -0.6 12461
10 12.0 -0.8 12791
11 12.0 -1.0 12808
12 12.2 1.0 12346
13 12.2 0.8 12041
14 12.2 0.6 12345
15 12.2 0.4 12411
16 12.2 0.2 12810
17 12.2 0.0 12993
18 12.2 -0.2 12796
19 12.2 -0.4 12411
20 12.2 -0.6 12342
21 12.2 -0.8 12671
22 12.2 -1.0 13161
ggplot(dataset, aes(trim, draft_mean)) +
geom_tile(aes(fill=rf_pwr), color="black") +
scale_fill_gradient(low= "green", high= "red") +
scale_x_reverse() +
scale_y_reverse()
This plot (image) is taking the minimum values and plotting them as green and maximum values as red. What I need help with is that I need colour palette to go from green to red (minimum to maximum) for every row of the plot (2 rows in this plot) rather than the whole plot.
For draft_mean=12.2, rf_pwr should be colour formatted from minimum to maximum for trim values.
For every value of draft_mean, I should be able to tell the trim values with lowest and highest rf_pwr.
I can plot individual draft_mean values to check, but all draft_mean values needs to be visualized together.
You can create a scaled variable where min = 0 and max = 1 per group like this:
require(tidyverse)
# create toy data
set.seed(1)
df <- data.frame(
draft_mean =sort(rep(c(12,12.2),11 )),
trim=rep(sample(seq(-1,1,length.out = 11), replace = F),2),
rf_pwr = sample(11000:13000,22)
)
# create a scaled variable per unique draft_mean (min = 0 and max = 1)
df <- df %>% group_by(draft_mean) %>% mutate(rf_scl = (rf_pwr-
min(rf_pwr))/(max(rf_pwr)-min(rf_pwr)))
ggplot(df, aes(trim, draft_mean)) +
geom_tile(aes(fill=rf_scl), color="black") +
scale_fill_gradient(low= "green", high= "red") +
scale_x_reverse() +
scale_y_reverse()
I have data that looks something like this:
time level strain
<dbl> <dbl> <chr>
1 0.0 0.000 M12-611020
2 1.0 0.088 M12-611020
3 3.0 0.211 M12-611020
4 4.0 0.278 M12-611020
5 4.5 0.404 M12-611020
6 5.0 0.606 M12-611020
7 5.5 0.778 M12-611020
8 6.0 0.902 M12-611020
9 6.5 1.024 M12-611020
10 8.0 1.100 M12-611020
11 0.0 0.000 M12-611025
12 1.0 0.077 M12-611025
13 3.0 0.088 M12-611025
14 4.0 0.125 M12-611025
15 5.0 0.304 M12-611025
16 5.5 0.421 M12-611025
17 6.0 0.518 M12-611025
18 6.5 0.616 M12-611025
19 7.0 0.718 M12-611025
I can easily graph it using ggplot, asking ggplot to look at the strains seperatley and using stat_smooth to fit a curve:
ggplot(data = data, aes(x = time, y = level), group = strain) + stat_smooth(aes(group=strain,fill=strain, colour = strain) ,method = "loess", se = F, span = 0.8) +
theme_gray()+xlab("Time(h)") +
geom_point(aes(fill=factor(strain)),alpha=0.5 , size=3,shape = 21,colour = "black", stroke = 1)+
theme(legend.position="right")
I would then like to predict using the loess curve that was fitted to I do so as follows:
# define the model
model <- loess(time ~ strain,span = 0.8, data = data)
# Predict for given levle (x) the time (y)
predict(model, newdata = 0.3, se = FALSE)
I do not know however to predict for one or other of my "strains" set out above (i.e the red or blue lines in the plot)?
Additionally is there a simple way to plot this predicytion on the graph for exmaple in the form of a dotted line going across at 0.3 down to the predicted time as above?
Do you mean something like this?
p <- ggplot(data = dat, aes(x = time, y = level, fill = strain)) +
geom_point(alpha=0.5 , size=3,shape = 21, colour = "black", stroke = 1) +
stat_smooth(aes(group=strain, colour=strain) ,method = "loess", se = F, span = 0.8)
newdat <- split(dat, dat$strain)
mod <- lapply(newdat, function(x)loess(level ~ time,span = 0.8, data = x))
predict(mod[["M12-611020"]], newdata = 2, se = FALSE)
p +
geom_segment(aes(x=2, xend=2, y=0, yend=0.097), linetype="dashed") +
geom_segment(aes(x=0, xend=2, y=0.097, yend=0.097), linetype="dashed")