Changing default quantiles (hinges) in boxplot - r

I've been learning R for the past few months and I've struggled with something that I couldn't figure out.
I have a really simple question, how do I display percentiles 20 and 80 instead of 25 and 75 (or Q1/Q3) in a boxplot while using tidyverse?
I have tried to find documentation about it in the R Graph Gallery and in the tidyverse help and a lot other sites, but I couldn't reproduce. Usually the examples are with only 1 box, but I have a 7 to be shown.
Here is a sample of my data:
dataset <- structure(
list(
PM1 = c(0.4, 6.2, 5.1, 7.8, 8, NA, NA, 5.2),
PM2 = c(2, 8, 5.6, 8, NA, 6.4, 10.3, 7),
PM3 = c(NA, 7.2, 4.8, 4.4, NA, NA, 10.3, 5.9),
PM4 = c(1.2, 8.7, 5.4, NA, NA, NA, NA, NA),
PM5 = c(3.5, NA, 1.9, 2.2, NA, 3.5, 9.4, 0.3),
PM6 = c(1.3, NA, 1.1, NA, NA, 2.8, NA, NA),
PM7 = c(NA, NA, NA, 0.4, NA, NA, 8.8, 0.6)),
row.names = c(NA, -8L),
class = c("tbl_df", "tbl", "data.frame")
)
I can make the boxplot with this different quantiles using qboxplot, here's the code that I used:
library(qboxplot)
dataset %>%
qboxplot(
main = "Dissolved Oxygen",
probs = c(0.20, 0.50, 0.80),
ylim = c(0, 12),
ylab = "mg/L",
xlab = "Monitoring Points"
)
I have searched for something similar to probs = c(0.20, 0.50, 0.80) from the qboxplot package in the ggplot2 but I found different approaches that I couldn't reproduce, like here, here and here.
library(tidyverse)
dataset %>%
pivot_longer(
cols = everything(),
names_to = "monitoring_point",
values_to = "oxigenio_dissolvido"
) %>%
ggplot(
aes(x = monitoring_point,
y = oxigenio_dissolvido)
)+
stat_boxplot(
geom = "errorbar",
width = 0.3,
position = position_dodge(width = 0.65)
)+
geom_boxplot()+
labs(title = "Dissolved Oxygen",
y = "oxigenio_dissolvido (mg/L)")+
scale_y_continuous(
expand = expansion(mult = c(0,0)),
limits = c(0, 12)
)+
theme_bw()+
theme(
plot.title = element_text(hjust = 0.5)
)
I think I'm close to my desired output, but I really didn't get how to change the hinges. Thank you very much in advance for helping me!

Define the function to draw each element of the box plot:
f <- function(x) {
r <- quantile(x, probs = c(0.1, 0.2, 0.5, 0.8, 0.9))
names(r) <- c("ymin", "lower", "middle", "upper", "ymax")
return(r)
}
Then plot it as a stat_summary()
dataset %>%
pivot_longer(
cols = everything(),
names_to = "monitoring_point",
values_to = "oxigenio_dissolvido"
) %>%
ggplot(aes(monitoring_point,
oxigenio_dissolvido)) +
geom_boxplot() +
stat_summary(fun.data = f, geom="boxplot")
Code modified from a previous related question

Related

R: mapped_discrete` objects can only be created from numeric vectors

I have the following data in R:
df <- structure(list(t0 = c(3.82, -4.88, NA, -3.83, -3.99, NA, NA,
NA, 6.35, 2.47, 0.28, 0.3, NA, 8.31, NA, NA, NA, 2.76, NA, 1.38
), t1 = c(NA, NA, NA, NA, NA, NA, -1.23, 2.19, 4.13, 3.49, -0.42,
NA, 3.78, 2.7, 1.17, NA, NA, NA, NA, NA), t2 = c(-1.85, NA, 1.46,
0.17, NA, NA, -2.81, 1.75, NA, 2.32, -3.08, -1.39, NA, 7.53,
1.77, NA, 0.1, NA, NA, -2.61), t3 = c(-2.05, 3.73, -2.04, -0.22,
-4.29, NA, NA, -0.11, 0.43, NA, -0.78, 3.24, NA, NA, -1.13, 1.09,
NA, NA, 2.7, NA), t4 = c(1.01, -2.77, NA, -3.05, -2.33, 3.78,
NA, NA, NA, NA, -2.04, -4.01, -2.32, 4, -0.28, NA, NA, 9.04,
NA, -4.12), t5 = c(1.56, NA, 4.89, NA, NA, NA, NA, NA, 0.88,
3.15, NA, NA, 2.59, NA, 2.04, NA, NA, NA, -0.26, NA), t6 = c(0.34,
-0.99, NA, 1.93, NA, NA, NA, NA, 0.35, NA, -6.46, NA, NA, NA,
2.57, NA, NA, 4.89, NA, -5.63), t7 = c(0.52, NA, 0.5, 1.85, -6.23,
NA, NA, 1.59, 7.82, 0.82, NA, NA, -1.77, NA, NA, NA, 2.01, NA,
0.7, -1.55), t8 = c(NA, NA, 4.9, -3.93, -8.13, 3.14, 0.03, 1.67,
3.55, NA, -1.55, 2.57, -0.87, NA, 0.71, -0.1, NA, NA, 2.04, NA
), t9 = c(-1.09, NA, -0.52, NA, NA, NA, NA, NA, NA, 2.05, -5.21,
-0.89, -0.03, NA, 0.66, 3.72, -1.96, NA, NA, NA)), row.names = c(NA,
20L), class = "data.frame")
Using the following tutorial (https://jenslaufer.com/data/analysis/visualize_missing_values_with_ggplot.html), I am trying to make a visualization that shows the percentage of missing data:
library(dplyr)
library(ggplot2)
library(tidyverse)
row.plot <- df %>%
mutate(id = row_number()) %>%
gather(-id, key = "key", value = "val") %>%
mutate(isna = is.na(val)) %>%
ggplot(aes(key, id, fill = isna)) +
geom_raster(alpha=0.8) +
scale_fill_manual(name = "",
values = c('steelblue', 'tomato3'),
labels = c("Present", "Missing")) +
scale_x_discrete(limits = levels) +
labs(x = "Variable",
y = "Row Number", title = "Missing values in rows") +
coord_flip()
When I try to see the results, this is the error that I get:
row.plot
Error in `new_mapped_discrete()`:
! `mapped_discrete` objects can only be created from numeric vectors
Run `rlang::last_error()` to see where the error occurred.
Warning messages:
1: In structure(in_domain, pos = match(in_domain, breaks)) :
Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
Consider 'structure(list(), *)' instead.
2: In structure(in_domain, pos = match(in_domain, breaks)) :
Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
Consider 'structure(list(), *)' instead.
3: Removed 200 rows containing missing values (geom_raster).
My Question: Can someone please show me what I am doing wrong and how can I fix this error? In the end, I would like to get this kind of picture:
The error is caused by scale_x_discrete.
You don't need it because in your example id is numeric and doesn't have levels as a factor would:
df %>%
mutate(id = row_number()) %>%
gather(-id, key = "key", value = "val") %>%
mutate(isna = is.na(val)) %>%
ggplot(aes(key, id, fill = isna)) +
geom_raster(alpha=0.8) +
scale_fill_manual(name = "",
values = c('steelblue', 'tomato3'),
labels = c("Present", "Missing")) +
#scale_x_discrete(limits = levels)
labs(x = "Variable",
y = "Row Number", title = "Missing values in rows") +
coord_flip()
It looks like you were wanting to produce this plot for missing data on each row rather than for each variable (though I've provided both here). The main issue is that levels is not provided, so we can create that here, then provide as a factor to scale_x_discrete.
library(tidyverse)
output <- df %>%
mutate(id = row_number()) %>%
pivot_longer(-id, names_to = "key", values_to = "val") %>%
select(-key) %>%
group_by(id) %>%
mutate(isna = is.na(val),
total = n()) %>%
group_by(id, total, isna) %>%
summarise(num.isna = n()) %>%
mutate(pct = num.isna / total * 100)
levels <- output %>% filter(isna == T) %>% arrange(desc(pct)) %>% pull(id)
row.plot <- output %>%
ggplot() +
geom_bar(aes(
x = reorder(id, desc(pct)),
y = pct,
fill = isna
),
stat = 'identity',
alpha = 0.8) +
scale_x_discrete(limits = factor(levels)) +
scale_fill_manual(
name = "",
values = c('steelblue', 'tomato3'),
labels = c("Present", "Missing")
) +
coord_flip() +
labs(title = "Percentage of missing values", x =
'Row Number', y = "% of missing values")
Output
Or if you want to do it by variable, then:
output <- df %>%
pivot_longer(everything(), names_to = "key", values_to = "val") %>%
group_by(key) %>%
mutate(isna = is.na(val),
total = n()) %>%
group_by(key, total, isna) %>%
summarise(num.isna = n()) %>%
mutate(pct = num.isna / total * 100)
levels <- output %>% filter(isna == T) %>% arrange(desc(pct)) %>% pull(key)
row.plot <- output %>%
ggplot() +
geom_bar(aes(
x = reorder(key, desc(pct)),
y = pct,
fill = isna
),
stat = 'identity',
alpha = 0.8) +
scale_x_discrete(limits = levels) +
scale_fill_manual(
name = "",
values = c('steelblue', 'tomato3'),
labels = c("Present", "Missing")
) +
coord_flip() +
labs(title = "Percentage of missing values", x =
'Variable', y = "% of missing values")
Output
When I run the code from your tutorial with your data, there is no error. Maybe you want something like this:
library(tidyverse)
missing.values <- df %>%
gather(key = "key", value = "val") %>%
mutate(isna = is.na(val)) %>%
group_by(key) %>%
mutate(total = n()) %>%
group_by(key, total, isna) %>%
summarise(num.isna = n()) %>%
mutate(pct = num.isna / total * 100)
levels <- (missing.values %>% filter(isna == T) %>% arrange(desc(pct)))$key
percentage.plot <- missing.values %>%
ggplot() +
geom_bar(aes(x = reorder(key, desc(pct)), y = pct, fill=isna), stat = 'identity', alpha=0.8, width = 1) +
scale_x_discrete(limits = levels) +
scale_fill_manual(name = "", values = c('goldenrod3', 'firebrick3'), labels = c("Present", "Missing")) +
coord_flip() +
labs(title = "Percentage of missing values", x = 'Variable', y = "% of missing values") +
theme_bw() +
theme(panel.grid = element_blank(),
panel.border = element_blank())
Output:

Change font of specific rows to bold in forestplot

I wrote a script using the "forestplot" package. I want to group the variables in certain categories, which I would like to show in bold, in order to accentuate those categories. How can i adjust my script, so that only certain rows, i.e Risk factor OR (95% CI), patient characteristics, medication history, comorbidities, surgical history and other are shown in bold? I have two colums and 18 rows. Can someone help me? I would be much grateful!!
My script is as below:
tabletext <- cbind(
c("Risk factor" ,"Patient characteristics","Sex, male*", "Bmi (5 points)",
"Alcohol (5 units)", "Smoking*","Medication history",
"Steroid use", "Anticoagulant use*","Comorbidities",
"COPD GOLD 1/2", "COPD GOLD 3/4", "Other pulmonary disease",
"Surgical history",
"Previous colorectal surgery*",
"Previous abdominal surgery (other)","Other", "HIPEC*"),
c("OR (95% CI)",NA, "1.78 (1.20-2.68)", "1.15 (0.95-1.38)", "1.04 (0.94-1.14)",
"1.78 (1.11-2.80)", NA," 1.40 (0.68-2.67)", "1.55 (1.02-2.32)",NA,
"1.40 (0.70-2.61)", "1.56 (0.42-4.67)", "1.78 (0.63-4.28)",NA,
"1.61 (1.03-2.49)", "0.80 (0.47-1.32)",NA, "4.14 (2.14-7.73)"))
?fpTxtGp
require(forestplot)
forestplot(tabletext,
txt_gp = fpTxtGp(label = list(gpar(fontfamily = "Times",
fontface="bold"),
gpar(fontfamily = "",
col = "black"))),
df_c,new_page = TRUE,
boxsize = 0.2,
is.summary = c(rep(FALSE,32)),
clip = c(0,17),
xlab = 'Odds ratio with 95% confidence interval
* indicates significance',
xlog = FALSE,
zero = 1,
plotwidth=unit(12, "cm"),
colgap=unit(2, "mm"),
col = fpColors(box = "royalblue",
line = "darkblue",
summary = "royalblue"))
Its not clear what df_c is so I just created it based on your tabletext matrix:
df_c <- data.frame(mean = c(NA, NA, 1.78, 1.15, 1.04, 1.78, NA, 1.4, 1.55,
NA, 1.4, 1.56, 1.78, NA, 1.61, 0.8, NA, 4.14),
lower = c(NA, NA, 1.2, 0.95, 0.94, 1.11, NA, 0.68, 1.02, NA, 0.7,
0.42, 0.63, NA, 1.03, 0.47, NA, 2.14),
upper = c(NA, NA, 2.68, 1.38,1.14, 2.8, NA, 2.67,2.32, NA,
2.61, 4.67, 4.28, NA, 2.49, 1.32, NA, 7.73))
From there, its just a matter of adjusting the values passed to is.summary:
forestplot(tabletext,
txt_gp = fpTxtGp(label = list(gpar(fontfamily = "Times"),
gpar(fontfamily = "",
col = "black"))),
df_c,new_page = TRUE,
boxsize = 0.2,
is.summary = c(TRUE, TRUE, rep(FALSE, 4),
TRUE, FALSE, FALSE, TRUE,
rep(FALSE,3), TRUE, rep(FALSE,4)),
clip = c(0,17),
xlab = 'Odds ratio with 95% confidence interval
* indicates significance',
xlog = FALSE,
zero = 1,
plotwidth=unit(12, "cm"),
colgap=unit(2, "mm"),
col = fpColors(box = "royalblue",
line = "darkblue",
summary = "royalblue"))
Which generates the following figure:

Ordering matrix plot using ggplot2

I am trying to plot a matrix plot using ggplot2. I am using the following code
library(tidyverse)
library(RColorBrewer)
df %>%
mutate(Models = factor(Models, labels = c("NDVI","SR","WBI","NWI-1","NWI-2","NWI-3","NWI-4","1650/2220 nm ratio"))) %>%
pivot_longer(-Models) %>%
mutate(p.value = cut(value, c(max(value, na.rm = T), 0.05, 0.01, min(value, na.rm = T)),
labels = c("NS","< 0.05","< 0.01"))) %>%
ggplot(aes(x=Models,y=name, fill=p.value)) +
theme_bw() +
geom_tile() +
xlab("Parameters") + ylab(" ") +
theme(text=element_text(size=18, family="serif"))+
scale_colour_manual(values = c("#DAA520", "#F5DEB3", "#FFF8DC","#DCDCDC"),
aesthetics = c("colour", "fill")) +
geom_text(aes(label=format(round(value, 2), nsmall = 2)), color="black", size=2)
which returns me the following plot
As you can see from the plot the x-axis labels are ordered according to my order. But I am unable to order y-axis. So, my questions are
How can I order y-axis? and
How to remove the NAs?
Only colour values < 0.05 and < 0.01 and > 0.05, not all.
Data
df = structure(list(Models = c("NDVI", "SR", "WBI", "NWI-1", "NWI-2",
"NWI-3", "NWI-4", "1650/2220 nm ratio"), NDVI = c(NA, 0.008,
0.017, 0.58, 0.02, 0.035, 0.067, 0.027), SR = c(NA, NA, 0.203,
0.542, 0.618, 0.825, 0.007, 0.015), WBI = c(NA, NA, NA, 0.506,
0.438, 0.086, 0.035, 0.067), `NWI-1` = c(NA, NA, NA, NA, 0.912,
0.698, 0.868, 0.319), `NWI-2` = c(NA, NA, NA, NA, NA, 0.782,
0.956, 0.268), `NWI-3` = c(NA, NA, NA, NA, NA, NA, 0.825, 0.166
), `NWI-4` = c(NA, NA, NA, NA, NA, NA, NA, 0.052), `1650/2220.nm.ratio` = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
)), row.names = c(NA, 8L), class = "data.frame")
The output should look like the following
Try the following :
Get the data in long format first and then change both Models and column names to factor in different order.
library(tidyverse)
fac_levels <- c("NDVI","SR","WBI","NWI-1","NWI-2","NWI-3","NWI-4","1650/2220 nm ratio")
df %>%
pivot_longer(-Models, values_drop_na = TRUE) %>%
mutate(Models = factor(Models, levels = fac_levels),
name = factor(name, levels = rev(fac_levels)),
p.value = cut(value, c(max(value, na.rm = T), 0.05, 0.01, min(value, na.rm = T)),
labels = c("NS","< 0.05","< 0.01"))) %>%
ggplot(aes(x=Models,y=name, fill=p.value)) +
theme_bw() +
geom_tile() +
xlab("Parameters") + ylab(" ") +
theme(text=element_text(size=18, family="serif"))+
scale_colour_manual(values = c("#DAA520", "#F5DEB3", "#FFF8DC","#DCDCDC"),
aesthetics = c("colour", "fill")) +
geom_text(aes(label=format(round(value, 2), nsmall = 2)), color="black", size=2) +
scale_x_discrete(drop=FALSE) +
scale_y_discrete(drop=FALSE)

Ylim max to change dynamically with a variable, while min is set to 0 in R

I would like my graphs to start at y= 0, but I would like the maximum to change with a multiple of the data, or somehow otherwise zoom out dynamically. I have 34 charts in this set with various ymax.
I have tried scale_y_continuous and coord_cartesian but when I try to put in the expand = expand_scale(mult = 2) that works for getting my maximum to change dynamically, but then the graphs start to start at negative numbers, and I want them to start at 0.
title<- c(
"Carangidae",
"Atlantic cutlassfish",
"Lizardfish",
"Sharks",
"Mackerel")
#DATA#
biomass<- structure(list(timestep = structure(c(10957, 10988, 11017, 11048,
11078, 11109, 11139, 11170, 11201, 11231, 11262, 11292), class = "Date"),
bio_pre_Carangidae = c(0.01105, 0.0199, 0.017,
0.01018, 0.0119, 0.0101, 0.009874, 0.009507,
0.009019, 0.00843, 0.00841, 0.00805), bio_obs_Carangidae = c(NA,
NA, NA, NA, NA, 0.00239, NA, NA, NA, NA, NA, NA), bio_pre_Atl_cutlassfish = c(0.078,
0.069, 0.067, 0.06872, 0.0729, 0.0769,
0.0775, 0.075, 0.0743, 0.072, 0.071,
0.069), bio_obs_Atl_cutlassfish = c(NA, NA, NA, NA, NA,
0.0325, NA, NA, NA, NA, NA, NA), bio_pre_lizardfish = c(0.0635,
0.062, 0.057, 0.0536, 0.0505, 0.0604,
0.0627, 0.068, 0.0695, 0.066, 0.0623,
0.0598), bio_obs_lizardfish = c(NA, NA, NA, NA, NA, 0.037,
NA, NA, NA, NA, NA, NA), bio_pre_sharks = c(0.025, 0.0155,
0.0148, 0.0135, 0.01379, 0.01398, 0.014,
0.0139, 0.0136, 0.0132, 0.0126, 0.011),
bio_obs_sharks = c(NA, NA, NA, NA, NA, 0.003, NA, NA,
NA, NA, NA, NA), bio_pre_mackerel = c(0.0567, 0.0459,
0.0384, 0.03, 0.0328, 0.0336, 0.0299,
0.0296, 0.02343, 0.02713, 0.0239, 0.019
), bio_obs_mackerel = c(NA, NA, NA, NA, NA, 0.055, NA,
NA, NA, NA, NA, NA)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -12L))
This is my function:
function (biomass, .var1, .var2, .var3) {
p <- ggplot(biomass, aes(x = timestep)) +
geom_line(aes(y = .data[[.var1]], linetype = "Predicted")) + geom_point(size = 3, aes(y = .data[[.var2]], shape = "Observed")) +
ggtitle(paste0(.var3)) +
ylab(expression("biomass" ~ (t/km^2))) +
theme_classic() +
scale_y_continuous(limits = c(0, NA), expand = expand_scale(mult = 2))+
###This is the portion where I cannot figure out how to set ymin = 0 and then ymax to 2* the maximum value of a dataset.##
theme(legend.position = "right") +
theme(axis.ticks = element_line(size = 1), axis.ticks.length = unit(0.25, "cm"))
return(p)
}
## create two separate name vectors
var1_names <- colnames(biomass)[grepl("^bio_pre", colnames(biomass))]
var2_names <- colnames(biomass)[grepl("^bio_obs", colnames(biomass))]
var3_names <- title
## loop through two vectors simultaneously and save result in a list
# ..1 = var1_names, ..2 = var2_names
my_plot_b <- pmap(list(var1_names, var2_names, var3_names), ~ my_bio_plot(biomass, ..1, ..2, ..3))
## merge plots together
# https://cran.r-project.org/web/packages/cowplot/
# install.packages("cowplot", dependencies = TRUE)
dev.new(title = "Model Fit Biomass",
width = 12,
height = 6,
noRStudioGD = TRUE
)
print(my_plot_b)
I can manage to get EITHER a set ymin=0 (a) OR a dynamic ymax (b) but cannot manage to get both.
a
b
How about this? Seems to work on your data.
Define the max for each chart at the top of your function:
my_bio_plot <- function (biomass, .var1, .var2, .var3) {
max_y = 2.0 * max(biomass[[.var1]])
...
scale_y_continuous(limits = c(0, max_y)) +
...
This seems to create the requested output, with min y = 0 and max y = 2 * max y in data.
Updated to add a substantially different approach from yours:
biomass %>%
gather(species, bio, -timestep) %>%
mutate(type = ifelse(stringr::str_detect(species, 'pre'), 'predicted', 'observed'),
species = gsub(".*_", "", species)) %>%
group_by(species) %>%
mutate(ul = max(bio, na.rm = TRUE) * 2) %>%
filter(species == "sharks") -> df
df %>%
ggplot(aes(timestep, bio, group = type)) +
geom_point(aes(shape = type)) +
geom_line(aes(linetype = type)) +
# facet_wrap(~species) +
scale_linetype_manual(name = "",
values = c("blank", 'solid')) +
scale_shape_manual(name = "",
values = c(19, NA))+
scale_y_continuous(limits = c(0, max(df$ul)))
You could remove the filter(species == "sharks") and uncomment thefacet_wrap(~species) and you will get all the species plotted at the same time.

How to create 2D-Grid, raster or heatmap based on group values that include NAs?

Following data:
df <- data.frame(cbind("Group_ID" = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4), "WBHO" = runif(20, 1.0, 7.0), "SI" = runif(20, 1.0, 7.0), "OORT" = c(2.34, 4.64, NA, 5.32, 3.23, 6.01, 5.43, 4.78, 3.98, 3.80, 4.45, NA, NA, 3.18, 4.87, NA, NA, 5.73, 3.52, 4.89), "LMX" = runif(20, 1.0, 7.0),"RL" = runif(20, 1.0, 7.0),"AL" = c(1.54, NA, 1.08, 6.77, NA, NA, 4.56, NA, 5.34, 4.32, 2.45, 3.86, 6.21, 2.89, 7.32, 6.43, NA, 4.56, 3.89, 6.16),"SL" = runif(20, 1.0, 7.0),"RV" = runif(20, 1.0, 7.0),"PT" = runif(20, 1.0, 7.0),"SD" = runif(20, 1.0, 7.0), "HT" = runif(20, 1.0, 7.0), "RTL" = c(2.45, NA, 6.04, 2.88, 3.49, 2.30, NA, 5.32, 2.39, NA, 3.62, 3.22, 4.87, 2.91, 5.41, NA, NA, 4.78, 6.20, NA), "INB" = runif(20, 1.0, 7.0), "ETB" = runif(20, 1.0, 7.0)))
Now, I want to create a raster, 2D-Grid or Heatmap which gives a nice overview of all the variables for each group ("Group_ID") using the mean (the x-axis showing the groups and the y-axis all the variables), giving a particular field green colour for value 1 to 3, yellow for 3 to 5 and green for 5 to 7. I have the following Code to create a df that combines the variables in one column and has the values and Group-belonging in the other two:
library(dplyr)
library(tidyr)
df %>%
gather(key = "variable", value = "value", - Group_ID) -> df_new
This does not work, however, as there are NAs included. However, I want to keep those rows with NAs. Is there a way with which I can do this in the same step?
Then, I would like to create the raster concerning which I have been given the following code which I am not fully sure how to apply in this case:
library(raster)
r <- raster(ncol=nrow(df_new), nrow=15, xmn=0, xmx=4, ymn=0, ymx=15)
values(r) <- as.vector(as.matrix(df$WBHO, df$SI, df$OORT, df$LMX, df$RL, df$AL, df$SL, df$RV, df$PT, df$SD, df$HT, df$RTL,
df$INB, df$ETB)
plot(r, axes=F, box=F, asp=NA)
axis(1, at=seq(), 0:9)
axis(2, at=seq(), c("", colnames(df_new)), las=1)
Thanks for any help!
We can use the dplyr and tidyr to calculate the mean. After that, we can use the cut function to categorize the values. We can then use the geom_tile from the ggplot2 to plot a heatmap. Specify x to be the variable, y is Group_ID (converted to be factor), and fill to be based on value2. No raster package is required.
It is not clear why do you want two groups (1-3, 5-7), both being green. My example assign red to the group 5-7, but you can make changes easily based on your needs.
library(dplyr)
library(tidyr)
df_new <- df %>%
gather(key = "variable", value = "value", - Group_ID) %>%
group_by(Group_ID, variable) %>%
summarise(value = mean(value, na.rm = TRUE)) %>%
mutate(value2 = cut(value, breaks = c(1, 3, 5, 7), labels = c("Low", "Medium", "High"))) %>%
ungroup()
library(ggplot2)
ggplot(df_new, aes(x = variable, y = factor(Group_ID), fill = value2)) +
geom_tile() +
scale_fill_manual(values = c("Low" = "Green", "Medium" = "Yellow", "High" = "Red")) +
labs(
y = "Group_ID"
)

Resources