"Nested" barplots, with multiple levels of grouping - r

How can I group bars in a barplot by a third variable?
I would like to achieve this in base R, without, for example, ggplot2, as in this related question. In another related question the groups of groups are labeled, but not (visually) grouped – as in my example above –, making the plot difficult to read.
Sample data:
groups = c("A", "B")
choices = c("orange", "apple", "beer")
supergroups = c("fruits", "non-fruits")
dat <- data.frame(
group = rep(groups, c(93, 94)),
choice = factor(c(
rep(choices, c(51, 30, 12)),
rep(choices, c(47, 29, 18))
),
levels = choices
),
supergroup = c(
rep(supergroups, c(81, 12)),
rep(supergroups, c(76, 18))
)
)
barplot(table(dat), beside = TRUE)
Which returns the error:
Error in barplot.default(table(dat), beside = TRUE) :
'height' must be a vector or a matrix

Related

ggplot2: heatmap customize legend

I am trying to plot a heatmap (colored by odds ratios) using ggplot2. The odds ratio values range from 0-200. I would like my heatmap legend to show markings corresponding to certain values (0.1, 1, 10, 50, 100, 200). This is the code I am using but my legend does not label all the values (see below)
Code below:
map is a sample data frame with columns: segments, OR, tissue type
segments <- c("TssA", "TssBiv", "BivFlnk", "EnhBiv","ReprPC", "ReprPCWk", "Quies", "TssAFlnk", "TxFlnk", "Tx", "TxWk", "EnhG", "Enh", "ZNF/Rpts", "Het")
OR <- c(1.4787622, 46.99886002, 11.74417278, 4.49223136, 204.975818, 1.85228517, 0.85762414, 0.67926846, 0.33696213, 0.06532777, 0.10478027, 0.07462983, 0.06501252, 1.32922162, 0.32638438)
df <- data.frame(segments, OR)
map <- df %>% mutate(tissue = 'colon')
ggplot(map, aes(tissue,segments, fill = OR))+ geom_tile(colour="gray80")+
theme_bw()+coord_equal()+
scale_fill_gradientn(colours=c("lightskyblue1", "white","navajowhite","lightsalmon", "orangered2", "indianred1"),
values=rescale(c(0.1, 1, 10, 50, 100, 200)), guide="colorbar", breaks=c(0.1, 1, 10, 50, 150, 200))
I am looking for my legend to look something similar to this (using the values I specified):
With your map data, first rescale OR to log(OR).
Also, you might want to assign white to OR = 1. If that's the case, your approach would be able to achieve that. You may want to try different limits values to achieve that with real data.
map_1 <-map %>% mutate(OR = log(OR))
OR_max <- max(map$OR, na.rm = TRUE)
log_list <- c(0.2, 1, 10, 50, 200) %>% log
ggplot(map_1, aes(tissue,segments, fill = OR))+ geom_tile(colour="gray80")+
theme_bw()+coord_equal()+
scale_fill_gradientn(
colours = c("red3", "white", "navy"),
values=rescale(log_list),
guide="colorbar",
breaks=log_list,
limits = c(1/OR_max, OR_max) %>% log,
labels = c("0.1", "1", "10", "50", "200")
)

Adding the split according to the specific rowname in a circular heatmap using R

I am a newer in R. I would like to create a circular heatmap and set some split according to https://jokergoo.github.io/2020/05/21/make-circular-heatmaps/, which says :
If the value for split argument is a factor, the order of the factor levels controls the order of heatmaps. If split is a simple vector, the order of heatmaps is unique(split).
# note since circos.clear() was called in the previous plot,
# now the layout starts from theta = 0 (the first sector is 'e')
circos.heatmap(mat1, split = factor(split, levels = c("e", "d", "c", "b", "a")),
col = col_fun1, show.sector.labels = TRUE)
refered result plot
my data was like this:
esters.csv
This is my code
library(circlize)
library(ComplexHeatmap)
library(dendextend)
mat1=read.csv("esters.csv")
row.names(mat1)<-mat1[,1]#
mat2<-mat1[,-1]##remove the first column
mat3<-mat1[-1,]##remove the first row
#Draw circoheatmap
col_fun1 = colorRamp2(c(0, 0.00001, 0.0001, 0.001, 0.01,0.1, 0.4, 0.8), c("#FAFAFA", "#EAF7E7", "#E0F3DC", "#D7F0D1", "#CDEBC6", "#D5E4FD", "#8CACE3", "#5E7192"))##
circos.par(start.degree = 90, gap.degree = 10, gap.after = c(10))##
mat1 = mat1[sample(165, 165), ] # randomly permute rows
split = sample(letters[1:5], 165, replace = TRUE)
splits = factor(split, levels = letters[1:5])
circos.heatmap(mat2, col = col_fun1, split = splits,
dend.track.height = 0.15,
dend.side = "inside",
rownames.side = "outside",
dend.callback = function(dend, m, si) {
color_branches(dend, k = 4, col = 1:4)
}
)
#By default, the numeric matrix is clustered on rows.
#Used to draw legend
lgd = Legend(title = "Relative abundance", col_fun = col_fun1)
grid.draw(lgd)
circos.clear()
I want to add the split according to the specific row name, like "ester40", "ester80", "ester128". For example, the first split or sector contained 40 rows named "ester1, ester2, ester3, ester4,...to ester40" and all columns from "H6d_T" to "M10d_P".
I tried my best to understand it, but it still did not work.
Did anyone could tell me what should I type in
split = ???

gt R package: Giving a different color to a table's cells according to numerical threshold(s)

Aim
Giving a different color to a table's cells according to numerical threshold(s).
R Package
gt
Reproducible example
mydata <- structure(list(none = c(4, 4, 25, 18, 10), light = c(2, 3, 10,
24, 6), medium = c(3, 7, 12, 33, 7), heavy = c(2, 4, 4, 13, 2
)), row.names = c("SM", "JM", "SE", "JE", "SC"), class = "data.frame")
Using the above dataset, I can produce a table (however crude), using the following code:
mytable <- gt::gt(mydata)
Where I got stuck
It must be really easy, but I can wrap my head around how to assign (say) red to the cells where the value is (say) larger than 20 AND blue to cells whose value is (say) smaller than 10. It's days now that I am trying to do a little of google search (example HERE), but I could not find a solution. It must be pretty simple but no success so far. My best guess is using the tab_style() function, but I am at loss of understanding how to tune the parameters to get what I am after.
This isn't ideal if you have an arbitrarily large data frame, but for an example of your size it's certainly manageable, imo. I generalized the tests as separate functions to reduce additional code duplication and make it easier to adjust your conditional parameters.
If you're looking for a more generalized solution it would be to look over a vector of columns, as described here.
library(gt)
isHigh <- function(x) {
x > 20
}
isLow <- function(x) {
x < 10
}
mydata %>%
gt() %>%
tab_style(
style = list(
cell_fill(color = 'red'),
cell_text(weight = 'bold')
),
locations =
list(
cells_body(
columns = none,
rows = isHigh(none)
),
cells_body(
columns = light,
rows = isHigh(light)
),
cells_body(
columns = medium,
rows = isHigh(medium)
),
cells_body(
columns = heavy,
rows = isHigh(heavy)
)
)
) %>%
tab_style(
style = list(
cell_fill(color = 'lightblue'),
cell_text(weight = 'bold')
),
locations =
list(
cells_body(
columns = none,
rows = isLow(none)
),
cells_body(
columns = light,
rows = isLow(light)
),
cells_body(
columns = medium,
rows = isLow(medium)
),
cells_body(
columns = heavy,
rows = isLow(heavy)
)
)
)
On the basis of the comment I got, and after having read the earlier post here on SO, I came up with the following:
Create a dataset to work with:
mydata <- structure(list(none = c(4, 4, 25, 18, 10), light = c(2, 3, 10,
24, 6), medium = c(3, 7, 12, 33, 7), heavy = c(2, 4, 4, 13, 2
)), row.names = c("SM", "JM", "SE", "JE", "SC"), class = "data.frame")
Create a 'gt' table:
mytable <- gt::gt(mydata)
Create a vector of columns name to be later used inside the 'for' loops:
col.names.vect <- colnames(mydata)
Create two 'for' loops, one for each threshold upon which we want our values to be given different colors (say, a RED text for values > 20; a BLUE text for values < 5):
for(i in seq_along(col.names.vect)) {
mytable <- gt::tab_style(mytable,
style = gt::cell_text(color="red"),
locations = gt::cells_body(
columns = col.names.vect[i],
rows = mytable$`_data`[[col.names.vect[i]]] > 20))
}
for(i in seq_along(col.names.vect)) {
mytable <- gt::tab_style(mytable,
style = gt::cell_text(color="blue"),
locations = gt::cells_body(
columns = col.names.vect[i],
rows = mytable$`_data`[[col.names.vect[i]]] < 5))
}
This seems to achieve the goal I had in mind.

{gtExtras} column showing in wrong order in {gt} table when grouped

I am making a gt table showing the progress of individuals towards a goal. In the table, there is a row showing a horizontal bar graph of progress towards that goal (if goal is 50 and score is 40, the bar is at 80%).
However, when I change the order of the gt rows by using the groupname_col argument, the order of the other cells changes, but not the order of the gtExtras gt_plt_bar_pct column, so it's showing the wrong bars for the name and score in that row, instead, that column seems to always be represented in the order of rows in the input data.
I understand that I can fix this by using arrange on the df before the gt begins, but this doesn't seem like a good solution since I'm going to want to change the order of the rows to view by different groups. Is this a flaw with gtExtras? is there a better fix?
thanks!
reprex:
library(tibble)
library(gt)
library(gtExtras)
library(dplyr)
# make dataframe of individuals and their goals
df <- tribble(
~name, ~group, ~score, ~goal,
"Bob", "C", 20, 40,
"Chris", "A", 50, 40,
"Dale", "B", 30, 50,
"Jay", "A", 0, 40,
"Ben", "B", 10, 20
) %>%
# calculate percent towards goal, and cap at 100%
mutate(percent_to_goal = score/goal *100,
percent_to_goal = case_when(percent_to_goal >= 100 ~ 100,
TRUE ~ percent_to_goal))
df %>%
# this fixes the issue, but doesn't seem like a permanent solution
#arrange(group, name) %>%
# make gt table
gt(rowname_col = "name", groupname_col = "group") %>%
# order groups
row_group_order(groups = c("A","B","C")) %>%
# add bar chart column
gt_plt_bar_pct(column = percent_to_goal) %>%
# highlight blue if person reaches their goal
tab_style(
style = list(
cell_fill(color = "lightcyan"),
cell_text(weight = "bold")),
locations = cells_body(
columns = c(goal,score, percent_to_goal),
rows = score >= goal
)
)
Here is the output from the above code: notice that the length of the bar charts do not always reflect the values of the rows they are appearing in. Instead, they reflect the order of the original dataset.
EDIT: remove row_group_order. If I run the above code again, but comment out the line meant to rearrange the appearance of groups, the grouping shows up in a different order (order of appearance of groups in the original dataset), and the name and first two columns sort into these groups accordingly, but the bar chart column still does not, and remains in the original order of the dataset. Image below:
Per gtExtras v 0.2.4 this bug has been fixed. Thanks for raising and the great reprex!
library(tibble)
library(gt)
library(gtExtras)
library(dplyr)
# make dataframe of individuals and their goals
df <- tribble(
~name, ~group, ~score, ~goal,
"Bob", "C", 20, 40,
"Chris", "A", 50, 40,
"Dale", "B", 30, 50,
"Jay", "A", 0, 40,
"Ben", "B", 10, 20
) %>%
# calculate percent towards goal, and cap at 100%
mutate(percent_to_goal = score/goal *100,
percent_to_goal = case_when(percent_to_goal >= 100 ~ 100,
TRUE ~ percent_to_goal))
df %>%
# make gt table
gt(rowname_col = "name", groupname_col = "group") %>%
# order groups
row_group_order(groups = c("A","B","C")) %>%
# add bar chart column
gt_plt_bar_pct(column = percent_to_goal) %>%
# highlight blue if person reaches their goal
tab_style(
style = list(
cell_fill(color = "lightcyan"),
cell_text(weight = "bold")),
locations = cells_body(
columns = c(goal,score, percent_to_goal),
rows = score >= goal
)
)

Soil profiles with coloured volume fractions with "aqp" in R

I am trying to plot a soil profile in R using the package aqp: algorithms for quantitative pedology. The profile should represent matrix colour, plus mottling colour and percentage. For that purpose, I am using the function addVolumeFraction, which works well to some extent: it plots points on the profile that correspond to the right mottling percentage for each horizon, but it doesn't assign the corresponding colours. Here an example:
#Variables for the soil profile
id <- rep(1, 4)
hor <- c("H1", "H2", "H3", "H4")
tops <- c(0,15,35,60)
bottoms <- c(15, 35, 60, 95)
mx_Hex <- c("#695F59FF", "#A59181FF", "#9E9388FF", "#A59181FF")
mot_Hex <- c("#EEB422","#EEB422", "#CD4F39", "#CD4F39")
mot_perc <- c(5, 10, 40, 8)
#Soil profile df
soildf <- data.frame(id,hor,tops,bottoms, mx_Hex, mot_Hex, mot_perc)
soildf$mx_Hex <- as.character(mx_Hex) #the class "SoilProfileCollection" needs colors as characters
soildf$mot_Hex <- as.character(mot_Hex)
# Transform df to "SoilProfileCollection"
depths(soildf) <- id ~ tops + bottoms
#Plot
plot(soildf, name = "hor", color = "mx_Hex", divide.hz = TRUE)
addVolumeFraction(soildf, "mot_perc",pch = 19, cex.min = 0.4, cex.max = 0.5, col = soildf$mot_Hex)
Soil profile plot
As you can see on the plot, the mottles' colours are mixed along the profile. I would like to have mottles of a given colour for their corresponding horizon instead. Can anybody help me to solve this problem?
Thanks!!
This works as expected in the current version of aqp available on CRAN (v1.19 released in January 2020).
I modified your example below to use alternating black and white mottles in each horizon.
library(aqp)
#Variables for the soil profile
id <- rep(1, 4)
hor <- c("H1", "H2", "H3", "H4")
tops <- c(0,15,35,60)
bottoms <- c(15, 35, 60, 95)
mx_Hex <- c("#695F59FF", "#A59181FF", "#9E9388FF", "#A59181FF")
# change mottle colors to something obviously different in each horizon
mot_Hex <- c("#FFFFFF", "#000000", "#FFFFFF","#000000")
mot_perc <- c(5, 10, 40, 8)
#Soil profile df
soildf <- data.frame(id, hor, tops, bottoms, mx_Hex, mot_Hex, mot_perc)
#the class "SoilProfileCollection" needs colors as characters
soildf$mx_Hex <- as.character(mx_Hex)
soildf$mot_Hex <- as.character(mot_Hex)
# Transform df to "SoilProfileCollection"
depths(soildf) <- id ~ tops + bottoms
#Plot
plot(soildf,
name = "hor",
color = "mx_Hex",
divide.hz = TRUE)
addVolumeFraction(
soildf,
"mot_perc",
pch = 19,
cex.min = 0.4,
cex.max = 0.5,
col = soildf$mot_Hex
)
alternating mottles

Resources