I want to add a geom_text() including a Latex formula to the plot, to describe the mean percentage of each value in the 2 matrices:
library(latex2exp)
library(ggplot2)
library(tidyverse)
percentage <- matrix(c(10,100,90,80,100,97,80,19,90,82,9,87),nrow=2)
colnames(percentage) <- c("value1","value2","value3","value4","value5","value6")
rownames(percentage) <- c("matrix1", "matrix2")
mean_p <- apply(percentage,2,mean)
mat <- c("matrix1", "matrix2")
percentage %>%
as_data_frame() %>%
gather(., Value , Percentage) %>%
ggplot(., aes(x=Value,y=Percentage,color=rep(mat,ncol(percentage)))) +
geom_bar(position = position_dodge(width = 0.8), stat = "identity", fill = "white")`
I tried to add
lab <- character()
for(i in 1:ncol(percentage)){
lab <- c(lab,"",sprintf('$\\oslash%s$',mean_p[i]))
}
geom_text(aes(label=lapply(lab,TeX)),vjust=0,show.legend = FALSE,color="lightblue")
but this doesn't convert the Latex Expression correctly. Has anybody an idea how to fix this Problem?
The output I want to generate should look like this:
I propose a solution using annotate wherease geom_text, it is largely inspired by the following solution :
Annotate a plot made with ggplot2 with an equation using latex2exp::TeX
lab <- character()
for(i in 1:ncol(percentage)){
lab <- c(lab, paste('$\\oslash$', mean_p[i], '$\\%$', sep = " "))
}
percentage %>%
as_data_frame() %>%
gather(., Value , Percentage) %>%
ggplot(., aes(x=Value,y=Percentage,color=rep(mat,ncol(percentage)))) +
geom_bar(position = position_dodge(width = 0.8), stat = "identity", fill = "white") +
annotate('text', x = 1:6, y = percentage[2,], label = lapply(lab, function(x){TeX(x, output = "character")}), hjust=0, size = 4, parse = TRUE)
Related
I am trying to add a number label on each cell of a heatmap. Because it also needs marginal barcharts I have tried two packages. iheatmapr and ComplexHeatmap.
(1st try) iheatmapr makes it easy to add to add bars as below, but I couldnt see how to add labels inside the heatmap on individual cells.
library(tidyverse)
library(iheatmapr)
library(RColorBrewer)
in_out <- data.frame(
'Economic' = c(2,1,1,3,4),
'Education' = c(0,3,0,1,1),
'Health' = c(1,0,1,2,0),
'Social' = c(2,5,0,3,1) )
rownames(in_out) <- c('Habitat', 'Resource', 'Combined', 'Protected', 'Livelihood')
GreenLong <- colorRampPalette(brewer.pal(9, 'Greens'))(12)
lowGreens <- GreenLong[0:5]
in_out_matrix <- as.matrix(in_out)
main_heatmap(in_out_matrix, colors = lowGreens)
in_out_plot <- iheatmap(in_out_matrix,
colors=lowGreens) %>%
add_col_labels() %>%
add_row_labels() %>%
add_col_barplot(y = colSums(bcio)/total) %>%
add_row_barplot(x = rowSums(bcio)/total)
in_out_plot
Then used: save_iheatmap(in_out_plot, "iheatmapr_test.png")
Because I couldnt use ggsave(device = ragg::agg_png etc) with iheatmapr object.
Also, the iheatmapr object's apparent incompatibility (maybe I am wrong) with ggsave() is a problem for me because I normally use ragg package to export image AGG to preserve font sizes. I am suspecting some other heatmap packages make custom objects that maybe incompatible with patchwork and ggsave.
ggsave("png/iheatmapr_test.png", plot = in_out_plot,
device = ragg::agg_png, dpi = 72,
units="in", width=3.453, height=2.5,
scaling = 0.45)
(2nd try) ComplexHeatmap makes it easy to label individual number "cells" inside a heatmap, and also offers marginal bars among its "Annotations", and I have tried it, but its colour palette system (which uses integers to refer to a set of colours) doesnt suit my RGB vector colour gradient, and overall it is a sophisticated package clearly designed to make graphics more advanced than what I am doing.
I am aiming for style as shown in screenshot example below, which was made in Excel.
Please can anyone suggest a more suitable R package for a simple heatmap like this with marginal bars, and number labels inside?
Instead of relying on packages which offer out-of-the-box solutions one option to achieve your desired result would be to create your plot from scratch using ggplot2 and patchwork which gives you much more control to style your plot, to add labels and so on.
Note: The issue with iheatmapr is that it returns a plotly object, not a ggplot. That's why you can't use ggsave.
library(tidyverse)
library(patchwork)
in_out <- data.frame(
'Economic' = c(1,1,1,5,4),
'Education' = c(0,0,0,1,1),
'Health' = c(1,0,1,0,0),
'Social' = c(1,1,0,3,1) )
rownames(in_out) <- c('Habitat', 'Resource', 'Combined', 'Protected', 'Livelihood')
in_out_long <- in_out %>%
mutate(y = rownames(.)) %>%
pivot_longer(-y, names_to = "x")
# Summarise data for marginal plots
yin <- in_out_long %>%
group_by(y) %>%
summarise(value = sum(value)) %>%
mutate(value = value / sum(value))
xin <- in_out_long %>%
group_by(x) %>%
summarise(value = sum(value)) %>%
mutate(value = value / sum(value))
# Heatmap
ph <- ggplot(in_out_long, aes(x, y, fill = value)) +
geom_tile() +
geom_text(aes(label = value), size = 8 / .pt) +
scale_fill_gradient(low = "#F7FCF5", high = "#00441B") +
theme(legend.position = "bottom") +
labs(x = NULL, y = NULL, fill = NULL)
# Marginal plots
py <- ggplot(yin, aes(value, y)) +
geom_col(width = .75) +
geom_text(aes(label = scales::percent(value)), hjust = -.1, size = 8 / .pt) +
scale_x_continuous(expand = expansion(mult = c(.0, .25))) +
theme_void()
px <- ggplot(xin, aes(x, value)) +
geom_col(width = .75) +
geom_text(aes(label = scales::percent(value)), vjust = -.5, size = 8 / .pt) +
scale_y_continuous(expand = expansion(mult = c(.0, .25))) +
theme_void()
# Glue plots together
px + plot_spacer() + ph + py + plot_layout(ncol = 2, widths = c(2, 1), heights = c(1, 2))
I am trying to create a custom histogram with a rug plot showing the original values on the X axis.
I am going to use the mtcars dataset to illustrate. Its not be best dataset for this question...but hopefully the reader will understand what I am trying to achieve...
Below shows the basic histogram, without any rug plot attempt.
I want to create the histogram using geom_bar as this allows for more flexibility with custom bins.
I also want a small gap between the histgram bars (i.e width = 0.95) .... which adds to this
problem's complexity.
library(dplyr)
library(ggplot2)
# create custom bins
vct_seq <- c(seq(from = 10, to = 25, by = 5), 34)
mtcars$bin <- cut(mtcars$mpg, breaks = vct_seq)
# create data.frame for the ggplot graph..using bins above
df_mtcars_count <- mtcars %>% group_by(bin) %>% summarise(count = n())
# indicative labels
vct_labels <- c("bin 1", "bin 2", "bin 3", "bin 4")
# attempt 1 - basic plot -- no rug plot
p <- ggplot(data = df_mtcars_count, aes(x = bin, y = count))
p <- p + geom_bar(stat = "identity", width = 0.95)
p <- p + geom_text(aes(label = count), vjust = -0.5)
p <- p + scale_x_discrete("x title to go here", labels = df_mtcars_count$bin, breaks = df_mtcars_count$bin)
p
Next, try and add a basic rug plot on the X axis. This obviously doesn't work as the geom_bar and geom_rug have completely different scales.
# attempt 2 with no scaling.... doesn't work as x scale for ordinal (bins) and
# x scale for continuous (mpg) do not match
p <- ggplot(data = df_mtcars_count, aes(x = bin, y = count))
p <- p + geom_bar(stat = "identity", width = 0.95)
p <- p + geom_text(aes(label = count), vjust = -0.5)
p <- p + scale_x_discrete("x title to go here", labels = df_mtcars_count$bin, breaks = df_mtcars_count$bin)
p <- p + geom_rug(data = mtcars, aes(x = mpg), inherit.aes = F, alpha = 0.3)
p
Now, try and rescale the mpg column to match with the ordinal scale....
First define a linear mapping function...
fn_linear_map <- function(vct_existing_val, vct_new_range) {
# example....converts 1:20 into the range 1 to 10 like this:
# fn_linear_map(1:20, c(1, 10))
fn_r_diff <- function(x) x %>% range() %>% diff()
flt_ratio <- fn_r_diff(vct_new_range) / fn_r_diff(vct_existing_val)
vct_old_min_offset <- vct_existing_val - min(vct_existing_val)
vct_new_range_val <- (vct_old_min_offset * flt_ratio) + min(vct_new_range)
return(vct_new_range_val)
}
Now apply the function...we try and map mpg to the range 1 to 4 (which is an attempt to match
the ordinal scale)
mtcars$mpg_remap <- fn_linear_map(mtcars$mpg, c(1, 4))
Try the plot again.... getting closer ... but not really accurate...
# attempt 3: getting closer but doesn't really match the ordinal scale
p <- ggplot(data = df_mtcars_count, aes(x = bin, y = count))
p <- p + geom_bar(stat = "identity", width = 0.95)
p <- p + geom_text(aes(label = count), vjust = -0.5)
p <- p + scale_x_discrete("x title to go here", labels = df_mtcars_count$bin, breaks = df_mtcars_count$bin)
p <- p + geom_rug(data = mtcars, aes(x = mpg_remap), inherit.aes = F, alpha = 0.3)
p
The graph above is getting close to what I want....but rug plot does not line up
with the actual data ... example the max observation (33.9) should be displayed
almost aligning with the right hand side of the bar.. see below:
mtcars %>% filter(bin == "(25,34]") %>% arrange(mpg) %>% dplyr::select(mpg, mpg_remap)
Your scale makes no sense to me, as you are showing a bin that is twice as wide using the same bar width. Doing that in combination with a rug strikes me as confusing as best and misleading at worst. I suggest you plot the bars with their correct widths, after which the rug is trivial.
I think the best solution is to just use geom_histogram:
ggplot(mtcars, aes(mpg)) +
geom_histogram(breaks = vct_seq, col = 'grey80') +
geom_rug(aes(mpg, y = NULL))
If you really want the gaps between the bars you'll have to do more work:
library(tidyr)
d <- mtcars %>%
count(bin) %>%
separate(bin, c('min', 'max'), sep = ',', remove = FALSE) %>%
mutate_at(vars('min', 'max'), readr::parse_number) %>%
mutate(
middle = min + (max - min) / 2,
width = 0.9 * (max - min)
)
ggplot(d, aes(middle, n)) +
geom_col(width = d$width) +
geom_rug(aes(mpg, y = NULL), mtcars)
I am attempting to produce a ggplot from within a function. I can do so using the sample data and code below.
If I produce the plot (p) outside of the function, I can revise it with no problem to add a title, subtitle, axis labels, etc. (e.g., p + labs(title = "Most frequent words, by gender")).
However, if I produce the plot from within the function and then attempt to modify it, I get the following error: non-numeric argument to binary operator.
In both cases, the object "p" shows up under Values.
I would of course like to use a function because I have a number of different group_by variables to test, and I want to eliminate typing mistakes (e.g., forgetting to change "gender" to "income" on a later analysis).
Can someone explain why the error arises only after modifying a ggplot created in a function? And of course I would be grateful for advice about how to eliminate the source of the error.
# sample data of favorite activities
df <- tibble(
word = c("walk","hike","garden","garden","walk","hike", "garden","hike","hike","hike","walk"),
gender = c("Male","Female","Female","Female","Male","Male","Male", "Male","Male","Female","Female")
)
df
# function to figure out the proportions of the activities
sum_text_prop <- function(df, groupbyvar) {
groupbyvar <- enquo(groupbyvar)
df %>%
count(!!groupbyvar, word, sort = TRUE) %>%
group_by(groupbyvar = !!groupbyvar) %>%
mutate(proportion = n / sum(n)) %>%
top_n(proportion, n = 5) %>%
ungroup()
}
# function to plot the most common words
plot_text_prop <- function(df) {
p <- ggplot(data = df, aes(x = word, y = proportion, fill = groupbyvar)) +
geom_bar(stat = "identity", alpha = 0.8, show.legend = FALSE) +
facet_wrap(~ groupbyvar, ncol = 2, scales = "free") +
coord_flip()
print(p)
}
# deploy the functions
df %>%
sum_text_prop(groupbyvar = gender) %>%
plot_text_prop()
# add a title to the plot
p + labs(title = "Most frequent words, by gender")
# error: Error in p + labs(title = "Most frequent words, by gender") :
non-numeric argument to binary operator
Update
Thanks to the helpful responses, my revised code is as follows:
plot_text_prop <- function(df) {
ggplot(data = df, aes(reorder_within(word, proportion, groupbyvar),
proportion, fill = groupbyvar)) +
geom_bar(stat = "identity", alpha = 0.8, show.legend = FALSE) +
scale_x_reordered() +
facet_wrap(~ groupbyvar, ncol = 2, scales = "free") +
coord_flip()
}
p <- tidy_infl %>%
sum_text_prop(groupbyvar = gender) %>%
plot_text_prop()
p + labs(title = "Most frequent words, by gender")
I have data with lots of factor variables that I am visualising to get a feel for each of the variables. I am reproducing a lot of the code with minor tweaks for variable names etc. so decided to write a function to simply things. I just can't get it to work...
Dummy Data
ID <- sample(1:32, 128, replace = TRUE)
AgeGrp <- sample(c("18-65", "65-75", "75-85", "85+"), 128, replace = TRUE)
ID <- factor(ID)
AgeGrp <- factor(AgeGrp)
data <- data_frame(ID, AgeGrp)
data
Basically what I am trying to do with each factor variable is produce a bar chart with labels of percentages inside the bars. For example with the dummy data.
plotstats <- #Create a table with pre-summarised percentages
data %>%
group_by(AgeGrp) %>%
summarise(count = n()) %>%
mutate(pct = count/sum(count)*100)
age_plot <- #Plot the data
ggplot(data,aes(x = AgeGrp)) +
geom_bar() + #Add the percentage labels using pre-summarised table
geom_text(data = plotstats, aes(label=paste0(round(pct,1),"%"),y=pct),
size=3.5, vjust = -1, colour = "sky blue") +
ggtitle("Count of Age Group")
age_plot
This works fine with the dummy data - but when I try to create a function...
basic_plot <-
function(df, x){
plotstats <-
df %>%
group_by_(x) %>%
summarise_(
count = ~n(),
pct = ~count/sum(count)*100)
plot <-
ggplot(df,aes(x = x)) +
geom_bar() +
geom_text(data = plotstats, aes(label=paste0(round(pct,1),"%"),
y=pct), size=3.5, vjust = -1, colour = "sky blue")
plot
}
basic_plot(data, AgeGrp)
I get the error code :
Error in UseMethod("as.lazy") : no applicable method for 'as.lazy' applied to an object of class "factor"
I have looked at questions here, here, and here and also looked at the NSE Vignette but can't find my fault.
I got list of data.frame that need to be classified, I did manipulate these list and finally export them as csv files in default folder. However, to make these exported data more informative, I think it is better to generate grouped bar plot, or pie chart for each data.frame objects. As a beginner, I am still learning features of ggplot2 packages, so I have little idea how to do this easily. Can any one give me possible ideas how to generate grouped bar plot easily ? How can I generate well informative bar plot for list of files ? How can I make this happen ? Any idea ? Thanks in advance :)
reproducible data :
savedDF <- list(
bar.saved = data.frame(start=sample(100, 15), stop=sample(150, 15), score=sample(36, 15)),
cat.saved = data.frame(start=sample(100, 20), stop=sample(100,20), score=sample(45,20)),
foo.saved = data.frame(start=sample(125, 24), stop=sample(140, 24), score=sample(32, 24))
)
dropedDF <- list(
bar.droped = data.frame(start=sample(60, 12), stop=sample(90,12), score=sample(35,12)),
cat.droped = data.frame(start=sample(75, 18), stop=sample(84,18), score=sample(28,18)),
foo.droped = data.frame(start=sample(54, 14), stop=sample(72,14), score=sample(25,14))
)
so I am getting list of csv files from this pipeline :
comb <- do.call("rbind", c(savedDF, dropedDF))
cn <- c("letter", "saved","seq")
DF <- cbind(read.table(text = chartr("_", ".", rownames(comb)), sep = ".", col.names = cn), comb)
DF <- transform(DF, updown = ifelse(score>= 12, "stringent", "weak"))
by(DF, DF[c("letter", "saved", "updown")],
function(x) write.csv(x[-(1:3)],
sprintf("%s_%s_%s.csv", x$letter[1], x$updown[1], x$saved[1])))
To better understand the exported data, I think generating grouped bar plot and pie chart for each data.frame object will be much informative.
In desired plot, I intend to see number of features in each csv files for each data.frame objects. Can any one give me ideas to do this task ?
How can I make this happen easily by using ggplot2 packages ? Is there any way to get this done more efficiently ? Thanks a lot
If I understand correctly, this may work for you as a rough solution. Please comment to let me know if this is acceptable. In the future, if you can provide a rough sketch along with your data to show what you're trying to achieve that would be a good idea.
library(dplyr)
library(ggplot2)
plot_data <- DF %>%
group_by(letter, saved, updown) %>%
tally %>%
group_by(saved, updown) %>%
mutate(percentage = n/sum(n))
ggplot(plot_data, aes(x = saved, y = n, fill = saved)) +
geom_bar(stat = "identity") +
facet_wrap(~ letter + updown, ncol = 2)
You can always change the facet_wrap(~ letter + updown, ncol = 2) to an explicit facet_grid(letter ~ updown) if you wish.
Or you could view it this way:
ggplot(plot_data, aes(x = letter, y = n)) +
geom_bar(stat = "identity") +
facet_wrap(~updown+saved, ncol = 2)
For a pie (cleaning up and labeling is up to you):
ggplot(plot_data, aes(x = 1, y = percentage, fill = letter)) +
geom_bar(stat = "identity", width =1) +
facet_wrap(~updown+saved, ncol = 2) +
coord_polar(theta = "y") +
theme_void()
The bar, 4 interaction pie just requires some manipulating of your data:
library(dplyr)
library(tidyr)
library(ggplot2)
plot_data <- DF %>%
unite(interaction, saved, updown, sep = "-") %>%
group_by(letter, interaction) %>%
tally %>%
mutate(percentage = n/sum(n)) %>%
filter(letter == "bar")
ggplot(plot_data, aes(x = 1, y = percentage, fill = interaction)) +
geom_bar(stat = "identity", width =1) +
coord_polar(theta = "y") +
theme_void()
You should really look into dplyr, tidyr and ggplot2 packages. Read their documentation and vignettes and work through the exmaples. Best way to learn is by doing.