ggplot2 heatmap total number as Fill value and two - r

I have a table with pokemon data that can be found in Kaggle: Link
I'm trying to produce a heatmap using ggplot2 but can't figure out how to use the sum of each pokemon type in each generation as the fill value. The total value should be calculated from two columns, "Type" and "Other Type"
This is what I tried but it doesn't seem to work.
ggplot(pokemon_mod, aes(x= Generation, y= Type, z= (Type, Other.Type)) +
geom_tile()

One issue in your code is that the color of the tile is specified with the fill aesthetic, not z. Also in general it's better to do feature engineering outside of ggplot2 and then pass the data in.
Your {dplyr} syntax from the comment is not quite right, but you're close with count().
With dplyr::count() you don't need to first group_by() so it saves you a step (it's shorthand for dplyr::group_by(...) %>% dplyr::summarize(count = n()).
If you want to just combine the counts of Type and Other Type, you can concatenate into a new column and then use tidyr::separate_rows() to essentially append them. Then you just have to remove the "NA" values and I think you'll get what you're after:
library(tidyverse)
library(vroom)
d <- vroom("pokemon-data.csv") # downloaded from [Kaggle](https://www.kaggle.com/datasets/swashbuckler1/pokemon-gen1gen8?resource=download)
d %>%
mutate(types = paste(Type, `Other Type`, sep = "_")) %>%
separate_rows(types, sep = "_") %>%
count(Generation, types) %>%
filter(types != "NA") %>%
ggplot(aes(Generation, types)) +
geom_tile(aes(fill = n)) +
scale_x_continuous(breaks = 1:8)
Created on 2022-11-09 with reprex v2.0.2

Related

ggplotly: how to use a special variable with a standard variable in the text aesthetic?

I'm creating an interactive plot and I want the tooltip to show the time (a variable in the dataset) and the count (a variable calculated by ggplot). Using the advice here I can set one or the other, but not both... Here's an example:
library(tidyverse)
data(ChickWeight)
ChickWeight %>%
ggplot(aes(x = Time,
text = paste(format(Time), "-", ..count..))) +
geom_bar()
plotly::ggplotly(tooltip = "text")
This fails with the error object 'Time' not found, but both text = format(Time) and text = ..count.. work individually. How can I include both values in the text aesthetic?
The ..count.. notation has been superseded by after_stat(count), but in either case, this can only be used in a layer employing this statistical transformation, not the global plot aesthetic mapping. The problem is, you can't use the text aesthetic anywhere other than the global aesthetic mapping.
The obvious way round this is to manipulate the data before plotting, which is just as straightforward as getting ggplot to do it for you:
library(tidyverse)
data(ChickWeight)
(ChickWeight %>%
count(Time) %>%
ggplot(aes(x = Time, text = paste(Time, "-", n))) +
geom_bar()) %>%
plotly::ggplotly()

Sort dataset for grouped boxplot

I have a rather untidy dataset and can't wrap my head around how to do this in R. Alternative would be to do this in Excel but since I have several of these, this would take forever.
So what I need is to create a grouped boxplot.
For this I think I need a dataset that consists of 4 columns: species, group (A or B), variable, value.
But what I have at the moment is only:
variable and species_group (together in one column),
Here is a reproducible example:
variable <- c('precipitation','soil','land use')
species1_A <- c(10000, 500, 1322)
species1_B <- c(11500, 200, 600)
species2_A <- c(10000, 500, 1489)
species2_B <- c(15687, 800, 587)
df <- data.frame(variable, species1_A, species1_B,species2_A, species2_B)
So I guess I have to create a whole new column "group" with A or B and somehow tell R to take that information from the "species1_A" name.
Can anyone help me please? Thank you!
I'd suggest the following:
library(tidyverse)
df %>%
pivot_longer(contains("species"), names_to = "name", values_to = "value") %>%
separate(name, c("species", "group"), "_") %>%
ggplot() +
facet_wrap(~variable) +
aes(x = species, y = value, color = group) +
geom_point()
Sorry I'm not sure how you'd want things laid out and you only have one value per group in your example dataset. You can change geom_point to geom_boxplot once you have more variables per group. Spacing between the boxes can be adjusted with position_dodge. HTH.

How to compare 2 categories to the whole categories using facet in ggplot

Hi would like to compare two categories to whole categories using facet_grid or facet_wrap or another function in ggplot. For example i would like to compare statistics of Hospitals 3 and 4 to the whole hospitals.
Hospital<-c("Hosp1","Hosp1","Hosp1","Hosp1","Hosp1",
"Hosp2","Hosp2","Hosp2","Hosp2","Hosp2",
"Hosp3","Hosp3","Hosp3","Hosp3","Hosp3",
"Hosp4","Hosp4","Hosp4","Hosp4","Hosp4")
Disease<-c("D1","D1","D2","D2","D3",
"D1","D1","D1","D3","D3",
"D3","D3","D2","D2","D3",
"D1","D1","D2","D2","D2")
data<-data.frame(Hospital,Disease)
plot<-ggplot(data, aes(x=Disease,fill=Disease))+
geom_bar()+facet_grid(~Hospital)+coord_flip()
Using facet_grid, I have a graph that compares the four hospitals, which I do not want.
I rather want something like this with facets without going through "grid.arrange", because I want to display all disease categories (even if they are null) for all graphs (in order to easily compare) and I don't want the x.axis label to be displayed for each graph because it takes a lot of space
wh<-ggplot(data, aes(x=Disease,fill=Disease))+
geom_bar()+coord_flip()+labs(title = "whole hospital")
H3<-ggplot(data[data$Hospital=="Hosp3",], aes(x=Disease,
fill=Disease))+ geom_bar()+coord_flip()+
labs(title = "hospital3")
H4<-ggplot(data[data$Hospital=="Hosp4",], aes(x=Disease,
fill=Disease))+ geom_bar()+coord_flip()+
labs(title = "hospital4")
grid.arrange(wh,H3,H4,ncol=3)
How about this based on gghighlight
library(ggplot2)
library(dplyr)
data_all <-
data %>%
mutate(Hospital = "Hosp_all") %>%
group_by(Disease) %>%
summarise(total = n())
data %>%
filter(Hospital %in% c("Hosp3", "Hosp4")) %>%
ggplot(aes(x = Disease, fill = Disease))+
geom_col(data = data_all, aes(Disease, total), fill = "gray80")+
geom_bar()+
coord_flip()+
facet_wrap(~Hospital)+
theme(legend.position = "bottom")
Created on 2020-06-23 by the reprex package (v0.3.0)
If your data is not too large, one way to bind the data frames together, add another column that would indicate the dataset (or hospital) then plot with facet :
library(dplyr)
library(ggplot2)
rbind(data,subset(data,Hospital == "Hosp3"),subset(data,Hospital == "Hosp4")) %>%
mutate(hospital=rep(c("whole hospital","Hosp3","Hosp4"),
c(nrow(data),sum(data$Hospital == "Hosp3"),sum(data$Hospital == "Hosp4")))
) %>%
mutate(hospital=factor(hospital,levels=c("whole hospital","Hosp3","Hosp4"))) %>%
ggplot(aes(x=Disease,fill=Disease))+ geom_bar()+coord_flip()+
facet_wrap(~hospital,scale="free_y")

How to assign unique title and text labels to ggplots created in lapply loop?

I've tried about every iteration I can find on Stack Exchange of for loops and lapply loops to create ggplots and this code has worked well for me. My only problem is that I can't assign unique titles and labels. From what I can tell in the function i takes the values of my response variable so I can't index the title I want as the ith entry in a character string of titles.
The example I've supplied creates plots with the correct values but the 2nd and 3rd plots in the plot lists don't have the correct titles or labels.
Mock dataset:
library(ggplot2)
nms=c("SampleA","SampleB","SampleC")
measr1=c(0.6,0.6,10)
measr2=c(0.6,10,0.8)
measr3=c(0.7,10,10)
qual1=c("U","U","")
qual2=c("U","","J")
qual3=c("J","","")
df=data.frame(nms,measr1,qual1,measr2,qual2,measr3,qual3,stringsAsFactors = FALSE)
identify columns in dataset that contain response variable
measrsindex=c(2,4,6)
Create list of plots that show all samples for each measurement
plotlist=list()
plotlist=lapply(df[,measrsindex], function(i) ggplot(df,aes_string(x="nms",y=i))+
geom_col()+
ggtitle("measr1")+
geom_text(aes(label=df$qual1)))
Create list of plots that show all measurements for each sample
plotlist2=list()
plotlist2=lapply(df[,measrsindex],function(i)ggplot(df,aes_string(x=measrsindex, y=i))+
geom_col()+
ggtitle("SampleA")+
geom_text(aes(label=df$qual1)))
The problem is that I cant create unique title for each plot. (All plots in the example have the title "measr1" or "SampleA)
Additionally I cant apply unique labels (from qual columns) for each bar. (ex. the letter for qual 2 should appear on top of the column for measr2 for each sample)
Additionally in the second plot list the x-values aren't "measr1","measr2","measr3" they're the index values for those columns which isn't ideal.
I'm relatively new to R and have never posted on Stack Overflow before so any feedback about my problem or posting questions is welcomed.
I've found lots of questions and answers about this sort of topic but none that have a data structure or desired plot quite like mine. I apologize if this is a redundant question but I have tried to find the solution in previous answers and have been unable.
This is where I got the original code to make my loops, however this example doesn't include titles or labels:
Looping over ggplot2 with columns
You could loop over the names of the columns instead of the column itself and then use some non-standard evaluation to get column values from the names. Also, I have included label in aes.
library(ggplot2)
library(rlang)
plotlist3 <- purrr::map(names(df)[measrsindex],
~ggplot(df, aes(nms, !!sym(.x), label = qual1)) +
geom_col() + ggtitle(.x) + geom_text(vjust = -1))
plotlist3[[1]]
plotlist3[[2]]
The same can be achieved with lapply as well
plotlist4 <- lapply(names(df)[measrsindex], function(x)
ggplot(df, aes(nms, !!sym(x), label = qual1)) +
geom_col() + ggtitle(x) + geom_text(vjust = -1))
I would recommend putting your data in long format prior to using ggplot2, it makes plotting a much simpler task. I also recoded some variables to facilitate constructing the plot. Here is the code to construct the plots with lapply.
library(tidyverse)
#Change from wide to long format
df1<-df %>%
pivot_longer(cols = -nms,
names_to = c(".value", "obs"),
names_sep = c("r","l")) %>%
#Separate Sample column into letters
separate(col = nms,
sep = "Sample",
into = c("fill","Sample"))
#Change measures index to 1-3
measrsindex=c(1,2,3)
plotlist=list()
plotlist=lapply(measrsindex, function(i){
#Subset by measrsindex (numbers) and plot
df1 %>%
filter(obs == i) %>%
ggplot(aes_string(x="Sample", y="meas", label="qua"))+
geom_col()+
labs(x = "Sample") +
ggtitle(paste("Measure",i, collapse = " "))+
geom_text()})
#Get the letters A : C
samplesvec<-unique(df1$Sample)
plotlist2=list()
plotlist2=lapply(samplesvec, function(i){
#Subset by samplesvec (letters) and plot
df1 %>%
filter(Sample == i) %>%
ggplot(aes_string(x="obs", y = "meas",label="qua"))+
geom_col()+
labs(x = "Measure") +
ggtitle(paste("Sample",i,collapse = ", "))+
geom_text()})
Watching the final plots, I think it might be useful to use facet_wrap to make these plots. I added the code to use it with your plots.
#Plot for Measures
ggplot(df1, aes(x = Sample,
y = meas,
label = qua)) +
geom_col()+
facet_wrap(~ obs) +
ggtitle("Measures")+
labs(x="Samples")+
geom_text()
#Plot for Samples
ggplot(df1, aes(x = obs,
y = meas,
label = qua)) +
geom_col()+
facet_wrap(~ Sample) +
ggtitle("Samples")+
labs(x="Measures")+
geom_text()
Here is a sample of the plots using facet_wrap.

ggplot labels for melted dataframes

It's often the case I melt my dataframes to show multiple variables on one barplot. The goal is to create a geom_bar with one par for each variable, and one summary label for each bar.
For example, I'll do this:
mtcars$id<-rownames(mtcars)
tt<-melt(mtcars,id.vars = "id",measure.vars = c("cyl","vs","carb"))
ggplot(tt,aes(variable,value))+geom_bar(stat="identity")+
geom_text(aes(label=value),color='blue')
The result is a barplot in which the label for each bar is repeated for each case (it seems):
What I want to have is one label for each bar, like this:
A common solution is to create aggregated values to place on the graph, like this:
aggr<-tt %>% group_by(variable) %>% summarise(aggrLABEL=mean(value))
ggplot(tt,aes(variable,value))+geom_bar(stat="identity")+
geom_text(aes(label=aggr$aggrLABEL),color='blue')
or
ggplot(tt,aes(variable,value))+geom_bar(stat="identity")+
geom_text(label=dplyr::distinct(tt,value),color='blue')
However, these attempts result in errors, respectively:
For solution 1: Error: Aesthetics must be either length 1 or the same as the data (96): label, x, y
For solution 2: Error in [<-.data.frame(*tmp*, aes_params, value = list(label = list( : replacement element 1 is a matrix/data frame of 7 rows, need 96
So, what to do? Setting geom_text to stat="identity" does not help either.
What I would do is create another dataframe with the summary values of your columns. I would then refer to that dataframe in the geom_text line. Like this:
library(tidyverse) # need this for the %>%
tt_summary <- tt %>%
group_by(variable) %>%
summarize(total = sum(value))
ggplot(tt, aes(variable, value)) +
geom_col() +
geom_text(data = tt_summary, aes(label = total, y = total), nudge_y = 1) # using nudge_y bc it looks better.

Resources