Making groups from 2 different datasets in ggboxplot using R - r

I currently have been making box plots to represent my data to show the difference between RATE for 2 different treatments, CALC_ACT = Yes or No, but have only been using one dataset at a time.
I currently have two different datasets that I want to compare but don't know how to put them in the one boxplot.
I have shown below how I am using ggboxplot to represent a single dataset (PatientData).
What I would like is for that dataset to be grouped together and then the same data from my second dataset (PatientData2) to be on the plot next to it with the label of the dataset underneath each section.
Hopefully this makes sense... any tips?
PatientData <- data.frame(PATIENT_ID = c(1,1,2,2,3,3,4,4), CALC_ACT = c("No","Yes","No","Yes","No","Yes","No","Yes"), RATE = c(1,0.1,0.5,0.6,0.8,1,0.5,0.4))
PatientData2 <- data.frame(PATIENT_ID = c(5,5,6,6,7,7,8,8), CALC_ACT = c("No","Yes","No","Yes","No","Yes","No","Yes"), RATE = c(.4,1,0.5,0.6,0.3,0.8,0.6,0.4))
ggboxplot(PatientData, x = "CALC_ACT", y = "RATE",
color = "CALC_ACT", palette = c("#00AFBB", "#E7B800"),
order = c("No", "Yes"),
ylab = "Rate", xlab = "Calcium")

You can combine the two datasets and plot the boxplot.
library(tidyverse)
bind_rows(lst(PatientData, PatientData2), .id = 'Dataset') %>%
unite('CALC_ACT', Dataset, CALC_ACT) %>%
ggplot(aes(CALC_ACT,RATE, color = CALC_ACT)) + geom_boxplot()

If I understand you correctly, I believe this is the solution:
library(tidyverse)
#install.packages("ggpubr")
library(ggpubr)
PatientData <- data.frame(
PATIENT_ID = c(1,1,2,2,3,3,4,4),
CALC_ACT = c("No","Yes","No","Yes","No","Yes","No","Yes"),
RATE = c(1,0.1,0.5,0.6,0.8,1,0.5,0.4)
)
PatientData2 <- data.frame(
PATIENT_ID = c(5,5,6,6,7,7,8,8),
CALC_ACT = c("No","Yes","No","Yes","No","Yes","No","Yes"),
RATE = c(.4,1,0.5,0.6,0.3,0.8,0.6,0.4)
)
combined <- bind_rows(list(PatientData = PatientData,
PatientData2 = PatientData2),
.id = "Source")
ggboxplot(combined, x = "CALC_ACT", y = "RATE",
facet.by = "Source", color = "CALC_ACT",
palette = c("#00AFBB", "#E7B800"),
order = c("No", "Yes"),
ylab = "Rate", xlab = "Calcium",
strip.position = "bottom")

Related

Creating 4 yes/no bar charts on a single plot from frequency tables in R

I have four variables as columns in my data set:
whether the person had free school meals when they were younger
whether the person's parents attended university
whether the person studied A-level drama at school
whether their school offered A-level drama
Each value in the column is either "yes", "no" or "not applicable".
I want to put four sets of bar charts on one single plot (which I can then save as a .png), with each of the bar charts having a yes bar and a no bar.
I have used the below to create a frequency table for each of the variables. Here I've used the example of whether the person received free school meals (FSM) when they were younger:
FSM_df <- champions %>% count(FSM, sort = TRUE) %>% mutate(pct = prop.table(n))
percentage = label_percent()(FSM_df$pct)
FSM_df$percentage = percentage
I can use the code below to create a single bar chart, but I'm not sure how to do this for multiple plots:
ggplot(FSM_df, aes(x = FSM, y = n, fill = "#fe8080")) + geom_bar(stat = "identity", show.legend = FALSE) + coord_flip() + labs(x = "FSM", y = "Number of Champions") + geom_text(aes(label = percentage), color = "#662483")
Generating Random Data
lunch <- sample(0:1, 100, replace = TRUE, prob = c(0.7,0.3))
parents <- sample(0:1, 100, replace = TRUE, prob = c(0.5,0.5))
drama_major <- sample(0:1, 100, replace = TRUE, prob = c(0.9,0.1))
drama_offered <- sample(0:1, 100, replace = TRUE, prob = c(0.8,0.1))
Creating the Tibble
df <- tibble(lunch = lunch,
parents = parents,
drama_major = drama_major,
drama_offered)
pivot_longer
df %>%
pivot_longer(cols = 1:4,
names_to = "measure",
values_to = "measure_is_true_1") %>%
mutate(is_true = if_else(measure_is_true_1 == 0, "no", "yes")) %>%
ggplot(aes(x = measure)) +
geom_bar(aes(fill = is_true), position = "dodge", alpha = 0.7) +
coord_flip() +
theme_bw()
^ in this example, you should convert your data to long format and then set the grouping aesthetics using the fill parameters. The ggplot logic should be: plot my groups along the x axis and count the frequency for each time it's a 0 or 1 in the response column (whether or not they were on free lunch/drama, etc). This is how you can achieve it all on the same plot.
Simple Bar Chart Plot

Order by a value within a "fill" variable ggplot - Bar Chart R

I have the following dataset:
Data:
test <- data.frame(
cluster = c("1", "2", "3","1", "2", "3","1", "2", "3",),
variable = c("age", "age", "age", "speed", "speed", "speed", "price","price","price",),
value = c(0.33,0.12,0.98,0.77,0.7,0.6,0.11,0.04,0.15))
test$variable <- factor(test$variable, levels = c("age","speed","price"))
Code
test %>%
ggplot(aes(x = cluster, y = value ,fill = variable ,group = (cluster))) +
geom_col(position = "stack", color = "black", alpha = .75) +
coord_flip()
I try to order the bar chart by a value within variable, for exampel "age".This is my code i used to visualize the chart, and i already tried the order function, but that doesnt seems to be possible within the "fill" argument.
Think the problem is, that "age" itself is just a value of "variable".
It should be like following:
Is it at all possible to display something like this with ggplot or do i need another package?
You've adjusted the level order of variable, which will affect the order of the fill colors within each bar. To change the order of the axis where you mapped x = cluster, we need to adjust the order of the levels of cluster. As a one-off, you can do this manually. It's a little bit more work to do it responsively:
Manually:
test$cluster = factor(test$cluster, levels = c(2, 1, 3))
Calculating the right order:
library(dplyr)
level_order = test %>%
filter(variable == "age") %>%
group_by(cluster) %>%
summarize(val = sum(value), .groups = "drop") %>%
arrange(val) %>%
pull(cluster)
test = mutate(test, cluster = factor(cluster, levels = level_order))

How to specify groups with colors in qqplot()?

I have created a qqplot (with quantiles of beta distribution) from a dataset including two groups. To visualize, which points belong to which group, I would like to color them. I have tried the following:
res <- beta.mle(data$values) #estimate parameters of beta distribution
qqplot(qbeta(ppoints(500),res$param[1], res$param[2]),data$values,
col = data$group,
ylab = "Quantiles of data",
xlab = "Quantiles of Beta Distribution")
the result is shown here:
I have seen solutions specifying a "col" vector for qqnorm, hover this seems to not work with qqplot, as simply half the points is colored in either color, regardless of group. Is there a way to fix this?
A simulated some data just to shown how to add color in ggplot
Libraries
library(tidyverse)
# install.packages("Rfast")
Data
#Simulating data from beta distribution
x <- rbeta(n = 1000,shape1 = .5,shape2 = .5)
#Estimating parameters
res <- Rfast::beta.mle(x)
data <-
tibble(
simulated_data = sort(x),
quantile_data = qbeta(ppoints(length(x)),res$param[1], res$param[2])
) %>%
#Creating a group variable using quartiles
mutate(group = cut(x = simulated_data,
quantile(simulated_data,seq(0,1,.25)),
include.lowest = T))
Code
data %>%
# Adding group variable as color
ggplot(aes( x = quantile_data, y = simulated_data, col = group))+
geom_point()
Output
For those who are wondering, how to work with pre-defined groups, this is the code that worked for me:
library(tidyverse)
library(Rfast)
res <- beta.mle(x)
# make sure groups are not numerrical
# (else color skale might turn out continuous)
g <- plyr::mapvalues(g, c("1", "2"), c("Group1", "Group2"))
data <-
tibble(
my_data = sort(x),
quantile_data = qbeta(ppoints(length(x)),res$param[1], res$param[2]),
group = g[order(x)]
)
data %>%
# Adding group variable as color
ggplot(aes( x = quantile_data, y = my_data, col = group))+
geom_point()
result

How to make and customize sections within the bars of a barplot created with ggpplot2?

I have a data frame structured like data created here:
set.seed(123)
data <- data.frame(Loc = paste("Loc", seq(1:20), sep = ""),
A = sample(c(0,15,20,25,40),size = 20,replace = T, prob = c(45,25,15,10,5)),
B = sample(c(0,15,20,25,40),size = 20,replace = T, prob = c(45,25,15,10,5)),
C = sample(c(0,15,20,25,40),size = 20,replace = T, prob = c(45,25,15,10,5))
)
data$D <- 100-(data[,2]+data[,3]+data[,4])
data$total <- sample(c(10:20), replace = T, length(data[,1]))
Here, Loc is a grouping variable with 20 levels. Each Loc represents a locations from which samples were taken (the actual "samples" are not here). A, B, and C and D represent clusters that observations were assigned to. The associated values for each Loc that are in the columns A, B, and C and D represent the percentage of observations from each Loc that were assigned to each cluster. The total column represents the total number of observations that were taken from each Loc. For instance, there were 14 observations for Loc1 25% of those observations were assigned to cluster B, and 75% were assigned to cluster D.
I have made a bar plot that shows Loc on the x-axis and total on the y-axis. Assuming each cluster will be given a unique "color", I am trying to color the bars in such a way that for a given Loc the colors will represent the percentage of observations that were assigned to each cluster. For instance, say cluster B is yellow and cluster D is blue, then the bar for Loc1 will be 25% yellow and 75% blue.
I have tried several variants of this:
library(tidyverse)
data%>%
pivot_longer(-c(Loc,total), names_to= "Group", values_to = "val")%>%
ggplot(., aes(x=Loc, y=total, col = Group))+
geom_bar(stat = "identity", aes(fill = val))+
geom_text(aes(label = total))
Which produces this:
Which is close, but not what I want. How can I make this kind of plot? if possible, I would also like to move the value for total to the top of each bar, and the percentage associated with each respective color to be in the center of that "color" or "cluster's" section within each bar.
Try this. I added a variable with the numbers by group.
library(ggplot2)
library(dplyr)
library(tidyr)
set.seed(123)
data <- data.frame(Loc = paste("Loc", seq(1:20), sep = ""),
A = sample(c(0,15,20,25,40),size = 20,replace = T, prob = c(45,25,15,10,5)),
B = sample(c(0,15,20,25,40),size = 20,replace = T, prob = c(45,25,15,10,5)),
C = sample(c(0,15,20,25,40),size = 20,replace = T, prob = c(45,25,15,10,5))
)
data$D <- 100-(data[,2]+data[,3]+data[,4])
data$total <- sample(c(10:20), replace = T, length(data[,1]))
data1 <- data %>%
pivot_longer(-c(Loc,total), names_to= "Group", values_to = "val") %>%
# Number per Group
mutate(val1 = val * total / 100)
data1 %>%
# Map val1 on y, Group on fill
ggplot(., aes(x=Loc, y=val1, fill = Group))+
geom_bar(stat = "identity")+
# Make label only for the first group. Here: A
geom_text(aes(y = total, label = ifelse(Group == "A", total, "")), nudge_y = 1, size = 3) +
# Add percentages
geom_text(aes(y = val1,
label = ifelse(val > 0, scales::percent(val, scale = 1, accuracy = 1), "")),
position = position_stack(vjust = .5), size = 3)
You can try this:
library(reshape2)
library(tidyverse)
#Format Loc
data$Loc <- factor(data$Loc,levels = paste0('Loc',1:dim(data)[1]),ordered = T)
#Melt
df <- melt(data,id.vars = c('Loc','total'))
#Create label
df$Label <- ifelse(df$value==0,NA,paste0(df$value,'%'))
#Plot
ggplot(df,aes(x=Loc,y=value,color=variable,group=variable,label=Label,fill=variable))+
geom_bar(stat='identity')+
geom_text(position=position_stack(vjust=0.5),color='black')+
geom_text(inherit.aes = FALSE, data = data,
aes(x = Loc, y = 100, label = total), vjust = -0.25)

Specify the order for groups when using unite from dplyr for plotting with ggplot

I wanted to do something like this
Add multiple comparisons using ggsignif or ggpubr for subgroups with no labels on x-axis
I got this far:
Packages and Example data
library(tidyverse)
library(ggpubr)
library(ggpol)
library(ggsignif)
example.df <- data.frame(species = sample(c("primate", "non-primate"), 50, replace = TRUE),
treated = sample(c("Yes", "No"), 50, replace = TRUE),
gender = sample(c("male", "female"), 50, replace = TRUE),
var1 = rnorm(50, 100, 5))
Levels
example.df$species <- factor(example.df$species,
levels = c("primate", "non-primate"), labels = c("p", "np"))
example.df$treated <- factor(example.df$treated,
levels = c("No", "Yes"), labels = c("N","Y"))
example.df$gender <- factor(example.df$gender,
levels = c("male", "female"), labels = c("M", "F"))
Since I have had no luck in getting either ggsignif or ggpubr to work with placing the significant groups correctly when the groups they need to refer to are not explicitly named in the x-axis (as they are subgroups of each variable in the x-axis and are indicated only in the fill legend and not the x-axis, I tried this instead.
example.df %>%
unite(groups, species, treated, remove = F, sep= "\n") %>%
{ggplot(., aes(groups, var1, fill= treated)) +
geom_boxjitter() +
facet_wrap(~ gender, scales = "free") +
ggsignif::geom_signif(comparisons = combn(sort(unique(.$groups)), 2, simplify = F),
step_increase = 0.1)}
I get this,
Faceted plot with significance values computed for every group
However, the order of the combined groups on the x -axis is not how I want it. I want to order it with p/N, np/N, p/Y, np/Y for each facet.
How do I do this? Any help is greatly appreciated.
Edit: Creating a new variable using mutate and making it an ordered factor with my preferred plotting order solves.
example.df %>%
unite(groups, species, treated, remove = F, sep= "\n") %>%
mutate(groups2 = factor(groups, levels = c("p\nN", "np\nN", "p\nY", "np\nY"),
ordered = TRUE)) %>%
{ggplot(., aes(groups2, var1, fill= treated)) +
geom_boxjitter() +
facet_wrap(~gender,scales = "free") +
ggsignif::geom_signif(comparisons = combn(sort(unique(.$groups2)), 2, simplify = F),
step_increase = 0.1)}
But I am still looking for solutions to not having to use unite at all and keeping the original factors and still get significance values to plot using ggsignif or ggpubr.
The default parameters for interaction (from the base package) appear to give the factor ordering you are looking for:
example.df %>%
mutate(groups = interaction(species, treated, sep = "\n")) %>%
{ggplot(., aes(groups, var1, fill= treated)) +
geom_boxjitter() +
facet_wrap(~ gender, scales = "free") +
geom_signif(comparisons = combn(sort(as.character(unique(.$groups))), 2, simplify = F),
step_increase = 0.1)}

Resources