Error in ggplot Violin Graphs- trouble overlaying boxplot with violin [duplicate] - r

This question already has an answer here:
Align violin plots with dodged box plots
(1 answer)
Closed 2 years ago.
I'm trying to show my data as a violin plot with an overlaid boxplot. I have four groups, split by two independent factors, so I put in the commands below.
The table has X1, Category, and Area columns.
ggplot(malbdata,aes(x=X1,y=Area,fill=Category))+
geom_violin()+
geom_boxplot(width=.1)
And what I get is the graph attached, where it places the boxplots next to the violins, but not within them. I'm very new to working with R; any ideas on what might be going wrong?

I believe the issue is the width = 0.1 parameter, e.g.
library(tidyverse)
library(palmerpenguins)
penguins %>%
na.omit() %>%
select(species, island, bill_length_mm) %>%
ggplot(aes(x = island, y = bill_length_mm, fill = species)) +
geom_boxplot(width=.1) +
geom_violin()
If you make the widths the same they line up as expected:
library(tidyverse)
library(palmerpenguins)
penguins %>%
na.omit() %>%
select(species, island, bill_length_mm) %>%
ggplot(aes(x = island, y = bill_length_mm, fill = species)) +
geom_boxplot(width=.2) +
geom_violin(width=.2)
Also, instead of using boxplots and violins (both of illustrating the distribution of values) it might be better to plot the individual values and the distribution, e.g.
library(tidyverse)
library(palmerpenguins)
library(ggbeeswarm)
penguins %>%
na.omit() %>%
select(species, island, bill_length_mm) %>%
rename(Species = species, Island = island) %>%
ggplot(aes(x = Island, y = bill_length_mm, fill = Species)) +
geom_boxplot(width=.4, outlier.shape = NA,
position = position_dodge2(preserve = "single")) +
geom_quasirandom(aes(colour = Species), groupOnX = TRUE,
width=.2, alpha = 0.5, dodge.width = 0.4) +
theme_bw(base_size = 16) +
ylab("Bill Length (mm)")

Related

shading under geom_step with discrete x-axis, respecting the factor order

I'd like to shade the area under a geom_step() curve on a plot with a discrete and ordered x-axis, e.g. to show the cumulative distribution for some frequency-ordered categories/
The basic geom_step() curve could be created like this:
library(dplyr)
library(ggplot2)
library(forcats)
diamonds %>%
group_by(color) %>%
summarize(count=n()) %>%
arrange(desc(count)) %>%
mutate(frac_of_tot = count/sum(count),
ecdf=cumsum(frac_of_tot),
color=fct_reorder(color, ecdf)) %>%
ggplot(aes(x=color, y=ecdf, group=0)) +
geom_step() +
expand_limits(y=0) +
labs(title="a pareto-style cumulative distribution chart",
subtitle="with x-axis ordered by decreasing frequency",
y="cumulative fraction of total") +
theme_minimal()
but adding the shaded area using geom_rect() as taught by this answer seems to re-order the x-axis, resulting in a nonsensical plot:
diamonds %>%
group_by(color) %>%
summarize(count=n()) %>%
arrange(desc(count)) %>%
mutate(frac_of_tot = count/sum(count),
ecdf=cumsum(frac_of_tot),
color=fct_reorder(color, ecdf)) %>%
ggplot(aes(x=color, y=ecdf, group=0)) +
geom_step() +
geom_rect(aes(xmin=color, xmax=lead(color), ymin=0, ymax=ecdf), alpha=0.3) +
expand_limits(y=0) +
labs(title="A sudden mess after adding geom_rect",
subtitle="with x-axis surprisingly back in alpha order",
y="cumulative fraction of total") +
theme_minimal()
Why is the geom_rect() layer causing the x-axis to be re-ordered?
How can I produce a plot that looks just like the first one, but with the area under the curve shaded?
It seems to me that doing this with geom_rect is doing it the hard way. With some minor data reshaping you can simply use geom_area
library(dplyr)
library(ggplot2)
library(forcats)
library(tidyr)
diamonds %>%
group_by(color) %>%
summarize(count = n()) %>%
arrange(desc(count)) %>%
mutate(frac_of_tot = count/sum(count),
ecdf = cumsum(frac_of_tot),
ecd = lag(ecdf),
color = fct_reorder(color, ecdf)) %>%
pivot_longer(starts_with("ecd")) %>%
arrange(color, name) %>%
ggplot(aes(x = color, y = value, group = 0)) +
geom_area(position = "identity", color = "black", alpha = 0.5) +
expand_limits(y = 0) +
labs(title = "a pareto-style cumulative distribution chart",
subtitle = "with x-axis ordered by decreasing frequency",
y = "cumulative fraction of total") +
theme_minimal()

Multi-row labels in ggplot2

I have a plot which contains multiple entries of the same items along the x-axis. I have a total of 45 items grouped according to the groups below.
pvalall$Group<-c(rep("Physical",5*162),rep("Perinatal",11*162),rep("Developmental",3*162),
rep("Lifestyle-Life Events",5*162),rep("Parental-Family",13*162),rep("School",3*162),
rep("Neighborhood",5*162))
pvalall$Group <- factor(pvalall$Group,
levels = c("Physical", "Perinatal", "Developmental",
"Lifestyle-Life Events", "Parental-Family",
"School","Neighborhood"))
So essentially there are 162*45=7290 points along the x-axis and each 162 set of them corresponds to one of the variables of interest. How do I get geom_point to only plot one lable for each of these 162 given a list of the variable names c("var1","var2",....,"var45")?
A reprex would be nice, but generally the solution is to create a separate dataframe with one row per group indicating where the labels should go, and to add a geom_text() layer to your plot that uses this dataframe.
My guess is that the code should look like this:
# create a dataframe for the labels
pvalall %>%
group_by(Group) %>%
summarize(Domains = mean(Domains),
`-log10(P-Values)` = mean(`-log10(P-Values)`)) -> label_df
# now make the plot
pvalall %>%
ggplot(aes(x = Domains, y = `-log10(P-Values)`)) +
geom_point(aes(col = Group)) + # putting col aesthetic in here so that the labels are not colored
geom_text(data =label_df, aes(label = Group))
Here is an example with mtcars:
library(tidyverse)
mtcars %>%
group_by(cyl) %>%
summarize(mpg = mean(mpg),
disp = mean(disp)) %>%
mutate(cyl_label = str_c(cyl, "\ncylinders")) -> label_df
mtcars %>%
ggplot(aes(x = mpg, y = disp)) +
geom_point(aes(col = factor(cyl)), show.legend = F) +
geom_text(data = label_df, aes(label = cyl_label))
produces

ggplot2 barplot - adding percentage labels inside the stacked bars but retaining counts on the y-axis

I have created an stacked barplot with the counts of a variables. I want to keep these as counts, so that the different bar sizes represent different group sizes. However, inside the bar plot i would like to add labels that show the proportion of each stack - in terms of percentage.
I managed to create the stacked plot of count for every group. Also I have created the labels and they are are placed correctly. What i struggle with is how to calculate the percentage there?
I have tried this, but i get an error:
dataex <- iris %>%
dplyr::group_by(group, Species) %>%
dplyr::summarise(N = n())
names(dataex)
dataex <- as.data.frame(dataex)
str(dataex)
ggplot(dataex, aes(x = group, y = N, fill = factor(Species))) +
geom_bar(position="stack", stat="identity") +
geom_text(aes(label = ifelse((..count..)==0,"",scales::percent((..count..)/sum(..count..)))), position = position_stack(vjust = 0.5), size = 3) +
theme_pubclean()
Error in (count) == 0 : comparison (1) is possible only for atomic
and list types
desired result:
well, just found answer ... or workaround. Maybe this will help someone in the future: calculate the percentage before the ggplot and then just just use that vector as labels.
dataex <- iris %>%
dplyr::group_by(group, Species) %>%
dplyr::summarise(N = n()) %>%
dplyr::mutate(pct = paste0((round(N/sum(N)*100, 2))," %"))
names(dataex)
dataex <- as.data.frame(dataex)
str(dataex)
ggplot(dataex, aes(x = group, y = N, fill = factor(Species))) +
geom_bar(position="stack", stat="identity") +
geom_text(aes(label = dataex$pct), position = position_stack(vjust = 0.5), size = 3) +
theme_pubclean()

Adding labels to individual % inside geom_bar() using R / ggplot2 [duplicate]

This question already has answers here:
Add percentage labels to a stacked barplot
(2 answers)
Closed 3 years ago.
bgraph <- ggplot(data = data, aes(x = location)) +
geom_bar(aes(fill = success))
success is a percentage calculated as a factor of 4 categories with the varying 4 outcomes of the data set. I could separately calculate them easily, but as the ggplot is currently constituted, they are generated by the geom_bar(aes(fill=success)).
data <- as.data.frame(c(1,1,1,1,1,1,2,2,3,3,3,3,4,4,4,4,4,4,
4,4,5,5,5,5,6,6,6,6,6,6,7,7,7,7,7))
data[["success"]] <- c("a","b","c","c","d","d","a","b","b","b","c","d",
"a","b","b","b","c","c","c","d","a","b","c","d",
"a","b","c","c","d","d","a","b","b","c","d")
names(data) <- c("location","success")
bgraph <- ggplot(data = data, aes(x = location)) +
geom_bar(aes(fill = success))
bgraph
How do I get labels over the individual percentages? More specifically, I wanted 4 individual percentages for each bar. One for yellow, light orange, orange, and red, respectively. %'s all add up to 1.
Maybe there is a way to do this in ggplot directly but with some pre-processing in dplyr, you'll be able to achieve your desired output.
library(dplyr)
library(ggplot2)
data %>%
count(location, success) %>%
group_by(location) %>%
mutate(n = n/sum(n) * 100) %>%
ggplot() + aes(x = location, n, fill = success,label = paste0(round(n, 2), "%")) +
geom_bar(stat = "identity") +
geom_text(position=position_stack(vjust=0.5))
How about creating a summary frame with the relative frequencies within location and then using that with geom_col() and geom_text()?
# Create summary stats
tots <-
data %>%
group_by(location,success) %>%
summarise(
n = n()
) %>%
mutate(
rel = round(100*n/sum(n)),
)
# Plot
ggplot(data = tots, aes(x = location, y = n)) +
geom_col(aes(fill = fct_rev(success))) + # could only get it with this reversed
geom_text(aes(label = rel), position = position_stack(vjust = 0.5))
OUTPUT:

Make multiple smoothed lines more visible in relation to confidence interval fills using ggplot geom_smooth

I'm making a graph of the expression of multiple genes among multiple subjects, displaying the data points and smoothed conditional means with the respective confidence intervals, but the points and lines are obscured by the fill of the confidence intervals. Is there a way to put the points and lines back on the first plane or make the confidence interval fill lighter, to make the points and lines more visible?
data1
library(forcats)
library(ggplot2)
library(tidyr)
tbl_long <- data1 %>%
gather(gene, expression, -X)
tbl_long %>%
ggplot(aes(x = fct_inorder(X), y = expression, color = gene, group = gene)) +
geom_point() +
geom_smooth(aes(fill=gene)) +
theme_classic()
I`m a begginer R user, so any help would be much appreciated
library(dplyr)
library(forcats)
library(ggplot2)
library(readr)
library(tidyr)
"X,ALDOA,ALDOC,GPI,GAPDHS,LDHA,PGK1,PKLR
C1,-0.643185598,-0.645053078,-0.087097464,-0.343085671,-0.770712771,0.004189881,0.088937264
C2,-0.167424935,-0.414607255,0.049551335,-0.405339423,-0.182211808,-0.127414498,-0.313125427
C3,-0.81858642,-0.938110755,-1.141371324,-0.212165875,-0.582733509,-0.299505078,-0.417053296
C4,-0.83403929,-0.36359332,-0.731276681,-1.173581357,-0.42953985,-0.14434282,-0.861271021
C5,-0.689384044,-0.833311409,-0.622961915,-1.13983245,0.479864518,-0.353765462,-0.787467172
C6,-0.465153207,-0.740128773,-0.05430084,0.499455778,-0.692945684,-0.215067456,-0.460695935
S2,0.099525323,0.327565645,-0.315537278,0.065457821,0.78394394,0.189251447,0.11684847
S3,0.33216583,0.190001824,0.749459725,0.224739679,-0.138610536,-0.420150288,0.919318891
S4,0.522281547,0.278411886,1.715325626,0.534957031,1.130054777,-0.129296273,1.803756399
S5,0.691225088,0.665540011,1.661124529,0.662320212,0.267803229,0.853683613,1.105808889
S6,1.269616976,1.86390714,2.069219749,1.312324149,1.498836807,1.794147633,0.842335285
S7,1.254166133,1.819075004,0.44893804,0.438435159,0.482694339,0.446939822,0.802671992
S8,0.751743085,0.702057721,0.657752337,1.668582798,-0.186354601,1.214976683,0.287904556
S9,0.091028475,-0.214746307,0.037471169,-0.90747123,-0.172209571,0.062382102,0.136354703
S10,1.5792826,1.736452158,0.194961866,0.706323594,1.396245579,0.208168636,0.883114282
R2,-0.36289097,-0.252649755,0.026497148,-0.026676693,-0.720750516,-0.087657548,0.390400605
R3,0.106992251,0.290831853,-0.815393104,-0.020562949,-0.579128953,-0.222087138,0.603723294
R4,0.208230649,0.533552023,-0.116632671,1.126588341,-0.09646495,0.157577458,-0.402493353
R5,-0.10781116,0.436174594,-0.969979695,-1.298192703,0.541570124,-0.07591813,-0.704663307
R6,-0.282867322,-0.960902616,0.184185506,-1.215118472,0.856165556,-0.256458847,-1.528611038
R7,-0.300331377,-0.918484952,0.191947526,-0.895049036,1.200294702,0.7120941,-0.047383224
R8,0.278804568,-0.07335879,0.300083636,0.37631121,-0.288228181,0.427576413,0.631281194
R9,0.393632652,0.228379711,-0.201269856,1.731887958,0.141541807,0.242716283,0.154875397
R10,0.731821818,0.058779515,-0.310899832,0.578285435,-0.474621274,0.126920851,0.017104493" %>%
read_csv() -> tbl_wide
tbl_long <- tbl_wide %>%
gather(gene, expression, -X)
tbl_long %>%
ggplot(aes(x = fct_inorder(X), y = expression, color = gene, fill = gene, group = gene)) +
geom_smooth(method = "loess", alpha = 0.1) +
geom_point() +
labs(x = "Location",
y = "Expression",
color = "Gene",
fill = "Gene") +
theme_classic()

Resources