Plot variable with column chart with ggplot with data from read.csv2 - r

I have the height of male and females in my data grouped by cm of 10. I want to plot them togheter side by side.
My graph looks somewhat what I want it to be, but the x-axis says factor(male). It should be height in cm.
Also I got three bars, but there should be two, one for male and one for female.
# Library
library(ggplot2)
library(tidyverse) # function "%>%"
# 1. Define data
data = read.csv2(text = "Height;Male;Female
160-170;5;2
170-180;5;5
180-190;6;5
190-200;2;2")
# 2. Print table
df <- as.data.frame(data)
df
# 3. Plot Variable with column chart
ggplot(df, aes(factor(Male),
fill = factor(Male))) +
geom_bar(position = position_dodge(preserve = "single")) +
theme_classic()

pivot_longer to longformat
Then use geom_bar with fill
library(tidyverse)
df1 <- df %>% pivot_longer(
cols = c(Male, Female),
names_to = "Gender",
values_to = "N"
)
# 3. Plot Variable with column chart
ggplot(df1, aes(x=Height, y=N)) +
geom_bar(aes(fill = Gender), position = "dodge", stat="identity") +
theme_classic()

One solution would be:
df %>%
pivot_longer(cols = 2:3, names_to = "gender", values_to = "count") %>%
ggplot(aes(x = Height, y = count, fill = gender)) +
geom_bar(stat = "identity", position = "dodge") +
theme_classic()

Related

Sorting Y-axis of barplot based on the decresing value of last facet grid in ggplot2

Question:
I am trying to sort the Y-axis of the barplot based on the decreasing value of the last facet group "Step4" with having a common Y-axis label. There are suggestions for ordering all facet groups within themselves but how to do with the common y-axis label and values of one facet group. I have attached a sample data and code for the initial plot to understand the question.
Thanks in advance.
Data:
Download the sample data here
Code:
library(ggplot2)
library(reshape2)
#reading data
data <- read.csv(file = "./sample_data.csv", stringsAsFactors = TRUE)
#reshaping data in longer format using reshape::melt
data.melt <- melt(data)
#plotting the data in multi-panel barplot
ggplot(data.melt, aes(x= value, y=reorder(variable, value))) +
geom_col(aes(fill = Days), width = 0.7) +
facet_grid(.~step, scales = "free")+
theme_pubr() +
labs(x = "Number of Days", y = "X")
Graph: Barplot Graph for the sample data
Summarise the values for last 'step' and extract the levels from the data.
library(dplyr)
library(ggplot2)
lvls <- data.melt %>%
arrange(step) %>%
filter(step == last(step)) %>%
#Or
#filter(step == 'Step4') %>%
group_by(variable) %>%
summarise(sum = sum(value)) %>%
arrange(sum) %>%
pull(variable)
data.melt$variable <- factor(data.melt$variable, lvls)
ggplot(data.melt, aes(x= value, y= variable)) +
geom_col(aes(fill = days), width = 0.7) +
facet_grid(.~step, scales = "free")+
theme_pubr() +
labs(x = "Number of Days", y = "X")

Multi-row labels in ggplot2

I have a plot which contains multiple entries of the same items along the x-axis. I have a total of 45 items grouped according to the groups below.
pvalall$Group<-c(rep("Physical",5*162),rep("Perinatal",11*162),rep("Developmental",3*162),
rep("Lifestyle-Life Events",5*162),rep("Parental-Family",13*162),rep("School",3*162),
rep("Neighborhood",5*162))
pvalall$Group <- factor(pvalall$Group,
levels = c("Physical", "Perinatal", "Developmental",
"Lifestyle-Life Events", "Parental-Family",
"School","Neighborhood"))
So essentially there are 162*45=7290 points along the x-axis and each 162 set of them corresponds to one of the variables of interest. How do I get geom_point to only plot one lable for each of these 162 given a list of the variable names c("var1","var2",....,"var45")?
A reprex would be nice, but generally the solution is to create a separate dataframe with one row per group indicating where the labels should go, and to add a geom_text() layer to your plot that uses this dataframe.
My guess is that the code should look like this:
# create a dataframe for the labels
pvalall %>%
group_by(Group) %>%
summarize(Domains = mean(Domains),
`-log10(P-Values)` = mean(`-log10(P-Values)`)) -> label_df
# now make the plot
pvalall %>%
ggplot(aes(x = Domains, y = `-log10(P-Values)`)) +
geom_point(aes(col = Group)) + # putting col aesthetic in here so that the labels are not colored
geom_text(data =label_df, aes(label = Group))
Here is an example with mtcars:
library(tidyverse)
mtcars %>%
group_by(cyl) %>%
summarize(mpg = mean(mpg),
disp = mean(disp)) %>%
mutate(cyl_label = str_c(cyl, "\ncylinders")) -> label_df
mtcars %>%
ggplot(aes(x = mpg, y = disp)) +
geom_point(aes(col = factor(cyl)), show.legend = F) +
geom_text(data = label_df, aes(label = cyl_label))
produces

Reorder vertical axis alphabetically and change position of binary variable of stacked percent bar graph (ggplot2)

I have a dataset with two variables: 1) ID, 2) Infection Status (Binary:1/0).
I would like to use ggplot2 to
Create a stacked percentage bar graph with the various ID on the verticle-axis (arranged alphabetically with A starting on top), and the percent on the horizontal-axis. I can't seem to get a code that will automatically sort the ID alphabetically as my original dataset has quite a number of categories and will be difficult to arrange them manually.
I also hope to have the infected category (1) to be red and towards the left of the blue non-infected category (0). Is it also possible to change the sub-heading of the legend box from "Non_infected" to "Non-infected"?
I hope that the displayed ID in the plot will include the count of the number of times the ID appeared in the dataset. E.g. "A (n=6)", "B (n=3)"
My sample code is as follow:
ID <- c("A","A","A","A","A","A",
"B","B","B",
"C","C","C","C","C","C","C",
"D","D","D","D","D","D","D","D","D")
Infection <- sample(c(1, 0), size = length(ID), replace = T)
df <- data.frame(ID, Infection)
library(ggplot2)
library(dplyr)
library(reshape2)
df.plot <- df %>%
group_by(ID) %>%
summarize(Infected = sum(Infection)/n(),
Non_Infected = 1-Infected)
df.plot %>%
melt() %>%
ggplot(aes(x = ID, y = value, fill = variable)) + geom_bar(stat = "identity", position = "stack") +
xlab("ID") +
ylab("Percent Infection") +
scale_fill_discrete(guide = guide_legend(title = "Infection Status")) +
coord_flip()
Right now I managed to get this output:
I hope to get this:
Thank you so much!
First, we need to add a count to your original data.frame.
df.plot <- df %>%
group_by(ID) %>%
summarize(Infected = sum(Infection)/n(),
Non_Infected = 1-Infected,
count = n())
Then, we augment our ID column, turn the Infection Status into a factor variable, use forcats::fct_rev to reverse the ID ordering, and use scale_fill_manual to control your legend.
df.plot %>%
mutate(ID = paste0(ID, " (n=", count, ")")) %>%
select(-count) %>%
melt() %>%
mutate(variable = factor(variable, levels = c("Non_Infected", "Infected"))) %>%
ggplot(aes(x = forcats::fct_rev(ID), y = value, fill = variable)) +
geom_bar(stat = "identity", position = "stack") +
xlab("ID") +
ylab("Percent Infection") +
scale_fill_manual("Infection Status",
values = c("Infected" = "#F8766D", "Non_Infected" = "#00BFC4"),
labels = c("Non-Infected", "Infected"))+
coord_flip()

How to change the order of fill aesthetic in faceted ggplot?

I have a faceted ggplot that is all but done. I cannot seem to get the fill aesthetic to be descending for each group in the dodged plot and across facets. The idea is to look at the plot and quickly recognise the top three categories within each group on the y-axis - and that the colors will be order different for each group. Here is some code to get a representative graph.
library(tidyverse)
set.seed(123)
#using crossing from purrr
df <- crossing(
mean = 1:8,
cats = sample(letters[1:3], 8, T),
gender = c('Male', 'Female')) %>%
mutate(vary_x = sample(seq(1,3,.1),nrow(.), T))
df %>%
ggplot(aes(mean, vary_x, fill = cats))+
geom_bar(stat = 'identity',
position = 'dodge') +
facet_grid(.~gender) +
coord_flip()
Something like this maybe:
df %>%
ggplot(aes(mean, reorder(vary_x,mean), fill = cats))+
geom_bar(stat = 'identity',
position = 'dodge') +
facet_grid(.~gender) +
coord_flip()

Order x axis in stacked bar by subset of fill

There are multiple questions (here for instance) on how to arrange the x axis by frequency in a bar chart with ggplot2. However, my aim is to arrange the categories on the X-axis in a stacked bar chart by the relative frequency of a subset of the fill. For instance, I would like to sort the x-axis by the percentage of category B in variable z.
This was my first try using only ggplot2
library(ggplot2)
library(tibble)
library(scales)
factor1 <- as.factor(c("ABC", "CDA", "XYZ", "YRO"))
factor2 <- as.factor(c("A", "B"))
set.seed(43)
data <- tibble(x = sample(factor1, 1000, replace = TRUE),
z = sample(factor2, 1000, replace = TRUE))
ggplot(data = data, aes(x = x, fill = z, order = z)) +
geom_bar(position = "fill") +
scale_y_continuous(labels = percent)
When that didn't work I created a summarised data frame using dplyr and then spread the data and sort it by B and then gather it again. But plotting that didn't work either.
library(dplyr)
library(tidyr)
data %>%
group_by(x, z) %>%
count() %>%
spread(z, n) %>%
arrange(-B) %>%
gather(z, n, -x) %>%
ggplot(aes(x = reorder(x, n), y = n, fill = z)) +
geom_bar(stat = "identity", position = "fill") +
scale_y_continuous(labels = percent)
I would prefer a solution with ggplot only in order not to be dependent of the order in the data frame created by dplyr/tidyr. However, I'm open for anything.
If you want to sort by absolute frequency:
lvls <- names(sort(table(data[data$z == "B", "x"])))
If you want to sort by relative frequency:
lvls <- names(sort(tapply(data$z == "B", data$x, mean)))
Then you can create the factor on the fly inside ggplot:
ggplot(data = data, aes(factor(x, levels = lvls), fill = z)) +
geom_bar(position = "fill") +
scale_y_continuous(labels = percent)
A solution using tidyverse would be:
data %>%
mutate(x = forcats::fct_reorder(x, as.numeric(z), fun = mean)) %>%
ggplot(aes(x, fill = z)) +
geom_bar(position = "fill") +
scale_y_continuous(labels = percent)

Resources