how to plot a muti level variable in muti graphs - r

I would like to make plots with every 3 colors in one graph. for my sample data, I will need two graphs. Also, is it possible to just plot top 3 colors group with most obs? What should I do. Currently I have 6 in one graph. This is just a sample data, my real data has about 50 levels, and my codes won't be able to create sth that is readable. Too crawdad.
The codes are:
ID<- c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18")
Group<-c("A","B","C","D","D","D","A","B","D","C","B","D","A","A","C","B","B","B")
Color<-c("Green","Blue","Red","Red","Black","Yellow","Green","Green","Yellow","Purple","Red","Yellow","Yellow","Yellow","Green","Red","Red","Green")
Realy_Love<-c("Y","N","Y","Y","N","N","Y","Y","Y","N","N","Y","N","Y","N","Y","N","Y")
Sample.data <- data.frame(ID, Group, Color, Realy_Love)
Sample<-Sample.data %>%
count(Group, Color, sort = TRUE)
Sample<-Sample.data %>%
count(Group, Color, Realy_Love, sort = TRUE)
library(dplyr)
library(ggplot2)
Sample.data %>%
count(Group, Color, sort = TRUE) %>%
ggplot(aes(x = Group, y = n, fill = Color)) +
geom_col() +
facet_wrap(~ Color)
Thanks.

For your first question, make two graphs with 3 colors each, you need to create a variable that groups the colors that will go into each group. We will use the case_when() function for this
Sample.data %>%
count(Group, Color, sort = TRUE) %>%
mutate(Facet = case_when(Color %in% c("Black", "Blue", "Green") ~ "Group 1",
Color %in% c("Purple", "Red", "Yellow") ~ "Group 2")) %>%
ggplot(aes(x = Group, y = n, fill = Color)) +
geom_col() +
facet_wrap(~ Facet)
For your second request, about plotting only the 3 colors with most observations, we can use the fct_lump() function from the forcats package:
Sample.data %>%
mutate(Color = fct_lump(f = Color,
n = 3,
other_level = "Other colors")) %>%
filter(Color != "Other colors") %>%
count(Group, Color, sort = TRUE) %>%
ggplot(aes(x = Group, y = n, fill = Color)) +
geom_col() +
facet_wrap(~ Color)

Related

Multi-row labels in ggplot2

I have a plot which contains multiple entries of the same items along the x-axis. I have a total of 45 items grouped according to the groups below.
pvalall$Group<-c(rep("Physical",5*162),rep("Perinatal",11*162),rep("Developmental",3*162),
rep("Lifestyle-Life Events",5*162),rep("Parental-Family",13*162),rep("School",3*162),
rep("Neighborhood",5*162))
pvalall$Group <- factor(pvalall$Group,
levels = c("Physical", "Perinatal", "Developmental",
"Lifestyle-Life Events", "Parental-Family",
"School","Neighborhood"))
So essentially there are 162*45=7290 points along the x-axis and each 162 set of them corresponds to one of the variables of interest. How do I get geom_point to only plot one lable for each of these 162 given a list of the variable names c("var1","var2",....,"var45")?
A reprex would be nice, but generally the solution is to create a separate dataframe with one row per group indicating where the labels should go, and to add a geom_text() layer to your plot that uses this dataframe.
My guess is that the code should look like this:
# create a dataframe for the labels
pvalall %>%
group_by(Group) %>%
summarize(Domains = mean(Domains),
`-log10(P-Values)` = mean(`-log10(P-Values)`)) -> label_df
# now make the plot
pvalall %>%
ggplot(aes(x = Domains, y = `-log10(P-Values)`)) +
geom_point(aes(col = Group)) + # putting col aesthetic in here so that the labels are not colored
geom_text(data =label_df, aes(label = Group))
Here is an example with mtcars:
library(tidyverse)
mtcars %>%
group_by(cyl) %>%
summarize(mpg = mean(mpg),
disp = mean(disp)) %>%
mutate(cyl_label = str_c(cyl, "\ncylinders")) -> label_df
mtcars %>%
ggplot(aes(x = mpg, y = disp)) +
geom_point(aes(col = factor(cyl)), show.legend = F) +
geom_text(data = label_df, aes(label = cyl_label))
produces

Reorder vertical axis alphabetically and change position of binary variable of stacked percent bar graph (ggplot2)

I have a dataset with two variables: 1) ID, 2) Infection Status (Binary:1/0).
I would like to use ggplot2 to
Create a stacked percentage bar graph with the various ID on the verticle-axis (arranged alphabetically with A starting on top), and the percent on the horizontal-axis. I can't seem to get a code that will automatically sort the ID alphabetically as my original dataset has quite a number of categories and will be difficult to arrange them manually.
I also hope to have the infected category (1) to be red and towards the left of the blue non-infected category (0). Is it also possible to change the sub-heading of the legend box from "Non_infected" to "Non-infected"?
I hope that the displayed ID in the plot will include the count of the number of times the ID appeared in the dataset. E.g. "A (n=6)", "B (n=3)"
My sample code is as follow:
ID <- c("A","A","A","A","A","A",
"B","B","B",
"C","C","C","C","C","C","C",
"D","D","D","D","D","D","D","D","D")
Infection <- sample(c(1, 0), size = length(ID), replace = T)
df <- data.frame(ID, Infection)
library(ggplot2)
library(dplyr)
library(reshape2)
df.plot <- df %>%
group_by(ID) %>%
summarize(Infected = sum(Infection)/n(),
Non_Infected = 1-Infected)
df.plot %>%
melt() %>%
ggplot(aes(x = ID, y = value, fill = variable)) + geom_bar(stat = "identity", position = "stack") +
xlab("ID") +
ylab("Percent Infection") +
scale_fill_discrete(guide = guide_legend(title = "Infection Status")) +
coord_flip()
Right now I managed to get this output:
I hope to get this:
Thank you so much!
First, we need to add a count to your original data.frame.
df.plot <- df %>%
group_by(ID) %>%
summarize(Infected = sum(Infection)/n(),
Non_Infected = 1-Infected,
count = n())
Then, we augment our ID column, turn the Infection Status into a factor variable, use forcats::fct_rev to reverse the ID ordering, and use scale_fill_manual to control your legend.
df.plot %>%
mutate(ID = paste0(ID, " (n=", count, ")")) %>%
select(-count) %>%
melt() %>%
mutate(variable = factor(variable, levels = c("Non_Infected", "Infected"))) %>%
ggplot(aes(x = forcats::fct_rev(ID), y = value, fill = variable)) +
geom_bar(stat = "identity", position = "stack") +
xlab("ID") +
ylab("Percent Infection") +
scale_fill_manual("Infection Status",
values = c("Infected" = "#F8766D", "Non_Infected" = "#00BFC4"),
labels = c("Non-Infected", "Infected"))+
coord_flip()

ggplot prioritize line overlap

library(tidyverse)
mtcars %>%
mutate(ID = row_number()) %>%
select(ID, vs, am, gear, carb) %>%
gather(key, value, 2:5) %>%
mutate(violation = c(rep(FALSE, 96), rep(TRUE, 32))) %>%
ggplot(aes(ID, value, group = key, color = violation)) +
scale_color_manual(values = c("grey", "red")) +
geom_line() +
theme_classic()
In the image below the red 'violation' line is broken up into segments. I assume this is because ggplot is plotting lines sequentially, and one of the grey lines, is plotted after the red line, with the same coordinates. How do I stop the grey lines from overlapping the red?
As referenced in other stackoverflow questions, I would add a seperate line like this:
geom_line(df %>% filter(violation == TRUE), aes(color = "red")) +
but this causes problems when there are no violations in my data frames. I do monthly analyses and some months contain violations, some months do not. If I add this single line above I get an error "must be length greater than 0" for the months absent violations, so this one-liner approach probably won't work.
You might use the following code (with only 2 minor changes as compared to your code)
library(tidyverse)
mtcars %>%
mutate(ID = row_number()) %>%
select(ID, vs, am, gear, carb) %>%
gather(key, value, 2:5) %>%
mutate(violation = c(rep(FALSE, 96), rep(TRUE, 32))) %>%
ggplot(aes(ID, value, group = key, color = violation)) +
scale_color_manual(values = c("grey", "red")) +
geom_line(alpha = .5, size= 1.2) + ### changes in transparancy and thickness ###
theme_classic()
Yielding this plot:
"H 1"'s suggestion is an alternative approach which changes the sequence of line drawings:
mtcars %>%
mutate(ID = row_number()) %>%
select(ID, vs, am, gear, carb) %>%
gather(key, value, 2:5) %>%
mutate(violation = c(rep(FALSE, 96), rep(TRUE, 32))) %>%
ggplot(aes(ID, value, group = key, color = violation)) +
scale_color_manual(values = c("grey", "red")) +
geom_line(aes(group = rev(key))) + ### changes sequence of plotting of the lines ###
theme_classic()
This produces the following plot:

bar chart of row freq ggplot2

I have the following data:
dataf <- read.table(text = "index,group,taxa1,taxa2,taxa3,total
s1,g1,2,5,3,10
s2,g1,3,4,3,10
s3,g2,1,2,7,10
s4,g2,0,4,6,10", header = T, sep = ",")
I'm trying to make a stacked bar plot of the frequences of the data so that it counts across the row (not down a column) for each index (s1,s2,s3,s4) and then for each group (g1,g2) of each taxa. I'm only able to figure out how to graph the species of one taxa but not all three stacked on each other.
Here are some examples of what I'm trying to make:
These were made on google sheets so they don't look like ggplot but it would be easier to make in r with ggplot2 because the real data set is larger.
You would need to reshape the data.
Here is my solution (broken down by plot)
For first plot
library(tidyverse)
##For first plot
prepare_data_1 <- dataf %>% select(index, taxa1:taxa3) %>%
gather(taxa,value, -index) %>%
mutate(index = str_trim(index)) %>%
group_by(index) %>% mutate(prop = value/sum(value))
##Plot 1
prepare_data_1 %>%
ggplot(aes(x = index, y = prop, fill = fct_rev(taxa))) + geom_col()
For second plot
##For second plot
prepare_data_2 <- dataf %>% select(group, taxa1:taxa3) %>%
gather(taxa,value, -group) %>%
mutate(group = str_trim(group)) %>%
group_by(group) %>% mutate(prop = value/sum(value))
##Plot 2
prepare_data_2 %>%
ggplot(aes(x = group, y = prop, fill = fct_rev(taxa))) + geom_col()
##You need to reshape data before doing that.
dfm = melt(dataf, id.vars=c("index","group"),
measure.vars=c("taxa1","taxa2","taxa3"),
variable.name="variable", value.name="values")
ggplot(dfm, aes(x = index, y = values, group = variable)) +
geom_col(aes(fill=variable)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.25)) +
geom_text(aes(label = values), position = position_stack(vjust = .5), size = 3) + theme_gray()

How to get legend labels that differ from fill aesthetic in R ggplot2

In ggplot2, how do I get the legend labels for the fill aesthetic to differ from the variable actually used as the fill aesthetic? What I'd like to see from the following plot is the legend labels reflecting the name variable. I know I could just use the name itself as the fill aesthetic; however, in the following example, it's more convenient to set up the colour vector storm_cols (used for ggplot2::scale_fill_manual) using the id column as the vector names rather than typing out each name.
library(dplyr)
library(ggplot2)
dat <-
storms %>%
filter(year >= 2015) %>%
group_by(name, year) %>%
summarize(avg_wind = mean(wind)) %>%
ungroup() %>%
mutate(id = as.character(row_number())) %>%
slice(1:4)
storm_cols <- c("1" = "red", "2" = "blue", "3" = "green", "4" = "yellow")
dat %>%
ggplot(aes(id, avg_wind, fill = id)) +
geom_col() +
scale_fill_manual(values = storm_cols)
You don't need to explicitly type out the names for the color vector. Instead, you can create it programmatically, making it easier to create the desired color assignments and use name directly as the fill aesthetic. For example, in this case you can use the set_names function from the purrr package (or the base R setNames function).
library(tidyverse)
dat %>%
ggplot(aes(id, avg_wind, fill = name)) +
geom_col() +
scale_fill_manual(values = c("red","blue","green","yellow") %>% set_names(dat$name))
With your original example, you could change the legend labels with the labels argument to scale_fill_manual:
dat %>%
ggplot(aes(id, avg_wind, fill = id)) +
geom_col() +
scale_fill_manual(values = storm_cols,
labels = dat$name)

Resources