Change the boxplot background based in x-variable (ggplot2) - r

I want to change the background of a boxplot based in x-variables. My code is very simple:
ggplot(data = df, aes(x = variable, y = value)) +
geom_boxplot() +
So, i have 17 x-variables and i generate 17 boxplots in the same picture. I want to change to grey the background of the boxplots from 1 to 4 and from 11 to 14. I donĀ“t know how can i do that.
Thanks.

You must create some factor to aid this process. In the example I created a new feature (tales) in df.
library(tidyverse)
df <- data.frame(variable = rep(base::LETTERS[1:17], 5),
value = runif(17*5, 0, 100))
df <- df %>%
dplyr::mutate(tales = rep(c(rep("x", 4), rep("y", 11-4), rep("w", 17-11)), 5))
ggplot(data = df, aes(x = variable, y = value)) +
geom_boxplot(aes(fill = tales))

Related

How create a box plot + line plot in a single plot using ggplot2

I want to create a box plot + line plot in a single plot using ggplot2
This is what my code now:
library(ggplot2)
dat <- data.frame(day = c(0,0,0,0,0,0,10,10,10,10,10,10,14,14,14,14,14,14,21,21,21,21,21,21,28,28,28,28,28,28,35,35,35,35,35,35,42,42,42,42,42,42), group = c('Saline','RP','Saline','Saline','RP','RP','Saline','RP','Saline','Saline','RP','RP','Saline','RP','Saline','Saline','RP','RP','Saline','RP','Saline','Saline','RP','RP','Saline','RP','Saline','Saline','RP','RP','Saline','RP','Saline','Saline','RP','RP','Saline','RP','Saline','Saline','RP','RP'), score = c(37.5,43,7,63,26,15,17,16,43,26,53,26,26,26,43,10,6,15,18,9,10,4,8,18,60,26,20,12.5,9,43,43,43,11,10,7,60,43,43,32,10.5,8,57.5))
g1 = ggplot(data = dat, aes(x = factor(day), y = score)) +
geom_boxplot(aes(fill = group))
g1
When doing box plot, I want scores of different treatments(groups) to be represented separately, so I let x = factor(day).
But for line plot, I want each day's score to be the average of the two treatments(group) of the day.
This is how my plot look like now
This is how I want my plot to look
How can I do this? Thank you so much!
#Libraries
library(tidyverse)
#Data
dat <- data.frame(day = c(0,0,0,0,0,0,10,10,10,10,10,10,14,14,14,14,14,14,21,21,21,21,21,21,28,28,28,28,28,28,35,35,35,35,35,35,42,42,42,42,42,42), group = c('Saline','RP','Saline','Saline','RP','RP','Saline','RP','Saline','Saline','RP','RP','Saline','RP','Saline','Saline','RP','RP','Saline','RP','Saline','Saline','RP','RP','Saline','RP','Saline','Saline','RP','RP','Saline','RP','Saline','Saline','RP','RP','Saline','RP','Saline','Saline','RP','RP'), score = c(37.5,43,7,63,26,15,17,16,43,26,53,26,26,26,43,10,6,15,18,9,10,4,8,18,60,26,20,12.5,9,43,43,43,11,10,7,60,43,43,32,10.5,8,57.5))
#How to
dat %>%
ggplot(aes(x = factor(day), y = score)) +
geom_boxplot(aes(fill = group))+
geom_line(
data = dat %>%
group_by(day) %>%
summarise(score = median(score,na.rm = TRUE)),
aes(group = 1),
size = 1,
col = "red"
)

how to ggplot the CDF of multiple variables in r?

I have the the DF data.frame. I want to plot the cumulative distribution function (CDF) of the variables in DF using ggplot. using the following code produce the plot but because of big range in the data for variables i don't see the plot well. I don't want to use multiple facets- would like to have all of the variables plotted on the single panel.
library(ggplot2)
set.seed(123)
DF <- melt(data.frame(p1 = runif(200,1,10), p2 = runif(200,-2,1), p3 = runif(200,0,0.05),p4 = runif(200,100,4000)))
ggplot(DF, aes(x = value, col = variable))+
stat_ecdf(lwd = 1.2)
We can use facet_wrap to identify
ggplot(DF, aes(x = value, col = variable))+
stat_ecdf(lwd = 1.2) +
facet_wrap(~ variable)
If you don't want to use facets, you could use a log scale:
library(ggplot2)
set.seed(123)
DF <- reshape::melt(data.frame(p1 = runif(200,1,10), p2 = runif(200,-2,1), p3 = runif(200,0,0.05),p4 = runif(200,100,4000)))
ggplot(DF, aes(x = value, col = variable),log='x')+
stat_ecdf(lwd = 1.2)+
scale_x_log10()

geom_bar overlapping labels

for simplicity lets suppose we have a database like
# A
1 1
2 2
3 2
4 2
5 3
We have a categorical variable "A" with 3 possible values (1,2,3). And im tring this code:
ggplot(df aes(x="", y=df$A, fill=A))+
geom_bar(width = 1, stat = "identity")
The problem is that the labels are overlapping. Also i want to change the labes for 1,2,3 to x,y,z.
Here is picture of what is happening
And here is a link for the actual data that im using.
https://a.uguu.se/anKhhyEv5b7W_Data.csv
Your graph does not correspond to the sample of data you are showing, so it is hard to be sure that the structure of your real data is actually the same.
Using a random example, I get the following plot:
df <- data.frame(A = sample(1:3,20, replace = TRUE))
library(ggplot2)
ggplot(df, aes(x="A", y=A, fill=as.factor(A)))+
geom_bar(width = 1, stat = "identity") +
scale_fill_discrete(labels = c("x","y","z"))
EDIT: Using data provided by the OP
Here using your data, you should get the following plot:
ggplot(df, aes(x = "A",y = A, fill = as.factor(A)))+
geom_col()
Or if you want the count of each individual values of A, you can do:
library(dplyr)
library(ggplot2)
df %>% group_by(A) %>% count() %>%
ggplot(aes(x = "A", y = n, fill = as.factor(A)))+
geom_col()
Is it what you are looking for ?

How to plot multiple facets histogram with ggplot in r?

i have a dataframe structured like this
Elem. Category. SEZa SEZb SEZc
A. ONE. 1. 3. 4
B. TWO. 4. 5. 6
i want to plot three histograms in three different facets (SEZa, SEZb, SEZc) with ggplot where the x values are the category values (ONE. e TWO.) and the y values are the number present in columns SEZa, SEZb, SEZc.
something like this:
how can I do? thank you for your suggestions!
Assume df is your data.frame, I would first convert from wide format to a long format:
new_df <- reshape2::melt(df, id.vars = c("Elem", "Category"))
And then make the plot using geom_col() instead of geom_histogram() because it seems you've precomputed the y-values and wouldn't need ggplot to calculate these values for you.
ggplot(new_df, aes(x = Category, y = value, fill = Elem)) +
geom_col() +
facet_grid(variable ~ .)
I think that what you are looking for is something like this :
library(ggplot2)
library(reshape2)
df <- data.frame(Category = c("One", "Two"),
SEZa = c(1, 4),
SEZb = c(3, 5),
SEZc = c(4, 6))
df <- melt(df)
ggplot(df, aes(x = Category, y = value)) +
geom_col(aes(fill = variable)) +
facet_grid(variable ~ .)
My inspiration is :
http://felixfan.github.io/stacking-plots-same-x/

ggplot faceted cumulative histogram

I have the following data
set.seed(123)
x = c(rnorm(100, 4, 1), rnorm(100, 6, 1))
gender = rep(c("Male", "Female"), each=100)
mydata = data.frame(x=x, gender=gender)
and I want to plot two cumulative histograms (one for males and the other for females) with ggplot.
I have tried the code below
ggplot(data=mydata, aes(x=x, fill=gender)) + stat_bin(aes(y=cumsum(..count..)), geom="bar", breaks=1:10, colour=I("white")) + facet_grid(gender~.)
but I get this chart
that, obviously, is not correct.
How can I get the correct one, like this:
Thanks!
I would pre-compute the cumsum values per bin per group, and then use geom_histogram to plot.
mydata %>%
mutate(x = cut(x, breaks = 1:10, labels = F)) %>% # Bin x
count(gender, x) %>% # Counts per bin per gender
mutate(x = factor(x, levels = 1:10)) %>% # x as factor
complete(x, gender, fill = list(n = 0)) %>% # Fill missing bins with 0
group_by(gender) %>% # Group by gender ...
mutate(y = cumsum(n)) %>% # ... and calculate cumsum
ggplot(aes(x, y, fill = gender)) + # The rest is (gg)plotting
geom_histogram(stat = "identity", colour = "white") +
facet_grid(gender ~ .)
Like #Edo, I also came here looking for exactly this. #Edo's solution was the key for me. It's great. But I post here a few additions that increase the information density and allow comparisons across different situations.
library(ggplot2)
set.seed(123)
x = c(rnorm(100, 4, 1), rnorm(50, 6, 1))
gender = c(rep("Male", 100), rep("Female", 50))
grade = rep(1:3, 50)
mydata = data.frame(x=x, gender=gender, grade = grade)
ggplot(mydata, aes(x,
y = ave(after_stat(density), group, FUN = cumsum)*after_stat(width),
group = interaction(gender, grade),
color = gender)) +
geom_line(stat = "bin") +
scale_y_continuous(labels = scales::percent_format()) +
facet_wrap(~grade)
I rescale the y so that the cumulative plot always ends at 100%. Otherwise, if the groups are not the same size (like they are in the original example data) then the cumulative plots have different final heights. This obscures their relative distribution.
Secondly, I use geom_line(stat="bin") instead of geom_histogram() so that I can put more than one line on a panel. This way I can compare them easily.
Finally, because I also want to compare across facets, I need to make sure the ggplot group variable uses more than just color=gender. We set it manually with group = interaction(gender, grade).
Answering a million years later....
I was looking for a solution for the same problem and I got here..
Eventually I figured it out by myself, so I'll drop it here in case other people will ever need it.
As required: no pre-work is necessary!
ggplot(mydata) +
geom_histogram(aes(x = x, y = ave(..count.., group, FUN = cumsum),
fill = gender, group = gender),
colour = "gray70", breaks = 1:10) +
facet_grid(rows = "gender")

Resources