R multiple boxplots in one plot - r

I have a question regarding multiple boxplots. Assume we have data structures like this:
a <- rnorm(100, 0, 1)
b <- rnorm(100, 0, 1)
c <- rbinom(100, 1, 0.5)
My task is to create a boxplot of a and b for each group of c. However, it needs to be in the same plot. Ideally: Boxplot for a and b side by side for group 0 and next to it boxplot for a and b for group 1 and all together in one graphic.
I tried several things, but only the seperate plots are working:
boxplot(a~as.factor(c))
boxplot(b~as.factor(c))
But actually, that's not what I'm searching for. As it has to be one plot.

You can use the tidyverse package for this. You transform your data into long-format that you get three variables: "names", "values" and "group". After that you can plot your boxplots with ggplot():
value_a <- rnorm(100, 0, 1)
value_b <- rnorm(100, 0, 1)
group <- as.factor(rbinom(100, 1, 0.5))
data <- data.frame(value_a,value_b,group)
library(tidyverse)
data %>%
pivot_longer(value_a:value_b, names_to = "names", values_to = "values") %>%
ggplot(aes(y = values, x = group, fill = names))+
geom_boxplot()
Created on 2022-08-19 with reprex v2.0.2

Another option using lattice package with bwplot function:
library(tidyr)
a <- rnorm(100, 0, 1)
b <- rnorm(100, 0, 1)
c <- rbinom(100, 1, 0.5)
df <- data.frame(a = a,
b = b,
c = c)
# make longer dataframe
df_long <- pivot_longer(df, cols = -c)
library(lattice)
bwplot(value ~ name | as.factor(c), df_long)
Created on 2022-08-19 with reprex v2.0.2

Noah has already given the ggplot2 answer that would also be my go to option. As you used the boxplot function in the question, this is how to approach it with boxplot. You should probably stay consistently within base or within ggplot2 for your publication/presentation.
First we transform the data to a long format (here an option without additional packages):
a <- rnorm(100, 0, 1)
b <- rnorm(100, 0, 1)
c <- rbinom(100, 1, 0.5)
d <- data.frame(a, b, c)
d <- cbind(stack(d, select = c("a", "b")), c)
giving us
> head(d)
values ind c
1 -0.66905293 a 0
2 -0.28778381 a 0
3 0.29148347 a 1
4 0.81380406 a 0
5 -0.85681913 a 0
6 -0.02566758 a 0
With which we can then call boxplot:
boxplot(values ~ ind + c, data = d, at = c(1, 2, 4, 5))
The at argument controls the grouping and placement of the boxes. Contrary to ggplot2 you need to choose placing manually, but you also get very fine control of spacing very easily.
Slightly refined version of the plot:
boxplot(values ~ ind + c, data = d, at = c(1, 2, 4, 5),
col = c(2, 4), show.names = FALSE,
xlab = "")
axis(1, labels= c("c = 0", "c = 1"), at = c(1.5, 4.5))
legend("topright", fill = c(2, 4), legend = c("a", "b"))

Related

How to make an R plot with values on x-axis values not in order

I have the following example dataframe and want to plot columns b (x-axis) and d (y-axis) but based on column a. Meaning I want the rows in b and d that correspond to value 1 in column a to be plotted next to each other (as points or vertical lines) and then similarly for value 2 in column a etc. (all in one graph).
I am having trouble doing this as it would mean the values on the x-axis will increase and decrease/fluctuate.
a <- c(1,1,1,1,1,1,1, 2,2,2,2,2,2,2, 3,3,3,3,3,3,3)
b <- c(1,4,5,7,8,10,13, 5,7,10,11,14,17,23, 3,7,11,16,19,26,29)
d <- c(.4,.15,.76,.07,.18,.11,.12, .23,.45,.25,.11,.16,.2,.5, .48,.9,.13,.75,.4,.98,.3)
df <- data.frame("a" = a, "b" = b, "d" = d)
plot(df$b, df$d)
I have tried basic plotting as shown above (not the results I want) and have tried other methods to relabel the values on the x-axis but that is also incorrect.
Lastly, I am unable to use any R libraries as I cannot download them on the computer I am using.
Thank you in advance for any help!
Not sure if I understand the question properly, but perhaps plotting the interaction of "a" and "b" would work?
a <- c(1,1,1,1,1,1,1, 2,2,2,2,2,2,2, 3,3,3,3,3,3,3)
b <- c(1,4,5,7,8,10,13, 5,7,10,11,14,17,23, 3,7,11,16,19,26,29)
d <- c(.4,.15,.76,.07,.18,.11,.12, .23,.45,.25,.11,.16,.2,.5, .48,.9,.13,.75,.4,.98,.3)
df <- data.frame("a" = a, "b" = interaction(b, a), "d" = d)
df$b <- factor(df$b, levels = df$b)
png("test.png", width = 900, height = 400, units = "px")
plot(df$b, df$d)
dev.off()
Edit
Or maybe remove the x-axis labels and add them in 'manually'?
a <- c(1,1,1,1,1,1,1, 2,2,2,2,2,2,2, 3,3,3,3,3,3,3)
b <- c(1,4,5,7,8,10,13, 5,7,10,11,14,17,23, 3,7,11,16,19,26,29)
d <- c(.4,.15,.76,.07,.18,.11,.12, .23,.45,.25,.11,.16,.2,.5, .48,.9,.13,.75,.4,.98,.3)
df <- data.frame("a" = a, "b" = b, "d" = d)
png("test.png", width = 900, height = 400, units = "px")
plot(seq_along(df$b), df$d, xaxt = "none")
axis(1, at = seq_along(df$b), labels = df$b)
dev.off()
Edit 2
Another approach that may better "preserve the spacing" of the x-axis is to split the dataframe and plot each group separately:
a <- c(1,1,1,1,1,1,1, 2,2,2,2,2,2,2, 3,3,3,3,3,3,3)
b <- c(1,4,5,7,8,10,13, 5,7,10,11,14,17,23, 3,7,11,16,19,26,29)
d <- c(.4,.15,.76,.07,.18,.11,.12, .23,.45,.25,.11,.16,.2,.5, .48,.9,.13,.75,.4,.98,.3)
df <- data.frame("a" = a, "b" = b, "d" = d)
split_df <- split(df, df$a)
par(mfrow = c(1, 3), mar = c(2, 2, 1, 1))
plot(split_df$`1`$b, split_df$`1`$d, ylim = c(0, 1))
plot(split_df$`2`$b, split_df$`2`$d, ylim = c(0, 1))
plot(split_df$`3`$b, split_df$`3`$d, ylim = c(0, 1))
Created on 2021-10-15 by the reprex package (v2.0.1)

A simple plot for many curves with different colors

I have the following data frame which contains 4 columns of data in addition to the vector of labels c.
Time <-c(1:4)
d<-data.frame(Time,
x1= rpois(n = 4, lambda = 10),
x2= runif(n = 4, min = 1, max = 10),
x3= rpois(n = 4, lambda = 5),
x4= runif(n = 4, min = 1, max = 5),
c=c(1,1,2,3))
I would like to use ggpolt to plot 4 curves"x1,..,x4" above each others where each curve is colored according to the label. So curves x1 and x2 are colored by the same color since they have the same label where as curves x3 and x4 in different colors.
I did the following
d %>% pivot_longer(-c(Time,x1,x2,x3,x4))%>%
rename(class=value) %>% select(-name) %>%
pivot_longer(-c(Time,class)) %>%
mutate(Label=ifelse(Time==max(Time,na.rm = T),name,NA),
Label=ifelse(duplicated(Label),NA,Label)) %>%
ggplot(aes(x=Time,y=value,color=factor(class),group=name))+
geom_line()+
labs(color='class')+
scale_color_manual(values=c('red','blue','green'))+
geom_label_repel(aes(label = Label),
nudge_x = 1.5,
na.rm = TRUE,show.legend = F,color='black')
but I don't get the needed plot, the resulted curves are not colored according to the label. I want x1 and x2 in red, x3 in blue and x4 in green.
To add: I would like to obtain the same plot above in the following general case, where I can't add the vector c to the data frame as length(c) is not equal to length(x1)=...=length(x4)
Time <-c(1:5)
d<-data.frame(Time,
x1= rpois(n = 5, lambda = 10),
x2= runif(n = 5, min = 1, max = 10),
x3= rpois(n = 5, lambda = 5),
x4= runif(n = 5, min = 1, max = 5))
and c=c(1,1,2,3)
As you point out in your comments, it is only possible to put the vector of colors as a column in the original data.frame because it happens to be square, but this is a dangerous way to store the information because the colors really belong to the columns rather than the rows. It's better to assign the colors separately and then join into the long format data by variable name prior to plotting.
Below is an example of how I'd do this with your data.
First, prepare the data without the color mapping for each variable, we'll do that next:
# load necessary packages
library(tidyverse)
library(ggrepel)
# set seed to make simulated data reproducible
set.seed(1)
# simulate data
Time <-c(1:4)
d <- data.frame(Time,
x1 = rpois(n = 4, lambda = 10),
x2 = runif(n = 4, min = 1, max = 10),
x3 = rpois(n = 4, lambda = 5),
x4 = runif(n = 4, min = 1, max = 5))
Next, make a separate data.frame that maps the color grouping to the variable names. At some point you'll want to make this a factor (i.e. discrete rather than continuous) to map it to color so I just do it here but it can be done later in the ggplot call if you prefer. Per your request, this solution easily scales with your dataset without needing to manually set each level, but it requires that your vector of color mappings is in the same order and the same length as the variable names in d unless you have some other way to establish that relationship.
# create separate df with color groupings for variable in d
color_grouping <- data.frame(var = names(d)[-1],
color_group = factor(c(1, 1, 2, 3)))
Then you pivot_longer and do a join to merge the color mapping with the data for plotting.
# pivot d to long and merge in color codes
d_long <- d %>%
pivot_longer(cols = -Time, names_to = "var", values_to = "value") %>%
left_join(., color_grouping)
# inspect final table prior to plotting to confirm color mappings
head(d_long, 4)
# # A tibble: 4 x 4
# Time var value color_group
# <int> <chr> <dbl> <fct>
# 1 1 x1 8 1
# 2 1 x2 1.56 1
# 3 1 x3 4 2
# 4 1 x4 4.97 3
Finally, generate line plot where color is mapped to the color_group variable. To ensure you get one line per original variable you also need to set group = var. For more info on this check the documentation on grouping.
# plot data adding labels for each line
p <- d_long %>%
ggplot(aes(x = Time, y = value, group = var, color = color_group)) +
geom_line() +
labs(color='class') +
scale_color_manual(values=c('red','blue','green')) +
geom_label_repel(aes(label = var),
data = d_long %>% slice_max(order_by = Time, n = 1),
nudge_x = 1.5,
na.rm = TRUE,
show.legend = F,
color='black')
p
This produces the this plot:
In your comment you suggested wanting to separate out and stacking the plots. I'm not sure I fully understood, but one way to accomplish this is with faceting.
For example if you wanted to facet out separate panels by color_group, you could add this line to the plot above:
p + facet_grid(rows = "color_group")
Which gives this plot:
Note that the faceting variable must be put in quotes.
You were on the right path, but you need a little bit of a different structure to use ggplot:
# delete old color column
d$c <- NULL
# reshape df
plot.d <- reshape2::melt(d, id.vars = c("Time"))
# create new, correct color column
plot.d$c <- NA
plot.d$c[plot.d$variable == "x1"] <- 1
plot.d$c[plot.d$variable == "x2"] <- 1
plot.d$c[plot.d$variable == "x3"] <- 2
plot.d$c[plot.d$variable == "x4"] <- 3
# plot
ggplot(plot.d, aes(x=Time, y=value, color=as.factor(c), group = variable))+
geom_line() +
labs(color='class')+
scale_color_manual(values=c('red','blue','green'))
Note that I omitted the labels for brevity, but you can add them back in using the same logic. The code above gives the following result:
Here is a solution for how I understood your question.
The DF is brought in the long format, the variable c is replaced with mutate / case_when with the number code you have used.
I have set a seed for better reproducibility.
library(tidyverse)
library(ggrepel)
set.seed(1)
# YOUR DATA
Time <- c(1:4)
d <- data.frame(Time,
x1 = rpois(n = 4, lambda = 10),
x2 = runif(n = 4, min = 1, max = 10),
x3 = rpois(n = 4, lambda = 5),
x4 = runif(n = 4, min = 1, max = 5),
c = c(1, 1, 2, 3)
)
d %>%
pivot_longer(cols = x1:x4) %>% # make it long
mutate(c = as.factor(case_when( # replace consistently
name == "x1" | name == "x2" ~ 1, # according to YOUR DATA
name == "x3" ~ 2,
name == "x4" ~ 3
))) %>%
mutate(
Label = ifelse(Time == max(Time, na.rm = T), name, NA),
Label = ifelse(duplicated(Label), NA, Label)
) %>%
ggplot(aes(x = Time, y = value, color = c, group = name)) +
geom_line() +
labs(color = "class") +
scale_color_manual(values = c("red", "blue", "green")) + # YOUR CHOICE
geom_label_repel(aes(label = Label),
nudge_x = 1.5,
na.rm = TRUE, show.legend = F, color = "black"
)
ADDED
You could leave the c out and color according to name.
The color code was neccessary because you wanted 2 names with the same color. If that is not needed, the following code can do it.
d %>%
pivot_longer(cols = x1:x4) %>% # make it long
mutate(
Label = ifelse(Time == max(Time, na.rm = T), name, NA),
Label = ifelse(duplicated(Label), NA, Label)
) %>%
ggplot(aes(x = Time, y = value, color = name, group = name)) +
geom_line() +
geom_label_repel(aes(label = Label),
nudge_x = 1.5,
na.rm = TRUE, show.legend = F, color = "black"
)

How to scatter plot using face_wrap of ggplot in R?

I need to scatter plot Observed Vs Predicted data of each Variable using facet_wrap functionality of ggplot. I might be close but not there yet. I use some suggestion from an answer to my previous question to gather the data to automate the plotting process. Here is my code so far- I understand that the aes of my ggplot is wrong but I used it purposely to make my point clear. I would also like to add geom_smooth to have the confidence interval.
library(tidyverse)
DF1 = data.frame(A = runif(12, 1,10), B = runif(12,5,10), C = runif(12, 3,9), D = runif(12, 1,12))
DF2 = data.frame(A = runif(12, 4,13), B = runif(12,6,14), C = runif(12, 3,12), D = runif(12, 4,8))
DF1$df <- "Observed"
DF2$df <- "Predicted"
DF = rbind(DF1,DF2)
DF_long = gather(DF, key = "Variable", value = "Value", -df)
ggplot(DF_long, aes(x = Observed, y = Predicted))+
geom_point() + facet_wrap(Variable~.)+ geom_smooth()
I should see a plot like below, comparing Observed Vs Predicted for each Variable.
We will need to convert each dataframe separately then cbind as x is Observed and y is Predicted, then facet, see this example:
library(ggplot2)
# reproducible data with seed
set.seed(1)
DF1 = data.frame(A = runif(12, 1,10), B = runif(12,5,10), C = runif(12, 3,9), D = runif(12, 1,12))
DF2 = data.frame(A = runif(12, 4,13), B = runif(12,6,14), C = runif(12, 3,12), D = runif(12, 4,8))
DF1_long <- gather(DF1, key = "group", "Observed")
DF2_long <- gather(DF2, key = "group", "Predicted")
plotDat <- cbind(DF1_long, DF2_long[, -1, drop = FALSE])
head(plotDat)
# group Observed Predicted
# 1 A 3.389578 10.590824
# 2 A 4.349115 10.234584
# 3 A 6.155680 8.298577
# 4 A 9.173870 11.750885
# 5 A 2.815137 7.942874
# 6 A 9.085507 6.203175
ggplot(plotDat, aes(x = Observed, y = Predicted))+
geom_point() +
facet_wrap(group~.) +
geom_smooth()
We can use ggpubr to add P and R values to the plot see answers in this post:
Similarly, consider merge on reshaped data frames using base R's reshape (avoiding any tidyr dependencies in case you are a package author). Below lapply + Reduce dynamically merges to bypass helper objects, DF1_long and DF2_long, in global environment:
Data
set.seed(10312019)
DF1 = data.frame(A = runif(12, 1,10), B = runif(12,5,10),
C = runif(12, 3,9), D = runif(12, 1,12))
DF2 = data.frame(A = runif(12, 4,13), B = runif(12,6,14),
C = runif(12, 3,12), D = runif(12, 4,8))
Plot
library(ggplot2) # ONLY IMPORTED PACKAGE
DF1$df <- "Observed"
DF2$df <- "Predicted"
DF = rbind(DF1, DF2)
DF_long <- Reduce(function(x,y) merge(x, y, by=c("Variable", "id")),
lapply(list(DF1, DF2), function(df)
reshape(df, varying=names(DF)[1:(length(names(DF))-1)],
times=names(DF)[1:(length(names(DF))-1)],
v.names=df$df[1], timevar="Variable", drop="df",
new.row.names=1:1E5, direction="long")
)
)
head(DF_long)
# Variable id Observed Predicted
# 1 A 1 6.437720 11.338586
# 2 A 10 4.690934 9.861456
# 3 A 11 6.116200 9.020343
# 4 A 12 6.499371 5.904779
# 5 A 2 6.779087 5.901970
# 6 A 3 6.499652 8.557102
ggplot(DF_long, aes(x = Observed, y = Predicted)) +
geom_point() + geom_smooth() + facet_wrap(Variable~.)

Can you facet on a indicator variable in ggplot2?

types = c("A", "B", "C")
df = data.frame(n = rnorm(100), type=sample(types, 100, replace = TRUE))
ggplot(data=df, aes(n)) + geom_histogram() + facet_grid(~type)
Above is how I normally used facetting. But can I use it when instead of a categorical variable I have a set of columns that are indicator variables such as:
df = data.frame(n = rnorm(100), A=rbinom(100, 1, .5), B=rbinom(100, 1, .5), C=rbinom(100, 1, .5))
Now the "Type" variable from my previous example isn't mutually exclusive. An observation can be "A and B" or "A and B and C" for example. However, I'd still like an individual histogram for any observation that has the presence of A, B, or C?
I would reshape the data with tidyr so that data in more that one category are duplicated. filter to remove unwanted cases.
df <- data.frame(
n = rnorm(100),
A = rbinom(100, 1, .5),
B = rbinom(100, 1, .5),
C = rbinom(100, 1, .5)
)
library("tidyr")
library("dplyr")
library("ggplot2")
df %>% gather(key = "type", value = "value", -n) %>%
filter(value == 1) %>%
ggplot(aes(x = n)) +
geom_histogram() +
facet_wrap(~type)
I've always despised gather, so I'll add another method and one for the data.table fans.
library(data.table)
DT <- melt(setDT(df), id= "n", variable = "type")[value > 0]
ggplot(DT,aes(n)) + geom_histogram() + facet_grid(~type)
#tidyland
library(reshape2)
library(dplyr)
library(ggplot2)
df %>%
melt(id = "n", variable = "type") %>%
filter(value > 0) %>%
ggplot(aes(n)) + geom_histogram() + facet_grid(~type)

How to plot on two Y axis based on X value in R?

Here is my question, I have a data like this
A B C D
a 24 1 2 3
b 26 2 3 1
c 25 3 1 2
Now I would like to plot A in a Y axis (0 to 30) and B~D in another Y axis (0 to 5) in one graph. Also, I want a, b, c row has a line to link them together (lets say a, b, c represents a mouse ID). Could anyone come up with ideas on how to do it? I prefer using R. Thanks in advance!
# create some data
data = as.data.frame(list(A = c(24,26,25),
B = c(1,2,3),
C = c(2,3,1),
D = c(3,1,2)))
# adjust your margins to allow room for your second axis
par(mar=c(5, 4, 4, 4) + 0.1)
# create your first plot
plot(1:3,data$A,pch = 19,ylab = "1st ylab",xlab="index")
# set par to new so you dont' overwrite your current plot
par(new=T)
# set axes = F, set your ylim and remove your labels
plot(1:3,data$B,ylim = c(0,5), pch = 19, col = 2,
xlab="", ylab="",axes = F)
# add your points
points(1:3,data$C,pch = 19,col = 3)
points(1:3,data$D, pch = 19,col = 4)
# set the placement for your axis and add text
axis(4, ylim=c(0,5))
mtext("2nd ylab",side=4,line=2.5)
I greatly prefer using ggplot2 for plotting. Sadly, ggplot2 does not support this for philosophical reasons.
I would like to propose an alternative which uses facets, i.e. subplots. Note that to be able to plot the data using ggplot2, we need to change the data structure. We do this using gather from the tidyr package. In addition, I use the programming style as defined in dplyr (which uses piping a lot):
library(ggplot2)
library(dplyr)
library(tidyr)
df = data.frame(A = c(24, 26, 25), B = 1:3, C = c(2, 3, 1), D = c(3, 1, 2))
plot_data = df %>% mutate(x_value = rownames(df)) %>% gather(variable, value, -x_value)
ggplot(plot_data) + geom_line(aes(x = x_value, y = value, group = variable)) +
facet_wrap(~ variable, scales = 'free_y')
Here, each subplot has it's own y-axis.

Resources