How to scatter plot using face_wrap of ggplot in R? - r

I need to scatter plot Observed Vs Predicted data of each Variable using facet_wrap functionality of ggplot. I might be close but not there yet. I use some suggestion from an answer to my previous question to gather the data to automate the plotting process. Here is my code so far- I understand that the aes of my ggplot is wrong but I used it purposely to make my point clear. I would also like to add geom_smooth to have the confidence interval.
library(tidyverse)
DF1 = data.frame(A = runif(12, 1,10), B = runif(12,5,10), C = runif(12, 3,9), D = runif(12, 1,12))
DF2 = data.frame(A = runif(12, 4,13), B = runif(12,6,14), C = runif(12, 3,12), D = runif(12, 4,8))
DF1$df <- "Observed"
DF2$df <- "Predicted"
DF = rbind(DF1,DF2)
DF_long = gather(DF, key = "Variable", value = "Value", -df)
ggplot(DF_long, aes(x = Observed, y = Predicted))+
geom_point() + facet_wrap(Variable~.)+ geom_smooth()
I should see a plot like below, comparing Observed Vs Predicted for each Variable.

We will need to convert each dataframe separately then cbind as x is Observed and y is Predicted, then facet, see this example:
library(ggplot2)
# reproducible data with seed
set.seed(1)
DF1 = data.frame(A = runif(12, 1,10), B = runif(12,5,10), C = runif(12, 3,9), D = runif(12, 1,12))
DF2 = data.frame(A = runif(12, 4,13), B = runif(12,6,14), C = runif(12, 3,12), D = runif(12, 4,8))
DF1_long <- gather(DF1, key = "group", "Observed")
DF2_long <- gather(DF2, key = "group", "Predicted")
plotDat <- cbind(DF1_long, DF2_long[, -1, drop = FALSE])
head(plotDat)
# group Observed Predicted
# 1 A 3.389578 10.590824
# 2 A 4.349115 10.234584
# 3 A 6.155680 8.298577
# 4 A 9.173870 11.750885
# 5 A 2.815137 7.942874
# 6 A 9.085507 6.203175
ggplot(plotDat, aes(x = Observed, y = Predicted))+
geom_point() +
facet_wrap(group~.) +
geom_smooth()
We can use ggpubr to add P and R values to the plot see answers in this post:

Similarly, consider merge on reshaped data frames using base R's reshape (avoiding any tidyr dependencies in case you are a package author). Below lapply + Reduce dynamically merges to bypass helper objects, DF1_long and DF2_long, in global environment:
Data
set.seed(10312019)
DF1 = data.frame(A = runif(12, 1,10), B = runif(12,5,10),
C = runif(12, 3,9), D = runif(12, 1,12))
DF2 = data.frame(A = runif(12, 4,13), B = runif(12,6,14),
C = runif(12, 3,12), D = runif(12, 4,8))
Plot
library(ggplot2) # ONLY IMPORTED PACKAGE
DF1$df <- "Observed"
DF2$df <- "Predicted"
DF = rbind(DF1, DF2)
DF_long <- Reduce(function(x,y) merge(x, y, by=c("Variable", "id")),
lapply(list(DF1, DF2), function(df)
reshape(df, varying=names(DF)[1:(length(names(DF))-1)],
times=names(DF)[1:(length(names(DF))-1)],
v.names=df$df[1], timevar="Variable", drop="df",
new.row.names=1:1E5, direction="long")
)
)
head(DF_long)
# Variable id Observed Predicted
# 1 A 1 6.437720 11.338586
# 2 A 10 4.690934 9.861456
# 3 A 11 6.116200 9.020343
# 4 A 12 6.499371 5.904779
# 5 A 2 6.779087 5.901970
# 6 A 3 6.499652 8.557102
ggplot(DF_long, aes(x = Observed, y = Predicted)) +
geom_point() + geom_smooth() + facet_wrap(Variable~.)

Related

R multiple boxplots in one plot

I have a question regarding multiple boxplots. Assume we have data structures like this:
a <- rnorm(100, 0, 1)
b <- rnorm(100, 0, 1)
c <- rbinom(100, 1, 0.5)
My task is to create a boxplot of a and b for each group of c. However, it needs to be in the same plot. Ideally: Boxplot for a and b side by side for group 0 and next to it boxplot for a and b for group 1 and all together in one graphic.
I tried several things, but only the seperate plots are working:
boxplot(a~as.factor(c))
boxplot(b~as.factor(c))
But actually, that's not what I'm searching for. As it has to be one plot.
You can use the tidyverse package for this. You transform your data into long-format that you get three variables: "names", "values" and "group". After that you can plot your boxplots with ggplot():
value_a <- rnorm(100, 0, 1)
value_b <- rnorm(100, 0, 1)
group <- as.factor(rbinom(100, 1, 0.5))
data <- data.frame(value_a,value_b,group)
library(tidyverse)
data %>%
pivot_longer(value_a:value_b, names_to = "names", values_to = "values") %>%
ggplot(aes(y = values, x = group, fill = names))+
geom_boxplot()
Created on 2022-08-19 with reprex v2.0.2
Another option using lattice package with bwplot function:
library(tidyr)
a <- rnorm(100, 0, 1)
b <- rnorm(100, 0, 1)
c <- rbinom(100, 1, 0.5)
df <- data.frame(a = a,
b = b,
c = c)
# make longer dataframe
df_long <- pivot_longer(df, cols = -c)
library(lattice)
bwplot(value ~ name | as.factor(c), df_long)
Created on 2022-08-19 with reprex v2.0.2
Noah has already given the ggplot2 answer that would also be my go to option. As you used the boxplot function in the question, this is how to approach it with boxplot. You should probably stay consistently within base or within ggplot2 for your publication/presentation.
First we transform the data to a long format (here an option without additional packages):
a <- rnorm(100, 0, 1)
b <- rnorm(100, 0, 1)
c <- rbinom(100, 1, 0.5)
d <- data.frame(a, b, c)
d <- cbind(stack(d, select = c("a", "b")), c)
giving us
> head(d)
values ind c
1 -0.66905293 a 0
2 -0.28778381 a 0
3 0.29148347 a 1
4 0.81380406 a 0
5 -0.85681913 a 0
6 -0.02566758 a 0
With which we can then call boxplot:
boxplot(values ~ ind + c, data = d, at = c(1, 2, 4, 5))
The at argument controls the grouping and placement of the boxes. Contrary to ggplot2 you need to choose placing manually, but you also get very fine control of spacing very easily.
Slightly refined version of the plot:
boxplot(values ~ ind + c, data = d, at = c(1, 2, 4, 5),
col = c(2, 4), show.names = FALSE,
xlab = "")
axis(1, labels= c("c = 0", "c = 1"), at = c(1.5, 4.5))
legend("topright", fill = c(2, 4), legend = c("a", "b"))

How to specify multiple xlims for facetted data in ggplot2 R?

The data is facetted by two variables (see graph). Each variable has a different range. I want to specify the range so that all plots in var1 and vae2 are bound by the min and max values of those variables. See sample code attached. I don't want to use setscales = "free" on facet_wrap.
var1 <- rnorm(100, 6, 2)
var2 <- rnorm(100,15,2)
spp.val <- rnorm(100,10,2)
spp <- rep(c("A","B","C","D"), 25)
df <- data.frame(var1, var2,spp, spp.val)
df <- gather(df,
key = "var",
value = "var.val",
var1,var2)
df$var <- as.factor(as.character(df$var))
df$spp <- as.factor(as.character(df$spp))
ggplot(aes(x = var.val, y = spp.val), data = df) +
geom_point() +
facet_grid(spp~var)
#I want the limits for each facet_grid to be set as follows
xlim(min(df[df$var == "var1",]), max(df[df$var == "var1",])
xlim(min(df[df$var == "var2",]), max(df[df$var == "var2",])
Is this what you want?
library(tidyverse)
tibble(
var1 = rnorm(100, 6, 2),
var2 = rnorm(100, 15, 2),
spp.val = rnorm(100, 10, 2),
spp = rep(c("A", "B", "C", "D"), 25)
) |>
pivot_longer(starts_with("var"), names_to = "var", values_to = "var.val") |>
mutate(across(c(spp, var), factor)) |>
ggplot(aes(var.val, spp.val)) +
geom_point() +
facet_grid(spp ~var, scales = "free_x")
Created on 2022-04-23 by the reprex package (v2.0.1)

How group dataset in a boxplot?

I have been trying to figure out how to group 9 datasets into 3 different groups (1, 2, and 3).
I have 3 different data frames that look like this:
ID1 ID2 dN dS Omega Label_ID1 Label_ID2 Group
QJY77946 NP_073551 0.0293 0.0757 0.3872 229E-CoV 229E-CoV Intra
QJY77954 NP_073551 0.0273 0.0745 0.3668 229E-CoV 229E-CoV Intra
...
So, the only columns that I´m interested in are three: dN, dS, and Omega.
My main goal is to take these three columns from my data frames and plots in a boxplot using Rstudio.
To do that, first I take the 3 columns of each data frame with these lines:
dN_1 <- df_1$dN
dS_1 <- df_1$dS
Omega_1 <- df_1$Omega
Then, to generate the plot I use this line (option 1):
boxplot(dN_S, dS_S, Omega_S, dN_M, dS_M, Omega_M, dN_E, dS_E, Omega_E,
main = "Test",
xlab = "Frames",
ylab = "Distribution",
col = "red")
My goal is to group these 9 boxes into 3 separate groups:
I know that using ggplot2 could be easier, so my option 2 is to use these lines (option 2):
df_1 %>%
ggplot(aes(y=dN_S)) +
geom_boxplot(
color = "blue",
fill = "blue",
alpha = 0.2,
notch = T,
notchwidth = 0.8)
However, you can see that I couldn´t find a way to plot all groups in the same plot.
So how can I group my data in the boxplot using option 1 or option 2? Maybe the second option is less development but perhaps someone could help with that too.
library(dplyr)
library(purrr)
library(tidyr)
library(ggplot2)
set.seed(123)
df_s <- data.frame(dN = runif(20),
dS = runif(20),
Omega = runif(20))
df_m <- data.frame(dN = runif(20),
dS = runif(20),
Omega = runif(20))
df_e <- data.frame(dN = runif(20),
dS = runif(20),
Omega = runif(20))
df <-
list(df_s, df_m, df_e) %>%
set_names(c("S", "M", "E")) %>%
map_dfr(bind_rows, .id = "df") %>%
pivot_longer(-df)
ggplot(df)+
geom_boxplot(aes(x = name, y = value))+
facet_wrap(~df, nrow = 1)
Created on 2021-09-24 by the reprex package (v2.0.0)
One way to accomplish this is by providing ggplot() another aesthetic, like fill. Here's a small reproducible example:
library(tidyverse)
df <- tibble(category = rep(letters[1:4], 5),
time = c(rep("before", 10), rep("after", 10)),
num = rnorm(20))
df %>%
ggplot() +
geom_boxplot(aes(x=category, y=num, fill = time))
Let me know if you're looking for something else.

scatter plots for all pairwise columns between two data frames

lets say I have 2 data frames:
df1 = data.frame(a = rnorm(10), b = rnorm(10), c = rnorm(10))
df2 = data.frame(d = rnorm(10), e = rnorm(10))
I would like to look at the all pairwise scatter plots between data frames:
i.e.: the six scatter plots: a vs d, a vs e, b vs d, b vs e, c vs d, c vs e.
How could I achieve this? I notice that pairs does this for a single data.frame
use cbind to combine the two dfs and then use plot()
df1 = data.frame(a = rnorm(10), b = rnorm(10), c = rnorm(10))
df2 = data.frame(d = rnorm(10), e = rnorm(10))
df <- cbind(df1, df2)
plot(df)
If you want to create only plots between the two data.frames (no self comparison), you can loop them:
par(mfrow = c(ncol(df1), ncol(df2)))
for(i in 1:ncol(df1)){
for(j in 1:ncol(df2)){
plot(df1[,i], df2[,j], main = paste(names(df1)[i], "vs", names(df2)[j]),
ylab = names(df2)[j],
xlab = names(df1)[i])
}
}
A pretty (unnecessarily complicated?) tidyverse/ggplot2 solution.
Reorganize data:
library(dplyr)
library(tidyr)
mfun <- function(x,label="df1") {
x %>%
mutate(obs=seq(n())) %>% ## add obs numbers
gather(key=var,value=value,-obs) ## reshape
}
## combine
df12 <- mfun(df1) %>% full_join(mfun(df2),by="obs")
Plot:
library(ggplot2); theme_set(theme_bw())
ggplot(df12,aes(value.x,value.y)) +
geom_point()+
facet_grid(var.x~var.y)+
theme(panel.margin=grid::unit(0,"lines")) ## squash panels together

How to properly create this kind of plot [duplicate]

This question already has answers here:
Simplest way to plot changes in ranking between two ordered lists in R?
(4 answers)
Closed 7 years ago.
I want to show the connections between a number of people, organizations or whatever:
Var1 Var2 Freq
1 F A 5
2 F B 38
3 B C 10
4 E C 28
5 A D 8
6 B D 21
7 A E 50
8 A F 34
9 D F 50
10 E F 14
I couldn't find any examples for this kind of plot, so I started from scratch. However, I'm struggling with the labels for the frequency values. Any ideas how to fix that?
MWE:
### Sample data ###
# Gerate names
names <- LETTERS[1:6]
# Generate all possible permutations
df = expand.grid(rep(list(names), 2))
rownames(df) <- NULL
# Drop some of the permutations
df <- df[df$Var1 != df$Var2, ]
df <- df[-sample(1:nrow(df), nrow(df) * 2/3), ]
# Add a column with random frequency values
df$Freq <- sample(1:50, nrow(df), replace=T)
### Prepare sample data for ggplot ####
# Add a column with the row numbers (used for grouping)
df$Pair <- 1:nrow(df)
# Convert data frame to long format
df.from <- df[, -which(names(df) %in% c("Var2"))]
df.from$Type <- "From"
colnames(df.from) <- c("Name", "Freq", "Pair", "Type")
df.to <- df[, -which(names(df) %in% c("Var1"))]
df.to$Type <- "To"
colnames(df.to) <- c("Name", "Freq", "Pair", "Type")
df2 <- rbind(df.from, df.to)
### Plot ###
library(ggplot2)
library(scales)
p <- ggplot()
p <- p + geom_text(aes(x = "From", y = names, label = names), hjust = 1, vjust = 0.5)
p <- p + geom_text(aes(x = "To", y = names, label = names), hjust = 0, vjust = 0.5)
p <- p + geom_line(data = df2, aes(x = Type, y = Name, group = Pair))
p <- p + geom_text(data = df2[df2$Type == "To", ], aes(x = Type, y = Name, group = Pair, label = Freq), hjust = 3, vjust = 0.5)
p <- p + scale_y_discrete(name = "", limits = rev(factor(names, levels = sort(names))))
p <- p + scale_x_discrete(name = "", limits = c("From", "To"))
p
to me the request:
to show the connections between a number of people, organizations or whatever
sounds like a desire to graph the the network plot. Using the network package:
#Construct a sparse graph
m<-matrix(rbinom(100,1,1.5/9),10)
diag(m)<-0
g<-network(m)
#Plot the graph
plot(g)
You could get the following
Alternatively, this may be more relevant to your problem, you may consider making use of the qgraph package. For example the code below:
require(qgraph)
set.seed(1)
adj = matrix(sample(0:1, 10^2, TRUE, prob = c(0.8, 0.2)), nrow = 10, ncol = 10)
qgraph(adj)
title("Unweighted and directed graphs", line = 2.5)
Would return this beautiful network graph:
If you are you looking for more examples just refer to this excellent page by Sacha Epskam on how to use qgraph.

Resources