How to properly create this kind of plot [duplicate] - r

This question already has answers here:
Simplest way to plot changes in ranking between two ordered lists in R?
(4 answers)
Closed 7 years ago.
I want to show the connections between a number of people, organizations or whatever:
Var1 Var2 Freq
1 F A 5
2 F B 38
3 B C 10
4 E C 28
5 A D 8
6 B D 21
7 A E 50
8 A F 34
9 D F 50
10 E F 14
I couldn't find any examples for this kind of plot, so I started from scratch. However, I'm struggling with the labels for the frequency values. Any ideas how to fix that?
MWE:
### Sample data ###
# Gerate names
names <- LETTERS[1:6]
# Generate all possible permutations
df = expand.grid(rep(list(names), 2))
rownames(df) <- NULL
# Drop some of the permutations
df <- df[df$Var1 != df$Var2, ]
df <- df[-sample(1:nrow(df), nrow(df) * 2/3), ]
# Add a column with random frequency values
df$Freq <- sample(1:50, nrow(df), replace=T)
### Prepare sample data for ggplot ####
# Add a column with the row numbers (used for grouping)
df$Pair <- 1:nrow(df)
# Convert data frame to long format
df.from <- df[, -which(names(df) %in% c("Var2"))]
df.from$Type <- "From"
colnames(df.from) <- c("Name", "Freq", "Pair", "Type")
df.to <- df[, -which(names(df) %in% c("Var1"))]
df.to$Type <- "To"
colnames(df.to) <- c("Name", "Freq", "Pair", "Type")
df2 <- rbind(df.from, df.to)
### Plot ###
library(ggplot2)
library(scales)
p <- ggplot()
p <- p + geom_text(aes(x = "From", y = names, label = names), hjust = 1, vjust = 0.5)
p <- p + geom_text(aes(x = "To", y = names, label = names), hjust = 0, vjust = 0.5)
p <- p + geom_line(data = df2, aes(x = Type, y = Name, group = Pair))
p <- p + geom_text(data = df2[df2$Type == "To", ], aes(x = Type, y = Name, group = Pair, label = Freq), hjust = 3, vjust = 0.5)
p <- p + scale_y_discrete(name = "", limits = rev(factor(names, levels = sort(names))))
p <- p + scale_x_discrete(name = "", limits = c("From", "To"))
p

to me the request:
to show the connections between a number of people, organizations or whatever
sounds like a desire to graph the the network plot. Using the network package:
#Construct a sparse graph
m<-matrix(rbinom(100,1,1.5/9),10)
diag(m)<-0
g<-network(m)
#Plot the graph
plot(g)
You could get the following
Alternatively, this may be more relevant to your problem, you may consider making use of the qgraph package. For example the code below:
require(qgraph)
set.seed(1)
adj = matrix(sample(0:1, 10^2, TRUE, prob = c(0.8, 0.2)), nrow = 10, ncol = 10)
qgraph(adj)
title("Unweighted and directed graphs", line = 2.5)
Would return this beautiful network graph:
If you are you looking for more examples just refer to this excellent page by Sacha Epskam on how to use qgraph.

Related

Loop printing lots of graphs in order (PDF) using ggplot2 in R

I have a large dataset as a result of a bayesian logistic regression. The dataset contains parameter estimates, confidence intervals, etc (see below for head).
mean sd confint_2.5 confint_97.5 Rhat median spec Errorup Errordown
1 -0.7897597 0.18668304 -1.1759960 -0.4517294 1.002211 -0.7811156 Marvulg -0.3293862 -1.957112
2 -0.7891327 0.08145761 -0.9570086 -0.6380287 1.000155 -0.7861764 Viotric -0.1481477 -1.743185
3 -0.6619662 0.26049168 -1.2203315 -0.2059030 1.045208 -0.6440501 Antdioi -0.4381470 -1.864382
4 -0.6571516 0.17940842 -1.0417642 -0.3364415 1.008100 -0.6470382 Eleacic -0.3105968 -1.688802
5 -0.6526717 0.20005184 -1.0816375 -0.2968111 1.005126 -0.6394952 Antcotu -0.3426842 -1.721133
6 -0.6497648 0.16620699 -1.0081607 -0.3555847 1.003738 -0.6384035 Triflav -0.2828188 -1.646564
I have a total of 714 rows of data, sorted (mean) from low to high. I use this code to plot 50 at a time, where a3_sort is a subset of 50 rows of data (so manually doing a3_sort <- a3[n:n,), after which I print the subset and proceed to the next 50):
ggplot2::ggplot(data = a3_sort, mapping = aes(x = reorder(spec, mean), y = mean, ymin = confint_97.5, ymax = confint_2.5))+
geom_pointrange()+
geom_hline(yintercept = 0, lty = 2)+
coord_flip()+
xlab ("species") +ylab ("mean (credibility interval)")+
theme_bw()
This works, and I get what I want, but there must be a less manual labour way to do this?
My question: Is there a way to loop this procedure, automatically saving the PDF in the working directory?
Below an example of what one plot looks like:
You can try this solution. I tested with dummy data DF with 714 rows and same columns as you have. DF in your case is your sorted dataframe of 714 rows and the variables you have. I have set the code so that you can change if you require a width larger than 50.
library(zoo)
#Create keys; change 50 if you want a larger window
keys <- seq(1, nrow(DF), 50)
vals=1:length(keys)
#Flag to allocate the position and values
#na.locf is used to complete NA so that we have same index
DF$Flag <- NA
DF$Flag[keys]<-vals
DF$Flag <- na.locf(DF$Flag)
#Then split by flag
ListData <- split(DF,DF$Flag)
#Function to create plot
myplot <- function(x)
{
tplot <- ggplot2::ggplot(data = x, mapping = aes(x = reorder(spec, mean), y = mean, ymin = confint_97.5, ymax = confint_2.5))+
geom_pointrange()+
geom_hline(yintercept = 0, lty = 2)+
coord_flip()+
xlab ("species") +ylab ("mean (credibility interval)")+
theme_bw()
return(tplot)
}
#Replicate plots
LPlots <- lapply(ListData,myplot)
#Export to pdf
pdf('Myplots.pdf',width = 14)
for(i in c(1:length(LPlots)))
{
plot(LPlots[[i]])
}
dev.off()
In the end, you will have your plots in pdf. I hope this helps. Let me know if you have any doubt.
This approach could be adapted to your case:
# Some dummy data:
df <- data.frame(g = letters[1:24],
min = sample(0:10, 24, replace = TRUE),
mid = sample(11:20, 24, replace = TRUE),
max = sample(21:30, 24, replace = TRUE))
library(ggplot2)
library(purrr)
# list of the rows you want printing, this could be automated
plot_range <- list(p1_6 = 1:6, p7_12 = 7:12, p13_18 = 13:18, p19_24 = 19:24)
# plotting function which also sets a title and plot name
gg_plot <- function(df, plot_rows){
title <- paste("Automatic plot rows: ", min(plot_rows), "to", max(plot_rows))
plot_nm <- paste("plots", min(plot_rows), max(plot_rows), sep = "_")
p <- ggplot(df[plot_rows, ])+
geom_segment(aes(x = min , xend = max, y = g, yend = g))+
geom_point(aes(x = mid, y = g))+
ggtitle(title)
print(ggsave(plot_nm, p, device = "pdf"))
}
# purrr function which acts as a loop to print each graph and allows a different data frame to be used.
walk(plot_range, ~gg_plot(df = df, plot_rows = .x))
#> Saving 7 x 5 in image
#> NULL
#> Saving 7 x 5 in image
#> NULL
#> Saving 7 x 5 in image
#> NULL
#> Saving 7 x 5 in image
#> NULL
Created on 2020-07-11 by the reprex package (v0.3.0)

Visualising diagonal in asymmetric matrix plot

I have a number of symmetric matrices of the same dimensionality, and I wish to visualise the mean and variance of the values in each cell across these matrices in an elegant way (which I will make more precise below) that makes use of the symmetric character.
Let me start by making some data to illustrate. The following creates 10 9x9 matrices, aggregates the mean and variance, and transforms to long format in preparation for plotting:
library(dplyr, warn.conflicts = FALSE)
library(tidyr)
make_matrix <- function(n) {
m <- matrix(NA, nrow = n, ncol = n)
m[lower.tri(m)] <- runif((n^2 - n) / 2)
m <- pmax(m, t(m), na.rm = TRUE)
diag(m) <- runif(n)
rownames(m) <- colnames(m) <- letters[1:n]
m
}
matrices <- replicate(10, make_matrix(9))
means <- apply(matrices, 1:2, mean) %>%
as_tibble(rownames = "row") %>%
pivot_longer(-1, names_to = "col", values_to = "mean")
vars <- apply(matrices, 1:2, var) %>%
as_tibble(rownames = "row") %>%
pivot_longer(-1, names_to = "col", values_to = "var")
df <- full_join(means, vars, by = c("row", "col"))
head(df)
#> # A tibble: 6 x 4
#> row col mean var
#> <chr> <chr> <dbl> <dbl>
#> 1 a a 0.548 0.111
#> 2 a b 0.507 0.0914
#> 3 a c 0.374 0.105
#> 4 a d 0.350 0.0976
#> 5 a e 0.525 0.0752
#> 6 a f 0.452 0.0887
Now, I could simply use geom_tile to make one plot of the means, and one plot of the variances. However, considering that both of these are symmetric, this wastes quite a lot of space, and also fails to communicate the symmetric character to the audience.
To address this problem, I have been playing around with the ggasym package to create an asymmetric matrix plot. The following is a slight modification from the ggasym vignette:
library(ggasym)
library(ggplot2)
ggplot(df, aes(x = col, y = row)) +
geom_asymmat(aes(fill_diag = mean, fill_tl = mean, fill_br = var)) +
scale_fill_diag_gradient(limits = c(0, 1), low = "lightpink", high = "tomato") +
scale_fill_tl_gradient(limits = c(0, 1), low = "lightpink", high = "tomato") +
scale_fill_br_gradient(low = "lightblue1", high = "dodgerblue") +
geom_text(data = filter(df, row == col), aes(label = signif(var, 2)))
Created on 2020-06-27 by the reprex package (v0.3.0)
What bothers me about this is the diagonal. In the above, I have mapped the fill of the diagonal to the means, and overlaid the variance by text, which works, but doesn't seem great. Specifically, I would like to map all the information here to fill, so as to get rid of the text. I see a couple of options for how to do this, but I am not sure how to implement any of them:
Split the fill of the diagonal cells, so that (in the example above) the lower right of each cell on the diagonal is an appropriate shade of blue, while the upper left is some shade of red.
Plot the upper and lower matrices separately (each with the diagonal), and then somehow "overlay" these plots so that they end up next to each other in an appropriate way. In other words, this would plot the diagonal twice.
I am open to other suggestions for how to accomplish this in a clean way. Let me emphasise that I do not require a solution building on ggasym, this was simply the closest I have been able to get so far. However, I would like some kind of ggplot-based solution.
So here is my take on the 'split-the-fill' strategy. You can plot most of the things you would want in ggplot if you don't mind parameterising your stuff as polygons. We let the ggnewscale package handle the double fill mapping for us.
First off, we no longer autoname the matrices, as we will not use the dimnames.
suppressPackageStartupMessages({
library(ggplot2)
library(tidyr)
library(dplyr)
library(ggnewscale)
})
make_matrix <- function(n) {
m <- matrix(NA, nrow = n, ncol = n)
m[lower.tri(m)] <- runif((n^2 - n) / 2)
m <- pmax(m, t(m), na.rm = TRUE)
diag(m) <- runif(n)
# rownames(m) <- colnames(m) <- letters[1:n]
m
}
Below is a function that takes a matrix, parameterises it as a polygon and cuts off one half.
halfmat <- function(mat, side) {
side <- match.arg(side, c("upper", "lower", "both"))
# Convert to long format
dat <- data.frame(
x = as.vector(row(mat)),
y = as.vector(col(mat)),
id = seq_along(mat),
value = as.vector(mat)
)
# Parameterise as polygon
poly <- with(dat, data.frame(
x = c(x - 0.5, x + 0.5, x + 0.5, x - 0.5),
y = c(y - 0.5, y - 0.5, y + 0.5, y + 0.5),
id = rep(id, 4),
value = rep(value, 4)
))
# Slice off one of the triangles
if (side == "upper") {
poly <- filter(poly, y >= x)
} else if (side == "lower") {
poly <- filter(poly, x >= y)
}
poly
}
Then we generate the data, compute the means and variances and reparameterise them.
matrices <- replicate(10, make_matrix(9))
means <- apply(matrices, 1:2, mean) %>% halfmat("upper")
vars <- apply(matrices, 1:2, var) %>% halfmat("lower")
Then we put in the means and variances as two seperate polygon layers, since we need to seperate the fill mappings with new_scale_fill(). There is a bit of extra fiddling with the scales, as these are now continuous instead of discrete, but it is not that bad.
ggplot(means, aes(x, y, fill = value, group = id)) +
geom_polygon() +
scale_fill_distiller(palette = "Reds", name = "Mean") +
# Be sure to call new_scale_fill() only after you've set up a fill scale
# for the upper part
new_scale_fill() +
geom_polygon(data = vars, aes(fill = value)) +
scale_fill_distiller(palette = "Blues", name = "Variance") +
scale_x_continuous(breaks = function(x){seq(x[1] + 0.5, x[2] - 0.5, by = 1)},
labels = function(x){letters[x]},
expand = c(0,0), name = "col") +
scale_y_continuous(breaks = function(x){seq(x[1] + 0.5, x[2] - 0.5, by = 1)},
labels = function(x){letters[x]},
expand = c(0,0), name = "row")
Created on 2020-06-27 by the reprex package (v0.3.0)

geom_density plots with nested vectors

I have a data frame with a nested vector in one column. Any ideas how to ggplot a geom_density using the values from the nested vector?
If I use pivot_longer the entire data frame, I get 25 million rows, so I'd prefer to avoid that if possible.
library(ggplot2)
df = data.frame(a = rep(letters[1:5],length.out = 100), b = sample(LETTERS, 100, replace = T))
df[["c"]] = purrr::map(1:100, function(x) rnorm(100))
# works but too heavy for the actual implementation
ggplot(tidyr::unnest(df, c), aes(c, group = a)) + geom_density() + facet_wrap(vars(b))
# doesn't work
ggplot(df, aes(c, group = a)) + geom_density() + facet_wrap(vars(b))
Different solution: Prepare each plot separately and rearrange your plots afterwards using gridExtra package.
library(ggplot2)
df = data.frame(a = rep(letters[1:5],length.out = 100), b = sample(LETTERS, 100, replace = T))
df[["c"]] = purrr::map(1:100, function(x) rnorm(100))
lst_plot <- lapply(sort(unique(df$b)), function(x){
data <- df[df$b == x,
data <- purrr::map_dfr(seq(length(data$a)), ~ data.frame(a = data$a[.x], c = data$c[.x][[1]]))
gg <- ggplot(data) +
geom_density(aes(c, group = a)) +
ylab(NULL)
return(gg)
})
gridExtra::grid.arrange(grobs = lst_plot, ncol = 6, left = "density")
To be honest, I'm not sure how well this works with your massive dataset...

Filter in ggplot2's geoms using common aesthetics and data frames across geoms

Say I have the following data frame:
# Dummy data frame
df <- data.frame(x = rep(1:5, 2), y = runif(10), z = rep(c("A", "B"), each = 5))
# x y z
# 1 1 0.92024937 A
# 2 2 0.37246007 A
# 3 3 0.76632809 A
# 4 4 0.03418754 A
# 5 5 0.33770400 A
# 6 1 0.15367174 B
# 7 2 0.78498276 B
# 8 3 0.03341913 B
# 9 4 0.77484244 B
# 10 5 0.13309999 B
I'd like to plot cases where z == "A" as points and cases where z == "B" as lines. Simple enough.
library(ggplot2)
# Plot data
g <- ggplot()
g <- g + geom_point(data = df %>% filter(z == "A"), aes(x = x, y = y))
g <- g + geom_line(data = df %>% filter(z == "B"), aes(x = x, y = y))
g
My data frame and aesthetic for the points and lines are identical, so this seems a bit verbose – especially if I want to do this lots of times (e.g., z == "A" through z == "Z"). Is there a way that I could state ggplot(df, aes(x = x, y = y)) and then subsequently state my filtering or subsetting criteria within the appropriate geoms?
I find the example in the question itself the most readable, although verbose. The second part of the question about dealing with more cases just requires a more sophisticated test in filter using for example %in% (or grep, grepl, etc.) when dealing with multiple cases. Taking advantage of the possibility of accessing default plot data within a layer, and as mentioned by #MrFlick moving the mapping of aesthetics out of the individual layers results in more concise code. All earlier answers get the plot done, so in this respect my answer is not better than any of them...
library(ggplot2)
library(dplyr)
df <- data.frame(x = rep(1:5, 4),
y = runif(20),
z = rep(c("A", "B", "C", "Z"), each = 5))
g <- ggplot(data = df, aes(x = x, y = y)) +
geom_point(data = . %>% filter(z %in% c("A", "B", "C"))) +
geom_line(data = . %>% filter(z == "Z"))
g
Another option would be to spread the data and then just supply the y aesthetic.
library(tidyverse)
df %>% spread(z,y) %>%
ggplot(aes(x = x))+
geom_point(aes(y = A))+
geom_line(aes(y = B))
You can plot lines and points for all z records, but remove unwanted lines and points with passing NA to scale_linetype_manual and scale_shape_manual:
library(ggplot2)
ggplot(df, aes(x, y, linetype = z, shape = z)) +
geom_line() +
geom_point() +
scale_linetype_manual(values = c(1, NA)) +
scale_shape_manual(values = c(NA, 16))

ggplot display geom_segment as a sequence of points

I am trying to display some data, where I don't only need to display a point using geom_point, but also want to trace a line to it from the axis. I figured I can do it with geom_segment, but I want to display a sequence of discrete dots instead.
Say I have a data like this:
df2 <- data_frame(x = c("a", "b", "c" ,"d"), y = c(3:6))
# A tibble: 4 × 2
x y
<chr> <int>
1 a 3
2 b 4
3 c 5
4 d 6
What I want to get is like the graph below, only having a dot in each of 4 variables between 0 and their value (with the desired points marked manually in red):
ggplot(df2, aes(x=x)) + geom_point(aes(y=y)) + geom_point(aes(y=0))
This works... you could wrap it up in a function to make it more generalizable if needed.
First we use expand.grid to create all combinations of x and 1:(max(y) - 1), join it to the original data, and filter out the unnecessary ones.
library(dplyr)
df3 = left_join(expand.grid(x = unique(df2$x), i = 1:max(df2$y - 1)),
df2) %>%
filter(i < y)
Once the data is constructed, the plotting is easy:
ggplot(df2, aes(x=x)) +
geom_point(aes(y=y)) +
geom_point(y = 0) +
geom_point(data = df3, aes(y = i), color = "red") +
expand_limits(y = 0)
I'm not sure if you actually want the dots to be red - if you want them to all look the same then you could use 1:max(df2$y) (omit the -1) and use <= in the filter to and then only use the resulting data frame.
If you wanted to use a data.table approach, using a similar expansion methodology you could use:
dt <- setDT(df2)
dt_expand<-dt[rep(seq(nrow(dt)),dt$y),]
dt_expand[,y2:=(1:.N),by=.(x)]
ggplot(dt_expand, aes(x=x)) + geom_point(aes(y=y2)) + geom_point(aes(y=0))
Note I didn't include the red coloring, but that is easily done if you want it
Here a solution in base R. The idea is to create 2 different datasets , one for red points:
dat1 <- do.call(rbind,Map(function(x,y)data.frame(x=x,y=seq(0,y)),df2$x,df2$y))
And another for the black points
dat2 <- do.call(rbind,Map(function(x,y)data.frame(x=x,y=c(0,y)),df2$x,df2$y))
Then the plot is just the juxtopsition of 2 layers of the same plot but with different datas:
library(ggplot2)
ggplot(data=dat1,aes(x=x,y=y)) +
geom_point(col="red") +
geom_point(data=dat2)
Yet another option, which is similar to #Gregor's, in that it's creating a new data vector.
d <- data.frame(x = c("a", "b", "c" ,"d"), y = c(3:6))
new_points <- mapply(seq, 0, d$y)
new <- data.frame(new = unlist(lapply(new_points, as.data.frame)),
x = rep(letters[1:4], d$y + 1),
group = 1)
d <- merge(d, new, by = "x")
d$group <- as.factor(ifelse(d$y == d$new|d$new == 0, 2, d$group))
ggplot(d, aes(x, new, color = group)) +
geom_point() +
scale_color_manual(values = c("red", "black")) +
theme(legend.position = "none")

Resources