Visualising diagonal in asymmetric matrix plot - r

I have a number of symmetric matrices of the same dimensionality, and I wish to visualise the mean and variance of the values in each cell across these matrices in an elegant way (which I will make more precise below) that makes use of the symmetric character.
Let me start by making some data to illustrate. The following creates 10 9x9 matrices, aggregates the mean and variance, and transforms to long format in preparation for plotting:
library(dplyr, warn.conflicts = FALSE)
library(tidyr)
make_matrix <- function(n) {
m <- matrix(NA, nrow = n, ncol = n)
m[lower.tri(m)] <- runif((n^2 - n) / 2)
m <- pmax(m, t(m), na.rm = TRUE)
diag(m) <- runif(n)
rownames(m) <- colnames(m) <- letters[1:n]
m
}
matrices <- replicate(10, make_matrix(9))
means <- apply(matrices, 1:2, mean) %>%
as_tibble(rownames = "row") %>%
pivot_longer(-1, names_to = "col", values_to = "mean")
vars <- apply(matrices, 1:2, var) %>%
as_tibble(rownames = "row") %>%
pivot_longer(-1, names_to = "col", values_to = "var")
df <- full_join(means, vars, by = c("row", "col"))
head(df)
#> # A tibble: 6 x 4
#> row col mean var
#> <chr> <chr> <dbl> <dbl>
#> 1 a a 0.548 0.111
#> 2 a b 0.507 0.0914
#> 3 a c 0.374 0.105
#> 4 a d 0.350 0.0976
#> 5 a e 0.525 0.0752
#> 6 a f 0.452 0.0887
Now, I could simply use geom_tile to make one plot of the means, and one plot of the variances. However, considering that both of these are symmetric, this wastes quite a lot of space, and also fails to communicate the symmetric character to the audience.
To address this problem, I have been playing around with the ggasym package to create an asymmetric matrix plot. The following is a slight modification from the ggasym vignette:
library(ggasym)
library(ggplot2)
ggplot(df, aes(x = col, y = row)) +
geom_asymmat(aes(fill_diag = mean, fill_tl = mean, fill_br = var)) +
scale_fill_diag_gradient(limits = c(0, 1), low = "lightpink", high = "tomato") +
scale_fill_tl_gradient(limits = c(0, 1), low = "lightpink", high = "tomato") +
scale_fill_br_gradient(low = "lightblue1", high = "dodgerblue") +
geom_text(data = filter(df, row == col), aes(label = signif(var, 2)))
Created on 2020-06-27 by the reprex package (v0.3.0)
What bothers me about this is the diagonal. In the above, I have mapped the fill of the diagonal to the means, and overlaid the variance by text, which works, but doesn't seem great. Specifically, I would like to map all the information here to fill, so as to get rid of the text. I see a couple of options for how to do this, but I am not sure how to implement any of them:
Split the fill of the diagonal cells, so that (in the example above) the lower right of each cell on the diagonal is an appropriate shade of blue, while the upper left is some shade of red.
Plot the upper and lower matrices separately (each with the diagonal), and then somehow "overlay" these plots so that they end up next to each other in an appropriate way. In other words, this would plot the diagonal twice.
I am open to other suggestions for how to accomplish this in a clean way. Let me emphasise that I do not require a solution building on ggasym, this was simply the closest I have been able to get so far. However, I would like some kind of ggplot-based solution.

So here is my take on the 'split-the-fill' strategy. You can plot most of the things you would want in ggplot if you don't mind parameterising your stuff as polygons. We let the ggnewscale package handle the double fill mapping for us.
First off, we no longer autoname the matrices, as we will not use the dimnames.
suppressPackageStartupMessages({
library(ggplot2)
library(tidyr)
library(dplyr)
library(ggnewscale)
})
make_matrix <- function(n) {
m <- matrix(NA, nrow = n, ncol = n)
m[lower.tri(m)] <- runif((n^2 - n) / 2)
m <- pmax(m, t(m), na.rm = TRUE)
diag(m) <- runif(n)
# rownames(m) <- colnames(m) <- letters[1:n]
m
}
Below is a function that takes a matrix, parameterises it as a polygon and cuts off one half.
halfmat <- function(mat, side) {
side <- match.arg(side, c("upper", "lower", "both"))
# Convert to long format
dat <- data.frame(
x = as.vector(row(mat)),
y = as.vector(col(mat)),
id = seq_along(mat),
value = as.vector(mat)
)
# Parameterise as polygon
poly <- with(dat, data.frame(
x = c(x - 0.5, x + 0.5, x + 0.5, x - 0.5),
y = c(y - 0.5, y - 0.5, y + 0.5, y + 0.5),
id = rep(id, 4),
value = rep(value, 4)
))
# Slice off one of the triangles
if (side == "upper") {
poly <- filter(poly, y >= x)
} else if (side == "lower") {
poly <- filter(poly, x >= y)
}
poly
}
Then we generate the data, compute the means and variances and reparameterise them.
matrices <- replicate(10, make_matrix(9))
means <- apply(matrices, 1:2, mean) %>% halfmat("upper")
vars <- apply(matrices, 1:2, var) %>% halfmat("lower")
Then we put in the means and variances as two seperate polygon layers, since we need to seperate the fill mappings with new_scale_fill(). There is a bit of extra fiddling with the scales, as these are now continuous instead of discrete, but it is not that bad.
ggplot(means, aes(x, y, fill = value, group = id)) +
geom_polygon() +
scale_fill_distiller(palette = "Reds", name = "Mean") +
# Be sure to call new_scale_fill() only after you've set up a fill scale
# for the upper part
new_scale_fill() +
geom_polygon(data = vars, aes(fill = value)) +
scale_fill_distiller(palette = "Blues", name = "Variance") +
scale_x_continuous(breaks = function(x){seq(x[1] + 0.5, x[2] - 0.5, by = 1)},
labels = function(x){letters[x]},
expand = c(0,0), name = "col") +
scale_y_continuous(breaks = function(x){seq(x[1] + 0.5, x[2] - 0.5, by = 1)},
labels = function(x){letters[x]},
expand = c(0,0), name = "row")
Created on 2020-06-27 by the reprex package (v0.3.0)

Related

How to draw a multi-colored dashed line (alternating colors for visual effect) [duplicate]

This question already has answers here:
Alternating color of individual dashes in a geom_line
(4 answers)
Closed 8 months ago.
I was wondering if it is possible to create a multicolored dashed line in ggplot.
Basically I have a plot displaying savings based on two packages.
A orange line with savings based on package A
A green line with savings based on package B
I also have a third line and I would like that one to be dashed alterenating between orange and green. Is that something that somebody has been able to do?
Here is an example:
library(tidyverse)
S <- seq(0, 5, by = 0.05)
a <- S ^ 2
b <- S
a_b = a + b #This data should have the dashed multicolor line, since it is the sum of the other two lines.
S <- data.frame(S)
temp <- cbind(S, a, b, a_b)
temp <- gather(temp, variable, value, -S)
desiredOrder <- c("a", "b", "a_b")
temp$variable <- factor(temp$variable, levels = desiredOrder)
temp <- temp[order(temp$variable),]
p <- ggplot(temp, aes(x = S, y = value, colour = variable)) +
theme_minimal() +
geom_line(size = 1) +
scale_color_manual(name = "Legend", values = c("orange", "green", "#0085bd"),
breaks = c("a", "b", "a_b"))
p
I basically want to have a multicolored (dashed or dotted) line for "c"
This is, to my best knowledge, currently only possible via creation of new segments for each alternate color. This is fiddly.
Below I've tried a largely programmatic approach in which you can define the size of the repeating segment (based on your x unit). The positioning of y values is slightly convoluted and it will also result in slightly irregular segment lengths when dealing with different slopes. I also haven't tested it on many data, either. But I guess it's a good start :)
For the legend, I'm taking the same approach, by creating a fake legend and stitching it onto the other plot. The challenges here include:
positioning of legend elements relative to the plot
relative distance between the legend elements
update
For a much neater way to create those segments and a Stat implementation see this thread
library(tidyverse)
library(patchwork)
S <- seq(0, 5, by = 0.05)
a <- S^2
b <- S
a_b <- a + b
df <- data.frame(x = S, a, b, a_b) %>%
pivot_longer(-x, names_to = "variable", values_to = "value")
## a function to create modifiable cuts in order to get segments.
## this looks convoluted - and it is! there are a few if/else statements.
## Why? The assigment of new y to x values depends on how many original values
## you have.
## There might be more direct ways to get there
alt_colors <- function(df, x, y, seg_length, my_cols) {
x <- df[[x]]
y <- df[[y]]
## create new x for each tiny segment
length_seg <- seg_length / length(my_cols)
new_x <- seq(min(x, na.rm = TRUE), x[length(x)], length_seg)
## now we need to interpolate y values for each new x
## This is different depending on how many x and new x you have
if (length(new_x) < length(x)) {
ind_int <- findInterval(new_x, x)
new_y <- sapply(seq_along(ind_int), function(i) {
if (y[ind_int[i]] == y[ind_int[length(ind_int)]]) {
y[ind_int[i]]
} else {
seq_y <- seq(y[ind_int[i]], y[ind_int[i] + 1], length.out = length(my_cols))
head(seq_y, -1)
}
})
} else {
ind_int <- findInterval(new_x, x)
rle_int <- rle(ind_int)
new_y <- sapply(rle_int$values, function(i) {
if (y[i] == y[max(rle_int$values)]) {
y[i]
} else {
seq_y <- seq(y[i], y[i + 1], length.out = rle_int$lengths[i] + 1)
head(seq_y, -1)
}
})
}
## THis is also a bit painful and might cause other bugs that I haven't
## discovered yet.
if (length(unlist(new_y)) < length(new_x)) {
newdat <- data.frame(
x = new_x,
y = rep_len(unlist(new_y), length.out = length(new_x))
)
} else {
newdat <- data.frame(x = new_x, y = unlist(new_y))
}
newdat <- newdat %>%
mutate(xend = lead(x), yend = lead(y)) %>%
drop_na(xend)
newdat$color <- my_cols
newdat
}
## the below is just a demonstration of how the function would work
## using different segment widths
df_alt1 <-
df %>%
filter(variable == "a_b") %>%
alt_colors("x", "value", 1, c("orange", "green"))
df_alt.5 <-
df %>%
filter(variable == "a_b") %>%
alt_colors("x", "value", .5, c("orange", "green"))
df_ab <-
df %>%
filter(variable != "a_b") %>%
# for the identity mapping
mutate(color = ifelse(variable == "a", "green", "orange"))
## create data frame for the legend, also using the alt_colors function as per above
## the amount of x is a bit of trial and error, this is just a quick hack
## this is a trick to center the legend more or less relative to the main plot
y_leg <- ceiling(mean(range(df$value, na.rm = TRUE)))
dist_y <- 2
df_legend <-
data.frame(
variable = rep(unique(df$variable), each = 2),
x = 1:2,
y = rep(seq(y_leg - dist_y, y_leg + dist_y, by = dist_y), each = 2)
)
df_leg_onecol <-
df_legend %>%
filter(variable != "a_b") %>%
mutate(color = ifelse(variable == "a", "green", "orange"))
df_leg_alt <-
df_legend %>%
filter(variable == "a_b") %>%
alt_colors("x", "y", .5, c("orange", "green"))
## I am mapping the colors globally using identity mapping (see scale_identity).
p1 <-
ggplot(mapping = aes(x, value, colour = color)) +
theme_minimal() +
geom_line(data = df_ab, size = 1) +
geom_segment(data = df_alt1, aes(y = y, xend = xend, yend = yend), size = 1) +
scale_color_identity() +
ggtitle("alternating every 1 unit")
p.5 <-
ggplot(mapping = aes(x, value, colour = color)) +
theme_minimal() +
geom_line(data = df_ab, size = 1) +
geom_segment(data = df_alt.5, aes(y = y, xend = xend, yend = yend), size = 1) +
scale_color_identity() +
ggtitle("alternating every .5 unit")
p_leg <-
ggplot(mapping = aes(x, y, colour = color)) +
theme_void() +
geom_line(data = df_leg_onecol, size = 1) +
geom_segment(data = df_leg_alt, aes(xend = xend, yend = yend), size = 1) +
scale_color_identity() +
annotate(
geom = "text", y = unique(df_legend$y), label = unique(df_legend$variable),
x = max(df_legend$x + 1), hjust = 0
)
## set y limits to the range of the main plot
## in order to make the labels visible you need to adjust the plot margin and
## turn clipping off
p1 + p.5 +
(p_leg + coord_cartesian(ylim = range(df$value), clip = "off") +
theme(plot.margin = margin(r = 20, unit = "pt"))) +
plot_layout(widths = c(1, 1, .2))
Created on 2022-01-18 by the reprex package (v2.0.1)
(Copied this over from Alternating color of individual dashes in a geom_line)
Here's a ggplot hack that is simple, but works for two colors only. It results in two lines being overlayed, one a solid line, the other a dashed line.
library(dplyr)
library(ggplot2)
library(reshape2)
# Create df
x_value <- 1:10
group1 <- c(0,1,2,3,4,5,6,7,8,9)
group2 <- c(0,2,4,6,8,10,12,14,16,18)
dat <- data.frame(x_value, group1, group2) %>%
mutate(group2_2 = group2) %>% # Duplicate the column that you want to be alternating colors
melt(id.vars = "x_value", variable.name = "group", value.name ="y_value") # Long format
# Put in your selected order
dat$group <- factor(dat$group, levels=c("group1", "group2", "group2_2"))
# Plot
ggplot(dat, aes(x=x_value, y=y_value)) +
geom_line(aes(color=group, linetype=group), size=1) +
scale_color_manual(values=c("black", "red", "black")) +
scale_linetype_manual(values=c("solid", "solid", "dashed"))
Unfortunately the legend still needs to be edited by hand. Here's the example plot.

Simulate a two-dimensional random walk in a grid in R and plot with ggplot

I was looking for a simple code that could simulate a two-dimensional random walk in a grid (using R), and then plot the data using ggplot.
In particular, I was interested to a random walk from few position (5 points) in a 2D grid to the center of the square grid. It is just for visualisation purposes.
And my idea was then to plot the results with ggplot on a discrete grid (as the one simulated), may be using the function geom_tile.
Do you have any suggestion for a pre-existing code that I could easily manipulate?
Here is a small example with a for loop. From here, you can simply adjust how X_t and Y_t are defined:
Xt = 0; Yt = 0
for (i in 2:1000)
{
Xt[i] = Xt[i-1] + rnorm(1,0,1)
Yt[i] = Yt[i-1] + rnorm(1,0,1)
}
df <- data.frame(x = Xt, y = Yt)
ggplot(df, aes(x=x, y=y)) + geom_path() + theme_classic() + coord_fixed(1)
EDIT ----
After chatting with OP I've revised the code to include a step probability. This may result in the walk being stationary much more frequently. In higher dimensions, you will need to scale your prob factor lower in order to compensate for more options.
finally, my function does not account for an absolute distance, it only considers points on the grid that are within a certain step size in all dimensions. For example, hypothetically, at position c(0,0) you could go to c(1,1) with this function. But I guess this is relative to the grid's connectiveness.
If the OP wants to only consider nodes that are within 1 (by distance) of the current position, then use the following version of move_step()
move_step <- function(cur_pos, grid, prob = 0.04, size = 1){
opts <- grid %>%
rowwise() %>%
mutate(across(.fns = ~(.x-.env$cur_pos[[cur_column()]])^2,
.names = '{.col}_square_diff')) %>%
filter(sqrt(sum(c_across(ends_with("_square_diff"))))<=.env$size) %>%
select(-ends_with("_square_diff")) %>%
left_join(y = mutate(cur_pos, current = TRUE), by = names(grid))
new_pos <- opts %>%
mutate(weight = case_when(current ~ 1-(prob*(n()-1)), #calculate chance to move,
TRUE ~ prob), #in higher dimensions, we may have more places to move
weight = if_else(weight<0, 0, weight)) %>% #thus depending on prob, we may always move.
sample_n(size = 1, weight = weight) %>%
select(-weight, -current)
new_pos
}
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(ggplot2)
library(gganimate)
move_step <- function(cur_pos, grid, prob = 0.04, size = 1){
opts <- grid %>%
filter(across(.fns = ~ between(.x, .env$cur_pos[[cur_column()]]-.env$size, .env$cur_pos[[cur_column()]]+.env$size))) %>%
left_join(y = mutate(cur_pos, current = TRUE), by = names(grid))
new_pos <- opts %>%
mutate(weight = case_when(current ~ 1-(prob*(n()-1)), #calculate chance to move,
TRUE ~ prob), #in higher dimensions, we may have more places to move
weight = if_else(weight<0, 0, weight)) %>% #thus depending on prob, we may always move.
sample_n(size = 1, weight = weight) %>%
select(-weight, -current)
new_pos
}
sim_walk <- function(cur_pos, grid, grid_prob = 0.04, steps = 50, size = 1){
iterations <- cur_pos
for(i in seq_len(steps)){
cur_pos <- move_step(cur_pos, grid, prob = grid_prob, size = size)
iterations <- bind_rows(iterations, cur_pos)
}
iterations$i <- 1:nrow(iterations)
iterations
}
origin <- data.frame(x = 0, y =0)
small_grid <- expand.grid(x = -1:1, y = -1:1)
small_walk <- sim_walk(cur_pos = origin,
grid = small_grid)
ggplot(small_walk, aes(x, y)) +
geom_path() +
geom_point(color = "red") +
transition_reveal(i) +
labs(title = "Step {frame_along}") +
coord_fixed()
large_grid <- expand.grid(x = -10:10, y = -10:10)
large_walk <- sim_walk(cur_pos = origin,
grid = large_grid,
steps = 100)
ggplot(large_walk, aes(x,y)) +
geom_path() +
geom_point(color = "red") +
transition_reveal(i) +
labs(title = "Step {frame_along}") +
xlim(c(-10,10)) + ylim(c(-10,10))+
coord_fixed()
large_walk %>%
count(x, y) %>%
right_join(y = expand.grid(x = -10:10, y = -10:10), by = c("x","y")) %>%
mutate(n = if_else(is.na(n), 0L, n)) %>%
ggplot(aes(x,y)) +
geom_tile(aes(fill = n)) +
coord_fixed()
multi_dim_walk <- sim_walk(cur_pos = data.frame(x = 0, y = 0, z = 0),
grid = expand.grid(x = -20:20, y = -20:20, z = -20:20),
steps = 100, size = 2)
library(cowplot)
plot_grid(
ggplot(multi_dim_walk, aes(x, y)) + geom_path(),
ggplot(multi_dim_walk, aes(x, z)) + geom_path(),
ggplot(multi_dim_walk, aes(y, z)) + geom_path())
Created on 2021-05-06 by the reprex package (v1.0.0)
Here is a base R option using Reduce + replicate + plot for 2D random walk process
set.seed(0)
plot(
setNames(
data.frame(replicate(
2,
Reduce(`+`, rnorm(99), init = 0, accumulate = TRUE)
)),
c("X", "Y")
),
type = "o"
)

Is there a programatic way to pass specific ranges for the y-axis on a ggplot2 plot?

I've got plots that are being generated automatically based on some user inputs. Most of the time, the plots work fine. However, some users have requested to ensure that there is always an axis label on each end of the plotted data. For example, this plot:
sample_data <-
data.frame(
x = rep(LETTERS[1:3], each = 3)
, y = 1:9 + 0.5
)
ggplot(
sample_data
, aes(x = x, y = y)) +
stat_summary(
fun = "mean"
)
Has no label above the top point or below the bottom point. I can add them easily enough with expand_limits:
ggplot(
sample_data
, aes(x = x, y = y)) +
stat_summary(
fun = "mean"
) +
expand_limits(y = c(2, 10))
However, because these plots are being automatically generated, I cannot manually add the next axis point each time. I've tried passing only.loose = TRUE to labeling:extended, but that still doesn't change the displayed values (any more than entering the values that I want would):
ggplot(
sample_data
, aes(x = x, y = y)) +
stat_summary(
fun = "mean"
) +
scale_y_continuous(breaks = breaks_extended(only.loose = TRUE))
In addition, some of the plots are more complex than this (e.g., with or without confidence intervals, additional grouping, etc.), and the data is prepared for the plot using dplyr and piped directly into ggplot (with %>%). So, even something like recalculating the values is non-trivial.
In fact, even in this case, it fails because adding the expanded points to capture the next set of labels changes the labeling.
ggplot(
sample_data
, aes(x = x, y = y)) +
stat_summary(
fun = "mean"
) +
scale_y_continuous(breaks = breaks_extended(n = 5
, only.loose = TRUE)) +
expand_limits(y =
sample_data %>%
group_by(x) %>%
summarise(my_mean = mean(y)) %>%
pull(my_mean) %>%
range() %>%
{labeling::extended(.[1], .[2], 5
, only.loose = TRUE)}
)
It appears that this happens because
labeling::extended(2.5, 8.5, 5, only.loose = TRUE)
returns the range 2 to 9 by 1's, while:
labeling::extended(2, 9, 5, only.loose = TRUE)
returns the range 2 to 10 by 2's. Somehow, breaks_extended is throwing in some added variation, though whether I track it down or not doesn't change much. I could work around this by calculating the breaks first, but (again) this is for a fairly complicated set of plots.
I feel like I am missing some sort of obvious point, but it keeps eluding me.
Yes there is a programmatic way to set the limits on y-scales and that is to provide a function to the limits argument. It is given the natural data limits as input that you can then edit programmatically. The same goes for breaks, except the input are the limits.
Example below, how this code should look exactly is up to your specifications.
library(ggplot2)
sample_data <- data.frame(
x = rep(LETTERS[1:3], each = 3),
y = 1:9 + 0.5
)
ggplot(sample_data,
aes(x = x, y = y)) +
stat_summary(
fun = "mean"
) +
scale_y_continuous(
limits = function(x) {
lower <- floor(x[1])
lower <- ifelse(x[1] - lower < 0.5, lower - 1, lower)
upper <- ceiling(x[2])
upper <- ifelse(upper - x[2] <= 0.5, upper + 1, upper)
c(lower, upper)
},
breaks = function(x) {
scales::breaks_pretty()(x)
}
)
#> Warning: Removed 3 rows containing missing values (geom_segment).
Created on 2021-03-23 by the reprex package (v1.0.0)
Inspired by teunbrand, I built a function that generates the limits, then checks to ensure that the expansion (including the 5% buffer) does not change the output of pretty
my_lims_expand <- function(x){
prev_pass <-
range(pretty(x))
curr_pass <-
pretty(c(prev_pass[1] - 0.05 * diff(prev_pass)
, prev_pass[2] + 0.05 * diff(prev_pass)))
last_under <-
tail(which(curr_pass < min(x)), 1)
first_over <-
head(which(curr_pass > max(x)), 1)
out <-
range(curr_pass[last_under:first_over])
confirm_out <-
range(pretty(out))
while(!all(out == confirm_out)){
prev_pass <- curr_pass
curr_pass <-
pretty(c(prev_pass[1] - 0.05 * diff(prev_pass)
, prev_pass[2] + 0.05 * diff(prev_pass)))
last_under <-
tail(which(curr_pass < min(x)), 1)
first_over <-
head(which(curr_pass > max(x)), 1)
out <-
range(curr_pass[last_under:first_over])
confirm_out <-
range(pretty(out))
}
return(out)
}
Then, I can use that function for limits:
ggplot(sample_data,
aes(x = x, y = y)) +
stat_summary(
fun = "mean"
) +
scale_y_continuous(
limits = my_lims_expand
, breaks = pretty
)
to generate the desired plot:

Loop printing lots of graphs in order (PDF) using ggplot2 in R

I have a large dataset as a result of a bayesian logistic regression. The dataset contains parameter estimates, confidence intervals, etc (see below for head).
mean sd confint_2.5 confint_97.5 Rhat median spec Errorup Errordown
1 -0.7897597 0.18668304 -1.1759960 -0.4517294 1.002211 -0.7811156 Marvulg -0.3293862 -1.957112
2 -0.7891327 0.08145761 -0.9570086 -0.6380287 1.000155 -0.7861764 Viotric -0.1481477 -1.743185
3 -0.6619662 0.26049168 -1.2203315 -0.2059030 1.045208 -0.6440501 Antdioi -0.4381470 -1.864382
4 -0.6571516 0.17940842 -1.0417642 -0.3364415 1.008100 -0.6470382 Eleacic -0.3105968 -1.688802
5 -0.6526717 0.20005184 -1.0816375 -0.2968111 1.005126 -0.6394952 Antcotu -0.3426842 -1.721133
6 -0.6497648 0.16620699 -1.0081607 -0.3555847 1.003738 -0.6384035 Triflav -0.2828188 -1.646564
I have a total of 714 rows of data, sorted (mean) from low to high. I use this code to plot 50 at a time, where a3_sort is a subset of 50 rows of data (so manually doing a3_sort <- a3[n:n,), after which I print the subset and proceed to the next 50):
ggplot2::ggplot(data = a3_sort, mapping = aes(x = reorder(spec, mean), y = mean, ymin = confint_97.5, ymax = confint_2.5))+
geom_pointrange()+
geom_hline(yintercept = 0, lty = 2)+
coord_flip()+
xlab ("species") +ylab ("mean (credibility interval)")+
theme_bw()
This works, and I get what I want, but there must be a less manual labour way to do this?
My question: Is there a way to loop this procedure, automatically saving the PDF in the working directory?
Below an example of what one plot looks like:
You can try this solution. I tested with dummy data DF with 714 rows and same columns as you have. DF in your case is your sorted dataframe of 714 rows and the variables you have. I have set the code so that you can change if you require a width larger than 50.
library(zoo)
#Create keys; change 50 if you want a larger window
keys <- seq(1, nrow(DF), 50)
vals=1:length(keys)
#Flag to allocate the position and values
#na.locf is used to complete NA so that we have same index
DF$Flag <- NA
DF$Flag[keys]<-vals
DF$Flag <- na.locf(DF$Flag)
#Then split by flag
ListData <- split(DF,DF$Flag)
#Function to create plot
myplot <- function(x)
{
tplot <- ggplot2::ggplot(data = x, mapping = aes(x = reorder(spec, mean), y = mean, ymin = confint_97.5, ymax = confint_2.5))+
geom_pointrange()+
geom_hline(yintercept = 0, lty = 2)+
coord_flip()+
xlab ("species") +ylab ("mean (credibility interval)")+
theme_bw()
return(tplot)
}
#Replicate plots
LPlots <- lapply(ListData,myplot)
#Export to pdf
pdf('Myplots.pdf',width = 14)
for(i in c(1:length(LPlots)))
{
plot(LPlots[[i]])
}
dev.off()
In the end, you will have your plots in pdf. I hope this helps. Let me know if you have any doubt.
This approach could be adapted to your case:
# Some dummy data:
df <- data.frame(g = letters[1:24],
min = sample(0:10, 24, replace = TRUE),
mid = sample(11:20, 24, replace = TRUE),
max = sample(21:30, 24, replace = TRUE))
library(ggplot2)
library(purrr)
# list of the rows you want printing, this could be automated
plot_range <- list(p1_6 = 1:6, p7_12 = 7:12, p13_18 = 13:18, p19_24 = 19:24)
# plotting function which also sets a title and plot name
gg_plot <- function(df, plot_rows){
title <- paste("Automatic plot rows: ", min(plot_rows), "to", max(plot_rows))
plot_nm <- paste("plots", min(plot_rows), max(plot_rows), sep = "_")
p <- ggplot(df[plot_rows, ])+
geom_segment(aes(x = min , xend = max, y = g, yend = g))+
geom_point(aes(x = mid, y = g))+
ggtitle(title)
print(ggsave(plot_nm, p, device = "pdf"))
}
# purrr function which acts as a loop to print each graph and allows a different data frame to be used.
walk(plot_range, ~gg_plot(df = df, plot_rows = .x))
#> Saving 7 x 5 in image
#> NULL
#> Saving 7 x 5 in image
#> NULL
#> Saving 7 x 5 in image
#> NULL
#> Saving 7 x 5 in image
#> NULL
Created on 2020-07-11 by the reprex package (v0.3.0)

How is the binning done in stat_summary_bin in ggplot2?

I'm trying to add some custom features to a bin-scatter plot using ggplot2. The original way that I was doing the bin-scatter was with stat_summary_bin(fun.y="mean"). This seems to produce a reasonable binning, but when I try to reproduce it by binning manually, I keep getting slightly different results -- especially at the right tail.
Can anyone help me figure out how the binning in stat_summary_bin is done? I need to figure out if this is a reliable form of bin-scattering that I can use...
library(tidyverse)
library(mltools)
#>
#> Attaching package: 'mltools'
#> The following object is masked from 'package:tidyr':
#>
#> replace_na
x = runif(1000, 0, 10)
y = x + rnorm(1000, 0.5, 2)
plot(x,y)
df <- data.frame(x = x, y = y)
p <- df %>%
ggplot(aes(x = x, y = y)) +
stat_summary_bin(aes(color ="stat summary"),fun.y = "mean", size = 2.5, geom="point", bins=20)
p
## Attempt 1 at binning
df$x_bin <- mltools::bin_data(df$x, bins=20, binType = "explicit")
df_binned <- df %>%
group_by(x_bin) %>%
mutate(
x_binned = mean(x),
y_binned = mean(y)
) %>%
ungroup()
p <- p + geom_point(aes(x = df_binned$x_binned, y = df_binned$y_binned, color = "manual bin"), size = 2.5)
p
## Attempt 2 at binning
xbreaks = quantile(df$x, probs = seq(0,1,0.05))
df_binned$x_bin_2 <- cut(df$x, xbreaks, include.lowest = T)
df_binned <- df_binned %>%
group_by(x_bin_2) %>%
mutate(
x_binned2 = mean(x),
y_binned2 = mean(y)
) %>%
ungroup()
p <- p + geom_point(aes(x = df_binned$x_binned2, y = df_binned$y_binned2, color = "2nd manual bin"), size = 2.5)
p
Created on 2018-09-09 by the reprex
package (v0.2.0).

Resources