Clip {ggdist} ccdf-barplot

Clip {ggdist} ccdf-barplot - r

I'm making a complementary cumulative distribution function barplot with {ggdist}. When I export the plot to svg (or other vector representation), I notice that there is a zero-width stripe protruding from the polygon (see attached image). I rather not have this protruding stripe.
library(ggplot2)
library(ggdist)
df <- data.frame(
x = rep(c("A", "B"), each = 10),
y = c(rnorm(20, mean = rep(c(5, 7), each = 10)))
)
p <- ggplot(df, aes(x, y)) +
stat_ccdfinterval(geom = "slab") +
lims(y = c(0, NA))
ggsave("test.svg", plot = p, device = svglite::svglite)
The code above results in an svg file below, wherein I've highlighted the outline of the polygon with the stripe.
I'd like to get rid of that zero-width stripe. I tried setting the relevant aesthetic to NA at these points, but that also deletes one of the corners that is not part of the stripe.
ggplot(df, aes(x, y)) +
stat_ccdfinterval(
geom = "slab",
aes(thickness = after_stat(ifelse(f == 0, NA, f)))
) +
lims(y = c(0, NA))
Created on 2022-03-10 by the reprex package (v2.0.1)

I found a solution specific to this problem, but it might pan out differently if the orientation is horizontal or the cdf instead of the ccdf is used. In brief, we're still setting 0-thickness datapoints to NA, but we now do this only where the y aesthetic exceeds the groupwise maximum.
library(ggplot2)
library(ggdist)
df <- data.frame(
x = rep(c("A", "B"), each = 10),
y = c(rnorm(20, mean = rep(c(5, 7), each = 10)))
)
helper <- function(f, y, group) {
split(f, group) <- Map(
function(value, y) {
f <- value
max_y <- max(y[f != 0])
f[f == 0 & y > max_y] <- NA
f
},
value = split(f, group),
y = split(y, group)
)
f
}
ggplot(df, aes(x, y)) +
stat_ccdfinterval(
geom = "slab",
aes(thickness = after_stat(helper(f, y, group)))
) +
lims(y = c(0, NA))
Created on 2022-03-13 by the reprex package (v2.0.1)

Related

Reverse legend in geom_contour

I'm struggling to find the right solution to reverse the legend, so that red is at the bottom and greens at the top. Here's a simple example.
library(ggplot2)
library(dplyr)
x = seq(0.01,1,0.01)
y = seq(0.01,1,0.01)
df <- expand.grid(x = x, y = y)
df <- df %>% mutate(z = x*y/(1 + x))
ggplot(df, aes(x = x, y = y, z = z)) +
geom_contour_filled(bins = 10) +
geom_contour(bins = 20, colour = "grey") +
scale_fill_manual(values = rainbow(20))

Couple of issues:
You're using 20 colours to describe 10 bins.
You're using the entire rainbow for a red-green gradient.
Suggested fix is to use the end and rev arguments of the rainbow() function.
library(ggplot2)
library(dplyr)
x = seq(0.01,1,0.01)
y = seq(0.01,1,0.01)
df <- expand.grid(x = x, y = y)
df <- df %>% mutate(z = x*y/(1 + x))
ggplot(df, aes(x = x, y = y, z = z)) +
geom_contour_filled(bins = 10) +
geom_contour(bins = 20, colour = "grey") +
scale_fill_manual(values = rainbow(10, end = 0.4, rev = TRUE))
Created on 2022-05-15 by the reprex package (v2.0.1)
Aside from that, you might want to consider to take a palette that has better visual properties than a rainbow. For example, can you really discriminate the 2nd-4th green bins visually? A close palette with better (but not perfect) properties is viridisLite::turbo(10, begin = 0.5).

Simulate a two-dimensional random walk in a grid in R and plot with ggplot

I was looking for a simple code that could simulate a two-dimensional random walk in a grid (using R), and then plot the data using ggplot.
In particular, I was interested to a random walk from few position (5 points) in a 2D grid to the center of the square grid. It is just for visualisation purposes.
And my idea was then to plot the results with ggplot on a discrete grid (as the one simulated), may be using the function geom_tile.
Do you have any suggestion for a pre-existing code that I could easily manipulate?

Here is a small example with a for loop. From here, you can simply adjust how X_t and Y_t are defined:
Xt = 0; Yt = 0
for (i in 2:1000)
{
Xt[i] = Xt[i-1] + rnorm(1,0,1)
Yt[i] = Yt[i-1] + rnorm(1,0,1)
}
df <- data.frame(x = Xt, y = Yt)
ggplot(df, aes(x=x, y=y)) + geom_path() + theme_classic() + coord_fixed(1)

EDIT ----
After chatting with OP I've revised the code to include a step probability. This may result in the walk being stationary much more frequently. In higher dimensions, you will need to scale your prob factor lower in order to compensate for more options.
finally, my function does not account for an absolute distance, it only considers points on the grid that are within a certain step size in all dimensions. For example, hypothetically, at position c(0,0) you could go to c(1,1) with this function. But I guess this is relative to the grid's connectiveness.
If the OP wants to only consider nodes that are within 1 (by distance) of the current position, then use the following version of move_step()
move_step <- function(cur_pos, grid, prob = 0.04, size = 1){
opts <- grid %>%
rowwise() %>%
mutate(across(.fns = ~(.x-.env$cur_pos[[cur_column()]])^2,
.names = '{.col}_square_diff')) %>%
filter(sqrt(sum(c_across(ends_with("_square_diff"))))<=.env$size) %>%
select(-ends_with("_square_diff")) %>%
left_join(y = mutate(cur_pos, current = TRUE), by = names(grid))
new_pos <- opts %>%
mutate(weight = case_when(current ~ 1-(prob*(n()-1)), #calculate chance to move,
TRUE ~ prob), #in higher dimensions, we may have more places to move
weight = if_else(weight<0, 0, weight)) %>% #thus depending on prob, we may always move.
sample_n(size = 1, weight = weight) %>%
select(-weight, -current)
new_pos
}
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(ggplot2)
library(gganimate)
move_step <- function(cur_pos, grid, prob = 0.04, size = 1){
opts <- grid %>%
filter(across(.fns = ~ between(.x, .env$cur_pos[[cur_column()]]-.env$size, .env$cur_pos[[cur_column()]]+.env$size))) %>%
left_join(y = mutate(cur_pos, current = TRUE), by = names(grid))
new_pos <- opts %>%
mutate(weight = case_when(current ~ 1-(prob*(n()-1)), #calculate chance to move,
TRUE ~ prob), #in higher dimensions, we may have more places to move
weight = if_else(weight<0, 0, weight)) %>% #thus depending on prob, we may always move.
sample_n(size = 1, weight = weight) %>%
select(-weight, -current)
new_pos
}
sim_walk <- function(cur_pos, grid, grid_prob = 0.04, steps = 50, size = 1){
iterations <- cur_pos
for(i in seq_len(steps)){
cur_pos <- move_step(cur_pos, grid, prob = grid_prob, size = size)
iterations <- bind_rows(iterations, cur_pos)
}
iterations$i <- 1:nrow(iterations)
iterations
}
origin <- data.frame(x = 0, y =0)
small_grid <- expand.grid(x = -1:1, y = -1:1)
small_walk <- sim_walk(cur_pos = origin,
grid = small_grid)
ggplot(small_walk, aes(x, y)) +
geom_path() +
geom_point(color = "red") +
transition_reveal(i) +
labs(title = "Step {frame_along}") +
coord_fixed()
large_grid <- expand.grid(x = -10:10, y = -10:10)
large_walk <- sim_walk(cur_pos = origin,
grid = large_grid,
steps = 100)
ggplot(large_walk, aes(x,y)) +
geom_path() +
geom_point(color = "red") +
transition_reveal(i) +
labs(title = "Step {frame_along}") +
xlim(c(-10,10)) + ylim(c(-10,10))+
coord_fixed()
large_walk %>%
count(x, y) %>%
right_join(y = expand.grid(x = -10:10, y = -10:10), by = c("x","y")) %>%
mutate(n = if_else(is.na(n), 0L, n)) %>%
ggplot(aes(x,y)) +
geom_tile(aes(fill = n)) +
coord_fixed()
multi_dim_walk <- sim_walk(cur_pos = data.frame(x = 0, y = 0, z = 0),
grid = expand.grid(x = -20:20, y = -20:20, z = -20:20),
steps = 100, size = 2)
library(cowplot)
plot_grid(
ggplot(multi_dim_walk, aes(x, y)) + geom_path(),
ggplot(multi_dim_walk, aes(x, z)) + geom_path(),
ggplot(multi_dim_walk, aes(y, z)) + geom_path())
Created on 2021-05-06 by the reprex package (v1.0.0)

Here is a base R option using Reduce + replicate + plot for 2D random walk process
set.seed(0)
plot(
setNames(
data.frame(replicate(
2,
Reduce(`+`, rnorm(99), init = 0, accumulate = TRUE)
)),
c("X", "Y")
),
type = "o"
)

Define colors depending on difference to previous value - points and lines

I wish to create a plot which returns me the colours of each point conditionated by the previous one. In other words, if the y value of x is greater than in x-1, then it should show up in green whereas if it were smaller it should appear in red.
Is there an easy way to do it with ggplot2?

An option using the sign of the difference, assuming data is sorted on x.
library(ggplot2)
df <- data.frame(x = 1:20, y = runif(20, min = 0, max = 2))
ggplot(df, aes(x, y)) +
geom_point(aes(colour = as.factor(c(0, sign(diff(y)))))) +
scale_colour_manual(values = c("red", "grey", "green"))
Created on 2021-01-26 by the reprex package (v0.3.0)
EDIT:
To do this for lines, it is often best to reparametrize data to geom_segment()s, since I don't think geom_line() handles multiple colours per line very well. Example below of converting the economics data to segments.
library(ggplot2)
df <- economics[, c("date", "unemploy")]
df$date_end <- c(tail(df$date, -1), NA)
df$unemploy_end <- c(tail(df$unemploy, -1), NA)
df <- head(df, -1)
ggplot(df, aes(date, unemploy,
xend = date_end,
yend = unemploy_end)) +
geom_segment(aes(colour = unemploy > unemploy_end))
Created on 2021-01-26 by the reprex package (v0.3.0)

This is an example:
library(ggplot2)
df <- data.frame(x = 1:20, y = runif(20, min = 0, max = 2))
v <- (unlist(lapply(1:20, function(i) ifelse(df$y[i] > df$y[i-1], "green", "red"))))
# add your condition on first element (e.g "blue")
v <- c("blue", v)
ggplot(df) +
geom_point(aes(x, y), color = v)

ggplot: Drawing tiles / rectangles with discrete variables

I'm attempting to draw tiles / rectangles to get the following result:
library(tidyverse)
library(plotly)
set.seed(0)
df <- tibble(
a = runif(5),
b = runif(5),
c = runif(5),
d = runif(5),
case_id = 1:5
) %>% tidyr::pivot_longer(cols = -case_id)
plot <- ggplot2::ggplot(
data = df,
mapping = aes(
x = name,
y = value,
group = case_id
)
) + geom_point()
plot_boxes_y <- seq(from = 0, to = 1, by = .2)
plot_boxes_x <- unique(df$name) %>% length()
for (x in 1:plot_boxes_x) {
for (y in plot_boxes_y) {
plot <- plot + geom_rect(
mapping = aes_(
xmin = x - .5,
xmax = x + .5,
ymin = y - .5,
ymax = y + .5
),
color = "red",
fill = NA
)
}
}
plotly::ggplotly(plot)
As you can see, I currently do this by looping through coordinates and drawing each rectangle individually. The problem is, that this generates many layers which makes plotly::ggplotly() really slow on large datasets.
Therefore, I'm looking for a more efficient way. Please note, that I cannot use the panel.grid, since I intend to visualize z-data by filling rectangles later on.
My approach was to draw geom_tile() on top of the scatter plot:
# my attempt
df$z <- rep(0, nrow(df))
plot2 <- ggplot2::ggplot(
data = df,
mapping = aes(
x = name,
y = value,
color = z,
group = case_id
)
) + geom_point() + geom_tile()
I assume that this fails because of the fact that name is a discrete variable? So, how can i efficiently draw tiles in addition to my scatterplot?
Thanks

Here is a solution using the geom_tile option. The key here creating a data frame to hold the coordinates of the grid and then specifying the aesthetics individually in each of the function calls.
library(ggplot2)
library(tidyr)
set.seed(0)
df <- tibble(
a = runif(5),
b = runif(5),
c = runif(5),
d = runif(5),
case_id = 1:5
) %>% pivot_longer(cols = -case_id)
df$z <- rep(0, nrow(df))
#make data frame for the grid corrdinates
grid<-data.frame(x=factor( ordered( 1:4), labels = c("a", "b", "c", "d" )),
y=rep(seq(0, 1, .1), each=4))
#plot using geom_tile & geom_point
plot2 <- ggplot2::ggplot() + geom_tile(data=grid, aes(x=x, y=y), fill=NA, col="red") +
geom_point(data = df,
mapping = aes(
x = name,
y = value,
color = z,
group = case_id))
print(plot2)

if you don't mind them going beyond the axis
ggplot(df,aes(x=name,y=value)) + geom_point() +
geom_vline(xintercept=seq(0.5,4.5,by=1)) +
geom_hline(yintercept=seq(0,2,by=.2))
else:
#make a new data frame
GRIDS = rbind(
# the vertical lines
data.frame(x=seq(0.5,4.5,by=1),xend=seq(0.5,4.5,by=1),y=0,yend=2),
# the horizontal lines
data.frame(x=0.5,xend=4.5,y=seq(0,2,by=.2),yend=seq(0,2,by=.2))
)
ggplot(df,aes(x=name,y=value)) + geom_point() +
geom_segment(data=GRIDS,aes(x=x,y=y,xend=xend,yend=yend),col="red")

Superscripts within ggplot2's axis text

I would like to create a graph that has superscripts on the axis instead of displaying unformatted numbers using ggplot2. I know that there are a lot of answers which change the axis label, but not the axis text. I am not trying to change the label of the graph, but the text on the axis.
Example:
x<-c('2^-5','2^-3','2^-1','2^1','2^2','2^3','2^5','2^7','2^9','2^11','2^13')
y<-c('2^-5','2^-3','2^-1','2^1','2^2','2^3','2^5','2^7','2^9','2^11','2^13')
df<-data.frame(x,y)
p<-ggplot()+
geom_point(data=df,aes(x=x,y=y),size=4)
p
So I would like the x-axis to display the same numbers but without the carrot.

EDIT:
A purely base approach:
df %>%
mutate_all(as.character)->new_df
res<-unlist(Map(function(x) eval(parse(text=x)),new_df$x))#replace with y for y
to_use<-unlist(lapply(res,as.expression))
split_text<-strsplit(gsub("\\^"," ",names(to_use))," ")
join_1<-as.numeric(sapply(split_text,"[[",1)) #tidyr::separate might help, less robust for numeric(I think)
join_2<-as.numeric(sapply(split_text,"[[",2))
to_use_1<-sapply(seq_along(join_1),function(x) parse(text=paste(join_1[x],"^",
join_2[x])))
The above can be reduced to less step, I posted the stepwise approach I took. The result for only x, the same can be done for y:
new_df %>%
ggplot()+
geom_point(aes(x=x,y=y),size=4)+
scale_x_discrete(breaks=df$x,labels=to_use_1)#replace with y and scale_y_discrete for y
Plot:
Original and erroneous answer:
I have deviated from standard tidyverse practice by using $, you can replace it with . and it might work although in this case it's not really important since the focus is on labels.:
library(dplyr)
df %>%
mutate(new_x=gsub("\\^"," ",x),
new_y=gsub("\\^"," ",y))->new_df
new_df %>%
ggplot()+
geom_point(aes(x=x,y=y),size=4)+
scale_x_discrete(breaks=x,labels=new_df$new_x)+
scale_y_discrete(breaks=y,labels=new_df$new_y)

This can be done with functions scale_x_log2 and scale_y_log2 that can be found in GitHub package jrnoldmisc.
First, install the package.
devtools::install_github("jrnold/rubbish")
Then, coerce the variables to numeric. I wil work with a copy of the original dataframe.
df1 <- df
df1[] <- lapply(df1, function(x){
x <- as.character(x)
sapply(x, function(.x)eval(parse(text = .x)))
})
Now, graph it.
library(jrnoldmisc)
library(ggplot2)
library(MASS)
library(scales)
a <- ggplot(df1, aes(x = x, y = y, size = 4)) +
geom_point(show.legend = FALSE) +
scale_x_log2(limits = c(0.01, NA),
labels = trans_format("log2", math_format(2^.x)),
breaks = trans_breaks("log2", function(x) 2^x, n = 10)) +
scale_y_log2(limits = c(0.01, NA),
labels = trans_format("log2", math_format(2^.x)),
breaks = trans_breaks("log2", function(x) 2^x, n = 10))
a + annotation_logticks(base = 2)
Edit.
Following the discussion in the comments, here are the two other ways that were seen to give different axis labels.
Axis labels every tick mark. Set limits = c(1.01, NA) and function argument n = 11, an odd number.
Axis labels on odd number exponents. Keep limits = c(0.01, NA), change to function(x) 2^(x - 1), n = 11.
Just the instructions, no plots.
The first.
a <- ggplot(df1, aes(x = x, y = y, size = 4)) +
geom_point(show.legend = FALSE) +
scale_x_log2(limits = c(1.01, NA),
labels = trans_format("log2", math_format(2^.x)),
breaks = trans_breaks("log2", function(x) 2^(x), n = 11)) +
scale_y_log2(limits = c(1.01, NA),
labels = trans_format("log2", math_format(2^.x)),
breaks = trans_breaks("log2", function(x) 2^(x), n = 11))
a + annotation_logticks(base = 2)
And the second.
a <- ggplot(df1, aes(x = x, y = y, size = 4)) +
geom_point(show.legend = FALSE) +
scale_x_log2(limits = c(0.01, NA),
labels = trans_format("log2", math_format(2^.x)),
breaks = trans_breaks("log2", function(x) 2^(x - 1), n = 11)) +
scale_y_log2(limits = c(0.01, NA),
labels = trans_format("log2", math_format(2^.x)),
breaks = trans_breaks("log2", function(x) 2^(x - 1), n = 11))
a + annotation_logticks(base = 2)

You can provide a function to the labels argument of the scale_x_*** and scale_y_*** functions to generate labels with superscripts (or other formatting). See examples below.
library(jrnoldmisc)
library(ggplot2)
df<-data.frame(x=2^seq(-5,5,2),
y=2^seq(-5,5,2))
ggplot(df) +
geom_point(aes(x=x,y=y),size=2) +
scale_x_log2(breaks=2^seq(-5,5,2),
labels=function(x) parse(text=paste("2^",round(log2(x),2))))
ggplot(df) +
geom_point(aes(x=x,y=y),size=2) +
scale_x_continuous(breaks=c(2^-5, 2^seq(1,5,2)),
labels=function(x) parse(text=paste("2^",round(log2(x),2))))
ggplot(df) +
geom_point(aes(x=x,y=y),size=2) +
scale_x_log10(breaks=10^seq(-1,1,1),
labels=function(x) parse(text=paste("10^",round(log10(x),2))))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Clip {ggdist} ccdf-barplot - r

Related

Reverse legend in geom_contour

Simulate a two-dimensional random walk in a grid in R and plot with ggplot

Define colors depending on difference to previous value - points and lines

ggplot: Drawing tiles / rectangles with discrete variables

Superscripts within ggplot2's axis text

Categories

Resources