Is it possible to generate a gganimate plot with random numbers? [closed] - r

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I wish to generate a gganimate object that shows n random points in a specific range. The other limitation is that it should plot 2^n points, meaning 2, 4, 8, 16, 32, 64... points. I did this to calculate the decimals of pi but I wish to plot this animation so I can show how it improves the results given more random numbers in a nicer way.
This is what I have so far:
results <- c()
for(i in c(1:20)) {
r <- 1
limit <- 2^i
points <- data.frame(
x = runif(limit, -r, r),
y = runif(limit, -r, r))
points$d <- sqrt(points$x^2 + points$y^2)
points$type <- ifelse(points$d < r, "c", "s")
picalc <- 4 * length(points$type[points$type=="c"]) / limit
error <- pi - picalc
label <- paste0('Pi calc : ', round(picalc, 6), '\nError : ', round(error, 6))
iter <- data.frame(n = limit, picalc = picalc, error = error, label = label)
results <- rbind(results, iter)
}
# GGANIMATE
library(ggplot2)
library(gganimate)
p <- ggplot(results, aes(x = runif(n, -1, 1), y = runif(n, -1, 1))) +
geom_point(lwd = 2, alpha = 0.3) +
theme_minimal() +
geom_text(aes(x = 0, y = 0, label = label), size = 5) +
labs(caption = 'Number of random points : {frame_time}') +
transition_time(n)
animate(p, nframes = nrow(results), fps = 5)
Any suggestions?

Here's how I would "show how it improves the results given more random numbers in a nicer way."
library(ggplot2)
library(gganimate)
p <- ggplot(results, aes(x = n, y = error)) +
geom_point(lwd = 2, alpha = 0.3) +
theme_minimal() +
geom_text(aes(x = 0, y = 0, label = label), size = 5, hjust = 0) +
scale_x_log10(breaks = c(2^(1:4), 4^(2:10)), minor_breaks = NULL) +
labs(caption = 'Number of random points : {2^frame}') +
transition_manual(n) +
shadow_trail(exclude_layer = 2)
animate(p, nframes = nrow(results), fps = 5)
To show the kind of picture described in the question, you'd need to have the points labeled with the frame they belong to. (Also, as currently written, the points are randomly assigned afresh each iteration. It would be better to set all your points first, and then stick with those, in growing window sizes, to calculate the results.)
To do this quickly, I'm taking the last points frame (as it exists when i is at the end of the loop), and adding a number for what frame it should belong to. Then I can plot the points of each frame using transition_manual, and keep the past frames using shadow_trail.
Note, ggplot will be slower than I cared to wait if you run it with 1M points, so I did an abridged version up to 2^15 = 32k.
# Note, I only ran the orig loop for 2^(1:15), lest it get too slow
points2 <- points %>%
mutate(row = row_number(),
count = 2^ceiling(log2(row)))
point_plot <- ggplot(points2,
aes(x = x, y = y, color = type, group = count)) +
geom_point(alpha = 0.6, size = 0.1) +
theme_minimal() +
labs(caption = 'Number of random points : {2^(frame-1)}') +
transition_manual(count) +
shadow_trail()
animate(point_plot, nframes = 15, fps = 2)

Related

Is there a better representation which shows multiple lines in ggplot [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed last month.
Improve this question
I have plot 100 lines. Each line has color based on score. It is very difficult to see any pattern in the figure because lines overlap.
Is there a better representation which shows lines and scores are linked to one another.
I believe some kind of density plot can show the pattern.
library(tidyverse)
x <- rep(seq(0, 3.2, 0.01), times = 100)
score <- rep(1:100, each = 321)
y = runif(1000) * score * 0.01
df <- tibble(x = x,
score = score,
y = y)
ggplot(data = df,
aes(x = x,
y = y,
group = score,
color = score)) +
geom_line(size = 0.15) +
theme_bw() +
theme(aspect.ratio = 0.5) +
# legend.position="none") +
scale_color_gradient(low = 'blue', high = 'yellow')
The sample data is simply too messy and complex to show in an unfiltered line plot. One option is to show a summary of each line via geom_smooth. Although you lose details in the data, it allows you to convey the message that you want the plot to show.
library(tidyverse)
x <- rep(seq(0, 3.2, 0.01), times = 100)
score <- rep(1:100, each = 321)
y = runif(32100) * score * 0.01
df <- tibble(x = x,
score = score,
y = y)
ggplot(data = df,
aes(x = x,
y = y,
group = score,
color = score)) +
geom_smooth(linewidth = 0.5, se = FALSE) +
theme_bw() +
theme(aspect.ratio = 0.5) +
scale_color_gradient(low = 'blue', high = 'yellow')
What about a heat map - which you could make by categorizing both x and y and then taking the average score in each x-y combination.
library(tidyverse)
x <- rep(seq(0, 3.2, 0.01), times = 100)
score <- rep(1:100, each = 321)
y = runif(32100) * score * 0.01
df <- tibble(x = x,
score = score,
y = y) %>%
mutate(x_cat = cut(x, breaks=11),
y_cat = cut(y, breaks=11)) %>%
group_by(x_cat, y_cat) %>%
summarise(score = mean(score),
x = median(range(x)),
y=median(range(y)))
#> `summarise()` has grouped output by 'x_cat'. You can override using the
#> `.groups` argument.
ggplot(df, aes(x=x_cat, y=y_cat, fill=score)) +
geom_tile() +
scale_fill_gradient(low = 'blue', high = 'yellow') +
scale_x_discrete(labels=sprintf("%.2f", sort(unique(df$x)))) +
scale_y_discrete(labels=sprintf("%.6f", sort(unique(df$y)))) +
theme_classic() +
theme(axis.text.x = element_text(angle=45, hjust=1)) +
labs(x="X", y="Y", fill="Average\nScore")
Created on 2023-01-19 by the reprex package (v2.0.1)
If the x pattern isn't too important, we could just focus on score and the average y for each. This shows that relationship more clearly.
library(dplyr)
df |>
group_by(score) |>
summarize(avg_y = mean(y)) |>
ggplot(aes(score, avg_y)) +
geom_point()
Or perhaps there's another salient feature of each score line, like "average slope" or "spikiness" or "variability vs. linear regression." You could code that to color in this plot. More ideas for time series features here: https://github.com/tidyverts/feasts

How to add tick marks on a plot that is not from plot() in R

I use a R package, SetMethods, to get the fsQCA results of panel data. In the package, it uses cluster.plot() function to generate a plot.
However, I have a hard time letting the x-axis of the graph show the number of units as tick marks. For example, I want it shows 10, 20, 30,..,140 on the x-axis to know how many units' consistency score lower than a certain point.
Is there any method to add tick marks on a plot that is not generated by plot() function? Thanks in advance.
Here I use the dataset in the package as an example.
install.packages("SetMethods")
library(SetMethods)
data("PAYF")
PS <- minimize(data = PAYF,
outcome = "HL",
conditions = c("HE","GG","AH","HI","HW"),
incl.cut = 0.9,
n.cut = 2,
include = "?",
details = TRUE,
show.cases = TRUE)
PS
# Perform cluster diagnostics:
CB <- cluster(data = PAYF,
results = PS,
outcome = "HL",
unit_id = "COUNTRY",
cluster_id = "REGION",
necessity=FALSE,
wicons = FALSE)
CB
# Plot pooled, between, and within consistencies:
cluster.plot(cluster.res = CB,
labs = TRUE,
size = 8,
angle = 6,
wicons = TRUE)
Finally, I get a graph as follows.
However, I want it shows 10, 20, 30,..,140 on the x-axis to know how many units' consistency score lower than a certain point.
Is there any method to add tick marks on a plot that is not generated by plot() function? Thanks in advance.
If you look inside the cluster.plot function definition (in RStudio press F2 while pointer is on it) you will see that it uses ggplot2 under the hood. Only it doesn't return ggplot2 objects but just prints them one over another. Because of this it's not really possible to modify the output afterwards in any covenient manner.
But you can always copy the function code and rewrite it for your own need. The part that prints the final plot in your case is
CTw <- list()
ticklabw = unique(as.character(cluster.res$unit_ids))
xtickw <- seq(1, length(ticklabw), by = 1)
if (class(cluster.res) == "clusterminimize") {
for (i in 1:length(cluster.res$output)) {
CTw[[i]] <- cluster.res$output[[i]]$WICONS
dtw <- data.frame(x = xtickw, y = CTw[[i]])
dtw <- dtw[order(dtw$y), ]
dtw$xr <- reorder(dtw$x, 1 - dtw$y)
pw <- ggplot(dtw, aes(y = dtw[, 2], x = dtw[,
3])) + geom_point() + ylim(0, 1) + theme_classic(base_size = 16) +
geom_hline(yintercept = cluster.res$output[[i]]$POCOS) +
labs(title = names(cluster.res$output[i]),
x = "Units", y = "Consistency") + theme(axis.text.x = element_blank())
suppressWarnings(print(pw))
}
}
You can modify the ggplot2 construction part to something like this (packages ggplot2 and dplyr need to be loaded):
pw <-
dtw %>%
mutate(x_ind = as.numeric(xr)) %>%
ggplot(aes(x_ind, y)) +
geom_point() +
ylim(0, 1) +
theme_classic(base_size = 16) +
geom_hline(yintercept = cluster.res$output[[i]]$POCOS) +
scale_x_continuous(breaks = seq(from = 0, to = 140, by = 10)) +
labs(title = names(cluster.res$output[i]),
x = "Units", y = "Consistency")

Dropping data outside valid range when using geom_ma in scatterplot

I have four categories that I am plotting her using ggplot. I would like add a moving average using geom_ma but I have too few of the green dots to get a good moving average (I would prefer a period of at least 20). How can I keep the scatterplot as is and only add a MA of the purple and blue dots, which would be in my range of a 20 period moving average?
Example:
ggplot(data, aes(x, y, color=Str)) + geom_point(stat="identity") + geom_ma(ma_fun = SMA, n = 20, linetype=1, size=1, na.rm=TRUE)
I get the error: "Warning message:
Computation failed in stat_sma():
n = 20 is outside valid range: [1, 10]"
This is a great example of why it helps to provide a minimal reproducible example. You have provided the code that produced the error, but there is nothing wrong with the code on its own: it will only cause this error with certain inputs. Given suitable data, your code is fine.
Let's make a dummy data frame with the same name and column names as your data frame. We will make data for the first 330 days of 2020, and we will have 4 groups in Str, so a total of 1320 rows:
library(tidyquant)
library(ggplot2)
set.seed(1)
data <- data.frame(x = rep(seq(as.Date("2020-01-01"),
by = "day", length.out = 330), 4),
y = as.vector(replicate(4, 1000 * cumsum(rnorm(330)))),
Str = rep(c("A", "B", "C", "D"), each = 330))
Now if we use your exact plotting code, we can see that the plot is fine:
ggplot(data, aes(x, y, color = Str)) +
geom_point(stat="identity") +
geom_ma(ma_fun = SMA, n = 20, linetype = 1, size = 1, na.rm = TRUE)
But if one or more of our Str groups has fewer than 20 measurements, then we get your error. Let's remove most of the Str == "A" and Str == "B" cases, and repeat the plot:
data <- data[c(1:20 * 33, 661:1320),]
ggplot(data, aes(x, y, color = Str)) +
geom_point(stat="identity") +
geom_ma(ma_fun = SMA, n = 20, linetype = 1, size = 1, na.rm = TRUE)
#> Warning: Computation failed in `stat_sma()`:
#> n = 20 is outside valid range: [1, 10]
We get your exact warning, and the MA lines disappear from all the groups. Clearly we cannot get a 20-measurement moving average if we only have 10 data points, so geom_ma just gives up.
The fix here is to use the data = argument in geom_ma to filter out any groups with fewer than 20 data points:
ggplot(data, aes(x, y, color = Str)) +
geom_point(stat="identity") +
geom_ma(ma_fun = SMA, n = 20, linetype = 1, size = 1, na.rm = TRUE,
data = data[data$Str %in% names(table(data$Str)[table(data$Str) > 20]),])

stat_function not transitioning over transition_states

I'm trying to write my own Central Limit Theorem demonstration using ggplot2 and am unable to get my stat_function to display a changing normal distribution.
below is my code, I want the normal distribution in stat_function to transition through different states; specifically, I'm hoping for it to change the standard deviation to correspond with each value in dataset. Any help would be greatly appreciated.
#library defs
library(gganimate)
library(ggplot2)
library(transformr)
#initialization for distribution, rolls, and vectors
k = 2
meanr = 1/k
sdr = 1/k
br = sdr/10
rolls <- 200
avg <- 1
dataset <- 1
s <- 1
#loop through to create vectors of sample statistics from 200 samples of size i
#avg is sample average, s is standard deviations of sample means, and dataset is the indexes to run the transition states
for (i in c(1:40)){
for (j in 1:rolls){
avg <- c(avg,mean(rexp(i,k)))
}
dataset <- c(dataset, rep(i,rolls))
s <- c(s,rep(sdr/sqrt(i),rolls))
}
#remove initialized vector information as it was only created to start loops
avg <- avg[-1]
rn <- rn[-1]
dataset <- dataset[-1]
s <- s[-1]
#dataframe
a <- data.frame(avgf=avg, rnf = rn,datasetf = dataset,sf = s)
#plot histogram, density function, and normal distribution
ggplot(a,aes(x=avg,y=s))+
geom_histogram(aes(y = ..density..), binwidth = br,fill='beige',col='black')+
geom_line(aes(y = ..density..,colour = 'Empirical'),lwd=2, stat = 'density') +
stat_function(fun = dnorm, aes(colour = 'Normal', y = s),lwd=2,args=list(mean=meanr,sd = mean(s)))+
scale_y_continuous(labels = scales::percent_format()) +
scale_color_discrete(name = "Densities", labels = c("Empirical", "Normal"))+
labs(x = 'Sample Average',title = 'Sample Size: {closest_state}')+
transition_states(dataset,4,4)+ view_follow(fixed_x = TRUE)
I think it's difficult to use stat_function here because the dnorm function that you are passing includes a grouped variable (mean(s)). There is no way to indicate that you wish to group s by the dataset column, and the transition_states function doesn't filter the whole data frame. You could use transition_filter to filter the whole data frame, but this would be laborious.
It's not much work to just add a dnorm to your input data frame and plot it as a line, particularly since the rest of your code can be simplified substantially. Here's a fully reproducible example:
library(gganimate)
library(ggplot2)
library(transformr)
k <- 2
meanr <- sdr <- 1/k
br <- sdr/10
rolls <- 200
a <- do.call(rbind, lapply(1:40, function(i){
data.frame(avg = replicate(rolls, mean(rexp(i, k))),
dataset = rep(i, rolls),
x = seq(0, 2, length.out = rolls),
s = dnorm(seq(0, 2, length.out = rolls),
meanr, sdr/sqrt(i))) }))
ggplot(a, aes(x = avg, group = dataset)) +
geom_histogram(aes(y = ..density..), fill = 'beige',
colour = "black", binwidth = br) +
geom_line(aes(y = ..density.., colour = 'Empirical'),
lwd = 2, stat = 'density', alpha = 0.5) +
geom_line(aes(x = x, y = s, colour = "Normal"), size = 2, alpha = 0.5) +
scale_y_continuous(labels = scales::percent_format()) +
coord_cartesian(xlim = c(0, 2)) +
scale_color_discrete(name = "Densities", labels = c("Empirical", "Normal")) +
labs(x = 'Sample Average', title = 'Sample Size: {closest_state}') +
transition_states(dataset, 4, 4) +
view_follow(fixed_x = TRUE, fixed_y = TRUE)

Retaining trailing zeros from string plotted with `geom_text()` [duplicate]

This question already has an answer here:
Stop parsing out zeros after decimals in ggplot2's annotate
(1 answer)
Closed 2 years ago.
I have seen similar questions and solutions but none as far as I can see relatable to geom_text() in particular. Any guidance is greatly appreciated.
Say I want a plot point estimates and confidence intervals of:
# create tbl
ni <- tribble(
~ method, ~ mean_difference, ~ lo95, ~ hi95,
"NC", 3.235762, -0.5063099, 6.977835,
"IPTW", 3.256231, -0.5063099, 6.977835,
"EM", 5.642857, -1.995181, 13.280896,
)
Next I create a string var pasting together [rounded] mean_difference, lo95, and hi95 — which will be specified as the label for geom_text
# convert to point estimate and confidence intervals to strings (to keep trailing zeros for plot)
to_string <- function(
var,
n_digits = 1,
n_small = 1){
as.character(format(round(var, digits = n_digits), nsmall = n_small))
}
ni <- ni %>%
mutate(
mean_difference_lab = to_string(mean_difference),
lo95_lab = to_string(lo95),
hi95_lab = to_string(hi95),
lab = paste(
mean_difference_lab,
" (",
lo95_lab,
"-",
hi95_lab,
")",
sep = "")
)
This parses correctly in console.
print(ni$lab)
And yet, the trailing zeros are removed from the string when I plot it as:
ni %>%
ggplot(aes(x = mean_difference, y = method)) +
geom_point(
size = 6,
shape = 18) +
geom_errorbarh(aes(
xmin = lo95,
xmax = hi95,
height = 0
)) +
geom_text(aes(
family = 'Courier',
label = lab),
parse = TRUE,
nudge_y = -0.2) +
scale_x_continuous(breaks = seq(- 6, 14, 2))
Can any help spare my blushes, please?
if I understand your description you're getting:
but you want:
The only thing I changed was the argument parse=TRUE to parse=FALSE, i. e.
ni %>%
ggplot(aes(x = mean_difference, y = method)) +
geom_point(
size = 6,
shape = 18) +
geom_errorbarh(aes(
xmin = lo95,
xmax = hi95,
height = 0
)) +
geom_text(aes(
family = 'Courier',
label = lab),
parse = FALSE, # changed
nudge_y = -0.2) +
scale_x_continuous(breaks = seq(- 6, 14, 2))
(note that there are some awkward spaces when using parse=FALSE - these however are already in the data, i. e. what's shown in the plot is the same as what you get when looking at ni$lab)
Does this answer your question?

Resources