Consider the two following graphs
require(ggplot2)
set.seed(5276)
my_mtcars = mtcars
my_mtcars$rand = rnorm(nrow(mtcars))
ggplot(my_mtcars, aes(rand, mpg)) + geom_point()
ggplot(my_mtcars, aes(wt, mpg)) + geom_point()
I would like to make a movie (or any animated picture that could easily be introduced in a .ppt) that would make a smooth passage from the first graph toward the second. During the passage, we would see the dots smoothly move horizontally to reach their destination. The x-axis scale and label should also appear (eventually smoothly).
I would try to create many .png that are intermediate pictures and then make a gif out of it using ImageMagick (Bash) but I am pretty sure there should be a better solution. I used ggplot2 to produce the plots but I am not strict on that. Here is what I tried for the moment
IntFormat = function(x,nbdigits=0){
cx = paste0(x)
l = length(strsplit(cx,"")[[1]])
if (nbdigits < l){nbdigits = l; print("WARNING: Parameter `nbdigits` too small")}
before = paste(rep("0",nbdigits-l), collapse="")
paste0(before, cx)
}
SmoothGraph = function(x1, x2, y1, y2, times = "NOTSET", ...){
path = "/Users/remi/Desktop/"
if (times == "NOTSET") {
times = c(seq(0,0.1,length.out=10), seq(0,0.2, length.out=5), seq(0.2,0.8, length.out=6), seq(0.8,0.9, length.out=5), seq(0.9,1,length.out=10))
}
nbdigits = length(times)
count = 0
x1_rel = x1 / (max(x1) - min(x1))
x1_rel = x1_rel - max(x1_rel) + 1
x2_rel = x2 / (max(x2) - min(x2))
x2_rel = x2_rel - max(x2_rel) + 1
y1_rel = y1 / (max(y1) - min(y1)) - min(y1)
y1_rel = y1_rel - max(y1_rel) + 1
y2_rel = y2 / (max(y2) - min(y2)) - min(y2)
y2_rel = y2_rel - max(y2_rel) + 1
x_diff = x2_rel - x1_rel
y_diff = y2_rel - y1_rel
for (time in times){
count = count + 1
xtmp = x1_rel + x_diff * time
ytmp = y1_rel + y_diff * time
print(count)
png(paste0(path, "SmoothGraph_", IntFormat(count, nbdigits=nbdigits), ".png"))
plot(x=xtmp, y=ytmp, ...)
dev.off()
}
system (command="
cd /Users/remi/Desktop/
convert SmoothGraph_*.png -delay 1 SmoothGraph.gif
rm SmoothGraph_*.png
")
}
SmoothGraph(x1=rnorm(12), y1=mtcars$mpg , x2 = mtcars$wt, y2=mtcars$mpg)
This seems like a good fit for the animation package. You could do something like this. First, define a function that can interpolate between the two plots
framedata<-function(x) {
subset(transform(my_mtcars,
x=rand + x*(wt-rand),
y=mpg
), select=c(x,y))
}
Then you can animate them with
library(animation)
frame <- seq(0, 1, length.out=20)
saveGIF(lapply(frame, function(f) {
print( ggplot(framedata(f), aes(x, y)) + geom_point() )
}), "go.gif", interval = 0.05, loop=1)
This will create a 20 frame animation with a .05 second delay between frames that plays once.
Related
Very new programmer here. I'm struggling with this graph; I am trying to have one y axis reversed (for precipitation) and one y axis not reversed (for depth). I finally got both axes to work, but I can't figure out how to invert just one of them and the data too. The precipitation should be coming down from the top of the graph while the depth should stay how it is on the graph I shared here.
Here is what I tried:
ggp1 <-ggplot(BeachSensors) + geom_bar(aes(x=Date, y=Precip),stat="identity", fill="cyan",colour="#006000") + scale_y_reverse()
ggp1
ggp2 <- ggp1 + geom_line(aes(x=Date, y=CorrDepth, color=Sensor),stat="identity",size=1)+
labs(title= "GLSM Precipitation and Depth",
x="Date",y="Precipitation (in.)")+
scale_y_continuous(sec.axis=sec_axis(~.*2,name="Depth (ft.)")) + theme_minimal()
But I kept either inverting both of the y axes at the same time, which is not what I want.
The "trick" (and often nightmares of inverting algebraic equations in high school) is knowing how to change the transforms for sec_axis, and knowing that you need to change the raw data itself.
I'm going to assume your data is in a "wide" format, looking somewhat like this:
set.seed(42)
dat <- data.frame(precip = runif(50, 0, 1.5)) |>
transform(
time = 1:50,
precip = ifelse(runif(50) < 0.8, 0, precip),
sensor = sample(c("inside", "outside"), 50, replace = TRUE)
) |>
transform(depth = 3.6 + cumsum(precip) - time/10)
head(dat)
# precip time sensor depth
# 1 0 1 outside 3.5
# 2 0 2 outside 3.4
# 3 0 3 inside 3.3
# 4 0 4 outside 3.2
# 5 0 5 inside 3.1
# 6 0 6 outside 3.0
From here, we need to know how to algebraically convert depth to the correct location and scale of the precip data.
The conversion is a "simple" scaling, meaning we need to adjust it to be completely within [0,1] and then adjust to the new scale. Using some prep objects,
rngdepth <- range(dat$depth)
diffdepth <- diff(rngdepth)
maxprecip <- max(dat$precip)
diffprecip <- diff(range(dat$precip))
We now have a simple relationship and the inversion. The two formulas we're interested in are the first and last:
y = maxprecip - diffprecip * (depth - rngdepth[1]) / diffdepth
maxprecip - y = diffprecip * (depth - rngdepth[1]) / diffdepth
diffdepth * (maxprecip - y) = diffprecip * (depth - rngdepth[1])
diffdepth * (maxprecip - y) / diffprecip = depth - rngdepth[1]
diffdepth * (maxprecip - y) / diffprecip + rngdepth[1] = depth
I prefer to use ~-function transforms in the data, so that the original frame is not encumbered (confusingly, sometimes) with the variables. Here's the plotting code:
ggplot(dat, aes(x = time)) +
geom_col(aes(y = precip)) +
geom_line(
aes(x = time, y = depth2, color = sensor, group = sensor),
data = ~ transform(., depth2 = maxprecip - diffprecip * (depth - rngdepth[1]) / diffdepth)
) +
scale_y_reverse(
name = "Precipitation",
sec.axis = sec_axis(
~ diffdepth * (maxprecip - .) / diffprecip + rngdepth[1],
name = "Depth"
)
)
I have the following task statement:
In this task we want to simulate random variables with density
To do this, write a function r_density(n) that simulates n of such random variables.
Then use this function to simulate N = 1000 of such random variables. Using geom_density() you can now estimate the density from the simulated random variables. We can compare this estimate with the real density. To do this, create a graph that looks like this:
Problem is, however, that I don't understand why my output looks like this:
Why is the raked density plotted in such a weird way? Can someone explain to me why it looks like that and how to get the estimated density from the expected image?
This is the corresponding code I wrote for the above plot:
library(tidyverse)
N <- 1000
r_density <- function(n){
exp(-abs(n))/2
}
x <- runif(N)
tb <- tibble(
x = x,
density_fkt = r_density(x)
)
ggplot() +
geom_density(
data = tb,
mapping = aes(
x = density_fkt,
y = ..scaled..
)
) +
geom_function(
fun = r_density,
xlim = c(-6,6),
color = "red",
size = 1
) +
theme_minimal() +
labs(
x = "x",
y = "Dichtefunktion f(x)",
title = "Geschätzte (schwarz) vs echte (rot) Dichte"
)
You may use inverse transform sampling or rejection sampling. I choose rejection sampling.
library(tidyverse)
N <- 1000
r_density <- function(n){
exp(-abs(n))/2
}
x = c()
while (length(x) < N) {
y = rnorm(1)
while (y > 6 | y < -6) {
y = rnorm(1)
}
u = runif(1)
if (u < r_density(y)/(dnorm(y) * 3)) {
x=append(x, y)
}
}
tb <- tibble(
x = x,
density_fkt = r_density(x)
)
ggplot() +
geom_density(
data = tb,
mapping = aes(
x = x,
y = ..density..
)
) +
geom_function(
fun = r_density,
xlim = c(-6,6),
color = "red",
size = 1
) +
theme_minimal() +
labs(
x = "x",
y = "Dichtefunktion f(x)",
title = "Geschätzte (schwarz) vs echte (rot) Dichte"
)
Here's the inverse transform sampling method (this involves some difficult integration, so perhaps not what your teacher intended)
r_density <- function(n) {
cdf <- function(x) {
1/4 * exp(-x) * (-1 + 2 * exp(x) + exp(2*x) - (-1 + exp(x))^2 * sign(x))
}
sapply(runif(n), function(i) {
uniroot(function(x) cdf(x) - i, c(-30, 20))$root
})
}
Plotting gives:
ggplot() +
geom_density(aes(r_density(1000))) +
geom_function(
fun = function(x) exp(-abs(x))/2,
xlim = c(-6,6),
color = "red",
size = 1
) +
theme_minimal() +
labs(
x = "x",
y = "Dichtefunktion f(x)",
title = "Geschätzte (schwarz) vs echte (rot) Dichte"
)
I am trying to make a graph in ggplot2 of 7 points in the shape of a circle, but trying to graph them only shows me 6 and I don't know why this happens.
The code is the following:
# Function for the points
circleFun <- function(center = c(-1, 1), diameter = 1, npoints = 7) {
r <- diameter / 2
tt <- seq(0, 2 * pi, length.out = npoints)
xx <- center[1] + r * cos(tt)
yy <- center[2] + r * sin(tt)
return(data.frame(x = xx, y = yy))
}
# example with 7 points
ej <-
circleFun(diameter = 50, center = c(50,50), npoints = 7)
# plot
ej |>
ggplot(aes(x = x, y = y)) +
geom_point(alpha = 0.4) +
theme_bw()
Does anyone know why this happens?
Rows 1 and 7 are identical, so their points are overlapped. The dot is a bit darker (per your alpha = 0.4). You can make this obvious by adding x = jitter(x) (For demonstration, not that you'd do that in production). I'm not sure what you're expecting to see given the identical data.
If you want 7 distinct points, then I suggest you create n+1 and remove the last (or first) point.
circleFun <- function(center = c(-1, 1), diameter = 1, npoints = 7) {
r <- diameter / 2
tt <- seq(0, 2 * pi, length.out = npoints + 1) # changed
xx <- center[1] + r * cos(tt)
yy <- center[2] + r * sin(tt)
data.frame(x = xx, y = yy)[-1,,drop = FALSE] # changed
}
## unchanged from here on
ej <-
circleFun(diameter = 50, center = c(50,50), npoints = 7)
ej |>
ggplot(aes(x = x, y = y)) +
geom_point(alpha = 0.4) +
theme_bw()
(BTW, there is no need for an explicit call to return(.), especially when it is the only end-point of the function and "obvious" based on data flow. It certainly doesn't hurt, but it adds one step on the call stack that adds no value. It may be declarative/self-documenting, and as such this is a style/subjective point.)
Given family of functions f(x;q) (x is argument and q is parameter) I'd like to visulaize this function family on x taking from the interval [0,1] for 9 values of q (from 0.1 to 0.9). So far my solution is:
f = function(p,q=0.9) {1-(1-(p*q)^3)^1024}
x = seq(0.0,0.99,by=0.01)
q = seq(0.1,0.9,by=0.1)
qplot(rep(x,9), f(rep(x,9),rep(q,each=100)), colour=factor(rep(q,each=100)),
geom="line", size=I(0.9), xlab="x", ylab=expression("y=f(x)"))
I get quick and easy visual with qplot:
My concern is that this method is rather memory hungry as I need to duplicate x for each parameter and duplicate each parameter value for whole x range. What would be alternative way to produce same graph without these duplications?
At some point ggplot will need to have the data available to plot it and the way that package works prohibits simply doing what you want. I suppose you could set up a blank plot if you know the x and y axis limits, and then loop over the 9 values of q, generating the data for that q, and adding a geom_line layer to the existing plot object. However, you'll have to produce the colours for each layer yourself.
If this is representative of the size of problem you have, I wouldn't worry too much about the memory footprint. We're only talking about a two vectors of length 900
> object.size(rnorm(900))
7240 bytes
and the 100 values over the range of x appears sufficient to give a smooth plot.
for loop to add layers to ggplot
require("ggplot2")
## something to replicate ggplot's colour palette, sure there is something
## to do this already in **ggplot** now...
ggHueColours <- function(n, h = c(0, 360) + 15, l = 65, c = 100,
direction = 1, h.start = 0) {
turn <- function(x, h.start, direction) {
(x + h.start) %% 360 * direction
}
if ((diff(h) %% 360) < 1) {
h[2] <- h[2] - 360 / n
}
hcl(h = turn(seq(h[1], h[2], length = n), h.start = h.start,
direction = direction), c = c, l = l)
}
f = function(p,q=0.9) {1-(1-(p*q)^3)^1024}
x = seq(0.0,0.99,by=0.01)
q = seq(0.1,0.9,by=0.1)
cols <- ggHueColours(n = length(q))
for(i in seq_along(q)) {
df <- data.frame(y = f(x, q[i]), x = x)
if(i == 1) {
plt <- ggplot(df, aes(x = x, y = y)) + geom_line(colour = cols[i])
} else {
plt <- plt + geom_line(data = df, colour = cols[i])
}
}
plt
which gives:
I'll leave the rest to you - I'm not familiar enough with ggplot to draw a legend manually.
For very heavy-tailed data of both positive and negative sign, I sometimes like to see all the data on a plot without hiding structure in the unit interval.
When plotting with Matplotlib in Python, I can achieve this by selecting a symlog scale, which uses a logarithmic transform outside some interval, and linear plotting inside it.
Previously in R I have constructed similar behavior by transforming the data with an arcsinh on a one-off basis. However, tick labels and the like are very tricky to do right (see below).
Now, I am faced with a bunch of data where the subsetting in lattice or ggplot would be highly convenient. I don't want to use Matplotlib because of the subsetting, but I sure am missing symlog!
Edit:
I see that ggplot uses a package called scales, which solves a lot of this problem (if it works). Automatically choosing tick mark and label placing still looks pretty hard to do nicely though. Some combination of log_breaks and cbreaks perhaps?
Edit 2:
The following code is not too bad
sinh.scaled <- function(x,scale=1){ sinh(x)*scale }
asinh.scaled <- function(x,scale=1) { asinh(x/scale) }
asinh_breaks <- function (n = 5, scale = 1, base=10)
{
function(x) {
log_breaks.callable <- log_breaks(n=n,base=base)
rng <- rng <- range(x, na.rm = TRUE)
minx <- floor(rng[1])
maxx <- ceiling(rng[2])
if (maxx == minx)
return(sinh.scaled(minx, scale=scale))
big.vals <- 0
if (minx < (-scale)) {
big.vals = big.vals + 1
}
if (maxx>scale) {
big.vals = big.vals + 1
}
brk <- c()
if (minx < (-scale)) {
rbrk <- log_breaks.callable( c(-min(maxx,-scale), -minx ) )
rbrk <- -rev(rbrk)
brk <- c(brk,rbrk)
}
if ( !(minx>scale | maxx<(-scale)) ) {
rng <- c(max(minx,-scale), min(maxx,scale))
minc <- floor(rng[1])
maxc <- ceiling(rng[2])
by <- floor((maxc - minc)/(n-big.vals)) + 1
cb <- seq(minc, maxc, by = by)
brk <- c(brk,cb)
}
if (maxx>scale) {
brk <- c(brk,log_breaks.callable( c(max(minx,scale), maxx )))
}
brk
}
}
asinh_trans <- function(scale = 1) {
trans <- function(x) asinh.scaled(x, scale)
inv <- function(x) sinh.scaled(x, scale)
trans_new(paste0("asinh-", format(scale)), trans, inv,
asinh_breaks(scale = scale),
domain = c(-Inf, Inf))
}
A solution based on the package scales and inspired by Brian Diggs' post mentioned by #Dennis:
symlog_trans <- function(base = 10, thr = 1, scale = 1){
trans <- function(x)
ifelse(abs(x) < thr, x, sign(x) *
(thr + scale * suppressWarnings(log(sign(x) * x / thr, base))))
inv <- function(x)
ifelse(abs(x) < thr, x, sign(x) *
base^((sign(x) * x - thr) / scale) * thr)
breaks <- function(x){
sgn <- sign(x[which.max(abs(x))])
if(all(abs(x) < thr))
pretty_breaks()(x)
else if(prod(x) >= 0){
if(min(abs(x)) < thr)
sgn * unique(c(pretty_breaks()(c(min(abs(x)), thr)),
log_breaks(base)(c(max(abs(x)), thr))))
else
sgn * log_breaks(base)(sgn * x)
} else {
if(min(abs(x)) < thr)
unique(c(sgn * log_breaks()(c(max(abs(x)), thr)),
pretty_breaks()(c(sgn * thr, x[which.min(abs(x))]))))
else
unique(c(-log_breaks(base)(c(thr, -x[1])),
pretty_breaks()(c(-thr, thr)),
log_breaks(base)(c(thr, x[2]))))
}
}
trans_new(paste("symlog", thr, base, scale, sep = "-"), trans, inv, breaks)
}
I am not sure whether the impact of a parameter scale is the same as in Python, but here are a couple of comparisons (see Python version here):
data <- data.frame(x = seq(-50, 50, 0.01), y = seq(0, 100, 0.01))
data$y2 <- sin(data$x / 3)
# symlogx
ggplot(data, aes(x, y)) + geom_line() + theme_bw() +
scale_x_continuous(trans = symlog_trans())
# symlogy
ggplot(data, aes(y, x)) + geom_line() + theme_bw()
scale_y_continuous(trans="symlog")
# symlog both, threshold = 0.015 for y
# not too pretty because of too many breaks in short interval
ggplot(data, aes(x, y2)) + geom_line() + theme_bw()
scale_y_continuous(trans=symlog_trans(thr = 0.015)) +
scale_x_continuous(trans = "symlog")
# Again symlog both, threshold = 0.15 for y
ggplot(data, aes(x, y2)) + geom_line() + theme_bw()
scale_y_continuous(trans=symlog_trans(thr = 0.15)) +
scale_x_continuous(trans = "symlog")