How to simulate data and visualize in a single R function - r

I'm using replicate to simulate distributions in R and visualize how they change with different parameters (e.g., rbinom(100,1,0.5) vs. rbinom(100,1,0.01)).
I'd like to do all of this within a single function that 1. simulates replicates, 2. sets the plotting dimensions and parameters, and 3. loops through and draws density curves.
In separate pieces, this code works no problem:
n <- 100
d <- as.data.frame(
replicate(n,
expr = rbinom(n, 1, 0.5),
simplify = F)
)
colnames(d) <- 1:n
plot( NULL, xlim = c( min(d)-0.5, max(d)+0.5), ylim = c(0,2))
for(i in 1:n) lines( density( d[,i]) )
But inside a function, only a single density curve is returned:
plotcurves <- function(n, distr, ymax) {
d <- as.data.frame(
replicate(n,
expr = distr,
simplify = F)
)
colnames(d) <- 1:n
plot( NULL, xlim = c( min(d)-0.5, max(d)+0.5), ylim = c(0,ymax))
for(i in 1:n) lines( density( d[,i]) )
}
plotcurves(n = 100, distr = rbinom(100, 1, 0.5), ymax = 2)
The solution seems like it would be very simple but I cannot seem to find it.
What do I need to do to fix the code OR does a function like this already exist that I am unaware of?

The problem is that in your function, distr is evaluated before it reaches the call to replicate. You can see this if you make a variation of the function that just returns the data frame d instead of plotting it:
show_d <- function(n, distr, ymax)
{
d <- as.data.frame(
replicate(n,
expr = distr,
simplify = F)
)
return(d)
}
show_d(n = 3, distr = rbinom(5, 1, 0.5), ymax = 2)
#> c.1L..0L..1L..1L..1L. c.1L..0L..1L..1L..1L..1 c.1L..0L..1L..1L..1L..2
#> 1 1 1 1
#> 2 0 0 0
#> 3 1 1 1
#> 4 1 1 1
#> 5 1 1 1
You'll notice the columns are all the same. Effectively, the call to rbinom was evaluated then passed to replicate, which is the same as calling replicate(3, c(1, 0, 1, 1, 1)). So you are plotting all the lines - it's just that the lines are all the same.
What you need to do inside a function is to ensure that distr is passed as a call to replicate rather than being evaluated and sent as a vector. You can do this using match.call() and extracting the third element (which is the second parameter):
plotcurves <- function(n, distr, ymax) {
mc <- match.call()[[3]]
d <- as.data.frame(
replicate(n,
expr = mc,
simplify = F)
)
colnames(d) <- 1:n
plot( NULL, xlim = c( min(d)-0.5, max(d)+0.5), ylim = c(0,ymax))
for(i in 1:n) lines( density( d[,i]) )
}
plotcurves(n = 100, distr = rbinom(100, 1, 0.5), ymax = 2)

Related

Draw a vector field from matrix multiplication r

I'm trying to print a vector field based on a matrix multiplication. The problem is that the function that will print values to make the matrix multiplication can only take a single number. When a range of number is put into the all.p function, the output is not usable to do the matrix multiplication. Is there a way to change all.p so that with multiple inputs, the matrix multiplication can still be valid, and the vector field can be computed? The code fails at the vectorfield function as this function with put the values into the range 0 to 1, but the all.p can't take multiple inputs.
geno.fit = matrix(c(0.791,1.000,0.834,
0.670,1.006,0.901,
0.657,0.657,1.067),
nrow = 3,
ncol = 3,
byrow = T)
all.p <- function(p) {
if (length(p)>1) {
stop("More numbers in input than expected")
}
P = p^2
PQ = 2*p*(1-p)
Q = (1-p)^2
return(list=c(P=P,PQ=PQ,Q=Q))
}
library(pracma)
f <- function(x, y) all.p(x) %*% geno.fit %*% all.p(y)
xx <- c(0, 1); yy <- c(0, 1)
vectorfield(fun = f, xlim = xx, ylim = yy, scale = 0.1)
for (xs in seq(0, 1, by = 0.25)) {
sol <- rk4(f, 0, 1, xs, 100)
lines(sol$x, sol$y, col="darkgreen")
}
grid()
I also tried to use a for loop.
f <- function(x, y, n = 16) {
space3 = matrix(NA,nrow = n,ncol = n)
for (i in 1:(length(x))) {
for (j in 1:(length(y))) {
# Calculate mean fitness
space3[i,j] = all.p(x[i]) %*% geno.fit %*% all.p(y[j])
}
}
return(space3)
}
xx <- c(0, 1); yy <- c(0, 1)
f(seq(0,1,length.out = 16), seq(0,1,length.out = 16))
vectorfield(fun = f, xlim = xx, ylim = yy, scale = 0.1)
Below is the code to make the gradient ascend (without the vectors).
library(fields) # for image.plot
res = 0.01
seq.x = seq(0,1,by = res)
space = outer(seq.x,seq.x,"*")
pace2 = space
for (i in 1:length(seq.x)) {
for (j in 1:length(seq.x)) {
space[i,j] = all.p(1-seq.x[i]) %*% geno.fit %*% all.p(1-seq.x[j])
}
}
round(t(space),3)
new.space = t(space)
image.plot(new.space)
by.text = 8
for (i in seq(1,length(seq.x),by = by.text)) {
for (j in seq(1,length(seq.x),by = by.text)) {
text(seq.x[i],seq.x[j],
labels = round(new.space[i,j],4),
cex = new.space[i,j]/2,
col = "black")
}
}
contour(new.space,ylim=c(1,0),add = T, nlevels = 50)
I was able to make the vector field function work, but it's not showing what I was expecting from the previous gradient ascend vector field:
How can the 2 be reconciled? (i.e., plotting the vectors on the gradient ascend image which would show the proper direction of the vectors in the steepest ascend)
Here is my solution:
library(fields) # for image.plot
library(plotly)
library(raster)
# Genotype fitness matrix -------------------------------------------------
geno.fit = matrix(c(0.791,1.000,0.834,
0.670,1.006,0.901,
0.657,0.657,1.067),
nrow = 3,
ncol = 3,
byrow = T)
# Resolution
res = 0.01
# Sequence of X
seq.x = seq(0,1,by = res)
# Make a matrix
space = outer(seq.x,seq.x,"*")
# Function to calculate the AVERAGE fitness for a given frequency of an allele to get the expected frequency of genotypes in a population
all.p <- function(p) { # Takes frequency of an allele in the population
if (length(p)>1) { # Has to be only 1 number
stop("More numbers in input than expected")
}
P = p^2 # Gets the AA
PQ = 2*p*(1-p) # gets the Aa
Q = (1-p)^2 # Gets the aa
return(list=c(P=P, # Return the values
PQ=PQ,
Q=Q))
}
# Examples
all.p(0)
all.p(1)
# Plot the matrix of all combinations of genotype frequencies
image.plot(space,
ylim=c(1.05,-0.05),
ylab= "Percentage of Chromosome EF of TD form",
xlab= "Percentage of Chromosome CD of BL form")
# Backup the data
space2 = space
# calculate the average fitness for EVERY combination of frequency of 2 genotypes
for (i in 1:length(seq.x)) {
for (j in 1:length(seq.x)) {
# Calculate mean fitness
space[i,j] = all.p(1-seq.x[i]) %*% geno.fit %*% all.p(1-seq.x[j])
}
}
# Show the result
round(t(space),3)
# Transform the space
new.space = t(space)
image.plot(new.space,
# ylim=c( 1.01,-0.01),
ylab= "Percentage of Chromosome EF of TD (Tidbinbilla) form",
xlab= "Percentage of Chromosome CD of BL (Blundell) form")
# Add the numbers to get a better sense of the average fitness values at each point
by.text = 8
for (i in seq(1,length(seq.x),by = by.text)) {
for (j in seq(1,length(seq.x),by = by.text)) {
text(seq.x[i],seq.x[j],
labels = round(new.space[i,j],4),
cex = new.space[i,j]/2,
col = "black") # col = "gray70"
}
}
# Add contour lines
contour(new.space,ylim=c(1,0),add = T, nlevels = 50)
# Plotly 3D graph --------------------------------------------------------
# To get the 3D plane in an INTERACTIVE graph
xyz=cbind(expand.grid(seq.x,
seq.x),
as.vector(new.space))
plot_ly(x = xyz[,1],y = xyz[,2],z = xyz[,3],
color = xyz[,3])
# Vector field on the Adaptive landscape ----------------------------------
library(tidyverse)
library(ggquiver)
raster2quiver <- function(rast, aggregate = 50, colours = terrain.colors(6), contour.breaks = 200)
{
names(rast) <- "z"
quiv <- aggregate(rast, aggregate)
terr <- terrain(quiv, opt = c('slope', 'aspect'))
quiv$u <- -terr$slope[] * sin(terr$aspect[])
quiv$v <- -terr$slope[] * cos(terr$aspect[])
quiv_df <- as.data.frame(quiv, xy = TRUE)
rast_df <- as.data.frame(rast, xy = TRUE)
print(ggplot(mapping = aes(x = x, y = y, fill = z)) +
geom_raster(data = rast_df, na.rm = TRUE) +
geom_contour(data = rast_df,
aes(z=z, color=..level..),
breaks = seq(0,3, length.out = contour.breaks),
size = 1.4)+
scale_color_gradient(low="blue", high="red")+
geom_quiver(data = quiv_df, aes(u = u, v = v), vecsize = 1.5) +
scale_fill_gradientn(colours = colours, na.value = "transparent") +
theme_bw())
return(quiv_df)
}
r <-raster(
space,
xmn=range(seq.x)[1], xmx=range(seq.x)[2],
ymn=range(seq.x)[1], ymx=range(seq.x)[2],
crs=CRS("+proj=utm +zone=11 +datum=NAD83")
)
# Draw the adaptive landscape
raster2quiver(rast = r, aggregate = 2, colours = tim.colors(100))
Not exactly what I wanted, but it does what I was looking for!

How can I plot this 'integral' code in R?

f1 <- function(x){integrate(f = function(t){
sqrt(t^3-1)
}, lower = 1, upper = x)}
The domain of x is 1 to 4. f1 always emit value characterized 'integrate'. I don't know how to plot this integral function in R.
Thanks to anyone who can help me.
You may need to compute the values of your function f1 and then use an apply function as follows:
f1 <- function(x) {
integrate( function(t) sqrt(t^3-1), lower = 1, upper = x)
}
u <- seq(1, 4, by = 0.1) # Defining a vector of values from 1 to 4 in steps of .1
f1u <- sapply(u, function(x) f1(x)$value) #computing the values of f1 over u
plot(u,f1u, type = "l", xlab = "x", ylab = "f1(x)") # your plot
You can vectorize the upper argument to integrate like so:
vintegrate <- Vectorize(integrate, "upper")
f1 <- function(x) {
unlist(vintegrate(function(t) sqrt(t^3-1), lower = 1, upper = x)[1,])
}
Then you can plot using the curve function in base R:
curve(f1(x), from = 1, to = 4)
Or using ggplot2:
library(ggplot2)
ggplot(data.frame(x = 0)) +
geom_function(fun = f1) +
xlim(1, 4)
Without vectorizing, the upper argument expects an vector of length 1 or else it will error:
integrate(function(t) sqrt(t^3-1), lower = 1, upper = 1:4)
Error in integrate(function(t) sqrt(t^3 - 1), lower = 1, upper = 1:4) :
length(upper) == 1 is not TRUE
After vectorizing:
vintegrate(function(t) sqrt(t^3-1), lower = 1, upper = 1:4)
[,1] [,2] [,3] [,4]
value 0 1.515927 5.356315 11.84309
abs.error 0 0.0001312847 0.0003641383 0.0006563824
subdivisions 1 5 5 5
message "OK" "OK" "OK" "OK"
call Expression Expression Expression Expression
And we use unlist and [1,] to get the value.

Plotting multiple rater: restriction on number of panels

Is there a restriction on how many raster I can plot using using the plot function? For e.g
library(raster)
mystack <- stack()
for(i in 1:25){
df <- data.frame( x = rep( 0:1, each=2 ),
y = rep( 0:1, 2),
l = rnorm( 4 ))
dfr <- rasterFromXYZ(df)
mystack <- stack(mystack, dfr)
}
plot(mystack)
It plots only 16 plots no matter how many iterations I do.
You were on the right track, from ?`plot,Raster,ANY-method`:
## S4 method for signature 'Raster,ANY' plot(x, y, maxpixels=500000, col, alpha=NULL, colNA=NA, add=FALSE, ext=NULL, useRaster=TRUE,
interpolate=FALSE, addfun=NULL, nc, nr, maxnl=16, main, npretty=0,
...)
Notice maxnl = 16. All you need is to change that to the desired number of layers:
plot(mystack, maxnl=25)

Vectorized R function to produce sets of histograms

I have a vectorized R function (see below). At each run, the function plots two histograms. My goal is that when argument n is a vector (see example of use below), the function plots length of n separate sets of these histograms (ex: if n is a vector of length 2, I expected two sets of histograms i.e., 4 individual histograms)?
I have tried the following with no success. Is there a way to do this?
t.sim = Vectorize(function(n, es, n.sim){
d = numeric(n.sim)
p = numeric(n.sim)
for(i in 1:n.sim){
N = sqrt((n^2)/(2*n))
x = rnorm(n, es, 1)
y = rnorm(n, 0, 1)
a = t.test(x, y, var.equal = TRUE)
d[i] = a[[1]]/N
p[i] = a[[3]]
}
par(mfcol = c(2, length(n)))
hist(p) ; hist(d)
}, "n")
# Example of use:
t.sim(n = c(30, 300), es = .1, n.sim = 1e3) # `n` is a vector of `2` so I expect
# 4 histograms in my graphical device
Vectorize seems to be based on mapply, which would essentially call the function numerous times while cycle through your inputs vector. Hence, the easier way out probably just calls it outside the function
t.sim = Vectorize(function(n, es, n.sim){
d = numeric(n.sim)
p = numeric(n.sim)
for(i in 1:n.sim){
N = sqrt((n^2)/(2*n))
x = rnorm(n, es, 1)
y = rnorm(n, 0, 1)
a = t.test(x, y, var.equal = TRUE)
d[i] = a[[1]]/N
p[i] = a[[3]]
}
# par(mfcol = c(2, npar))
hist(p) ; hist(d)
}, "n")
#inputs
data <- c(30,300)
par(mfcol = c(2, length(data)))
t.sim(n = data, es = c(.1), n.sim = 1e3)

Graphical output of density for the function gammamixEM (package mixtools)

I'm using the function gammamixEM from the package mixtools. How can I return the graphical output of density as in the function normalmixEM (i.e., the second plot in plot(...,which=2)) ?
Update:
Here is a reproducible example for the function gammamixEM:
x <- c(rgamma(200, shape = 0.2, scale = 14), rgamma(200,
shape = 32, scale = 10), rgamma(200, shape = 5, scale = 6))
out <- gammamixEM(x, lambda = c(1, 1, 1)/3, verb = TRUE)
Here is a reproducible example for the function normalmixEM:
data(faithful)
attach(faithful)
out <- normalmixEM(waiting, arbvar = FALSE, epsilon = 1e-03)
plot(out, which=2)
I would like to obtain this graphical output of density from the function gammamixEM.
Here you go.
out <- normalmixEM(waiting, arbvar = FALSE, epsilon = 1e-03)
x <- out
whichplots <- 2
density = 2 %in% whichplots
loglik = 1 %in% whichplots
def.par <- par(ask=(loglik + density > 1), "mar") # only ask and mar are changed
mix.object <- x
k <- ncol(mix.object$posterior)
x <- sort(mix.object$x)
a <- hist(x, plot = FALSE)
maxy <- max(max(a$density), .3989*mix.object$lambda/mix.object$sigma)
I just had to dig into the source code of plot.mixEM
So, now to do this with gammamixEM:
x <- c(rgamma(200, shape = 0.2, scale = 14), rgamma(200,
shape = 32, scale = 10), rgamma(200, shape = 5, scale = 6))
gammamixEM.out <- gammamixEM(x, lambda = c(1, 1, 1)/3, verb = TRUE)
mix.object <- gammamixEM.out
k <- ncol(mix.object$posterior)
x <- sort(mix.object$x)
a <- hist(x, plot = FALSE)
maxy <- max(max(a$density), .3989*mix.object$lambda/mix.object$sigma)
main2 <- "Density Curves"
xlab2 <- "Data"
col2 <- 2:(k+1)
hist(x, prob = TRUE, main = main2, xlab = xlab2,
ylim = c(0,maxy))
for (i in 1:k) {
lines(x, mix.object$lambda[i] *
dnorm(x,
sd = sd(x)))
}
I believe it should be pretty straight forward to continue this example a bit, if you want to add the labels, smooth lines, etc. Here's the source of the plot.mixEM function.

Resources