Simulation Poisson Process using R and ggplot2 - r

Using a simulation of a Poisson process with rate lambda = 0.7. Show a sample run of a Poisson process with N(t) on the vertical axis and time t on the horizontal axis. The simulation is from the range t[0:100]. Generate first graph with 10 trajectories and second graph with 100 trajectories.
I have tried the following code but I cannot generate both graphs.
library(plyr)
library(ggplot2)
Process_poisson<- function(t, lambda){
distr_poisson<- rpois(1, t*lambda)
s_poisson<- sort(runif(distr_poisson, 0, t))
data.frame(x = c(0, 0, s_poisson),y = c(0, 0:distr_poisson))
}
N_simulations<- function(n,t,lambda){
s_poisson<- lapply (1:n, function(n) data.frame(Process_poisson(t, lambda), simulation = n))
s_poisson<- ldply (s_poisson, data.frame)
s_poisson$simulation<- factor(s_poisson$simulation)
}
t<- 0:100
lambda<- 0.7
N_simulations(10, t, lambda)
N_simulations(100, t, lambda)
par(mfrow = c(1,2))
matplot(x, y, type = "l", lty = 0:5, lwd = 1, lend = par("lend"),
pch = NULL, col = simulation, cex = 0.5, bg = NA, main =sprintf("Nº simulations of trajectories of Poisson Process",10,lambda), xlab = "Time", ylab = "N(t)",
xlim = c(0,100), ylim = c(-10,0))
matplot(Proceso_poisson(t, lambda), n, y, type = "l", lty = 0:5, lwd = 1, lend = par("lend"),
pch = NULL, col = simulacion, cex = 0.5, bg = NA, main =sprintf("Nº simulations of trajectories of Poisson Process",10,lambda), xlab = "Time", ylab = "N(t)",
xlim = c(0,100), ylim = c(-10,0))
How could I do it?
Thanks so much!

I think you could make this simpler. Here's a ggplot solution.
First, create a function that will simulate a Poisson process by taking samples drawn from an exponential distribution with the appropriate lambda. In this example I have used a while loop that starts with a vector x whose first element is 0. The function grows this vector by adding random samples until its sum reaches the target duration tmax. This is not the most efficient way to do it, but should make the example clearer.
When the target is reached, the function returns the cumulative sum of the vector, which represents the arrival times of a Poisson process of the appropriate lambda. Note that to make plotting easier, it actually returns a data frame with the cumulative times, the cumulative count, and a grouping variable run that will allow us to plot several runs easily on a single plot.
make_sample_df <- function(run, tmax, lambda)
{
x <- 0
while(sum(x) < tmax) x <- c(x, rexp(1, lambda))
data.frame(t = cumsum(x), N = seq_along(x), run = rep(run, length(x)))
}
We can now use this function inside our actual plotting function:
plot_poisson <- function(runs, tmax, lambda)
{
# Creates one data frame for each run, this sticks them all together:
df <- do.call("rbind", lapply(seq(runs), make_sample_df, tmax, lambda))
ggplot2::ggplot(df, aes(t, N, group = run)) +
geom_step(alpha = 0.25) +
labs( title = paste(runs, "runs of Poisson process with lambda", lambda)) +
theme(legend.position = "none") +
coord_cartesian(xlim = c(0, tmax))
}
So you can do:
plot_poisson(runs = 10, tmax = 100, lambda = 0.7)
plot_poisson(runs = 100, tmax = 100, lambda = 0.7)

Related

Density plot of the F-distribution (df1=1). Theoretical or simulated?

I am plotting the density of F(1,49) in R. It seems that the simulated plot does not match the theoretical plot when values approach the zero.
set.seed(123)
val <- rf(1000, df1=1, df2=49)
plot(density(val), yaxt="n",ylab="",xlab="Observation",
main=expression(paste("Density plot (",italic(n),"=1000, ",italic(df)[1],"=1, ",italic(df)[2],"=49)")),
lwd=2)
curve(df(x, df1=1, df2=49), from=0, to=10, add=T, col="red",lwd=2,lty=2)
legend("topright",c("Theoretical","Simulated"),
col=c("red","black"),lty=c(2,1),bty="n")
Using density(val, from = 0) gets you much closer, although still not perfect. Densities near boundaries are notoriously difficult to calculate in a satisfactory way.
By default, density uses a Gaussian kernel to estimate the probability density at a given point. Effectively, this means that at each point an observation was found, a normal density curve is placed there with its center at the observation. All these normal densities are added up, then the result is normalized so that the area under the curve is 1.
This works well if observations have a central tendency, but gives unrealistic results when there are sharp boundaries (Try plot(density(runif(1000))) for a prime example).
When you have a very high density of points close to zero, but none below zero, the left tail of all the normal kernels will "spill over" into the negative values, giving a Gaussian-type which doesn't match the theoretical density.
This means that if you have a sharp boundary at 0, you should remove values of your simulated density that are between zero and about two standard deviations of your smoothing kernel - anything below this will be misleading.
Since we can control the standard deviation of our smoothing kernel with the bw parameter of density, and easily control which x values are plotted using ggplot, we will get a more sensible result by doing something like this:
library(ggplot2)
ggplot(as.data.frame(density(val), bw = 0.1), aes(x, y)) +
geom_line(aes(col = "Simulated"), na.rm = TRUE) +
geom_function(fun = ~ df(.x, df1 = 1, df2 = 49),
aes(col = "Theoretical"), lty = 2) +
lims(x = c(0.2, 12)) +
theme_classic(base_size = 16) +
labs(title = expression(paste("Density plot (",italic(n),"=1000, ",
italic(df)[1],"=1, ",italic(df)[2],"=49)")),
x = "Observation", y = "") +
scale_color_manual(values = c("black", "red"), name = "")
The kde1d and logspline packages are not bad for such densities.
sims <- rf(1500, 1, 49)
library(kde1d)
kd <- kde1d(sims, bw = 1, xmin = 0)
plot(kd, col = "red", xlim = c(0, 2), ylim = c(0, 2))
curve(df(x, 1, 49), add = TRUE)
library(logspline)
fit <- logspline(sims, lbound = 0, knots = c(0, 0.5, 1, 1.5, 2))
plot(fit, col = "red", xlim = c(0, 2), ylim = c(0, 2))
curve(df(x, 1, 49), add = TRUE)

Calculate intersection point of two density curves in R

I have two vectors of 1000 values (a and b), from which I created density plots and histograms. I would like to retrieve the coordinates (or just the y value) where the two plots cross (it does not matter if it detects several crossings, I can discriminate them afterwards). Please find the data in the following link. Sample Data
xlim = c(min(c(a,b)), max(c(a,b)))
hist(a, breaks = 100,
freq = F,
xlim = xlim,
xlab = 'Test Subject',
main = 'Difference plots',
col = rgb(0.443137, 0.776471, 0.443137, 0.5),
border = rgb(0.443137, 0.776471, 0.443137, 0.5))
lines(density(a))
hist(b, breaks = 100,
freq = F,
col = rgb(0.529412, 0.807843, 0.921569, 0.5),
border = rgb(0.529412, 0.807843, 0.921569, 0.5),
add = T)
lines(density(b))
Using locate() is not optimal, since I need to retrieve this from several plots (but will use that approach if nothing else is viable). Thanks for your help.
We calculate the density curves for both series, taking care to use the same range. Then, we compare whether the y-value for a is greater than b at each x-value. When the outcome of this comparison flips, we know the lines have crossed.
df <- merge(
as.data.frame(density(a, from = xlim[1], to = xlim[2])[c("x", "y")]),
as.data.frame(density(b, from = xlim[1], to = xlim[2])[c("x", "y")]),
by = "x", suffixes = c(".a", ".b")
)
df$comp <- as.numeric(df$y.a > df$y.b)
df$cross <- c(NA, diff(df$comp))
points(df[which(df$cross != 0), c("x", "y.a")])
which gives you

How to replicate a figure describing standard error of the mean in R?

The first figure in link here shows a very nice example of how to visualise standard error and I would like to replicate that in R.
I'm getting there with the following
set.seed(1)
pop<-rnorm(1000,175,10)
mean(pop)
hist(pop)
#-------------------------------------------
# Plotting Standard Error for small Samples
#-------------------------------------------
smallSample <- replicate(10,sample(pop,3,replace=TRUE)) ; smallSample
smallMeans<-colMeans(smallSample)
par(mfrow=c(1,2))
x<-c(1:10)
plot(x,smallMeans,ylab="",xlab = "",pch=16,ylim = c(150,200))
abline(h=mean(pop))
#-------------------------------------------
# Plotting Standard Error for Large Samples
#-------------------------------------------
largeSample <- replicate(10,sample(pop,20,replace=TRUE))
largeMeans<-colMeans(largeSample)
x<-c(1:10)
plot(x,largeMeans,ylab="",xlab = "",pch=16,ylim = c(150,200))
abline(h=mean(pop))
But I'm not sure how to plot the raw data as they have with the X symbols. Thanks.
Using base plotting, you need to use the arrows function.
In R there is no function (ASAIK) that computes standard error so try this
sem <- function(x){
sd(x) / sqrt(length(x))
}
Plot (using pch = 4 for the x symbols)
plot(x, largeMeans, ylab = "", xlab = "", pch = 4, ylim = c(150,200))
abline(h = mean(pop))
arrows(x0 = 1:10, x1 = 1:10, y0 = largeMeans - sem(largeSample) * 5, largeMeans + sem(largeSample) * 5, code = 0)
Note: the SE's from the data you provided were quite small, so i multiplied them by 5 to make them more obvious
Edit
Ahh, to plot all the points, then perhaps ?matplot, and ?matpoints would be helpful? Something like:
matplot(t(largeSample), ylab = "", xlab = "", pch = 4, cex = 0.6, col = 1)
abline(h = mean(pop))
points(largeMeans, pch = 19, col = 2)
Is this more the effect you're after?

why my GAM fit doesn't seem to have a correct intecept? [R]

My GAM curves are being shifted downwards. Is there something wrong with the intercept? I'm using the same code as Introduction to statistical learning... Any help's appreciated..
Here's the code. I simulated some data (a straight line with noise), and fit GAM multiple times using bootstrap.
(It took me a while to figure out how to plot multiple GAM fits in one graph. Thanks to this post Sam's answer, and this post)
library(gam)
N = 1e2
set.seed(123)
dat = data.frame(x = 1:N,
y = seq(0, 5, length = N) + rnorm(N, mean = 0, sd = 2))
plot(dat$x, dat$y, xlim = c(1,100), ylim = c(-5,10))
gamFit = vector('list', 5)
for (ii in 1:5){
ind = sample(1:N, N, replace = T) #bootstrap
gamFit[[ii]] = gam(y ~ s(x, 10), data = dat, subset = ind)
par(new=T)
plot(gamFit[[ii]], col = 'blue',
xlim = c(1,100), ylim = c(-5,10),
axes = F, xlab='', ylab='')
}
The issue is with plot.gam. If you take a look at the help page (?plot.gam), there is a parameter called scale, which states:
a lower limit for the number of units covered by the limits on the ‘y’ for each plot. The default is scale=0, in which case each plot uses the range of the functions being plotted to create their ylim. By setting scale to be the maximum value of diff(ylim) for all the plots, then all subsequent plots will produced in the same vertical units. This is essential for comparing the importance of fitted terms in additive models.
This is an issue, since you are not using range of the function being plotted (i.e. the range of y is not -5 to 10). So what you need to do is change
plot(gamFit[[ii]], col = 'blue',
xlim = c(1,100), ylim = c(-5,10),
axes = F, xlab='', ylab='')
to
plot(gamFit[[ii]], col = 'blue',
scale = 15,
axes = F, xlab='', ylab='')
And you get:
Or you can just remove the xlim and ylim parameters from both calls to plot, and the automatic setting of plot to use the full range of the data will make everything work.

Plot normal, left and right skewed distribution in R

I want to create 3 plots for illustration purposes:
- normal distribution
- right skewed distribution
- left skewed distribution
This should be an easy task, but I found only this link, which only shows a normal distribution. How do I do the rest?
If you are not too tied to normal, then I suggest you use beta distribution which can be symmetrical, right skewed or left skewed based on the shape parameters.
hist(rbeta(10000,5,2))
hist(rbeta(10000,2,5))
hist(rbeta(10000,5,5))
Finally I got it working, but with both of your help, but I was relying on this site.
N <- 10000
x <- rnbinom(N, 10, .5)
hist(x,
xlim=c(min(x),max(x)), probability=T, nclass=max(x)-min(x)+1,
col='lightblue', xlab=' ', ylab=' ', axes=F,
main='Positive Skewed')
lines(density(x,bw=1), col='red', lwd=3)
This is also a valid solution:
curve(dbeta(x,8,4),xlim=c(0,1))
title(main="posterior distrobution of p")
just use fGarch package and these functions:
dsnorm(x, mean = 0, sd = 1, xi = 1.5, log = FALSE)
psnorm(q, mean = 0, sd = 1, xi = 1.5)
qsnorm(p, mean = 0, sd = 1, xi = 1.5)
rsnorm(n, mean = 0, sd = 1, xi = 1.5)
** mean, sd, xi location parameter mean, scale parameter sd, skewness parameter xi.
Examples
## snorm -
# Ranbdom Numbers:
par(mfrow = c(2, 2))
set.seed(1953)
r = rsnorm(n = 1000)
plot(r, type = "l", main = "snorm", col = "steelblue")
# Plot empirical density and compare with true density:
hist(r, n = 25, probability = TRUE, border = "white", col = "steelblue")
box()
x = seq(min(r), max(r), length = 201)
lines(x, dsnorm(x), lwd = 2)
# Plot df and compare with true df:
plot(sort(r), (1:1000/1000), main = "Probability", col = "steelblue",
ylab = "Probability")
lines(x, psnorm(x), lwd = 2)
# Compute quantiles:
round(qsnorm(psnorm(q = seq(-1, 5, by = 1))), digits = 6)

Resources