I am currently running a simulation using a normal distribution, it simulates the times between events and is based on an analysis of data given (not relevant for the problem). The simulation is created like this:
SimProcess <- function(mu, sigma, T) {
ctimes <- c() # Array of arrival times, initially empty
t <- rnorm(1,mu, sqrt(sigma)) # Time of next arrival
while(t < T) {
ctimes <- c(ctimes, t)
dt = rnorm(1, mu, sqrt(sigma))
if (dt<0){dt = 0}
t <- t + dt # sampling from the dataset
}
return(ctimes)
}
# Create a sample path of one run
T <- 10
# arrival times
arrivals <- SimProcess(mu_t, var_t, T)
Now I would like to do several of these random trials and then plot them in a figure so we can compare it to the given data. 10 of these trials would be ideal. I tried plotting it like this but unfortunately it doesn't work. I am afraid i'll have to use reshape2 to melt the data of the 10 trials because the length of these vectors is all not the same. I use this to try to plot all the lines, it clearly doesn't work the way it should.
x <- c(0, arrivals, T,rep(0,500-length(arrivals)))
y <- c(0:length(arrivals), length(arrivals),rep(0,500-length(arrivals)))
plotdataNT = data.frame(x,y)
p = ggplot(plotdataNT,aes(x,y))
plot(x,y,type = 's')
j = 1
for (j in 10){
arrivals <- SimProcess(mu_t,var_t,T)
x <- c(0, arrivals, T,rep(0,500-length(arrivals)))
y <- c(0:length(arrivals), length(arrivals),rep(0,500-length(arrivals)))
p = p + geom_step(mapping = aes (x,y))
}
Edit:
In the end I figuered it out, because I used 10 instead of 1:10 it would not run properly and I also had some more tiny mistakes. This ended up being the solution:
arrivals <- SimProcess(mu_t,var_t,T)
NT <- length(arrivals)
x <- c(0, arrivals, T,rep(0,correction-length(arrivals)))
y <- c(0:length(arrivals), length(arrivals),rep(0,correction-length(arrivals)))
plotdataNT = data.frame(x,y)
p = ggplot(plotdataNT,aes(x,y)) + geom_step(mapping = aes (x,y))
jk = 1
runs = 25
colourvec = rainbow(runs)
for (jk in 1:runs){
arrivals <- SimProcess(mu_t,var_t,T)
x <- c(0, arrivals, T,rep(0,correction-length(arrivals)))
y <- c(0:length(arrivals), length(arrivals),rep(0,correction-length(arrivals)))
newdata = data.frame(x,y)
p = p + geom_step(mapping = aes (x,y),newdata,colour = colourvec[jk])
}
p = p + scale_x_continuous(name = "Time in days") + scale_y_continuous(name = "Amount of claims")
p
This results in 26 random samples plotted in one graph in several colors, it represents a process with random time steps according to the gamma, normal or lognormal distribution. The answer below is a more clean example of what I meant. If anyone knows how to do this with reshape2 in a more efficient way I'd also be glad to know.
Two solutions:
for (j in 1:10) {
arrivals <- SimProcess(mu_t,var_t,T)
x <- c(0, arrivals, T,rep(0,500-length(arrivals)))
y <- c(0:length(arrivals), length(arrivals),rep(0,500-length(arrivals)))
xy <- data.frame(x,y)
p = p + geom_step(data=xy, mapping=aes(x,y))
}
print(p)
for (j in 1:10) {
arrivals <- SimProcess(mu_t,var_t,T)
x <- c(0, arrivals, T,rep(0,500-length(arrivals)))
y <- c(0:length(arrivals), length(arrivals),rep(0,500-length(arrivals)))
xy <- data.frame(x,y)
p = p + geom_step(mapping=aes_string(x,y))
}
print(p)
Related
I am having trouble saving the results of a for loop in the way that I want.
The loop I'm currently running looks like this:
# Setup objects
n = 100
R = (1:1000)
P = seq(-.9, .9, .1)
betahat_OLS = rep(NA, 1000)
Bhat_OLS = rep(NA, 19)
# Calculate betahat_OLS for each p in P and each r in R
for (p in P) {
for (r in R) {
# Simulate data
v = rnorm(n)
e = rnorm(n)
z = rnorm(n)
u = p*v+e
x = z+v
y = 0*x+u
#Calculate betahat_OLS
betahat_OLS[r] = sum(x*y)/sum(x^2)
}
#Calculate Bhat_OLS
Bhat_OLS = sum(betahat_OLS)/1000-0
}
# Make a scatterplot with p on the x-axis and Bhat_OLS on the y-axis
plot(P, Bhat_OLS)
The loop seems to be working correctly, except for the fact that I would like to end up with 19 values of Bhat_OLS and only currently get 1 value. I want to have a Bhat_OLS value for each value of p in P so that I can plot Bhat_OLS against p.
You can write your results into a data frame with two columns, containing P and Bhat_OLS.
# Setup objects
n = 100
R = (1:1000)
P = seq(-.9, .9, .1)
betahat_OLS = rep(NA, 1000)
Bhat_OLS = rep(NA, 19)
# initialize result data frame
results <- data.frame(matrix(ncol = 2, nrow = 0,
dimnames = list(NULL, c("P", "Bhat_OLS"))))
# Calculate betahat_OLS for each p in P and each r in R
for (p in P) {
for (r in R) {
# Simulate data
v = rnorm(n)
e = rnorm(n)
z = rnorm(n)
u = p*v+e
x = z+v
y = 0*x+u
#Calculate betahat_OLS
betahat_OLS[r] = sum(x*y)/sum(x^2)
}
#Calculate Bhat_OLS
Bhat_OLS = sum(betahat_OLS)/1000-0
# insert P and Bhat_OLS into results
results[nrow(results) + 1,] = c(p, Bhat_OLS)
}
# Make a scatterplot with p on the x-axis and Bhat_OLS on the y-axis
plot(results$P, results$Bhat_OLS)
The fact that you loop over the probabilities makes it difficult with the indices. You could loop over seq(P) instead and subset P[i]. Also, at the end you need Bhat_OLS[i]. Then it works.
# Setup objects
n <- 100
R <- (1:1000)
P <- seq(-.9, .9, .1)
betahat_OLS <- rep(NA, length(R))
Bhat_OLS <- rep(NA, length(P))
set.seed(42) ## for sake of reproducibility
# Calculate betahat_OLS for each p in P and each r in R
for (i in seq(P)) {
for (r in R) {
# Simulate data
v <- rnorm(n)
e <- rnorm(n)
z <- rnorm(n)
u <- P[i]*v + e
x <- z + v
y <- 0*x + u
#Calculate betahat_OLS
betahat_OLS[r] <- sum(x*y)/sum(x^2)
}
#Calculate Bhat_OLS
Bhat_OLS[i] <- sum(betahat_OLS)/1000 - 0
}
# Make a scatterplot with p on the x-axis and Bhat_OLS on the y-axis
plot(P, Bhat_OLS, xlim=c(-1, 1))
Alternative solution vapply
In a more R-ish way (right now it is more c-ish) you could define the simulation in a function sim() and use vapply for the outer loop. (Actually also for the inner loop, but I've tested it and this way it's faster.)
sim <- \(p, n=100, R=1:1000) {
r <- rep(NA, max(R))
for (i in R) {
v <- rnorm(n)
e <- rnorm(n)
z <- rnorm(n)
u <- p*v + e
x <- z + v
y <- 0*x + u
r[i] <- sum(x*y)/sum(x^2)
}
return(sum(r/1000 - 0))
}
set.seed(42)
Bhat_OLS1 <- vapply(seq(-.9, .9, .1), \(p) sim(p), 0)
stopifnot(all.equal(Bhat_OLS, Bhat_OLS1))
Note:
R.version.string
# [1] "R version 4.1.2 (2021-11-01)"
Could someone explain me why such function doesn't produce a countor plot as I expected.
I've a bivariate normal function whit:
means = c(5,1)
var_cov = matrix(c(2,1,1,1),2)
I'd like to plot its contour plot; I'm able to reach the result but I'd like to ask why in one case I don't get expected result.
Working Example:
library(MASS)
library(ggplot2)
N <- 100
set.seed(123)
var_cov_matrix <- matrix(c(2,1,1,1),2)
mean <- c(5,1)
bivariate_points <- expand.grid(s.1 = seq(-25, 25, length.out=N), s.2 = seq(-25, 25, length.out=N))
z <- mvtnorm::dmvnorm(bivariate_points, mean = mean, sigma = var_cov_matrix)
data <- cbind(bivariate_points,z)
colnames(data) <- c("X1","X2","Z")
data.df <- as.data.frame(data)
ggplot() +
geom_contour(data=data.df,aes(x=X1,y=X2,z=Z))
Non Working Example:
library(MASS)
library(ggplot2)
N <- 100
set.seed(123)
var_cov_matrix <- matrix(c(2,1,1,1),2)
mean <- c(5,1)
bivariate_points <- mvrnorm(N, mu = mean, Sigma = var_cov_matrix ) # <---- EDITED
z <- mvtnorm::dmvnorm(bivariate_points, mean = mean, sigma = var_cov_matrix)
data <- cbind(bivariate_points,z)
colnames(data) <- c("X1","X2","Z")
data.df <- as.data.frame(data)
ggplot() +
geom_contour(data=data.df,aes(x=X1,y=X2,z=Z))
In your non-working example, since you don't have regular grid for contour plot, you can use stat_density2d instead, i.e.,
ggplot(data.df, aes(x = X1, y = X2, z = Z)) +
geom_point(aes(colour = z)) +
stat_density2d()
I want to draw multiple simulated paths from any distribution (lognormal in the present case) on the same plot using ggplot2?
Using print(ggplot()) inside a for- loop does not show the paths all together.
library(ggplot2)
t <- 1000 # length of a simulation
time <- seq(0,t-1,by = 1) # make vector of time points
s <- cumsum(rlnorm(t, meanlog = 0, sdlog = 1)) # simulate trajectory of lognormal variable
df <- data.frame(cbind(time,s)) # make dataframe
colnames(df) <- c("t","s") # colnames
ggplot(df, aes(t,s )) + geom_line() # Get one trajectory
Now i want (say) 100 such paths in the same plot;
nsim <- 100 # number of paths
for (i in seq(1,nsim, by =1)) {
s <- cumsum(rlnorm(t, meanlog = 0, sdlog = 1))
df <- data.frame(cbind(time,s))
colnames(df) <- c("t","s")
print(ggplot(df, aes(t,s, color = i)) + geom_line())
}
The above loop obviously cannot do the job.
Any way to visualize such simulations using simple R with ggplot?
Instead of adding each line iteratively, you could iteratively simulate in a loop, collect all results in a data.frame, and plot all lines at once.
library(ggplot2)
nsim <- 100
npoints <- 1000
sims <- lapply(seq_len(nsim), function(i) {
data.frame(x = seq_len(npoints),
y = cumsum(rlnorm(npoints, meanlog = 0, sdlog = 1)),
iteration = i)
})
sims <- do.call(rbind, sims)
ggplot(sims, aes(x, y, colour = iteration, group = iteration)) +
geom_line()
Created on 2019-08-13 by the reprex package (v0.3.0)
In ggplot one method to achieve such methods is to add extra layers to the plot at each iteration. Doing so, a simple change of the latter code should be sufficient.
library(ggplot2)
nsim <- 100 # number of paths
dat <- vector("list", nsim)
p <- ggplot()
t <- 1000 # length of a simulation
time <- seq(0, t-1, by = 1)
for (i in seq(nsim)) {
s <- cumsum(rlnorm(t, meanlog = 0, sdlog = 1))
dat[[i]] <- data.frame(t = time, s = s)
p <- p + geom_line(data = dat[[i]], mapping = aes(x = t, y = s), col = i)
}
p #or print(p)
Note how I initiate the plot, similarly to how I initiate a list to contain the data frames prior to the loop. The loop then builds the plot step by step, but it is not visualized before i print the plot after the for loop. At which point every layer is evaluated (thus it can take a bit longer than standard R plots.)
Additionally as I want to specify the colour for each specific line, the col argument has to be moved outside the aes.
I am trying to create a plot where for each i there is a density graph and a histogram side by side. For this instance i = 1..3
The problem I have is creating the list to pass to grid.arrange. However I do it it seems to repeat itself somehow.
df:
x1 x2 x3
1 108.28 17.05 1484.10
2 152.36 16.59 750.33
3 95.04 10.91 766.42
4 65.45 14.14 1110.46
5 62.97 9.52 1031.29
6 263.99 25.33 195.26
7 265.19 18.54 193.83
8 285.06 15.73 191.11
9 92.01 8.10 1175.16
10 165.68 11.13 211.15
X <- df
mu.X <- colMeans(X)
cov.X <- cov(X)
eg <- eigen(cov.X)
myprinboot = function(
X,
iter = 10000,
alpha = 0.05,
prettyPlot = T
){
# Find the dimensions of X
nrX <- dim(X)[1]
nx <- dim(X)[2]
# Make matrices of suitable sizes to hold the booted parameter estimates
# lambda
# each cov matrix will have nx lambdas
lambda.mat <- matrix(NA, nr = nx, nc = iter)
# e vectors nx components each and one vector per eigen value
# Each cov matrix will therefore produce a nx X nx matrix of components
Y.mat <- matrix(NA, nr = nx, nc = iter * nx)
# For loop to fill the matrices created above
for (i in 1:iter)
{
# ind will contain random integers used to make random samples of the X matrix
# Must use number of rows nrX to index
ind <- sample(1:nrX,nrX,replace=TRUE)
# eigen will produce lambdas in decreasing order of size
# make an object then remove extract the list entries using $
eigvalvec <- eigen(cov(X[ind,]))
lambda.mat[,i] <- eigvalvec$values
colstart <- 1 + nx * (i - 1)
colend <- colstart + nx - 1
Y.mat[,colstart:colend] = eigvalvec$vectors
}
if(prettyPlot){
p <- list()
i <- 0
for(j in 1:(2*nx))
{
if (j %% 2 == 0){
p[[j]] <- ggplot(NULL, aes(lambda.mat[i,])) +
geom_histogram(color = 'black', fill = 'green', alpha = .5) +
xlab(substitute(lambda[i])) +
ggtitle(substitute(paste("Histogram of the pc variance ", lambda[i])))
} else {
i <- i + 1
p[[j]] <- ggplot(NULL, aes(lambda.mat[i,])) +
geom_density(fill = 'blue', alpha = .5) +
xlab((substitute(lambda[i]))) +
ggtitle(substitute(paste("Density plot of the pc variance ", lambda[i])))
}
do.call(grid.arrange, p)
}
do.call(grid.arrange, p)
} else {
layout(matrix(1:(2*nx),nr=nx,nc=2,byrow=TRUE))
for(i in 1:nx)
{
plot(density(lambda.mat[i,]),xlab=substitute(lambda[i]),
main=substitute(paste("Density plot of the pc variance ", lambda[i])
))
hist(lambda.mat[i,],xlab=substitute(lambda[i]),
main=substitute(paste("Histogram of the pc variance ", lambda[i])))
}
}
library(rgl)
plot3d(t(lambda.mat))
list(lambda.mat = lambda.mat, Y.mat = Y.mat)
}
pc <- myprinboot(X = Y, iter=1000, alpha=0.5)
Output
Anyone have any clue what I am doing wrong or is this just not possible?
I don't understand your code, Jay, as it seems to do lots of things and use both base and ggplot plotting, but if all you want is to create a combined histogram and density plot for each j, why not loop over j and inside that for j loop do something like this:
d <- your density plot created so that it depends on j only
h <- your histogram plot created so that it depends on j only
p[[j]] <- grid.arrange(d,h,ncol=2)
Then, when you come out of the loop, you'll have an object p which consists of a list of plots, with each plot consisting of a combination of density plot and histogram.
Then you could use the cowplot package (after installing it) to do something like this:
cowplot::plot_grid(plotlist = p, ncol = 2)
where the number of columns may need to be changed. See here for other ways to plot a list of plots: How do I arrange a variable list of plots using grid.arrange?
I don't know enough about your problem to understand why you treat the case of j even and j odd differently. But the underlying idea should be the same as what I suggested here.
I eventually got this working as follows.
getHist <- function(x, i){
lam <- paste('$\\lambda_', i, '$', sep='')
p <- qplot(x[i,],
geom="histogram",
fill = I('green'),
color = I('black'),
alpha = I(.5),
main=TeX(paste("Histogram of the pc variance ", lam, sep='')),
xlab=TeX(lam),
ylab="Count",
show.legend=F)
return(p)
}
getDens <- function(x, i){
lam <- paste('$\\lambda_', i, '$', sep='')
p <- qplot(x[i,],
geom="density",
fill = I('blue'),
alpha = I(.5),
main=TeX(paste("Density plot of the pc variance ", lam, sep='')),
xlab=TeX(lam),
ylab="Density",
show.legend=F)
return(p)
}
fp <- lapply(1:3, function(x) arrangeGrob(getHist(lambda.mat, x), getDens(lambda.mat, x), ncol=2))
print(marrangeGrob(fp, nrow = 3, ncol=1, top = textGrob("Lambda.mat Histogram and Density Plot",gp=gpar(fontsize=18))))
I am running a simulation of mixture data. My function is harder than Gaussian distribution. Hence, here, I simplified my question to be in Gaussian form. That is, if I simulated a mixture data like this:
N=2000
U=runif(N, min=0,max=1)
X = matrix(NA, nrow=N, ncol=2)
for (i in 1:N){
if(U[i] < 0.7){
X[i,] <- rnorm(1,0.5,1)
} else {
X[i,] <- rnorm(1,3,5)
}
}
How can I have a scatter plot with different colour and shape (type of the plot point) for each cluster or distribution? I would like to have this manually since my function is hard and complex. I tried plot(X[,1],X[,2],col=c("red","blue")) but it does not work.
I think this is what you want. Note that I had to do a bit of guesswork here to figure out what was going on, because your example code seems to have an error in it, you weren't generating different x1 and x2 values in each row:
N=2000
U=runif(N, min=0,max=1)
X = matrix(NA, nrow = N, ncol=2)
for (i in 1:N){
if(U[i] < 0.7){
# You had rnorm(n=1, ...) which gives 2 identical values in each row
# Change that to 2 and you get different X1 and X2 values
X[i,] <- rnorm(2, 0.5, 1)
} else {
X[i,] <- rnorm(2, 3, 5)
}
}
df = data.frame(
source = ifelse(U < 0.7, "dist1", "dist2"),
x = X[, 1],
y = X[, 2]
)
library(ggplot2)
ggplot(df, aes(x = x, y = y, colour = source, shape = source)) +
geom_point()
Result:
Here's what I got, but I'm not sure if this what you are looking for - the location of the observations for both clusters are exactly the same.
library(tidyverse)
df <- data.frame(X = X, U = U)
df <- gather(df, key = cluster, value = X, -U)
ggplot(df, aes(x = X, y = U, colour = cluster)) + geom_point() + facet_wrap(~cluster)
EDIT: I don't seem to be understanding what you are looking to map onto a scatter plot, so I'll indicate how you need to shape your data in order to create a chart like the above with the proper X and Y coordinates:
head(df)
U cluster X
1 0.98345408 X.1 2.3296047
2 0.33939935 X.1 -0.6042917
3 0.66715421 X.1 -2.2673422
4 0.06093674 X.1 2.4007376
5 0.48162959 X.1 -2.3118850
6 0.50780007 X.1 -0.7307929
So you want one variable for the Y coordinate (I'm using variable U here), one variable for the X coordinate (using X here), and a 3rd variable that indicates whether the observation belongs to cluster 1 or cluster 2 (variable cluster here).