plot function to limit in R - r

Say I have a simple mathematical function n1=m1*n1 and I want to plot this function as n1 approaches infinity. Is there a quick way to do that?
m1=0.1
initial n1=0.1
Or do I have to used deSolve and setup a differential equation? There must be a quick way to do this.

If you mean the next value in this equation depends on the last value you would set up something like this:
m1 <- 0.1
x <- seq(0.1, 1000, 0.1)
y <- c(0.1, rep(NA, length(x)-1))
for(i in 2:length(x)){
y[i] <- y[i-1] * m1
}
plot(y~x, type = "l" )

Related

When do I have to set a seed?

I think this is a very basic question
I am doing simulations, so I make functions to recreate for example a random walk, which mathematically takes this form:
so to simulate it I make my function:
ar_1 <- function(iter, y0, sigma_e){
e <- rnorm(iter, sd = sigma_e)
y <- numeric(iter)
y[1] <- y0
for(t in 2:iter){
y[t] = y[t-1]+e[t]
}
result <- data.frame(iteration = seq(1,iter), y = y)
print(plot(result$iteration, result$y, type="l"))
return(result)
}
try1 <- ar_1(iter = 100, y0 = 2, sigma_e = 0.0003)
So the thing is the e vector takes random numbers.
I want to replicate the same graph and values wherever, so I know I gotta use a seed.
So my question is: does the seed goes inside the function or at the very start of the script?
Furthermore, I would want to know why.
If you set.seed once at the top of the script, the seed will remain set until the first call to rnorm. Subsequent calls to functions that require a random seed will not use the initial seed.
So really the answer is: do you intend to call the function more than once? If so, then set the seed inside the function.
Note that you do not need a for loop in your function. Because R is vectorized, loops can ussually be avoided. Random walk values can be calculated using the base R cumsum function. For example:
set.seed(7)
y1 <- pi
rand_vals <- rnorm(10, 0, 5)
path <- c(y1, rand_vals)
walk <- cumsum(path)
rand_vals
[1] 11.4362358 -5.9838584 -3.4714626 -2.0614648 -4.8533667 -4.7363997 3.7406967 -0.5847761 0.7632881 10.9498905
path
[1] 3.1415927 11.4362358 -5.9838584 -3.4714626 -2.0614648 -4.8533667 -4.7363997 3.7406967 -0.5847761 0.7632881
[11] 10.9498905
walk
[1] 3.141593 14.577828 8.593970 5.122507 3.061043 -1.792324 -6.528724 -2.788027 -3.372803 -2.609515 8.340376

how to draw the log-likelihood graph

I am learning how to draw a log-likelihood graph. Please allow me briefly introduce what I want to do specifically:
Assume we have the data/vector as below:
set.seed(123)
sample <- rpois(50, 1.65)
And the log_like function is given as below:
log_like_graph <- function(lambda){
X <- as.matrix(sample) # not sure whether this is necessary for one-parameter distribution.
N <- nrow(X)
logLik <- N*log(lambda) - lambda*N*mean(X)
return(loglik)
}
log_like_graph <- Vectorize(log_like_graph)
# set range of lambda
lambda_vals <- seq(-10,10,by=1)
log_vals <- outer(lambda_vals,log_like_graph)
Based on the above lambda_vals and log_vals, I expect to produce a plot like below:
However, when I excute the last command: log_vals <- outer(lambda_vals,log_like_graph), I got the error hint
Error in as.vector(x, mode) :
cannot coerce type 'closure' to vector of type 'any'
Could you please help me solve this problem? Thank you very much!
(FYI: I mainly follow the youtube video https://www.youtube.com/watch?v=w3drLH-DFpE&ab_channel=CalebLikesR that teaches to draw the curve for a log-likelihood function, although it uses normal distribution for demonstration.)
A couple of things I see; no need to vectorise log_like_graph as you can just pass lambda values into it with sapply rather than outer, you are passing lambda_vals < 0 but the support of lambda is >= 0, and I don't think your log-likelihood function is correct (I think it should be -N * lambda - sum(lfactorial(sample)) + log(lambda) * sum(X) but it is easier/more accurate to use dpois(..., log=TRUE)).
So fixing these things
# data
set.seed(123)
samples <- rpois(50, 1.65)
# The log-likelihood becomes
log_like_graph <- function(X, lambda){
N <- NROW(X)
logLik <- -N * lambda - sum(lfactorial(X)) + log(lambda) * sum(X)
return(logLik)
}
# set lambda >= 0 and take smaller steps (0.01) for a smoother curve
lambda_vals <- seq(0,10,by=0.01)
# loop through lambda values calculating the log-likehood at each value
ll1 <- sapply(lambda_vals, function(i) log_like_graph(samples, i))
plot(lambda_vals, ll1, type="l")
This can also be done with dpois(..., log=TRUE) :
ll2 <- sapply(lambda_vals, function(i) sum(dpois(samples, lambda=i, log=TRUE)))
all.equal(ll1, ll2)
# [1] TRUE

Trying to replicate rgeom() funtion

As an exercise, I'm trying to write a function which replicates the rgeom() function. I want it to have the same arguments and return values. I've started out by using runif to generate a vector with x elements, but I'm not sure how to apply the probability distribution:
rgeometric <- function(x, prob) {
outcomes <- runif(x)
P <- (1 - prob)^length(x) * prob
return (P)
}
Would it be something like the following? How can I check that the distribution is geometric?
set.seed(0)
rgeometric <- function(x, prob) {
outcomes <- runif(x)
P <- (1 - prob)^length(x) * prob
for (i in x) {
x[i] <- x[i]*P
}
return (outcomes)
}
rgeometric(5, 0.4)
We can accomplish this task using Inverse Transform Sampling.
First, let's clear up some of your notation.
In the rgeom() function, we'll want that first argument to be n, an integer vector of length one giving the number of samples to generate:
rgeometric <- function(n, prob) {
u <- runif(n)
## do stuff
}
So how does inverse transform sampling work?
First we generate a vector u of standard uniform deviates, as shown above.
Then, for each element ui of u, we find the value of the inverse of the cumulative density function at ui.
For the geometric distribution, the CDF is 1 - (1 - prob)^(x+1); the inverse of the CDF is ceiling(log(1-u) / log(1-prob)) - 1 (link to derivation, p. 11).
So, we can complete the function like so:
rgeometric <- function(n, prob) {
u <- runif(n)
return(ceiling(log(1-u) / log(1-prob)) - 1)
}
Your last question is how can we test if the resulting samples are distributed geometric?
I don't know of a formal test that will help, but we can see it appears to work when we compare the density of 1 million random draws from this custom function to the density of 1 million random draws from base R's rgeom() function:
n <- 1e6
p <- 0.25
set.seed(0)
x <- rgeometric(n, p)
y <- rgeom(n, p)
png("so-answer.png", width = 960)
opar <- par(mfrow = c(1, 2))
plot(density(x), main = "Draws from custom function")
plot(density(y), main = "Draws from base R function")
par(opar)
dev.off()
Note that for the definition of the geometric function implemented by r, the random variable is the number of failures until the first success. Therefore you could do:
my_rgeom <- function(n, p){
fun <- function(p){
n <- 0
stopifnot(p>0)
while(runif(1)>p) n <- n+1
n
}
replicate(n, fun(p))
}
Now test the function:
n <- 100000
p <- 0.25
X <- rgeom(n, p)
Y <- my_rgeom(n, p)
You can do a ks.test on X and Y, though this is for continuous variables. The best thing to do is the chisq.test to determine whether the two are similar.
Lastly we could use graphical methods. eg superimposed histogram:
barplot(table(X), col = rgb(0.5, 1, 0.5, 0.4))
barplot(table(Y), add = TRUE, col = rgb(1, 0.5, 0, 0.3))
From the image above you can see that the two are nearly identical

How to run monte carlo simulation from a custom distribution in R

I would like to pull 1000 samples from a custom distribution in R
I have the following custom distribution
library(gamlss)
mu <- 1
sigma <- 2
tau <- 3
kappa <- 3
rate <- 1
Rmax <- 20
x <- seq(1, 2e1, 0.01)
points <- Rmax * dexGAUS(x, mu = mu, sigma = sigma, nu = tau) * pgamma(x, shape = kappa, rate = rate)
plot(points ~ x)
How can I randomly sample via Monte Carlo simulation from this distribution?
My first attempt was the following code which produced a histogram shape I did not expect.
hist(sample(points, 1000), breaks = 51)
This is not what I was looking for as it does not follow the same distribution as the pdf.
If you want a Monte Carlo simulation, you'll need to sample from the distribution a large number of times, not take a large sample one time.
Your object, points, has values that increases as the index increases to a threshold around 400, levels off, and then decreases. That's what plot(points ~ x) shows. It may describe a distribution, but the actual distribution of values in points is different. That shows how often values are within a certain range. You'll notice your x axis for the histogram is similar to the y axis for the plot(points ~ x) plot. The actual distribution of values in the points object is easy enough to see, and it is similar to what you're seeing when sampling 1000 values at random, without replacement from an object with 1900 values in it. Here's the distribution of values in points (no simulation required):
hist(points, 100)
I used 100 breaks on purpose so you could see some of the fine details.
Notice the little bump in the tail at the top, that you may not be expecting if you want the histogram to look like the plot of the values vs. the index (or some increasing x). That means that there are more values in points that are around 2 then there are around 1. See if you can look at how the curve of plot(points ~ x) flattens when the value is around 2, and how it's very steep between 0.5 and 1.5. Notice also the large hump at the low end of the histogram, and look at the plot(points ~ x) curve again. Do you see how most of the values (whether they're at the low end or the high end of that curve) are close to 0, or at least less than 0.25. If you look at those details, you may be able to convince yourself that the histogram is, in fact, exactly what you should expect :)
If you want a Monte Carlo simulation of a sample from this object, you might try something like:
samples <- replicate(1000, sample(points, 100, replace = TRUE))
If you want to generate data using points as a probability density function, that question has been asked and answered here
Let's define your (not normalized) probability density function as a function:
library(gamlss)
fun <- function(x, mu = 1, sigma = 2, tau = 3, kappa = 3, rate = 1, Rmax = 20)
Rmax * dexGAUS(x, mu = mu, sigma = sigma, nu = tau) *
pgamma(x, shape = kappa, rate = rate)
Now one approach is to use some MCMC (Markov chain Monte Carlo) method. For instance,
simMCMC <- function(N, init, fun, ...) {
out <- numeric(N)
out[1] <- init
for(i in 2:N) {
pr <- out[i - 1] + rnorm(1, ...)
r <- fun(pr) / fun(out[i - 1])
out[i] <- ifelse(runif(1) < r, pr, out[i - 1])
}
out
}
It starts from point init and gives N draws. The approach can be improved in many ways, but I'm simply only going to start form init = 5, include a burnin period of 20000 and to select every second draw to reduce the number of repetitions:
d <- tail(simMCMC(20000 + 2000, init = 5, fun = fun), 2000)[c(TRUE, FALSE)]
plot(density(d))
You invert the ECDF of the distribution:
ecd.points <- ecdf(points)
invecdfpts <- with( environment(ecd.points), approxfun(y,x) )
samp.inv.ecd <- function(n=100) invecdfpts( runif(n) )
plot(density (samp.inv.ecd(100) ) )
plot(density(points) )
png(); layout(matrix(1:2,1)); plot(density (samp.inv.ecd(100) ),main="The Sample" )
plot(density(points) , main="The Original"); dev.off()
Here's another way to do it that draws from R: Generate data from a probability density distribution and How to create a distribution function in R?:
x <- seq(1, 2e1, 0.01)
points <- 20*dexGAUS(x,mu=1,sigma=2,nu=3)*pgamma(x,shape=3,rate=1)
f <- function (x) (20*dexGAUS(x,mu=1,sigma=2,nu=3)*pgamma(x,shape=3,rate=1))
C <- integrate(f,-Inf,Inf)
> C$value
[1] 11.50361
# normalize by C$value
f <- function (x)
(20*dexGAUS(x,mu=1,sigma=2,nu=3)*pgamma(x,shape=3,rate=1)/11.50361)
random.points <- approx(cumsum(pdf$y)/sum(pdf$y),pdf$x,runif(10000))$y
hist(random.points,1000)
hist((random.points*40),1000) will get the scaling like your original function.

Illustrating the LLN (Law of Large Numbers)

I have to illustrate the Law of Large Numbers through simulations in R.
More precisely.
I would like to illustrate that the cumulative distribution function of the mean,
converges to the function f given by
f(x) = 0 if x ≤ μ and f(x) = 1 if x > μ.
In my case, I have to use a dice. That is, each Xi is the uniformly distributed on {1,2,3,4,5,6}, so μ = 3.5.
Using R, I have tried to proceed in the following way:
n <- 100
N <- 10000
mu <- 3.5
for(j in 1:N)
{
V[j] <- sum(sample(1:6), n, replace = TRUE);
}
f <- function(x)
{
if (x<=3.5)
{
y <-0
}
else
{
y <- 1
}
y
}
Vf <- Vectorize(f, "x")
So my idea was to compare the cumulative distribution function of the mean with the function f using a plot. How can I implement it in R properly. So I have to plot the cumulative distribution function and the function f in one plot.
You can simulate dice-rolls like this
set.seed(1)
n.rolls <- 100
dicerolls <- sample(1:6, n.rolls, replace=TRUE)
mean(dicerolls)
As for the rest of your question I'm afraid I'd need some further explanation. Maybe you can draw an image of what kind of plot you want?
If this is homework you should tag your question accordingly, and read the info for the tag.
As you can see this site doesn't support MathJax/LaTeX equation mark-up. If you want to include equations you can do it through something like codecogs.
Maybe it's something like this you're thinking of?
dicerolls <- function(rolls=2, reps=10^4) {
mean.per.replicate <- replicate(reps, mean(sample(1:6, rolls, replace=TRUE)))
}
set.seed(1)
dice.seq <- c(1:6, 20, 100)
opar <- par(no.readonly=TRUE)
par(mar=c(2, 2.5, 1, 0.1), mfrow=c(length(dice.seq), 2),
cex=0.5, mgp=c(1.5, 0.5, 0))
for (i in dice.seq) {
hist(dicerolls(i), breaks=50, col="darkgrey",
xlim=c(1, 6), ylim=c(0, 3), freq=FALSE, main="", xlab="")
legend("topleft", paste(i, "dice"), bty="n")
plot(ecdf(dicerolls(i)), xlim=c(1, 6), main="", frame.plot=FALSE)
}
par(opar)
Consider a dice rolling experiment and consider the Expected value for this event .E[X] = 1+2+3+4+5+6 / 6
Suppose we perform the experiment of throwing the dice n times , recording the number that is observed each time , let the observations be X1 , X2 , ….Xn.
If we compute say the mean say Xbar = X1 + X2 + …….Xn / n.
if the n is large then , the Xbar should tend to E[X] .
For better understanding , i have a blog where the intuition and mathematical part has been explained and also there is a simulation you can play with and the python code for the same is also available on the website. The following is the link .
https://statisticsexplained.blogspot.com/2020/06/law-of-large-numbers-explained-using.html.
There is a simulation for better understanding and the python code for the same has been attached too .

Resources