When do I have to set a seed? - r

I think this is a very basic question
I am doing simulations, so I make functions to recreate for example a random walk, which mathematically takes this form:
so to simulate it I make my function:
ar_1 <- function(iter, y0, sigma_e){
e <- rnorm(iter, sd = sigma_e)
y <- numeric(iter)
y[1] <- y0
for(t in 2:iter){
y[t] = y[t-1]+e[t]
}
result <- data.frame(iteration = seq(1,iter), y = y)
print(plot(result$iteration, result$y, type="l"))
return(result)
}
try1 <- ar_1(iter = 100, y0 = 2, sigma_e = 0.0003)
So the thing is the e vector takes random numbers.
I want to replicate the same graph and values wherever, so I know I gotta use a seed.
So my question is: does the seed goes inside the function or at the very start of the script?
Furthermore, I would want to know why.

If you set.seed once at the top of the script, the seed will remain set until the first call to rnorm. Subsequent calls to functions that require a random seed will not use the initial seed.
So really the answer is: do you intend to call the function more than once? If so, then set the seed inside the function.

Note that you do not need a for loop in your function. Because R is vectorized, loops can ussually be avoided. Random walk values can be calculated using the base R cumsum function. For example:
set.seed(7)
y1 <- pi
rand_vals <- rnorm(10, 0, 5)
path <- c(y1, rand_vals)
walk <- cumsum(path)
rand_vals
[1] 11.4362358 -5.9838584 -3.4714626 -2.0614648 -4.8533667 -4.7363997 3.7406967 -0.5847761 0.7632881 10.9498905
path
[1] 3.1415927 11.4362358 -5.9838584 -3.4714626 -2.0614648 -4.8533667 -4.7363997 3.7406967 -0.5847761 0.7632881
[11] 10.9498905
walk
[1] 3.141593 14.577828 8.593970 5.122507 3.061043 -1.792324 -6.528724 -2.788027 -3.372803 -2.609515 8.340376

Related

Summing N normal distributions

I am trying to determine the distribution of the sum of N univariate distributions.
Can you suggest a function that allows me to dynamically input any N number of distributions?
This works:
library(distr)
var1 <- Norm(mean=14, sd=1)
var2 <- Norm(mean=10, sd=1)
var3 <- Norm(mean=9, sd=1)
conv <- convpow(var1+var2+var3,1)
This (obviously) doesn't work since pasting the list together creates a messy character string, however this is the framework for my ideal function:
convolution_multi <- function(mean_list = c(14,10,9,10,50)){
distribution_list <- lapply(X = mean_list, Norm, sd=1)
conv_out <- convpow(paste(distribution_list,collapse="+"),1)
return(conv_out)
}
Thanks for your help!
You can use Reduce to repeatedly add each RV to one another. After that you can use convpow
new_var <- Reduce("+", distribution_list)
convpow(new_var, 1)
With that being said the call to convpow does absolutely nothing here.
> identical(convpow(new_var, 1), new_var)
[1] TRUE

How to generate samples from MVN model?

I am trying to run some code on R based on this paper here through example 5.1. I want to simulate the following:
My background on R isn't great so I have the following code below, how can I generate a histogram and samples from this?
xseq<-seq(0, 100, 1)
n<-100
Z<- pnorm(xseq,0,1)
U<- pbern(xseq, 0.4, lower.tail = TRUE, log.p = FALSE)
Beta <- (-1)^U*(4*log(n)/(sqrt(n)) + abs(Z))
Some demonstrations of tools that will be of use:
rnorm(1) # generates one standard normal variable
rnorm(10) # generates 10 standard normal variables
rnorm(1, 5, 6) # generates 1 normal variable with mu = 5, sigma = 6
# not needed for this problem, but perhaps worth saying anyway
rbinom(5, 1, 0.4) # generates 5 Bernoulli variables that are 1 w/ prob. 0.4
So, to generate one instance of a beta:
n <- 100 # using the value you gave; I have no idea what n means here
u <- rbinom(1, 1, 0.4) # make one Bernoulli variable
z <- rnorm(1) # make one standard normal variable
beta <- (-1)^u * (4 * log(n) / sqrt(n) + abs(z))
But now, you'd like to do this many times for a Monte Carlo simulation. One way you might do this is by building a function, having beta be its output, and using the replicate() function, like this:
n <- 100 # putting this here because I assume it doesn't change
genbeta <- function(){ # output of this function will be one copy of beta
u <- rbinom(1, 1, 0.4)
z <- rnorm(1)
return((-1)^u * (4 * log(n) / sqrt(n) + abs(z)))
}
# note that we don't need to store beta anywhere directly;
# rather, it is just the return()ed value of the function we defined
betadraws <- replicate(5000, genbeta())
hist(betadraws)
This will have the effect of making 5000 copies of your beta variable and putting them in a histogram.
There are other ways to do this -- for instance, one might just make a big matrix of the random variables and work directly with it -- but I thought this would be the clearest approach for starting out.
EDIT: I realized that I ignored the second equation entirely, which you probably didn't want.
We've now made a vector of beta values, and you can control the length of the vector in the first parameter of the replicate() function above. I'll leave it as 5000 in my continued example below.
To get random samples of the Y vector, you could use something like:
x <- replicate(5000, rnorm(17))
# makes a 17 x 5000 matrix of independent standard normal variables
epsilon <- rnorm(17)
# vector of 17 standard normals
y <- x %*% betadraws + epsilon
# y is now a 17 x 1 matrix (morally equivalent to a vector of length 17)
and if you wanted to get many of these, you could wrap that inside another function and replicate() it.
Alternatively, if you didn't want the Y vector, but just a single Y_i component:
x <- rnorm(5000)
# x is a vector of 5000 iid standard normal variables
epsilon <- rnorm(1)
# epsilon_i is a single standard normal variable
y <- t(x) %*% betadraws + epsilon
# t() is the transpose function; y is now a 1 x 1 matrix

I am beginner in R and I'm trying to solve a system of equations but when i run i get error in R [duplicate]

This question already has an answer here:
Simple for loop in R producing "replacement has length zero" in R
(1 answer)
Closed 4 years ago.
# my error : Error in F[1] <- n/(X[0]) - sum(log(1 + Y^exp(X[1] + X[2] * x))) : replacement has length zero
set.seed(16)
#Inverse Transformation on CDF
n=100
SimRRR.f <- function(100, lambda=1,tau)) {
x= rnorm(100,0,1)
tau= exp(-1-x)
u=runif(100)
y= (1/(u^(1/lambda)-1))^(1/tau)
y
}
Y<-((1/u)-1)^exp(-1-x)
# MLE for Simple Linear Regresion
# System of equations
library(rootSolve)
library(nleqslv)
model <- function(X){
F <- numeric(length(X))
F[1] <- n/(X[0])-sum(log(1+Y^exp(X[1]+X[2]*x)))
F[2] <- 2*n -(X[0]+1)*sum(exp(X[1]+X[2]*x))*Y^( exp(X[1]+X[2]*x))*log(Y)/(1+ Y^( exp(X[1]+X[2]*x)))
F[3] <- sum(x) + sum(x*log(Y))*exp(X[1]+X[2]*x) -(X[0]+1)*X[1]*sum(exp(X[1]+X[2]*x)*Y^(exp(X[1]+X[2]*x)*log(Y)))/(1+ Y^( exp(X[1]+X[2]*x)))
# Solution
F
}
startx <- c(0.5,3,1) # start the answer search here
answers<-as.data.frame(nleqslv(startx,model))
The problem is that you define x, u, tau and y inside the SimRRR function, but are trying to define Y in terms of them outside the function.
Using a function, you give it input, and you get back output. All the other variables defined in the course of the function doing its job go away at the end. As it stands, Y should be a series of NAs (unless you defined the above variables in the global environment as you were working on your function...)
Try the following functions, see if they do the job:
# I usually put all my library calls together at the beginning of the script.
library(rootSolve)
library(nleqslv)
x = rnorm(n,0,1) # see below for why this is pulled out.
SimRRR.f <- function(x, lambda=1,tau)) { # 100 can't be by itself in the function call. everything in there needs to be attached to a variable.
n <- length(x)
tau= exp(-1-x)
u=runif(n)
y= (1/(u^(1/lambda)-1))^(1/tau)
y
}
Y_sim = SimRRR.f(n = 100, lambda = 1, tau = 1) # pick the right tau, it's never defined here.
Your second function has more issues. Namely, it relies on x, which is not defined anywhere that can be found. Either you need x from the previous function, or you really meant X. I'm going to assume you do need the values of x, since X is only of length 3. This is why I pulled it out of the last function call - we need it now.
[Update]
It's also been pointed out in the comments that the indexing here is wrong. I didn't catch that previously (and the F elements are defined correctly). I think I've fixed the indexing issues too now:
model <- function(X, Y, x){ # If you use x and Y in the function, define them here.
n <- length(x)
F <- numeric(length(X))
F[1] <- n/(X[1])-sum(log(1+Y^exp(X[2]+X[3]*x)))
F[2] <- 2*n -(X[1]+1)*sum(exp(X[2]+X[3]*x))*Y^( exp(X[2]+X[3]*x))*log(Y)/(1+ Y^( exp(X[2]+X[3]*x)))
F[3] <- sum(x) + sum(x*log(Y))*exp(X[2]+X[3]*x) -(X[1]+1)*X[2]*sum(exp(X[2]+X[3]*x)*Y^(exp(X[2]+X[3]*x)*log(Y)))/(1+ Y^( exp(X[2]+X[3]*x)))
# Solution
F
}
I'm not familiar with the nleqslv package, but unless there is a method defined to convert it to a data frame, that might not go so well. I'd make sure everything else is working before the conversion.
startx <- c(0.5,3,1) # start the answer search here
answers <- nleqslv(startx,model, Y = Y_sim, x = x)
answer_df <- as.data.frame(answers)

plot function to limit in R

Say I have a simple mathematical function n1=m1*n1 and I want to plot this function as n1 approaches infinity. Is there a quick way to do that?
m1=0.1
initial n1=0.1
Or do I have to used deSolve and setup a differential equation? There must be a quick way to do this.
If you mean the next value in this equation depends on the last value you would set up something like this:
m1 <- 0.1
x <- seq(0.1, 1000, 0.1)
y <- c(0.1, rep(NA, length(x)-1))
for(i in 2:length(x)){
y[i] <- y[i-1] * m1
}
plot(y~x, type = "l" )

Uniform kernel function returning NaN

I'm writing my own uniform kernel function like so:
uniform.kernel <- function(data, predict.at, iv.name, dv.name, bandwidth){
#Load in the DV/IV and turn them into vectors
iv <- data$iv.name
dv <- data$dv.name
#Given the point we're predicting,
#what kernel weights does each observation of the iv receive?
kernelvalue <- ifelse(abs((iv - predict.at)/bandwidth)<= 1, 0.5,0)
#Given these kernel values and the dv,
#what is our estimate of the conditional expectation?
conditional.expectation <-sum(kernelvalue*dv)/sum(kernelvalue)
#Return the expectation
return(conditional.expectation)
}
And then applying it to this data:
set.seed(101)
x <- seq(from=0, to=100, by=.1)
errors <- runif(min=.5, max=5000, n=length(x))
y <- x^2 - 3*x + errors^1.1
combo.frame <- cbind.data.frame(x,y)
Only, when I apply the function to the data (like below), I get "NaN".
uniform.kernel(combo.frame, 20, "x","y", 4)
However, when I just write out the steps within my function to the data set directly (without using the function), I get the correct answer. For example, I do the following and get the correct results:
kernelvalue <- ifelse(abs((combo.frame$x - 20)/4)<= 1, 0.5,0)
conditional.expectation <- sum(kernelvalue*combo.frame$y)/sum(kernelvalue)
Why am I getting NaN when I use the function?
You can't use the $ operator with character objects like that. Use the [ operator instead. Replace the first two lines in your function like this:
iv <- data[,iv.name]
dv <- data[,dv.name]
and it works as expected.

Resources