Adding random numbers to the end of a vector - r

I am trying to repeatedly compute the median of a random vector, constructing a vector of all the medians. Why do I get a NULL result from my code below, in which I repeatedly compute the median and add it to the end of the vector m?
medainfunc<-function(n,mu,sigma,iterate){
m<-c()
for (i in itreate){
x<-rnorm(n,mu,sigma)
y<-median(x)
m<-c(m,y)
}
m
}
medianfunc(10,3,15,10000)
NULL

We have multiple typos in the OP's code, i.e. iterate vs. itreate and calling medainfunc while the original function is medianfunc. In addition, we are providing only a single input value for 'iterate', so seq(iterate may be we want inside the function. Otherwise, we get a single value output.
medianfunc<-function(n,mu,sigma,iterate){
m<-c()
for (i in seq(iterate)){
x<-rnorm(n,mu,sigma)
y<-median(x)
m<-c(m,y)
}
m
}
set.seed(25)
medianfunc(10,3,15, 5)
#[1] 0.9770646 -6.4852741 4.6768291 -6.4167869 5.3176253
This could be vectorized by getting the rnorm of 'n*iterate' values. Convert this to a matrix and use colMedians from library(matrixStats).
medianfunc1 <- function(n, mu, sigma, iterate){
m1 <- matrix(rnorm(n*iterate, mu, sigma), ncol=iterate)
library(matrixStats)
colMedians(m1)
}
set.seed(25)
medianfunc1(10,3,15, 5)
#[1] 0.9770646 -6.4852741 4.6768291 -6.4167869 5.3176253

Building a vector one-by-one is both inefficient (see the second circle of the R inferno) and code-intensive. Instead, you can repeat a randomized operation a certain number of times in a single line of code with the replicate function:
medianfunc <- function(n, mu, sigma, iterate) {
replicate(iterate, median(rnorm(n, mu, sigma)))
}
set.seed(25)
medianfunc(10,3,15, 5)
# [1] 0.9770646 -6.4852741 4.6768291 -6.4167869 5.3176253

I dont know if this is just the code you wrote on the post but the function is called medainfunc and then on your code you are calling the medianfunc. That might be just one error, the other thing I noticed is that you should add return
at the end of your function like so:
medainfunc<-function(n,mu,sigma,iterate){
m<-c()
for (i in itreate){
x<-rnorm(n,mu,sigma)
y<-median(x)
m<-c(m,y)
}
return(m)
}

Related

Vectorize a two argument function

I have a covariance function type of two lags: h1 and h2. I am trying to avoid for loops to create the covariance function matrix.
When I type cov1 it does not give me a matrix. Just a vector if I type for example covmatrix(h1=1:5,h2=1:5). How can I obtain for example the whole 5 by 5 matrix.
I tried all apply functions, and the new vectorize function (with lower case v)
R code:
x=arima.sim(n = 100 , list(ar = .5))
cov=function(h1,h2){
(1/n)*sum((x[1:(n-h1-h2)]-mean(x))*(x[(1+h1):(n-h2)]-mean(x))*(x[(1+h1+h2):n]-mean(x)))
}
covmatrix=Vectorize(cov)
A simple double-apply should get you what you are looking for. Note how the return value of the vectorized function is equal to the diagonal of the covmatrix.
test <- sapply(1:5, function(x) sapply(1:5, function(y) cov(x, y)))
all.equal(diag(test), covmatrix(1:5, 1:5))

Problems with Gaussian Quadrature in R

I'm using the the gaussquad package to evaluate some integrals numerically.
I thought the ghermite.h.quadrature command worked by evaluating a function f(x) at points x1, ..., xn and then constructing the sum w1*f(x1) + ... + wn*f(xn), where x1, ..., xn and w1, ..., wn are nodes and weights supplied by the user.
Thus I thought the commands
ghermite.h.quadrature(f,rule)
sum(sapply(rule$x,f)*rule$w)
would yield the same output for any function f, where ''rule'' is a dataframe which stores the nodes in a column labelled ''x'' and the weights in a column labelled "w". For many functions the output is indeed the same, but for some functions I get very different results. Can someone please help me understand this discrepancy?
Thanks!
Code:
n.quad = 50
rule = hermite.h.quadrature.rules(n.quad)[[n.quad]]
f <- function(z){
f1 <- function(x,y) pnorm(x+y)
f2 <- function(y) ghermite.h.quadrature(f1,rule,y = y)
g <- function(x,y) x/(1+y) / f2(y)*pnorm(x+y)
h <- function(y) ghermite.h.quadrature(g,rule,y=y)
h(z)
}
ghermite.h.quadrature(f,rule)
sum(sapply(rule$x,f)*rule$w)
Ok, that problem got me interested.
I've looked into gaussquad sources, and clearly author is not running sapply internally, because all integrands/function shall return vector on vector argument.
It is clearly stated in documentation:
functn an R function which should take a numeric argument x and possibly some parameters.
The function returns a numerical vector value for the given argument x
In case where you're using some internal functions, they're written that way, so everything works.
You have to rewrite your function to work with vector argument and return back a vector
UPDATE
Vectorize() works for me to rectify the problem, as well as simple wrapper with sapply
vf <- function(z) {
sapply(z, f)
}
After either of those changes, results are identical: 0.2029512

Linear regression using a list of function

I've a dataset with X and Y value obtained from a calibration and I have to interpolate them with a predefined list of polynomial functions and choose the one with the best R2.
The most silly function should be
try<-function(X,Y){
f1<- x + I(x^2.0) - I(x^3.0)
f2<- x + I(x^1.5) - I(x^3.0)
...
f20<- I(x^2.0) - I(x^2.5) + I(x^0.5)
r1<- lm(y~f1)
r2<- lm(y~f2)
...
r20<-lm(y~f20)
v1<-summary(r1)$r.squared
v2<-summary(r2)$r.squared
...
v20<-summary(r20)$r.squared
v<-c(v1,v2,...,v20)
return(v)
}
I'd like then to make this function shorter and smarter (especially from the definition of r1 to the end). I'd also like to give the user the possibility to choose a function among f1 to f20 (typing the desired row number of v) and see the output of the function print and plot on it.
Please, could you help me?
Thank you.
#mso: the idea of using sapply is nice but unfortunately in this way I don't use a polynome for the regression: my x vector is transformed in the f1 vector according to the formula and then used for the regression. I obtain just one parameter instead of 3 (in this case).
Create F as a list and proceed:
F = list(f1, f2, ...., f20)
r = sapply(F, function(x) lm(y~x))
v = sapply(r, function(x) summary(x)$r.squared)
return v
sapply will take each element of F and perform lm with y and put results in vector r. In next line, sapply will take every element of r and get summary and put the results in the vector v. Hopefully, it should work. You could also try lapply (instead of sapply) which is very similar.

Optimizing a function in a loop

I am trying to deal with a likelihood function from an AR(2) model.
I have to maximize a function with respect to two variables, alpha1 and alpha2.
Since it is about time series, I have the variable x stored in a matrix for 200 time periods.
I have 10000 simulations of this. So I have the x values in a 200x10000 matrix.
I want to directly have a matrix of 2x10000 in which the results of the optimization for every simulation are stored. I have set a for loop and I have specified the function inside of it, but it is not working, i.e. when I run it it tells me:
Error in A <- -optim(c(1.5, 0.75), log_lik) : argument not valid for operator
I attach here my code. I have created a function to contain the results before running the loop and I have called it A:
for (i in 1:R) {
for (t in 3:N) {
log_lik <- function (α) {
α1 <- α[1]
α2 <- α[2]
L = -1/2*((N-2)*log(pi*2)+(N-2)*log(1)+sum((x[t,i]-c-α1*x[t-1,i]-α2*x[t-2,i])^2))
}
A <- -optim(c(1.5, 0.75), log_lik)$par
}
return(A)
}
Thanks a lot!!

R: looping to search for max of non-monotonic function

Refer to the R code below. The function (someRfunction) operates on a vector and returns a scalar value. The data are pairs (x,y), where x and y are vectors of length n, which may be large.
I want to know the value of x* such that the result of someRfunction on y where {x>x*} is maximized. The function operates on y values and is non-monotonic in x*. I need to evaluate for all x* (i.e. each element of x). Speed is not an issue if executed once, but the code would be executed many times in a simulation. Is there any way to make this code more efficient/faster?
### x and y are vectors of length n
### sort x and y such that they are ordered by descending x
xord <- x[order(-x)]
yord <- y[order(-x)]
maxf <- -99999
maxcut <- NA
for (i in 1:n) {
### yi is a subvector of y that corresponds to y[x>x{i}]
### where x{i} is the (n-i+1)th order statistic of x
yi <- yord[1:(i-1)]
fxi <- someRfunction(yi)
if (fxi>maxf) {
maxf <- fxi
maxcut <- xord[i]
}
}
Thanks.
Edit: let someRfunction(yi)=t.test(yi)$statistic.
If you can say anything more about the function, particularly whether it is smooth and whether its gradient can be determine, you will get a better answer. At the moment the only increase in speed will be modest due to the ability to pre-specify a vector to hold the results, omit that if-max clause and then use which.max() on the vector. You might want to look at the function optimx in package "optimx".

Resources