General code for a summation in R - r

I'm writing some code in R and I came across following problem:
Basically, I want to calculate a variable X[k], where X takes on values for each k, like this:
where A is a known variable which takes on different values for each index.
For the moment, I have something like this:
k <- NULL
X <- NULL
z<- 1: n
for (k in seq(along =z)){
for (j in seq (along = 1:k)){
X[k] = 1/k*sum(A[n-k]/A[n-j+1])
}
}
which can't be right. Any idea on how to fix this one?
As always, any help would be dearly appreciated.

Try this
# define A
A <- c(1,2,3,4)
n <- length(A)
z <- 1:n
#predefine X (don't worry, all values will be overwritten, but it will have the same length as A
X <- A
for(k in z){
for(j in 1:k){
X[k] = 1/k*sum(A[n-k]/A[n-j+1])
}
}
You don't need to define z, it is only used inside the for. In this case, do for(k in 1:n){

As
You can do the following
set.seed(42)
A <- rnorm(10)
k <- sample(length(A), 4)
calc_x <- function(A, k){
n <- length(A)
c_sum <- cumsum(1/rev(A)[1:max(k)])
A[n-k]/k * c_sum[k]
}
calc_x(A,k)
what returns:
[1] 0.07775603 2.35789999 -0.45393983 0.13323284

Related

loop in loop to calculate a sum of a "continuous product" sequence function in R

I try to calculate the value for the following equation in R.
I have the dataset and the value for each corresponding F_x, F_{x+1}...
However, as both Q and s have too many values, I am considering write a loop in loop. It's bit confused. As a loop for Q seems conflicting as the loop for s
But if I write loop as below, seems like I need to by hand calculate Q 100 times to get all of the answer. Also my loop seems to be wrong...How can I fix this problem? Thank you so much
Y <- function(x,s, Q){
n <- length(s-x)-1
Q <- c(1:100)
for(s in seq(1:n)){
Y[s] <- sum(s*Q[s]*cumprod(Fx[1:s]))
}
return(Y)
}
I am not sure if the code below reaches your objective
Y <- function(x,s) {
Q <- 1:100
S <- 1:(s-x)
outer(Q,S,FUN = function(q,s) q * sum(c(1:s) * cumprod(Fx[1:s])))
}
for loop version
Y <- function(x,s) {
nr <- 100
nc <- s-x
y <- matrix(nr*nc,nrow = nr)
for (Q in 1:nr) {
for (S in 1:nc) {
y[Q,S] <- Q * sum(c(1:S) * cumprod(Fx[1:S]))
}
}
y
}

Is it possible to use vector math in R for a summation involving intervals?

Title's a little rough, open to suggestions to improve.
I'm trying to calculate time-average covariances for a 500 length vector.
This is the equation we're using
The result I'm hoping for is a vector with an entry for k from 0 to 500 (0 would just be the variance of the whole set).
I've started with something like this, but I know I'll need to reference the gap (i) in the first mean comparison as well:
x <- rnorm(500)
xMean <-mean(x)
i <- seq(1, 500)
dfGam <- data.frame(i)
dfGam$gamma <- (1/(500-dfGam$i))*(sum((x-xMean)*(x[-dfGam$i]-xMean)))
Is it possible to do this using vector math or will I need to use some sort of for loop?
Here's the for loop that I've come up with for the solution:
gamma_func <- function(input_vec) {
output_vec <- c()
input_mean <- mean(input_vec)
iter <- seq(1, length(input_vec)-1)
for(val in iter){
iter2 <- seq((val+1), length(input_vec))
gamma_sum <- 0
for(val2 in iter2){
gamma_sum <- gamma_sum + (input_vec[val2]-input_mean)*(input_vec[val2-val]-input_mean)
}
output_vec[val] <- (1/length(iter2))*gamma_sum
}
return(output_vec)
}
Thanks
Using data.table, mostly for the shift function to make x_{t - k}, you can do this:
library(data.table)
gammabar <- function(k, x){
xbar <- mean(x)
n <- length(x)
df <- data.table(xt = x, xtk = shift(x, k))[!is.na(xtk)]
df[, sum((xt - xbar)*(xtk - xbar))/n]
}
gammabar(k = 10, x)
# [1] -0.1553118
The filter [!is.na(xtk)] starts the sum at t = k + 1, because xtk will be NA for the first k indices due to being shifted by k.
Reproducible x
x <- c(0.376972124936433, 0.301548373935665, -1.0980231706536, -1.13040590360378,
-2.79653431987176, 0.720573498411587, 0.93912102300901, -0.229377746707471,
1.75913134696347, 0.117366786802848, -0.853122822287008, 0.909259181618213,
1.19637295955276, -0.371583903741348, -0.123260233287436, 1.80004311672545,
1.70399587729432, -3.03876460529759, -2.28897494991878, 0.0583034949929225,
2.17436525195634, 1.09818265352131, 0.318220322390854, -0.0731475581637693,
0.834268741278827, 0.198750636733429, 1.29784138432631, 0.936718306241348,
-0.147433193833294, 0.110431994640128, -0.812504663900505, -0.743702167768748,
1.09534507180741, 2.43537370755095, 0.38811846676708, 0.290627670295127,
-0.285598287083935, 0.0760147178373681, -0.560298603759627, 0.447188372143361,
0.908501134499943, -0.505059597708343, -0.301004012157305, -0.726035976548133,
-1.18007702699501, 0.253074712637114, -0.370711296884049, 0.0221795637601637,
0.660044122429767, 0.48879363533552)

Arguments of a function where another function will be called

Consider a hypothetical example:
sim <- function(n,p){
x <- rbinom(n,1,p)
y <- (x==0) * rnorm(n)
z <- (x==1) * rnorm(n,5,2)
dat <- data.frame(x, y, z)
return(dat)
}
Now I want to write another function simfun where I will call the above sim function and check if y and z columns of the data frame is less than a value k.
simfun <- function(n, p, k){
dat <- sim(n, p)
dat$threshold <- (dat$y<=k & dat$z<=k)
return(dat$threshold)
}
But is it standard to use the argument of sim as the argument of simfun? Can I write simfun <- function(k) and call the sim function inside simfun?
I'd say it's fairly standard to do this sort of thing in R. A few pointers to consider:
Usually you should explicitly declare the argument names so as not to create any unwanted behaviour if changes are made. I.e., instead of sim(n, p), write sim(n = n, p = p).
To get simfun() down to just a k argument will require default values for n and p. There are lots of ways to do this. One way would be to hardcode inside simfun itself. E.g.:
simfun <- function(k) {
dat <- sim(n = 100, p = c(.4, .6))
dat$threshold <- (dat$y<=k & dat$z<=k)
return(dat$threshold)
}
simfun(.5)
A more flexible way would be to add default values in the function declaration. When you do this, it's good practice to put variables with default values AFTER variables without default values. So k would come first as follow:
simfun <- function(k, n = 100, p = c(.4, .6)){
dat <- sim(n, p)
dat$threshold <- (dat$y<=k & dat$z<=k)
return(dat$threshold)
}
simfun(.5)
The second option is generally preferable because you can still change n or p if you need to.
While not great, you could define n and p separately
n <- 1
p <- .5
simfun <- function(k){
dat <- sim(n, p)
dat$threshold <- (dat$y<=k & dat$z<=k)
return(dat$threshold)
}
You can read more about R Environments here: http://adv-r.had.co.nz/Environments.html

R: How could I change this loop to apply?

I'm currently working on an R program, where there is one part of this program that computes in a loop two values which are interdependant. Although since I have to do 100,000 iterations it takes so long time.
So I would like to substitute this for loop for an apply loop or some more efficient function, but I don't know how to do it. Could someone help me?
p <- c()
for(i in 1:n) {
if(i == 1) {
x <- b[i]
}
else {
x <- c(x, max(h[i - 1], p[i]))
}
h <- c(h, x[i] + y[i])
}
Thank you very much!!
You don't seem to have a full working example here, but the main problem is that building up the x and h vectors with the c() function is very slow. It's better to preallocate them:
x <- numeric(n) # allocate vector of size n
h <- numeric(n)
and then fill them in as you go by assigning to x[i] and h[i]. For example, the following loop:
x <- c(); for (i in 1:100000) x <- c(x,1)
takes about 10 seconds to run on my laptop, but this version:
x <- numeric(100000); for (i in 1:100000) x[i] <- 1
does the same thing while running almost instantly.

Simulations in R

I'm trying to write a function such as to obtain a test statistic for a vector of size n over 10 simulations. I wrote the following code but I'm not getting the result I need, how can I fix this?
skw=function(n,nsims){
t=numeric(nsims)
for (i in 1:nsims) {
x=rnorm(n)
t[i]=skwness(x)
zscore=t/(6/n)
return(zscore)
}
}
where:
skwness=function(x){
n=length(x)
skew.stat=(1/(n))*(1/(sd(x)^3))*(sum((x-mean(x))^3))
return(skew.stat)
}
Thanks!
You have a couple of issues. The major one is that return should be outside the for loop. Also, you should define t and zscore as vectors, and x as a list.
I think this will work.
Side note: t seems unnecessary in this function. You could get away with using
zscore[i] <- skwness(x[[i]])/(6/n) and get rid of t all together
skwness <- function(x){
n <- length(x)
skew.stat <- (1/(n))*(1/(sd(x)^3))*(sum((x-mean(x))^3))
return(skew.stat)
}
skw <- function(n, nsims){
t <- zscore <- numeric(nsims)
x <- vector("list", nsims)
for (i in 1:nsims) {
x[[i]] <- rnorm(n)
t[i] <- skwness(x[[i]])
zscore[i] <- t[i]/(6/n)
}
return(zscore)
}
Giving it a go:
> x <- rnorm(100)
> skwness(x)
[1] 0.2332121
> skw(100, 10)
[1] 0.6643582 -1.6963196 -2.9192317 -2.7166170 4.9255001 0.0773482 3.9171435
[8] -3.3993994 -2.4258642 0.7075989

Resources