I made a user defined function...
From a vector x, y, f(x,y) returns list of (x,y,z)...
Now I want to do iterations of
data1 <- f(x,y)
data2 <- f(data1$x, data1$y)
data3 <- f(data2$x, data2$y)
data4 <- f(data3$x, data3$y)
and so on...
Is there a way to make a loop for this?
I tried to use paste function
data1 <- f(x,y)
for (i = 2:10) {
assign(paste("data",i,sep=""), f(paste("data",i-1,"$x",sep=""), paste("data",i-1,"$y",sep=""))
}
but it gets error since input becomes "data1$x" which is string not numeric.
As Vincent just replied you can make a list, and a list of lists etc. This will make it easier to produce what you want.
I made an example for you:
x <- 1:10; y <- 11:20
f <- function(x, y) {return(list(x = x+1, y = y+1))}
data <- c()
data[[1]] <- f(x, y)
for(i in 2:10){
data[[i]] <- f(data[[i-1]]$x, data[[i-1]]$y)
}
You can then get x from time i with data[[i]]$x.
Related
Title's a little rough, open to suggestions to improve.
I'm trying to calculate time-average covariances for a 500 length vector.
This is the equation we're using
The result I'm hoping for is a vector with an entry for k from 0 to 500 (0 would just be the variance of the whole set).
I've started with something like this, but I know I'll need to reference the gap (i) in the first mean comparison as well:
x <- rnorm(500)
xMean <-mean(x)
i <- seq(1, 500)
dfGam <- data.frame(i)
dfGam$gamma <- (1/(500-dfGam$i))*(sum((x-xMean)*(x[-dfGam$i]-xMean)))
Is it possible to do this using vector math or will I need to use some sort of for loop?
Here's the for loop that I've come up with for the solution:
gamma_func <- function(input_vec) {
output_vec <- c()
input_mean <- mean(input_vec)
iter <- seq(1, length(input_vec)-1)
for(val in iter){
iter2 <- seq((val+1), length(input_vec))
gamma_sum <- 0
for(val2 in iter2){
gamma_sum <- gamma_sum + (input_vec[val2]-input_mean)*(input_vec[val2-val]-input_mean)
}
output_vec[val] <- (1/length(iter2))*gamma_sum
}
return(output_vec)
}
Thanks
Using data.table, mostly for the shift function to make x_{t - k}, you can do this:
library(data.table)
gammabar <- function(k, x){
xbar <- mean(x)
n <- length(x)
df <- data.table(xt = x, xtk = shift(x, k))[!is.na(xtk)]
df[, sum((xt - xbar)*(xtk - xbar))/n]
}
gammabar(k = 10, x)
# [1] -0.1553118
The filter [!is.na(xtk)] starts the sum at t = k + 1, because xtk will be NA for the first k indices due to being shifted by k.
Reproducible x
x <- c(0.376972124936433, 0.301548373935665, -1.0980231706536, -1.13040590360378,
-2.79653431987176, 0.720573498411587, 0.93912102300901, -0.229377746707471,
1.75913134696347, 0.117366786802848, -0.853122822287008, 0.909259181618213,
1.19637295955276, -0.371583903741348, -0.123260233287436, 1.80004311672545,
1.70399587729432, -3.03876460529759, -2.28897494991878, 0.0583034949929225,
2.17436525195634, 1.09818265352131, 0.318220322390854, -0.0731475581637693,
0.834268741278827, 0.198750636733429, 1.29784138432631, 0.936718306241348,
-0.147433193833294, 0.110431994640128, -0.812504663900505, -0.743702167768748,
1.09534507180741, 2.43537370755095, 0.38811846676708, 0.290627670295127,
-0.285598287083935, 0.0760147178373681, -0.560298603759627, 0.447188372143361,
0.908501134499943, -0.505059597708343, -0.301004012157305, -0.726035976548133,
-1.18007702699501, 0.253074712637114, -0.370711296884049, 0.0221795637601637,
0.660044122429767, 0.48879363533552)
This is the code i tried to use to solve the problem but my results are not accurate. Any suggestions will be appreciated. thanks
> fun1 <- function(x,y){
##input of the function
x<- matrix(data = x, nrow=20,ncol =0 )
##get mean of rows and columns
mean_row<- rowMeans(x)
mean_col<- colMeans(x)
##create the return value y as a list
y<- list(x, mean_row, mean_col)
return(y)
}
Consider a hypothetical example:
sim <- function(n,p){
x <- rbinom(n,1,p)
y <- (x==0) * rnorm(n)
z <- (x==1) * rnorm(n,5,2)
dat <- data.frame(x, y, z)
return(dat)
}
Now I want to write another function simfun where I will call the above sim function and check if y and z columns of the data frame is less than a value k.
simfun <- function(n, p, k){
dat <- sim(n, p)
dat$threshold <- (dat$y<=k & dat$z<=k)
return(dat$threshold)
}
But is it standard to use the argument of sim as the argument of simfun? Can I write simfun <- function(k) and call the sim function inside simfun?
I'd say it's fairly standard to do this sort of thing in R. A few pointers to consider:
Usually you should explicitly declare the argument names so as not to create any unwanted behaviour if changes are made. I.e., instead of sim(n, p), write sim(n = n, p = p).
To get simfun() down to just a k argument will require default values for n and p. There are lots of ways to do this. One way would be to hardcode inside simfun itself. E.g.:
simfun <- function(k) {
dat <- sim(n = 100, p = c(.4, .6))
dat$threshold <- (dat$y<=k & dat$z<=k)
return(dat$threshold)
}
simfun(.5)
A more flexible way would be to add default values in the function declaration. When you do this, it's good practice to put variables with default values AFTER variables without default values. So k would come first as follow:
simfun <- function(k, n = 100, p = c(.4, .6)){
dat <- sim(n, p)
dat$threshold <- (dat$y<=k & dat$z<=k)
return(dat$threshold)
}
simfun(.5)
The second option is generally preferable because you can still change n or p if you need to.
While not great, you could define n and p separately
n <- 1
p <- .5
simfun <- function(k){
dat <- sim(n, p)
dat$threshold <- (dat$y<=k & dat$z<=k)
return(dat$threshold)
}
You can read more about R Environments here: http://adv-r.had.co.nz/Environments.html
I am trying to create a data frame that I do not know the size of. Is there a way to create a data frame that adapts to your variables?
Am I able to do something like this?
df <- function(n){
x <- numeric(0)
y <- numeric(0)
z <- numeric(0)
i <- 0
repeat{
x[i] <- value1(...)
y[i] <- value2(...)
z[i] <- value3(...)
i < i + 1
if(i >= n){
break
}
}
df <- data.frame(val1 = x, val2 = y, val3 = z)
}
For this sake lets assume that value1(), value2(), and value3() just return some numeric value.
I think you can do this way:
initialaize your data frame as empty using empty vectors:
df <- data.frame(val1=value1(),
val2=value2(),
val3=value3())
I want to make a boolean column which states whether or not each sample is a maximum.
I made this function and used it with tapply:
is.max <- function(x){
x <- data.frame(x)
x$x <- round(x$x,5)
x_max <- round(max(x),5)
for(i in 1:nrow(x)) {
if(x$x[i] == x_max) x$is.max[i] <- T
else x$is.max[i] <- F
}
return(x$is.max)
}
y <- c(rnorm(10), runif(10), rnorm(10,1))
f <- gl(3,10)
m <- tapply(y,f,is.max)
But is there a better, efficient way to do that?
{P.S. Actually with my real data I used sapply, e.g. is.maxes<-sapply(s, function(x) is.max(x[,"Sum"]),simplify=F)}
Yeah, you can use do this in one line with tapply:
tapply(y,f,function(x) round(x,5)==round(max(x),5))