How could I avoid for loop calculating xts weighted sum as I am trying to do below:
library(xts)
thetaSum <- function(theta, w=c(1, 1, 1)) {
sum(coredata(theta)*rev(w))
}
n <- 10
tmpVec <- rep(1, n)
tmpDates <- seq(as.Date("2000-01-01"), length = n, by = "day")
theta <- xts(tmpVec, order.by=tmpDates)
N <- 3
thetaSummed <- xts(rep(NA, n), order.by=tmpDates)
for (i in N:n) {
thetaTemp <- theta[(i-N+1):i, ]
thetaSummed[i] <- thetaSum(thetaTemp, w=rep(1, N))
}
thetaSummed
N is a look back period smaller than n.
What are some fast alternatives for for loop?
You can use rollapply.
rollapplyr(theta, width=3, FUN=thetaSum, fill=NA)
Related
I wrote this function which gives me as output a frequency table divided in K classes, the mean of datas and the the mead of the frequency table.
Data_Frame <- function(x, k) {
if(k<=1) {
print("insert a number greter then 1") }
else {
data_range <- range(x)
interval_width <- (max(data_range)-min(data_range))/k
cutting_values <- seq (from = min(data_range),
to = max(data_range),
by= interval_width,)
lower_bounds <- cutting_values[1:k]
upper_bounds <- cutting_values[2:(k+1)]
counts <- numeric(length = k)
for (k in seq_along(counts)) {
counts[k] <- length(
x[which((x>=cutting_values[k]) & (x<=cutting_values[(k+1)]))])
}
DF <- data.frame(low.bounds = lower_bounds,
up.bounds = upper_bounds,
freq = counts)
Data_m <- mean(x)
DF_m <- sum((DF$low.bounds+DF$up.bounds)/2*DF$freq)/
sum(DF$freq)
result <- list(DF, Data_m = Data_m, DF_m = DF_m)
return(result)
}
}
### Code for using functions
set.seed(4321)
x <- rnorm(1000, 10, 2)
k <- 5L
result_function_1 <- Data_Frame(x, k)
print(result_function_1)
I have to write a second function which must search for the best number of k that minimizes the absolute value of the error between the mean of datas (Data_m) and the mean of frequency table (DF_m). Starting from k = 2 to K = max_K I have to return the optimal k.
Could someone help me out?
thanks
I've tried to optimize a function which I wrote a few weeks ago.
It got better but it is still slow. So I used Rprof() and found out split() takes the most time which for some reason makes me think this function can be a lot better.
Can it be done?!
normDist_V2 <- function(size=1e5, precision=1, ...)
{
data <- rnorm(size)
roundedData <- round(data, precision)
framedData <- data.frame(cbind(data, roundedData))
factoredData <- split(framedData$data, framedData$roundedData)
actualsize <- (size)/10^precision
X <- names(factoredData)
Probability <- sapply(factoredData, length) / actualsize
plot(X, Probability, ...)
}
Current speed:
system.time(normDist_V2(size=1e7, precision = 2)) #11.14 sec
normDist_V2 <- function(size = 1e5, precision = 1, ...) {
require(data.table)
data <- rnorm(size)
roundedData <- round(data, precision)
framedData <- data.table(data, roundedData)
actualsize <- (size)/10^precision
dt <- framedData[, .N, keyby = roundedData]
X <- dt$roundedData
Probability <- dt$N/actualsize
plot(X, Probability, ...)
}
system.time(normDist_V2(size=1e7, precision = 2)) # 1.26 sec
Title's a little rough, open to suggestions to improve.
I'm trying to calculate time-average covariances for a 500 length vector.
This is the equation we're using
The result I'm hoping for is a vector with an entry for k from 0 to 500 (0 would just be the variance of the whole set).
I've started with something like this, but I know I'll need to reference the gap (i) in the first mean comparison as well:
x <- rnorm(500)
xMean <-mean(x)
i <- seq(1, 500)
dfGam <- data.frame(i)
dfGam$gamma <- (1/(500-dfGam$i))*(sum((x-xMean)*(x[-dfGam$i]-xMean)))
Is it possible to do this using vector math or will I need to use some sort of for loop?
Here's the for loop that I've come up with for the solution:
gamma_func <- function(input_vec) {
output_vec <- c()
input_mean <- mean(input_vec)
iter <- seq(1, length(input_vec)-1)
for(val in iter){
iter2 <- seq((val+1), length(input_vec))
gamma_sum <- 0
for(val2 in iter2){
gamma_sum <- gamma_sum + (input_vec[val2]-input_mean)*(input_vec[val2-val]-input_mean)
}
output_vec[val] <- (1/length(iter2))*gamma_sum
}
return(output_vec)
}
Thanks
Using data.table, mostly for the shift function to make x_{t - k}, you can do this:
library(data.table)
gammabar <- function(k, x){
xbar <- mean(x)
n <- length(x)
df <- data.table(xt = x, xtk = shift(x, k))[!is.na(xtk)]
df[, sum((xt - xbar)*(xtk - xbar))/n]
}
gammabar(k = 10, x)
# [1] -0.1553118
The filter [!is.na(xtk)] starts the sum at t = k + 1, because xtk will be NA for the first k indices due to being shifted by k.
Reproducible x
x <- c(0.376972124936433, 0.301548373935665, -1.0980231706536, -1.13040590360378,
-2.79653431987176, 0.720573498411587, 0.93912102300901, -0.229377746707471,
1.75913134696347, 0.117366786802848, -0.853122822287008, 0.909259181618213,
1.19637295955276, -0.371583903741348, -0.123260233287436, 1.80004311672545,
1.70399587729432, -3.03876460529759, -2.28897494991878, 0.0583034949929225,
2.17436525195634, 1.09818265352131, 0.318220322390854, -0.0731475581637693,
0.834268741278827, 0.198750636733429, 1.29784138432631, 0.936718306241348,
-0.147433193833294, 0.110431994640128, -0.812504663900505, -0.743702167768748,
1.09534507180741, 2.43537370755095, 0.38811846676708, 0.290627670295127,
-0.285598287083935, 0.0760147178373681, -0.560298603759627, 0.447188372143361,
0.908501134499943, -0.505059597708343, -0.301004012157305, -0.726035976548133,
-1.18007702699501, 0.253074712637114, -0.370711296884049, 0.0221795637601637,
0.660044122429767, 0.48879363533552)
I'm trying to create a simulation to calculate the confidence interval for a binomial proportion. So far I have a function that calculates the lower and upper bounds and I have generated and stored the type of data I want (in a matrix, I'm not sure about that).
How can I create a loop that generates samples with different sizes. I'd like to test how the formula performs when calculating the intervals with sample sizes n=10, 11, 12,... up to 100.
My code so far:
## functions that calculate lower and upper bounds
ll <- function(x, cl=0.95) {
n <- length(x)
p.est <- mean(x)
z = abs(qnorm((1-cl)/2))
return((p.est) - z*sqrt(p.est*(1-p.est)/n))
}
ul <- function(x, cl=0.95) {
n <- length(x)
p.est <- mean(x)
z = abs(qnorm((1-cl)/2))
return((p.est) + z*sqrt(p.est*(1-p.est)/n))
}
## my simulation for n=10 and 200 repetitions.
p <- 0.4
n <- 10
rep <- 200
dat <- rbinom(rep*n,1,p)
x <- matrix(dat, ncol=rep)
ll.res <- apply(x, 2, ll)
ul.res <- apply(x, 2, ul)
hits <- ll.res <= p & p <= ul.res
sum(hits==1)/rep
I'm not sure which values do you want to compare between different sample sizes. But I guess wrapping your simulation in a for and using lists to store the results should work:
nrep=200
hits=list()
value=NULL
ll.res = list()
ul.res = list()
ns = c(10:100)
for(i in 1:length(ns)){
p <- 0.4
n <- ns[i]
rep <- 200
dat <- rbinom(rep*n,1,p)
x <- matrix(dat, ncol=nrep)
ll.res[[i]] <- apply(x, 2, ll)
ul.res[[i]] <- apply(x, 2, ul,cl=0.95)
hits[[i]] <- ll.res[[i]] <= p & p <= ul.res[[i]]
value[i] = sum(hits[[i]]==1)/rep
}
I have to solve the following exercise.
(1) Create 100 Poisson distributed r.v.'s with lambda = 4
(2) Calculate the mean of the sample, generated in (1).
(3) Repeat (1) and (2) 10.000 times.
(4) create a vector, containing the 10.000 means.
(5) plot the vector in a histogram.
Is the following solution(?) right?
> as.numeric(x)
> for(i in 1:10000){
> p <- rpois(100, lambda = 4)
> m <- mean(p)
> append(x, m)
>}
> hist(x, breaks = 20)
It's a little funny. You can quickly do what you ask in more legible ways. For example:
L <- 10000
emptyvector <- rep(NA, L)
for(i in 1:L){
emptyvector[i] <- mean(rpois(100, lambda = 4))
}
hist(emptyvector)
I would have taken advantage of the replicate() function which would create a matrix of results and then run colMeans to quickly get my vector.
meanvector <- colMeans(replicate(10000, rpois(100, lambda = 4)))
hist(meanvector, main = "Mean values from 10,000 runs of \nPoisson n = 100")
hist(replicate(10000, mean(rpois(100, lambda = 4))))
you need to assign x again with the value.
x1 <- x <- NULL
for(i in 1:10000){
p <- rpois(100, lambda = 4)
m <- mean(p)
x[length(x) + 1] <- m
x1 <- append(x1, m)
## X or x1 vector will suffice for histogram
}
hist(x1, breaks = 20)