Coding a simple loop for a sliding window [closed] - r

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 years ago.
Improve this question
I have the following problem. I have a time series made by 2659 observations. I need to perform a statistical test over a sliding window of length 256 and each time I want to extract the p-values from these tests and gather them into a time series vector. To perform this test (runs test) I want as threshold a moving average that moves along with the data and the rolling window. Here is my attemp (in R)
x<- ts(rnorm(2659, mean = 0.0001, sd = 0.0001))
library(randtests)
for(i in 1:2404){
runs <- runs.test(x[i:i+255], threshold = mean(x[i:i+255]))
ret[i] <- runs$p.value
}
The index starts from 1 but stops to 2404 because the time window must move of 256 each time, therefore the first window goes from 1 to 256, the second from 2 to 257... and finally stops to 255+2404 = 2659. I hope that I made clear my problem, I do not understand why it does not work. Of course I need to plot the result over time to have in a plot all the p-values over the time. I hope you can help me.
PS: Please, set a seed if you propose an example so that I can reproduce your results.

Use rollapplyr with the indicated function.
library(zoo)
pv <- function(xx) runs.test(xx, threshold = mean(xx))$p.value
out <- rollapplyr(x, 256, pv, fill = NA)
Note
library(randtests)
set.seed(123)
x <- ts(rnorm(2659, mean = 0.0001, sd = 0.0001))

Two changes to your existing code should make it work:
set.seed(0)
x <- ts(rnorm(2659, mean = 0.0001, sd = 0.0001))
library(randtests)
ret <- rep(NA, length(x))
for(i in 1:2404){
runs <- runs.test(x[i:(i+255)], threshold = mean(x[i:(i+255)]))
ret[i] <- runs$p.value
}
First change is to initialize the ret variable before the loop. ret <- rep(NA, length(x))
The second change is to add the parenthesis, i.e. x[i:(i+255)]. If you do x[i:i+255], you will get a single return value, x[i].

Related

How does one calculate the correct amplitude with R's fft()-function? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
In R, I am generating uncorrelated values in time domain with rnorm(). Then I apply fft() to these values, however, I am only getting a value of 0.88 instead of 1. Is there anything I am not aware of?
Here is a MWE:
# dt <- 0.01 # time stesp
nSteps <- 100000 # Number of time steps
# df <- 1/(nSteps*dt) # frequency resolution
# t <- 0:(nSteps-1)*dt #
y <- rnorm(nSteps, mean=0, sd=1) # generate uncorrelated data. Should result in a white noise spectrum with sd=1
y_sq_sum <- sum(y^2)
# We ignore cutting to the Nyquist frequency.
# f <- 0:(nSteps-1)*df
fft_y <- abs(fft(y))/sqrt(length(y))
fft_y_sq_sum <- sum(fft_y^2)
print(paste("Check for Parseval's theorem: y_sq_sum = ", y_sq_sum, "; fft_y_sq_sum = ", fft_y_sq_sum, sep=""))
print(paste("Mean amplitude of my fft spectrum: ", mean(fft_y)))
print(paste("The above is typically around 0.88, why is it not 1?"))
This question doesn't belong on StackOverflow, it's more of a Cross-validated kind of thing. But here's an answer anyway:
Parseval's theorem says that the mean of fft_y^2 should be 1. The square root function is a concave function, so Jensen's inequality says the mean of sqrt(fft_y^2) will be less than 1. Since fft_y is positive in your definition, fft_y = sqrt(fft_y^2).

Why does my function only sometimes work?

I have the following function:
samp315<-function(n=30, desmean=86, distance=3.4995) {
x = seq(from = 0, to = 100, by = 0.1)
samp<-0
while (!between(mean(samp),desmean-distance,desmean+distance)) samp<-sample(x,n,replace=TRUE)
samp
}
percent <- samp315()
so pretty much I want to generate 30 numbers within 0-100 that has a mean of 86+/-3.4995, however whenever I run the last line it will load forever or when I am lucky it will genrate a list of desired results. Any idea on how i could change the function to improve its functionality?
As suggested by Parfait in the comments, you're using a randomization strategy that gives a low probability of providing the condition you're interested in. Did no other answer to this question help you out?
Some other possible strategies for you to try out.
n = 30
# Using truncated normal
library(truncnorm)
x = round(rtruncnorm(n, a = -0.0495, b = 100.0495, mean = 85, sd = 3.5*2), 1)
# Using beta
sig = 3
x = round(100*rbeta(n, (0.85)*sig, (1-0.85)*sig), 1)
The round(..., 1) is meant to align with your vector x. These methods would both have very few values away from 85. It's a trade-off you have to consider. If you want to have a mean in 85 +/- 3.5, then you can't too many values below 10, for example. So you have to lower the probability of such values being selected. Using your function, when it is completed, you'll probably find that values closer to 85 are more represented.

Why is this causing my program to crash? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I have created the following user defined function which generates random variables using the Acceptance-Rejection Method. However, whenever it is called, my program goes on and eventually crashes or I have to force quit. I have gone through it several times. Any ideas as to what could be causing this?
I am aware that this may not be the best way to have written this (Yesterday was the first time I have used R) so any extra tips are a bonus!
acceptReject <- function(){
Z <- 0
Y <- c(0,0)
while(Y[2] < .5*(Y[1]-1)**2){
U <- runif(2,0,1)
Y <- log(U)
}
Z <- Y[1]
U <- runif(1,0,1)
if(U <= .5){
Z <- abs(Z)
}
else{
Z <- -abs(Z)
}
Z
}
You have an infinite loop.
If you assume that Y ~ log( [0,1] ) (mathematically), that means that it always range between log(0) and log(1), equating to -Inf and 0, respectively. (Bottom line, it is always less-than-or-equal-to zero.)
Now let's look at your conditional: .5*(Y[1]-1)**2. If you know the domain of Y is c(-Inf,0), then the range of this formula is
.5*(c(-Inf,0)-1)**2
# [1] Inf 0.5
(This is always greater-than-or-equal-to 0.5.)
Since Y is always <= 0 and the formula is always >= 0.5, your conditional will mathematically always be true. Infinite loop.

'Non-conformable arguments' in R code [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 4 years ago.
Improve this question
: ) I previously wrote an R function that will compute a least-squares polynomial of arbitrary order to fit whatever data I put into it. "LeastSquaresDegreeN.R" The code works because I can reproduce results I got previously. However, when I try to put new data into it I get a "Non-conformable arguments" error.
"Error in Conj(t(Q))%*%t(b) : non-conformable arguments"
An extremely simple example of data that should work:
t <- seq(1,100,1)
fifthDegree <- t^5
LeastSquaresDegreeN(t,fifthDegree,5)
This should output and plot a polynomial f(t) = t^5 (up to rounding errors).
However I get "Non-conformable arguments" error even if I explicitly make these vectors:
t <- as.vector(t)
fifthDegree <- as.vector(fifthDegree)
LeastSquaresDegreeN(t,fifthDegree,5)
I've tried putting in the transpose of these vectors too - but nothing works.
Surely the solution is really simple. Help!? Thank you!
Here's the function:
LeastSquaresDegreeN <- function(t, b, deg)
{
# Usage: t is independent variable vector, b is function data
# i.e., b = f(t)
# deg is desired polynomial order
# deg <- deg + 1 is a little adjustment to make the R loops index correctly.
deg <- deg + 1
t <- t(t)
dataSize <- length(b)
A <- mat.or.vec(dataSize, deg) # Built-in R function to create zero
# matrix or zero vector of arbitrary size
# Given basis phi(z) = 1 + z + z^2 + z^3 + ...
# Define matrix A
for (i in 0:deg-1) {
A[1:dataSize,i+1] = t^i
}
# Compute QR decomposition of A. Pull Q and R out of QRdecomp
QRdecomp <- qr(A)
Q <- qr.Q(QRdecomp, complete=TRUE)
R <- qr.R(QRdecomp, complete=TRUE)
# Perform Q^* b^T (Conjugate transpose of Q)
c <- Conj(t(Q))%*%t(b)
# Find x. R isn't square - so we have to use qr.solve
x <- qr.solve(R, c)
# Create xPlot (which is general enough to plot any degree
# polynomial output)
xPlot = x[1,1]
for (i in 1:deg-1){
xPlot = xPlot + x[i+1,1]*t^i
}
# Now plot it. Least squares "l" plot first, then the points in red.
plot(t, xPlot, type='l', xlab="independent variable t", ylab="function values f(t)", main="Data Plotted with Nth Degree Least Squares Polynomial", col="blue")
points(t, b, col="red")
} # End

How could I speed up my R code? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
Disclaimer
Hello everyone! I recently started programming in R. My codes are working just fine, but in terms of speed some of them are taking way too long to put to good use. I hope someone can help me making this code run faster either by optimising the code, or with the use of one of the multicore packages.
About the code
I have large datasets containing about 15000 numeric data each. The code takes two parameters (p, n) where p >= n, and make subsets of the data. It applies the zyp.yuepilon function (from the zyp package) to each row of the subsets. Then the parameter n is used to apply the same function on an n sized subset.
Problem is I run this code in a nested for loop: p in 10:40 and n in 10:40 so it takes an eternity to get the results, and it's just one dataset among many others.
sp <- function(p, n){
library(zyp)
data <- runif(15000, 1, 4)
lower <- seq(80 - p + 1, by=1, length.out=length(data)-81)
upper <- lower + p - 1
subsets <- matrix(nrow=length(lower), ncol=p)
for(j in 1:length(lower)){
subsets[j, ] = data[lower[j] : upper[j]]
}
ret <- apply(subsets, 1, zyp.yuepilon)
subset_n <- subsets[, 1:n]
ret2 <- apply(subset_n, 1, zyp.yuepilon)
return(list(ret, ret2))
}
Benchmark results in seconds:
expr min lq median uq max neval
sp(7, 6) 92.77266 94.24901 94.53346 95.10363 95.64914 10
Here is a series of comments, rather than an answer.
Looking at the zyp.yuepilon function body, by calling the function without parenthesis in a R session, you see that this function, and the function zyp.sen are written in plain R code (as opposed to compiled code).
The biggest speed-up is likely attained by using the Rcpp package which facilitates calling (compiled) C++ code within R. In fact, there is a small linear model example here Fast LM model using Rcpp/RcppArmadillo.
I would be inclined to rewrite the two functions zyp.yuepilon and zyp.sen in C++, using Rcpp, including the loop over subset vectors (for which you are currently using apply to do).
For general R speed-up issues see this question R loop performance, as well as the R package plyr, which may provide an entry point for taking a map-reduce type of approach to your problem.
If you want to steer clear of C++, then a series of micro-optimisations would be your quickest win. To speed up the apply aspect of your code, you could use something like this
library(doParallel)
library(parallel)
library(foreach)
library(zyp)
cl<-makeCluster(4)
registerDoParallel(cl)
sp_1<-function(p=7, n=6){
N_ob=15000;
off_set=81;
N_ob_o=N_ob-off_set;
am<-matrix(runif(N_ob*p),ncol=p);
subsets<-am[-(1:off_set),];
ret=matrix(unlist( foreach(i=1:N_ob_o) %dopar% zyp::zyp.yuepilon(subsets[i,]),use.names=FALSE),ncol=11, byrow=TRUE);
subset_n <- subsets[, 1:n]
ret2=matrix(unlist( foreach(i=1:N_ob_o) %dopar% zyp::zyp.yuepilon(subset_n[i,]),use.names=FALSE),nrow=11);
return(list(ret, ret2))
}
sp<-function(p=7, n=6){
data <- runif(15000, 1, 4)
lower <- seq(80 - p + 1, by=1, length.out=length(data)-81)
upper <- lower + p - 1
subsets <- matrix(nrow=length(lower), ncol=p)
for(j in 1:length(lower)){
subsets[j, ] = data[lower[j] : upper[j]]
}
ret <- apply(subsets, 1, zyp.yuepilon)
subset_n <- subsets[, 1:n]
ret2 <- apply(subset_n, 1, zyp.yuepilon)
return(list(ret, ret2))
}
system.time(sp_1())
system.time(sp())
This gives me a speed-up of around a factor of 2. But this will depend on your platform, etc. Check out the help files for the functions and packages above, and tune the number of clusters using makeCluster to see what works best for your platform (in the absence of any information about your particular set-up).
Another route might be to make use of the byte-code compiler via library(compiler) to see if the various functions can be optimised, this way.
library(compiler)
enableJit(3);
zyp_comp<-cmpfun(zyp.yuepilon);

Resources