Detecting cycle maxima (peaks) in noisy time series (In R?) [closed] - r

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
This question is about an algorithm for determining the number and location of maxima in a sequence of numbers. Thus, there is a statistical flavor to the question, but it is more leaning towards programming, because I am not interested in the specific statistical properties, and the solution needs to be in R. The use of statistics to answer this question is OK, but not a requirement.
I want to extract maxima of cycles in time series data (i.e., an ordered sequence of numbers). An example of such data is the solar flare time series (~11 year cycle, between 9 & 14 years). The cycles don't repeat at a perfect interval, and the peaks aren't always the same height.
I found a recent paper describing an algorithm for this, and the paper actually uses solar flares as an example (Figure 5, Scholkmann et al. 2012, Algorithms). I was hoping that this algorithm, or an equally effective algorithm, was available as an R package.
Link to Scholkmann paper on "automatic multiscale-based peak detection"
http://www.mdpi.com/1999-4893/5/4/588
I've tried the "turningpoints" function in the "pastecs" package but it seemed to be too sensitive (i.e., detected too many peaks). I thought of trying to smooth the time series first, but I'm not sure if this is the best approach (I'm no expert).
Thanks for any pointers.

If the peaks are almost periodic (with a slowly fluctuating period), as in the sunspot example,
you can use the Hilbert transform or the empirical mode decomposition to smooth the time series.
library(EMD)
x <- as.vector(sunspots)
r <- emd(x)
# Keep 5 components -- you may need more, or less.
y <- apply( r$imf[,5:10], 1, sum ) + mean(r$residue)
plot(x, type="l", col="grey")
lines( y, type="l", lwd=2)
n <- length(y)
i <- y[2:(n-1)] > y[1:(n-2)] & y[2:(n-1)] > y[3:n]
points( which(i), y[i], pch=15 )

Here is a solution involving the wmtsa package in R. I added my own little function to facilitate the searching of maxima once the wmtsa::wavCWTPeaks got it close.
PeakCycle <- function(Data=as.vector(sunspots), SearchFrac=0.02){
# using package "wmtsa"
#the SearchFrac parameter just controls how much to look to either side
#of wavCWTPeaks()'s estimated maxima for a bigger value
#see dRange
Wave <- wavCWT(Data)
WaveTree <- wavCWTTree(Wave)
WavePeaks <- wavCWTPeaks(WaveTree, snr.min=5)
WavePeaks_Times <- attr(WavePeaks, which="peaks")[,"iendtime"]
NewPeakTimes <- c()
dRange <- round(SearchFrac*length(Data))
for(i in 1:length(WavePeaks_Times)){
NewRange <- max(c(WavePeaks_Times[i]-dRange, 1)):min(c(WavePeaks_Times[i]+dRange, length(Data)))
NewPeakTimes[i] <- which.max(Data[NewRange])+NewRange[1]-1
}
return(matrix(c(NewPeakTimes, Data[NewPeakTimes]), ncol=2, dimnames=list(NULL, c("PeakIndices", "Peaks"))))
}
dev.new(width=6, height=4)
par(mar=c(4,4,0.5,0.5))
plot(seq_along(as.vector(sunspots)), as.vector(sunspots), type="l")
Sunspot_Ext <- PeakCycle()
points(Sunspot_Ext, col="blue", pch=20)

Related

Coding a plot with functions [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
So I've probably referenced the entire internet trying to make this problem work, and haven't. However, I found stack overflow. Like I said I've been learning for not even 2 weeks yet.
So this is the problem
Let
f(x)=sqrt((x^3+3x^2+1)/(x^4+5x^3+7x+9))
(x ≥ 0)
(a) Draw a line graph of (x, f(x)) for 0 ≤ x ≤ 10 with increments of 0.01
(b) Find numerically the maximum value of f(x) and the maximizer x (report x to the
second decimal place. For instance, x = 1.23)
So I'm basically been saying x=x and y= the sqrt....., and then I write plot(x,y,type="l") and usually it just doesn't even work.
Also how do I do the increment part. I'm sorry for lack of explanation, but I have no idea what most of this means.
First thing to do would be to define the function:
equation <- function(x){
sqrt((x^3+3*x^2+1)/(x^4+5*x^3+7*x+9))
}
Then, define the values you want to apply the function to, and store them in vector input
input<-seq(0,10,0.01)
Apply the equation function to input, and store the values in vector results
results<-sapply(input,function)
Produce a line plot:
plot(input,results,type="l")
Print the value of x which maximises f(x)
maxx<-input[which.max(results)]
maxx
I would suggest a ggplot2 approach. First you have to create a random x variable and then compute y. I will add the code for that variables an the plot.
In the case of finding the maximum of f(x) you must know calculus or you can use a visual approach. Here the code:
library(ggplot2)
library(dplyr)
set.seed(123)
First we create a random variable x with the limits you mentioned:
#Data
x <- runif(100,0,10)
Now, we save the variable in a dataframe and compute y:
#Allocate data in a dataframe
df <- data.frame(x=x)
#Compute variable
df$y <- sqrt(((df$x^3)+3*((df$x)^2)+1)/((df$x^4)+5*(df$x)^3+7*(df$x)+9))
Finally, we plot:
#Plot
ggplot(df,aes(x=x,y=y))+
geom_point()+
scale_x_continuous(limits = c(0,10))
Output:
Values for x are randomly generated, if you have real values for x you should use those values.

In which cases is it most efficent to write a part of R code in C++ or parallelise it (in R or Rcpp)? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have some data with a special structure that requires me to write my own k-means function. Not far into this, I have already noticed the extremely high computation time when calculating the distance of a center to all data points. Because my data is going to be about 60 times larger in the future and I'll need to do runs with many different cluster sizes, I am very concerned with speed.
I have a attached a minimal example of calculating the distance from one randomly sampled center to each data point. I am not yet experienced using C++ or parallel computation in R, but I am most unsure about which of these solutions is the best approach to my problem (here and there some people claim you should parallelize whenever, some people claim it is almost never necessary, some advice for, some advice against using Rcpp). As with most things in life I'm sure there are cases where all of these answers are correct. However, what are the general circumstances when to go with which approach?
(I have profiled this code and couldn't find anything I could improve just within the R code for speed. If you have any suggestions however, please let me know as well!)
x <- matrix(runif(15000*34),nrow = 15000, ncol = 34)
w <- matrix(runif(15000*17),nrow = 15000, ncol = 17)
k <- 3
i <- 1
centers <- x[sample.int(nrow(x), size = k),]
weighted_matching <- function(point,center,weight){
point <- matrix(point, ncol = 2, nrow = 17, byrow = T)
center <- matrix(center, ncol = 2, nrow = 17, byrow = T)
1/sum(weight) * sum(weight * apply(point, 1, function(x,y) sqrt(sum((x-y)^2)), y = center))
}
system.time(
apply(x, 1, weighted_matching, weight = w, center = centers[i,])
)
There are two cases I use C++ in replacement to R:
iterate over lots of elements (large for-loops)
want to reduce memory-footprint
In your case, you are already using vectorized code instead of loop, so the first point does not apply.
The second point however could be beneficial; indeed, you're computing (x-y)^2 which creates two new temporary vectors.
It would be beneficial to rewrite this in C++ to use less memory and maybe getting a 2-3 fold improvement in computation time.
But, when I usually hear about "computing distances", I would probably go for trying to derive this using matrix computations (linear algebra).

In R, find non-linear lines from two sets of points and then find the intersection of those points

Using R, I want to estimate two curves using points from two vectors, and then find the x and y coordinates where those estimated curves intersect.
In a strategic setting with players "t" and "p", I am simulating best responses for both players in response to what the other would pick in a strategic setting (game theory). The problem is that I don't have functions or lines, I have two sets of points originating from simulation, with one set of points corresponding to the player's best response to given actions by the other player. The actual math was too difficult for me (or matlab) to solve, which is why I'm using this simulated visual approach. I want to estimate best response functions (i.e. create non-linear curves) using the points, and then take the two estimated curves and find where they intersect in order to identify nash equilibrium (where the best response curves intersect).
As an example, here are two such vectors I am working with:
t=c(10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0)
p=c(12.3,12.3,12.3,12.3,12.3,12.3,12.4,12.4,12.4,12.5,12.5,12.5,12.6,12.6,12.7,12.7,12.8,12.8,12.9,12.9,13.0,13.1,13.1,13.2,13.3,13.4,13.5,13.4,13.5,13.6,13.6,13.7,13.8,13.8,13.9,13.9,13.9,14.0,14.0,14.0,14.0)
For the first line, the sample is made up of (t,a), and for the second line, the sample is made up of (a,p) where a is a third vector given by
a = seq(10, 14, by = 0.1)
For example, the first point for the sample corresponding to the first vector would be (10.0,10.0) and the second point would be (10.0,10.1). The first point for the sample corresponding to the second vector would be (10.0,12.3) and the second point would be (10.1,12.3).
What I originally tried to do is estimate the lines using polynomials produced by lm models, but those don't seem to always work:
plot(a,t, xlim=c(10,14), ylim=c(10,14), col="purple")
points(p,a, col="red")
fit4p <- lm(a~poly(p,3,raw=TRUE))
fit4t <- lm(t~poly(a,3,raw=TRUE))
lines(a, predict(fit4t, data.frame(x=a)), col="purple", xlim=c(10,14), ylim=c(10,14),type="l",xlab="p",ylab="t")
lines(p, predict(fit4p, data.frame(x=a)), col="green")
fit4pCurve <- function(x) coef(fit4p)[1] +x*coef(fit4p)[2]+x^2*coef(fit4p)[3]+x^3*coef(fit4p)[4]
fit4tCurve <- function(x) coef(fit4t)[1] +x*coef(fit4t)[2]+x^2*coef(fit4t)[3]+x^3*coef(fit4t)[4]
a_opt1 = optimise(f=function(x) abs(fit4pCurve(x)-fit4tCurve(x)), c(10,14))$minimum
b_opt1 = as.numeric(fit4pCurve(a_opt1))
EDIT:
After fixing the type, I get the correct answer, but it doesn't always work if the samples don't come back as cleanly.
So my question can be broken down a few ways. First, is there a better way to accomplish what I'm trying to do. I know what I'm doing isn't perfectly accurate by any means, but it seems like a decent approximation for my purposes. Second, if there isn't a better way, is there a way I could improve on the methodology I have listed above.
Restart your R session, make sure all variables are cleared and copy/paste this code. I found a few mistakes in referenced variables. Also note that R is case sensitive. My suspicion is that you've been overwriting variables.
plot(a,t, xlim=c(10,14), ylim=c(10,14), col="purple")
points(p,a, col="red")
fit4p <- lm(a~poly(p,3,raw=TRUE))
fit4t <- lm(t~poly(a,3,raw=TRUE))
lines(a, predict(fit4t, data.frame(x=a)), col="purple", xlim=c(T,P), ylim=c(10,14),type="l",xlab="p",ylab="t")
lines(p, predict(fit4p, data.frame(x=a)), col="green")
fit4pCurve <- function(x) coef(fit4p)[1] +x*coef(fit4p)[2]+x^2*coef(fit4p)[3]+x^3*coef(fit4p)[4]
fit4tCurve <- function(x) coef(fit4t)[1] +x*coef(fit4t)[2]+x^2*coef(fit4t)[3]+x^3*coef(fit4t)[4]
a_opt = optimise(f=function(x) abs(fit4pCurve(x)-fit4tCurve(x)), c(T,P))$minimum
b_opt = as.numeric(fit4pCurve(a_opt))
As you will see:
> a_opt
[1] 12.24213
> b_opt
[1] 10.03581

Dataset with NaN values for a filled contour plot with R [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I cannot get my data plotted properly with R. I got measurements from a football field and I did not fill measurements for each grid.
Here is my dataset contour_map_R.csv at https://db.tt/1L7cxilB
It looks like this using image function to plot it.
Any one can provide an example to create a filled contour plot?
Thanks a lot!
As stated in the comments you need to have complete data before you can calculate contours. Therefore you have to interpolate or replace your missing values in some way that makes sense in your case. I've provided a couple of options below, but you'd need to come up with rationale for using one method over another, and whether a more sophisticated geostatistical approach might be warranted. Furthermore you could interpolate to a finer grid than you currently have as well to produce a smoother result (at the expense of potentially making up data).
d <- read.csv("contour_map_R.csv")
library(raster)
r <- raster(as.matrix(d))
contour(r)
v <- getValues(r)
xy <- xyFromCell(r, 1:ncell(r))
## Interpolate using a thin-plate spline:
library(fields)
tps <- Tps(xy, v)
tp <- interpolate(r, tps)
plot(tp)
contour(tp, add=T)
## Alternatively, interpolate using nearest idw():
library(gstat)
dxy <- data.frame(x=xy[,1], y=xy[,2], v)
dxy <- dxy[complete.cases(dxy),]
id <- gstat(formula = v~1, locations = ~x+y, data=dxy)
ip <- interpolate(r, id)
plot(ip)
contour(ip, nlevels=5, add=T)
If that's what you were looking for you can get filled contours by using the filledContour() function on the interpolated rasters (tp or ip).

R Time series - having trouble making bollinger lines - need simple example please

Learning R language - I know how to do a moving average but I need to do more - but I am not a statistician - unfortunately all the docs seem to be written for statisticians.
I do this in excel a lot, it's really handy for analysis of operational activities.
Here are the fields on each row to make bollinger bands:
Value could be # of calls, complaint ratio, anything
TimeStamp | Value | Moving Average | Moving STDEVP | Lower Control | Upper Control
Briefly, the moving avg and the stdevP point to the prior 8 or so values in the series. Lower control at a given point in time is = moving average - 2*moving stdevP and upper control = moving average + 2*moving stdevP
This can easily be done in excel for a single file, but if I can find a way to make R work R will be better for my needs. Hopefully faster and more reliable when automated, too.
links or tips would be appreciated.
You could use the function rollapply() from the zoo package, providing you work with a zoo series :
TimeSeries <- cumsum(rnorm(1000))
ZooSeries <- as.zoo(TimeSeries)
BollLines <- rollapply(ZooSeries,9,function(x){
M <- mean(x)
SD <- sd(x)
c(M,M+SD*2,M-SD*2)
})
Now you have to remember that rollapply uses a centered frame, meaning that it takes the values to the left and the right of the current day. This is also more convenient and more true to the definition of the Bollinger Band than your suggestion of taking x prior values.
If you don't want to convert to zoo, you can use the vectors as well and write your own function. I added an S3 based plotting function that allows you to easily plot the calculations as well. With these functions, you could do something like :
TimeSeries <- cumsum(rnorm(1000))
X <- BollingerBands(TimeSeries,80)
plot(X,TimeSeries,type="l",main="An Example")
to get :
The function codes :
BollingerBands <- function(x,width){
Start <- width +1
Stop <- length(x)
Trail <- rep(NA,ceiling(width/2))
Tail <- rep(NA,floor(width/2))
Lines <- sapply(Start:Stop,function(i){
M <- mean(x[(i-width):i])
SD <- sd(x[(i-width):i])
c(M,M+2*SD,M-2*SD)
})
Lines <- apply(Lines,1,function(i)c(Trail,i,Tail))
Out <- data.frame(Lines)
names(Out) <- c("Mean","Upper","Lower")
class(Out) <- c("BollingerBands",class(Out))
Out
}
plot.BollingerBands <- function(x,data,lcol=c("red","blue","blue"),...){
plot(data,...)
for(i in 1:3){
lines(x[,i],col=lcol[i])
}
}
There is an illustration in the R Graph Gallery (65) giving code both for calculating the bands and for plotting share prices.
The 2005 code still seems to work six years later and will give IBM's current share price and going back several months
The most obvious bug is the width of the bandwidth and volume lower charts which have been narrowed; there may be another over the number of days covered.

Resources