how to detect peaks in a particular range in R - r

I have a time series and I would like to detect (and identify them) some peaks but only for a particular range in R.
here is an example
## generate test data with 3 peaks
set.seed(123)
x <- seq(0, 360, length = 20)
y <- abs(rnorm(20, mean = 1, sd = 0.1))
y[5:10] <- c(2, 4, 7, 3, 4, 2)
y <- c(y, 0.8 * y, 1.2 * y)
x <- seq(0, 360, along = y)
y[6] <- y[7] # test case with 2 neighbouring equal points
plot(x, y, type="b")
#
In that example, let says, I want to select peaks (y) only between 6 and 9 (2 peaks) or only between 2 and 4 (also 2 peaks).
I am aware of several packages in R detecting peaks (e.g. Peaks, pastecs, quantmod, pracma, splus2R) but none seems to have this feature, usually only having a minimum threshold.
Any advice would be appreciated.
thank you
Martin
Edit: The code provided by Eric works perfectly. But with my own datasets I have a small problem. What would you do to detect only one peak if same values twice in a certain window (x). Basically I would like to create a conditional statement that would say, you need a certain number of points (x) between peaks to be considered as two distinctive peaks.

Something like this gets close (not sure if you care about detecting the peak with two values twice).
# Reproduce your data
set.seed(123)
x <- seq(0, 360, length = 20)
y <- abs(rnorm(20, mean = 1, sd = 0.1))
y[5:10] <- c(2, 4, 7, 3, 4, 2)
y <- c(y, 0.8 * y, 1.2 * y)
x <- seq(0, 360, along = y)
y[6] <- y[7] # test case with 2 neighbouring equal points
plot(x, y, type="b")
# shift y up and down a position (for peak identification)
yu <- c(tail(y, -1), NA)
yd <- c(NA, head(y, -1))
# identify peaks that are in the correct range
# where y is higher than the point before and after
high <- which(y - yu >= 0 & y - yd >= 0 & y > 6 & y < 9)
low <- which(y - yu >= 0 & y - yd >= 0 & y >= 2 & y <= 4) # one peak is at 4
# plot lines at peaks
abline(v = x[high], col = 'blue')
abline(v = x[low], col = 'red')

Related

Find approximate value for the probability 𝑃(𝑙𝑜𝑔(𝑌)>𝑠𝑖𝑛(𝑋)) using simulation

I have made a simulation to following distribution:
in the statistic program R and now I have to find a approximate value for the probability P(log(Y ) > sin(X)). How can I do that in R? Can anyone help me?
I hide my own simulation while other with same problem not should
copy it. But I have this simulation from another post that also work:
n <- 1e4
X <- data.frame(x = runif(n, -1, 1), y = runif(n, 0, 1), z = runif(n, 0, 3/2))
i <- with(X, 0 < y & x^2 + y^2 < 1 & z <= (3/2)*y)
X <- X[i, ]
How can I for example use this simulation to find the probability P(log(Y ) > sin(X)) in R?
I do not know how to post the solution without your mates are going to see it as well ... ;-)
# part 1: prepare probability density distribution on rect -1,...1
n <- 1e4
X <- data.frame(x = runif(n, -1, 1), y = runif(n, -1, 1), h=1)
X$h <- 3/2*X$y # set probability density h = 3/2*y
head(X)
# part 2: restrict to half disk and normalize probability h to equal 1
i <- with(X, 0 < y & x^2 + y^2 < 1)
X <- X[i, ]
X$h <- X$h / sum(X$h)
plot(X[, 1:2], asp=1, pch='.')
# measure probability for points with log(y) > sin(x)
ii <- with(X, log(y) > sin(x))
points(X[ii, 1:2], pch='.', col="red")
p <- sum(X[ii, "h"])
p

Understanding "levels" in r contour function of bivariate distribution

I have trouble understanding how to set the levels in the plot of a bivariate distribution in r. The documentation states that I can choose the levels by setting a
numeric vector of levels at which to draw contour lines
Now I would like the contour to show the limit containing 95% of the density or mass. But if, in the example below (adapted from here) I set the vector as a <- c(.95,.90) the code runs without error but the plot is not displayed. If instead, I set the vector as a <- c(.01,.05) the plot is displayed. But I am not sure I understand what the labels "0.01" and "0.05" mean with respect to the density.
library(mnormt)
x <- seq(-5, 5, 0.25)
y <- seq(-5, 5, 0.25)
mu1 <- c(0, 0)
sigma1 <- matrix(c(2, -1, -1, 2), nrow = 2)
f <- function(x, y) dmnorm(cbind(x, y), mu1, sigma1)
z <- outer(x, y, f)
a <- c(.01,.05)
contour(x, y, z, levels = a)
But I am not sure I understand what the labels "0.01" and "0.05" mean with respect to the density.
It means the points where the density is equal 0.01 and 0.05. From help("contour"):
numeric vector of levels at which to draw contour lines.
So it is the function values at which to draw the lines (contours) where the function is equal to those levels (in this case the density). Take a simple example which may help is x + y:
y <- x <- seq(0, 1, length.out = 50)
z <- outer(x, y, `+`)
par(mar = c(5, 5, 1, 1))
contour(x, y, z, levels = c(0.5, 1, 1.5))
Now I would like the contour to show the limit containing 95% of the density or mass.
In your example, you can follow my answer here and draw the exact points:
# input
mu1 <- c(0, 0)
sigma1 <- matrix(c(2, -1, -1, 2), nrow = 2)
# we start from points on the unit circle
n_points <- 100
xy <- cbind(sin(seq(0, 2 * pi, length.out = n_points)),
cos(seq(0, 2 * pi, length.out = n_points)))
# then we scale the dimensions
ev <- eigen(sigma1)
xy[, 1] <- xy[, 1] * 1
xy[, 2] <- xy[, 2] * sqrt(min(ev$values) / max(ev$values))
# then rotate
phi <- atan(ev$vectors[2, 1] / ev$vectors[1, 1])
R <- matrix(c(cos(phi), sin(phi), -sin(phi), cos(phi)), 2)
xy <- tcrossprod(R, xy)
# find the right length. You can change .95 to which ever
# quantile you want
chi_vals <- qchisq(.95, df = 2) * max(ev$values)
s <- sqrt(chi_vals)
par(mar = c(5, 5, 1, 1))
plot(s * xy[1, ] + mu1[1], s * xy[2, ] + mu1[2], lty = 1,
type = "l", xlab = "x", ylab = "y")
The levels indicates where the lines are drawn, with respect to the specific 'z' value of the bivariate normal density. Since max(z) is
0.09188815, levels of a <- c(.95,.90) can't be drawn.
To draw the line delimiting 95% of the mass I used the ellipse() function as suggested in this post (second answer from the top).
library(mixtools)
library(mnormt)
x <- seq(-5, 5, 0.25)
y <- seq(-5, 5, 0.25)
mu1 <- c(0, 0)
sigma1 <- matrix(c(2, -1, -1, 2), nrow = 2)
f <- function(x, y) dmnorm(cbind(x, y), mu1, sigma1)
z <- outer(x, y, f)
a <- c(.01,.05)
contour(x, y, z, levels = a)
ellipse(mu=mu1, sigma=sigma1, alpha = .05, npoints = 250, col="red")
I also found another solution in the book "Applied Multivariate Statistics with R" by Daniel Zelterman.
# Figure 6.5: Bivariate confidence ellipse
library(datasets)
library(MASS)
library(MVA)
#> Loading required package: HSAUR2
#> Loading required package: tools
biv <- swiss[, 2 : 3] # Extract bivariate data
bivCI <- function(s, xbar, n, alpha, m)
# returns m (x,y) coordinates of 1-alpha joint confidence ellipse of mean
{
x <- sin( 2* pi * (0 : (m - 1) )/ (m - 1)) # m points on a unit circle
y <- cos( 2* pi * (0 : (m - 1)) / (m - 1))
cv <- qchisq(1 - alpha, 2) # chisquared critical value
cv <- cv / n # value of quadratic form
for (i in 1 : m)
{
pair <- c(x[i], y[i]) # ith (x,y) pair
q <- pair %*% solve(s, pair) # quadratic form
x[i] <- x[i] * sqrt(cv / q) + xbar[1]
y[i] <- y[i] * sqrt(cv / q) + xbar[2]
}
return(cbind(x, y))
}
### pdf(file = "bivSwiss.pdf")
plot(biv, col = "red", pch = 16, cex.lab = 1.5)
lines(bivCI(var(biv), colMeans(biv), dim(biv)[1], .01, 1000), type = "l",
col = "blue")
lines(bivCI(var(biv), colMeans(biv), dim(biv)[1], .05, 1000),
type = "l", col = "green", lwd = 1)
lines(colMeans(biv)[1], colMeans(biv)[2], pch = 3, cex = .8, type = "p",
lwd = 1)
Created on 2021-03-15 by the reprex package (v0.3.0)

Smooth change of day length

I want to model what it might look like to have the day length change smoothly over time (but remain sinusoidal). The formula for a "chirp", to change the instantaneous frequency is given at https://en.wikipedia.org/wiki/Chirp but it doesn't look right when coded for a 24h period over 5 days and then a transition to 12h over another 5 days:
period = list( c(24,24,5), c(24,12,5) )
alpha = list( c(0,5), c(0,5) )
s_samples = 100
A=50
O=50
simulatedData = data.frame(t=numeric(), v=numeric()) #initialise the output
daySteps = c(0, cumsum(unlist(period)[seq(3,length(unlist(period)), by=3)])) #set up the period starts and ends to set over, starting at 0
##Cycle over each of the items in the list
for(set in seq(period) ){
t_points = s_samples*period[[set]][3]
t = seq(daySteps[set], daySteps[set+1], length.out=t_points) #make the time
slope = (24/period[[set]][2]-24/period[[set]][1])/(max(t)-min(t)) # get the slope
f0 = 24/period[[set]][1] - slope*(min(t)) # find the freq when t0
c = (24/period[[set]][2]-f0)/(max(t)) #calculate the chirp see https://en.wikipedia.org/wiki/Chirp and https://dsp.stackexchange.com/questions/57904/chirp-after-t-seconds
wt = ((c*(t^2))/2) + f0*(t) # calc the freq
a = alpha[[set]][1]
v = A * cos(2*pi*wt - a) + O
simulatedData = rbind(simulatedData, data.frame(t, v) )
}
plot(simulatedData, type="l", lwd=2)
t = seq(0,sum(unlist(period)[seq(3,length(unlist(period)), by=3)]), by=1/24)
points(t, A*cos(2*pi*t)+O, col=3, type="l", lty=2)
points(t, A*cos(2*(24/12)*pi*t)+O, col=4, type="l", lty=2)
The first 24 are perfect, as expected, and the last part of the second 5 days matches a 12h cycled, but the first part of that period looks 180deg out of phase. What's wrong?
I think you're making this a lot more complex than it needs to be. Remember that many R functions are already vectorized. The following function will produce a linear chirp between frequencies f0 and f1 between t0 and t1, with an optional phi parameter to specify at what point on the cycle you want your sequence to begin:
chirp <- function(f0, f1, t0, t1, phi = 0, n_steps = 1000)
{
C <- (f1 - f0)/(t1 - t0)
x <- seq(t0, t1, length.out = n_steps)
y <- sin(2 * pi * (C / 2 * (x - t0)^2 + f0 * (x - t0)) + phi) # Ref Wikipedia
data.frame(x, y)
}
Of course, it can also produce the static first half of your plot by "chirping" between two identical frequencies, so we can get a data frame of x, y points on the plot by doing
df <- rbind(chirp(1, 1, 0, 5), chirp(1, 2, 5, 10))
Which results in:
plot(df$x, df$y, type = "l")
Note that between 5 and 10 days there are 7.5 cycles, so if you wanted to smoothly continue frequency 2, you would need to set the phi parameter to a half cycle (i.e. to pi):
df <- rbind(df, chirp(2, 2, 10, 15, phi = pi))
plot(df$x, df$y, type = "l")
Note that the phases of the chirped signal and a 2 Hz signal will only match after n seconds if the chirp occurs over an even number of periods of the original signal. For an odd number, the phase will be out by 180 degrees. This is a mathematical consequence of a linear chirp. To see this, let's use our function to chirp over 6 seconds so the phases match at 10 seconds:
plot(df$x, df$y, type = "l")
lines(df2$x, df2$y, lty = 2, col = "green")
lines(df3$x, df3$y, lty = 2, col = "blue")
lines(df$x, df$y)

Finding points in a coordinate system based on equations

I have a number of inequalities that I would like to translate to points into a 3-d coordinate system, x, y and z.
I need to find all possible points that fulfill all my inequalities, save each point, and then plot it as a 3d-scatterplot.
Simplified data example:
#Inequalities (I have several more)
df <- data.frame(
x = c(0.5, 0.4, 0.1),
y = c(0.7, 0.11, -0.25),
z = c(-0.5, -0.02, 1),
v = c(90, 2500, 350))
# Limits of the coordinate system
x.lim <- seq(-100, 100, by = 1)
y.lim <- seq(-100, 100, by = 1)
z.lim <- seq(-50, 50, by = 1)
# Basic check - must be true for all points:
df$x + df$y + df$z < df$v
# Looping through all points of the coordinate system
# no need to test row 2, # if the first row is false
df$x*-100 + df$y*-100 + df$z*-50 < df$v
# if all conditions are true, save the point to a list/matrix to be able to plot it
df$x*-99 + df$y*-100 + df$z*-50 < df$v
#...
df$x*-100 + df$y*-99 + df$z*-50 < df$v
df$x*-99 + df$y*-98 + df$z*-50 < df$v
#...
df$x*100 + df$y*100 + df$z*50 < df$v
So, in the end I would have a matrix m - that would contain all test for true variables, which would look something like this:
m
x y z
-100 -100 -50
99 -100 -50
...
100 100 50
In the end I would be able to plot all my TRUE-TRUE-TRUE results into a 3d-scatterplot such as:
plot3d(x = m[, 1], y = m[, 2], z = m[, 3], col = "blue", size = 4,
xlim = c(-100, 100), ylim = c(-100, 100), zlim = c(-50, 50))
I potentially have a lot more equations and much larger coordinate system, so speed is also a issue I am looking at. My main problem is how to write a conditional loop and look at all the different values that are available and then save the correct ones into a matrix.

Plot 3D data in R

I have a 3D dataset:
data = data.frame(
x = rep( c(0.1, 0.2, 0.3, 0.4, 0.5), each=5),
y = rep( c(1, 2, 3, 4, 5), 5)
)
data$z = runif(
25,
min = (data$x*data$y - 0.1 * (data$x*data$y)),
max = (data$x*data$y + 0.1 * (data$x*data$y))
)
data
str(data)
And I want to plot it, but the built-in-functions of R alwyas give the error
increasing 'x' and 'y' values expected
# ### 3D Plots ######################################################
# built-in function always give the error
# "increasing 'x' and 'y' values expected"
demo(image)
image(x = data$x, y = data$y, z = data$z)
demo(persp)
persp(data$x,data$y,data$z)
contour(data$x,data$y,data$z)
When I searched on the internet, I found that this message happens when combinations of X and Y values are not unique. But here they are unique.
I tried some other libraries and there it works without problems. But I don't like the default style of the plots (the built-in functions should fulfill my expectations).
# ### 3D Scatterplot ######################################################
# Nice plots without surface maps?
install.packages("scatterplot3d", dependencies = TRUE)
library(scatterplot3d)
scatterplot3d(x = data$x, y = data$y, z = data$z)
# ### 3D Scatterplot ######################################################
# Only to play around?
install.packages("rgl", dependencies = TRUE)
library(rgl)
plot3d(x = data$x, y = data$y, z = data$z)
lines3d(x = data$x, y = data$y, z = data$z)
surface3d(x = data$x, y = data$y, z = data$z)
Why are my datasets not accepted by the built-in functions?
I use the lattice package for almost everything I plot in R and it has a corresponing plot to persp called wireframe. Let data be the way Sven defined it.
wireframe(z ~ x * y, data=data)
Or how about this (modification of fig 6.3 in Deepanyan Sarkar's book):
p <- wireframe(z ~ x * y, data=data)
npanel <- c(4, 2)
rotx <- c(-50, -80)
rotz <- seq(30, 300, length = npanel[1]+1)
update(p[rep(1, prod(npanel))], layout = npanel,
panel = function(..., screen) {
panel.wireframe(..., screen = list(z = rotz[current.column()],
x = rotx[current.row()]))
})
Update: Plotting surfaces with OpenGL
Since this post continues to draw attention I want to add the OpenGL way to make 3-d plots too (as suggested by #tucson below). First we need to reformat the dataset from xyz-tripplets to axis vectors x and y and a matrix z.
x <- 1:5/10
y <- 1:5
z <- x %o% y
z <- z + .2*z*runif(25) - .1*z
library(rgl)
persp3d(x, y, z, col="skyblue")
This image can be freely rotated and scaled using the mouse, or modified with additional commands, and when you are happy with it you save it using rgl.snapshot.
rgl.snapshot("myplot.png")
Adding to the solutions of others, I'd like to suggest using the plotly package for R, as this has worked well for me.
Below, I'm using the reformatted dataset suggested above, from xyz-tripplets to axis vectors x and y and a matrix z:
x <- 1:5/10
y <- 1:5
z <- x %o% y
z <- z + .2*z*runif(25) - .1*z
library(plotly)
plot_ly(x=x,y=y,z=z, type="surface")
The rendered surface can be rotated and scaled using the mouse. This works fairly well in RStudio.
You can also try it with the built-in volcano dataset from R:
plot_ly(z=volcano, type="surface")
If you're working with "real" data for which the grid intervals and sequence cannot be guaranteed to be increasing or unique (hopefully the (x,y,z) combinations are unique at least, even if these triples are duplicated), I would recommend the akima package for interpolating from an irregular grid to a regular one.
Using your definition of data:
library(akima)
im <- with(data,interp(x,y,z))
with(im,image(x,y,z))
And this should work not only with image but similar functions as well.
Note that the default grid to which your data is mapped to by akima::interp is defined by 40 equal intervals spanning the range of x and y values:
> formals(akima::interp)[c("xo","yo")]
$xo
seq(min(x), max(x), length = 40)
$yo
seq(min(y), max(y), length = 40)
But of course, this can be overridden by passing arguments xo and yo to akima::interp.
I think the following code is close to what you want
x <- c(0.1, 0.2, 0.3, 0.4, 0.5)
y <- c(1, 2, 3, 4, 5)
zfun <- function(a,b) {a*b * ( 0.9 + 0.2*runif(a*b) )}
z <- outer(x, y, FUN="zfun")
It gives data like this (note that x and y are both increasing)
> x
[1] 0.1 0.2 0.3 0.4 0.5
> y
[1] 1 2 3 4 5
> z
[,1] [,2] [,3] [,4] [,5]
[1,] 0.1037159 0.2123455 0.3244514 0.4106079 0.4777380
[2,] 0.2144338 0.4109414 0.5586709 0.7623481 0.9683732
[3,] 0.3138063 0.6015035 0.8308649 1.2713930 1.5498939
[4,] 0.4023375 0.8500672 1.3052275 1.4541517 1.9398106
[5,] 0.5146506 1.0295172 1.5257186 2.1753611 2.5046223
and a graph like
persp(x, y, z)
Not sure why the code above did not work for the library rgl, but the following link has a great example with the same library.
Run the code in R and you will obtain a beautiful 3d plot that you can turn around in all angles.
http://statisticsr.blogspot.de/2008/10/some-r-functions.html
########################################################################
## another example of 3d plot from my personal reserach, use rgl library
########################################################################
# 3D visualization device system
library(rgl);
data(volcano)
dim(volcano)
peak.height <- volcano;
ppm.index <- (1:nrow(volcano));
sample.index <- (1:ncol(volcano));
zlim <- range(peak.height)
zlen <- zlim[2] - zlim[1] + 1
colorlut <- terrain.colors(zlen) # height color lookup table
col <- colorlut[(peak.height-zlim[1]+1)] # assign colors to heights for each point
open3d()
ppm.index1 <- ppm.index*zlim[2]/max(ppm.index);
sample.index1 <- sample.index*zlim[2]/max(sample.index)
title.name <- paste("plot3d ", "volcano", sep = "");
surface3d(ppm.index1, sample.index1, peak.height, color=col, back="lines", main = title.name);
grid3d(c("x", "y+", "z"), n =20)
sample.name <- paste("col.", 1:ncol(volcano), sep="");
sample.label <- as.integer(seq(1, length(sample.name), length = 5));
axis3d('y+',at = sample.index1[sample.label], sample.name[sample.label], cex = 0.3);
axis3d('y',at = sample.index1[sample.label], sample.name[sample.label], cex = 0.3)
axis3d('z',pos=c(0, 0, NA))
ppm.label <- as.integer(seq(1, length(ppm.index), length = 10));
axes3d('x', at=c(ppm.index1[ppm.label], 0, 0), abs(round(ppm.index[ppm.label], 2)), cex = 0.3);
title3d(main = title.name, sub = "test", xlab = "ppm", ylab = "samples", zlab = "peak")
rgl.bringtotop();

Resources