I have a zoo object that contains velocity data from two different points (V1 and V2), as well as particle Data from the same two points. The distance between the two points is 170m.
Date<- as.POSIXct("2012-01-01 08:00:00") + 1:120
V1<-rnorm(200,mean=5) #Velocity in m/sec
R<-rnorm(4,mean=3)
V2<-V1+R #Velocity in m/sec
Data1<-rnorm(200, mean=20)
Data2<-rnorm(200, mean=25)
V<-data.frame(V1,V2,Data1,Data2)
z<-zoo(as.matrix(V),order.by=Date)
L<-170 #Length =170m
If I average the velocity data
z$Avg_Vel<-rowMeans(z[,1:2])
I should have a pretty good idea of how fast the particles are traveling, and since I know the distance I should have a good idea of how long it is taking the particles to travel from Point 1 to Point 2 during the course of the time series.
z$Off<-L/z$Avg_Vel
But I cant figure out how to offset my zoo object to account for the time delay it takes for particles to travel between the two points. So if I am interested in finding the difference between Data 1 and Data 2, I don't want to do
Diff<-z$Data1-z$Data2
As this does not include the offset
If it takes 2 minutes for the particles to travel from point 1 to point 2, than I would want
Diff<-z$Data1-z$Data2(+2min)
So that I am looking at the difference between Data1 at time x, and Data2 at time x+2min
To clarify in response to an answer, the end result would be a rolling offset. So that
Offset<-z$Off
Looking at this kind of Offset
round(as.numeric(z$Off))
The result would look like this
1 Diff<- Diff<-z$Data1-z$Data2(+22 sec)
2 Diff<- Diff<-z$Data1-z$Data2(+23 sec)
3 Diff<- Diff<-z$Data1-z$Data2(+32 sec)..........
This is a way to include an offset:
offset <- 120 # 2 minutes in seconds
ix <- index(z) + offset # new time index
Calculate the difference with a 2-minute offset:
z$Data1[rev(index(z) %in% ix)] -
as.numeric(z$Data2[index(z) %in% ix])
Your example time series is too short for an offset of 2 minutes. I tested it with a 1-minute offset instead (offset = 60).
If you want to use a vector of offsets, use this:
offsets <- sample(1:5, nrow(z), TRUE) # some example offsets (in ms)
# alternatively you could use:
# round(as.numeric(z$Off))
ixs <- index(z) + offsets
ixs_num <- match(ixs, index(z), nomatch = NA)
z$Data1[seq(length(ixs_num))[!is.na(ixs_num)]] -
as.numeric(z$Data2)[na.omit(ixs_num)]
Note. This procedure works for both positive and negative offsets.
Related
I am trying to find the duration in seconds of a signal, to do this I use a function from (Finding start and end of a peak in time series in R).
y = signal
y$startEndPeak <- NA
which.min(abs(diff(y$peak)))
i.mins <- which(diff(sign(diff(c(Inf, y$peak, Inf)))) == 2)
i.mins
i.mx <- which.max(y$peak)
i.mx
y$startEndPeak[i.mx]<- "peak"
ind <- sort(c(i.mins, i.mx))
ix <- ind[which(ind == i.mx) + c(-1, 1)]
ix
y$startEndPeak[ix]<- c("start", "end")
## start, end, peak
ix2<- c(ix,i.mx)
ix2
dat<- y[ix2,]
dat
The output:
Time
Peak
startEndPeak
4.46
-649.774
start
5.86
19791.226
end
5.00
48655.226
peak
I was able to detect the maximum point (peak), but the code fails to detect correctly the minimum points to the peak. Thus, it is impossible to determine correctly the duration of the signal.
This is what I want to identify, x= time, y= signal:
I would like to know if:
Is there any way to find these points accurately? I have tried multiple functions and packages in R, I have even tried Matlab and I have not been able to achieve precision.
It may be possible to draw a line at the minimum point on the left of the peak and find the intersection point on the other side of the sign.
You can download one example of the signal here
Thank you for your time!
Here is a simple solution using pspline and the time series derivative of the data. You may have to tune the cutoff values and the ignore region judging by other peaks as well:
get_duration <- function(df) {
deriv <- as.vector(predict(sm.spline(df$time, df$peak), df$time, 1))
start <- which.max(deriv > 7000)
ignore_region <- 60
end <- start + ignore_region + which.max(deriv[(start + ignore_region):length(deriv)] > -100)
df$time[end] - df$time[start]
}
get_duration(df)
#[1] 3.22
Taking the ideas from the following links:
the local minimum between the two peaks
How to explain ...
I look for the local minimum or minimums, avoiding the use of functions already created for this purpose [max / min locale or global].
Our progress:
#DATA
simulate <- function(lambda=0.3, mu=c(0, 4), sd=c(1, 1), n.obs=10^5) {
x1 <- rnorm(n.obs, mu[1], sd[1])
x2 <- rnorm(n.obs, mu[2], sd[2])
return(ifelse(runif(n.obs) < lambda, x1, x2))
}
data <- simulate()
hist(data)
d <- density(data)
#
#https://stackoverflow.com/a/25276661/8409550
##Since the x-values are equally spaced, we can estimate dy using diff(d$y)
d$x[which.min(abs(diff(d$y)))]
#With our data we did not obtain the expected value
#
d$x[which(diff(sign(diff(d$y)))>0)+1]#pit
d$x[which(diff(sign(diff(d$y)))<0)+1]#peak
#we check
#1
optimize(approxfun(d$x,d$y),interval=c(0,4))$minimum
optimize(approxfun(d$x,d$y),interval=c(0,4),maximum = TRUE)$maximum
#2
tp <- pastecs::turnpoints(d$y)
summary(tp)
ind <- (1:length(d$y))[extract(tp, no.tp = FALSE, peak = TRUE, pit = TRUE)]
d$x[ind[2]]
d$x[ind[1]]
d$x[ind[3]]
My questions and request for help:
Why did the command lines fail:
d$x[which.min(abs(diff(d$y)))]
It is possible to eliminate the need to add one to the index in the command lines:
d$x[which(diff(sign(diff(d$y)))>0)+1]#pit
d$x[which(diff(sign(diff(d$y)))<0)+1]#peak
How to get the optimize function to return the two expected maximum values?
Question 1
The answer to the first question is straighforward. The line d$x[which.min(abs(diff(d$y)))] asks for the x value at which there was the smallest change in y between two consecutive points. The answer is that this happened at the extreme right of the plot where the density curve is essentially flat:
which.min(abs(diff(d$y)))
#> [1] 511
length(abs(diff(d$y)))
#> [1] 511
This is not only smaller than the difference at your local maxima /minima points; it is orders of magnitude smaller. Let's zoom in to the peak value of d$y, including only the peak and the point on each side:
which.max(d$y)
#> [1] 324
plot(d$x[323:325], d$y[323:325])
We can see that the smallest difference is around 0.00005, or 5^-5, between two consecutive points. Now look at the end of the plot where it is flattest:
plot(d$x[510:512], d$y[510:512])
The difference is about 1^-7, which is why this is the flattest point.
Question 2
The answer to your second question is "no, not really". You are taking a double diff, which is two elements shorter than x, and if x is n elements long, a double diff will correspond to elements 2 to (n - 1) in x. You can remove the +1 from the index, but you will have an off-by-one error if you do that. If you really wanted to, you could concatenate dummy zeros at each stage of the diff, like this:
d$x[which(c(0, diff(sign(diff(c(d$y, 0))))) > 0)]
which gives the same result, but this is longer, harder to read and harder to justify, so why would you?
Question 3
The answer to the third question is that you could use the "pit" as the dividing point between the minimum and maximum value of d$x to find the two "peaks". If you really want a single call to get both at once, you could do it inside an sapply:
pit <- optimize(approxfun(d$x,d$y),interval=c(0,4))$minimum
peaks <- sapply(1:2, function(i) {
optimize(approxfun(d$x, d$y),
interval = c(min(d$x), pit, max(d$x))[i:(i + 1)],
maximum = TRUE)$maximum
})
pit
#> [1] 1.691798
peaks
#> [1] -0.02249845 3.99552521
i have data in the following form (2 examples):
p1 <- structure(c(1.38172177074188, 1.18601365390563, 1.25131938561825,
1.07175353794277, 0.887770295772917, 0.806599968169486, 0.843543355495394,
0.889051695167723, 0.764131945540256, 0.699309441111923, 0.945165791967098,
1.31310409471336), .Dim = 12L)
p2 <- structure(c(1.24801075135611, 1.06280347993594, 1.21410288703334,
1.36797720634294, 1.07291218307332, 0.936924063490867, 0.819723966406961,
0.854960740335283, 0.718565087630857, 0.649827141012991, 0.785853771875901,
1.04368795443605), .Dim = 12L)
These are standardized monthly means of hydrological time series; so-called Pardé regimes that give some indication about annual seasonality. To do further analysis, i need to derive the 3 highest and lowest months from these Pardé series. Because seasonality can be bimodal, i need to identify the 3 highest/lowest consecutive data points (which are most often not the three absolute highest/lowest data points, see examples) to derive the timing of the most wet and dry periods. Up to now i failed because of the circular character of the time series, which poses a special challenge.
Any suggestions?
You could use filter. It sums consecutive values and can deal with circular time series.
f1 <- stats::filter(p1, c(1, 1, 1), circular = TRUE, sides = 1)
#Time Series:
# Start = 1
#End = 12
#Frequency = 1
#[1] 3.639992 3.880840 3.819055 3.509087 3.210843 2.766124 2.537914 2.539195 2.496727 2.352493 2.408607 2.957579
((which.max(f1) - (3:1)) %% 12) + 1
#[1] 12 1 2
I have data which is movement of an object in 3D space at regular time intervals. Data is as below:
Time X Y Z
1 1 1 1
2 2 1 2
3 2 0 1
4 3 2 1
.....
(x,y,z) is the position of object at time t. I want to plot a 3D graph where it shows the complete movement of object in 3d space, but to have a slider or something of that sort where I can select a time range (say 500 to 750) and see the movement of the object in 3D space. So, here we have 4 dimensions: x,y,z are positions and time as 4th dimension and use a slider to control the plotting of points with in that time. [Example in Mathematica below gives a good idea about this]
To make it more clear. We first draw the complete movement of the object in 3D space from time 1 to N. Then, by controlling the slider, we draw the movement of same object between t1 to t2 time stamps. It is also important to display at what time the slider is at (as I have to make a note of some interested time stamps based on the movement).
I have Googled the same, but no example was close enough to get me what I want. All of those bind the slider to one of the axis variables (say x or y which might be time) but we have to bind it to 4th dimension, time. dygraphs was promising but I had similar issues as discussed above (also, didn't find any 3d support).
This one in Mathematica is interesting. But I don't have license for it. It just moves a point on the 3D path traced. This can solve my problem as well, but I should be able to know the time-stamp values when I pause it.
Solution in R is good for me because it does not have any licensing issues. Or in Matlab if it does not use any advanced visualization toolboxes. Or Python.
Thanks in Advance.
This is a raw example that can be customized as desired. It uses manipulate and plot3D
library(manipulate)
library(plot3D)
min_time <- 1
max_time <- 100
time_interval <- min_time:max_time
# Create data frame
DF <- data.frame(t = time_interval)
# Time parametric functions
X <- function(t) {
return(2 * t)
}
Y <- function(t) {
return(t ** 2)
}
Z <- function(t) {
return(10 * cos(t / 100))
}
# Update data frame
DF$x <- sapply(DF$t, X)
DF$y <- sapply(DF$t, Y)
DF$z <- sapply(DF$t, Z)
# Use manipulate with RStudio
manipulate({
lines3D(x = DF$x, y = DF$y, z = DF$z)
scatter3D(
x = DF$x[t],
y = DF$y[t],
z = DF$z[t],
add = TRUE
)
}, t = slider(min_time, max_time))
I'm new to R and I am attempting to take a set of time series and run them through a Conditional Inference Tree to help classify the shape of the time series. The problem is that not all of the time sereis are of the same number of periods. I am trying to expand each time series to be 30 periods long, but still maintain the same "shape". This is as far as I have got
Require(zoo)
test<-c(606,518,519,541,624,728,560,512,777,728,1014,1100,930,798,648,589,680,635,607,544,566)
accordion<-function(A,N){
x<-ts(scale(A), start=c(1,1), frequency=1)
X1 <- zoo(x,seq(from = 1, to = N, by =(N-1)/(length(x)-1) ))
X2<-merge(X1, zoo(order.by=seq(start(X1), end(X1)-1, by=((N-1)/length(x))/(N/length(x)))))
X3<-na.approx(X2)
return(X3)}
expand.test<-accordion(test,30)
plot(expand.test); lines(scale(test))
length(expand.test)
The above code, scales the time series and then evenly spaces it out to 30 periods and interpolates the missing values. However, the length of the returned series is 42 units and not 30, however it retains the same "shape" as the orignal time series. Does anyone know how to modify this so that the results produced by the function accordian are 30 periods long and the time series shape remains relatively unchanged?
I think there's a base R solution here. Check out approx(), which does linear (or constant) interpolation with as many points n as you specify. Here I think you want n = 30.
test2 <- approx(test, n=30)
plot(test2)
points(test, pch="*")
This returns a list test2 where the second element y is your interpolated values. I haven't yet used your time series object, but it seems that was entirely interior to your function, correct?