I have a smoothed time series and want to find the instantaneous velocity of the function at any point along the line.
What I want to do is take a series of values: ex(1,6,5,4,3,5,6,7,1)
and return the derivative of each relative to the function of the entire series, such that at every point in time, I know what direction the line is trending.
I am new to R, but know there must be a way.
Any tips?
Ex:
library(smoother)
data(BJsales)
m <- data.frame(BJsales)
x.smth <- as.data.frame(smth.gaussian(m$BJsales,tails=TRUE,alpha = 5))
x.smth.ts <- cbind(seq(1:nrow(m)),x.smth)
colnames(x.smth.ts) <- c("x","y")
x.smth.ts
plot(x.smth.ts$y~x.smth.ts$x)
Desired output:
df with 2 columns: x, deriv.of.y
Edit: Final Result thanks to G5W
TS with Color by Derivative
Your proposed example using the BJSales data is decidedly not differentiable,
so instead I will show the derivative of a much smoother function. If your real data is smooth, this should work for you.
The simplest way to approximate the derivative is simply to use finite differences.
f'(x) ≈ (f(x+h) - f(x))/h
## Smooth sample function
x = seq(0,10,0.1)
y = x/2 + sin(x)
plot(x,y, pch=20)
## Simplest - first difference
d1 = diff(y)/diff(x)
d1 = c(d1[1],d1)
Let's use it to plot a tangent line as an error check. I picked a place to draw the tangent line arbitrarily: the 18th point, x=1.7
plot(x,y, type="l")
abline(y[18]-x[18]*d1[18], d1[18])
To get the data.frame that you requested, you just need
Derivative = data.frame(x, d1)
Related
I am trying to find the x-values of the inflection points in the curve of a Kernel density plot that I computed with the density() function.
I found the following answered question helpful in finding the turning points:
How to find all the turning points on a kernel density curve when window width varies.
So I would think there must be a way to fnd the x-values of the inflection points, too.
Would be great if somene has a tipp.
By definition, an inflection point is the point where the second derivative of the function equals zero. In the practice, this means that an inflection point will be a point where the slope passes from increasing to decreasing, or v.v. Using this definition, I came with this approximate and non-automatic approach:
Let's say that you have a dataframe, that I will call all, which contains the x-values in the first column, and the result of the density computation in the second one. From this dataframe, we can calculate the slope of two consecutive points like this :
slopes <- vector()
for(i in (2:nrow(all))){
x1 <- all[i-1, 1]
x2 <- all[i, 1]
y1 <- all[i-1, 2]
y2 <- all[i, 2]
slope_i <- (y2-y1)/(x2-x1)
slopes <- append(slopes, slope_i)
}
By the definition of inflection point, we can now calculate if, from one point to another, the slope gets larger or smaller:
increment <- vector()
for(j in 2:length(slopes)){
increment_j <- slopes[j] - slopes[j-1]
increment <- append(increment, increment_j)
}
The inflection points will be those points were this increment passes from positive to negative, or v.v.
Now, let's separate these increments in positive and negative:
pos <- which(increment>0)
neg <- which(increment<0
Now, whenever there is a jump in these pos or neg vectors, it means we have an inflection point. So, once again:
steps_p <- vector()
for(k in 2:length(pos)){
steps_k <- pos[k] - pos[k-1]
steps_p <- append(steps_p, steps_k)
}
steps_n <- vector()
for(k in 2:length(neg)){
steps_k <- neg[k] - neg[k-1]
steps_n <- append(steps_n, steps_k)
}
Now, just ask R:
which(steps_p>1)
which(steps_n>1)
This are the indices of your inflection points, now just go to your original dataframe and ask for the x value:
all[pos[which(steps_p>1)],1]
all[neg[which(steps_n>1)],1]
Take in mind that the x value will be close to exact, but not quite, as during every loop we lose one index, but it will still be a very close solution.
I have a 2-dimensional data set.
I use the R's smooth.spline function to smooth my points graph following an example in this article:
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/predict.smooth.spline.html
So that I get the spline graph similar to the green line on this picture
I'd like to know the X values, where the first derivative of the smoothing spline equals zero (to determine exact minimum or maximum).
My problem is that my initial dataset (or a dataset that I could auto-generate) to feed into the predict() function does not contain such exact X values that correspond to the smoothing spline extrema.
How can I find such X values?
Here is the picture of the first derivative of the green spline line above
But exact X coordinate of extremums are still not exact.
My approximate R script to generate the pictures looks like the following
sp1 <- smooth.spline(df)
pred.prime <- predict(sp1, deriv=1)
pred.second <- predict(sp1, deriv=2)
d1 <- data.frame(pred.prime)
d2 <- data.frame(pred.second)
dfMinimums <- d1[abs(d1$y) < 1e-4, c('x','y')]
I think that there are two problems here.
You are using the original x-values and they are spaced too far apart AND
Because of the wide spacing of the x's, your threshold for where you consider the derivative "close enough" to zero is too high.
Here is basically your code but with many more x values and requiring smaller derivatives. Since you do not provide any data, I made a coarse approximation to it that should suffice for illustration.
## Coarse approximation of your data
x = runif(300, 0,45000)
y = sin(x/5000) + sin(x/950)/4 + rnorm(300, 0,0.05)
df = data.frame(x,y)
sp1 <- smooth.spline(df)
Spline code
Sx = seq(0,45000,10)
pred.spline <- predict(sp1, Sx)
d0 <- data.frame(pred.spline)
pred.prime <- predict(sp1, Sx, deriv=1)
d1 <- data.frame(pred.prime)
Mins = which(abs(d1$y) < mean(abs(d1$y))/150)
plot(df, pch=20, col="navy")
lines(sp1, col="darkgreen")
points(d0[Mins,], pch=20, col="red")
The extrema look pretty good.
plot(d1, type="l")
points(d1[Mins,], pch=20, col="red")
The points identified look like zeros of the derivative.
You can use my R package SplinesUtils: https://github.com/ZheyuanLi/SplinesUtils, which can be installed by
devtools::install_github("ZheyuanLi/SplinesUtils")
The function to be used are SmoothSplinesAsPiecePoly and solve. I will just use the example under the documentation.
library(SplinesUtils)
## a toy dataset
set.seed(0)
x <- 1:100 + runif(100, -0.1, 0.1)
y <- poly(x, 9) %*% rnorm(9)
y <- y + rnorm(length(y), 0, 0.2 * sd(y))
## fit a smoothing spline
sm <- smooth.spline(x, y)
## coerce "smooth.spline" object to "PiecePoly" object
oo <- SmoothSplineAsPiecePoly(sm)
## plot the spline
plot(oo)
## find all stationary / saddle points
xs <- solve(oo, deriv = 1)
#[1] 3.791103 15.957159 21.918534 23.034192 25.958486 39.799999 58.627431
#[8] 74.583000 87.049227 96.544430
## predict the "PiecePoly" at stationary / saddle points
ys <- predict(oo, xs)
#[1] -0.92224176 0.38751847 0.09951236 0.10764884 0.05960727 0.52068566
#[7] -0.51029209 0.15989592 -0.36464409 0.63471723
points(xs, ys, pch = 19)
One caveat in the #G5W implementation that I found is that it sometimes returns multiple records close around extrema instead of a single one. On the diagram they cannot be seen, since they all fall into one point effectively.
The following snippet from here filters out single extrema points with the minimum value of the first derivative:
library(tidyverse)
df2 <- df %>%
group_by(round(y, 4)) %>%
filter(abs(d1) == min(abs(d1))) %>%
ungroup() %>%
select(-5)
I need help! I have a list of points (mydata X, Y = latitude, longitude) which I want to set in relation towards a single Point (for example 287, 347). I just want to measure (and plot) the distance between the list of points towards this single point of reference. But I don't get it. I tried PCA, but there I can't set my specific point of reference.
Does anyone has an idea how to do this?
mydata:
X,Y
260,204
409,208
260,60
272,181
367,257
315,208
381,294
66,333
246,220
330,207
342,304
...
If all you want is the euclidean distance, why not just make your own function to do it?
distance <- function(x, y, home= c(0,0)) sqrt((x-home[1])^2+(y-home[2])^2)
plot(distance(400*runif(100), 400*runif(100), c(287, 347)))
Using the distance formula, this is a trivial calculation:
mydata$distance = sqrt((mydata$X - 287)^2 + (mydata$Y - 347)^2)
I'm not sure what sort of plot you would like of distance - perhaps a histogram?
hist(mydata$distance)
I solved it by compute the distance like Georg said and then by using the density function.
mydata <- read.table("blabla.csv", sep=",")
mydata$distance = sqrt((mydata$V1 -555)^2 + (mydata$V2 - 85)^2)
d <- density(mydata$distance)
plot(d)
I think this is something like a Kernel regression (KDE)?.
I have a problem I wish to solve in R with example data below. I know this must have been solved many times but I have not been able to find a solution that works for me in R.
The core of what I want to do is to find how to translate a set of 2D coordinates to best fit into an other, larger, set of 2D coordinates. Imagine for example having a Polaroid photo of a small piece of the starry sky with you out at night, and you want to hold it up in a position so they match the stars' current positions.
Here is how to generate data similar to my real problem:
# create reference points (the "starry sky")
set.seed(99)
ref_coords = data.frame(x = runif(50,0,100), y = runif(50,0,100))
# generate points take subset of coordinates to serve as points we
# are looking for ("the Polaroid")
my_coords_final = ref_coords[c(5,12,15,24,31,34,48,49),]
# add a little bit of variation as compared to reference points
# (data should very similar, but have a little bit of noise)
set.seed(100)
my_coords_final$x = my_coords_final$x+rnorm(8,0,.1)
set.seed(101)
my_coords_final$y = my_coords_final$y+rnorm(8,0,.1)
# create "start values" by, e.g., translating the points we are
# looking for to start at (0,0)
my_coords_start =apply(my_coords_final,2,function(x) x-min(x))
# Plot of example data, goal is to find the dotted vector that
# corresponds to the translation needed
plot(ref_coords, cex = 1.2) # "Starry sky"
points(my_coords_start,pch=20, col = "red") # start position of "Polaroid"
points(my_coords_final,pch=20, col = "blue") # corrected position of "Polaroid"
segments(my_coords_start[1,1],my_coords_start[1,2],
my_coords_final[1,1],my_coords_final[1,2],lty="dotted")
Plotting the data as above should yield:
The result I want is basically what the dotted line in the plot above represents, i.e. a delta in x and y that I could apply to the start coordinates to move them to their correct position in the reference grid.
Details about the real data
There should be close to no rotational or scaling difference between my points and the reference points.
My real data is around 1000 reference points and up to a few hundred points to search (could use less if more efficient)
I expect to have to search about 10 to 20 sets of reference points to find my match, as many of the reference sets will not contain my points.
Thank you for your time, I'd really appreciate any input!
EDIT: To clarify, the right plot represent the reference data. The left plot represents the points that I want to translate across the reference data in order to find a position where they best match the reference. That position, in this case, is represented by the blue dots in the previous figure.
Finally, any working strategy must not use the data in my_coords_final, but rather reproduce that set of coordinates starting from my_coords_start using ref_coords.
So, the previous approach I posted (see edit history) using optim() to minimize the sum of distances between points will only work in the limited circumstance where the point distribution used as reference data is in the middle of the point field. The solution that satisfies the question and seems to still be workable for a few thousand points, would be a brute-force delta and comparison algorithm that calculates the differences between each point in the field against a single point of the reference data and then determines how many of the rest of the reference data are within a minimum threshold (which is needed to account for the noise in the data):
## A brute-force approach where min_dist can be used to
## ameliorate some random noise:
min_dist <- 5
win_thresh <- 0
win_thresh_old <- 0
for(i in 1:nrow(ref_coords)) {
x2 <- my_coords_start[,1]
y2 <- my_coords_start[,2]
x1 <- ref_coords[,1] + (x2[1] - ref_coords[i,1])
y1 <- ref_coords[,2] + (y2[1] - ref_coords[i,2])
## Calculate all pairwise distances between reference and field data:
dists <- dist( cbind( c(x1, x2), c(y1, y2) ), "euclidean")
## Only take distances for the sampled data:
dists <- as.matrix(dists)[-1*1:length(x1),]
## Calculate the number of distances within the minimum
## distance threshold minus the diagonal portion:
win_thresh <- sum(rowSums(dists < min_dist) > 1)
## If we have more "matches" than our best then calculate a new
## dx and dy:
if (win_thresh > win_thresh_old) {
win_thresh_old <- win_thresh
dx <- (x2[1] - ref_coords[i,1])
dy <- (y2[1] - ref_coords[i,2])
}
}
## Plot estimated correction (your delta x and delta y) calculated
## from the brute force calculation of shifts:
points(
x=ref_coords[,1] + dx,
y=ref_coords[,2] + dy,
cex=1.5, col = "red"
)
I'm very interested to know if there's anyone that solves this in a more efficient manner for the number of points in the test data, possibly using a statistical or optimization algorithm.
I am using R to visualize some data. I am found RGL to be a great library for plotting points.
points3d(x,y,z)
where x = c(x1,x2, ...), y = c(y1,y2,...), z = c(z1,z2, ...) and x,y,z have the same length, is a great function for plotting large sets of data.
Now, I would like to plot ellipses, mixed in with the data. I have a characterization of ellipses by a center point C, a vector describing the major axis U, and a vector describing the minor axis V. I obtain points P on the boundary of the ellipse by
P = U*cos(t) + V*sin(t) (t ranges between 0 and 2*pi)
obtaining vectors, xt, yt, and zt. Then I can plot the ellipse with
polygon3d(xt,yt,zt)
It works fine, but I'm guessing everyone reading is cringing, and will tell me that this is a bad way to do this. Indeed it takes a couple seconds to render each ellipse this way.
I don't think the ellipse3d function from the RGL package works here; at the very least, I am not working a matrix of covariances, nor do I understand how to get the ellipse I want from this function. Also, it returns an ellipsoid, not an ellipse.
****** EDIT ************
For a concrete example that takes awhile:
library(rgl)
open3d()
td <- c(0:359)
t <- td*pi/180
plotEllipseFromVector <- function(c,u,v){
xt <- c[1] + u[1]*cos(t) + v[1]*sin(t)
yt <- c[2] + u[2]*cos(t) + v[2]*sin(t)
zt <- c[3] + u[3]*cos(t) + v[3]*sin(t)
polygon3d(xt,yt,zt)
}
Input center point, major, and minor axis you want. It takes just over 2 seconds for me.
On the other hand, if I change t to be 0,20,40,... 340, then it works quite fast.