I know two vectors x and y, how can I calculate derivatives of y with respect to x in R ?
x<-rnorm(1000)
y<-x^2+x
I want to caculate derivative of y with respect to x: dy/dx; suppose I don't know the underlying function between x and y. There can be a value in derivative scale corresponding to each x.
The only problem with your data is that it is not sorted.
set.seed(2017)
x<-rnorm(1000)
y<-x^2+x
y = y[order(x)]
x = sort(x)
plot(x,y)
Now you can take the y differences over the x differences.
plot(x[-1],diff(y)/diff(x))
abline(1,2)
The result agrees well with the theoretical result d(x) = 2x+1
If you want to get you hands on the function for the derivative, just use approxfun on all of the points that you have.
deriv = approxfun(x[-1], diff(y)/diff(x))
Once again, plotting this agrees well with the expected derivative.
To find the derivative use the numeric approximation: (y2-y1)/(x2-x1) or dy/dx. In R use the diff function to calculate the difference between 2 consecutive points:
x<-rnorm(100)
y<-x^2+x
#find the average x between 2 points
avex<-x[-1]-diff(x)/2
#find the numerical approximation
#delta-y/delta-x
dydx<-diff(y)/diff(x)
#plot numeric approxiamtion
plot(x=avex, dydx)
#plot analytical answer
lines(x=avex, y=2*avex+1)
Related
Say I have vectors x and y and want to calculate the second derivative of y with respect to x using finite differences.
I'd do
x <- rnorm(2000)
y <- x^2
y = y[order(x)]
x = sort(x)
dydx = diff(y) / diff(x)
d2ydx2 = c(NA, NA, diff(dydx) / diff(x[-1]))
plot(x, d2ydx2)
As you can see, there are a few points which are wildly inaccurate. I believe the problem arises because values in dydx do not exactly correspond to those of x[-1] leading a second differentiation to have inaccurate results. Since the step in x is non-constant, the second-order differentiation is not straight forward. How can I do this?
Each time you are taking the numerical approximation derivative, you are losing one value in the vector and shifting the output value over one spot. You are correct, the error is due to the uneven spacing in the x values (incorrect divisor in dydx & d2ydx2 calculations).
In order to correct, calculate a new set of x values corresponding to the mid point between the adjacent x values at each derivative. This is the value where the slope is calculated.
Thus y'1 = f'((x1+x2)/2).
This method is not perfect but the resulting error is much smaller.
#create the input
x <- sort(rnorm(2000))
y <- x**2
#calculate the first deriative and the new mean x value
xprime <- x[-1] - diff(x)/2
dydx <- diff(y)/diff(x)
#calculate the 2nd deriative and the new mean x value
xpprime <- xprime[-1] - diff(xprime)/2
d2ydx2 <- diff(dydx)/diff(xprime)
plot(xpprime, d2ydx2)
Another way is using splinefun, which returns a function from which you can calculate cubic spline derivatives.
Of course, given your example function y= x^2 the second derivatives will be always 2
x <- rnorm(2000)
y <- x^2
y = y[order(x)]
x = sort(x)
fun = splinefun(x,y)
plot(x,fun(x,deriv=2))
I try to simulate values out of an unknown integral (to create a climatological forecaster)
my function is: $\int_{x = 0}^{x = 0.25} 4*y^(-1/x) dx$
Normally one inputs the variable y and gets a value as output.
However, I want to input the value this integral is equal to and get the value of y as an output.
I have 3 runif vectors of length 1 000, 10 000 and 100 000 (with values between 0 and 1), which I use as my input values.
Say the first value is 0.3 and the second value is 0.78
I want to calculate for which y, the integral above is equal to 0.3 (or equal to 0.78 for the second value).
how am I able to do this in R?
I've tried some stuff with the integrate function, but then I need a value for y to make that work
You are trying to solve a non-linear equation with an integral inside.
Intuitively, what you need to do is to start with an interval in which the desired y sits on. Then try different values of y and calculate the integral, narrow the interval by the result.
You can implement that in R using integrate and optimize as below:
f <- function(x, y) {
4*y^(-1/x)
}
intf <- function(y) {
integrate(f, 0, 0.25, y=y)
}
objective <- function(y, value) {
abs(intf(y)$value - value)
}
optimize(objective, c(1, 10), value=0.3)
#$minimum
#[1] 1.14745
#
#$objective
#[1] 1.540169e-05
optimize(objective, c(1, 10), value=0.78)
#$minimum
#[1] 1.017891
#
#$objective
#[1] 0.0001655954
Here, f is the function to be integrated, intf calculates the integral for a given y, and objective measures the distance between the value of the integral against the desired value.
Since optimize function finds the minimum value of a function, it finds y such that the objective is closest to the target value.
Note that non-linear equations with an integral inside are in general tough to solve. This case seems manageable since the function is monotonic and continuous in y. The solution y should be unique and can be easily found by narrowing down the interval.
This question already has answers here:
How do I best simulate an arbitrary univariate random variate using its probability function?
(4 answers)
Closed 9 years ago.
How can I generate random sample data from the quantiles of the unknown density f(x) for x between 0 and 4 in R?
f = function(x) ((x-1)^2) * exp(-(x^3/3-2*x^2/2+x))
If I understand you correctly (??) you want to generate random samples with the distribution whose density function is given by f(x). One way to do this is to generate a random sample from a uniform distribution, U[0,1], and then transform this sample to your density. This is done using the inverse cdf of f, a methodology which has been described before, here.
So, let
f(x) = your density function,
F(x) = cdf of f(x), and
F.inv(y) = inverse cdf of f(x).
In R code:
f <- function(x) {((x-1)^2) * exp(-(x^3/3-2*x^2/2+x))}
F <- function(x) {integrate(f,0,x)$value}
F <- Vectorize(F)
F.inv <- function(y){uniroot(function(x){F(x)-y},interval=c(0,10))$root}
F.inv <- Vectorize(F.inv)
x <- seq(0,5,length.out=1000)
y <- seq(0,1,length.out=1000)
par(mfrow=c(1,3))
plot(x,f(x),type="l",main="f(x)")
plot(x,F(x),type="l",main="CDF of f(x)")
plot(y,F.inv(y),type="l",main="Inverse CDF of f(x)")
In the code above, since f(x) is only defined on [0,Inf], we calculate F(x) as the integral of f(x) from 0 to x. Then we invert that using the uniroot(...) function on F-y. The use of Vectorize(...) is needed because, unlike almost all R functions, integrate(...) and uniroot(...) do not operate on vectors. You should look up the help files on these functions for more information.
Now we just generate a random sample X drawn from U[0,1] and transform it with Z = F.inv(X)
X <- runif(1000,0,1) # random sample from U[0,1]
Z <- F.inv(X)
Finally, we demonstrate that Z is indeed distributed as f(x).
par(mfrow=c(1,2))
plot(x,f(x),type="l",main="Density function")
hist(Z, breaks=20, xlim=c(0,5))
Rejection sampling is easy enough:
drawF <- function(n) {
f <- function(x) ((x-1)^2) * exp(-(x^3/3-2*x^2/2+x))
x <- runif(n, 0 ,4)
z <- runif(n)
subset(x, z < f(x)) # Rejection
}
Not the most efficient but it gets the job done.
Use sample . Generate a vector of probablities from your existing function f, normalized properly. From the help page:
sample(x, size, replace = FALSE, prob = NULL)
Arguments
x Either a vector of one or more elements from which to choose, or a positive integer. See ‘Details.’
n a positive number, the number of items to choose from. See ‘Details.’
size a non-negative integer giving the number of items to choose.
replace Should sampling be with replacement?
prob A vector of probability weights for obtaining the elements of the vector being sampled.
Is there an easy way to calculate the derivative of non-liner functions that are give by data?
for example:
x = 1 / c(1000:1)
y = x^-1.5
ycs = cumsum(y)
plot (x, ycs, log="xy")
How can I calculate the derivative function from the function given by ´x´ and ´ycs´?
Was also going to suggest an example of a smoothed spline fit followed by prediction of the derivative. In this case, the results are very similar to the diff calculation described by #dbaupp:
spl <- smooth.spline(x, y=ycs)
pred <- predict(spl)
plot (x, ycs, log="xy")
lines(pred, col=2)
ycs.prime <- diff(ycs)/diff(x)
pred.prime <- predict(spl, deriv=1)
plot(ycs.prime)
lines(pred.prime$y, col=2)
Generating derivatives from raw data is risky unless you are very careful. Not for nothing is this process known as "error multiplier." Unless you know the noise content of your data and take some action (e.g. spline) to remove the noise prior to differentiation, you may well end up with a scary curve indeed.
The derivative of a function is dy/dx, which can be approximated by Δy/Δx, that is, "change in y over change in x". This can be written in R as
ycs.prime <- diff(ycs)/diff(x)
and now ycs.prime contains an approximation to the derivative of the function at each x: however it is a vector of length 999, so you will need to shorten x (i.e. use x[1:999] or x[2:1000]) when doing any analysis or plotting.
There is also gradient from the pracma package.
grad <- pracma::gradient(ycs, h1 = x)
plot(grad, col = 1)
Refer to the R code below. The function (someRfunction) operates on a vector and returns a scalar value. The data are pairs (x,y), where x and y are vectors of length n, which may be large.
I want to know the value of x* such that the result of someRfunction on y where {x>x*} is maximized. The function operates on y values and is non-monotonic in x*. I need to evaluate for all x* (i.e. each element of x). Speed is not an issue if executed once, but the code would be executed many times in a simulation. Is there any way to make this code more efficient/faster?
### x and y are vectors of length n
### sort x and y such that they are ordered by descending x
xord <- x[order(-x)]
yord <- y[order(-x)]
maxf <- -99999
maxcut <- NA
for (i in 1:n) {
### yi is a subvector of y that corresponds to y[x>x{i}]
### where x{i} is the (n-i+1)th order statistic of x
yi <- yord[1:(i-1)]
fxi <- someRfunction(yi)
if (fxi>maxf) {
maxf <- fxi
maxcut <- xord[i]
}
}
Thanks.
Edit: let someRfunction(yi)=t.test(yi)$statistic.
If you can say anything more about the function, particularly whether it is smooth and whether its gradient can be determine, you will get a better answer. At the moment the only increase in speed will be modest due to the ability to pre-specify a vector to hold the results, omit that if-max clause and then use which.max() on the vector. You might want to look at the function optimx in package "optimx".