I want to calculate the density of a multivariate normal distribution manually. As inputs of my function, I have x which is a n*p matrix of data points, a vector mu with n means and a covariance matrix sigma of dim p*p.
I wrote the following function for this:
`dmnorm <- function(mu, sigma, x){
k <- ncol(sigma)
x <- t(x)
dmn <- exp((-1/2)*t(x-mu)%*%solve(sigma)%*%(x-
mu))/sqrt(((2*pi)^k)*det(sigma))
return(dmn)
}`
My own function gives me a matrix of n*n. However, I should get a vector of length n.
In the end, I want the same results as I get from using the dmvnorm() function from the mvtnorm package. What's wrong with my code?
The expression t(x-mu)%*%solve(sigma)%*%(x-
mu) is p x p, so that's why your result is that size. You want the diagonal of that matrix, which you can get using
diag(t(x-mu)%*%solve(sigma)%*%(x-mu))
so the full function should be
dmnorm <- function(mu, sigma, x){
k <- ncol(sigma)
x <- t(x)
dmn <- exp((-1/2)*diag(t(x-mu)%*%solve(sigma)%*%(x-
mu)))/sqrt(((2*pi)^k)*det(sigma))
dmn
}
Related
I am trying to create linear regression function in R for n parameter but I don't know how to proceed.I have created function for two variable.
]
new_lm <- function(y,x){
z=cbind(1,x)
k= solve(t(z)%*%z) %*% t(z) %*% y
return(k)
}
But in this case I am passing the values suppose I wanted to use it for n parameter that is same function can be used for n=1,2.. etc.
Something like this might work:
new_lm <- function(y,...){
x <- do.call(cbind, list(...))
z <- cbind(1,x)
k <- solve(t(z)%*%z) %*% t(z) %*% y
return(k)
}
By the way, computing linear regressions this way is terrible in practice (although for small problems it will work fine); you should use QR or singular value decomposition, or some other more sophisticated bit of linear algebra ...
As an exercise, I'm trying to write a function which replicates the rgeom() function. I want it to have the same arguments and return values. I've started out by using runif to generate a vector with x elements, but I'm not sure how to apply the probability distribution:
rgeometric <- function(x, prob) {
outcomes <- runif(x)
P <- (1 - prob)^length(x) * prob
return (P)
}
Would it be something like the following? How can I check that the distribution is geometric?
set.seed(0)
rgeometric <- function(x, prob) {
outcomes <- runif(x)
P <- (1 - prob)^length(x) * prob
for (i in x) {
x[i] <- x[i]*P
}
return (outcomes)
}
rgeometric(5, 0.4)
We can accomplish this task using Inverse Transform Sampling.
First, let's clear up some of your notation.
In the rgeom() function, we'll want that first argument to be n, an integer vector of length one giving the number of samples to generate:
rgeometric <- function(n, prob) {
u <- runif(n)
## do stuff
}
So how does inverse transform sampling work?
First we generate a vector u of standard uniform deviates, as shown above.
Then, for each element ui of u, we find the value of the inverse of the cumulative density function at ui.
For the geometric distribution, the CDF is 1 - (1 - prob)^(x+1); the inverse of the CDF is ceiling(log(1-u) / log(1-prob)) - 1 (link to derivation, p. 11).
So, we can complete the function like so:
rgeometric <- function(n, prob) {
u <- runif(n)
return(ceiling(log(1-u) / log(1-prob)) - 1)
}
Your last question is how can we test if the resulting samples are distributed geometric?
I don't know of a formal test that will help, but we can see it appears to work when we compare the density of 1 million random draws from this custom function to the density of 1 million random draws from base R's rgeom() function:
n <- 1e6
p <- 0.25
set.seed(0)
x <- rgeometric(n, p)
y <- rgeom(n, p)
png("so-answer.png", width = 960)
opar <- par(mfrow = c(1, 2))
plot(density(x), main = "Draws from custom function")
plot(density(y), main = "Draws from base R function")
par(opar)
dev.off()
Note that for the definition of the geometric function implemented by r, the random variable is the number of failures until the first success. Therefore you could do:
my_rgeom <- function(n, p){
fun <- function(p){
n <- 0
stopifnot(p>0)
while(runif(1)>p) n <- n+1
n
}
replicate(n, fun(p))
}
Now test the function:
n <- 100000
p <- 0.25
X <- rgeom(n, p)
Y <- my_rgeom(n, p)
You can do a ks.test on X and Y, though this is for continuous variables. The best thing to do is the chisq.test to determine whether the two are similar.
Lastly we could use graphical methods. eg superimposed histogram:
barplot(table(X), col = rgb(0.5, 1, 0.5, 0.4))
barplot(table(Y), add = TRUE, col = rgb(1, 0.5, 0, 0.3))
From the image above you can see that the two are nearly identical
For lm or glm type objects, or even lmer type objects, you can extract the hat values from the model by using the R function hatvalues(). However, this doesn't work with nls objects, apparently. I have Googled every which way, but I can't find a way to get these values. Does nls simply not create a hat matrix, or are the hat values produced from a non-linear least squares model just not reliable somehow?
Reproducible example:
xs = rep(1:10, times = 10)
ys = 3 + 2*exp(-0.5*xs)
for (i in 1:100) {
xs[i] = rnorm(1, xs[i], 2)
}
df1 = data.frame(xs, ys)
nls1 = nls(ys ~ a + b*exp(d*xs), data=df1, start=c(a=3, b=2, d=-0.5))
There's a nice article (On the outlier Detection in Nonlinear
Regression) where hat matrix is approximated by gradient matrix computed
at estimated point.
In your case:
# gradient of the model function at the current parameter values
V <- nls1$m$gradient()
# tangent plane leverage matrix (it plays a similar role as the Hat matrix)
H <- V %*% solve(t(V) %*% V) %*% t(V)
# 'hat' values for nls
nls1.hat_values <- diag(H)
And if you follow this article you can calculate H a little bit faster:
Q1 <- qr.Q(qr(V)) # V is the same matrix as above
H <- Q1 %*% t(Q1)
Since H can be quite big and if you want only hat values you can skip matrix multiplication altogether. We only need diagonal of H matrix.
###
#' Approximation of hat values for nls.
#'
#' #param model An 'nls' object
#' #param ... Additional parameters (ignored)
#' #return Vector of approximated hat values
###
hatvalues.nls <- function(model, ...) {
stopifnot(is(model, 'nls'))
list(...) # ignore additional parameters
V <- model$m$gradient()
Q1 <- qr.Q(qr(V))
rowSums(Q1*Q1)
}
So I know you can find the derivative of something like: "x^3-6*x^2" by doing: D(expression(x^3-6*x^2), 'x'), but what if I need to find the first derivative maximum of a list of values such as:
value <- c(610,618,627,632,628,634,634,628,634,642,637,643,653,666,684,717,787,923,1197,1716,2638,4077,5461,7007,8561,9994,11278,12382,13382,14252)
these values are the y coordinate and the x coordinate starts at 1 and increments by 1. IE the first point is (1,610) second is (2,618) etc. -Thanks
Consider using the package numDerive from CRAN. It has a function grad that computes derivative of a function at a point. Example:
f = function(x) x^3 - 6*x^2
library(numDeriv)
grad(f, 1) #derivative of f at x=1
To solve your problem with a list of values, use a for loop:
xval <- c(YOUR VALUES HERE)
xval.derivatives <- c() #empty vector to hold
for(i in 1:length(xval)) xval.derivatives[i] <- grad(f,xval[i])
The gradient function from the pracma package calculates the derivative from a vector of values.
library(pracma)
value <- c(610,618,627,632,628,634,634,628,634,642,637,643,653,666,684,717,787,923,1197,1716,2638,4077,5461,7007,8561,9994,11278,12382,13382,14252)
value_prime <- pracma::gradient(value, h1 = 1)
plot(value_prime)
Alternatively, fit a spline.
spl <- smooth.spline(1:length(value), y=value)
pred <- predict(spl)
pred.prime <- predict(spl, deriv=1)
plot(pred.prime, type = 'b')
If you are interested in higher derivatives, check the pspline package.
This question already has answers here:
How do I best simulate an arbitrary univariate random variate using its probability function?
(4 answers)
Closed 9 years ago.
How can I generate random sample data from the quantiles of the unknown density f(x) for x between 0 and 4 in R?
f = function(x) ((x-1)^2) * exp(-(x^3/3-2*x^2/2+x))
If I understand you correctly (??) you want to generate random samples with the distribution whose density function is given by f(x). One way to do this is to generate a random sample from a uniform distribution, U[0,1], and then transform this sample to your density. This is done using the inverse cdf of f, a methodology which has been described before, here.
So, let
f(x) = your density function,
F(x) = cdf of f(x), and
F.inv(y) = inverse cdf of f(x).
In R code:
f <- function(x) {((x-1)^2) * exp(-(x^3/3-2*x^2/2+x))}
F <- function(x) {integrate(f,0,x)$value}
F <- Vectorize(F)
F.inv <- function(y){uniroot(function(x){F(x)-y},interval=c(0,10))$root}
F.inv <- Vectorize(F.inv)
x <- seq(0,5,length.out=1000)
y <- seq(0,1,length.out=1000)
par(mfrow=c(1,3))
plot(x,f(x),type="l",main="f(x)")
plot(x,F(x),type="l",main="CDF of f(x)")
plot(y,F.inv(y),type="l",main="Inverse CDF of f(x)")
In the code above, since f(x) is only defined on [0,Inf], we calculate F(x) as the integral of f(x) from 0 to x. Then we invert that using the uniroot(...) function on F-y. The use of Vectorize(...) is needed because, unlike almost all R functions, integrate(...) and uniroot(...) do not operate on vectors. You should look up the help files on these functions for more information.
Now we just generate a random sample X drawn from U[0,1] and transform it with Z = F.inv(X)
X <- runif(1000,0,1) # random sample from U[0,1]
Z <- F.inv(X)
Finally, we demonstrate that Z is indeed distributed as f(x).
par(mfrow=c(1,2))
plot(x,f(x),type="l",main="Density function")
hist(Z, breaks=20, xlim=c(0,5))
Rejection sampling is easy enough:
drawF <- function(n) {
f <- function(x) ((x-1)^2) * exp(-(x^3/3-2*x^2/2+x))
x <- runif(n, 0 ,4)
z <- runif(n)
subset(x, z < f(x)) # Rejection
}
Not the most efficient but it gets the job done.
Use sample . Generate a vector of probablities from your existing function f, normalized properly. From the help page:
sample(x, size, replace = FALSE, prob = NULL)
Arguments
x Either a vector of one or more elements from which to choose, or a positive integer. See ‘Details.’
n a positive number, the number of items to choose from. See ‘Details.’
size a non-negative integer giving the number of items to choose.
replace Should sampling be with replacement?
prob A vector of probability weights for obtaining the elements of the vector being sampled.