So I am sampling from a multivariate normal distribution in R, and am trying to figure out how to calculate its 95% confidence ellipse using the ellipse() function in the package car.
Here is some code I am running:
mu = c(0,0)
sigma = matrix(c(20,0,0,45),nrow=2)
z = rmvnorm(10000,mu,sqrt(sigma))
par(mfrow=c(1,2))
plot(z)
ellipse(mu,sqrt(sigma*qchisq(.05,2)),radius=1)
dataEllipse(z,levels=.95)
So basically I want the ellipse command to replicate the dataEllipse command. If anyone has any suggestions that would be greatly appreciated!
Edit: Using Dwins code and combining it within my own:
library(car)
library(mvtnorm)
mu = c(0,0)
sigma = matrix(c(20,0,0,45),nrow=2)
z = rmvnorm(10000,mu,sqrt(sigma))
dataEllipse(z,levels=.95)
car::ellipse(mu, sigma*qchisq(.05,2), col="blue",
radius=sqrt(2 * qf(.975, 2, 9998)) )
So as you can see, the ellipses are still not the same...
I'm guessing (although you should not have made me do so) that rmvnorm is from 'mixtools' and it was loaded after 'car'. I do not think the sqrt() function is needed since the argument to ellipse is supposed to be a covariance matrix. Also at the moment it is plotting but you cannot see it, because you didn't color it red (or anything). Furthermore both 'mixtools' and 'car' have ellipse functions, so if you want the car-version ( which does have a radius argument unlike the mixtools version) then you need to call it with the double colon convention:
library(car); library(mixtools)
car::ellipse(mu, sigma*qchisq(.05,2), col="red",
radius=sqrt(2 * qf(.975, 2, 9998)) )
Since this post is still getting views, I'll provide the actual answer. The last three lines of this code snippet replicates car::dataEllipse exactly:
library(car)
library(mvtnorm)
mu = c(0,0)
sigma = matrix(c(20,0,0,45),nrow=2)
z = rmvnorm(10000,mu,sigma)
dataEllipse(z,levels=.95)
center <- apply(z, 2, mean)
cov_mat <- cov(z)
ellipse(center, cov_mat, col="red", radius=sqrt(2 * qf(.95, 2, 9999)))
Note that both car::dataEllipse and car::ellipse return the coordinates of the points silently, so one can confirm that the points are indeed equal.
Related
I'm trying to find a root of the following function (based on the Gamma (gamma()) function) using the uniroot() function:
cv = 0.056924/1.024987^2
fx2 = function(theta, eta){
p1 = 1 - 2/(theta*(1-eta))
p2 = 1 - 1/(theta*(1-eta))
return(( gamma(p1)/(gamma(p2))^2 ) - (cv+1) )
}
This function gives me the following plot:
v = seq(0, 1, 0.01)
plot(v, fx2(3.0, v), type='l' )
It seems to me that the root of this function is close to 0.33, but the uniroot() function doesn't find the root, returning the following result:
uniroot(fx2, interval = c(0,0.3), theta=3 )
Error in uniroot(fx2, interval = c(0, 0.3), theta = 3) :
f() values at end points not of opposite sign
How do I find the root of this function? Are there any other packages with a more accurate algorithm?
I first rewrote your function to (optionally) express gamma(p1)/gamma(p2)^2 in terms of a computation that's first done on the log scale (via lgamma()) and then exponentiated. This is more numerically stable, and the consequences will become clear below ... (It's possible that I screwed up the log-scale computation — you should double-check it. Update/warning: reading the documentation more carefully (!!), lgamma() evaluates to the log of the absolute value of the gamma function. So there may be some weird sign stuff going on in the answer below. The fact remains that if you are evaluating ratios of gamma functions for x<0 (i.e. in the regime where the value can go negative), Bad Stuff is very likely going to happen.
cv = 0.056924/1.024987^2
fx3 <- function(theta, eta, lgamma = FALSE) {
p1 <- 1 - 2/(theta*(1-eta))
p2 <- 1 - 1/(theta*(1-eta))
if (lgamma) {
val <- exp(lgamma(p1) - 2*lgamma(p2)) - (cv+1)
} else {
val <- ( gamma(p1)/(gamma(p2))^2 ) - (cv+1)
}
}
Compute the function with and without log-scaling:
x <- seq(0, 1, length.out = 20001)
v <- sapply(x, fx3, theta = 3.0, lgamma = TRUE)
v2 <- sapply(x, fx3, theta = 3.0, lgamma = FALSE)
Find root (more explanation below):
uu <- uniroot(function(eta) fx3(3.0, eta, lgamma = TRUE),
c(0.4, 0.5))
Plot it:
par(las=1, bty="l")
plot(x, abs(v), col = as.numeric(v<0) + 1, type="p", log="y",
pch=".", cex=3)
abline(v = uu$root, lty=2)
cvec <- sapply(c("blue","magenta"), adjustcolor, alpha.f = 0.2)
points(x, abs(v2), col=cvec[as.numeric(v2<0) + 1], pch=".", cex=3)
Here I'm plotting the absolute value on a log scale, with sign indicated by colour (black/blue >0, red>magenta <0). Black/red is the log-scale calculation, blue/magenta is the original calculation. I also plotted the function at very high resolution to try to avoid missing or mischaracterizing features.
There's a lot of weird stuff going on here.
both versions of the function do something interesting near x=1/3; the original version looks like a pole (value diverges to +∞, "returns" from -∞), while the log-scale computation goes up to +∞ and returns without changing sign.
the log-scale computation has a root near x=0.45 (absolute value becomes small while the sign flips), but the original computation doesn't — presumably because of some kind of catastrophic loss of precision? If we give uniroot bounds that don't include the pole, it can find this root.
there are further poles and/or roots at larger values of x that I didn't explore.
All of this basically says that it's pretty dangerous to mess around with this function without knowing what its mathematical properties are. I discovered some stuff by numerical exploration, but it would be best to analyze the function so that you really know what's happening; any numerical exploration can be fooled if the function is sufficiently strangely behaved.
I have a smoothed time series and want to find the instantaneous velocity of the function at any point along the line.
What I want to do is take a series of values: ex(1,6,5,4,3,5,6,7,1)
and return the derivative of each relative to the function of the entire series, such that at every point in time, I know what direction the line is trending.
I am new to R, but know there must be a way.
Any tips?
Ex:
library(smoother)
data(BJsales)
m <- data.frame(BJsales)
x.smth <- as.data.frame(smth.gaussian(m$BJsales,tails=TRUE,alpha = 5))
x.smth.ts <- cbind(seq(1:nrow(m)),x.smth)
colnames(x.smth.ts) <- c("x","y")
x.smth.ts
plot(x.smth.ts$y~x.smth.ts$x)
Desired output:
df with 2 columns: x, deriv.of.y
Edit: Final Result thanks to G5W
TS with Color by Derivative
Your proposed example using the BJSales data is decidedly not differentiable,
so instead I will show the derivative of a much smoother function. If your real data is smooth, this should work for you.
The simplest way to approximate the derivative is simply to use finite differences.
f'(x) ≈ (f(x+h) - f(x))/h
## Smooth sample function
x = seq(0,10,0.1)
y = x/2 + sin(x)
plot(x,y, pch=20)
## Simplest - first difference
d1 = diff(y)/diff(x)
d1 = c(d1[1],d1)
Let's use it to plot a tangent line as an error check. I picked a place to draw the tangent line arbitrarily: the 18th point, x=1.7
plot(x,y, type="l")
abline(y[18]-x[18]*d1[18], d1[18])
To get the data.frame that you requested, you just need
Derivative = data.frame(x, d1)
I want to plot a univariate normal density function of the normal distribution onto a (x,y,z) coordinate system.
The code I am using is:
library(rgl)
open3d()
x <- seq(0, 10, length=100)
y <- seq(0, 10, length=100)
z = outer(x,y, function(x,y) dnorm(x,2.5,1)*dnorm(y,2.5,1))
persp3d(x, y, z,col = rainbow(100))
The problem I an encountering is that I want the normal distribution not to be around its mean only but also to be on a straight line or a circle. In latter case, I would expect the output to be similar to a volcano. I guess I must first create some probabilities within a loop. How can I do this? Or should I also use some surface command to plot the output? I am pretty sure this has nothing to do with a bivariate normal though.
Best
Fuji
The first part is easy: just don't let your z depend on y for instance:
z = outer(x,y, function(x,y) dnorm(x,2.5,1))
persp3d(x, y, z,col = rainbow(100))
For the second part, you can imagine that the means of the normal distribution lie on the x^2+y^2=1 circle. You will have infinite normal distributions with radial directions. Try this:
#define the volcano function
volcano<-function(x,y,sigma=1/2) {
alpha<-atan(y/x)+pi*(x<0)
d<-sqrt((cos(alpha)-x)^2 + (sin(alpha)-y)^2)
dnorm(d,0,sigma)
}
x<-seq(-2,2,length.out=100)
y<-seq(-2,2,length.out=100)
z<-outer(x,y,volcano)
persp3d(x, y, z,col = rainbow(100))
I'm new user of R, and trying to generate a k-moving average graph with sine function which involves random number(in range [-0.5,+0.5]) noise.
So what I have to do is calculate a mean of consecutive (2*k+1) elements in noised-sine vector but however, the code with "HELP" below, it's not working as I expected... :(
The code seems to calculate the mean of 1 through (i-k)th element.
What's wrong with it? Help please!
set.seed(1)
x = seq(0,2*pi,pi/50)
sin_graph <- sin(x)
noise <- runif(101, -0.5, 0.5)
sin_noise <- sin_graph + noise
plot(x,sin_noise, ylim=c(-2,2))
lines(x,sin_graph, col="red")
k<-1
MA<-0
while (k<=1){
i <- k+1
MA_vector <- rep(NA, times=101)
while (i<=101-k){
MA_vector[i] <- mean(sin_noise[i-k:i+k]) #HELP!
i <- i+1
}
print(MA_vector)
plot(x, MA_vector, ylim=c(-2,2))
lines(x,sin_graph, col="red")
k<-k+1
}
As it stands, it's substracting a vector of k:i from i and then adding k. : takes precedent over mathematical operators. By using brackets (see code below), it evaluates i-k and i+k and creates a vector with min and max as results of the evaluations. I get another smooth function.
MA_vector[i] <- mean(sin_noise[(i-k):(i+k)])
Suppose that $Y_1$ has a gamma distribution with parameters $\alpha_1$ and $\beta$, that $Y_2$ is gamma distributed with parameters $\alpha_2$ and $\beta $ , and that $Y_1$ and
$Y_2$ are independent. Let $U_1 = Y_1/(Y_1 + Y_2)$ and $U_2 = Y_1 + Y_2$.
1) Plot the marginal density functions of $U_1$ and $U_2$ for $\alpha_1 = 3$, $\alpha_2 = 5$ and $\beta = 2$ using R.
2) A more challenging question: plot the joint density of $U_1$ and $U_2$ (a 3D plot) in R.
For 1, I have the following command for $U_1$. Are they correct?
alpha_1=3
alpha_2=5
beta = 2
t=seq(0,1, 0.01)
den=dbeta(t,alpha_1 ,alpha_2, beta)
plot(t, den)
For 1, for $U_2$, i found that it is a gamma density.
however the command for this is the same as above except
den=dgamma(t,alpha_1, alpha, beta)
R keeps complaining that do not specify both scale and rate which conflicts me... how do fix it?
Another thing that is not clear to me is the following command:
since for $U_2$ the support for $U_2$ should be bigger than 0. what is the command for infinity?
t=seq(0,1 (this should be infinity right), 0.01)
For question 2, I type in rotate as a command to try to rotate the graph. however, the command does not work. Do I need to use some other software or there are other commands which will work?