I am trying to plot in 3d with wireframe, here is my code:
mecdf = function (nr, nc, x, u) # mecdf is the bivariate cumulative empiric function
{ k = rep (TRUE, nr)
for (j in 1:nc) k = k & (x [,j] <= u [j])
sum (k) / nr
}
xc = round(runif(100), 2)
yc = round(runif(100), 2)
Da = cbind(xc, yc)
bcdfa<-rep(NA,100)
for (i in 1:100) {bcdfa[i]=mecdf(nrow(Da),ncol(Da),Da,Da[i,])}
bcdfa
x<-cbind(xc,yc,bcdfa) # bcdfa is the value of the bcdfa in every (xc,yc) point
...........
Is it possible to use the wireframe or another function with these data to represent the bivariate cumulative distribution function ?
Here's one way.
library(akima)
library(plot3D)
bcfoo<-interp(xc,yc,bcdfa)
persp3D(bcfoo$x,bcfoo$y,bcfoo$z)
#plot points to verify overall shape
scatter3D(xc,yc,bcdfa)
Related
The task:
Eric the fly has a friend, Ernie. Assume that the two flies sit at independent locations, uniformly distributed on the globe’s surface. Let D denote the Euclidean distance between Eric and Ernie (i.e., on a straight line through the interior of the globe).
Make a conjecture about the probability density function of D and give an
estimate of its expected value, E(D).
So far I have made a function to generate two points on the globe's surface, but I am unsure what to do next:
sample3d <- function(2)
{
df <- data.frame()
while(n > 0){
x <- runif(1,-1,1)
y <- runif(1,-1,1)
z <- runif(1,-1,1)
r <- x^2 + y^2 + z^2
if (r < 1){
u <- sqrt(x^2+y^2+z^2)
vector = data.frame(x = x/u,y = y/u, z = z/u)
df <- rbind(vector,df)
n = n- 1
}
}
df
}
E <- sample3d(2)
This is an interesting problem. I'll outline a computational approach; I'll leave the math up to you.
First we fix a random seed for reproducibility.
set.seed(2018);
We sample 10^4 points from the unit sphere surface.
sample3d <- function(n = 100) {
df <- data.frame();
while(n > 0) {
x <- runif(1,-1,1)
y <- runif(1,-1,1)
z <- runif(1,-1,1)
r <- x^2 + y^2 + z^2
if (r < 1) {
u <- sqrt(x^2 + y^2 + z^2)
vector = data.frame(x = x/u,y = y/u, z = z/u)
df <- rbind(vector,df)
n = n- 1
}
}
df
}
df <- sample3d(10^4);
Note that sample3d is not very efficient, but that's a different issue.
We now randomly sample 2 points from df, calculate the Euclidean distance between those two points (using dist), and repeat this procedure N = 10^4 times.
# Sample 2 points randomly from df, repeat N times
N <- 10^4;
dist <- replicate(N, dist(df[sample(1:nrow(df), 2), ]));
As pointed out by #JosephWood, the number N = 10^4 is somewhat arbitrary. We are using a bootstrap to derive the empirical distribution. For N -> infinity one can show that the empirical bootstrap distribution is the same as the (unknown) population distribution (Bootstrap theorem). The error term between empirical and population distribution is of the order 1/sqrt(N), so N = 10^4 should lead to an error around 1%.
We can plot the resulting probability distribution as a histogram:
# Let's plot the distribution
ggplot(data.frame(x = dist), aes(x)) + geom_histogram(bins = 50);
Finally, we can get empirical estimates for the mean and median.
# Mean
mean(dist);
#[1] 1.333021
# Median
median(dist);
#[1] 1.41602
These values are close to the theoretical values:
mean.th = 4/3
median.th = sqrt(2)
I have an exercise, in which i have to create an algorithm as follows:
ratio of Uniforms is based on the fact that for a random variable X with density f(x) we can generate X from the desired density by calculating X = U/V for a pair (U, V ) uniformly distributed in the set
Af = {(u,v):0 < v ≤ f(u/v)}
Random points can be sampled uniformly in Af by rejection from the min- imal bounding rectangle, i.e., the smallest possible rectangle that contains Af .
It is given by (u−, u+) × (0, v+) where
v+ = max f(x), x
u− = minx f(x), x
u+ = maxx f(x)
Then the Ratio-of-Uniforms method consists of the following simple steps:
Generate random number U uniformly in (u−, u+).
Generate random number V uniformly in (0, v+).
Set X ← U/V .
If V 2 ≤ f(X) accept and return X.
Else try again.
My code so far:
x <- cnorm(1, mean = 0, sd=1)
myrnorm <- function(pdf){
## call rou() n times
pdf <- function(x) {exp(-x^2/2)}
}
rou <- function(u, v) {
uplus <- 1
vplus <- 1
n <- 100
u <- runif(n, min=0, max=uplus)
v <- runif(n, min=0, max=vplus)
xi <- v/u
while(v < sqrt(xi)) {
if(v^2 <= xi)
return(xi)
}
}
myx <- myrnorm(1000)
hist(myx)
But I really dont know how to go on. Im ´lost with this exercise. I would be really grateful for any advise.
Following example 1 in page 8 of this link and your sample code, I came up this solution:
ratioU <- function(nvals)
{
h_x = function(x) exp(-x)
# u- is b-, u+ is b+ and v+ is a in the example:
uminus = 0
uplus = 2/exp(1)
vplus = 1
X.vals <- NULL
i <- 0
repeat {
i <- i+1
u <- runif(1,0,vplus)
v <- runif(1,uminus,uplus)
X <- u/v
if(v^2 <= h_x(X)) {
tmp <- X
}
else {
next
}
X.vals <- c(X.vals,tmp)
if(length(X.vals) >= nvals) break
}
answer <- X.vals
answer
}
sol = ratioU(1000)
par(mfrow=c(1,2))
hist(sol,breaks=50, main= "using ratioU",freq=F)
hist(rexp(1000),breaks = 50, main="using rexp from R",freq=F)
par(mfrow=c(1,1))
par(mfrow=c(1,2))
plot(density(sol))
plot(density(rexp(1000)))
par(mfrow=c(1,1))
A lot of the code may be optimized but I think it is good enough like this for this purpose. I hope this helps.
I am doing cluster analysis of several time series in R (the sales of a product in different stores).
I am using the first order temporal correlation coefficient CORT(S1,S2), in package TSclust, where S1 and S2 are two time series.
The literaure (https://cran.r-project.org/web/packages/TSclust/TSclust.pdf) explains that CORT belongs to the interval [-1,1]: when CORT(S1,S2)=1 both series show a similar dynamic behavior, and when CORT(S1,S2)=-1 they have opposite behavior.
I would like to know how to see the results of CORT, in order to observe the values of CORT for each pair of time series.
We can see the next example in TSclust package:
## Create three sample time series
x <- cumsum(rnorm(100))
y <- cumsum(rnorm(100))
z <- sin(seq(0, pi, length.out=100))
## Compute the distance and check for coherent results
diss.CORT(x, y, 2)
diss.CORT(x, z, 2)
diss.CORT(y, z, 2)
So with the above code we can calculate de dissimilarity index using the coefficient CORT(S1,S2), but we cannot consult the values of the CORT coefficient.
So, does anyone how to see the values of CORT coefficient in R?
Thanks in advance.
I am not sure if this is what you want, but any how this is what I did:
View(diss.CORT)
where R shows:
function (x, y, k = 2, deltamethod = "Euclid")
{
.ts.sanity.check(x, y)
.check.equal.length.ts(x, y)
corrt <- corrtemporder1(x, y)
type <- (pmatch(deltamethod, c("Euclid", "Frechet", "DTW")))
typedist <- 0
if (is.na(type)) {
stop(paste("Unknown method", deltamethod))
}
else if (type == 1) {
typedist <- as.numeric(dist(rbind(x, y)))
}
else if (type == 2) {
typedist <- diss.FRECHET(x, y)
}
else if (type == 3) {
typedist <- dtw(x, y, dist.method = "Manhattan", distance.only = T)$distance
}
(2/(1 + exp(k * corrt))) * typedist
}
Now if you go through that and start reading the script it seems that you are looking for line where corrt <- corrtemporder1(x, y). google it and you get to: https://github.com/cran/TSclust/blob/master/R/diss.R
#############################################################################
################# Temporal Correlation Distance #########################
#############################################################################
##CHOUAKRIA-DOUZAL
corrtemporder1 <- function (x, y) {
p <- length(x)
sum((x[2:p] - x[1:(p-1)]) * (y[2:p] - y[1:(p-1)])) / ( sqrt( sum((x[2:p] - x[1:(p-1)])^2) ) * sqrt( sum((y[2:p] - y[1:(p-1)])^2) ))
}
Now, I think this is what you are looking for.
This is my first post to the R-community, so pardon me if it is silly. I would like to use the functions geom_density2d and stat_density2d in ggplot2 to plot kernel density estimates, but the problem is that they can't handle weighted data. From what I understand, these two functions call the function kde2d from package MASS to make the kernel density estimate. And the kde2d doesn't take data weights as a parameter.
Now, I have found this altered version of kde2d http://www.inside-r.org/node/226757, which takes weights as a parameter and is based on the source code of kde2d. The code of this function:
kde2d.weighted <- function (x, y, w, h, n = 25, lims = c(range(x), range(y))) {
nx <- length(x)
if (length(y) != nx)
stop("data vectors must be the same length")
if (length(w) != nx & length(w) != 1)
stop("weight vectors must be 1 or length of data")
gx <- seq(lims[1], lims[2], length = n) # gridpoints x
gy <- seq(lims[3], lims[4], length = n) # gridpoints y
if (missing(h))
h <- c(bandwidth.nrd(x), bandwidth.nrd(y));
if (missing(w))
w <- numeric(nx)+1;
h <- h/4
ax <- outer(gx, x, "-")/h[1] # distance of each point to each grid point in x-direction
ay <- outer(gy, y, "-")/h[2] # distance of each point to each grid point in y-direction
z <- (matrix(rep(w,n), nrow=n, ncol=nx, byrow=TRUE)*matrix(dnorm(ax), n, nx)) %*% t(matrix(dnorm(ay), n, nx))/(sum(w) * h[1] * h[2]) # z is the density
return(list(x = gx, y = gy, z = z))
}
I would like to make the functions geom_density2d and stat_density2d call kd2d.weighted instead of kde2d, and by that making them accept weighted data.
I have never changed any functions in existing R packages so my question is what is the easiest way doing this?
You can actually pass your own density data to geom_contour which would probably be the easiest. Let's start with a sample dataset by adding weights to the geyser data.
library("MASS")
data(geyser, "MASS")
geyserw <- transform(geyser,
weight = sample(1:5, nrow(geyser), replace=T)
)
Now we use your weighted function to calculate the density and turn it into a data.frame
dens <- kde2d.weighted(geyserw$duration, geyserw$waiting, geyserw$weight)
dfdens <- data.frame(expand.grid(x=dens$x, y=dens$y), z=as.vector(dens$z))
Now we plot the data
ggplot(geyserw, aes(x = duration, y = waiting)) +
geom_point() + xlim(0.5, 6) + ylim(40, 110) +
geom_contour(aes(x=x, y=y, z=z), data= dfdens)
And that should do it
I would like to calculate a density function of a distribution whose characteristics function is known. As a simple example take the normal distribution.
norm.char<-function(t,mu,sigma) exp((0+1i)*t*mu-0.5*sigma^2*t^2)
and then I would like to use R's fft function. but I don't get the multiplicative constants right and I have to reorder the result (take the 2nd half and then the first half of the values). I tried something like
xmax = 5
xmin = -5
deltat = 2*pi/(xmax-xmin)
N=2^8
deltax = (xmax-xmin)/(N-1)
x = xmin + deltax*seq(0,N-1)
t = deltat*seq(0,N-1)
density = Re(fft(norm.char(t*2*pi,mu,sigma)))
density = c(density[(N/2+1):N],density[1:(N/2)])
But this is still not correct. Does anybody know a good reference on the fft in R in the context of density calculations? Obviously the problem is the mixture of the continuous FFT and the discrete one. Can anybody recommend a procedure?
Thanks
It is just cumbersome: take a pen and paper,
write the integral you want to compute
(the Fourier transform of the characteristic function),
discretize it, and rewrite the terms so that they look like
a discrete Fourier transform (the FFT assumes that the interval starts
at zero).
Note that fft is an unnormalized transform: there is no 1/N factor.
characteristic_function_to_density <- function(
phi, # characteristic function; should be vectorized
n, # Number of points, ideally a power of 2
a, b # Evaluate the density on [a,b[
) {
i <- 0:(n-1) # Indices
dx <- (b-a)/n # Step size, for the density
x <- a + i * dx # Grid, for the density
dt <- 2*pi / ( n * dx ) # Step size, frequency space
c <- -n/2 * dt # Evaluate the characteristic function on [c,d]
d <- n/2 * dt # (center the interval on zero)
t <- c + i * dt # Grid, frequency space
phi_t <- phi(t)
X <- exp( -(0+1i) * i * dt * a ) * phi_t
Y <- fft(X)
density <- dt / (2*pi) * exp( - (0+1i) * c * x ) * Y
data.frame(
i = i,
t = t,
characteristic_function = phi_t,
x = x,
density = Re(density)
)
}
d <- characteristic_function_to_density(
function(t,mu=1,sigma=.5)
exp( (0+1i)*t*mu - sigma^2/2*t^2 ),
2^8,
-3, 3
)
plot(d$x, d$density, las=1)
curve(dnorm(x,1,.5), add=TRUE)