I'm trying to write an equation in R (see code below). I was wondering how I can correctly use +- before sqrt() in my code?
x <- seq(0,1,by=0.01)
y <- %+-%sqrt((.5^2)-(x-.5)^2)+.5
Need to plot them separately but the %+-% operator can be used in plotmath expressions. Needs to be flanked by two values, however, hence the need to use the non-printing phantom():
x <- c( seq(0,1,by=0.01) )
y <- c( sqrt((.5^2)-(x-.5)^2)+.5, -sqrt((.5^2)-(x-.5)^2)+.5)
plot( rep(x,times=2), y)
title(main= bquote( phantom(0) %+-% sqrt((.5^2)-(x-.5)^2)+.5))
You may want to have the equation in parametric form, without requiring +- of sqrt.
theta <- seq(0,2*pi,0.01)
x <- 0.5 + 0.5*sin(theta)
y <- 0.5 + 0.5*cos(theta)
plot(x, y)
title(main= substitute(paste('x=(1+sin',theta,')/2, y=(1+cos', theta, ')/2')))
Try this:
draw.circle <- function(stepsize=.01) {
theta <- seq(0,2*pi,by=stepsize)
x <- 0.5 + 0.5*sin(theta)
y <- 0.5 + 0.5*cos(theta)
plot(x, y,type="n",xlim = c(0,1),ylim = c(0,1))
segments(x,y,.5,.5)
}
draw.circle(.01)
draw.circle(.02)
draw.circle(.05)
Related
Does anybody know if it is possible to insert a "is proportional to" symbol in an expression string in R?
Use something like this:
expression(x %~~% y)
expression(x %prop% y)
An example
# Approximately equal
x <- 1:10
y <- x + rnorm(10,0,.01)
plot(x, y, main = expression(y %~~% x))
# Proportional to...
x <- 1:10
y <- 3*x
plot(x, y, main = expression(y %prop% x))
Take a look at ?plotmath for documentation and more examples.
This is my first post to the R-community, so pardon me if it is silly. I would like to use the functions geom_density2d and stat_density2d in ggplot2 to plot kernel density estimates, but the problem is that they can't handle weighted data. From what I understand, these two functions call the function kde2d from package MASS to make the kernel density estimate. And the kde2d doesn't take data weights as a parameter.
Now, I have found this altered version of kde2d http://www.inside-r.org/node/226757, which takes weights as a parameter and is based on the source code of kde2d. The code of this function:
kde2d.weighted <- function (x, y, w, h, n = 25, lims = c(range(x), range(y))) {
nx <- length(x)
if (length(y) != nx)
stop("data vectors must be the same length")
if (length(w) != nx & length(w) != 1)
stop("weight vectors must be 1 or length of data")
gx <- seq(lims[1], lims[2], length = n) # gridpoints x
gy <- seq(lims[3], lims[4], length = n) # gridpoints y
if (missing(h))
h <- c(bandwidth.nrd(x), bandwidth.nrd(y));
if (missing(w))
w <- numeric(nx)+1;
h <- h/4
ax <- outer(gx, x, "-")/h[1] # distance of each point to each grid point in x-direction
ay <- outer(gy, y, "-")/h[2] # distance of each point to each grid point in y-direction
z <- (matrix(rep(w,n), nrow=n, ncol=nx, byrow=TRUE)*matrix(dnorm(ax), n, nx)) %*% t(matrix(dnorm(ay), n, nx))/(sum(w) * h[1] * h[2]) # z is the density
return(list(x = gx, y = gy, z = z))
}
I would like to make the functions geom_density2d and stat_density2d call kd2d.weighted instead of kde2d, and by that making them accept weighted data.
I have never changed any functions in existing R packages so my question is what is the easiest way doing this?
You can actually pass your own density data to geom_contour which would probably be the easiest. Let's start with a sample dataset by adding weights to the geyser data.
library("MASS")
data(geyser, "MASS")
geyserw <- transform(geyser,
weight = sample(1:5, nrow(geyser), replace=T)
)
Now we use your weighted function to calculate the density and turn it into a data.frame
dens <- kde2d.weighted(geyserw$duration, geyserw$waiting, geyserw$weight)
dfdens <- data.frame(expand.grid(x=dens$x, y=dens$y), z=as.vector(dens$z))
Now we plot the data
ggplot(geyserw, aes(x = duration, y = waiting)) +
geom_point() + xlim(0.5, 6) + ylim(40, 110) +
geom_contour(aes(x=x, y=y, z=z), data= dfdens)
And that should do it
I would like to compute the convolution of two probability distributions in R and I need some help. For the sake of simplicity, let's say I have a variable x that is normally distributed with mean = 1.0 and stdev = 0.5, and y that is log-normally distributed with mean = 1.5 and stdev = 0.75. I want to determine z = x + y. I understand that the distribution of z is not known a priori.
As an aside the real world example I am working with requires addition to two random variables that are distributed according to a number of different distributions.
Does anyone know how to add two random variables by convoluting the probability density functions of x and y?
I have tried generating n normally distributed random values (with above parameters) and adding them to n log-normally distributed random values. However, I wish to know if I can use the convolution method instead. Any help would be greatly appreciated.
EDIT
Thank you for these answers. I define a pdf, and try to do the convolution integral, but R complains on the integration step. My pdfs are Log Pearson 3 and are as follows
dlp3 <- function(x, a, b, g) {
p1 <- 1/(x*abs(b) * gamma(a))
p2 <- ((log(x)-g)/b)^(a-1)
p3 <- exp(-1* (log(x)-g) / b)
d <- p1 * p2 * p3
return(d)
}
f.m <- function(x) dlp3(x,3.2594,-0.18218,0.53441)
f.s <- function(x) dlp3(x,9.5645,-0.07676,1.184)
f.t <- function(z) integrate(function(x,z) f.s(z-x)*f.m(x),-Inf,Inf,z)$value
f.t <- Vectorize(f.t)
integrate(f.t, lower = 0, upper = 3.6)
R complains at the last step since the f.t function is bounded and my integration limits are probably not correct. Any ideas on how to solve this?
Here is one way.
f.X <- function(x) dnorm(x,1,0.5) # normal (mu=1.5, sigma=0.5)
f.Y <- function(y) dlnorm(y,1.5, 0.75) # log-normal (mu=1.5, sigma=0.75)
# convolution integral
f.Z <- function(z) integrate(function(x,z) f.Y(z-x)*f.X(x),-Inf,Inf,z)$value
f.Z <- Vectorize(f.Z) # need to vectorize the resulting fn.
set.seed(1) # for reproducible example
X <- rnorm(1000,1,0.5)
Y <- rlnorm(1000,1.5,0.75)
Z <- X + Y
# compare the methods
hist(Z,freq=F,breaks=50, xlim=c(0,30))
z <- seq(0,50,0.01)
lines(z,f.Z(z),lty=2,col="red")
Same thing using package distr.
library(distr)
N <- Norm(mean=1, sd=0.5) # N is signature for normal dist
L <- Lnorm(meanlog=1.5,sdlog=0.75) # same for log-normal
conv <- convpow(L+N,1) # object of class AbscontDistribution
f.Z <- d(conv) # distribution function
hist(Z,freq=F,breaks=50, xlim=c(0,30))
z <- seq(0,50,0.01)
lines(z,f.Z(z),lty=2,col="red")
I was having trouble getting integrate() to work for different density parameters, so I came up with an alternative to #jlhoward's using Riemann approximation:
set.seed(1)
#densities to be convolved. could also put these in the function below
d1 <- function(x) dnorm(x,1,0.5) #
d2 <- function(y) dlnorm(y,1.5, 0.75)
#Riemann approximation of convolution
conv <- function(t, a, b, d) { #a to b needs to cover the range of densities above. d needs to be small for accurate approx.
z <- NA
x <- seq(a, b, d)
for (i in 1:length(t)){
print(i)
z[i] <- sum(d1(x)*d2(t[i]-x)*d)
}
return(z)
}
#check against sampled convolution
X <- rnorm(1000, 1, 0.5)
Y <- rlnorm(1000, 1.5, 0.75)
Z <- X + Y
t <- seq(0, 50, 0.05) #range to evaluate t, smaller increment -> smoother curve
hist(Z, breaks = 50, freq = F, xlim = c(0,30))
lines(t, conv(t, -100, 100, 0.1), type = "s", col = "red")
I would like to plot:
production.ts(31, .002, 10,12,125313.93,211,95,x,"2014-02-01","2014-05-14",z,y) as function of x,y,z
As something like this plot from Mathematica, (if possible in R):
http://i.stack.imgur.com/3PRaf.png
I have a function:
library("lubridate"); library("rgl")
production.ts <- function(a, b, z, c, d, e,
f, g, h, j, r, k) {
elapsed <- (4-z)*10 + (4-c)
un.days <- 100 - elapsed
gone.days <- day(as.Date(h))
rem.days <- day(as.Date(j))
r.days <- as.numeric(as.Date(j) - as.Date(h))
m.r <- f/100*d
inputs <- d * a * (gone.days - 1)/365 + r
prin <- m.r + inputs
costs <- (r.days/365 * r + 1) * prin
added.p <- a/100*d + r
due <- d * 1-un.days
tomr.f <- 1- due + k^2
acct.paid <- (d - due)*tomr.f
net <- added.p + due + acct.paid
pv.net <- net/(1+r*(e-30-day(as.Date(j)))/365)
end <- d - due - acct.paid
more.add.p <- end*a*(rem.days-1)/365
rem <- (f-g)/100 * end
total.fv <- pv.net + rem + more.add.p
out <- costs - total.fv
out
}
x<-seq(-10,10,by=.1)
y<-seq(0,1000,by=.1)
z<-seq(0,90,by=.1)
I have tried:
func.3d<-Vectorize(production.ts(31, .002, 10,12,125313.93,211,95,x,"2014-02-01","2014-05-14",z,y))
c <- func.3d; c <- cut(c,breaks=64); cols <- rainbow(64)[as.numeric(c)]
open3d()
plot3d(x, y, z, col=cols,type="s",size=1)
But this plots lines and the colors don't line up with the values the function should output.
Does anyone know how I could do this? Thanks, I really appreciate your time!
Like this?
x<-seq(-10,10,length=100)
y<-seq(0,1000,length=100)
z<-seq(0,90,length=100)
df <- expand.grid(x=x,y=y,z=z)
f <- function(x,y,z) {production.ts(31, .002, 10,12,125313.93,211,95,x,"2014-02-01","2014-05-14",z,y)}
df$c <- f(df$x,df$y,df$z)
c <- cut(df$c,breaks=64)
cols <- rainbow(64)[as.numeric(c)]
open3d()
plot3d(df$x, df$y, df$z, col=cols,type="p",size=1)
Your code was not plotting lines. When you pass x, y, and z like that to plot3d(...) it cycles through all the elements together, so x[1],y[1],z[1] is a point, x[2],y[2],z[2] is another point, and so on. Since the vectors are different lengths, the shorter ones are recycled to fill out to the length of the longest. The visual effect of this is that the points lie on a line.
You want yo plot every combination of x, y, and z, and give each point a color based on that combination. The code above does that. The plot does not quite look like yours, but I can't tell if that is because of the way you have defined your function.
Also, the way you defined x, y, and z there would be 201 X 10001 X 901 = 1,811,191,101 points, which is too many to handle. The code above plots 1,000,000 points.
Finally, plotting spheres (type="s") is very expensive and unnecessary in this case.
I'm trying to simulate some data (x1 and x2 - my explanatory variables), calculate y using a specified function + random noise and plot the resulting observations AND the true regression surface. Here's what I have so far:
set.seed(1)
library(rgl)
# Simulate some data
x1 <- runif(50)
x2 <- runif(50)
y <- sin(x1)*x2+x1*x2 + rnorm(50, sd=0.3)
# 3D scatterplot of observations
plot3d(x1,x2,y, type="p", col="red", xlab="X1", ylab="X2", zlab="Y", site=5, lwd=15)
Now I'm not sure how I can add the "true" regression plane. I'm basically looking for something like curve() where I can plug in my (true) model formula.
Thanks!
If you wanted a plane, you could use planes3d.
Since your model is not linear, it is not a plane: you can use surface3d instead.
my_surface <- function(f, n=10, ...) {
ranges <- rgl:::.getRanges()
x <- seq(ranges$xlim[1], ranges$xlim[2], length=n)
y <- seq(ranges$ylim[1], ranges$ylim[2], length=n)
z <- outer(x,y,f)
surface3d(x, y, z, ...)
}
library(rgl)
f <- function(x1, x2)
sin(x1) * x2 + x1 * x2
n <- 200
x1 <- 4*runif(n)
x2 <- 4*runif(n)
y <- f(x1, x2) + rnorm(n, sd=0.3)
plot3d(x1,x2,y, type="p", col="red", xlab="X1", ylab="X2", zlab="Y", site=5, lwd=15)
my_surface(f, alpha=.2 )
Apologies: ( I didn't read the question very carefllly and now see that I rushed into estimation when you wanted to plot the Truth.)
Here's an approach to estimation followed by surface plotting using loess:
mod2 <- loess(y~x1+x2)
grd<- data.frame(x1=seq(range(x1)[1],range(x1)[2],len=20),
x2=seq(range(x2)[1],range(x2)[2],len=20))
grd$pred <- predict(mod2, newdata=grd)
grd <- grd[order(grd$x1,grd$x2),]
x1 <- unique(grd$x1)
x2 <- unique(grd$x2) # shouldn't have used y
surface3d(x1, x2, z=matrix(grd$pred,length(x1),length(x2)) )
IRTFM's somewhat imperfect answers above let me to a thread on the CRAN help pages. https://stat.ethz.ch/pipermail/r-help/2013-December/364037.html
I extracted the relevant bits of code and turned them into a function like so:
require(rgl)
pred.surf.3d <- function(df, x.nm,y.nm,z.nm, ...){
x <- df[,x.nm]; y <- df[,y.nm]; z<-df[,z.nm]
fit <- lm(z ~ x + y + x*y + x^2 + y^2)
xnew <- seq(range(x)[1],range(x)[2],len=20)
ynew <- seq(range(y)[1],range(y)[2],len=20)
df <- expand.grid(x=xnew, y=ynew)
df$z <- predict(fit, newdata=df)
with(df, surface3d(xnew, ynew, z=df$z))
}
I may end up bundling this into my CRAN utility package at some point.
In the mean time, I hope you find it useful! (Run it on IRTFM's first code chunk like so:)
pred.surf.3d(data.frame(x1,x2,y),'x1','x2','y')