Conway Maxwell Distribution Density Plot - r

I have written my own code to simulate the Conway maxwell distribution sample.
This is the pmf (Guikema & Goffelt, 2008):
However, I have met some problem to plot the density plot.
rcomp <- function(n,lamb,v)
{
u <- runif(n)
w <- integer(n)
for(i in 1:n) {
z=sum(sapply( 0:100, function(j) (( ((lamb)^j) / (factorial(j)) )^v) ))
x <- seq(1, 50, 1) #seq of 1 to 50, increase by 1
px <- (((lamb^x)/factorial(x))^v)/z
# px is pmf of re-parameter conway maxwell
w[i] <- if (u[i] < px[1]) 0 else (max (which (cumsum(px) <= u[i])))
}
return (w)
}
dcomp <- function(x,lamb,v) {
z=sum(sapply( 0:100, function(j) (( ((lamb)^j) / (factorial(j)) )^v) ))
px <- (((lamb^x)/factorial(x))^v)/z
return(px)
}
As I wanna plot the density plot to check whether lamb or v is location parameter, the plot I get is weird.
x = rcomp(100,6,0.2); pdf = dcomp(x,6,0.2)
x1 = rcomp(100,6,0.5); pdf1 = dcomp(x1,6,0.5)
x2 = rcomp(100,6,0.7); pdf2 = dcomp(x2,6,0.7)
plot(x2, pdf2, type="l", lwd=1,lty=1,col="blue")
How could I solve this problem?
Source: Guikema & Goffelt (2008), A Flexible Count Data Regression Model for Risk Analysis. Risk Analysis 28(1): 215.

You have to sort the values of the x coordinate if you want a graph to connect the points in their axis order.
Note, however, that there might be better ways to graph the density you want. See the red curve. I first create a vector x of values within a certain range and then compute the PDF for those values. These pairs (x, y) are what function lines plots.
set.seed(2673) # Make the results reproducible
x2 <- rcomp(100, 6, 0.7)
x2 <- sort(x2)
pdf2 <- dcomp(x2, 6, 0.7)
plot(x2, pdf2, type = "l", lwd = 1, lty = 1, col = "blue")
x <- seq(0, 50, length.out = 100)
y <- dcomp(x, 6, 0.2)
lines(x, y, type = "l", col = "red")

Related

This is my goal: Plot the average of z according to bins formed by x and y in R

So I came across this answer here, and my question is, if I have three variables and I want to use the x and y to create bins, like using cut and table in the other answer, how can I then graph the z as the average of all the variable Z data that falls into those bins?
This what I have:
library(plot3D)
x <- data$OPEXMKUP_PT_1d
y <- data$prod_opex
z <- data$ab90_ROIC_wogw3
x_c <- cut(x, 20)
y_c <- cut(y, 20)
cutup <- table(x_c, y_c)
mat <- data.frame(cutup)
hist3D(z = cutup, border="black", bty ="g",
main = "Data", xlab = "Markup",
ylab ="Omega", zlab = "Star")
But it show the z as the frequency, and when I try,
hist3D(x, y, z, phi = 0, bty = "g", type = "h", main = 'NEWer',
ticktype = "detailed", pch = 19, cex = 0.5,
xlim=c(0,3),
ylim=c(-10,20),
zlim=c(0,1))
It thinks for a long time and throws an error,
Error: protect(): protection stack overflow
Graphics error: Plot rendering error
It will do the 3d scatter fine but the data doesn't make sense since the Z variable is a ratio that falls mostly between 0 and 1, so you get a bunch of tall lines and and a bunch of short lines. I would like them averaged by bin to show a visual of how the average ratio changes as x and y change. Please let me know if there is a way to do this.
Not sure exactly what your data looks like, so I made some up. You should be able to adjust to your needs. It's a bit hacky/brute force-ish, but could work just fine if your data isn't too large to slow down the loop.
library(plot3D)
# Fake it til you make it
n = 5000
x = runif(n)
y = runif(n)
z = x + 2*y + sin(x*2*pi)
# Divide into bins
x_c = cut(x, 20)
y_c = cut(y, 20)
x_l = levels(x_c)
y_l = levels(y_c)
# Compute the mean of z within each x,y bin
z_p = matrix(0, 20, 20)
for (i in 1:length(x_l)){
for (j in 1:length(y_l)){
z_p[i,j] = mean(z[x_c %in% x_l[i] & y_c %in% y_l[j]])
}
}
# Get the middle of each bin
x_p = sapply(strsplit(gsub('\\(|]', '', x_l), ','), function(x) mean(as.numeric(x)))
y_p = sapply(strsplit(gsub('\\(|]', '', y_l), ','), function(x) mean(as.numeric(x)))
# Plot
hist3D(x_p, y_p, z_p, bty = "g", type = "h", main = 'NEWer',
ticktype = "detailed", pch = 19, cex = 0.5)
Basically, we're just manually computing the average bin height z by looping over the bins. There may be a better way to do the computation.

Persp in r, how to fix surface extends beyond the box

I am trying to create a clean perspective plot. I am able to create a nice plot with a predictive "mesh" based on my data, however it extends past my x and y limits. My code is below. I apologize for the lack of reproducible data.
dat<-data.frame(x,y,z);rm(x,y,z)
m1i<-(lm(z~poly(y,2)*x, data=dat))
xr<-range(dat$x)
xseq<-seq((xr[1]-1),xr[2], length=30)#the subtraction just made my prediction limits larger than what my data has- so it predicts for data I don't have
yr<-range(dat$y)
yseq<-seq((yr[1]-0.5),yr[2], length=30)#same as above, just so my predictions started at 0
zp<-outer(xseq,yseq, function(a,b) predict(m1i, newdata=data.frame(x=a,y=b)))
nrz<-nrow(zp)
ncz<-ncol(zp)
jet.colors<-colorRampPalette(c("grey60","white"))
nbcol<-100
color<-jet.colors(nbcol)
zfacet<-zp[-1,-1]+zp[-1,-ncz]+zp[-nrz,-1]+zp[-nrz,-ncz]
facetcol<-cut(zfacet,nbcol)
res<-persp(x=xseq,y=yseq,z=zp, col=color[facetcol],theta=40, phi=10,
ylab="Set Time (hr)", xlab="Distance (m)",
zlab="Proportion Captured", nticks=5, ticktype="detailed",
xlim=c(0,5),
ylim=c(0,4), zlim=c(0,1.1))
I get a warning when I run the code
In persp.default(x = xseq, y = yseq, z = zp, col = color[facetcol], :
surface extends beyond the box
I would like to cut off the surface so it ends at my box limits.
perspPlotExample
There are several reasons why you can get this warning. Either your x, y or z variable extends beyond the limits you set. The fix for the x and y variables is to cut the variables where necessary. The z variable you can set to NA. Below is a reproducible example.
# generate data
N <- 100
x <- rnorm(N, 2, 0.5)
y <- rnorm(N, 2, 0.4)
z <- 0.1*y^2 * x + rnorm(N)
dat<-data.frame(x,y,z);rm(x,y,z)
# run code from question (not copied)
# set z beyond limit to NA
zp[zp < 0] <- NA
zp[zp > 1.1] <- NA
# plot
persp(x = xseq[xseq > 0 & xseq < 5],
y = yseq[yseq > 0 & yseq < 4],
z = zp[xseq > 0 & xseq < 5, yseq > 0 & yseq < 4],
theta = 40, phi = 10,
ylab = "Set Time (hr)", xlab = "Distance (m)",
zlab = "Proportion Captured", nticks = 5, ticktype = "detailed",
xlim = c(0,5),
ylim = c(0,4),
zlim = c(0,1.1))

Overlay many plots with a different range of x

I would like to make a plot like the this image what I want, however I don't know how. I wrote the code below but I don't find a way to obtain the plot. The point here is to add density lines to my original plot (Relation Masa-SFR) the density is supposed to be every 0.3 in x. I mean one line from 7 to 7.3, the next one from 7.3 to 7.6 and so on. With the code below (continue until x=12), I obtain the this [plot][2]
plot(SFsl$MEDMASS, SFR_SalpToMPA,xlim= range(7:12),
ylim= range(-3:2.5),ylab="log(SFR(M(sun)/yr)",
xlab="log(M(star)/(M(sun)")
title("Relacion Masa-SFR")
par(new=TRUE)
FCUTsfrsl1=(SFsl$MEDMASS >= 7 & SFsl$MEDMASS <=7.3 &
SFR_SalpToMPA < 2 & SFR_SalpToMPA > -3)
x <- SFR_SalpToMPA[FCUTsfrsl1]
y <- density(x)
plot(y$y, y$x, type='l',ylim=range(-3:2.5), col="red",
ylab="", xlab="", axes=FALSE)
I did what you said but I obtained this plot, I don't know if I did something wrong
Since I don't have your data, I had to make some up. If this does what you want, I think you can adapt it to your actual data.
set.seed(7)
x <- runif(1000, 7, 12)
y <- runif(1000, -3, 3)
DF <- data.frame(x = x, y = y)
plot(DF$x, DF$y)
# Cut the x axis into 0.3 unit segments, compute the density and plot
br <- seq(7, 12, 0.333)
intx <- cut(x, br) # intervals
intx2 <- as.factor(cut(x, br, labels = FALSE)) # intervals by code
intx3 <- split(x, intx) # x values
inty <- split(y, intx2) # corresponding y values for density calc
for (i in 1:length(intx3)) {
xx <- seq(min(intx3[[i]]), max(intx3[[i]]), length.out = 512)
lines(xx, density(inty[[i]])$y, col = "red")
}
This produce the following image. You need to look closely but there is a separate density plot for each 0.3 unit interval.
EDIT Change the dimension that is used to compute the density
set.seed(7)
x <- runif(1000, 7, 12)
y <- runif(1000, -3, 3)
DF <- data.frame(x = x, y = y)
plot(DF$x, DF$y, xlim = c(7, 15))
# Cut the x axis into 0.3 unit segments, compute the density and plot
br <- seq(7, 12, 0.333)
intx <- cut(x, br) # intervals
intx2 <- as.factor(cut(x, br, labels = FALSE)) # intervals by code
intx3 <- split(x, intx) # x values
inty <- split(y, intx2) # corresponding y values
# This gives the density values in the horizontal direction (desired)
# This is the change, the above is unchanged.
for (i in 1:length(intx3)) {
yy <- seq(min(inty[[i]]), max(inty[[i]]), length.out = 512)
offset <- min(intx3[[i]])
lines(density(intx3[[i]])$y + offset, yy, col = "red")
}
Which gives:

Visualize a function using double integration in R - Wacky Result

I am trying to visualize a curve for pollination distribution. I am very new to R so please don't be upset by my stupidity.
llim <- 0
ulim <- 6.29
f <- function(x,y) {(.156812/((2*pi)*(.000005^2)*(gamma(2/.156812)))*exp(-((sqrt(x^2+y^2))/.000005)^.156812))}
integrate(function(y) {
sapply(y, function(y) {
integrate(function(x) f(x,y), llim, ulim)$value
})
}, llim, ulim)
fv <- Vectorize(f)
curve(fv, from=0, to=1000)
And I get:
Error in y^2 : 'y' is missing
I'm not quite sure what you're asking to plot. But I know you want to visualise your scalar function of two arguments.
Here are some approaches. First we define your function.
llim <- 0
ulim <- 6.29
f <- function(x,y) {
(.156812/((2*pi)*(.000005^2)*(gamma(2/.156812)))*exp(-((sqrt(x^2+y^2))/.000005)^.156812))
}
From your title I thought of the following. The function defined below intf integrates your function over the square [0,ul] x [0,ul] and return the value. We then vectorise and plot the integral over the square as a function the length of the side of the square.
intf <- function(ul) {
integrate(function(y) {
sapply(y, function(y) {
integrate(function(x) f(x,y), 0, ul)$value
})
}, 0, ul)$value
}
fv <- Vectorize(intf)
curve(fv, from=0, to=1000)
If f is a distribution, I guess you can make your (somewhat) nice probability interpretation of this curve. (I.e. ~20 % probability of pollination(?) in the 200 by 200 meter square.)
However, you can also do a contour plot (of the log-transformed values) which illustrate the function we are integrating above:
logf <- function(x, y) log(f(x, y))
x <- y <- seq(llim, ulim, length.out = 100)
contour(x, y, outer(x, y, logf), lwd = 2, drawlabels = FALSE)
You can also plot some profiles of the surface:
plot(1, xlim = c(llim, ulim), ylim = c(0, 0.005), xlab = "x", ylab = "f")
y <- seq(llim, ulim, length.out = 6)
for (i in seq_along(y)) {
tmp <- function(x) f(x, y = y[i])
curve(tmp, llim, ulim, add = TRUE, col = i)
}
legend("topright", lty = 1, col = seq_along(y),
legend = as.expression(paste("y = ",y)))
They need to be modified a bit to make them publication worthy, but you get the idea. Lastly, you can do some 3d plots as others have suggested.
EDIT
As per your comments, you can also do something like this:
# Define the function times radius (this time with general a and b)
# The default of a and b is as before
g <- function(z, a = 5e-6, b = .156812) {
z * (b/(2*pi*a^2*gamma(2/b)))*exp(-(z/a)^b)
}
# A function that integrates g from 0 to Z and rotates
# As g is not dependent on the angle we just multiply by 2pi
intg <- function(Z, ...) {
2*pi*integrate(g, 0, Z, ...)$value
}
# Vectorize the Z argument of intg
gv <- Vectorize(intg, "Z")
# Plot
Z <- seq(0, 1000, length.out = 100)
plot(Z, gv(Z), type = "l", lwd = 2)
lines(Z, gv(Z, a = 5e-5), col = "blue", lwd = 2)
lines(Z, gv(Z, b = .150), col = "red", lwd = 2)
lines(Z, gv(Z, a = 1e-4, b = .2), col = "orange", lwd = 2)
You can then plot the curves for the a and b you want. If either is not specified, the default is used.
Disclaimer: my calculus is rusty and I just did off this top of my head. You should verify that I've done the rotation of the function around the axis properly.
The lattice package has several functions that can help you draw 3 dimensional plots, including wireframe() and persp(). If you prefer not to use a 3d-plot, you can create a contour plot using contour().
Note: I don't know if this is intentional, but your data produces a very large spike in one corner of the plot. This produces a plot that is for all intents flat, with a barely noticable spike in one corner. This is particularly problematic with the contour plot below.
library(lattice)
x <- seq(0, 1000, length.out = 50)
y <- seq(0, 1000, length.out = 50)
First the wire frame plot:
df <- expand.grid(x=x, y=y)
df$z <- with(df, f(x, y))
wireframe(z ~ x * y, data = df)
Next the perspective plot:
dm <- outer(x, y, FUN=f)
persp(x, y, dm)
The contour plot:
contour(x, y, dm)

Getting values from kernel density estimation in R

I am trying to get density estimates for the log of stock prices in R. I know I can plot it using plot(density(x)). However, I actually want values for the function.
I'm trying to implement the kernel density estimation formula. Here's what I have so far:
a <- read.csv("boi_new.csv", header=FALSE)
S = a[,3] # takes column of increments in stock prices
dS=S[!is.na(S)] # omits first empty field
N = length(dS) # Sample size
rseed = 0 # Random seed
x = rep(c(1:5),N/5) # Inputted data
set.seed(rseed) # Sets random seed for reproducibility
QL <- function(dS){
h = density(dS)$bandwidth
r = log(dS^2)
f = 0*x
for(i in 1:N){
f[i] = 1/(N*h) * sum(dnorm((x-r[i])/h))
}
return(f)
}
QL(dS)
Any help would be much appreciated. Been at this for days!
You can pull the values directly from the density function:
x = rnorm(100)
d = density(x, from=-5, to = 5, n = 1000)
d$x
d$y
Alternatively, if you really want to write your own kernel density function, here's some code to get you started:
Set the points z and x range:
z = c(-2, -1, 2)
x = seq(-5, 5, 0.01)
Now we'll add the points to a graph
plot(0, 0, xlim=c(-5, 5), ylim=c(-0.02, 0.8),
pch=NA, ylab="", xlab="z")
for(i in 1:length(z)) {
points(z[i], 0, pch="X", col=2)
}
abline(h=0)
Put Normal density's around each point:
## Now we combine the kernels,
x_total = numeric(length(x))
for(i in 1:length(x_total)) {
for(j in 1:length(z)) {
x_total[i] = x_total[i] +
dnorm(x[i], z[j], sd=1)
}
}
and add the curves to the plot:
lines(x, x_total, col=4, lty=2)
Finally, calculate the complete estimate:
## Just as a histogram is the sum of the boxes,
## the kernel density estimate is just the sum of the bumps.
## All that's left to do, is ensure that the estimate has the
## correct area, i.e. in this case we divide by $n=3$:
plot(x, x_total/3,
xlim=c(-5, 5), ylim=c(-0.02, 0.8),
ylab="", xlab="z", type="l")
abline(h=0)
This corresponds to
density(z, adjust=1, bw=1)
The plots above give:

Resources