cooks distance plot with R - r

Does anybody know, how to grab the single cooks distance plot that you get from this code:
treatment <- factor(rep(c(1, 2), c(43, 41)), levels = c(1, 2), labels = c("placebo","treated"))
improved <- factor(rep(c(1, 2, 3, 1, 2, 3), c(29, 7, 7, 13, 7, 21)), levels = c(1, 2,3),labels = c("none", "some", "marked"))
numberofdrugs <- rpois(84, 5)+1
healthvalue <- rpois(84,5)
y <- data.frame(healthvalue, numberofdrugs, treatment, improved)
test <- glm(healthvalue~numberofdrugs+treatment+improved, y, family=poisson)
par(mfrow=c(2,2))
plot(test) # how to grab plot 2.1 ?
What I don't like to have is this
par(mfrow=c(1, 1))
plot(test, which=c(4))
because it doesn't have residuals on the y axis and leverage on the x axis!
Thanks guys

I'm not quite sure what your problem is. You seem to want the plot with residuals on the y axis and leverage on the x axis. Isn't that just the 5th (of 6) plot generated:
plot(test,which=5)
You can read more about this at ?plot.lm
Edit to address OP's question about setting y axis labels:
Usually, simply adding ylab="My Label" to the plot() call would work, but these graphs are designed to be produced "automatically" and so certain graphical parameters are 'hard coded'. If you pass your own ylab value, you'll get an error, as plot.lm() will be presented with two ylab's and won't know which one to use. If you really don't like the y axis label, your only option here is to grab the plot.lm code (just type 'plot.lm' at the console and hit enter) copy and paste it into a text file and look for this section:
if (show[5L]) {
ylab5 <- if (isGlm)
"Std. Pearson resid."
else "Standardized residuals"
r.w <- residuals(x, "pearson")
if (!is.null(w))
r.w <- r.w[wind]
rsp <- dropInf(r.w/(s * sqrt(1 - hii)), hii)
ylim <- range(rsp, na.rm = TRUE)
if (id.n > 0) {
ylim <- extendrange(r = ylim, f = 0.08)
show.rsp <- order(-cook)[iid]
}
and modify it with your own y axis label. Rename the function (say, plotLMCustomY, or something) and it should work.

Related

How to plot a surface in rgl plot3d

So I have this code that produces the exact surface
f = function(x, y){
z = ((x^2)+(3*y^2))*exp(-(x^2)-(y^2))
}
plot3d(f, col = colorRampPalette(c("blue", "white")),
xlab = "X", ylab = "Y", zlab = "Z",
xlim = c(-3, 3), ylim = c(-3, 3),
aspect = c(1, 1, 0.5))
Giving the following plot:
Now I have some code that does a random walk metropolis algorithm to reproduce the above image. I think it works as if I do another plot of these calculated values I get the next image with 500 points. Here is the code
open3d()
plot3d(x0, y0, f(x0, y0), type = "p")
Which gives the following plot:
I know it's hard looking at this still image but being able to rotate the sampling is working.
Now here is my question: How can I use plot3d() so that I can have a surface that connects all these points and gives a more jagged representation of the exact plot? Or how can I have each point in the z axis as a bar from the xy plane? I just want something more 3 dimensional than points and I can't find how to do this.
Thanks for your help
You can do this by triangulating the surface. You don't give us your actual data, but I can create some similar data using
f = function(x, y){
z = ((x^2)+(3*y^2))*exp(-(x^2)-(y^2))
}
x <- runif(500, -3, 3)
y <- runif(500, -3, 3)
z <- f(x, y)
Then the plotting is done using the method in ?persp3d.deldir:
library(deldir)
library(rgl)
col <- colorRampPalette(c("blue", "white"))(20)[1 + round(19*(z - min(z))/diff(range(z)))]
dxyz <- deldir::deldir(x, y, z = z, suppressMsge = TRUE)
persp3d(dxyz, col = col, front = "lines", back = "lines")
This might need some cosmetic fixes, e.g.
aspect3d(2, 2, 1)
After some rotation, this gives me the following plot:
I'm not sure to understand what you want. If my understanding is correct, here is a solution. Define a parametric representation of your surface:
fx <- function(u,v) u
fy <- function(u,v) v
fz <- function(u,v){
((u^2)+(3*v^2))*exp(-(u^2)-(v^2))
}
Let's say you have these points:
x0 <- seq(-3, 3, length.out = 20)
y0 <- seq(-3, 3, length.out = 20)
Then you can use the function parametric3d of the misc3d package, with the option fill=FALSE to get a wireframe:
library(misc3d)
parametric3d(fx, fy, fz, u=x0, v=y0,
color="blue", fill = FALSE)
Is it what you want?
To get some vertical bars, use the function segments3d of rgl:
i <- 8
bar <- rbind(c(x0[i],y0[i],0),c(x0[i],y0[i],f(x0[i],y0[i])))
segments3d(bar, color="red")
Here is a plot with only 50 points using my original code.
When I then apply what was said by Stéphane Laurent I then get this plot which feels too accurate when given the actual points I have
Perhaps you need to explain to me what is actually happening in the function parametric3d

How to make logarithmic axes in plot3d (library("rgl")) in R?

I am having extreme difficulty in making my axes logarithmic/have custom tick marks in plot3d using the rgl package. I've tried using the "log='xy'" command in my code just like you would in the basic plot function, and I have tried to create custom tick marks using rgl.bbox. My y axis is plotting fine but my x and z are not cooperating. I cannot get anything to work. Any ideas? Below is my data, code, and a picture of the result I'm getting. I should also add that I'm basically plotting multiple 2d scatterplots in 3d using an arbitrary z value to separate the individual 2d plots.
https://www.dropbox.com/s/wv24rmnyalm3vvc/scattertest.csv?dl=0
#!/usr/bin/env Rscript
library("rgl")
data <- read.csv("~/Desktop/scattertest.csv", header=TRUE, fill=TRUE, sep=',')
x <- names(data[2])
y <- names(data[3])
z <- names(data[4])
plot3d(data[[x]], data[[z]], data[[y]], type="s", size=0.75, lit=FALSE, axes=FALSE,
xlab="rpmn", ylab="round", zlab="rpmt", log="xz",
xmin=c(0.1, 10^6), ymin=c(1,4), zmin=c(0.1, 10^6))
rgl.bbox(color="grey50", emission="grey50",
xat = c(0.1, 1, 10, 100, 10^3, 10^4, 10^5, 10^6), yat = c(1, 2, 3, 4), zat = c(0.1, 1, 10, 100, 10^3, 10^4, 10^5, 10^6),
xlen=8, ylen=4, zlen=8)
There's no support for log="xy" in plot3d(), you'll need to do the transformation yourself.
Your code asks for logarithmic labels, but you aren't doing the logarithmic transformation, so it's not working. You need to rescale the data as well.
You didn't post a reproducible example, but it's easy to create one:
x <- rlnorm(20, 2, 6)
y <- runif(20, 1, 4)
z <- rlnorm(20, 2, 6)
xyz <- cbind(log(x), y, log(z))
plot3d(xyz, axes = FALSE)
ticks <- 10^((-1):6)
bbox3d(xat = log(ticks), xlab = ticks, yat = pretty(1:4),
zat = log(ticks), zlab = ticks,
color="grey50", emission="grey50")

R legend for color density scatterplot produced using smoothScatter

I am producing a color density scatterplot in R using the smoothScatter() function.
Example:
## A largish data set
n <- 10000
x1 <- matrix(rnorm(n), ncol = 2)
x2 <- matrix(rnorm(n, mean = 3, sd = 1.5), ncol = 2)
x <- rbind(x1, x2)
oldpar <- par(mfrow = c(2, 2))
smoothScatter(x, nrpoints = 0)
Output:
The issue I am having is that I am unsure how to add a legend/color scale that describes the relative difference in numeric terms between different shades. For example, there is no way to tell whether the darkest blue in the figure above is 2 times, 10 times or 100 times as dense as the lightest blue without some sort of legend or color scale. Is there any way in R to retrieve the requisite information to make such a scale, or anything built in that can produce a color scale of this nature automatically?
Here is an answer that relies on fields::imageplot and some fiddling with par(mar) to get the margins correct
fudgeit <- function(){
xm <- get('xm', envir = parent.frame(1))
ym <- get('ym', envir = parent.frame(1))
z <- get('dens', envir = parent.frame(1))
colramp <- get('colramp', parent.frame(1))
fields::image.plot(xm,ym,z, col = colramp(256), legend.only = T, add =F)
}
par(mar = c(5,4,4,5) + .1)
smoothScatter(x, nrpoints = 0, postPlotHook = fudgeit)
You can fiddle around with image.plot to get what you want and look at ?bkde2D and the transformation argument to smoothScatter to get an idea of what the colours represent.

How to plot a normal distribution by labeling specific parts of the x-axis?

I am using the following code to create a standard normal distribution in R:
x <- seq(-4, 4, length=200)
y <- dnorm(x, mean=0, sd=1)
plot(x, y, type="l", lwd=2)
I need the x-axis to be labeled at the mean and at points three standard deviations above and below the mean. How can I add these labels?
The easiest (but not general) way is to restrict the limits of the x axis. The +/- 1:3 sigma will be labeled as such, and the mean will be labeled as 0 - indicating 0 deviations from the mean.
plot(x,y, type = "l", lwd = 2, xlim = c(-3.5,3.5))
Another option is to use more specific labels:
plot(x,y, type = "l", lwd = 2, axes = FALSE, xlab = "", ylab = "")
axis(1, at = -3:3, labels = c("-3s", "-2s", "-1s", "mean", "1s", "2s", "3s"))
Using the code in this answer, you could skip creating x and just use curve() on the dnorm function:
curve(dnorm, -3.5, 3.5, lwd=2, axes = FALSE, xlab = "", ylab = "")
axis(1, at = -3:3, labels = c("-3s", "-2s", "-1s", "mean", "1s", "2s", "3s"))
But this doesn't use the given code anymore.
If you like hard way of doing something without using R built in function or you want to do this outside R, you can use the following formula.
x<-seq(-4,4,length=200)
s = 1
mu = 0
y <- (1/(s * sqrt(2*pi))) * exp(-((x-mu)^2)/(2*s^2))
plot(x,y, type="l", lwd=2, col = "blue", xlim = c(-3.5,3.5))
An extremely inefficient and unusual, but beautiful solution, which works based on the ideas of Monte Carlo simulation, is this:
simulate many draws (or samples) from a given distribution (say the normal).
plot the density of these draws using rnorm. The rnorm function takes as arguments (A,B,C) and returns a vector of A samples from a normal distribution centered at B, with standard deviation C.
Thus to take a sample of size 50,000 from a standard normal (i.e, a normal with mean 0 and standard deviation 1), and plot its density, we do the following:
x = rnorm(50000,0,1)
plot(density(x))
As the number of draws goes to infinity this will converge in distribution to the normal. To illustrate this, see the image below which shows from left to right and top to bottom 5000,50000,500000, and 5 million samples.
In general case, for example: Normal(2, 1)
f <- function(x) dnorm(x, 2, 1)
plot(f, -1, 5)
This is a very general, f can be defined freely, with any given parameters, for example:
f <- function(x) dbeta(x, 0.1, 0.1)
plot(f, 0, 1)
I particularly love Lattice for this goal. It easily implements graphical information such as specific areas under a curve, the one you usually require when dealing with probabilities problems such as find P(a < X < b) etc.
Please have a look:
library(lattice)
e4a <- seq(-4, 4, length = 10000) # Data to set up out normal
e4b <- dnorm(e4a, 0, 1)
xyplot(e4b ~ e4a, # Lattice xyplot
type = "l",
main = "Plot 2",
panel = function(x,y, ...){
panel.xyplot(x,y, ...)
panel.abline( v = c(0, 1, 1.5), lty = 2) #set z and lines
xx <- c(1, x[x>=1 & x<=1.5], 1.5) #Color area
yy <- c(0, y[x>=1 & x<=1.5], 0)
panel.polygon(xx,yy, ..., col='red')
})
In this example I make the area between z = 1 and z = 1.5 stand out. You can move easily this parameters according to your problem.
Axis labels are automatic.
This is how to write it in functions:
normalCriticalTest <- function(mu, s) {
x <- seq(-4, 4, length=200) # x extends from -4 to 4
y <- (1/(s * sqrt(2*pi))) * exp(-((x-mu)^2)/(2*s^2)) # y follows the formula
of the normal distribution: f(Y)
plot(x,y, type="l", lwd=2, xlim = c(-3.5,3.5))
abline(v = c(-1.96, 1.96), col="red") # draw the graph, with 2.5% surface to
either side of the mean
}
normalCriticalTest(0, 1) # draw a normal distribution with vertical lines.
Final result:

Plotting multiple curves same graph and same scale

This is a follow-up of this question.
I wanted to plot multiple curves on the same graph but so that my new curves respect the same y-axis scale generated by the first curve.
Notice the following example:
y1 <- c(100, 200, 300, 400, 500)
y2 <- c(1, 2, 3, 4, 5)
x <- c(1, 2, 3, 4, 5)
# first plot
plot(x, y1)
# second plot
par(new = TRUE)
plot(x, y2, axes = FALSE, xlab = "", ylab = "")
That actually plots both sets of values on the same coordinates of the graph (because I'm hiding the new y-axis that would be created with the second plot).
My question then is how to maintain the same y-axis scale when plotting the second graph.
(The typical method would be to use plot just once to set up the limits, possibly to include the range of all series combined, and then to use points and lines to add the separate series.) To use plot multiple times with par(new=TRUE) you need to make sure that your first plot has a proper ylim to accept the all series (and in another situation, you may need to also use the same strategy for xlim):
# first plot
plot(x, y1, ylim=range(c(y1,y2)))
# second plot EDIT: needs to have same ylim
par(new = TRUE)
plot(x, y2, ylim=range(c(y1,y2)), axes = FALSE, xlab = "", ylab = "")
This next code will do the task more compactly, by default you get numbers as points but the second one gives you typical R-type-"points":
matplot(x, cbind(y1,y2))
matplot(x, cbind(y1,y2), pch=1)
points or lines comes handy if
y2 is generated later, or
the new data does not have the same x but still should go into the same coordinate system.
As your ys share the same x, you can also use matplot:
matplot (x, cbind (y1, y2), pch = 19)
(without the pch matplopt will plot the column numbers of the y matrix instead of dots).
You aren't being very clear about what you want here, since I think #DWin's is technically correct, given your example code. I think what you really want is this:
y1 <- c(100, 200, 300, 400, 500)
y2 <- c(1, 2, 3, 4, 5)
x <- c(1, 2, 3, 4, 5)
# first plot
plot(x, y1,ylim = range(c(y1,y2)))
# Add points
points(x, y2)
DWin's solution was operating under the implicit assumption (based on your example code) that you wanted to plot the second set of points overlayed on the original scale. That's why his image looks like the points are plotted at 1, 101, etc. Calling plot a second time isn't what you want, you want to add to the plot using points. So the above code on my machine produces this:
But DWin's main point about using ylim is correct.
My solution is to use ggplot2. It takes care of these types of things automatically. The biggest thing is to arrange the data appropriately.
y1 <- c(100, 200, 300, 400, 500)
y2 <- c(1, 2, 3, 4, 5)
x <- c(1, 2, 3, 4, 5)
df <- data.frame(x=rep(x,2), y=c(y1, y2), class=c(rep("y1", 5), rep("y2", 5)))
Then use ggplot2 to plot it
library(ggplot2)
ggplot(df, aes(x=x, y=y, color=class)) + geom_point()
This is saying plot the data in df, and separate the points by class.
The plot generated is
I'm not sure what you want, but i'll use lattice.
x = rep(x,2)
y = c(y1,y2)
fac.data = as.factor(rep(1:2,each=5))
df = data.frame(x=x,y=y,z=fac.data)
# this create a data frame where I have a factor variable, z, that tells me which data I have (y1 or y2)
Then, just plot
xyplot(y ~x|z, df)
# or maybe
xyplot(x ~y|z, df)

Resources