Find y-value given x-value on a beta distribution - r

I am trying to find the y-values of points on a beta curve.
This is my beta; let's say I would like to find the point whose x-value is 0.6, for example:
x=seq(0,1,length=100)
y=dbeta(x,7,2)
plot(x,y, type="l", col="blue")
abline(v=0.6)
I have tried to add the corresponding point, but for some reason it does not work:
points(0.6, beta(7, 2), cex=3, pch=20, col="black")
Once fixed this problem, how can I find the y-value?
I looked online; I found some examples using approxfun but I don't know how to apply it to this problem.

You need to use dbeta() instead of beta() (assuming that's not a typo), and specify all three of x, shape1, and shape2. I think you want
points(0.6, dbeta(0.6, shape1=7, shape2=2),
cex=3, pch=20, col="black")
If you want to store the actual y-value in a variable, use
bval <- dbeta(0.6, shape1=7, shape2=2)

Related

Specificity/Sensitivity vs cut-off points using pROC package

I need to plot the following graph so I can choose the optimal threshold for a logistic regression model.
However I can't use the packages (epi and roc) which are used in many of the research I have done. I do have the package pROC. Is there anyway to plot the graph using this package. Also how else could I choose the optimal threshold? How does it work using only the ROC curve?
If you are using the pROC package, the first step is to extract the coordinates of the curve. For instance:
library(pROC)
data(aSAH)
myroc <- roc(aSAH$outcome, aSAH$ndka)
mycoords <- coords(myroc, "all")
Once you have that you can plot anything you like. This should be somewhat close to your example.
plot(mycoords["threshold",], mycoords["specificity",], type="l",
col="red", xlab="Cutoff", ylab="Performance")
lines(mycoords["threshold",], mycoords["sensitivity",], type="l",
col="blue")
legend(100, 0.4, c("Specificity", "Sensitivity"),
col=c("red", "blue"), lty=1)
Choosing the "optimal" cutpoint is as difficult as defining what is optimal in the first place. It highly depends on the context and your application.
A common shortcut is to use the Youden index, which is simply the point with the cutoff with max(specificity + sensitivity). Again with pROC:
best.coords <- coords(myroc, "best", best.method="youden")
abline(v=best.coords["threshold"], lty=2, col="grey")
abline(h=best.coords["specificity"], lty=2, col="red")
abline(h=best.coords["sensitivity"], lty=2, col="blue")
With pROC you can change the criteria for the "best" threshold. See the ?coords help page and the best.method and best.weights arguments for quick ways to tune it. You may want to look at the OptimalCutpoints package for more advanced ways to select your own optimum.
The output plot should look something like this:

Multiple plots using curve() function (e.g. normal distribution)

I am trying to plot multiple functions using curve(). My example tries to plot multiple normal distributions with different means and the same standard deviation.
png("d:/R/standardnormal-different-means.png",width=600,height=300)
#First normal distribution
curve(dnorm,
from=-2,to=2,ylab="d(x)",
xlim=c(-5,5))
abline(v=0,lwd=4,col="black")
#Only second normal distribution is plotted
myMean <- -1
curve(dnorm(x,mean=myMean),
from=myMean-2,to=myMean+2,
ylab="d(x)",xlim=c(-5,5), col="blue")
abline(v=-1,lwd=4,col="blue")
dev.off()
As the curve() function creates a new plot each time, only the second normal distribution is plotted.
I reopened this question because the ostensible duplicates focus on plotting two different functions or two different y-vectors with separate calls to curve. But since we want the same function, dnorm, plotted for different means, we can automate the process (although the answers to the other questions could also be generalized and automated in a similar way).
For example:
my_curve = function(m, col) {
curve(dnorm(x, mean=m), from=m - 3, to=m + 3, col=col, add=TRUE)
abline(v=m, lwd=2, col=col)
}
plot(NA, xlim=c(-10,10), ylim=c(0,0.4), xlab="Mean", ylab="d(x)")
mapply(my_curve, seq(-6,6,2), rainbow(7))
Or, to generalize still further, let's allow multiple means and standard deviations and provide an option regarding whether to include a mean line:
my_curve = function(m, sd, col, meanline=TRUE) {
curve(dnorm(x, mean=m, sd=sd), from=m - 3*sd, to=m + 3*sd, col=col, add=TRUE)
if(meanline==TRUE) abline(v=m, lwd=2, col=col)
}
plot(NA, xlim=c(-10,10), ylim=c(0,0.4), xlab="Mean", ylab="d(x)")
mapply(my_curve, rep(0,4), 4:1, rainbow(4), MoreArgs=list(meanline=FALSE))
You can also use line segments that start at zero and stop at the top of the density distribution, rather than extending all the way from the bottom to the top of the plot. For a normal distribution the mean is also the point of highest density. However, I've used the which.max approach below as a more general way of identifying the x-value at which the maximum y-value occurs. I've also added arguments for line width (lwd) and line end cap style (lend=1 means flat rather than rounded):
my_curve = function(m, sd, col, meanline=TRUE, lwd=1, lend=1) {
x=curve(dnorm(x, mean=m, sd=sd), from=m - 3*sd, to=m + 3*sd, col=col, add=TRUE)
if(meanline==TRUE) segments(m, 0, m, x$y[which.max(x$y)], col=col, lwd=lwd, lend=lend)
}
plot(NA, xlim=c(-10,20), ylim=c(0,0.4), xlab="Mean", ylab="d(x)")
mapply(my_curve, seq(-5,5,5), c(1,3,5), rainbow(3))

R Beta function - relative y scale

I am having trouble understanding the Beta function in R. I want the y scale to display a relative value in percent (0->1). How do I achive this with the graph having the same form?
x = seq(0,1,0.001)
plot(x,dbeta(x,10,40), type="l", col="red", xlab="time", ylab="frequency")
It sounds like you're looking for the beta density, normalized so the maximum value is 1. This could be accomplished with:
x = seq(0,1,0.001)
density = dbeta(x, 10, 40)
plot(x, density/max(density), type="l", col="red", xlab="time", ylab="frequency")
Well, I am sure you looked the help page at the value page perhaps there is what you are looking for :
dbeta gives the density, pbeta the distribution function, qbeta the quantile function, and rbeta generates random deviates.
I think you want to plot the pbeta

filled.contour() in R: nonlinear key range

I am using filled.contour() to plot data stored in a matrix. The data is generated by a (highly) non-linear function, hence its distribution is not uniform at all and the range is very large.
Consequently, I have to use the option "levels" to fine tune the plot. However, filled.contour() does not use these custom levels to make an appropriate color key for the heat map, which I find quite surprising.
Here is a simple example of what I mean:
x = c(20:200/100)
y = c(20:200/100)
z = as.matrix(exp(x^2)) %*% exp(y^2)
filled.contour(x=x,y=y,z=z,color.palette=colorRampPalette(c('green','yellow','red')),levels=c(1:60/3,30,50,150,250,1000,3000))
As you can see, the color key produced with the code above is pretty much useless. I would like to use some sort of projection (perhaps sin(x) or tanh(x)?), so that the upper range is not over-represented in the key (in a linear way).
At this point, I would like to:
1) know if there is something very simple/obvious I am missing, e.g.: an option to make this "key range adapting" automagically;
2) seek suggestions/help on how to do it myself, should the answer to 1) be negative.
Thanks a lot!
PS: I apologize for my English, which is far from perfect. Please let me know if you need me to clarify anything.
I feel your frustration. I never found a way to do this with filled contour, so have usually reverted to using image and then adding my own scale as a separate plot. I wrote the function image.scale to help out with this (link). Below is an example of how you can supply a log-transform to your scale in order to stretch out the small values - then label the scale with the non-log-transformed values as labels:
Example:
source("image.scale.R") # http://menugget.blogspot.de/2011/08/adding-scale-to-image-plot.html
x = c(20:200/100)
y = c(20:200/100)
z = as.matrix(exp(x^2)) %*% exp(y^2)
pal <- colorRampPalette(c('green','yellow','red'))
breaks <- c(1:60/3,30,50,150,250,1000,3000)
ncolors <- length(breaks)-1
labs <- c(0.5, 1, 3,30,50,150,250,1000,3000)
#x11(width=6, height=6)
layout(matrix(1:2, nrow=1, ncol=2), widths=c(5,1), heights=c(6))
layout.show(2)
par(mar=c(5,5,1,1))
image(x=x,y=y,z=log(z), col=pal(ncolors), breaks=log(breaks))
box()
par(mar=c(5,0,1,4))
image.scale(log(z), col=pal(ncolors), breaks=log(breaks), horiz=FALSE, xlab="", ylab="", xaxt="n", yaxt="n")
axis(4, at=log(labs), labels=labs)
box()
Result:

Error: "Hit <Return> to see next plot: " in r

I have the following code
frame()
Y = read.table("Yfile.txt",header=T,row.names=NULL,sep='')
X = read.table("Xfile.txt",header=F,sep='')
plot(Y$V1~X$V1,pch=20,xlim=c(0,27))
par(new=T)
plot(Y$V1~X$V2,pch=20,xlim=c(0,27),col='red')
par(new=T)
plot(Y$V1~Y$V3,pch=20,xlim=c(0,27),col='blue')
par(new=T)
All is well and I get the 3 graphs on the same plot. However, when I want to divide X$V1, X$V2 and X$V3 to normalise the data such that
plot(Y$V1~X$V1/Y$V2,pch=20,xlim=c(0,27))
par(new=T)
plot(Y$V1~X$V2/Y$V2,pch=20,xlim=c(0,27),col='red')
par(new=T)
plot(Y$V1~Y$V3/Y$V2,pch=20,xlim=c(0,27),col='blue')
par(new=T)
I get the message
Hit Return to see next plot:
and the graphs just won't show in the same plot. Could anybody tell me what is happening and how to solve it?
If you want to use arithmetic operations in formula you have to use I() functions. So
plot(Y$V1~I(X$V1/Y$V2),pch=20,xlim=c(0,27))
par(new=T)
plot(Y$V1~I(X$V2/Y$V2),pch=20,xlim=c(0,27),col='red')
par(new=T)
plot(Y$V1~I(Y$V3/Y$V2),pch=20,xlim=c(0,27),col='blue')
par(new=T)
works.
Following help page to formula:
To avoid this confusion, the function
I() can be used to bracket those
portions of a model formula where the
operators are used in their arithmetic
sense. For example, in the formula y
~ a + I(b+c), the term b+c is to be
interpreted as the sum of b and c.
Edit. You could do it without formula in one command:
plot(c(X$V1/Y$V2, X$V2/Y$V2, Y$V3/Y$V2), rep(Y$V1, 3),
pch=20, xlim=c(0,27),
col=rep(c("black", "red", "blue"), each=30)
)
I'm not sure why you get the error, but using points instead of plot for the second and third graph is a much more elegant solution (and gets rid of those par calls)

Resources