Plot a regression equation with mean standard error in R - r

I would like to plot values from the image from a model regression, with R.
https://imgur.com/DDbP29T
my plan is to create the equation and plot with the curve function
eqn = function(x){ZZZ}
curve(eqn, from=0, to=50, n=50)
I expect a logarithmic growth curve

Something along these "lines" as it were; Consult ?curve, ?plot for further details about the arguments used, and your stats books for why I picked 1.96 as a multiplier for the sum (roughly) of the se values
eqn = function(x){62.259+11.395*log(x) -10.268}
curve(eqn, from=0, to=50, n=50, ylim=c(0,100))
lines(x=seq(0,50),y=eqn(seq(0,50))+ 1.96*8.5, lty=3)
lines(x=seq(0,50),y=eqn(seq(0,50))- 1.96*8.5, lty=3)
(I'm guessing you are trying to reproduce a plot you have seen constructed in the SAS product named "JMP". I used it for a while around 2008-2009, but found it too limited in its capabilities, not to mention becoming quite expensive.)

Related

kernel density estimator in R

I'm using the last column from the following data,
Data
And I'm trying to apply the idea of a kernel density estimator to this dataset which is represented by
where k is some kernal, normally a normal distribution though not necessarily., h is the bandwidth, n is the length of the data set, X_i is each data point and x is a fitted value. So using this equation I have the following code,
AstroData=read.table(paste0("http://www.stat.cmu.edu/%7Elarry",
"/all-of-nonpar/=data/galaxy.dat"),
header=FALSE)
x=AstroData$V3
xsorted=sort(x)
x_i=xsorted[1:1266]
hist(x_i, nclass=308)
n=length(x_i)
h1=.002
t=seq(min(x_i),max(x_i),0.01)
M=length(t)
fhat1=rep(0,M)
for (i in 1:M){
fhat1[i]=sum(dnorm((t[i]-x_i)/h1))/(n*h1)}
lines(t, fhat1, lwd=2, col="red")
Which produces a the following plot,
which is actually close to what I want as the final result should appear as this once I remove the histograms,
Which if you noticed is finer tuned and the red lines which should represent the density are rather rough and are not scaled as high. The final plot that you see is run using the density function in R,
plot(density(x=y, bw=.002))
Which is what I want to get to without having to use any additional packages.
Thank you
After some talk with my roommate he gave me the idea to go ahead and decrease the interval of the t-values (x). In doing some I changed it from 0.01 to 0.001. So the final code for this plot is as appears,
AstroData=read.table(paste0("http://www.stat.cmu.edu/%7Elarry",
"/all-of-nonpar/=data/galaxy.dat"),
header=FALSE)
x=AstroData$V3
xsorted=sort(x)
x_i=xsorted[1:1266]
hist(x_i, nclass=308)
n=length(x_i)
h1=.002
t=seq(min(x_i),max(x_i),0.001)
M=length(t)
fhat1=rep(0,M)
for (i in 1:M){
fhat1[i]=sum(dnorm((t[i]-x_i)/h1))/(n*h1)}
lines(t, fhat1, lwd=2, col="blue")
Which in terms gives the following plot, which is the one that I wanted,

Specificity/Sensitivity vs cut-off points using pROC package

I need to plot the following graph so I can choose the optimal threshold for a logistic regression model.
However I can't use the packages (epi and roc) which are used in many of the research I have done. I do have the package pROC. Is there anyway to plot the graph using this package. Also how else could I choose the optimal threshold? How does it work using only the ROC curve?
If you are using the pROC package, the first step is to extract the coordinates of the curve. For instance:
library(pROC)
data(aSAH)
myroc <- roc(aSAH$outcome, aSAH$ndka)
mycoords <- coords(myroc, "all")
Once you have that you can plot anything you like. This should be somewhat close to your example.
plot(mycoords["threshold",], mycoords["specificity",], type="l",
col="red", xlab="Cutoff", ylab="Performance")
lines(mycoords["threshold",], mycoords["sensitivity",], type="l",
col="blue")
legend(100, 0.4, c("Specificity", "Sensitivity"),
col=c("red", "blue"), lty=1)
Choosing the "optimal" cutpoint is as difficult as defining what is optimal in the first place. It highly depends on the context and your application.
A common shortcut is to use the Youden index, which is simply the point with the cutoff with max(specificity + sensitivity). Again with pROC:
best.coords <- coords(myroc, "best", best.method="youden")
abline(v=best.coords["threshold"], lty=2, col="grey")
abline(h=best.coords["specificity"], lty=2, col="red")
abline(h=best.coords["sensitivity"], lty=2, col="blue")
With pROC you can change the criteria for the "best" threshold. See the ?coords help page and the best.method and best.weights arguments for quick ways to tune it. You may want to look at the OptimalCutpoints package for more advanced ways to select your own optimum.
The output plot should look something like this:

How do I plot an abline() when I don't have any data points (in R)

I have to plot a few different simple linear models on a chart, the main point being to comment on them. I have no data for the models. I can't get R to create a plot with appropriate axes, i.e. I can't get the range of the axes correct. I think I'd like my y-axis to 0-400 and x to be 0-50.
Models are:
$$
\widehat y=108+0.20x_1
$$$$
\widehat y=101+2.15x_1
$$$$
\widehat y=132+0.20x_1
$$$$
\widehat y=119+8.15x_1
$$
I know I could possibly do this much more easily in a different software or create a dataset from the model and estimate and plot the model from that but I'd love to know if there is a better way in R.
As #Glen_b noticed, type = "n" in plot produces a plot with nothing on it. As it demands data, you have to provide anything as x - it can be NA, or some data. If you provide actual data, the plot function will figure out the plot margins from the data, otherwise you have to choose the margins by hand using xlim and ylim arguments. Next, you use abline that has parameters a and b for intercept and slope (or h and v if you want just a horizontal or vertical line).
plot(x=NA, type="n", ylim=c(100, 250), xlim=c(0, 50),
xlab=expression(x[1]), ylab=expression(hat(y)))
abline(a=108, b=0.2, col="red")
abline(a=101, b=2.15, col="green")
abline(a=132, b=0.2, col="blue")
abline(a=119, b=8.15, col="orange")

How to scale/transform graphics::plot() axes with any transformation, not just logarithmic (for Weibull plots)?

I am building an R package to display Weibull plots (using graphics::plot) in R. The plot has a log-transformed x-axis and a Weibull-transformed y-axis (for lack of a better description). The two-parameter Weibull distribution can thus be represented as a straight line on this plot.
The logarithmic transformation of the x-axis is as simple as adding the log="x" parameter to plot() or curve(). How can I supply the y-axis transformation in an elegant way, so that all graphics-related plotting will work on my axis-transformed plot? To demonstrate what I need, run the following example code:
## initialisation ##
beta <- 2;eta <- 1000
ticks <- c(seq(0.01,0.09,0.01),(1:9)/10,seq(0.91,0.99,0.01))
F0inv <- function (p) log(qweibull(p, 1, 1))
# this is the transformation function
F0 <- function (q) exp(-exp(q))
# this is the inverse of the transformation function
weibull <- function(x)pweibull(x,beta,eta)
# the curve of this function represents the weibull distribution
# as a straight line on weibull paper
weibull2 <- function(x)F0inv(weibull(x))
First an example of a Weibull distribution with beta=2 and eta=1000 on a regular, untransformed plot:
## untransformed axes ##
curve(weibull ,xlim=c(100,1e4),ylim=c(0.01,0.99))
abline(h=ticks,col="lightgray")
This plot is useless for Weibull analysis. Here is my currently implemented solution that transforms the data with function F0inv() and modifies the y-axis of the plot. Notice that I have to use F0inv() on all y-axis related data.
## transformed axis with F0inv() ##
curve(weibull2,xlim=c(100,1e4),ylim=F0inv(c(0.01,0.99)),log="x",axes=F)
axis(1);axis(2,at=F0inv(ticks),labels=ticks)
abline(h=F0inv(ticks),col="lightgray")
This works, but this is not very user-friendly: when the user wants to add annotations, one must always use F0inv():
text(300,F0inv(0.4),"at 40%")
I found that you can achieve a solution to my problem using ggplot2 and scales, but I don't want to change to a graphics package unless absolutely necessary since a lot of other code needs to be rewritten.
## with ggplot2 and scales ##
library(ggplot2)
library(scales)
weibull_trans <- function()trans_new("weibull", F0inv, F0)
qplot(c(100,1e4),xlim=c(100,1e4),ylim=c(0.01,0.99),
stat="function",geom="line",fun=weibull) +
coord_trans(x="log10",y = "weibull")
I think that if I could dynamically replace the code for applying the logarithmic transformation with my own, my problem would be solved.
I tried to find more information by Googling "R axis transformation", "R user coordinates", "R axis scaling" without useful results. Almost everything I have found dealt with logarithmic scales.
I tried to look into plot() at how the log="x" parameter works, but the relevant code for plot.window is written in C – not my strongest point at all.
While it doesn't appear to be possible in base graphics, you can make this function do what you want so that you can call it more simply:
F0inv <- function (p) log(qweibull(p, 1, 1))
## this is the transformation function
F0 <- function (q) exp(-exp(q))
weibullplot <- function(eta, beta,
ticks=c(seq(0.01,0.09,0.01),(1:9)/10,seq(0.91,0.99,0.01)),
...) {
## the curve of this function represents the weibull distribution
## as a straight line on weibull paper
weibull2 <- function(x)
F0inv(pweibull(x, beta, eta))
curve(weibull2, xlim=c(100, 1e4), ylim=F0inv(c(0.01, 0.99)), log="x", axes=FALSE)
axis(1);
axis(2, at=F0inv(ticks), labels=ticks)
abline(h=F0inv(ticks),col="lightgray")
}
weibullplot(eta=1000, beta=2)

How do I plot the 'inverse' of a survival function?

I am trying to plot the inverse of a survival function, as the data I'm is actually an increase in proportion of an event over time. I can produce Kaplan-Meier survival plots, but I want to produce the 'opposite' of these. I can kind of get what I want using the following fun="cloglog":
plot(survfit(Surv(Days_until_workers,Workers)~Queen_Number+Treatment,data=xdata),
fun="cloglog", lty=c(1:4), lwd=2, ylab="Colonies with Workers",
xlab="Days", las=1, font.lab=2, bty="n")
But I don't understand quite what this has done to the time (i.e. doesn't start at 0 and distance decreases?), and why the survival lines extend above the y axis.
Would really appreciate some help with this!
Cheers
Use fun="event" to get the desired output
fit <- survfit(Surv(time, status) ~ x, data = aml)
par(mfrow=1:2, las=1)
plot(fit, col=2:3)
plot(fit, col=2:3, fun="event")
The reason for fun="cloglog" screwing up the axes is that it does not plot a fraction at all. It is instead plotting this according to ?plot.survfit:
"cloglog" creates a complimentary log-log survival plot (f(y) = log(-log(y)) along with log scale for the x-axis)
Moreover, the fun argument is not limited to predefined functions like "event" or "cloglog", so you can easily give it your own custom function.
plot(fit, col=2:3, fun=function(y) 3*sqrt(1-y))

Resources