I am using the following code to plot the ROC curve after having run the logistic regression.
fit1 <- glm(formula=GB160M3~Behvscore, data=eflscr,family="binomial", na.action = na.exclude)
prob1=predict(fit1, type=c("response"))
eflscr$prob1 = prob1
library(pROC)
g1 <- roc(GB160M3~prob1, data=eflscr, plot=TRUE, grid=TRUE, print.auc=TRUE)
The ROC curves plotted look like this (see link below)
The x-axis scale does not fill the who chart.
How can I change the x axis to report 1 - specifically?
By default pROC sets asp = 1 to ensure the plot is square and both sensitivity and specificity are on the same scale. You can set it to NA or NULL to free the axis and fill the chart, but your ROC curve will be misshaped.
plot(g1, asp = NA)
Using par(pty="s") as suggested by Joe is probably a better approach
This is purely a labeling problem: note that the x axis goes decreasing from 1 to 0, which is exactly the same as plotting 1-specificity on an increasing axis. You can set the legacy.axes argument to TRUE to change the behavior if the default one bothers you.
plot(g1, legacy.axes = TRUE)
A good shortcut to getting a square plot is to run the following before plotting:
par(pty="s")
This forces the shape of the plot region to be square. Set the plotting region back to maximal by simply resetting the graphics device and clearing the plot.
dev.off()
As pointed out by #Calimo, there is the legacy.axes argument to reverse the x-axis and the label is also changed automatically. You can run ?plot.roc to see all the pROC plotting options.
Example
# Get ROC object
data(aSAH)
roc1 <- roc(aSAH$outcome, aSAH$s100b)
# Plot
par(pty="s")
plot(roc1, grid = TRUE, legacy.axes = TRUE)
# Reset graphics device and clear plot
dev.off()
Related
I'm doing stochastic dominance analysis with diferent income distributions using Pen's Parade. I can plot a single Pen's Parade using Pen function from ineq package, but I need a visual comparison and I want multiple lines in the same image. I don't know how extract values from the function, so I can't do this.
I have the following reproducible example:
set.seed(123)
x <- rnorm(100)
y <- rnorm(100, mean = 0.2)
library(ineq)
Pen(x)
Pen(y)
I obtain the following plots:
I want obtain sometime as the following:
You can use add = TRUE:
set.seed(123)
x <- rnorm(100)
y <- rnorm(100, mean = 0.2)
library(ineq)
Pen(x); Pen(y, add = TRUE)
From help("Pen"):
add logical. Should the plot be added to an existing plot?
While the solution mentioned by M-M in the comments is a more general solution, in this specific case it produces a busy Y axis:
Pen(x)
par(new = TRUE)
Pen(y)
I would generalize the advice for plotting functions in this way:
Check the plotting function's help file. If it has an add argument, use that.
Otherwise, use the par(new = TRUE) technique
Update
As M-M helpfully mentions in the comments, their more general solution will not produce a busy Y axis if you manually suppress the Y axis on the second plot:
Pen(x)
par(new = TRUE)
Pen(y, yaxt = "n")
Looking at ?ineq::Pen() it seems to work like plot(); therefore, followings work for you.
Pen(x)
Pen(y, add=T)
Note: However, add=T cuts out part of your data since second plot has points which fall out of the limit of the first.
Update on using par(new=T):
Using par(new=T) basically means overlaying two plots on top of each other; hence, it is important to make them with the same scale. We can achieve that by setting the same axis limits. That said, while using add=T argument it is desired to set limits of the axis to not loose any part of data. This is the best practice for overlaying two plots.
Pen(x, ylim=c(0,38), xlim=c(0,1))
par(new=T)
Pen(y, col="red", ylim=c(0,38), xlim=c(0,1), yaxt='n', xaxt='n')
Essentially, you can do the same with add=T.
I have to plot a few different simple linear models on a chart, the main point being to comment on them. I have no data for the models. I can't get R to create a plot with appropriate axes, i.e. I can't get the range of the axes correct. I think I'd like my y-axis to 0-400 and x to be 0-50.
Models are:
$$
\widehat y=108+0.20x_1
$$$$
\widehat y=101+2.15x_1
$$$$
\widehat y=132+0.20x_1
$$$$
\widehat y=119+8.15x_1
$$
I know I could possibly do this much more easily in a different software or create a dataset from the model and estimate and plot the model from that but I'd love to know if there is a better way in R.
As #Glen_b noticed, type = "n" in plot produces a plot with nothing on it. As it demands data, you have to provide anything as x - it can be NA, or some data. If you provide actual data, the plot function will figure out the plot margins from the data, otherwise you have to choose the margins by hand using xlim and ylim arguments. Next, you use abline that has parameters a and b for intercept and slope (or h and v if you want just a horizontal or vertical line).
plot(x=NA, type="n", ylim=c(100, 250), xlim=c(0, 50),
xlab=expression(x[1]), ylab=expression(hat(y)))
abline(a=108, b=0.2, col="red")
abline(a=101, b=2.15, col="green")
abline(a=132, b=0.2, col="blue")
abline(a=119, b=8.15, col="orange")
I am facing a probably pretty easy-to-solve issue: adding a log- curve to a scatter plot.
I have already created the corresponding model and now only need to add the respective curve/line.
The current model is as follows:
### DATA
SpStats_urbanform <- c (0.3702534,0.457769,0.3069843,0.3468263,0.420108,0.2548158,0.347664,0.4318018,0.3745645,0.3724192,0.4685135,0.2505839,0.1830535,0.3409849,0.1883303,0.4789871,0.3979671)
co2 <- c (6.263937,7.729964,8.39634,8.12979,6.397212,64.755192,7.330138,7.729964,11.058834,7.463414,7.196863,93.377393,27.854284,9.081405,73.483949,12.850917,12.74407)
### Plot initial plot
plot (log10 (1) ~ log10 (1), col = "white", xlab = "PUSHc values",
ylab = "Corrected GHG emissions [t/cap]", xlim =c(0,xaxes),
ylim =c(0,yaxes), axes =F)
axis(1, at=seq(0.05, xaxes, by=0.05), cex.axis=1.1)
axis(2, at=seq(0, yaxes, by=1), cex.axis=1.1 )
### FIT
fit_co2_urbanform <- lm (log10(co2) ~ log10(SpStats_urbanform))
### Add data points (used points() instead of simple plot() bc. of other code parts)
points (co2_cap~SpStats_urbanform, axes = F, cex =1.3)
Now, I've already all the fit_parameters and are still not able to construct the respective fit-curve for co2_cap (y-axis)~ SpStats_urbanform (x-axis)
Can anyone help me finalizing this little piece of code ?
First, if you want to plot in a log-log space, you have to specify it with argument log="xy":
plot (co2~SpStats_urbanform, log="xy")
Then if you want to add your regression line, then use abline:
abline(fit_co2_urbanform)
Edit: If you don't want to plot in a log-log scale then you'll have to translate your equation log10(y)=a*log10(x)+b into y=10^(a*log10(x)+b) and plot it with curve:
f <- coefficients(fit_co2_urbanform)
curve(10^(f[1]+f[2]*log10(x)),ylim=c(0,100))
points(SpStats_urbanform,co2)
So I have a barplot in which the y axis is the log (frequencies). From just eyeing it, it appears that bars decrease exponentially, but I would like to know this for sure. What I want to do is also plot an exponential on this same graph. Thus, if my bars fall below the exponential, I would know that my bars to decrease either exponentially or faster than exponential, and if the bars lie on top of the exponential, I would know that they dont decrease exponentially. How do I plot an exponential on a bar graph?
Here is my graph if that helps:
If you're trying to fit density of an exponential function, you should probably plot density histogram (not frequency). See this question on how to plot distributions in R.
This is how I would do it.
x.gen <- rexp(1000, rate = 3)
hist(x.gen, prob = TRUE)
library(MASS)
x.est <- fitdistr(x.gen, "exponential")$estimate
curve(dexp(x, rate = x.est), add = TRUE, col = "red", lwd = 2)
One way of visually inspecting if two distributions are the same is with a Quantile-Quantile plot, or Q-Q plot for short. Typically this is done when inspecting if a distribution follows standard normal.
The basic idea is to plot your data, against some theoretical quantiles, and if it matches that distribution, you will see a straight line. For example:
x <- qnorm(seq(0,1,l=1002)) # Theoretical normal quantiles
x <- x[-c(1, length(x))] # Drop ends because they are -Inf and Inf
y <- rnorm(1000) # Actual data. 1000 points drawn from a normal distribution
l.1 <- lm(sort(y)~sort(x))
qqplot(x, y, xlab="Theoretical Quantiles", ylab="Actual Quantiles")
abline(coef(l.1)[1], coef(l.1)[2])
Under perfect conditions you should see a straight line when plotting the theoretical quantiles against your data. So you can do the same plotting your data against the exponential function you think it will follow.
I am trying to plot the inverse of a survival function, as the data I'm is actually an increase in proportion of an event over time. I can produce Kaplan-Meier survival plots, but I want to produce the 'opposite' of these. I can kind of get what I want using the following fun="cloglog":
plot(survfit(Surv(Days_until_workers,Workers)~Queen_Number+Treatment,data=xdata),
fun="cloglog", lty=c(1:4), lwd=2, ylab="Colonies with Workers",
xlab="Days", las=1, font.lab=2, bty="n")
But I don't understand quite what this has done to the time (i.e. doesn't start at 0 and distance decreases?), and why the survival lines extend above the y axis.
Would really appreciate some help with this!
Cheers
Use fun="event" to get the desired output
fit <- survfit(Surv(time, status) ~ x, data = aml)
par(mfrow=1:2, las=1)
plot(fit, col=2:3)
plot(fit, col=2:3, fun="event")
The reason for fun="cloglog" screwing up the axes is that it does not plot a fraction at all. It is instead plotting this according to ?plot.survfit:
"cloglog" creates a complimentary log-log survival plot (f(y) = log(-log(y)) along with log scale for the x-axis)
Moreover, the fun argument is not limited to predefined functions like "event" or "cloglog", so you can easily give it your own custom function.
plot(fit, col=2:3, fun=function(y) 3*sqrt(1-y))