R: plotting predicted probabilities with plot(effect()) - how to customise plot appearance? - r

I would like to plot the predicted probabilities of Y (binary outcome) over the range of observed x values (x=age). I use the following code to produce the plot:
(1) I calculate the predicted probabilities of Y over a specified range of x-values (xlevels = x.list) for my independent variable (age), and save it in an object.
prob <- effect(c("age"),M, xlevels = x.list)
(2) Then I plot that object, customising certain plot appearances (such as axis labels, color of confidence intervals, etcetera).
plot(prob,
xlab="x",
ylab="Pred. prob.",
confint=list(col="red", alpha=0.3),
lines=list(col="red")
rug=FALSE, main="")
This produces a plot that almost looks like the one I would like to have. However, when trying to customise the main title, the y and x axis limits, as well as the ticks on the axis, the plot gets produced, but unfortunately also messed up (the y axis does not range from 0 to 1, and the actual line with confidence intervals gets pushed out of the plots' margins).
plot(prob,
xlab="x",
ylab="Pred. prob.",
confint=list(col="red", alpha=0.3),
lines=list(col="red")
rug=TRUE,
axes=list(y=lim={c(0, 1, 0.1)})))
In particular, I would like to change the y-axis so that it ...
(a) ranges from 0-1
(b) with where ticks in increments of 0.1.
(c) I further would like to rid of the box around the plot, i.e. only have the x and y axis drawn.
I have been trying to read up on ?plot.eff, but unfortunately cannot get the legacy arguments to work. Any input on how the code should be modified to get it to work would be much appreciated.

axes=list(y=lim={c(0, 1, 0.1)})) looks very strange, have you tried just ylim=c(0,1)
also do main=""

Related

plot function in R with log scale parameters shows negative values

I have a table with 2 columns: time and distance. Both they > 0 (in meters and minutes accordingly). When I do:
plot(dist, time, main="Distance vs Time", xlab="Distance (meters)", ylab="Time (min)")
I get following plot:
Not very readable. I will use log scale instead:
plot(log(dist), log(time), main="Distance vs Time",
xlab="Distance (meters), log scale", ylab="Time (min), log scale")
And I get following plot:
My question is: why plot shows negative values as well? I do not have any parameters less than 0.
You might prefer
plot(dist, time, log="xy", ...)
The reason you are getting negative values in the plot is that you have explicitly taken the logarithm of your data. Values less than 1 will be transformed to negative values - that's just the way the math works ... using log="xy" instead will plot the points in the same locations, but will change the scales so that they show the original values.
set.seed(101)
x <- rlnorm(10)
y <- rlnorm(10)
par(mfrow=c(2,2),las=1,bty="l")
Plot on original scale:
plot(x,y)
Plot logged data, labeled by log values (which will be negative when the original values are <1):
plot(log(x),log(y))
Plot logged data, labeled by original values:
plot(x,y,log="xy")
Recreate the same plot (almost) from scratch by specifying the axis label ticks at the log positions but using the original values as labels:
plot(log(x),log(y),axes=FALSE)
brkpos <- c(0.2,0.5,1.0,2,3)
axis(side=1,at=log(brkpos),label=brkpos)
axis(side=2,at=log(brkpos),label=brkpos)
box()
(I should have used axis labels "x" and "y" in this last subplot rather than "log(x)" and "log(y)" ...)

R: why is boxplot(x,log="y") different from boxplot(log(x))?

delme <- exp(rnorm(1000,1.5,0.3))
boxplot(delme,log="y")
boxplot(log10(delme))
Why are the whiskers different in this 2 plots?
Thanks
Agus
I would say that in your first plot you just changed the y axis to log, so the values you plot still range between 1 and 10. In this plot the y axis is a log scale. The whiskers on this axis look different because the space between each "tick" (ie axis break) is not constant (there is more space between 2 and 4 than between 10 and 8)
In the second plot, you take the log of the values then plot them, so they range from .2 to 1, and are plotted with a linear y axis.
Look at the summary for both of the normal and log transformed dataframes

R Polygon Plot Not Shading to X Axis

Using R and polygon I'm trying to shade the area under the line of a plot from the line to the x-axis and I'm not sure what I am doing wrong here.
The shading is using some point in the middle of the y range to shade from, not 0, the x-axis.
The data set ratioresults is a zoo object but I don't think that's the issue since I tried coercing the y values to as.numeric and as.vector and got the same results.
Code:
plot(index(ratioresults),ratioresults$ratio, type="o", col="red")
polygon(c(1,index(ratioresults),11),c(0, ratioresults$ratio, 0) , col='red')
What's index(ratioresults)? For a simple zoo object I see:
> index(x)
[1] "2003-02-01" "2003-02-03" "2003-02-07" "2003-02-09" "2003-02-14"
which is a vector of Date objects. You are trying to prepend/append values of 1 and 11 to this vector. Its not going to work.
Here's a reproducible example:
x=zoo(matrix(runif(11),ncol=1),as.Date("2012-08-01") + 0:10)
colnames(x)="ratio"
plot(index(x),x$ratio,type="o",col="red",ylim=c(0,1))
polygon(index(x)[c(1,1:11,11)],c(0,x$ratio,0),col="red")
Differences from yours:
I call my thing x.
I set ylim on the plot - I don't know how your plot managed to start at 0 on the Y axis.
I complete the polygon using the x-values of the first and 11th (last) point, rather than 1 and 11 themselves.
#With an example dataset: please provide one when you need help!
ratioresults<-as.zoo(runif(10,0,1))
plot(index(ratioresults),ratioresults, type="o", col="red",
xaxs="i",yaxs="i", ylim=c(0,2))
polygon(c(index(ratioresults),rev(index(ratioresults))),
c(as.vector(ratioresults),rep(0,length(ratioresults))),col="red")
The issue with your question is that the x-axis is not a line defined by a given y value by default, so one way to fill under a curve to the x-axis using polygon would be to define a y values for the x-axis using ylim (here I chose 0). Whatever value you choose you will want to specify that the plot stop exactly at the value using yaxs="i".
You also have to construct your polygon with the value you chose for you x-axis.

Axis-labeling in R histogram and density plots; multiple overlays of density plots

I have two related problems.
Problem 1: I'm currently using the code below to generate a histogram overlayed with a density plot:
hist(x,prob=T,col="gray")
axis(side=1, at=seq(0,100, 20), labels=seq(0,100,20))
lines(density(x))
I've pasted the data (i.e. x above) here.
I have two issues with the code as it stands:
the last tick and label (100) of the x-axis does not appear on the histogram/plot. How can I put these on?
I'd like the y-axis to be of count or frequency rather than density, but I'd like to retain the density plot as an overlay on the histogram. How can I do this?
Problem 2: using a similar solution to problem 1, I now want to overlay three density plots (not histograms), again with frequency on the y-axis instead of density. The three data sets are at:
http://pastebin.com/z5X7yTLS
http://pastebin.com/Qg8mHg6D
http://pastebin.com/aqfC42fL
Here's your first 2 questions:
myhist <- hist(x,prob=FALSE,col="gray",xlim=c(0,100))
dens <- density(x)
axis(side=1, at=seq(0,100, 20), labels=seq(0,100,20))
lines(dens$x,dens$y*(1/sum(myhist$density))*length(x))
The histogram has a bin width of 5, which is also equal to 1/sum(myhist$density), whereas the density(x)$x are in small jumps, around .2 in your case (512 even steps). sum(density(x)$y) is some strange number definitely not 1, but that is because it goes in small steps, when divided by the x interval it is approximately 1: sum(density(x)$y)/(1/diff(density(x)$x)[1]) . You don't need to do this later because it's already matched up with its own odd x values. Scale 1) for the bin width of hist() and 2) for the frequency of x length(x), as DWin says. The last axis tick became visible after setting the xlim argument.
To do your problem 2, set up a plot with the correct dimensions (xlim and ylim), with type = "n", then draw 3 lines for the densities, scaled using something similar to the density line above. Think however about whether you want those semi continuous lines to reflect the heights of imaginary bars with bin width 5... You see how that might make the density lines exaggerate the counts at any particular point?
Although this is an aged thread, if anyone catches this. I would only think it is a 'good idea' to forego translating the y density to count scales based on what the user is attempting to do.
There are perfectly good reasons for using frequency as the y value. One idea in particular that comes to mind is that using counts for the y scale value can give an analyst a good idea about where to begin the 'data hunt' for stratifying heterogenous data, if a mixed distribution model cannot soundly or intuitively be applied.
In practice, overlaying a density estimate over the observed histogram can be very useful in data quality checks. For example, in the above, if I were looking at the above graphic as a single source of data with the assumption that it describes "1 thing" and I wish to model this as "1 thing", I have an issue. That is, I have heterogeneous data which may require some level of stratification. The density overlay then becomes a simple visual tool for detecting heterogeneity (apart from using log transformations to smooth between-interval variation), and a direction (locations of the mixed distributions) for stratifying the data.

How to plot density of two datasets on same scale in one figure?

How to plot the density of a single column dataset as dots? For example
x <- c(1:40)
On the same plot using the same scale of the x-axis and y-axis, how to add another data set as line format which represent the density of another data that represents the equation of
y = exp(-x)
to the plot?
The equation is corrected to be y = exp(-x).
So, by doing plot(density(x)) or plot(density(y)), I got two separated figures. How to add them in the same axis and using dots for x, smoothed line for y?
You can add a line to a plot with the lines() function. Your code, modified to do what you asked for, is the following:
x <- 1:40
y <- exp(-x)
plot(density(x), type = "p")
lines(density(y))
Note that we specified the plot to give us points with the type parameter and then added the density curve for y with lines. The help pages for ?plot, ?par, ?lines would be some insightful reading. Also, check out the R Graph Gallery to view some more sophisticated graphs that generally have the source code attached to them.

Resources