R - logistic curve plot with aggregate points - r

Let's say I have the following dataset
bodysize=rnorm(20,30,2)
bodysize=sort(bodysize)
survive=c(0,0,0,0,0,1,0,1,0,0,1,1,0,1,1,1,0,1,1,1)
dat=as.data.frame(cbind(bodysize,survive))
I'm aware that the glm plot function has several nice plots to show you the fit,
but I'd nevertheless like to create an initial plot with:
1)raw data points
2)the loigistic curve and both
3)Predicted points
4)and aggregate points for a number of predictor levels
library(Hmisc)
plot(bodysize,survive,xlab="Body size",ylab="Probability of survival")
g=glm(survive~bodysize,family=binomial,dat)
curve(predict(g,data.frame(bodysize=x),type="resp"),add=TRUE)
points(bodysize,fitted(g),pch=20)
All fine up to here.
Now I want to plot the real data survival rates for a given levels of x1
dat$bd<-cut2(dat$bodysize,g=5,levels.mean=T)
AggBd<-aggregate(dat$survive,by=list(dat$bd),data=dat,FUN=mean)
plot(AggBd,add=TRUE)
#Doesn't work
I've tried to match AggBd to the dataset used for the model and all sort of other things but I simply can't plot the two together. Is there a way around this?
I basically want to overimpose the last plot along the same axes.
Besides this specific task I often wonder how to overimpose different plots that plot different variables but have similar scale/range on two-dimensional plots. I would really appreciate your help.

The first column of AggBd is a factor, you need to convert the levels to numeric before you can add the points to the plot.
AggBd$size <- as.numeric (levels (AggBd$Group.1))[AggBd$Group.1]
to add the points to the exisiting plot, use points
points (AggBd$size, AggBd$x, pch = 3)

You are best specifying your y-axis. Also maybe using par(new=TRUE)
plot(bodysize,survive,xlab="Body size",ylab="Probability of survival")
g=glm(survive~bodysize,family=binomial,dat)
curve(predict(g,data.frame(bodysize=x),type="resp"),add=TRUE)
points(bodysize,fitted(g),pch=20)
#then
par(new=TRUE)
#
plot(AggBd$Group.1,AggBd$x,pch=30)
obviously remove or change the axis ticks to prevent overlap e.g.
plot(AggBd$Group.1,AggBd$x,pch=30,xaxt="n",yaxt="n",xlab="",ylab="")
giving:

Related

ggplot2 Line graphs in R: Plotting dependent variable on y axis

I am trying to plot the vertical concentration profile of a pollutant. By convention, altitude is plotted on the vertical axis, and concentration is on the x (even though altitude is the independent variable). When plotting the concentrations for pollutants that do not fit a one-to-one function, R connects the points in a most annoying zig-zag pattern, instead of connecting them in order by altitude.
I tried changing the concentration values to factors, with levels based on altitude values:
concSummary$value <- factor(concSummary$value, levels =
concSummary$value[order(concSummary$altitude)])
But this didn't seem to work.
Does anyone know how to get around this problem?
Update: Someone posted a useful solution here: controlling order of points in ggplot2 in R?
Using geom_path() instead of geom_point() tells R to connect points in the order in which they appear in a dataframe. This happened to work for me because the data were ordered by altitude.

How to deal with all data as non-outliers for boxplot in R?

I am new to R project and have to use boxplot function to plot the data.
When I use it, boxplot automatically deals with some points as outliers.
But for my case, every points are not outliers. I just wanted to show min/max, 25/75 percentile and median. So I've searched for boxplot function and haven't found an option that deals every points as non-outliers.
Is there any way to do what I want?
You should try using range=0. For example:
x <- rlnorm(1000)
boxplot(x, range = 0)

Creating a Bland-Altman plot for data in two columns in data frame

I have a data frame data_2 and wish to create a Bland-Altman plot to compare the differences between the data in the columns alog1 vs. dig1.
Please help with the function for this and how to execute this. Would the function be barplot()?
Thanks for your time.
Another name for a Bland-Altman plot is a Tukey mean-difference plot. (I have nothing against Bland and Altman, but I think 'mean-difference' is more descriptive.) Note that this different from a boxplot (observe the pictures on the two Wikipedia pages). The mean-difference plot is simply a regular scatterplot, except that instead of plotting x versus y, you are plotting the difference x-y against the mean of x and y (or in your case, alog1 and dig1). Probably the easiest way to make this is to form these two new variables first, and then simply plot them as you would any other scatterplot. Here is some sample code:
mn <- (data_2$alog1 + data_2$dig1)/2
dif <- data_2$alog1 - data_2$dig1
plot(mn, dif)
If you wanted to add arguments to customize your plot, you could do that just as you normally would, for example:
plot(mn, dif, main="Bland-Altman plot", xlab="mean of alog1 & dig1",
ylab="difference between alog1 & dig1")

How to plot density plots with proportions on the y-axis?

I am using the sm package in R to draw a density plot of several variables with different sample sizes, like this:
var1 <- density(vars1[,1])
var2 <- density(vars2[,1])
var3 <- density(vars3[,1])
pdf(file="density.pdf",width=8.5,height=8)
plot(var1,col="BLUE")
par(new=T)
plot(var2,axes=FALSE,col="RED")
par(new=T)
plot(var3,axes=FALSE,col="GREEN")
dev.off()
The problem I'm having, is that I want the y-axis to show the proportions so I can compare the different variables with each other in a more meaningful way. The maxima of all three density plots are now exactly the same, and I'm pretty sure that they wouldn't be if the y-axis showed proportions. Any suggestions? Many thanks!
Edit:
I just learned that I should not plot on top of an existing plot, so now the plotting part of the code looks like this:
pdf(file="density.pdf",width=8.5,height=8)
plot(var1,col="BLUE")
lines(var2,col="RED")
lines(var3,col="GREEN")
dev.off()
The maxima of those lines however are now very much in line with the sample size differences. Is there a way to put the proportions on the y-axis for all three variables, so the area under the curve is equal for all three variables? Many thanks!
Don't plot on top of an existing plot, because they axes may be different. Instead, use lines() to plot the second and third densities after plotting the first. If necessary, adjust the ylim parameter in plot() so that they all fit.
An example for how sample size ought not matter:
set.seed(1)
D1 <- density(rnorm(1000))
D2 <- density(rnorm(10000))
D3 <- density(rnorm(100000))
plot(D1$x,D1$y,type='l',col="red",ylim=c(0,.45))
lines(D2$x,D2$y,lty=2,col="blue")
lines(D3$x,D3$y,lty=3,col="green")
You could make tim's solution a little more flexible by not hard-coding in the limits.
plot(D1$x,D1$y,type='l',col="red",ylim=c(0, max(sapply(list(D1, D2, D3),
function(x) {max(x$y)}))))
This would also cater for Vincent's point that the density functions are not necessarily constrained in their range.

Plotting More than 2 Factors

Suppose I ran a factor analysis & got 5 relevant factors. Now, I want to graphically represent the loading of these factors on the variables. Can anybody please tell me how to do it. I can do using 2 factors. But can't able to do when number of factors are more than 2.
The 2 factor plotting is given in "Modern Applied Statistics with S", Fig 11.13. I want to create similar graph but with more than 2 factors. Please find the snap of the Fig mentioned above:
X & y axes are the 2 factors.
Regards,
Ari
Beware: not the answer you are looking for and might be incorrect also, this is my subjective thought.
I think you run into the problem of sketching several dimensions on a two dimension screen/paper.
I would say there is no sense in plotting more factors' or PCs' loadings, but if you really insist: display the first two (based on eigenvalues) or create only 2 factors. Or you could reduce dimension by other methods also (e.g. MDS).
Displaying 3 factors' loadings in a 3 dimensional graph would be just hardly clear, not to think about more factors.
UPDATE: I had a dream about trying to be more ontopic :)
You could easily show projections of each pairs of factors as #joran pointed out like (I am not dealing with rotation here):
f <- factanal(mtcars, factors=3)
pairs(f$loadings)
This way you could show even more factors and be able to tweak the plot also, e.g.:
f <- factanal(mtcars, factors=5)
pairs(f$loadings, col=1:ncol(mtcars), upper.panel=NULL, main="Factor loadings")
par(xpd=TRUE)
legend('topright', bty='n', pch='o', col=1:ncol(mtcars), attr(f$loadings, 'dimnames')[[1]], title="Variables")
Of course you could also add rotation vectors also by customizing the lower triangle, or showing it in the upper one and attaching the legend on the right/below etc.
Or just point the variables on a 3D scatterplot if you have no more than 3 factors:
library(scatterplot3d)
f <- factanal(mtcars, factors=3)
scatterplot3d(as.data.frame(unclass(f$loadings)), main="3D factor loadings", color=1:ncol(mtcars), pch=20)
Note: variable names should not be put on the plots as labels, but might go to a distinct legend in my humble opinion, specially with 3D plots.
It looks like there's a package for this:
http://factominer.free.fr/advanced-methods/multiple-factor-analysis.html
Comes with sample code, and multiple factors. Load the FactoMineR package and play around.
Good overview here:
http://factominer.free.fr/docs/article_FactoMineR.pdf
Graph from their webpage:
You can also look at the factor analysis object and see if you can't extract the values and plot them manually using ggplot2 or base graphics.
As daroczig mentions, each set of factor loadings gets its own dimension. So plotting in five dimensions is not only difficult, but often inadvisable.
You can, though, use a scatterplot matrix to display each pair of factor loadings. Using the example you cite from Venables & Ripley:
#Reproducing factor analysis from Venables & Ripley
#Note I'm only doing three factors, not five
data(ability.cov)
ability.FA <- factanal(covmat = ability.cov,factor = 3, rotation = "promax")
load <- loadings(ability.FA)
rot <- ability.FA$rot
#Pairs of factor loadings to plot
ind <- combn(1:3,2)
par(mfrow = c(2,2))
nms <- row.names(load)
#Loop over pairs of factors and draw each plot
for (i in 1:3){
eqscplot(load[,ind[1,i]],load[,ind[2,i]],xlim = c(-1,1),
ylim = c(-0.5,1.5),type = "n",
xlab = paste("Factor",as.character(ind[1,i])),
ylab = paste("Factor",as.character(ind[2,i])))
text(load[,ind[1,i]],load[,ind[2,i]],labels = nms)
arrows(c(0,0),c(0,0),rot[ind[,i],ind[,i]][,1],
rot[ind[,i],ind[,i]][,2],length = 0.1)
}
which for me resulting in the following plot:
Note that I had to play a little with the the x and y limits, as well as the various other fiddly bits. Your data will be different and will require different adjustments. Also, plotting each pair of factor loadings with five factors will make for a rather busy collection of scatterplots.

Resources