plotting custom confidence intervals around curve fits R - r

Say I have some relationships between x and some outcome variable, for three different groups, A,B and C:
x<-c(0:470)/1000
#3 groups, each has a different v-max parameter value.
v.A<-5
v.B<-4
v.C<-3
C<- (v.C*x)/(0.02+x)
B<- (v.B*x)/(0.02+x)
A<-(v.A*x)/(0.02+x)
d.curve<-data.frame(x,A,B,C)
The estimates of the v. parameter also have associated errors:
err.A<-0.24
err.B<-0.22
err.C<-0.29
I'd like to plot these curve fits, as well as shaded error regions around each curve, based on the uncertainty in the v. parameter. So, the shaded region would be +/- one error value. I can generate the plot of the 3 curves easily enough:
limx<-c(0,0.47)
limy<-c(0,5.5)
plot(A~x,data=d.curve,xlim=limx,ylim=limy,col=NA)
lines(smooth.spline(d.curve$x,d.curve$A),col='black',lwd=3)
par(new=T)
plot(B~x,data=d.curve,xlim=limx,ylim=limy,col=NA,ylab=NA,xlab=NA,axes=F)
lines(smooth.spline(d.curve$x,d.curve$B),col='black',lwd=3,lty=2)
par(new=T)
plot(C~x,data=d.curve,xlim=limx,ylim=limy,col=NA,ylab=NA,xlab=NA,axes=F)
lines(smooth.spline(d.curve$x,d.curve$C),col='black',lwd=3,lty=3)
But how can I added custom shaded regions around them, based on specified error terms?

You can add the following code to your current code. The calculation of the error of the line is based on the error (assumed standard error) of the coefficients. You can change the calculation for the error of the line to something else, if desired. The order of plotting might need to be changed to make the polygons appear behind the lines.
# calculating the standard error of the line base on standard error of A,B,C
# could substitute another calculation
se.line.A <- ((x)/(0.02+x))*err.A
se.line.B <- ((x)/(0.02+x))*err.B
se.line.C <- ((x)/(0.02+x))*err.C
# library for polygons
library(graphics)
# plotting polygons
# colors can be changed
# polygons will be drawn over the existing lines
# may change the order of plotting for the shaded regions to be behind line
polygon(c(x,rev(x))
,c(A+se.line.A,rev(A-se.line.A))
,col='gray'
,density=100)
polygon(c(x,rev(x))
,c(B+se.line.B,rev(B-se.line.B))
,col='blue'
,density=100)
polygon(c(x,rev(x))
,c(C+se.line.C,rev(C-se.line.C))
,col='green'
,density=100)

Related

R: Could not get two graphs plot with the same starting X values

I'm plotting density probabilities against predicted probabilities with poisson distribution in R, this is the code I used:
dens=density(data)
plot(dens$x,dens$y,type="l",xlab="Value",
ylab="Count estimate",ylim=c(0,0.2),xlim=range(0:22),col=4,lwd=2)
lines(dpois(0:22,lambda=estimated_lambda), col=2,lwd=2)
And this is the result:
For some reason I couldn't get both lines to line up, even though both have the same x-axis range (as shown in the code above), it should work the same given that both values are discrete, and if I plot the dpois not using lines then the other line will not show up at all.
Any help is appreciated.

How to smooth non-linear regression curve in R

So I'm asked to obtain an estimate theta of the variable Length in the MASS package. The code I used is shown below, as well as the resulting curve. Somehow, I don't end up with a smooth curve, but with a very "blocky" one, as well as some lines between points on the curve. Can anyone help me to get a smooth curve?
utils::data(muscle,package = "MASS")
Length.fit<-nls(Length~t1+t2*exp(-Conc/t3),muscle,
start=list(t1=3,t2=-3,t3=1))
plot(Length~Conc,data=muscle)
lines(muscle$Conc, predict(Length.fit))
Image of the plot:
.
Edit: as a follow-up question:
If I want to more accurately predict the curve, I use nonlinear regression to predict the curve for each of the 21 species. This gives me a vector
theta=(T11,T12,...,T21,T22,...,T3).
I can create a for-loop that plots all of the graphs, but like before, I end up with the blocky curve. However, seeing as I have to plot these curves as follows:
for(i in 1:21) {
lines(muscle$Conc,theta[i]+theta[i+21]*
exp(-muscle$Conc/theta[43]), col=color[i])
i = i+1
}
I don't see how I can use the same trick to smooth out these curves, since muscle$Conc still only has 4 values.
Edit 2:
I figured it out, and changed it to the following:
lines(seq(0,4,0.1),theta[i]+theta[i+21]*exp(-seq(0,4,0.1)/theta[43]), col=color[i])
If you look at the output of cbind(muscle$Conc, predict(Length.fit)), you'll see that many points are repeated and that they're not sorted in order of Conc. lines just plots the points in order and connects the points, giving you multiple back-and-forth lines. The code below runs predict on a unique set of ordered values for Conc.
plot(Length ~ Conc,data=muscle)
lines(seq(0,4,0.1),
predict(Length.fit, newdata=data.frame(Conc=seq(0,4,0.1))))

Labelling the residuals on diagnostic plots

I have made a linear regression model in R with 3 continuous independent variables and one continuous dependent variable. I have generated the diagnostic plots.
I would now like to label/colour the data points for each residual on my diagnostic plots according to the binary categorical independent variable that was not included in the model;
i.e. when this variable = A, I want a blue dot on my diagnostic plot,
and when this variable = B, I want a red dot, so there will be red and blue dots on my diagnostic plots.
I would love some advice on how to do this.
[You don't specify what diagnostic plots you're trying to do this to. You also haven't given a minimal reproducible example, which makes it difficult to alter what you were doing to do what you want.]
I'll give an example of the kind of command that does what you need and you may be able to adapt it to whatever displays you need.
library(MASS)
catsmdl <- lm(Hwt~Bwt,cats)
plot(residuals(catsmdl)~fitted(catsmdl), col=cats$Sex)
abline(h=0, col=8, lty=3)
which gives:
This even works with plot.lm, because it has a ... argument to pass information along to the lower level plotting functions. So for example:
opar <- par()
par(mfrow=c(2,2))
plot(catsmdl,col=c("blue","darkorange")[as.numeric(cats$Sex)])
par(opar)
If you replace c("blue","darkorange") with whatever colours you like, it should work. (There are a variety of ways to specify colours in R.)

Uniform plot points in R -- Research / HW

This is for research I am doing for my Masters Program in Public Health
I am graphing data against each other, a standard x,y type deal, over top of that I am plotting a predicted line. I get what I think to be the most funky looking point/boxplot looking thing ever with an x axis that is half filled out and I don't understand why as I do not call a boxplot function. When I call the plot function it is my understanding that only the points will plot.
The data I am plotting looks like this
TOTAL.LACE | DAYS.TO.FAILURE
9 | 15
16 | 7
... | ...
The range of the TOTAL.LACE is from 0 to 19 and DAYS.TO.FAILURE is 0 - 30
My code is as follows, maybe it is something before the plot but I don't think it is:
# To control the type of symbol we use we will use psymbol, it takes
# value 1 and 2
psymbol <- unique(FAILURE + 1)
# Build a test frame that will predict values of the lace score due to
# a patient being in a state of failure
test <- survreg(Surv(time = DAYS.TO.FAILURE, event = FAILURE) ~ TOTAL.LACE,
dist = "logistic")
pred <- predict(test, type="response") <-- produces numbers from about 14 to 23
summary(pred)
ord <- order(TOTAL.LACE)
tl_ord <- TOTAL.LACE[ord]
pred_ord <- pred[ord]
plot(TOTAL.LACE, DAYS.TO.FAILURE, pch=unique(psymbol)) <-- Produces goofy graph
lines(tl_ord, pred_ord) <-- this produces the line not boxplots
Here is the resulting picture
Not to sure how to proceed from here, this is an off shoot of another problem I had with the same data set at this link here I am not understanding why boxplots are being drawn, the reason being is I did not specifically call the boxplot() command so I don't know why they appeared along with point plots. When I issue the following command: plot(DAYS.TO.FAILURE, TOTAL.LACE) I only get points on the resulting plot like I expected, but when I change the order of what is plotted on x and y the boxplots show up, which to me is unexpected.
Here is a link to sample data that will hopefully help in reproducing the problem as pointed out by #Dwin et all Some Sample Data
Thank you,
Since you don't have a reproducible example, it is a little hard to provide an answer that deals with your situation. Here I generate some vaguely similar-looking data:
set.seed(4)
TOTAL.LACE <- rep(1:19, each=1000)
zero.prob <- rbinom(19000, size=1, prob=.01)
DAYS.TO.FAILURE <- rpois(19000, lambda=15)
DAYS.TO.FAILURE <- ifelse(zero.prob==1, DAYS.TO.FAILURE, 0)
And here is the plot:
First, the problem with some of the categories not being printed on the x-axis is because they don't fit. When you have so many categories, to make them all fit you have to display them in a smaller font. The code to do this is to use cex.axis and set the value <1 (you can read more about this here):
boxplot(DAYS.TO.FAILURE~TOTAL.LACE, cex.axis=.8)
As to the question of why your plot is "goofy" or "funky-looking", it is a bit hard to say, because those terms are rather nebulous. My guess is that you need to more clearly understand how boxplots work, and then understand what these plots are telling you about the distribution of your data. In a boxplot, the midline of the box is the 50th percentile of your data, while the bottom and top of the box are the 25th and 75th percentiles. Typically, the 'whiskers' will extend out to the furthest datapoint that is at most 1.5 times the inter-quartile range beyond the ends of the box. In your case, for the first 9 TOTAL.LACEs, more than 75% of your data are 0's, so there is no box and thus no whiskers are possible. Everything beyond the whisker limits is plotted as an individual point. I don't think your plots are "funky" (although I'll admit I have no idea what you mean by that), I think your data may be "funky" and your boxplots are representing the distributions of your data accurately according to the rules by which boxplots are constructed.
In the future (and I mean this politely), it will help you get more useful and faster answers if you can write questions that are more clearly specified, and contain a reproducible example.
Update: Thanks for providing more information. I gather by "funky" you mean that it is a boxplot, rather than a typical scatterplot. The thing to realize is that plot() is a generic function that will call different methods depending on what you pass to it. If you pass simple continuous data, it will produce a scatterplot, but if you pass continuous data and a factor, then it will produce a boxplot, even if you don't call boxplot explicitly. Consider:
plot(TOTAL.LACE, DAYS.TO.FAILURE)
plot(as.factor(TOTAL.LACE), DAYS.TO.FAILURE)
Evidently, you have converted DAYS.TO.FAILURE to a factor without meaning to. Presumably this was done in the pch=unique(psymbol) argument via the code psymbol <- unique(FAILURE + 1) above. Although I haven't had time to try this, I suspect eliminating that line of code and using pch=(FAILURE + 1) will accomplish your goals.

Plotting histograms with R; y axis keeps changing to frequency from proportion/probability

I try to overlay two histograms in the same plane but the option Probability=TRUE (relative frequencies) in hist() is not effective with the code below. It is a problem because the two samples have very different sizes (length(cl1)=9 and length(cl2)=339) and, with this script, I cannot vizualize differences between both histograms because each shows frequencies. How can I overlap two histograms with the same bin width, showing relative frequencies?
c1<-hist(dataList[["cl1"]],xlim=range(minx,maxx),breaks=seq(minx,maxx,pasx),col=rgb(1,0,0,1/4),main=paste(paramlab,"Group",groupnum,"cl1",sep=" "),xlab="",probability=TRUE)
c2<-hist(dataList[["cl2"]],xlim=range(minx,maxx),breaks=seq(minx,maxx,pasx),col=rgb(0,0,1,1/4),main=paste(paramlab,"Group",groupnum,"cl2",sep=" "),xlab="",probability=TRUE)
plot(c1, col=rgb(1,0,0,1/4), xlim=c(minx,maxx), main=paste(paramlab,"Group",groupnum,sep=" "),xlab="")# first histogram
plot(c2, col=rgb(0,0,1,1/4), xlim=c(minx,maxx), add=T)
cl1Col <- rgb(1,0,0,1/4)
cl2Col <- rgb(0,0,1,1/4)
legend('topright',c('Cl1','Cl2'),
fill = c(cl1Col , cl2Col ), bty = 'n',
border = NA)
Thanks in advance for your help!
When you call plot on an object of class histogram (like c1), it calls the S3 method for the histogram. Namely, plot.histogram. You can see the code for this function if you type graphics:::plot.histogram and you can see its help under ?plot.histogram. The help file for that function states:
freq logical; if TRUE, the histogram graphic is to present a
representation of frequencies, i.e, x$counts; if FALSE, relative
frequencies (probabilities), i.e., x$density, are plotted. The default
is true for equidistant breaks and false otherwise.
So, when plot renders a histogram it doesn't use the previously specified probability or freq arguments, it tries to figure it out for itself. The reason for this is obvious if you dig around inside c1, it contains all of the data necessarily for the plot, but does not specify how it should be rendered.
So, the solution is to reiterate the argument freq=FALSE when you run the plot functions. Notably, freq=FALSE works whereas probability=TRUE does not because plot.histogram does not have a probability option. So, your plot code will be:
plot(c1, col=rgb(1,0,0,1/4), xlim=c(minx,maxx), main=paste(paramlab,"Group",groupnum,sep=" "),xlab="",freq=FALSE)# first histogram
plot(c2, col=rgb(0,0,1,1/4), xlim=c(minx,maxx), add=T, freq=FALSE)
This all seems like a oversight/idiosyncratic decision (or lack thereof) on the part of the R devs. To their credit it is appropriately documented and is not "unexpected behavior" (although I certainly didn't expect it). I wonder where such oddness should be reported, if it should be reported at all.

Resources