How to smooth non-linear regression curve in R - r

So I'm asked to obtain an estimate theta of the variable Length in the MASS package. The code I used is shown below, as well as the resulting curve. Somehow, I don't end up with a smooth curve, but with a very "blocky" one, as well as some lines between points on the curve. Can anyone help me to get a smooth curve?
utils::data(muscle,package = "MASS")
Length.fit<-nls(Length~t1+t2*exp(-Conc/t3),muscle,
start=list(t1=3,t2=-3,t3=1))
plot(Length~Conc,data=muscle)
lines(muscle$Conc, predict(Length.fit))
Image of the plot:
.
Edit: as a follow-up question:
If I want to more accurately predict the curve, I use nonlinear regression to predict the curve for each of the 21 species. This gives me a vector
theta=(T11,T12,...,T21,T22,...,T3).
I can create a for-loop that plots all of the graphs, but like before, I end up with the blocky curve. However, seeing as I have to plot these curves as follows:
for(i in 1:21) {
lines(muscle$Conc,theta[i]+theta[i+21]*
exp(-muscle$Conc/theta[43]), col=color[i])
i = i+1
}
I don't see how I can use the same trick to smooth out these curves, since muscle$Conc still only has 4 values.
Edit 2:
I figured it out, and changed it to the following:
lines(seq(0,4,0.1),theta[i]+theta[i+21]*exp(-seq(0,4,0.1)/theta[43]), col=color[i])

If you look at the output of cbind(muscle$Conc, predict(Length.fit)), you'll see that many points are repeated and that they're not sorted in order of Conc. lines just plots the points in order and connects the points, giving you multiple back-and-forth lines. The code below runs predict on a unique set of ordered values for Conc.
plot(Length ~ Conc,data=muscle)
lines(seq(0,4,0.1),
predict(Length.fit, newdata=data.frame(Conc=seq(0,4,0.1))))

Related

R: Could not get two graphs plot with the same starting X values

I'm plotting density probabilities against predicted probabilities with poisson distribution in R, this is the code I used:
dens=density(data)
plot(dens$x,dens$y,type="l",xlab="Value",
ylab="Count estimate",ylim=c(0,0.2),xlim=range(0:22),col=4,lwd=2)
lines(dpois(0:22,lambda=estimated_lambda), col=2,lwd=2)
And this is the result:
For some reason I couldn't get both lines to line up, even though both have the same x-axis range (as shown in the code above), it should work the same given that both values are discrete, and if I plot the dpois not using lines then the other line will not show up at all.
Any help is appreciated.

kernel density estimator in R

I'm using the last column from the following data,
Data
And I'm trying to apply the idea of a kernel density estimator to this dataset which is represented by
where k is some kernal, normally a normal distribution though not necessarily., h is the bandwidth, n is the length of the data set, X_i is each data point and x is a fitted value. So using this equation I have the following code,
AstroData=read.table(paste0("http://www.stat.cmu.edu/%7Elarry",
"/all-of-nonpar/=data/galaxy.dat"),
header=FALSE)
x=AstroData$V3
xsorted=sort(x)
x_i=xsorted[1:1266]
hist(x_i, nclass=308)
n=length(x_i)
h1=.002
t=seq(min(x_i),max(x_i),0.01)
M=length(t)
fhat1=rep(0,M)
for (i in 1:M){
fhat1[i]=sum(dnorm((t[i]-x_i)/h1))/(n*h1)}
lines(t, fhat1, lwd=2, col="red")
Which produces a the following plot,
which is actually close to what I want as the final result should appear as this once I remove the histograms,
Which if you noticed is finer tuned and the red lines which should represent the density are rather rough and are not scaled as high. The final plot that you see is run using the density function in R,
plot(density(x=y, bw=.002))
Which is what I want to get to without having to use any additional packages.
Thank you
After some talk with my roommate he gave me the idea to go ahead and decrease the interval of the t-values (x). In doing some I changed it from 0.01 to 0.001. So the final code for this plot is as appears,
AstroData=read.table(paste0("http://www.stat.cmu.edu/%7Elarry",
"/all-of-nonpar/=data/galaxy.dat"),
header=FALSE)
x=AstroData$V3
xsorted=sort(x)
x_i=xsorted[1:1266]
hist(x_i, nclass=308)
n=length(x_i)
h1=.002
t=seq(min(x_i),max(x_i),0.001)
M=length(t)
fhat1=rep(0,M)
for (i in 1:M){
fhat1[i]=sum(dnorm((t[i]-x_i)/h1))/(n*h1)}
lines(t, fhat1, lwd=2, col="blue")
Which in terms gives the following plot, which is the one that I wanted,

plotting custom confidence intervals around curve fits R

Say I have some relationships between x and some outcome variable, for three different groups, A,B and C:
x<-c(0:470)/1000
#3 groups, each has a different v-max parameter value.
v.A<-5
v.B<-4
v.C<-3
C<- (v.C*x)/(0.02+x)
B<- (v.B*x)/(0.02+x)
A<-(v.A*x)/(0.02+x)
d.curve<-data.frame(x,A,B,C)
The estimates of the v. parameter also have associated errors:
err.A<-0.24
err.B<-0.22
err.C<-0.29
I'd like to plot these curve fits, as well as shaded error regions around each curve, based on the uncertainty in the v. parameter. So, the shaded region would be +/- one error value. I can generate the plot of the 3 curves easily enough:
limx<-c(0,0.47)
limy<-c(0,5.5)
plot(A~x,data=d.curve,xlim=limx,ylim=limy,col=NA)
lines(smooth.spline(d.curve$x,d.curve$A),col='black',lwd=3)
par(new=T)
plot(B~x,data=d.curve,xlim=limx,ylim=limy,col=NA,ylab=NA,xlab=NA,axes=F)
lines(smooth.spline(d.curve$x,d.curve$B),col='black',lwd=3,lty=2)
par(new=T)
plot(C~x,data=d.curve,xlim=limx,ylim=limy,col=NA,ylab=NA,xlab=NA,axes=F)
lines(smooth.spline(d.curve$x,d.curve$C),col='black',lwd=3,lty=3)
But how can I added custom shaded regions around them, based on specified error terms?
You can add the following code to your current code. The calculation of the error of the line is based on the error (assumed standard error) of the coefficients. You can change the calculation for the error of the line to something else, if desired. The order of plotting might need to be changed to make the polygons appear behind the lines.
# calculating the standard error of the line base on standard error of A,B,C
# could substitute another calculation
se.line.A <- ((x)/(0.02+x))*err.A
se.line.B <- ((x)/(0.02+x))*err.B
se.line.C <- ((x)/(0.02+x))*err.C
# library for polygons
library(graphics)
# plotting polygons
# colors can be changed
# polygons will be drawn over the existing lines
# may change the order of plotting for the shaded regions to be behind line
polygon(c(x,rev(x))
,c(A+se.line.A,rev(A-se.line.A))
,col='gray'
,density=100)
polygon(c(x,rev(x))
,c(B+se.line.B,rev(B-se.line.B))
,col='blue'
,density=100)
polygon(c(x,rev(x))
,c(C+se.line.C,rev(C-se.line.C))
,col='green'
,density=100)

lines() not properly displaying quadratic fit

I'm simply trying to display the fit I've generated using lm(), but the lines function is giving me a weird result in which there are multiple lines coming out of one point.
Here is my code:
library(ISLR)
data(Wage)
lm.mod<-lm(wage~poly(age, 4), data=Wage)
Wage$lm.fit<-predict(lm.mod, Wage)
plot(Wage$age, Wage$wage)
lines(Wage$age, Wage$lm.fit, col="blue")
I've tried resetting my plot with dev.off(), but I've had no luck. I'm using rStudio. FWIW, the line shows up perfectly fine if I make the regression linear only, but as soon as I make it quadratic or higher (using I(age^2) or poly()), I get a weird graph. Also, the points() function works fine with poly().
Thanks for the help.
Because you forgot to order the points by age first, the lines are going to random ages. This is happening for the linear regression too; he reason it works for lines is because traveling along any set of points along a line...stays on the line!
plot(Wage$age, Wage$wage)
lines(sort(Wage$age), Wage$lm.fit[order(Wage$age)], col = 'blue')
Consider increasing the line width for a better view:
lines(sort(Wage$age), Wage$lm.fit[order(Wage$age)], col = 'blue', lwd = 3)
Just to add another more general tip on plotting model predictions:
An often used strategy is to create a new data set (e.g. newdat) which contains a sequence of values for your predictor variables across a range of possible values. Then use this data to show your predicted values. In this data set, you have a good spread of predictor variable values, but this may not always be the case. With the new data set, you can ensure that your line represents evenly distributed values across the variable's range:
Example
newdat <- data.frame(age=seq(min(Wage$age), max(Wage$age),length=1000))
newdat$pred <- predict(lm.mod, newdata=newdat)
plot(Wage$age, Wage$wage, col=8, ylab="Wage", xlab="Age")
lines(newdat$age, newdat$pred, col="blue", lwd=2)

Extended Survival Plot Lines in R

I've obtained a survival plot from the following code:
s = Surv(outcome.[,1], outcome.[,2])
survplot= (survfit(s ~ person.list[,1]))
plot(survplot, mark.time = FALSE)
person.list is just a list of 15 people.
When I plot this, the lines on my plot all end at different time points. Is there a way to extend all the lines to make them end at a certain time point? (i.e outcome.[,1] is a time to event variable and I would like the survival lines on the plot to extend out to say 5(years) )
Thanks,
Matt
This isn't an answer of how to do what you ask, but rather an explanation of why you should not do what you ask.
The lines stop where the data stops. Beyond that time, you have no information in order to make an estimate of the survival (this is in a traditional Kaplan-Meier survival analysis, as you have set it up). Therefore, the Kaplan-Meier estimate is not well defined beyond that time, and so extending that curve does not have any particular meaning. While graphically you could just draw a horizontal line at the same level as the last survival value, this is not really meaningful.
This is code I posted to a similar question on rhelp a while ago:
http://finzi.psych.upenn.edu/Rhelp10/2010-September/253817.html
?survfit # to get a working example since you did not provide one
lsurv2 <- survfit(Surv(time, status) ~ x, aml, type='fleming')
plot(lsurv2, lty=2:3, xmax=300) # drats, no effect of xmax
str(lsurv2) # so see the structure of the survfit object
lsurv2$time[21] <- 300 #add a time value
lsurv2$n.censor[21] <- 1 # mark as censoring time
lsurv2$strata[2] <- 11 # add to count of group 2
plot(lsurv2, lty=2:3, xmax=300) # horizontal line to 300 for group 2
And this was Therneau's later response (presumably better than mine): http://finzi.psych.upenn.edu/Rhelp10/2010-September/253879.html
plot(surv, mark.time=F, fun='event', xlim=c(0, 54))
for (i in 1:length(surv$strata)) { #number of curves
temp <- surv[i]
lines(c(max(temp$time), 54), 1- rep(min(temp$surv),2))
}

Resources