plot least squared line in a scatter plot - plot

Having created a scatter plot with pandas. I don't know how to create the regresion line that would be the least squared from the points.
looking for examples in http://matplotlib.org i haven't found any similar graph.
Thanks you a lot in advance !!

Pandas has an ordinary least squares (ols) function, there is a very detailed example in the 0.10.1 docs of how to plot the result, here's a snippet:
model = ols(y=rets['AAPL'], x=rets.ix[:, ['GOOG']], window=250)
# just plot the coefficient for GOOG
model.beta['GOOG'].plot()
Note: this example is no longer in the docs (since 0.10.1), I'm not sure why.

Related

R: Could not get two graphs plot with the same starting X values

I'm plotting density probabilities against predicted probabilities with poisson distribution in R, this is the code I used:
dens=density(data)
plot(dens$x,dens$y,type="l",xlab="Value",
ylab="Count estimate",ylim=c(0,0.2),xlim=range(0:22),col=4,lwd=2)
lines(dpois(0:22,lambda=estimated_lambda), col=2,lwd=2)
And this is the result:
For some reason I couldn't get both lines to line up, even though both have the same x-axis range (as shown in the code above), it should work the same given that both values are discrete, and if I plot the dpois not using lines then the other line will not show up at all.
Any help is appreciated.

How to smooth non-linear regression curve in R

So I'm asked to obtain an estimate theta of the variable Length in the MASS package. The code I used is shown below, as well as the resulting curve. Somehow, I don't end up with a smooth curve, but with a very "blocky" one, as well as some lines between points on the curve. Can anyone help me to get a smooth curve?
utils::data(muscle,package = "MASS")
Length.fit<-nls(Length~t1+t2*exp(-Conc/t3),muscle,
start=list(t1=3,t2=-3,t3=1))
plot(Length~Conc,data=muscle)
lines(muscle$Conc, predict(Length.fit))
Image of the plot:
.
Edit: as a follow-up question:
If I want to more accurately predict the curve, I use nonlinear regression to predict the curve for each of the 21 species. This gives me a vector
theta=(T11,T12,...,T21,T22,...,T3).
I can create a for-loop that plots all of the graphs, but like before, I end up with the blocky curve. However, seeing as I have to plot these curves as follows:
for(i in 1:21) {
lines(muscle$Conc,theta[i]+theta[i+21]*
exp(-muscle$Conc/theta[43]), col=color[i])
i = i+1
}
I don't see how I can use the same trick to smooth out these curves, since muscle$Conc still only has 4 values.
Edit 2:
I figured it out, and changed it to the following:
lines(seq(0,4,0.1),theta[i]+theta[i+21]*exp(-seq(0,4,0.1)/theta[43]), col=color[i])
If you look at the output of cbind(muscle$Conc, predict(Length.fit)), you'll see that many points are repeated and that they're not sorted in order of Conc. lines just plots the points in order and connects the points, giving you multiple back-and-forth lines. The code below runs predict on a unique set of ordered values for Conc.
plot(Length ~ Conc,data=muscle)
lines(seq(0,4,0.1),
predict(Length.fit, newdata=data.frame(Conc=seq(0,4,0.1))))

Different lowess curves in plot and qplot in R

I am comparing two graphs with a non-parametric lo(w)ess curve superimposed in each case. The problem is that the curves look very different, despite the fact that their arguments, such as span, are identical.
y<-rnorm(100)
x<-rgamma(100,2,2)
qplot(x,y)+stat_smooth(span=2/3,se=F)+theme_bw()
plot(x,y)
lines(lowess(y~x))
There seems to be a lot more curvatute in the graph generated by qplot(). As you know detecting curvature is very important in the diagnostics of regression analysis and I fear that If I am to use ggplot2, I would reach erroneous conclusions.
Could you please tell me how I could produce the same curve in ggplot2?
Thank you
Or, you can use loess(..., degree=1). This produces a very similar, but not quite identical result to lowess(...)
set.seed(1) # for reproducibility
y<-rnorm(100)
x<-rgamma(100,2,2)
plot(x,y)
points(x,loess(y~x,data.frame(x,y),degree=1)$fitted,pch=20,col="red")
lines(lowess(y~x))
With ggplot
qplot(x,y)+stat_smooth(se=F,degree=1)+
theme_bw()+
geom_point(data=as.data.frame(lowess(y~x)),aes(x,y),col="red")
Here is a new stat function for use with ggplot2 that uses lowess(): https://github.com/harrelfe/Hmisc/blob/master/R/stat-plsmo.r. You need to load the proto package for this to work. I like using lowess because it is fast for any sample size and allows outlier detection to be turned off for binary Y. But it doesn't provide confidence bands.

qqline() equivalent for a normal probability plot of edf

I made a plot of an empirical distribution function (EDF) using plot.ecdf(x, ...).
In order to visualize normality, I'm looking in r for a qqline equivalent to draw a simple diagonal line in my plot.
The normplot() function in MATLAB is doing the same thing (See the red line in the plot on this link: http://www.mathworks.de/de/help/stats/normplot.html). Thanks.
As mentioned in the comments, just call qqline():
x <- ecdf(rnorm(10))
plot.ecdf(x)
qqline(x)

In R and ggplot2 package, How to Add Lines?

I am doing a survival analysis and have produced the survival graph using
plot function to plot the Kaplan-Meier(KA variable) estimate as y value against time.
lines function to plot the step lines between estimate i and i+1, for each i=1,2,....
The code is as follows:
plot(KA)
for( i in 1:(length(KA)-1)){
lines(c(i,i+1),c(KA[i],KA[i])) # The horizontal step lines
lines(c(i+1,i+1),c(KA[i],KA[i+1])) # The vertical step lines
}
Now I want to make a more beautiful survival graph using ggplot2 package.
The question is: how to add the step lines into the graph?
I'm sorry, I can not put graphs as my reputation is less than 10.
Have a look at either geom_step, geom_path or geom_segment.
They might be helpful in what you are trying to achieve.
http://had.co.nz/ggplot2/geom_step.html
http://had.co.nz/ggplot2/geom_path.html
http://had.co.nz/ggplot2/geom_segment.html

Resources