Regression Line to Data Points/ How to create vertical lines? - r

How one can get the following visualization in R (see below):
let's consider a simple case of three points.
# Define two vectors
x <- c(12,21,54)
y <- c(2, 7, 11)
# OLS regression
ols <- lm(y ~ x)
# Visualisation
plot(x,y, xlim = c(0,60), ylim =c(0,15))
abline(ols, col="red")
What I desire is, to draw the vertical distance lines from OLS line (red line) to points.

You can do this really nicely with ggplot2
library(ggplot2)
set.seed(1)
x<-1:10
y<-3*x + 2 + rnorm(10)
m<-lm(y ~ x)
yhat<-m$fitted.values
diff<-y-yhat
qplot(x=x, y=y)+geom_line(y=yhat)+
geom_segment(aes(x=x, xend=x, y=y, yend=yhat, color="error"))+
labs(title="regression errors", color="series")

There is a much simpler solution:
segments(x, y, x, predict(ols))

If you construct a matrix of points, you can use apply to plot the lines like this:
Create a matrix of coordinates:
cbind(x,x,y,predict(ols))
# x x y
#1 12 12 2 3.450920
#2 21 21 7 5.153374
#3 54 54 11 11.395706
This can be plotted as:
apply(cbind(x,x,y,predict(ols)),1,function(coords){lines(coords[1:2],coords[3:4])})
effectively a for loop running over the rows of the matrix and plotting one line for each row.

Related

How to draw a graph with both x-axis and y-axis are functions in R?

I have a function,
x= (z-z^2.5)/(1+2*z-z^2)
y = z-z^2.5
where z is the only variable. How to draw a graph where x-axis shows value of function x, and y-axis shows value of function y as z range from 0 to 5?
You can get a very basic plot by simply following your own instructions.
## z ranges from 0 to 5
z = seq(0,5,0.01)
## x and y are functions of z
x = (z-z^2.5)/(1+2*z-z^2)
y = z-z^2.5
##plot
plot(x,y, pch=20, cex=0.5)
If you want a smooth curve it is a little trickier. There is a discontinuity in the curve at
z = 1 + sqrt(2) ~ 2.414. If you just draw the curve as one piece, you get an unwanted line connecting across the discontinuity. So, in two pieces,
plot(x[1:242],y[1:242], type='l', xlab='x', ylab='y',
xlim=range(x), ylim=range(y))
lines(x[243:501],y[243:501])
But be careful about interpreting this. There is something tricky going on from z=0 to z=1.
Using ggplot2
# z ranges from -1000 to 1000 (The range can be arbitrary)
z = seq(-1000,1000,.25)
# x as a function of z
x = (z-z^2.5) / ((1+2*z)-z^2)
# y as a function of z
y = z-z^2.5
# make a dataframe of x,y and z
df <- data.frame(x=x, y=y, z=z)
# subset the df where z is between 0 and 5
df_5 <- subset(df, (df$z>=0 & df$z<=5))
# plot the graph
library(ggplot2)
ggplot(df_5, aes(x,y))+ geom_point(color="red")
The only addition to #G5W answer is subset() of values between 0 and 5 from your dataset to plot and the use of ggplot2.

Plot decision boundaries with ggplot2?

How do I plot the equivalent of contour (base R) with ggplot2? Below is an example with linear discriminant function analysis:
require(MASS)
iris.lda<-lda(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = iris)
datPred<-data.frame(Species=predict(iris.lda)$class,predict(iris.lda)$x) #create data.frame
#Base R plot
eqscplot(datPred[,2],datPred[,3],pch=as.double(datPred[,1]),col=as.double(datPred[,1])+1)
#Create decision boundaries
iris.lda2 <- lda(datPred[,2:3], datPred[,1])
x <- seq(min(datPred[,2]), max(datPred[,2]), length.out=30)
y <- seq(min(datPred[,3]), max(datPred[,3]), length.out=30)
Xcon <- matrix(c(rep(x,length(y)),
rep(y, rep(length(x), length(y)))),,2) #Set all possible pairs of x and y on a grid
iris.pr1 <- predict(iris.lda2, Xcon)$post[, c("setosa","versicolor")] %*% c(1,1) #posterior probabilities of a point belonging to each class
contour(x, y, matrix(iris.pr1, length(x), length(y)),
levels=0.5, add=T, lty=3,method="simple") #Plot contour lines in the base R plot
iris.pr2 <- predict(iris.lda2, Xcon)$post[, c("virginica","setosa")] %*% c(1,1)
contour(x, y, matrix(iris.pr2, length(x), length(y)),
levels=0.5, add=T, lty=3,method="simple")
#Eqivalent plot with ggplot2 but without decision boundaries
ggplot(datPred, aes(x=LD1, y=LD2, col=Species) ) +
geom_point(size = 3, aes(pch = Species))
It is not possible to use a matrix when plotting contour lines with ggplot. The matrix can be rearranged to a data-frame using melt. In the data-frame below the probability values from iris.pr1 are displayed in the first column along with the x and y coordinates in the following two columns. The x and y coordinates form a grid of 30 x 30 points.
df <- transform(melt(matrix(iris.pr1, length(x), length(y))), x=x[X1], y=y[X2])[,-c(1,2)]
I would like to plot the coordinates (preferably connected by a smoothed curve) where the posterior probabilities are 0.5 (i.e. the decision boundaries).
You can use geom_contour in ggplot to achieve a similar effect. As you correctly assumed, you do have to transform your data. I ended up just doing
pr<-data.frame(x=rep(x, length(y)), y=rep(y, each=length(x)),
z1=as.vector(iris.pr1), z2=as.vector(iris.pr2))
And then you can pass that data.frame to the geom_contour and specify you want the breaks at 0.5 with
ggplot(datPred, aes(x=LD1, y=LD2) ) +
geom_point(size = 3, aes(pch = Species, col=Species)) +
geom_contour(data=pr, aes(x=x, y=y, z=z1), breaks=c(0,.5)) +
geom_contour(data=pr, aes(x=x, y=y, z=z2), breaks=c(0,.5))
and that gives
The partimat function in the klaR library does what you want for observed predictors, but if you want the same for the LDA projections, you can build a data frame augmenting the original with the LD1...LDk projections, then call partimat with formula Group~LD1+...+LDk, method='lda' - then you see the "LD-plane" that you intended to see, nicely partitioned for you. This seemed easier to me, at least to explain to students newer to R, since I'm just reusing a function already provided in a way in which it wasn't quite intended.

Plotting a line of best fit from where data starts to where data ends in R

I am trying to plot a line of best fit on my dataset in R:
abline(lm(y~x))
However the line goes all the way through the entire graph. Is there anyway that I can configure the line so that it only covers the area where the data points are (similar to what you get in Excel)?
Many thanks!
A solution would be to use lines() and have two predictions for both extremes of x.
See this example:
x <- rnorm(20)
y <- 5 + 0.4*x + rnorm(20)/10
dt <- data.frame(x=x, y=y)
ols1 <- lm(y ~ x, data=dt)
nd <- data.frame(x=range(x)) ## generate new data with the two extremes of x
plot(x, y) ## original scatter plot
lines(nd$x, predict(ols1, newdata=nd), col='orange') ## line from two points
I hope that helps.

How to achieve a graph like this in R

This sample from a study is very close to what I need. The question is, how do I achieve the conditional background color like in the chart below. This chart has two categories, I have three, so I would use some texture for the third.
The categories for the condition that changes over time are in a vector with names CL, C, and CR.
Here's some sample data. So there's the index and then there's the categories that are government types (center-left, center, center-right). In the data there are 72 government terms so there are 72 consecutive runs, therefore doing it by hand with rects is kind of cumbersome at least. I do understand that first I need to plot the categories and then add the line to the plot, I'll worry about axes after the fact and add them last.
shareindex categ
100 C
103 C
104 C
102 CL
99 CL
98 CR
99 CR
101 CL
104 CL
105 CR
104 CR
102 C
103 C
Here's some example data and a call to plot using the panel.first argument to draw the rectangles. I've suggested here using an lapply call to simply the drawing the many rectangles.
# data
set.seed(1)
x <- rnorm(1000)
x2 <- cumsum(x)
y <- rnorm(1000)
y2 <- cumsum(y)-5
ranges <- list(c(5,10), c(20,100), c(200,250), c(500,600), c(800,820), c(915,930))
# expression to be used for plotting gray boxes
boxes <- expression(lapply(ranges, function(z) rect(z[1],-100,z[2],100, col='gray', border=NA)))
# the actual plotting
plot(1:1000, x2, type='l', xlab='time', panel.first = eval(boxes))
lines(1:1000, y2, col='red')
You can use rect to make rectangles and plot lines on top of that
For your data example:
set.seed(1)
x <- 1:100
y <- cumsum(rnorm(100))
z <- c(rep(1, 10), rep(2,20), rep(1,40), rep(3,30))
plot(x, y, type="n")
rect(xleft = x - 1, xright = x, ybottom=par("usr")[3], ytop=par("usr")[4], col=z, border=NA )
lines(x, y, col="white")
Edit for your data:
## Data frame with the data
dat <- data.frame(shareindex=c(100,103,104,102, 99,98,99,101,104,105,104,102,103),
categ=c("C","C","C","CL","CL","CR","CR", "CL", "CL","CR", "CR","C", "C"))
## Add index column
dat$id <- seq(along.with=dat$shareindex)
# Add your background colors here
cols <- c("lightgray","grey", "lightblue")
## Just an empty plot
plot(dat$id, dat$shareindex, type="n", ylab="Share index", xlab="id")
## Plot the rectangles for the background
rect(xleft =dat$id - 1 , xright = dat$id,
ybottom=par("usr")[3], ytop=par("usr")[4],
col=cols[dat$categ], border=NA )
## Plot the line
lines(dat$id, dat$shareindex, lwd=2)
The output looks like this:
Cheers,
alex

Plot 95% confidence limits in scatterplot

I need to plot several data points that are defined as
c(x,y, stdev_x, stdev_y)
as a scatter plot with a representation of their 95% confidence limits, for examples showing the point and one contour around it. Ideally I'd like to plot on oval around the point, but don't know how to do it. I was thinking of building samples and plotting them, adding stat_density2d() but would need to limit the number of contours to 1, and could not figure out how to do it.
require(ggplot2)
n=10000
d <- data.frame(id=rep("A", n),
se=rnorm(n, 0.18,0.02),
sp=rnorm(n, 0.79,0.06) )
g <- ggplot (d, aes(se,sp)) +
scale_x_continuous(limits=c(0,1))+
scale_y_continuous(limits=c(0,1)) +
theme(aspect.ratio=0.6)
g + geom_point(alpha=I(1/50)) +
stat_density2d()
First, saved all your plot as object (changed limits).
g <- ggplot (d, aes(se,sp, group=id)) +
scale_x_continuous(limits=c(0,0.5))+
scale_y_continuous(limits=c(0.5,1)) +
theme(aspect.ratio=0.6) +
geom_point(alpha=I(1/50)) +
stat_density2d()
With function ggplot_build() save all the information used for the plot. Contours are stored in object data[[2]].
gg<-ggplot_build(g)
str(gg$data)
head(gg$data[[2]])
level x y piece group PANEL
1 10 0.1363636 0.7390318 1 1-1 1
2 10 0.1355521 0.7424242 1 1-1 1
3 10 0.1347814 0.7474747 1 1-1 1
4 10 0.1343692 0.7525253 1 1-1 1
5 10 0.1340186 0.7575758 1 1-1 1
6 10 0.1336037 0.7626263 1 1-1 1
There are in total 12 contour lines but to keep only outer line, you should subset only group=="1-1" and replace original information.
gg$data[[2]]<-subset(gg$data[[2]],group=="1-1")
Then use ggplot_gtable() and grid.draw() to get your plot.
p1<-ggplot_gtable(gg)
grid.draw(p1)
latticeExtra provides panel.ellipse is a lattice panel function that computes and draws a confidence ellipsoid from bivariate data, possibly grouped by a third variable.
here I draw the levels 0.65 and 0.95 suing your data.
library(latticeExtra)
xyplot(sp~se,data=d,groups=id,
par.settings = list(plot.symbol = list(cex = 1.1, pch=16)),
panel = function(x,y,...){
panel.xyplot(x, y,alpha=0.2)
panel.ellipse(x, y, lwd = 2, col="green", robust=FALSE, level=0.65,...)
panel.ellipse(x, y, lwd = 2, col="red", robust=TRUE, level=0.95,...)
})
Looks like the stat_ellipse function that you found is really a great solution, but here's another one (non-ggplot), just for the record, using dataEllipse from the car package.
# some sample data
n=10000
g=4
d <- data.frame(ID = unlist(lapply(letters[1:g], function(x) rep(x, n/g))))
d$x <- unlist(lapply(1:g, function(i) rnorm(n/g, runif(1)*i^2)))
d$y <- unlist(lapply(1:g, function(i) rnorm(n/g, runif(1)*i^2)))
# plot points with 95% normal-probability contour
# default settings...
library(car)
with(d, dataEllipse(x, y, ID, level=0.95, fill=TRUE, fill.alpha=0.1))
# with a little more effort...
# random colours with alpha-blending
d$col <- unlist(lapply(1:g, function (x) rep(rgb(runif(1), runif(1), runif(1), runif(1)),n/g)))
# plot points first
with(d, plot(x,y, col=col, pch="."))
# then ellipses over the top
with(d, dataEllipse(x, y, ID, level=0.95, fill=TRUE, fill.alpha=0.1, plot.points=FALSE, add=TRUE, col=unique(col), ellipse.label=FALSE, center.pch="+"))
Just found the function stat_ellipse() here (and here) and it takes care of this beautifully.
g + geom_point(alpha=I(1/10)) +
stat_ellipse(aes(group=id), color="black")
Different data set, of course:
I don't know anything about the ggplot2 library, but you can draw ellipses with plotrix. Does this plot look anything like what you're asking for?
library(plotrix)
n=10
d <- data.frame(x=runif(n,0,2),y=runif(n,0,2),seX=runif(n,0,0.1),seY=runif(n,0,0.1))
plot(d$x,d$y,pch=16,ylim=c(0,2),xlim=c(0,2))
draw.ellipse(d$x,d$y,d$seX,d$seY)

Resources