Combining two plot in R - r

I wish to compare the observed values to the fitted ones. To do so, I decided to use a plot in R. What I want to do is to plot X vs Y and X vs Y.fitted on the same plot. I have written some code, but it is incomplete. My plot needs to look like this one below. On the plot, circles and crosses represent the observed and fitted values respectively
set.seed(1)
x <- runif(8,0,1)
y <- runif(8,0,1)
y.fitted <- runif(8,0,1)
plot(x,y,pch=1)
plot(x,y.fitted,pch=5)

In your code, the second plot will not add points to the existing plot but create a new one. You can + use the function points to add points to the existing plot.
plot(x, y, pch = 1)
points(x, y.fitted, pch = 4)

running plot the second time will create a new one. You could use points
set.seed(1)
x <- runif(8,0,1)
y <- runif(8,0,1)
y.fitted <- runif(8,0,1)
plot(x,y,pch=1)
points(x,y.fitted,pch=5)

A solution with ggplot2 giving a better and neat graph outlook:
library(ggplot2)
df = data.frame(x=runif(8,0,1),y=runif(8,0,1),y.fitted=runif(8,0,1))
df = melt(df, id=c('x'))
ggplot() + geom_point(aes(x=x,y=value, shape=variable, colour=variable), df)

Related

Change colors of select lines in ggplot2 coefficient plot in R

I would like to change the color of coefficient lines based on whether the point estimate is negative or positive in a ggplot2 coefficient plot in R. For example:
require(coefplot)
set.seed(123)
dat <- data.frame(x = rnorm(100), z = rnorm(100))
mod1 <- lm(y1 ~ x + z, data = dat)
coefplot.lm(mod1)
Which produces the following plot:
In this plot, I would like to change the "x" variable to red when plotted. Any ideas? Thanks.
I think, you cannot do this with a plot produced by coefplot.lm. The package coefplot uses ggplot2 as the plotting system, which is good itself, but does not allow to play with colors as easily as you would like. To achieve the desired colors, you need to have a variable in your dataset that would color-code the values; you need to specify color = color-code in aes() function within the layer that draws the dots with CE. Apparently, this is impossible to do with the output of coefplot.lm function. Maybe, you can change the colors using ggplot2 ggplot_build() function. I would say, it's easier to write your own function for this task.
I've done this once to plot odds. If you want, you may use my code. Feel free to change it. The idea is the same as in coefplot. First, we extract coefficients from a model object and prepare the data set for plotting; second, actually plot.
The code for extracting coefficients and data set preparation
df_plot_odds <- function(x){
tmp<-data.frame(cbind(exp(coef(x)), exp(confint.default(x))))
odds<-tmp[-1,]
names(odds)<-c('OR', 'lower', 'upper')
odds$vars<-row.names(odds)
odds$col<-odds$OR>1
odds$col[odds$col==TRUE] <-'blue'
odds$col[odds$col==FALSE] <-'red'
odds$pvalue <- summary(x)$coef[-1, "Pr(>|t|)"]
return(odds)
}
Plot the output of the extract function
plot_odds <- function(df_plot_odds, xlab="Odds Ratio", ylab="", asp=1){
require(ggplot2)
p <- ggplot(df_plot_odds, aes(x=vars, y=OR, ymin=lower, ymax=upper),asp=asp) +
geom_errorbar(aes(color=col),width=0.1) +
geom_point(aes(color=col),size=3)+
geom_hline(yintercept = 1, linetype=2) +
scale_color_manual('Effect', labels=c('Positive','Negative'),
values=c('blue','red'))+
coord_flip() +
theme_bw() +
theme(legend.position="none",aspect.ratio = asp)+
ylab(xlab) +
xlab(ylab) #switch because of the coord_flip() above
return(p)
}
Plotting your example
set.seed(123)
dat <- data.frame(x = rnorm(100),y = rnorm(100), z = rnorm(100))
mod1 <- lm(y ~ x + z, data = dat)
df <- df_plot_odds(mod1)
plot <- plot_odds(df)
plot
Which yields
Note that I chose theme_wb() as the default. Output is a ggplot2object. So, you may change it quite a lot.

How to plot deviation from mean

In R I have created a simple matrix of one column yielding a list of numbers with a set mean and a given standard deviation.
rnorm2 <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
r <- rnorm2(100,4,1)
I now would like to plot how these numbers differ from the mean. I can do this in Excel as shown below:
But I would like to use ggplot2 to create a graph in R. in the Excel graph I have cheated by using a line graph but if I could do this as columns it would be better. I have tried using a scatter plot but I cant work out how to turn this into deviations from the mean.
Perhaps you want:
rnorm2 <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
set.seed(101)
r <- rnorm2(100,4,1)
x <- seq_along(r) ## sets up a vector from 1 to length(r)
par(las=1,bty="l") ## cosmetic preferences
plot(x, r, col = "green", pch=16) ## draws the points
## if you don't want points at all, use
## plot(x, r, type="n")
## to set up the axes without drawing anything inside them
segments(x0=x, y0=4, x1=x, y1=r, col="green") ## connects them to the mean line
abline(h=4)
If you were plotting around 0 you could do this automatically with type="h":
plot(x,r-4,type="h", col="green")
To do this in ggplot2:
library("ggplot2")
theme_set(theme_bw()) ## my cosmetic preferences
ggplot(data.frame(x,r))+
geom_segment(aes(x=x,xend=x,y=mean(r),yend=r),colour="green")+
geom_hline(yintercept=mean(r))
Ben's answer using ggplot2 works great, but if you don't want to manually adjust the line width, you could do this:
# Half of Ben's data
rnorm2 <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
set.seed(101)
r <- rnorm2(50,4,1)
x <- seq_along(r) ## sets up a vector from 1 to length(r)
# New variable for the difference between each value and the mean
value <- r - mean(r)
ggplot(data.frame(x, value)) +
# geom_bar anchors each bar at zero (which is the mean minus the mean)
geom_bar(aes(x, value), stat = "identity"
, position = "dodge", fill = "green") +
# but you can change the y-axis labels with a function, to add the mean back on
scale_y_continuous(labels = function(x) {x + mean(r)})
in base R it's quite simple, just do
plot(r, col = "green", type = "l")
abline(4, 0)
You also tagged ggplot2, so in that case it will be a bit more complicated, because ggplot requires creating a data frame and then melting it.
library(ggplot2)
library(reshape2)
df <- melt(data.frame(x = 1:100, mean = 4, r = r), 1)
ggplot(df, aes(x, value, color = variable)) +
geom_line()

Plot decision boundaries with ggplot2?

How do I plot the equivalent of contour (base R) with ggplot2? Below is an example with linear discriminant function analysis:
require(MASS)
iris.lda<-lda(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = iris)
datPred<-data.frame(Species=predict(iris.lda)$class,predict(iris.lda)$x) #create data.frame
#Base R plot
eqscplot(datPred[,2],datPred[,3],pch=as.double(datPred[,1]),col=as.double(datPred[,1])+1)
#Create decision boundaries
iris.lda2 <- lda(datPred[,2:3], datPred[,1])
x <- seq(min(datPred[,2]), max(datPred[,2]), length.out=30)
y <- seq(min(datPred[,3]), max(datPred[,3]), length.out=30)
Xcon <- matrix(c(rep(x,length(y)),
rep(y, rep(length(x), length(y)))),,2) #Set all possible pairs of x and y on a grid
iris.pr1 <- predict(iris.lda2, Xcon)$post[, c("setosa","versicolor")] %*% c(1,1) #posterior probabilities of a point belonging to each class
contour(x, y, matrix(iris.pr1, length(x), length(y)),
levels=0.5, add=T, lty=3,method="simple") #Plot contour lines in the base R plot
iris.pr2 <- predict(iris.lda2, Xcon)$post[, c("virginica","setosa")] %*% c(1,1)
contour(x, y, matrix(iris.pr2, length(x), length(y)),
levels=0.5, add=T, lty=3,method="simple")
#Eqivalent plot with ggplot2 but without decision boundaries
ggplot(datPred, aes(x=LD1, y=LD2, col=Species) ) +
geom_point(size = 3, aes(pch = Species))
It is not possible to use a matrix when plotting contour lines with ggplot. The matrix can be rearranged to a data-frame using melt. In the data-frame below the probability values from iris.pr1 are displayed in the first column along with the x and y coordinates in the following two columns. The x and y coordinates form a grid of 30 x 30 points.
df <- transform(melt(matrix(iris.pr1, length(x), length(y))), x=x[X1], y=y[X2])[,-c(1,2)]
I would like to plot the coordinates (preferably connected by a smoothed curve) where the posterior probabilities are 0.5 (i.e. the decision boundaries).
You can use geom_contour in ggplot to achieve a similar effect. As you correctly assumed, you do have to transform your data. I ended up just doing
pr<-data.frame(x=rep(x, length(y)), y=rep(y, each=length(x)),
z1=as.vector(iris.pr1), z2=as.vector(iris.pr2))
And then you can pass that data.frame to the geom_contour and specify you want the breaks at 0.5 with
ggplot(datPred, aes(x=LD1, y=LD2) ) +
geom_point(size = 3, aes(pch = Species, col=Species)) +
geom_contour(data=pr, aes(x=x, y=y, z=z1), breaks=c(0,.5)) +
geom_contour(data=pr, aes(x=x, y=y, z=z2), breaks=c(0,.5))
and that gives
The partimat function in the klaR library does what you want for observed predictors, but if you want the same for the LDA projections, you can build a data frame augmenting the original with the LD1...LDk projections, then call partimat with formula Group~LD1+...+LDk, method='lda' - then you see the "LD-plane" that you intended to see, nicely partitioned for you. This seemed easier to me, at least to explain to students newer to R, since I'm just reusing a function already provided in a way in which it wasn't quite intended.

Plotting a line of best fit from where data starts to where data ends in R

I am trying to plot a line of best fit on my dataset in R:
abline(lm(y~x))
However the line goes all the way through the entire graph. Is there anyway that I can configure the line so that it only covers the area where the data points are (similar to what you get in Excel)?
Many thanks!
A solution would be to use lines() and have two predictions for both extremes of x.
See this example:
x <- rnorm(20)
y <- 5 + 0.4*x + rnorm(20)/10
dt <- data.frame(x=x, y=y)
ols1 <- lm(y ~ x, data=dt)
nd <- data.frame(x=range(x)) ## generate new data with the two extremes of x
plot(x, y) ## original scatter plot
lines(nd$x, predict(ols1, newdata=nd), col='orange') ## line from two points
I hope that helps.

Plotting CCDF of walking durations

I have plotted the CCDF as mentioned in question part of the maximum plot points in R? post to get a plot(image1) with this code:
ccdf<-function(duration,density=FALSE)
{
freqs = table(duration)
X = rev(as.numeric(names(freqs)))
Y =cumsum(rev(as.list(freqs)));
data.frame(x=X,count=Y)
}
qplot(x,count,data=ccdf(duration),log='xy')
Now, on the basis of answer by teucer on Howto Plot “Reverse” Cumulative Frequency Graph With ECDF I tried to plot a CCDF using the commands below:
f <- ecdf(duration)
plot(1-f(duration),duration)
I got a plot like image2.
Also I read in from the comments in one of the answers in Plotting CDF of a dataset in R? as CCDF is nothing but 1-ECDF.
I am totally confused about how to get the CCDF of my data.
Image1
Image2
Generate some data and find the ecdf function.
x <- rlnorm(1e5, 5)
ecdf_x <- ecdf(x)
Generate vector at regular intervals over range of x. (EDIT: you want them evenly spaced on a log scale in this case; if you have negative values, then use sample over a linear scale.)
xx <- seq(min(x), max(x), length.out = 1e4)
#or
log_x <- log(x)
xx <- exp(seq(min(log_x), max(log_x), length.out = 1e3))
Create data with x and y coordinates for plot.
dfr <- data.frame(
x = xx,
ecdf = ecdf_x(xx),
ccdf = 1 - ecdf_x(xx)
)
Draw plot.
p_ccdf <- ggplot(dfr, aes(x, ccdf)) +
geom_line() +
scale_x_log10()
p_ccdf
(Also take a look at aes(x, ecdf).)
I used ggplot to get desired ccdf plot of my data as shown below:
>>ecdf_x <- ecdf(x)
>>dfr <- data.frame( ecdf = ecdf_x(x),
>>ccdf = 1 - ecdf_x(x) )
>>p_ccdf <- ggplot(dfr, aes(x, ccdf)) + geom_line() + scale_x_log10()
>>p_ccdf
Sorry for posting it so late.
Thank you all!

Resources