Adding a regression line to a facet_grid with qplot in R - r

I am trying to add a regression line to this facet_grid, but i seem to be having problems.
qplot(date, data=ua, geom="bar",
weight=count,
ylab="New User Count",
xlab='Date',
main='New Users by Type Last 3 Months',
colour=I('lightgreen'),
fill=I('grey')) +
facet_grid(un_region~role, scales='free', space='free_y') +
opts(axis.text.x =theme_text(angle=45, size=5))
Here's a sample of the data i am working with, note that the counts need to be summed, this is why i am using weight=count, not sure if there is a better way.
date role name un_region un_subregion us_state count
1 2012-06-21 ENTREPRENEUR Australia Oceania Australia and New Zealand 2
2 2012-06-21 ENTREPRENEUR Belgium Europe Western Europe 1
Thanks.

I'm not sure what you're drawing the slope of since it seems that you're using a bar plot. There are three ways you can do it, I'll illustrate two. If you have actual xy data and you want to fit individual regression lines by facet, just use stat_smooth(method="lm"). Here's some code:
library(ggplot2)
x <- rnorm(100)
y <- + .7*x + rnorm(100)
f1 <- as.factor(c(rep("A",50),rep("B",50)))
f2 <- as.factor(rep(c(rep("C",25),rep("D",25)),2))
df <- data.frame(cbind(x,y))
df$f1 <- f1
df$f2 <- f2
ggplot(df,aes(x=x,y=y))+geom_point()+facet_grid(f1~f2)+stat_smooth(method="lm",se=FALSE)
This yields:
Or you can use the geom_abline() command to set your own intercept and slope. Here's an example using the diamonds data set overlaid on a bar plot, maybe what you're looking for.
You can see an example with just the scatterplots with this snippet
ggplot(df,aes(x=x,y=y))+geom_point()+facet_grid(f1~f2)+geom_abline(intercept = 0, slope = 1 )

Related

Autoplot time series - set fixed months/years to plot

library(fpp)
library(forecast)
ausbeer.train <- window(ausbeer, end=c(1999,4))
ausbeer.test <- window(ausbeer, start=c(2000,1))
autoplot(ausbeer.train, xlab="Rok", ylab="beer") +
autolayer(snaive(ausbeer.train, h=32), PI=FALSE, series="snaive") +
autolayer(meanf(ausbeer.train, h=32), PI=FALSE, series="meanf") +
autolayer(ausbeer.test)
produces
What if I wanted to plot only data from 1995 up to 2008? Can I somehow limit the range on the x axis? I don't want to subset my data (as snaive and meanf and probably other methods will need the entire train data), I only need to limit what I draw on the plot.
If p is the value of the autoplot statement in the question then this will plot only 1995 to the end of the series.
library(ggplot2)
p + xlim(1995, NA)

Barplot with continuous x axis using base r graphics

I am looking to scale the x axis on my barplot to time, so as to accurately represent when measurements were taken.
I have these data frames:
> Botcv
Date Average SE
1 2014-09-01 4.0 1.711307
2 2014-10-02 5.5 1.500000
> Botc1
Date Average SE
1 2014-10-15 2.125 0.7180703
2 2014-11-12 1.000 0.4629100
3 2014-12-11 0.500 0.2672612
> Botc2
Date Average SE
1 2014-10-15 3.375 1.3354708
2 2014-11-12 1.750 0.4531635
3 2014-12-11 0.625 0.1829813
I use this code to produce a grouped barplot:
covaverage <- c(Botcv$Average,NA,NA,NA)
c1average <- c(NA,NA, Botc1$Average)
c2average <- c(NA,NA, Botc2$Average)
date <- c(Botcv$Date, Botc1$Date)
averagematrix <- matrix(c(covaverage,c1average, c2average), nrow=3, ncol=5, byrow=TRUE)
barplot(averagematrix,date, xlab="Date", ylab="Average", axis.lty=1, space=NULL,width=3,beside=T, ylim=c(0.00,6.00))
R plots the bars equal distances apart by default and I have been trying to find a workaround for this. I have seen several other solutions that utilise ggplot2 but I am producing plots for my masters thesis and would like to keep the appearance of my barplots in line with other graphs that I have created using base R graphics. I also want to add error bars to the plot. If anyone could provide a solution then I would be very grateful!! Thanks!
Perhaps you can use this as a start. It is probably easier to use boxplots, as they can be put at a given x position by using the at argument. For base barplots this cannot be done, but you can use rectangle instead to replicate the barplot look. Error bars can be added using arrows or segments.
bar_w = 1 # width of bars
offset = c(-1,1) # offset to avoid overlapping
cols = grey.colors(2) # colors for different types
# combine into a single data frame
d = data.frame(rbind(Botc1, Botc2), 'type' = c(1,1,1,2,2,2))
# set up empty plot with sensible x and y lims
plot(as.Date(d$Date), d$Average, type='n', ylim=c(0,4))
# draw data of data frame 1 and 2
for (i in unique(d$type)){
dd = d[d$type==i, ]
x = as.Date(dd$Date)
y = dd$Average
# rectangles
rect(xleft=x-bar_w+offset[i], ybottom=0, xright=x+bar_w+offset[i], ytop=y, col=cols[i])
# errors bars
arrows(x0=x+offset[i], y0=y-0.5*dd$SE, x1=x+offset[i], y1=y+0.5*dd$SE, col=1, angle=90, code=3, length = 0.1)
}
If what you want to get is simply the theme that will match the base theme the + theme_bw() in ggplot2 will achieve this:
data(mtcars)
require(ggplot2)
ggplot(mtcars, aes(factor(cyl), mpg)) +
geom_boxplot() +
theme_bw()
Result
Alternative
boxplot(mpg~cyl,data=mtcars)
If, as you said, the only thing you want to achieve is similar look, and you have working plot in the ggplot2 using the theme_bw() should produce plots that are indistinguishable from what would be derived via the standard plotting mechanism. If you feel so inclined you may tweak some minutiae details like font sizes, thickness of graph borders or visualisation of outliers.

plotCI: how to overlay plots of two variables

I am trying to plot populations of predators and of prey over time, with confidence intervals. I can plot these two separately, how to plot on same graph?
#take mean, number, and create se of prey(d)
d.means=tapply(mydata$prey,mydata$week, mean)
d.n=tapply(mydata$prey,mydata$week, length)
d.se=tapply(mydata$prey,mydata$week, sd)/sqrt(d.n)
#plot with se using plotrix
plotCI(as.numeric(row.names(d.means)),d.means,d.se,ylim=c(0,400),pch=19,gap=0,xlab="Week",ylab="d, w population")
#take mean, number, and create se of predator(w)
w.means=tapply(mydata$pred,mydata$week, mean)
w.n=tapply(mydata$pred,mydata$week, length)
w.se=tapply(mydata$pred,mydata$week, sd)/sqrt(w.n)
#plot with se using plotrix
plotCI(as.numeric(row.names(w.means)),w.means,w.se,ylim=c(0,400),pch=19,gap=0,xlab="Week",ylab="d, w population")
After the first plot, use the code below before plotting the next plot:
par(new=T)
Make sure that you set the xlim and ylim to accommodate both plots. And you will need to use the options axes=F and ann=F.
These graphical features are discussed in detail in the ebook "R Fundamentals & Graphics". You might want to use it as a desk reference.
#take mean, number, and create se of prey(d)
d.means=tapply(mydata$prey,mydata$week, mean)
d.n=tapply(mydata$prey,mydata$week, length)
d.se=tapply(mydata$prey,mydata$week, sd)/sqrt(d.n)
#take mean, number, and create se of predator(w)
w.means=tapply(mydata$pred,mydata$week, mean)
w.n=tapply(mydata$pred,mydata$week, length)
w.se=tapply(mydata$pred,mydata$week, sd)/sqrt(w.n)
Here you have created all the variables you need but to plot them using ggplot you need them to be in a tall dataset with an variable indicating if they are predator or prey. I also added a time variable, I think yours would be week.
x=data.frame(means=c(w.means,d.means),
n=c(w.n,d.n),
se=c(w.se,d.se),
role=c(rep("pred",length(w.n)),rep("prey",length(d.n))),
time=c(1:length(w.n),1:length(d.n))
)
I don't know exactly what your data look like so here is a fake one I cooked up just to illustrate the format.
means n se role time
1 0.9874234 10 0.16200575 pred 1
2 1.4120207 12 0.08895026 pred 2
3 2.7352516 8 0.07991036 pred 3
4 1.1301248 11 0.05481813 prey 1
5 2.4810040 13 0.28682585 prey 2
6 3.1546947 9 0.22126054 prey 3
Once the data are in this nice format using ggplot is really pretty easy.
ggplot(x, aes(x=time, y=means, colour=role)) +
geom_errorbar(aes(ymin=means-se, ymax=means+se), width=.1) +
geom_line()
That gives this:

Create faceted graph in R, keeping other points as greyed out

I have some data (AllPCA) that is divided by site. I have used qplot (PC1, PC2, data=AllPCA, colour=Population, facets=~Population) + scale_colour_manual (values=cbbPalette) to facet a scatterplot of two variables by site.
Example AllPCA:
ID PC1 PC2 Population
Syd1 0.0185 0.0426 Sydney
Was1 0.0167 0.0415 Washington
Rea1 0.0182 0.0431 Reading
Aar1 0.0183 0.0427 Aarhus
This works fine, but only gives the data from each site in each of the windows.
I would like to create the same plot, but keeping the rest of the data in each facetted plot, just greyed out. Can you help?
One way would be to use two geom_point() calls. In first I use data=AllPCA[,-4] - this is your data without column Population and set color="grey". So all points will be plotted in all facets in grey. Then I add second geom_point() with all data and color=Population. This will add only points in facets corresponding to each Population in separate colors (when facet_wrap() is used).
ggplot()+
geom_point(data=AllPCA[,-4],aes(PC1,PC2),color="grey")+
geom_point(data=AllPCA,aes(PC1,PC2,color=Population))+
facet_wrap(~Population)
Duplicate your data several times:
n <- length(unique(AllPCA[["Population"]]))
dat <- do.call(rbind, rep(list(AllPCA), n))
Create new columns for (a) facetting and (b) colour:
dat[["Population2"]] <- rep(AllPCA[["Population"]], each = n)
dat[["PopulationMatch"]] <- with(dat, Population == Population2)
Plot:
library(ggplot2)
qplot(PC1, PC2, data = dat, colour = PopulationMatch, facets = ~ Population2) +
scale_colour_manual(values = c("grey", "black"))

Stacke different plots in a facet manner

To train with ggplot and to improve my skills in writing R functions I decided to build a series of functions that produces survival plots, with all kinds of extras. I managed to build a good working function for the basic survival plot, now I am getting to the extras. One thing I would like to do is an option that stacks an area plot of the number at risk at a given time point, on top of the survival plot. I would like it to look just like the facet_grid option of ggplot, but I did not manage to do it with this function. I do not want the two plots binded, like we can do with grid.arrange, but rather to have the same x-axis.
The following code produces the two (simplified) plots that I would like to stack on top of each other. I tried to do this with facet_grid, but I don't think the solution lies in this
library(survival)
library(ggplot2)
data(lung)
s <- survfit(Surv(time, status) ~ 1, data = lung)
dat <- data.frame(time = c(0, s$time),
surv = c(1, s$surv),
nr = c(s$n, s$n.risk))
pl1 <- ggplot(dat, aes(time, surv)) + geom_step()
pl2 <- ggplot(dat, aes(time, nr)) + geom_area()
First, melt your data to long format.
library(reshape2)
dat.long<-melt(dat,id.vars="time")
head(dat.long)
time variable value
1 0 surv 1.0000000
2 5 surv 0.9956140
3 11 surv 0.9824561
4 12 surv 0.9780702
5 13 surv 0.9692982
6 15 surv 0.9649123
Then use subset() to use only surv data in geom_step() and nr data in geom_area() and with facet_grid() you will get each plot in separate facet as variable is used to divide data for facetting and for subsetting. scales="free_y" will make pretty axis.
ggplot()+geom_step(data=subset(dat.long,variable=="surv"),aes(time,value))+
geom_area(data=subset(dat.long,variable=="nr"),aes(time,value))+
facet_grid(variable~.,scales="free_y")

Resources