I need to plot several data points that are defined as
c(x,y, stdev_x, stdev_y)
as a scatter plot with a representation of their 95% confidence limits, for examples showing the point and one contour around it. Ideally I'd like to plot on oval around the point, but don't know how to do it. I was thinking of building samples and plotting them, adding stat_density2d() but would need to limit the number of contours to 1, and could not figure out how to do it.
require(ggplot2)
n=10000
d <- data.frame(id=rep("A", n),
se=rnorm(n, 0.18,0.02),
sp=rnorm(n, 0.79,0.06) )
g <- ggplot (d, aes(se,sp)) +
scale_x_continuous(limits=c(0,1))+
scale_y_continuous(limits=c(0,1)) +
theme(aspect.ratio=0.6)
g + geom_point(alpha=I(1/50)) +
stat_density2d()
First, saved all your plot as object (changed limits).
g <- ggplot (d, aes(se,sp, group=id)) +
scale_x_continuous(limits=c(0,0.5))+
scale_y_continuous(limits=c(0.5,1)) +
theme(aspect.ratio=0.6) +
geom_point(alpha=I(1/50)) +
stat_density2d()
With function ggplot_build() save all the information used for the plot. Contours are stored in object data[[2]].
gg<-ggplot_build(g)
str(gg$data)
head(gg$data[[2]])
level x y piece group PANEL
1 10 0.1363636 0.7390318 1 1-1 1
2 10 0.1355521 0.7424242 1 1-1 1
3 10 0.1347814 0.7474747 1 1-1 1
4 10 0.1343692 0.7525253 1 1-1 1
5 10 0.1340186 0.7575758 1 1-1 1
6 10 0.1336037 0.7626263 1 1-1 1
There are in total 12 contour lines but to keep only outer line, you should subset only group=="1-1" and replace original information.
gg$data[[2]]<-subset(gg$data[[2]],group=="1-1")
Then use ggplot_gtable() and grid.draw() to get your plot.
p1<-ggplot_gtable(gg)
grid.draw(p1)
latticeExtra provides panel.ellipse is a lattice panel function that computes and draws a confidence ellipsoid from bivariate data, possibly grouped by a third variable.
here I draw the levels 0.65 and 0.95 suing your data.
library(latticeExtra)
xyplot(sp~se,data=d,groups=id,
par.settings = list(plot.symbol = list(cex = 1.1, pch=16)),
panel = function(x,y,...){
panel.xyplot(x, y,alpha=0.2)
panel.ellipse(x, y, lwd = 2, col="green", robust=FALSE, level=0.65,...)
panel.ellipse(x, y, lwd = 2, col="red", robust=TRUE, level=0.95,...)
})
Looks like the stat_ellipse function that you found is really a great solution, but here's another one (non-ggplot), just for the record, using dataEllipse from the car package.
# some sample data
n=10000
g=4
d <- data.frame(ID = unlist(lapply(letters[1:g], function(x) rep(x, n/g))))
d$x <- unlist(lapply(1:g, function(i) rnorm(n/g, runif(1)*i^2)))
d$y <- unlist(lapply(1:g, function(i) rnorm(n/g, runif(1)*i^2)))
# plot points with 95% normal-probability contour
# default settings...
library(car)
with(d, dataEllipse(x, y, ID, level=0.95, fill=TRUE, fill.alpha=0.1))
# with a little more effort...
# random colours with alpha-blending
d$col <- unlist(lapply(1:g, function (x) rep(rgb(runif(1), runif(1), runif(1), runif(1)),n/g)))
# plot points first
with(d, plot(x,y, col=col, pch="."))
# then ellipses over the top
with(d, dataEllipse(x, y, ID, level=0.95, fill=TRUE, fill.alpha=0.1, plot.points=FALSE, add=TRUE, col=unique(col), ellipse.label=FALSE, center.pch="+"))
Just found the function stat_ellipse() here (and here) and it takes care of this beautifully.
g + geom_point(alpha=I(1/10)) +
stat_ellipse(aes(group=id), color="black")
Different data set, of course:
I don't know anything about the ggplot2 library, but you can draw ellipses with plotrix. Does this plot look anything like what you're asking for?
library(plotrix)
n=10
d <- data.frame(x=runif(n,0,2),y=runif(n,0,2),seX=runif(n,0,0.1),seY=runif(n,0,0.1))
plot(d$x,d$y,pch=16,ylim=c(0,2),xlim=c(0,2))
draw.ellipse(d$x,d$y,d$seX,d$seY)
Related
I have a function,
x= (z-z^2.5)/(1+2*z-z^2)
y = z-z^2.5
where z is the only variable. How to draw a graph where x-axis shows value of function x, and y-axis shows value of function y as z range from 0 to 5?
You can get a very basic plot by simply following your own instructions.
## z ranges from 0 to 5
z = seq(0,5,0.01)
## x and y are functions of z
x = (z-z^2.5)/(1+2*z-z^2)
y = z-z^2.5
##plot
plot(x,y, pch=20, cex=0.5)
If you want a smooth curve it is a little trickier. There is a discontinuity in the curve at
z = 1 + sqrt(2) ~ 2.414. If you just draw the curve as one piece, you get an unwanted line connecting across the discontinuity. So, in two pieces,
plot(x[1:242],y[1:242], type='l', xlab='x', ylab='y',
xlim=range(x), ylim=range(y))
lines(x[243:501],y[243:501])
But be careful about interpreting this. There is something tricky going on from z=0 to z=1.
Using ggplot2
# z ranges from -1000 to 1000 (The range can be arbitrary)
z = seq(-1000,1000,.25)
# x as a function of z
x = (z-z^2.5) / ((1+2*z)-z^2)
# y as a function of z
y = z-z^2.5
# make a dataframe of x,y and z
df <- data.frame(x=x, y=y, z=z)
# subset the df where z is between 0 and 5
df_5 <- subset(df, (df$z>=0 & df$z<=5))
# plot the graph
library(ggplot2)
ggplot(df_5, aes(x,y))+ geom_point(color="red")
The only addition to #G5W answer is subset() of values between 0 and 5 from your dataset to plot and the use of ggplot2.
In R I have created a simple matrix of one column yielding a list of numbers with a set mean and a given standard deviation.
rnorm2 <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
r <- rnorm2(100,4,1)
I now would like to plot how these numbers differ from the mean. I can do this in Excel as shown below:
But I would like to use ggplot2 to create a graph in R. in the Excel graph I have cheated by using a line graph but if I could do this as columns it would be better. I have tried using a scatter plot but I cant work out how to turn this into deviations from the mean.
Perhaps you want:
rnorm2 <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
set.seed(101)
r <- rnorm2(100,4,1)
x <- seq_along(r) ## sets up a vector from 1 to length(r)
par(las=1,bty="l") ## cosmetic preferences
plot(x, r, col = "green", pch=16) ## draws the points
## if you don't want points at all, use
## plot(x, r, type="n")
## to set up the axes without drawing anything inside them
segments(x0=x, y0=4, x1=x, y1=r, col="green") ## connects them to the mean line
abline(h=4)
If you were plotting around 0 you could do this automatically with type="h":
plot(x,r-4,type="h", col="green")
To do this in ggplot2:
library("ggplot2")
theme_set(theme_bw()) ## my cosmetic preferences
ggplot(data.frame(x,r))+
geom_segment(aes(x=x,xend=x,y=mean(r),yend=r),colour="green")+
geom_hline(yintercept=mean(r))
Ben's answer using ggplot2 works great, but if you don't want to manually adjust the line width, you could do this:
# Half of Ben's data
rnorm2 <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
set.seed(101)
r <- rnorm2(50,4,1)
x <- seq_along(r) ## sets up a vector from 1 to length(r)
# New variable for the difference between each value and the mean
value <- r - mean(r)
ggplot(data.frame(x, value)) +
# geom_bar anchors each bar at zero (which is the mean minus the mean)
geom_bar(aes(x, value), stat = "identity"
, position = "dodge", fill = "green") +
# but you can change the y-axis labels with a function, to add the mean back on
scale_y_continuous(labels = function(x) {x + mean(r)})
in base R it's quite simple, just do
plot(r, col = "green", type = "l")
abline(4, 0)
You also tagged ggplot2, so in that case it will be a bit more complicated, because ggplot requires creating a data frame and then melting it.
library(ggplot2)
library(reshape2)
df <- melt(data.frame(x = 1:100, mean = 4, r = r), 1)
ggplot(df, aes(x, value, color = variable)) +
geom_line()
I'm working with circular data and I wanted to reproduce this kind of plot using ggplot2:
library(circular)
data1 <- rvonmises(1000, circular(0), 10, control.circular=list(units="radians")) ## sample
quantile.circular(data1,c(0.05,.95)) ## for interval
data2 <- mean(data1)
dens <- density(data1, bw=27)
p<-plot(dens, points.plot=TRUE, xlim=c(-1,2.1),ylim=c(-1.0,1.2),
main="Circular Density", ylab="", xlab="")
points(circular(0), plot.info=p, col="blue",type="o")
arrows.circular(c(5.7683795,0.5151433 )) ## confidence interval
arrows.circular(data2, lwd=3) ## circular mean
The thinest arrows are extremes of my interval
I suppose blue point is forecast
The third arrow is circular mean
I need circular density
I've been looking for something similar but I did not found anything.
Any suggestion?
Thanks
To avoid running in the wrong direction would you quickly check if this code goes in the right direction? The arrows can be added easily using +arrow(...) with appropriate loading.
EDIT: One remark to the complicated way of attaching density values - ggplot's geom_density does not seem to like coord_polar (at least the way I tried it).
#create some dummy radial data and wrap it in a dataframe
d1<-runif(100,min=0,max=120)
df = NULL
df$d1 <- d1
df <- as.data.frame(df)
#estimate kernel density and then derive an approximate function to attach density values to the radial values in the dataframe
data_density <- density(d1)
density_function <- with(data_density, approxfun(x, y, rule=1))
df$density <- density_function(df$d1)
#order dataframe to facilitate geom_line in polar coordinates
df <- df[order(df$density,df$d1),]
#ggplot object
require(ggplot2)
g = ggplot(df,aes(x=d1,y=density))
#Radial observations on unit circle
g = g + geom_point(aes(x=d1,y=min(df$density)))
#Density function
g = g + geom_line()
g = g + ylim(0,max(df$density))
g = g + xlim(0,360)
#polar coordinates
g = g + coord_polar()
g
Uniform random variables sampled from (0,120):
How one can get the following visualization in R (see below):
let's consider a simple case of three points.
# Define two vectors
x <- c(12,21,54)
y <- c(2, 7, 11)
# OLS regression
ols <- lm(y ~ x)
# Visualisation
plot(x,y, xlim = c(0,60), ylim =c(0,15))
abline(ols, col="red")
What I desire is, to draw the vertical distance lines from OLS line (red line) to points.
You can do this really nicely with ggplot2
library(ggplot2)
set.seed(1)
x<-1:10
y<-3*x + 2 + rnorm(10)
m<-lm(y ~ x)
yhat<-m$fitted.values
diff<-y-yhat
qplot(x=x, y=y)+geom_line(y=yhat)+
geom_segment(aes(x=x, xend=x, y=y, yend=yhat, color="error"))+
labs(title="regression errors", color="series")
There is a much simpler solution:
segments(x, y, x, predict(ols))
If you construct a matrix of points, you can use apply to plot the lines like this:
Create a matrix of coordinates:
cbind(x,x,y,predict(ols))
# x x y
#1 12 12 2 3.450920
#2 21 21 7 5.153374
#3 54 54 11 11.395706
This can be plotted as:
apply(cbind(x,x,y,predict(ols)),1,function(coords){lines(coords[1:2],coords[3:4])})
effectively a for loop running over the rows of the matrix and plotting one line for each row.
I have data conditioned on two variables, one major condition, one minor condition. I want a xyplot (lattice) with points and lines (type='b'), in one panel so that the major condition determines the color and the minor condition is used for drawing the lines.
Here is an example that is representative of my problem (see the code below to produce the data frame). d is the major condition, and c is the minor condition.
> dat
x y c d
1 1 0.9645269 a A
2 2 1.4892217 a A
3 3 1.4848654 a A
....
10 10 2.4802803 a A
11 1 1.5606218 b A
12 2 1.5346806 b A
....
98 8 2.0381943 j B
99 9 2.0826099 j B
100 10 2.2799917 j B
The way to get the connecting lines to be conditioned on c is to use groups=c in the plot. Then the way to tell them apart is to use a formula conditioned on d:
xyplot(y~x|d, data=dat, type='b', groups=c)
However, I want the plots in the same panel. Removing the formula condition on d produces one panel, but when group=d is specified, there are "retrace" lines drawn:
xyplot(y~x, data=dat, type='b', groups=d, auto.key=list(space='inside'))
What I want looks very like the above plot, only without these "retrace" lines.
It's possible to set the colors explicitly in this example, as I know that there are five lines of category 'A' followed by five of category 'B', but this won't easily work for my real problem. In addition, auto.key is useless when setting the colors this way:
xyplot(y~x, data=dat, type='b', groups=c, col=rep(5:6, each=5))
The data:
set.seed(1)
dat <- do.call(
rbind,
lapply(1:10,
function(x) {
firsthalf <- x < 6
data.frame(x=1:10, y=log(1:10 + rnorm(10, .25) + 2 * firsthalf),
c=letters[x],
d=LETTERS[2-firsthalf]
)
}
)
)
The default graphical parameters are obtained from the superpose.symbol and superpose.line. One solution s to set them using par.settings argument.
## I compute the color by group
col <-by(dat,dat$c,
FUN=function(x){
v <- ifelse(x$d=='A','darkgreen','orange')
v[1] ## I return one parameter , since I need one color
}
)
xyplot(y~x, data=dat, type='b', groups=c,
auto.key = list(text =levels(dat$d),points=F),
par.settings=
list(superpose.line = list(col = col), ## color of lines
superpose.symbol = list(col=col), ## colors of points
add.text = list(col=c('darkgreen','orange')))) ## color of text in the legend
Does it have to be lattice? In ggplot it is rather easy:
library(ggplot2)
ggplot(dat, aes(x=x,y=y,colour=d)) + geom_line(aes(group=c),size=0.8) + geom_point(shape=1)
This is a quick and dirty example. You can customize the colour of the lines, the legend , the axis, the background,...