geom_polygon to draw normal and logistic distributions - r

UPDATE:
I have solved my problem. I was looking for
coord_cartesian(xlim = c(800, 2100), ylim = c(0, 0.0021))
Thanks to every one who tried to help!
QUESTION WAS:
I would like to draw a nice picture of what is the difference between normal and logistic distributions. I have reached that point :
x=seq(1000,2000,length=200)
dat <- data.frame(
norm = dnorm(x,mean=1500,sd=200),
logistic = dlogis(x,location=1500,scale=200), x = x
)
ggplot(data=dat, aes(x=x)) +
geom_polygon(aes(y=norm), fill="red", alpha=0.6) +
geom_polygon(aes(y=logistic), fill="blue", alpha=0.6) +
xlab("") + ylab("") +
opts(title="Logistic and Normal Distributions") +
scale_x_continuous(expand = c(0, 0)) +
scale_y_continuous(expand = c(0, 0))
However the logistic one is "cut" at the bottom. I think what I should do is to draw this distribution from 0 to 3000 for example but show only 1000-2000.
Any clues how to do this?
I tried scale_x_continuous(limits = c(1000, 2000)) but this does not work
UPDATE:
I have updated my code so I have legend, now it looks like this:
x=seq(700,2300,length=200)
dat2 <- data.frame(x=x)
dat2$value <- dnorm(x,mean=1500,sd=200)
dat2$type <- "Normal"
dat1 <- data.frame(x=x)
dat1$value <- dlogis(x,location=1500,scale=200)
dat1$type <- "Logistic"
dat <- rbind(dat1, dat2)
ggplot(data=dat, aes(x=x, y=value, colour=type, fill=type)) + geom_polygon(alpha=0.6) + scale_y_continuous(expand = c(0, 0))

I would draw it using z-scores, from [-2 ; +2]. This has the side benefit that your problem goes away.
x=seq(-2,2,length=200)
dat <- data.frame(
norm = dnorm(x,mean=0,sd=0.2),
logistic = dlogis(x,location=0,scale=0.2), x = x
)
p <- ggplot(data=dat, aes(x=x)) +
geom_polygon(aes(y=norm), fill="red", alpha=0.6) +
geom_polygon(aes(y=logistic), fill="blue", alpha=0.6) +
xlab("z") + ylab("") +
scale_x_continuous(expand = c(0, 0)) +
scale_y_continuous(expand = c(0, 0)) +
opts(title="Logistic and Normal Distributions")
print(p)

The reason it cuts off the bottom is because geom_polygon literally draws the polygon consisting of lines connecting the points you give it. So the flat line across the bottom of the distribution is just connecting the first and last value in your data frame. If you want it to extend to the bottom you can add the appropriate points to your data frame:
ggplot(data=dat, aes(x=x)) +
geom_polygon(aes(y=norm), fill="red", alpha=0.6) +
geom_polygon(data = rbind(c(NA,0,1000),dat,c(NA,0,2000)),aes(y=logistic), fill="blue", alpha=0.6) + xlab("") + ylab("") +
opts(title="Logistic and Normal Distributions")+
scale_x_continuous(expand = c(0, 0)) +
scale_y_continuous(expand = c(0, 0))
Edited for clarity
You can tinker with this to get it to go down only as far as you want by adding points with the right values. For instance, I forced the logistic distribution to fill all the way down to zero. You could make it level with the normal distribution by rbinding the minimum normal density value instead. Also, be careful where you add them in your data frame. geom_polygon will connect the dots in the order they appear. That's why I added one at the beginning of the data frame and one at the end.
Edit 2
Based on your revised code, my solution still works fine:
x=seq(700,2300,length=200)
dat2 <- data.frame(x=x)
dat2$value <- dnorm(x,mean=1500,sd=200)
dat2$type <- "Normal"
dat1 <- data.frame(x=x)
dat1$value <- dlogis(x,location=1500,scale=200)
dat1$type <- "Logistic"
#Append extra points at the top/bottom to
# complete the polygon
dat1 <- rbind(data.frame(x=700,value=0,type = "Logistic"),dat1,
data.frame(x=2300,value=0,type = "Logistic"))
dat <- rbind(dat1, dat2)
ggplot(data=dat, aes(x=x, y=value, colour=type, fill=type)) +
geom_polygon(alpha=0.6) +
scale_y_continuous(expand = c(0, 0))
And personally, I would prefer this over coord_cartesian, since I'm a stickler about starting my axes from zero.

The solution is to use
+ coord_cartesian(xlim = c(800, 2100), ylim = c(0, 0.0021))

I ran your code, and then analyzed the values of norm and logistic:
Rgames: mystat(dat$logistic)
min max mean median
3.51e-04 1.25e-03 8.46e-04 8.63e-04
sdev skew kurtosis
2.96e-04 -1.33e-01 -1.4
Rgames: mystat(dat$norm)
min max mean median
8.76e-05 1.99e-03 9.83e-04 9.06e-04
sdev skew kurtosis
6.62e-04 1.67e-01 -1.48
So your logistic values are in fact correctly plotted. As the other answers showed, there are preferable ways to create your underlying data.

Related

Facet_wrap and scale="free" unexpectedly centers y-axis at zero in ggplot2

From this dataframe
df <- data.frame(cat=c(rep("X", 20),rep("Y", 20), rep("Z",20)),
value=c(runif(20),runif(20)*100, rep(0, 20)),
var=rep(LETTERS[1:5],12))
i want to create facetted boxplots.
library(ggplot2)
p1 <- ggplot(df, aes(var,value)) + geom_boxplot() + facet_wrap(~cat, scale="free")
p1
The results is aesthetically dissactisfactory as it centers the y-axis of the empty panel at zero. I want to start all y-scales at zero. I tried several answers from this earlier question:
p1 + scale_y_continuous(expand = c(0, 0)) # not working
p1 + expand_limits(y = 0) #not working
p1 + scale_y_continuous(limits=c(0,NA)) ## not working
p1 + scale_y_continuous(limits=c(0,100)) ## partially working, but defeats scale="free"
p1 + scale_y_continuous(limits=c(0,max(df$value))) ## partially working, see above
p1 + scale_y_continuous(limits=c(0,max(df$value))) + expand_limits(y = 0)## partially working, see above
One solution would possibly be to replace the zero's with very tiny values, but maybe you can find a more straightforward solution. Thank you.
A simpler solution would be to pass a function as the limits argument:
p1 <- ggplot(df, aes(var,value)) + geom_boxplot() + facet_wrap(~cat, scale="free") +
scale_y_continuous(limits = function(x){c(0, max(0.1, x))})
The function takes per facet the automatically calculated limits as x argument, where you can apply any transformation on them, such as for example choosing the maximum between 0.1 and the true maximum.
The result is still subject to scale expansion though.
This might be a bit of a work around, but you could use geom_blank() to help set your axis dimension. For example:
df <- data.frame(cat=c(rep("X", 20),rep("Y", 20), rep("Z",20)),
value=c(runif(20),runif(20)*100, rep(0, 20)),
var=rep(LETTERS[1:5],12))
# Use this data frame to set min and max for each category
# NOTE: If the value in this DF is smaller than the max in df it will be overridden
# by the max(df$value)
axisData <- data.frame(cat = c("X", "X", "Y", "Y", "Z", "Z"),
x = 'A', y = c(0, 1, 0, 100, 0, 1))
p1 <- ggplot(df, aes(var,value)) +
geom_boxplot() +
geom_blank(data = axisData, aes(x = x, y = y)) +
facet_wrap(~cat, scale="free")
p1

ggplot2 and regression lines and R^2 values

I know there have been a number of entries with regards to adding R^2 values to plots, but I am having trouble following the codes. I am graphing a scatter plot with three categories. I have added a linear regression line for each one. I would now like to add r^2 values for each but I can't figure out how to do this.
My code:
veg <- read.csv("latandwtall2.csv", header=TRUE)
library("ggplot2")
a <- ggplot(veg, aes(x=avglat, y=wtfi, color=genus)) + geom_point(shape=19, size=4)
b <- a + scale_colour_hue(l=50) + stat_smooth(method = "lm", formula = y ~ x, size = 1, se = FALSE)
c <- b + labs(x="Latitude", y="Weight (g)")
d <- c + theme_bw()
e <- d + theme(panel.grid.minor=element_blank(), panel.grid.major=element_blank())
#changes size of text
f <- e + theme(
axis.title.x = element_text(color="black", vjust=-0.35, size=15, face="bold"),
axis.title.y = element_text(color="black" , vjust=0.35, size=15, face="bold")
)
g <- e+theme(legend.key=element_rect(fill='white'))
g
Any help with how to add R^2 values would be greatly appreciated. Thanks!
If you build a data frame with the r-squared values, you might be able to (mostly) automate the positioning of the annotation text by including it as a call to geom_text.
Here's a toy example. The rsq data frame is used in geom_text to place the r-squared labels. In this case, I've set it up to put the labels just after the highest x-value and the predict function gets the y-value. It's probably too much work for a single plot, but if you're doing this a lot, you can turn it into a function so that you don't have to repeat the set-up code every time, and maybe add some fancier logic to make label placement more flexible:
library(reshape2) # For melt function
# Fake data
set.seed(12)
x = runif(100, 0, 10)
dat = data.frame(x, y1 = 2*x + 3 + rnorm(100, 0, 5),
y2 = 4*x + 20 + rnorm(100, 0, 10))
dat.m = melt(dat, id.var="x")
# linear models
my1 = lm(y1 ~ x, data=dat)
my2 = lm(y2 ~ x, data=dat)
# Data frame for adding r-squared values to plot
rsq = data.frame(model=c("y1","y2"),
r2=c(summary(my1)$adj.r.squared,
summary(my2)$adj.r.squared),
x=max(dat$x),
y=c(predict(my1, newdata=data.frame(x=max(dat$x))),
predict(my2, newdata=data.frame(x=max(dat$x)))))
ggplot() +
geom_point(data=dat.m, aes(x, value, colour=variable)) +
geom_smooth(data=dat.m, aes(x, value, colour=variable),
method="lm", se=FALSE) +
geom_text(data=rsq, aes(label=paste("r^2 == ", round(r2,2)),
x=1.05*x, y=y, colour=model, hjust=0.5),
size=4.5, parse=TRUE)
I can't really reproduce what you're doing but you need to use annotate()
Something that could work (puting the R2 on the 10th point) would be :
R2 = 0.4
i = 10
text = paste("R-squared = ", R2, sep="")
g = g + annotate("text", x=avglat[i], y=wtfi[i], label=text, font="Calibri", colour="red", vjust = -2, hjust = 1)
Use vjust and hjust to adjust the position of the text to the point (change the i), and just fill the variable R2 with your computed rsquared. You can choose the point you like or manually enter the x,y coordinate it's up to you. Does that help ?
PS. I put extra parameters (font, colours) so that you have the flexibility to change them.
Build the model separately, get the R^2 from there, and add it to the plot. I'll give you some dummy code, but it would be of better quality if you had given us a sample data frame.
r2 = summary(lm(wtfi ~ avglat, data=veg))$r.squared
#to piggyback on Romain's code...
i=10
g = g + annotate("text", x=avglat[i], y=wtfi[i], label=round(r2,2), font="Calibri", colour="red", vjust = -2, hjust = 1)
The way I wrote it here you don't need to hard-code the R^2 value in.

How to highlight an item of time-series in a ggplot2 plot

I wish to highlight segments above or below a certain value in a time series by a unique colour or a shape. In the example data I am decomposing a mortality time series into its components. My goal is to highlight the segments when the mortality in the trend component falls below 35 (deep between 1997 and 2000) and when the residual component is above 100 (the spike). I have tried to use annotate, but that did not produce what I wanted.
#Load library and obtain data
library(gamair)
library(tsModel)
library(ggplot2)
library(reshape2)
data<-data(chicago)
## create variables, decompose TS
chicago$date<-seq(from=as.Date("1987-01-01"), to=as.Date("2000-12-31"),length=5114)
data<- chicago[,c("date","death")]
mort <- tsdecomp(data$death, c(1, 2, 15, 5114))
## Convert matrix to df, rename, melt
df<-as.data.frame(mort)
names(df)[1] <- "Trend"
names(df)[2] <- "Seasonal"
names(df)[3] <- "Residual"
df$date<-seq(as.Date("1987-01-01"), as.Date("2000-12-31"), "day")
meltdf <- melt(df,id="date")
## Plot
ggplot(meltdf,aes(x=date,y=value,colour=variable,group=variable)) + geom_line() +
theme_bw() +
ylab("") + xlab("") +
facet_grid(variable ~ . , scales = "free") +
theme(legend.position = "none")
annotate("rect", xmin=1995-01-01,xmax=1996-01-01,ymin= 10, ymax=300, alpha = .2,fill="blue")
Well, this works but I must admit it's more work that I'd hoped.
get.box <- function(data) {
rng <- range(data$date) + c(-50,50)
z <- meltdf[meltdf$date>=rng[1] & meltdf$date <=rng[2] & meltdf$variable==unique(data$variable),]
data.frame(variable=unique(z$variable),
xmin=min(z$date),xmax=max(z$date),ymin=min(z$value),ymax=max(z$value))
}
hilight.trend <- get.box(with(meltdf,meltdf[variable=="Trend" & value<35,]))
hilight.resid <- get.box(with(meltdf,meltdf[variable=="Residual" & value>100,]))
ggplot(meltdf,aes(colour=variable,group=variable)) +
geom_line(aes(x=date,y=value)) +
theme_bw() +
ylab("") + xlab("") +
facet_grid(variable ~ . , scales = "free") +
theme(legend.position = "none") +
geom_rect(data=hilight.trend, alpha=0.2, fill="red",
aes(xmax=xmax,xmin=xmin,ymax=ymax,ymin=ymin)) +
geom_rect(data=hilight.resid, alpha=0.2, fill="blue",
aes(xmax=xmax,xmin=xmin,ymax=ymax,ymin=ymin))
You can't really use annotate(...) with facets, because you will get the same annotation on all the facets. So you're left with something like geom_rect(...). The problem here is that geom_rect(...) draws a rectangle for every row in the data. So you need to create an auxiliary dataset with just one row for each variable, containing the x- and y- min and max.

ggplot2 - draw logistic distribution with small part of the area colored

I have following code to draw my logistic distribution:
x=seq(-2000,2000,length=1000)
dat <- data.frame(x=x)
dat$value <- dlogis(x,location=200,scale=400/log(10))
dat$type <- "Expected score"
p <- ggplot(data=dat, aes(x=x, y=value)) + geom_line(col="blue", size=1) +
coord_cartesian(xlim = c(-500, 900), ylim = c(0, 0.0016)) +
scale_x_continuous(breaks=c(seq(-500, 800, 100)))
pp <- p + geom_line(aes(x = c(0,0), y = c(0,0.0011)), size=0.9, colour="green", linetype=2, alpha=0.7)
Now what I would like to do is to highlight the area to the left of x = 0.
I tried to do it like this:
x = seq(-500, 0, length=10)
y = dlogis(x,location=200,scale=400/log(10))
pol <- data.frame(x = x, y = y)
pp + geom_polygon(aes(data=pol,x=x, y=y), fill="light blue", alpha=0.6)
But this does not work. Not sure what I am doing wrong. Any help?
I haven't diagnosed the problem with your polygon (although I think you would need to give the full path around the outside, i.e. attach rep(0,length(x)) to the end of y and rev(x) to the end of x), but geom_ribbon (as in Shading a kernel density plot between two points. ) seems to do the trick:
pp + geom_ribbon(data=data.frame(x=x,y=y),aes(ymax=y,x=x,y=NULL),
ymin=0,fill="light blue",alpha=0.5)

Error with ggplot2

I don't know what am I missing in the code?
set.seed(12345)
require(ggplot2)
AData <- data.frame(Glabel=LETTERS[1:7], A=rnorm(7, mean = 0, sd = 1), B=rnorm(7, mean = 0, sd = 1))
TData <- data.frame(Tlabel=LETTERS[11:20], A=rnorm(10, mean = 0, sd = 1), B=rnorm(10, mean = 0, sd = 1))
i <- 2
j <- 3
p <- ggplot(data=AData, aes(AData[, i], AData[, j])) + geom_point() + theme_bw()
p <- p + geom_text(aes(data=AData, label=Glabel), size=3, vjust=1.25, colour="black")
p <- p + geom_segment(data = TData, aes(xend = TData[ ,i], yend=TData[ ,j]),
x=0, y=0, colour="black",
arrow=arrow(angle=25, length=unit(0.25, "cm")))
p <- p + geom_text(data=TData, aes(label=Tlabel), size=3, vjust=1.35, colour="black")
Last line of the code produces the error. Please point me out how to figure out this problem. Thanks in advance.
I have no idea what you are trying to do, but the line that fails is the last line, because you haven't mapped new x and y variables in the mapping. geom_text() needs x and y coords but you only provide the label argument, so ggplot takes x and y from p, which has only 7 rows of data whilst Tlabel is of length 10. That explains the error. I presume you mean to plot at x = A and y = B of TData? If so, this works:
p + geom_text(data=TData, mapping = aes(A, B, label=Tlabel),
size=3, vjust=1.35, colour="black")
(This might get a better answer on the ggplot mailing list.)
It looks like you're trying to display some kind of biplot ... the root of your problem is that you're violating the idiom of ggplot, which wants you to specify variables in a way that's consistent with the scope of the data.
Maybe this does what you want, via some aes_string trickery that substitutes the names of the desired columns ...
varnames <- colnames(AData)[-1]
v1 <- varnames[1]
v2 <- varnames[2]
p <- ggplot(data=AData,
aes_string(x=v1, y=v2)) + geom_point() + theme_bw()
## took out redundant 'data', made size bigger so I could see the labels
p <- p + geom_text(aes(label=Glabel), size=7, vjust=1.25, colour="black")
p <- p + geom_segment(data = TData, aes_string(xend = v1, yend=v2),
x=0, y=0, colour="black",
arrow=arrow(angle=25, length=unit(0.25, "cm")))
## added colour so I could distinguish this second set of labels
p <- p + geom_text(data=TData,
aes(label=Tlabel), size=10, vjust=1.35, colour="blue")

Resources