How to graph multiple vertically offset density functions - r

I have a time series of univariate distributions that I'd like to visualize more compactly. i know how to add multiple density functions to the same set of axes, but I'd like to vertically offset each function to get show the evolution of the distribution through time.

ggplot is great for this type of stuff (here just using the same distro):
library(ggplot2)
ggplot(df, aes(x=x, y=values, color=ind)) + geom_line() + facet_wrap(~ ind, ncol=1)
And the toy data I used:
df <- stack(setNames(replicate(5, dnorm((-30:30)/10), s=F), letters[1:5]))
df$x <- ave(df$value, df$ind, FUN=seq_along)

Related

plotting multiple plots in ggplot2 on same graph that are unrelated

How would one use the smooth.spline() method in a ggplot2 scatterplot?
If my data is in the data frame called data, with two columns, x and y.
The smooth.spline would be sm <- smooth.spline(data$x, data$y). I believe I should use geom_line(), with sm$x and sm$y as the xy coordinates. However, how would one plot a scatterplot and a lineplot on the same graph that are completely unrelated? I suspect it has something to do with the aes() but I am getting a little confused.
You can use different data(frames) in different geoms and call the relevant variables using aes or you could combine the relevant variables from the output of smooth.spline
# example data
set.seed(1)
dat <- data.frame(x = rnorm(20, 10,2))
dat$y <- dat$x^2 - 20*dat$x + rnorm(20,10,2)
# spline
s <- smooth.spline(dat)
# plot - combine the original x & y and the fitted values returned by
# smooth.spline into a data.frame
library(ggplot2)
ggplot(data.frame(x=s$data$x, y=s$data$y, xfit=s$x, yfit=s$y)) +
geom_point(aes(x,y)) + geom_line(aes(xfit, yfit))
# or you could use geom_smooth
ggplot(dat, aes(x , y)) + geom_point() + geom_smooth()

How to reverse axis order and use a predefined scale in ggplot?

I've read a past post asking about using scale_reverse and scale_log10 at the same time. I have a similar issue, except my scale I'm seeking to "reverse" is a pre-defined scale in the "scales" package. Here is my code:
##Defining y-breaks for probability scale
ybreaks <- c(1,2,5,10,20,30,40,50,60,70,80,90,95,98,99)/100
#Random numbers, and their corresponding weibull probability valeus (which I'm trying to plot)
x <- c(.3637, .1145, .8387, .9521, .330, .375, .139, .662, .824, .899)
p <- c(.647, .941, .255, .059, .745, .549, .853, .451, .352, .157)
df <- data.frame(x, p)
require(scales)
require(ggplot2)
ggplot(df)+
geom_point(aes(x=x, y=p, size=2))+
stat_smooth(method="lm", se=FALSE, linetype="dashed", aes(x=x, y=p))+
scale_x_continuous(trans='probit',
breaks=ybreaks,
minor_breaks=qnorm(ybreaks))+
scale_y_log10()
Resulting plot:
For more information, the scale I'm trying to achieve is the probability plotting scale, which has finer resolution on either end of the scale (at 0 and 1) to show extreme events, with ever-decreasing resolution toward the median value (0.5).
I want to be able to use scale_x_reverse concurrently with my scale_x_continuous probability scale, but I don't know how to build that in any sort of custom scale. Any guidance on this?
Arguments in scale_(x|y)_reverse() are passed to scale_(x|y)_continuous() so you should simply do:
scale_x_reverse(trans='probit', breaks = ybreaks, minor_breaks=qnorm(ybreaks))
Rather than try to combine two transformations, why not transform your existing data and then plot it?
The following looks like it should be right.
#http://r.789695.n4.nabble.com/Inverse-Error-Function-td802691.html
erf.inv <- function(x) qnorm((x + 1)/2)/sqrt(2)
#http://en.wikipedia.org/wiki/Probit#Computation
probit <- function(x) sqrt(2)*erf.inv((2*x)-1)
# probit(0.3637)
df$z <- probit(df$x)
ggplot(df)+
geom_point(aes(x=z, y=p), size=2)+
stat_smooth(method="lm", se=FALSE, linetype="dashed", aes(x=z, y=p))+
scale_x_reverse(breaks = ybreaks,
minor_breaks=qnorm(ybreaks))+
scale_y_log10()

log-scaled density plot: ggplot2 and freqpoly, but with points instead of lines

What I really want to do is plot a histogram, with the y-axis on a log-scale. Obviously this i a problem with the ggplot2 geom_histogram, since the bottom os the bar is at zero, and the log of that gives you trouble.
My workaround is to use the freqpoly geom, and that more-or less does the job. The following code works just fine:
ggplot(zcoorddist) +
geom_freqpoly(aes(x=zcoord,y=..density..),binwidth = 0.001) +
scale_y_continuous(trans = 'log10')
The issue is that at the edges of my data, I get a couple of garish vertical lines that really thro you off visually when combining a bunch of these freqpoly curves in one plot. What I'd like to be able to do is use points at every vertex of the freqpoly curve, and no lines connecting them. Is there a way to to this easily?
The easiest way to get the desired plot is to just recast your data. Then you can use geom_point. Since you don't provide an example, I used the standard example for geom_histogram to show this:
# load packages
require(ggplot2)
require(reshape)
# get data
data(movies)
movies <- movies[, c("title", "rating")]
# here's the equivalent of your plot
ggplot(movies) + geom_freqpoly(aes(x=rating, y=..density..), binwidth=.001) +
scale_y_continuous(trans = 'log10')
# recast the data
df1 <- recast(movies, value~., measure.var="rating")
names(df1) <- c("rating", "number")
# alternative way to recast data
df2 <- as.data.frame(table(movies$rating))
names(df2) <- c("rating", "number")
df2$rating <- as.numeric(as.character(df$rating))
# plot
p <- ggplot(df1, aes(x=rating)) + scale_y_continuous(trans="log10", name="density")
# with lines
p + geom_linerange(aes(ymax=number, ymin=.9))
# only points
p + geom_point(aes(y=number))

Scatterplot with ugly margins when using log scale

I have a somewhat "weird" two-dimensional distribution (not normal with some uniform values, but it kinda looks like this.. this is just a minimal reproducible example), and want to log-transform the values and plot them.
library("ggplot2")
library("scales")
df <- data.frame(x = c(rep(0,200),rnorm(800, 4.8)), y = c(rnorm(800, 3.2),rep(0,200)))
Without the log transformation, the scatterplot (incl. rug plot which I need) works (quite) well, apart from a marginally narrower rug plot on the x axis:
p <- ggplot(df, aes(x, y)) + geom_point() + geom_rug(alpha = I(0.5)) + theme_minimal()
p
When plotting the same with a log10-transform though, the points at the margin (at x = 0 and y = 0, respectively) are plotted outside the rug plot or just on the axis (with other data, and only one half side of a point is visible).
p + scale_x_log10() + scale_y_log10()
How can I "rescale" the axes so that all the points are contained fully within the grid and the rug plots are unaffected, as in the first example?
Maybe you want
p + scale_x_log10(oob=squish_infinite) + scale_y_log10(oob=squish_infinite)
I don't really know what you expect to happen for those values that can be negative or infinite, but one general advice when transformations don't do what you want is to perform them outside of ggplot2. Something like this might be useful,
library(plyr)
df2 <- colwise(log10)(df) # log transform columns
df2 <- colwise(squish_infinite)(df2) # do something with infinites
p %+% df2 # plot the transformed data

ggplot2 clustering in R

can someone point me in the right direction to making a plot like this one w/ ggplot2? even just the function type.
I've been looking around in ggplot2 and can't find anything like this.
I'm assuming that the essential features of the plot are that:
a.) the x-axis is categorical, and
b.) the x-positions of the points are varied slightly,
c.) some summary statistic (I used medians).
If that's what you're looking for,
require(ggplot2)
require(plyr)
#define the data
lev <- gl(2, 10, 20, labels=c("I", "II"))
y <- runif(20)
df <- data.frame(lev, y)
#calculate the medians - I'm guessing that's what the horiz lines are?
meds <- ddply(df, .(lev), summarise, med = median(y))
ggplot(df, aes(x=lev, y=y, colour=lev)) +
geom_point(position="jitter") +
theme_bw() +
scale_colour_manual(values=c("red", "darkblue")) +
geom_errorbar(data=meds, aes(x=lev, y=med, ymin=med, ymax=med))
You can use annotate() to add the numbers and the little bracket if that is important.

Resources