I'm trying to plot something like this, where alpha is intended to be a mean of dat$c per bin:
library(ggplot2)
set.seed(1)
dat <- data.frame(a = rnorm(1000), b = rnorm(1000), c = 1/rnorm(1000),
d = as.factor(sample(c(0, 1), 1000, replace=TRUE)))
# plot
p <- ggplot(dat, environment = environment()) +
geom_bin2d(aes(x=a, y=b, alpha=c, fill=d),
binwidth = c(1.0/10, 1.0/10))
but it doesn't look like alpha is correct. Please help
I'm not sure what you're expecting to see, but this will calculate mean(dat$c) in each bin and plot the result.
library(ggplot2)
brks <- seq(-5,5,0.1)
lbls <- brks[-1]-0.05
gg.df <- aggregate(c~cut(a, brks, lbls)+cut(b,brks, lbls)+d,dat,FUN=mean)
names(gg.df)[1:2] <- c("a","b")
gg.df$a <- as.numeric(as.character(gg.df$a))
gg.df$b <- as.numeric(as.character(gg.df$b))
ggplot(gg.df, aes(x=a, y=b, alpha=c, fill=d)) + geom_raster() + coord_fixed()
Edit: Response to OP's comment.
You could try:
dat$c <- with(dat,1/(a^2+b^2))
This makes dat$c inversely proportional to the radius (distance from (0,0) to the point). Now running the same code as above:
gg.df <- aggregate(c~cut(a, brks, lbls)+cut(b,brks, lbls)+d,dat,FUN=mean)
names(gg.df)[1:2] <- c("a","b")
gg.df$a <- as.numeric(as.character(gg.df$a))
gg.df$b <- as.numeric(as.character(gg.df$b))
ggplot(gg.df, aes(x=a, y=b, alpha=c, fill=d)) + geom_raster() + coord_fixed() +
scale_alpha_continuous(trans="log",breaks=10^(0:3))
Produces this, as expected: a plot having tiles with higher alpha (less transparent) near the center.
I needed to use a log scale for alpha because the values range over several orders of magnitude.
Related
I would like to add decile breaks to the following plot.
ggplot(data = data.frame(x = c(-0, 1)), aes(x)) +
stat_function(fun = dexp, n = 101,
args = list(rate=3)
)
As an example of what I would like to achieve, here is another plot but I would like to find a way to stick to stat_function since I can easily change the underlying distribution and parameters and also the plot is always more smooth.
vals <-rexp(1001,rate=3)
dens <- density(vals)
plot(dens)
df <- data.frame(x=dens$x, y=dens$y)
probs <-seq(0,1,0.1)
quantiles <- quantile(vals, prob=probs)
df$quant <- factor(findInterval(df$x,quantiles))
ggplot(df, aes(x,y)) + geom_line() + geom_ribbon(aes(ymin=0, ymax=y, fill=quant)) +
scale_x_continuous(breaks=quantiles) + scale_fill_brewer(guide="none")
This is a bit hacky, uses pexp with some rounding to calculate which decile each x location is in. It doesn't use stat_function but has the same advantages as you mention.
# Set up range and number of x points
x=seq(0,1,l=1e4)
# Create x and y positions with dexp
df <- data.frame(x=x, y=dexp(x,rate=3))
# Identify quantiles using pexp
df$quant <- scales::number(floor(10*pexp(df$x,rate=3))*10, suffix="%")
# Find x positions for breaks
xpos <- aggregate(df, x~quant, min)
# Plot
ggplot(df, aes(x,y)) +
geom_line() +
geom_ribbon(aes(ymin=0, ymax=y, fill=quant)) +
scale_x_continuous(breaks=xpos$x, labels=xpos$quant)
ggplot2 can create a very attractive filled violin plot:
ggplot() + geom_violin(data=data.frame(x=1, y=rnorm(10 ^ 5)),
aes(x=x, y=y), fill='gray90', color='black') +
theme_classic()
I'd like to restrict the fill to the central 95% of the distribution if possible, leaving the outline intact. Does anyone have suggestions on how to accomplish this?
Does this do what you want? It requires some data-processing and the drawing of two violins.
set.seed(1)
dat <- data.frame(x=1, y=rnorm(10 ^ 5))
#calculate for each point if it's central or not
dat_q <- quantile(dat$y, probs=c(0.025,0.975))
dat$central <- dat$y>dat_q[1] & dat$y < dat_q[2]
#plot; one'95' violin and one 'all'-violin with transparent fill.
p1 <- ggplot(data=dat, aes(x=x,y=y)) +
geom_violin(data=dat[dat$central,], color="transparent",fill="gray90")+
geom_violin(color="black",fill="transparent")+
theme_classic()
Edit: the rounded edges bothered me, so here is a second approach. If I were doing this, I would want straight lines. So I did some playing with the density (which is what violin plots are based on)
d_y <- density(dat$y)
right_side <- data.frame(x=d_y$y, y=d_y$x) #note flip of x and y, prevents coord_flip later
right_side$central <- right_side$y > dat_q[1]&right_side$y < dat_q[2]
#add the 'left side', this entails reversing the order of the data for
#path and polygon
#and making x negative
left_side <- right_side[nrow(right_side):1,]
left_side$x <- 0 - left_side$x
density_dat <- rbind(right_side,left_side)
p2 <- ggplot(density_dat, aes(x=x,y=y)) +
geom_polygon(data=density_dat[density_dat$central,],fill="red")+
geom_path()
p2
Just make a selection first. Proof of concept:
df1 <- data.frame(x=1, y=rnorm(10 ^ 5))
df2 <- subset(df1, y > quantile(df1$y, 0.025) & y < quantile(df1$y, 0.975))
ggplot(mapping = aes(x = x, y = y)) +
geom_violin(data = df1, aes(fill = '100%'), color = NA) +
geom_violin(data = df2, aes(fill = '95%'), color = 'black') +
theme_classic() +
scale_fill_grey(name = 'level')
#Heroka gave a great answer. Here is a more general function based on his answer that allows to fill the violin plot according to any ranges (not just quantiles).
violincol <- function(x,from=-Inf,to=Inf,col='grey'){
d <- density(x)
right <- data.frame(x=d$y, y=d$x) #note flip of x and y, prevents coord_flip later
whichrange <- function(r,x){x <= r[2] & x > r[1]}
ranges <- cbind(from,to)
right$col <- sapply(right$y,function(y){
id <- apply(ranges,1,whichrange,y)
if(all(id==FALSE)) NA else col[which(id)]
})
left <- right[nrow(right):1,]
left$x <- 0 - left$x
dat <- rbind(right,left)
p <- ggplot(dat, aes(x=x,y=y)) +
geom_polygon(data=dat,aes(fill=col),show.legend = F)+
geom_path()+
scale_fill_manual(values=col)
return(p)
}
x <- rnorm(10^5)
violincol(x=x)
violincol(x=x,from=c(-Inf,0),to=c(0,Inf),col=c('green','red'))
r <- seq(-5,5,0.5)
violincol(x=x,from=r,to=r+0.5,col=rainbow(length(r)))
ggplot2 can create a very attractive filled violin plot:
ggplot() + geom_violin(data=data.frame(x=1, y=rnorm(10 ^ 5)),
aes(x=x, y=y), fill='gray90', color='black') +
theme_classic()
I'd like to restrict the fill to the central 95% of the distribution if possible, leaving the outline intact. Does anyone have suggestions on how to accomplish this?
Does this do what you want? It requires some data-processing and the drawing of two violins.
set.seed(1)
dat <- data.frame(x=1, y=rnorm(10 ^ 5))
#calculate for each point if it's central or not
dat_q <- quantile(dat$y, probs=c(0.025,0.975))
dat$central <- dat$y>dat_q[1] & dat$y < dat_q[2]
#plot; one'95' violin and one 'all'-violin with transparent fill.
p1 <- ggplot(data=dat, aes(x=x,y=y)) +
geom_violin(data=dat[dat$central,], color="transparent",fill="gray90")+
geom_violin(color="black",fill="transparent")+
theme_classic()
Edit: the rounded edges bothered me, so here is a second approach. If I were doing this, I would want straight lines. So I did some playing with the density (which is what violin plots are based on)
d_y <- density(dat$y)
right_side <- data.frame(x=d_y$y, y=d_y$x) #note flip of x and y, prevents coord_flip later
right_side$central <- right_side$y > dat_q[1]&right_side$y < dat_q[2]
#add the 'left side', this entails reversing the order of the data for
#path and polygon
#and making x negative
left_side <- right_side[nrow(right_side):1,]
left_side$x <- 0 - left_side$x
density_dat <- rbind(right_side,left_side)
p2 <- ggplot(density_dat, aes(x=x,y=y)) +
geom_polygon(data=density_dat[density_dat$central,],fill="red")+
geom_path()
p2
Just make a selection first. Proof of concept:
df1 <- data.frame(x=1, y=rnorm(10 ^ 5))
df2 <- subset(df1, y > quantile(df1$y, 0.025) & y < quantile(df1$y, 0.975))
ggplot(mapping = aes(x = x, y = y)) +
geom_violin(data = df1, aes(fill = '100%'), color = NA) +
geom_violin(data = df2, aes(fill = '95%'), color = 'black') +
theme_classic() +
scale_fill_grey(name = 'level')
#Heroka gave a great answer. Here is a more general function based on his answer that allows to fill the violin plot according to any ranges (not just quantiles).
violincol <- function(x,from=-Inf,to=Inf,col='grey'){
d <- density(x)
right <- data.frame(x=d$y, y=d$x) #note flip of x and y, prevents coord_flip later
whichrange <- function(r,x){x <= r[2] & x > r[1]}
ranges <- cbind(from,to)
right$col <- sapply(right$y,function(y){
id <- apply(ranges,1,whichrange,y)
if(all(id==FALSE)) NA else col[which(id)]
})
left <- right[nrow(right):1,]
left$x <- 0 - left$x
dat <- rbind(right,left)
p <- ggplot(dat, aes(x=x,y=y)) +
geom_polygon(data=dat,aes(fill=col),show.legend = F)+
geom_path()+
scale_fill_manual(values=col)
return(p)
}
x <- rnorm(10^5)
violincol(x=x)
violincol(x=x,from=c(-Inf,0),to=c(0,Inf),col=c('green','red'))
r <- seq(-5,5,0.5)
violincol(x=x,from=r,to=r+0.5,col=rainbow(length(r)))
I'd like to create a faceted plot using ggplot2 in which the minimum limit of the y axis will be fixed (say at 0) and the maximum limit will be determined by the data in the facet (as it is when scales="free_y". I was hoping that something like the following would work, but no such luck:
library(plyr)
library(ggplot2)
#Create the underlying data
l <- gl(2, 10, 20, labels=letters[1:2])
x <- rep(1:10, 2)
y <- c(runif(10), runif(10)*100)
df <- data.frame(l=l, x=x, y=y)
#Create a separate data frame to define axis limits
dfLim <- ddply(df, .(l), function(y) max(y$y))
names(dfLim)[2] <- "yMax"
dfLim$yMin <- 0
#Create a plot that works, but has totally free scales
p <- ggplot(df, aes(x=x, y=y)) + geom_point() + facet_wrap(~l, scales="free_y")
#Add y limits defined by the limits dataframe
p + ylim(dfLim$yMin, dfLim$yMax)
It's not too surprising to me that this throws an error (length(lims) == 2 is not TRUE) but I can't think of a strategy to get started on this problem.
In your case, either of the following will work:
p + expand_limits(y=0)
p + aes(ymin=0)
I'm trying to use ggplot2 to create and label a scatterplot. The variables that I am plotting are both scaled such that the horizontal and the vertical axis are plotted in units of standard deviation (1,2,3,4,...ect from the mean). What I would like to be able to do is label ONLY those elements that are beyond a certain limit of standard deviations from the mean. Ideally, this labeling would be based off of another column of data.
Is there a way to do this?
I've looked through the online manual, but I haven't been able to find anything about defining labels for plotted data.
Help is appreciated!
Thanks!
BEB
Use subsetting:
library(ggplot2)
x <- data.frame(a=1:10, b=rnorm(10))
x$lab <- letters[1:10]
ggplot(data=x, aes(a, b, label=lab)) +
geom_point() +
geom_text(data = subset(x, abs(b) > 0.2), vjust=0)
The labeling can be done in the following way:
library("ggplot2")
x <- data.frame(a=1:10, b=rnorm(10))
x$lab <- rep("", 10) # create empty labels
x$lab[c(1,3,4,5)] <- LETTERS[1:4] # some labels
ggplot(data=x, aes(x=a, y=b, label=lab)) + geom_point() + geom_text(vjust=0)
Subsetting outside of the ggplot function:
library(ggplot2)
set.seed(1)
x <- data.frame(a = 1:10, b = rnorm(10))
x$lab <- letters[1:10]
x$lab[!(abs(x$b) > 0.5)] <- NA
ggplot(data = x, aes(a, b, label = lab)) +
geom_point() +
geom_text(vjust = 0)
Using qplot:
qplot(a, b, data = x, label = lab, geom = c('point','text'))