Replicate hist breaks in ggplot without a function outside of the plot - r

Histogram breaks
In base plot, when you use the hist function, it automatically calculates the breaks using the nclass.Sturges function, in ggplot however, you have to provide the breaks.
If I plot a histogram of the classical faithfull data I get the following graph:
data("faithful")
hist(faithful$waiting)
This works
I found out in this question, that you can mimic the
library(tidyverse)
data("faithful")
brx <- pretty(range(faithful$waiting), n = nclass.Sturges(faithful$waiting), min.n = 1)
ggplot(faithful, aes(waiting)) +
geom_histogram(color="darkgray", fill="white", breaks=brx)
But this does not
But I would like to add the breaks within my ggplot function, so I tried this:
ggplot(faithful, aes(waiting)) +
geom_histogram(color="darkgray", fill="white",
aes(breaks=pretty(range(waiting),
n = nclass.Sturges(waiting),
min.n = 1)))
Which gives me the following error:
Warning: Ignoring unknown aesthetics: breaks
Error: Aesthetics must be either length 1 or the same as the data (272): breaks, x
I understand what it means, but I can put Aesthetics of length one in aes, such as:
ggplot(faithful, aes(waiting)) +
geom_histogram(color="darkgray", fill="white", breaks=brx,
aes(alpha = 0.5))
What am I doing wrong?

geom_histogram uses the same aes as geom_bar according to the documentation and breaks isn't one of those aesthetics. (See geom_bar)
In the working code chunk you pass breaks to the function geom_histogram directly and that works, but in the problematic chunk you pass it as an aesthetic, and ggplot complains.
This works for me and I think does what you want:
ggplot(faithful, aes(x = waiting)) +
geom_histogram(color = "darkgray", fill = "white",
breaks = pretty(range(faithful$waiting),
n = nclass.Sturges(faithful$waiting),
min.n = 1))

Related

plotting multiple geom-vline in a graph

I am trying to plot two ´geom_vline()´ in a graph.
The code below works fine for one vertical line:
x=1:7
y=1:7
df1 = data.frame(x=x,y=y)
vertical.lines <- c(2.5)
ggplot(df1,aes(x=x, y=y)) +
geom_line()+
geom_vline(aes(xintercept = vertical.lines))
However, when I add the second desired vertical line by changing
vertical.lines <- c(2.5,4), I get the error:
´Error: Aesthetics must be either length 1 or the same as the data (7): xintercept´
How do I fix that?
Just remove aes() when you use + geom_vline:
ggplot(df1,aes(x=x, y=y)) +
geom_line()+
geom_vline(xintercept = vertical.lines)
It's not working because the second aes() conflicts with the first, it has to do with the grammar of ggplot.
You should see +geom_vline as a layer of annotation to the graph, not like +geom_points or +geom_line which are for mapping data to the plot. (See here how they are in two different sections).
All the aesthetics need to have either length 1 or the same as the data, as the error tells you. But the annotations can have different lengths.
Data:
x=1:7
y=1:7
df1 = data.frame(x=x,y=y)
vertical.lines <- c(2.5,4)
ggplot(df1, aes(x = x, y = y)) +
geom_line() +
sapply(vertical.lines, function(xint) geom_vline(aes(xintercept = xint)))

How do I have the standard errors around regression lines in R with multiple colours in the same plot?

I'm using ggplot to plot several pieces of data on the same graph. Below, each line is a time point and has different color, which I've manually selected using the scale_color_manual function. I'd like to include standard error but have the standard error colour match the color of the regression line.
Below I've edited the color of standard error to red by changing the 'fill' in the geom_smooth function. But don't know how to change it so each line and error match.
ggplot(data, aes(x=log10(x), y=y, color=factor(Time)))+
geom_smooth(method="loess", span=2, fill="red") +
facet_wrap(~Condition)+
scale_color_manual(name="Time",values=c("red","blue","green"))
Set the fill aesthetic. This can be done in the ggplot() call or in the geom_smooth call.
data = data.frame(x = runif(60), y = runif(60),
Time = rep(1:3, 20),
Condition = factor(rep(1:2, 30)))
ggplot(data, aes(x=log10(x), y=y, colour=factor(Time), fill = factor(Time)))+
geom_smooth( method="loess", span=2) +
facet_wrap(~Condition)+
scale_color_manual(name="Time", values=c("red","blue","green")) +
scale_fill_manual(name="Time", values=c("red","blue","green"))

Adding shaded target region to ggplot2 barchart

I have two data frames: one I am using to create the bars in a barchart and a second that I am using to create a shaded "target region" behind the bars using geom_rect.
Here is example data:
test.data <- data.frame(crop=c("A","B","C"), mean=c(6,4,12))
target.data <- data.frame(crop=c("ONE","TWO"), mean=c(31,12), min=c(24,9), max=c(36,14))
I start with the means of test.data for the bars and means of target.data for the line in the target region:
library(ggplot2)
a <- ggplot(test.data, aes(y=mean, x=crop)) + geom_hline(aes(yintercept = mean, color = crop), target.data) + geom_bar(stat="identity")
a
So far so good, but then when I try to add a shaded region to display the min-max range of target.data, there is an issue. The shaded region appears just fine, but somehow, the crops from target.data are getting added to the x-axis. I'm not sure why this is happening.
b <- a + geom_rect(aes(xmin=-Inf, xmax=Inf, ymin=min, ymax=max, fill = crop), data = target.data, alpha = 0.5)
b
How can I add the geom_rect shapes without adding those extra names to the x-axis of the bar-chart?
This is a solution to your question, but I'd like to better understand you problem because we might be able to make a more interpretable plot. All you have to do is add aes(x = NULL) to your geom_rect() call. I took the liberty to change the variable 'crop' in add.data to 'brop' to minimize any confusion.
test.data <- data.frame(crop=c("A","B","C"), mean=c(6,4,12))
add.data <- data.frame(brop=c("ONE","TWO"), mean=c(31,12), min=c(24,9), max=c(36,14))
ggplot(test.data, aes(y=mean, x=crop)) +
geom_hline(data = add.data, aes(yintercept = mean, color = brop)) +
geom_bar(stat="identity") +
geom_rect(data = add.data, aes(xmin=-Inf, xmax=Inf, x = NULL, ymin=min, ymax=max, fill = brop),
alpha = 0.5, show.legend = F)
In ggplot calls all of the aesthetics or aes() are inherited from the intial call:
ggplot(data, aes(x=foo, y=bar)).
That means that regardless of what layers I add on geom_rect(), geom_hline(), etc. ggplot is looking for 'foo' to assign to x and 'bar' to assign to y, unless you specifically tell it otherwise. So like aeosmith pointed out you can clear all inherited aethesitcs for a layer with inherit.aes = FALSE, or you can knock out single variables at a time by reassigning them as NULL.

Whisker plots to compare mean and variance between clusters [duplicate]

I am trying to recreate a figure from a GGplot2 seminar http://dl.dropbox.com/u/42707925/ggplot2/ggplot2slides.pdf.
In this case, I am trying to generate Example 5, with jittered data points subject to a dodge. When I run the code, the points are centered around the correct line, but have no jitter.
Here is the code directly from the presentation.
set.seed(12345)
hillest<-c(rep(1.1,100*4*3)+rnorm(100*4*3,sd=0.2),
rep(1.9,100*4*3)+rnorm(100*4*3,sd=0.2))
rep<-rep(1:100,4*3*2)
process<-rep(rep(c("Process 1","Process 2","Process 3","Process 4"),each=100),3*2)
memorypar<-rep(rep(c("0.1","0.2","0.3"),each=4*100),2)
tailindex<-rep(c("1.1","1.9"),each=3*4*100)
ex5<-data.frame(hillest=hillest,rep=rep,process=process,memorypar=memorypar, tailindex=tailindex)
stat_sum_df <- function(fun, geom="crossbar", ...) {stat_summary(fun.data=fun, geom=geom, ...) }
dodge <- position_dodge(width=0.9)
p<- ggplot(ex5,aes(x=tailindex ,y=hillest,color=memorypar))
p<- p + facet_wrap(~process,nrow=2) + geom_jitter(position=dodge) +geom_boxplot(position=dodge)
p
In ggplot2 version 1.0.0 there is new position named position_jitterdodge() that is made for such situation. This postion should be used inside the geom_point() and there should be fill= used inside the aes() to show by which variable to dodge your data. To control the width of dodging argument dodge.width= should be used.
ggplot(ex5, aes(x=tailindex, y=hillest, color=memorypar, fill=memorypar)) +
facet_wrap(~process, nrow=2) +
geom_point(position=position_jitterdodge(dodge.width=0.9)) +
geom_boxplot(fill="white", outlier.colour=NA, position=position_dodge(width=0.9))
EDIT: There is a better solution with ggplot2 version 1.0.0 using position_jitterdodge. See #Didzis Elferts' answer. Note that dodge.width controls the width of the dodging and jitter.width controls the width of the jittering.
I'm not sure how the code produced the graph in the pdf.
But does something like this get you close to what you're after?
I convert tailindex and memorypar to numeric; add them together; and the result is the x coordinate for the geom_jitter layer. There's probably a more effective way to do it. Also, I'd like to see how dodging geom_boxplot and geom_jitter, and with no jittering, will produce the graph in the pdf.
library(ggplot2)
dodge <- position_dodge(width = 0.9)
ex5$memorypar2 <- as.numeric(ex5$tailindex) +
3 * (as.numeric(as.character(ex5$memorypar)) - 0.2)
p <- ggplot(ex5,aes(x=tailindex , y=hillest)) +
scale_x_discrete() +
geom_jitter(aes(colour = memorypar, x = memorypar2),
position = position_jitter(width = .05), alpha = 0.5) +
geom_boxplot(aes(colour = memorypar), outlier.colour = NA, position = dodge) +
facet_wrap(~ process, nrow = 2)
p

ggplot not showing data

I am trying to make a nice plot with ggplot. However, I do not know why it is not showing data.
Here is some minimum code
dummylabels <- c("A","B","C")
dummynumbers <- c(1,2,3)
dummy_frame <- data.frame(dummylabels,dummynumbers)
p= ggplot(data=dummy_frame, aes(x =dummylabels , y = dummynumbers)) + geom_bar(fill = "blue")
p + coord_flip() + labs(title = "Title")
I get the following error message, which I cannot make sense of
Error : Mapping a variable to y and also using stat="bin".
With stat="bin", it will attempt to set the y value to the count of cases in each group.
This can result in unexpected behavior and will not be allowed in a future version of ggplot2.
If you want y to represent counts of cases, use stat="bin" and don't map a variable to y.
If you want y to represent values in the data, use stat="identity".
See ?geom_bar for examples. (Defunct; last used in version 0.9.2)
Why do I get this error?
From the error message you got:
If you want y to represent values in the data, use stat="identity".
geom_bar expects to be used as a histogram, where it bins the data itself and calculates heights based on frequency. This is the stat="bin" behaviour, and is the default. It throws an error, as you gave it a y value too. To fix it, you want stat="identity":
p <- ggplot(data = dummy_frame, aes(x = dummylabels, y = dummynumbers)) +
geom_bar(fill = "blue", stat = "identity") +
coord_flip() +
labs(title = "Title")
p

Resources