We may want to define some global aes() for a ggplot() graphics, but exclude them in some layers. For instance suppose the following example:
foo <- data.frame(x=runif(10),y=runif(10))
bar <- data.frame(x=c(0,1),ymin=c(-.1,.9),ymax=c(.1,1.1))
p <- ggplot(foo,aes(x=x,y=y))+geom_point()
Everything is good. However when trying to add the ribbon:
p <- p + geom_ribbon(data=bar, aes(x=x,ymin=ymin,ymax=ymax), alpha=.1)
# Error: Discrete value supplied to continuous scale
This error happens because we have already defined y as a part of global aes() that applies also to the geom_ribbon(), but the bar does not have it.
I have found two possibilities to escape this error, one of them is to remove y=y from the original ggplot(foo,aes(x=x,y=y)), however every time in the future I need to draw something I should add y=y to the aes() that is not good.
The other possibility is to add a fake y column to bar:
bar = cbind(bar, y=0)
p <- p + geom_ribbon(data=bar, aes(x=x,ymin=ymin,ymax=ymax), alpha=.1)
Now works good. However I don't like acting so, as it's a fake variable. Is there any way to temporarily disable the already defined aes() in ggplot() when calling the geom_ribbon()?
As said in the comments by #ErnestA, we can unmap the aesthetics by setting them to NULL
aes(y=NULL,x=x,ymin=ymin,ymax=ymax)
PS: For the legend you can now override aesthetic by aes.override
Related
I am planning to plot a bar-plot/clustered column chart for time vs revenue with trend-line connecting each bars on top. Starting from year 1981 to 1988.
I have used this code to read the csv : read.csv(file_location/Revenue.csv",header = T, sep=",", dec = ".")
for the plotting : pl <- ggplot(data,aes(x=ï..Year))
and then : pl + geom_bar(color='red',fill='blue').
Unfortunately, i end up with something like this. Whereas, i'd prefer something like this.
I used only ggplot2 library in this case, should i use tidyr, diplyr additionally ? Am i mistaking between continuous and discrete variables. Any advice regarding aesthetic modification to beautify it or solutions regarding this would be really appreciated as i am still in the basics of ggplot and data visualizations.
I have added the fine in case if you want to check it : Revenue.csv
Check the documentation here form some information, but the big change you should make is to use geom_col in place of geom_bar. Your current call specifies an x= aesthetic (what should be the x axis), but not the y= aesthetic (what should be the y axis). geom_bar indicates the number of cases/observations at each x value by default, whereas geom_col is used to display a bar of length y at each x value... but you need a y aesthetic.
With all that being said, try this:
pl <- ggplot(data,aes(x=ï..Year, y=your.y.column.name)) +
geom_col(color='red',fill='blue')
As for aesthetics, I might change the color scheme a bit and also the theme, but that's ind of personal preference. My suggestion would be to at least change your color scheme for geom_bar/col. The color= specifies the outline on the bars, and the fill= is the color of the bars. Your code would give you bright blue bars with a red outline... not awesome. I would also change the width of your bars a to be a bit skinnier by adjusting the width= argument from the default of 1 to something smaller. Here is an example with a dummy dataset. Most people (me included) would not want to download someone else's data via a link, sorry.
df <- data.frame(x=1:10, y=1:10)
ggplot(df, aes(x=x, y=y)) +
geom_col(fill='steelblue', color='black', width=0.5) +
theme_bw()
When using facet_grid(x ~ y) with ggplot2 I've seen in various examples and read in the documentation that the x variable is laid out vertically and the y variable horizontally. However, when I run the following:
set.seed(1)
b = c(rnorm(10000,mean=0,sd=0.5),rnorm(10000,mean=5,sd=0.5),
rnorm(10000,mean=7,sd=0.5),rnorm(10000,mean=10,sd=0.5))
x = c(rep('xL', 20000), rep('xR',20000))
y = c(rep('yL',10000), rep('yR',20000), rep('yL',10000))
foo = data.frame(x=x,y=y,b=b)
ggplot(data=foo, aes(foo$b)) +
geom_histogram(aes(y=..density..),breaks=seq(-5,12,by=.2),col='steelblue',fill='steelblue2') +
geom_density(col='black') +
facet_grid(x ~ y, scales='free_y')
I get the below (sorry for the quality). And even though, from above, the distribution with mean 10 is the one with (x,y) of 'xR,xL' that one appears in the bottom right quadrant which has labels 'xR,yR'. What am I doing wrong?
Change aes(foo$b) to aes(x = b) to make sure the aesthetics are mapping correctly.
You want to make sure ggplot is finding the column labelled b from the correct scope i.e. from the data that it has been passed. For example, it may be the case that ggplot rearranged your data when you passed it, so mapping the variable foo$b no longer aligns with what you want.
I'm not saying this is what happened - just an example of why calling the aesthetic from the correct scope is important.
I am building a barplot with a line connecting two bars in order to show that asterisk refers to the difference between them:
Most of the plot is built correctly with the following code:
mytbl <- data.frame(
"var" =c("test", "control"),
"mean1" =c(0.019, 0.022),
"sderr"= c(0.001, 0.002)
);
mytbl$var <- relevel(mytbl$var, "test"); # without this will be sorted alphabetically (i.e. 'control', then 'test')
p <-
ggplot(mytbl, aes(x=var, y=mean1)) +
geom_bar(position=position_dodge(), stat="identity") +
geom_errorbar(aes(ymin=mean1-sderr, ymax=mean1+sderr), width=.2)+
scale_y_continuous(labels=percent, expand=c(0,0), limits=c(NA, 1.3*max(mytbl$mean1+mytbl$sderr))) +
geom_text(mapping=aes(x=1.5, y= max(mean1+sderr)+0.005), label='*', size=10)
p
The only thing missing is the line itself. In my very old code, it was supposedly working with the following:
p +
geom_line(
mapping=aes(x=c(1,1,2,2),
y=c(mean1[1]+sderr[1]+0.001,
max(mean1+sderr) +0.004,
max(mean1+sderr) +0.004,
mean1[2]+sderr[2]+0.001)
)
)
But when I run this code now, I get an error: Error: Aesthetics must be either length 1 or the same as the data (2): x, y. By trying different things, I came to an awkward workaround: I add data=rbind(mytbl,mytbl), before mapping but I don't understand what really happens here.
P.S. additional little question (I know, I should ask in a separate SO post, sorry for that) - why in scale_y_continuous(..., limits()) I can't address data by columns and have to call mytbl$ explicitly?
Just put all that in a separate data frame:
line_data <- data.frame(x=c(1,1,2,2),
y=with(mytbl,c(mean1[1]+sderr[1]+0.001,
max(mean1+sderr) +0.004,
max(mean1+sderr) +0.004,
mean1[2]+sderr[2]+0.001)))
p + geom_line(data = line_data,aes(x = x,y = y))
In general, you should avoid using things like [ and $ when you map aesthetics inside of aes(). The intended way to use ggplot2 is usually to adjust your data into a format such that each column is exactly what you want plotted already.
You can't reference variables in mytbl in the scale_* functions because that data environment isn't passed along like it is with layers. The scales are treated separately than the data layers, and so the information about them is generally assumed to live somewhere separate from the data you are plotting.
When using facet_grid(x ~ y) with ggplot2 I've seen in various examples and read in the documentation that the x variable is laid out vertically and the y variable horizontally. However, when I run the following:
set.seed(1)
b = c(rnorm(10000,mean=0,sd=0.5),rnorm(10000,mean=5,sd=0.5),
rnorm(10000,mean=7,sd=0.5),rnorm(10000,mean=10,sd=0.5))
x = c(rep('xL', 20000), rep('xR',20000))
y = c(rep('yL',10000), rep('yR',20000), rep('yL',10000))
foo = data.frame(x=x,y=y,b=b)
ggplot(data=foo, aes(foo$b)) +
geom_histogram(aes(y=..density..),breaks=seq(-5,12,by=.2),col='steelblue',fill='steelblue2') +
geom_density(col='black') +
facet_grid(x ~ y, scales='free_y')
I get the below (sorry for the quality). And even though, from above, the distribution with mean 10 is the one with (x,y) of 'xR,xL' that one appears in the bottom right quadrant which has labels 'xR,yR'. What am I doing wrong?
Change aes(foo$b) to aes(x = b) to make sure the aesthetics are mapping correctly.
You want to make sure ggplot is finding the column labelled b from the correct scope i.e. from the data that it has been passed. For example, it may be the case that ggplot rearranged your data when you passed it, so mapping the variable foo$b no longer aligns with what you want.
I'm not saying this is what happened - just an example of why calling the aesthetic from the correct scope is important.
I have been trying to work out a few things about ggplot2, and how supplemntary arguments inherit from the first part ggplot(). Specifically, if inheritance is passed on beyond the geom_*** part.
I have a histogram of data:
ggplot(data = faithful, aes(eruptions)) + geom_histogram()
Which produces a fine chart, though the breaks are default. It appears to me (an admitted novice), that geom_histogram() is inheriting the data specification from ggplot(). If I want to have a smarter way of setting the breaks I could use a process like so:
ggplot(data = faithful, aes(eruptions)) +
geom_histogram(breaks = seq(from = min(faithful$eruptions),
to = max(faithful$eruptions), length.out = 10))
However, here I am re-specifying within the geom_histogram() function that I want faithful$eruptions. I have been unable to find a way to phrase this without re-specifying. Further, if I use the data = argument in geom_histogram(), and specify just eruptions in min and max, seq() still doesn't understand that I mean the faithful data set.
I know that seq is not part of ggplot2, but I wondered if it might be able to inherit regardless, as it is bound within geom_histogram(), which itself inherits from ggplot(). Am I just using the wrong syntax, or is this possible?
Note that the term you are looking for is not "inheritance", but non standard evaluation (NSE). ggplot offers a couple of places where you can refer to your data items by their column names instead of a full reference (NSE), but those are the mapping arguments to the geom_* layers only, and even then when you are using aes. These work:
ggplot(faithful) + geom_point(aes(eruptions, eruptions))
ggplot(faithful) + geom_point(aes(eruptions, eruptions, size=waiting))
The following doesn't work because we are referring to waiting outside of aes and mapping (note first arg to geom_* is the mapping arg):
ggplot(faithful) + geom_point(aes(eruptions, eruptions), size=waiting)
But this works:
ggplot(faithful) + geom_point(aes(eruptions, eruptions), size=faithful$waiting)
though differently since now size is being interpreted litterally instead of being normalized as when part of mapping.
In your case, since breaks is not part of the aes/mapping spec, you can't use NSE and you are left using the full reference. Some possible work-arounds:
ggplot(data = faithful, aes(eruptions)) + geom_histogram(bins=10) # not identical
ggplot(data=faithful, aes(eruptions)) +
geom_histogram(
breaks=with(faithful, # use `with`
seq(from=max(eruptions), to=min(eruptions), length.out=10)
) )
And no-NSE, but a little less typing:
ggplot(data=faithful, aes(eruptions)) +
geom_histogram(
breaks=do.call(seq, c(as.list(range(faithful$eruptions)), len=10))
)
Based on the ggplot2 documentation it seems that + operator which is really the +.gg function allows adding the following objects to a ggplot object:
data.frame, uneval, layer, theme, scale, coord, facet
The geom function are functions that create layers which inherit the data and aes from the ggplot object "above" unless stated otherwise.
However the ggplot object and functions "live" in the Global environment, and thus calling a function such as seq which doesn't create a ggplot object from the ones listed above and doesn't inherit the ggplot object's themes (with the + operator which apply's to the listed above objects) lives in the global environment which doesn't include an object eruptions