Why is geom_text() plotting the text several times? - r

Please consider the following minimal example:
library(ggplot2)
library(ggrepel)
ggplot(mtcars) +
aes(x = mpg, y = qsec) +
geom_line() +
geom_text(x = 20, y = 20, label = "(20,20)")
I guess you can see pretty easily that the text "(20,20)" is heavily overplotted (actually, I don't know whether that's the correct word. I mean that the text is plotted several times at one location).
If I use annotate(), this does not happen:
ggplot(mtcars) +
aes(x = mpg, y = qsec) +
geom_line() +
annotate("text", x = 20, y = 20, label = "(20,20)")
"So, why don't you use annotate() then?" you might ask. Actually, I don't want to use text for annotation but labels. And I also want to use the {ggrepel} package to avoid overplotting. But look what happens, when I try this:
ggplot(mtcars) +
aes(x = mpg, y = qsec) +
geom_line() +
geom_label_repel(x = 20, y = 20, label = "(20,20)")
Again, many labels are plotted and {ggrepel} does a good job at preventing them from overlapping. But I want only one label pointing at a specific location. I really don't understand why this happens. I only supplied one value for x, y and label each. I also tried data = NULL and inherit.aes = F and putting the values into aes() within geom_label_repel() to no effect. I suspect that there are as many labels as there are rows in mtcars. For my real application that's really bad because I have a lot of rows in the respective dataset.
Could you help me out here and maybe give a short explanation why this happens and why your solution works? Thanks a lot!

Add "check_overlap = TRUE" to geom_text to prevent overplotting.
library(ggplot2)
ggplot(mtcars) +
aes(x = mpg, y = qsec) +
geom_line() +
geom_text(x = 20, y = 20, label = "(20,20)", check_overlap = TRUE)

geom_text or geom_label_repel adds one label per row. Therefore you can submit a separate dataset for annotation geom. For example:
library(ggplot2)
library(ggrepel)
ggplot(mtcars, aes(mpg, qsec)) +
geom_line() +
geom_label_repel(aes(20, 20, label = "(20,20)"), data.frame())

Related

Coloring subset of lines in a plot using ggplot

I am trying to connect the dots on my plot using geom_path(). I also want to color certain lines(intervals) based on a group variable(t). This is what I have so far:
ggplot(data, aes(x=x, y=x)) +
geom_point() +
geom_path(color=t)
What this does is it "incorrectly" connects the points based on this group. I just want the correct connecting lines to have a separate color.
Could any one help me with this?
Since you did not share your data: You could be experiencing an edge case that occurs if you color by boolean; e.g., a specific value of a variable.
In this case, ggplot groups your geom_path by var == x. You can prevent this by adding group = 1.
Basic (somewhat contrived) example
ggplot(mtcars) +
geom_point(aes(mpg, hp)) +
geom_path(aes(mpg, hp))
Above plot with color = cyl == 4
ggplot(mtcars) +
geom_point(aes(mpg, hp)) +
geom_path(aes(mpg, hp, color = cyl == 4))
Above plot with group = 1
ggplot(mtcars) +
geom_point(aes(mpg, hp)) +
geom_path(aes(mpg, hp, color = cyl == 4, group = 1))
If you pass either a single color (not what you want), or a vector of colors equal to the number of plot elements, you can get ggplot to color the lines for you. So, for instance,
data <- data.frame(x = 1:10, y = 1:10)
ggplot(data, aes(x=x, y=x)) +
geom_point() +
geom_path(color=rainbow(10))

How to label stacked histogram in ggplot

I am trying to add corresponding labels to the color in the bar in a histogram. Here is a reproducible code.
ggplot(aes(displ),data =mpg) + geom_histogram(aes(fill=class),binwidth = 1,col="black")
This code gives a histogram and give different colors for the car "class" for the histogram bars. But is there any way I can add the labels of the "class" inside corresponding colors in the graph?
The inbuilt functions geom_histogram and stat_bin are perfect for quickly building plots in ggplot. However, if you are looking to do more advanced styling it is often required to create the data before you build the plot. In your case you have overlapping labels which are visually messy.
The following codes builds a binned frequency table for the dataframe:
# Subset data
mpg_df <- data.frame(displ = mpg$displ, class = mpg$class)
melt(table(mpg_df[, c("displ", "class")]))
# Bin Data
breaks <- 1
cuts <- seq(0.5, 8, breaks)
mpg_df$bin <- .bincode(mpg_df$displ, cuts)
# Count the data
mpg_df <- ddply(mpg_df, .(mpg_df$class, mpg_df$bin), nrow)
names(mpg_df) <- c("class", "bin", "Freq")
You can use this new table to set a conditional label, so boxes are only labelled if there are more than a certain number of observations:
ggplot(mpg_df, aes(x = bin, y = Freq, fill = class)) +
geom_bar(stat = "identity", colour = "black", width = 1) +
geom_text(aes(label=ifelse(Freq >= 4, as.character(class), "")),
position=position_stack(vjust=0.5), colour="black")
I don't think it makes a lot of sense duplicating the labels, but it may be more useful showing the frequency of each group:
ggplot(mpg_df, aes(x = bin, y = Freq, fill = class)) +
geom_bar(stat = "identity", colour = "black", width = 1) +
geom_text(aes(label=ifelse(Freq >= 4, Freq, "")),
position=position_stack(vjust=0.5), colour="black")
Update
I realised you can actually selectively filter a label using the internal ggplot function ..count... No need to preformat the data!
ggplot(mpg, aes(x = displ, fill = class, label = class)) +
geom_histogram(binwidth = 1,col="black") +
stat_bin(binwidth=1, geom="text", position=position_stack(vjust=0.5), aes(label=ifelse(..count..>4, ..count.., "")))
This post is useful for explaining special variables within ggplot: Special variables in ggplot (..count.., ..density.., etc.)
This second approach will only work if you want to label the dataset with the counts. If you want to label the dataset by the class or another parameter, you will have to prebuild the data frame using the first method.
Looking at the examples from the other stackoverflow links you shared, all you need to do is change the vjust parameter.
ggplot(mpg, aes(x = displ, fill = class, label = class)) +
geom_histogram(binwidth = 1,col="black") +
stat_bin(binwidth=1, geom="text", vjust=1.5)
That said, it looks like you have other issues. Namely, the labels stack on top of each other because there aren't many observations at each point. Instead I'd just let people use the legend to read the graph.

How to correctly represent both hline and abline in a legend in ggplot2?

I am trying to create a legend in ggplot2 for hlines and ablines using clues from other similar questions. I am close to getting what I need with the following code (and example image) but I can't seem to get rid of the extra lines crossing the legend icons.
p <- ggplot(mtcars, aes(x = wt, y=mpg, col = factor(cyl))) + geom_point()
p + geom_hline(aes(lty="foo",yintercept=20)) +
geom_hline(aes(lty="bar",yintercept=25)) +
geom_hline(aes(lty="bar",yintercept=30)) +
geom_abline(aes(lty = "regression", intercept = 10 , slope = 1)) +
scale_linetype_manual(name="",values=c(2,3,1))
This behavior in the legend only appears when I include the abline. Without it, both hline appear as intended in the legend.
What am I missing here?
As a secondary point: both hlines (labelled "bar" here) here use the exact same configuration, but have different values for yintercept. I wasn't able to draw both of them with the same command, receiving an error (Error: Aesthetics must be either length 1 or the same as the data (32): linetype, yintercept).
Whenever I copy&paste a command like this, it feels like I'm not doing it right. Is it possible to set two yintercepts, while manually defining the linetype to create a legend?
You can use argument show.legend in the geom_abline:
ggplot() +
geom_point(aes(x = mtcars$wt, y=mtcars$mpg, col = factor(mtcars$cyl))) +
geom_hline(aes(lty=c("foo", "bar","bar"),yintercept=c(20,25,30))) +
geom_abline(aes(lty = "regression", intercept = 10 , slope = 1), show.legend = F) +
scale_linetype_manual(name="",values=c(2,3,1) )
If you not define the data on the ggplot command you can define all the hlines in just one command:
ggplot(mtcars, aes(x = wt, y=mpg, col = factor(cyl))) + geom_point() +
geom_hline(aes(lty="foo",yintercept=20)) +
geom_hline(aes(lty="bar",yintercept=25)) +
geom_hline(aes(lty="bar",yintercept=30)) +
geom_abline(aes(lty = "regression", intercept = 10 , slope = 1), show.legend = F) +
scale_linetype_manual(name="",values=c(2,3,1) )

How would you plot a box plot and specific points on the same plot?

We can draw box plot as below:
qplot(factor(cyl), mpg, data = mtcars, geom = "boxplot")
and point as:
qplot(factor(cyl), mpg, data = mtcars, geom = "point")
How would you combine both - but just to show a few specific points(say when wt is less than 2) on top of the box?
If you are trying to plot two geoms with two different datasets (boxplot for mtcars, points for a data.frame of literal values), this is a way to do it that makes your intent clear. This works with the current (Sep 2016) version of ggplot (ggplot2_2.1.0)
library(ggplot2)
ggplot() +
# box plot of mtcars (mpg vs cyl)
geom_boxplot(data = mtcars,
aes(x = factor(cyl), y= mpg)) +
# points of data.frame literal
geom_point(data = data.frame(x = factor(c(4,6,8)), y = c(15,20,25)),
aes(x=x, y=y),
color = 'red')
I threw in a color = 'red' for the set of points, so it's easy to distinguish them from the points generated as part of geom_boxplot
Use + geom_point(...) on your qplot (just add a + geom_point() to get all the points plotted).
To plot selectively just select those points that you want to plot:
n <- nrow(mtcars)
# plot every second point
idx <- seq(1,n,by=2)
qplot( factor(cyl), mpg, data=mtcars, geom="boxplot" ) +
geom_point( aes(x=factor(cyl)[idx],y=mpg[idx]) ) # <-- see [idx] ?
If you know the points before-hand, you can feed them in directly e.g.:
qplot( factor(cyl), mpg, data=mtcars, geom="boxplot" ) +
geom_point( aes(x=factor(c(4,6,8)),y=c(15,20,25)) ) # plot (4,15),(6,20),...
You can show both by using ggplot() rather than qplot(). The syntax may be a little harder to understand, but you can usually get much more done. If you want to plot both the box plot and the points you can write:
boxpt <- ggplot(data = mtcars, aes(factor(cyl), mpg))
boxpt + geom_boxplot(aes(factor(cyl), mpg)) + geom_point(aes(factor(cyl), mpg))
I don't know what you mean by only plotting specific points on top of the box, but if you want a cheap (and probably not very smart) way of just showing points above the edge of the box, here it is:
boxpt + geom_boxplot(aes(factor(cyl), mpg)) + geom_point(data = ddply(mtcars, .(cyl),summarise, mpg = mpg[mpg > quantile(mpg, 0.75)]), aes(factor(cyl), mpg))
Basically it's the same thing except for the data supplied to geom_point is adjusted to include only the mpg numbers in the top quarter of the distribution by cylinder. In general I'm not sure this is good practice because I think people expect to see points beyond the whiskers only, but there you go.

What is the simplest method to fill the area under a geom_freqpoly line?

The x-axis is time broken up into time intervals. There is an interval column in the data frame that specifies the time for each row. The column is a factor, where each interval is a different factor level.
Plotting a histogram or line using geom_histogram and geom_freqpoly works great, but I'd like to have a line, like that provided by geom_freqpoly, with the area filled.
Currently I'm using geom_freqpoly like this:
ggplot(quake.data, aes(interval, fill=tweet.type)) + geom_freqpoly(aes(group = tweet.type, colour = tweet.type)) + opts(axis.text.x=theme_text(angle=-60, hjust=0, size = 6))
I would prefer to have a filled area, such as provided by geom_density, but without smoothing the line:
The geom_area has been suggested, is there any way to use a ggplot2-generated statistic, such as ..count.., for the geom_area's y-values? Or, does the count aggregation need to occur prior to using ggplot2?
As stated in the answer, geom_area(..., stat = "bin") is the solution:
ggplot(quake.data, aes(interval)) + geom_area(aes(y = ..count.., fill = tweet.type, group = tweet.type), stat = "bin") + opts(axis.text.x=theme_text(angle=-60, hjust=0, size = 6))
produces:
Perhaps you want:
geom_area(aes(y = ..count..), stat = "bin")
geom_ribbon can be used to produce a filled area between two lines without needing to explicitly construct a polygon. There is good documentation here.
ggplot(quake.data, aes(interval, fill=tweet.type, group = 1)) + geom_density()
But I don't think this is a meaningful graphic.
I'm not entirely sure what you're aiming for. Do you want a line or bars. You should check out geom_bar for filled bars. Something like:
p <- ggplot(data, aes(x = time, y = count))
p + geom_bar(stat = "identity")
If you want a line filled in underneath then you should look at geom_area which I haven't personally used but it appears the construct will be almost the same.
p <- ggplot(data, aes(x = time, y = count))
p + geom_area()
Hope that helps. Give some more info and we can probably be more helpful.
Actually i would throw on an index, just the row of the data and use that as x, and then use
p <- ggplot(data, aes(x = index, y = count))
p + geom_bar(stat = "identity") + scale_x_continuous("Intervals",
breaks = index, labels = intervals)

Resources