I've become quite fond of boxplots in which jittered points are overlain over the boxplots to represent the actual data, as below:
set.seed(7)
l1 <- gl(3, 1, length=102, labels=letters[1:3])
l2 <- gl(2, 51, length=102, labels=LETTERS[1:2]) # Will use this later
y <- runif(102)
d <- data.frame(l1, l2, y)
ggplot(d, aes(x=l1, y=y)) +
geom_point(position=position_jitter(width=0.2), alpha=0.5) +
geom_boxplot(fill=NA)
(These are particularly helpful when there are very different numbers of data points in each box.)
I'd like to use this technique when I am also (implicitly) using position_dodge to separate boxplots by a second variable, e.g.
ggplot(d, aes(x=l1, y=y, colour=l2)) +
geom_point(position=position_jitter(width=0.2), alpha=0.5) +
geom_boxplot(fill=NA)
However, I can't figure out how to dodge the points by the colour variable (here, l2) and also jitter them.
Here is an approach that manually performs the jittering and dodging.
# a plot with no dodging or jittering of the points
dp <- ggplot(d, aes(x=l1, y=y, colour=l2)) +
geom_point(alpha=0.5) +
geom_boxplot(fill=NA)
# build the plot for rendering
foo <- ggplot_build(dp)
# now replace the 'x' values in the data for layer 1 (unjittered and un-dodged points)
# with the appropriately dodged and jittered points
foo$data[[1]][['x']] <- jitter(foo$data[[2]][['x']][foo$data[[1]][['group']]],amount = 0.2)
# now draw the plot (need to explicitly load grid package)
library(grid)
grid.draw(ggplot_gtable(foo))
# note the following works without explicitly loading grid
plot(ggplot_gtable(foo))
I don't think you'll like it, but I've never found a way around this except to produce your own x values for the points. In this case:
d$l1.num <- as.numeric(d$l1)
d$l2.num <- (as.numeric(d$l2)/3)-(1/3 + 1/6)
d$x <- d$l1.num + d$l2.num
ggplot(d, aes(l1, y, colour = l2)) + geom_boxplot(fill = NA) +
geom_point(aes(x = x), position = position_jitter(width = 0.15), alpha = 0.5) + theme_bw()
It's certainly a long way from ideal, but becomes routine pretty quickly. If anyone has an alternative solution, I'd be very happy!
The new position_jitterdodge() works for this. However, it requires the fill aesthetic to tell it how to group points, so you have to specify a manual fill to get uncolored boxes:
ggplot(d, aes(x=l1, y=y, colour=l2, fill=l2)) +
geom_point(position=position_jitterdodge(width=0.2), alpha=0.5) +
geom_boxplot() + scale_fill_manual(values=rep('white', length(unique(l2))))
I'm using a newer version of ggplot2 (ggplot2_2.2.1.9000) and I was struggling to find an answer that worked for a similar plot of my own. #John Didon's answer produced an error for me; Error in position_jitterdodge(width = 0.2) : unused argument (width = 0.2). I had previous code that worked with geom_jitter that stopped working after downloading the newer version of ggplot2. This is how I solved it below - minimal-fuss code....
ggplot(d, aes(x=l1, y=y, colour=l2, fill=l2)) +
geom_point(position = position_jitterdodge(dodge.width = 1,
jitter.width = 0.5), alpha=0.5) +
geom_boxplot(position = position_dodge(width = 1), fill = NA)
Another option would be to use facets:
set.seed(7)
l1 <- gl(3, 1, length=102, labels=letters[1:3])
l2 <- gl(2, 51, length=102, labels=LETTERS[1:2]) # Will use this later
y <- runif(102)
d <- data.frame(l1, l2, y)
ggplot(d, aes(x=l1, y=y, colour=l2)) +
geom_point(position=position_jitter(width=0.2), alpha=0.5) +
geom_boxplot(fill=NA) +
facet_grid(.~l2) +
theme_bw()
Sorry, donĀ“t have enough points to post the resulting graph.
Related
This question already has an answer here:
draw straight line between any two point when using coord_polar() in ggplot2 (R)
(1 answer)
Closed 2 years ago.
I am making a polar violin plot. I would like to add lines and labels to the plot to annotate what each spoke means.
I'm running into two problems.
The first is that when I try to create line segments, if x != xend, then the segments are drawn as curves rather than as lines.
For example:
data.frame(
x = rnorm(1000),
spoke = factor(sample(1:6, 1000, replace=T))
) %>%
ggplot(aes(x = spoke, fill=spoke, y = x)) +
geom_violin() +
coord_polar() +
annotate("segment", x=1.1, xend=1.3, y=0, yend=3, color="black", size=0.6) +
theme_minimal()
The second problem that arises occurs when I try to add an annotation between the last spoke and the first. In this case, the annotation causes the coordinate scale to shift, so that spokes are no longer evenly distributed.
See as here:
data.frame(
x = rnorm(1000),
spoke = factor(sample(1:5, 1000, replace=T))
) %>%
ggplot(aes(x = spoke, fill=spoke, y = x)) +
geom_violin() +
coord_polar() +
scale_x_discrete(limits = 1:5) +
annotate("segment", x=5.9, xend=5.7, y=0, yend=3, color="black", size=0.6) +
theme_minimal()
Any assistance is greatly appreciated!
(PS: I do understand that there are perceptual issues with plots like these. I have a good reason...)
You want an 'generic annotation' as shown here
You basically have to overlay your plots and not use the layer facility, if you don't want to exactly calculate the distance in radians of each x for each y.
With cowplot
require(ggplot2) #again, you should specify your required packages in your question as well
require(cowplot)
my_dat <- data.frame(x = rnorm(1000),
spoke = factor(sample(1:6, 1000, replace=T)))
my_annot <- data.frame(para = c('start','end'), x = c(0,0.4), y = c(0,0.2))
#first point x/y = c(0,0) because this makes positioning easier
When I edited your question and removed the piping - that was not only a matter of good style, but also makes it much easier to then work with your different plots. So - I would suggest you should remove the pipe.
p1 <- ggplot(my_dat, aes(x = spoke, fill=spoke, y = x)) +
geom_violin() +
theme_minimal()+
coord_polar()
p2 <- ggplot(my_annot) +
geom_line(aes(x,y)) +
coord_cartesian(xlim = c(0,2), ylim =c(0,2)) +
# the limits change the length of your line too
theme_void()
ggdraw() +
draw_plot(p1) +
draw_plot(p2, x = 0.55, y = 0.6)
Obviously - you can now play around with both length of your line and its position within draw_plot()
I want to recreate an "image" plot in ggplot (because of some other aspects of the package). However, I'm facing a problem caused by my y-scale, which is defined by unequally but logically spaced values, e.g. I would have z values for y = 2,4,8,16,32. This causes the tiles to not be equally large, so I have these white bands in my figure. I can solve this by transforming the y values in a factor, but I don't want to do this because I'm also trying to plot other geom objects on the figure which require a numeric scale.
This clearifies my problem a bit:
# random data, with y scale numeric
d <- data.frame(Var1=rep(1901:2000,10),Var2=rep(c(2,4,8,16,32),each=100),value=rnorm(500,50,5))
line=data.frame(Var1=1901:2000,Var2=rnorm(50,1.5,0.5))
ggplot(d, aes(x=Var1, y=Var2)) +
geom_tile(aes(fill=value)) +
geom_line(data=line)
# y as factor
d2 = d
d2$Var2=as.factor(d2$Var2) ggplot(d2, aes(x=Var1, y=Var2)) +
geom_tile(aes(fill=value)) +
geom_line(data=line)
I tried attributing the line values to the value of the nearest factor level, but this introduces a big error. Also, I tried the size option in geom_tile, but this didn't work out either.
In the example the y data is log transformed, but this is just for the ease of making a fake dataset.
Thank you.
Something like this??
ggplot(d, aes(x=Var1, y=Var2)) +
geom_tile(aes(fill=value)) +
geom_line(data=line)+
scale_y_continuous(trans="log2")
Note the addition of scale_y_continuous(trans="log2")
EDIT Based on OP's comment below.
There is no built-in "reverse log2 transform", but it is possible to create new transformations using the trans_new(...) function in package scales. And, naturally, someone has already thought of this: ggplot2 reverse log coordinate transform. The code below is based on the link.
library(scales)
reverselog2_trans <- function(base = 2) {
trans <- function(x) -log(x, base)
inv <- function(x) base^(-x)
trans_new(paste0("reverselog-", format(base)), trans, inv, log_breaks(base = base), domain = c(1e-100, Inf))
}
ggplot(d, aes(x=Var1, y=Var2)) +
geom_tile(aes(fill=value)) +
geom_line(data=line)+
scale_y_continuous(trans="reverselog2")
Perhaps another approach using a discrete scale and facets might be a possibility:
d <- data.frame(Var1=rep(1901:2000,10),Var2=rep(c(2,4,8,16,32),each=100),value=rnorm(500,50,5), chart="tile" )
d$Var2 <- factor(d$Var2, levels=rev(unique(d$Var2)))
line <- data.frame(Var1=1901:2000,Var2=rnorm(50,1.5,0.5), chart="line")
ggplot(d, aes(x=Var1, y=Var2)) +
geom_tile(aes(y = Var2, fill=value) ) +
geom_line( data=line ) +
scale_y_discrete() +
facet_grid( chart ~ ., scale = "free_y", space="free_y")
which gives a chart like:
I am trying to improve the clarity and aspect of a histogram of discrete values which I need to represent with a log scale.
Please consider the following MWE
set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
class(data$dist)
ggplot(data, aes(x=dist)) + geom_histogram()
which produces
and then
ggplot(data, aes(x=dist)) + geom_line() + scale_x_log10(breaks=c(1,2,3,4,5,10,100))
which probably is even worse
since now it gives the impression that the something is missing between "1" and "2", and also is not totally clear which bar has value "1" (bar is on the right of the tick) and which bar has value "2" (bar is on the left of the tick).
I understand that technically ggplot provides the "right" visual answer for a log scale. Yet as observer I have some problem in understanding it.
Is it possible to improve something?
EDIT:
This what happen when I applied Jaap solution to my real data
Where do the dips between x=0 and x=1 and between x=1 and x=2 come from? My value are discrete, but then why the plot is also mapping x=1.5 and x=2.5?
The first thing that comes to mind, is playing with the binwidth. But that doesn't give a great solution either:
ggplot(data, aes(x=dist)) +
geom_histogram(binwidth=10) +
scale_x_continuous(expand=c(0,0)) +
scale_y_continuous(expand=c(0.015,0)) +
theme_bw()
gives:
In this case it is probably better to use a density plot. However, when you use scale_x_log10 you will get a warning message (Removed 524 rows containing non-finite values (stat_density)). This can be resolved by using a log plus one transformation.
The following code:
library(ggplot2)
library(scales)
ggplot(data, aes(x=dist)) +
stat_density(aes(y=..count..), color="black", fill="blue", alpha=0.3) +
scale_x_continuous(breaks=c(0,1,2,3,4,5,10,30,100,300,1000), trans="log1p", expand=c(0,0)) +
scale_y_continuous(breaks=c(0,125,250,375,500,625,750), expand=c(0,0)) +
theme_bw()
will give this result:
I am wondering, what if, y-axis is scaled instead of x-axis. It will results into few warnings wherever values are 0, but may serve your purpose.
set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
class(data$dist)
ggplot(data, aes(x=dist)) + geom_histogram() + scale_y_log10()
Also you may want to display frequencies as data labels, since people might ignore the y-scale and it takes some time to realize that y scale is logarithmic.
ggplot(data, aes(x=dist)) + geom_histogram(fill = 'skyblue', color = 'grey30') + scale_y_log10() +
stat_bin(geom="text", size=3.5, aes(label=..count.., y=0.8*(..count..)))
A solution could be to convert your data to a factor:
library(ggplot2)
set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
ggplot(data, aes(x=factor(dist))) +
geom_histogram(stat = "count") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
Resulting in:
I had the same issue and, inspired by #Jaap's answer, I fiddled with the histogram binwidth using the x-axis in log scale.
If you use binwidth = 0.201, the bars will be juxtaposed as expected. However, this means you can only have up to five bars between two x coordinates.
set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
class(data$dist)
ggplot(data, aes(x=dist)) +
geom_histogram(binwidth = 0.201, color = 'red') +
scale_x_log10()
Result:
I would like to create a function that produce a ggplot graph.
data1 <- data.table(x=1:5, y=1:5, z=c(1,2,1,2,1))
data2 <- data.table(x=1:5, y=11:15, z=c(1,2,1,2,1))
myfun <- function(data){
ggplot(data, aes(x=x, y=y)) +
geom_point() +
geom_text(aes(label=y), y=3) +
facet_grid(z~.)
}
myfun(data2)
It is supposed to label some text on the graph. However, without knowing the data in advance I am unable to adjust the positions of text vertically manually. Especially I don't want the label to move positions with data: I want it always stays at about 1/4 vertically of the plots. (top-mid)
How can I do that?
Is there a function that returns the y.limit.up and y.limit.bottom then I can assign y = (y.limit.up + y.limit.bottm) / 2 or something.
Setting either x or y position in geom_text(...) relative to the plot scale in a facet is actually a pretty big problem. #agstudy's solution works if the y scale is the same for all facets. This is because, in calculating range (or max, or min, etc), ggplot uses the unsubsetted data, not the data subsetted for the appropriate facet (see this question).
You can achieve what you want using auxiliary tables, though.
data1 <- data.table(x=1:5, y=1:5, z=c(1,2,1,2,1))
data2 <- data.table(x=1:5, y=11:15, z=c(1,2,1,2,1))
myfun <- function(data){
label.pos <- data[,ypos:=min(y)+0.75*diff(range(y)),by=z] # 75% to the top...
ggplot(data, aes(x=x, y=y)) +
geom_point() +
# geom_text(aes(label=y), y=3) +
geom_text(data=label.pos, aes(y=ypos, label=y)) +
facet_grid(z~., scales="free") # note scales = "free"
}
myfun(data2)
Produces this.
If you want scales="fixed", then #agstudy's solution is the way to go.
You can do this for example:
ggplot(data2, aes(x=x)) +
geom_point(aes(y=y)) +
geom_text(aes(label=y, y=mean(range(y)))) +
facet_grid(z~.)
Or fix y limits manually:
scale_y_continuous(limits = c(10, 15))
#user890739 :
with geom_density you can estimate an ypos variable like this :
data<-dplyr::mutate(group_by(data, z), ypos=max(density(y)$y)*.75*nrow(data))
Then plot the result :
ggplot(data, aes(x=x)) +
stat_density(aes(y=..density..)) +
geom_text(aes(label=y, y=ypos)) +
facet_grid(z~., scales="free")
is there a way in ggplot2 to get the plot type "b"? See example:
x <- c(1:5)
y <- x
plot(x,y,type="b")
Ideally, I want to replace the points by their values to have something similar to this famous example:
EDIT:
Here some sample data (I want to plot each "cat" in a facet with plot type "b"):
df <- data.frame(x=rep(1:5,9),y=c(0.02,0.04,0.07,0.09,0.11,0.13,0.16,0.18,0.2,0.22,0.24,0.27,0.29,0.31,0.33,0.36,0.38,0.4,0.42,0.44,0.47,0.49,0.51,0.53,0.56,0.58,0.6,0.62,0.64,0.67,0.69,0.71,0.73,0.76,0.78,0.8,0.82,0.84,0.87,0.89,0.91,0.93,0.96,0.98,1),cat=rep(paste("a",1:9,sep=""),each=5))
Set up the axes by drawing the plot without any content.
plot(x, y, type = "n")
Then use text to make your data points.
text(x, y, labels = y)
You can add line segments with lines.
lines(x, y, col = "grey80")
EDIT: Totally failed to clock the mention of ggplot in the question. Try this.
dfr <- data.frame(x = 1:5, y = 1:5)
p <- ggplot(dfr, aes(x, y)) +
geom_text(aes(x, y, label = y)) +
geom_line(col = "grey80")
p
ANOTHER EDIT: Given your new dataset and request, this is what you need.
ggplot(df, aes(x, y)) + geom_point() + geom_line() + facet_wrap(~cat)
YET ANOTHER EDIT: We're starting to approach a real question. As in 'how do you make the lines not quite reach the points'.
The short answer is that that isn't a standard way to do this in ggplot2. The proper way to do this would be to use geom_segment and interpolate between your data points. This is quite a lot of effort however, so I suggest an easier fudge: draw big white circles around your points. The downside to this is that it makes the gridlines look silly, so you'll have to get rid of those.
ggplot(df, aes(x, y)) +
facet_wrap(~cat) +
geom_line() +
geom_point(size = 5, colour = "white") +
geom_point() +
opts(panel.background = theme_blank())
There's an experimental grob in gridExtra to implement this in Grid graphics,
library(gridExtra)
grid.newpage() ; grid.barbed(pch=5)
This is now easy with ggh4x::geom_pointpath. Set shape = NA and add a geom_text layer.
library(ggh4x)
#> Loading required package: ggplot2
df <- data.frame(x = rep(1:5, each = 5),
y = c(outer(seq(0, .8, .2), seq(0.02, 0.1, 0.02), `+`)),
cat = rep(paste0("a", 1:5)))
ggplot(df, aes(x, y)) +
geom_text(aes(label = cat)) +
geom_pointpath(aes(group = cat, shape = NA))
Created on 2021-11-13 by the reprex package (v2.0.1)
Another way to make great slope graphs is using the package CGPfunctions.
library(CGPfunctions)
newggslopegraph(newcancer, Year, Survival, Type)
You have also many options to choose. You can find a good tutorial here:
https://www.r-bloggers.com/2018/06/creating-slopegraphs-with-r/