I want to show the added line via geom_abline in the legend since the bar chart is denoted in the x axis labels.
How embarrassing, not sure how i forgot toy data. I also cleaned up the example making sure i was running the most up to date version of R and ggplot (and reshape!) I forgot how it can make a difference sometimes
The end product is a bar chart with the added line (indicating the average) with this information showing in the legend, so a red dotted line that says "County Average".
library(ggplot2)
DataToPlot.. <- data.frame(UGB = c("EUG","SPR","COB","VEN"),
Rate = c( 782, 798,858,902))
ggplot(DataToPlot.. ,y = Rate, x = UGB) +
geom_bar(aes(x=UGB,y=Rate, fill = UGB),stat="identity",show.legend = FALSE) +
scale_fill_brewer(palette="Set3") +
geom_abline(aes(intercept = 777, slope = 0), colour = "red",
size = 1.25, linetype="dashed",show.legend = TRUE)
After playing around for awhile (it was not as easy as I expected) I used this:
library(ggplot2)
DataToPlot.. <- data.frame(UGB = c("EUG","SPR","COB","VEN"),
Rate = c( 782, 798,858,902))
x <- c(0.5,nrow(DataToPlot..)+0.5)
AvgLine.. <- data.frame(UGB=x,Rate=777,avg="777")
ggplot(DataToPlot.. ,y = Rate, x = UGB) +
geom_bar(aes(x=UGB,y=Rate, fill = UGB),stat="identity",show.legend=TRUE ) +
scale_fill_brewer(palette="Set3") +
geom_line(data=AvgLine..,aes(x=UGB,y=Rate,linetype=avg),
colour = "red", size = 1.25) +
scale_linetype_manual(values=c("777"="dashed")) +
# make the guide wider and specify the order
guides(linetype=guide_legend(title="Country Average",order=1,keywidth = 3),
color=guide_legend(title="UGB",order=2))
Note I couldn't coerce geom_abline to make its own guide. I had to create a dataframe. The x-coordinates for that line are basically the factor values, and I adjusted them to reach beyond the edges of the plot.
To get this:
Related
I would like to create a plot with points and lines between them, but with spaces, in ggplot2, R. I have a shaded area in the plot, so some parts of points has gray and white background. I found lemon library with geom_pointline function.
ggplot(data = dt, aes(x = x, y = y)) +
geom_ribbon(aes(ymin = min, ymax = max), fill = "gray", alpha = 0.35) +
geom_pointline(shape = 19, linecolor = "black", size = 4, color = "blue", distance = 2)
The result I get is shown below. As one can notice, the lines don't start and end in the middle of points, but rather at the top right and bottom left of the point. It gets even worse when I shorten the lines. I tried with many parameters but couldn't solve it. I would like the lines to start and end closer to the middle than it is now.
Thanks in advance!
If switching to an other package is an option for you then one option to achieve your desired result would be ggh4x::geom_pointpath whichs similar to geom_pointline adds some padding around points along a line or path. One drawback is that TBMK it has no option to set different colors for the points and the lines. A hack would be to draw the lines via ggh4x::geom_pointpath then add a geom_point on top of it.
Using some fake example data:
set.seed(123)
dt <- data.frame(
x = seq(20, 160, 20),
y = 1:8,
min = 1:8 - runif(8),
max = 1:8 + runif(8)
)
library(ggplot2)
library(ggh4x)
ggplot(data = dt, aes(x = x, y = y)) +
geom_ribbon(aes(ymin = min, ymax = max), fill = "gray", alpha = 0.35) +
geom_pointpath(shape = 19, size = 4, color = "black", mult = .25) +
geom_point(shape = 19, size = 4, color = "blue")
I'm trying to figure out how to modify a scatter-plot that contains two groups of data along a continuum separated by a large gap. The graph needs a break on the x-axis as well as on the regression line.
This R code using the ggplot2 library accurately presents the data, but is unsightly due to the vast amount of empty space on the graph. Pearson's correlation is -0.1380438.
library(ggplot2)
p <- ggplot(, aes(x = dis, y = result[, 1])) + geom_point(shape = 1) +
xlab("X-axis") +
ylab("Y-axis") + geom_smooth(color = "red", method = "lm", se = F) + theme_classic()
p + theme(plot.title = element_text(hjust = 0.5, size = 14))
This R code uses gap.plot to produce the breaks needed, but the regression line doesn't contain a break and doesn't reflect the slope properly. As you can see, the slope of the regression line isn't as sharp as the graph above and there needs to be a visible distinction in the slope of the line between those disparate groups.
library(plotrix)
gap.plot(
x = dis,
y = result[, 1],
gap = c(700, 4700),
gap.axis = "x",
xlab = "X-Axis",
ylab = "Y-Axis",
xtics = seq(0, 5575, by = 200)
)
abline(v = seq(700, 733) , col = "white")
abline(lm(result[, 1] ~ dis), col = "red", lwd = 2)
axis.break(1, 716, style = "slash")
Using MS Paint, I created an approximation of what the graph should look like. Notice the break marks on the top as well as the discontinuity between on the regression line between the two groups.
One solution is to plot the regression line in two pieces, using ablineclip to limit what's plotted each time. (Similar to #tung's suggestion, although it's clear that you want the appearance of a single graph rather than the appearance of facets.) Here's how that would work:
library(plotrix)
# Simulate some data that looks roughly like the original graph.
dis = c(rnorm(100, 300, 50), rnorm(100, 5000, 100))
result = c(rnorm(100, 0.6, 0.1), rnorm(100, 0.5, 0.1))
# Store the location of the gap so we can refer to it later.
x.axis.gap = c(700, 4700)
# gap.plot() works internally by shifting the location of the points to be
# plotted based on the gap size/location, and then adjusting the axis labels
# accordingly. We'll re-compute the second half of the regression line in the
# same way; these are the new values for the x-axis.
dis.alt = dis - x.axis.gap[1]
# Plot (same as before).
gap.plot(
x = dis,
y = result,
gap = x.axis.gap,
gap.axis = "x",
xlab = "X-Axis",
ylab = "Y-Axis",
xtics = seq(0, 5575, by = 200)
)
abline(v = seq(700, 733), col = "white")
axis.break(1, 716, style = "slash")
# Add regression line in two pieces: from 0 to the start of the gap, and from
# the end of the gap to infinity.
ablineclip(lm(result ~ dis), col = "red", lwd = 2, x2 = x.axis.gap[1])
ablineclip(lm(result ~ dis.alt), col = "red", lwd = 2, x1 = x.axis.gap[1] + 33)
I am creating a number of heatmaps in R, but I am having problems when it comes to keeping the colour scale consistent across graphs.
I find that the colours are scaled within a graph, is there a way to make colours consistent across graphs? Ie. So that that colour difference between a value of 0.4 and 0.5 is always the same?
Code Example:
set.seed(123)
d1 = matrix(rnorm(9, mean = 0.2, sd = 0.1), ncol = 3)
d2 = matrix(rnorm(9, mean = 0.8, sd = 0.1), ncol = 3)
mat = list(d1, d2)
for(m in mat)
heatmap(m, Rowv = NA ,Colv = NA)
You'll note in the example that cell (2,3) the first graph is similar to cell (1,3) in the second, despite being ~0.8 different
Here's a way to do it with ggplot2, if you're open to not using base graphics:
library(reshape2)
library(ggplot2)
# Set common limits for color scale
limits = range(unlist(mat))
Here's the code for two separate graphs. The last line of code for each graph ensures that they use the same z limits for setting the colors:
ggplot(melt(mat[[1]]), aes(Var1, Var2, fill=value)) +
geom_tile() +
scale_fill_continuous(limits=limits)
ggplot(melt(mat[[2]]), aes(Var1, Var2, fill=value)) +
geom_tile() +
scale_fill_continuous(limits=limits)
Another option is to plot both heatmaps in a single graph using facetting, which automatically ensures both graphs are on the same color scale:
ggplot(melt(mat), aes(Var1, Var2, fill=value)) +
geom_tile() +
facet_grid(. ~ L1)
I've used the default colors here, but for either approach you can set the color scale to be anything you wish. For example:
ggplot(melt(mat), aes(Var1, Var2, fill=value)) +
geom_tile() +
facet_grid(. ~ L1) +
scale_fill_gradient(low="red", high="green")
You could use the image function directly (heatmap uses image), though it will require some extra formatting to match the output of heatmap. You can use zlim to set the color range. Quoting from the ?image page:
the minimum and maximum z values for which colors should be plotted,
defaulting to the range of the finite values of z. Each of the given
colors will be used to color an equispaced interval of this range. The
midpoints of the intervals cover the range, so that values just
outside the range will be plotted.
# define zlim min and max for all the plots
minz = Reduce(min, mat)
maxz = Reduce(max, mat)
for(m in mat) {
image( m, zlim = c(minz, maxz), col = heat.colors(20))
}
To get closer to the formatting produced by heatmap, you can just reuse some code from the heatmap function:
for(m in mat) {
labCol = dim(m)[2]
labRow = dim(m)[1]
image(seq_len(labCol), seq_len(labRow), m, zlim = c(minz, maxz),
col = heat.colors(20), axes = FALSE, xlab = "", ylab = "",
xlim = 0.5 + c(0, labCol), ylim = 0.5 + c(0, labRow))
axis(1, 1L:labCol, labels = seq_len(labCol), las = 2, line = -0.5, tick = 0)
axis(4, 1L:labRow, labels = seq_len(labRow), las = 2, line = -0.5, tick = 0)
}
Using the breaks argument to image is another option. It allows more flexibility than zlim in setting the breakpoints for colors. Quoting from the help page, breaks is
a set of finite numeric breakpoints for the colours: must have one
more breakpoint than colour and be in increasing order. Unsorted
vectors will be sorted, with a warning.
How can I show the dots colored using the mosaic package to do a dotplot?
library(mosaic)
n=500
r =rnorm(n)
d = data.frame( x = sample(r ,n= 1,size = n, replace = TRUE), color = c(rep("red",n/2), rep("green",n/2)))
dotPlot(d$x,breaks = seq(min(d$x)-.1,max(d$x)+.1,.1))
right now all the dots are blue but I would like them to be colored according to the color column inthe data table
If you are still interested in a mosaic/lattice solution rather than a ggplot2 solution, here you go.
dotPlot( ~ x, data = d, width = 0.1, groups = color,
par.settings=list(superpose.symbol = list(pch = 16, col=c("green", "red"))))
resulting plot
Notice also
as with ggplot2, the colors are not determined by the values in your color variable but by the theme. You can use par.settings to modify this on the level of a plot or trellis.par.set() to change the defaults.
it is preferable to use a formula and data = and to avoid the $ operator.
you can use the width argument rather than breaks if you want to set the bin width. (You can use the center argument to control the centers of the bins if that matters to you. By default, 0 will be the center of a bin.)
You need to add stackgroups=TRUE so that the two different colors aren't plotted on top of each other.
n=20
set.seed(15)
d = data.frame(x = sample(seq(1,10,1), n, replace = TRUE),
color = c(rep("red",n/2), rep("green",n/2)))
table(d$x[order(d$x)])
length(d$x[order(d$x)])
binwidth= 1
ggplot(d, aes(x = x)) +
geom_dotplot(breaks = seq(0.5,10.5,1), binwidth = binwidth,
method="histodot", aes(fill = color),
stackgroups=TRUE) +
scale_x_continuous(breaks=1:10)
Also, ggplot uses its internal color palette for the fill aesthetic. You'd get the same colors regardless of what you called the values of the "color" column in your data. Add scale_fill_manual(values=c("green","red")) if you want to set the colors manually.
I am creating a number of histograms and I want to add annotations towards the top of the graph. I am plotting these using a for loop so I need a way to place the annotations at the top even though my ylims change from graph to graph. If I could store the ylim for each graph within the loop I could cause the y coordinates for my annotation to vary based on the current graph. The y value I include in my annotation must change dynamically as the loop proceeds across iterations. Here is some sample code to demonstrate my issue (Notice how the annotation moves around. I need it to change based on the ylim for each graph):
library(ggplot2)
cuts <- levels(as.factor(diamonds$cut))
pdf(file = "Annotation Example.pdf", width = 11, height = 8,
family = "Helvetica", bg = "white")
for (i in 1:length(cuts)) {
by.cut<-subset(diamonds, diamonds$cut == cuts[[i]])
print(ggplot(by.cut, aes(price)) +
geom_histogram(fill = "steelblue", alpha = .55) +
annotate ("text", label = "My annotation goes at the top", x = 10000 ,hjust = 0, y = 220, color = "darkred"))
}
dev.off()
ggplot uses Inf in its positions to represent the extremes of the plot range, without changing the plot range. So the y value of the annotation can be set to Inf, and the vjust parameter can also be adjusted to get a better alignment.
...
print(ggplot(by.cut, aes(price)) +
geom_histogram(fill = "steelblue", alpha = .55) +
annotate("text", label = "My annotation goes at the top",
x = 10000, hjust = 0, y = Inf, vjust = 2, color = "darkred"))
...
For i<-2, this looks as:
There may be a neater way, but you can get the max count and use that to set y in the annotate call:
for (i in 1:length(cuts)) {
by.cut<-subset(diamonds, diamonds$cut == cuts[[i]])
## get the cut points that ggplot will use. defaults to 30 bins and thus 29 cuts
by.cut$cuts <- cut(by.cut$price, seq(min(by.cut$price), max(by.cut$price), length.out=29))
## get the highest count of prices in a given cut.
y.max <- max(tapply(by.cut$price, by.cut$cuts, length))
print(ggplot(by.cut, aes(price)) +
geom_histogram(fill = "steelblue", alpha = .55) +
## change y = 220 to y = y.max as defined above
annotate ("text", label = "My annotation goes at the top", x = 10000 ,hjust = 0, y = y.max, color = "darkred"))
}