Ggplot2: Heatmap + Bar chart combination and tweaking - r

I'm new to R and am trying to visualise data that is broken down by company and by year for a project at university. I want to try and add bars (as in bar chart bars) to the top and right hand side of the heatmap I've created below, to provide a direct method of comparing count data between the companies (right y axis), and between years for the group as a whole (top x axis).
I know I could probably merge images in Illustrator or something pre-publication, but I know the software is capable of adding together graphics (in a Lattice i think?) and I would like to improve my skills with R and ggplot.
Ideally I would also like to learn to:
1) add the values of each tile superimposed on top
geom_text(aes(fill = trialx.m$value, label = trialx.m$value)
doesn't seem to work for me?
2) Move the legend so that it was out of the way of the bar plots
3) Adjust the legend size and scale (to account for the data scaling)
4) order the heatmap vertically with the bar sizes
I know this is a lot, but I would appreciate help or advice about any part.
What I'm currently doing:
Re-scaling
library(ggplot2)
trialx.m <- melt(trialx)
trialx.m <- ddply(trialx.m, .(variable), transform, rescale = scale(value))
Plotting
(p <- ggplot(trialx.m, aes(variable, Company)) +
geom_tile(aes(fill = rescale), colour = "white") +
scale_fill_gradient(low = "ghostwhite", high = "darkblue"))
Neaten, remove background, rotate text etc.
p + theme_grey(base_size = base_size) + labs(x = "", y = "") +
scale_x_discrete(expand = c(0, 0)) + scale_y_discrete(expand = c(0, 0)) +
opts(legend.position = "", axis.ticks = theme_blank(),
axis.text.x = theme_text(size = base_size * 0.8, angle = 90, hjust = 0,
colour = "grey50"))
Here is a dropbox link to the data I am using:

Related

Adding space *just* on right size of x-axis, color based on relative position, specify labels

I have a time series graph of 49 countries, and I'd like to do three things: (1) prevent the country label name from being cut off, (2) specify so that the coloring is based on the position in the graph rather than alphabetically, and (3) specify which countries I would like to label (49 labels in one graph is too many).
library(ggplot2)
library(directlabels)
library(zoo)
library(RColorBrewer)
library(viridis)
colourCount = length(unique(df$newCol))
getPalette = colorRampPalette(brewer.pal(11, "Paired"))
## Yearly Incorporation Rates
ggplot(df,aes(x=year2, y=total_count_th, group = newCol, color = newCol)) +
geom_line() +
geom_dl(aes(label = newCol),
method= list(dl.trans(x = x + 0.1),
"last.points", cex = 0.8)) +
scale_color_manual(values = getPalette(colourCount)) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
legend.position = "none") +
labs(title = "Title",
x = "Year",
y = "Count")
This code works -- there are 49 lines, and each of them is labelled. But it just so happens that all the countries with the highest y-values have the same/similar colors (red/orange). So is there a way to specify the colors dynamically (maybe with scale_color_identity)? And how do I add space just on the right side of the labels? I found the expand = expand_scale, but it added space on both sides (though I did read that in the new version, it should be possible to do so.)
I am also fine defining a list of 49 manually-defined colors rather than using the color ramp.
One way to do it is to limit the x axis by adding something like
coord_cartesian(xlim = c(1,44), expand = TRUE)
In this case, I had 41 years of observations on the axis, so by specifying 44, I added space to the x-axis.
Thank you to #JonSpring for the help and getting me to the right answer!

Is there a way to overlay a line plot (with a finer resolution) onto a bar plot (with a lower resolution)?

So I have a dataset of performance scores with an associated difficulty value, and I want to display the average performance score per difficulty value. The difficulty values range from 0 to 10, but have up to 10 decimal points and as a result are hyper specific. To make this more legible, I've been grouping the difficulty scores into bins. I've done this at two different resolutions, bins of width 0.1, and bins of width 1.
What I would like to do, is display a line plot (using the finer data points), on top of a bar plot (using the wider resolution), but I want the bar plot to maintain its structure. Right now, when I try to overlay the line plot, the x-axis seems to scale to the line plot, and the bars end up extremely narrow.
Here's the bar plot code:
g1.4 = ggplot() +
geom_bar(data = grouped_diff_wide, aes(y=mean_diff_perf, x=gr, fill=subject), stat = "identity" )+
facet_wrap(~subject)+
ggtitle("Average Performance By Difficulty")+
labs(fill = "Subject")+
ylab("Performance")+
xlab("Difficulty")+
scale_x_discrete(breaks = diff_breaks_wide, labels = seq(0, 9, 1))
g1.4
And the resulting graph:
just the bar plot
Here's the line plot code:
g1.5 = ggplot() +
geom_line(data = grouped_diff_fine, aes(y=mean_diff_perf, x = gr, group = 1))+
facet_wrap(~subject)+
ggtitle("Average Performance By Difficulty")+
labs(fill = "Subject")+
ylab("Performance")+
xlab("Difficulty")+
scale_x_discrete(breaks = diff_breaks_fine, labels = seq(0, 9, 1))
g1.5
And the resulting graph: just the line plot
And here's my attempt to combine them:
g1.6 = ggplot() +
geom_bar(data = grouped_diff_wide, aes(y=mean_diff_perf, x=gr, fill=subject), stat = "identity" )+
geom_line(data = grouped_diff_fine, aes(y=mean_diff_perf, x = gr, group = 1))+
facet_wrap(~subject)+
ggtitle("Average Performance By Difficulty")+
labs(fill = "Subject")+
ylab("Performance")+
xlab("Difficulty")+
scale_x_discrete(breaks = diff_breaks_fine, labels = seq(0, 9, 1))
g1.6
And how it turns out: combined plot with skinny bars
Is there a way to maintain the proportions of the stand alone bar plot but with the line plot overlayed?
you can use the width parameter of geom_bar (reference see here). As a very simple example using the built-in mtcars data:
ggplot(mtcars, aes(x = mpg, y = disp)) +
geom_bar(stat = "identity", width = 1.1) +
geom_line(colour = "blue", size = 2)

Problem of different x-axis position when using grid.arrange and legend on bottom

I have to arrange two plots with same axes next to each other and did this with ggplot2 and grid.arrange. Because of a more tidy representation, the legends have to be placed bottom. Unfortunately some times the left plot has more legend entries than the right one and therefore needs a second line, yielding x-axes on different y positions. Therefore it does not only look untidy, the aim of being able to compare these plots is not fulfilled anymore.
Can anybody help?
plot_left <- some_ggplot2_fct(variable,left) +
theme(legend.position = "bottom")+
theme(legend.background = element_rect(size = 0.5, linetype="solid", colour ="black"))
plot_right <- some_ggplot2_fct(variable,right,f)+
theme(legend.position = "bottom")+
theme(legend.background = element_rect(size = 0.5, linetype="solid", colour ="black"))
# adjust y axis for more easy compare
upper_lim <- max(plot_Volume_right$data$value, plot_Volume_left$data$value)
lower_lim <- min(plot_Volume_right$data$value, plot_Volume_left$data$value)
plot_Volume_left <- plot_Volume_left + ylim(c(lower_lim, upper_lim))
plot_Volume_right <- plot_Volume_right + ylim(c(lower_lim, upper_lim))
# Arrange plots in grid
grid.arrange(plot_Volume_left, plot_Volume_right,
ncol = 2,
top = textGrob(strTitle,
gp = gpar(fontfamily = "Raleway", fontsize = 15, font = 2)))
In the picture you can see the result:
Do you now an easy way to solve this without too much change in code? (The underlying framework is quite large)

Create a concentric circle legend for a ggplot bubble chart

I am trying to recreate this visualization of a bubble chart using ggplot2 (I have found the code for doing this in R, but not with the ggplot2 package). This is what I have so far. There are some other errors with my code at the moment, but I want to have the legend show concentric circles for size, versus circles shown in rows. Thanks for your help!
Original visualization:
My reproduction:
My (simplified) code:
crime <-
read.csv("http://datasets.flowingdata.com/crimeRatesByState2005.tsv",
header=TRUE, sep="\t")
ggplot(crime,
mapping= aes(x=murder, y=burglary))+
geom_point(aes(size=population), color="red")+
geom_text(aes(label=state.name), show.legend=FALSE, size=3)+
theme(legend.position = c(0.9, 0.2))
Here's an approach where we build the legend as imagined from scratch.
1) This part slightly tweaks your base chart.
Thank you for including the source data. I missed that earlier and have edited this answer to use it. I switched to a different point shape so that we can specify both outside border (color) as well as interior fill.
gg <- ggplot(crime,
mapping= aes(x=murder, y=burglary))+
geom_point(aes(size=population), shape = 21, color="white", fill = "red")+
ggrepel::geom_text_repel(aes(label = state.name),
size = 3, segment.color = NA,
point.padding = unit(0.1, "lines")) +
theme_classic() +
# This scales area to size (not radius), specifies max size, and hides legend
scale_size_area(max_size = 20, guide = FALSE)
2) Here I make another table to use for the concentric legend circles
library(dplyr); library(ggplot2)
legend_bubbles <- data.frame(
label = c("3", "20", "40m"),
size = c(3E6, 20E6, 40E6)
) %>%
mutate(radius = sqrt(size / pi))
3) This section adds the legend bubbles, text, and title.
It's not ideal, since different print sizes will require placement tweaks. But it seems like it'd get complicated to get into the underlying grobs with ggplot_build to extract and use those sizing adjustments...
gg + geom_point(data = legend_bubbles,
# The "radius/50" was trial and error. Better way?
aes(x = 8.5, y = 250 + radius/50, size = size),
shape = 21, color = "black", fill = NA) +
geom_text(data = legend_bubbles, size = 3,
aes(x = 8.5, y = 275 + 2 * radius/50, label = label)) +
annotate("text", x = 8.5, y = 450, label = "Population", fontface = "bold")

how to show all data points on y axis in ggplot heatmaps?

I've created a heatmap using ggplot
library(plyr)
library(scales)
guide_ind <- ddply(guide_tag[company == FALSE], .(tag), transform, rescale = rescale(count))
(p <- ggplotly(ggplot(guide_ind, aes(tag, username)) +
geom_tile(aes(fill = rescale),colour = "white") +
scale_fill_gradient(low = "white",high = "steelblue") +
theme(axis.text.x = element_text(angle= 90, hjust=1), legend.position= "bottom") )
)
I have about 700 rows of user name, and I would like to make sure that all the usernames are visible in document so that when I produce this in markdown, it will show the names individually instead of overlapping like the picture below.
I've tried using the fig.height, and gplot heatmap, but neither has worked.
Does anyone have suggestions to how to make all data points visible on the yaxis?

Resources