I have been working on creating a heatmap for a few days and I cannot get the final formating of gridlines to work. See the codes and attached plots below. What I am trying to do is to align the gridline along the tiles of the heatmap using geom_tile() so each tile fills the inside of the grid in a box way. I was able to align the gridlines using geom_raster() but the y-axis label ticks at either the top or the bottom of the tile but I need it to tick at the center (See red highlight), also I cannot get geom_raster to wrap a white line border around the tiles so the color blocks looks a bit disorganized in my original dataset. Would be grateful for any help with the formatting codes. Thanks very much!
#The data set in long format
y<- c("A","A","A","A","B","B","B","B","B","C","C","C","D","D","D")
x<- c("2020-03-01","2020-03-15","2020-03-18","2020-03-18","2020-03-01","2020-03-01","2020-03-01","2020-03-01","2020-03-05","2020-03-06","2020-03-05","2020-03-05","2020-03-20","2020-03-20","2020-03-21")
v<-data.frame(y,x)
#approach 1 using geom_tile but gridline does not align with borders of the tiles
v%>%
count(y,x,drop=FALSE)%>%
arrange(n)%>%
ggplot(aes(x=x,y=fct_reorder(y,n,sum)))+
geom_tile(aes(fill=n),color="white", size=0.25)
I have tried running similar codes from another post but I wasn't able to get it to run properly. I think because my x variable is a count variable of y variable so cannot be formatted into a factor variable to specify xmin and xmax in geom_rect()
#approach 2 using geom_raster but y-axis label can't tick at the center of tiles and there's no border around the tile to differentiate between tiles.
v%>%
count(y,x,drop=FALSE)%>%
arrange(n)%>%
ggplot()+
geom_raster(aes(x=x,y=fct_reorder(y,n,sum),fill=n),hjust=0,vjust=0)
I think it makes sense to keep the ticks and in turn the grid lines where they are. To still achieve what you're looking for, I would suggest you expand your data to include all possible combinations and simply set the na.value to a neutral fill color:
# all possible combinations
all <- v %>% expand(y, x)
# join with all, n will be NA for obs. in all that are not present in v
v = v %>% group_by_at(vars(y, x)) %>%
summarize(n = n()) %>% right_join(all)
ggplot(data = v,
aes(x=x, y=fct_reorder(y,n, function(x) sum(x, na.rm = T))))+ # note that you must account for the NA values now
geom_tile(aes(fill=n), color="white",
size=0.25) +
scale_fill_continuous(na.value = 'grey90') +
scale_x_discrete(expand = c(0,0)) +
scale_y_discrete(expand = c(0,0))
This is a bit of a hack. My approach converts the categorical variables to numerics which adds minor grid lines to the plot which align with the tiles. To get rid of the major grid lines I simply use theme(). Drawback: Breaks and labels have to be set manually.
library(ggplot2)
library(dplyr)
library(forcats)
v1 <- v %>%
count(y,x,drop=FALSE)%>%
arrange(n) %>%
mutate(y = fct_reorder(y, n, sum),
y1 = as.integer(y),
x = factor(x),
x1 = as.integer(x))
labels_y <- levels(v1$y)
breaks_y <- seq_along(labels_y)
labels_x <- levels(v1$x)
breaks_x <- seq_along(labels_x)
ggplot(v1, aes(x=x1, y=y1))+
geom_tile(aes(fill=n), color="white", size=0.25) +
scale_y_continuous(breaks = breaks_y, labels = labels_y) +
scale_x_continuous(breaks = breaks_x, labels = labels_x) +
theme(panel.grid.major = element_blank())
Created on 2020-05-23 by the reprex package (v0.3.0)
Edit: Checked for long var names
y<- c("John Doe","John Doe","John Doe","John Doe","Mary Jane","Mary Jane","Mary Jane","Mary Jane","Mary Jane","C","C","C","D","D","D")
x<- c("2020-03-01","2020-03-15","2020-03-18","2020-03-18","2020-03-01","2020-03-01","2020-03-01","2020-03-01","2020-03-05","2020-03-06","2020-03-05","2020-03-05","2020-03-20","2020-03-20","2020-03-21")
v<-data.frame(y,x)
Created on 2020-05-23 by the reprex package (v0.3.0)
Related
I use R for most of my data analysis. Until now I used to export the results as a CSV and visualized them using Macs Numbers.
The reason: The Graphs are embeded in documents and there is a rather large border on the right side reserved for annotations (tufte handout style). Between the acutal text and the annotations column there is white space. The plot of the graphs needs to fit the width of text while the legend should be placed in the annotation column.
I would prefer to also create the plots within R for a better workflow and higher efficiency. Is it possible to create such a layout using plotting with R?
Here is an example of what I would like to achieve:
And here is some R Code as a starter:
library(tidyverse)
data <- midwest %>%
head(5) %>%
select(2,23:25) %>%
pivot_longer(cols=2:4,names_to="Variable", values_to="Percent") %>%
mutate(Variable=factor(Variable, levels=c("percbelowpoverty","percchildbelowpovert","percadultpoverty"),ordered=TRUE))
ggplot(data=data, mapping=aes(x=county, y=Percent, fill=Variable)) +
geom_col(position=position_dodge(width=0.85),width=0.8) +
labs(x="County") +
theme(text=element_text(size=9),
panel.background = element_rect(fill="white"),
panel.grid = element_line(color = "black",linetype="solid",size= 0.3),
panel.grid.minor = element_blank(),
panel.grid.major.x=element_blank(),
axis.line.x=element_line(color="black"),
axis.ticks= element_blank(),
legend.position = "right",
legend.title = element_blank(),
legend.box.spacing = unit(1.5,"cm") ) +
scale_y_continuous(breaks= seq(from=0, to=50,by=5),
limits=c(0,51),
expand=c(0,0)) +
scale_fill_manual(values = c("#CF232B","#942192","#000000"))
I know how to set a custom font, just left it out for easier saving.
Using ggsave
ggsave("Graph_with_R.jpeg",plot=last_plot(),device="jpeg",dpi=300, width=18, height=9, units="cm")
I get this:
This might resample the result aimed for in the actual case, but the layout and sizes do not fit exact. Also recognize the different text sizes between axis titles, legend and tick marks on y-axes. In addition I assume the legend width depends on the actual labels and is not fixed.
Update
Following the suggestion of tjebo I posted a follow-up question.
Can it be done? Yes. Is it convenient? No.
If you're working in ggplot2 you can translate the plot to a gtable, a sort of intermediate between the plot specifications and the actual drawing. This gtable, you can then manipulate, but is messy to work with.
First, we need to figure out where the relevant bits of our plot are in the gtable.
library(ggplot2)
library(gtable)
library(grid)
plt <- ggplot(mtcars, aes(factor(cyl), fill = factor(vs))) +
geom_bar(position = position_dodge2(preserve = "single"))
# Making gtable
gt <- ggplotGrob(plt)
gtable_show_layout(gt)
Then, we can make a new gtable with prespecified dimensions and place the bits of our old gtable into it.
# Making a new gtable
new <- gtable(widths = unit(c(12.5, 1.5, 4), "cm"),
heights = unit(9, "cm"))
# Adding main panel and axes in first cell
new <- gtable_add_grob(
new,
gt[7:9, 3:5], # If you see the layout above as a matrix, the main bits are in these rows/cols
t = 1, l = 1
)
# Finding the legend
legend <- gt$grobs[gt$layout$name == "guide-box"][[1]]
legend <- legend$grobs[legend$layout$name == "guides"][[1]]
# Adding legend in third cell
new <- gtable_add_grob(
new, legend, t = 1, l = 3
)
# Saving as raster
ragg::agg_png("test.png", width = 18, height = 9, units = "cm", res = 300)
grid.newpage(); grid.draw(new)
dev.off()
#> png
#> 2
Created on 2021-04-02 by the reprex package (v1.0.0)
The created figure should match the dimensions you're looking for.
Another option is to draw the three components as separate plots and stitch them together in the desired ratio.
The below comes quite close to the desired ratio, but not exactly. I guess you'd need to fiddle around with the values given the exact saving dimensions. In the example I used figure dimensions of 7x3.5 inches (which is similar to 18x9cm), and have added the black borders just to demonstrate the component limits.
library(tidyverse)
library(patchwork)
data <- midwest %>%
head(5) %>%
select(2,23:25) %>%
pivot_longer(cols=2:4,names_to="Variable", values_to="Percent") %>%
mutate(Variable=factor(Variable, levels=c("percbelowpoverty","percchildbelowpovert","percadultpoverty"),ordered=TRUE))
p1 <-
ggplot(data=data, mapping=aes(x=county, y=Percent, fill=Variable)) +
geom_col() +
scale_fill_manual(values = c("#CF232B","#942192","#000000"))
p_legend <- cowplot::get_legend(p1)
p_main <- p1 <-
ggplot(data=data, mapping=aes(x=county, y=Percent, fill=Variable)) +
geom_col(show.legend = FALSE) +
scale_fill_manual(values = c("#CF232B","#942192","#000000"))
p_main + plot_spacer() + p_legend +
plot_layout(widths = c(12.5, 1.5, 4)) &
theme(plot.margin = margin(),
plot.background = element_rect(colour = "black"))
Created on 2021-04-02 by the reprex package (v1.0.0)
update
My solution is only semi-satisfactory as pointed out by the OP. The problem is that one cannot (to my knowledge) define the position of the grob in the third panel.
Other ideas for workarounds:
One could determine the space needed for text (but this seems not so easy) and then to size the device accordingly
Create a fake legend - however, this requires the tiles / text to be aligned to the left with no margin, and this can very quickly become very hacky.
In short, I think teunbrand's solution is probably the most straight forward one.
Update 2
The problem with the left alignment should be fixed with Stefan's suggestion in this thread
I made two heatmaps with the code:
I create the first heatmap
heatmap1<-ggplot(mod_mat_constraint, aes(x=Categorie, y=label)) +
geom_tile(aes(fill=Value)) + scale_fill_manual(values = c("#86d65e","#404040","#86d65e","#40c5e8","#e84a4a","#86d65e","#404040","#e2e2e2"), breaks=label_text)
I create the second heatmap
heatmap2<-ggplot(mod_mat_gen_env, aes(x=Categorie, y=label)) +
geom_tile(aes(fill=Value)) + scale_fill_manual(values = c("#86d65e","#404040","#86d65e","#40c5e8","#e84a4a","#86d65e","#404040","#e2e2e2"), breaks=label_text)
and I add them with a tree with:
heatmap2 %>% insert_left(tree) %>% insert_right(heatmap1, width=.5)
which gives me:
and I wondered if there were a way with ggplot2 to add an additional df box at the right corner such as:
from a dataframe called DF1
COL1 COL2 COL3
0.1 Peter USA
Hard to help precisely without a dataset, but to get a table overlaid on your plot, probably the best way would be to use annotation_custom() with a tableGrob() from the gridExtra package.
Here's an example heatmap pulled right from the R Graph Gallery which I used to add in your table as a grob.
# Library
library(ggplot2)
library(gridExtra)
# Dummy data
x <- LETTERS[1:20]
y <- paste0("var", seq(1,20))
data <- expand.grid(X=x, Y=y)
data$Z <- runif(400, 0, 5)
# Heatmap
p <- ggplot(data, aes(X, Y, fill= Z)) +
geom_tile()
df <- data.frame(COL1=0.1,COL2='Peter',COL3='USA')
# adding the table
p + coord_cartesian(clip='off') +
theme(plot.margin = margin(r=140)) +
annotation_custom(
grob=tableGrob(df, theme=ttheme_default(base_size = 7)),
xmin=20, xmax=27, ymin=1, ymax=5
)
You can probably use a similar approach in your case. Note the few things that I had to do here to get this to work:
add the grob as annotation_custom(). You will need to play around with the positioning... really just play with those numbers. Also note you may want to play with the base size to ensure the table is the right aspect ratio compared to your plot.
Extend the plot margin so that you have the real estate on that side to include the table.
Turn clipping off so that you can see things outside the plot area properly.
I am trying to generate a heatmap where I can show more than one level of information on each cell. For each cell I would like to show a different color depending on its value in one variable and then overlay this with a transparency (alpha) that shades the cell according to its value for another variable.
Similar questions have been addressed here (Place 1 heatmap on another with transparency in R) a
and here (Making a heatmap in R varying both color and transparency). In both cases the suggestion is to use ggplot and overlay two geom_tiles, one with the colors one with the transparency.
I have managed to overlay two geom_tiles (see code below). However, in my case, the problem is that the shading defined by the transparency (or "alpha") geom_tile also shades some cells that should remain as white or blank according to the colors (or "fill") geom_tile. I would like these cells to remain white even after overlaying the transparency.
#Create sample dataframe
df <- data.frame("x_pos" = c("A","A","A","B","B","B","C","C","C"),
"y_pos" = c("X","Y","Z","X","Y","Z","X","Y","Z"),
"col_var"= c(1,2,NA,4,5,6,NA,8,9),
"alpha_var" = c(7,12,0,3,2,15,0,6,15))
#Convert factor columns to numeric
df$col_var<- as.numeric(df$col_var)
df$alpha_var<- as.numeric(df$alpha_var)
#Cut display variable into breaks
df$col_var_cut <- cut(df$col_var,
breaks = c(0,3,6,10),
labels = c("cat1","cat2", "cat3"))
#Plot
library(ggplot2)
ggplot(df, aes (x = x_pos, y = y_pos, fill = col_var_cut, label = col_var)) +
geom_tile () +
geom_text() +
scale_fill_manual(values=(brewer.pal(3, "RdYlBu")),na.value="white") +
geom_tile(aes(alpha = alpha_var), fill ="gray29")+
scale_alpha_continuous("alpha_var", range=c(0,0.7), trans = 'reverse')+
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
I would like cells "AZ" and "CX" in the heatmap resulting from the code above to be colored white instead of grey such that the alpha transparency doesn't apply to them. In my data, these cells have NA in the color variable (col_var) and can have a value of NA or 0 (as in the example code) in the transparency/alpha variable (alpha_var).
If this is not possible, then I would like to know whether there are other options to display both variables in a heatmap and keep the NA cells in the col_var white? I am happy to use other packages or alternative heatmap layouts such as those where the size of each cell or the thickness of its border vary according to the values the alpha_var. However, I am not sure how I could achieve this either.
Thanks in advance and my apologies for the cumbersome bits in the example code (I am still learning R and this is my first time asking questions here).
You were not far. See below for a possible solution. The first plot shows an implementation of adding transparency within the geom_tile call itself - note I removed the trans = reverse specification from your plot.
Plot 2 just adds back the white tiles on top of the other plot - simple hack which you will often find necessary when wanting to plot certain data points differently.
Note I have added a few minor comments to your code below.
# creating your data frame with better name - df is a base R function and not recommended as example name.
# Also note that I removed the quotation marks in the data frame call - they were not necessary. I also called as.numeric directly.
mydf <- data.frame(x_pos = c("A","A","A","B","B","B","C","C","C"), y_pos = c("X","Y","Z","X","Y","Z","X","Y","Z"), col_var= as.numeric(c(1,2,NA,4,5,6,NA,8,9)), alpha_var = as.numeric(c(7,12,0,3,2,15,0,6,15)))
mydf$col_var_cut <- cut(mydf$col_var, breaks = c(0,3,6,10), labels = c("cat1","cat2", "cat3"))
#Plot
library(tidyverse)
library(RColorBrewer) # you forgot to add this to your reprex
ggplot(mydf, aes (x = x_pos, y = y_pos, fill = col_var_cut, label = col_var)) +
geom_tile(aes(alpha = alpha_var)) +
geom_text() +
scale_fill_manual(values=(brewer.pal(3, "RdYlBu")), na.value="white")
#> Warning: Removed 2 rows containing missing values (geom_text).
# a bit hacky for quick and dirty solution. Note I am using dplyr::filter from the tidyverse
ggplot(mapping = aes(x = x_pos, y = y_pos, fill = col_var_cut, label = col_var)) +
geom_tile(data = filter(mydf, !is.na(col_var))) +
geom_tile(data = filter(mydf, !is.na(col_var)), aes(alpha = alpha_var), fill ="gray29")+
geom_tile(data = filter(mydf, is.na(col_var)), fill = 'white') +
geom_text(data = mydf) +
scale_fill_manual(values = (brewer.pal(3, "RdYlBu"))) +
scale_alpha_continuous("alpha_var", range=c(0,0.7), trans = 'reverse')
#> Warning: Removed 2 rows containing missing values (geom_text).
Created on 2019-07-04 by the reprex package (v0.2.1)
I have a dataframe with 2 columns: date and var1.
Now I want to plot these 2 variables in a ggplot and add small lines with geom_rug().
df<-tibble(date=lubridate::today() -0:14,
var1= c(1,2.5,NA,3,NA,6.5,1,NA,3,2,NA,7,3,NA,1))
df%>%ggplot(aes(x=date,y=var1))+
geom_point()+
geom_rug(sides = "tr",outside = T) +
# Need to turn clipping off if rug is outside plot area
coord_cartesian(clip = "off")
And here is my plot:
But my problem is that the small lines for var1 are on the left side. I want to have them on the top.
With the argument sides= you can change the disposition of the small lines, like here:
df%>%ggplot(aes(x=date,y=var1))+
geom_point()+
geom_rug(sides = "t",outside = T) +
# Need to turn clipping off if rug is outside plot area
coord_cartesian(clip = "off")
But in this example the small lines are representing the date and not the var1. (var1 has only 10 values, but there are 15 small lines)
Can someone help me, how can reverse the geom_rug-element and avoid this problem?
You do have 15 rows in df$var and in df$date. In the former, 5 are NA.
One way to approach this, is to define a limited data set with only relevant info for what is plotted (not the NAs). This should be given to geom_rug. With complete.cases we are able to omit the rows with NAs in our data set.
You could use the following code to achieve your wanted plot
library(ggplot2)
library(tibble)
library(dplyr)
df %>% ggplot(aes(x = date, y = var1)) +
geom_point() +
geom_rug(data = df[complete.cases(df), ] , ## selected date: not NA
sides = "lt",
outside = TRUE) +
coord_cartesian(clip = "off")
please let me know whether this is what you want.
Another option if you want to drop all cases with NA values while plotting is to use the ggplot2 remove_missing() function:
df %>% ggplot(data = remove_missing(.), mapping = aes(x=date,y=var1))+
geom_point()+
geom_rug(sides = "t",outside = T) +
coord_cartesian(clip = "off")
Is there any way to line up the points of a line plot with the bars of a bar graph using ggplot when they have the same x-axis? Here is the sample data I'm trying to do it with.
library(ggplot2)
library(gridExtra)
data=data.frame(x=rep(1:27, each=5), y = rep(1:5, times = 27))
yes <- ggplot(data, aes(x = x, y = y))
yes <- yes + geom_point() + geom_line()
other_data = data.frame(x = 1:27, y = 50:76 )
no <- ggplot(other_data, aes(x=x, y=y))
no <- no + geom_bar(stat = "identity")
grid.arrange(no, yes)
Here is the output:
The first point of the line plot is to the left of the first bar, and the last point of the line plot is to the right of the last bar.
Thank you for your time.
Extending #Stibu's post a little: To align the plots, use gtable (Or see answers to your earlier question)
library(ggplot2)
library(gtable)
data=data.frame(x=rep(1:27, each=5), y = rep(1:5, times = 27))
yes <- ggplot(data, aes(x = x, y = y))
yes <- yes + geom_point() + geom_line() +
scale_x_continuous(limits = c(0,28), expand = c(0,0))
other_data = data.frame(x = 1:27, y = 50:76 )
no <- ggplot(other_data, aes(x=x, y=y))
no <- no + geom_bar(stat = "identity") +
scale_x_continuous(limits = c(0,28), expand = c(0,0))
gYes = ggplotGrob(yes) # get the ggplot grobs
gNo = ggplotGrob(no)
plot(rbind(gNo, gYes, size = "first")) # Arrange and plot the grobs
Edit To change heights of plots:
g = rbind(gNo, gYes, size = "first") # Combine the plots
panels <- g$layout$t[grepl("panel", g$layout$name)] # Get the positions for plot panels
g$heights[panels] <- unit(c(0.7, 0.3), "null") # Replace heights with your relative heights
plot(g)
I can think of (at least) two ways to align the x-axes in the two plots:
The two axis do not align because in the bar plot, the geoms cover the x-axis from 0.5 to 27.5, while in the other plot, the data only ranges from 1 to 27. The reason is that the bars have a width and the points don't. You can force the axex to align by explicitly specifying an x-axis range. Using the definitions from your plot, this can be achieved by
yes <- yes + scale_x_continuous(limits=c(0,28))
no <- no + scale_x_continuous(limits=c(0,28))
grid.arrange(no, yes)
limits sets the range of the x-axis. Note, though, that the alginment is still not quite perfect. The y-axis labels take up a little more space in the upper plot, because the numbers have two digits. The plot looks as follows:
The other solution is a bit more complicated but it has the advantage that the x-axis is drawn only once and that ggplot makes sure that the alignment is perfect. It makes use of faceting and the trick described in this answer. First, the data must be combined into a single data frame by
all <- rbind(data.frame(other_data,type="other"),data.frame(data,type="data"))
and then the plot can be created as follows:
ggplot(all,aes(x=x,y=y)) + facet_grid(type~.,scales = "free_y") +
geom_bar(data=subset(all,type=="other"),stat="identity") +
geom_point(data=subset(all,type=="data")) +
geom_line(data=subset(all,type=="data"))
The trick is to let the facets be constructed by the variable type which was used before to label the two data sets. But then each geom only gets the subset of the data that should be drawn with that specific geom. In facet_grid, I also used scales = "free_y" because the two y-axes should be independent. This plot looks as follows:
You can change the labels of the facets by giving other names when you define the data frame all. If you want to remove them alltogether, then add the following to your plot:
+ theme(strip.background = element_blank(), strip.text = element_blank())