Related
I'm trying to create a graph in R using the ggplot2 package. I can make the graph without any issues as per the simplified example below:
library(tidyverse)
A <- c(10,9,8,7,6,5,4,3,2,1,2,3,4,5,6,7,8,9,10)
B <- c(15,14,13,12,11,10,9,8,7,6,7,8,9,10,11,12,13,14,15)
C <- rep(5,19)
D <- c(1:19)
data1 <- tibble(A, B, C, D)
data1 <- gather(data1, Type, Value, -D)
plot <- ggplot(data = data1, aes(x = D, y = Value))+
theme_light()+
geom_line(aes(colour = Type), size=0.75)+
theme(legend.title = element_blank())
The original graphs look like this but they are made in pretty much the same way as in the example:
The issue is that I want to crop A (red in example; solid blue line in the original graphs) when it hits C (blue in example; solid red line in original graph) without affecting B (green in example; everything else in original graph).
The complication is that I don't really want to change the structure of my original dataset so I'm hoping that there's a way of doing this while I'm creating the plot?
Many thanks,
Carolina
Yes, you can do it within the ggplot call by passing a filtered version of the data frame to the data argument of geom_line:
ggplot(data = data1, aes(x = D, y = Value))+
theme_light() +
geom_line(aes(colour = Type),
data = data1 %>%
filter(!(Type == "A" & Value > mean(Value[Type == "C"]))),
size = 0.75)+
theme(legend.title = element_blank())
However, from looking at your original plot, I think the tidiest way to do this would be to narrow the x axis limits to "chop off" either end of the lower blue line where it meets the horizontal red line, since this will not affect the dashed upper blue line at all. You can cut off the data with lims(x = c(0.25, 12.5) but retain the lower limit of 0 on your axis by setting coord_cartesian(xlim = c(0, 12.5))
I have been working on creating a heatmap for a few days and I cannot get the final formating of gridlines to work. See the codes and attached plots below. What I am trying to do is to align the gridline along the tiles of the heatmap using geom_tile() so each tile fills the inside of the grid in a box way. I was able to align the gridlines using geom_raster() but the y-axis label ticks at either the top or the bottom of the tile but I need it to tick at the center (See red highlight), also I cannot get geom_raster to wrap a white line border around the tiles so the color blocks looks a bit disorganized in my original dataset. Would be grateful for any help with the formatting codes. Thanks very much!
#The data set in long format
y<- c("A","A","A","A","B","B","B","B","B","C","C","C","D","D","D")
x<- c("2020-03-01","2020-03-15","2020-03-18","2020-03-18","2020-03-01","2020-03-01","2020-03-01","2020-03-01","2020-03-05","2020-03-06","2020-03-05","2020-03-05","2020-03-20","2020-03-20","2020-03-21")
v<-data.frame(y,x)
#approach 1 using geom_tile but gridline does not align with borders of the tiles
v%>%
count(y,x,drop=FALSE)%>%
arrange(n)%>%
ggplot(aes(x=x,y=fct_reorder(y,n,sum)))+
geom_tile(aes(fill=n),color="white", size=0.25)
I have tried running similar codes from another post but I wasn't able to get it to run properly. I think because my x variable is a count variable of y variable so cannot be formatted into a factor variable to specify xmin and xmax in geom_rect()
#approach 2 using geom_raster but y-axis label can't tick at the center of tiles and there's no border around the tile to differentiate between tiles.
v%>%
count(y,x,drop=FALSE)%>%
arrange(n)%>%
ggplot()+
geom_raster(aes(x=x,y=fct_reorder(y,n,sum),fill=n),hjust=0,vjust=0)
I think it makes sense to keep the ticks and in turn the grid lines where they are. To still achieve what you're looking for, I would suggest you expand your data to include all possible combinations and simply set the na.value to a neutral fill color:
# all possible combinations
all <- v %>% expand(y, x)
# join with all, n will be NA for obs. in all that are not present in v
v = v %>% group_by_at(vars(y, x)) %>%
summarize(n = n()) %>% right_join(all)
ggplot(data = v,
aes(x=x, y=fct_reorder(y,n, function(x) sum(x, na.rm = T))))+ # note that you must account for the NA values now
geom_tile(aes(fill=n), color="white",
size=0.25) +
scale_fill_continuous(na.value = 'grey90') +
scale_x_discrete(expand = c(0,0)) +
scale_y_discrete(expand = c(0,0))
This is a bit of a hack. My approach converts the categorical variables to numerics which adds minor grid lines to the plot which align with the tiles. To get rid of the major grid lines I simply use theme(). Drawback: Breaks and labels have to be set manually.
library(ggplot2)
library(dplyr)
library(forcats)
v1 <- v %>%
count(y,x,drop=FALSE)%>%
arrange(n) %>%
mutate(y = fct_reorder(y, n, sum),
y1 = as.integer(y),
x = factor(x),
x1 = as.integer(x))
labels_y <- levels(v1$y)
breaks_y <- seq_along(labels_y)
labels_x <- levels(v1$x)
breaks_x <- seq_along(labels_x)
ggplot(v1, aes(x=x1, y=y1))+
geom_tile(aes(fill=n), color="white", size=0.25) +
scale_y_continuous(breaks = breaks_y, labels = labels_y) +
scale_x_continuous(breaks = breaks_x, labels = labels_x) +
theme(panel.grid.major = element_blank())
Created on 2020-05-23 by the reprex package (v0.3.0)
Edit: Checked for long var names
y<- c("John Doe","John Doe","John Doe","John Doe","Mary Jane","Mary Jane","Mary Jane","Mary Jane","Mary Jane","C","C","C","D","D","D")
x<- c("2020-03-01","2020-03-15","2020-03-18","2020-03-18","2020-03-01","2020-03-01","2020-03-01","2020-03-01","2020-03-05","2020-03-06","2020-03-05","2020-03-05","2020-03-20","2020-03-20","2020-03-21")
v<-data.frame(y,x)
Created on 2020-05-23 by the reprex package (v0.3.0)
I am trying to generate a heatmap where I can show more than one level of information on each cell. For each cell I would like to show a different color depending on its value in one variable and then overlay this with a transparency (alpha) that shades the cell according to its value for another variable.
Similar questions have been addressed here (Place 1 heatmap on another with transparency in R) a
and here (Making a heatmap in R varying both color and transparency). In both cases the suggestion is to use ggplot and overlay two geom_tiles, one with the colors one with the transparency.
I have managed to overlay two geom_tiles (see code below). However, in my case, the problem is that the shading defined by the transparency (or "alpha") geom_tile also shades some cells that should remain as white or blank according to the colors (or "fill") geom_tile. I would like these cells to remain white even after overlaying the transparency.
#Create sample dataframe
df <- data.frame("x_pos" = c("A","A","A","B","B","B","C","C","C"),
"y_pos" = c("X","Y","Z","X","Y","Z","X","Y","Z"),
"col_var"= c(1,2,NA,4,5,6,NA,8,9),
"alpha_var" = c(7,12,0,3,2,15,0,6,15))
#Convert factor columns to numeric
df$col_var<- as.numeric(df$col_var)
df$alpha_var<- as.numeric(df$alpha_var)
#Cut display variable into breaks
df$col_var_cut <- cut(df$col_var,
breaks = c(0,3,6,10),
labels = c("cat1","cat2", "cat3"))
#Plot
library(ggplot2)
ggplot(df, aes (x = x_pos, y = y_pos, fill = col_var_cut, label = col_var)) +
geom_tile () +
geom_text() +
scale_fill_manual(values=(brewer.pal(3, "RdYlBu")),na.value="white") +
geom_tile(aes(alpha = alpha_var), fill ="gray29")+
scale_alpha_continuous("alpha_var", range=c(0,0.7), trans = 'reverse')+
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
I would like cells "AZ" and "CX" in the heatmap resulting from the code above to be colored white instead of grey such that the alpha transparency doesn't apply to them. In my data, these cells have NA in the color variable (col_var) and can have a value of NA or 0 (as in the example code) in the transparency/alpha variable (alpha_var).
If this is not possible, then I would like to know whether there are other options to display both variables in a heatmap and keep the NA cells in the col_var white? I am happy to use other packages or alternative heatmap layouts such as those where the size of each cell or the thickness of its border vary according to the values the alpha_var. However, I am not sure how I could achieve this either.
Thanks in advance and my apologies for the cumbersome bits in the example code (I am still learning R and this is my first time asking questions here).
You were not far. See below for a possible solution. The first plot shows an implementation of adding transparency within the geom_tile call itself - note I removed the trans = reverse specification from your plot.
Plot 2 just adds back the white tiles on top of the other plot - simple hack which you will often find necessary when wanting to plot certain data points differently.
Note I have added a few minor comments to your code below.
# creating your data frame with better name - df is a base R function and not recommended as example name.
# Also note that I removed the quotation marks in the data frame call - they were not necessary. I also called as.numeric directly.
mydf <- data.frame(x_pos = c("A","A","A","B","B","B","C","C","C"), y_pos = c("X","Y","Z","X","Y","Z","X","Y","Z"), col_var= as.numeric(c(1,2,NA,4,5,6,NA,8,9)), alpha_var = as.numeric(c(7,12,0,3,2,15,0,6,15)))
mydf$col_var_cut <- cut(mydf$col_var, breaks = c(0,3,6,10), labels = c("cat1","cat2", "cat3"))
#Plot
library(tidyverse)
library(RColorBrewer) # you forgot to add this to your reprex
ggplot(mydf, aes (x = x_pos, y = y_pos, fill = col_var_cut, label = col_var)) +
geom_tile(aes(alpha = alpha_var)) +
geom_text() +
scale_fill_manual(values=(brewer.pal(3, "RdYlBu")), na.value="white")
#> Warning: Removed 2 rows containing missing values (geom_text).
# a bit hacky for quick and dirty solution. Note I am using dplyr::filter from the tidyverse
ggplot(mapping = aes(x = x_pos, y = y_pos, fill = col_var_cut, label = col_var)) +
geom_tile(data = filter(mydf, !is.na(col_var))) +
geom_tile(data = filter(mydf, !is.na(col_var)), aes(alpha = alpha_var), fill ="gray29")+
geom_tile(data = filter(mydf, is.na(col_var)), fill = 'white') +
geom_text(data = mydf) +
scale_fill_manual(values = (brewer.pal(3, "RdYlBu"))) +
scale_alpha_continuous("alpha_var", range=c(0,0.7), trans = 'reverse')
#> Warning: Removed 2 rows containing missing values (geom_text).
Created on 2019-07-04 by the reprex package (v0.2.1)
This has been something I've been experimenting with to find a fix for a while, but basically I was wondering if there is a quick way to "dodge" lineplots for two different data sets in ggplot2.
My code is currently:
#Example data
id <- c("A","A")
var <- c(1,10)
id_num <- c(1,1)
df1 <- data.frame(id,var,id_num)
id <- c("A","A")
var <- c(1,15)
id_num <- c(0.9,0.9)
df2 <- data.frame(id,var,id_num)
#Attempted plot
dodge <- position_dodge(width=0.5)
p<- ggplot(data= df1, aes(x=var, y=id)) +
geom_line(aes(colour="Group 1"),position="dodge") +
geom_line(data= df2,aes(x=var, y=id,colour="Group 2"),position="dodge") +
scale_color_manual("",values=c("salmon","skyblue2"))
p
Which produces:
Here the "Group 2" line is hiding all of the "Group 1" line which is not what I want. Instead, I want the "Group 2" line to be below the "Group 1" line. I've looked around and found this previous post: ggplot2 offset scatterplot points but I can't seem to adapt the code to get two geom_lines to dodge each other when using separate data frames.
I've been converting my y-variables to numeric and slightly offsetting them to get the desired output, but I was wondering if there was a faster/easier way to get the same result using the dodge functionality of ggplot or something else.
My work around code is simply:
p<- ggplot(data= df1, aes(x=var, y=id_num)) +
geom_line(aes(colour="Group 1")) +
geom_line(data= df2,aes(x=var, y=id_num,colour="Group 2")) +
scale_color_manual("",values=c("salmon","skyblue2")) +
scale_y_continuous(lim=c(0,1))
p
Giving me my desired output of:
Desired output:
The numeric approach can be a little cumbersome when I try to expand it to fit my actual data. I have to convert my y-values to factors, change them to numeric and then merge the values onto the second data set, so a quicker way would be preferable. Thanks in advance for your help!
You have actually two issues here:
If the two lines are plotted using two layers of geom_line() (because you have two data frames), then each line "does not know" about the other. Therefore, they can not dodge each other.
position_dodge() is used to dodge in horizontal direction. The standard example is a bar plot, where you place various bars next to each other (instead of on top of each other). However, you want to dodge in vertical direction.
Issue 1 is solved by combining the data frames into one as follows:
library(dplyr)
df_all <- bind_rows(Group1 = df1, Group2 = df2, .id = "group")
df_all
## Source: local data frame [4 x 4]
##
## group id var id_num
## (chr) (fctr) (dbl) (dbl)
## 1 Group1 A 1 1.0
## 2 Group1 A 10 1.0
## 3 Group2 A 1 0.9
## 4 Group2 A 15 0.9
Note how setting .id = "Group" lets bind_rows() create a column group with the labels taken from the names that were used together with df1 and df2.
You can then plot both lines with a single geom_line():
library(ggplot2)
ggplot(data = df_all, aes(x=var, y=id, colour = group)) +
geom_line(position = position_dodge(width = 0.5)) +
scale_color_manual("",values=c("salmon","skyblue2"))
I also used position_dodge() to show you issue 2 explicitly. If you look closely, you can see the red line stick out a little on the left side. This is the consequence of the two lines dodging each other (not very successfully) in vertical direction.
You can solve issue 2 by exchanging x and y coordinates. In that situation, dodging horizontally is the right thing to do:
ggplot(data = df_all, aes(y=var, x=id, colour = group)) +
geom_line(position = position_dodge(width = 0.5)) +
scale_color_manual("",values=c("salmon","skyblue2"))
The last step is then to use coord_flip() to get the desired plot:
ggplot(data = df_all, aes(y=var, x=id, colour = group)) +
geom_line(position = position_dodge(width = 0.5)) +
scale_color_manual("",values=c("salmon","skyblue2")) +
coord_flip()
I am using ggplot to generate a chart that summarises a race made up from several laps. There are 24 participants in the race,numbered 1-12, 14-25; I am plotting out a summary measure for each participant using ggplot, but ggplot assumes I want the number range 1-25, rather than categories 1-12, 14-25.
What's the fix for this? Here's the code I am using (the data is sourced from a Google spreadsheet).
sskey='0AmbQbL4Lrd61dHlibmxYa2JyT05Na2pGVUxLWVJYRWc'
library("ggplot2")
require(RCurl)
gsqAPI = function(key,query,gid){ return( read.csv( paste( sep="", 'http://spreadsheets.google.com/tq?', 'tqx=out:csv', '&tq=', curlEscape(query), '&key=', key, '&gid=', curlEscape(gid) ) ) ) }
sin2011racestatsX=gsqAPI(sskey,'select A,B,G',gid='13')
sin2011proximity=gsqAPI(sskey,'select A,B,C',gid='12')
h=sin2011proximity
k=sin2011racestatsX
l=subset(h,lap==1)
ggplot() +
geom_step(aes(x=h$car, y=h$pos, group=h$car)) +
scale_x_discrete(limits =c('VET','WEB','HAM','BUT','ALO','MAS','SCH','ROS','SEN','PET','BAR','MAL','','SUT','RES','KOB','PER','BUE','ALG','KOV','TRU','RIC','LIU','GLO','AMB'))+
xlab(NULL) + opts(title="F1 2011 Korea \nRace Summary Chart", axis.text.x=theme_text(angle=-90, hjust=0)) +
geom_point(aes(x=l$car, y=l$pos, pch=3, size=2)) +
geom_point(aes(x=k$driverNum, y=k$classification,size=2), label='Final') +
geom_point(aes(x=k$driverNum, y=k$grid, col='red')) +
ylab("Position")+
scale_y_discrete(breaks=1:24,limits=1:24)+ opts(legend.position = "none")
Expanding on my cryptic comment, try this:
#Convert these to factors with the appropriate labels
# Note that I removed the ''
h$car <- factor(h$car,labels = c('VET','WEB','HAM','BUT','ALO','MAS','SCH','ROS','SEN','PET','BAR','MAL',
'SUT','RES','KOB','PER','BUE','ALG','KOV','TRU','RIC','LIU','GLO','AMB'))
k$driverNum <- factor(k$driverNum,labels = c('VET','WEB','HAM','BUT','ALO','MAS','SCH','ROS','SEN','PET','BAR','MAL',
'SUT','RES','KOB','PER','BUE','ALG','KOV','TRU','RIC','LIU','GLO','AMB'))
l=subset(h,lap==1)
ggplot() +
geom_step(aes(x=h$car, y=h$pos, group=h$car)) +
geom_point(aes(x=l$car, y=l$pos, pch=3, size=2)) +
geom_point(aes(x=k$driverNum, y=k$classification,size=2), label='Final') +
geom_point(aes(x=k$driverNum, y=k$grid, col='red')) +
ylab("Position") +
scale_y_discrete(breaks=1:24,limits=1:24) + opts(legend.position = "none") +
opts(title="F1 2011 Korea \nRace Summary Chart", axis.text.x=theme_text(angle=-90, hjust=0)) + xlab(NULL)
Calling scale_x_discrete is no longer necessary. And stylistically, I prefer putting opts and xlab stuff at the end.
Edit
A few notes in response to your comment. Many of your difficulties can be eased by a more streamlined use of ggplot. Your data is in an awkward format:
#Summarise so we can use geom_linerange rather than geom_step
d1 <- ddply(h,.(car),summarise,ymin = min(pos),ymax = max(pos))
#R has a special value for missing data; use it!
k$classification[k$classification == 'null'] <- NA
k$classification <- as.integer(k$classification)
#The other two data sets should be merged and converted to long format
d2 <- merge(l,k,by.x = "car",by.y = "driverNum")
colnames(d2)[3:5] <- c('End of Lap 1','Final Position','Grid Position')
d2 <- melt(d2,id.vars = 1:2)
#Now the plotting call is much shorter
ggplot() +
geom_linerange(data = d1,aes(x= car, ymin = ymin,ymax = ymax)) +
geom_point(data = d2,aes(x= car, y= value,shape = variable),size = 2) +
opts(title="F1 2011 Korea \nRace Summary Chart", axis.text.x=theme_text(angle=-90, hjust=0)) +
labs(x = NULL, y = "Position", shape = "")
A few notes. You were setting aesthetics to fixed values (size = 2) which should be done outside of aes(). aes() is for mapping variables (i.e. columns) to aesthetics (color, shape, size, etc.). This allows ggplot to intelligently create the legend for you.
Merging the second two data sets and then melting it creates a grouping variable for ggplot to use in the legend. I used the shape aesthetic since a few values overlap; using color may make that hard to spot. In general, ggplot will resist mixing aesthetics into a single legend. If you want to use shape, color and size you'll get three legends.
I prefer setting labels using labs, since you can do them all in one spot. Note that setting the aesthetic label to "" removes the legend title.