This question already has answers here:
Order discrete x scale by frequency/value
(7 answers)
Closed 4 years ago.
I'm trying to make a heatmap using ggplot2 using the geom_tiles function
here is my code below:
p<-ggplot(data,aes(Treatment,organisms))+geom_tile(aes(fill=S))+
scale_fill_gradient(low = "black",high = "red") +
scale_x_discrete(expand = c(0, 0)) +
scale_y_discrete(expand = c(0, 0)) +
theme(legend.position = "right",
axis.ticks = element_blank(),
axis.text.x = element_text(size = base_size, angle = 90, hjust = 0, colour = "black"),
axis.text.y = element_text(size = base_size, hjust = 1, colour = "black")).
data is my data.csv file
my X axis is types of Treatment
my Y axis is types of organisms
I'm not too familiar with commands and programming and I'm relatively new at this. I just want to be able to specify the order of the labels on the x axis. In this case, I'm trying to specify the order of "Treatment". By default, it orders alphabetically. How do I override this/keep the data in the same order as in my original csv file?
I've tried this command
scale_x_discrete(limits=c("Y","X","Z"))
where x, y and z are my treatment condition order. It however doesn't work very well, and give me missing heat boxes.
It is a little difficult to answer your specific question without a full, reproducible example. However something like this should work:
#Turn your 'treatment' column into a character vector
data$Treatment <- as.character(data$Treatment)
#Then turn it back into a factor with the levels in the correct order
data$Treatment <- factor(data$Treatment, levels=unique(data$Treatment))
In this example, the order of the factor will be the same as in the data.csv file.
If you prefer a different order, you can order them by hand:
data$Treatment <- factor(data$Treatment, levels=c("Y", "X", "Z"))
However this is dangerous if you have a lot of levels: if you get any of them wrong, that will cause problems.
One can also simply factorise within the aes() call directly. I am not sure why setting the limits doesn't work for you - I assume you get NA's because you might have typos in your level vector.
The below is certainly not much different than user Drew Steen's answer, but with the important difference of not changing the original data frame.
library(ggplot2)
## this vector might be useful for other plots/analyses
level_order <- c('virginica', 'versicolor', 'setosa')
p <- ggplot(iris)
p + geom_bar(aes(x = factor(Species, level = level_order)))
## or directly in the aes() call without a pre-created vector:
p + geom_bar(aes(x = factor(Species, level = c('virginica', 'versicolor', 'setosa'))))
## plot identical to the above - not shown
## or use your vector as limits in scale_x_discrete
p + geom_bar(aes(x = Species)) +
scale_x_discrete(limits = level_order)
Created on 2022-11-20 with reprex v2.0.2
Related
When x axis labels are rotated in ggplot sometimes it happens that labels are cut off
I looked at those posts How can I manipulate a ggplot in R to allow extra room on lhs for angle=45 long x-axis labels? and ggplot2 plot area margins?. The suggestion in both cases is to use plot.margin parameter. But I'm wondering if there's more elegant and dynamic solution to the problem. In my application users will be allowed to change font size for axes labels, so setting a hardcoded value for plot margin seems not to be a good approach. Are there any other ways to avoid such effect? Is it possible to manipulate the layout somehow?
Code to reproduce:
categories <- c(
"Entertainment",
"Research",
"Development",
"Support",
"Classic",
"Old Time"
)
years <- 2020:2021
types <- c(
"Easy",
"Pro",
"Free",
"Trial",
"Subscription"
)
d <- expand.grid(category = categories,
type = types,
year = years)
counts <- sample(0:100, size = nrow(d))
d$n <- counts
ggplot(
data = d,
aes(x = category, y = n, fill = category)
) + geom_bar(stat = "identity") +
facet_grid(rows = vars(year), cols = vars(type)) +
theme(
axis.text.x = element_text(
angle = 22.5,
hjust = 1,
size = 12
)
)
I don't see any way to do this automatically natively using ggplot2 tools, so why not write a small function that sets the size of the margin based on the number of characters in the leftmost x category value?
margin_spacer <- function(x) {
# where x is the column in your dataset
left_length <- nchar(levels(factor(x)))[1]
if (left_length > 8) {
return((left_length - 8) * 4)
}
else
return(0)
}
The function can deal with a character column (or factor), and checks the number of characters in the first level (which would appear on the left in the plot). Fiddling around, it seemed anything longer than 8 posed an issue for the plot code, so this adds 4 points to the margin for every character past 8 characters.
Note also, that I changed the angle of the x axis text on your plot - I think 22.5 is a bit too shallow and you get a lot of overlapping with the size of your text on my graphics device. This means that 8 and 4 value may not work quite as well for you, but here's how it works for a few different data frames.
Here's the new plot code:
ggplot(data = d, aes(x = category, y = n, fill = category)) +
geom_bar(stat = "identity") +
facet_grid(rows = vars(year), cols = vars(type)) +
theme(
axis.text.x = element_text(angle = 40, hjust = 1, size = 12),
plot.margin = margin(l = 0 + margin_spacer(d$category))
)
I created the following plots by changing the code where d$categories is defined. I am showing you the output using the code above where the first entry in categories <- c(...) is changed accordingly for each one. You'll note it works pretty well unless it's crazy long. As the text gets really long, the text size may have to be adjusted as well. If you think your users are going to get crazy with labels, you can use a similar strategy to adjust text size... but that's probably overkill.
"Enter" (5 characters)
"Entertain" (9 characters)
"Entertainer" (11 characters)
"Entertainment" (13 characters)
"Quality Entertainment" (21 characters)
This question already has answers here:
Order discrete x scale by frequency/value
(7 answers)
Closed 4 years ago.
I'm trying to make a heatmap using ggplot2 using the geom_tiles function
here is my code below:
p<-ggplot(data,aes(Treatment,organisms))+geom_tile(aes(fill=S))+
scale_fill_gradient(low = "black",high = "red") +
scale_x_discrete(expand = c(0, 0)) +
scale_y_discrete(expand = c(0, 0)) +
theme(legend.position = "right",
axis.ticks = element_blank(),
axis.text.x = element_text(size = base_size, angle = 90, hjust = 0, colour = "black"),
axis.text.y = element_text(size = base_size, hjust = 1, colour = "black")).
data is my data.csv file
my X axis is types of Treatment
my Y axis is types of organisms
I'm not too familiar with commands and programming and I'm relatively new at this. I just want to be able to specify the order of the labels on the x axis. In this case, I'm trying to specify the order of "Treatment". By default, it orders alphabetically. How do I override this/keep the data in the same order as in my original csv file?
I've tried this command
scale_x_discrete(limits=c("Y","X","Z"))
where x, y and z are my treatment condition order. It however doesn't work very well, and give me missing heat boxes.
It is a little difficult to answer your specific question without a full, reproducible example. However something like this should work:
#Turn your 'treatment' column into a character vector
data$Treatment <- as.character(data$Treatment)
#Then turn it back into a factor with the levels in the correct order
data$Treatment <- factor(data$Treatment, levels=unique(data$Treatment))
In this example, the order of the factor will be the same as in the data.csv file.
If you prefer a different order, you can order them by hand:
data$Treatment <- factor(data$Treatment, levels=c("Y", "X", "Z"))
However this is dangerous if you have a lot of levels: if you get any of them wrong, that will cause problems.
One can also simply factorise within the aes() call directly. I am not sure why setting the limits doesn't work for you - I assume you get NA's because you might have typos in your level vector.
The below is certainly not much different than user Drew Steen's answer, but with the important difference of not changing the original data frame.
library(ggplot2)
## this vector might be useful for other plots/analyses
level_order <- c('virginica', 'versicolor', 'setosa')
p <- ggplot(iris)
p + geom_bar(aes(x = factor(Species, level = level_order)))
## or directly in the aes() call without a pre-created vector:
p + geom_bar(aes(x = factor(Species, level = c('virginica', 'versicolor', 'setosa'))))
## plot identical to the above - not shown
## or use your vector as limits in scale_x_discrete
p + geom_bar(aes(x = Species)) +
scale_x_discrete(limits = level_order)
Created on 2022-11-20 with reprex v2.0.2
I want to make a Munsell for color chart for the chips used by the World Color Survey. It should look like this:
The information needed can be found on the WCS page, here, I take the following steps:
library(munsell) # https://cran.r-project.org/web/packages/munsell/munsell.pdf
library(ggplot2)
# take the "cnum-vhcm-lab-new.txt" file from: https://www1.icsi.berkeley.edu/wcs/data.html#wmt
# change by replacing .50 with .5 removing .00 after hue values
WCS <- read.csv("cnum-vhcm-lab-new.txt", sep = "\t", header = T)
WCS$hex <- mnsl2hex(hvc2mnsl(hue = WCS$MunH, value = ceiling(WCS$MunV), chroma = WCS$C), fix = T)
# this works, but the order of tiles is messed up
ggplot(aes(x=H, y=V, fill=hex), data = WCS) +
geom_tile(aes(x=H, y=V), show.legend = F) +
scale_fill_manual(values = WCS$hex) +
scale_x_continuous(breaks = scales::pretty_breaks(n = 40))
The result:
Clearly, the chips are not ordered along hue and value but with reference to some other dimension, perhaps even order in the original data frame. I also have to revert the order on the y-axis. I guess the solution will have to do with factor() and reorder(), but how to do it?
OP. TL;DR - you should be using scale_fill_identity() rather than scale_fill_manual().
Now for the long description: At its core, ggplot2 functions on mapping the columns of your data to specific features on the plot, which ggplot2 refers to as "aesthetics" using the aes() function. Positioning is defined by mapping certain columns of your data to x and y aesthetics, and the different colors in your tiles are mapped to fill using aes() as well.
The mapping for fill does not specify color, but only specifies which things should be different colors. When mapped this way, it means that rows in your data (observations) that have the same value in column mapped to the fill aesthetic will be the same color, and observations that have different values in the column mapped to the fill aesthetic will be different colors. Importantly, this does not specify the color, but only specifies if colors should be different!
The default behavior is that ggplot2 will determine the colors to use by applying a default scale. For continuous (numeric) values, a continuous scale is applied, and for discrete values (like a vector of characters), a discrete scale is applied.
To see the default behavior, just remove scale_fill_manual(...) from your plot code. I've recopied your code below and added the needed revisions to programmatically remove and adjust the ".50" and ".00" changes to WCS$MunH. The code below should work entirely if you have downloaded the original .txt file from the link you provided.
library(munsell)
library(ggplot2)
WCS <- read.csv("cnum-vhcm-lab-new.txt", sep = "\t", header = T)
WCS$MunH <- gsub('.50','.5', WCS$MunH) # remove trailing "0" after ".50"
WCS$MunH <- gsub('.00', '', WCS$MunH) # remove ".00" altogether
WCS$V <- factor(WCS$V) # needed to flip the axis
WCS$hex <- mnsl2hex(hvc2mnsl(hue = WCS$MunH, value = ceiling(WCS$MunV), chroma = WCS$C), fix = T)
ggplot(aes(x=H, y=V, fill=hex), data = WCS) +
geom_tile(aes(x=H, y=V), show.legend = F, width=0.8, height=0.8) +
scale_y_discrete(limits = rev(levels(WCS$V))) + # flipping the axis
scale_x_continuous(breaks = scales::pretty_breaks(n = 40)) +
coord_fixed() + # force all tiles to be "square"
theme(
panel.grid = element_blank()
)
You have show.legend = F in there, but there should be 324 different values mapped to the WCS$hex column (i.e. length(unique(WCS$hex))).
When using scale_fill_manual(values=...), you are supplying the names of the colors to be used, but they are not mapped to the same positions in your column WCS$hex. They are applied according to the way in which ggplot2 decides to organize the levels of WCS$hex as if it were a factor.
In order to tell ggplot2 to basically ignore the mapping and just color according to the actual color name you see in the column mapped to fill, you use scale_fill_identity(). This will necessarily remove the ability to show any legend, since it kind of removes the mapping and recoloring that is the default behavior of aes(fill=...). Regardless, this should solve your issue:
ggplot(aes(x=H, y=V, fill=hex), data = WCS) +
geom_tile(aes(x=H, y=V), width=0.8, height=0.8) +
scale_fill_identity() + # assign color based on text
scale_y_discrete(limits = rev(levels(WCS$V))) + # flipping the axis
scale_x_continuous(breaks = scales::pretty_breaks(n = 40)) +
coord_fixed() + # force all tiles to be "square"
theme(
panel.grid = element_blank()
)
The main thing is to use the right color scale (scale_fill_identity). This ensures the hex values are uses as the color for the tiles.
library(munsell) # https://cran.r-project.org/web/packages/munsell/munsell.pdf
library(ggplot2)
WCS <- read.csv(url('https://www1.icsi.berkeley.edu/wcs/data/cnum-maps/cnum-vhcm-lab-new.txt'), sep = "\t", header = T)
WCS$hex <- mnsl2hex(hvc2mnsl(hue = gsub('.00','',gsub('.50', '.5',WCS$MunH)), value = ceiling(WCS$MunV), chroma = WCS$C), fix = T)
# this works, but the order of tiles is messed up
ggplot(aes(x=H, y=V, fill=hex), data = WCS) +
geom_tile(aes(x=H, y=V), show.legend = F) +
scale_fill_identity() +
scale_x_continuous(breaks = scales::pretty_breaks(n = 40))
Created on 2021-10-05 by the reprex package (v2.0.1)
So I'm using ggplot2 to plot both a bar graph and points. I'm currently getting this:
As you can see the bars are nicely separated and colored in the desired colors. However my points are all uncolored and stacked ontop of eachother. I would like the points to be above their designated bar and in the same color.
#Add bars
A <- A + geom_col(aes(y = w1, fill = factor(Species1)),
position = position_dodge(preserve = 'single'))
#Add colors
A <- A + scale_fill_manual(values = c("A. pelagicus"= "skyblue1","A. superciliosus"="dodgerblue","A. vulpinus"="midnightblue","Alopias sp."="black"))
#Add points
A <- A + geom_point(aes(y = f1/2.5),
shape= 24,
size = 3,
fill = factor(Species1),
position = position_dodge(preserve = 'single'))
#change x and y axis range
A <- A + scale_x_continuous(breaks = c(2000:2020), limits = c(2016,2019))
A <- A + expand_limits(y=c(0,150))
# now adding the secondary axis, following the example in the help file ?scale_y_continuous
# and, very important, reverting the above transformation
A <- A + scale_y_continuous(sec.axis = sec_axis(~.*2.5, name = " "))
# modifying axis and title
A <- A + labs(y = " ",
x = " ")
A <- A + theme(plot.title = element_text(size = rel(4)))
A <- A + theme(axis.text.x = element_text(face="bold", size=14, angle=45),
axis.text.y = element_text(face="bold", size=14))
#A <- A + theme(legend.title = element_blank(),legend.position = "none")
#Print plot
A
When I run this code I get the following error:
Error: Unknown colour name: A. pelagicus
In addition: Warning messages:
1: Width not defined. Set with position_dodge(width = ?)
2: In max(table(panel$xmin)) : no non-missing arguments to max; returning -Inf
I've tried a couple of things but I can't figure out it does work for geom_col and not for geom_points.
Thanks in advance
The two basic problems you have are dealing with your color error and not dodging, and they can be solved by formatting your scale_...(values= argument using a list instead of a vector, and applying the group= aesthetic, respectively.
You'll see the answer to these two question using an example:
# dummy dataset
year <- c(rep(2017, 4), rep(2018, 4))
species <- rep(c('things', 'things1', 'wee beasties', 'ew'), 2)
values <- c(10, 5, 5, 4, 60, 10, 25, 7)
pt.value <- c(8, 7, 10, 2, 43, 12, 20, 10)
df <-data.frame(year, species, values, pt.value)
I made the "values" set for my column heights and I wanted to use a different y aesthetic for points for illustrative purposes, called "pt.value". Otherwise, the data setup is similar to your own. Note that df$year will be set as numeric, so it's best to change that into either Date format (kinda more trouble than it's worth here), or just as a factor, since "2017.5" isn't gonna make too much sense here :). The point is, I need "year" to be discrete, not continuous.
Solve the color error
For the plot, I'll try to create it similar to you. Here note that in the scale_fill_manual object, you have to set the values= argument using a list. In your example code, you are using a vector (c()) to specify the colors and naming. If you have name1=color1, name2=color2,..., this represents a list structure.
ggplot(df, aes(x=as.factor(year), y=values)) +
geom_col(aes(fill=species), position=position_dodge(width=0.62), width=0.6) +
scale_fill_manual(values=
list('ew' = 'skyblue1', 'things' = 'dodgerblue',
'things1'='midnightblue', 'wee beasties' = 'gray')) +
geom_point(aes(y=pt.value), shape=24, position=position_dodge(width=0.62)) +
theme_bw() + labs(x='Year')
So the colors are applied correctly and my axis is discrete, and the y values of the points are mapped to pt.value like I wanted, but why don't the points dodge?!
Solve the dodging issue
Dodging is a funny thing in ggplot2. The best reasoning here I can give you is that for columns and barplots, dodging is sort of "built-in" to the geom, since the default position is "stack" and "dodge" represents an alternative method to draw the geom. For points, text, labels, and others, the default position is "identity" and you have to be more explicit in how they are going to dodge or they just don't dodge at all.
Basically, we need to let the points know what they are dodging based on. Is it "species"? With geom_col, it's assumed to be, but with geom_point, you need to specify. We do that by using a group= aesthetic, which let's the geom_point know what to use as criteria for dodging. When you add that, it works!
ggplot(df, aes(x=as.factor(year), y=values, group=species)) +
geom_col(aes(fill=species), position=position_dodge(width=0.62), width=0.6) +
scale_fill_manual(values=
list('ew' = 'skyblue1', 'things' = 'dodgerblue',
'things1'='midnightblue', 'wee beasties' = 'gray')) +
geom_point(aes(y=pt.value), shape=24, position=position_dodge(width=0.62)) +
theme_bw() + labs(x='Year')
I have a bar graph coming from one set of monthly data and I want to overlay on it data from another set of monthly data in the form of a line. Here is a simplified example (in my data the second data set is not a simple manipulation of the first):
library(reshape2)
library(ggplot2)
test<-abs(rnorm(12)*1000)
test<-rbind(test, test+500)
colnames(test)<-month.abb[seq(1:12)]
rownames(test)<-c("first", "second")
otherTest<-apply(test, 2, mean)
test<-melt(test)
otherTest<-as.data.frame(otherTest)
p<-ggplot(test, aes(x=Var2, y=value, fill=Var1, order=-as.numeric(Var2))) + geom_bar(stat="identity")+
theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black")) +
ggtitle("Test Graph") +
scale_fill_manual(values = c(rgb(1,1,1), rgb(.9,0,0))) +
guides(fill=FALSE) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
works great to get the bar graph:
but I have tried multiple iterations to get the line on there and can't figure it out (like this):
p + geom_line(data=otherTest,size=1, color=rgb(0,.5,0)
Also, if anybody knows how I can make the bars in front of each other so that all you see is a red bar of height 500, I would appreciate any suggestions. I know I can just take the difference between the two lines of the matrix and keep it as a stacked bar but I thought there might be an easy way to put both bars on the x-axis, white in front of red. Thanks!
You have a few problems to deal with here.
Directly answering your question, if you don't provide a mapping via aes(...) in a geom call (like your geom_line...), then the mapping will come from ggplot(). Your ggplot() specifies x=Var2, y=value, fill=Var1.... All of these variable names must exist in your data frame otherTest for this to work, and they don't right now.
So, you either need to ensure that these variable names exist in otherTest, or specify mapping separately in geom_line. You might want to read up about how these layering options work. E.g., here's a post of mine that goes into some detail.
If you go for the first option, some other problems to think about:
is Var2 a factor with the same levels in both data frames? It probably should be.
to use geom_line as you are, you might need to add group = 1. See here.
Some others too, but here's a brief example of what you might do:
library(reshape2)
library(ggplot2)
test <- abs(rnorm(12)*1000)
test <- rbind(test, test+500)
colnames(test) <- month.abb[seq(1:12)]
rownames(test) <- c("first", "second")
otherTest <- apply(test, 2, mean)
test <- melt(test)
otherTest <- data.frame(
Var2 = names(otherTest),
value = otherTest
)
otherTest$Var2 = factor(otherTest$Var2, levels = levels(test$Var2))
ggplot(test, aes(x = Var2, y = value, group = 1)) +
geom_bar(aes(fill = Var1), stat="identity") +
geom_line(data = otherTest)