I'm trying to do a plot which consists in two main parts, the "background" is the shape of a USA state and on top, I'm adding measurement points (using latitude and longitude coordinates) which I want to be color scaled according to the value of the measurement (The data comes from a data frame). I'm having a hard time changing the color of the points and personalizing the legend bar, I would like the bar to also show the max and minimum values and use a color scale that is more visually appealing.
m = map_data('state', region = state)
finalplot <- ggplot() +
geom_polygon( data=m, aes(x=long, y=lat), colour="black", fill="white" ) +
geom_point(data=filteredtable,aes(x=LongitudeMeasure,y=LatitudeMeasure, colour = Result)) +
ggtitle(paste0("Measurement points of ", contaminant, " in ", state)) +
theme_void()
when adding something like + scale_color_grey(start = 0.8, end = 0.2) it gives me the following Error: Continuous value supplied to discrete scale
If you have any other idea in what would be the best approach into doing this type of plot I would appreciate it.
I think this is a good example of why it's better to post some data in your question as well as showing us your code. However, it's possible to create some data so that your exact plotting code produces a reasonable output:
set.seed(69)
filteredtable <- data.frame(LongitudeMeasure = runif(100, -81.5, -80.5),
LatitudeMeasure = runif(100, 26, 28),
Result = runif(100))
state <- "Florida"
contaminant <- "Dilithium"
Now let's try your plotting code:
m = map_data('state', region = state)
finalplot <- ggplot() +
geom_polygon( data=m, aes(x=long, y=lat), colour="black", fill="white" ) +
geom_point(data=filteredtable,aes(x=LongitudeMeasure,y=LatitudeMeasure, colour = Result)) +
ggtitle(paste0("Measurement points of ", contaminant, " in ", state)) +
theme_void()
So our plot looks like this:
finalplot
But if we try to add the grayscale that you wanted, we get the same error:
finalplot + scale_color_grey(start = 0.8, end = 0.2)
#> Error: Continuous value supplied to discrete scale
The reason for this is that scale_color_grey produces a discrete gray color scale, but you want a continuous color scale, since you have a continuous variable for Result. You probably wanted scale_color_gradient or scale_color_gradientn. Let's try scale_color_gradient with a grayscale palette and set our breaks to 0.1 increments so we get the labels we want on the bar:
finalplot + scale_color_gradient(low = "gray20", high = "gray80", breaks = seq(0, 1, 0.1))
Or if we want something more colorful:
finalplot +
scale_color_gradientn(colours = c("red", "gold", "forestgreen"), breaks = seq(0, 1, 0.1))
Related
I have created a barchart using ggplot() + geom_bar() functions, by ggplot2 package. I have also used coord_flip() to reverse the orientation of the bars and geom_text() to add the values at the top of each bar. Some of the bars have different colors, so there is a legend following the graph. What I am getting as result is a picture half occupied by the graph, half by the legend and with the values on top of the longest bars being cut off because of the small size of the graph.
Any ideas on how to enlarge the size of the graph and reduce the size of the legend, in order the values of the bars not to be cut off?
Thank you
This is my code on imaginary data:
labels <- c("A","B","C","D","E")
freq <- c(10.3678, 5.84554, 1.5673, 2.313, 7.111)
df <- as.data.frame(cbind(labels,freq))
type <- c("rich","poor","poor","poor","rich")
library(ggplot2)
ggplot(df, aes(x = reorder(labels,freq), y= freq, fill = type)) +
geom_bar(stat = "identity", alpha = 1, width = 0.9)+
coord_flip()+
xlab("")+
ylab("Mean frequency")+
scale_fill_manual(name = "Type", values = c("red", "blue")) +
ggtitle("Mean frequency of different labels")+
geom_text(label = sort(freq, decreasing = FALSE), size = 3.5, hjust = -0.2)
And this is the graph it gives as result:
There are a few fixes to this:
Change your Limits
As indicated by #Dave2e - see his response
Change the size of your output
The interesting thing about graphics in R is that the aspect ratio and resolution of the graphics device will change the result and look of a plot. When I ran your code... no clipping was observed. You can test this out creating the plot and then saving differently. If I take your default code, here's what I get with different arguments to width= and height= for ggsave() as a png:
ggsave('a1.png', width=10, height=5)
ggsave('a2.png', width=15, height=5)
Set an Expansion
The third way is to set an expansion to the scale limits. By default, ggplot2 actually adds some "padding" to the ends of a scale. So, if you set your limits from 0 to 10, you'll actually have a plot area that goes a bit beyond this (about 5% beyond by default). You can redefine that setting by using the expand= argument of scale_... commands in ggplot. So you can set this limit, for example in the following code:
labels <- c("A","B","C","D","E")
freq <- c(10.3678, 5.84554, 1.5673, 2.313, 7.111)
type <- c("rich","poor","poor","poor","rich")
df <- data.frame(labels, freq, type)
library(ggplot2)
ggplot(df, aes(x = reorder(labels,freq), y= freq, fill = type)) +
geom_bar(stat = "identity", alpha = 1, width = 0.9)+
coord_flip()+
xlab("")+
ylab("Mean frequency")+
scale_fill_manual(name = "Type", values = c("red", "blue")) +
ggtitle("Mean frequency of different labels")+
geom_text(label = freq, size = 3.5, hjust = -0.2) +
scale_y_continuous(expand=expansion(mult=c(0,0.15)))
You can define the lower and upper expansion for an axis, so in the above code I've defined to set no expansion to the lower limit of the y scale and to use a multiplier of 0.15 (about 15%) to the upper limit. Default is 0.05, I believe (or 5%).
You can override the default limits on the y axis scale with with the ylim() function.
labels <- c("A","B","C","D","E")
freq <- c(10.3678, 5.84554, 1.5673, 2.313, 7.111)
type <- c("rich","poor","poor","poor","rich")
df <- data.frame(labels, freq, type)
#set the max y axis limit to allow enough room for the label
ylimitmax <- 11
library(ggplot2)
ggplot(df, aes(x = reorder(labels,freq), y= freq, fill = type)) +
geom_bar(stat = "identity", alpha = 1, width = 0.9)+
coord_flip()+
xlab("")+
ylab("Mean frequency")+
scale_fill_manual(name = "Type", values = c("red", "blue")) +
ggtitle("Mean frequency of different labels")+
ylim(0, ylimitmax) +
geom_text(label = freq, size = 3.5, hjust = -0.2)
The script shows how to code the manual limits but you may want to automate the limit calculation with something like ylimitmax= max(freq) * 1.2.
I would like to plot densities of two variables ("red_variable", "green_variable") from two independent dataframes on one density plot, using red and green color for the two variables.
This is my attempt at coding:
library(ggplot2)
### Create dataframes
red_dataframe <- data.frame(red_variable = c(10,11,12,13,14))
green_dataframe <- data.frame(green_variable = c(6,7,8,9,10))
mean(red_dataframe$red_variable) # mean is 12
mean(green_dataframe$green_variable) # mean is 8
### Set colors
red_color= "#FF0000"
green_color= "#008000"
### Trying to plot densities with correct colors and correct legend entries
ggplot() +
geom_density(aes(x=red_variable, fill = red_color, alpha=0.5), data=red_dataframe) +
geom_density(aes(x=green_variable, fill = green_color, alpha=0.5), data=green_dataframe) +
scale_fill_manual(labels = c("Density of red_variable", "Density of green_variable"), values = c(red_color, green_color)) +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha=FALSE)
Result: The legend shows correct colors, but the colors on the plot are wrong: The "red" variable is plotted with green color, the "green" variable with red color. The "green" density (mean=8) should appear left and the "red" density (mean=12) on the right on the x-axis. This behavior of the plot doesn't make any sense to me.
I can in fact get the desired result by switching red and green in the code:
### load ggplot2
library(ggplot2)
### Create dataframes
red_dataframe <- data.frame(red_variable = c(10,11,12,13,14))
green_dataframe <- data.frame(green_variable = c(6,7,8,9,10))
mean(red_dataframe$red_variable) # mean is 12
mean(green_dataframe$green_variable) # mean is 8
### Set colors
red_color= "#FF0000"
green_color= "#008000"
### Trying to plot densities with correct colors and correct legend entries
ggplot() +
geom_density(aes(x=red_variable, fill = green_color, alpha=0.5), data=red_dataframe) +
geom_density(aes(x=green_variable, fill = red_color, alpha=0.5), data=green_dataframe) +
scale_fill_manual(labels = c("Density of red_variable", "Density of green_variable"), values = c(red_color, green_color)) +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha=FALSE)
... While the plot makes sense now, the code doesn't. I cannot really trust code doing the opposite of what I would expect it to do. What's the problem here? Am I color blind?
On your code, in order to have color at the right position, you need to specify fill = red_color or fill = green_color (as well as alpha as it is a constant - as pointed out by #Gregor) outside of the aes such as:
...+
geom_density(aes(x=red_variable), alpha=0.5, fill = red_color, data=red_dataframe) +
geom_density(aes(x=green_variable), alpha=0.5, fill = green_color, data=green_dataframe) + ...
Alternatively, you can bind your dataframes together, reshape them into a longer format (much more appropriate to ggplot) and then add color column that you can use with scale_fill_identity function (https://ggplot2.tidyverse.org/reference/scale_identity.html):
df <- cbind(red_dataframe,green_dataframe)
library(tidyr)
library(ggplot2)
library(dplyr)
df <- df %>% pivot_longer(.,cols = c(red_variable,green_variable), names_to = "var",values_to = "val") %>%
mutate(Color = ifelse(grepl("red",var),red_color,green_color))
ggplot(df, aes(val, fill = Color))+
geom_density(alpha = 0.5)+
scale_fill_identity(guide = "legend", name = "Legend", labels = levels(as.factor(df$var)))+
xlab("X value") +
ylab("Density")
Does it answer your question ?
You're trying to use ggplot as if it's base graphics... the mindset shift can take a little while to get used to. dc37's answer shows how you should do it. I'll try to explain what goes wrong in your attempt:
When you put fill = green_color inside aes(), because it's inside aes() ggplot essentially creates a new column of data filled with the green_color values in your green_data_frame, i.e., "#008000", "#008000", "#008000", .... Ditto for the red color values in the red data frame. We can see this if we modify your plot by simply deleting your scale:
ggplot() +
geom_density(aes(x = red_variable, fill = green_color, alpha = 0.5), data =
red_dataframe) +
geom_density(aes(x = green_variable, fill = red_color, alpha = 0.5), data =
green_dataframe) +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha = FALSE)
We can actually get what you want by putting the identity scale, which is designed for the (common in base, rare in ggplot2) case where you actually put color values in the data.
ggplot() +
geom_density(aes(x = red_variable, fill = green_color, alpha = 0.5), data =
red_dataframe) +
geom_density(aes(x = green_variable, fill = red_color, alpha = 0.5), data =
green_dataframe) +
scale_fill_identity() +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha = FALSE)
When you added your scale_fill_manual, ggplot was like "okay, cool, you want to specify colors and labels". But you were thinking in the order that you added the layers to the plot (much like base graphics), whereas ggplot was thinking of these newly created variables "#FF0000" and "#008000", which it ordered alphabetically by default (just as if they were factor or character columns in a data frame). And since you happened to add the layers in reverse alphabetical order, it was switched.
dc37's answer shows a couple better methods. With ggplot you should (a) work with a single, long-format data frame whenever possible (b) don't put constants inside aes() (constant color, constant alpha, etc.), (c) set colors in a scale_fill_* or scale_color_* function when they're not constant.
I have the following data frame:
observed <- c("1000","2000","3000","4000")
simulated <- c("1100","2100","3100","4100")
error <- c("-1","-2","-0.5","-4")
Date <- c("2013-01-01","2013-01-02","2013-01-03","2013-01-04")
y <- data.frame(Date,observed,simulated,error)
y[-1] <- sapply(y[-1], as.character)
y[-1] <- sapply(y[-1], as.numeric)
y$Date <- as.Date(y$Date, format="%Y-%m-%d")
It compares observed with simulated daily river dicharges on the left y axis and shows the related difference in percent on the right y axis (note that the percentages are just an example here and are not correctly calculated).
I would like to plot all three in one graph with the percentage error plotted on the secondary y axis. I used the following code:
p<-ggplot(y, aes(x=Date))
p<-p + geom_line(aes(y=observed, colour = "observed"), size=1.5)
p<-p + geom_line(aes(y=simulated, colour = "simulated"), size=1.5)
p<-p + geom_line(aes(y=error*-500, colour="red"), size=1.5)
p<-p + scale_colour_manual(name="Discharge [m3/sec]", labels=c("observed","simulated","error"), values = c("blue", "black","red"))
p <- p + scale_y_continuous(sec.axis = sec_axis(~./-500,name = "Error [%]"))
p <- p + labs(y=expression(paste('Q [',m^3~s^-1,']'),
colour = "Parameter"))
p <- p + theme(legend.position = c(0.2, 0.87), legend.title=element_blank(),axis.title.x=element_blank())
My problem is that the secondary y axis starts at -8 and goes down to 0 from top to bottom. What I would like to have is that the secondary y axis` zero is at the top and the -8 is at the bottom where the zero from the first y axis (left) is.
The reason your secondary axis looks like that is because that's how you transformed your data. Since you multiplied your error by -500 in your 3rd geom_line, as the error gets smaller (ie, closer to -8), the line will go up. Therefore, for the secondary axis to correctly map to the data you have, it must be upside down (with -8 at the top).
If you want 0 to be at the top, just divide your error and the trans formula in sec_axis by positive 500:
ggplot(y, aes(x=Date)) +
geom_line(aes(y=observed, colour = "observed"), size=1.5) +
geom_line(aes(y=simulated, colour = "simulated"), size=1.5) +
geom_line(aes(y=error*500, colour = "error"), size=1.5) +
scale_colour_manual(name="Discharge [m3/sec]",
values = c('observed' = "blue",
'simulated' = "black",
'error' = "red")) +
scale_y_continuous(sec.axis = sec_axis(~./500, name = "Error [%]",
breaks = c(0, -2, -4, -6, -8))) +
labs(y=expression(paste('Q [',m^3~s^-1,']'),
colour = "Parameter")) +
theme(legend.position = c(0.2, 0.87),
legend.title=element_blank(),
axis.title.x=element_blank())
And if you want to make the two plots overlap, you can manually add 8 to you error to move it up, and then subtract it from the sec_axis to keep the numbers correct:
ggplot(y, aes(x=Date)) +
geom_line(aes(y=observed, colour = "observed"), size=1.5) +
geom_line(aes(y=simulated, colour = "simulated"), size=1.5) +
geom_line(aes(y=(8 + error) * 500, colour = "error"), size=1.5) +
scale_colour_manual(name="Discharge [m3/sec]",
values = c('observed' = "blue",
'simulated' = "black",
'error' = "red")) +
scale_y_continuous(sec.axis = sec_axis(~(. / 500) - 8, name = "Error [%]",
breaks = c(0, -2, -4, -6, -8))) +
labs(y=expression(paste('Q [',m^3~s^-1,']'),
colour = "Parameter")) +
theme(legend.position = c(0.2, 0.87),
legend.title=element_blank(),
axis.title.x=element_blank())
Additional tips:
You can link multiple ggplot functions with the + operator like I do above instead of saving the intermediate result to a variable each time like you do in your example
The correct way to use scale_color_manual is to pass a named vector to values. This ensures that the given color value (ie. observed) is always associated with the correct color (ie. blue).
If you want the error line to be smaller and less dominant, just reduce the transformation factor. If you multiply (in geom_line) and divide (in sec_axis) it by 100 instead of 500 you get a much flatter line. You'll have to play around with the number to get it to look like what you want. In ggplot2, the secondary axis must be a transformation of the primary axis, so you can't just pass in its own limits= argument.
I have data which comes from a statistical test (gene set enrichment analysis, but that's not important), so I obtain p-values for statistics that are normally distributed, i.e., both positive and negative values:
The test is run on several categories:
set.seed(1)
df <- data.frame(col = rep(1,7),
category = LETTERS[1:7],
stat.sign = sign(rnorm(7)),
p.value = runif(7, 0, 1),
stringsAsFactors = TRUE)
I want to present these data in a geom_tile ggplot such that I color code the df$category by their df$p.value multiplied by their df$stat.sign (i.e, the sign of the statistic)
For that I first take the log10 of df$p.value:
df$sig <- df$stat.sign*(-1*log10(df$p.value))
Then I order the df by df$sig for each sign of df$sig:
library(dplyr)
df <- rbind(dplyr::filter(df, sig < 0)[order(dplyr::filter(df, sig < 0)$sig), ],
dplyr::filter(df, sig > 0)[order(dplyr::filter(df, sig > 0)$sig), ])
And then I ggplot it:
library(ggplot2)
df$category <- factor(df$category, levels=df$category)
ggplot(data = df,
aes(x = col, y = category)) +
geom_tile(aes(fill=sig)) +
scale_fill_gradient2(low='darkblue', mid='white', high='darkred') +
theme_minimal() +
xlab("") + ylab("") + labs(fill="-log10(P-Value)") +
theme(axis.text.y = element_text(size=12, face="bold"),
axis.text.x = element_blank())
which gives me:
Is there a way to manipulate the legend such that the values of df$sig are represented by their absolute value but everything else remains unchanged? That way I still get both red and blue shades and maintain the order I want.
If you check ggplot's documentation, scale_fill_gradient2, like other continuous scales, accepts one of the following for its labels argument:
NULL for no labels
waiver() for the default labels computed for the transofrmation object
a character vector giving labels (must be same length as breaks)
a function that takes the breaks as input and returns labels as output
Since you only want the legend values to be absolute, I assume you're satisfied with the default breaks in the legend colour bar (-0.1 to 0.4 with increments in 0.1), so all you really need is to add a function that manipulates the labels.
I.e. instead of this:
scale_fill_gradient2(low = 'darkblue', mid = 'white', high = 'darkred') +
Use this:
scale_fill_gradient2(low = 'darkblue', mid = 'white', high = 'darkred',
labels = abs) +
I'm not sure I did understood what you're looking for. Do you meant that you wan't to change the labels within legends? If you want to change labels manipulating breaks and labels given by scale_fill_gradient2() shall do it.
ggplot(data=df,aes(x=col,y=category)) +
geom_tile(aes(fill=sig)) +
scale_fill_gradient2(low='darkblue',mid='white',high='darkred',
breaks = order(unique(df$sig)),
labels = abs(order(unique(df$sig)))) +
theme_minimal()+xlab("")+ylab("")+labs(fill="-log10(P-Value)") +
theme(axis.text.y=element_text(size=12,face="bold"),axis.text.x=element_blank())
For what you're looking for maybe you could display texts inside the figure to show the values, try stacking stat_bin_2d() like this:
ggplot(data=df,aes(x=col,y=category)) +
geom_tile(aes(fill=sig)) +
scale_fill_gradient2(low='darkblue',mid='white',high='darkred',
breaks = order(unique(df$sig)),
labels = abs(order(unique(df$sig)))) +
theme_minimal()+xlab("")+ylab("")+labs(fill="-log10(P-Value)") +
stat_bin_2d(geom = 'text', aes(label = sig), colour = 'black', size = 16) +
theme(axis.text.y=element_text(size=12,face="bold"),axis.text.x=element_blank())
You might want to give the size and colour arguments some tries.
I have data that looks like this
df = data.frame(x=sample(1:5,100,replace=TRUE),y=rnorm(100),assay=sample(c('a','b'),100,replace=TRUE),project=rep(c('primary','secondary'),50))
and am producing a plot using this code
ggplot(df,aes(project,x)) + geom_violin(aes(fill=assay)) + geom_jitter(aes(shape=assay,colour=y),height=.5) + coord_flip()
which gives me this
This is 90% of the way to being what I want. But I would like it if each point was only plotted on top of the violin plot for the matching assay type. That is, the jitterred positions of the points were set such that the triangles were only ever on the upper teal violin plot and the circles in the bottom red violin plot for each project type.
Any ideas how to do this?
In order to get the desired result, it is probably best to use position_jitterdodge as this gives you the best control over the way the points are 'jittered':
ggplot(df, aes(x = project, y = x, fill = assay, shape = assay, color = y)) +
geom_violin() +
geom_jitter(position = position_jitterdodge(dodge.width = 0.9,
jitter.width = 0.5,
jitter.height = 0.2),
size = 2) +
coord_flip()
which gives:
You can use interaction between assay & project:
p <- ggplot(df,aes(x = interaction(assay, project), y=x)) +
geom_violin(aes(fill=assay)) +
geom_jitter(aes(shape=assay, colour=y), height=.5, cex=4)
p + coord_flip()
The labeling can be adjusted by numeric scaled x axis:
# cbind the interaction as a numeric
df$group <- as.numeric(interaction(df$assay, df$project))
# plot
p <- ggplot(df,aes(x=group, y=x, group=cut_interval(group, n = 4))) +
geom_violin(aes(fill=assay)) +
geom_jitter(aes(shape=assay, colour=y), height=.5, cex=4)
p + coord_flip() + scale_x_continuous(breaks = c(1.5, 3.5), labels = levels(df$project))