I have data which comes from a statistical test (gene set enrichment analysis, but that's not important), so I obtain p-values for statistics that are normally distributed, i.e., both positive and negative values:
The test is run on several categories:
set.seed(1)
df <- data.frame(col = rep(1,7),
category = LETTERS[1:7],
stat.sign = sign(rnorm(7)),
p.value = runif(7, 0, 1),
stringsAsFactors = TRUE)
I want to present these data in a geom_tile ggplot such that I color code the df$category by their df$p.value multiplied by their df$stat.sign (i.e, the sign of the statistic)
For that I first take the log10 of df$p.value:
df$sig <- df$stat.sign*(-1*log10(df$p.value))
Then I order the df by df$sig for each sign of df$sig:
library(dplyr)
df <- rbind(dplyr::filter(df, sig < 0)[order(dplyr::filter(df, sig < 0)$sig), ],
dplyr::filter(df, sig > 0)[order(dplyr::filter(df, sig > 0)$sig), ])
And then I ggplot it:
library(ggplot2)
df$category <- factor(df$category, levels=df$category)
ggplot(data = df,
aes(x = col, y = category)) +
geom_tile(aes(fill=sig)) +
scale_fill_gradient2(low='darkblue', mid='white', high='darkred') +
theme_minimal() +
xlab("") + ylab("") + labs(fill="-log10(P-Value)") +
theme(axis.text.y = element_text(size=12, face="bold"),
axis.text.x = element_blank())
which gives me:
Is there a way to manipulate the legend such that the values of df$sig are represented by their absolute value but everything else remains unchanged? That way I still get both red and blue shades and maintain the order I want.
If you check ggplot's documentation, scale_fill_gradient2, like other continuous scales, accepts one of the following for its labels argument:
NULL for no labels
waiver() for the default labels computed for the transofrmation object
a character vector giving labels (must be same length as breaks)
a function that takes the breaks as input and returns labels as output
Since you only want the legend values to be absolute, I assume you're satisfied with the default breaks in the legend colour bar (-0.1 to 0.4 with increments in 0.1), so all you really need is to add a function that manipulates the labels.
I.e. instead of this:
scale_fill_gradient2(low = 'darkblue', mid = 'white', high = 'darkred') +
Use this:
scale_fill_gradient2(low = 'darkblue', mid = 'white', high = 'darkred',
labels = abs) +
I'm not sure I did understood what you're looking for. Do you meant that you wan't to change the labels within legends? If you want to change labels manipulating breaks and labels given by scale_fill_gradient2() shall do it.
ggplot(data=df,aes(x=col,y=category)) +
geom_tile(aes(fill=sig)) +
scale_fill_gradient2(low='darkblue',mid='white',high='darkred',
breaks = order(unique(df$sig)),
labels = abs(order(unique(df$sig)))) +
theme_minimal()+xlab("")+ylab("")+labs(fill="-log10(P-Value)") +
theme(axis.text.y=element_text(size=12,face="bold"),axis.text.x=element_blank())
For what you're looking for maybe you could display texts inside the figure to show the values, try stacking stat_bin_2d() like this:
ggplot(data=df,aes(x=col,y=category)) +
geom_tile(aes(fill=sig)) +
scale_fill_gradient2(low='darkblue',mid='white',high='darkred',
breaks = order(unique(df$sig)),
labels = abs(order(unique(df$sig)))) +
theme_minimal()+xlab("")+ylab("")+labs(fill="-log10(P-Value)") +
stat_bin_2d(geom = 'text', aes(label = sig), colour = 'black', size = 16) +
theme(axis.text.y=element_text(size=12,face="bold"),axis.text.x=element_blank())
You might want to give the size and colour arguments some tries.
Related
I am trying to change my the labels in plot's legend to say "yearling" and "adult" not "1" and "-1". I first run this code to make this plot:
cols <- c("dodgerblue", "goldenrod") #designate colors
density_age_plot <- plot_model(density_model, type = "pred",
terms = c("density_std", "ageclass_std"),
color = cols, show.data = TRUE, jitter = 0.1,
axis.title = c("Density", "Proportional Mass Gain"),
title = "Density*Age Class", alpha = 0) +
theme_classic() +
labs(color = "Age Class") +
theme(legend.position = c(0.8, 0.9)) +
theme(plot.title = element_text(hjust = 0.5))
This is the graph it makes
I then try density_age_plot + scale_color_discrete(labels = c("yearling", "adult")) but I get this error:
Scale for 'colour' is already present. Adding another scale for 'colour', which will replace the existing scale.
Error: Continuous value supplied to discrete scale
Can anyone help me? ggplot2 is NOT my thing! Thanks!
OP, you can probably use scale_color_manual(). Here's a reprex, since I don't have your data. You pass a vector to the values= argument to control the color and a vector to the labels= argument to control the labelling. If you supply a named vector, you can explicitly specify which value (color) or label is associated with each factor. Otherwise, it will default to either the levels of the factor or order your items alphanumerically.
df <- data.frame(x=1:10, y=1:10, category = rep(c(1,-1), 5))
ggplot(df, aes(x,y,color=factor(category))) + geom_point() +
scale_color_manual(
values=c('1'='blue','-1'='red'),
labels=c('1'="positive",'-1'="negative"))
Note that if you have numbers as factor levels, you would have to refer to them with the backtick syntax (since names cannot be numbers), but if it is a factor, quotes work fine.
Without scale_color_manual(), it looks like this:
So I'm using ggplot2 to plot both a bar graph and points. I'm currently getting this:
As you can see the bars are nicely separated and colored in the desired colors. However my points are all uncolored and stacked ontop of eachother. I would like the points to be above their designated bar and in the same color.
#Add bars
A <- A + geom_col(aes(y = w1, fill = factor(Species1)),
position = position_dodge(preserve = 'single'))
#Add colors
A <- A + scale_fill_manual(values = c("A. pelagicus"= "skyblue1","A. superciliosus"="dodgerblue","A. vulpinus"="midnightblue","Alopias sp."="black"))
#Add points
A <- A + geom_point(aes(y = f1/2.5),
shape= 24,
size = 3,
fill = factor(Species1),
position = position_dodge(preserve = 'single'))
#change x and y axis range
A <- A + scale_x_continuous(breaks = c(2000:2020), limits = c(2016,2019))
A <- A + expand_limits(y=c(0,150))
# now adding the secondary axis, following the example in the help file ?scale_y_continuous
# and, very important, reverting the above transformation
A <- A + scale_y_continuous(sec.axis = sec_axis(~.*2.5, name = " "))
# modifying axis and title
A <- A + labs(y = " ",
x = " ")
A <- A + theme(plot.title = element_text(size = rel(4)))
A <- A + theme(axis.text.x = element_text(face="bold", size=14, angle=45),
axis.text.y = element_text(face="bold", size=14))
#A <- A + theme(legend.title = element_blank(),legend.position = "none")
#Print plot
A
When I run this code I get the following error:
Error: Unknown colour name: A. pelagicus
In addition: Warning messages:
1: Width not defined. Set with position_dodge(width = ?)
2: In max(table(panel$xmin)) : no non-missing arguments to max; returning -Inf
I've tried a couple of things but I can't figure out it does work for geom_col and not for geom_points.
Thanks in advance
The two basic problems you have are dealing with your color error and not dodging, and they can be solved by formatting your scale_...(values= argument using a list instead of a vector, and applying the group= aesthetic, respectively.
You'll see the answer to these two question using an example:
# dummy dataset
year <- c(rep(2017, 4), rep(2018, 4))
species <- rep(c('things', 'things1', 'wee beasties', 'ew'), 2)
values <- c(10, 5, 5, 4, 60, 10, 25, 7)
pt.value <- c(8, 7, 10, 2, 43, 12, 20, 10)
df <-data.frame(year, species, values, pt.value)
I made the "values" set for my column heights and I wanted to use a different y aesthetic for points for illustrative purposes, called "pt.value". Otherwise, the data setup is similar to your own. Note that df$year will be set as numeric, so it's best to change that into either Date format (kinda more trouble than it's worth here), or just as a factor, since "2017.5" isn't gonna make too much sense here :). The point is, I need "year" to be discrete, not continuous.
Solve the color error
For the plot, I'll try to create it similar to you. Here note that in the scale_fill_manual object, you have to set the values= argument using a list. In your example code, you are using a vector (c()) to specify the colors and naming. If you have name1=color1, name2=color2,..., this represents a list structure.
ggplot(df, aes(x=as.factor(year), y=values)) +
geom_col(aes(fill=species), position=position_dodge(width=0.62), width=0.6) +
scale_fill_manual(values=
list('ew' = 'skyblue1', 'things' = 'dodgerblue',
'things1'='midnightblue', 'wee beasties' = 'gray')) +
geom_point(aes(y=pt.value), shape=24, position=position_dodge(width=0.62)) +
theme_bw() + labs(x='Year')
So the colors are applied correctly and my axis is discrete, and the y values of the points are mapped to pt.value like I wanted, but why don't the points dodge?!
Solve the dodging issue
Dodging is a funny thing in ggplot2. The best reasoning here I can give you is that for columns and barplots, dodging is sort of "built-in" to the geom, since the default position is "stack" and "dodge" represents an alternative method to draw the geom. For points, text, labels, and others, the default position is "identity" and you have to be more explicit in how they are going to dodge or they just don't dodge at all.
Basically, we need to let the points know what they are dodging based on. Is it "species"? With geom_col, it's assumed to be, but with geom_point, you need to specify. We do that by using a group= aesthetic, which let's the geom_point know what to use as criteria for dodging. When you add that, it works!
ggplot(df, aes(x=as.factor(year), y=values, group=species)) +
geom_col(aes(fill=species), position=position_dodge(width=0.62), width=0.6) +
scale_fill_manual(values=
list('ew' = 'skyblue1', 'things' = 'dodgerblue',
'things1'='midnightblue', 'wee beasties' = 'gray')) +
geom_point(aes(y=pt.value), shape=24, position=position_dodge(width=0.62)) +
theme_bw() + labs(x='Year')
I am pretty sure that this is easy to do but I can't seem to find a proper way to query this question into google or stack, so here we are:
I have a plot made in ggplot2 which makes use of geom_jitter(), efficiently creating one row for each element in a factor and plotting its values.
I would like to add a complementary geom_violin() to the plot, but just adding the extra geom_ function to the plot code returns two layers: the jitter and the violin, one on top of the other (as usually expected).
EDIT:
This is how the plot looks like:
How can I have the violin as a separate row, without generating a second plot?
Side quest: how I can I have the jitter and the violin geoms interleaved? (i.e. element A jitter row followed by element A violin row, and then element B jitter row followed by element B violin row)
This is the minimum required code to make it (without all the theme() embellishments):
P1 <- ggplot(data=TEST_STACK_SUB, aes(x=E, y=C, col=A)) +
theme(... , aspect.ratio=0.3) +
geom_point(position = position_jitter(w = 0.30, h = 0), alpha=0.2, size=0.5) +
geom_violin(data=TEST_STACK_SUB, mapping=aes(x=E, y=C), position="dodge") +
scale_x_discrete() +
scale_y_continuous(limits=c(0,1), breaks=seq(0,1,0.1),
labels=c(seq(0,1,0.1))) +
scale_color_gradient2(breaks=seq(0,100,20),
limits=c(0,100),
low="green3",
high="darkorchid4",
midpoint=50,
name="") +
coord_flip()
options(repr.plot.width=8, repr.plot.height=2)
plot(P1)
Here is a subset of the data to generate it (for you to try):
data
How about manipulating your factor as a continuous variable and nudging the entries across the aes() calls like so:
library(dplyr)
library(ggplot2)
set.seed(42)
tibble(x = rep(c(1, 3), each = 10),
y = c(rnorm(10, 2), rnorm(10))) -> plot_data
ggplot(plot_data) +
geom_jitter(aes(x = x - 0.5, y = y), width = 0.25) +
geom_violin(aes(x = x + 0.5, y = y, group = x), width = 0.5) +
coord_flip() +
labs(x = "x") +
scale_x_continuous(breaks = c(1, 3),
labels = paste("Level", 1:2),
trans = scales::reverse_trans())
I would like to draw a chart with ggplot for a couple of model accuracies. The detail of the plotted result doesn't matter, however, I've a problem to fill the geom_point objects.
A sample file can be found here: https://ufile.io/z1z4c
My code is:
library(ggplot2)
library(ggthemes)
Palette <- c('#A81D35', '#085575', '#1DA837')
results <- read.csv('test.csv', colClasses=c('factor', 'factor', 'factor', 'numeric'))
results$dates <- factor(results$dates, levels = c('01', '15', '27'))
results$pocd <- factor(results$pocd, levels = c('without POCD', 'with POCD', 'null accuracy'))
results$model <- factor(results$model, levels = c('SVM', 'DT', 'RF', 'Ada', 'NN'))
ggplot(data = results, group = pocd) +
geom_point(aes(x = dates, y = acc,
shape = pocd,
color = pocd,
fill = pocd,
size = pocd)) +
scale_shape_manual(values = c(0, 1, 3)) +
scale_color_manual(values = c(Palette[1], Palette[2], Palette[3])) +
scale_fill_manual(values = c(Palette[1], Palette[2], Palette[3])) +
scale_size_manual(values = c(2, 2, 1)) +
facet_grid(. ~ model) +
xlab('Date of knowledge') +
ylab('Accuracy') +
theme(legend.position = 'right',
legend.title = element_blank(),
axis.line = element_line(color = '#DDDDDD'))
As a result I get unfilled circles and squares. How can I fix it, so that the squares and circles are filled with the specfic color?
Additional question: I would like to add a geom_line to the graph, connecting the three points in each group. However, I fail to adjust linetype and width. It always take the values of scale_*_manual, which is very adverse especially in the case of size.
Thanks for helping!
You need to change the shapes specified, like so:
scale_shape_manual(values = c(21,22,23)) +
For your additional question, that should be solved if you set aes(size=) in the first part of your code (under ggplot(data=...) and then manually specify size=1 under geom_line as +geom_line(size=1....`
I want to create the next histogram density plot with ggplot2. In the "normal" way (base packages) is really easy:
set.seed(46)
vector <- rnorm(500)
breaks <- quantile(vector,seq(0,1,by=0.1))
labels = 1:(length(breaks)-1)
den = density(vector)
hist(df$vector,
breaks=breaks,
col=rainbow(length(breaks)),
probability=TRUE)
lines(den)
With ggplot I have reached this so far:
seg <- cut(vector,breaks,
labels=labels,
include.lowest = TRUE, right = TRUE)
df = data.frame(vector=vector,seg=seg)
ggplot(df) +
geom_histogram(breaks=breaks,
aes(x=vector,
y=..density..,
fill=seg)) +
geom_density(aes(x=vector,
y=..density..))
But the "y" scale has the wrong dimension. I have noted that the next run gets the "y" scale right.
ggplot(df) +
geom_histogram(breaks=breaks,
aes(x=vector,
y=..density..,
fill=seg)) +
geom_density(aes(x=vector,
y=..density..))
I just do not understand it. y=..density.. is there, that should be the height. So why on earth my scale gets modified when I try to fill it?
I do need the colours. I just want a histogram where the breaks and the colours of each block are directionally set according to the default ggplot fill colours.
Manually, I added colors to your percentile bars. See if this works for you.
library(ggplot2)
ggplot(df, aes(x=vector)) +
geom_histogram(breaks=breaks,aes(y=..density..),colour="black",fill=c("red","orange","yellow","lightgreen","green","darkgreen","blue","darkblue","purple","pink")) +
geom_density(aes(y=..density..)) +
scale_x_continuous(breaks=c(-3,-2,-1,0,1,2,3)) +
ylab("Density") + xlab("df$vector") + ggtitle("Histogram of df$vector") +
theme_bw() + theme(plot.title=element_text(size=20),
axis.title.y=element_text(size = 16, vjust=+0.2),
axis.title.x=element_text(size = 16, vjust=-0.2),
axis.text.y=element_text(size = 14),
axis.text.x=element_text(size = 14),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
fill=seg results in grouping. You are actually getting a different histogram for each value of seg. If you don't need the colours, you could use this:
ggplot(df) +
geom_histogram(breaks=breaks,aes(x=vector,y=..density..), position="identity") +
geom_density(aes(x=vector,y=..density..))
If you need the colours, it might be easiest to calculate the density values outside of ggplot2.
Or an option with ggpubr
library(ggpubr)
gghistogram(df, x = "vector", add = "mean", rug = TRUE, fill = "seg",
palette = c("#00AFBB", "#E7B800", "#E5A800", "#00BFAB", "#01ADFA",
"#00FABA", "#00BEAF", "#01AEBF", "#00EABA", "#00EABB"), add_density = TRUE)
The confusion regarding interpreting the y-axis might be due to density is plotted rather than count. So, the values on the y-axis are proportions of the total sample, where the sum of the bars is equal to 1.