I am trying to 'trim' the legend of the plot below:
df <- data.frame(x = seq(0, 15 , 0.001),
y = seq(0, 15, 0.001))
ggplot(df, aes(x=x, y=y, col = y)) +
geom_line() +
scale_color_gradientn(colours = c("green", "black", "red", "red"),
values = rescale(x = c(0, 2, 4, 15), to = c(0,1), from = c(0, 15) ))
I am able to set the required breaks and values by adding breaks = c(0,2,4), labels = c("0", "2", "4+"):
But when I add the limits=c(0,4), the gradient gets messed up.
Question
Is it possible to 'trim' the legend, so that it shows the values from 0 to 4+ (i.e., and omits all values above)?
The following is what you probably want.
library(ggplot2)
library(scales)
df <- data.frame(x = seq(0, 15 , 0.001),
y = seq(0, 15, 0.001))
ggplot(df, aes(x=x, y=y, col = y)) +
geom_line() +
scale_color_gradientn(colours = c("green", "black", "red", "red"),
values = rescale(x = c(0, 2, 4, 15), from = c(0, 4)),
oob = squish,
limits = c(0, 4))
What's happening is as follows. Suppose we have some values in data space (meaning they haven't been rescaled yet).
# Colour positions in data space
print(col_pos_data <- c(0, 2, 4, 15))
#> [1] 0 2 4 15
By default, the scales::rescale() function brings everything into the [0,1] interval. However, when you set a custom range, any out of bounds values will be scaled linearly with the in bounds values. You'll notice that the 15 becomes 3.75 in this case.
# Colour positions in [0,1] interval
col_pos_scaled <- rescale(col_pos_data, from = c(0, 4))
print(col_pos_scaled)
#> [1] 0.00 0.50 1.00 3.75
However, the default manner that ggplot enforces the limits of continuous scales, is to set anything that exceeds the limits to NA, which is often removed later.
# Default ggplot limit enforcing
print(censor(col_pos_scaled))
#> [1] 0.0 0.5 1.0 NA
Now that is a bit too bad for your scale purposes, but one of the alternatives is to 'squish' the data. This brings any (finite) out of bounds values to the nearest limit.
Notice that the last value is no longer NA but set to the largest limit in the [0,1] interval.
print(scaled_squish <- squish(col_pos_scaled))
#> [1] 0.0 0.5 1.0 1.0
The same thing holds true for values in data space if the range is adjusted accordingly.
print(censor(col_pos_data, range = c(0, 4)))
#> [1] 0 2 4 NA
print(data_squish <- squish(col_pos_data, range = c(0, 4)))
#> [1] 0 2 4 4
Internally, ggplot rescales all data to the limits and the order of operations doesn't matter for squishing/rescaling, so the data values and the colour positions in [0,1] line up nicely.
# So when data values are rescaled, they match up the colours
identical(rescale(data_squish), scaled_squish)
#> [1] TRUE
Created on 2020-04-24 by the reprex package (v0.3.0)
Related
I'm not able to correctly place the labels for this plot. By 'correctly' I mean not stuck or overlapping at the top but moved down the y-axis to an appropriate position. Please see reproducible example and plots below. The current method of labelling means the y-axis position is ignored. How can I workaround this?
library(magrittr)
library(dplyr)
library(ggpubr)
set.seed(20)
col1<-c(rep('E', each = 8))
col2<-c(rep('R', each = 8))
col3<-c(rep('S', each = 8))
behaviour<-c(col1,col2,col3)
value <- runif(length(behaviour), min=0, max=0.0006)
species <- c(rep('B_theta', each = length(behaviour)))
test.data <- data.frame(behaviour, value, species)
d <- compare_means(value~behaviour, data = test.data,method = 't.test')
d %<>% mutate(y_pos = c(1.004,1.156763e-06,5.882128e-04),labels = ifelse(p<0.15,p.format,p.signif))
d
.y. group1 group2 p p.adj p.format p.signif method y_pos labels
value E R 0.4678791 0.76 0.47 ns T-test 1.004000e+00 ns
value E S 0.1559682 0.47 0.16 ns T-test 1.156763e-06 0.16
value R S 0.3794209 0.76 0.38 ns T-test 5.882128e-04 ns
ggplot(data=subset(test.data, !is.na(value)), aes(x=behaviour,y=value)) +
geom_boxplot(aes(fill = behaviour), width=0.4,outlier.colour = "transparent")+
geom_point(aes(fill = behaviour), size = 5, shape = 21, position = position_jitterdodge()) +
scale_fill_manual(values = c("E" = '#3797a4', "R"= '#96bb7c',"S"= '#944e6c'))+
theme_classic()+
scale_y_log10() + annotation_logticks(sides = "l")+
geom_signif(data = as.data.frame(d), textsize=6, tip_length = 0.01,
aes(xmin=group1, xmax=group2, annotations=labels,y_position=y_pos),manual=TRUE)
Warning message:
"Ignoring unknown aesthetics: xmin, xmax, annotations, y_position"
This is what I currently get:
Desirable output:
I had some problems to reproduce your example. Are you sure that the code you provided gave you this numbers? I really struggle to understand how a p-value of 0.16 can be smaller than 0.15 ???
Also, I am not familiar with the geom_signif function, all information I have is what i found in its help (?geom_signif)
Anyhow, with some minor changes I was able to create your desired plot:
library(magrittr)
library(dplyr)
library(ggpubr)
set.seed(20)
col1<-c(rep('E', each = 8))
col2<-c(rep('R', each = 8))
col3<-c(rep('S', each = 8))
behaviour<-c(col1,col2,col3)
value <- runif(length(behaviour), min=0, max=0.0006)
species <- c(rep('B_theta', each = length(behaviour)))
test.data <- data.frame(behaviour, value, species)
d <- compare_means(value~behaviour, data = test.data,method = 't.test')
# Here you manually specify the y-position of the significance bars
# The plot you got, was exactly what you specified here.
# But since we are using log10 scale for y Axis we should probably indicate
# The exponent i.e. -1, 0, 1 (as seen in your desired output plot)
# ALSO: I could not reproduce the significant result with your value for alpha of 0.15 -> had to use 0.25
#
# P.S. I will never use %<>% in my code. Ever! ;) (In my work, %<>% is an excellent footgun - it might save halve a second of typing,
# but you can only hope that nobody has ever to read the code again)
#
d <- d %>% mutate(y_pos = c(-1, 0, 1),labels = ifelse(p < 0.25, p.format, p.signif))
ggplot(data=subset(test.data, !is.na(value)), aes(x=behaviour,y=value)) +
geom_boxplot(aes(fill = behaviour), width=0.4,outlier.colour = "transparent")+
geom_point(aes(fill = behaviour), size = 5, shape = 21, position = position_jitterdodge()) +
scale_fill_manual(values = c("E" = '#3797a4', "R"= '#96bb7c',"S"= '#944e6c'))+
theme_classic()+
scale_y_log10() + annotation_logticks(sides = "l") +
# The syntax you used did not correspond to the example I found in
# ?geom_signif
# If we just follow the example everything works and the warnings disappear
#
geom_signif(textsize=6, tip_length = 0.01,
xmin=d$group1, xmax=d$group2, annotations=d$labels, y_position=d$y_pos)
I am trying to plot the relative frequency of 1D data from 3 clusters. What I want is a single histogram that uses color to distinguish between the 3 clusters, and I want the height of each bin to represent the relative frequency of that value range for a particular cluster.
The code is as follows:
library(mvtnorm)
library(gtools)
library(ggplot2)
K = 3 # number of clusters
p_p = c(0.25, 0.25, 0.5) # population weights
theta_p = c(2, 5, 15) # population gamma params - shape
phi_p = c(2,2, 5) # population gamma params - scale
N_p = c(25, 25, 50) # sample size within each cluster
set.seed(1) # set seed so that the results are the same each time
y <- numeric()
## We will now sample data from all three clusters
y[1:N_p[1]] <- rgamma(N_p[1], theta_p[1], phi_p[1])
y[(N_p[1]+1): (N_p[1]+N_p[2])] <- rgamma(N_p[2], theta_p[2], phi_p[2])
y[(N_p[1]+N_p[2]+1): sum(N_p)] <- rgamma(N_p[3], theta_p[3], phi_p[3])
Data = data.frame(y = y, source = as.factor(c(rep(1,25), rep(2,25), rep(3,50))))
ggplot(Data, aes(x=y, color = source))+
geom_histogram(aes(y=..count../sum(..count..)),fill="white", position="dodge", binwidth = 0.5) +
theme(legend.position="top")+labs(title="Samples against Theoretical Dist",y="Frequency", x="Sample Value")
length(which(y[1:25]<=0.5))/length(y)
length(which(y[1:25]<=0.5))/length(y[0:25])
Now, what I want is for the first red histogram bar to have a height equal to length(which(y[1:25]<=0.5))/length(y[0:25]). I would understand if i was getting length(which(y[1:25]<=0.5))/length(y) instead, and I could work around that.
However, I'm getting a height of around 0.12, which doesn't match either of these values and has me thinking I am completely misunderstanding ..count.. and sum(..count..).
The issue isn't with your understanding of ..count.. but in your assumption of how binwidth works. You have assumed that setting it to 0.5 will set the breaks at 0, 0.5, 1, 1.5 etc, but in fact it sets it at the lowest value of the range of your data. So in fact, the height of your first bar is length(which(y[1:25] <= (min(y) + 0.5)))/length(y), which is 13.
You can specify breaks in geom_histogram to work round this limitation:
ggplot(Data, aes(x = y, color = source)) +
geom_histogram(aes(y = stat(count)/length(y)), fill = "white",
position = "dodge", breaks = seq(0, 6, 0.5)) +
theme(legend.position = "top" +
labs(title = "Samples against Theoretical Dist",
y = "Frequency", x = "Sample Value")
Now each bar is 1/100th of the count since the vector is 100 long.
I have a dataframe with 3 columns and several rows, with this structure
Label Year Frequency
1 a 1 86.45
2 b 1 35.32
3 c 1 10.94
4 a 2 13.55
5 b 2 46.30
6 c 2 12.70
up until 20 years. I plot it like this:
ggplot(data=df, aes(x=df$Year, y=df$Frequency, fill=df$Label))+
geom_col(position=position_dodge2(width = 0.1, preserve = "single"))+
scale_fill_manual(name=NULL,
labels=c("A", "B", "C"),
values=c("red", "cyan", "green")) +
scale_x_continuous(breaks = seq(0, 20, by = 1),
limits = c(0, 20)) +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 90),
breaks = seq(0, 90, by = 10)) +
theme_bw()
What I want to do is to add three normal distribution to the plot, so that each group of data (A, B, C) can be visually compared with the normal distribution more similar to its distribution, using the same colors (the normal distribution for label A will be red, and so on).
From the data used in here as an example, I will expect to see a red distribution higher and narrower than the green distribution, which will be shorter and wider. How can I add them to the plot?
I am using hline from ggplot to construct an axis for a data set I am looking out. Essentially I want to selectively color this axis based on a dataframe. This dataframe consists of an array of (7684, 7685,...,7853) and each corresponds to a letter "a", "b", "c", and "d". I would like to correspond each letter with a color used to color that interval on the axis.
For example row 1 of this data frame is: (7684, "c") so I would want to color the interval on the axis from 7684 to 7685 the color of "c" which could be red for instance. I have yet to think of a straightforward solution to this, I am not sure if hline would be the way to go with this.
> df
p nucleotide
1 c 7684
2 c 7685
3 t 7686
4 t 7687
5 a 7688
6 c 7689
7 a 7690
8 t 7691
9 a 7692
10 c 7693
Small snippet of what I am talking about. Basically want to associate df$p with colors. And color the interval of the corresponding df$nucleotide
You never use a for loop in ggplot and you should never use df$.. in an aesthetic.
library(dplyr)
library(ggplot2)
ggplot(df) +
geom_segment(aes(x = nucleotide, xend = lead(nucleotide), y = 1, yend = 1, color = p), size = 4)
#> Warning: Removed 1 rows containing missing values (geom_segment).
This takes us half the way. What is does is draw a segment from x to xend. x is mapped to the nucleotide value, xend is mapped to lead(nucleotide), meaning the next value. This of course lead to leaving out the last line, as it does not have a next value.
The following code takes care of that, admittedly in a hackish way, adding a row to the df, and then limiting scale_x . It may be not generalizable.
It also add some graphical embellishment.
df %>%
add_row(p = '', nucleotide = max(.$nucleotide) + 1) %>%
ggplot() +
geom_segment(aes(x = nucleotide, xend = lead(nucleotide), y = 1, yend = 1, color = p), size = 4) +
geom_text(aes(x = nucleotide, y = 1, label = nucleotide), nudge_x = .5, size = 3) +
scale_x_continuous(breaks = NULL, limits = c(min(df$nucleotide), max(df$nucleotide) + 1)) +
scale_color_brewer(palette = 'Dark2', limits = c('a', 'c', 't'), direction = 1) +
theme(aspect.ratio = .2,
panel.background = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.y = element_blank())
#> Warning: Removed 1 rows containing missing values (geom_segment).
#> Warning: Removed 1 rows containing missing values (geom_text).
Data
df <- read.table(text = ' p nucleotide
1 c 7684
2 c 7685
3 t 7686
4 t 7687
5 a 7688
6 c 7689
7 a 7690
8 t 7691
9 a 7692
10 c 7693', header = T)
I'd like to make small returns in this plot more visible. The most appropriate function seems to be scale_colour_gradient2, but this washes out the small returns, which happen most often. Using limits helped but I couldn't work out how to set oob (out of bounds) so it would just have a "saturated" value rather than be grey. And the log transform just made small values stand out. Has someone else figured out how to do this elegantly?
library(zoo)
library(ggplot2)
library(tseries)
spx <- get.hist.quote(instrument="^gspc", start="2000-01-01",
end="2013-12-14", quote="AdjClose",
provider="yahoo", origin="1970-01-01",
compression="d", retclass="zoo")
spx.rtn <- diff(log(spx$AdjClose)) * 100
rtn.data <- data.frame(x=time(spx.rtn),yend=spx.rtn)
p <- ggplot(rtn.data) +
geom_segment(aes(x=x,xend=x,y=0,yend=yend,colour=yend)) +
xlab("") + ylab("S&P 500 Daily Return %") +
theme(legend.position="null",axis.title.x=element_blank())
# low returns invisible
p + scale_colour_gradient2(low="blue",high="red")
# extreme values are grey
p + scale_colour_gradient2(low="blue",high="red",limits=c(-3,3))
# log transform returns has opposite problem
max_val <- max(log(abs(spx.rtn)))
values <- seq(-max_val, max_val, length = 11)
library(RColorBrewer)
p + scale_colour_gradientn(colours = brewer_pal(type="div",pal="RdBu")(11),
values = values
, rescaler = function(x, ...) sign(x)*log(abs(x)), oob = identity)
Here is another possibility, using scale_colour_gradientn. Mapping of colours is set using values = rescale(...) so that resolution is higher for values close to zero. I had a look at some colour scales here: http://colorbrewer2.org. I chose a 5-class diverging colour scheme, RdBu, from red to blue via near-white. There might be other scales that suit your needs better, this is just to show the basic principles.
# check the colours
library(RColorBrewer)
# cols <- brewer_pal(pal = "RdBu")(5) # not valid in 1.1-2
cols <- brewer.pal(n = 5, name = "RdBu")
cols
# [1] "#CA0020" "#F4A582" "#F7F7F7" "#92C5DE" "#0571B0"
# show_col(cols) # not valid in 1.1-2
display.brewer.pal(n = 5, name = "RdBu")
Using rescale, -10 corresponds to blue #0571B0; -1 = light blue #92C5DE; 0 = light grey #F7F7F7; 1 = light red #F4A582; 10 = red #CA0020. Values between -1 and 1 are interpolated between light blue and light red, et c. Thus, mapping is not linear and resolution is higher for small values.
library(ggplot2)
library(scales) # needed for rescale
ggplot(rtn.data) +
geom_segment(aes(x = x, xend = x, y = 0, yend = yend, colour = yend)) +
xlab("") + ylab("S&P 500 Daily Return %") +
scale_colour_gradientn(colours = cols,
values = rescale(c(-10, -1, 0, 1, 10)),
guide = "colorbar", limits=c(-10, 10)) +
theme(legend.position = "null", axis.title.x = element_blank())
how about:
p + scale_colour_gradient2(low="blue",high="red",mid="purple")
or
p + scale_colour_gradient2(low="blue",high="red",mid="darkgrey")