I want to have some more flexibility for setting the breaks on a ggplot that has facets.
library(data.table)
library(tidyverse)
dt <- data.table(x = rnorm(1000),
group = sample(c(1,2), size=1000, replace = TRUE))
The problem is I want to create the breaks based off all of the data for a particular facet, but the documentation for breaks says:
A function that takes the limits as input and returns breaks as output (e.g., a function returned by scales::extended_breaks()). Also accepts rlang lambda function notation.
Note that you just get the limits of the data for that facet. Say I want to use the output of summary to create my breaks. E.g.:
breaks_f <- function(x){
print(x) # included this to confirm I only get limits
as.numeric(ceiling(summary(x)))
}
dt %>%
ggplot(aes(x=x)) +
geom_density(adjust=.8, color = NA, alpha = .8, fill = 'blue')+
scale_x_continuous(breaks = breaks_f) +
facet_wrap(vars(group), scales = 'free')
You'll see if you run this, you'll get breaks that based off taking summary(c(min, max)) for each facet, not all of the data for that facet.
So is there a way to access all of the data within each facet?
Thanks!
One option would be ggh4x::facetted_pos_scales which allows to
... vary labels, breaks, limits, transformations and even axis guides for each panel individually.
Hence, using ggh4x::facetted_pos_scales you could apply your function and set the breaks individually for each panel:
library(data.table)
library(tidyverse)
library(ggplot2)
library(ggh4x)
set.seed(123)
dt <- data.table(
x = rnorm(1000),
group = sample(c(1, 2), size = 1000, replace = TRUE)
)
breaks_f <- function(x) {
print(x) # included this to confirm I only get limits
as.numeric(ceiling(summary(x)))
}
dt %>%
ggplot(aes(x = x)) +
geom_density(adjust = .8, color = NA, alpha = .8, fill = "blue") +
facet_wrap(vars(group), scales = "free") +
facetted_pos_scales(x = list(
scale_x_continuous(breaks = breaks_f),
scale_x_continuous(breaks = breaks_f)
))
#> [1] -2.937358 3.535249
#> [1] -2.937358 3.535249
#> [1] -3.084506 2.959591
#> [1] -3.084506 2.959591
Related
I am attempting to complete a principal component analysis on a set of data containing columns of numeric data.
Assuming a dataset like this (in reality I have a pre configured data frame, this one if for reproducibility):
v1 <- c(1,2,3,4,5,6,7)
v2 <- c(3,6,2,5,2,4,9)
v3 <- c(6,1,4,2,3,7,5)
dataset <-data.frame(v1,v2,v3)
row.names(dataset) <-c('New York', 'Seattle', 'Washington DC', 'Dallas', 'Chicago','Los Angeles','Minneapolis')
I have ran my principal component analysis, and successfully plotted it:
pca=prcomp(dataset,scale=TRUE)
plot(pca$x[,1], pca$x[,2],
xlab="First PC",ylab="Second PC")
text(pca$x[,1], pca$x[,2],cex=0.7,pos=3,col="darkgrey")
What I want to do however is colour code my data points based on the city, which is the row names of my dataset. I also want to use these cities (i.e. rownames) as labels.
I've tried the following, but neither have worked:
## attempt 1 - I get row labels, but no chart
plot(pca$x[,1], pca$x[,2],col=rownames(dataset),pch=rownames(dataset),
xlab="First PC",ylab="Second PC")
text(pca$x[,1], pca$x[,2],labels=rownames(dataset),cex=0.7,pos=3,col="darkgrey")
## attempt 2
datasetwithcity = rownames_to_column(dataset, var = "city")
head(datasetwithcity)
OnlyCities=datasetwithcity[,1]
OnlyCities
# this didn't work:
City_Labels=as.numeric(OnlyCities)
head(City_Labels)
# gets city labels, but loses points and no colour
plot(pca$x[,1], pca$x[,2],col=City_Labels,pch=City_Labels,
xlab="First PC",ylab="Second PC")
text(pca$x[,1], pca$x[,2],labels=rownames(dataset),
cex=0.7,pos=3,col="darkgrey")
There are many different ways to do this.
In base R, you could do:
plot(pca$x[,1], pca$x[,2],
xlab="First PC",ylab="Second PC", col = seq(nrow(pca$x)),
xlim = c(-2.5, 2.5), ylim = c(-2, 2))
text(pca$x[,1], pca$x[,2],cex=0.7,pos=3,col="darkgrey")
text(x = pca$x[,1], y = pca$x[,2], labels = rownames(pca$x), pos = 1)
Personally, I think the resulting aesthetics are nicer (and more easy to change to suit your needs) with ggplot. The code is also a bit easier to read once you get used to the syntax.
library(ggplot2)
df <- as.data.frame(pca$x)
df$city <- rownames(df)
ggplot(df, aes(PC1, PC2, color = city)) +
geom_point(size = 3) +
geom_text(aes(label = city) , vjust = 2) +
lims(x = c(-2.5, 2.5), y = c(-2, 2)) +
theme_bw() +
theme(legend.position = "none")
Created on 2021-10-28 by the reprex package (v2.0.0)
A question posted here shows how to declare some of the values missing. I have a similar problem except I wish to highlight a single value with a different color eg. mpg = 20. Ideally, I would like it to show up on the legend as well.
To be clear, I wish to highlight a specific value on the gradient.
I am reusing the code that was used in the other post to seed the effort. This code specifies the lower limit of the data but does not allow for an arbitrarily chosen value.
I was wondering if people know how to do this with our without using something like scale_colour_gradientn.
library(ggplot2)
dat <- head(mtcars)
dat$model <- head(colnames(mtcars))
dat$is_low <- ifelse(dat$mpg < 20, TRUE, FALSE)
ggplot(dat, aes(x = model, y = mpg, fill = mpg)) +
geom_col() +
scale_fill_continuous(limits=c(20,max(dat$mpg)))
This is adapted from the answer I gave here, but it requires some messing around with the palette.
This is a custom palette function that replaces the values between the target values with the replace_colour, but it requires to know the range of the data first. Note that the function isn't very user friendly, but it does the job.
library(ggplot2)
library(scales)
my_palette <- function(colours, target = c(20.5, 21.5),
range = range(target), values = NULL,
replace_colour = "green") {
target <- (target - range[1]) / diff(range)
ramp <- scales::colour_ramp(colours)
force(values)
function(x) {
# Decide what values to replace
replace <- x > target[1] & x < target[2]
if (length(x) == 0)
return(character())
if (!is.null(values)) {
xs <- seq(0, 1, length.out = length(values))
f <- stats::approxfun(values, xs)
x <- f(x)
}
out <- ramp(x)
# Actually replace values
out[replace] <- replace_colour
out
}
}
You can then use that function with a custom scale as follows. I chose to highlight around 21 because 20 doesn't occur in dat$mpg.
dat <- head(mtcars)
dat$model <- head(colnames(mtcars))
dat$is_low <- ifelse(dat$mpg < 20, TRUE, FALSE)
colours <- seq_gradient_pal("#132B43", "#56B1F7")(seq(0, 1, length.out = 12))
ggplot(dat, aes(x = model, y = mpg, fill = mpg)) +
geom_col() +
continuous_scale(
"fill", "my_pal",
my_palette(colours, range = range(dat$mpg), target = c(20.9, 21.1)),
guide = guide_colourbar(nbin = 500) # Give guide plenty bins
)
Created on 2021-04-13 by the reprex package (v1.0.0)
Applying this to log scaled values requires you to log scale all the input data to my_palette too.
dat <- head(mtcars)
dat$model <- head(colnames(mtcars))
dat$mpg <- c(1e-6, 1e-4, 1e-2, 1e0, 1e2, 1e4)
colours <- seq_gradient_pal("#132B43", "#56B1F7")(seq(0, 1, length.out = 12))
ggplot(dat, aes(x = model, y = mpg, fill = mpg)) +
geom_col() +
scale_y_log10() +
continuous_scale(
"fill", "my_pal", trans = "log10",
my_palette(colours, range = log10(range(dat$mpg)),
target = log10(1e2) * c(0.9, 1.1)),
guide = guide_colourbar(nbin = 500) # Give guide plenty bins
)
I'm currently finishing off my Masters project and need to include some graphics for the write-up. Without boring you too much, I have some data which is associated with AR(1) parameters ranging from 0.1 to 0.9 by 0.1 increments. As such I thought of doing a faceted histogram like the one below (worry not about the hideous fruit salad of colours, it will not be used).
I used this code.
ggplot(opt_lens_geom,aes(x=l_1024,fill=factor(rho))) + geom_histogram()+coord_flip()+facet_grid(.~rho,scales = "free_x")
I also would like to draw a trend line for the median values since the AR(1) parameter is continuous. In a later iteration I deleted the padding and made it "look" like it was one graph, but I have had issues with the endpoints matching up since each facet is a separate graphical device. Can anyone give me some advice on how to do this? I am not particularly partial to the faceting so if it is not needed I do away with it.
I will try and upload sample data, but all simulating 100 values for each of the 9 rhos would work just to get it started like:
opt_lens_geom <- data.frame(rho= rep(seq(0.1,0.9,by=0.1),each=100),l_1024=rnorm(900))
You might consider ggridges. I've assumed here that you want a median value for each value of rho.
library(ggplot2)
library(ggridges)
library(dplyr)
set.seed(1001)
opt_lens_geom <- data.frame(rho = rep(seq(0.1, 0.9, by = 0.1), each = 100),
l_1024 = rnorm(900))
opt_lens_geom %>%
mutate(rho_f = factor(rho)) %>%
ggplot(aes(l_1024, rho_f)) +
stat_density_ridges(quantiles = 2, quantile_lines = TRUE)
Result. You can add scale = 1 as a parameter to stat_density_ridges if you don't like the amount of overlap.
Try the following. It uses a pre-computed data frame of the medians.
library(ggplot2)
df <- iris[c(1, 5)]
names(df) <- c("val", "rho")
med <- plyr::ddply(df, "rho", summarise, m = median(val))
ggplot(data = df, aes(x = val, fill = factor(rho))) +
geom_histogram() +
coord_flip() +
geom_vline(data = med, aes(xintercept = m), colour = 'black') +
facet_wrap(~ factor(rho))
You could do a variant on this using geom_violin instead of using histograms, although you wouldn't get labelled counts, just an idea of the relative density. Example with made up data:
df = data.frame(
rho = rep(c(0.1, 0.2, 0.3), each = 50),
val = sample(1:10, 150, replace = TRUE)
)
df$val = df$val + (5 * (df$rho == 0.2)) + (8 * (df$rho == 0.3))
ggplot(df, aes(x = rho, y = val, fill = factor(rho))) +
geom_violin() +
stat_summary(aes(group = 1), colour = "black",
geom = "line", fun.y = "median")
This produces a violin for each value of rho, and joins the medians for each violin.
I'd like to be able to build a string based on the number of columns in my matrix and pass that to ggplot as an aesthetic. This doesn't seem to be covered by the aes_string() function. The reason I want this is that I'm using the ggalluvial package but the intricacies matter less than the principle. My code looks like this:
library(ggplot2)
library(ggalluvial)
my_alluvial_plot <- function(scores, n_groups = 5) {
score_names <- names(scores)
scr_mat <- data.matrix(scores)
n_cols <- ncol(scores)
# create ntiles of scores so that flow can be seen between groups
ranks <- apply(scr_mat, 2, function(x) {
rk <- dplyr::ntile(x, n_groups)
return(as.factor(rk))
})
to_plot <- data.frame(ranks)
# build the string for the aes() function
a_string <- ""
for (i in 1:n_cols) {
a_string <- paste0(a_string, "axis", i, " = to_plot[, ", i, "],")
}
# remove final comma
a_string <- substr(a_string, 1, nchar(a_string) - 1)
ggplot(to_plot,
aes(eval(a_string))) +
geom_alluvium(aes(fill = to_plot[, n_cols], width = 1/12)) +
geom_stratum(width = 1/12, fill = "black", color = "grey") +
scale_x_continuous(breaks = 1:n_cols, labels = score_names) +
scale_fill_brewer(type = "qual", palette = "Set1")
}
df <- data.frame(col1 = runif(10),
col2 = runif(10),
col3 = rnorm(10),
col4 = rnorm(10))
my_alluvial_plot(df)
This produces a blank plot with the following error:
Warning: Ignoring unknown aesthetics: width
Error: Discrete value supplied to continuous scale
Basically, I want to build an alluvial plot that can support an arbitrary number of columns, so the ggplot code as it's evaluated would end up being like
ggplot(to_plot,
aes(axis1 = data[, 1], axis2 = data[, 2], axis3 = data[, 3], ...))
But neither eval() or parse() produce anything sensible. aes_string() produces the same problem. Is there any way to do this systematically?
The reason you can't run parse() or eval() on strings like "axis1 = col1, axis2 = col2" is that such is a string by itself is not valid R code. But the entire ggplot call? That can be parsed!
If you rework the plot call like this, it produces the alluvial plot just fine:
gg_string <- paste0("ggplot(to_plot,
aes(", a_string, ")) +
geom_alluvium(aes(fill = to_plot[, n_cols], width = 1/12)) +
geom_stratum(width = 1/12, fill = 'black', color = 'grey') +
scale_x_continuous(breaks = 1:n_cols, labels = score_names) +
scale_fill_brewer(type = 'qual', palette = 'Set1')")
eval(parse(text = gg_string))
I need to add a legend of the two lines (best fit line and 45 degree line) on TOP of my two plots. Sorry I don't know how to add plots! Please please please help me, I really appreciate it!!!!
Here is an example
type=factor(rep(c("A","B","C"),5))
xvariable=seq(1,15)
yvariable=2*xvariable+rnorm(15,0,2)
newdata=data.frame(type,xvariable,yvariable)
p = ggplot(newdata,aes(x=xvariable,y=yvariable))
p+geom_point(size=3)+ facet_wrap(~ type) +
geom_abline(intercept =0, slope =1,color="red",size=1)+
stat_smooth(method="lm", se=FALSE,size=1)
Here is another approach which uses aesthetic mapping to string constants to identify different groups and create a legend.
First an alternate way to create your test data (and naming it DF instead of newdata)
DF <- data.frame(type = factor(rep(c("A", "B", "C"), 5)),
xvariable = 1:15,
yvariable = 2 * (1:15) + rnorm(15, 0, 2))
Now the ggplot code. Note that for both geom_abline and stat_smooth, the colour is set inside and aes call which means each of the two values used will be mapped to a different color and a guide (legend) will be created for that mapping.
ggplot(DF, aes(x = xvariable, y = yvariable)) +
geom_point(size = 3) +
geom_abline(aes(colour="one-to-one"), intercept =0, slope = 1, size = 1) +
stat_smooth(aes(colour="best fit"), method = "lm", se = FALSE, size = 1) +
facet_wrap(~ type) +
scale_colour_discrete("")
Try this:
# original data
type <- factor(rep(c("A", "B", "C"), 5))
x <- 1:15
y <- 2 * x + rnorm(15, 0, 2)
df <- data.frame(type, x, y)
# create a copy of original data, but set y = x
# this data will be used for the one-to-one line
df2 <- data.frame(type, x, y = x)
# bind original and 'one-to-one data' together
df3 <- rbind.data.frame(df, df2)
# create a grouping variable to separate stat_smoothers based on original and one-to-one data
df3$grp <- as.factor(rep(1:2, each = nrow(df)))
# plot
# use original data for points
# use 'double data' for abline and one-to-one line, set colours by group
ggplot(df, aes(x = x, y = y)) +
geom_point(size = 3) +
facet_wrap(~ type) +
stat_smooth(data = df3, aes(colour = grp), method = "lm", se = FALSE, size = 1) +
scale_colour_manual(values = c("red","blue"),
labels = c("abline", "one-to-one"),
name = "") +
theme(legend.position = "top")
# If you rather want to stack the two keys in the legend you can add:
# guide = guide_legend(direction = "vertical")
#...as argument in scale_colour_manual
Please note that this solution does not extrapolate the one-to-one line outside the range of your data, which seemed to be the case for the original geom_abline.