I am, in R and using ggplot2, plotting the development over time of several variables for several groups in my sample (days of the week, to be precise). An artificial sample (using long data suitable for plotting) is this:
library(tidyverse)
groups1 <- rep(1:2, each = 7 * 100)
groups2 <- rep(rep(1:7, times = 2), each = 100)
x <- rep(1:100, times = 14)
values <- c(rnorm(n = 700), rgamma(n = 700, shape = 2))
data <- tibble(x, groups1, groups2, values)
data %>% ggplot(mapping = aes(x = x, y = values)) + geom_line() + facet_grid(groups2 ~ groups1)
which gives
In this example, the first variable -- shown in the left column -- has unlimited range, while the second variable -- shown in the right column -- is weakly positive.
I would like to reflect this in my plot by allowing the Y axes to differ across the columns in this plot, i.e. set Y axis limits separately for the two variables plotted. However, in order to allow for easy visual comparison of the different groups for each of the two variables, I would also like to have the identical Y axes within each column.
I've looked at the scales option to facet_grid(), but it does not seem to be able to do what I want. Specifically,
passing scales = "free_x" allows the Y axes to vary across rows, while
passing scales = "free_y" allows the X axes to vary across columns, but
there is no option to allow the Y axes to vary across columns (nor, presumably, the X axes across rows).
As usual, my attempts to find a solution have yielded nothing. Thank you very much for your help!
I think the easiest would to create a plot per facet column and bind them with something like {patchwork}. To get the facet look, you can still add a faceting layer.
library(tidyverse)
library(patchwork)
groups1 <- rep(1:2, each = 7 * 100)
groups2 <- rep(rep(1:7, times = 2), each = 100)
x <- rep(1:100, times = 14)
set.seed(42) ## always better to set a seed before using random functions
values <- c(rnorm(n = 700), rgamma(n = 700, shape = 2))
data <- tibble(x, groups1, groups2, values)
data %>%
group_split(groups1) %>%
map({
~ggplot(.x, aes(x = x, y = values)) +
geom_line() +
facet_grid(groups2 ~ groups1)
}) %>%
wrap_plots()
Created on 2023-01-11 with reprex v2.0.2
Related
I want to create a combination plot using plot_grid from the cowplot package.
The two plots that I want to combine use a log scale. Of the data plotted, some is negative, which gets dropped.
I can quite easily produce a decent result using facet_wrap that looks like this:
library(tidyverse)
tibble(x = rnorm(100),
y = rnorm(100),
type = "A") %>%
bind_rows(tibble(x = rnorm(100, mean = 10),
y = rnorm(100, mean = 10),
type = "B")) %>%
ggplot(aes(y = y, x = x)) +
geom_point() +
facet_wrap(~type)
But in my particular situation, I can't use facet_wrap because I want to give the panels A and B different x-axis labels and want to change the number format slightly (e.g. adding a $ sign to the axis ticks of panel A and a % sign to panel B).
Therefore I use plot_grid:
tibble(x = rnorm(100),
y = rnorm(100),
type = "A") %>%
ggplot(aes(y = y, x = x)) +
geom_point() +
scale_y_log10() -> a
tibble(x = rnorm(100, mean = 10),
y = rnorm(100, mean = 10),
type = "B") %>%
ggplot(aes(y = y, x = x)) +
geom_point() +
scale_y_log10() -> b
cowplot::plot_grid(a,b)
Now the problem is that the axis is completely distorted (this would be equal to scales = "free_y" in facet_wrap)
So therefore I attempt to set the limits/ranges for both plots manually by choosing the min and max from both plots:
lims <- c(min(layer_scales(a)$y$range$range, layer_scales(b)$y$range$range),
max(layer_scales(a)$y$range$range, layer_scales(b)$y$range$range))
cowplot::plot_grid(a + ylim(lims),b + ylim(lims))
But now the result is this:
So essentially I want to replicate the scales="fixed" in facet_wrap using plot_grid
Any ideas?
many thanks!
The issue is that you provide y axis limits in log10 scale as returned by layer_scales. You need to convert it to actual values.
lims = 10^c(min(layer_scales(a)$y$range$range, layer_scales(b)$y$range$range),
max(layer_scales(a)$y$range$range, layer_scales(b)$y$range$range))
Alternatively, you can compute the range of the actual data.
Problem description
I have thousands of lines (~4000) that I want to plot. However it is infeasible to plot all lines using geom_line() and just use for example alpha=0.1 to illustrate where there is a high density of lines and where not. I came across something similar in Python, especially the second plot of the answers looks really nice, but I do not now if something similar can be achieved in ggplot2. Thus something like this:
An example dataset
It would make much more sense to demonstrate this with a set showing a pattern, but for now I just generated random sinus curves:
set.seed(1)
gen.dat <- function(key) {
c <- sample(seq(0.1,1, by = 0.1), 1)
time <- seq(c*pi,length.out=100)
val <- sin(time)
time = 1:100
data.frame(time,val,key)
}
dat <- lapply(seq(1,10000), gen.dat) %>% bind_rows()
Tried heatmap
I tried a heatmap like answered here, however this heatmap will not consider the connection of points over the complete axis (like in a line) but rather show the "heat" per time point.
Question
How can we in R, using ggplot2 plot a heatmap of lines simmilar to that shown in the first figure?
Looking closely, one can see that the graph to which you are linking consists of many, many, many points rather than lines.
The ggpointdensity package does a similar visualisation. Note with so many data points, there are quite some performance issues. I am using the developer version, because it contains the method argument which allows to use different smoothing estimators and apparently helps deal better with larger numbers. There is a CRAN version too.
You can adjust the smoothing with the adjust argument.
I have increased the x interval density of your code, to make it look more like lines. Have slightly reduced the number of 'lines' in the plot though.
library(tidyverse)
#devtools::install_github("LKremer/ggpointdensity")
library(ggpointdensity)
set.seed(1)
gen.dat <- function(key) {
c <- sample(seq(0.1,1, by = 0.1), 1)
time <- seq(c*pi,length.out=500)
val <- sin(time)
time = seq(0.02,100,0.1)
data.frame(time,val,key)
}
dat <- lapply(seq(1, 1000), gen.dat) %>% bind_rows()
ggplot(dat, aes(time, val)) +
geom_pointdensity(size = 0.1, adjust = 10)
#> geom_pointdensity using method='kde2d' due to large number of points (>20k)
Created on 2020-03-19 by the reprex package (v0.3.0)
update
Thanks user Robert Gertenbach for creating some more interesting sample data. Here the suggested use of ggpointdensity on this data:
library(tidyverse)
library(ggpointdensity)
gen.dat <- function(key) {
has_offset <- runif(1) > 0.5
time <- seq(1, 1000, length.out = 1000)
val <- sin(time / 100 + rnorm(1, sd = 0.2) + (has_offset * 1.5)) *
rgamma(1, 20, 20)
data.frame(time,val,key)
}
dat <- lapply(seq(1,1000), gen.dat) %>% bind_rows()
ggplot(dat, aes(time, val, group=key)) +stat_pointdensity(geom = "line", size = 0.05, adjust = 10) + scale_color_gradientn(colors = c("blue", "yellow", "red"))
Created on 2020-03-24 by the reprex package (v0.3.0)
Your data will result in a quite uniform polkadot density.
I generated some slightly more interesting data like this:
gen.dat <- function(key) {
has_offset <- runif(1) > 0.5
time <- seq(1, 1000, length.out = 1000)
val <- sin(time / 100 + rnorm(1, sd = 0.2) + (has_offset * 1.5)) *
rgamma(1, 20, 20)
data.frame(time,val,key)
}
dat <- lapply(seq(1,1000), gen.dat) %>% bind_rows()
We then get a 2d density estimate. kde2d doesn't have a predict function so we model it with a LOESS
dens <- MASS::kde2d(dat$time, dat$val, n = 400)
dens_df <- data.frame(with(dens, expand_grid( y, x)), z = as.vector(dens$z))
fit <- loess(z ~ y * x, data = dens_df, span = 0.02)
dat$z <- predict(fit, with(dat, data.frame(x=time, y=val)))
Plotting it then gets this result:
ggplot(dat, aes(time, val, group = key, color = z)) +
geom_line(size = 0.05) +
theme_minimal() +
scale_color_gradientn(colors = c("blue", "yellow", "red"))
This is all highly reliant on:
The number of series
The resolution of series
The density of kde2d
The span of loess
so your mileage may vary
I came up with the following solution, using geom_segment(), however I'm not sure if geom_segment() is the way to go as it then only checks if pairwise values are exactly the same whereas in a heatmap (as in my question) values near each other also affect the 'heat' rather than being exactly the same.
# Simple stats to get all possible line segments
vals <- unique(dat$time)
min.val = min(vals)
max.val = max(vals)
# Get all possible line segments
comb.df <- data.frame(
time1 = min.val:(max.val - 1),
time2 = (min.val + 1): max.val
)
# Join the original data to all possible line segments
comb.df <- comb.df %>%
left_join(dat %>% select(time1 = time, val1 = val, key )) %>%
left_join(dat %>% select(time2 = time, val2 = val, key ))
# Count how often each line segment occurs in the data
comb.df <- comb.df %>%
group_by(time1, time2, val1, val2) %>%
summarise(n = n_distinct(key))
# ggplot2 to plot segments
ggplot(comb.df %>% arrange(n)) +
geom_segment(aes(x = time1, y = val1, xend = time2, yend = val2, color = n), alpha =0.9) +
scale_colour_gradient( low = 'green', high = 'red') +
theme_bw()
I recently discovered the multiplot function from the Rmisc package to produce stacked plots using ggplot2 plots/objects. What I am trying to do now is to create a multiplot of multiplots. Unfortunately, unlike the ggplot function, multiplot does not produce objects, so my issue cannot be resolved by simply nesting multiplot.
I will create a dataframe to make my point clear. In my dataframe named df, I have 3 columns: period, group and value. A certain value is recorded for each of 3 groups over 10 periods. (Note: I don't use a seed number below despite the use of the sample function because the focus is not numerical, it is graphical)
# Create a data frame for illustration purposes
df <- data.frame(period = rep(1:10, 3),
group = rep(LETTERS[1:3], each = 10),
value = sample(100, 30, replace = TRUE))
I then add a fourth column to df, which is the exponential transformation of the value column.
df$exp.value = exp(df$value)
I would like to create stacked plots allowing me to compare the values in each group to their exponential counterparts.
# Split dataframe by group
df_split <- split(df, df$group)
# Plots of values in each group
plots <- lapply(df_split, function(i){
ggplot(data = i, aes(x = period, y = value)) + geom_line()
})
# Plots of logged values in each group
plots_exp <- lapply(df_split, function(i){
ggplot(data = i, aes(x = period, y = exp.value)) + geom_line()
})
plots and plots_exp are both lists of 3 elements each containing ggplot objects. The first element of each list corresponds to group A, the second element corresponds to group B and the third element corresponds to group C.
In order to compare each group's values to the exponential values, I can use the multiplot function. Following is an example with group A:
multiplot(plots[[1]], plots_log[[1]], cols = 1)
How can I create a grid which will include the multiplot above as well as the ones for groups B and C? As if the code included ... + facet_grid(. ~ group)?
We can use cowplot package:
library(cowplot)
plot_grid(plots[[1]], plots_exp[[1]],
plots[[2]], plots_exp[[2]],
plots[[3]], plots_exp[[3]],
labels = c("A", "A", "B", "B", "C", "C"),
ncol = 1, align = "v")
We can output to a pdf looping through plots and plots_exp list objects. Every page will contain 2 plots. This is a better option when we have a lot of groups:
pdf("myPlots.pdf")
lapply(seq(length(plots)), function(i){
plot_grid(plots[[i]], plots_exp[[i]], ncol = 1, align = "v")
})
dev.off()
Another option is to prepare the data for ggplot and use facet as usual:
library(dplyr)
library(tidyr)
library(ggplot2)
gather(df, valueType, value, -c(group, period)) %>%
mutate(myGroup = paste(group, valueType)) %>%
ggplot(aes(period, value)) +
geom_line() +
facet_grid(myGroup ~ ., scales = "free_y")
I'm trying to plot distribution of species between 2 different habitat types (hab 1 and hab 2). Some of my species secondarily use some habitats, so I have a separate column for secondary hab1 (hab1.sec). To visualise their distribution across the two habitats and different depths, I am using a facet_grid between hab1 and hab2. Example code as below:
# example code
set.seed(101)
ID <- seq(1,20, by=1) ## ID for plotting
species <- sample(letters, size=20) ## arbitrary species
## different habitat types in hab.1
hab1 <- c("coastal","shelf","slope","open.ocean","seamount")
hab1.pri <- sample(hab1, size = 20, replace = T)
## secondarily used habitats, may not be present for some species
hab.sec <- c("coastal","shelf","slope","open.ocean","seamount", NA)
hab1.sec <- sample(hab.sec, size = 20, replace = T)
## habitat types for hab.2
hab2 <- c("epipelagic","benthopelagic","epibenthic","benthic")
hab.2 <- sample(hab2, size = 20, replace = T)
## arbitrary depth values
dep.min <- sample(seq(0,1000), size = 20, replace = T)
dep.max <- sample(seq(40, 1500), size = 20, replace = T)
# make data frame
dat <- data.frame(ID, species, hab1.pri, hab1.sec, hab.2,dep.min, dep.max)
# ggplot with facet grid
p <- ggplot(data=dat)+ geom_segment(aes(x=as.factor(ID),xend=as.factor(ID),y=dep.min, yend=dep.max),size=2,data = dat)+ scale_y_reverse(breaks = c(0, 200, 1000,1500))+facet_grid(hab.2~hab1.pri, scales = "free" ,space = "free")+theme_bw()
I would like to add segments for hab1.sec within the existing facet grid. I have tried this code:
p+ geom_segment(aes(x=as.factor(ID),xend=as.factor(ID),y=dep.min, yend=dep.max),linetype=2,data = dat)+facet_wrap(~hab1.sec)
But doing this produces a new graph.
Is there a better way to add those extra lines to the existing grid (preferably as dashed lines)?
I'd be really grateful for any help with this!
Thanks a lot, in advance!
What about combining the primary and secondary habitats into one variable and mapping that variable to an aesthetic?
Note I'm using tidyr and dplyr tools here because they help a lot in cases like this.
library(dplyr)
library(tidyr)
dat %>%
gather(hab1, value, -ID, -species, -(hab.2:dep.max)) %>%
ggplot()+
geom_segment(aes(x=as.factor(ID),xend=as.factor(ID),y=dep.min, yend=dep.max, linetype=hab1),size=2) +
scale_y_reverse(breaks = c(0, 200, 1000,1500))+
facet_grid(hab.2~value, scales = "free" ,space = "free")+
theme_bw()
So I have two sets of data (of different length) that I am trying to group up and display the density plots for:
dat <- data.frame(dens = c(nEXP,nCNT),lines = rep(c("Exp","Cont")))
ggplot(dat, aes(x = dens, group=lines, fill = lines)) + geom_density(alpha = .5)
when I run the code it spits an error about the different lengths, i.e.
"arguments imply different num of rows: x, y"
I then augment the code to:
dat <- data.frame(dens = c(nEXP,nCNT),lines = rep(c("Exp","Cont"),X))
Where X is the length of the longer argument so the lengths of "lines" will match that of dens.
Now the issue is that when when I go to plot the data I am only getting ONE density plot.... I know there should be two, since plotting the densities with plot/lines, is clearly two non-equal overlapping distributions, so I am assuming the error is with the grouping...
hope that makes sense.
So I am not sure why but basically I simply had to do the rep() function manually:
A<-data.frame(ExpN, key = "exp")
B<-data.frame(ConN,key = "con")
colnames(A) <- c("a","key")
colnames(B) <- c("a","key")
dat <- rbind(A,B)
ggplot(dat, aes(x = dens, fill = key)) + geom_density(alpha = .5)
You need to tell rep how many times to repeat each element to get it to line up
dat <- data.frame(dens = c(nEXP,nCNT),
lines = rep(c("Exp","Cont"), c(length(nEXP),length(nCNT)))
That should give you a dat you can use with your ggplot call.