I have 30 heatmaps in the form of square matrices. Each heatmap is linked to a date and I would like to create an animation transitioning from one heatmap to the next based on the date. I am using gganimate in R in order to do so and I am encountering some problems. Let me show an example with some random data
# GENERATE RANDOM DATA. 30 3X3 MATRICES STORED IN list_matrices
pacman::p_load(tidyverse, ggplot2, av, gganimate)
list_matrices = list()
for(i in 1:30){list_matrices[[i]] = matrix(runif(9), nrow=3)}
# PUT ALL THE MATRICES TOGETHER INTO A TIBBLE AND
# DO A PIVOT LONGER IN ORDER TO USE GGPLOT
for(i in 1:length(list_matrices))
{
tmp_result = list_matrices[[i]] %>% as_tibble() %>%
mutate(rowname = c('a', 'b', 'c'),
frame = i) %>%
pivot_longer(-c(rowname, frame), names_to = 'colname')
if(i == 1)
{
df_result = tmp_result
} else{
df_result = rbind(df_result, tmp_result)
}
}
This is the dataframe that I will plot:
> df_result
# A tibble: 270 x 4
rowname frame colname value
<chr> <int> <chr> <dbl>
1 a 1 V1 0.456
2 a 1 V2 0.716
3 a 1 V3 0.316
4 b 1 V1 0.724
5 b 1 V2 0.766
And I create the plot:
p <- ggplot(data = df_result, aes(x = rowname, y = colname, fill = value)) +
geom_tile() +
geom_text(aes(label = round(value, 3))) +
scale_fill_gradient(low="white", high="red", limits=c(0, 1))
p + transition_time(frame) +
labs(title = "Date: {frame_time}")
I am having three problems here
I am outputing this into a video, but regardless the number of matrices in list_matrices (this is, regardless the number of heatmaps I have) the video always lasts 10 seconds, so transition is too fast. Is there a way to increase video duration?
The video quality is poor. Is there a way to increase quality?
I am including the value of the variable in the heatmap, and I have set geom_text(aes(label = round(value, 3))) to round the number of decimals, but some of the heatmaps in the video show way more than 3 decimals. Any idea why this is happening and how to fix it?
Your three problems could be addressed as
change fps to 1 to slow down (modify further to slow it down).
save as png to get better video quality
wrap your round with as.character() in the label.
p <- ggplot(data = df_result, aes(x = rowname, y = colname, fill = value)) + geom_tile() + geom_text(aes(label = as.character(round(value,3)))) + scale_fill_gradient(low="white", high="red", limits=c(0, 1))
pp <- p + transition_time(frame) + labs(title = "Date: {frame_time}")
animate(pp, fps=1, dev="png")
Related
I am currently attempting to plot densities of flies on julian dates, per year. The aim is to see when there are peak densities of flies, for two methods of data collection (group 1 and group 2). I have many rows of data, over the course of 10 years, for example, the data set looks like this:
year
julian
group
2000
214
1
2001
198
1
2001
224
1
2000
189
2
2000
214
2
2001
222
2
2001
259
2
2000
260
2
2000
212
1
Each row is a single observation.
This is my first time plotting using ggplots, so I am confused as to how to plot vertical peak lines for each year.
The code currently looks like this:
Code
data$group <- as.factor(data$group)
plots <- ggplot(data, aes(x = julian, group = group)) +
geom_density(aes(colour = group),adjust = 2) + facet_wrap(~year, ncol = 2)
I have attempted to plot peaks using this code:
geom_vline(data = vline, aes(xintercept = density(data$julian)$x[which.max(density(data$julian)$y)]))
vline <- summarise(group_by(data,year, group=group), density(ata$julian, group=group)$x[which.max(density(data$julian)$y)])
vline
However I assume it has found the peak density for all years and all groups.
Please may anyone help advise me on how to plot max densities for each year and group across each facet? Even better if there are multiple peaks, how would I find those, and a quantitative value for the peaks?
Thank you in advance, I am very new to ggplots.
Instead of trying to wrangle all computations into one line of code I would suggest to split it into steps like so. Instead of using your code to find the highest peak I make use of this answer which in principle should also find multiple peaks (see below):
library(dplyr)
library(ggplot2)
fun_peak <- function(x, adjust = 2) {
d <- density(x, adjust = adjust)
d$x[c(F, diff(diff(d$y) >= 0) < 0)]
}
vline <- data %>%
group_by(year, group) %>%
summarise(peak = fun_peak(julian))
#> `summarise()` has grouped output by 'year'. You can override using the `.groups` argument.
ggplot(data, aes(x = julian, group = group)) +
geom_density(aes(colour = group), adjust = 2) +
geom_vline(data = vline, aes(xintercept = peak)) +
facet_wrap(~year, ncol = 2)
And here is a small example with multiple peaks based on the example data in the linked answer:
x <- c(1,1,4,4,9)
data <- data.frame(
year = 2000,
julian = rep(c(1,1,4,4,9), 2),
group = rep(1:2, each = 5)
)
data$group <- as.factor(data$group)
vline <- data %>%
group_by(year, group) %>%
summarise(peak = fun_peak(julian, adjust = 1))
#> `summarise()` has grouped output by 'year', 'group'. You can override using the `.groups` argument.
ggplot(data, aes(x = julian, group = group)) +
geom_density(aes(colour = group), adjust = 1) +
geom_vline(data = vline, aes(xintercept = peak)) +
facet_wrap(~year, ncol = 2)
I have a relatively large dataset that I can share here.
I am trying to plot all the lines (not just one: e.g. a mean or a median) corresponding to the values of y over x = G, with the data grouped by I and P; so that the levels of the variable I appear with a different colour and the levels of the variable P appear with a different line type.
The problem I have is that the graph I get is a zig-zag line graph along the x-axis. The aim, obviously, is to have a line for each combination of data, avoiding the zig-zag. I have read that this problem could be related to the way the data is grouped. I have tried several combinations of data grouping using group but I can't solve the problem.
The code I use is as follows:
#Selecting colours
colours<-brewer.pal(n = 11, name = "Spectral")[c(9,11,1)]
#Creating plot
data %>%
ggplot(aes(x = G, y = y, color = I, linetype=P)) +
geom_line(aes(linetype=P,color=I),size=0.2)+
scale_linetype_manual(values=c("solid", "dashed")) +
scale_color_manual(values=colours) +
scale_x_continuous(breaks = seq(0,100, by=25), limits=c(0,100)) +
scale_y_continuous(breaks = seq(0,1, by=0.25), limits=c(0,1)) +
labs(x = "Time", y = "Value") +
theme_classic()
I also tried unsuccessfully adding group=interaction(I, P) inside ggplot(aes()), as they suggests in other forums.
Following #JonSpring's point:
dd2 <- (filter(dd,G %in% c(16,17))
%>% group_by(P,I,G)
%>% summarise(n=length(unique(y)))
)
shows that you have many different values of y for each combination of G/I/P:
# A tibble: 12 x 4
# Groups: P, I [6]
P I G n
<chr> <chr> <dbl> <int>
1 heterogeneity I005 16 34
2 heterogeneity I005 17 37
3 heterogeneity I010 16 34
... [etc.]
One way around this, if you so choose, is to use stat_summary() to have R collapse the y values in each group to their mean:
(dd %>%
ggplot(aes(x = G, y = y, color = I, linetype=P)) +
stat_summary(fun=mean, geom="line",
aes(linetype=P,color=I,group=interaction(I,P)),size=0.2) +
scale_linetype_manual(values=c("solid", "dashed")) +
scale_color_manual(values=colours) +
labs(x = "Time", y = "Value") +
theme_classic()
)
You could also do this yourself with group_by() + summarise() before calling ggplot.
There's not enough information in the data set as presented to identify individual lines. If we are willing to assume that the order of the values within a given I/G/P group is an appropriate indexing variable, then we can do this:
## add index variable
dd3 <- dd %>% group_by(P,I,G) %>% mutate(index=seq(n()))
(dd3 %>%
ggplot(aes(x = G, y = y, color = I, linetype=P)) +
geom_line(aes(group=interaction(index,I,P)), size=0.2) +
scale_linetype_manual(values=c("solid", "dashed")) +
scale_color_manual(values=colours) +
labs(x = "Time", y = "Value") +
theme_classic()
)
If this isn't what you had in mind, then you need to provide more information ...
I have no dataset just two plotted lines and I want to generate scattered y-axis data 2 standard deviations away from the mean (the plotted line). Here is my code for the line:
ggplot() +
lims(x = c(0,20), y = c(0,1)) +
annotate("segment",x = .1,xend = 5, yend = .25, y = .1) +
annotate("segment",x = 5,xend = 20, yend = .35,y = .25)
Sorry if this post is unclear but I am not sure the best way to explain it. Let me know if you have any questions or if what I am trying to do isn't possible.
Here's an example for one of the lines you have (I didn't double check whether y = 0.09*x + 0 is consistent or not with what you showed, guiding my answer from your comment).
library(ggplot2)
library(dplyr)
df <- tibble(x=1:20,
y1=0.09*x,
y2=0.0067*x)
# generate dots for y1
# mean y1 and sd = 1
sapply(df$y1, function(tt) rnorm(10, tt)) %>%
# make it into tibble
as_tibble() %>%
# pivot into longer format
tidyr::pivot_longer(everything()) %>%
# names of the columns get assigned to V1 V2 ...
# we can clean that and get the actual x
# this works nicely because your x=1:20, will fail otherwise
mutate(X=as.numeric(stringr::str_remove(name, "V"))) %>%
# plot the thing
ggplot(aes(X, value)) +
geom_point() +
# add the "mean" values from before
geom_point(data=df, aes(x, y1), col="red", size=2)
I have data like:
Name Count
Object1 110
Object2 111
Object3 95
Object4 40
...
Object2000 1
So only the first 3 objects have high counts, the rest 1996 objects have fewer than 40, with the majority less than 10. I am plotting this data with ggplot bar like:
ggplot(data=object_count, mapping = aes(x=object, y=count)) +
geom_bar(stat="identity") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank())
My plot is as below. As you can see, because there are so many objects with low counts, the width of the graph is very long, and the width of the bar is tiny, which is almost invisible for the hight-counts objects. Is there a better way to represent this data? My goal is to show a few top-count objects and to show there are many low-count ones. Is there a way to group the low count ones together?
My guess is that your data looks something like this:
set.seed(1)
object_count <- tibble(
obj_num = 1:2000,
object = paste0("Object", obj_num),
count = ceiling(20 * rpois(2000, 10) / obj_num)
)
head(object_count)
## A tibble: 6 x 3
# obj_num object count
# <int> <chr> <dbl>
#1 1 Object1 160
#2 2 Object2 100
#3 3 Object3 46
#4 4 Object4 55
#5 5 Object5 56
#6 6 Object6 40
Sure enough, when I plot that with ggplot(object_count, aes(object, count)) + geom_col() + [theme stuff] I get a similar figure.
Here are some strategies "to show a few top-count objects and to show there are many low-count ones."
Histogram
A vanilla histogram might not be clarifying here, since the important big values appear dramatically less often and would not be prominent enough:
ggplot(object_count, aes(count)) +
geom_histogram()
But we could change that by transforming the y axis to bring more emphasis to small values. The pseudo_log transformation is nice for that since it works like a log transform for large values, but linearly near -1 to 1. In this view, we can clearly see where the outliers with just one appearance are, but also see that there are many more small values. The binwidth = 1 here could be set to something wider if the specific values of the big values aren't as important as their general range.
ggplot(object_count, aes(count)) +
geom_histogram(binwidth = 1) +
scale_y_continuous(trans = "pseudo_log",
breaks = c(0:3, 100, 1000), minor_breaks = NULL)
Faceting
Another option could be to split your view into two pieces, one with detail on the big values, the other showing all the small values:
object_count %>%
mutate(biggies = if_else(count > 20, "Big", "Little")) %>%
ggplot(aes(obj_num, count)) +
geom_col() +
facet_grid(~biggies, scales = "free")
Lumping
Another option might be too lump together all the counts under 10. The version below emphasizes the object name and count, and the "Other" category has been labeled to show how many values it includes.
object_count %>%
mutate(group = if_else(count < 10, "Others", object)) %>%
group_by(group) %>%
summarize(avg = mean(count), count = n()) %>%
ungroup() %>%
mutate(group = if_else(group == "Others",
paste0("Others (n =", count, ")"),
group)) %>%
mutate(group = forcats::fct_reorder(group, avg)) %>%
ggplot() +
geom_col(aes(group, avg)) +
geom_text(aes(group, avg, label = round(avg, 0)), hjust = -0.5) +
coord_flip()
Cumulative count (~Pareto chart)
If you're interested in the share of total count, you might also look at the cumulative count and see how the big values make up a large share:
object_count %>%
mutate(cuml = cumsum(count)) %>%
ggplot(aes(obj_num)) +
geom_tile(aes(y = count + lag(cuml, default = 0),
height = count))
I'm working with ggplot2, stacked barplot to 100% with relative values, using the position = "fill" option in geom_bar().
Here my code:
test <- data.frame (x = c('a','a','a','b','b','b','b')
,k = c('k','j','j','j','j','k','k')
,y = c(1,3,4,2,5,9,7))
plot <- ggplot(test, aes(x =x, y = y, fill = k))
plot <- plot + geom_bar(position = "fill",stat = "identity")
plot <- plot + scale_fill_manual(values = c("#99ccff", "#ff6666"))
plot <- plot + geom_hline(yintercept = 0.50)+ggtitle("test")
plot
Here the result:
However, I need to add the labels on the various bars, also on the "sub bars". To do this, I worked with the geom_text():
plot + geom_text(aes(label=y, size=4))
But the result is not good. I tried without luck the hjust and vjust parameters, and also using something like:
plot + geom_text(aes(label=y/sum(y), size=4))
But I did not reach the result needed (I'm not adding all the tests to not overload the question with useless images, if needed, please ask!).
Any idea about to have some nice centered labels?
label specifies what to show, and y specifies where to show. Since you are using proportions for y-axis with position = "fill", you need to calculate the label positions (geom_text(aes(y = ...))) in terms of proportions for each x using cumulative sums. Additionally, to display only the total proportion of a given color, you will need to extract the Nth row for each x, k combination. Here, I am building a separate test_labels dataset for use in geom_text to display the custom labels:
test <- data.frame (x = c('a','a','a','b','b','b','b'),
k = c('k','j','j','j','j','k','k'),
y = c(1,3,4,2,5,9,7))
test_labels = test %>%
arrange(x, desc(k)) %>%
group_by(x) %>%
mutate(ylabel_pos = cumsum(y)/sum(y),
ylabel = y/sum(y)) %>%
group_by(k, add = TRUE) %>%
mutate(ylabel = sum(ylabel)) %>%
slice(n())
ggplot(test, aes(x =x, y = y, fill = k)) +
geom_bar(position = "fill", stat = "identity") +
scale_fill_manual(values = c("#99ccff", "#ff6666")) +
geom_hline(yintercept = 0.50) +
geom_text(data = test_labels,
aes(y = ylabel_pos, label=paste(round(ylabel*100,1),"%")),
vjust=1.6, color="white", size=3.5) +
ggtitle("test")
Result:
> test_labels
# A tibble: 4 x 5
# Groups: x, k [4]
x k y ylabel_pos ylabel
<fctr> <fctr> <dbl> <dbl> <dbl>
1 a j 4 1.0000000 0.8750000
2 a k 1 0.1250000 0.1250000
3 b j 5 1.0000000 0.3043478
4 b k 7 0.6956522 0.6956522