How to move dots closer together in circular dot plot in R using ggplot? - r

I have this code below which creates a figure that is near perfect for what I need, except I need to know how to adjust the dot distances from each other. Changing dot size will not help because I have some data sets that will have over a 10000 points (some type of clustering could work too). The dots end up covering the entire circle because the dots have so much space between them.
Any help is appreciated!
library(tidyverse)
ACT34_radarplot_input %>%
hist(breaks = seq(0, 24, 0.2), plot = FALSE) %>%
`[`(c(2, 4)) %>%
as_tibble() %>%
group_by(mids) %>%
summarize(y = -seq(counts), .groups = "drop") %>%
filter(y < 0) %>%
ggplot(aes(mids, y)) +
geom_point(col = "red2", size = 3) +
geom_hline(yintercept = 0) +
scale_x_continuous(breaks = c(0:24), limits = c(0, 24)) +
ylim(c(-20, 0)) +
coord_polar() +
ggtitle("Act 34 Significant Genes Phases") +
theme_void() +
theme(axis.text.x = element_text(size = 16),
plot.title = element_text(size = 20, hjust = 0.5))
Here is the figure as is.

Related

corrplot one column multiple groups single plot

I would like a corrplot, where I only have the first column. I got that solution from here:
Corrplot with only one variable on x axis
library(corrplot)
corrplot(cor(iris[,1:4])[1:4,1, drop=FALSE], cl.pos='n')
but I would like the first column to repeat WITHIN THE SAME PLOT by group:
I am not married to using corrplot but need some similar solution. Thanks for any and all help.
Here is a hacky way to do it without an additional package:
library(corrplot)
Dat<-cor(iris[,1:4])[1:4,1, drop=FALSE]
PlotDat<-cbind(Dat,Dat,Dat)
corrplot(PlotDat, cl.pos='n')
If you want complete control over your plot, you could do something like this using ggplot2
library(tidyverse)
as.data.frame(cor(iris[1:4])[,rep(1, 3)]) %>%
setNames(1:3) %>%
rownames_to_column() %>%
pivot_longer(-rowname) %>%
mutate(rowname = factor(rowname, rev(unique(rowname)))) %>%
ggplot(aes(name, rowname)) +
geom_tile(fill = 'white', color = 'black') +
geom_point(aes(size = abs(value), color = value)) +
coord_equal() +
scale_x_discrete(NULL, labels = rep('Sepal.Length', 3), expand = c(0, 0),
position = 'top') +
scale_y_discrete(NULL, expand = c(0, 0)) +
scale_color_gradientn(colors = c('red3', 'red', 'white', '#4080ce', '#063062'),
limits = c(-1, 1), guide = 'none') +
scale_size_continuous(limits = c(0, 1), range = c(0, 20), guide = 'none') +
theme(axis.text.x = element_text(angle = 90),
plot.margin = margin(20, 20, 20, 20))

How to fold a wide plot into multiple lines using ggplot?

I want to show the x-axis labels and the form of the line clearly on this plot. It is a point plot with a lot of categories along the x-axis which makes the plot very wide and very hard to read the x-axis.
Would it be possible to fold the plot in half and display it on two panels, one above the other? How would I do that? I thought about hacking around with facet_wrap but this got ugly with the ordered points (as I wish to maintain the order of the x-axis based on the value).
Or are there better ways of showing this data? The position of the categories along the x-axis is of interest, as is the shape of the line formed by the points.
I generated the example plot using this code:
library(stringi)
example <- data.frame(
cat = do.call(paste0, Map(stri_rand_strings, n=150, length=c(25, 14, 13), pattern = c('[A-Z]', '[0-9]', '[A-Z]'))),
val = rnorm(150, mean = 20)
)
cat_ordered_by_val <- example$cat[order(example$val)]
example$cat = factor(example$cat, levels=cat_ordered_by_val)
ggplot(example, aes(y = val, x = cat)) +
geom_point() +
ylab("Value") + xlab("Category") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1, size=5))
ggsave("~/Desktop/what_a_plot.jpg")
This puts points in one of the two facets in alternating ways. You can also do mutate(facet = row_number() < nrow(example) / 2) to put the first half of the points in one facet and the other half in the other facet:
library(tidyverse)
example <- data.frame(
cat = do.call(paste0, Map(stri_rand_strings, n = 150, length = c(25, 14, 13), pattern = c("[A-Z]", "[0-9]", "[A-Z]"))),
val = rnorm(150, mean = 20)
)
cat_ordered_by_val <- example$cat[order(example$val)]
example$cat <- factor(example$cat, levels = cat_ordered_by_val)
example %>%
arrange(cat) %>%
mutate(facet = row_number() %% 2) %>%
ggplot(aes(y = val, x = cat)) +
geom_point() +
ylab("Value") +
xlab("Category") +
theme_bw() +
facet_wrap(~facet, ncol = 1, scales = "free") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1, size = 5))

How to automatically choose a good ylim to read geom_labels in ggplot2 in R

Suppose I write the following code with the diamonds dataset:
library(tidyverse)
diamonds %>%
group_by(cut) %>%
summarize(total_value = sum(price, na.rm = TRUE)) %>%
arrange(total_value) %>%
mutate(cut = as_factor(cut)) %>%
mutate(across(where(is.numeric), ~round(., 1))) %>%
ggplot(aes(x = cut, y = total_value)) +
geom_col(aes(fill = cut)) +
theme(legend.position = "note") +
coord_flip() +
geom_label(aes(label = paste0("$", total_value)), size = 6) +
labs(title = "Total Value of Diamonds by Cut", y = "USD", x = "") +
theme(axis.text = element_text(size = rel(1)))
which outputs the following plot:
As you can see, it is impossible to read the last digit(s) of the first category ("Ideal").
So, my question is, I know I can simply write something like coord_flip(ylim = c(0,80000000) and this would solve the problem; however, what could I write instead for ggplot2 to automatically know by itself how much space it should provide in ylim for people to clearly read the geom_label()s without me having to do this manually?
I'm trying to create an automatic Dashboard with multiple plots such as this, but I cannot manually tune every one of those, I need an automatic mechanism and I haven't found anything regarding this on StackOverflow for geom_label() specifically.
Thanks.
Instead of positioning your label at the the bar, you could move it closer to the middle and adjust position with vjust so it won't spill out of the plot set to include the bars.
library(tidyverse)
diamonds %>%
group_by(cut) %>%
summarize(total_value = sum(price, na.rm = TRUE)) %>%
arrange(total_value) %>%
mutate(cut = as_factor(cut)) %>%
mutate(across(where(is.numeric), ~round(., 1))) %>%
ggplot(aes(x = cut, y = total_value)) +
geom_col(aes(fill = cut)) +
theme(legend.position = "note") +
coord_flip() +
geom_label(aes(label = paste0("$", total_value), y = total_value/2), size = 6, hjust = 0.2) +
labs(title = "Total Value of Diamonds by Cut", y = "USD", x = "") +
theme(axis.text = element_text(size = rel(1)))
That gives:

How to label the count of each bin within ggridges package?

I have a data frame that simulates the NFL season with 2 columns: team and rank. I am trying to use ggridges to make a distribution plot of the frequency of each team at each rank from 1-10. I can get the plot working, but I'd like to display the count of each team/rank in each bin. I have been unsuccessful so far.
ggplot(results,
aes(x=rank, y=team, group = team)) +
geom_density_ridges2(aes(fill=team), stat='binline', binwidth=1, scale = 0.9, draw_baseline=T) +
scale_x_continuous(limits = c(0,11), breaks = seq(1,10,1)) +
theme_ridges() +
theme(legend.position = "none") +
scale_fill_manual(values = c("#4F2E84", "#FB4F14", "#7C1415", "#A71930", "#00143F", "#0C264C", "#192E6C", "#136677", "#203731"), name = NULL)
Which creates this plot:
I tried adding in this line to get the count added to each bin, but it did not work.
geom_text(stat='bin', aes(y = team + 0.95*stat(count/max(count)),
label = ifelse(stat(count) > 0, stat(count), ""))) +
Not the exact dataset but this should be enough to at least run the original plot:
results = data.frame(team = rep(c('Jets', 'Giants', 'Washington', 'Falcons', 'Bengals', 'Jaguars', 'Texans', 'Cowboys', 'Vikings'), 1000), rank = sample(1:20,9000,replace = T))
How about calculating the count for each bin, joining to the original data and using the new variable n as the label?
library(dplyr) # for count, left_join
results %>%
count(team, rank) %>%
left_join(results) %>%
ggplot(aes(rank, team, group = team)) +
geom_density_ridges2(aes(fill = team),
stat = 'binline',
binwidth = 1,
scale = 0.9,
draw_baseline = TRUE) +
scale_x_continuous(limits = c(0, 11),
breaks = seq(1, 10, 1)) +
theme_ridges() +
theme(legend.position = "none") +
scale_fill_manual(values = c("#4F2E84", "#FB4F14", "#7C1415", "#A71930", "#00143F",
"#0C264C", "#192E6C", "#136677", "#203731"), name = NULL) +
geom_text(aes(label = n),
color = "white",
nudge_y = 0.2)
Result:
Neilfws' answer is great, but I've always found geom_ridgelines difficult to work with in circumstances like this so I usually recreate them with geom_rect:
library(dplyr)
results %>%
count(team, rank) %>%
filter(rank<=10) %>%
mutate(team=factor(team)) %>%
ggplot() +
geom_rect(aes(xmin=rank-0.5, xmax=rank+0.5, ymin=team, fill=team,
ymax=as.numeric(team)+n*0.75/max(n))) +
geom_text(aes(x=rank, y=as.numeric(team)-0.1, label=n)) +
theme_ridges() +
theme(legend.position = "none") +
scale_fill_manual(values = c("#4F2E84", "#FB4F14", "#7C1415", "#A71930",
"#00143F", "#0C264C", "#192E6C", "#136677",
"#203731"), name = NULL) +
ylab("team")
I especially like the level of fine control I get from geom_rect rather than ridgelines. But you do lose out on the nice bounding line drawn around each ridgeline, so if that's important then go with the other answer.

How does gganimate order an ordered bar time-series?

I have a time-series of data, where I'm plotting diagnosis rates for a disease on the y-axis DIAG_RATE_65_PLUS, and geographical groups for comparison on the x-axis NAME as a simple bar graph. My time variable is ACH_DATEyearmon, which the animation is cycling through as seen in the title.
df %>% ggplot(aes(reorder(NAME, DIAG_RATE_65_PLUS), DIAG_RATE_65_PLUS)) +
geom_bar(stat = "identity", alpha = 0.66) +
labs(title='{closest_state}') +
theme(plot.title = element_text(hjust = 1, size = 22),
axis.text.x=element_blank()) +
transition_states(ACH_DATEyearmon, transition_length = 1, state_length = 1) +
ease_aes('linear')
I've reordered NAME so it gets ranked by DIAG_RATE_65_PLUS.
What gganimate produces:
I now have two questions:
1) How exactly does gganimate reorder the data? There is some overall general reordering, but each month has no frame where the groups are perfectly ordered by DIAG_RATE_65_PLUS from smallest to biggest. Ideally, I would like the final month "Aug 2018" to be ordered perfectly. All of the previous months can have their x-axis based on the ordered NAME for "Aug 2018`.
2) Is there an option in gganimate where the groups "shift" to their correct rank for each month in the bar chart?
Plots for my comment queries:
https://i.stack.imgur.com/s2UPw.gif
https://i.stack.imgur.com/Z1wfd.gif
#JonSpring
df %>%
ggplot(aes(ordering, group = NAME)) +
geom_tile(aes(y = DIAG_RATE_65_PLUS/2,
height = DIAG_RATE_65_PLUS,
width = 0.9), alpha = 0.9, fill = "gray60") +
geom_hline(yintercept = (2/3)*25, linetype="dotdash") +
# text in x-axis (requires clip = "off" in coord_cartesian)
geom_text(aes(y = 0, label = NAME), hjust = 2) + ## trying different hjust values
theme(plot.title = element_text(hjust = 1, size = 22),
axis.ticks.y = element_blank(), ## axis.ticks.y shows the ticks on the flipped x-axis (the now metric), and hides the ticks from the geog layer
axis.text.y = element_blank()) + ## axis.text.y shows the scale on the flipped x-axis (the now metric), and hides the placeholder "ordered" numbers from the geog layer
coord_cartesian(clip = "off", expand = FALSE) +
coord_flip() +
labs(title='{closest_state}', x = "") +
transition_states(ACH_DATEyearmon,
transition_length = 2, state_length = 1) +
ease_aes('cubic-in-out')
With hjust=2, labels are not aligned and move around.
Changing the above code with hjust=1
#eipi10
df %>%
ggplot(aes(y=NAME, x=DIAG_RATE_65_PLUS)) +
geom_barh(stat = "identity", alpha = 0.66) +
geom_hline(yintercept=(2/3)*25, linetype = "dotdash") + #geom_vline(xintercept=(2/3)*25) is incompatible, but geom_hline works, but it's not useful for the plot
labs(title='{closest_state}') +
theme(plot.title = element_text(hjust = 1, size = 22)) +
transition_states(ACH_DATEyearmon, transition_length = 1, state_length = 50) +
view_follow(fixed_x=TRUE) +
ease_aes('linear')
To add on to #eipi10's great answer, I think this is a case where it's worth replacing geom_bar for more flexibility. geom_bar is normally quite convenient for discrete categories, but it doesn't let us take full advantage of gganimate's silky-smooth animation glory.
For instance, with geom_tile, we can recreate the same appearance as geom_bar, but with fluid movement on the x-axis. This helps to keep visual track of each bar and to see which bars are shifting order the most. I think this addresses the 2nd part of your question nicely.
To make this work, we can add to the data a new column showing the ordering that should be used at each month. We save this order as a double, not an integer (by using* 1.0). This will allow gganimate to place a bar at position 1.25 when it's animating between position 1 and 2.
df2 <- df %>%
group_by(ACH_DATEyearmon) %>%
mutate(ordering = min_rank(DIAG_RATE_65_PLUS) * 1.0) %>%
ungroup()
Now we can plot in similar fashion, but using geom_tile instead of geom_bar. I wanted to show the NAME both on top and at the axis, so I used two geom_text calls with different y values, one at zero and one at the height of the bar. vjust lets us align each vertically using text line units.
The other trick here is to turn off clipping in coord_cartesian, which lets the bottom text go below the plot area, into where the x-axis text would usually go.
p <- df2 %>%
ggplot(aes(ordering, group = NAME)) +
geom_tile(aes(y = DIAG_RATE_65_PLUS/2,
height = DIAG_RATE_65_PLUS,
width = 0.9), alpha = 0.9, fill = "gray60") +
# text on top of bars
geom_text(aes(y = DIAG_RATE_65_PLUS, label = NAME), vjust = -0.5) +
# text in x-axis (requires clip = "off" in coord_cartesian)
geom_text(aes(y = 0, label = NAME), vjust = 2) +
coord_cartesian(clip = "off", expand = FALSE) +
labs(title='{closest_state}', x = "") +
theme(plot.title = element_text(hjust = 1, size = 22),
axis.ticks.x = element_blank(),
axis.text.x = element_blank()) +
transition_states(ACH_DATEyearmon,
transition_length = 2, state_length = 1) +
ease_aes('cubic-in-out')
animate(p, nframes = 300, fps = 20, width = 400, height = 300)
Back to your first question, here's a color version that I made by removing fill = "gray60" from the geom_tile call. I sorted the NAME categories in order of Aug 2017, so they will look sequential for that one, as you described.
There's probably a better way to do that sorting, but I did it by joining df2 to a table with just the Aug 2017 ordering.
Aug_order <- df %>%
filter(ACH_DATEyearmon == "Aug 2017") %>%
mutate(Aug_order = min_rank(DIAG_RATE_65_PLUS) * 1.0) %>%
select(NAME, Aug_order)
df2 <- df %>%
group_by(ACH_DATEyearmon) %>%
mutate(ordering = min_rank(DIAG_RATE_65_PLUS) * 1.0) %>%
ungroup() %>%
left_join(Aug_order) %>%
mutate(NAME = fct_reorder(NAME, -Aug_order))
The bar ordering is done by ggplot and is not affected by gganimate. The bars are being ordered based on the sum of DIAG_RATE_65_PLUS within each ACH_DATEyearmon. Below I'll show how the bars are ordered and then provide code for creating the animated plot with the desired sorting from low to high in each frame.
To see how the bars are ordered, first let's create some fake data:
library(tidyverse)
library(gganimate)
theme_set(theme_classic())
# Fake data
dates = paste(rep(month.abb, each=10), 2017)
set.seed(2)
df = data.frame(NAME=c(replicate(12, sample(LETTERS[1:10]))),
ACH_DATEyearmon=factor(dates, levels=unique(dates)),
DIAG_RATE_65_PLUS=c(replicate(12, rnorm(10, 30, 5))))
Now let's make a single bar plot. The bars are the sum of DIAG_RATE_65_PLUS for each NAME. Note the order of the x-axis NAME values:
df %>%
ggplot(aes(reorder(NAME, DIAG_RATE_65_PLUS), DIAG_RATE_65_PLUS)) +
geom_bar(stat = "identity", alpha = 0.66) +
labs(title='{closest_state}') +
theme(plot.title = element_text(hjust = 1, size = 22))
You can see below that the ordering is the same when we explicitly sum DIAG_RATE_65_PLUS by NAME and sort by the sum:
df %>% group_by(NAME) %>%
summarise(DIAG_RATE_65_PLUS = sum(DIAG_RATE_65_PLUS)) %>%
arrange(DIAG_RATE_65_PLUS)
NAME DIAG_RATE_65_PLUS
1 A 336.1271
2 H 345.2369
3 B 346.7151
4 I 350.1480
5 E 356.4333
6 C 367.4768
7 D 368.2225
8 F 368.3765
9 J 368.9655
10 G 387.1523
Now we want to create an animation that sorts NAME by DIAG_RATE_65_PLUS separately for each ACH_DATEyearmon. To do this, let's first generate a new column called order that sets the ordering we want:
df = df %>%
arrange(ACH_DATEyearmon, DIAG_RATE_65_PLUS) %>%
mutate(order = 1:n())
Now we create the animation. transition_states generates the frames for each ACH_DATEyearmon. view_follow(fixed_y=TRUE)shows x-values only for the current ACH_DATEyearmon and maintains the same y-axis range for all frames.
Note that we use order as the x variable, but then we run scale_x_continuous to change the x-labels to be the NAME values. I've included these labels in the plot so you can see that they change with each ACH_DATEyearmon, but you can of course remove them in your actual plot as you did in your example.
p = df %>%
ggplot(aes(order, DIAG_RATE_65_PLUS)) +
geom_bar(stat = "identity", alpha = 0.66) +
labs(title='{closest_state}') +
theme(plot.title = element_text(hjust = 1, size = 22)) +
scale_x_continuous(breaks=df$order, labels=df$NAME) +
transition_states(ACH_DATEyearmon, transition_length = 1, state_length = 50) +
view_follow(fixed_y=TRUE) +
ease_aes('linear')
animate(p, nframes=60)
anim_save("test.gif")
If you turn off view_follow(), you can see what the "whole" plot looks like (and you can, of course, see the full, non-animated plot by stopping the code before the transition_states line).
p = df %>%
ggplot(aes(order, DIAG_RATE_65_PLUS)) +
geom_bar(stat = "identity", alpha = 0.66) +
labs(title='{closest_state}') +
theme(plot.title = element_text(hjust = 1, size = 22)) +
scale_x_continuous(breaks=df$order, labels=df$NAME) +
transition_states(ACH_DATEyearmon, transition_length = 1, state_length = 50) +
#view_follow(fixed_y=TRUE) +
ease_aes('linear')
UPDATE: To answer your questions...
To order by a given month's values, turn the data into a factor with the levels ordered by that month. To plot a rotated graph, instead of coord_flip, we'll use geom_barh (horizontal bar plot) from the ggstance package. Note that we have to switch the y's and x's in aes and view_follow() and that the order of the y-axis NAME values is now constant:
library(ggstance)
# Set NAME order based on August 2017 values
df = df %>%
arrange(DIAG_RATE_65_PLUS) %>%
mutate(NAME = factor(NAME, levels=unique(NAME[ACH_DATEyearmon=="Aug 2017"])))
p = df %>%
ggplot(aes(y=NAME, x=DIAG_RATE_65_PLUS)) +
geom_barh(stat = "identity", alpha = 0.66) +
labs(title='{closest_state}') +
theme(plot.title = element_text(hjust = 1, size = 22)) +
transition_states(ACH_DATEyearmon, transition_length = 1, state_length = 50) +
view_follow(fixed_x=TRUE) +
ease_aes('linear')
animate(p, nframes=60)
anim_save("test3.gif")
For smooth transitions, it seems like #JonSpring's answer handles that well.

Resources