How to use crosstalk with bar + line plot in r? - r

I am new to crosstalk & trying to make rmarkdown file more interactive by using on bar+line plot but it is not giving line on the plot and also gets weird when I change country.
library(tidyverse)
library(plotly)
library(crosstalk)
library(glue)
library(scales)
library(tidytext)
load data:
file_url <- "https://raw.githubusercontent.com/johnsnow09/covid19-df_stack-code/main/test_crosswalk.csv"
test_df <- read.csv(url(file_url))
Country_selected = c("Brazil")
selected_case_type = c("Confirmed_daily")
trend_sd <- test_df %>%
filter(Daily_Cases_type %in% selected_case_type
# Country.Region %in% Country_selected,
) %>%
select(Country.Region, date, Cases_count)%>%
arrange(date) %>%
group_by(Country.Region) %>%
mutate(new_avg = cumsum(Cases_count)/ seq_len(length(Cases_count))) %>%
ungroup() %>%
SharedData$new()
bscols(widths = c(9, 3),
list(
filter_select(id = "country", label = "Country:", sharedData = trend_sd, group = ~ Country.Region),
ggplotly(ggplot(data = trend_sd) +
geom_col(aes(x = date, y = Cases_count), fill = "turquoise", alpha = .3) +
geom_point(aes(x = date, y = new_avg), col = "tomato") +
geom_line(aes(x = date, y = new_avg), col = "tomato", size = .9, alpha = .3) +
scale_y_continuous(labels = comma) +
# expand_limits(y = 100000) +
labs(title = glue("{Country_selected}'s {selected_case_type} Cases {date_from} onwards"),
caption = "Data source: covid19.analytics")
))
)
This doesn't give correct line plot & even when I change country to some other then bars gets distorted.
Code & Plot Result below without crosstalk & plotly:
Country_selected = c("India") # can be selective
selected_case_type = c("Confirmed_daily")
test_df %>%
filter(Daily_Cases_type %in% selected_case_type,
Country.Region %in% Country_selected,
) %>%
select(Country.Region, date, Cases_count)%>%
arrange(date) %>%
group_by(Country.Region) %>%
mutate(new_avg = cumsum(Cases_count)/ seq_len(length(Cases_count))) %>%
ungroup() %>%
ggplot() +
geom_col(aes(x = date, y = Cases_count), fill = "turquoise", alpha = .3) +
geom_point(aes(x = date, y = new_avg), col = "tomato") +
geom_line(aes(x = date, y = new_avg), col = "tomato", size = .9, alpha = .3) +
scale_y_continuous(labels = comma) +
labs(title = glue("{Country_selected}'s {selected_case_type} Cases {date_from} onwards"),
subtitle = "With Average Daily Cases Trend line",
caption = "Data source: covid19.analytics")

Related

selectize widget of ggplotly highlight not always visible (depends on order of geoms?)

I want to do an interactive scatterplot where I can
highlight individual points
a tooltip shows me the id
search for specific id with a selectize widget
I tried for some time with plotly and ended up with this code
library(tidyverse)
library(plotly)
set.seed(1)
dat <- tibble(id = LETTERS[1:10],
trt = factor(rep(0:1, 5)),
x = rnorm(10),
y = x + rnorm(10, sd = 0.2)) %>%
highlight_key(~id)
dat %>%
{ggplot(., aes(x = x, y = y, group = id, color = trt)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed")} %>%
ggplotly(tooltip = c("id")) %>%
highlight(on = "plotly_hover", selectize = TRUE)
It took my very long to understand that the order of geoms seems to be important
## no color, geom order reversed
## selectize.js widget is completely missing
dat %>%
{ggplot(., aes(x = x, y = y, group = id)) +
geom_hline(yintercept = 0, linetype = "dashed") +
geom_point()} %>%
ggplotly(tooltip = c("id")) %>%
highlight(on = "plotly_hover", selectize = TRUE)
## color by trt, geom order reversed
## selectize.js widget only works for data where t = 0
dat %>%
{ggplot(., aes(x = x, y = y, group = id, color = trt)) +
geom_hline(yintercept = 0, linetype = "dashed") +
geom_point()} %>%
ggplotly(tooltip = c("id")) %>%
highlight(on = "plotly_hover", selectize = TRUE)
Can somebody explain this strange behavior? What if I would like to reverse the order of geoms i.e. hline ploted behind points?

How to plot average temperature and variation?

I have hourly temperature data from 1970 to 2021. I would like to summarize the dataset into a graph in ggplot showing the mean temperature of each month and year. And I also want to show the average temperature and temperature volatility increased. And I would like to show average temperature in the '2010s is higher than in the '1990s.
Here is the code of temperature density plot in 2010s and 1990s. But I don't know how to set the x axis.
data_re <- data
data_re$Year910 <- ifelse(data$Year %in% c(1990:1999), "1990s",
ifelse(data$Year %in% c(2010:2020), "2010s",NA))
data_re1 <- subset(data_re, Year910 %in% c("1990s","2010s"))
data_re1$Year910 <- factor(data_re1$Year910)
ggplot(data_re1, aes(x=, color=Year910)) +
xlab('Temperature')+
geom_density()
Here is the data file.
https://docs.google.com/spreadsheets/d/1HwPFJ1wKMMr0845Et60tV36WIWXM66Ig/edit?usp=sharing&ouid=111186166036061320361&rtpof=true&sd=true
Any advice on how to best go about this would be greatly appreciated.
I used lubridate library to convert the character to Date.
df = df %>%
pivot_longer(!c(Date,Month,Year), names_to = "HourNo", values_to = "Temp")
df$Date = as.Date(df$Date, format = "%Y-%m-%d")
df = na.omit(df)
df$Abbr = paste(month.abb[month(df$Date)], "-", year(df$Date), sep="")
df_Aggr = df %>%
group_by(Abbr) %>%
summarise(Avg = mean(Temp),
Begin = min(Date))
ggplot(df_Aggr, aes(x =Begin, y=Avg))+geom_line(color="blue", size=1)+
scale_x_date(date_breaks = "24 month", date_labels = "%b-%Y")+
theme(axis.text.x = element_text(vjust = 0.8,
angle = 35, hjust = 0.9))+ggtitle("Average Temperature")
I would first of all ensure your data is in the correct format after importing it. I am assuming you have already loaded your data frame from Excel and named it df:
df <- df[1:624,]
df[2:28] <- lapply(df[2:28], as.numeric)
df$Date <- as.POSIXct(df$Date)
Now we can load a couple of useful packages:
library(tidyverse)
library(geomtextpath)
Then we could summarize and plot with something like the following:
df %>%
rowwise() %>%
mutate(Temp = sum(across(starts_with('Hour')), na.rm = TRUE) / 24) %>%
select(-starts_with('Hour')) %>%
filter(Date > as.POSIXct('1989-12-31')) %>%
group_by(Year) %>%
summarize(Temp = mean(Temp), Date = median(Date)) %>% {
ggplot(., aes(Date, Temp)) +
geom_line(color = 'gray') +
geom_point(color = 'gray75') +
geom_textsegment(aes(x = as.POSIXct('1990-01-01'),
xend = as.POSIXct('1999-12-31'),
y = mean(Temp), yend = mean(Temp), color = '1990s',
label = '1990s'), vjust = -0.2, size = 6,
data = .[.$Date < as.POSIXct('2000-01-01'),], linetype = 2) +
geom_textsegment(aes(x = as.POSIXct('2000-01-01'),
xend = as.POSIXct('2009-12-31'),
y = mean(Temp), yend = mean(Temp), color = '2000s',
label = '2000s'), vjust = -0.2, size = 6,
data = .[.$Date < as.POSIXct('2010-01-01') &
.$Date > as.POSIXct('1999-12-31'),], linetype = 2) +
geom_textsegment(aes(x = as.POSIXct('2010-01-01'),
xend = as.POSIXct('2019-12-31'),
y = mean(Temp), yend = mean(Temp), color = '2010s',
label = '2010s'), vjust = -0.2, size = 6,
data = .[.$Date < as.POSIXct('2020-01-01') &
.$Date > as.POSIXct('2009-12-31'),], linetype = 2) +
theme_light(base_size = 16) +
scale_color_brewer(palette = 'Set1') +
theme(legend.position = 'none') +
labs(title = 'Annual mean temperature')
Note that I have used an annual summary of the temperature here. If you use monthly temperatures, the range of temperatures becomes much larger and the message of the plot becomes much weaker:
df %>%
rowwise() %>%
mutate(Temp = sum(across(starts_with('Hour')), na.rm = TRUE) / 24) %>%
select(-starts_with('Hour')) %>%
filter(Date > as.POSIXct('1989-12-31')) %>% {
ggplot(., aes(Date, Temp)) +
geom_line(color = 'gray') +
geom_point(color = 'gray75') +
geom_textsegment(aes(x = as.POSIXct('1990-01-01'),
xend = as.POSIXct('1999-12-31'),
y = mean(Temp), yend = mean(Temp), color = '1990s',
label = '1990s'), vjust = -0.2, size = 6,
data = .[.$Date < as.POSIXct('2000-01-01'),], linetype = 2) +
geom_textsegment(aes(x = as.POSIXct('2000-01-01'),
xend = as.POSIXct('2009-12-31'),
y = mean(Temp), yend = mean(Temp), color = '2000s',
label = '2000s'), vjust = -0.2, size = 6,
data = .[.$Date < as.POSIXct('2010-01-01') &
.$Date > as.POSIXct('1999-12-31'),], linetype = 2) +
geom_textsegment(aes(x = as.POSIXct('2010-01-01'),
xend = as.POSIXct('2019-12-31'),
y = mean(Temp), yend = mean(Temp), color = '2010s',
label = '2010s'), vjust = -0.2, size = 6,
data = .[.$Date < as.POSIXct('2020-01-01') &
.$Date > as.POSIXct('2009-12-31'),], linetype = 2) +
theme_light(base_size = 16) +
scale_color_brewer(palette = 'Set1') +
theme(legend.position = 'none') +
labs(title = 'Annual mean temperature')
}

ggplot2 geom_segment by group

I am trying to draw separate line segments for each of the countries (A, B, C) in the plot.
I used the variable country for the group argument (as the docs suggest), but that does not work. The line is still a continuous line connecting all the text labels, but I need 3 separate lines to be drawn, one for each country, connecting the 3 text labels across the years.
library(dplyr)
library(ggplot2)
df_p <- data.frame(
year = rep(2019:2021, each = 3),
country = rep(LETTERS[1:3], 3),
var_a = c(1,6,10,2,5,7,3,7,9),
var_b = c(2,8,14,4,9,15,2,9,19)
)
df_p %>% arrange(country, year) %>%
ggplot(aes(x = var_a, y = var_b, color = country)) +
geom_text(aes(label = year)) +
geom_segment(
aes(
xend = c(tail(var_a, n = -1), NA),
yend = c(tail(var_b, n = -1), NA),
group = country
),
arrow = arrow(type = "open", length = unit(0.15, "inches"))
)
I think you just need geom_path instead of geom_segment.
Try this:
df_p %>% arrange(country, year) %>%
ggplot(aes(x = var_a, y = var_b, color = country)) +
geom_text(aes(label = year)) +
geom_path(
aes(
group = country
),
arrow = arrow(type = "open", length = unit(0.15, "inches"))
)
Another possible solution with geom_polygon() without showing the direction of the connections:
Sample data:
df_p <- data.frame(
year = rep(2019:2021, each = 3),
country = rep(LETTERS[1:3], 3),
var_a = c(1,6,10,2,5,7,3,7,9),
var_b = c(2,8,14,4,9,15,2,9,19)
)
Sample code:
library(dplyr)
library(ggplot2)
df_p %>%
arrange(country, year) %>%
ggplot(aes(x = var_a, y = var_b, group = country)) +
geom_point(aes(colour = country, shape = country), size = 4) +
geom_line(aes(colour = country), size = 1)+
geom_text(aes(label = year)) +
geom_polygon(
aes(
fill= country), alpha = .4)+
labs(x="Variable B",y="Variable A")+
theme_bw()
Output:

How to adjust ggrepel label on pie chart?

I am trying to create a pie chart to visualize percent abundance of 9 genera. However, the labels are all clumping together. How do I remedy this? Code included below:
generaabundance2020 <- c(883, 464, 1948, 1177, 2607, 962, 2073, 620, 2670)
genera2020 <- c("Andrena", "Ceratina", "Halictus",
"Hesperapis", "Lasioglossum", "Melissodes",
"Osmia", "Panurginus", "Other")
generabreakdown2020 <- data.frame(group = genera2020, value = generaabundance2020)
gb2020label <- generabreakdown2020 %>%
group_by(value) %>% # Variable to be transformed
count() %>%
ungroup() %>%
mutate(perc = `value` / sum(`value`)) %>%
arrange(perc) %>%
mutate(labels = scales::percent(perc))
generabreakdown2020 %>%
ggplot(aes(x = "", y = value, fill = group)) +
geom_col() +
coord_polar("y", start = 0) +
theme_void() +
geom_label_repel(aes(label = gb2020label$labels), position = position_fill(vjust = 0.5),
size = 5, show.legend = F, max.overlaps = 50) +
guides(fill = guide_legend(title = "Genera")) +
scale_fill_manual(values = c("brown1", "chocolate1",
"darkgoldenrod1", "darkgreen",
"deepskyblue", "darkslateblue",
"darkorchid4", "hotpink1",
"lightpink"))
Which produces the following:
Thanks for adding your data.
There are a few errors in your code. The main one is that you didn't precalculate where to place the labels (done here in the text_y variable). That variable needs to be passed as the y aesthetic for geom_label_repel.
The second is that you no longer need
group_by(value) %>% count() %>% ungroup() because the data you provided is already aggregated.
library(tidyverse)
library(ggrepel)
generaabundance2020 <- c(883, 464, 1948, 1177, 2607, 962, 2073, 620, 2670)
genera2020 <- c("Andrena", "Ceratina", "Halictus", "Hesperapis", "Lasioglossum", "Melissodes", "Osmia", "Panurginus", "Other")
generabreakdown2020 <- data.frame(group = genera2020, value = generaabundance2020)
gb2020label <-
generabreakdown2020 %>%
mutate(perc = value/ sum(value)) %>%
mutate(labels = scales::percent(perc)) %>%
arrange(desc(group)) %>% ## arrange in the order of the legend
mutate(text_y = cumsum(value) - value/2) ### calculate where to place the text labels
gb2020label %>%
ggplot(aes(x = "", y = value, fill = group)) +
geom_col() +
coord_polar(theta = "y") +
geom_label_repel(aes(label = labels, y = text_y),
nudge_x = 0.6, nudge_y = 0.6,
size = 5, show.legend = F) +
guides(fill = guide_legend(title = "Genera")) +
scale_fill_manual(values = c("brown1", "chocolate1",
"darkgoldenrod1", "darkgreen",
"deepskyblue", "darkslateblue",
"darkorchid4", "hotpink1",
"lightpink"))
If you want to arrange in descending order of frequency, you should remember to also set the factor levels of the group variable to the same order.
gb2020label <-
generabreakdown2020 %>%
mutate(perc = value/ sum(value)) %>%
mutate(labels = scales::percent(perc)) %>%
arrange(desc(perc)) %>% ## arrange in descending order of frequency
mutate(group = fct_rev(fct_inorder(group))) %>% ## also arrange the groups in descending order of freq
mutate(text_y = cumsum(value) - value/2) ### calculate where to place the text labels
gb2020label %>%
ggplot(aes(x = "", y = value, fill = group)) +
geom_col() +
coord_polar(theta = "y") +
geom_label_repel(aes(label = labels, y = text_y),
nudge_x = 0.6, nudge_y = 0.6,
size = 5, show.legend = F) +
guides(fill = guide_legend(title = "Genera")) +
scale_fill_manual(values = c("brown1", "chocolate1",
"darkgoldenrod1", "darkgreen",
"deepskyblue", "darkslateblue",
"darkorchid4", "hotpink1",
"lightpink"))
Created on 2021-10-27 by the reprex package (v2.0.1)
You didn't provide us with your data to work with so I'm using ggplot2::mpg here.
library(tidyverse)
library(ggrepel)
mpg_2 <-
mpg %>%
slice_sample(n = 20) %>%
count(manufacturer) %>%
mutate(perc = n / sum(n)) %>%
mutate(labels = scales::percent(perc)) %>%
arrange(desc(manufacturer)) %>%
mutate(text_y = cumsum(n) - n/2)
Chart without polar coordinates
mpg_2 %>%
ggplot(aes(x = "", y = n, fill = manufacturer)) +
geom_col() +
geom_label(aes(label = labels, y = text_y))
Chart with polar coordinates and geom_label_repel
mpg_2 %>%
ggplot(aes(x = "", y = n, fill = manufacturer)) +
geom_col() +
geom_label_repel(aes(label = labels, y = text_y),
force = 0.5,nudge_x = 0.6, nudge_y = 0.6) +
coord_polar(theta = "y")
But maybe your data isn’t dense enough to need repelling?
mpg_2 %>%
ggplot(aes(x = "", y = n, fill = manufacturer)) +
geom_col() +
geom_label(aes(label = labels, y = text_y), nudge_x = 0.6) +
coord_polar(theta = "y")
Created on 2021-10-26 by the reprex package (v2.0.1)

How to use condition in geom_text / nudge_y

I want text labels were above or under of bar cap depending on where is more space for them. Now it's always down which is not always looks good:
Here is my code:
library(tidyr)
library(ggplot2)
library(dplyr)
library(stringr)
library(purrr)
numa.nodes <- tibble (
numa_name = c("numa_01","numa_01","numa_01","numa_01","numa_01","numa_01","numa_02","numa_02","numa_02","numa_02"),
counter_name =c("cpu01","cpu02","cpu03","cpu04","memory_used","memory_total","cpu01","cpu02","memory_used","memory_total"),
value = c(sample(0:100,4), sample(0:32,1), 32, sample(0:100,1), sample(0:100,1), sample(0:128,1), 128)
)
numa.nodes <- numa.nodes %>% add_row(
numa_name = c("numa_03","numa_03","numa_03","numa_03","numa_03","numa_03","numa_04","numa_04","numa_04","numa_04"),
counter_name =c("cpu01","cpu02","cpu03","cpu04","memory_used","memory_total","cpu01","cpu02","memory_used","memory_total"),
value = c(sample(0:100,4), sample(0:32,1), 32, sample(0:100,1), sample(0:100,1), sample(0:128,1), 128)
)
numa.nodes <- numa.nodes %>% add_row(
numa_name = c("numa_05","numa_05","numa_05","numa_05","numa_05","numa_05","numa_05"),
counter_name =c("cpu01","cpu02","cpu03","cpu04","cpu05","memory_used","memory_total"),
value = c(sample(1:100,5), sample(1:64,1), 64)
)
numa.nodes <- numa.nodes %>% mutate(counter_name=factor(counter_name,levels = unique(counter_name),ordered = T))
memory_columns <- numa.nodes %>% filter(counter_name=='memory_total')
memory_y_scale <- max(memory_columns$value, na.rm = TRUE) + 6
plot_numa = function(num){
df = numa.nodes %>% filter(str_detect(numa_name, num))
cpu_plot = df %>%
filter(str_detect(counter_name, "cpu")) %>%
ggplot(aes(x = counter_name)) +
geom_col(aes(y = 100), fill = "white", color = "black") +
geom_col(aes(y = value), fill = "#00AFBB", color = "black") +
geom_text(aes(y = value, label = paste0(value,"%")), nudge_y = 5, color = "black") +
theme_bw() +
labs(x = "CPU", y = "")
memory_plot = df %>%
filter(str_detect(counter_name, "memory")) %>%
pivot_wider(names_from = counter_name, values_from = value) %>%
ggplot(aes(x = "") ) +
geom_col(aes(y = memory_total), fill = "white", color = "black") +
geom_col(aes(y = memory_used), fill = "#FC4E07", color = "black") +
geom_text(aes(label = paste(memory_total, "GB"), y = memory_total), nudge_y = 5, color = "black") +
geom_text(aes(label = paste(memory_used, "GB"), y = memory_used), nudge_y = -3, color = "black") +
theme_bw() +
ylim(0, memory_y_scale) +
labs(x = "Memory", y = "")
ggpubr::ggarrange(cpu_plot, memory_plot, ncol = 2) %>% ggpubr::annotate_figure(top = paste("NUMA",num))
}
numa_numbers <- unique(numa.nodes$numa_name) %>% str_remove ("numa_")
ggpubr::ggarrange(plotlist = map(.x = numa_numbers, .f = ~plot_numa(num = .x)))
I tried to change this line:
geom_text(aes(label = paste(memory_used, "GB"), y = memory_used), nudge_y = -3, color = "black")
to something like that:
geom_text(aes(label = paste(memory_used, "GB"), y = memory_used),nudge_y = ifelse( (memory_total-memory_used) > 10, 5, -3)
, color = "black")
But I've got an error:
Error in ifelse((memory_total - memory_used) > 10, 5, -3) :
object 'memory_total' not found
Is there a better way to print labels optimal way?
What am I doing wrong?
How to change color of label to more contrast ie black on white, white on red?
Think of it this way: The nudge value will be different (potentially) for every observation in your data frame. That means that this is something that should be handled within aes(), where stuff is designed to change with your data, rather than nudge_y, which is designed to be a constant (and complains if used otherwise).
So, the solution is to do away entirely with nudge_y and build your ifelse() statement directly into aes(y=...).
In this case, here's the replacement for that particular geom_text() line:
# to see the same plot posted here, put this at the top of your code
set.seed(7331)
...
# plot code...
... +
geom_text(aes(
label = paste(memory_used, "GB"),
y = ifelse((memory_total-memory_used > 10), memory_used + 5, memory_used - 3)),
color = "black") +

Resources