Combine Line and Headmap plots in R - r

on my research on how to plot multiple line charts I came across the following paper:
https://arxiv.org/pdf/1808.06019.pdf
It is showing a way on how huge amount of time series data is displayed by combining every line chart with a common headmap, the result looks like kind of equal to this representation:
I was looking for an R package (but could not find anything) or a nice implementation for ggplot to achieve the same result. So I am able to plot a lot of geom_lines and color them differently but I do not know how to actually apply the headmap to it.
Does anyone has a hint/idea for me?
Thanks!
Stephan

library(tidyverse)
datasets::ChickWeight # from Base R
ggplot(ChickWeight, aes(Time, weight, group = Chick)) + geom_line()
The wrangling here counts how many readings in each time / weight bucket, and normalizes to "share of most common reading" for each Time.
ChickWeight %>%
count(Time, weight = 10*floor(weight/10)) %>%
complete(Time, weight = 10*0:30, fill = list(n = 0)) %>%
group_by(Time) %>%
mutate(share = n / max(n)) %>% # weighted for num as % of max for that Time
ungroup() %>%
ggplot(aes(Time, weight, fill = share)) +
geom_tile(width = 2) +
scale_fill_viridis_c(direction = -1)
If your data has sparse time readings, it might be useful to interpolate your lines to get more resolution for binning:
ChickWeight %>%
group_by(Chick) %>%
arrange(Time) %>%
padr::pad_int("Time", step = 0.5) %>%
mutate(weight_approx = approx(Time, weight, Time)$y) %>%
ungroup() %>%
count(Time, weight_approx = 10*floor(weight_approx/10)) %>%
complete(Time, weight_approx = 10*0:60, fill = list(n = 0)) %>%
group_by(Time) %>%
mutate(share = n / sum(n)) %>% # Different weighting option
ungroup() %>%
ggplot(aes(Time, weight_approx, fill = share)) +
geom_tile() +
scale_fill_viridis_c(direction = -1)

Related

Wanting Top ten to plot in ggplot

I have 16000 ish missing persons data that I am trying to order by Count and then plot on a graph. this is the code i am using. I am wanting to plot only the top ten.
mp.city <-mp.All %>%
group_by(State, City, Sex) %>%
summarise(Count = n())
mp.city %>%
arrange(desc(Count)) %>%
slice(1:10) %>%
ggplot(aes(y = City)) +
geom_bar()
the code will run but the plot is garbage. Any help would be amazing thank you!
I think you can manage it con head():
url<-'https://raw.githubusercontent.com/kitapplegate/fall2020/master/mpAll.csv'
mp.All<-read.csv(url)
library(ggplot2)
library(dplyr)
mp.city <-mp.All %>%
group_by(State, City, Sex) %>%
summarise(Count = n())
mp.city %>%
# sort
arrange(desc(Count)) %>%
# top 10 overall
head(10) %>%
# plot ordered
ggplot(aes(x = reorder(City,Count), y = Count))+
geom_bar( stat = "identity") +
# flipped
coord_flip() +
# label for x axis (flipped)
xlab("City")
P.S.
Next time try to share your data with dput(head(yourdata)) and posting the result, it's way better.

Working with tidyverse, ggplot, and broom to add confidence interval to a proportion test (prop.test) in R

Let's say I'm working with proportions, I have two main variables (sex and pain_level). It's not difficult to plot them:
With tidyverse and broom (and thanks for this link here: Calling prop.test function in R with dplyr) I can compare if the proportions are statistically different.
Now comes the question!
I want to add to the plot, the error bar. I know it's not as difficult as I'm thinking, but I could not find a way to do it. I've tried to replicate this link here (http://www.andrew.cmu.edu/user/achoulde/94842/labs/lab07_solution.html) but I'm trying to stay at tidyverse environment.
The desired output should be something like that:
Please feel free to use the script/syntax below that simulate the original dataset.
library(tidyverse)
ds <- data.frame(sex = rep(c("M","F"), 18),
pain_level = c("High","Moderate","low"))
#plot
ds %>%
group_by(pain_level, sex) %>%
summarise(n=n()) %>%
mutate(prop = n/sum(n)*100) %>%
ggplot(., aes(x = sex, fill = pain_level, y = prop)) +
geom_bar(stat = "summary") +
facet_wrap( ~ pain_level) +
theme(legend.position = "none")
#p values of proportion test
ds %>%
rowwise %>%
group_by(pain_level, sex) %>%
summarise(cases = n()) %>%
mutate(pop = sum(cases)) %>% #compute totals
distinct(., pain_level, .keep_all= TRUE) %>% #keep only one value of the row
mutate(tst = list(broom::tidy(prop.test(cases, pop, conf.level=0.95)))) %>%
tidyr::unnest(tst)
I think the following might roughly resemble your desired output:
ds %>%
group_by(pain_level, sex) %>%
summarise(cases = n()) %>%
mutate(pop = sum(cases)) %>%
rowwise() %>%
mutate(tst = list(broom::tidy(prop.test(cases, pop, conf.level=0.95)))) %>%
tidyr::unnest(tst) %>%
ggplot(aes(sex, estimate, group = pain_level)) +
geom_col(aes(fill = pain_level)) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high)) +
facet_wrap(~ pain_level)

make geom_bar show values in the ascending order

Although my query shows me values in descending order, ggplot then displays them alphabetically instead of ascending order.
Known solutions to this problem haven't seem to work. They suggest using Reorder or factor for values, which didn't work in this case
This is my code:
boxoffice %>%
group_by(studio) %>%
summarise(movies_made = n()) %>%
arrange(desc(movies_made)) %>%
top_n(10) %>%
arrange(desc(movies_made)) %>%
ggplot(aes(x = studio, y = movies_made, fill = studio, label = as.character(movies_made))) +
geom_bar(stat = 'identity') +
geom_label(label.size = 1, size = 5, color = "white") +
theme(legend.position = "none") +
ylab("Movies Made") +
xlab("Studio")
for those wanting a more complete example, here's where I got:
library(dplyr)
library(ggplot2)
# get some dummy data
boxoffice = boxoffice::boxoffice(dates=as.Date("2017-1-1"))
df <- (
boxoffice %>%
group_by(distributor) %>%
summarise(movies_made = n()) %>%
mutate(studio=reorder(distributor, -movies_made)) %>%
top_n(10))
ggplot(df, aes(x=distributor, y=movies_made)) + geom_col()
You'll need to convert boxoffice$studio to an ordered factor. ggplot will then respect the order of rows in the data set, rather than alphabetizing. Your dplyr chain will look like this:
boxoffice %>%
group_by(studio) %>%
summarise(movies_made = n()) %>%
arrange(desc(movies_made)) %>%
ungroup() %>% # ungroup
mutate(studio = factor(studio, studio, ordered = T)) %>% # convert variable
top_n(10) %>%
arrange(desc(movies_made)) %>%
ggplot(aes(x = studio, y... (rest of plotting code)

dplyr() and ggolot2()::geom_tile, filtering a group of summary statistics

I've got a data frame (df) with three categorical variables called site, purchase, and happycustomer.
I'd like to use gglot2's geom_tile function to create a heat-map of customer experience. I'd like site on the x-axis, purchase on the y-axis, and happycustomer as the fill. I'd like the heat map to feature the percentages for the happy customers grouped by site and purchase (ie the ones for which the value of happycustomer is y).
My problem's that at the moment the plot features both the happy and the unhappy customers.
Any help would be much appreciated.
Starting point (df):
df <- data.frame(site=c("GA","NY","BO","NY","BO","NY","BO","NY","BO","GA","NY","GA","NY","NY","NY"),purchase=c("a1","a2","a1","a1","a3","a1","a1","a3","a1","a2","a1","a2","a1","a2","a1"),happycustomer=c("n","y","n","y","y","y","n","y","n","y","y","y","n","y","n"))
Current code:
library(ggplot2)
library(dplyr)
df %>%
group_by(site, purchase,happycustomer) %>%
summarize(bin = sum(happycustomer==happycustomer)) %>%
group_by(site,happycustomer) %>%
mutate(bin_per = (bin/sum(bin)*100)) %>%
ggplot(aes(site,purchase)) + geom_tile(aes(fill = bin_per),colour = "white") + geom_text(aes(label = round(bin_per, 1))) +
scale_fill_gradient(low = "blue", high = "red")
Here is the solution with two data frames.
happyDF <- df %>%
filter(happycustomer == "y") %>%
group_by(site, purchase) %>%
summarise( n = n() )
totalDF <- df %>%
group_by(site, purchase) %>%
summarise( n = n() )
And the ggplot code:
merge(happyDF, totalDF, by=c("site", "purchase") ) %>%
mutate(prop = 100 * (n.x / n.y) ) %>%
ggplot(., aes(site, purchase)) +
geom_tile(aes(fill = prop),colour = "white") +
geom_text(aes(label = round(prop, 1))) +
scale_fill_gradient(low = "blue", high = "red")

ggplot2() bar chart and dplyr() grouped and overall data in R

I'd like to make a stacked proportional bar chart representing the prevalence of diabetes in a cohort of individuals residing in towns A, B, and C. I'd also like the plot to feature a bar representing the entire cohort.
I'm happy with the below plot, but I'd like to know if there is a way of incorporating the pre-processing step into the processing step, ie piping it with dplyr()?
Thanks!
Starting point (df):
dfa <- data.frame(town=c("A","A","A","B","B","C","C","C","C","C"),diabetes=c("y","y","n","n","y","n","y","n","n","y"),heartdisease=c("n","y","y","n","y","y","n","n","n","y"))
Pre-processing:
dfb <- rbind(dfa, transform(dfa, town = "ALL"))
Processing and plot:
library(dplyr)
library(ggplot)
dfc <- dfb %>%
group_by(town) %>%
count(diabetes) %>%
mutate(prop = n / sum(n))
ggplot(dfc, aes(x = town, y = prop, fill = diabetes)) +
geom_bar(stat = "identity") +
coord_flip()
Like this:
dfc <- dfa %>%
bind_rows(dfa %>%
mutate(town = "ALL")) %>%
group_by(town) %>%
count(diabetes) %>%
mutate(prop = n / sum(n)) %>%
ggplot(aes(x = town, y = prop, fill = diabetes)) +
geom_bar(stat = "identity") +
coord_flip()
EDIT: added pre-processing into pipeline using bind_rows and mutate instead of rbind and transform

Resources