A heat map animation of daily deaths - r

I have a data frame that has columns Region, Date, and Deaths, and I've imported the package "maps" and its 50 state map.
All of the examples I've seen ask me to merge() the data with the map. However, when I do this merging, I manage to end up with an object of over 4 million rows.
The daily data is in melted8 and then melted9.
Because of the huge size of the merge(), the animate() step takes a long time to run... in fact I cut it short after 10 minutes. I do not know if my ggplot() is correctly made, but it is also huge (240 mb).
Is there a more reasonably-sized object I could give to ggplot(), and am I giving ggplot() the right instructions?
# a sample
melted8[sample(nrow(melted8), 5), ]
region date deaths
<chr> <int> <dbl>
arizona 214 7.2815030
missouri 287 0.0000000
arkansas 160 0.3313668
mississippi 53 0.0000000
new jersey 300 0.7880939
library(ggplot2)
library(gganimate)
library(maps)
us.map <- map_data("state") #50 state map from library(maps)
melted9 <- merge(us.map, melted8, by="region", all.x=T)
d <- ggplot(melted9) +
geom_polygon(aes(long,lat, group = group), color='white', fill=NA, data=us.map) +
geom_polygon(aes(long,lat, group = group, fill = deaths), color = "white") +
scale_fill_gradient(low = "gray65", high = "red") +
labs(title = "Deaths per Day") +
ease_aes("linear")
a <- animate(d, duration = 30, nframes = nrow(melted9)/50, end_pause = 5)
a

You don't have to merge the dataset with the map file, if you use geom_map instead of geom_polygon.
See if this is faster for you:
layer_type.GeomMap <- function(x) 'point' # must run this line first
melted8 %>%
ggplot(aes(fill = deaths, map_id = region)) +
geom_map(map = us.map) +
expand_limits(x = us.map$long, y = us.map$lat) +
coord_fixed() +
scale_fill_gradient(low = "gray65", high = "red") +
theme(legend.position = "bottom") +
labs(title = "Deaths per Day: {closest_state}",
x = "lon", y = "lat") +
transition_states(date)
Dataset used (simulating 7 days of records for each state):
library(dplyr)
set.seed(123)
melted8 <- data.frame(region = unique(us.map$region)) %>%
mutate(date = list(seq(1, 7))) %>%
tidyr::unnest(cols = c(date)) %>%
group_by(region) %>%
mutate(deaths = abs(rnorm(n()))) %>%
ungroup()

Related

ggplot and lapply /mapply for nested list and data frames

Edit:
I did find a way to do what I need, but now I'm having trouble getting a title to appear for each of the plots that are created so I know which site I am looking at:
lapply(seq(gl), function(i){
lapply(seq(gl[[i]]), function(j){
ggplot() +
geom_point(data = gl[[i]][[j]], aes(x = `UTC_date.1`, y = `actSWE_mm`, color = `swe_Res_mm`))+
geom_segment(data = gl[[i]][[j]], aes(x = `UTC_date.1`, y = `actSWE_mm`, xend = `UTC_date.1`, yend = `swe_mm`), alpha=.2)+
scale_color_steps2(low = "blue", mid = "white", high = "red") +
guides(color = FALSE) + geom_point(data = gl[[i]][[j]], aes(x = `UTC_date.1`, y = `swe_mm`), shape = 1) +
facet_wrap(vars(year), scales="free_x") + theme_bw()
})})
I tried adding:
theme(plot.title = paste(names(gl)[i], names(gl[[i]])[j], sep = "_"))
but that does not seem to work.
Original:
I have a list of 12 dataframes representing each month. Within each data frame are timeseries measurements of several different sites. Below is a table example (not actual data) of the data for January (monthSplit is the list - monthSplit$January):
site_id UTC_date.1 swe_mm actSWE_mm swe_Res_mm Month Year
<int> <date> <dbl> <dbl> <dbl> <chr> <num>
1003 2005-01-01 2 54.2 0.241 53.059 "January" 2005
1003 2005-01-02 2 54.2 0.241 53.059 "January" 2005
958 2005-01-01 2 154.2 0.241 153.059 "January" 2005
946 2005-01-01 2 154.2 152.25 1.95 "January" 2005
946 2005-01-02 2 500.2 550.241 50.059 "January" 2005
I'm having two problems when trying to perform ggplot over a list of dataframes that need to be further subset by the unique sites.
I tried to create a ggplot function and use mapply:
plot_fun = function(d) {
ggplot(d, aes(x = `UTC_date.1`, y = `actSWE_mm`)) +
geom_segment(aes(xend = `UTC_date.1`, yend = `swe_mm`), alpha=.2) + geom_point(aes(color = `swe_Res_mm`)) +
scale_color_steps2(low = "blue", mid = "white", high = "red") +
guides(color = FALSE) + geom_point(aes(y = `swe_mm`), shape = 1) +
facet_wrap(vars(year), scales="free_x") + theme_bw()
}
pltlist = mapply(plot_fun, d = monthSplit, SIMPLIFY=FALSE)
This yielded plot in the right format and everything, however it was not split by site_id. So it created a plot that contained several plots with the month's plot values each year. EG: September plot yielded 13 plots in one window representing each year from 2003-2015 for the month of September. The problem is, all the sites were lumped in there.
When looking at the actual data (as is the case with the above plot function), nothing meaningful is gained from the plots because the range of data varies so broadly in the y-axis.
I was wondering how I would go about splitting the list of plots further by site_id so that only one site appears in each plot for comparison.
Add group = site_id if you want to have one color point and line per site_id, e.g.
plot_fun = function(d) { ggplot(d, aes(x = UTC_date.1, y = actSWE_mm, group = site_id)) + geom_segment(aes(xend = UTC_date.1, yend = swe_mm), alpha=.2) + geom_point(aes(color = swe_Res_mm)) + scale_color_steps2(low = "blue", mid = "white", high = "red") + guides(color = FALSE) + geom_point(aes(y = swe_mm), shape = 1) + facet_wrap(vars(year), scales="free_x") + theme_bw() }
(Note I had to delete all your '`' characters as that is the code character).
Not this proposal gives not more plots, but more lines per plot.
If you want to have one plots per site_id, you might split your datasets by that variable, or include it in the facet_wrap:
facet_wrap(facets = ~ year + site_id, scales="free_x")
And if the scales are very different per site, I use log scales. However, zeros and negative values cannot be graphed then, that is a drawback.

Plotting in r by date range

I have a dataset with 4000 categoric variables which are city names arranged by date. I can do a plot of the entire dataset with an overall count.
What I need to do is be able to plot aggregates of specific cities by specific date ranges. I cannot use by quarter or anything like that because the required date ranges every year are different. I need to be able to, say, select 2016/4/1 to 2016/6/23 to get a count of how many are Denver.
How can I do this?
library(ggplot2)
library(ggpubr)
theme_set(theme_classic())
df <- log %>%
group_by(Location) %>%
summarise(counts = n())
df
ggplot(df, aes(x = Location, y = counts)) +
geom_bar(fill = "#0073C2FF", stat = "identity",width = .65) +
geom_text(aes(label = counts), vjust = -0.3) +
theme(axis.text.x = element_text(angle=65, vjust=0.6)) +
labs(title="Locations of Library Instruction",
subtitle="2016-2020")

How to avoid overlap between segments while plotting a Gantt chart with ggplot in R

Dears,
I have a question regarding plotting a Gantt chart with ggplot in R. Whenever I plot my data there is an overlap between segments. For example (as per the attached plot), you will see that product A1 should have four segments (as per the data attached) but due to the overlap you can see that there are only two segments (as per the attached image link). I would like to plot every segment for the same product in a separate row to avoid such overlapping. Once I have each segment on a row, I would like to have the product name for the group of segments is shown only one time. I am attaching the code I used in addition to the plot.
Thanks in advance
Mohamed
Product Codes Batch Number Start End
A 1 1000 1500
A 1 1400 2000
A 1 1800 2300
A 1 6573 6905
A 2 13773 14105
A 2 5040 5372
A 2 720 1052
A 3 1921 2253
A 3 3933 4265
A 3 13441 13773
library(ggplot2)
library(grid)
library(dplyr)
library(gtable)
library(readxl)
library(readxl)
library(reshape2)
library(ggrepel)
library(RColorBrewer)
Book2 <- read_excel("C:/Users/...stack.xlsx", sheet = "Sheet1")
attach(Book2)
df2<-Book2
actcols <- c("#d95f0e","#756bb1","#0218a2","#ffb703", "#f76f73", "#027fdc", "#07c4c5","#303030","#11793b","#5d7261","#3f5f34","#905435","#997940","#ab3434","#961B4D")
myColors <- brewer.pal(5,"Set1")
start<-as.POSIXct('04/06/2020',format='%m/%d/%Y')
date<-(df2$Start)*60+start
zz2<-(df2$End)*60+start
Product<-paste(df2$`Product Codes`,df2$`Batch Number`)
plot2 <- ggplot(df2, aes(x=date,xend=zz2-900,y=Product, yend=Product,color=Product))+ geom_segment(stat = "identity")+ theme_bw()+ geom_segment(size=5)+
theme(plot.title=element_text(size=24, face="bold"),
axis.text.x=element_text(size=10),
axis.text.y=element_text(size=14),
axis.title.x=element_text(size=16),
axis.title.y=element_text(size=14),
legend.title = element_text(size=16),
legend.position="top")+scale_x_datetime(date_labels ="%a %b %d", date_breaks ="1 day") +
theme(axis.text.x = element_text(hjust=2.5))
plot2
Try this. Basic Idea is to add the segment numbers to your Product variable and map the new Product_segment variable on y:
library(RColorBrewer)
library(ggplot2)
library(dplyr)
df2 <- read.table(text = "'Product Codes' 'Batch Number' Start End
A 1 1000 1500
A 1 1400 2000
A 1 1800 2300
A 1 6573 6905
A 2 13773 14105
A 2 5040 5372
A 2 720 1052
A 3 1921 2253
A 3 3933 4265
A 3 13441 13773", header = TRUE, stringsAsFactors = FALSE)
df2 <- df2 %>%
rename(`Product Codes` = Product.Codes, `Batch Number` = Batch.Number)
actcols <- c("#d95f0e","#756bb1","#0218a2","#ffb703", "#f76f73", "#027fdc", "#07c4c5","#303030","#11793b","#5d7261","#3f5f34","#905435","#997940","#ab3434","#961B4D")
myColors <- brewer.pal(5,"Set1")
start<-as.POSIXct('04/06/2020',format='%m/%d/%Y')
# Data wrangling
df3 <- df2 %>%
mutate(
date = Start * 60 + start,
zz2 = End * 60 + start - 900,
Product = paste(`Product Codes`, `Batch Number`)
) %>%
select(-Start, -End) %>%
# Add segment number
group_by(Product) %>%
arrange(Product, date) %>%
mutate(segment = 1,
segment = cumsum(segment),
Product_segment = paste(Product, segment),
y.labels = ifelse(segment == n_distinct(segment), Product, "")) %>%
ungroup()
y.labels <- df3$y.labels
plot2 <- ggplot(df3, aes(x = date, xend = zz2, y = Product_segment, yend = Product_segment, color = Product)) +
geom_segment(size = 10) +
scale_x_datetime(date_labels ="%a %b %d", date_breaks ="1 day") +
scale_y_discrete(labels = y.labels) +
labs(x = NULL, y = NULL) +
theme_bw() +
theme(plot.title=element_text(size=24, face="bold"),
axis.text.x=element_text(size=10),
axis.text.y=element_text(size=14),
axis.title.x=element_text(size=16),
axis.title.y=element_text(size=14),
legend.title = element_text(size=16),
legend.position="top")
plot2
Edit: Subplots
subplot2 <- ggplot(df3, aes(x = date, xend = zz2, y = factor(segment), yend = factor(segment), color = Product)) +
geom_segment(size = 9) +
scale_x_datetime(date_labels ="%a %b %d", date_breaks ="1 day") +
labs(x = NULL, y = NULL) +
facet_wrap(~ Product, ncol = 1, scales = "free_y") +
theme_bw() +
theme(plot.title=element_text(size=24, face="bold"),
axis.text.x=element_text(size=10),
axis.text.y=element_text(size=14),
axis.title.x=element_text(size=16),
axis.title.y=element_text(size=14),
legend.title = element_text(size=16),
legend.position="top")
subplot2
Created on 2020-04-08 by the reprex package (v0.3.0)

Geom_text Value overlapping

I trying to create a Pie chart, but the value is overlapping each other.
Sample Data:
City_Area Age Spending
A 0-15 100
A 15-30 400
B 0-15 200
B 15-30 300
Here my code:
CA = filter(City_Area == 'A') %>% group_by(City_Area,Age,Spending)
ggplot(CA, aes(x="",y = Spending, fill = Age)) + geom_bar(stat='identity')+ coord_polar("y") + theme_void() + geom_text(aes(label = scales::percent(round((..count..)/sum(..count..),2)),y= ((..count..)/sum(..count..))), stat="count",position=position_stack(0.5))
Here without coord_polar
Using Rui Barradas code
The data preparation code seems to be wrong and so does the plotting code.
First, prepare the data. The main thing to do is to get rid of the dollar signs. I will do that with sub.
library(dplyr)
library(ggplot2)
CA2 <- CA %>%
mutate(Spending = as.numeric(sub("\\$", "", Spending))) %>%
filter(City_Area == 'A')
In the question there is a group_by line but for this example it is not needed.
Now the plot.
ggplot(CA2, aes(x = "", y = Spending, fill = Age)) +
geom_bar(stat = 'identity') +
coord_polar("y") +
theme_void() +
geom_text(aes(label = scales::percent(Spending/sum(Spending), 2)),
position = position_stack(0.5))
Data.
CA <- read.table(text = "
City_Area Age Spending
A 0-15 100$
A 15-30 400$
B 0-15 200$
B 15-30 300$
", header = TRUE)

How to make ggplot to order a stacked barchart

I have the following R code, where I transform the data and then order it by a specific column:
df2 <- df %>%
group_by(V2, news) %>%
tally() %>%
complete(news, fill = list(n = 0)) %>%
mutate(percentage = n / sum(n) * 100)
df22 <- df2[order(df2$news, -df2$percentage),]
I want to apply the ordered data "df22" in ggplot:
ggplot(df22, aes(x = V2, y = percentage, fill = factor(news, labels = c("Read","Otherwise")))) +
geom_bar(stat = "identity", position = "fill", width = .7) +
coord_flip() + guides(fill = guide_legend(title = "Online News")) +
scale_fill_grey(start = .1, end = .6) + xlab("Country") + ylab("Share")
Unfortunately, ggplot still returns me a plot without the order:
Does anyone know what is wrong with my code? This is not the same as to order bar chart with a single value per bar like here Reorder bars in geom_bar ggplot2. I try to order the cart by a specific category of a factor. In particular, I want to see countries with the largest share of Read news first.
Here is the data:
V2 news n percentage
1 United States News Read 1583 1.845139
2 Netherlands News Read 1536 1.790356
3 Germany News Read 1417 1.651650
4 Singapore News Read 1335 1.556071
5 United States Otherwise 581 0.6772114
6 Netherlands Otherwise 350 0.4079587
7 Germany Otherwise 623 0.7261665
8 Singapore Otherwise 635 0.7401536
I used the following R code:
df2 <- df %>%
group_by(V2, news) %>%
tally() %>%
complete(news, fill = list(n = 114)) %>%
mutate(percentage = n / sum(n) * 100)
df2 <- df2[order(df2$news, -df2$percentage),]
df2 <- df2 %>% group_by(news, percentage) %>% arrange(desc(percentage))
df2$V2 <- factor(df2$V2, levels = unique(df2$V2))
ggplot(df2, aes(x = V2, y = percentage, fill = news))+
geom_bar(stat = "identity", position = "stack") +
guides(fill = guide_legend(title = "Online News")) +
coord_flip() +
scale_x_discrete(limits = rev(levels(df2$V2)))
Everything was fine except some countries break the order for some reason and I do not understand why. Here is the picture:
What I did with the hints from guys, I used "arrange" command instead of dplyr
df4 <- arrange(df2, news, desc(percentage))
Here is the result:
Here's what I have - hope this is useful. As mentioned #Axeman - the trick is to reorder the labels as factors. Further, using coord_flip() reorders the labels in the opposite direction so scale_x_discrete() is needed.
I am using the small sample you provided.
library(ggplot2)
library(dplyr)
df <- read.csv("data.csv")
df <- arrange(df, news, desc(Percentage))
df$V2 <- factor(df$V2, levels = unique(df$V2))
ggplot(df, aes(x = V2, y = Percentage, fill = news))+
geom_bar(stat = "identity", position = "stack") +
guides(fill = guide_legend(title = "Online News")) +
coord_flip() +
scale_x_discrete(limits = rev(levels(df$V2)))

Resources