R - Having trouble making US heatmap of states - r

I am trying to make a US state heatmap of the Tidy Tuesday data this week. I am having a lot of trouble getting it to work and I am thinking this should only take a few lines of code.
Is USMAP the best way to do mapping like this in R?
Is there a ggplot way to do this instead of using the usmap package?
What am I doing wrong in my example?
library(usmap)
library(tidyverse)
nurses <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-05/nurses.csv')
plot_data <- nurses %>%
filter(Year == 2020) %>%
select(State, `Total Employed RN`) %>%
rename("Total_Employed" = 2)
# Works with no data
plot_usmap()
# Does not work with data ??
plot_usmap(data = plot_data,
values = Total_Employed,
color = "blue") +
scale_fill_continuous(low = "white",
high = "blue")
# The column is real and I can access it
plot_data %>%
mutate(test_column = (1 + Total_Employed))
# Trying to emulate an example from : https://cran.r-project.org/web/packages/usmap/vignettes/mapping.html
statepop <- statepop %>%
rename(State = 3)
merged_df <- left_join(plot_data, statepop, by = "State") %>%
select(fips, abbr, Total_Employed, State)
merged_df
# Still does not work even though data is in the same format
plot_usmap(data = merged_df,
values = Total_Employed,
color = "blue") +
scale_fill_continuous(low = "white",
high = "blue")

Various answers in this thread: https://twitter.com/Indescribled/status/1445234775858368514
My preferred one - https://twitter.com/ivelasq3/status/1445242672021606401

Related

Plotting a line graph by datetime with a histogram/bar graph by date

I'm relatively new to R and could really use some help with some pretty basic ggplot2 work.
I'm trying to visualize total number of submissions on a graph, showing the overall total in a line graph and the daily total in a histogram (or bar graph) on top of it. I'm not sure how to add breaks or bins to the histogram so that it takes the submission datetime column and makes each bar the daily total.
I tried adding a column that converts the datetime into just date and plots based on that, but I'd really like the line graph to include the time.
Here's what I have so far:
df <- df %>%
mutate(datetime = lubridate::mdy_hm(datetime))%>%
mutate(date = lubridate::as_date(datetime))
#sort by datetime
df <- df %>%
arrange(datetime)
#add total number of submissions
df <- df %>%
mutate(total = row_number())
#ggplot
line_plus_histo <- df%>%
ggplot() +
geom_histogram(data = df, aes(x=datetime)) +
geom_line(data = df, aes(x=datetime, y=total), col = "red") +
stat_bin(data = df, aes(x=date), geom = "bar") +
labs(
title="Submissions by Day",
x="Date",
y="Submissions",
legend=NULL)
line_plus_histo
As you can see, I'm also calculating the total number of submissions by sorting by time and then adding a column with the row number. So if you can help me use a better method I'd really appreciate it.
Please, find below the line plus histogram of time v. submissions:
Here's the pastebin link with my data
You can extend your data manipulation by:
df <- df |>
mutate(datetime = lubridate::mdy_hm(datetime)) |>
arrange(datetime) |>
mutate(midday = as_datetime(floor_date(as_date(datetime), unit = "day") + 0.5)) |>
mutate(totals = row_number()) |>
group_by(midday) |>
mutate(N = n())|>
ungroup()
then use midday for bars and datetime for line:
df%>%
ggplot() +
geom_bar(data = df, aes(x = midday)) +
geom_line(data = df, aes(x=datetime, y=totals), col = "red") +
labs(
title="Submissions by Day",
x="Date",
y="Submissions",
legend=NULL)
PS. Sorry for Polish locales on X axis.
PS2. With geom_bar it looks much better
Created on 2022-02-03 by the reprex package (v2.0.1)

plotly and ggplot legend order interaction

I have multiple graphs that I am plotting with ggplot and then sending to plotly. I set the legend order based the most recent date, so that one can easily interpret the graphs. Everything works great in generating the ggplot, but once I send it through ggplotly() the legend order reverts to the original factor level. I tried resetting the factors but this creates a new problem - the colors are different in each graph.
Here's the code:
Data:
Country <- c("CHN","IND","INS","PAK","USA")
a <- data.frame("Country" = Country,"Pop" = c(1400,1300,267,233,330),Year=rep(2020,5))
b <- data.frame("Country" = Country,"Pop" = c(1270,1000,215,152,280),Year=rep(2000,5))
c <- data.frame("Country" = Country,"Pop" = c(1100,815,175,107,250),Year=rep(1990,5))
Data <- bind_rows(a,b,c)
Legend Ordering Vector - This uses 2020 as the year to determine order.
Legend_Order <- Data %>%
filter(Year==max(Year)) %>%
arrange(desc(Pop)) %>%
select(Country) %>%
unlist() %>%
as.vector()
Then I create my plot and use Legend Order as breaks
Graph <- Data %>%
ggplot() +
geom_line(aes(x = Year, y = Pop, group = Country, color = Country), size = 1.2) +
scale_color_discrete(name = 'Country', breaks = Legend_Order)
Graph
But then when I pass this on to:
ggplotly(Graph)
For some reason plotly ignores the breaks argument and uses the original factor levels.
If I set the factor levels beforehand, the color schemes changes (since the factors are in a different order).
How can I keep the color scheme from graph to graph, but change the legend order when using plotly?
Simply recode your Conutry var as factor with the levels set according to Legend_Order. Try this:
library(plotly)
library(dplyr)
Country <- c("CHN","IND","INS","PAK","USA")
a <- data.frame("Country" = Country,"Pop" = c(1400,1300,267,233,330),Year=rep(2020,5))
b <- data.frame("Country" = Country,"Pop" = c(1270,1000,215,152,280),Year=rep(2000,5))
c <- data.frame("Country" = Country,"Pop" = c(1100,815,175,107,250),Year=rep(1990,5))
Data <- bind_rows(a,b,c)
Legend_Order <- Data %>%
filter(Year==max(Year)) %>%
arrange(desc(Pop)) %>%
select(Country) %>%
unlist() %>%
as.vector()
Data$Country <- factor(Data$Country, levels = Legend_Order)
Graph <- Data %>%
ggplot() +
geom_line(aes(x = Year, y = Pop, group = Country, color = Country), size = 1.2)
ggplotly(Graph)
To "lock in" the color assignment you can make use of a named color vector like so (for short I only show the ggplots):
# Fix the color assignments using a named color vector which can be assigned via scale_color_manual
cols <- scales::hue_pal()(5) # Default ggplot2 colors
cols <- setNames(cols, Legend_Order) # Set names according to legend order
# Plot with unordered Countries but "ordered" color assignment
Data %>%
ggplot() +
geom_line(aes(x = Year, y = Pop, color = Country), size = 1.2) +
scale_color_manual(values = cols)
# Plot with ordered factor
Data$Country <- factor(Data$Country, levels = Legend_Order)
Data %>%
ggplot() +
geom_line(aes(x = Year, y = Pop, color = Country), size = 1.2) +
scale_color_manual(values = cols)

Possible to animate polygon fill using gganimate in R?

I have county level data recording the year an invasive insect pest was first detected in that county between 2002 and 2018. I created a map using ggplot2 and the maps package that fills the county polygons with a color according to the year the pest was detected.
**Is there a way to use the gganimate package to animate this map with the first frame filling in only polygons with a detection date of 2002, the second frame filling polygons with a detection date of 2003 or earlier (so 2002 and 2003), a third frame for detection dates of 2004 or earlier (2002, 2003, 2004), etc.? **
Clarification: I'd like it so all the county polygons are always visible and filled in with white initially and each frame of the animation adds fills in counties based on the year of detection.
I've tried using the transition_reveal(data$detect_year) with the static plot but get an error that "along data must either be integer, numeric, POSIXct, Date, difftime, orhms".
Here's some code for a reproducible example:
library(dplyr)
library(purrr)
library(maps)
library(ggplot2)
library(gganimate)
# Reproducible example
set.seed(42)
map_df <- map_data("county") %>%
filter(region == "minnesota")
map_df$detection_year <- NA
# Add random detection year to each county
years <- 2002:2006
map_list <- split(map_df, f = map_df$subregion)
map_list <- map(map_list, function(.x) {
.x$detection_years <- mutate(.x, detection_years = sample(years, 1))
})
# collapse list back to data frame
map_df <- bind_rows(map_list)
map_df$detection_years <- as.factor(map_df$detection_years)
# Make plot
static_plot <- ggplot(map_df,
aes(x = long,
y = lat,
group = group)) +
geom_polygon(data = map_df, color = "black", aes(fill = detection_years)) +
scale_fill_manual(values = terrain.colors(n = length(unique(map_df$detection_years))),
name = "Year EAB First Detected") +
theme_void() +
coord_fixed(1.3)
animate_plot <- static_plot +
transition_reveal(detection_years)
If it's possible to do this with gganimate, I'd like to but I'm also open to other solutions if anyone has ideas.
After getting an answer from #RLave that almost did what I wanted and spending a little time with the documentation, I was able to figure out a way to do what I want. It doesn't seem very clean, but it works.
Essentially, I created a copy of my data frame for each year that needed a frame in the animation. Then for each year of detection I wanted to animate, I edited the detection_year variable in that copy of the data frame so that any county that had a detection in the year of interest or earlier retained their values and any county that had no detection yet was converted to the value I plotted as white. This made sure all the counties were always plotted. Then I needed to use transition_manual along with a unique ID I gave to each copy of the original data frame to determine the order of the animation.
library(dplyr)
library(purrr)
library(maps)
library(ggplot2)
library(gganimate)
# Reproducible example
set.seed(42)
years <- 2002:2006
map_df <- map_data("county") %>%
filter(region == "minnesota")
map_df <- map_df %>%
group_by(subregion) %>%
mutate(detection_year = sample(years,1))
animate_data <- data.frame()
for(i in 2002:2006){
temp_dat <- map_df %>%
mutate(detection_year = as.numeric(as.character(detection_year))) %>%
mutate(detection_year = case_when(
detection_year <= i ~ detection_year,
detection_year > i ~ 2001
),
animate_id = i - 2001
)
animate_data <- bind_rows(animate_data, temp_dat)
}
animate_data$detection_year <- as.factor(as.character(animate_data$detection_year))
# Make plot
static_plot <- ggplot(animate_data,
aes(x = long,
y = lat,
group = group)) +
geom_polygon(data = animate_data, color = "black", aes(fill = detection_year)) +
scale_fill_manual(values = c("white",
terrain.colors(n = 5)),
name = "Year First Detected") +
theme_void() +
coord_fixed(1.3) #+
facet_wrap(~animate_id)
animate_plot <- static_plot +
transition_manual(frames = animate_id)
animate_plot
Possibily this, but I'm not sure that this is the expected output.
I changed your code, probably you don't need to split. I used group_by to assign a year to each region.
set.seed(42)
years <- 2002:2006
map_df <- map_data("county") %>%
filter(region == "minnesota")
map_df <- map_df %>%
group_by(subregion) %>%
mutate(detection_year = sample(years,1))
For the transition you need to define the id, here the same as the grouping (subregion or group) and a correct date format for the transition (along) variable (I used lubridate::year())
# Make plot
static_plot <- ggplot(map_df,
aes(x = long,
y = lat,
group = group)) +
geom_polygon(color = "black", aes(fill = as.factor(detection_year))) +
scale_fill_manual(values = terrain.colors(n = length(unique(map_df$detection_year))),
name = "Year EAB First Detected") +
theme_void() +
coord_fixed(1.3)
animate_plot <- static_plot +
transition_reveal(subregion, # same as the group variable
lubridate::year(paste0(detection_year, "-01-01"))) # move along years
Does this do it for you?

Standard evaluation inside a function with dplyr

I have data with lots of factor variables that I am visualising to get a feel for each of the variables. I am reproducing a lot of the code with minor tweaks for variable names etc. so decided to write a function to simply things. I just can't get it to work...
Dummy Data
ID <- sample(1:32, 128, replace = TRUE)
AgeGrp <- sample(c("18-65", "65-75", "75-85", "85+"), 128, replace = TRUE)
ID <- factor(ID)
AgeGrp <- factor(AgeGrp)
data <- data_frame(ID, AgeGrp)
data
Basically what I am trying to do with each factor variable is produce a bar chart with labels of percentages inside the bars. For example with the dummy data.
plotstats <- #Create a table with pre-summarised percentages
data %>%
group_by(AgeGrp) %>%
summarise(count = n()) %>%
mutate(pct = count/sum(count)*100)
age_plot <- #Plot the data
ggplot(data,aes(x = AgeGrp)) +
geom_bar() + #Add the percentage labels using pre-summarised table
geom_text(data = plotstats, aes(label=paste0(round(pct,1),"%"),y=pct),
size=3.5, vjust = -1, colour = "sky blue") +
ggtitle("Count of Age Group")
age_plot
This works fine with the dummy data - but when I try to create a function...
basic_plot <-
function(df, x){
plotstats <-
df %>%
group_by_(x) %>%
summarise_(
count = ~n(),
pct = ~count/sum(count)*100)
plot <-
ggplot(df,aes(x = x)) +
geom_bar() +
geom_text(data = plotstats, aes(label=paste0(round(pct,1),"%"),
y=pct), size=3.5, vjust = -1, colour = "sky blue")
plot
}
basic_plot(data, AgeGrp)
I get the error code :
Error in UseMethod("as.lazy") : no applicable method for 'as.lazy' applied to an object of class "factor"
I have looked at questions here, here, and here and also looked at the NSE Vignette but can't find my fault.

How to generate grouped bar plot or pie chart from list of csv files?

I got list of data.frame that need to be classified, I did manipulate these list and finally export them as csv files in default folder. However, to make these exported data more informative, I think it is better to generate grouped bar plot, or pie chart for each data.frame objects. As a beginner, I am still learning features of ggplot2 packages, so I have little idea how to do this easily. Can any one give me possible ideas how to generate grouped bar plot easily ? How can I generate well informative bar plot for list of files ? How can I make this happen ? Any idea ? Thanks in advance :)
reproducible data :
savedDF <- list(
bar.saved = data.frame(start=sample(100, 15), stop=sample(150, 15), score=sample(36, 15)),
cat.saved = data.frame(start=sample(100, 20), stop=sample(100,20), score=sample(45,20)),
foo.saved = data.frame(start=sample(125, 24), stop=sample(140, 24), score=sample(32, 24))
)
dropedDF <- list(
bar.droped = data.frame(start=sample(60, 12), stop=sample(90,12), score=sample(35,12)),
cat.droped = data.frame(start=sample(75, 18), stop=sample(84,18), score=sample(28,18)),
foo.droped = data.frame(start=sample(54, 14), stop=sample(72,14), score=sample(25,14))
)
so I am getting list of csv files from this pipeline :
comb <- do.call("rbind", c(savedDF, dropedDF))
cn <- c("letter", "saved","seq")
DF <- cbind(read.table(text = chartr("_", ".", rownames(comb)), sep = ".", col.names = cn), comb)
DF <- transform(DF, updown = ifelse(score>= 12, "stringent", "weak"))
by(DF, DF[c("letter", "saved", "updown")],
function(x) write.csv(x[-(1:3)],
sprintf("%s_%s_%s.csv", x$letter[1], x$updown[1], x$saved[1])))
To better understand the exported data, I think generating grouped bar plot and pie chart for each data.frame object will be much informative.
In desired plot, I intend to see number of features in each csv files for each data.frame objects. Can any one give me ideas to do this task ?
How can I make this happen easily by using ggplot2 packages ? Is there any way to get this done more efficiently ? Thanks a lot
If I understand correctly, this may work for you as a rough solution. Please comment to let me know if this is acceptable. In the future, if you can provide a rough sketch along with your data to show what you're trying to achieve that would be a good idea.
library(dplyr)
library(ggplot2)
plot_data <- DF %>%
group_by(letter, saved, updown) %>%
tally %>%
group_by(saved, updown) %>%
mutate(percentage = n/sum(n))
ggplot(plot_data, aes(x = saved, y = n, fill = saved)) +
geom_bar(stat = "identity") +
facet_wrap(~ letter + updown, ncol = 2)
You can always change the facet_wrap(~ letter + updown, ncol = 2) to an explicit facet_grid(letter ~ updown) if you wish.
Or you could view it this way:
ggplot(plot_data, aes(x = letter, y = n)) +
geom_bar(stat = "identity") +
facet_wrap(~updown+saved, ncol = 2)
For a pie (cleaning up and labeling is up to you):
ggplot(plot_data, aes(x = 1, y = percentage, fill = letter)) +
geom_bar(stat = "identity", width =1) +
facet_wrap(~updown+saved, ncol = 2) +
coord_polar(theta = "y") +
theme_void()
The bar, 4 interaction pie just requires some manipulating of your data:
library(dplyr)
library(tidyr)
library(ggplot2)
plot_data <- DF %>%
unite(interaction, saved, updown, sep = "-") %>%
group_by(letter, interaction) %>%
tally %>%
mutate(percentage = n/sum(n)) %>%
filter(letter == "bar")
ggplot(plot_data, aes(x = 1, y = percentage, fill = interaction)) +
geom_bar(stat = "identity", width =1) +
coord_polar(theta = "y") +
theme_void()
You should really look into dplyr, tidyr and ggplot2 packages. Read their documentation and vignettes and work through the exmaples. Best way to learn is by doing.

Resources