I am attempting to create heat maps with a large data set that has several factors. I'd like to get a birds eye view first, by plotting the heat map of all values and all factors. THEN, I'd like to subset the heat map plot by a variety of factors - but have ggplot2::geom_tile re-calculate the heat map so it plots the relative abundance based on whatever factors I've subsampled.
library(reshape2)
library(ggplot2)
library(dplyr)
#Test data
df <- data.frame(
Measurement = c(1:30),
CA = rep(rnorm(30, mean=20, sd=5)),
TX = rep(rnorm(30, mean=18, sd=5)),
NY = rep(rnorm(30, mean=34, sd=2))
)
df.melt <- melt(df,id = c("Measurement"))
Basic heat map plot code. My actual data includes several factors/columns from which I want to pull data for various comparisons.
#Basic plot
ggplot(data = df.melt,
aes(x = variable, y = Measurement, colors = value, fill = value)) +
geom_tile(color = "black") +
scale_fill_gradientn(colors = c("lightyellow", "darkred"))
I want the output colors to correspond to relative abundance by measurement. So I can look at Relative changes across CA, TX, and NY. This would be my "Base plot".
df.melt.reabun <- df.melt %>% group_by(Measurement) %>%
mutate(RelAbun = value/sum(value))
df.melt.reabun <- as.data.frame(df.melt.reabun)
#New plot with relative abundance
ggplot(data = df.melt.reabun,
aes(x = variable, y = Measurement, colors = RelAbun, fill = RelAbun)) +
geom_tile(color = "black") +
scale_fill_gradientn(colors = c("lightyellow", "darkred"))
What I also want to do is be able to re-plot however I want and the relative abundance to automatically calculate within ggplot tile.
#Assign plot object
heat <- ggplot(data = df.melt.reabun,
aes(x = variable, y = Measurement, colors = RelAbun, fill = RelAbun)) +
geom_tile(color = "black")+
scale_fill_gradientn(colors = c("lightyellow", "darkred"))
#Select variable to subset data
alt <- c("CA", "TX")
#Subset ggplot object
heat %+% subset(df.melt.reabun, variable %in% alt)
But this output is incorrect, because it is only showing relative abundance from the calculation that included CA, TX, and NY.
I want the relative abundance to re-calculate every time I subset the df to plot at this step: heat %+% subset()
I have a feeling I can smoothly combine group_by and geom_tile to do this automatically.. but I can't quite figure it out. Any help would be appreciated. I have MANY MANY combinations of heat maps I want to look at and I do NOT want to re-calculate the relative abundance "manually" each time.
It's generally advisable to do your data wranglings before passing the data frame to ggplot. In this case, something like the following could work:
subsetFun <- function(df, var.filter){
return(df %>%
filter(variable %in% var.filter) %>%
group_by(Measurement) %>%
mutate(RelAbun = value / sum(value)) %>%
ungroup())
}
heat %+% subsetFun(df.melt.reabun, alt)
Related
Using the airline-safety dataset available here, I'm trying to create a heat map in R. I want to order the heat map so that the airlines with the highest number of fatal accidents are listed at the top.
I'm able to order the heat map by "value" -
but this orders the heatmap by value, regardless of what the group is i.e. incidents, fatal accidents or fatalities.
# load packages -----------
library(tidyverse)
library(ggplot2)
library(reshape2)
library(dplyr)
library(plyr)
library(scales)
library(forcats)
# read in the data
airlines <- read.csv("/Volumes/GoogleDrive/My Drive/Uni/DVN/AT2/Blog 2/airline_incidents.csv", header = TRUE)
# select relevant columns
airlines_00_14 <- airlines[,c(1,6,7,8)]
# create a long dataset
airlines_00_14.m <- melt(airlines_00_14)
# rescale values for heat map
airlines_00_14.m <- ddply(airlines_00_14.m, .(variable), transform, rescale = rescale(value))
# create heat map
(q <- airlines_00_14.m %>%
ggplot( aes(x = variable, y = reorder(airline, value))) +
geom_tile(aes(fill = rescale), colour = "white") +
scale_fill_gradient(low = "white", high = "steelblue"))
One way to do this is to create the order before you melt, like this:
# order by fatalities and generate air_order value
airlines_00_14 = airlines_00_14[order(airlines_00_14$fatal_accidents_00_14),]
airlines_00_14$air_order = seq_len(nrow(airlines_00_14))
Then, when you use reshape2::melt, set `id.vars = c("airline","air_order")
# create a long dataset
airlines_00_14.m <- reshape2::melt(airlines_00_14,id.vars = c("airline", "air_order"))
Then, in your plot, use y=reorder(airline, air_order) instead of the current y=reorder(airline, value)
Output:
I already drew 3 plots using ggplot, geom_line and geom_ribbon etc.
I want to merge y axis plots of p_min, p_max and p_mean in a layout.
p_min, p_max and p_mean must locate in y axis.
x axis is number(1,2).
Let me know how to draw plots of multiple y axis using complex variables in a layout.
I think the crux here is that you should combine your data so that each geom only needs to refer to one table, with the characteristic of the source table (e.g. min vs. max vs. mean) made explicit as a variable in that combined table.
Here's a quick function to make some fake data and save it to three tables:
make_fake <- function(a, b, label) {
df <- data.frame(name = "apple", number = 1:5, value = a - b*sqrt(1:5), level = 2)
df$lower = df$value - 0.5; df$upper = df$value + 0.5; df$label = label
df
}
fake_min <- make_fake(3,.1, "min")
fake_max <- make_fake(7,1.5, "max")
fake_mean <- make_fake(5,0.8, "mean")
To plot them together, it will be simpler if they are combined such that each geom only needs to refer to one table. I've done this by adding a label variable in the fake data above.
If we use base::rbind, we can append the tables to each other and plot them, distinguishing series by having the color aesthetic connect to the label variable:
ggplot(data = rbind(fake_min, fake_max, fake_mean),
aes(x=number, y=value, group=label))+
geom_line(aes(color=label))+
geom_ribbon(aes(ymin=lower, ymax=upper, fill=label, group=label), alpha=0.3)
Maybe you want a combined ribbon showing the highest upper and lowest lower. Then you could precalc those, here using dplyr:
library(dplyr)
rbind(fake_min, fake_max, fake_mean) %>%
group_by(number) %>%
summarize(upper = max(upper),
lower = min(lower)) -> fake_ribbon
rbind(fake_min, fake_max, fake_mean) %>%
ggplot(aes(x=number)) +
geom_line(aes(color=label, y=value))+
geom_ribbon(data = fake_ribbon, aes(ymin=lower, ymax=upper), alpha=0.2)
I have a data set with 3 columns: date, weight, and location. I want to make a graph with time on the x-axis and weight on the y-axis with a different line for each location, where each point on the line is the mean weight of all samples from that location at that date. The only ways I've been able to come up with to do this would take way too long and require more lines of code than seems reasonable just to make a graph. For instance I tried subsetting like this:
A <- df$Location == "A"
Aug10_19 <- df$Date == "2019/07/10"
ind <- Aug10_19 & A
mean(df$Weight[ind])
But then I would have to do this manually for every individual combination of date and location and then force all the means into a new data frame. What is the shorter way to accomplish this?
You can use ggplot2 to quickly create summary plots.
library(dplyr)
library(ggplot2)
df <- transmute(
iris,
Location = Species,
Date = as.Date(as.character(
cut(Sepal.Length, breaks = 3,
labels = c("2019-07-10", "2019-07-12", "2019-07-15")))),
Weight = Sepal.Width)
ggplot(data = df,
mapping = aes(x = Date, y = Weight, colour = Location)) +
stat_summary(fun = "mean", geom = "line") +
theme_bw()
I have county level data recording the year an invasive insect pest was first detected in that county between 2002 and 2018. I created a map using ggplot2 and the maps package that fills the county polygons with a color according to the year the pest was detected.
**Is there a way to use the gganimate package to animate this map with the first frame filling in only polygons with a detection date of 2002, the second frame filling polygons with a detection date of 2003 or earlier (so 2002 and 2003), a third frame for detection dates of 2004 or earlier (2002, 2003, 2004), etc.? **
Clarification: I'd like it so all the county polygons are always visible and filled in with white initially and each frame of the animation adds fills in counties based on the year of detection.
I've tried using the transition_reveal(data$detect_year) with the static plot but get an error that "along data must either be integer, numeric, POSIXct, Date, difftime, orhms".
Here's some code for a reproducible example:
library(dplyr)
library(purrr)
library(maps)
library(ggplot2)
library(gganimate)
# Reproducible example
set.seed(42)
map_df <- map_data("county") %>%
filter(region == "minnesota")
map_df$detection_year <- NA
# Add random detection year to each county
years <- 2002:2006
map_list <- split(map_df, f = map_df$subregion)
map_list <- map(map_list, function(.x) {
.x$detection_years <- mutate(.x, detection_years = sample(years, 1))
})
# collapse list back to data frame
map_df <- bind_rows(map_list)
map_df$detection_years <- as.factor(map_df$detection_years)
# Make plot
static_plot <- ggplot(map_df,
aes(x = long,
y = lat,
group = group)) +
geom_polygon(data = map_df, color = "black", aes(fill = detection_years)) +
scale_fill_manual(values = terrain.colors(n = length(unique(map_df$detection_years))),
name = "Year EAB First Detected") +
theme_void() +
coord_fixed(1.3)
animate_plot <- static_plot +
transition_reveal(detection_years)
If it's possible to do this with gganimate, I'd like to but I'm also open to other solutions if anyone has ideas.
After getting an answer from #RLave that almost did what I wanted and spending a little time with the documentation, I was able to figure out a way to do what I want. It doesn't seem very clean, but it works.
Essentially, I created a copy of my data frame for each year that needed a frame in the animation. Then for each year of detection I wanted to animate, I edited the detection_year variable in that copy of the data frame so that any county that had a detection in the year of interest or earlier retained their values and any county that had no detection yet was converted to the value I plotted as white. This made sure all the counties were always plotted. Then I needed to use transition_manual along with a unique ID I gave to each copy of the original data frame to determine the order of the animation.
library(dplyr)
library(purrr)
library(maps)
library(ggplot2)
library(gganimate)
# Reproducible example
set.seed(42)
years <- 2002:2006
map_df <- map_data("county") %>%
filter(region == "minnesota")
map_df <- map_df %>%
group_by(subregion) %>%
mutate(detection_year = sample(years,1))
animate_data <- data.frame()
for(i in 2002:2006){
temp_dat <- map_df %>%
mutate(detection_year = as.numeric(as.character(detection_year))) %>%
mutate(detection_year = case_when(
detection_year <= i ~ detection_year,
detection_year > i ~ 2001
),
animate_id = i - 2001
)
animate_data <- bind_rows(animate_data, temp_dat)
}
animate_data$detection_year <- as.factor(as.character(animate_data$detection_year))
# Make plot
static_plot <- ggplot(animate_data,
aes(x = long,
y = lat,
group = group)) +
geom_polygon(data = animate_data, color = "black", aes(fill = detection_year)) +
scale_fill_manual(values = c("white",
terrain.colors(n = 5)),
name = "Year First Detected") +
theme_void() +
coord_fixed(1.3) #+
facet_wrap(~animate_id)
animate_plot <- static_plot +
transition_manual(frames = animate_id)
animate_plot
Possibily this, but I'm not sure that this is the expected output.
I changed your code, probably you don't need to split. I used group_by to assign a year to each region.
set.seed(42)
years <- 2002:2006
map_df <- map_data("county") %>%
filter(region == "minnesota")
map_df <- map_df %>%
group_by(subregion) %>%
mutate(detection_year = sample(years,1))
For the transition you need to define the id, here the same as the grouping (subregion or group) and a correct date format for the transition (along) variable (I used lubridate::year())
# Make plot
static_plot <- ggplot(map_df,
aes(x = long,
y = lat,
group = group)) +
geom_polygon(color = "black", aes(fill = as.factor(detection_year))) +
scale_fill_manual(values = terrain.colors(n = length(unique(map_df$detection_year))),
name = "Year EAB First Detected") +
theme_void() +
coord_fixed(1.3)
animate_plot <- static_plot +
transition_reveal(subregion, # same as the group variable
lubridate::year(paste0(detection_year, "-01-01"))) # move along years
Does this do it for you?
Hi visualization lovers,
I am trying to create a color map plot,like this one:
(source: https://github.com/hrbrmstr/albersusa)
BUT i want this maps to be biased so that the areas of the states to be proportional to the value I provide (in particular,I use GPD value).
What i mean is that I want some states to look bigger, some smaller that they are in reality but reminding the real USA map as much as possible.
No problems with the states moving or shape destroying.
Any ideas? Any ready solutions?
Currently I use R and albersusa package because it is something I am familiar with. Open to change!
My current code for the plot is:
gmap<-
ggplot() +
geom_map(data = counties#data, map = cmap,
aes(fill =atan(y/x),alpha=x+y, map_id = name),
color = "gray50") +
geom_map(data = smap, map = smap,
aes(x = long, y = lat, map_id = id),
color = "black", size = .5, fill = NA) +
theme_map(base_size = 12) +
theme(plot.title=element_text(size = 16, face="bold",margin=margin(b=10))) +
theme(plot.subtitle=element_text(size = 14, margin=margin(b=-20))) +
theme(plot.caption=element_text(size = 9, margin=margin(t=-15),hjust=0)) +
scale_fill_viridis()+guides(alpha=F,fill=F)
Here's a very ugly first try to get you started, using the outlines from the maps package and some data manipulation from dplyr.
library(maps)
library(dplyr)
library(ggplot2)
# Generate the base outlines
mapbase <- map_data("state.vbm")
# Load the centroids
data(state.vbm.center)
# Coerce the list to a dataframe, then add in state names
# Then generate some random value (or your variable of interest, like population)
# Then rescale that value to the range 0.25 to 0.95
df <- state.vbm.center %>% as.data.frame() %>%
mutate(region = unique(mapbase$region),
somevalue = rnorm(50),
scaling = scales::rescale(somevalue, to = c(0.25, 0.95)))
df
# Join your centers and data to the full state outlines
df2 <- df %>%
full_join(mapbase)
df2
# Within each state, scale the long and lat points to be closer
# to the centroid by the scaling factor
df3 <- df2 %>%
group_by(region) %>%
mutate(longscale = scaling*(long - x) + x,
latscale = scaling*(lat - y) + y)
df3
# Plot both the outlines for reference and the rescaled polygons
ggplot(df3, aes(long, lat, group = region, fill = somevalue)) +
geom_path() +
geom_polygon(aes(longscale, latscale)) +
coord_fixed() +
theme_void() +
scale_fill_viridis()
These outlines aren't the best, and the centroid positions they shrink toward cause the polygons to sometimes overlap the original state outline. But it's a start; you can find better shapes for US states and various centroid algorithms.