Hi visualization lovers,
I am trying to create a color map plot,like this one:
(source: https://github.com/hrbrmstr/albersusa)
BUT i want this maps to be biased so that the areas of the states to be proportional to the value I provide (in particular,I use GPD value).
What i mean is that I want some states to look bigger, some smaller that they are in reality but reminding the real USA map as much as possible.
No problems with the states moving or shape destroying.
Any ideas? Any ready solutions?
Currently I use R and albersusa package because it is something I am familiar with. Open to change!
My current code for the plot is:
ggplot() +
geom_map(data = counties#data, map = cmap,
aes(fill =atan(y/x),alpha=x+y, map_id = name),
color = "gray50") +
geom_map(data = smap, map = smap,
aes(x = long, y = lat, map_id = id),
color = "black", size = .5, fill = NA) +
theme_map(base_size = 12) +
theme(plot.title=element_text(size = 16, face="bold",margin=margin(b=10))) +
theme(plot.subtitle=element_text(size = 14, margin=margin(b=-20))) +
theme(plot.caption=element_text(size = 9, margin=margin(t=-15),hjust=0)) +
Here's a very ugly first try to get you started, using the outlines from the maps package and some data manipulation from dplyr.
# Generate the base outlines
mapbase <- map_data("state.vbm")
# Load the centroids
# Coerce the list to a dataframe, then add in state names
# Then generate some random value (or your variable of interest, like population)
# Then rescale that value to the range 0.25 to 0.95
df <- state.vbm.center %>% as.data.frame() %>%
mutate(region = unique(mapbase$region),
somevalue = rnorm(50),
scaling = scales::rescale(somevalue, to = c(0.25, 0.95)))
# Join your centers and data to the full state outlines
df2 <- df %>%
# Within each state, scale the long and lat points to be closer
# to the centroid by the scaling factor
df3 <- df2 %>%
group_by(region) %>%
mutate(longscale = scaling*(long - x) + x,
latscale = scaling*(lat - y) + y)
# Plot both the outlines for reference and the rescaled polygons
ggplot(df3, aes(long, lat, group = region, fill = somevalue)) +
geom_path() +
geom_polygon(aes(longscale, latscale)) +
coord_fixed() +
theme_void() +
These outlines aren't the best, and the centroid positions they shrink toward cause the polygons to sometimes overlap the original state outline. But it's a start; you can find better shapes for US states and various centroid algorithms.
So I have the following code which produces:
The issue here is twofold:
The group bar chart automatically places the highest value on the top (i.e. for avenue 4 CTP is on top), whereas I would always want FTP to be shown first then CTP to be shown after (so always blue bar then red bar)
I need all of the values to scale to 100 or 100% for their respective group (so for CTP avenue 4 would have a huge bar graph but the other avenues should be extremely tiny)
I am new to 'R'/Stack overflow so sorry if anything is wrong/you need more but any help is greatly appreciated.
# function to specify decimals
specify_decimal <- function(x, k) trimws(format(round(x, k), nsmall=k))
# sample data
avenues <- c("Avenue1", "Avenue2", "Avenue3", "Avenue4")
flytip_amount <- c(1000, 2000, 1500, 250)
collection_amount <- c(5, 15, 10, 2000)
# create data frame from the sample data
df <- data.frame(avenues, flytip_amount, collection_amount)
# got it working - now to test
df3 <- df
SumFA <- sum(df3$flytip_amount)
df3$FTP <- (df3$flytip_amount/SumFA)*100
df3$FTP <- specify_decimal(df3$FTP, 1)
SumCA <- sum(df3$collection_amount)
df3$CTP <- (df3$collection_amount/SumCA)*100
df3$CTP <- specify_decimal(df3$CTP, 1)
# Now we have percentages remove whole values
df2 <- df3[,c(1,4,5)]
df2 <- df2 %>% pivot_longer(-avenues)
FTGraphPos <- df2$name
ggplot(df2, aes(x = avenues, fill = as.factor(name), y = value)) +
geom_col(position = "dodge", width = 0.75) + coord_flip() +
labs(title = "Flytipping & Collection %", x = "ward_name", y = "Percentageperward") +
geom_text(aes(x= avenues, label = value), vjust = -0.1, position = "identity", size = 5)
I have tried the above and I have looked at lots of tutorials but nothing is exactly precise to what I need of ensuring the group bar charts puts the layers in the same order despite amount and scaling to 100/100%
As Camille notes, to handle ordering of the categories in a plot, you need to set them as factors, and then use functions from the forcats package to handle the order. Here I am using fct_relevel() (note that it will automatically convert character variables to factors).
Your numeric values are in fact set to character, so they need to be set to numeric for the chart to make sense.
To cover point #2, I'm using group_by() to calculate percentages within each name.
I have also fixed the labels so that they are properly dodged along with the bar chart. Also, note that you don't need to call ggplot2 or magrittr if you are calling tidyverse - those packages come along with it already.
df_plot <- df2 |>
mutate(name = fct_relevel(name, "CTP"),
value = as.numeric(value)) |>
group_by(name) |>
mutate(perc = value / sum(value)) |>
ggplot(df_plot, aes(x = value, y = avenues, fill = name)) +
geom_col(position = "dodge", width = 0.75) +
geom_text(aes(label = value), position = position_dodge(width = 0.75), size = 5) +
labs(title = "Flytipping & Collection %", x = "Percentageperward", y = "ward_name") +
guides(fill = guide_legend(reverse = TRUE))
I am attempting to create heat maps with a large data set that has several factors. I'd like to get a birds eye view first, by plotting the heat map of all values and all factors. THEN, I'd like to subset the heat map plot by a variety of factors - but have ggplot2::geom_tile re-calculate the heat map so it plots the relative abundance based on whatever factors I've subsampled.
#Test data
df <- data.frame(
Measurement = c(1:30),
CA = rep(rnorm(30, mean=20, sd=5)),
TX = rep(rnorm(30, mean=18, sd=5)),
NY = rep(rnorm(30, mean=34, sd=2))
df.melt <- melt(df,id = c("Measurement"))
Basic heat map plot code. My actual data includes several factors/columns from which I want to pull data for various comparisons.
#Basic plot
ggplot(data = df.melt,
aes(x = variable, y = Measurement, colors = value, fill = value)) +
geom_tile(color = "black") +
scale_fill_gradientn(colors = c("lightyellow", "darkred"))
I want the output colors to correspond to relative abundance by measurement. So I can look at Relative changes across CA, TX, and NY. This would be my "Base plot".
df.melt.reabun <- df.melt %>% group_by(Measurement) %>%
mutate(RelAbun = value/sum(value))
df.melt.reabun <- as.data.frame(df.melt.reabun)
#New plot with relative abundance
ggplot(data = df.melt.reabun,
aes(x = variable, y = Measurement, colors = RelAbun, fill = RelAbun)) +
geom_tile(color = "black") +
scale_fill_gradientn(colors = c("lightyellow", "darkred"))
What I also want to do is be able to re-plot however I want and the relative abundance to automatically calculate within ggplot tile.
#Assign plot object
heat <- ggplot(data = df.melt.reabun,
aes(x = variable, y = Measurement, colors = RelAbun, fill = RelAbun)) +
geom_tile(color = "black")+
scale_fill_gradientn(colors = c("lightyellow", "darkred"))
#Select variable to subset data
alt <- c("CA", "TX")
#Subset ggplot object
heat %+% subset(df.melt.reabun, variable %in% alt)
But this output is incorrect, because it is only showing relative abundance from the calculation that included CA, TX, and NY.
I want the relative abundance to re-calculate every time I subset the df to plot at this step: heat %+% subset()
I have a feeling I can smoothly combine group_by and geom_tile to do this automatically.. but I can't quite figure it out. Any help would be appreciated. I have MANY MANY combinations of heat maps I want to look at and I do NOT want to re-calculate the relative abundance "manually" each time.
It's generally advisable to do your data wranglings before passing the data frame to ggplot. In this case, something like the following could work:
subsetFun <- function(df, var.filter){
return(df %>%
filter(variable %in% var.filter) %>%
group_by(Measurement) %>%
mutate(RelAbun = value / sum(value)) %>%
heat %+% subsetFun(df.melt.reabun, alt)
I have a set of text to be printed using ggplot at (x,y) locations where only a subset of them overlaps. I would like to keep the ones not overlapping exactly where they are and then repel the ones that overlap (I know which ones do these -- for example the names of states in New England overlap while in the west nothing overlaps, I want to keep the western state names where they are but repel the ones in New England). When I use the geom_text_repel it repels all of the text. If I chose the subset that does not overlap and use geom_text to print them and the other using geom_text_repel because they are at different layers. Is there a way to fix some subset of text and repel the rest using geom_text_repel or do I need to go for a completely different solution?
Here is an example:
# state centers by fixing Alaska and Hawaii to look good in our maps
df = data.frame(x = state.center$x, y= state.center$y, z = state.abb)
overlaps = c('RI', 'DE', 'CT', 'MA')
df %>%
ggplot() +
size = 1) +
# plot the ones I would like to keep where the are
# I want these right centered around the points
data = df %>% filter(! z %in% overlaps),
size = 4) +
# plot the ones I would like to repel
data = df %>% filter(z %in% overlaps),
size = 4,
min.segment.length = unit(0, "npc")) +
coord_map() +
df %>%
ggplot() +
size = 1) +
# if we repel all instead
size = 4,
min.segment.length = unit(0, "npc")) +
coord_map() +
I have data for several pixels of the pacific ocean (each pixel is a 250km^2 square of the ocean represented by a lat and lon value). I have the following code in R to draw my map with the minor breaks representing each pixel I have data for:
mp1 <- fortify(map(fill=TRUE, plot=FALSE))
mp2 <- mp1
mp2$long <- mp2$long + 360
mp2$group <- mp2$group + max(mp2$group) + 1
mp <- rbind(mp1, mp2)
ggplot(aes(x = long, y = lat, group = group), data = mp) +
geom_path() +
scale_x_continuous(name="Longitude",minor_breaks=seq(170,246,2), limits = c(170, 246)) +
scale_y_continuous(name= "Latitude",minor_breaks=seq(30,64,2),limits = c(30, 64))
which gives:
For each pixel that isn't landlocked I'd initially like to colour in, say blue, to represent ocean. I'd also like a legend explaining this. Eventually, once I have processed the data, I would like to colour it by the classification it will be given. Will this be possible? If it helps I have a matrix of all possible lon/lat combinations that I have pixels for so they would be available for referencing.
mp2 looks like its derived from a shapefile or vector data; I'm assuming you have a second raster (pixel) data layer. You could try using geom_raster or geom_tile:
ggplot(aes(x = long, y = lat, group = group), data = mp) +
geom_raster(data = rasterdat, aes(fill = colorvar)) +
geom_path() +
limits = c(170, 246)) +
scale_y_continuous(name= "Latitude",minor_breaks=seq(30,64,2),
limits = c(30, 64))
where rasterdat is your dataframe of raster data and colorvar is some field that is used to specify the color. I'm also assuming that you can reshape your raster data so it has lat and lon columns like mp2. Note that you should add the geom_raster aesthetic BEFORE geom_path, so that the map lines are drawn on top.
I would like to create a colour blind test, similar to that below, using ggplot.
The basic idea is to use geom_hex (or perhaps a voronoi diagram, or possibly even circles as in the figure above) as the starting point, and define a dataframe that, when plotted in ggplot, produces the image.
We would start by creating a dataset, such as:
df <- data.frame(x = rnorm(10000), y = rnorm(10000))
then plot this:
ggplot(df, aes(x, y)) +
geom_hex() +
coord_equal() +
scale_fill_gradient(low = "red", high = "green", guide = FALSE) +
which gives the image below:
The main missing step is to create a dataset that actually plots a meaningful symbol (letter or number), and I'm not sure how best to go about this without painstakingly mapping the coordinates. Ideally one would be able to read in the coordinates perhaps from an image file.
Finally, a bit of tidying up could round the plot edges by removing the outlying points.
All suggestions are very welcome!
Getting a little closer to what I'm after, we can use the image below of the letter 'e':
Using the imager package, we can read this in and convert it to a dataframe:
img <- imager::load.image("e.png")
df <- as.data.frame(img)
then plot that dataframe using geom_raster:
ggplot(df, aes(x, y)) +
geom_raster(aes(fill = value)) +
coord_equal() +
scale_y_continuous(trans = scales::reverse_trans()) +
scale_fill_gradient(low = "red", high = "green", guide = FALSE) +
If we use geom_hex instead of geom_raster, we can get the following plot:
ggplot(df %>% filter(value %in% 1), aes(x, y)) +
geom_hex() +
coord_equal() +
scale_y_continuous(trans = scales::reverse_trans()) +
scale_fill_gradient(low = "red", high = "green", guide = FALSE) +
so, getting there but clearly still a long way off...
Here's an approach for creating this plot:
Packages you need:
Get image into a 2D matrix (x and y coordinates) of values. To do this, I downloaded the .png file of the e as "e.png" and saved in my working directory. Then some processing:
img <- png::readPNG("e.png")
# From http://stackoverflow.com/questions/16496210/rotate-a-matrix-in-r
rotate <- function(x) t(apply(x, 2, rev))
# Convert to one colour layer and rotate it to be in right direction
img <- rotate(img[,,1])
# Check that matrix makes sense:
Next, create a whole lot of circles! I did this based on this post.
# Create random "circles"
ncircles <- 1200
offset <- 100
rmax <- 80
x_limits <- c(-offset, ncol(img) + offset)
y_limits <- c(-offset, nrow(img) + offset)
xyr <- data.frame(
x = runif(ncircles, min(x_limits), max(x_limits)),
y = runif(ncircles, min(y_limits), max(y_limits)),
r = rbeta(ncircles, 1, 10) * rmax)
# Find non-overlapping arrangement
res <- circleLayout(xyr, x_limits, y_limits, maxiter = 1000)
cat(res$niter, "iterations performed")
#> 1000 iterations performed
# Convert to data for plotting (just circles for now)
plot_d <- circlePlotData(res$layout)
# Check circle arrangement
ggplot(plot_d) +
geom_polygon(aes(x, y, group=id), colour = "white", fill = "skyblue") +
coord_fixed() +
Finally, interpolate the image pixel values for the centre of each circle. This will indicate whether a circle is centered over the shape or not. Add some noise to get variance in colour and plot.
# Get x,y positions of centre of each circle
circle_positions <- plot_d %>%
group_by(id) %>%
summarise(x = min(x) + (diff(range(x)) / 2),
y = min(y) + (diff(range(y)) / 2))
# Interpolate on original image to get z value for each circle
circle_positions <- circle_positions %>%
z = fields::interp.surface(
list(x = seq(nrow(img)), y = seq(ncol(img)), z = img),
as.matrix(.[, c("x", "y")])),
z = ifelse(is.na(z), 1, round(z)) # 1 is the "empty" area shown earlier
# Add a little noise to the z values
circle_positions <- circle_positions %>%
mutate(z = z + rnorm(n(), sd = .1))
# Bind z value to data for plotting and use as fill
plot_d %>%
left_join(select(circle_positions, id, z)) %>%
ggplot(aes(x, y, group = id, fill = z)) +
geom_polygon(colour = "white", show.legend = FALSE) +
scale_fill_gradient(low = "#008000", high = "#ff4040") +
coord_fixed() +
#> Joining, by = "id"
To get colours right, tweak them in scale_fill_gradient