How to fix overlapping hexagons with geom_hex() and ggplot()? - r

I am plotting some data, basically I have coordinates and a value for each point. I wanted to make a hexagon map, with each hexagon averaging all the point values that correspond to that hexagon.
I manage to produce the map, but the some of the hexagons are overlapping and I am not sure how to fix it.
Here is my code:
pp = ggplot(df, aes(x = lon, y = lat, fill=value, group=value)) +
geom_hex(bins = 50, linewidth = 10)
pp
And the plot:

If you want the hexagons to be colored according to the average value, you will need stat_summary_hex, passing the numeric value to the z aesthetic, which by default is averaged in each hex bin.
Don't group by value - this effectively creates a layer of hexbins for each value, and this is what leads to the bins being in different positions in each group. Also, the values can't be averaged if they are in different groups.
library(ggplot2)
ggplot(df, aes(x = lon, y = lat)) +
stat_summary_hex(aes(z = as.numeric(as.character(value))),
bins = 50, linewidth = 10) +
scale_fill_gradientn(colors = scales::hue_pal()(5))
Note that the latest CRAN version of ggplot has an issue with hex-binning, and you will need to install the development version to get a decent result here. See this question for further details.
Created on 2023-01-04 with reprex v2.0.2
Data used
set.seed(1)
df <- data.frame(lon = rnorm(1000, 5.5), lat = rnorm(1000, 52.5),
value = factor(sample(0:4, 1000, TRUE)))

Related

ggplot - extract the values associated with breaks in a continuous colour scale

I would like to extract the breaks and the colour values associated with a ggplot continuous colour scale. There are multiple answers to finding the colour associated with each date point (like this), which can also be used to get discrete scale values, but I haven't seen an approach for a continuous colour scale. I don't want to force the scales, just retrieve the values that ggplot generates.
example:
library(ggplot)
df <- data.frame(x = 1:10, y = 1:10, col = 11:20)
ggplot(df) +
geom_point(aes(x = x, y = y, colour = col))
I would like to get a data frame showing breaks (12.5, 15, 17.5, 20) and the colour values associated with them.
Many thanks!
There are two ways of doing this, once with building the plot and once without building the plot.
If we build the plot;
library(ggplot2)
df <- data.frame(x = 1:10, y = 1:10, col = 11:20)
ggplot(df) +
geom_point(aes(x = x, y = y, colour = col))
We can extract the scale and use it to retrieve the relevant information.
# Using build plot
build <- ggplot_build(last_plot())
scale <- build$plot$scales$get_scales("colour")
breaks <- scale$get_breaks()
colours <- scale$map(breaks)
data.frame(breaks = breaks, colours = colours)
#> breaks colours
#> 1 NA grey50
#> 2 12.5 #1D3F5E
#> 3 15.0 #2F638E
#> 4 17.5 #4289C1
#> 5 20.0 #56B1F7
Alternatively, we can skip building the plot and use the scales themselves directly, provided we 'train' the scales by showing it the limits of the data.
scale <- scale_colour_continuous()
scale$train(range(df$col))
breaks <- scale$get_breaks()
colours <- scale$map(breaks)
data.frame(breaks = breaks, colours = colours)
As you can see, the default breaks algorithm produces an out-of-bounds break. If you want to use the information later on, it might be good to filter those out.

How do I correct the scale and order of the y axis in R on a barplot

Working with borehole data, attempting to plot the cross section with R. I'm rusty and am having trouble organizing the plot the way I want. From the image, my bar plot is not tracking with y axis values displaying the depth of the borehole, instead it tracks with the Layers (categorical data).
Very similar question was asked here but I could not get the code to work for my situation because my data is formatted differently.
Just to clarify, I want to put the y axis in increasing numerical order, starting at 0, with the categorical layer data mapped to the correct part of that depth.
my code:
g2 <- ggplot(data=df3,
mapping = aes(x=PointID,y=End_Depth,
fill=`Layer`)) +
geom_col(colour="black") +
labs(y="Depth")
The Data
The question you were pointing to contains a very good idea, to use geom_rect instead. You could do something like the following (comments in code)
library(tidyverse)
# just some fake data, similar to yours
foo <- data.frame(id = "id", layer = letters[1:6], depth = c(5,10,12,15,20,25))
foo2 <-
foo %>%
# using lag to create ymin, which is needed for geom_rect
# transforming id into integers so i can add / subtract some x for xmin/xmax
mutate( ymin = lag(depth, default = 0),
id_int = as.integer(factor(id)))
# I am turning off the legend and labelling the layers directly instead
# using geom_text
# this creates a "wrong" y axis title which I'm changing with labs(y = ... )
# the continuous x axis needs to be turned into a fake discrete axis by
# semi-manually setting the breaks and labels
ggplot(foo2) +
geom_rect(aes(xmin = id_int - .5, xmax = id_int +.5,
ymin = ymin, ymax = depth,
fill = layer), show.legend = FALSE) +
geom_text(aes(x = id_int, y = (depth + ymin)/2, label = layer)) +
scale_x_continuous(breaks = foo2$id_int, labels = foo2$id) +
labs(y = "depth")
Created on 2021-10-19 by the reprex package (v2.0.1)

ggplot | Legend not filled

this question is different from this one and this one in the sense that I get a legend. However, the values in the legend only have coloured borders, and not the entire area.
I am fairly sure there is an easy argument I forgot to specify, but I am looking for it for about 2 hours now and am slowly going crazy. Maybe you can help.
First, a reproducable example:
library(ggplot2)
set.seed(1234)
#this only generates coordinates, in my case 72 latitude and 144 longitude ones. ignore it if you want
df <- data.frame(lon = rep(1:144/144*360-180,72), lat = rep(1:72/72*180-90, each = 144), val = runif(72*144) )
world <- map_data("world")
# This function is supposed to colour all points where the val columns is less than a specified value in blue, and the others in red.
#'#param cutoff a cutoff value, where all values less than the value are plotted in blue and all coefficients greater than it are plotted in red
#'#return A plot object of a map of the world where the color indicates the value
#'#export
plot_with_cutoff = function(df, cutoff = quantile(df$val, 0.05)){
df$indicator <- as.factor(ifelse(df$val<cutoff,0,1))
plot <- ggplot() + geom_tile(data = df, mapping = aes(x = lon, y = lat, color = indicator)) +
coord_fixed(1.4)+ scale_color_manual("Higher or lower than 5% quantile", values = c("blue","red")) +
geom_polygon(data = world, aes(x=long, y = lat, group = group), fill = NA, color = "black")
return(plot)
}
plot_with_cutoff(df)
As you can see, the function works as requested (it's not nice to look at, but only an example after all. It's not nice to look at because the data is randomly generated, which it ususally isn't). BUUUT, LOOK AT THE LEGEND!!!
I don't know how to get the "squares" to be filled, and I honestly have no idea what else to do, so any help is greatly appreciated! Thanks in Advance!!!

Small ggplot2 plots placed on coordinates on a ggmap

I would like to first use ggmap to plot a specific area with longitude and latitude as axes.
Then I would like to put small ggplot2 plots on the specific locations, given their longitude and latitude. These can be barplots with minimal theme.
My database may have the columns:
1. town
2. longitude
3. latitude
4. through 6. value A, B, C
I generate a plot (pseudocode)
p <- ggmap(coordinates)
and I have my minimal ggplot2 design
q<-ggplot2()+geom_bar(....)+ ... x-axis null y axis null minimal template
How to combine the two designs to have a ggmap with small minimal ggplot plots imposed on specific coordinates of the map?
Here's one I did using pie charts as points on a scatterplot. You can use the same concept to put barcharts on a map at specific lat/long coordinates.
R::ggplot2::geom_points: how to swap points with pie charts?
Needs further update. Some of the code used was abbreviated from another answer, which has since been deleted. If you find this answer via a search engine, drop a comment and I'll get around to fleshing it back out.
Updated:
Using mostly your adapted code from your answer, but I had to update a few lines.
p <- ggmap(Poland) + coord_quickmap(xlim = c(13, 25), ylim = c(48.8, 55.5), expand = F)
This change makes a better projection and eliminates the warnings about duplicated scales.
df.grobs <- df %>%
do(subplots = ggplot(., aes(1, value, fill = component)) +
geom_col(position = position_dodge(width = 1),
alpha = 0.75, colour = "white") +
geom_text(aes(label = round(value, 1), group = component),
position = position_dodge(width = 1),
size = 3) +
theme_void()+ guides(fill = F)) %>%
mutate(subgrobs = list(annotation_custom(ggplotGrob(subplots),
x = lon-0.5, y = lat-0.5,
xmax = lon+0.5, ymax = lat+0.5)))
Here I explicitly specified the dodge width for your geom_col so I could match it with geom_text. I used round(value, 1) for the label aesthetic, and it automatically inherits the x and y aesthetics from the subplots = ggplot(...) call. I also manually set the size to be quite small, so the labels would fit, but then I increased the overall bounding box for each subgrob, from 0.35 to 0.5 in each direction.
df.grobs %>%
{p +
.$subgrobs +
geom_text(data=df, aes(label = name), vjust = 3.5, nudge_x = 0.065, size=2) +
geom_col(data = df,
aes(Inf, Inf, fill = component),
colour = "white")}
The only change I made here was for the aesthetics of the "ghost" geom_col. When they were set to 0,0 they weren't plotted at all since that wasn't within the x and y limits. By using Inf,Inf they're plotted at the far upper right corner, which is enough to make them invisible, but still plotted for the legend.

Coordinate points appropriately sized in ggplot2

I have sampled 10,000 coordinates from my data in this file. I have around 130,000 points.
https://www.dropbox.com/s/40hfyx6a5hsjuv7/data.csv
I am trying to plot these points on the Americas map using ggplot2. Here is my code.
library(ggplot2)
library(maps)
map_world <- map_data("world")
map_world <- subset(map_world, (lat >= -60 & lat <= 75))
map_world <- subset(map_world, (long >= -170 & long <= -30))
p <- ggplot(data = data_coords, legend = FALSE) +
geom_polygon(data = map_world, aes(x = long, y = lat, group = group)) +
geom_point(aes(x = lon, y = lat), shape = 19, size = 0.00001,
alpha = 0.3, colour = "red") +
theme(panel.grid.major = element_blank()) +
theme(panel.grid.minor = element_blank()) +
theme(axis.text.x = element_blank(),axis.text.y = element_blank()) +
theme(axis.ticks = element_blank()) +
xlab("") + ylab("")
png("my_plot.png", width = 8000, height = 7000, res = 1000)
print(p)
dev.off()
The points seem to cover the whole area in which they were plotted. I would like them to be more smaller to better represent a location. You can see that I've set the size to 0.00001. I was just trying to see if it has any effect but it doesn't seem to help after a certain limit. Is this the best that is possible at this resolution or could it be reduced more?
I had actually plotted around 400,000 points but only on the US map before and they looked much better like below. Hoping to get something like this. Thanks.
https://www.dropbox.com/s/8d0niu9g6ygz0wo/Clusters_reduced.png
Try playing with very small values of alpha, instead of the point size:
http://docs.ggplot2.org/0.9.3.1/geom_point.html
# Varying alpha is useful for large datasets
d <- ggplot(diamonds, aes(carat, price))
d + geom_point(alpha = 1/1000)
Edit:
Additional ideas are given in the documentation. Here's a summary:
Details
The scatterplot is useful for displaying the relationship between two continuous variables, although it can also be used with one continuous and one categorical variable, or two categorical variables. See geom_jitter for possibilities.
The bubblechart is a scatterplot with a third variable mapped to the size of points. There are no special names for scatterplots where another variable is mapped to point shape or colour, however.
The biggest potential problem with a scatterplot is overplotting: whenever you have more than a few points, points may be plotted on top of one another. This can severely distort the visual appearance of the plot. There is no one solution to this problem, but there are some techniques that can help. You can add additional information with stat_smooth, stat_quantile or stat_density2d. If you have few unique x values, geom_boxplot may also be useful. Alternatively, you can summarise the number of points at each location and display that in some way, using stat_sum.
Another technique is to use transparent points, geom_point(alpha = 0.05).
Edit 2:
Combining the details from the manual with the hints in Transparency and Alpha levels for ggplot2 stat_density2d with maps and layers in R
This might look like the solution:
library(ggplot2)
library(maps)
data_coords <- read.csv("C:/Downloads/data.csv")
map_world <- map_data("world")
map_world <- subset(map_world, (lat >= -60 & lat <= 75))
map_world <- subset(map_world, (long >= -170 & long <= -30))
p <- ggplot( data = data_coords, legend = FALSE) +
geom_polygon( data = map_world, aes(x = long, y = lat, group = group)) +
stat_density2d( data = data_coords, aes(x=lon, y=lat, fill = as.factor(..level..)), size=1, bins=10, geom='polygon') +
scale_fill_manual(values = c("yellow","red","green","royalblue", "black","white","orange","brown","grey"))
png("my_plot2k.png", width = 2000, height = 2000, res = 500)
print(p)
dev.off()
Resulting image (not the best colour palette used):

Resources