Replicating Hexagon heatmap in ggmaps [R] - r

I need to replicate heatmap that can be found in the link: https://rpubs.com/chrisbrunsdon/gwdplyr
I tried the very approach however I cannot replicate final hexagon colored heatmap.
My main issues is that I am having problem with even creating non colored map, using st_buffer(), st_make_grid() for some reason it takes a lot of time and then cannot be store because the file is very large.
I provided example that is with location that I need to plot heatmap on, then values are discrete and shall be use as color in heatmap
library(sf)
library(dplyr)
library(ggplot2)
library(ggspatial)
df <-
data.frame(
x = runif(1000, min = 14.22, max = 14.71),
y = runif(1000, min = 49.94, max = 50.18),
value = sample(c(1:8))
)
hp <-
df%>%
st_as_sf(coords=c("x","y"),crs=27700)
gl_hexes <- st_buffer(hp,1) %>%
st_make_grid(cellsize=c(0.0005, 0.0005),square=FALSE) %>%
st_sf() %>% mutate(hex_ID=sprintf('Hex%04d',row_number()))
I will be thankful for any advice...

Related

Plotting large sf dataframe

I need to work with a 10minutes (1/12th degree) global grid of all land areas. The grid is generated in R using package (sf). The grid is to be limited to land areas of the world. Grid ID is needed for further down-stream analysis. Code generating the grid is below:
library(tidyverse)
library(sf)
sf_use_s2(F)
birds <- st_read('BOTW_breeding_valid_union.gpkg') ## This is just an examplary shapefile I use to set a bbox.
bbox <- st_bbox(birds)
bbox[1] <- -180
bbox[2] <- -90
bbox[3] <- 180
bbox[4] <- 90
bbox <- bbox %>% st_as_sfc
grid <- st_make_grid(bbox, cellsize = 1/12) %>% st_as_sf() %>% mutate(grid_ID = row_number())
land <- st_read('ne_10m_land.shp')
land_grids <- st_intersects(grid, land) %>% as.data.frame() %>% rename(grid_ID = row.id)
grid <- grid %>% left_join(land_grids, by = "grid_ID") %>% filter(col.id == "1") %>% select(grid_ID) %>%
st_write('global_10m_grid.gpkg')
Now I need to plot it to inspect it and for further data mapping (the grids will have values). I use package tmap:
grid <- st_read('global_10m_grid.gpkg')
bitmap('test_grid.png')
tm_shape(grid) + tm_fill(col = 'red')
dev.off()
However, I am struggling due to the size either on a personal machine (takes incredibly long time to load [or so I hope as it hasn't loaded in principle just yet]) or on a cluster with interactive shell (dev.off produced an empty file).
Is there a way to plot this more efficiently?
Yes, rasterising was indeed the solution. Maintaining the resolution of the vector in raster results in files just under 11 mb, perfectly openable in a normal RStudio setting on my Desktop.
For future references, the code looks like this:
g <- st_read('yourfile.gpkg')
library(stars)
g %>% left_join(df, by = 'grp') %>% select(value) %>% st_rasterize(n = 2773927) %>% write_stars('filename.tif')

Diagram with arranged countries polygons

I am looking for solution of a special diagram using following data set (below). The diagram has to contain the polygons of the countries from the data set, but they have to be (1) placed next to each other, not by long,lat coordinates; (2) the size of the polygon of the each country has to correspond to the relative size, which is a t$rs variable (calculated as it's shown below); (3) the fill color of each country depends on the value of t$value variable -- if it's positive the color is green and red if negative.
My code so far is:
library(ggmap)
library(dplyr)
library(sf)
library(tidyverse)
library(gridExtra)
library(rgdal)
library(tmap)
# The data set
t <- data.frame(id = c(136,142,172,567),
name = c("Italy","Norway","Finland","Singapore"),
value = c(-0.921253632,245.6713064,4.049413648,207.5896534))
# Min value in absolute terms
min = min(abs(t$value))
# Relative value w.r.t. min value
t$rs <- t$value / min
# Shape files for countries
# https://hub.arcgis.com/datasets/esri::world-countries-generalized/about
# Unzip files from the ArcGIS archive file
unzip("World_Countries_(Generalized).zip", exdir = ".")
shp.file <- "World_Countries__Generalized_.shx"
countries <- readOGR(shp.file)
qtm(countries[countries$COUNTRY %in% t$name,])
My output diagram is attached. It's a bit far from desired shape.
I adjusted your data import slightly using st_read() from the sf package:
library(tidyverse)
library(sf)
library(tmap)
# Unzip files from the ArcGIS archive file
unzip("World_Countries_(Generalized).zip", exdir = ".")
shp.file <- "World_Countries__Generalized_.shx"
countries <- st_read(shp.file)
countries %>%
left_join(t, by = c("COUNTRY" = "name")) %>%
filter(!is.na(id)) %>%
st_as_sf() %>%
tm_shape() +
tm_fill("value") +
tm_facets(by = "COUNTRY")

Change position of legend in plot of pec object

I am trying to plot the prediction error curve from pec package but I can't change the legend position and size. There's an example from pec package:
library(rms)
library(pec)
data(pbc)
pbc <- pbc[sample(1:NROW(pbc),size=100),]
f1 <- psm(Surv(time,status!=0)~edema+log(bili)+age+sex+albumin,data=pbc)
f2 <- coxph(Surv(time,status!=0)~edema+log(bili)+age+sex+albumin,data=pbc,x=TRUE,y=TRUE)
f3 <- cph(Surv(time,status!=0)~edema+log(bili)+age+sex+albumin,data=pbc,surv=TRUE)
brier <- pec(list("Weibull"=f1,"CoxPH"=f2,"CPH"=f3),data=pbc,formula=Surv(time,status!=0)~1)
print(brier)
plot(brier)
But shows a big the legend in the middle of plot.
I also tried:
plot(brier, legend = "topright")
class(brier)
But don't show legend.
How can I change the position of legend? And also ¿is it posible to plot this graph using ggplot?
I think I got what you want using ggplot2. The idea is to pick elements from your brier object that contains data for the plot, make a dataframe with it and plot it.
library(ggplot2)
# packages for the pipe and pivot_wider, you can do it with base functions, I just prefer these
library(tidyr)
library(dplyr)
df <- do.call(cbind, brier[["AppErr"]]) # contains y values for each model
df <- cbind(brier[["time"]], df) # values of the x axis
colnames(df)[1] <- "time"
df <- as.data.frame(df) %>% pivot_longer(cols = 2:last_col(), names_to = "models", values_to = "values") # pivot table to long format makes it easier to use ggplot
ggplot(data = df, aes(x = time, y = values, color = models)) +
geom_line() # I suppose you know how to custom axis names etc.
Output:

Adding text annotation to a clustering scatter plot (tSNE)

I have XY data (a 2D tSNE embedding of high dimensional data) which I'd like to scatter plot. The data are assigned to several clusters, so I'd like to color code the points by cluster and then add a single label for each cluster, that has the same color coding as the clusters, and is located outside (as much as possible) from the cluster's points.
Any idea how to do this using R in either ggplot2 and ggrepel or plotly?
Here's the example data (the XY coordinates and cluster assignments are in df and the labels in label.df) and the ggplot2 part of it:
library(dplyr)
library(ggplot2)
set.seed(1)
df <- do.call(rbind,lapply(seq(1,20,4),function(i) data.frame(x=rnorm(50,mean=i,sd=1),y=rnorm(50,mean=i,sd=1),cluster=i)))
df$cluster <- factor(df$cluster)
label.df <- data.frame(cluster=levels(df$cluster),label=paste0("cluster: ",levels(df$cluster)))
ggplot(df,aes(x=x,y=y,color=cluster))+geom_point()+theme_minimal()+theme(legend.position="none")
The geom_label_repel() function in the ggrepel package allows you to easily add labels to plots while trying to "repel" the labels from not overlapping with other elements. A slight addition to your existing code where we summarize the data / get coordinates of where to put the labels (here I chose the upper left'ish region of each cluster - which is the min of x and the max of y) and merge it with your existing data containing the cluster labels. Specify this data frame in the call to geom_label_repel() and specify the variable that contains the label aesthetic in aes().
library(dplyr)
library(ggplot2)
library(ggrepel)
set.seed(1)
df <- do.call(rbind,lapply(seq(1,20,4),function(i) data.frame(x=rnorm(50,mean=i,sd=1),y=rnorm(50,mean=i,sd=1),cluster=i)))
df$cluster <- factor(df$cluster)
label.df <- data.frame(cluster=levels(df$cluster),label=paste0("cluster: ",levels(df$cluster)))
label.df_2 <- df %>%
group_by(cluster) %>%
summarize(x = min(x), y = max(y)) %>%
left_join(label.df)
ggplot(df,aes(x=x,y=y,color=cluster))+geom_point()+theme_minimal()+theme(legend.position="none") +
ggrepel::geom_label_repel(data = label.df_2, aes(label = label))

Use conditional coloring on a plotly surface

I am using plotly via R for the first time and trying to create a surface from a grid and color it based on a calculation.
For example, I would like to use the surface from data(volcano), as in
library(plotly)
plot_ly(z = ~volcano) %>% add_surface()
But instead of color based on the z-value (altitude), let's just say I wanted to color based on distance from my house on the little mesa at (20,60) .
house_loc <- c(20,60,150) # (x,y,z) of my house
dist_to_house <- Vectorize(function(x,y,z){sqrt(sum( (c(x,y,z)-house_loc)^2 ))})
So far I have tried:
color_me <-function(x){
colorRampPalette(c('tan','blue')
)(24L)[findInterval(x,seq(0,1,length.out=25),
all.inside=TRUE)]
}
library(dplyr)
library(reshape2)
volcano %>%
melt( varnames=c('y','x'),value.name='z' ) %>%
mutate( d = dist_to_house(x, y, z) ,
d_rel = d/max(d),
d_color = color_me(d_rel)
) -> df
plot_ly(df,
type='scatter3d',
mode='none', # no markers, just surface
x=~x,
y=~y,
z=~z,
surfaceaxis=2,
surfacecolor=~d_color) # last argument seems not to work
Which just returns:
The desired result would color the landscape tan in the region of the house and gradually fade to blue in the regions far from the house.
Somewhat related question uses mesh3d code found elsewhere and doesn't explain how to calculate (i, j, k)
Your code virtually has everything you need, just use a surface plot and use your distance array as the color.
library(plotly)
library(dplyr)
library(reshape2)
house_loc <- c(20,60,150)
dist_to_house <- Vectorize(function(x,y,z){sqrt(sum( (c(x,y,z)-house_loc)^2 ))})
volcano %>%
melt( varnames=c('y','x'),value.name='z' ) %>%
mutate( d = dist_to_house(x, y, z) ,
d_rel = d/max(d)
) -> df
color <- df$d_rel
dim(color) <- dim(volcano)
plot_ly(df,
type='surface',
z=volcano,
surfacecolor=color,
colors=c('tan','blue'))
In addition to the surface plot (see accepted answer) we can also do a mesh3d plot and avoid the reshaping (back to grid) step that plot requires.
However, the scale bar still isn't right (showing range of z, not d_rel)
plot_ly(df,
type='mesh3d',
x = ~x,
y = ~y,
z = ~z,
intensity=~d_rel,
colors = colorRamp(c("tan", "blue"))
)
Counter-intuitively, it is intensity= and not color= which seems to control the conditional coloring.
I originally avoided mesh3d because I thought I had to create a triangular mesh (Delaunay something or another) and had no idea how to do that, but it seems to be handled automatically in this case.

Resources