R leaflet/shiny: display and offset overlapping polylines based on column value - r

I'm trying to stretch what I can do in R and have hit a wall and hope you can point me in the right direction on how best I could accomplish what I want to do. I am plotting a bunch of polylines from a shape file whose data looks like this:
placename,placetype,placebook,numbooks,row_number,placelinecoords
Main Street,street,"BOOKTDS",1,1,LINESTRING(-3.700559678237278 40.42098474661999,-3.698346475125229 40.42033268716025,-3.69731867182242 40.42003534594848,-3.697243299580215 40.42003534594848)
First Street,street,"BOOKESM",3,1,LINESTRING(-3.710546258545151 40.41308011176736,-3.710213664627304 40.41309440722183,-3.709234658336868 40.41341707524381,-3.708880606746902 40.4135232694443,-3.708711627578964 40.41372748858957)
First Street,street,"BOOKTDS",3,2,LINESTRING(-3.710546258545151 40.41308011176736,-3.710213664627304 40.41309440722183,-3.709234658336868 40.41341707524381,-3.708880606746902 40.4135232694443,-3.708711627578964 40.41372748858957)
First Street,street,"BOOKLDE",3,3,LINESTRING(-3.710546258545151 40.41308011176736,-3.710213664627304 40.41309440722183,-3.709234658336868 40.41341707524381,-3.708880606746902 40.4135232694443,-3.708711627578964 40.41372748858957)
Loughborough Street,street,"BOOKESM",2,1,LINESTRING(-3.707336328013795 40.42433623251054,-3.707014282978915 40.42429971916709,-3.706726498054129 40.42429971916709,-3.706281116622912 40.42409628731927,-3.705390353760477 40.42377288157678,-3.704602371228324 40.42316257940762,-3.70376642454204 40.42259400231908)
Loughborough Street,street,"BOOKTDS",2,2,LINESTRING(-3.707336328013795 40.42433623251054,-3.707014282978915 40.42429971916709,-3.706726498054129 40.42429971916709,-3.706281116622912 40.42409628731927,-3.705390353760477 40.42377288157678,-3.704602371228324 40.42316257940762,-3.70376642454204 40.42259400231908)
Oak Street,street,"BOOKLMI",2,1,LINESTRING(-3.700391803697817 40.41664973667679,-3.700384951675798 40.41673842198933,-3.699754565650076 40.4176044018386,-3.699549004989513 40.41782350340716)
Oak Street,street,"BOOKLBU",2,2,LINESTRING(-3.700391803697817 40.41664973667679,-3.700384951675798 40.41673842198933,-3.699754565650076 40.4176044018386,-3.699549004989513 40.41782350340716)
"placebook" is a unique code for a book where a particular street name appears. I have assigned each book with a color and load in the data:
books = c("OBRAESM", "OBRAHOR", "OBRAINS", "OBRALBU", "OBRALCT", "OBRALDB","OBRALDE","OBRALMI","OBRALPI","OBRATDS")
color = c("red", "orange", "yellow", "green", "blue", "pink","gray","purple","black","white")
df = cbind.data.frame(books,color)
colnames(df) = c("books","color")
placeographypaths <- readOGR("shapefiles/places_paths.shp")
placeographypathsstreets <- subset(placeographypaths, placetype %like% "street")
What I would like to do is plot these lines onto the map by book, with each appearing as a different color. When there is more than one book assigned to a particular line, I would need to offset the lines so they are all visible. These lines will overlap in their entirety and in most cases there will only be a few lines overlapping in the same location (the maximum is 5, but most are 1-3).
So "First Street" would display three lines: red, gray, and white. I see there's a PolylineOffset tool, but I can't find any examples that use column values as the criteria for offsetting--and it seems to mostly apply to more complex situations where only a part of the line overlaps)--perhaps there's a simpler solution that I'm missing.

I spent some time to think what I could do since I do not know how to use PolylineOffset. It seems that this feature will be coming in the leaflet package. I want to suggest an alternative for your visualization. Reading your question, you would have five types of lines. That is, each line type can represent how many times streets appear in your data set. You said the maximum overlapping is five times. I think you can create five levels in in a grouping variable and create colors in leaflet.
First, I summarized your data grouping by placename. For each placename, I counted how many data points (rows) exist. I also created a string containing book names. The string is arranged for popups. Then, I created a color palette for leaflet. Finally, I drew a map. If you want something fancier, I think you can use the htmlTable package, for example.
library(dplyr)
library(leaflet)
library(sf)
library(viridis)
# Aggregate the data by placename. Note your data is called mysf, which is an
# sf object.
group_by(mysf, placename) %>%
summarize(frequency = factor(n(), levels = 1:3, labels = c("1", "2", "3")),
books = paste0("<br/>", paste0(placebook, collapse = "<br/>"))) -> mysf2
# Create categorical colors
# I am checking colors here
previewColors(colorFactor(palette = "viridis", domain = mysf2$frequency),
values = unique(mysf2$frequency))
# Create my own palette
mypal <- colorFactor(palette = "viridis", domain = mysf2$frequency)
# Draw a leaflet map
leaflet() %>%
addProviderTiles("CartoDB.Positron") %>%
addPolylines(data = mysf2, color = ~mypal(mysf2$frequency),
popup = paste("Place: ", mysf2$placename, "<br>",
"Book(s): ", mysf2$books, "<br>")) %>%
addLegend(position = "bottomright", pal = mypal, values = mysf2$frequency,
title = "Frequency",
opacity = 1)
Finally one note. The way you provided your data does not unfortunately work for anybody to replicate your situation. (I invested some good amount of time to manually create your data. I would not do this if I do not have enough time. Perhapd you would not want to as well, right?) If your data is large, you want to consider uploading it somewhere else. Otherwise, you can use dput() which creates a copy of your data. If you carefully see questions in R, many users provide their data with dput(). I highly recommend you to use this function when you ask more questions in the future.
DATA
mysf <- structure(list(placename = structure(c(3L, 1L, 1L, 1L, 2L, 2L,
4L, 4L), .Label = c("First Street", "Loughborough Street", "Main Street",
"Oak Street"), class = "factor"), placetype = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "placetype", class = "factor"),
placebook = structure(c(5L, 1L, 5L, 3L, 1L, 5L, 4L, 2L), .Label = c("BOOKESM",
"BOOKLBU", "BOOKLDE", "BOOKLMI", "BOOKTDS"), class = "factor"),
geometry = structure(list(structure(c(-3.70055967823728,
-3.69834647512523, -3.69731867182242, -3.69724329958022,
40.42098474662, 40.4203326871602, 40.4200353459485, 40.4200353459485
), .Dim = c(4L, 2L), class = c("XY", "LINESTRING", "sfg")),
structure(c(-3.71054625854515, -3.7102136646273, -3.70923465833687,
-3.7088806067469, -3.70871162757896, 40.4130801117674,
40.4130944072218, 40.4134170752438, 40.4135232694443,
40.4137274885896), .Dim = c(5L, 2L), class = c("XY",
"LINESTRING", "sfg")), structure(c(-3.71054625854515,
-3.7102136646273, -3.70923465833687, -3.7088806067469,
-3.70871162757896, 40.4130801117674, 40.4130944072218,
40.4134170752438, 40.4135232694443, 40.4137274885896), .Dim = c(5L,
2L), class = c("XY", "LINESTRING", "sfg")), structure(c(-3.71054625854515,
-3.7102136646273, -3.70923465833687, -3.7088806067469,
-3.70871162757896, 40.4130801117674, 40.4130944072218,
40.4134170752438, 40.4135232694443, 40.4137274885896), .Dim = c(5L,
2L), class = c("XY", "LINESTRING", "sfg")), structure(c(-3.70733632801379,
-3.70701428297891, -3.70672649805413, -3.70628111662291,
-3.70539035376048, -3.70460237122832, -3.70376642454204,
40.4243362325105, 40.4242997191671, 40.4242997191671,
40.4240962873193, 40.4237728815768, 40.4231625794076,
40.4225940023191), .Dim = c(7L, 2L), class = c("XY",
"LINESTRING", "sfg")), structure(c(-3.70733632801379,
-3.70701428297891, -3.70672649805413, -3.70628111662291,
-3.70539035376048, -3.70460237122832, -3.70376642454204,
40.4243362325105, 40.4242997191671, 40.4242997191671,
40.4240962873193, 40.4237728815768, 40.4231625794076,
40.4225940023191), .Dim = c(7L, 2L), class = c("XY",
"LINESTRING", "sfg")), structure(c(-3.70039180369782,
-3.7003849516758, -3.69975456565008, -3.69954900498951,
40.4166497366768, 40.4167384219893, 40.4176044018386,
40.4178235034072), .Dim = c(4L, 2L), class = c("XY",
"LINESTRING", "sfg")), structure(c(-3.70039180369782,
-3.7003849516758, -3.69975456565008, -3.69954900498951,
40.4166497366768, 40.4167384219893, 40.4176044018386,
40.4178235034072), .Dim = c(4L, 2L), class = c("XY",
"LINESTRING", "sfg"))), class = c("sfc_LINESTRING", "sfc"
), precision = 0, bbox = structure(c(xmin = -3.71054625854515,
ymin = 40.4130801117674, xmax = -3.69724329958022, ymax = 40.4243362325105
), class = "bbox"), crs = structure(list(epsg = 4326L, proj4string = "+proj=longlat +datum=WGS84 +no_defs"), class = "crs"), n_empty = 0L)), row.names = c(NA,
8L), sf_column = "geometry", agr = structure(c(placename = NA_integer_,
placetype = NA_integer_, placebook = NA_integer_), class = "factor", .Label = c("constant",
"aggregate", "identity")), class = c("sf", "data.frame"))

Related

Fast contiguity checks for polygons in SF/R

For my dissertation, I'm running an algorithm that creates random congressional districts by flipping precincts from one district to an adjacent district thousands of times. One important part in the analysis is that the district that gives the precinct cannot be rendered non-contiguous (i.e., it can't give up a precinct that connects one portion of a district to a different part). To check this, I utilize the following script:
library(sf)
library(tidyverse)
subtracted_district <- main_df %>% #main_df has all of the precincts
filter(district %in% giver) %>% #this selects only precincts in "giver" district
filter(!index %in% proposed) %>% #this removes the proposed precinct to be flipped to the "taker" district
summarise()
foo <- st_cast(subtracted_district, "POLYGON")
d <- dim(foo)[1]
ifelse(d == 1, accepted <- 1, adjoining_giver <- adjoining_giver[-a]) #if contiguous, it's accepted, otherwise, we start over without this precinct available.
In short, this pulls the precincts from the shapefiles that are assigned to the "giver" district, filters out the "proposed" donor precinct, reassembles the "giver" district without the proposed donor precinct, then checks to see if the reassembled district would be comprised of multiple polygons.
This works, but the problem is that even on my very fast desktop, recreating the district is prohibitively slow, especially if it has thousands of precincts in it. It ends up taking about an hour to run through.
I'm wondering if there is a way to do this without recreating the district. If you know the adjacency matrix for a group of polygons, is there a fast, reliable way to check to see if the group of polygons is contiguous?
Edit to address a comment: To be clear, the slow part comes with the sf/tidy command "summarise," which takes all of the precinct polygons and merges them into a district. This takes a huge amount of time, which is why I'm wondering if I can do it without having to actually create the district just by looking at the adjacency matrix.
Below is a sample main_df file, in sf format. The way it works now, a vector of indexes for the precincts in the "giver" district that are adjacent to the "taker" precinct are selected (not shown, but done through the adjacency matrix), and then a random precinct is selected from that vector as a proposed donation ("proposed").
If you were to create this shapefile and run the full code, 9, 11, 12 and 15 would be listed as possible precincts to flip from 17 to 13. It would then randomly select from that list a precinct to flip.
Let's say it proposes precinct 9 to donate. To test whether it was an acceptable proposal, the script takes main_df, filters out the precincts currently in district 17, then filters out precinct 9.
This is where things slow down: It then will merge together all of the precincts into a single district. It will be fast with this data, but very slow if you have, say, 2000 precincts in a district to merge.
It then uses st_cast to see if we are left with a multipolygon. If the resulting dataframe has more than 1 row, it does, the proposal is rejected, and it selects from the remaining data. Here it would break contiguity, so the proposal would be rejected and it would choose from 12, 11 and 15. If it chose 12, the proposal would fail, and it would select from 11 and 15. Whichever it chose, the proposal would be accepted, because it would not break contiguity.
Since creating the district slows things down so much, I'm wondering there's a fast way to do it via the adjacency matrix.
Sample adjacency matrix:
structure(list(12L, c(10L, 11L, 13L, 15L), c(8L, 10L, 12L, 15L
), c(5L, 13L), c(4L, 11L, 14L), c(9L, 11L, 14L), 14L, c(3L, 12L
), c(6L, 10L, 11L, 12L), c(2L, 3L, 9L, 11L, 12L, 15L), c(2L,
5L, 6L, 9L, 10L, 13L), c(1L, 3L, 8L, 9L, 10L), c(2L, 4L, 11L,
15L), 5:7, c(2L, 3L, 10L, 13L)), predicate = "relate_pattern", region.id = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15"), remove_self = FALSE, retain_unique = FALSE, ncol = 15L, class = c("sgbp",
"list"))
Sample shapefile:
structure(list(index = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 11L, 12L, 15L, 18L, 19L), Total_pop = c(3951.064118, 37401.9916269,
1989.623112, 2679.000014, 2934.782304, 2193.129252, 3967.176508,
2916.376886, 1237.112553, 14039.899499, 9486.059285, 2805.5047867,
3240.0551608, 2230.911935, 11792.6263111), district = c(17, 13,
13, 17, 17, 17, 17, 13, 17, 13, 17, 17, 17, 17, 13), geometry = structure(list(
structure(list(structure(c(-84.320765, -84.314011, -84.309424,
-84.329215, -84.333127, -84.340251, -84.340531, -84.333975,
-84.329623, -84.323183, -84.320765, 40.859333, 40.859414,
40.847182, 40.841154, 40.829147, 40.828847, 40.859099, 40.859198,
40.859295, 40.859405, 40.859333), .Dim = c(11L, 2L))), class = c("XY",
"POLYGON", "sfg")), structure(list(structure(c(-84.108857,
-84.093944, -84.096841, -84.094248, -84.079782, -84.075252,
-84.070914, -84.094946, -84.090253, -84.075439, -84.075766,
-84.089857, -84.089814, -84.100991, -84.108197, -84.108219,
-84.108304, -84.113273, -84.123183, -84.143812, -84.152296,
-84.16154, -84.146775, -84.160499, -84.122984, -84.127923,
-84.121899, -84.10871, -84.108857, 40.795642, 40.788453,
40.77236, 40.752697, 40.745239, 40.730615, 40.717104, 40.716549,
40.709565, 40.709664, 40.702659, 40.699769, 40.688383, 40.687675,
40.695077, 40.698505, 40.709503, 40.724139, 40.7303, 40.73043,
40.726295, 40.730179, 40.737101, 40.754921, 40.751592, 40.764579,
40.769752, 40.770137, 40.795642), .Dim = c(29L, 2L))), class = c("XY",
"POLYGON", "sfg")), structure(list(structure(c(-84.340004,
-84.223661, -84.223143, -84.222799, -84.318213, -84.318511,
-84.339536, -84.340004, 40.772111, 40.773446, 40.729461,
40.685957, 40.685658, 40.714662, 40.714542, 40.772111), .Dim = c(8L,
2L))), class = c("XY", "POLYGON", "sfg")), structure(list(
structure(c(-83.994079, -83.994156, -83.879834, -83.880194,
-83.994079, 40.644132, 40.731146, 40.732443, 40.64469,
40.644132), .Dim = c(5L, 2L))), class = c("XY", "POLYGON",
"sfg")), structure(list(structure(c(-83.879834, -83.994156,
-83.994343, -83.880063, -83.879834, 40.732443, 40.731146,
40.81805, 40.819919, 40.732443), .Dim = c(5L, 2L))), class = c("XY",
"POLYGON", "sfg")), structure(list(structure(c(-83.995165,
-83.994343, -84.109248, -84.109586, -84.109516, -83.995165,
40.905066, 40.81805, 40.817277, 40.860994, 40.90473, 40.905066
), .Dim = c(6L, 2L))), class = c("XY", "POLYGON", "sfg")),
structure(list(structure(c(-83.880301, -83.890429, -83.904255,
-83.904832, -83.917586, -83.900224, -83.880402, -83.880393,
-83.880383, -83.880301, 40.881537, 40.877166, 40.879403,
40.892264, 40.898383, 40.905847, 40.905907, 40.901276, 40.898756,
40.881537), .Dim = c(10L, 2L))), class = c("XY", "POLYGON",
"sfg")), structure(list(structure(c(-84.340004, -84.339536,
-84.318511, -84.318213, -84.396778, -84.397189, -84.340016,
-84.340004, 40.772111, 40.714542, 40.714662, 40.685658, 40.684926,
40.786584, 40.786948, 40.772111), .Dim = c(8L, 2L))), class = c("XY",
"POLYGON", "sfg")), structure(list(structure(c(-84.22485,
-84.130459, -84.109586, -84.109248, -84.108904, -84.148307,
-84.22407, -84.22485, 40.859307, 40.860182, 40.860994, 40.817277,
40.802748, 40.801737, 40.801247, 40.859307), .Dim = c(8L,
2L))), class = c("XY", "POLYGON", "sfg")), structure(list(
structure(c(-84.223143, -84.223661, -84.22407, -84.148307,
-84.108904, -84.108857, -84.10871, -84.121899, -84.127923,
-84.122984, -84.160499, -84.146775, -84.16154, -84.223143,
40.729461, 40.773446, 40.801247, 40.801737, 40.802748,
40.795642, 40.770137, 40.769752, 40.764579, 40.751592,
40.754921, 40.737101, 40.730179, 40.729461), .Dim = c(14L,
2L))), class = c("XY", "POLYGON", "sfg")), structure(list(
structure(c(-84.108857, -84.108904, -84.109248, -83.994343,
-83.994156, -84.075252, -84.079782, -84.094248, -84.096841,
-84.093944, -84.108857, 40.795642, 40.802748, 40.817277,
40.81805, 40.731146, 40.730615, 40.745239, 40.752697,
40.77236, 40.788453, 40.795642), .Dim = c(11L, 2L))), class = c("XY",
"POLYGON", "sfg")), structure(list(structure(c(-84.340004,
-84.340016, -84.397189, -84.397374, -84.340101, -84.340251,
-84.333127, -84.329215, -84.309424, -84.314011, -84.22485,
-84.22407, -84.223661, -84.340004, 40.772111, 40.786948,
40.786584, 40.815941, 40.816143, 40.828847, 40.829147, 40.841154,
40.847182, 40.859414, 40.859307, 40.801247, 40.773446, 40.772111
), .Dim = c(14L, 2L))), class = c("XY", "POLYGON", "sfg")),
structure(list(structure(c(-83.994079, -84.107787, -84.107908,
-84.108197, -84.100991, -84.089814, -84.089857, -84.075766,
-84.075439, -84.090253, -84.094946, -84.070914, -84.075252,
-83.994156, -83.994079, 40.644132, 40.643069, 40.657938,
40.695077, 40.687675, 40.688383, 40.699769, 40.702659, 40.709664,
40.709565, 40.716549, 40.717104, 40.730615, 40.731146, 40.644132
), .Dim = c(15L, 2L))), class = c("XY", "POLYGON", "sfg")),
structure(list(structure(c(-83.880402, -83.900224, -83.917586,
-83.904832, -83.904255, -83.890429, -83.880301, -83.880063,
-83.994343, -83.995165, -83.995228, -83.880423, -83.880402,
40.905907, 40.905847, 40.898383, 40.892264, 40.879403, 40.877166,
40.881537, 40.819919, 40.81805, 40.905066, 40.919843, 40.920429,
40.905907), .Dim = c(13L, 2L))), class = c("XY", "POLYGON",
"sfg")), structure(list(structure(c(-84.222799, -84.223143,
-84.16154, -84.152296, -84.143812, -84.123183, -84.113273,
-84.108304, -84.108219, -84.108197, -84.107908, -84.222749,
-84.222799, 40.685957, 40.729461, 40.730179, 40.726295, 40.73043,
40.7303, 40.724139, 40.709503, 40.698505, 40.695077, 40.657938,
40.656948, 40.685957), .Dim = c(13L, 2L))), class = c("XY",
"POLYGON", "sfg"))), n_empty = 0L, crs = structure(list(input = "NAD83",
wkt = "GEOGCRS[\"NAD83\",\n DATUM[\"North American Datum 1983\",\n ELLIPSOID[\"GRS 1980\",6378137,298.257222101,\n LENGTHUNIT[\"metre\",1]]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n CS[ellipsoidal,2],\n AXIS[\"latitude\",north,\n ORDER[1],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n AXIS[\"longitude\",east,\n ORDER[2],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n ID[\"EPSG\",4269]]"), class = "crs"), class = c("sfc_POLYGON",
"sfc"), precision = 0, bbox = structure(c(xmin = -84.397374,
ymin = 40.643069, xmax = -83.879834, ymax = 40.920429), class = "bbox"))), row.names = c(NA,
15L), sf_column = "geometry", agr = structure(c(index = NA_integer_,
Total_pop = NA_integer_, district = NA_integer_), .Label = c("constant",
"aggregate", "identity"), class = "factor"), class = c("sf",
"data.frame"))
This is an interesting problem and I believe it can be solved by via adjacency matrix.
I will illustrate a possible approach on the well known and much loved NC shapefile that ships with {sf}.
First I construct a subset of North Carolina that makes a contiguous polygon:
library(sf)
library(dplyr)
library(sfdep)
# the one & only NC shapefile...
nc <- st_read(system.file("shape/nc.shp", package="sf")) %>%
# round up the usual suspects...
filter(!NAME %in% c('Dare', 'Currituck', 'Carteret', 'Hyde'))
plot(st_geometry(nc))
Then I construct a adjacency matrix; I personally prefer sfdep::st_contiguity() but other approaches are possible.
The next step is iterating over the matrix, looking for matches between second degree neighbors (neighbors of a given neighbor) and first degree neighbors (neighbors of district / county being evaluated). Should you find a case of no common neighbor - you have your troublemaker!
You will have to first eliminate counties/districts that have only one neighbor. These can not cause discontinuity and are safely ignored (consider county Tyrell on what is left of the Albemarle-Pamlico Peninsula).
Once you have iterated over the entire adjacency matrix you are done.
neighbours <- sfdep::st_contiguity(nc)
troublemakers <- rep(FALSE, times = length(neighbours)) # init of an resultset
for (i in seq_along(neighbours)) {
first_degree <- neighbours[[i]]
# edge case of single neighbor districts needs to be handled;
# it can not create discontinuity and is safe to be ignored
if (length(first_degree) > 1) {
for (j in seq_along(first_degree)) {
# j-th second degree neighbors vs first degree neighbors
wrk_diff <- intersect(unlist(neighbours[first_degree[j]]),
first_degree)
# if no common neighbor >> discontinuity!
if (length(wrk_diff) == 0) troublemakers[i] <- T
}
}
}
And because a picture is worth 1000 of words - let us check the troublemakers on a plot (first all counties, and then overlay with a subset of troublemakers in red)
plot(st_geometry(nc))
plot(st_geometry(nc[troublemakers, ]), col = "red", add = T)

R create column in dataframe value name of dataframe

I have a list of dataframes (these are spatial dataframes) named for example "map_g1_r1_airport", "map_g1_r1_hotel", "map_g1_r2_bank", "map_g1_r2_market"
These are elements that were digitized from several maps. The maps were originally called "map_g1_r1", "map_g1_r2".
I am trying to add a column to each dataframe with the name of the original map using a loop.
Here is what I am trying to do:
map_g1_r1_airport$mapid<-map_g1_r1
With the loop (Unfortunately this does not do what I intend to do. Instead it simply creates a "content" field in the Values board.):
list_df<-c("map_g1_r1_airport", "map_g1_r1_hotel", "map_g1_r2_bank", "map_g1_r2_market")
for (df in 1:length(list_df)){
paste(list_df[df],"$mapid<-",
print(content<-gsub("(.*)_.*","\\1",
c(paste(list_df[df]))),sep=""),
quote=FALSE)}
Any help is most welcome!
Here is one example of the data before change:
structure(list(id = c(1, 2, 3), Name = structure(c(1L, 3L, 4L
), .Label = c("A", "B", "C", "D", "E"
), class = "factor"), Year = structure(c(NA_integer_, NA_integer_,
NA_integer_), .Label = character(0), class = "factor"), geometry = structure(list(
structure(c(41.4086152370865, 2.44718243982123), class = c("XY",
"POINT", "sfg")), structure(c(45.3852740543083, -4.31103098867136
), class = c("XY", "POINT", "sfg")), structure(c(38.4200314592624,
-6.96113884231683), class = c("XY", "POINT", "sfg"))), class = c("sfc_POINT",
"sfc"), precision = 0, bbox = structure(c(xmin = 41.4086152370865,
ymin = 2.31103098867136, xmax = 45.4200314592624, ymax = -4.44718243982123
), class = "bbox"), crs = structure(list(epsg = NA_integer_,
proj4string = NA_character_), class = "crs"), n_empty = 0L)), sf_column = "geometry", agr = structure(c(id = NA_integer_,
Name = NA_integer_, Year = NA_integer_), .Label = c("constant",
"aggregate", "identity"), class = "factor"), row.names = c(NA,
3L), class = c("sf", "data.frame"))
This is what I would like to get (with the mapid map_g1_r1):
structure(list(id = c(1, 2, 3), Name = structure(c(1L, 3L, 4L
), .Label = c("A", "B", "C", "D", "E"
), class = "factor"), Year = structure(c(NA_integer_, NA_integer_,
NA_integer_), .Label = character(0), class = "factor"), geometry = structure(list(
structure(c(41.4086152370865, 2.44718243982123), class = c("XY",
"POINT", "sfg")), structure(c(45.3852740543083, -4.31103098867136
), class = c("XY", "POINT", "sfg")), structure(c(38.4200314592624,
-6.96113884231683), class = c("XY", "POINT", "sfg"))), class = c("sfc_POINT",
"sfc"), precision = 0, bbox = structure(c(xmin = 41.4086152370865,
ymin = 2.31103098867136, xmax = 45.4200314592624, ymax = -4.44718243982123
), class = "bbox"), crs = structure(list(epsg = NA_integer_,
proj4string = NA_character_), class = "crs"), n_empty = 0L),
mapid = c("map_g1_r1", "map_g1_r1", "map_g1_r1")), sf_column = "geometry", agr = structure(c(id = NA_integer_,
Name = NA_integer_, Year = NA_integer_, mapid = NA_integer_), .Label = c("constant",
"aggregate", "identity"), class = "factor"), row.names = c(NA,
3L), class = c("sf", "data.frame"))
You can achieve that even without a loop.
I would first start by creating a list with the names you want to see in each spatial data.frame. I assume they are derived from the names of the list.
mapid = names(list_df)
following that you can employ mapply to use a function that takes
the first element of a list (or vector) and the first element of another list/vector. Them it moves on and apply the same function to the second elements of each vector. It is essentially a multiple input version of lapply.
The function we will give to mapply is cbind which creates takes takes two data.frames and joins them by column. In this case one data.frame will be your spatial object and the other will be a vector with one single element: the current map name. cbind will naturally convert this name to a 1-column data.frame and repeat the name to match the number of rows in the spatial object.
final = mapply(cbind, list_df, mapid)
I haven't tested it, but it should work.
You can get all the individual dataframes in a list using mget and add a new column with their name using mutate.
Using tidyverse functions you can do this as :
library(dplyr)
library(purrr)
list_df<-c("map_g1_r1_airport", "map_g1_r1_hotel", "map_g1_r2_bank", "map_g1_r2_market")
tmp <- mget(list_df)
result <- imap(tmp, ~.x %>% mutate(map_id = .y))
result will have all changed dataframes in a list, if you want these changes to reflect in the original object you can use list2env.
list2env(result, .GlobalEnv)

ggraph edges are connecting wrong?

I am working on generating a hierarchical edge plot where the edge's color/transparency/thickness varies by the column (pvalue) in my connect dataframe, however the color/transparency/thickness of the edges in the plot I generated don't always map to the values in column (pvalue). For example, subgroup1 and subgroup4 should have the strongest thickest connection (pvalue is E-280), when in fact they don't, rather the connection between subgroup3 and subgroup4 looks to be strongest.
This data generates a reproducible example:
> dput(vertices)
structure(list(name = structure(c(3L, 1L, 2L, 4L, 5L, 6L, 7L), .Label = c("gp1",
"gp2", "origin", "subgroup1", "subgroup2", "subgroup3", "subgroup4"
), class = "factor"), id = c(NA, NA, NA, 1L, 2L, 3L, 4L), angle = c(NA,
NA, NA, 0, -90, 0, -90), hjust = c(NA, NA, NA, 1, 1, 1, 1)), row.names = c(NA,
-7L), class = "data.frame")
> dput(hierarchy)
structure(list(from = structure(c(3L, 3L, 1L, 1L, 2L, 2L), .Label = c("gp1",
"gp2", "origin"), class = "factor"), to = structure(1:6, .Label = c("gp1",
"gp2", "subgroup1", "subgroup2", "subgroup3", "subgroup4"), class = "factor")), class = "data.frame", row.names = c(NA,
-6L))
> dput(connect)
structure(list(from = structure(c(1L, 1L, 2L, 3L, 1L, 2L, 3L,
1L), .Label = c("subgroup1", "subgroup2", "subgroup3"), class = "factor"),
to = structure(c(1L, 2L, 2L, 1L, 3L, 3L, 3L, 3L), .Label = c("subgroup2",
"subgroup3", "subgroup4"), class = "factor"), pvalue = c(1.68e-204,
1.59e-121, 9.32e-73, 9.32e-73, 1.59e-21, 9.32e-50, 9.32e-40,
9.32e-280)), class = "data.frame", row.names = c(NA, -8L))
and this is the code I used to make this example plot:
from <- match( connect$from, vertices$name)
to <- match( connect$to, vertices$name)
col <- connect$pvalue
#Let's add information concerning the label we are going to add: angle, horizontal adjustement and potential flip
#calculate the ANGLE of the labels
vertices$id <- NA
myleaves <- which(is.na( match(vertices$name, hierarchy$from) ))
nleaves <- length(myleaves)
vertices$id[ myleaves ] <- seq(1:nleaves)
vertices$angle <- 90 - 360 * vertices$id / nleaves
# calculate the alignment of labels: right or left
# If I am on the left part of the plot, my labels have currently an angle < -90
vertices$hjust <- ifelse( vertices$id < 41, 1, 0)
# flip angle BY to make them readable
vertices$angle <- ifelse(vertices$angle < -90, vertices$angle+180, vertices$angle)
mygraph <- graph_from_data_frame( hierarchy, vertices=vertices )
ggraph(mygraph, layout = 'dendrogram', circular = TRUE) +
geom_node_point(aes(filter = leaf, x = x*1.05, y=y*1.05), size = 2, alpha = 0.8) +
geom_conn_bundle(data = get_con(from = from, to = to, col = col), aes(colour=col, alpha = col, width = col)) +
geom_node_text(aes(x = x*1.1, y=y*1.1, filter = leaf, label=name, angle = angle, hjust=hjust), size=3.5, alpha=0.6) +scale_edge_color_continuous(trans = "log",low="red", high="yellow")+ scale_edge_alpha_continuous(trans = "log",range = c(1, 0.1)) +scale_edge_width_continuous(trans = "log", range = c(4, 1))+
theme_void()
I think there is wrong mapping somewhere but I can't figure out where. Thank you so much for your input!
I believe there is a bug in this library. Rearranging the input data by the column of choice (pvalue in my case) in an ascending order helped but did not solve the issue.
connect_new <- arrange(connect, pvalue)
and I found the solution in a github issue submitted by another user. The subgroups within each group need to be ordered alphabetically in the hierarchy and vertices file. In addition, in the connect dataframe, the subgroups need to be ordered following the same order in the hierarchy and vertices file. Thanks to zhuxr11

What unit is the `dist` argument in `st_buffer` set to by default?

I have the following map of Mexico. It shows all of its municipalities and around 400 weather stations.
I want to create a 10km buffer around each station and eventually, associate each municipality to a station that is located within each radius.
The map and the stations are stored on separate sf objects. I tired the following:
buffers <- st_buffer(stations, dist = 1)
I thought the dist argument was set to kilometers, so I tried dist = 10. Unfortunately, this returned HUGE buffers for each station. That's why I am using dist = 1, but even these buffers are as big as a state! This question, suggests I transform my stations to Irish Grid, but I couldn't replicate the accepted answer. I am now wondering what unit the dist argument is set to.
From the aforementioned question, I assume it's set to degrees. How can I set a 10km buffer around each station?
Additional info:
My CRS is set to 4326 on both objects (the Mexican map and the stations).
This is my stations data:
> dput(head(stations))
structure(list(station_number = c(1004L, 1005L, 1008L, 1012L,
1017L, 1018L), station_alt = c(1925, 1844, 2323, 1589, 2172,
2053), month = c(9L, 9L, 9L, 9L, 9L, 9L), Mean_min = c(11.6,
12.75, 12.25, 13.9666666666667, 12.9, 12.6833333333333), Mean_max = c(26.9333333333333,
26.85, 24.0833333333333, 29.0333333333333, 24.8666666666667,
26.1333333333333), months_observed = c(5L, 5L, 5L, 5L, 5L, 5L
), geometry = structure(list(structure(c(-102.199, 22.001), class = c("XY",
"POINT", "sfg")), structure(c(-102.372, 21.781), class = c("XY",
"POINT", "sfg")), structure(c(-102.135, 22.203), class = c("XY",
"POINT", "sfg")), structure(c(-102.802, 21.794), class = c("XY",
"POINT", "sfg")), structure(c(-102.444, 22.233), class = c("XY",
"POINT", "sfg")), structure(c(-102.415, 22.141), class = c("XY",
"POINT", "sfg"))), class = c("sfc_POINT", "sfc"), precision = 0, bbox = structure(c(xmin = -102.802,
ymin = 21.781, xmax = -102.135, ymax = 22.233), class = "bbox"), crs = structure(list(
epsg = NA_integer_, proj4string = NA_character_), class = "crs"), n_empty = 0L)), sf_column = "geometry", agr = structure(c(station_number = NA_integer_,
station_alt = NA_integer_, month = NA_integer_, Mean_min = NA_integer_,
Mean_max = NA_integer_, months_observed = NA_integer_), .Label = c("constant",
"aggregate", "identity"), class = "factor"), row.names = c(NA,
6L), class = c("sf", "data.frame"))
Your coordinates are long/lat, so the distance will be in degrees. You should first project to a spatial reference in meter units and then take 10 000 meters.
The manual of st_buffer says this about the dist argument:
in case dist is a units object, it should be convertible to
arc_degree if x has geographic coordinates, and to st_crs(x)$units
otherwise
If you leave the coordinates in 4326 you should be able to take something like 0.1 which should be about 11 km for Mexico, but you will see a warning message:
In st_buffer.sfc(st_geometry(x), dist, nQuadSegs, endCapStyle =
endCapStyle, : st_buffer does not correctly buffer
longitude/latitude data
So first convert to another projection (in meter) and enter the distance in meters. This should work, which uses EPSG 7801:
library(sf)
pois <- st_as_sf(stations)
st_crs(pois) <- 4326
pois <- st_transform(pois, crs = 7801)
plot(st_geometry(pois))
buff <- st_buffer(pois, dist = 10000)
plot(st_geometry(buff), add = TRUE)
Control with leaflet and the measure tool:
buff <- st_transform(buff, crs = 4326)
library(leaflet)
leaflet() %>%
addTiles() %>%
addMeasure(primaryLengthUnit = "meters") %>%
addMarkers(data = pois) %>%
addPolygons(data = buff)

Links in table cells in R markdown pdf document

I have a table that I am trying to place in an rmarkdown pdf document. I am also trying to format the table so that the station numbers are clickable links for associated webpages.
I have created a function to transform the original table into a table with links in rmarkdown format. All works well except that the column that the links are in is so large that none of the other columns appear on the page. I have tried changing the width of the figure in the chunk details line, and changing the width of the table in the kable line with format.args= (width = 15). Neither of these options change the formatting, and I have not found any similar questions/answers online. This question: Including links within Rmarkdown tables (pdf) appears to be similar but as my table will produce a table hundreds of pages long, I am hoping to stick to a way of using the links table I have created.
The kable function:
kable(local.select, format = "markdown", row.names= FALSE, format.args= (width = 15))
The issue (very wide column width when links are added to station numbers, and all other columns are pushed off the page):
The original table:
structure(list(Station = structure(1:3, .Label = c("01AF002",
"01AP002", "01AP004"), class = "factor"), FLOW_TYPE = structure(c(1L,
1L, 1L), .Label = "Normal", class = "factor"), Reg. = structure(c(2L,
1L, 1L), .Label = c(" False", " True"), class = "factor"), OperSched = structure(c(1L,
1L, 1L), .Label = " Continuous", class = "factor"), DrainageArea = c(21900L,
668L, 1100L), PEARSEDA = structure(c(1L, 1L, 1L), .Label = "Saint John - St. Croix", class = "factor")), class = "data.frame", row.names = c(NA,
-3L), .Names = c("Station", "FLOW_TYPE", "Reg.", "OperSched",
"DrainageArea", "PEARSEDA"))
The link-converted table:
structure(list(`Station Number` = structure(1:3, .Label = c("[01AF002](http://wateroffice.ec.gc.ca/report/report_e.html?mode=Graph&type=h2oArc&stn=01AF002&dataType=Daily&parameterType=Flow&year=2013&scale=normal)",
"[01AP002](http://wateroffice.ec.gc.ca/report/report_e.html?mode=Graph&type=h2oArc&stn=01AP002&dataType=Daily&parameterType=Flow&year=2013&scale=normal)",
"[01AP004](http://wateroffice.ec.gc.ca/report/report_e.html?mode=Graph&type=h2oArc&stn=01AP004&dataType=Daily&parameterType=Flow&year=2013&scale=normal)"
), class = "factor"), `Water Quantity` = structure(c(1L, 1L,
1L), .Label = "Normal", class = "factor"), `Water Flow` = structure(c(2L,
1L, 1L), .Label = c(" False", " True"), class = "factor"), `Sampling Frequency` = structure(c(1L,
1L, 1L), .Label = " Continuous", class = "factor"), `Drainage Area` = c(21900L,
668L, 1100L), PEARSEDA = structure(c(1L, 1L, 1L), .Label = "Saint John - St. Croix", class = "factor")), .Names = c("Station Number",
"Water Quantity", "Water Flow", "Sampling Frequency", "Drainage Area",
"PEARSEDA"), class = "data.frame", row.names = c(NA, -3L))
The original table in the rmarkdown document (as I would like it to appear when there are links associated with the station names):
Is there a way to put links in a table without the extra spacing it seems to create in the links column?
Thanks for any suggestions.

Resources