"CRS object has comment, which is lost in output" in krige - r

I am trying to krige for water quality dataset with Latitude, longitude which using CRS("+init=epsg:4326").
GGT <- read.csv("C:/Users/user/Data/newdata2019.csv")
coordinates(GGT) = ~Lon+Lat
GGT <- st_as_sf(GGT)
st_crs(GGT) <- 4326
GGTgrid <- readOGR('C:/Users/user/Desktop/FisheryScience/Data/Maps/GGTgrid.shp')
GGTgrid1 <- st_as_stars(GGTgrid, crs = 4326)
st_crs(GGTgrid1) <- 4326
-This is what GGT dataset looks like
vario <- variogram(log(DO_S)~1, GGT)
model_GGT <- fit.variogram(vario, model=vgm(psill = 1, model= 'Sph', range= 200, nugget =1))
plot(vario, model = model_GGT)
-It seems to work fine by here
Then it shows error when I run the code block below
krige_result <- krige(formula = log(DO_S)~1, GGT, GGTgrid1, model = model_GGT)
with long lines of error
Warning message in proj4string(obj):
"CRS object has comment, which is lost in output"Warning message in proj4string(obj):
"CRS object has comment, which is lost in output"Warning message in proj4string(obj):

The warnings you get are indicating that your work may be affected by the change introduced with PROJ 6 (and GDAL 3), adopted by R-spatial and rspatial. You can get all the details using this two links:
https://rgdal.r-forge.r-project.org/articles/CRS_projections_transformations.html
https://r-spatial.org/r/2020/03/17/wkt.html
To make these warning messages disappear, you just have to use objects of type sf and stars (by installing/loading the packages of the same names) which take into account these recent changes. So, I suggest you use the following few lines of code at the beginning of your script to replace your first six lines of code. This will give you two objects (i.e. GGT of type sf and GGTgrid1 of type stars) :
GGT <- read.csv("C:/Users/user/Data/newdata2019.csv")
coordinates(GGT) = ~Lon+Lat
GGT <- st_as_sf(GGT)
st_crs(GGT) <- 4326
GGTgrid <- readOGR('C:/Users/user/Desktop/maps/GGTgrid.shp')
coordinates(GGTgrid) <- ~x+y
GGTgrid1 <- st_as_stars(GGTgrid1, crs = 4326)
st_crs(GGTgrid1) <- 4326
It is easier for me to work on real data than to work "virtually" with the name of your objects as I don't have your original files. So I prefer to show you how to proceed for your analysis with the "meuse" data contained in the sp package.
By analogy with the reprex I give you, I think you should be able to manage with your own files. And you will see, no more warning message will appear :-)
Please, find below my reprex.
Reprex
Loading the library and the data
library(sp)
library(sf)
library(stars)
library(gstat)
data(meuse) # loading the data (equivalent of your csv file)
coordinates(meuse) = ~x+y # you already know this step ;-)
# Just a look to the class of original data
class(meuse)
#> [1] "SpatialPointsDataFrame"
#> attr(,"package")
#> [1] "sp" # "meuse" is an object of class 'sp'
data(meuse.grid) # loading the data (equivalent of your shp file)
gridded(meuse.grid) = ~x+y
# Just a look to the class of original data
class(meuse.grid)
#> [1] "SpatialPixelsDataFrame"
#> attr(,"package")
#> [1] "sp" # "meuse" is an object of class 'sp'
Converting the meuse data into sf object and the meuse.grid data into stars object
# Convert 'sp' object 'meuse' (i.e. SpatialPointsDataFrame) into 'sf' object
meuse <- st_as_sf(meuse)
class(meuse)
#> [1] "sf" "data.frame" # meuse is indeed of class 'sf'
# Convert 'sp' object 'meuse.grid' (i.e. SpatialPixelDataFrame) into 'stars' object
meuse.grid <- st_as_stars(meuse.grid)
class(meuse.grid)
#> [1] "stars" # meuse.grid is indeed of class 'stars'
Compute and plot the variogram
vario <- variogram(log(zinc)~1, meuse)
model_meuse <- fit.variogram(vario, model = vgm(psill = 1, model = "Sph", range = 200, nugget = 1))
plot(vario, model = model_meuse)
Krige and plot predictions and variances
krige_result <- krige(formula = log(zinc)~1, meuse, meuse.grid, model = model_meuse)
#> [using ordinary kriging]
class(krige_result)
#> [1] "stars"
krige_result
#> stars object with 2 dimensions and 2 attributes
#> attribute(s):
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> var1.pred 4.77655207 5.2376428 5.5728908 5.7072284 6.1717618 7.4399908 5009
#> var1.var 0.08549102 0.1372838 0.1621815 0.1853301 0.2116141 0.5002793 5009
#> dimension(s):
#> from to offset delta refsys point values x/y
#> x 1 78 178440 40 NA NA NULL [x]
#> y 1 104 333760 -40 NA NA NULL [y]
plot(krige_result[1]) # plot predictions
plot(krige_result[2]) # plot variances
Created on 2021-10-19 by the reprex package (v2.0.1)

Related

Reading large shapefile in R using using sf package

I have a shapefile of points with 32M observations. I want to load it in R and I have tried read_sf and st_read but my R session keeps getting crashed. One other way that came to my mind was to write a for loop, subsetting columns that I want and maybe going a specific number of rows at a time and then rbinding them, but cannot figure out how to make R understand the query. Here's what I have so far which is not working:
for (i in 1:10) {
j = i-1
jj = i+1
print(i)
print(j)
print(jj)
A <- read_sf("C:\\Users\\...parcels-20210802T125336Z-001\\parcels\\join_L3_Mad_Addresses.shp", query = "SELECT FID, CENTROID_I, LOC_ID FROM join_L3_Mad_Addresses WHERE FID < "jj" AND FID > "j"")
}
I think you can readapt the following code.
Load packages
library(sf)
#> Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
Define path to a shapefile
dsn <- system.file("shape/nc.shp", package="sf")
Count the number of features in dsn
st_layers(dsn, do_count = TRUE)
#> Driver: ESRI Shapefile
#> Available layers:
#> layer_name geometry_type features fields
#> 1 nc Polygon 100 14
Start a for loop to read 10 features at a time. Add the data to a list
shp_data_list <- list()
i <- 1
for (offset in seq(10, 100, by = 10)) {
query <- paste0("SELECT * FROM nc LIMIT ", 10, " OFFSET ", offset - 10)
shp_data_list[[i]] <- st_read(dsn, query = query, quiet = TRUE)
gc(verbose = FALSE)
i <- i + 1
}
Rbind the objects
shp_data <- do.call(rbind, shp_data_list)
Add an ID column (just for plotting)
shp_data$ID <- as.character(rep(1:10, each = 10))
plot(shp_data["ID"])
The only problem is that this process may not preserve the geometry type. For
example,
unique(st_geometry_type(shp_data))
#> [1] MULTIPOLYGON POLYGON
#> 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE
while
unique(st_geometry_type(st_read(dsn, quiet = TRUE)))
#> [1] MULTIPOLYGON
#> 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE
You can change the geometry type with st_cast()
Created on 2021-08-03 by the reprex package (v2.0.0)

Plot the longest transcript in GenomicRanges with ggbio

I am trying to plot an specific region using ggbio. I am using the below code that produced my desire output, except that it contains several transcript. Is it possible to only plot the longest transcript? I've not been able to access the genomic ranges object within Homo.sapiens that I assume contains this information.
library(ggbio)
library(Homo.sapiens)
range <- GRanges("chr10" , IRanges(start = 78000000 , end = 79000000))
p.txdb <- autoplot(Homo.sapiens, which = range)
p.txdb
Here is a solution that involves filtering TxDb.Hsapiens.UCSC.hg19.knownGene on the longest transcript by gene_id (which does remove genes without gene_id):
suppressPackageStartupMessages({
invisible(lapply(c("ggbio", "biovizBase", "data.table",
"TxDb.Hsapiens.UCSC.hg19.knownGene",
"org.Hs.eg.db"),
require, character.only = TRUE))})
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
# retrieve transcript lengths
txlen <- transcriptLengths(txdb, with.utr5_len=TRUE, with.utr3_len=TRUE)
setDT(txlen)
txlen$len <- rowSums(as.matrix(txlen[, .(tx_len, utr5_len, utr3_len)]))
setkey(txlen, gene_id, len, tx_id)
# filter longesttranscript by gene_id
ltx <- txlen[!is.na(gene_id)][, tail(.SD,1), by=gene_id]$tx_id
# filter txdb object
txb <- as.list(txdb)
txb$transcripts <- txb$transcripts[txb$transcripts$tx_id %in% ltx, ]
txb$splicings <- txb$splicings[txb$splicings$tx_id %in% ltx,]
txb$genes <- txb$genes[txb$genes$tx_id %in% ltx,]
txb <- do.call(makeTxDb, txb)
# plot according to vignette, chapter 2.2.5
range <- GRanges("chr10", IRanges(start = 78000000 , end = 79000000))
gr.txdb <- crunch(txb, which = range)
#> Parsing transcripts...
#> Parsing exons...
#> Parsing cds...
#> Parsing utrs...
#> ------exons...
#> ------cdss...
#> ------introns...
#> ------utr...
#> aggregating...
#> Done
colnames(values(gr.txdb))[4] <- "model"
grl <- split(gr.txdb, gr.txdb$gene_id)
symbols <- select(org.Hs.eg.db, keys=names(grl), columns="SYMBOL", keytype="ENTREZID")
#> 'select()' returned 1:1 mapping between keys and columns
names(grl) <- symbols[match(symbols$ENTREZID, names(grl), nomatch=0),"SYMBOL"]
autoplot(grl, aes(type = "model"), gap.geom="chevron")
#> Constructing graphics...
Created on 2020-05-29 by the reprex package (v0.3.0)
Edit:
To get gene symbols instead of gene (or transcript) ids, just replace the names of grl with the associated gene symbols, e.g. via org.Hs.eg.db, or any other resource that matches them up.

Add new node to SpatialLinesNetwork in stplanr

How can one add a new node to a SpatialLinesNetwork?
context of my problem: I have a shapefile of a bus route and another shapefile of bus stops. I want to calculate the distance between stops along the bus route. Ideally, each stop would be a node and I would use stplanr::sum_network_routes() to calculate the distance between them. The problem is that when I convert the bus route into a SpatialLinesNetwork the network only has a few nodes that are far from each other and unrelated to bus stops locations.
reproducible dataset:
# load library and data
library(stplanr)
library(sf)
# get road data
data(routes_fast)
rnet <- overline(routes_fast, attrib = "length")
# convert to sf obj
rnet <- st_as_sf(rnet)
# convert SpatialLinesDataFrame into SpatialLinesNetwork
sln <- SpatialLinesNetwork(rnet)
# identify nodes
sln_nodes = sln2points(sln)
# Here is a bus stop which should be added as a node
new_point_coordinates = c(-1.535, 53.809)
p = sf::st_sf(geometry = sf::st_sfc(sf::st_point(new_point_coordinates)), crs = st_crs(rnet))
# plot
plot(sln, col = "gray") # network
plot(sln_nodes, col="red", add = TRUE) # nodes
plot(p, add=T, col="blue") # stop to be added as a new node
This doesn't answer your question at the outset, but I believe it does resolve your "Context" by showing how your desired network distances can be calculated. This can be done with dodgr (latest dev version) like this:
library (dodgr)
library (stplanr)
library (sf)
library (sp)
dat <- st_as_sf (routes_fast)
net <- weight_streetnet (dat, wt_profile = 1)
The net object is a simple data.frame containing all edges and vertices of the network. Then adapt your code above to get the routing points as a simple matrix
rnet rnet <- overline(routes_fast, attrib = "length")
SLN <- SpatialLinesNetwork(rnet)
sln_nodes = sln2points(SLN)
xy <- coordinates (sln_nodes)
colnames (xy) <- c ("x", "y")
Node that sln2points simply returns "nodes" (in stplanr terminology), which are junction points. You can instead replace with coordinates of bus stops, or simply add those to this matrix. The following three lines convert those coordinates to unique (nearest) vertex IDs of the dodgr net object:
v <- dodgr_vertices (net)
pts <- match_pts_to_graph (v, xy)
pts <- v$id [pts]
To calculate distances between those pts on the network, just
d <- dodgr_dists (net, from = pts, to = pts)
Thanks for the question, thanks to this question and subsequent collaboration with Andrea Gilardi, I'm happy to announce that it is now possible to add new nodes to sfNetwork objects with a new function, sln_add_node().
See below and please try to test reproducible code that demonstrates how it works:
devtools::install_github("ropensci/stplanr")
#> Skipping install of 'stplanr' from a github remote, the SHA1 (33158a5b) has not changed since last install.
#> Use `force = TRUE` to force installation
library(stplanr)
#> Registered S3 method overwritten by 'R.oo':
#> method from
#> throw.default R.methodsS3
#> Warning in fun(libname, pkgname): rgeos: versions of GEOS runtime 3.7.1-CAPI-1.11.1
#> and GEOS at installation 3.7.0-CAPI-1.11.0differ
sample_routes <- routes_fast_sf[2:6, NULL]
sample_routes$value <- rep(1:3, length.out = 5)
rnet <- overline2(sample_routes, attrib = "value")
#> 2019-09-26 16:06:18 constructing segments
#> 2019-09-26 16:06:18 building geometry
#> 2019-09-26 16:06:18 simplifying geometry
#> 2019-09-26 16:06:18 aggregating flows
#> 2019-09-26 16:06:18 rejoining segments into linestrings
plot(sample_routes["value"], lwd = sample_routes$value, main = "Routes")
plot(rnet["value"], lwd = rnet$value, main = "Route network")
sln <- SpatialLinesNetwork(rnet)
#> Linking to GEOS 3.7.1, GDAL 2.4.0, PROJ 5.2.0
new_point_coordinates <- c(-1.540, 53.826)
crs <- sf::st_crs(rnet)
p <- sf::st_sf(geometry = sf::st_sfc(sf::st_point(new_point_coordinates)), crs = crs)
p_dest <- sln2points(sln)[9, ]
# We can identify the nearest point on the network at this point
# and use that to split the associated linestring:
sln_new <- sln_add_node(sln = sln, p = p)
#> although coordinates are longitude/latitude, st_nearest_feature assumes that they are planar
route_new <- route_local(sln = sln_new, from = p, to = p_dest)
plot(sln_new)
plot(p, add = TRUE)
plot(route_new, lwd = 5, add = TRUE)
#> Warning in plot.sf(route_new, lwd = 5, add = TRUE): ignoring all but the
#> first attribute
Created on 2019-09-26 by the reprex package (v0.3.0)
In case it's of use/interest, see the source code of the new small family of functions that support this new functionality here: https://github.com/ropensci/stplanr/blob/master/R/node-funs.R

How to create an interactive plot of GTFS data in R using Leaflet?

I would like to create an interactive map showing the public transport lines of a city. I am trying to do this using Leaflet in R (but I'm open to alternatives, suggestions?)
Data: The data of the transport system is in GTFS format, organized in text files (.txt), which I read into R as a data frame.*
The Problem: I cannot find how to indicate the id of each Poly line (variable shape_id) so the plot would actually follow the route of each transit line. Instead, it is connecting the dots in a random sequence.
Here is what I've tried, so far without success:
# Download GTFS data of the Victoria Regional Transit System
tf <- tempfile()
td <- tempdir()
ftp.path <- "http://www.gtfs-data-exchange.com/agency/bc-transit-victoria-regional-transit-system/latest.zip"
download.file(ftp.path, tf)
# Read text file to a data frame
zipfile <- unzip( tf , exdir = td )
shape <- read.csv(zipfile[9])
# Create base map
basemap <- leaflet() %>% addTiles()
# Add transit layer
basemap %>% addPolylines(lng=shape$shape_pt_lon, lat=shape$shape_pt_lat,
fill = FALSE,
layerId =shape$shape_id)
I would be glad to have your comments on this.
*I know it is possible to import this data into a GIS software (e.g. QGIS) to create a shapefile and then read the shapefile into R with readOGR. Robin Lovelace has shown how to do this. BUT, I am looking for a pure R solution. ;)
ps. Kyle Walker has written a great intro to interactive maps in R using Leaflet. Unfortunately, he doesn't cover poly lines in his tutorial.
Your problem is not one of method but of data: note that you download 8 MB and that the line file you try to load into Leaflet via shiny is 5 MB. As a general principle, you should always try new methods with tiny datasets first, before scaling them up. This is what I do below to diagnose the problem and solve it.
Stage 1: Explore and subset the data
pkgs <- c("leaflet", "shiny" # packages we'll use
, "maps" # to test antiquated 'maps' data type
, "maptools" # to convert 'maps' data type to Spatial* data
)
lapply(pkgs, "library", character.only = TRUE)
class(shape)
## [1] "data.frame"
head(shape)
## shape_id shape_pt_lon shape_pt_lat shape_pt_sequence
## 1 1-39-220 -123.4194 48.49065 0
## 2 1-39-220 -123.4195 48.49083 1
## 3 1-39-220 -123.4195 48.49088 2
## 4 1-39-220 -123.4196 48.49123 3
## 5 1-39-220 -123.4197 48.49160 4
## 6 1-39-220 -123.4196 48.49209 5
object.size(shape) / 1000000 # 5 MB!!!
## 5.538232 bytes
summary(shape$shape_id)
shape$shape_id <- as.character(shape$shape_id)
ids <- unique(shape$shape_id)
shape_orig <- shape
shape <- shape[shape$shape_id == ids[1],] # subset the data
Stage 2: Convert to a Spatial* object
Is this like the data.frame objects from maps?
state.map <- map("state", plot = FALSE, fill = TRUE)
str(state.map)
## List of 4
## $ x : num [1:15599] -87.5 -87.5 -87.5 -87.5 -87.6 ...
## $ y : num [1:15599] 30.4 30.4 30.4 30.3 30.3 ...
## $ range: num [1:4] -124.7 -67 25.1 49.4
## $ names: chr [1:63] "alabama" "arizona" "arkansas" "california" ...
## - attr(*, "class")= chr "map"
Yes, it's similar, so we can use map2Spatial* to convert it:
shape_map <- list(x = shape$shape_pt_lon, y = shape$shape_pt_lat)
shape_lines <- map2SpatialLines(shape_map, IDs = ids[1])
plot(shape_lines) # success - this plots a single line!
Stage 3: Join all the lines together
A for loop will do this nicely. Note we only use the first 10 lines. Use 2:length(ids) for all lines:
for(i in 2:10){
shape <- shape_orig[shape_orig$shape_id == ids[i],]
shape_map <- list(x = shape$shape_pt_lon, y = shape$shape_pt_lat)
shape_temp <- map2SpatialLines(shape_map, IDs = ids[i])
shape_lines <- spRbind(shape_lines, shape_temp)
}
Stage 4: Plot
Using the SpatialLines object makes the code a little shorter - this will plot the first 10 lines in this case:
leaflet() %>%
addTiles() %>%
addPolylines(data = shape_lines)
Conclusion
You needed to play around with the data and manipulate it before converting it into a Spatial* data type for plotting, with the correct IDs. maptools::map2Spatial*, unique() and a clever for loop can solve the problem.

Convert latitude and longitude coordinates to country name in R

I have a list of latitude and longitude coordinates, and wish to find out which country they all reside in.
I modified an answer from this question about lat-long to US states, and have a working function, but I run into the problem that the worldHires map (from the mapdata package) is hideously out of date and contains a lot of obsolete countries such as Yugoslavia and the USSR.
How would I modify this function to use a more modern package, such as rworldmap? I have only managed to frustrate myself so far...
library(sp)
library(maps)
library(rgeos)
library(maptools)
# The single argument to this function, points, is a data.frame in which:
# - column 1 contains the longitude in degrees
# - column 2 contains the latitude in degrees
coords2country = function(points)
{
# prepare a SpatialPolygons object with one poly per country
countries = map('worldHires', fill=TRUE, col="transparent", plot=FALSE)
names = sapply(strsplit(countries$names, ":"), function(x) x[1])
# clean up polygons that are out of bounds
filter = countries$x < -180 & !is.na(countries$x)
countries$x[filter] = -180
filter = countries$x > 180 & !is.na(countries$x)
countries$x[filter] = 180
countriesSP = map2SpatialPolygons(countries, IDs=ids, proj4string=CRS("+proj=longlat +datum=wgs84"))
# convert our list of points to a SpatialPoints object
pointsSP = SpatialPoints(points, proj4string=CRS("+proj=longlat +datum=wgs84"))
# use 'over' to get indices of the Polygons object containing each point
indices = over(pointsSP, countriesSP)
# Return the state names of the Polygons object containing each point
myNames = sapply(countriesSP#polygons, function(x) x#ID)
myNames[indices]
}
##
## this works... but it has obsolete countries in it
##
# set up some points to test
points = data.frame(lon=c(0, 5, 10, 15, 20), lat=c(51.5, 50, 48.5, 47, 44.5))
# plot them on a map
map("worldHires", xlim=c(-10, 30), ylim=c(30, 60))
points(points$lon, points$lat, col="red")
# get a list of country names
coords2country(points)
# returns [1] "UK" "Belgium" "Germany" "Austria" "Yugoslavia"
# number 5 should probably be in Serbia...
Thanks for the carefully constructed question.
It required just a couple of line changes to be able to use rworldmap (containing up-to-date countries) see below. I'm not an expert on CRS but I don't think the change I had to make to the proj4string makes any difference. Others might like to comment on that.
This worked for me & gave :
> coords2country(points)
[1] United Kingdom Belgium Germany Austria
[5] Republic of Serbia
All the best,
Andy
library(sp)
library(rworldmap)
# The single argument to this function, points, is a data.frame in which:
# - column 1 contains the longitude in degrees
# - column 2 contains the latitude in degrees
coords2country = function(points)
{
countriesSP <- getMap(resolution='low')
#countriesSP <- getMap(resolution='high') #you could use high res map from rworldxtra if you were concerned about detail
# convert our list of points to a SpatialPoints object
# pointsSP = SpatialPoints(points, proj4string=CRS(" +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0"))
#setting CRS directly to that from rworldmap
pointsSP = SpatialPoints(points, proj4string=CRS(proj4string(countriesSP)))
# use 'over' to get indices of the Polygons object containing each point
indices = over(pointsSP, countriesSP)
# return the ADMIN names of each country
indices$ADMIN
#indices$ISO3 # returns the ISO3 code
#indices$continent # returns the continent (6 continent model)
#indices$REGION # returns the continent (7 continent model)
}
You can use my geonames package to lookup from the http://geonames.org/ service:
> GNcountryCode(51.5,0)
$languages
[1] "en-GB,cy-GB,gd"
$distance
[1] "0"
$countryName
[1] "United Kingdom of Great Britain and Northern Ireland"
$countryCode
[1] "GB"
> GNcountryCode(44.5,20)
$languages
[1] "sr,hu,bs,rom"
$distance
[1] "0"
$countryName
[1] "Serbia"
$countryCode
[1] "RS"
Get it from r-forge because I'm not sure I bothered to release it to CRAN:
https://r-forge.r-project.org/projects/geonames/
Yes, it depends on an external service, but at least it knows what happened to communism... :)

Resources