Use Rcartogram on a SpatialPolygonsDataFrame object - r

I'm trying to do the same thing asked in this question, Cartogram + choropleth map in R, but starting from a SpatialPolygonsDataFrame and hoping to end up with the same type of object.
I could save the object as a shapefile, use scapetoad, reopen it and convert back, but I'd rather have it all within R so that the procedure is fully reproducible, and so that I can code dozens of variations automatically.
I've forked the Rcartogram code on github and added my efforts so far here.
Essentially what this demo does is create a SpatialGrid over the map, look up the population density at each point of the grid and convert this to a density matrix in the format required for cartogram() to work on. So far so good.
But, how to interpolate the original map points based on the output of cartogram()?
There are two problems here. The first is to get the map and grid into the same units to allow interpolation. The second is to access every point of every polygon, interpolate it, and keep them all in right order.
The grid is in grid units and the map is in projected units (in the case of the example longlat). Either the grid must be projected into longlat, or the map into grid units. My thought is to make a fake CRS and use this along with the spTransform() function in package(rgdal), since this handles every point in the object with minimal fuss.
Accessing every point is difficult because they are several layers down into the SpPDF object: object>polygons>Polygons>lines>coords I think. Any ideas how to access these while keeping the structure of the overall map intact?

This problem can be solved with the getcartr package, available on Chris Brunsdon's GitHub, as beautifully explicated in this blog post.
The quick.carto function does exactly what you want -- takes a SpatialPolygonsDataFrame as input and has a SpatialPolygonsDataFrame as output.
Reproducing the essence of the example in the blog post here in case the link goes dead, with my own style mixed in & typos fixed:
(Shapefile; World Bank population data)
library(getcartr)
library(maptools)
library(data.table)
world <- readShapePoly("TM_WORLD_BORDERS-0.3.shp")
#I use data.table, see blog post if you want a base approach;
# data.table wonks may be struck by the following step as seeming odd;
# see here: http://stackoverflow.com/questions/32380338
# and here: https://github.com/Rdatatable/data.table/issues/1310
# for some background on what's going on.
world#data <- setDT(world#data)
world.pop <- fread("sp.pop.totl_Indicator_en_csv_v2.csv",
select = c("Country Code", "2013"),
col.names = c("ISO3", "pop"))
world#data[world.pop, Population := as.numeric(i.pop), on = "ISO3"]
#calling quick.carto has internal calls to the
# necessary functions from Rcartogram
world.carto <- quick.carto(world, world$Population, blur = 0)
#plotting with a color scale
x <- world#data[!is.na(Population), log10(Population)]
ramp <- colorRampPalette(c("navy", "deepskyblue"))(21L)
xseq <- seq(from = min(x), to = max(x), length.out = 21L)
#annoying to deal with NAs...
cols <- ramp[sapply(x, function(y)
if (length(z <- which.min(abs(xseq - y)))) z else NA)]
plot(world.carto, col = cols,
main = paste0("Cartogram of the World's",
" Population by Country (2013)"))

Related

Normalizing an R stars object by grid area?

first post :)
I've been transitioning my R code from sp() to sf()/stars(), and one thing I'm still trying to grasp is accounting for the area in my grids.
Here's an example code to explain what I mean.
library(stars)
library(tidyverse)
# Reading in an example tif file, from stars() vignette
tif = system.file("tif/L7_ETMs.tif", package = "stars")
x = read_stars(tif)
x
# Get areas for each grid of the x object. Returns stars object with "area" in units of [m^2]
x_area <- st_area(x)
x_area
I tried loosely adopting code from this vignette (https://github.com/r-spatial/stars/blob/master/vignettes/stars5.Rmd) to divide each value in x by it's grid area, and it's not working as expected (perhaps because my objects are stars and not sf?)
x$test1 = x$L7_ETMs.tif / x_area # Some computationally intensive calculation seems to happen, but doesn't produce the results I expect?
x$test1 = x$L7_ETMs.tif / x_area$area # Throws error, "non-conformable arrays"
What does seem to work is the following.
x %>%
mutate(test1 = L7_ETMs.tif / units::set_units(as.numeric(x_area$area), m^2))
Here are the concerns I have with this code.
I worry that as I turn the x_area$area (a matrix, areas in lat/lon) into a numeric vector, I may mess up the lat/lon matching between the grid and it's area. I did some rough testing to see if the areas match up the way I expect them to, but can't escape the worry that this could lead to errors that are difficult to catch.
It just doesn't seem clean that I start with "x_area" in the correct units, only to remove then set the units again during the computation.
Can someone suggest a "cleaner" implementation for what I'm trying to do, i.e. multiplying or dividing grids by its area while maintaining units throughout? Or convince me that the code I have is fine?
Thanks!
I do not know how to improve the stars code, but you can compare the results you get with this
tif <- system.file("tif/L7_ETMs.tif", package = "stars")
library(terra)
r <- rast(tif)
a <- cellSize(r, sum=FALSE)
x <- r / a
With planar data you could do this when it is safe to assume there is no distortion (generally not the case, but it can be the case)
y <- r / prod(res(r))

How can I get the same piece (duplicate code) of an image from many different photos every time?

From 5000 photos of license plates I want to determine which duplicate code these license plates have.
Here are 2 examples of a duplicate code on a license plate.
In the first example the duplicate code is 2 and in the second example the duplicate code is 1.
With the package Magick and Tesseract, see code below, I was able to retrieve the piece of the photo from the first example where the duplicate code is and to read the duplicate code. Only in the second example and other photos is the photo different.
So I am looking for something that can recognize where the duplicate code is and that will read the duplicate code. Note: The duplicate code is always above the 1st indent mark.
Does someone have an idea how to read the duplicate code automatically from 5000 different photos?
library(magick)
library(tesseract)
#Load foto:
foto <- image_read("C:/Users/camie/OneDrive/Documenten/kenteken3.jpg")
#Get piece of photo where duplicate code is retrieved:
foto2 <- image_crop(foto,"10X24-620-170")
#read duplicate code:
cat(ocr(foto3))
Here is an approach based on the package EBImage. ImageMagik is great for image manipulation but I think EBImage may provide more quantitative tools that are useful here. As for all image processing, the quality of input image matters a great deal. The approach suggested here would likely benefit from noise and artifact removal, scaling and possibly cropping.
Also, some licenses seem to have additional symbols in the position of interest that are not numbers. Clearly more pre-processing and filtering are needed for such cases.
Sample image
# Starting from EBImage
if (!require(EBImage)) {
source("http://bioconductor.org/biocLite.R")
biocLite("EBImage")
library(EBImage)
}
# Test images
# setwd(<image directory>)
f1 <- "license1.jpg"
f2 <- "license2.jpg"
# Read image and convert to normalized greyscale
img0 <- readImage(f1)
img <- channel(img0, "grey")
img <- normalize(img)
# plot(img) # insert plot or display commands as desired
# Rudimentary image process for ~300 pixel wide JPEG
xmf <- medianFilter(img, 1)
xgb <- gblur(xmf, 1)
xth <- xgb < otsu(xgb) # Otsu's algorithm to determine best threshold
xto <- opening(xth, makeBrush(3, shape = "diamond"))
A binary (thresholded) image has been produced and cleaned up to identify objects as shown here.
# Create object mask with unique integer for each object
xm <- bwlabel(xto)
# plot(colorLabels(xm)) # optional code to visualize the objects
In addition to the rudimentary image process, some "object processing" can be applied as shown here. Objects along the edge are not going to be of interest so they are removed. Similarly, artifacts that give rise to horizontal (wide) streaks can be removed as well.
# Drop objects touching the edge
nx <- dim(xm)[1]
ny <- dim(xm)[2]
sel <- unique(c(xm[1,], xm[nx,], xm[,1], xm[,ny]))
sel <- sel[sel != 0]
xm <- rmObjects(xm, sel, reenumerate = TRUE)
# Drop exceptionally wide objects (33% of image width)
major <- computeFeatures.moment(xm)[,"m.majoraxis"]
sel <- which(major > nx/3)
xm <- rmObjects(xm, sel, reenumerate = TRUE)
The following logic identifies the center of mass for each object with the computeFeatures.moment function of EBImage. It seems that the main symbols will be along a horizontal line while the candidate object will be above that line (lower y-value in EBImage Image object). An alternative approach would be to find objects stacked on one another, i.e., objects with similar x-values.
For the examples I explored, one standard deviation away from the median y-value for the center of mass appears to be sufficient to identify candidate object. This is used to determine the limits shown below. Of course, this logic should be adjusted as dictated by the actual data.
# Determine center of mass for remaining objects
M <- computeFeatures.moment(xm)
x <- M[,1]
y <- M[,2]
# Show suggested limit on image (y coordinates are inverted)
plot(img)
limit <- median(y) - sd(y)
abline(h = limit, col = "red")
# Show centers of mass on original image
ok <- y < limit
points(x[!ok], y[!ok], pch = 16, col = "blue")
points(x[ok], y[ok], pch = 16, col = "red")
The image shows the segmented objects after having discarded objects along the edge. Red shows the candidate, blue shows the non-candidates.
Because some licenses have two symbols above the dash, the following code selects the leftmost of possible candidates, expands the object mask and returns a rectangular crop of the image that can be passed to ocr().
# Accept leftmost (first) of candidate objects
left <- min(x[which(ok)])
sel <- which(x == left)
# Enlarge object mask and extract the candidate image
xm <- dilate(xm, makeBrush(7, "disc"))
ix <- range(apply(xm, 2, function(v) which(v == sel)))
iy <- range(apply(xm, 1, function(v) which(v == sel)))
xx <- ix[1]:ix[2]
yy <- iy[1]:iy[2]
# "Return" selected portion of image
ans <- img[xx, yy] # this is what can be passed to tesseract
plot(ans, interpolate = FALSE)
Here is the unscaled and extracted candidate image from example 1:
Another sample image
The same code applied to this example gives the following:
With a few more checks for errors and for illogical conditions, the code could be assembled into single function and applied to the list of 5000 files! But of course that assumes they are properly formatted, etc. etc.
What with the existance of multiple layouts for Dutch license plates, I'm not sure if you just can hardcode a method to extract a duplication value. Also you don't mention if every image you have always has the same quality and/or orientation/scale/skew/etc.
You could in theory apply a Convolutional Neural Network that categorizes license plates in a several categories. (0 for n/a, 1 for 1, 2 for 2, etc.) However I am not familiar with related packages in R, so I won't be able to point you to some.

Fit a bezier curve to spatial lines objects in R

I'm trying to made a flow map in R, which so far I've managed to do, but due to my map only covering the space of one country gcIntermediate from the geosphere will create spatial lines for me, but they have no curve.
I thought maybe I could add a bezier curve to my lines, but I'm having zero luck with working out how I might do that.
long <- runif(10000, 49.92332, 55.02101) #Random co-ordinates
lat <- runif(10000, -6.30217, 1.373248) # Random co-ordinates
df <- as.data.frame.matrix(data.frame(Lat.1 = sample(lat, 10),
Long.1 = sample(long, 10),
Lat.2 = sample(lat, 10),
Long.2 = sample(long, 10))) # Dataframe of flow beginning to flow end
lines <- gcIntermediate(df[,c("Long.1", "Lat.1")], df[,c("Long.2", "Lat.2")], 500, addStartEnd = TRUE) #Create spatial lines with the geosphere package
plot(lines) #Some very straight lines
My problem comes when setting a start and end point for the bezier line, as the function in the bezier package only seems to accept one value for start and one for end, which given each point needs two values (long, lat) to define it I'm a bit stumped by.
I won't bore you with all of the different things I've tried with the bezier package (as none of them worked), here are some things that didn't work
bezier(sep(0,1,100), lines, lines$Long.1~lines$Lat.1, lines$Long.2~lines$Lat.2) # Won't accept a line object and I don't think Long.1 etc exist anymore
bezier(sep(0,1,100), df, df$Long.1~df$Lat.1, df$Long.2~df$Lat.2) #Hoped that if I used a formula syntax I could combine the long/lat of the starting and ending points respectively (I can't)
Has anyone got any insight on this? It's quite frustrating being so close and yet so far.

Analyzing octopus catches with LinearK function in R [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I hope you can help me with this problem i can't find how to overcome. Sorry if I made some mistakes while writing this post, my english is a bit rusty right now.
Here is the question. I have .shp data that I want to analyze in R. The .shp can be either lines that represent lines of traps we set to catch octopuses or points located directly over those lines, representing where we had catured one.
The question i'm trying to answer is: Are octopuses statistically grouped or not?
After a bit of investigation it seems to me that i need to use R and its linearK function to answer that question, using the libraries Maptools, SpatStat and Sp.
Here is the code i'm using in RStudio:
Loading the libraries
library(spatstat)
library(maptools)
library(sp)
Creating a linnet object with the track
t1<- as.linnet(readShapeSpatial("./20170518/t1.shp"))
I get the following warning but it seems to work
Warning messages:
1: use rgdal::readOGR or sf::st_read
2: use rgdal::readOGR or sf::st_read
Plotting it to be sure everything is ok
plot(t1)
Creating a ppp object with the points
p1<- as.ppp(readShapeSpatial("./20170518/p1.shp"))
I get the same warning here, but the real problems start when I try to plot it:
> plot(p1)
Error in if (!is.vector(xrange) || length(xrange) != 2 || xrange[2L] < :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: Interpretation of arguments maxsize and markscale has changed (in spatstat version 1.37-0 and later). Size of a circle is now measured by its diameter.
2: In plot.ppp(x, ..., multiplot = FALSE, do.plot = FALSE) :
All mark values are NA; plotting locations only.
3: In plot.ppp(x, ..., multiplot = FALSE, do.plot = FALSE) :
All mark values are NA; plotting locations only.
4: In plot.ppp(x, ..., multiplot = FALSE, do.plot = FALSE) :
All mark values are NA; plotting locations only.
5: In plot.ppp(x, ..., multiplot = FALSE, do.plot = FALSE) :
All mark values are NA; plotting locations only.
6: In plot.ppp(x, ..., multiplot = FALSE, do.plot = FALSE) :
All mark values are NA; plotting locations only.
7: In plot.ppp(x, ..., multiplot = FALSE, do.plot = FALSE) :
All mark values are NA; plotting locations only.
Now what is left is to join the objects in a lpp object and to analyze it with the linearK function
> pt1 <- lpp(p1,t1)
> linearK(pt1)
Function value object (class ‘fv’)
for the function r -> K[L](r)
......................................
Math.label Description
r r distance argument r
est {hat(K)[L]}(r) estimated K[L](r)
......................................
Default plot formula: .~r
where “.” stands for ‘est’
Recommended range of argument r: [0, 815.64]
Available range of argument r: [0, 815.64]
This is my situation right now. What i dont know is why the plot function is not working with my ppp object and how to understant the return of the linearK function. Help(linearK) didn't provide any clue. Since i have a lot of tracks, each with its set of points, my desired outcome would be some kind of summary like x tracks analized, a grouped, b dispersed and c unkown.
Thank you for your time, i'll greatly appreciate if you can help me solve this problem.
Edit: Here is a link to a zip file containing al the shp files of one day, both tracks and points, and a txt file with my code. https://drive.google.com/open?id=0B0uvwT-2l4A5ODJpOTdCekIxWUU
First two pieces of general advice: (1) each time you create a complicated object, print it at the terminal, to see if it is what you expected. (2) When you get an error, immediately type traceback() and copy the output. This will reveal exactly where the error is detected.
A ppp object must include a specification of the study region (window). In your code, the object p1 is created by converting data of class SpatialPointsDataFrame, which do not include a specification of the study region, converted via the function as.ppp.SpatialPointsDataFrame, into an object of class ppp in which the window is guessed by taking the bounding box of the coordinates. Unfortunately, in your example, there is only one data point in p1, so the default bounding box is a rectangle of width 0 and height 0. [This would have been revealed by printing p1.] Such objects can usually be handled by spatstat, but this particular object triggers a bug in the function plot.solist which expects windows to have non-zero size. I will fix the bug, but...
In your case, I suggest you do
Window(p1) <- Window(t1)
immediately after creating p1. This will ensure that p1 has the window that you probably intended.
If all else fails, read the spatstat vignette on shapefiles...
I have managed to find a solution. As Adrian Baddeley noticed there was a problem with the owin object. That problem seems to be bypassed (not really solved) if I create the ppp object in a manual way instead of converting my set of points.
I have also changed the readShapeFile function for the rgdal::readOGR, since the first once was deprecated, and that was the reason of the warnings I was getting.
This is the R script i'm using right now, commented to clarify:
#first install spatstat, maptools y sp
#load them
library(spatstat)
library(maptools)
library(sp)
#create an array of folders, will add more when everything works fine
folders=c("20170518")
for(f in folders){
#read all shp from that folder, both points and tracks
pointfiles <- list.files(paste("./",f,"/points", sep=""), pattern="*.shp$")
trackfiles <- list.files(paste("./",f,"/tracks", sep=""), pattern="*.shp$")
#for each point and track couple
for(i in 1:length(pointfiles)){
#create a linnet object with the track
t<- as.linnet(rgdal::readOGR(paste("./",f,"/tracks/",trackfiles[i], sep="")))
#plot(t)
#create a ppp object for each set of points
pre_p<-rgdal::readOGR(paste("./",f,"/points/",pointfiles[i], sep=""))
#plot(p)
#obtain the coordinates the current set of points
c<-coordinates(pre_p)
#create vector of x coords
xc=c()
#create vector of y coords
yc=c()
#not a very good way to fill my vectors but it works for my study area
for(v in c){
if(v>4000000){yc<-c(yc,v)}
else {if(v<4000000 && v>700000){xc<-c(xc,v)}}
}
print(xc)
print(yc)
#create a ppp object using the vectors of x and y coords, and a window object
#extracted from my set of points
p=ppp(xc,yc,Window(as.ppp(pre_p)))
#join them into an lpp object
pt <- lpp(p,t)
#plot(pt)
#analize it with the linearK function, nsim=9 for testing purposes
#envelope.lpp is the method for analyzing linear point patterns
assign(paste("results",f,i,sep="_"),envelope.lpp(pt, nsim=9, fun=linearK))
}#end for each points & track set
}#end for each day of study
So as you can see this script is testing for CSR each couple of points and track for each day, working fine right now. Unfortunately I have not managed to create a report or reportlike with the results yet (or even to fully understand them), I'll keep working on that. Of course I can use any advice you have, since this is my first try with R and many newie mistakes will happen.
The script and the shp files with the updated folder structure can be found here(113 KB size)

R - original colours of georeferenced raster image using ggplot2- and raster-packages

I would like to use the original colortable of a >>georeferenced raster image<< (tif-file) as coloured scale in a map plotted by ggplot/ggplot2.
Due to not finding an easier solution, I accessed the colortable-slot from the legend-attribute of the loaded raster image (object) raster1 like so:
raster1 <- raster(paste(workingDir, "/HUEK200_Durchlaessigkeit001_proj001.tif", sep="", collapse=""))
raster1.pts <- rasterToPoints(raster1)
raster1.df <- data.frame(raster1.pts)
colTab <- attr(raster1, "legend")#colortable
Ok, so far so good. Now I simply need to apply colortable as a colored scale to my existing plot:
(ggplot(data=raster1.df)
+ geom_tile(aes(x, y, fill=raster1.df[[3]]))
+ scale_fill_gradientn(values=1:length(colTab), colours=colTab, guide=FALSE)
+ coord_fixed(ratio=1)
)
Unfortunately, this does not work as expected. The resulting image does not show any colors beside white and the typical ggplot-grey which often appears when no custom values are defined. At the moment, I am a little clueless what is actually wrong here. I assumed that the underlying band values stored in raster1.df[[3]] are indices for the color table. This might be wrong. If it is wrong, then how are the band values connected with the colortable? And even if my assumption would be right: The parameters which I have given to scale_fill_gradientn() should still result in a more colorful plot, shouldn't they? I checked out what the unique values are:
sort(unique(raster1.df[[3]]))
This outputs:
[1] 0 1 2 3 4 5 6 7 8 9 10 11 12
Apparently, not all of the 256 members of colortable are used which reminds me that the color does not always need to reflect the underlying band-data distribution (especially when including multiple bands).
I hope, my last thoughts didn't confuse you about the fact that the objective is quite straight forward.
Thank you for your help!
Ok, I have found an answer which might not apply to every georeferenced raster image out there, but maybe almost.
First, my assumption that the data values do bot exactly represent the color selection was wrong. There are 15 unique colors in the colortable of the spatial raster object. However, not all of them are used (14 and 15). Ok, now I know, I have to map my values to the corresponding colors ina way that scale_fill_gradientn understands. For this I am using my previous initial code snippet and define a new variable valTab which stores all unique data values of the given band:
raster1 <- raster(paste(workingDir, "/HUEK200_Durchlaessigkeit001_proj001.tif", sep="", collapse=""))
raster1.pts <- rasterToPoints(raster1)
raster1.df <- data.frame(raster1.pts)
raster1.img <- melt(raster1)
colTab <- attr(raster1, "legend")#colortable
names(colTab) <- 0:(length(colTab) - 1)
valTab <- sort(unique(raster1.df[[3]]))
Notice, how index names are defined for colTab - this will be important soon. With this, I am able to automatically relate all active colors with their respective value while plotting:
(ggplot(data=raster1.df)
+ geom_tile(aes(x, y, fill=raster1.df[[3]]))
+ scale_fill_gradientn(colours=colTab[as.character(valTab)])
+ coord_fixed(ratio=1)
)
Using valTab-members as references to the corresponding color-indices helps to always pick only the colors which are needed. I don't know if defining the values-paramter of scale_fill_gradientn() is necessary in some cases.
I am not sure if the raster images read by raster() always define their values starting from 0. If not, names(colTab) <- 0:(length(colTab) - 1) needs to be adjusted.
I hope, this helps somebody in the future. At least, I finally have a solution!

Resources