Confusion about the mask() function in the raster package - r

I apologise in advance for the very basic nature of this question, but I'm confused about how the mask() function works in the raster package in R.
Reading the documentation for the function it sounds like cells in raster x are set to NA (the default) if these cells match a maskvalue in a mask object (the default maskvalue being NA). However, the description of the mask() function in the book Geocomputation with R by Lovelace et al. (https://geocompr.robinlovelace.net/spatial-operations.html#spatial-ras) (section 4.3.1) makes it sound as if cells in raster x are KEPT if they match a maskvalue in a mask object, and set to NA if they don't. They give this example:
mask(elev, rmask, maskvalue = TRUE)
"we only want to keep those values of elev which are TRUE in rmask"
Hence my confusion. I would be grateful if someone could clarify which interpretation is correct.
The reason I'd like to know is that I'd like to mask a raster containing MODIS data on percentage tree cover with a raster from the same MODIS product that contains data quality codes. I'd like to retain only those values in the "tree cover" raster that have "good quality" quality codes in the "quality" raster. Clarifying how the mask() function works will help me to determine whether I need to use code [1] or code [2] to achieve what I want:
[1]
good <- c(0,1,2,3,4,5...etc.) # The codes in the quality raster that represent good quality data
tree_cover_masked <- mask(tree_cover, quality, maskvalue = good, inverse = TRUE)
# i.e. set cells in tree_cover to NA if they match any value OTHER THAN the "good" values in the quality raster.
# This is the code I would use based on my interpretation of the function documentation.
[2]
tree_cover_masked <- mask(tree_cover, quality, maskvalue = good)
# i.e. keep values in tree_cover that match "good" values in the quality raster, and set all others to NA
# This is the code I would use based on my interpretation of Lovelace et al.
Apologies again if this question is very simplistic, but I'd be grateful for your help!

What stops you from making a small example and test which approach works? In your case, neither [1] nor [2] will work, as maskvalue is a single value (the first value if you provide a longer vector). You probably want to use reclassify first
Example data
library(raster)
qual <- trees <- raster(nrow=4, ncol=4, xmn=0, xmx=1, ymn=0, ymx=1, crs='+proj=utm +zone=1')
values(trees) <- rep(1:4, 4)
values(qual) <- rep(1:8, 2)
Create a RasterLayer with good (4 - 8) and bad (1 - 8) values and then use mask
good <- reclassify(qual, rbind(c(0, 4, NA), c(4, 9, 1)))
# this would also work
# good <- reclassify(qual, cbind(0, 4, NA))
x <- mask(trees, good)
Or:
good <- subs(qual, data.frame(from=c(5,6,7,8,9), 1))
x <- mask(trees, good)

Related

How to visualize a distance matrix on the map by the thickness or color of the line connect the adjacent localities in R?

Suppose I have two datasets: (1) a data frame: coordinates of localities, each with ID; and (2) a linguistic distance matrix which reflects the linguistic distance between these localities.
# My data are similar to this structure
# dataframe
id <- c("A","B","C","D","E")
x_coor <- c(0.5,1,1,1.5,2)
y_coor <- c(5.5,3,7,6.5,5)
my.data <- data.frame(id = id, x_coor = x_coor, y_coor = y_coor)
# linguistic distance matrix
A B C D
B 308.298557
C 592.555483 284.256926
D 141.421356 449.719913 733.976839
E 591.141269 282.842712 1.414214 732.562625
Now, I want to visualize the linguistic distance between every two sites onto a map by the thickness or color of the line connect the adjacent localities in R.
Just like this:
enter image description here
My idea is to generate the delaunay triangulation by deldir or tripack package in R.
# generate delaunay triangulation
library(deldir)
de=deldir(my.data$x_coor,my.data$y_coor)
plot.deldir(de,wlines="triang",col='blue',wpoints = "real",cex = 0.1)
text(my.data$x_coor,my.data$y_coor,my.data$id)
this is the plot:
enter image description here
My question is how to reflect the linguistic distance by the thickness or color of the edges of triangles? Is there any other better method?
Thank you very much!
What you want to do in respect of the line widths can be done "fairly
easily" by the deldir package. You simply call plot.deldir() with the
appropriate value of "lw" (line width).
At the bottom of this answer is a demonstration script "demo.txt" which shows how to do this in the case of your example. In particular this script shows
how to obtain the appropriate value of lw from the "linguistic distance
matrix". I had to make some adjustments in the way this matrix was
presented. I.e. I had to convert it into a proper matrix.
I have rescaled the distances to lie between 0 and 10 to obtain the
corresponding values of the line widths. You might wish to rescale in a different manner.
In respect of colours, there are two issues:
(1) It is not at all clear how you would like to map the "linguistic
distances" to colours.
(2) Unfortunately the code for plot.deldir() is written in a very
kludgy way, whence the "col" argument to segments() cannot be
appropriately passed on in the same manner that the "lw" argument can.
(I wrote the plot.deldir() code a long while ago, when I knew far less about
R programming than I know now! :-))
I will adjust this code and submit a new version of deldir to CRAN
fairly soon.
#
# Demo script
#
# Present the linguistic distances in a useable way.
vldm <- c(308.298557,592.555483,284.256926,141.421356,449.719913,
733.976839,591.141269,282.842712,1.414214,732.562625)
ldm <- matrix(nrow=5,ncol=5)
ldm[row(ldm) > col(ldm)] <- vldm
ldm[row(ldm) <= col(ldm)] <- 0
ldm <- (ldm + t(ldm))/2
rownames(ldm) <- LETTERS[1:5]
colnames(ldm) <- LETTERS[1:5]
# Set up the example data. It makes life much simpler if
# you denote the "x" and "y" coordinates by "x" and "y"!!!
id <- c("A","B","C","D","E")
x_coor <- c(0.5,1,1,1.5,2)
y_coor <- c(5.5,3,7,6.5,5)
# Eschew nomenclature like "my.data". Such nomenclature
# is Micro$oft-ese and is an abomination!!!
demoDat <- data.frame(id = id, x = x_coor, y = y_coor)
# Form the triangulation/tessellation.
library(deldir)
dxy <- deldir(demoDat)
# Plot the triangulation with line widths proportional
# to "linguistic distances". Note that plot.deldir() is
# a *method* for plot, so you do not have to (and shouldn't)
# type the ".deldir" in the plotting command.
plot(dxy,col=0) # This, and plotting with "add=TRUE" below, is
# a kludge to dodge around spurious warnings.
ind <- as.matrix(dxy$delsgs[,c("ind1","ind2")])
lwv <- ldm[ind]
lwv <- 10*lwv/max(lwv)
plot(dxy,wlines="triang",col='grey',wpoints="none",
lw=10*lwv/max(lwv),add=TRUE)
with(demoDat,text(x,y,id,col="red",cex=1.5))

How can I get the same piece (duplicate code) of an image from many different photos every time?

From 5000 photos of license plates I want to determine which duplicate code these license plates have.
Here are 2 examples of a duplicate code on a license plate.
In the first example the duplicate code is 2 and in the second example the duplicate code is 1.
With the package Magick and Tesseract, see code below, I was able to retrieve the piece of the photo from the first example where the duplicate code is and to read the duplicate code. Only in the second example and other photos is the photo different.
So I am looking for something that can recognize where the duplicate code is and that will read the duplicate code. Note: The duplicate code is always above the 1st indent mark.
Does someone have an idea how to read the duplicate code automatically from 5000 different photos?
library(magick)
library(tesseract)
#Load foto:
foto <- image_read("C:/Users/camie/OneDrive/Documenten/kenteken3.jpg")
#Get piece of photo where duplicate code is retrieved:
foto2 <- image_crop(foto,"10X24-620-170")
#read duplicate code:
cat(ocr(foto3))
Here is an approach based on the package EBImage. ImageMagik is great for image manipulation but I think EBImage may provide more quantitative tools that are useful here. As for all image processing, the quality of input image matters a great deal. The approach suggested here would likely benefit from noise and artifact removal, scaling and possibly cropping.
Also, some licenses seem to have additional symbols in the position of interest that are not numbers. Clearly more pre-processing and filtering are needed for such cases.
Sample image
# Starting from EBImage
if (!require(EBImage)) {
source("http://bioconductor.org/biocLite.R")
biocLite("EBImage")
library(EBImage)
}
# Test images
# setwd(<image directory>)
f1 <- "license1.jpg"
f2 <- "license2.jpg"
# Read image and convert to normalized greyscale
img0 <- readImage(f1)
img <- channel(img0, "grey")
img <- normalize(img)
# plot(img) # insert plot or display commands as desired
# Rudimentary image process for ~300 pixel wide JPEG
xmf <- medianFilter(img, 1)
xgb <- gblur(xmf, 1)
xth <- xgb < otsu(xgb) # Otsu's algorithm to determine best threshold
xto <- opening(xth, makeBrush(3, shape = "diamond"))
A binary (thresholded) image has been produced and cleaned up to identify objects as shown here.
# Create object mask with unique integer for each object
xm <- bwlabel(xto)
# plot(colorLabels(xm)) # optional code to visualize the objects
In addition to the rudimentary image process, some "object processing" can be applied as shown here. Objects along the edge are not going to be of interest so they are removed. Similarly, artifacts that give rise to horizontal (wide) streaks can be removed as well.
# Drop objects touching the edge
nx <- dim(xm)[1]
ny <- dim(xm)[2]
sel <- unique(c(xm[1,], xm[nx,], xm[,1], xm[,ny]))
sel <- sel[sel != 0]
xm <- rmObjects(xm, sel, reenumerate = TRUE)
# Drop exceptionally wide objects (33% of image width)
major <- computeFeatures.moment(xm)[,"m.majoraxis"]
sel <- which(major > nx/3)
xm <- rmObjects(xm, sel, reenumerate = TRUE)
The following logic identifies the center of mass for each object with the computeFeatures.moment function of EBImage. It seems that the main symbols will be along a horizontal line while the candidate object will be above that line (lower y-value in EBImage Image object). An alternative approach would be to find objects stacked on one another, i.e., objects with similar x-values.
For the examples I explored, one standard deviation away from the median y-value for the center of mass appears to be sufficient to identify candidate object. This is used to determine the limits shown below. Of course, this logic should be adjusted as dictated by the actual data.
# Determine center of mass for remaining objects
M <- computeFeatures.moment(xm)
x <- M[,1]
y <- M[,2]
# Show suggested limit on image (y coordinates are inverted)
plot(img)
limit <- median(y) - sd(y)
abline(h = limit, col = "red")
# Show centers of mass on original image
ok <- y < limit
points(x[!ok], y[!ok], pch = 16, col = "blue")
points(x[ok], y[ok], pch = 16, col = "red")
The image shows the segmented objects after having discarded objects along the edge. Red shows the candidate, blue shows the non-candidates.
Because some licenses have two symbols above the dash, the following code selects the leftmost of possible candidates, expands the object mask and returns a rectangular crop of the image that can be passed to ocr().
# Accept leftmost (first) of candidate objects
left <- min(x[which(ok)])
sel <- which(x == left)
# Enlarge object mask and extract the candidate image
xm <- dilate(xm, makeBrush(7, "disc"))
ix <- range(apply(xm, 2, function(v) which(v == sel)))
iy <- range(apply(xm, 1, function(v) which(v == sel)))
xx <- ix[1]:ix[2]
yy <- iy[1]:iy[2]
# "Return" selected portion of image
ans <- img[xx, yy] # this is what can be passed to tesseract
plot(ans, interpolate = FALSE)
Here is the unscaled and extracted candidate image from example 1:
Another sample image
The same code applied to this example gives the following:
With a few more checks for errors and for illogical conditions, the code could be assembled into single function and applied to the list of 5000 files! But of course that assumes they are properly formatted, etc. etc.
What with the existance of multiple layouts for Dutch license plates, I'm not sure if you just can hardcode a method to extract a duplication value. Also you don't mention if every image you have always has the same quality and/or orientation/scale/skew/etc.
You could in theory apply a Convolutional Neural Network that categorizes license plates in a several categories. (0 for n/a, 1 for 1, 2 for 2, etc.) However I am not familiar with related packages in R, so I won't be able to point you to some.

R raster: extent conditional on cell value

I would like to obtain the extent of raster layer conditional on certain cell values. Consider the following example:
raster1 is a large raster object, filled with values between 1 and 1000. However, I only want to obtain the extent for pixels with value 100. Since this subset of cells should crowd in a small region, the extent should be rather narrow. Once I know the coordinates of that box, I can crop this minor area.
My approach so far is to replace all values != 100 with NA - as suggested in related questions. Considering the raster object's overall size, this step takes an enormous amount of time and invests a lot of computational capacity in regions that I would like to crop anyways.
Does anyone know how to obtain the extent conditional on a certain pixel value which does not require to reclassify the entire object beforehand?
Here is an alternative way to do that
Example data:
library(raster)
r <- raster(ncol=18,nrow=18)
values(r) <- 1
r[39:45] <- 100
r[113:115] <- 100
r[200] <- 100
"Standard" way:
x <- r == 100
s <- trim(x, values=FALSE)
Alternate route by creating an extent:
xy <- rasterToPoints(r, function(x){ x ==100 })
e <- extent(xy[,1:2])
e <- alignExtent(e, r, snap='out')
v <- crop(r, e)
Either way, all cells need to be looked at, but at least you do not need to create another large raster.

calculating area of most suitable raster habitat in R

I have run Maxent for multiple species under present conditions and also under future climate change scenarios. I was quantifying changes between present and future suitable habitat using the nicheOverlap function and Schoener's D statistic. Quite a few of the organisms in my study are just moving farther up mountains so there is a lot of overlap as the future distribution is inside the present distribution (just occupying less area at higher elevations). By looking at the ascii files in QGIS I can see that there is less suitable habitat in terms of area in the future, so I want to quantify this. I have scoured the internet for a good way to calculate area for rasters and never found anything that perfectly suited my fancy. I therefore wrote up something that is an amalgamation of bits and pieces of various scripts. It is pasted below.
Two questions:
1) do you all agree this is doing what I think it is doing (calculating area in square kilometers)?
2) is there a way to simplify this? Specifically you'll see I go from a raster to a dataframe back to raster? Maybe I could stay in rasters?
Thanks for any input!
Rebecca
####
library(raster)
#load rasters
m <- raster("SpeciesA_avg.asc")
mf <- raster("SpeciesA_future_layers_avg.asc")
#change to dataframe
m.df <- as.data.frame(m, xy=TRUE)
#get rid of NAs
m.df1 <- na.omit(m.df)
#keep only cells that that have a suitability score above 0.5 (scores range from 0 to 1)
m.df2 <- m.df1[m.df1$SpeciesA_avg> 0.5,]
#re-rasterize just the suitable area
m.raster <- rasterFromXYZ(m.df2)
##same as above but for future projection
mf.df <- as.data.frame(mf, xy=TRUE)
mf.df1 <- na.omit(mf.df)
mf.df2 <- mf.df1[mf.df1$SpeciesA_future_layers_avg>0.5,]
mf.raster <-rasterFromXYZ(mf.df2)
#get sizes of all cells in current distribution raster
#note my original layers were 30 seconds or 1 km2.
cell_size<-area(m.raster, na.rm=TRUE, weights=FALSE)
#delete NAs from all raster cells. It looks like these come back when switching from dataframe to raster
cell_size1<-cell_size[!is.na(cell_size)]
#compute area [km2] of all cells in raster
raster_area_present<-length(cell_size1)*median(cell_size1)
raster_area_present
#get sizes of all cells in future raster [km2]
cell_size<-area(mf.raster, na.rm=TRUE, weights=FALSE)
#delete NAs from vector of all raster cells
cell_size1<-cell_size[!is.na(cell_size)]
#compute area [km2] of all cells in geo_raster
raster_area_future<-length(cell_size1)*median(cell_size1)
raster_area_future
##calculate change in area
dif_area <- raster_area_present - raster_area_future
dif_area
When you ask a question, you should provide a simple self-contained example. Not just dump your script that points to files we do not have. Writing a simple example teaches your R, and often helps you solve the problem by yourself. Anyway, I here is some example data and solution to your problem, I think:
library(raster)
#example data
m <- mf <- raster(ncol=10, nrow=10, vals=0)
m[,1] <- NA
m[,3:7] <- 1
mf[,6:9] <- 1
# get rid of NAs (the example has none); should not be needed
m <- reclassify(m, cbind(NA, NA, 0))
mf <- reclassify(mf, cbind(NA, NA, 0))
# keep cells > 0.5 (scores range from 0 to 1)
m <- round(m)
mf <- round(mf)
# now combine the two layers, for example:
x <- m + mf * 10
# area of each cell
a <- area(x)
# sum area by class
z <- zonal(a, x, sum)
# zone value
#[1,] 0 152327547
#[2,] 1 152327547
#[3,] 10 101551698
#[4,] 11 101551698
zone 0 is "not current, nor future", 1 is "current only", 10 is "future only" and 11 is "current and future"
The areas are in m^2.
You may want to check out this tutorial on maxent and other spatial distribution models: http://rspatial.org/sdm/

Using R for simple image/pattern-recognition task?

I have an image with many dots, and I would like to extract from it what is the x-y location of each dot.
I already know how to do this manually (there is a package for doing it).
However, is there some way of doing it automatically ?
(My next question will be - is there a a way, when having an image of many lines, to detect where the lines intersect/"touch each other")
Due to requests in the comments, here is an example for an image to "solve" (i.e: extract the data point locations for it)
#riddle 1 (find dots):
plot(cars, pch = 19)
#riddle 2 (find empty center circles):
plot(cars, pch = 1)
#riddle 2 (fine intersection points):
plot(cars, pch = 3)
#riddle 3 (find intersections between lines):
plot(cars, pch = 1, col = "white")
lines(stats::lowess(cars))
abline(v = c(5,10,15,20,25))
Thanks, Tal
(p.s: since I am unfamiliar with this field, I am sorry if I am using the wrong terminology or asking something too simple or complex. Is this OMR?)
The Medical Imaging Task View covers general image provessing, this may be a start.
Following up after Dirk, yes check the medical imaging task view. Also look at Rforge,
Romain Francois has an RJImage package and another image processing package was recently registered. What you are looking for are segmentation algorithms. Your dots problem is much easier than the line problem. The first can be done with an RGB or greyscale filter, just doing some sort of radius search. Detecting linear features is harder. Once you hve the features extracted you can use a sweepline algorithm to detect intersections. EBIimage may have an example for detecting cells in the vignette.
Nicholas
I think you could use package raster to extract xy coordinates from an image with specific values. Have a look at the package vignettes.
EDIT
Can you try this and tell me if it's in the ball park of what you're looking for?
I hope the code with comments is quite self-explanatory. Looking forward to your answer!
library(raster)
rst <- raster(nrows = 100, ncols = 100) #create a 100x100 raster
rst[] <- round(runif(ncell(rst))) #populate raster with values, for simplicity we round them to 0 and 1
par(mfrow=c(1,2))
plot(rst) #see what you've got so far
rst.vals <- getValues(rst) #extract values from rst object
rst.cell.vals <- which(rst.vals == 1) #see which cells are 1
coords <- xyFromCell(rst, rst.cell.vals) #get coordinates of ones
rst[rst.cell.vals] <- NA #set those raster cells that are 1 to NA (you can play with rst[!rst.cell.vals] <- NA to exclude all others)
plot(rst) #a diag plot, should have only one color

Resources