r Terra issue with multicategorical raster. How to properly extract the categories and their values into layers without losing data? - r

I am working with rTerra and having an issue with the CONUS historical disturbance dataset from LANDFIRE found here:https://landfire.gov/version_download.php (HDist is the name). To summarize what I want to do, I want to take this dataset, crop and project to my extent, then take the values of the cells and separate them as layers. So I want a layer for severity, one for disturbance type, etc. The historical disturbance data has these things all in one attribute table. In terra, this attribute table is set up under categories and this is providing a lot of problems. I have not had issues with the crop nor reproject, it is getting into the values and separating the categories into layers. I have the following code
library(terra)
setwd("your pathway to historical disturbance tif here")
h1 <- terra::rast("LC16_HDst_200.tif") #read in the Hdist tif
h2 <- terra::project(h1, "EPSG:5070", method = "near") #project it using nearest neighbor
h3 <- crop(h2, ext([xmin,xmax,ymin,ymax]) #crop to the extent
h3
This then gives the output in the extent and projection I want but the main focus is the categories
categories : Count, HDIST_ID, DISTCODE_V, DIST_TYPE, TYPE_CONFI, SEVERITY, SEV_CONFID, HDIST_CAT, FDIST, R, G, B
So I learned that with these kinds of datasets, the values are stored under these categories.
if I plot with plot(h3)
I only get the first row of the count category. In order to switch that category I can use
activeCat(h3) <- 4
h3
and I would get
name : DIST_TYPE
min value : Clearcut
max value : Wildland Fire Use
The default active category was count, but now its DIST_TYPE, the fourth category, nothing too crazy. I try plotting
plot(h3)
I only get NoData plotted. None of the others. There is a function called catalyze() That claims to take your categories and converts them all into numerical layers
h4 <- catalyze(h3)
which gave me a thirteen layer dataset, which makes sense because there are 13 categories and it takes them and converts them into numeric layers. I tried plotting
plot(h4, 4) #plot h4 layer 4, which would correspond to DIST_TYPE category
it only plots a value of 8, and it looks to only show what is likely noData values. The map is mostly green, which is inline with the NoData from HDist.
Anytime I try directly accessing values, it crashes. When I look at the min and max values I get 8 and 8 for min and max for that 'name" names: DIST_TYPE min values: 8 max values: 8. Other categories show a similar pattern. So it appeared to just take the first row of values for each category and make that the entire layer.
In summary, it is clear that terra stores all of the values that would easily be seen in an attribute table if the dataset were brought into arcgis. However, whenever I try to plot it or work with it, even before any real manipulation, it only accesses the top row of that attribute table, and when I catalyze, it just seems to mess everything up even more. I know this is really easy to solve in arcgis pro, but I want to keep everything in r from a documentation coherency standpoint. Any terra whizzes know what to do about this? I figure it has to be something very easy, but I don't really know what else to try. Maybe it is some major issue too. I have the same issue with LANDFIRE evt data. I have not had this issue with simple rasters such as dem, canopy cover, etc. It is only with these rasters with multiple categories (or columns in an attribute table)
edit
this is the break image

That failed because the (ESRI) VAT IDs are not in the expected (for GDAL categories) 0..255 range. This has now been fixed and I get:
library(terra)
#terra version 1.4.6
r <- rast("LC16_HDst_200.tif")
activeCat(r) <- 4
r <- crop(r, ext(-93345, -57075, 1693125, 1716735))
plot(r)

Related

Moving spatial data off gird cell corners

I have a seemingly simple question that I can’t seem to figure out. I have a large dataset of millions of data points. Each data point represents a single fish with its biological information as well as when and where it was caught. I am running some statistics on these data and have been having issues which I have finally tracked down to some data points having latitude and longitude values that fall exactly on the corners of the grid cells which I am using to bin my data. When these fish with lats and long that fall exactly onto grid cell corners are grouped into their appropriate grid cell, they end up being duplicated 4 times (one for each cell that touches the grid cell corner their lats and long identify).
Needless to say this is bad and I need to force those animals to have lats and long that don’t put them exactly on a grid cell corner. I realize there are probably lots of ways to correct something like this but what I really need is a simply way to identify latitudes and longitudes that have integer values, and then to modify them by a very small amount (randomly adding or subtracting) so as to shift them into a specific cell without creating a bias by shifting them all the same way.
I hope this explanation makes sense. I have included a very simple example in order to provide a workable problem.
fish <- data.frame(fish=1:10, lat=c(25,25,25,25.01,25.2,25.1,25.5,25.7,25,25),
long=c(140,140,140,140.23,140.01,140.44,140.2,140.05,140,140))
In this fish data frame there are 10 fish, each with an associated latitude and longitude. Fish 1, 2, 3, 9, and 10 have integer lat and long values that will place them exactly on the corners of my grid cells. I need some way of shifting just these values by something like plus are minus 0.01.
I can identify which lats or longs are integers easy enough with something like:
fish %>%
near(as.integer(fish$lat))
But am struggling to find a way to then modify all the integer values by some small amount.
To answer my own question I was able to work this out this morning with some pretty basic code, see below. All it takes is making a function that actually looks for whole numbers, where is.integer does not.
# Used to fix the is.integer function to actually work and not just look at syntax
is.wholenumber <- function(x, tol = .Machine$double.eps^0.5) abs(x - round(x)) < tol
# Use ifelse to change only whole number values of lat and long
fish$jitter_lat <- ifelse(is.wholenumber(fish$lat), fish$lat+rnorm(fish$lat, mean=0, sd=0.01), fish$lat)
fish$jitter_long <- ifelse(is.wholenumber(fish$long), fish$long+rnorm(fish$long, mean=0, sd=0.01), fish$long)

Averaging different length vectors with same domain range in R

I have a dataset that looks like the one shown in the code.
What I am guaranteed is that the "(var)x" (domain) of the variable is always between 0 and 1. The "(var)y" (co-domain) can vary but is also bounded, but within a larger range.
I am trying to get an average over the "(var)x" but over the different variables.
I would like some kind of selective averaging, not sure how to do this in R.
ax=c(0.11,0.22,0.33,0.44,0.55,0.68,0.89)
ay=c(0.2,0.4,0.5,0.42,0.5,0.43,0.6)
bx=c(0.14,0.23,0.46,0.51,0.78,0.91)
by=c(0.1,0.2,0.52,0.46,0.4,0.41)
qx=c(0.12,0.27,0.36,0.48,0.51,0.76,0.79,0.97)
qy=c(0.03,0.2,0.52,0.4,0.45,0.48,0.61,0.9)
a<-list(ax,ay)
b<-list(bx,by)
q<-list(qx,qy)
What I would like to have something like
avgd_x = c(0.12,0.27,0.36,0.48,0.51,0.76,0.79,0.97)
and
avgd_y would have contents that would
find the value of ay and by at 0.12 and find the mean with ay, by and qy.
Similarly and so forth for all the values in the vector with the largest number of elements.
How can I do this in R ?
P.S: This is a toy dataset, my dataset is spread over files and I am reading them with a custom function, but the raw data is available as shown in the code below.
Edit:
Some clarification:
avgd_y would have the length of the largest vector, for example, in the case above, avgd_y would be (ay'+by'+qy)/3 where ay' and by' would be vectors which have c(ay(qx(i))) and c(by(qx(i))) for i from 1 to length of qx, ay' and by' would have values interpolated at data points of qx

Making a histogram

this sounds pretty basic but every time I try to make a histogram, my code is saying x needs to be numeric. I've been looking everywhere but can't find one relating to my problem. I have data with 240 obs with 5 variables.
Nipper length
Number of Whiskers
Crab Carapace
Sex
Estuary location
There is 3 locations and i'm trying to make a histogram with nipper length
I've tried making new factors and levels, with the 80 obs in each location but its not working
Crabs.data <-read.table(pipe("pbpaste"),header = FALSE)##Mac
names(Crabs.data)<-c("Crab Identification","Estuary Location","Sex","Crab Carapace","Length of Nipper","Number of Whiskers")
Crabs.data<-Crabs.data[,-1]
attach(Crabs.data)
hist(`Length of Nipper`~`Estuary Location`)
Error in hist.default(Length of Nipper ~ Estuary Location) :
'x' must be numeric
Instead of correct result
hist() doesn't seem to like taking more than one variable.
I think you'd have the best luck subsetting the data, that is, making a vector of nipper lengths for all crabs in a given estuary.
crabs.data<-read.table("whatever you're calling it")
names<-(as you have it)
Estuary1<-as.vector(unlist(subset(crabs.data, `Estuary Loc`=="Location", select = `Length of Nipper`)))
hist(Estuary1)
Repeat the last two lines for your other two estuaries. You may not need the unlist() command, depending on your table. I've tended to need it for Excel files, but I don't know what format your table is in (that would've been helpful).

Extracting data from lower layers in a Rasterbrick

So I'm extracting data from a rasterbrick I made using the method from this question: How to extract data from a RasterBrick?
In addition to obtaining the data from the layer given by the date, I want to extract the data from months prior. In my best guess I do this by doing something like this:
sapply(1:nrow(pts), function(i){extract(b, cbind(pts$x[i],pts$y[i]), layer=pts$layerindex[i-1], nl=1)})
So it the extracting should look at layerindex i-1, this should then give the data for one month earlier. So a point with layerindex = 5, should look at layer 5-1 = 4.
However it doesn't do this and seems to give either some random number or a duplicate from months prior. What would be the correct way to go about this?
Your code is taking the value from the layer of the previous point, not the previous layer.
To see that imagine we are looking at the point in row 2 (i=2). your code that indicates the layer is pts$layerindex[i-1], which is pts$layerindex[1]. In other words, the layer of the point in row 1.
The fix is easy enough. For clarity I will write the function separetely:
foo = function(i) extract(b, cbind(pts$x[i],pts$y[i]), layer=pts$layerindex[i]-1, nl=1)
sapply(1:nrow(pts), foo)
I have not tested it, but this should be all.

R package spatstat: How to use point process model covariate as factor starting with shape file

I have a question similar to this one from 2014, which was answered but the datasets are no longer available and our original data structures differ. (I'm in crunch time and stumped, so if you're able to respond quickly I would greatly appreciate it!!)
Goal: use the type of bedrock as a covariate in a Point Process Model (ppm) in spatstat with mine locations in Connecticut.
Data: the files are available from this Dropbox folder. The rock data and CT poly outline comes from UConn Magic Library, and the mine data comes from the USGS Mineral Resources Data System.
Approach:
I loaded some relevant packages and read in the shapefiles (and converted coords to match CT's system), and used the CT polygon as an owin object.
library(rgdal)
library(splancs)
library(spatstat)
library(sp)
library(raster)
library(geostatsp)
#read in shapefiles
ct <-readOGR(".","CONNECTICUT_STATE_POLY")
mrds <-readOGR(".","mrds-2017-02-20-23-30-58")
rock<-readOGR(".","bedrockpolyct_37800_0000_2000_s50_ctgnhs_1_shp_wgs84")
#convert mrds and rock to ct's coord system
tempcrs<-ct#proj4string
mrds<-spTransform(mrds,tempcrs)
rock<-spTransform(rock,tempcrs)
#turn ct shapefile into owin, call it w for window
w <-as.owin(ct)
#subset mrds data to just CT mines
mrdsCT <-subset(mrds,mrds#data$state=="Connecticut")
#ppm can't handle marked data yet, so need to unmark()
#create ppp object for mrds data, set window to w
mrdsCT.ppp <-as.ppp(mrdsCT)
Window(mrdsCT.ppp)<-w
From "Modelling Spatial Point Patterns in R" by Baddeley & Turner (page 39):
Unfortunately a pixel image in spatstat cannot have categorical (factor) values, because R refuses to create a factor-valued matrix. In order to represent a categorical variate as a pixel image, the categorical values should be encoded as integers (for efficiency’s sake) and assigned to an integer-valued pixel image. Then the model formula should invoke the factor command on this image. For example if fim is an image with integer values which represent levels of a factor, then:
ppm(X, ˜factor(f), Poisson(), covariates=list(f=fim))
There are several different types of rock classification included in the shapefile. I'm interested in LITHO1, which is a factor with 27 levels. It's the sixth attribute.
litho1<-rock[,6]
My (limited but researched) understanding is that I need to convert the shapefile to a raster, and later convert it to an image in order to be used in ppm. I created a mask from ct, and used that.
ctmask<-raster(ct, resolution=2000)
ctmask[!is.na(ctmask)] <- 0
litho1rast<-rasterize(litho1,ctmask)
After this point, I've tried several approaches and haven't had success just yet. I've attempted to follow the approaches laid out in the question linked, as well as search in documentation for relevant examples to adopt (factor, ratify, levels). Unlike the prior question, my data was already a factor, so it wasn't clear why I should apply the factor function to it.
Looking at litho1rast, the #data#attributes dataframe contains the following. If I plot it, it just plots the ID; levelplot function does plot LITHO1. When I would apply the factor functions, the ID would be retained but not LITHO1.
$ ID : int [1:1891] 1 2 3 4 5 6 7 8 9 10 ...
$ LITHO1: Factor w/ 27 levels "amphibolite",..: 23 16 23 16 23 16 24 23 16 24 ...
The ppm model would need an object class im, so I converted the raster to the im. I tried two ways. I can make ppm execute...but it treats every point as a factor rather than the 27 levels (with either litho1.im or litho1.im2) ...
litho1.im<-as.im(litho1rast)
litho1.im2<-as.im.RasterLayer(litho1rast)
model1=ppm(unmark(mrdsCT.ppp) ~ factor(COV1), covariates=list(COV1=litho1.im))
model1
So, I'm not quite sure where to go from here. It seems like I need to pass an argument to the as.im so that it knows to retain the LITHO1 not the ID. Clever ideas or leads to pertinent functions or approaches much appreciated!
The quoted statement from Baddeley & Turner is no longer true --- that quotation is from a very old set of workshop notes.
Pixel images of class im can have factor values (since 2011). If Z is an integer-valued pixel image (of class im), you can make it into a factor-valued image by setting levels(Z) <- lev where lev is the character vector of labels for the possible values.
You should not need to use rasterize: it should be possible to convert rock[,6] directly into a pixel image using as.im (after loading the maptools package).
See the book by Baddeley, Rubak and Turner (Spatial point patterns: methodology and applications with R, CRC Press, 2016) for a full explanation.
Looking at your code you don't seem to be providing the field argument to rasterize.
From rasterize help:
fieldnumeric or character. The value(s) to be transferred. This can
be a single number, or a vector of numbers that has the same length as
the number of spatial features (points, lines, polygons). If x is a
Spatial*DataFrame, this can be the column name of the variable to be
transferred. If missing, the attribute index is used (i.e. numbers
from 1 to the number of features). You can also provide a vector with
the same length as the number of spatial features, or a matrix where
the number of rows matches the number of spatial features
at this line:
litho1rast<-rasterize(litho1,ctmask)
you probably have to specify which column of the litho object to use in rasterization. Something like:
litho1rast<-rasterize(litho1,ctmask, field = "LITHO1")

Resources