algebra using rasters and dataframes - r

I want to predict vegetation health using 2 remote sensing vegetation indices (VIs) for multiple tree-stands across multiple months. I previously approached this by using a for() loop to iterate through a list of multi-band rasters and calculate the two VIs for each raster (month) using a given equation. I then used raster::extract() to extract the pixels corresponding to each stand. However, I now would like to include some additional variables in my predictions of vegetation health, and am having trouble integrating them using the same method as they are simply columns in a dataframe and not rasters. I'm open to different ways to do this, I just can't think of any.
example:
#Part 1: Loading libraries and creating some sample data
library(sf)
library(raster)
library(terra)
#polygons to generate random points into
v <- vect(system.file("ex/lux.shp", package="terra"))
v <- v[c(1:12)]
v_sf <- st_as_sf(v) # Convert 'SpatVector' into 'sf' object
#5 rasters (months) with 5 bands each
r <- rast(system.file("ex/elev.tif", package="terra"))
r <- rep(r, 5) * 1:5
names(r) <- paste0("band", 1:5)
ras_list <- list(r,r,r,r,r)
#generating some points (10 forest stands)
pnts <- st_sample(v_sf, size = 10, type = "random")
pnts<- as_Spatial(pnts)
#Part 2: Loop to predict vegetation health using two VI variables
vis <- list() #empty list to store NDVI rasters
for (i in seq_along(ras_list)) {
b <- ras_list[[i]]
#vegetation health = 1.23 + (0.45 * VI1) - (0.67 * VI2)
vis[i] <- 1.23 + 0.45*((b[[4]] + b[[3]] - b[[1]]) / (b[[4]] + b[[3]])) - 0.67*(b[[1]] * b[[3]] - b[[4]])
}
#Part 3: Loop to extract pixel values for each forest stand
vi_vals <- list() #empty list to store extracted pixel values
for (i in 1:length(vis)) {
n <- raster(vis[[i]])
vi_vals[[i]] <- raster::extract(n, pnts, method = "bilinear")
}
This method works fine but as I mentioned above, I now need to repeat the same process using a new equation which incorporates variables that can't be calculated from a raster. These values are simply 3 columns in a dataframe that are identified by a stand ID.

Let's first simplify your example a bit
Example data
library(terra)
v <- vect(system.file("ex/lux.shp", package="terra"))
r <- rast(system.file("ex/elev.tif", package="terra"))
r <- rep(r, 5) * 1:5
names(r) <- paste0("b", 1:5)
ras_list <- list(r,r,r,r,r)
set.seed(1)
pnts <- spatSample(v, 10, "random")
values(pnts) = data.frame(id=10, a=5:14, b=3:12, d=6:15)
Compute VI and extract
vis <- list()
for (i in seq_along(ras_list)) {
b <- ras_list[[i]]
vis[[i]] <- 1.23 + 0.45*((b[[4]] + b[[3]] - b[[1]]) / (b[[4]] + b[[3]])) - 0.67*(b[[1]] * b[[3]] - b[[4]])
}
vis <- rast(vis)
names(vis) = paste0("set", 1:5)
vi_vals <- extract(vis, pnts, method = "bilinear")
And now you can do something with the tree parameters
out <- t(t(vi_vals[,-1])) * pnts$a + pnts$b / pnts$d
It would be more efficient to first extract the values and then apply the function
e <- list()
for (i in seq_along(ras_list)) {
x <- extract(ras_list[[i]], pnts, method="bilinear")[,-1]
e[[i]] = (1.23 + 0.45*((x$b4 + x$b3 - x$b1) / (x$b4 + x$b3)) - 0.67*(x$b1 * x$b3 - x$b4)) * pnts$a + pnts$b / pnts$d
}
e <- do.call(cbind, e)
The results are not exactly the same; I assume because of loss of decimal number precision in one or the other method.

Related

How to create the sampling matrixes for Sobol sensitivity analysis in R (package "sensitivity")

I would like to perform a Sobol sensitivity analysis in R
The package "sensitivity" should allow me to do so, but I don't understand how to generate the sampling matrixes (X1, X2). I have a model that runs outside of R. I have 6 parameters with uniform distribution.
In my text book: N = (2k+2)*M ; M = 2^b ; b=[8,12] (New sampling method : Wu et al. 2012)
I had the feeling that I should create two sampling matrix and feed the two to the sobol function X1_{M,k} X2_{M,k}.
The dimension of final sampling matrix x$X is then (k+2)*M. because:
X <- rbind(X1, X2)
for (i in 1:k) {
Xb <- X1
Xb[, i] <- X2[, i]
X <- rbind(X, Xb)
}
How should I conduct my sampling to get the right number of runs as (2*k+2)*M ?
This script is for the old method but does someone know if the new method is already implemented yet in the sensitivity package? Feel free to comment this procedure
name = c("a" , "b" , "c" , "d" , "e", "f")
vals <- list(list(var="a",dist="unif",params=list(min=0.1,max=1.5)),
list(var="b",dist="unif",params=list(min=-0.3,max=0.4)),
list(var="c",dist="unif",params=list(min=-0.3,max=0.3)),
list(var="d",dist="unif",params=list(min=0,max=0.5)),
list(var="e",dist="unif",params=list(min=2.4E-5,max=2.4E-3)),
list(var="f",dist="unif",params=list(min=3E-5,max=3E-3)))
k = 6
b = 8
M = 2^b
n <- 2*M
X1 <- makeMCSample(n,vals, p = 1)
X2 <- makeMCSample(n,vals, p = 2)
x <- sobol2007(model = NULL, X1, X2, nboot = 200)
if I understand correctly, I should provide a y for each x$X sampling combination
then I can use the function "tell" which will generate the Sobol' first-order indices as well as the total indices
tell(x,y)
ggplot(x)
Supplemental R function SobolR
makeMCSample <- function(n, vals) {
# Packages to generate quasi-random sequences
# and rearrange the data
require(randtoolbox)
require(plyr)
# Generate a Sobol' sequence
if (p == 2){ sob <- sobol(n, length(vals), seed = 4321, scrambling = 1)
}else{sob <- sobol(n, length(vals), seed = 1234, scrambling = 1)}
# Fill a matrix with the values
# inverted from uniform values to
# distributions of choice
samp <- matrix(rep(0,n*(length(vals)+1)), nrow=n)
samp[,1] <- 1:n
for (i in 1:length(vals)) {
# i=1
l <- vals[[i]]
dist <- l$dist
params <- l$params
fname <- paste("q",dist,sep="")
samp[,i+1] <- do.call(fname,c(list(p=sob[,i]),params))
}
# Convert matrix to data frame and add labels
samp <- as.data.frame(samp)
names(samp) <- c("n",laply(vals, function(l) l$var))
return(samp)
}
ref: Qiong-Li Wu, Paul-Henry Cournède, Amélie Mathieu, 2012, Efficient computational method for global sensitivity analysis and its application to tree growth modelling

How to cycle through a list of functions in a for loop

I am fairly new to programming in R, so I apologize if this question is too basic. I am trying to study the properties of OLS with error terms created by three different processes (i.e., normal1, normal2, and chi-square). I include these in a list, 'fun_list'.
I would like to iterate through 1,000 (iter) regressions, each with sample size 500 (n). I would like to save all 1,000 X 500 observations in a dataset (big_data) as well as the regression results (reg_results).
At the end of the program, I would like 1,000 regressions for each of the three processes (for a total of 3,000 regressions). I have set up nested loops for the three functions on one level and the 1,000 iterations on a different (sub-) level. I am having trouble getting the program to loop through the three different functions. I am not sure how to call out each element of the list in this embedded loop. Any help would be greatly appreciated!
library(psych)
library(arm)
library(dplyr)
library(fBasics)
library(sjstats)
#set sample size and number of iterations
set.seed(12345)
n <- 500
iter <- 1000
#setting empty vectors. Probably a better way to do this. :)
bn <- rep(NA,iter)
sen <- rep(NA,iter)
#these are the three functions I want to use to generate en,
#which is the error term below. I want one loop for each of the three.
# I can get f1, f2 and f3 to work independently, but I can't get the list
#to work to cycle through all three.
f1 <- function (n) {rnorm(n, 0, 2)}
f2 <- function (n) {rnorm(n, 0, 10)}
f3 <- function (n) {rchisq(n, 2)}
fun_list <- list(f1, f2, f3)
#following line starting point for saving all iterations in one big
#dataset
datalist = list()
#if I remove the following line (for (j ....)), I can get this to work by
#referencing each function independently (i.e., using 'en <- f1(n)').
for (j in fun_list) {
for (s in 1:iter) {
# en <- f1(n)
en <- fun_list[[1]]
x <- rnorm(n, 0, .5)
yn <- .3*x + en
#this is the part that saves the data#
dat <- data.frame(yn, x, en)
dat$s <- s
datalist[[s]] <- dat
#### run model for normal data and save parameters###
lm1n <- lm(yn ~ x)
int.hatn <- coef (lm1n)[1]
b.hatn <- coef (lm1n)[2]
se.hatn <- se.coef (lm1n) [2]
##save them for each iteration
bn[s] = b.hatn
sen[s] = se.hatn
}
}
reg_results<- tibble(bn, sen)
big_data = do.call(rbind,datalist)
When using the loop, I get the following error:
Error in 0.3 * x + en : non-numeric argument to binary operator
I am assuming this is because I do not fully understand how to call out each of the three functions in the list.
Here is a complete solution which wraps the multiple points discussed in the comments:
library(psych)
library(arm)
library(dplyr)
library(fBasics)
library(sjstats)
#set sample size and number of iterations
set.seed(12345)
n <- 500
iter <- 1000
#setting empty vectors. Probably a better way to do this. :)
bn <- c()
sen <- c()
#these are the three functions I want to use to generate en,
#which is the error term below. I want one loop for each of the three.
# I can get f1, f2 and f3 to work independently, but I can't get the list
#to work to cycle through all three.
f1 <- function (n) {rnorm(n, 0, 2)}
f2 <- function (n) {rnorm(n, 0, 10)}
f3 <- function (n) {rchisq(n, 2)}
fun_list <- list(f1, f2, f3)
#following line starting point for saving all iterations in one big
#dataset
datalist = list()
#if I remove the following line (for (j ....)), I can get this to work by
#referencing each function independently (i.e., using 'en <- f1(n)').
for (j in c(1:length(fun_list))) {
en <- fun_list[[j]]
for (s in 1:iter) {
x <- rnorm(n, 0, .5)
random_part <- en(n)
yn <- .3*x + random_part
#this is the part that saves the data#
dat <- data.frame(yn, x, random_part)
dat$s <- s
datalist[[s]] <- dat
#### run model for normal data and save parameters###
lm1n <- lm(yn ~ x)
int.hatn <- coef(lm1n)[1]
b.hatn <- coef(lm1n)[2]
se.hatn <- se.coef(lm1n)[2]
##save them for each iteration
bn = c(bn,b.hatn)
sen = c(sen,se.hatn)
}
}
reg_results<- tibble(bn, sen)
big_data = do.call(rbind,datalist)

Package for Divide Chain of tesselations in R, spatstat package?

I am trying to create pretty figures of clustered points. Is there a package which will create the divide chain between tessellations of points? Ideally it would be fit for plotting in ggplot.
Here is some example code:
#DivideLineExample
library(spatstat)
W=owin(c(0,1),c(0,1)) # Set up the Window
p<-runifpoint(42, win=W) # Get random points
ll=cbind(p$x,p$y) # get lat/long for each point
zclust=kmeans(ll,centers=4) # Cluster the points spatially into 4 clusters
K<-pp<-D<-list()
plot(W,main="Clustered Points")
for (i in 1:4){ # this breaks up the points into separate ppp objects for each cluster
K[[i]]=ll[zclust$cluster==i,]
pp[[i]]=as.ppp(K[[i]],W)
plot(pp[[i]],col=i,add=TRUE,cex=1.5,pch=16)
D[[i]]=dirichlet(pp[[i]]) # This performs the Dirichlet Tessellation and plots
plot(D[[i]],col=i,add=TRUE)
}
This outputs as such:
http://imgur.com/CCXeOEB
What I'm looking for is this:
http://imgur.com/7nmtXjo
I know an algorithm exists.
Any ideas/alternatives?
I have written a function that I think will do what you want:
divchain <- function (X) {
stopifnot(is.ppp(X))
if(!is.multitype(X)) {
whinge <- paste(deparse(substitute(X)),
"must be a marked pattern with",
"factor valued marks.\n")
stop(whinge)
}
X <- unique(X, rule = "deldir", warn = TRUE)
w <- Window(X)
require(deldir)
dd <- deldir(X,z=marks(X),rw=c(w$xrange,w$yrange))
if (is.null(dd))
return(NULL)
ddd <- dd$dirsgs
sss <- dd$summary
z <- sss[["z"]]
rslt <- list()
nsgs <- nrow(ddd)
K <- 0
for (i in 1:nsgs) {
i1 <- ddd[i,5]
i2 <- ddd[i,6]
c1 <- z[i1]
c2 <- z[i2]
if(c1 != c2) {
K <- K+1
rslt[[K]] <- unlist(ddd[i,1:4])
}
}
class(rslt) <- "divchain"
attr(rslt,"rw") <- dd$rw
rslt
}
I have also written a plot method for class "divchain":
plot.divchain <- function(x,add=FALSE,...){
if(!add) {
rw <- attr(x,"rw")
plot(0,0,type="n",ann=FALSE,axes=FALSE,xlim=rw[1:2],ylim=rw[3:4])
bty <- list(...)$bty
box(bty=bty)
}
lapply(x,function(u){segments(u[1],u[2],u[3],u[4],...)})
invisible()
}
E.g.:
require(spatstat)
set.seed(42)
X <- runifpoint(50)
z <- factor(kmeans(with(X,cbind(x,y)),centers=4)$cluster)
marks(X) <- z
dcX <- divchain(X)
plot(dirichlet(X),border="brown",main="")
plot(X,chars=20,cols=1:4,add=TRUE)
plot(dcX,add=TRUE,lwd=3)
Let me know whether this is satisfactory. Sorry I can't help you with ggplot stuff; I don't do ggplot.
You could try point in polygon test for example like kirkpatrick data structure. Much easier is to divide the polygon in horizontal or vertical. Source:http://www.personal.kent.edu/~rmuhamma/Compgeometry/MyCG/Voronoi/DivConqVor/divConqVor.htm

Can't get an R loop to execute

I am starting in R and trying to get this loop to execute. I am trying to get the loop to calculate consecutive distances between coordinates using a function (Vincenty's formula). 'Distfunc' is the file to the function. The function is then called up by 'x' below. All I want then is a data frame or a list of the distances between coordinates. Greatful of any help!
Distfunc <- source("F://Distfunc.R")
for (i in length(Radians)) {
LatRad1 <- Radians[i,1]
LongRad1 <- Radians[i,2]
LatRad2 <- Radians[i+1,1]
LongRad2 <- Radians[i+1,2]
x <- gcd.vif(LongRad1, LatRad1, LongRad2, LatRad2)
print(data.frame(x[i]))
}
Well, without a good description of the problem you are facing and a proper reproducible example it is very difficult to provide any good insight. To start off, see How to make a great R reproducible example?.
There are many things that are not clear in the way you are doing things. First of all, why assign the results of source(...) to the variable Distfunc?
Anyways, here is some code that I put together in trying to understand this; it runs without problems, but it is not clear that it accomplishes what you expect (since you don't provide much information). In particular, the codet uses the implementation for function gcd.vif by Mario Pineda-Krch (http://www.r-bloggers.com/great-circle-distance-calculations-in-r/). The code below is aimed at clarity, since you mention that you are starting in R.
# Calculates the geodesic distance between two points specified by radian latitude/longitude using
# Vincenty inverse formula for ellipsoids (vif)
# By Mario Pineda-Krch (http://www.r-bloggers.com/great-circle-distance-calculations-in-r/)
gcd.vif <- function(long1, lat1, long2, lat2) {
# WGS-84 ellipsoid parameters
a <- 6378137 # length of major axis of the ellipsoid (radius at equator)
b <- 6356752.314245 # ength of minor axis of the ellipsoid (radius at the poles)
f <- 1/298.257223563 # flattening of the ellipsoid
L <- long2-long1 # difference in longitude
U1 <- atan((1-f) * tan(lat1)) # reduced latitude
U2 <- atan((1-f) * tan(lat2)) # reduced latitude
sinU1 <- sin(U1)
cosU1 <- cos(U1)
sinU2 <- sin(U2)
cosU2 <- cos(U2)
cosSqAlpha <- NULL
sinSigma <- NULL
cosSigma <- NULL
cos2SigmaM <- NULL
sigma <- NULL
lambda <- L
lambdaP <- 0
iterLimit <- 100
while (abs(lambda-lambdaP) > 1e-12 & iterLimit>0) {
sinLambda <- sin(lambda)
cosLambda <- cos(lambda)
sinSigma <- sqrt( (cosU2*sinLambda) * (cosU2*sinLambda) +
(cosU1*sinU2-sinU1*cosU2*cosLambda) * (cosU1*sinU2-sinU1*cosU2*cosLambda) )
if (sinSigma==0) return(0)  # Co-incident points
cosSigma <- sinU1*sinU2 + cosU1*cosU2*cosLambda
sigma <- atan2(sinSigma, cosSigma)
sinAlpha <- cosU1 * cosU2 * sinLambda / sinSigma
cosSqAlpha <- 1 - sinAlpha*sinAlpha
cos2SigmaM <- cosSigma - 2*sinU1*sinU2/cosSqAlpha
if (is.na(cos2SigmaM)) cos2SigmaM <- 0  # Equatorial line: cosSqAlpha=0
C <- f/16*cosSqAlpha*(4+f*(4-3*cosSqAlpha))
lambdaP <- lambda
lambda <- L + (1-C) * f * sinAlpha *
(sigma + C*sinSigma*(cos2SigmaM+C*cosSigma*(-1+2*cos2SigmaM*cos2SigmaM)))
iterLimit <- iterLimit - 1
}
if (iterLimit==0) return(NA)  # formula failed to converge
uSq <- cosSqAlpha * (a*a - b*b) / (b*b)
A <- 1 + uSq/16384*(4096+uSq*(-768+uSq*(320-175*uSq)))
B <- uSq/1024 * (256+uSq*(-128+uSq*(74-47*uSq)))
deltaSigma = B*sinSigma*(cos2SigmaM+B/4*(cosSigma*(-1+2*cos2SigmaM^2) -
B/6*cos2SigmaM*(-3+4*sinSigma^2)*(-3+4*cos2SigmaM^2)))
s <- b*A*(sigma-deltaSigma) / 1000
return(s) # Distance in km
}
# Initialize the variable 'Radians' with random data
Radians <- matrix(runif(20, min = 0, max = 2 * pi), ncol = 2)
lst <- list() # temporary list to store the results
for (i in seq(1, nrow(Radians) - 1)) { # loop through each row of the 'Radians' matrix
LatRad1 <- Radians[i, 1]
LongRad1 <- Radians[i, 2]
LatRad2 <- Radians[i + 1, 1]
LongRad2 <- Radians[i + 1, 2]
gcd_vif <- gcd.vif(LongRad1, LatRad1, LongRad2, LatRad2)
# Store the input data and the results
lst[[i]] <- c(
latitude_position_1 = LatRad1,
longtude_position_1 = LongRad1,
latitude_position_2 = LatRad2,
longtude_position_2 = LongRad2,
GCD = gcd_vif
)
}
Results <- as.data.frame(do.call(rbind, lst)) # store the input data and the results in a data frame

Calculation with Nested loops in R

I have a Geocode dataset with three columns: Latitude, Longitude and Cluster. I calculated the average center of the clusters and store the results in two lists Center_lat and Center_lon.
Now I wanna calculate the distance from each observation(3000+) to each cluster center(30) with the Haversine formula. To get a 3000 by 30 matrix.
I tried to use a nested for loop, but I got the same distance for all clusters. Here's the code.
for (i in 1:n){
for (k in 1:c){
lat1=radians(Geocode[i,1])
lon1=radians(Geocode[i,2])
lat2=radians(Center_lat[k,2])
lon2=radians(Center_lon[k,2])
}
R <- 3958.756 # Earth mean radius [miles]
dist_mat[i,] <- acos(sin(lat1)*sin(lat2) + cos(lat1)*cos(lat2) * cos(lon2-lon1)) * R
}
I'm also thinking using a lapply to substitute the nested loop. But I'm not sure how to use the function... Any help is appreciated.
# Convert to radian
radians = function(theta=0){return(theta * pi / 180)}
# Calculates the geodesic distance from each property to the center of it's current cluster using the
# Spherical Law of Cosines (slc)
get_dist <- function(lat1, lon1, lat2, lon2) {
R <- 3958.756 # Earth mean radius [miles]
d <- acos(sin(radians(lat1))*sin(radians(lat2)) +
cos(radians(lat1))*cos(radians(lat2)) * cos(radians(lon2)-radians(lon1))) * R
return(d) # Distance in miles
}
dist_mat<-lapply()
This is the type of computation you want to vectorize in R. Here we use outer to generate all the possible combinations of row indices from your Geocode data and Center_x data, and then apply the distance function in one fell swoop.
First, get data in easier to use form (one matrix for locations, another for centers, first column lats, second lons):
# See Data section below for actual data used
# G <- radians(Geocode)
# C <- radians(cbind(Center_lat[, 2], Center_lon[, 2]))
R <- 3958.756 # Earth mean radius [miles]
Define the function, notice how we use the indices to look up the actual coordinates in G and C, and how the function is vectorized (i.e. we only need to call it once with all the data):
my_dist <- function(xind, yind)
acos(
sin(G[xind, 1]) * sin(C[yind, 1]) +
cos(G[xind, 1]) * cos(C[yind, 1]) * cos(C[yind, 2] - G[xind, 2])
) * R
And apply it with outer:
DISTS <- outer(seq.int(nrow(G)), seq.int(nrow(C)), my_dist)
str(DISTS)
# num [1:3000, 1:30] 4208 6500 8623 7303 3864 ...
quantile(DISTS) # to make sure stuff is reasonable:
# 0% 25% 50% 75% 100%
# 0.000 4107.574 6204.799 8333.155 12422.059
This runs in about 30ms on my system.
Data:
set.seed(1)
lats <- runif(10000, -60, 60) * pi / 180
lons <- runif(10000, -179, 180) * pi / 180
G.ind <- sample(10000, 3000)
C.ind <- sample(10000, 30)
G <- cbind(lats[G.ind], lons[G.ind])
C <- cbind(lats[C.ind], lons[C.ind])
It seems like you want to write to the matrix once per row and per column, so you'd want to change the matrix within both for loops, like this:
for (i in 1:n){
for (k in 1:c){
lat1=radians(Geocode[i,1])
lon1=radians(Geocode[i,2])
lat2=radians(Center_lat[k,2])
lon2=radians(Center_lon[k,2])
R <- 3958.756 # Earth mean radius [miles]
dist_mat[i,k] <- acos(sin(lat1)*sin(lat2) + cos(lat1)*cos(lat2) * cos(lon2-lon1)) * R
}
}

Resources