Moving window regression - r

I want to perform a moving window regression on every pixel of two raster stacks representing Band3 and Band4 of Landsat data. The result should be two additional stacks, one representing the Intercept and the other one representing the slope of the regression.
So layer 1 of stack "B3" and stack "B4" result in layer 1 of stack "intercept" and stack "slope". Layer 2 of stack B3 and stack B4 result in layer 2,.... and so on.
I already came along the gwrfunction, but want to stay in the raster package.
I somehow know that focal must be included in order to set my moving window (which should be 3x3 pixels) and somehow a linear model like: lm(as.matrix(b3)~as.matrix(b4)) although I don't think that this gets me the values pixelwise...
Instead of a rasterstack a layer by layer approach is also possible. (So it must not necessarily be a rasterstack of Band3.
Has anyone a glue how to program this in R?

Here is one approach, adapted from ?raster::localFun
set.seed(0)
b <- stack(system.file("external/rlogo.grd", package="raster"))
x <- flip(b[[2]], 'y') + runif(ncell(b))
y <- b[[1]] + runif(ncell(b))
# local regression:
rfun <- function(x, y, ...) {
d <- na.omit(data.frame(x, y))
if (nrow(d) < 3) return(NA)
m <- lm(y~x, data=d)
# return slope
coefficients(m)[2]
}
ff <- localFun(x, y, fun=rfun)
plot(ff)
Unfortunately you will have to run this twice to get both the slope and intercept (coefficients(m)[1]).

Related

Maximum pseudo-likelihood estimator for soft-core point process

I am trying to fit a soft-core point process model on a set of point pattern using maximum pseudo-likelihood. I followed the instructions given in this paper by Baddeley and Turner
And here is the R-code I came up with
`library(deldir)
library(tidyverse)
library(fields)
#MPLE
# irregular parameter k
k <- 0.4
## Generate dummy points 50X50. "RA" and "DE" are x and y coordinates
dum.x <- seq(ramin, ramax, length = 50)
dum.y <- seq(demin, demax, length = 50)
dum <- expand.grid(dum.x, dum.y)
colnames(dum) <- c("RA", "DE")
## Combine with data and specify which is data point and which is dummy, X is the point pattern to be fitted
bind.x <- bind_rows(X, dum) %>%
mutate(Ind = c(rep(1, nrow(X)), rep(0, nrow(dum))))
## Calculate Quadrature weights using Voronoi cell area
w <- deldir(bind.x$RA, bind.x$DE)$summary$dir.area
## Response
y <- bind.x$Ind/w
# the sum of distances between all pairs of points (the sufficient statistics)
tmp <- cbind(bind.x$RA, bind.x$DE)
t1 <- rdist(tmp)^(-2/k)
t1[t1 == Inf] <- 0
t1 <- rowSums(t1)
t <- -t1
# fit the model using quasipoisson regression
fit <- glm(y ~ t, family = quasipoisson, weights = w)
`
However, the fitted parameter for t is negative which is obviously not a correct value for a softcore point process. Also, my point pattern is actually simulated from a softcore process so it does not make sense that the fitted parameter is negative. I tried my best to find any bugs in the code but I can't seem to find it. The only potential issue I see is that my sufficient statistics is extremely large (on the order of 10^14) which I fear may cause numerical issues. But the statistics are large because my observation window spans a very small unit and the average distance between a pair of points is around 0.006. So sufficient statistics based on this will certainly be very large and my intuition tells me that it should not cause a numerical problem and make the fitted parameter to be negative.
Can anybody help and check if my code is correct? Thanks very much!

How to extrapolate a raster using in R

I am trying to downscale climatic conditions using the methodology in this article using the R software. I am almost there, but I am missing a couple of steps
Packages and data needed
For this example I uploaded some data to the archive.org website to load the required packages and data used in this example use the following code:
library(raster)
library(rgdal)
download.file("https://archive.org/download/Downscaling/BatPatagonia.rds", "Bat.rds")
download.file("https://archive.org/download/Downscaling/TempMinPatNow.rds", "Tmin.rds")
BatPatagonia <- readRDS("Bat.rds")
TempMinPatNow <- readRDS("Tmin.rds")
BatPatagonia is a raster file with the Bathymetry and altitude of the area extracted and transformed from the GEBCO dataset, while the TempMinPatNow is the minimum temperature of the same area for January extracted from worldclim. The plots of the datasets are seen bellow:
The goal of this question
In order to downscale past data from the last glacial maximum I need to model how current climate would be like if the sea level was the same as it was in the past. In order to do that I use the GEBCO data and to figure out more or less were the coast was. According to the methodology in the article cited above this are the first three steps to follow:
Create a DEM for land above 20 meters above sea level
Calculate a Multiple Linear Regression in a moving window
Extrapolate coefficients to the ocean
Point 3 is what I have been struggling to do, I will show how I did the first 2 points, and show what I have been looking for trying to solve point 3
1. Create a DEM for land 20 meters above sea level
In order to do this I took the BatPatagonia raster, and replaced all values bellow 20 meters with NA values using the following code:
Elev20 <- BatPatagonia
values(Elev20) <- ifelse(values(Elev20) <= 20, NA, values(Elev20))
The resulting raster is shown in the following image
2. Calculate a Multiple Linear Regression in a moving window
According to the manuscript in page 2591, the next step is to do a Multiple Linear Regression in a moving window using the Following formula for altitudes over 20 meters:
We already have the elevation data, but we also need the rasters for latitude and longitude, for that we use the following code, where we first create the Latitude and Longitude Rasters:
Latitud <- BatPatagonia
Longitud <- BatPatagonia
data_matrix <- raster::xyFromCell(BatPatagonia, 1:ncell(BatPatagonia))
values(Latitud) <- data_matrix[, 2]
values(Longitud) <- data_matrix[, 1]
We will multiply this by a raster mask of the areas that have elevations over 20 meters, so that we get only the values that we need:
Elev20Mask <- BatPatagonia
values(Elev20Mask) <- ifelse(values(Elev20Mask) <= 20, NA, 1)
Longitud <- Elev20Mask*Longitud
Latitud <- Elev20Mask*Latitud
Now I will build a stack with the response variables and the predictor variables:
Preds <- stack(Elev20, Longitud, Latitud, TempMinPatNow)
names(Preds) <- c("Elev", "Longitud", "Latitud", "Tmin")
The resulting stack is shown in the following figure:
As stated in the paper the moving window should be of 25 by 25 cells, resulting in a total of 625 cells, however they state that if the moving window has less than 170 cells with data, the regression should not be performed, and it should have a maximum of 624 cells in order to ensure that we are only modelling areas close to the coast. The result of this Multiple Regression with the moving window should be a stack with the Local intercept, and the local estimation of each one of the Betas that are in the equation shown above. I found out how to make this using the following code using the getValuesFocal function (This loop takes a while):
# First we establish the 25 by 25 window
w <- c(25, 25)
# Then we create the empty layers for the resulting stack
intercept <- Preds[[1]]
intercept[] <- NA
elevationEst <- intercept
latitudeEst <- intercept
longitudeEst <- intercept
Now we start the code:
for (rl in 1:nrow(Preds)) {
v <- getValuesFocal(Preds[[1:4]], row = rl, nrows = 1, ngb = w, array = FALSE)
int <- rep(NA, nrow(v[[1]]))
x1 <- rep(NA, nrow(v[[1]]))
x2 <- rep(NA, nrow(v[[1]]))
x3 <- rep(NA, nrow(v[[1]]))
x4 <- rep(NA, nrow(v[[1]]))
for (i in 1:nrow(v[[1]])) {
xy <- na.omit(data.frame(x1 = v[[1]][i, ], x2 = v[[2]][i, ], x3 = v[[3]][i,
], y = v[[4]][i, ]))
if (nrow(xy) > 170 & nrow(xy) <= 624) {
coefs <- coefficients(lm(as.numeric(xy$y) ~ as.numeric(xy$x1) +
as.numeric(xy$x2) + as.numeric(xy$x3)))
int[i] <- coefs[1]
x1[i] <- coefs[2]
x2[i] <- coefs[3]
x3[i] <- coefs[4]
} else {
int[i] <- NA
x1[i] <- NA
x2[i] <- NA
x3[i] <- NA
}
}
intercept[rl, ] <- int
elevationEst[rl, ] <- x1
longitudeEst[rl, ] <- x2
latitudeEst[rl, ] <- x3
message(paste(rl, "of", nrow(Preds), "ready"))
}
Coeffs <- stack(intercept, elevationEst, latitudeEst, longitudeEst, (intercept + Preds$Elev * elevationEst + Preds$Longitud * longitudeEst + Preds$Latitud *latitudeEst), Preds$Tmin)
names(Coeffs) <- c("intercept", "elevationEst", "longitudeEst", "latitudeEst", "fitted", "Observed")
The result of this loop is the coeffs stack, show bellow:
This is where I got stuck:
Extrapolate coefficients to the ocean
The goal now is to extrapolate the first 4 rasters of the Coeffs stack (intercept, elevationEst, longitudeEst and latitudeEst) to where the coast should be according to the last glacial maximum which was 120 meters shallower
MaxGlacier <- BatPatagonia
values(MaxGlacier) <- ifelse(values(MaxGlacier) < -120, NA,1)
The projected coastline is shown in the following map:
The way the authors projected the coefficients to the coast was by filling the gaps using by solving poisson's equation using the poisson_grid_fill of the NCL language from NCAR. But I would like to keep it simple and try to do all in the same language. I also found a similar function in python.
I would be happy with any extrapolation process that works well, I am not limiting my search to that algorithm.
I found several r packages that fill gaps such as the Gapfill package and even found a review of methods to fill gaps, but most of them are for interpolating and mostly for NDVI layers that can be based on other layers where the gap is filled.
Any ideas on how to move froward on this?
Thanks
Thinking back several decades to my physics undergrad days, we used Laplace relaxation to solve these types of Poisson equation problems. I'm not sure, but I guess that may also be how poisson_grid_fill works. The process is simple. Relaxation is an iterative process where we calculate each cell except those that form the boundary condition as the mean of the cells that are horizontally or vertically adjacent, then repeat until the result approaches a stable solution.
In your case, the cells for which you already have values provide your boundary condition, and we can iterate over the others. Something like this (demonstrated here for the intercept coefficient - you can do the others the same way):
gaps = which(is.na(intercept)[])
intercept.ext = intercept
w=matrix(c(0,0.25,0,0.25,0,0.25,0,0.25,0), nc=3, nr=3)
max.it = 1000
for (i in 1:max.it) intercept.ext[gaps] = focal(intercept.ext, w=w, na.rm=TRUE)[gaps]
intercept.ext = mask(intercept.ext, MaxGlacier)
Edit
Here's the same process embedded in a function, to demonstrate how you might use a while loop that continues until a desired tolerance is reached (or maximum number of iterations is exceeded). Note that this function is to demonstrate the principle, and is not optimised for speed.
gap.fill = function(r, max.it = 1e4, tol = 1e-2, verbose=FALSE) {
gaps = which(is.na(r)[])
r.filled = r
w = matrix(c(0,0.25,0,0.25,0,0.25,0,0.25,0), nc=3, nr=3)
i = 0
while(i < max.it) {
i = i + 1
new.vals = focal(r.filled, w=w, na.rm=TRUE)[gaps]
max.residual = suppressWarnings(max(abs(r.filled[gaps] - new.vals), na.rm = TRUE))
if (verbose) print(paste('Iteration', i, ': residual = ', max.residual))
r.filled[gaps] = new.vals
if (is.finite(max.residual) & max.residual <= tol) break
}
return(r.filled)
}
intercept.ext = gap.fill(intercept)
intercept.ext = mask(intercept.ext, MaxGlacier)
plot(stack(intercept, intercept.ext))

polynomial fitting on spectral data

I want to fit a polynomial function (max. 3rd order) on each raster cell over all my spectral bands (Landsat 1-7) creating a new raster(stack) representing the coefficients.
I got my data (including NA values) in a stack with 6 Layer (Landsat Band 1-7[excluding 6]).
I guess somehow I should tell the polynomial function on which spectral wavelength the bands are located
Landsat7 Wavelength (micrometers)
Band 1 0.45-0.52
Band 2 0.52-0.60
Band 3 0.63-0.69
Band 4 0.77-0.90
Band 5 1.55-1.75
Band 7 2.09-2.35
so that it can fit it properly.
Has anyone an idea how to do that polynomial fitting of each cell and extracting the coefficients in R? Thanks for any help!
Your question is not very clear, as you do not specify what you are fitting. I am guessing it is band number. You can do something like this.
library(raster)
b <- brick(system.file("external/rlogo.grd", package="raster"))
b[[2]][125:225] <- NA
s <- stack(b, flip(b, 'y'))
names(s) <- paste0('b', 1:6)
bands <- 1:6
f <- function(x) {
# in case of NAs; match the number of coefficients returned
if (any(is.na(x))) return(c(NA, NA, NA))
m <- lm(x ~ bands + I(bands^2))
coefficients(m)
}
z <- calc(s, f)
z
plot(z)
If you need to speed this up you can follow the example here:
https://gis.stackexchange.com/questions/144211/want-cell-linear-regression-values-for-a-netcdf-or-multi-band-raster/144408#144408

Observation in a bivariate Ellipse

I am trying find the probability that a point lies within an ellipse?
For eg if I was plotting the bivariate data (x,y) for 300 datasets in an 95% ellipsoid region, how do I calculate how many times out of 300 will my points fall inside the
ellipse?
Heres the code I am using
library(MASS)
seed<-1234
x<-NULL
k<-1
Sigma2 <- matrix(c(.72,.57,.57,.46),2,2)
Sigma2
rho <- Sigma2[1,2]/sqrt(Sigma2[1,1]*Sigma2[2,2])
rho
eta1<-replicate(300,mvrnorm(k, mu=c(-1.59,-2.44), Sigma2))
library(car)
dataEllipse(eta1[1,],eta1[2,], levels=c(0.05, 0.95))
Thanks for your help.
I don't see why people are jumping on the OP. In context, it's clearly a programming question: it's about getting the empirical frequency of data points within a given ellipse, not a theoretical probability. The OP even posted code and a graph showing what they're trying to obtain.
It may be that they don't fully understand the statistical theory behind a 95% ellipse, but they didn't ask about that. Besides, making plots and calculating frequencies like this is an excellent way of coming to grips with the theory.
Anyway, here's some code that answers the narrowly-defined question of how to count the points within an ellipse obtained via a normal distribution (which is what underlies dataEllipse). The idea is to transform your data to the unit circle via principal components, then get the points within a certain radius of the origin.
within.ellipse <- function(x, y, plot.ellipse=TRUE)
{
if(missing(y) && is.matrix(x) && ncol(x) == 2)
{
y <- x[,2]
x <- x[,1]
}
if(plot.ellipse)
dataEllipse(x, y, levels=0.95)
d <- scale(prcomp(cbind(x, y), scale.=TRUE)$x)
rad <- sqrt(2 * qf(.95, 2, nrow(d) - 1))
mean(sqrt(d[,1]^2 + d[,2]^2) < rad)
}
It was also commented that a 95% data ellipse contains 95% of the data by definition. This is certainly not true, at least for normal-theory ellipses. If your distribution is particularly bad, the coverage frequency may not even converge to the assumed level as the sample size increases. Consider a generalised pareto distribution, for example:
library(evd) # for rgpd
# generalised pareto has no variance for shape > 0.5
z <- sapply(1:1000, function(...) within.ellipse(rgpd(100, shape=5), rgpd(100, shape=5), FALSE))
mean(z)
[[1] 0.97451
z <- sapply(1:1000, function(...) within.ellipse(rgpd(10000, shape=5), rgpd(10000, shape=5), FALSE))
mean(z)
[1] 0.9995808

Create function to automatically create plots from summary(fit <- lm( y ~ x1 + x2 +... xn))

I am running the same regression with small alterations of x variables several times. My aim is after having determined the fit and significance of each variable for this linear regression model to view all all major plots. Instead of having to create each plot one by one, I want a function to loop through my variables (x1...xn) from the following list.
fit <-lm( y ~ x1 + x2 +... xn))
The plots I want to create for all x are
1) 'x versus y' for all x in the function above
2) 'x versus predicted y
3) x versus residuals
4) x versus time, where time is not a variable used in the regression but provided in the dataframe the data comes from.
I know how to access the coefficients from fit, however I am not able to use the coefficient names from the summary and reuse them in a function for creating the plots, as the names are characters.
I hope my question has been clearly described and hasn't been asked already.
Thanks!
Create some mock data
dat <- data.frame(x1=rnorm(100), x2=rnorm(100,4,5), x3=rnorm(100,8,27),
x4=rnorm(100,-6,0.1), t=(1:100)+runif(100,-2,2))
dat <- transform(dat, y=x1+4*x2+3.6*x3+4.7*x4+rnorm(100,3,50))
Make the fit
fit <- lm(y~x1+x2+x3+x4, data=dat)
Compute the predicted values
dat$yhat <- predict(fit)
Compute the residuals
dat$resid <- residuals(fit)
Get a vector of the variable names
vars <- names(coef(fit))[-1]
A plot can be made using this character representation of the name if you use it to build a string version of a formula and translate that. The four plots are below, and the are wrapped in a loop over all the vars. Additionally, this is surrounded by setting ask to TRUE so that you get a chance to see each plot. Alternatively you arrange multiple plots on the screen, or write them all to files to review later.
opar <- par(ask=TRUE)
for (v in vars) {
plot(as.formula(paste("y~",v)), data=dat)
plot(as.formula(paste("yhat~",v)), data=dat)
plot(as.formula(paste("resid~",v)), data=dat)
plot(as.formula(paste("t~",v)), data=dat)
}
par(opar)
The coefficients are stored in the fit objects as you say, but you can access them generically in a function by referring to them this way:
x <- 1:10
y <- x*3 + rnorm(1)
plot(x,y)
fit <- lm(y~x)
fit$coefficient[1] # intercept
fit$coefficient[2] # slope
str(fit) # a lot of info, but you can see how the fit is stored
My guess is when you say you know how to access the coefficients you are getting them from summary(fit) which is a bit harder to access than taking them directly from the fit. By using fit$coeff[1] etc you don't have to have the name of the variable in your function.
Three options to directly answer what I think was the question: How to access the coefficients using character arguments:
x <- 1:10
y <- x*3 + rnorm(1)
fit <- lm(y~x)
# 1
fit$coefficient["x"]
# 2
coefname <- "x"
fit$coefficient[coefname]
#3
coef(fit)[coefname]
If the question was how to plot the various functions then you should supply a sufficiently complex construction (in R) to allow demonstration of methods with a well-specified set of objects.

Resources