Linear regresion on each raster pixel to predict future month (in R language) - r

I have successfull run this code. I have read it from:
Can't Calculate pixel-wise regression in R on raster stack with fun
library(raster)
# Example data
r <- raster(nrow=15, ncol=10)
set.seed(0)
# Now I make 6 raster (1 raster/months), then assign each pixel's value randomly
s <- stack(lapply(1:6, function(i) setValues(r, rnorm(ncell(r), i, 3))))
names(s) <- paste0('Month', c(1,2,3,4,5,6))
# Extract each pixel values
x <- values(s)
# Model with linreg
m <- lm(Month6 ~ ., data=data.frame(x))
# Prediction raster
p <- predict(s, m)
If you run that code, p will be a raster. But, I still confused. How to make raster in the future? For example, I want 'Month8' raster based on 6 previous raster?
What I mean is, each pixels has different linreg equations (where X=Month1, ... , Months6). If I input X=Month8, I will have 150 cells of Y for 8th Month that represent in each pixel of raster.
What I have done
# Lets try make a data frame for clear insight for my data
x <- values(s)
DF <- data.frame(x)
# Make X as month, and y is target.
library(data.table)
DF_T <- transpose(DF)
Month <- seq(1,nrow(DF_T))
DF_T <- cbind(Month, DF_T)
# Make prediction for first pixel
V1_lr <- lm(V1 ~ Month, data=DF_T)
# prediction for 8th Months in a pixel
V1_p <- predict(V1_lr, data.frame(Month=8))
V1_p
This is just one pixel. I want the entire raster for 'Month8'

Related

Creating a point distance component to a monte carlo simulation function in R

I am attempting to do some Monte Carlo simulations, where I have a population of 325 samples in a field. I want to create a list of composite samples (samples consisting of multiple subsamples) from the dataset, while increasing sample size, repeated 100 times. I have created the function that will do so, and have supplied that below in the code.
##Create an example data set
# x and y are coordinates
x <- c(1:100)
y <- rev(c(1:100))
## z and w are soil test values
set.seed(2345)
z <- rnorm(100,mean=50, sd=10)
set.seed(2345)
w <- rnorm(100, mean=75, sd=5)
data <- data.frame(x, y, z, w)
##Initialize list
data.step.sim.list <- list()
## Code that increases sample size
for(i in seq_len(nrow(data))){
thisdat <- replicate(100,data[sample(1:nrow(data), size=i, replace = F),], simplify = F)
data.step.sim.list[[i]] <- thisdat
}
The product becomes a list n long (n being length of dataset), with each list consisting of a list of 100 dataframes (100 coming from 100 replications) that are length 1:n length long.
I have x and y data for each sample as well, and want to stipulate that each subsample collected would be at least 'm' meters from the other samples.
I have created a function that will calculate each distance seen below. I cannot find a way to implement this into my current code. Would anyone know how to do this?
#function to compute distances
calc.dist <- function(x1, y1, x2, y2) {
d <- sqrt(((x2 - x1)^2) + ((y2 - y1)^2))
return(d)
} #end function calc.dist

looping through large rasterbrick in raster R

How can i perform for-loop in a large daily rasterbrick to get annual stacks and calculate the maximum value (annual maxima ) for each year( each stack of 365 files).
Basically, i have same question like this. So taking same question as sample, how i can conduct a for-loop that would calculate maximum value for each 46 stacks ( each stack with 8 layers).
I tried using only stackApply but it gives all black/zero value when i run for whole period, however it gives max values if i run for individual years (tested separately for 10 years, i have more than 100 years data).
library(raster)
# example data
sca <- brick(nrow=108,ncol=132,nl=365)
values(sca) <- runif(ncell(sca)*nlayers(sca))
# indices grouping sets of 8
i <- rep(1:ceiling(365/8), each=8)
# the last period is not a complete set of 8 days
i <- i[1:nlayers(sca)]
# This does not work for me, gives output as zero.
x <- stackApply(sca, i, max)
for (i in 1:nlayers(sca)) {
x <- sca[[i]]
xx<-stackApply(sca, i, max)
plot(xx)
# etc.
}
You could loop like this:
library(raster)
sca <- brick(nrow=108,ncol=132,nl=365)
values(sca) <- runif(ncell(sca)*nlayers(sca))
i <- rep(1:ceiling(365/8), each=8)
i <- i[1:nlayers(sca)]
for (j in unique(i)) {
x <- sca[[which(j==i)]]
xx <- max(x, na.rm=TRUE)
# or
# xx <- calc(x, fun=max, na.rm=TRUE, filename = patste0(i, '.tif'))
}

Partial Cross-correlation in R

I think the title is fairly self-explanatory. I want to compute the cross-correlation between two time series controlled for the values at other lags. I can't find any existing R code to do this, and I'm not at all confident enough in my knowledge of statistics (or R) to try to write something myself. It would be analogous to the partial autocorrelation function, just for the cross-correlation instead of the autocorrelation.
If it helps at all, my larger objective is to look for lagged correlations between different measurements of a physical system (to start with, flux and photon index from gamma ray measurements of blazars), with the goal of building a general linear model to try to predict flaring events.
Look at my answer to my own question (same as the one you posted).
You can make use of the pacf function in R, extending it to a matrix with 2 or more time series. I have checked results between the multivariate acf and ccf functions and they yield the same results, so the same can be concluded about the multivariate pacfand the non-existing pccf.
I believe this work,
pccf <- function(x,y,nlags=7,partial=TRUE){
# x (numeric): variable that leads y
# y (numeric): variable of interest
# nlags (integer): number of lags (uncluding zero)
# partial (boolean): partial or absolute correlation
# trim y
y <- y[-(1:(nlags-1))]
# lagged matrix of x
x_lagged <- embed(x,nlags)
# process for each lag
rho <- lag <- NULL
for(i in 1:(nlags)){
if(partial){
# residuals of x at lag of interest regressed on all other lags of x
ex <- lm(x_lagged[,i] ~ x_lagged[,-i])$residuals
# residuals of y regressed on all lags of x but the one of interest
ey <- lm(y ~ x_lagged[,-i])$residuals
}else{
ex <- x_lagged[,i]
ey <- y
}
# calculate correlation
rho[i] = cor(ex,ey, use="pairwise.complete.obs")
lag[i] = i-1
}
return(
tibble(lag=lag, rho=rho) %>%
arrange(lag)
)
}
# test
n <- 200 # count
nlag <- 6 # number of lags
x <- as.numeric(arima.sim(n=n,list(ar=c(phi=0.9)),sd=1)) # simulate times series x
y <- lag(x,nlag) + rnorm(n,0,0.5) # simulate y to lag x
y <- y[(nlag+1):n] # remove NAs from lag
x <- x[(nlag+1):n] # align with y
pccf(x,y,nlags=10,partial=FALSE) %>%
mutate(type='Cross correlation') %>%
bind_rows(
pccf(x,y,nlags=10,partial=TRUE) %>%
mutate(type='Partial cross correlation')
) %>%
ggplot() +
geom_col(aes(-lag,rho),width=0.1) +
facet_wrap(~type,scales='free_y', ncol=1) +
scale_x_continuous(breaks=-10:0) +
theme_bw(base_size=20)

Computing Euclidean Distance whilst holding point A constant and changing point B in R

I am currently working on a project for which I am interested in calculating the distance between the location of a basketball player and the ball during an event.
To do this I created the following function:
## Euclidean distance
distance <- function(x,y){
x2 <- (x[i]-x[j])^2
y2 <- (y[i]-y[j])^2
dis <- sqrt(x2+y2)
}
What I want to achieve is to calculate the distance between the basketball and the players, and then repeat this process for each time frame of data I have. So for each time frame x1 and y1 would have to be constant whilst x[j] and y[j] would keep going from 2 to 11. I thought of this nested for loop, but it is giving me a constant result of 28.34639. I added a link to an image of a sample of my data frame. Data Frame Sample
for(i in i:length(all.movement$x_loc)){
for(j in j:11){
all.movement$distance[j] <- distance(all.movement$x_loc, all.movement$y_loc)
}
i <- i + 11
}
I would really appreciate some help with this problem.
I'd go about:
set.seed(101)
x <- rnorm(30, 10, 5) # x coordinate
y <- rnorm(30, 15, 7) # y coordinate
df <- data.frame(x, y) # sample data.frame
i = 0
for (i in i:length(df$x)) {
df$distance <- sqrt((x - 5)^2 + (y + 4)^2)} # assume basket coordinates (5, -4)
df # output

polynomial fitting on spectral data

I want to fit a polynomial function (max. 3rd order) on each raster cell over all my spectral bands (Landsat 1-7) creating a new raster(stack) representing the coefficients.
I got my data (including NA values) in a stack with 6 Layer (Landsat Band 1-7[excluding 6]).
I guess somehow I should tell the polynomial function on which spectral wavelength the bands are located
Landsat7 Wavelength (micrometers)
Band 1 0.45-0.52
Band 2 0.52-0.60
Band 3 0.63-0.69
Band 4 0.77-0.90
Band 5 1.55-1.75
Band 7 2.09-2.35
so that it can fit it properly.
Has anyone an idea how to do that polynomial fitting of each cell and extracting the coefficients in R? Thanks for any help!
Your question is not very clear, as you do not specify what you are fitting. I am guessing it is band number. You can do something like this.
library(raster)
b <- brick(system.file("external/rlogo.grd", package="raster"))
b[[2]][125:225] <- NA
s <- stack(b, flip(b, 'y'))
names(s) <- paste0('b', 1:6)
bands <- 1:6
f <- function(x) {
# in case of NAs; match the number of coefficients returned
if (any(is.na(x))) return(c(NA, NA, NA))
m <- lm(x ~ bands + I(bands^2))
coefficients(m)
}
z <- calc(s, f)
z
plot(z)
If you need to speed this up you can follow the example here:
https://gis.stackexchange.com/questions/144211/want-cell-linear-regression-values-for-a-netcdf-or-multi-band-raster/144408#144408

Resources