Error in sample.int(length(x), size, replace, prob) : NA in probability vector - raster

I want to create a species distributions model (predict habitat suitability for a species in Chile based on occurence records and enviromental layers (predictors)). Therefore, I need to create background points which lie within a kernel-density raster (based on my occurence points). When I run the last part of the code I get the error: Error in sample.int(length(x), size, replace, prob) : NA in probability vector.
presences <- which(values(occur.chile) == 1)
pres.locs <- coordinates(occur.chile)[presences, ]
dens <- kde2d(pres.locs[,1], pres.locs[,2], n = c(nrow(occur.chile), ncol(occur.chile)))
dens.ras <- raster(dens)
dens.ras2 <- resample(dens.ras, predictors)
bg <- xyFromCell(dens.ras2, sample(which(!is.na(values(subset(predictors, 1)))), 10000, prob=values(dens.ras2)[!is.na(values(subset(predictors, 1)))]))

Related

Interpolating values per group in a dataframe when predicted vector is shorter than measured

I have a data set with Absorbance measured at odd interval wavelengths and want to run a linear interpolation to obtain absorbance values at the desired wavelength. An example dataset
#Measured wavelengths
w_measured <- c(179.856, 180.241, 180.626, 181.011, 181.396, 181.781, 182.166, 182.551,
182.936, 183.321, ...)
#Measured absorbance
a_measured <- c(0.2737, 0.6085, 0.0000, 0.0000, 0.1684, 0.3481, 0.0000, -0.0915,
-0.1206,-0.3094, ...)
samples <- the grouping variable
#dataframe with absorbance measurements per wavelength interval for a X number of samples
a <- as.data.frame(cbind(w_measured, a_measured, samples))
# the wavelengths I want absorbance measurements at
w_new <- c(180, 181, 182, 183, ... etc)
I tried using approx fun per SampleID using dplyr
apply_fun <- function(w_new, w_measured, a_measured) {
predfunc <- approxfun(w_measured,a_measured)
predfunc(w_new)
}
test <- a %>%
group_by(samples) %>%
arrange(samples, w_measured) %>%
mutate(predint = apply_fun(w_new, a$w_measured, a$a_measured))
But I get the error
Error: Problem with mutate() column predint. i predint = apply_fun(w_new, a$w_measured, a$a_measured). i predint must be
size 2047 or 1, not 561. i The error occurred in group 1: SampleIDs =
02SS02BI_071717_1a_Absorbance_20-09-29-293.
In addition: Warning message: Problem with mutate() column
predint. i predint = apply_fun(w_new, a$w_measured, a$a_measured).
i collapsing to unique 'x' values i The warning occurred in group 1:
SampleIDs = 02SS02BI_071717_1a_Absorbance_20-09-29-293.
I think because the length of the wavelength values for which I want to predict absorbance per sample is short than the length of the measured wavelengths. Is there a better way to code this so that the vector lengths of predicted values don't have to match the measured values?

R code Gaussian mixture -- numerical expression has 2 elements: only the first used

I'm trying to create a Gaussian Mix function according to these parameters:
For each sample, roll a die with k sides
If the j-th side appears from the roll, draw a sample from Normal(muj, sdj) where muj and sdj are the mean and standard deviation for the j-th Normal distribution respectively. This means you should have k different Normal distributions to choose from. Note that muj is the mathematical form of referring to the j-th element in a vector called mus.
The resulting sample from this Normal is then from a Gaussian Mixture.
Where:
n, an integer that represents the number of independent samples you want from this random variable
mus, a numeric vector with length k
sds, a numeric vector with length k
prob, a numeric vector with length k that indicates the probability of choosing the different Gaussians. This should have a default to NULL.
This is what I came up with so far:
n <- c(1)
mus <- c()
sds <- c()
prob <- c()
rgaussmix <- function(n, mus, sds, prob = NULL){
if(length(mus) != length(sds)){
stop("mus and sds have different lengths")
}
for(i in 1:seq_len(n)){
if(is.null(prob)){
rolls <- c(NA, n)
rolls <- sample(c(1:length(mus)), n, replace=TRUE)
avg <- rnorm(length(rolls), mean=mus[rolls], sd=sds[rolls])
}else{
rolls <- c(NA, n)
rolls <- sample(c(1:length(mus), n, replace=TRUE, p=prob))
avg <- rnorm(length(rolls), mean=mus[rolls], sd=sds[rolls])
}
}
return(avg)
}
rgaussmix(2, 1:3, 1:3)
It seems to match most of the requirements, but it keeps giving me the following error:
numerical expression has 2 elements: only the first usednumber of items to replace is not a multiple of replacement length
I've tried looking at the lengths of multiple variables, but I can't seem to figure out where the error is coming from!
Could someone please help me?
If you do seq_len(2) it gives you:
[1] 1 2
And you cannot do 1:(1:2) .. it doesn't make sense
Also you can avoid the loops in your code, by sampling the number of tries you need, for example if you do:
rnorm(3,c(0,10,20),1)
[1] -0.507961 8.568335 20.279245
It gives you 1st sample from the 1st mean, 2nd sample from 2nd mean and so on. So you can simplify your function to:
rgaussmix <- function(n, mus, sds, prob = NULL){
if(length(mus) != length(sds)){
stop("mus and sds have different lengths")
}
if(is.null(prob)){
prob = rep(1/length(mus),length(mus))
}
rolls <- sample(length(mus), n, replace=TRUE, p=prob)
avg <- rnorm(n, mean=mus[rolls], sd=sds[rolls])
avg
}
You can plot the results:
plot(density(rgaussmix(10000,c(0,5,10),c(1,1,1))),main="mixture of 0,5,10")

Random sampling with sample() gives unexpected results

Consider the following when performing random sampling in R:
n <- 10
k <- 10
p <- 0.10 # proportion of the k objects to subsample
probs <- c(0.30, 0.30, 0.30, rep(0.10/7, 7)) # probabilities for each of the k objects
Here, the roles of n and k are irrelevant; however, there is the condition that n >= k.
x <- sort(sample(k, size = ceiling(p * k), replace = FALSE)) # works
y <- sample(x, size = n, replace = TRUE, prob = probs[x]) # throws error
I am wondering why the function call assigned to y above throws an error.
The error I receive is:
Error in sample.int(x, size, replace, prob) :
incorrect number of probabilities
My thinking is that the 'size' argument to sample() (i.e., n*p) cannot evaluate to 1 in the second function call (y variable), but I haven't been able to find anything documenting this error in the help files to sample().
I know that ceiling() can act strangely in some instances, but I'm not convinced that this could be the issue.
When the above code is run, x is set to the integer data type, e.g., 1L, 2L, etc., which leads to the error in evaluating y.
Does someone have an idea on how to fix this issue?
If x is a single value, sample(x) samples from values 1 through x (see the Details section of the help), or 1 through floor(x) if x isn't an integer. So the prob argument has to be a vector of length x. In your code probs[x] is always a vector of length 1, which causes the error.

Large dataset and autocorrelation computation

I have geographical data at the town level for 35 000 towns.
I want to estimate the impact of my covariates X on a dependent variable Y, taking into account autocorrelation.
I have first computed weight matrix and then I used the command spautolm from the package spam but it returned me an error message because my dataset is too large.
Do you have any ideas of how can I fix it? Is there any other equivalent commands that would work?
library(haven)
library(tibble)
library(sp)
library(data.table)
myvars <- c("longitude","latitude","Y","X")
newdata2 <- na.omit(X2000[myvars]) #drop observations with no values for one observation
df <- data.frame(newdata2)
newdata3<- unique(df) #drop duplicates in terms of longitude and latitude
coordinates(newdata3) <- c("longitude2","latitude2") #set the coordinates
coords<-coordinates(newdata3)
Sy4_nb <- knn2nb(knearneigh(coords, k = 4)) # Display the k closest neighbours
Sy4_lw_idwB <- nb2listw(Sy8_nb, glist = idw, style = "B") #generate a list weighted by the distance
When I try to run such formulas:
spautolm(formula = Y~X, data = newdata3, listw = Sy4_lw_idwB)
It returns me : Error: cannot allocate vector of size 8.3 Gb

extracting residuals from pixel by pixel regression

I am trying to extract the residuals from a regression run pixel by pixel on a raster stack of NDVI/precipitation. My script works when i run it with a small part of my data. But when i try to run the whole of my study area i get: "Error in setValues(out, x) : values must be numeric, integer, logical or factor"
The lm works, since I can extract both slope and intercept. I just cant extract the residuals.
Any idea of how this could be fixed?
Here is my script:
setwd("F:/working folder/test")
gimms <- list.files(pattern="*ndvi.tif")
ndvi <- stack(gimms)
precip <- list.files(pattern="*pre.tif")
pre <- stack(precip)
s <- stack(ndvi,pre)
residualfun = function(x) { if (is.na(x[1])){ NA } else { m <- lm(x[1:6] ~ x[7:12], na.action=na.exclude)
r <- residuals.lm(m)
return (r)}}
res <- calc(s,residualfun)
And here is my data: https://1drv.ms/u/s!AhwCgWqhyyDclJRjhh6GtentxFOKwQ
Your function only test if the first layer shows NA values to avoid fitting the model. But there may be NA in other layers. You know that because you added na.action = na.exclude in your lm fit.
The problem is that if the model removes some values because of NA, the residuals will only have the length of the non-NA values. This means that your resulting r vector will have different lengths depending on the amount of NA values in layers. Then, calc is not be able to combine results of different lengths in a stack a a defined number of layers.
To avoid that, you need to specify the length of r in your function and attribute residuals only to non-NA values.
I propose the following function that now works on the dataset your provided. I added (1) the possibility compare more layers of each if you want to extend your exploration (with nlayers), (2) avoid fitting the model if there are only two values to compare in each layer (perfect model), (3) added a try if for any reason the model can fit, this will output values of -1e32 easily findable for further testing.
library(raster)
setwd("/mnt/Data/Stackoverflow/test")
gimms <- list.files(pattern="*ndvi.tif")
ndvi <- stack(gimms)
precip <- list.files(pattern="*pre.tif")
pre <- stack(precip)
s <- stack(ndvi,pre)
# Number of layers of each
nlayers <- 6
residualfun <- function(x) {
r <- rep(NA, nlayers)
obs <- x[1:nlayers]
cov <- x[nlayers + 1:nlayers]
# Remove NA values before model
x.nona <- which(!is.na(obs) & !is.na(cov))
# If more than 2 points proceed to lm
if (length(x.nona) > 2) {
m <- NA
try(m <- lm(obs[x.nona] ~ cov[x.nona]))
# If model worked, calculate residuals
if (is(m)[1] == "lm") {
r[x.nona] <- residuals.lm(m)
} else {
# alternate value to find where model did not work
r[x.nona] <- -1e32
}
}
return(r)
}
res <- calc(s, residualfun)

Resources