I am trying to fit a panel spatial model in R using the package spml. I first define the NxN weighting matrix as follows
neib <- dnearneigh(coordinates(coord), 0, 50, longlat = TRUE)
dlist <- nbdists(neib, coordinates(coord))
idlist <- lapply(dlist, function(x) 1/x)
w50 <- nb2listw(neib,zero.policy=TRUE, glist=idlist, style="W")
Thus I define two observations to be neighbours if they are distant within a range of 50km at most. The weights attached to each pairs of neighbour observations correspond to the inverse of their distance, so that closer neighbours receive higher weights. I also use the option zero.policy=TRUE so that observations which do not have neighbours are associated with a vector of zero weights.
Once I do this I try to fit the panel spatial model in the following way
mod <- spml(y ~ x , data = data_p, listw = w50, na.action = na.fail, lag = F, spatial.error = "b", model = "within", effect = "twoways" ,zero.policy=TRUE)
but I get the following error and warning messages
Error in lag.listw(listw, u) : Variable contains non-finite values In
addition: There were 50 or more warnings (use warnings() to see the
first 50)
Warning messages: 1: In mean.default(X[[i]], ...) : argument is not
numeric or logical: returning NA
...
50: In mean.default(X[[i]], ...) : argument is not numeric or
logical: returning NA
I believe this to be related to the non-neighbour observations. Can please anyone help me with this? Is there any way to deal with non-neighbour observations besides the zero.policy option?
Many many thanks for helping me.
You should check two things:
1) Make sure that the weight matrix is row-normalized.
2) Treat properly if you have any NA values in the dataset and as well in the W matrix.
Related
I have a large dataset (~100k observations) of presence/absence data that I am trying to fit a Hierarchical GAM with individual effects that have a Shared penalty (e.g. 'S' in Pedersen et al. 2019). The data consists of temp as numeric, region (5 groups) as a factor.
Here is a simple version of the model that I am trying to fit.
modS1 <- gam(occurrence ~ s(temp, region), family = binomial,
data = df, method = "REML")
modS2 <- gam(occurrence ~ s(temp, region, k= c(10,4), family = binomial,
data = df, method = "REML")
In the first case I received the following error:
Which I assumed it because k was set too high for region given there are only 5 different regions in the data set.
Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) :
NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning messages:
1: In mean.default(xx) : argument is not numeric or logical: returning NA
2: In Ops.factor(xx, shift[i]) : ‘-’ not meaningful for factors
In the second case I attempt to lower k for region and receive this error:
Error in if (k < M + 1) { : the condition has length > 1
In addition: Warning messages:
1: In mean.default(xx) : argument is not numeric or logical: returning NA
2: In Ops.factor(xx, shift[i]) : ‘-’ not meaningful for factors
I can fit Models G and GI and I from Pedersen et al. 2019 with no issues. It is models GS and S where I run into issues.
If anyone has any insights I would really appreciate it!
The bs = "fs" argument in the code you're using as a guide is important. If we start at the ?s help page and click on the link to the ?smooth.terms help page, we see:
Factor smooth interactions
bs="fs" Smooth factor interactions are often produced using by variables (see gam.models), but a special smoother class (see factor.smooth.interaction) is available for the case in which a smooth is required at each of a large number of factor levels (for example a smooth for each patient in a study), and each smooth should have the same smoothing parameter. The "fs" smoothers are set up to be efficient when used with gamm, and have penalties on each null space component (i.e. they are fully ‘random effects’).
You need to use a smoothing basis appropriate for factors.
Notably, if you take your source code and remove the bs = "fs" argument and attempt to run gam(log(uptake) ∼ s(log(conc), Plant_uo, k=5, m=2), data=CO2, method="REML"), it will produce the same error that you got.
I am trying to build a loop where I do non-linear regression of several variables, create and save graphs as a pdf. here is a fraction of the code:
library(propagate)
library(nlstools)
nls_fit_best<-nls(reformulate("a*IDV^b", i),
start = list(a = 1, b = 1),
control = list(minFactor=0, maxiter=nls_iterations),
data=df)
#calculcates the values for the confidence intervals
preds <- data.frame(IDV = seq(min(IDV), max(IDV), length=30))
y.conf <- predictNLS(nls_fit_best, newdata=preds, interval="confidence", alpha=0.05, nsim=10000)$summary
best_fit_coeffs<-as.data.frame(round(coeffs(nls_fit_best), digits=3))
residual_plots<-nlsResiduals(nls_fit_best)
par(mfrow = c(3,3), mar=c(5.1,4.1,2,1.1), oma=c(0,0,0,0))
layout(matrix(c(1,1,1,1,1,1,2,3,4),nrow=3, byrow = TRUE))
#plots the values into a graph with a bit of wiggle room
plot(i~IDV,
data=cor_data_centered,
ylim=c(0,max(i)+0.2,xlim=c(0,max(IDV)+0.2))
#plots best fit line
lines(IDV,predict(nls_fit_best),lty=2,col="black",lwd=3)
#plots 95% confidence interval and info
matlines(preds, y.conf[,c("Sim.2.5%", "Sim.97.5%")], col="black", lty="dashed")
mtext(paste("power function coeffs",best_fit_coeffs,sep=" "), side=3)
plot(residual_plots, which=2)
plot(residual_plots, which=4)
plot(residual_plots, which=6)
}
where IDV is my idenpendent variable (i.e, X) - which has 13 measurements. There are 14 variables in the df, with 13 measurments, lets say we have this, to make it easy.
IDV=1:13
df <- as.data.frame(matrix(1, ncol = 14, nrow = 13))
When I run the code i get the following error message
Error in model.frame.default(formula = ~i + IDV, data = df) :
variable lengths differ (found for 'IDV')
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
it seems to me that the code is trying to correlate the actualy column names of the data frame with the IDV instead of doing the correlations of each of the variables of the df with IDV. I suppose that the error lies in the initial loop but I do not know how to fix this.
In case anyone has a similar problem to mine, I managed to solve the issue with using the minimum of each variable to set the boundaries, using the following code:
max_loop=c()
max_loop[i]= max(linear_plots[,i])
min_loop=c()
min_loop[i]= min(linear_plots[,i])
#plots the values into a graph with a bit of wiggle room
plot(reformulate("IDV",i),
data=cor_data_centered,
ylim=c(min_loop[i]-wiggle_room_y,max_loop[i]+wiggle_room_y),
xlim=c(min(IDV)-wiggle_room_x,max(IDV)+wiggle_room_x))
With "wiggle room" being whatever value you want to use.
I have a dataframe where a column is a mix of positive and negative numbers and the first entry is NA. I'm trying to run the shape function as
shape(data$col, models = 30, start = 30, end = 400, ci=.90,reverse = TRUE,auto.scale = TRUE)
where the data in 'col' is [NA, -0.2663194135, -3.7665034719, -0.2072122334, 1.5721742718, -9.142419, -8.954330, -5.167314, 11.805930, 9.533830, 7.065835]
but I get an error that says
Error in optim(theta, negloglik, hessian = TRUE, ..., tmp = excess) :
non-finite value supplied by optim
Can someone help me figure out what it means? I've googled it but haven't found anything concrete
It's not clear what you are trying to do here. Calling shape allows you to see how altering the threshold or nextremes parameters in the gpd function will alter the xi parameter of the resulting generalised Pareto distribution model.
There are a few reasons why the example you supplied doesn't work. Let's first of all show an example of what does work. The exponential distribution is a special case of a GPD with mu = 0 and xi = 0, so a sample drawn from the exponential distribution should do the trick:
library(evir) # For the shape() function
set.seed(69) # Makes this example reproducible
x <- rexp(300) # Random sample of 300 elements drawn from exponential distribution
shape(x)
Fine.
However, your sample contains an NA. What happens if we make a single value NA in our sample?
x[1] <- NA
shape(x)
#> Error in optim(theta, negloglik, hessian = TRUE, ..., tmp = excess) :
#> non-finite value supplied by optim
So, no NAs allowed.
Unfortunately, you will find that you still get the same error if you remove your NA value. There are two reasons for this. Firstly, you have 9 non-NA samples. What happens if we try a length-9 exponential sample?
shape(rexp(9))
#> Error in optim(theta, negloglik, hessian = TRUE, ..., tmp = excess) :
#> non-finite finite-difference value [1]
We will find that the model will fail to fit with fewer than about 16 data points.
But that's not the only problem. What if we try to get a plot for data that can't be drawn from a generalized Pareto distribution?
# Maybe a uniform distribution?
shape(runif(300, 1, 10))
#> Error in optim(theta, negloglik, hessian = TRUE, ..., tmp = excess) :
#> non-finite finite-difference value [1]
#> In addition: Warning message:
#> In sqrt(diag(varcov)) : NaNs produced
#>
So in effect, you need a bigger sample with no NAs, and it needs to conform approximately to a GPD, otherwise the gpd function will throw an error.
I might be able to help if you let us know the bigger picture of what you are trying to do.
I want to estimate the coefficients for an AR process based on weekly data where the lags occur at t-1, t-52, and t-53. I will naturally lose a year of data to do this.
I currently tried:
lags <- rep(0,54)
lags[1]<- NA
lags[52] <- NA
lags[53] <- NA
testResults <- arima(data,order=c(53,0,0),fixed=lags)
Basically I tried using an ARIMA and shutting off the MA/differencing. I used 0's for the terms I wanted to exclude (plus intercept, and NAs for the terms I wanted.
I get the following error:
Error in optim(init[mask], armafn, method = optim.method, hessian =TRUE, :
non-finite finite-difference value [1]
In addition: Warning message:
In arima(data, order = c(53, 0, 0), fixed = lags) :
some AR parameters were fixed: setting transform.pars = FALSE
I'm hoping there is an easier method or potential solution to this error. I want to avoid creating columns with the lagged variables and simply running a regression. Thanks!
I have a problems with spdep(). Starting with a matrix of non-missing distances produced by a function
dist_m <- geoDistMatrix(data1, group = 'fips_dist')
dist_m[upper.tri(dist_m)] <- t(dist_m)[upper.tri(dist_m)]
we then turn into weights with linear inverse
max_dist <- max(dist_m)
w1 <- (max_dist + 1 - dist_m)/(max_dist + 1)
and now
lw <- mat2listw(w1, row.names = rownames(w1), style = 'M')
I check to make sure no missing weights:
any(is.na(lw$weights))
and since there aren't, go ahead with:
errorsarlm(cvote ~ inc, data = data1, lw, method = 'eigen', quiet = F, zero.policy = TRUE)
leads to the following error:
Error in subset.listw(listw, subset, zero.policy = zero.policy) :
Not yet able to subset general weights lists
This is because at least one observation in data1 is not complete, i.e. has missing values. Hence, errorsarlm wants to subset the data, i.e. restrict to complete cases. But it can't do it now - that's what the error message says.
Best is to subset the data manually or correct the incomplete cases.
This is because the spdep function created a listw object only for non-general weights by default. Set zero.polcy=TRUE beform you perform mat2listw or nb2listw function so that it consider non-neighbors that have zero value.