I am using spatstat v.1.59-0 to build some point process models, but I am having problems with some of the validation tools, specifically effectfun and and the residual plot parres
The components of the model are rings_pp which consists of point locations for 52 archaeological sites, and elev which is a DEM converted to a spatstat pixel image. I am trying to evaluate the fit of a Gibbs point process model with an Area Interaction parameter (AreaInter) and elev
I fit the model to the data using the following code:
rr1 <- data.frame(r=seq(100, 2000, by=50))
p1 <- profilepl(rr1, AreaInter, rings_pp~elev, aic=T)
ppm5 <- as.ppm(p1)
Everything seems to be working fine (e.g., other diagnostics show the model is a reasonable fit to the data) except when I try to use effectfun and parres to evaluate the effect of the elevation covariate, they wont work.
parres gives the error Error in m[, d$relevant, drop = FALSE] : (subscript) logical subscript too long When I traceback the error I get the following:
`> trace(parres(ppm5, "elev"))
Error in m[, d$relevant, drop = FALSE] :
(subscript) logical subscript too long
> traceback()
4: effectFun.can(x)
3: effectFun(xxx)
2: parres(ppm5, "elev")
1: trace(parres(ppm5, "elev"))`
effectfun works when se.fit=FALSE but will not work when se.fit=TRUE and gives the following error: Error in quadform(mm, vc) : ncol(x) == nrow(v) is not TRUE When I traceback the error I get the following:
`> trace(plot(effectfun(ppm5, "elev", se.fit=T)))
Error in quadform(mm, vc) : ncol(x) == nrow(v) is not TRUE
> traceback()
8: stop(simpleError(msg, call = sys.call(-1)))
7: stopifnot(ncol(x) == nrow(v))
6: quadform(mm, vc)
5: predict.ppm(orig.model, locations = fakeloc, covariates = fakecov,
se = se.fit)
4: predict(orig.model, locations = fakeloc, covariates = fakecov,
se = se.fit)
3: effectfun(ppm5, "elev", se.fit = T)
2: plot(effectfun(ppm5, "elev", se.fit = T))
1: trace(plot(effectfun(ppm5, "elev", se.fit = T)))`
When I fit an inhomogeneous model ppm1 <- ppm(rings_pp, ~elev) both effectfun and parres work fine and show a good fit (though using the residual K function Kres suggest the model isn't accounting for clustering). So it seems to be something with how I've fit the Gibbs AreaInter model.
Any advice would be much appreciated.
Thanks again for a clear question.
This is a bug, which can be demonstrated by the simple example
fit <- ppm(cells ~ x, AreaInter(0.07))
plot(effectfun(fit, se.fit=TRUE))
plot(parres(fit))
I will investigate and fix it soon.
Postscript: this has now been fixed in the development version of spatstat (version 1.59-0.020) available on the GitHub repository
.
Related
I am using glmmTMB to run a zero-inflated two-component hurdle model to determine how certain covariates might influence (1) whether or not a fish has food in its stomach and (2) if the stomach contains food, which covariates effect the number of prey items found in its stomach.
My data consists of the year a fish was caught, the season it was caught, sex, condition, place of origin, gross sea age (1SW = one year at sea, MSW = multiple years at sea), its genotype at two different loci, and fork length residuals. Data are available at my GitHub here.
Model interpretation
When I run the model (see code below), I get the following warning message about unusually large z-statistics.
library(glmmTMB)
library(DHARMa)
library(performance)
set.seed(111)
feast_or_famine_all_prey <- glmmTMB(num_prey ~ autumn_winter+
fishing_season + sex+ condition_scaled +
place_of_origin+
sea_age/(gene1+gene2+fork_length_residuals) + (1|location),
data = data_5,
family= nbinom2,
ziformula = ~ .,
dispformula = ~ fishing_season + place_of_origin,
control = glmmTMBControl(optCtrl = list(iter.max = 100000,
eval.max = 100000),
profile = TRUE, collect = FALSE))
summary(feast_or_famine_all_prey_df)
diagnose(feast_or_famine_all_prey_df)
Since the data does display imbalance for the offending variables (e.g. mean number of prey items in autumn = 85.33, mean number of prey items in winter = 10.61), I think the associated model parameters are near the edge of their range, hence, the extreme probabilities suggested by the z-statistics. Since this is an actual reflection of the underlying data structure (please correct me if I'm wrong!) and not a failure of the model itself, is the model output safe to interpret and use?
Conflicting error messages
Using the diagnose() function as well as exploring model diagnostics using the DHARMa package seem to suggest the model is okay.
diagnose(feast_or_famine_all_prey_df)
ff_all_prey_residuals_df<- simulateResiduals(feast_or_famine_all_prey_df, n = 1000)
testUniformity(ff_all_prey_residuals_df)
testOutliers(ff_all_prey_residuals_df, type = "bootstrap")
testDispersion(ff_all_prey_residuals_df)
testQuantiles(ff_all_prey_residuals_df)
testZeroInflation(ff_all_prey_residuals_df)
However, if I run the code performance::r2_nakagawa(feast_or_famine_all_prey_df) then I get the following error messages:
> R2 for Mixed Models
Conditional R2: 0.333
Marginal R2: 0.251
Warning messages:
1: In (function (start, objective, gradient = NULL, hessian = NULL, :
NA/NaN function evaluation
2: In (function (start, objective, gradient = NULL, hessian = NULL, :
NA/NaN function evaluation
3: In (function (start, objective, gradient = NULL, hessian = NULL, :
NA/NaN function evaluation
4: In fitTMB(TMBStruc) :
Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
5: In fitTMB(TMBStruc) :
Model convergence problem; false convergence (8). See vignette('troubleshooting')"
None of these appeared using diagnose() nor were they (to the best of my knowledge) hinted at by the DHARMa diagnostics. Should these errors be believed?
Short answer: when you run performance::r2_nakagawa it refits the model with the fixed effects components removed. It's possible that your R^2 estimates are unreliable, but this shouldn't affect any of the other model results.
(update after much digging):
The code descends through these functions:
performance::r2_nakagawa
performance:::.compute_random_vars
insight::get_variance
insight:::.compute_variances
insight:::.compute_variance_distribution
insight:::.variance_distributional
insight:::null_model
insight:::.null_model_mixed
at which point it tries to run a null model with no fixed effects (num_prey ~ (1 | location)). This is where the warnings are coming from.
When I run your code I get R^2 values of 0.308/0.237, which does suggest that this is a somewhat unstable calculation (not that these differences would really change the conclusion much).
I am trying to use the flashlight package with the h2o package. An example of doing this on a regression model can be found here. However, I am trying to make it work for a classification model... to achieve this I was following the example given in the link. flashlight will work with h2o if you provide your own custom predict function. However, the predict function that is in the example below does not work for classification.
Here is the code I'm using:
library(flashlight)
library(h2o)
h2o.init()
h2o.no_progress()
iris_hf <- as.h2o(iris)
iris_dl <- h2o.deeplearning(x = 1:4, y = "Species", training_frame = iris_hf, seed=123456)
pred_fun <- function(mod, X) as.vector(unlist(h2o.predict(mod, as.h2o(X))))
fl_NN <- flashlight(model = iris_dl, data = iris, y = "Species", label = "NN",
predict_function = pred_fun)
But when I try and check the importance or interactions, I get an error.... for example:
light_interaction(fl_NN, type = "H",
pairwise = TRUE)
Throws back the error:
Error: Assigned data predict(x, data = X[, cols, drop = FALSE]) must
be compatible with existing data. Existing data has 22500 rows.
Assigned data has 90000 rows. ℹ Only vectors of size 1 are recycled.
I need to change the predict function somehow to make it work... but I have had no success yet... any suggestion as to how I could change the predict function to work?
EDIT UPDATE: So, I found a custom predict function that works with the light_interaction function. That is:
pred_fun <- function(mod, X) as.vector(unlist(h2o.predict(mod, as.h2o(X))[,2]))
Where the above is indexed for the specific category. However, The above doesn't work for calculating the importance. For example:
light_importance(fl_NN)
Gives the error:
Warning messages:
1: In Ops.factor(actual, predicted) : ‘-’ not meaningful for factors
2: In Ops.factor(actual, predicted) : ‘-’ not meaningful for factors
3: In Ops.factor(actual, predicted) : ‘-’ not meaningful for factors
4: In Ops.factor(actual, predicted) : ‘-’ not meaningful for factors
5: In Ops.factor(actual, predicted) : ‘-’ not meaningful for factors
So, Im still trying to figure this out!?
I am trying to use inverse probability of treatment weighting in a cause-specific cox regression using the CSC function in the riskRegression Package.
I calculated the weights without a problem, but when I try to pass the weights to the CSC function I get the following error message:
Error in eval(extras, data, env) :
..1 used in an incorrect context, no ... to look in
A complete reproducible example looks like this:
library(ipw)
library(cmprsk)
library(survival)
library(riskRegression)
data(mgus2)
# get some example data
mgus2$etime <- with(mgus2, ifelse(pstat==0, futime, ptime))
mgus2$event <- with(mgus2, ifelse(pstat==0, 2*death, 1))
mgus2$event <- factor(mgus2$event, 0:2, labels=c("censor", "pcm", "death"))
mgus2$age_cat <- cut(mgus2$age, breaks=seq(0, 100, 25))
mgus2$sex <- ifelse(mgus2$sex=="F", 0, 1)
# remove NA
mgus2 <- subset(mgus2, !is.na(mspike))
# estimate inverse probability weights
weights <- ipwpoint(sex, "binomial", "logit", denominator= ~ age_cat + mspike,
data=mgus2)
mgus2$weights <- weights$ipw.weights
# rerun cox model using weights
mod2 <- CSC(Hist(etime, event) ~ sex + age_cat + mspike, cause="pcm",
surv.type="hazard", fitter="coxph", data=mgus2,
weights=weights)
I know from the documentation that the CSC function calls the coxph function internally, passing additional arguments to it using ... syntax. Other arguments could be passed to the function just fine, but the weight argument always produces the error message above.
How can I fix this?
UPDATE:
I have contacted the Package Maintainer and he has fixed the bug already. It should work fine now, with one little difference: Instead of weights=weights one has to use weights=mgus2$weights.
I have 9,150 polygons in my dataset. I was trying to run a spatial autoregressive model (SAR) in spdep to test spatial dependence of my outcome variable. After running the model, I wanted to examine the direct/indirect impacts, but encountered an error that seems to have something to do with the length of neighbors in the weights matrix not being equal to n.
I tried running the very same equation as SLX model (Spatial Lag X), and impacts() worked fine, even though there were some polygons in my set that had no neighbors. I Googled and looked at spdep documentation, but couldn't find a clue on how to solve this error.
# Defining queen contiguity neighbors for polyset and storing the matrix as list
q.nbrs <- poly2nb(polyset)
listweights <- nb2listw(q.nbrs, zero.policy = TRUE)
# Defining the model
model.equation <- TIME ~ A + B + C
# Run SAR model
reg <- lagsarlm(model.equation, data = polyset, listw = listweights, zero.policy = TRUE)
# Run impacts() to show direct/indirect impacts
impacts(reg, listw = listweights, zero.policy = TRUE)
Error in intImpacts(rho = rho, beta = beta, P = P, n = n, mu = mu, Sigma = Sigma, :
length(listweights$neighbours) == n is not TRUE
I know that this is a question from 2019, but maybe it can help people dealing with the same problem. I found out that in my case the problem was the type of dataset, your data=polyset should be of type "SpatialPolygonsDataFrame". Which can be achieved by converting your data:
polyset_spatial_sf <- sf::as_Spatial(polyset, IDs = polyset$ID)
Then rerun your code.
i want to use naive Bayes classifier to make some predictions.
So far i can make the prediction with the following (sample) code in R
library(klaR)
library(caret)
Faktor<-x <- sample( LETTERS[1:4], 10000, replace=TRUE, prob=c(0.1, 0.2, 0.65, 0.05) )
alter<-abs(rnorm(10000,30,5))
HF<-abs(rnorm(10000,1000,200))
Diffalq<-rnorm(10000)
Geschlecht<-sample(c("Mann","Frau", "Firma"),10000,replace=TRUE)
data<-data.frame(Faktor,alter,HF,Diffalq,Geschlecht)
set.seed(5678)
flds<-createFolds(data$Faktor, 10)
train<-data[-flds$Fold01 ,]
test<-data[flds$Fold01 ,]
features <- c("HF","alter","Diffalq", "Geschlecht")
formel<-as.formula(paste("Faktor ~ ", paste(features, collapse= "+")))
nb<-NaiveBayes(formel, train, usekernel=TRUE)
pred<-predict(nb,test)
test$Prognose<-as.factor(pred$class)
Now i want to improve this model by feature selection. My real data is about 100 features big.
So my question is , what woould be the best way to select the most important features for naive Bayes classification?
Is there any paper dor reference?
I tried the following line of code, bit this did not work unfortunately
rfe(train[, 2:5],train[, 1], sizes=1:4,rfeControl = rfeControl(functions = ldaFuncs, method = "cv"))
EDIT: It gives me the following error message
Fehler in { : task 1 failed - "nicht-numerisches Argument für binären Operator"
Calls: rfe ... rfe.default -> nominalRfeWorkflow -> %op% -> <Anonymous>
Because this is in german you may please reproduce this on your machine
How can i adjust the rfe() call to get a recursive feature elimination?
This error appears to be due to the ldaFuncs. Apparently they do not like factors when using matrix input. The main problem can be re-created with your test data using
mm <- ldaFuncs$fit(train[2:5], train[,1])
ldaFuncs$pred(mm,train[2:5])
# Error in FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...) :
# non-numeric argument to binary operator
And this only seems to happens if you include the factor variable. For example
mm <- ldaFuncs$fit(train[2:4], train[,1])
ldaFuncs$pred(mm,train[2:4])
does not return the same error (and appears to work correctly). Again, this only appears to be a problem when you use the matrix syntax. If you use the formula/data syntax, you don't have the same problem. For example
mm <- ldaFuncs$fit(Faktor ~ alter + HF + Diffalq + Geschlecht, train)
ldaFuncs$pred(mm,train[2:5])
appears to work as expected. This means you have a few different options. Either you can use the rfe() formula syntax like
rfe(Faktor ~ alter + HF + Diffalq + Geschlecht, train, sizes=1:4,
rfeControl = rfeControl(functions = ldaFuncs, method = "cv"))
Or you could expand the dummy variables yourself with something like
train.ex <- cbind(train[,1], model.matrix(~.-Faktor, train)[,-1])
rfe(train.ex[, 2:6],train.ex[, 1], ...)
But this doesn't remember which variables are paired in the same factor so it's not ideal.