I asked this question on RCommunity but haven't had anyone bite... so I'm here!
My current project involves me predicting whether some trees will survive given future climate change scenarios. Against better judgement (like using Maxent) I've decided to pursue this with a GLM, which requires presence and absence data. Everytime I generate my absence data (as I was only given presence data) using randomPoints from dismo, the resulting GLM model has different significant variables. I found a package called My.stepwise that has a My.stepwise.glm function (here: My.stepwise.glm: Stepwise Variable Selection Procedure for Generalized Linear... in My.stepwise: Stepwise Variable Selection Procedures for Regression Analysis) , and this goes through a forward/backward selection process to find the best variables and returns a model ready for you.
My problem is that I don't want to run My.stepwise.glm just once and use the model it spits out for me. I'd like to run it roughly 100 times with different pseudo-absence data and see which variables it returns, then take the most frequent variables and move forward with building my model using those. The issue is that the My.stepwise.glm function ends by 'print(summary(initial.model))' and I would like to be able to access the output similar to how step() returns a list, where you can then say 'step$coefficients' and have the function coefficients return as numerics. Can anyone help me with this?
I used the nowcast function from R package to use dynamic factor model to nowcast GDP using the extracted factors. I Have tried multiple combination of the initial variables and finally obtained this model which all variables in it seems significant and teh ales obtained for my variable of interest is acceptable.
enter image description here
But I can't find any reference about what tests on residuals that I need to do in order to validate this model.
I am really struggling and have been stuck in this for a month, I need to submit my graduation project this weekend and I really need this model to work. so any help will be very much appreciated. Thank you.
Update 1:
This is teh acf plot n residuals suggested by the same package nowcasting, I think my model passes that test and therefore I can use it. right?
enter image description here
{gtsummary} has the tbl_svysummary() function for producing summary statistics tables from survey.design objects created by the {survey} package. The {gtsummary} website provides an example of how to add confidence intervals for tbl_summary(), by defining custom functions for calculating CIs which are then passed to the statistics = argument in tbl_summary().
However, the documentation for tbl_svysummary() noted that "Unlike tbl_summary(), it is not possible to pass a custom function." I'm using a survey.design object since I'm applying weighting to my data, but I really like the output of {gtsummary}, so it would be great if I could find a way to add confidence intervals as I need to show these for reporting.
Any suggestions on how to achieve this, or is it not possible?
I am sorry to report that it is currently not possible. The way one would go about it using the add_stat() function (example here How to generate effect size [90%CI] in the summary table using R package “gtsummary”?). But that function has not yet been generalized to work with tbl_svysummary() objects.
I had never considered generalizing it until now, so thank you very much for your question. I opened a GitHub Issue to track implementation progress. You can subscribe to the issue to be notified when it is complete.
https://github.com/ddsjoberg/gtsummary/issues/688
Happy Programming!
I hope I have come to the right forum. I'm an ecologist making species distribution models using the maxent (version 3.3.3, http://www.cs.princeton.edu/~schapire/maxent/) function in R, through the dismo package. I have used the argument "replicates = 5" which tells maxent to do a 5-fold cross-validation. When running maxent from the maxent.jar file directly (the maxent software), an html file with statistics will be made, including the prediction maps. In R, an html file is also made, but the prediction maps have to be extracted afterwards, using the function "predict" in the dismo package in r. When I do this, I get 5 maps, due to the 5-fold cross-validation setting. However, (and this is the problem) I want only one output map, one "summary" prediction map. I assume this is possible, although I don't know how maxent computes it. The maxent tutorial (see link above) says that:
"...you may want to avoid eating up disk space by turning off the “write output grids” option, which will suppress writing of output grids for the replicate runs, so that you only get the summary statistics grids (avg, stderr etc.)."
A list of arguments that can be put into R is found in this forum https://groups.google.com/forum/#!topic/maxent/yRBlvZ1_9rQ.
I have tried to use the argument "outputgrids=FALSE" both in the maxent function itself, and in the predict function, but it doesn't work. I still get 5 maps, even though I don't get any errors in R.
So my question is: How do I get one "summary" prediction map instead of the five prediction maps that results from the cross-validation?
I hope someone can help me with this, I am really stuck and haven't found any answers anywhere on the internet. Not even a discussion about this. Hope my question is clear. This is the R-script that I use:
model1<-maxent(x=predvars, p=presence_points, a=target_group_absence, path="//home//...//model1", args=c("replicates=5", "outputgrids=FALSE"))
model1map<-predict(model1, predvars, filename="//home//...//model1map.tif", outputgrids=FALSE)
Best regards,
Kristin
Sorry to be the bearer of bad news, but based on the source code, it looks like Dismo's predict function does not have the ability to generate a summary map.
Nitty-gritty details for those who care: When you call maxent with replicates set to something greater than 1, the maxent function returns a MaxEntReplicates object, rather than a normal MaxEnt object. When predict receives a MaxEntReplicates object, it just iterates through all of the models that it contains and calls predict on them individually.
So, what next? Fortunately, all is not lost! The reason that Dismo doesn't have this functionality is that for most kinds of model-building, there isn't actually a valid way to average parameters across your cross-validation models. I don't want to go so far as to say that that's definitely the case for MaxEnt specifically, but I suspect it is. As such, cross-validation is usually used more as a way of checking that your model building methodology works for your data than as a way of building your model directly (see this question for further discussion of that point). After verifying via cross-validation that models built using a given procedure seem to be accurate for the phenomenon you're modelling, it's customary to build a final model using all of your data. In theory this new model should only be better than models trained on a subset of your data.
So basically, assuming your cross-validated models look reasonable, you can run MaxEnt again with only one replicate. Your final result will be a model accuracy estimate based on the cross-validation and a map based on the second run with all of your data lumped together. Depending on what exactly your question is, there might be other useful summary statistics from the cross-validation that you want to use, but those are all things you've already seen in the html output.
I may have found this a couple of years later. But you could do something like this:
xm <- maxent(predictors, pres_train) # basically the maxent model
px <- predict(predictors, xm, ext=ext, progress= '' ) #prediction
px2 <- predict(predictors, xm2, ext=ext, progress= '' ) #prediction #02
models <- stack(px,px2) # create a stack of prediction from all the models
final_map <- mean(px,px2) # Take a mean of all the prediction
plot(final_map) #plot the averaged map
xm1,xm2,.. would be the maxent models for each partitions in cross-validation, and px, px2,.. would be the predicted maps.
How do I create a customized function in R that fits all multiple linear regression models from the given data with number of variables specified by the user? The function looks like this:
BodyFat.lm <- lm(PercentBodyFat ~ ., data = BodyFat)
fits for all data. I want function where user specify the number of variables like
(my.data = BodyFat, n = 2)
You should be able to do what you want with dredge in the MuMin package. Perhaps something like this:
library(MuMIn)
BodyFat.lm.2 <- dredge(BodyFat.lm, m.max=2, m.min=2)
As a great resource which shows a possible solution, you might want to reference the following excellent post by Mark Heckmann which shows how to calculate all possible linear regression models for a given set of predictors. As the author points out, you can take a few approaches:
1) Write a lot of code (he does this), to follow a repetition driven step-by-step analysis approach
2) Make use of a specialized package. The author suggests the packages leaps and meifly, but notes that both seem to have some drawbacks. Note that you can see specific code and more information on Hadley Wickham's meifly package here: https://github.com/hadley/meifly/