How to use Dismo's predict() with a maxent model based on a dataframe - r

I am trying to figure out how dismo's predict function operates in terms of a model built with 'x' as a dataframe, rather than raster layers. I have successfully run models using raster layers and made prediction maps based on this.
My model is built as follows;
library(dismo)
model <- maxent(x = sightings.data, p = presence.vector)
with sightings.data being a dataframe containing the GPS locations of sightings, followed by the conditions at these times and locations. presence.vector is a vector indicating if a row is a presence or background point.
I am looking to find out;
What arguments to supply to predict given a model of this type
What predict() is capable of providing from a model such as this
I have successfully run models using raster layers and made prediction maps based on this.
The help file for predict() is not particularly detailed and the 'Species distribution modelling with R' does not successfully cover this topic (the examples just list 'cannot run this example because maxent is not available' outputs).
I have tried modelling with a dataframe containing only variables I have raster layers for, and tried predicting as I would for a model built with rasters, but I get the following error;
Error in .local(object, ...) : missing layers (or wrong names)
I have ensured the dataframe column names and the raster layers have the same names, excluding the mandatory latitude and longitude columns;
names(raster.stack) <- colnames(sightings.data[3:5])

The method I have found from the code avaialble from the following paper Oppel at al 2012 demonstrates that dismo's predict can produce relative values when provided with a dataframe of input variables.
> predictions <- predict(model, variables)
> str(predictions)
num [1:100] 0.635 ...
I'm still looking for an easy method to create a predicted distribution raster map from such predicted values.

If you provide dismo::maxent a dataframe, the function will recognize the first column as longitude and second column as latitude. If the data not follow this format, the function will not work.

In this format sightings data does not need to include the GPS locations, so you can remove the x & y columns from sightings.data. Then you can run the model, and then you can predict to a raster stack with raster names that are identical to the names in the sightings.data column names.
Predict was looking for the GPS locations in your raster stack, which I'm guessing were not there.

Related

Does the sarorderedprobit function in the sarprobit package support panel or timeseries datasets (within r)?

I am attempting to use the sarorderedprobit function (within the "spatialprobit" package) to perform a SAR Ordered Probit estimation using panel data.
I have imported my spatial weight matrix (representing the 50 states of the US) using the following script:
Weight_GAL<- read.gal(File, override.id=TRUE)
Weight_List<nb2listw(Weight_GAL,style="W", zero.policy=TRUE)
W<-listw2mat(Weight_List)
which successfully imports the 50x50 sparse matrix.
The following sarorderedprobit is run:
sarorderedprobit(formula, W=W, showProgress=TRUE)
When using cross-sectional data with 50 observations, the script successfully estimates the sarorderedprobit model. However, when panel data is used with 3 years (i.e., 150 observations), the script returns the following error:
Error: Matrices must have same dimensions in .Arith.Csparse(e1,e2, .Generic, class. = dgCMatrix")
The issue here seems to be related to the use of a 50x50 weight matrix with 150 observations. Unfortunately, I have not found any references to using the sarorderedprobit function with panel data. Can anyone provide guidance on whether the sarorderedprobit function supports estimation using panel or timeseries datasets?
EDIT:
I have calculated the Kronecker product using a sparse matrix to prepare a 150x150 weight matrix using the following script:
tperiods <- len(3)
t_diag <- Diagonal(tperiods)
bigWmat <- kronecker(t_diag,W)
Running the sarorderedprobit function using the bigWmat matrix is successful with no errors. However, I am concerned that this is not correctly handling the temporal nature of the panel data estimation. Do I need to add response dummy variables with the time periods (t=1, t=2, t=3 for the 3 years of panel data)?

Mice in R - how can I understand what this command does?

mice_mod <-
mice(titanicData[, !names(titanicData) %in%
c('PassengerId','Name','Ticket','Cabin','Survived')],
method='rf')
mice_output <- complete(mice_mod)
I am new to R and we had a college lecture yesterday. What does this command do? I have read the online documentation and broke down the command to a series of outputs, with no joy.
The mice function approximates missing values. In you case you are using the "rf" statement, which means the random forest imputations algorithm is used. Since I can't reproduce your dataset, I'm using airquality which is a built in dataset by R with NA values. Those can be approximated. You are creating kinda a prediction model with mice. Actually it is a mids object, which is used by mice for imputed datasets (documentation). If you want to use those imputations, you can call complete for creating the filled dataframe.
library(mice)
df<-airquality
mice_mod <- mice(df, method='rf')
mice_output <- complete(mice_mod)
When you compare df and mice_output, you'll see the NA values in Ozone and Solar got replaced.
In your example your lecturer is using all names which are not in the called list of names. So he is filtering the dataframe beforehand.
If you want more information about the algorithm: regarding to the documentation it is described in
Doove, L.L., van Buuren, S., Dusseldorp, E. (2014), Recursive
partitioning for missing data imputation in the presence of
interaction Effects. Computational Statistics \& Data Analysis, 72,
92-104.

How to predict new variables for a new randomly generate dataset using multiple regression in R?

I have generate a MARS regression model using known soil property data collected from field samples across the great plains region. I reduced all variables down to 5 predictor variables (elevation, tpi, k_factor, precipitation and temperature) and a single dependent variable (soil organic content:SOC). I split the original data set to a training class and a test class. I was able to utilize my model to predict values on the test dataset after the model was created just fine.
I want to predict on a newly generated dataset with data derived from geospatial rasters across teh great plains region. I generated random samples based on the study are size and created a point shapefile over the area. The rasters were written into the points where they intersected to give me a table full of the 5 perdictor variables for each point. I do not have a SOC raster, so my new table is missing that column.
My intention was to predict the SOC values based on the 5 predictor variables in the new table. However, I keep getting an error " variable lengths differ" for each of my columns. I would like to export the predictions back to the new table to be able to visualize the distribution of SOC within GIS. Below is example of my code:
setwd("E:\\Fall19\\stats\\FinalProject\\Excel_tables")
table=read.csv("sel_el_train.csv")
attach(table)
my_data=table[,c(8,9,15,16,18,19)]
mars1 <- earth(
SOC ~ ., data=my_data)
print(mars1)
summary(mars1)
plot(mars1)
predict(mars1, newdata=test.data)
Below are screen shots of the bottom of the record. You can see a difference of the number of records i built the model out of and the dataset I'm trying to predict on.
I figured it out. The method I was using was very particular about heading spelling. My K_factor variable was not spelled correctly. Once all column names matched up, everything worked well.

Spatial Regressions with Panel Data in R

I have a panel dataset with several hundred regions, ~10 years and spatial data for the regions. I created a weight matrix with the spdeppackage (via the standard way, and then, nb2listw).
I have, thus, a matrix with weights for each region (in relation to the other regions) - but each region is represented just once.
I would like to run some of the spatial regressions from the spdeppackage (lagsarlm, errorsarlm), but I get an error:
Error in subset.listw(listw, subset, zero.policy = zero.policy) :
Not yet able to subset general weights lists
and
Error in lagsarlm(y ~ x1 + x2: Input data and weights have different dimensions
I assume this is because the weight matrix has only one row per region (and then, only one year can be calculated). Do you have any suggestions how to attack the issue?
My ideas revolve around the following:
Extend the spatial weight matrix OR
Tell spdep that the regions will repeat in the same order (but how?)
Looking forward to your suggestions.

convert a list -class numeric- into a distance structure in R

I have a list that looks like this, it is a measure of dispersion for each sample.
1 2 3 4 5
0.11829384 0.24987017 0.08082147 0.13355495 0.12933790
To further analyze this I need it to be a distance structure, the -vegan- package need it as a 'dist' object.
I found some solutions that applies to matrices > dist, but how could I change this current data into a dist object?
I am using the FD package, at the manual I found,
Still, one potential advantage of FDis over Rao’s Q is that in the unweighted case
(i.e. with presence-absence data), it opens possibilities for formal statistical tests for differences in
FD between two or more communities through a distance-based test for homogeneity of multivariate
dispersions (Anderson 2006); see betadisper for more details
I wanted to use vegan betadisper function to test if there are differences among different regions (I provided this using element "region" with column "region" too)
functional <- FD(trait, comun)
mod <- betadisper(functional$FDis, region$region)
using gowdis or fdisp from FD didn't work too.
distancias <- gowdis(rasgo)
mod <- betadisper(distancias, region$region)
dispersion <- fdisp(distancias, presence)
mod <- betadisper(dispersion, region$region)
I tried this but I need a list object. I thought I could pass those results to betadisper.
You cannot do this: FD::fdisp() does not return dissimilarities. It returns a list of three elements: the dispersions FDis for each sampling unit (SU), and the results of the eigen decomposition of input dissimilarities (eig for eigenvalues, vectors for orthonormal eigenvectors). The FDis values are summarized for each original SU, but there is no information on the differences among SUs. The eigen decomposition can be used to reconstruct the original input dissimilarities (your distancias from FD::gowdis()), but you can directly use the input dissimilarities. Function FD::gowdis() returns a regular "dist" structure that you can directly use in vegan::betadisper() if that gives you a meaningful analysis. For this, your grouping variable must be based on the same units as your distancias. In typical application of fdisp, the units are species (taxa), but it seems you want to get analysis for communities/sites/whatever. This will not be possible with these tools.

Resources