Local kriging reports error in `krigeST` with full data (`STFDF`) - r

I have been playing around with the krigeST function in the gstat package, and I tried to do local kriging with a full set of SpatialPolygons based data (STFDF).
When I tried the following
spatio_time_krige_sumProd = krigeST(conso~1,
some_STFDF_with_SpatialPolygons_in_sp,
newdata = some_STF_with_SpatialPolygons_in_sp,
fitted_prodSum_variogram,
nmax = 50,
stAni = 4)
Error in from#sp[from#index[, 1], ] :
SpatialPolygons selection: can't find plot order if polygons are replicated
What seems to be the problem is that in krigeST.local, the some_STFDF_with_SpatialPolygons_in_sp is coerced into a SFIDF. In that, there is a subsetting step which is not allowed by the SpatialPolygons class, since there are repeated terms in from#index[, 1].
I'd like to try non-separable spatio-temporal variogram, so not using local kriging isn't really an option. Is there a workaround for this problem?
Thanks.

Related

Create a hierarchical clustering dendrogram for integrated Seurat object in R?

Does anybody know how to create a dendrogram for an integrated Seurat object. I can do it for a non-integrated object, but when I try:
immune.combined <- BuildClusterTree(object = immune.combined, slot = "data")
I see the error:
Error in hclust(d = data.dist) : NA/NaN/Inf in foreign function call (arg 10)
If you followed the normal Seurat workflow, at some point you will have changed the default assay to "RNA". Looking at the source for BuildClusterTree, it uses the most variable features from the chosen assay (var.features in the Large Seurat object under your chosen assay). For the integrated workflow, you only calculated these values for the "integrated" assay, not the RNA assay. You therefore need to do the analysis on the integrated assay. That would imply something like this:
sampleIntegrated <- BuildClusterTree(sampleIntegrated,assay="integrated")
For some reason that does not work, and the same error is produced. If you first explicitly set the default assay to integrated, however, it works:
DefaultAssay(sampleIntegrated) <- "integrated"
sampleIntegrated <- BuildClusterTree(sampleIntegrated,assay="integrated")
You can then use your visualization method of choice. For example, using the ggtree package and Tool from Seurat:
library(ggtree)
myPhyTree <- Tool(object=sampleIntegrated, slot = "BuildClusterTree")
ggtree(myPhyTree)+geom_tiplab()+theme_tree()+xlim(NA,400)

Specifying custom weights for the nonparametric estimate of spatially-varying relative risk in spatstat

Is there a way to specify weights in relrisk.ppp function in spatstat (version 1.63-3)?
The relrisk.ppp function calls the density.ppp function, which does allow users to specify their own weights.
For example, let us build upon the provided spatstat.data::urkiola data where, instead of individual trees, the locations are tree stands and we have a second numeric mark for the frequency of trees at each point-location:
urkiola_new <- spatstat.data::urkiola
urkiola_new$marks <- data.frame("type" = urkiola_new$marks, "freq" = rpois(urkiola_new$n, 3))
f1 <- spatstat::relrisk(urkiola_new, weights = urkiola_new$marks$freq)
When using the urkiola_new in a call of relrisk, urkiola_new is caught by stopifnot(is.multitype(X)) in relrisk.ppp. I next tried specifying the weights separately as a vector while using the original urkiola data,
f2 <- spatstat::relrisk(urkiola, weights = urkiola_new$marks$freq)
but was caught by a warning from the pixellate.ppp function within the internal density.ppp function:
Error in pixellate.ppp(x, ..., padzero = TRUE) : length(weights) == npoints(x) || length(weights) == 1 is not TRUE
The same error occurs when I convert the weights into a list
urkiola_weights <- split(urkiola_new$marks$freq, urkiola_new$marks$type)
f3 <- spatstat::relrisk(urkiola, weights = urkiola_weights)
I suspect there is a way to specify the weights cleverly, but it yet escapes me. Any suggestions or guidance would be helpful, thank you!
The function relrisk.ppp is not currently designed to handle weights. The help entry for relrisk.ppp does not mention weights.
The example above does not work because relrisk.ppp applies density.ppp separately to the sub-patterns of points of each type, and the extra argument weights is the wrong length for these sub-patterns.
I will take this question as a feature request, to add this capability to relrisk.ppp. It should be done soon.
Update: this is now implemented in the development version, spatstat 1.64-0.018 available at the spatstat github repository

How to apply universal kriging with custom prediction spatial grid using autoKrige in R

I want to apply universal kriging on a dataset using the autokrige function in R. I would like to create my own custom, spatial grid for the predicted points (for the new_data argument of autokrige). I am using R version 3.2.2 (64-bit) and RStudio Version 0.99.486. The following is what I've done so far:
library(automap)
library(sp)
library(gstat)
library(raster)
library(rgdal)
data(meuse)
coordinates(meuse) <- ~x + y
proj4string(meuse) <- CRS("+init=epsg:28992")
The following code was received from stackexchange here (credit goes to Jeffrey Evans) and is used to create a custom spatial grid for the prediction values:
ext_meuse <- as(extent(meuse), "SpatialPolygons")
r_meuse <- rasterToPoints(raster(ext_meuse, resolution = 59), spatial = TRUE)
proj4string(r_meuse) <- proj4string(meuse)
I then try to apply universal kriging (regression on the 'dist' column) using autoKrige:
kriging_result = autoKrige(zinc~dist, meuse, r_meuse)
The following error is then received:
Error in model.frame.default(terms.f, newdata, na.action = na.action, :
object is not a matrix In addition: Warning message:
'newdata' had 3102 rows but variable found had 1 row
Did I made a mistake with the grid creation (r_meuse)? Is there a 'better' way to create a grid for the predicted data? All the examples I have found so far uses the meuse.grid data, but I would like to apply universal kriging to other data that does not have its own grid data yet.
I believe the problem here is that you are performing UK without having the predictor, dist, present in r_meuse. This is a problem as that information is needed for the linear to make a prediction. So, r_meuse needs to be a SpatialPointsDataFrame with dist defined.

R: Autokrige.cv function in automap package generates NaNs

I’m fairly new to R and I am trying to make interpolations of temperature measurements that where gathered from different station across the Netherlands. I have data for about 35 stations that make measurements every 10 minutes covering a timespan of about two weeks. Accordingly, I figured it would be best to make a loop that takes care of this. To see how well the interpolation technique works I want to do a cross validation for every timestamp.
In order to do this I used the Autokrige function from the automap package, and next I used the compare.cv function from the automap package in order to get an overview of the most important statistics for all time stamps. Besides that, I made sure the cross validation is only done if at least 25 stations registred meassurements.
The problem however is, that my code as described below works most of the time but gives the following warnings in 4 cases:
1. In sqrt(ret[[var.name]]) : NaNs produced
2. In sqrt(ret[[var.name]]) : NaNs produced
3. In sqrt(ret[[var.name]]) : NaNs produced
4. In sqrt(ret[[var.name]]) : NaNs produced
When I try to use the compare.cv command for the total list including all the cross validations it gives me the following error:
"Error in quantile.default(as.numeric(x), c(0.25, 0.75), na.rm = na.rm, :
missing values and NaN's not allowed if 'na.rm' is FALSE"
Im wondering what causes the Autokrige function to generate NaNs in the cross validation, and more importantly how I can remove them from the results.cv so that I can use the compare.cv function?
rm(list=ls())
# load packages
require(sp)
require(gstat)
require(ggmap)
require(automap)
require(ggplot2)
#load data (download link provided below)
load("download path") https://www.dropbox.com/s/qmi3loub29e55io/meassurements_aug.RDS?dl=0
# make data spatial and assign spatial coordinate system
coordinates(meassurements) = ~x+y
proj4string(meassurements) <- CRS("+init=epsg:4326")
meassurements_df <- as.data.frame(meassurements)
# loop for cross validation
timestamp <- meassurements$import_log_id
results.cv=list()
for (i in unique(timestamp)) {
x = meassurements_df[which(meassurements$import_log_id == i), ]
if(sum(!is.na(x$temperature)) > 25){
results.cv[[paste0(i)]] = autoKrige.cv (temperature ~ 1, meassurements[which(meassurements$import_log_id == i & !is.na(meassurements$temperature)), ])
}
}
# calculate key statistics (RMSE MAE etc)
compare.cv(results.cv)
Thanks!
I came across the same problem and solved it with the help of remove.duplicates() of package sp on the SpatialPointDataFrame used for kriging. Prior to that I calculated the mean of the relevant variables in the DataFrame.
SPDF#data <- SPDF#data %>%
group_by(varx,vary,varz) %>%
mutate_at(vars(one_of(relevant_var)),mean,na.rm=TRUE) %>%
ungroup()
SPDF <- SPDF %>% remove.duplicates()
At the time I was encountering the same problem the Dropbox link above was not working anymore, so I could not check this specific example.

R problem with randomForest classification with raster package

I am having an issue with randomForest and the raster package. First, I create the classifier:
library(raster)
library(randomForest)
# Set some user variables
fn = "image.pix"
outraster = "classified.pix"
training_band = 2
validation_band = 1
original_classes = c(125,126,136,137,151,152,159,170)
reclassd_classes = c(122,122,136,137,150,150,150,170)
# Get the training data
myraster = stack(fn)
training_class = subset(myraster, training_band)
# Reclass the training data classes as required
training_class = subs(training_class, data.frame(original_classes,reclassd_classes))
# Find pixels that have training data and prepare the data used to create the classifier
is_training = Which(training_class != 0, cells=TRUE)
training_predictors = extract(myraster, is_training)[,3:nlayers(myraster)]
training_response = as.factor(extract(training_class, is_training))
remove(is_training)
# Create and save the forest, use odd number of trees to avoid breaking ties at random
r_tree = randomForest(training_predictors, y=training_response, ntree = 201, keep.forest=TRUE) # Runs out of memory, does not allow more trees than this...
remove(training_predictors, training_response)
Up to this point, all is good. I can see that the forest was created correctly by looking at the error rates, confusion matrix, etc. When I try to classify some data, however, I run into trouble with the following, which returns all NA's in predictions:
# Classify the whole image
predictor_data = subset(myraster, 3:nlayers(myraster))
layerNames(predictor_data) = layerNames(myraster)[3:nlayers(myraster)]
predictions = predict(predictor_data, r_tree, type='response', progress='text')
And gives this warning:
Warning messages:
1: In `[<-.factor`(`*tmp*`, , value = c(1, 1, 1, 1, 1, 1, ... :
invalid factor level, NAs generated
(keeps going like this)...
However, calling predict.randomForest directly works fine and returns the expected predictions (this is not a good option for me because the image is large, and I cannot store the whole matrix in memory):
# Classify the whole image and write it to file
predictor_data = subset(myraster, 3:nlayers(myraster))
layerNames(predictor_data) = layerNames(myraster)[3:nlayers(myraster)]
predictor_data = extract(predictor_data, extent(predictor_data))
predictions = predict(r_tree, newdata=predictor_data)
How can I get it to work directly with the "raster" version? I know that this is possible, as shown in the examples of predict{raster}.
You could try nesting predict.randomForest within the writeRaster function and write the matrix as a raster in chunks as per the pdf included in the raster package. Before that, try the argument 'na.rm=TRUE' when calling predict in the raster function. You might also assign dummy values to the NAs in the predict rasters, then later rewriting them as NAs using functions in the raster package.
As for memory problems when calling RFs, I've had a plethora of memory issues dealing with BRTs. They're immense on disk and in memory! (Should a model be more complex than the data?) I've not had them run reliably on 32-bit machines (WinXp or Linux). Sometimes tweaking Windows memory allotment to applications has helped, and moving to Linux has helped more, but I get the most from 64-bit Windows or Linux machines, since they impose a higher (or no) limit on the amount of memory applications can take. You may be able to increase the number of trees you can use by doing this.

Resources