Multidimensional random draw without replacement with 'predrawn' samples in pytorch

Multidimensional random draw without replacement with 'predrawn' samples in pytorch - multidimensional-array

I have an (N, I) tensor of N rows with I indices beween 0 and Z, e.g.,
N=5, I=3, Z=100:
foo = tensor([[83, 5, 85],
[ 7, 60, 66],
[89, 25, 63],
[58, 67, 47],
[12, 46, 40]], device='cuda:0')
Now I want to efficiently add X random additional new indices (i.e., not yet included in the tensor!) between 0 and Z to the tensor, e.g.:
foo_new = tensor([[83, 5, 85, 9, 43, 53, 42],
[ 7, 60, 66, 85, 64, 22, 1],
[89, 25, 63, 38, 24, 4, 75],
[58, 67, 47, 83, 43, 29, 55],
[12, 46, 40, 74, 21, 11, 52]], device='cuda:0')
The tensor would in the end have in each row I+X unique indices between 0 and Z, where I indices are the ones from the initial tensor, and X indices are uniform randomly drawn without replacement from the remaining indices {0...Z}\{I(n)}, where {I(n)} are the inidices of the nth row.
So it's like a multidimensional random draw without replacement from indices 0 to Z, where the first I draws (in each row) are enforced to result in the indices given by the initial tensor.
How would I do this efficiently, especially with potentially large Z?
What I tried so far (which was quite slow):
device = torch.cuda.current_device()
notinfoo = torch.ones((N,I), device=device).byte()
N_row = torch.arange(N, device=device).unsqueeze(dim=-1)
notinfoo[N_row, foo] = 0
foo_new = torch.stack([torch.cat((f, torch.arange(Z, device=device)[nf][torch.randperm(Z-I, device=device)][:X])) for f,nf in zip(foo,notinfoo)])

Use first numpy numpy.random.choice to get samples with replace=False for without replacement sampling.
and then concat both using torch.cat
import numpy as np
foo_new = torch.tensor(np.random.choice(100 , (5,4), replace=False)) # Z = 100
foo_new = torch.cat((foo, foo_new), 1)
foo_new
tensor([[83, 5, 85, 56, 83, 16, 20],
[ 7, 60, 66, 43, 31, 75, 67],
[89, 25, 63, 96, 3, 13, 11],
[58, 67, 47, 55, 92, 70, 35],
[12, 46, 40, 79, 61, 58, 76]])

Related

Predict xgboost model onto raster stack yields error

I am using an xgboost model to predict onto a raster stack. I have successfully used the same approach with CART, xgb and Random Forest models:
library(raster)
# create a RasterStack or RasterBrick with with a set of predictor layers
logo <- brick(system.file("external/rlogo.grd", package="raster"))
names(logo)
# known presence and absence points
p <- matrix(c(48, 48, 48, 53, 50, 46, 54, 70, 84, 85, 74, 84, 95, 85,
66, 42, 26, 4, 19, 17, 7, 14, 26, 29, 39, 45, 51, 56, 46, 38, 31,
22, 34, 60, 70, 73, 63, 46, 43, 28), ncol=2)
a <- matrix(c(22, 33, 64, 85, 92, 94, 59, 27, 30, 64, 60, 33, 31, 9,
99, 67, 15, 5, 4, 30, 8, 37, 42, 27, 19, 69, 60, 73, 3, 5, 21,
37, 52, 70, 74, 9, 13, 4, 17, 47), ncol=2)
# extract values for points
xy <- rbind(cbind(1, p), cbind(0, a))
v <- data.frame(cbind(pa=xy[,1], extract(logo, xy[,2:3])))
xgb <- xgboost(data = data.matrix(subset(v, select = -c(pa))), label = v$pa,
nrounds = 5)
raster::predict(model = xgb, logo)
But with xgboost I get the following error:
Error in xgb.DMatrix(newdata, missing = missing) :
xgb.DMatrix does not support construction from list

The problem is that predict.xgb.Booster does not accept a data.frame for argument newdata (see ?predict.xgb.Booster). That is unexpected (all common predict.* methods take a data.frame), but we can work around it. I show how to do that below, using the "terra" package instead of the obsolete "raster" package (but the solution is exactly the same for either package).
The example data
library(terra)
library(xgboost)
logo <- rast(system.file("ex/logo.tif", package="terra"))
p <- matrix(c(48, 48, 48, 53, 50, 46, 54, 70, 84, 85, 74, 84, 95, 85,
66, 42, 26, 4, 19, 17, 7, 14, 26, 29, 39, 45, 51, 56, 46, 38, 31,
22, 34, 60, 70, 73, 63, 46, 43, 28), ncol=2)
a <- matrix(c(22, 33, 64, 85, 92, 94, 59, 27, 30, 64, 60, 33, 31, 9,
99, 67, 15, 5, 4, 30, 8, 37, 42, 27, 19, 69, 60, 73, 3, 5, 21,
37, 52, 70, 74, 9, 13, 4, 17, 47), ncol=2)
xy <- rbind(cbind(1, p), cbind(0, a))
v <- extract(logo, xy[,2:3])
xgb <- xgboost(data = data.matrix(v), label=xy[,1], nrounds = 5)
The work-around is to write a prediction function that first coerces the data.frame with "new data" to a matrix. We can use that function with predict<SpatRaster>
xgbpred <- function(model, data, ...) {
predict(model, newdata=as.matrix(data), ...)
}
p <- predict(logo, model=xgb, fun=xgbpred)
plot(p)

Reshaping Time Series Data with Multiple Features for RNNs

I have a time series data set with 3 measurement variables and with about 2000 samples. I want to classify samples into 1 of 4 categories using a RNN or 1D CNN model using Keras in R. My problem is that I am unable to successfully reshape the model the k_reshape() function.
I am following along the Ch. 6 of Deep Learning with R by Chollet & Allaire, but their examples aren't sufficiently different from my data set that I'm now confused. I've tried to mimic the code from that chapter of the book to no avail. Here's a link to the source code for the chapter.
library(keras)
df <- data.frame()
for (i in c(1:20)) {
time <- c(1:100)
var1 <- runif(100)
var2 <- runif(100)
var3 <- runif(100)
run <- data.frame(time, var1, var2, var3)
run$sample <- i
run$class <- sample(c(1:4), 1)
df <- rbind(df, run)
}
head(df)
# time feature1 feature2 feature3 sample class
# 1 0.4168828 0.1152874 0.0004415961 1 4
# 2 0.7872770 0.2869975 0.8809415097 1 4
# 3 0.7361959 0.5528836 0.7201276931 1 4
# 4 0.6991283 0.1019354 0.8873193581 1 4
# 5 0.8900918 0.6512922 0.3656302236 1 4
# 6 0.6262068 0.1773450 0.3722923032 1 4
k_reshape(df, shape(10, 100, 3))
# Error in py_call_impl(callable, dots$args, dots$keywords) :
# TypeError: Failed to convert object of type <class 'dict'> to Tensor. Contents: {'time': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 3
I'm very new to reshaping arrays, but I would like to have an array with the shape: (samples, time, features). I would love to hear suggestions on how to properly reshape this array or guidance on how this data should be treated for a DL model if I'm off basis on that front.

I found two solutions to my question. My confusion stemmed from the error message from k_reshape that I did not understand how to interpret.
Use the array_reshape() function from the reticulate package.
Use k_reshape() function from keras but this time use the appropriate shape.
Here is the code I successfully executed:
# generate data frame
dat <- data.frame()
for (i in c(1:20)) {
time <- c(1:100)
var1 <- runif(100)
var2 <- runif(100)
var3 <- runif(100)
run <- data.frame(time, var1, var2, var3)
run$sample <- i
run$class <- sample(c(1:4), 1)
dat <- rbind(df, run)
}
dat_m <- as.matrix(df) # convert data frame to matrix
# time feature1 feature2 feature3 sample class
# 1 0.4168828 0.1152874 0.0004415961 1 4
# 2 0.7872770 0.2869975 0.8809415097 1 4
# 3 0.7361959 0.5528836 0.7201276931 1 4
# 4 0.6991283 0.1019354 0.8873193581 1 4
# 5 0.8900918 0.6512922 0.3656302236 1 4
# 6 0.6262068 0.1773450 0.3722923032 1 4
# solution with reticulate's array_reshape function
dat_array <- reticulate::array_reshape(x = dat_m[,c(2:4)], dim = c(20, 100, 3))
dim(dat_array)
# [1] 20 100 3
class(dat_array)
# [1] "array"
# solution with keras's k_reshape
dat_array_2 <- keras::k_reshape(x = dat_m[,c(2:4)], shape = c(20, 100, 3))
dim(dat_array)
# [1] 20 100 3
class(dat_array)
# [1] 20 100 3
class(dat_array_2)
# [1] "tensorflow.tensor" "tensorflow.python.framework.ops.Tensor"
# [3] "tensorflow.python.framework.ops._TensorLike" "python.builtin.object"
A few notes:
Conceptually, this reshaping makes more sense to me as a cast or spreading of the data in R parlance.
The output of array_reshape is an array class, but k_reshape() outputs a tensorflow tensor object. Both worked for me in created deep learning networks, but I find the array class much more interpretable.

Random Forest class probabilities in seperate raster layers

I'm using the randomForest package to classify a raster stack of different predictors. Classification works fine, but I also want to retrieve the class probabilities. With my code I only get a RasterLayer with the probability of the first class, but I'd like to get a RasterStack with the class probabilities for each class in one layer.
PRED_train$response <- as.factor(PRED_train$response)
rf <- randomForest(response~., data = PRED_train, na.action = na.omit, confusion = T)
pred_RF <- raster::predict(PRED,rf,)
beginCluster()
pred_RF <- clusterR(PRED, predict, args = list(rf,type="prob"))
endCluster()

The first place to look should be ?raster::predict; which has an example that shows how to do that. Here it is:
library(raster)
logo <- brick(system.file("external/rlogo.grd", package="raster"))
p <- matrix(c(48, 48, 48, 53, 50, 46, 54, 70, 84, 85, 74, 84, 95, 85,
66, 42, 26, 4, 19, 17, 7, 14, 26, 29, 39, 45, 51, 56, 46, 38, 31,
22, 34, 60, 70, 73, 63, 46, 43, 28), ncol=2)
a <- matrix(c(22, 33, 64, 85, 92, 94, 59, 27, 30, 64, 60, 33, 31, 9,
99, 67, 15, 5, 4, 30, 8, 37, 42, 27, 19, 69, 60, 73, 3, 5, 21,
37, 52, 70, 74, 9, 13, 4, 17, 47), ncol=2)
xy <- rbind(cbind(1, p), cbind(0, a))
v <- data.frame(cbind(pa=xy[,1], extract(logo, xy[,2:3])))
v$pa <- as.factor(v$pa)
library(randomForest)
rfmod <- randomForest(pa ~., data=v)
rp <- predict(logo, rfmod, type='prob', index=1:2)
spplot(rp)

predict with glmer where new data is a Raster Stack of fixed efefcts

I have constructed models in glmer and would like to predict these on a rasterStack representing the fixed effects in my model. my glmer model is in the form of:
m1<-glmer(Severity ~ x1 + x2 + x3 + (1 | Year) + (1 | Ecoregion), family=binomial( logit ))
As you can see, I have random effects which I don't have as spatial layer - for example 'year'. Therefore the problem is really predicting glmer on rasterStacks when you don't have the random effects data random effects layers. If I use it out of the box without adding my random effects I get an error.
m1.predict=predict(object=all.var, model=m1, type='response', progress="text", format="GTiff")
Error in predict.averaging(model, blockvals, ...) :

Your question is very brief, and does not indicated what, if any, trouble you have encountered. This seems to work 'out of the box', but perhaps not in your case. See ?raster::predict for options.
library(raster)
# example data. See ?raster::predict
logo <- brick(system.file("external/rlogo.grd", package="raster"))
p <- matrix(c(48, 48, 48, 53, 50, 46, 54, 70, 84, 85, 74, 84, 95, 85,
66, 42, 26, 4, 19, 17, 7, 14, 26, 29, 39, 45, 51, 56, 46, 38, 31,
22, 34, 60, 70, 73, 63, 46, 43, 28), ncol=2)
a <- matrix(c(22, 33, 64, 85, 92, 94, 59, 27, 30, 64, 60, 33, 31, 9,
99, 67, 15, 5, 4, 30, 8, 37, 42, 27, 19, 69, 60, 73, 3, 5, 21,
37, 52, 70, 74, 9, 13, 4, 17, 47), ncol=2)
xy <- rbind(cbind(1, p), cbind(0, a))
v <- data.frame(cbind(pa=xy[,1], extract(logo, xy[,2:3])))
v$Year <- sample(2000:2001, nrow(v), replace=TRUE)
library(lme4)
m <- lmer(pa ~ red + blue + (1 | Year), data=v)
# here adding Year as a constant, as it is not a variable (RasterLayer) in the RasterStack object
x <- predict(logo, m, const=(data.frame(Year=2000)))

If you don't have the random effects, just use re.form=~0 in your predict call to predict at the population level:
x <- predict(logo, m, re.form=~0)
works without complaint for me with #RobertH's example (although I don't know if correctly)

How to predict new raster using model generated by cforest

I use randomForest model to predict class memberships. 'x' consists of 10 classes that I use to train 'training_predictors' values extracted from a large rasterstack/brick. The specific line of codes is:
r_tree<-randomForest(x ~. , data=training_predictors, ...)
Then I run 'predict' using the model 'r_tree' that I apply to the rasterstack 'predictor_data', as follow:
predictions<-predict(predictor_data, r_tree, filename=outraster, fun=predict na.rm=TRUE, format="PCDISK", overwrite=TRUE, progress="text", type="response").
The output is a raster that I use as thematic map.
I would like to use the conditional inference trees mode 'cforest' instead of randomForest to achieve the same goals.
I understand that 'predict' can be used with cforest, yet, I have not been able to generate raster files, such as those with randomForest as illustrated above.

It should run fine, but you may need to add the argument OOB=TRUE, and identify factors if there are any.
Example data
p <- matrix(c(48, 48, 48, 53, 50, 46, 54, 70, 84, 85, 74, 84, 95, 85,
66, 42, 26, 4, 19, 17, 7, 14, 26, 29, 39, 45, 51, 56, 46, 38, 31,
22, 34, 60, 70, 73, 63, 46, 43, 28), ncol=2)
a <- matrix(c(22, 33, 64, 85, 92, 94, 59, 27, 30, 64, 60, 33, 31, 9,
99, 67, 15, 5, 4, 30, 8, 37, 42, 27, 19, 69, 60, 73, 3, 5, 21,
37, 52, 70, 74, 9, 13, 4, 17, 47), ncol=2)
# extract values for points
xy <- rbind(cbind(1, p), cbind(0, a))
v <- data.frame(cbind(xy[,1], extract(logo, xy[,2:3])))
colnames(v)[1] <- 'pa'
Basic model
library(party)
m1 <- cforest(pa~., control=cforest_unbiased(mtry=3), data=v)
pc1 <- predict(logo, m1, OOB=TRUE)
plot(pc1)
Model with factors
v$red <- as.factor(round(v$red/100))
logo$red <- round(logo[[1]]/100)
m2 <- cforest(pa~., control=cforest_unbiased(mtry=3), data=v)
f <- list(levels(v$red))
names(f) <- 'red'
pc2 <- predict(logo, m2, OOB=TRUE, factors=f)
plot(pc2)
By the way, this comes almost straight out of the help file of raster::predict

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Multidimensional random draw without replacement with 'predrawn' samples in pytorch - multidimensional-array

Related

Predict xgboost model onto raster stack yields error

Reshaping Time Series Data with Multiple Features for RNNs

Random Forest class probabilities in seperate raster layers

predict with glmer where new data is a Raster Stack of fixed efefcts

How to predict new raster using model generated by cforest

Categories

Resources