Related
I will play in the Brazilian Lottery with my friends. I requested every one of them to choose seven numbers. I create a variable for all of them.
pestana = c(04, 15, 29, 36, 54, 25, 07)
carol = c(7, 22, 30, 35, 44, 51, 57)
davi = c(8, 13, 21, 29, 37, 42, 55)
valerio = c(30, 20, 33, 14, 7, 41, 54)
victor = c(09, 11, 26, 33, 38, 52, 57)
Then, I created a list with all of the numbers, and a list with unique numbers (in order to avoid repeated numbers)
list = c(carol, davi, pestana, valerio, victor, diuli, cynara)
list2 = unique(list)
Finally, I made a sample() for the list2
sample(list2, 7)
After that, I was wondering. Is it possible for me not to use the unique and not have repeated numbers? Because for instance, that way, repeated numbers have the same probability of appearing, when in fact, they have more (for instance, seven appeared three times).
How about this:
pestana = c(04, 15, 29, 36, 54, 25, 07)
carol = c(7, 22, 30, 35, 44, 51, 57)
davi = c(8, 13, 21, 29, 37, 42, 55)
valerio = c(30, 20, 33, 14, 7, 41, 54)
victor = c(09, 11, 26, 33, 38, 52, 57)
list = c(carol, davi, pestana, valerio, victor)
l <- c(unlist(list))
nums <- table(l)
probs <- nums/sum(nums)
sample(names(probs), 7, prob = probs, replace=FALSE)
#> [1] "4" "33" "44" "11" "29" "52" "8"
Created on 2022-12-14 by the reprex package (v2.0.1)
Using the prob argument, you can make some values more likely to show up than others.
I am using an xgboost model to predict onto a raster stack. I have successfully used the same approach with CART, xgb and Random Forest models:
library(raster)
# create a RasterStack or RasterBrick with with a set of predictor layers
logo <- brick(system.file("external/rlogo.grd", package="raster"))
names(logo)
# known presence and absence points
p <- matrix(c(48, 48, 48, 53, 50, 46, 54, 70, 84, 85, 74, 84, 95, 85,
66, 42, 26, 4, 19, 17, 7, 14, 26, 29, 39, 45, 51, 56, 46, 38, 31,
22, 34, 60, 70, 73, 63, 46, 43, 28), ncol=2)
a <- matrix(c(22, 33, 64, 85, 92, 94, 59, 27, 30, 64, 60, 33, 31, 9,
99, 67, 15, 5, 4, 30, 8, 37, 42, 27, 19, 69, 60, 73, 3, 5, 21,
37, 52, 70, 74, 9, 13, 4, 17, 47), ncol=2)
# extract values for points
xy <- rbind(cbind(1, p), cbind(0, a))
v <- data.frame(cbind(pa=xy[,1], extract(logo, xy[,2:3])))
xgb <- xgboost(data = data.matrix(subset(v, select = -c(pa))), label = v$pa,
nrounds = 5)
raster::predict(model = xgb, logo)
But with xgboost I get the following error:
Error in xgb.DMatrix(newdata, missing = missing) :
xgb.DMatrix does not support construction from list
The problem is that predict.xgb.Booster does not accept a data.frame for argument newdata (see ?predict.xgb.Booster). That is unexpected (all common predict.* methods take a data.frame), but we can work around it. I show how to do that below, using the "terra" package instead of the obsolete "raster" package (but the solution is exactly the same for either package).
The example data
library(terra)
library(xgboost)
logo <- rast(system.file("ex/logo.tif", package="terra"))
p <- matrix(c(48, 48, 48, 53, 50, 46, 54, 70, 84, 85, 74, 84, 95, 85,
66, 42, 26, 4, 19, 17, 7, 14, 26, 29, 39, 45, 51, 56, 46, 38, 31,
22, 34, 60, 70, 73, 63, 46, 43, 28), ncol=2)
a <- matrix(c(22, 33, 64, 85, 92, 94, 59, 27, 30, 64, 60, 33, 31, 9,
99, 67, 15, 5, 4, 30, 8, 37, 42, 27, 19, 69, 60, 73, 3, 5, 21,
37, 52, 70, 74, 9, 13, 4, 17, 47), ncol=2)
xy <- rbind(cbind(1, p), cbind(0, a))
v <- extract(logo, xy[,2:3])
xgb <- xgboost(data = data.matrix(v), label=xy[,1], nrounds = 5)
The work-around is to write a prediction function that first coerces the data.frame with "new data" to a matrix. We can use that function with predict<SpatRaster>
xgbpred <- function(model, data, ...) {
predict(model, newdata=as.matrix(data), ...)
}
p <- predict(logo, model=xgb, fun=xgbpred)
plot(p)
I have this dataframe:
df <- structure(list(a = c(2, 5, 90, 77, 56, 65, 85, 75, 12, 24, 52,
32), b = c(45, 78, 98, 55, 63, 12, 23, 38, 75, 68, 99, 73), c = c(77,
85, 3, 22, 4, 69, 86, 39, 78, 36, 96, 11), d = c(52, 68, 4, 25,
79, 120, 97, 20, 7, 19, 37, 67), e = c(14, 73, 91, 87, 94, 38,
1, 685, 47, 102, 666, 74)), class = "data.frame", row.names = c(NA,
-12L))
and the script:
R <- Map(`+`, list(1:3), 0:3)
df_cum <- as.matrix(rep(NA, ncol(df)))
for (r in seq(R)) {
for (f in seq(ncol(df))) {
df_cum <- sapply(df[R[[r]],], function(x) (cumprod(1 + x) - 1)*100)
}
}
I want to change all the first row values to "0", for each loop (1:3, 2:4, 3:5,...), before
df_cum <- sapply(df[R[[r]],], function(x) (cumprod(1 + x) - 1)*100)
I.e. for the first cicle 1:3 (df rows), the first row values change from "2, 45, 77, 52, 14" to "0, 0, 0, 0, 0".
How can I do?
Thx
I have a time series data set with 3 measurement variables and with about 2000 samples. I want to classify samples into 1 of 4 categories using a RNN or 1D CNN model using Keras in R. My problem is that I am unable to successfully reshape the model the k_reshape() function.
I am following along the Ch. 6 of Deep Learning with R by Chollet & Allaire, but their examples aren't sufficiently different from my data set that I'm now confused. I've tried to mimic the code from that chapter of the book to no avail. Here's a link to the source code for the chapter.
library(keras)
df <- data.frame()
for (i in c(1:20)) {
time <- c(1:100)
var1 <- runif(100)
var2 <- runif(100)
var3 <- runif(100)
run <- data.frame(time, var1, var2, var3)
run$sample <- i
run$class <- sample(c(1:4), 1)
df <- rbind(df, run)
}
head(df)
# time feature1 feature2 feature3 sample class
# 1 0.4168828 0.1152874 0.0004415961 1 4
# 2 0.7872770 0.2869975 0.8809415097 1 4
# 3 0.7361959 0.5528836 0.7201276931 1 4
# 4 0.6991283 0.1019354 0.8873193581 1 4
# 5 0.8900918 0.6512922 0.3656302236 1 4
# 6 0.6262068 0.1773450 0.3722923032 1 4
k_reshape(df, shape(10, 100, 3))
# Error in py_call_impl(callable, dots$args, dots$keywords) :
# TypeError: Failed to convert object of type <class 'dict'> to Tensor. Contents: {'time': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 3
I'm very new to reshaping arrays, but I would like to have an array with the shape: (samples, time, features). I would love to hear suggestions on how to properly reshape this array or guidance on how this data should be treated for a DL model if I'm off basis on that front.
I found two solutions to my question. My confusion stemmed from the error message from k_reshape that I did not understand how to interpret.
Use the array_reshape() function from the reticulate package.
Use k_reshape() function from keras but this time use the appropriate shape.
Here is the code I successfully executed:
# generate data frame
dat <- data.frame()
for (i in c(1:20)) {
time <- c(1:100)
var1 <- runif(100)
var2 <- runif(100)
var3 <- runif(100)
run <- data.frame(time, var1, var2, var3)
run$sample <- i
run$class <- sample(c(1:4), 1)
dat <- rbind(df, run)
}
dat_m <- as.matrix(df) # convert data frame to matrix
# time feature1 feature2 feature3 sample class
# 1 0.4168828 0.1152874 0.0004415961 1 4
# 2 0.7872770 0.2869975 0.8809415097 1 4
# 3 0.7361959 0.5528836 0.7201276931 1 4
# 4 0.6991283 0.1019354 0.8873193581 1 4
# 5 0.8900918 0.6512922 0.3656302236 1 4
# 6 0.6262068 0.1773450 0.3722923032 1 4
# solution with reticulate's array_reshape function
dat_array <- reticulate::array_reshape(x = dat_m[,c(2:4)], dim = c(20, 100, 3))
dim(dat_array)
# [1] 20 100 3
class(dat_array)
# [1] "array"
# solution with keras's k_reshape
dat_array_2 <- keras::k_reshape(x = dat_m[,c(2:4)], shape = c(20, 100, 3))
dim(dat_array)
# [1] 20 100 3
class(dat_array)
# [1] 20 100 3
class(dat_array_2)
# [1] "tensorflow.tensor" "tensorflow.python.framework.ops.Tensor"
# [3] "tensorflow.python.framework.ops._TensorLike" "python.builtin.object"
A few notes:
Conceptually, this reshaping makes more sense to me as a cast or spreading of the data in R parlance.
The output of array_reshape is an array class, but k_reshape() outputs a tensorflow tensor object. Both worked for me in created deep learning networks, but I find the array class much more interpretable.
I'm using the randomForest package to classify a raster stack of different predictors. Classification works fine, but I also want to retrieve the class probabilities. With my code I only get a RasterLayer with the probability of the first class, but I'd like to get a RasterStack with the class probabilities for each class in one layer.
PRED_train$response <- as.factor(PRED_train$response)
rf <- randomForest(response~., data = PRED_train, na.action = na.omit, confusion = T)
pred_RF <- raster::predict(PRED,rf,)
beginCluster()
pred_RF <- clusterR(PRED, predict, args = list(rf,type="prob"))
endCluster()
The first place to look should be ?raster::predict; which has an example that shows how to do that. Here it is:
library(raster)
logo <- brick(system.file("external/rlogo.grd", package="raster"))
p <- matrix(c(48, 48, 48, 53, 50, 46, 54, 70, 84, 85, 74, 84, 95, 85,
66, 42, 26, 4, 19, 17, 7, 14, 26, 29, 39, 45, 51, 56, 46, 38, 31,
22, 34, 60, 70, 73, 63, 46, 43, 28), ncol=2)
a <- matrix(c(22, 33, 64, 85, 92, 94, 59, 27, 30, 64, 60, 33, 31, 9,
99, 67, 15, 5, 4, 30, 8, 37, 42, 27, 19, 69, 60, 73, 3, 5, 21,
37, 52, 70, 74, 9, 13, 4, 17, 47), ncol=2)
xy <- rbind(cbind(1, p), cbind(0, a))
v <- data.frame(cbind(pa=xy[,1], extract(logo, xy[,2:3])))
v$pa <- as.factor(v$pa)
library(randomForest)
rfmod <- randomForest(pa ~., data=v)
rp <- predict(logo, rfmod, type='prob', index=1:2)
spplot(rp)