Concatenate layers in R Keras - r

I have this BERT classifier, where I want to concatenate the BERT output with additional features (hot-coded, 13 categories).
I get this error message which I do not understand - the arguments specified are all named.
input_word_ids <- layer_input(shape = c(set.max_length), dtype = 'int32', name = "input_word_ids")
input_mask <- layer_input(shape = c(set.max_length), dtype = 'int32', name = "input_attention_mask")
input_topic <- layer_input(shape = c(13), dtype = 'int32', name = "input_topic")
last_hidden_state <- model_tf(input_word_ids, attention_mask = input_mask)[[1]] # shape=(None, 512, 768)
cls_token <- last_hidden_state[, 1,] # shape=(None, 768)
output <- cls_token %>%
layer_concatenate(inputs = list(cls_token, input_topic), axis = -1)
Error in assert_all_dots_named(envir, cl) :
All arguments provided to `...` must be named.
Call with unnamed arguments in dots:
layer_concatenate(inputs = list(cls_token, input_topic), axis = -1, .)
If I run layer_concatenate(inputs = list(cls_token, input_topic)) [without the axis argument],
I get
Error in modifiers[[nm]](args[[nm]]) :
cannot coerce type 'environment' to vector of type 'integer'
The first error message stems from the Keras package (assert_all_dots_named(), line 435, https://github.com/rstudio/keras/blob/main/R/utils.R) if I am not mistaken
I read the Keras vignette, I don't see what I am doing wrong...
Any help is highly appreciated, many thanks in advance!

I was able to solve it on my own - cls_token %>% was the problem. A conflict of Keras functional api and maggritr-piping I suppose. cls_token %>% was used by layer_concatenate() as another "unnamed" input, therefore the error message.
Solution:
output <- layer_concatenate(inputs = list(cls_token, input_topic), axis = 1) %>%
layer_dropout(rate = set.dropout)

Related

R package {gtsummary} - Can't convert .f (NULL) to function using custom stat function

I'm using gtsummary::tbl_custom_summary to present data analyzed through a test function I've written.
I'm getting this error:
Error in `mutate()`:
! Problem while computing `df_stats = pmap(...)`.
Caused by error in `as_group_map_function()`:
! Can't convert `.f`, NULL, to a function.
Backtrace:
1. gtsummary::tbl_custom_summary(...)
18. dplyr:::group_modify.grouped_df(...)
19. dplyr:::as_group_map_function(.f)
20. rlang::as_function(.f)
I haven't been able to tell if it has to do with data masking/quotations (which I tried to implement but, honestly, haven't understood much of…), or if it something related to gtsummary.
This is the code—please note the custom function's argument are modeled after tbl_custom_summary's requirements and have convenience defaults at the moment.
library(tidyverse)
library(infer)
library(gtsummary)
# Custom function supposedly uses infer package functions to determine the difference
# in medians and its bootstrapped CIs, returning a DF with values specified and
# 'glued' in the tbl_custom_summary call (statistic argument).
testdiffCI <- function(group_data,
full_data,
variable,
by,
type,
stat_display,
...,
point_stat = "median",
# This default is in order to use mtcars data
order = c("0", "1"),
ci_type = "bias-corrected"
) {
# change the function arg into one compatible with infer syntax, i.e. string
stat <- str_glue("diff in {point_stat}s") |> toString()
# These variables are passed as strings (names). Convert them to symbols.
variable <- sym(variable)
by <- sym(by)
# Calculate point estimate:
point <- full_data |>
# Trying to use shorthand defuse|>inject operator {{
# I'm really not sure if this is correct
specify(response={{variable}}, explanatory = {{by}}) |>
calculate(stat = stat, order = order) |> suppressWarnings()
# Bootstrap (population) object
boot <- full_data |>
specify(response = {{variable}}, explanatory = {{by}}) |>
generate(reps = 1000, type = "bootstrap") |>
calculate(stat = stat, order = order)
ci <- get_confidence_interval(boot,
type = ci_type,
point_estimate = point
)
# Return a tibble; column names are the same as those appearing in the statistic
# argument in tbl_custom_summary call
tibble(
diff = point$stat,
ci_lo = ci$lower_ci,
ci_up = ci$upper_ci
)
}
This is the call to the summary function:
mtcars |>
mutate(vs = as.factor(vs)) |>
tbl_custom_summary(
by = vs,
stat_fns = list(all_continuous() ~ testdiffCI),
statistic = ~ "{diff} ({ci_lo} – {ci_hi})"
)
Thanks for any insights you guys may share!

Create a multivariate matrix in tidymodels recipes::recipe()

I am trying to do a k-fold cross validation on a model that predicts the joint distribution of the proportion of tree species basal area from satellite imagery. This requires the use of the DiricihletReg::DirichReg() function, which in turn requires that the response variables be prepared as a matrix using the DirichletReg::DR_data() function. I originally tried to accomplish this in the caret:: package, but I found out that caret:: does not support multivariate responses. I have since tried to implement this in the tidymodels:: suite of packages. Following the documentation on how to register a new model in the parsnip:: (I appreciate Max Kuhn's vegetable humor) package, I created a "DREG" model and a "DR" engine. My registered model works when I simply call it on a single training dataset, but my goal is to do kfolds cross-validation, implementing the vfolds_cv(), a workflow(), and the 'fit_resample()' function. With the code I currently have I get warning message stating:
Warning message:
All models failed. See the `.notes` column.
Those notes state that Error in get(resp_char, environment(oformula)): object 'cbind(PSME, TSHE, ALRU2)' not found This, I believe is due to the use of DR_data() to preprocess the response variables into the format necessary for Dirichlet::DirichReg() to run properly. I think the solution I need to implement involve getting this pre-processing to happen in either the recipe() call or in the set_fit() call when I register this model with parsnip::. I have tried to use the step_mutate() function when specifying the recipe, but that performs a function on each column as opposed to applying the function with the columns as inputs. This leads to the following error in the "notes" from the output of fit_resample():
Must subset columns with a valid subscript vector.
Subscript has the wrong type `quosures`.
It must be numeric or character.
Is there a way to get the recipe to either transform several columns to a DirichletRegData class using the DR_data() function with a step_*() function or using the pre= argument in set_fit() and set_pred()?
Below is my reproducible example:
##Loading Necessary Packages##
library(tidymodels)
library(DirichletReg)
##Creating Fake Data##
set.seed(88)#For reproducibility
#Response variables#
PSME_BA<-rnorm(100,50, 15)
TSHE_BA<-rnorm(100,40,12)
ALRU2_BA<-rnorm(100,20,0.5)
Total_BA<-PSME_BA+TSHE_BA+ALRU2_BA
#Predictor variables#
B1<-runif(100, 0, 2000)
B2<-runif(100, 0, 1800)
B3<-runif(100, 0, 3000)
#Dataset for modeling#
DF<-data.frame(PSME=PSME_BA/Total_BA, TSHE=TSHE_BA/Total_BA, ALRU2=ALRU2_BA/Total_BA,
B1=B1, B2=B2, B3=B3)
##Modeling the data using Dirichlet regression with repeated k-folds cross validation##
#Registering the model to parsnip::#
set_new_model("DREG")
set_model_mode(model="DREG", mode="regression")
set_model_engine("DREG", mode="regression", eng="DR")
set_dependency("DREG", eng="DR", pkg="DirichletReg")
set_model_arg(
model = "DREG",
eng = "DR",
parsnip = "param",
original = "model",
func = list(pkg = "DirichletReg", fun = "DirichReg"),
has_submodel = FALSE
)
DREG <-
function(mode = "regression", param = NULL) {
# Check for correct mode
if (mode != "regression") {
rlang::abort("`mode` should be 'regression'")
}
# Capture the arguments in quosures
args <- list(sub_classes = rlang::enquo(param))
# Save some empty slots for future parts of the specification
new_model_spec(
"DREG",
args=args,
eng_args = NULL,
mode = mode,
method = NULL,
engine = NULL
)
}
set_fit(
model = "DREG",
eng = "DR",
mode = "regression",
value = list(
interface = "formula",
protect = NULL,
func = c(pkg = "DirichletReg", fun = "DirichReg"),
defaults = list()
)
)
set_encoding(
model = "DREG",
eng = "DR",
mode = "regression",
options = list(
predictor_indicators = "none",
compute_intercept = TRUE,
remove_intercept = TRUE,
allow_sparse_x = FALSE
)
)
set_pred(
model = "DREG",
eng = "DR",
mode = "regression",
type = "numeric",
value = list(
pre = NULL,
post = NULL,
func = c(fun = "predict.DirichletRegModel"),
args =
list(
object = expr(object$fit),
newdata = expr(new_data),
type = "response"
)
)
)
##Running the Model##
DF$Y<-DR_data(DF[,c(1:3)]) #Preparing the response variables
dreg_spec<-DREG(param="alternative") %>%
set_engine("DR")
dreg_mod<-dreg_spec %>%
fit(Y~B1+B2+B3, data = DF)#Model works when simply run on single dataset
##Attempting Crossvalidation##
#First attempt - simply call Y as the response variable in the recipe#
kfolds<-vfold_cv(DF, v=10, repeats = 2)
rcp<-recipe(Y~B1+B2+B3, data=DF)
dreg_fit<- workflow() %>%
add_model(dreg_spec) %>%
add_recipe(rcp)
dreg_rsmpl<-dreg_fit %>%
fit_resamples(kfolds)#Throws warning about all models failing
#second attempt - use step_mutate_at()#
rcp<-recipe(~B1+B2+B3, data=DF) %>%
step_mutate_at(fn=DR_data, var=vars(PSME, TSHE, ALRU2))
dreg_fit<- workflow() %>%
add_model(dreg_spec) %>%
add_recipe(rcp)
dreg_rsmpl<-dreg_fit %>%
fit_resamples(kfolds)#Throws warning about all models failing
This works, but I'm not sure if it's what you were expecting.
First--getting the data setup for CV and DR_data()
I don't know of any package that has built what would essentially be a translation for CV and DirichletReg. Therefore, that part is manually done. You might be surprised to find it's not all that complicated.
Using the data you created and the modeling objects you created for tidymodels (those prefixed with set_), I created the CV structure that you were trying to use.
df1 <- data.frame(PSME = PSME_BA/Total_BA, TSHE = TSHE_BA/Total_BA,
ALRU2=ALRU2_BA/Total_BA, B1, B2, B3)
set.seed(88)
kDf2 <- kDf1 <- vfold_cv(df1, v=10, repeats = 2)
For each of the 20 subset data frames identified in kDf2, I used DR_data to set the data up for the models.
# convert to DR_data (each folds and repeats)
df2 <- map(1:20,
.f = function(x){
in_ids = kDf1$splits[[x]]$in_id
dd <- kDf1$splits[[x]]$data[in_ids, ] # filter rows BEFORE DR_data
dd$Y <- DR_data(dd[, 1:3])
kDf1$splits[[x]]$data <<- dd
})
Because I'm not all that familiar with tidymodels, next conducted the modeling using DirichReg. I then did it again with tidymodels and compared them. (The output is identical.)
DirichReg Models and summaries of the fits
set.seed(88)
# perform crossfold validation on Dirichlet Model
df2.fit <- map(1:20,
.f = function(x){
Rpt = kDf1$splits[[x]]$id$id
Fld = kDf1$splits[[x]]$id$id2
daf = kDf1$splits[[x]]$data
fit = DirichReg(Y ~ B1 + B2, daf)
list(Rept = Rpt, Fold = Fld, fit = fit)
})
# summary of each fitted model
fit.a <- map(1:20,
.f = function(x){
summary(df2.fit[[x]]$fit)
})
tidymodels and summaries of the fits (the code looks the same, but there are a few differences--the output is the same, though)
# I'm not sure what 'alternative' is supposed to do here?
dreg_spec <- DREG(param="alternative") %>% # this is not model = alternative
set_engine("DR")
set.seed(88)
dfa.fit <- map(1:20,
.f = function(x){
Rpt = kDf1$splits[[x]]$id$id
Fld = kDf1$splits[[x]]$id$id2
daf = kDf1$splits[[x]]$data
fit = dreg_spec %>%
fit(Y ~ B1 + B2, data = daf)
list(Rept = Rpt, Fold = Fld, fit = fit)
})
afit.a <- map(1:20,
.f = function(x){
summary(dfa.fit[[x]]$fit$fit) # extra nest for parsnip
})
If you wanted to see the first model?
fit.a[[1]]
afit.a[[1]]
If you wanted the model with the lowest AIC?
# comare AIC, BIC, and liklihood?
# what do you percieve best fit with?
fmin = min(unlist(map(1:20, ~fit.a[[.x]]$aic))) # dir
# find min AIC model number
paste0((map(1:20, ~ifelse(fit.a[[.x]]$aic == fmin, .x, ""))), collapse = "")
fit.a[[19]]
afit.a[[19]]

Weird characters appearing in the plot legend when using DoHeatmap

I was using Seurat to analyse single cell RNA-seq data and I managed to draw a heatmap plot with DoHeatmap() after clustering and marker selection, but got a bunch of random characters appearing in the legend. They are random characters as they will change every time you run the code. I was worrying over it's something related to my own dataset, so I then tried the test Seurat object 'ifnb' but still got the same issue (see the red oval in the example plot).
example plot
I also tried importing the Seurat object in R in the terminal (via readRDS) and ran the plotting function, but got the same issue there, so it's not a Rstudio thing.
Here are the codes I ran:
'''
library(Seurat)
library(SeuratData)
library(patchwork)
InstallData("ifnb")
LoadData("ifnb")
ifnb.list <- SplitObject(ifnb, split.by = "stim")
ifnb.list <- lapply(X = ifnb.list, FUN = function(x) {
x <- NormalizeData(x)
x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2000)
})
features <- SelectIntegrationFeatures(object.list = ifnb.list)
immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, anchor.features = features)
immune.combined <- IntegrateData(anchorset = immune.anchors)
immune.combined <- ScaleData(immune.combined, verbose = FALSE)
immune.combined <- RunPCA(immune.combined, npcs = 30, verbose = FALSE)
immune.combined <- RunUMAP(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindNeighbors(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindClusters(immune.combined, resolution = 0.5)
DefaultAssay(immune.combined) <- 'RNA'
immune_markers <- FindAllMarkers(immune.combined, latent.vars = "stim", test.use = "MAST", assay = 'RNA')
immune_markers %>%
group_by(cluster) %>%
top_n(n = 10, wt = avg_log2FC) -> top10_immune
DoHeatmap(immune.combined, slot = 'data',features = top10_immune$gene, group.by = 'stim', assay = 'RNA')
'''
Does anyone have any idea how to solve this issue other than reinstalling everything?
I have been having the same issue myself and while I have solved it by not needing the legend, I think you could use this approach and use a similar solution:
DoHeatmap(immune.combined, slot = 'data',features = top10_immune$gene, group.by = 'stim', assay = 'RNA') +
scale_color_manual(
values = my_colors,
limits = c('CTRL', 'STIM'))
Let me know if this works! It doesn't solve the source of the odd text values but it does the job! If you haven't already, I would recommend creating a forum question on the Seurat forums to see where these characters are coming from!
When I use seurat4.0, I met the same problem.
While I loaded 4.1, it disappeared

GRU (Gated Recurrent Unit) presents errors with running out of data from a generator function and, no data provided for "each key in.." in R (RStudio)

Starter here - no formal background in ML or anything close and a little self taught R. I am using RStudio to do this.
I have set up an elementary G.R.U. onto which I hope to build more process. My first follows the instruction on https://blogs.rstudio.com/tensorflow/posts/2017-12-20-time-series-forecasting-with-recurrent-neural-networks/, to use a generator function for training data and feed into fit_generator(). The second uses no generator and fit.keras.engine.training.Model().
My settings at this point of trial and error are -
With generator function:
library(keras)
model <- keras_model_sequential() %>%
layer_gru(units = 5, input_shape = list(NULL, dim(data)[[-1]])) %>%
layer_dense(units = 1)
model %>% compile(
optimizer = optimizer_rmsprop(),
loss = "mae"
)
history <- model %>% fit_generator(
train_gen,
steps_per_epoch = 100,
epochs = 1,
validation_data = val_gen,
validation_steps = val_steps
)
With no generator function:
library(keras)
model <- keras_model_sequential() %>%
layer_gru(units = 32, input_shape = list(NULL, dim(data)[[-1]])) %>%
layer_dense(units = 1)
model %>% compile(
optimizer = optimizer_rmsprop(),
loss = "mae"
)
history <- model %>% fit(
x=train_data[,1:2],
y=train_data[,3:4],
epochs = 2,
verbose=1,
shuffle=F,
validation_data = val_data[,1:4]
)
My data are the closing prices of currencies from mt4. To help understand my data the code looks so:
library(tibble)
library(readr)
data_dir <- "C:/Users/elech/Desktop/traindats"
USDCADname <- file.path(data_dir, c("USDCAD.csv"))
USDAUDname <- file.path(data_dir, c("USDAUD.csv"))
USDCHFname <- file.path(data_dir, c("USDCHF.csv"))
USDNZDname <- file.path(data_dir, c("USDNZD.csv"))
USDJPYname <- file.path(data_dir, c("USDJPY.csv"))
USDCAD<- read_csv(USDCADname, col_names = F)
USDAUD<- read_csv(USDAUDname, col_names = F)
USDCHF<- read_csv(USDCHFname, col_names = F)
USDNZD<- read_csv(USDNZDname, col_names = F)
USDJPY<- read_csv(USDJPYname, col_names = F)
Close_USDAUD<-USDAUD$X6
Close_USDCAD<-USDCAD$X6
Close_USDCHF<-USDCHF$X6
Close_USDNZD<-USDNZD$X6
Close_USDJPY<-USDJPY$X6
train_data<-data[1:2000,]
val_data<-data[2001:4000,]
test_data<-data[4001:nrow(data),]
data<-tibble(USDAUD=Close_USDAUD[176:5175], USDCAD=Close_USDCAD[173:5172], USDCHF=Close_USDCHF[176:5175], USDJPY=Close_USDJPY[175:5174], USDNZD=Close_USDNZD[170:5169])
I struggled with the input shape before I used the setup provided by 1, if I could additionally get help with setting up a functional input array that'd be great.
My results that need help have revolved around;
With generator:
Error occurred in generator: argument 'length.out' must be of length 1
WARNING:tensorflow:Your dataset iterator ran out of data; interrupting training. Make sure that your iterator can generate at least `steps_per_epoch * epochs` batches (in this case, 2000 batches). You may need touse the repeat() function when building your dataset.
Error in py_call_impl(callable, dots$args, dots$keywords) :
ValueError: Empty training data.
(The 'Detailed Traceback' left out )
Without generator (which ive decided to focus on):
Error in py_call_impl(callable, dots$args, dots$keywords) :
ValueError: No data provided for "gru_8_input". Need data for each key in: ['gru_8_input']
What is gru_8_input? Even Google doesn't know what it is!
Please help.
Plus,
The generator function is from 1.
Thanks.

Keras, LIME: No data provided for "lstm__input" in "explain" function

I receive ValueError: No data provided for "lstm__input". Need data for each key in: ['lstm__input'] while executing lime::explain
I have latest lime version and looked thourgh all related topics to my problem. I have LSTM regression network and I don't have any issues training it and predicting values.
As well I don't have any problems predicting values using lime:
class(model)
%keras.engine.sequential.Sequential
model_type.keras.engine.sequential.Sequential <- function(x, ...) {
"regression"}
predict_model.keras.engine.sequential.Sequential <- function (x, newdata, type, ...) {
pred <- predict(object = x, x = newdata)
data.frame (pred) }
predict_model(x = model, newdata = (testX_Matrix), type = 'raw')
explainer <- lime::lime (
x = trainX,
model = model,
bin_continuous = FALSE)
Up to this point everything works fine. When I try to run next lines:
explanation <- lime::explain (
testX,
explainer = explainer,
n_features = 4)
I receive an error:
Error in py_call_impl(callable, dots$args, dots$keywords) :
ValueError: No data provided for "lstm_8_input". Need data for each key in: ['lstm_8_input']
Detailed traceback:
File "C:\Soft\anaconda\envs\R-TENS~1\lib\site-packages\keras\engine\training.py", line 1147, in predict
x, _, _ = self._standardize_user_data(x)
File "C:\Soft\anaconda\envs\R-TENS~1\lib\site-packages\keras\engine\training.py", line 749, in _standardize_user_data
exception_prefix='input')
File "C:\Soft\anaconda\envs\R-TENS~1\lib\site-packages\keras\engine\training_utils.py", line 77, in standardize_input_data
'for each key in: ' + str(names))
I guess main problem comes from the fact that my NN requires only 3-dim matrix (the same I used in predict_model), but lime doesn't work't with objects of class 'array', 'double', 'numeric' so I can't use it.
Has anyone faced such issue? How can I fix it?

Resources