Simple BERT implementation on R side - r

I would like to implement a simple text classification using BERT model on R side. On my local side, I have no progress because of this issue. Therefore, I moved to kaggle platform. Also, I have started to replicate this kaggle kernel. However, I could not succeed to call transformer$TFBertModel because this line throws me the below error. As you might guess that there are few tutorials BERT on R side, so I am also appreciated for any simple BERT text classification tutorial based on R (over kaggle).
ERROR
Error in py_get_attr_impl(x, name, silent): RuntimeError: Failed to import transformer...
as:
library(tidyverse)
library(reticulate)
library(LiblineaR)
library(tidymodels)
library(rsample)
reticulate::py_install('transformers', pip = TRUE)
transformer = reticulate::import('transformers')
tf = reticulate::import('tensorflow')
builtins <- import_builtins()
tokenizer <- transformer$AutoTokenizer$from_pretrained('bert-base-uncased')
BERT = transformer$TFBertModel$from_pretrained("bert-base-uncased")
print("completed...")

Related

Using Huggingface Transformer Models in R

I am trying to use different Huggingface models in R.
This works by importing the transformers package through reticulate (thank you, https://rpubs.com/eR_ic/transfoRmers)
Models where inputs just require a single string work for me.
Some models require a lists or a vector and I simply don't know where to get the information on how exactly to call the model.
Take this model for example. https://huggingface.co/openai/clip-vit-base-patch32.
From the python example I know it takes a picture and (I assume) a character vector of possible classes.
The Python input is: text=["a photo of a cat", "a photo of a dog"], images=image
library(reticulate)
library(here)
library(tidyverse)
transformers <- reticulate::import("transformers")
image_classification_zero_shot <- transformers$pipeline(task = "zero-shot-image-classification", model = "openai/clip-vit-base-patch32")
image_classification <- transformers$pipeline(task = "image-classification", model = "microsoft/beit-base-patch16-224-pt22k-ft22k")
image_url <- "http://images.cocodataset.org/val2017/000000039769.jpg"
The model just requiring the image works
image_classification(images = image_url)
The model which also requires a character input with the classes does not work.
image_classification_zero_shot(text = c("cats", "dogs"), images = image_url)
image_classification_zero_shot(text = "[cats, dogs]", images = image_url)
> Error in py_call_impl(callable, dots$args, dots$keywords) :
TypeError: object of type 'NoneType' has no len()
View(image_classification_zero_shot) does not yield any information.
How do I get the zero shot model to work?
How do I generally get the information on how to call these models in R? It's a function, shouldn't I be able to find information about its parameters somewhere (in R or on huggingface)?
Thank you very much!
I am experiencing a similar issue with another huggingface transformer called "jonas/sdg_classifier_osdg".
Error in py_call_impl(callable, dots$args, dots$keywords) :
TypeError: linear(): argument 'input' (position 1) must be Tensor, not
BatchEncoding
Solutions:
To write python code within a rmarkdown notebook
Activate repl_python() on the console and then write python code.
The same code generates a tensor when written in python but a character string when written in R.
Further Questions:
How to effectively transform an R string into a tensor that its understood by pythorch? (When I try to load torch together with reticulate, R crashes)

MXNet Time-series Example - Dropout Error when running locally

I am looking into using MXNet LSTM modelling for time-series analysis for a problem i am currently working on.
As a way of understanding how to implement this, I am following the example code given by xnNet from the link: https://mxnet.incubator.apache.org/tutorials/r/MultidimLstm.html
When running this script after downloading the necessary data to my local source, i am able to execute the code fine until i get to the following section to train the model:
## train the network
system.time(model <- mx.model.buckets(symbol = symbol,
train.data = train.data,
eval.data = eval.data,
num.round = 100,
ctx = ctx,
verbose = TRUE,
metric = mx.metric.mse.seq,
initializer = initializer,
optimizer = optimizer,
batch.end.callback = NULL,
epoch.end.callback = epoch.end.callback))
When running this section, the following error occurs once gaining connection to the API.
Error in mx.nd.internal.as.array(nd) :
[14:22:53] c:\jenkins\workspace\mxnet\mxnet\src\operator\./rnn-inl.h:359:
Check failed: param_.p == 0 (0.2 vs. 0) Dropout is not supported at the moment.
Is there currently a problem internally within the XNNet R package which is unable to run this code? I can't imagine they would provide a tutorial example for the package that is not executable.
My other thought is that it is something to do with my local device execution and connection to the API. I haven't been able to find any information about this being a problem for other users though.
Any inputs or suggestions would be greatly appreciated thanks.
Looks like you're running an old version of R package. I think following instructions on this page to build a recent R-package should resolve this issue.

Save the Deep learning model by R language

I have established a deep learning model with the h2o package of the R software. I gained a model with good presence and I wanna to save it. However, I tried all kinds of methods but failed. The code "save()" and "save.image()" are provided in the base package of R software. I used the "save()" function to conserve my model. But when I want to use the built model to run new data, it is said that the "model" object is not found in the function. I am really confused about this problem for a few days. If you have any good ideas, just tell me. Thanks for your reading~
load("F:/R/Rstudy/myfile") ##download the saved file
library(h2o)
h2o.init()
Te <- read.csv("F:/Rdata/Test.csv") ## import testing data
Te <- as.h2o(Te)
Te[,2] <- as.factor(Te[,2])
perf <- h2o.performance(model, Te) ## test model
ERROR: Unexpected HTTP Status code: 404 Not Found (url = http://localhost:54321/3/ModelMetrics/models/DeepLearning_model_R_1533035975237_1/frames/RTMP_sid_8185_2)
ERROR MESSAGE:
Object 'DeepLearning_model_R_1533035975237_1' not found in function: predict for argument: model
You can use the below to save and retrieve the model.
build the model
model <- h2o.deeplearning(params)
save the model
model_path <- h2o.saveModel(object=model, path=getwd(), force=TRUE)
print(model_path)
/tmp/mymodel/DeepLearning_model_R_1441838096933
load the model
saved_model <- h2o.loadModel(model_path)
Reference - http://docs.h2o.ai/h2o/latest-stable/h2o-docs/save-and-load-model.html
Hope this helps,
ND

Error: could not find function "makeLearner" using h2o package

I'm using h2o package and trying to create a learner using the below given code
install.packages("h2o")
library("h2o")
h2o.learner <- makeLearner("regr.h2o.deeplearning",predict.type = "response")
But I'm getting this error
> h2o.learner <- makeLearner("regr.h2o.deeplearning",predict.type = "response")
Error: could not find function "makeLearner"
Note: Few months back I used this code without any problem.
Any idea what could be possible thing for this error?
The correct code for this is simply
library(mlr)
h2o.learner = makeLearner("regr.h2o.deeplearning")
The makeLearner() is not part of H2O. It appears to be part of the mlr package. It also seems that mlr does have h2o support, so it might be as simple as adding a library(mlr) to the top of your script? (Making sure that the mlr package has been installed, already, of course.)

error loading NER .bin file as model argument for openNLP::Maxent_Entity_Annotator()

I created a model using Apache OpenNLP's command line tool to recognize named entities. The below code created the model using the file sentences4OpenNLP.txt as a training set.
opennlp TokenNameFinderTrainer -type maxent -model C:\Users\Documents\en-ner-org.bin -lang en -data C:\Users\Documents\apache-opennlp-1.6.0\sentences4OpenNLP.txt -encoding UTF-8
I tested the model from the command line by passing it sentences to tag, and the model seemed to be working well. However, I am unable to successfully use the model from R. I am using the below lines in attempts to create an organization annotating function. Using the same code to load a model downloaded from OpenNLP works fine.
modelNER <- "C:/Users/Documents/en-ner-org.bin"
oa <- openNLP::Maxent_Entity_Annotator(language = "en",
kind = "organization",
probs = TRUE,
model = modelNER)
When the above code is run I get an error saying:
Could not instantiate the opennlp.tools.namefind.TokenNameFinderFactory. The initialization throw an exception.
opennlp.tools.util.ext.ExtensionNotLoadedException: Unable to find implementation for opennlp.tools.util.BaseToolFactory, the class or service opennlp.tools.namefind.TokenNameFinderFactory could not be located!
at opennlp.tools.util.ext.ExtensionLoader.instantiateExtension(ExtensionLoader.java:97)
at opennlp.tools.util.BaseToolFactory.create(BaseToolFactory.java:106)
at opennlp.tools.util.model.BaseModel.initializeFactory(BaseModel.java:254)
Error in .jnew("opennlp.tools.namefind.TokenNameFinderModel", .jcast(.jnew("java.io.FileInputStream", :
java.lang.IllegalArgumentException: opennlp.tools.util.InvalidFormatException: Could not instantiate the opennlp.tools.namefind.TokenNameFinderFactory. The initialization throw an exception.
at opennlp.tools.util.model.BaseModel.loadModel(BaseModel.java:237)
at opennlp.tools.util.model.BaseModel.<init>(BaseModel.java:181)
at opennlp.tools.namefind.TokenNameFinderModel.<init>(TokenNameFinderModel.java:110)
Any advice on how to fix the error would be a big help. Thanks in advance.
Resolved the error. The R function openNLP::Maxent_Entity_Annotator was not compatible with the named entity recognition (NER) model being produced by OpenNLP 1.6.0. Building the NER model using OpenNLP 1.5.3 resulted in openNLP::Maxent_Entity_Annotator running without error.

Resources