Using Huggingface Transformer Models in R

Using Huggingface Transformer Models in R - r

I am trying to use different Huggingface models in R.
This works by importing the transformers package through reticulate (thank you, https://rpubs.com/eR_ic/transfoRmers)
Models where inputs just require a single string work for me.
Some models require a lists or a vector and I simply don't know where to get the information on how exactly to call the model.
Take this model for example. https://huggingface.co/openai/clip-vit-base-patch32.
From the python example I know it takes a picture and (I assume) a character vector of possible classes.
The Python input is: text=["a photo of a cat", "a photo of a dog"], images=image
library(reticulate)
library(here)
library(tidyverse)
transformers <- reticulate::import("transformers")
image_classification_zero_shot <- transformers$pipeline(task = "zero-shot-image-classification", model = "openai/clip-vit-base-patch32")
image_classification <- transformers$pipeline(task = "image-classification", model = "microsoft/beit-base-patch16-224-pt22k-ft22k")
image_url <- "http://images.cocodataset.org/val2017/000000039769.jpg"
The model just requiring the image works
image_classification(images = image_url)
The model which also requires a character input with the classes does not work.
image_classification_zero_shot(text = c("cats", "dogs"), images = image_url)
image_classification_zero_shot(text = "[cats, dogs]", images = image_url)
> Error in py_call_impl(callable, dots$args, dots$keywords) :
TypeError: object of type 'NoneType' has no len()
View(image_classification_zero_shot) does not yield any information.
How do I get the zero shot model to work?
How do I generally get the information on how to call these models in R? It's a function, shouldn't I be able to find information about its parameters somewhere (in R or on huggingface)?
Thank you very much!

I am experiencing a similar issue with another huggingface transformer called "jonas/sdg_classifier_osdg".
Error in py_call_impl(callable, dots$args, dots$keywords) :
TypeError: linear(): argument 'input' (position 1) must be Tensor, not
BatchEncoding
Solutions:
To write python code within a rmarkdown notebook
Activate repl_python() on the console and then write python code.
The same code generates a tensor when written in python but a character string when written in R.
Further Questions:
How to effectively transform an R string into a tensor that its understood by pythorch? (When I try to load torch together with reticulate, R crashes)

Related

Obtaining Metadata Information using ee_print function from RGEE

I am using the package RGEE (R wrapper for the Google Earth Engine Python API). The function ee_print() seems to work perfectly for an ImageCollection of just one variable, but seems to fail for ImageCollection with different variables where one needs to select the variable of interest. Any ideas on how to approach this issues with the latter kind of data.
Here's an example code:
GRIDMET = ee$ImageCollection("IDAHO_EPSCOR/GRIDMET")
ee_print(GRIDMET)
Where I get the following error message in return:
Error in strsplit(code, ":") : non-character argument

Have you considered the following approach?
GRIDMET = ee$ImageCollection("IDAHO_EPSCOR/GRIDMET")
print(GRIDMET, type = getOption("rgee.print.option"))
And play with the list of all metadata properties
GRIDMET$propertyNames()$getInfo()# Get a list of all metadata properties
(GRIDMET$get("product_tags")$getInfo()) # you can choose to show a characteristic like "product_tags"

Running a MCMC analysis with a new tree I made using BM, anyone know what this error would mean?

tree_mvBM <- read.nexus("C:/Users/Zach/Desktop/tree_mvBM.tre")
View(tree_mvBM)
dat <- data$Tp; names(dat) <- rownames(data)
Error in data$Tp : object of type 'closure' is not subsettable

You're trying to refer to an object called data in your global workspace, presumably a data frame. The object doesn't exist (you forgot to read it in,or you called it something else, or ... ?), so R is instead finding the built-in function data. It is trying to "subset" it (i.e. $Tp tells R to extract the element named "Tp"), which is not possible because you can't extract an element of a function. (Functions are called "closures" in R for technical reasons.)
This is one reason (probably the main reason) that you shouldn't give your variables names that match the names of built-in R objects (like I, t, c, data, df, ...). If you had called your data my_data instead the error message would be
Error: object 'my_data' not found
which might be easier to understand.
This is such a common error that there are jokes about it (image search the error message):

R - Using patsy.dmatrices() with reticulate

I have a problem of namespace when trying to use function patsy.dmatrices() with the reticulate R package.
Here is a simple reproducible example:
patsy <- import("patsy")
# Data
dataset <- data.frame(Y=rnorm(1000,2.5,1))
# Null model
formula_null <- "I(Y-1) ~ 1"
dmat = patsy$dmatrices(formula_null, data=dataset, NA_action="drop",
return_type="dataframe")
I get the following error:
Error in py_call_impl(callable, dots$args, dots$keywords) :
AttributeError: 'NoneType' object has no attribute 'f_locals'
I think this is associated to the namespace (cf. Namespace issues when calling patsy within a function) which might be fixed by using the eval_env argument of function dmatrices() but I wasn't able to figure out how.
This is quite problematic when we want to use in R the Python statsmodels package which make uses of the patsy package for formulas.
Thanks for your help,

I'm not sure, but I think your guess about namespaces is correct, and this is an unfortunate interaction between patsy and reticulate. By default, patsy tries to look into the caller's scope to evaluate any unrecognized functions/variables in the formula (just like R formula functions do). This requires using Python's stack introspection to peek at the caller's scope. But since the caller is in a different language entirely, this almost certainly isn't going to work.
It is possible to override patsy's normal behavior of reading the caller's namespace, using the eval_env argument to dmatrices. (Docs.) Try this:
dmat = patsy$dmatrices(formula_null, data=dataset, NA_action="drop",
return_type="dataframe",
# New:
eval_env=patsy$EvalEnvironment(c())
)
The idea is that here we create an empty EvalEnvironment object, and tell patsy to use that instead of trying to read the caller's environment.
I'm not familiar with reticulate, so you might need to tweak the above to work – in Python the equivalent would be:
dmat = patsy.dmatrices(formula_null, data=dataset, NA_action="drop",
return_type="dataframe",
eval_env=patsy.EvalEnvironment([])
In particular, if reticulate doesn't convert c() into an empty list, then you'll want to find something that does. (Maybe try patsy$EvalEnvironment(list())?)

R placing rugarch output into a dataframe

I have updated to the last version of R and updated the rugarch package as well.
Unfortunately some code that worked previously no longer works. I now get an error.
I would be greatful for some help to get the output into a dataframe.
library(rugarch)
data(sp500ret)
spec = ugarchspec( )
fit1 = ugarchfit(spec = spec, data = sp500ret)
df.fit1 <- as.data.frame(fit1,which="VaR")
Error in as.data.frame.default(fit1, which = "VaR"):
cannot coerce class "structure("uGARCHfit", package = "rugarch")" to a
data.frame
attributes(fit1)
shows:$fit$sigma
but when I try:
df1 <- data.frame(fit1$fit$sigma)
I get an error message;
Error in fit1$fit : $ operator not defined for this S4 class

as.data.frame(fit1, which="VaR") NEVER worked with an object of class uGARCHfit (you are confusing this with a uGARCHroll object). If you want the conditional VaR you can NOW (in the new version) use the quantile method e.g. quantile(fit1, c(0.01,0.05)).
If you want the conditional standard deviation then you should use sigma(fit1) which will return an xts matrix, or fit1#fit$sigma (# goes after an object in S4 classes). This and most other questions can be answered by carefully reading the documentation, vignette and the author's website which contains details of the changes.

Error with src() command in R

Yesterday I posted this question on Stats Exchange and based on the response I got, I decided to do some analysis using R's src() function. It's part of the "sensitivity" package.
I installed the package with no trouble, and then tried the following command:
sens <- src(seminars, REV, rank=TRUE, nboot=100)
sens is a new variable to store the results of the test
seminars is a data frame that I imported from a CSV file using the read.csv() command
REV is the name of a variable/column in seminars and my desired response variable
When I ran the command, I got the following error:
Error in data.frame(Y = y, X) : object 'REV' not found
Any thoughts?

From the documentation of src
y: a vector containing the responses corresponding to the design
of experiments (model output variables).
The input needs to be a vector (apparently) and you're attempting to pass in a name (and not even quoting the name at that). Since REV isn't defined (I'm guessing due to the error message) in the global environment it doesn't know what to do.
From reading the documentation it sounds like what you want to do is pass sensitivity[,-which(colnames(sensitivity) == "REV")] (just the design matrix - you don't want to include the responses) in as x and sensitivity[,"REV"] in as y.

This error is linked to the fact that the data.frame X=seminars include factors with 0 value, which produce an error while constructing the regression coefficient. You can first remove them as they don't contribute to the variance of the output.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Using Huggingface Transformer Models in R - r

Related

Obtaining Metadata Information using ee_print function from RGEE

Running a MCMC analysis with a new tree I made using BM, anyone know what this error would mean?

R - Using patsy.dmatrices() with reticulate

R placing rugarch output into a dataframe

Error with src() command in R

Categories

Resources