R - Using patsy.dmatrices() with reticulate - r

I have a problem of namespace when trying to use function patsy.dmatrices() with the reticulate R package.
Here is a simple reproducible example:
patsy <- import("patsy")
# Data
dataset <- data.frame(Y=rnorm(1000,2.5,1))
# Null model
formula_null <- "I(Y-1) ~ 1"
dmat = patsy$dmatrices(formula_null, data=dataset, NA_action="drop",
return_type="dataframe")
I get the following error:
Error in py_call_impl(callable, dots$args, dots$keywords) :
AttributeError: 'NoneType' object has no attribute 'f_locals'
I think this is associated to the namespace (cf. Namespace issues when calling patsy within a function) which might be fixed by using the eval_env argument of function dmatrices() but I wasn't able to figure out how.
This is quite problematic when we want to use in R the Python statsmodels package which make uses of the patsy package for formulas.
Thanks for your help,

I'm not sure, but I think your guess about namespaces is correct, and this is an unfortunate interaction between patsy and reticulate. By default, patsy tries to look into the caller's scope to evaluate any unrecognized functions/variables in the formula (just like R formula functions do). This requires using Python's stack introspection to peek at the caller's scope. But since the caller is in a different language entirely, this almost certainly isn't going to work.
It is possible to override patsy's normal behavior of reading the caller's namespace, using the eval_env argument to dmatrices. (Docs.) Try this:
dmat = patsy$dmatrices(formula_null, data=dataset, NA_action="drop",
return_type="dataframe",
# New:
eval_env=patsy$EvalEnvironment(c())
)
The idea is that here we create an empty EvalEnvironment object, and tell patsy to use that instead of trying to read the caller's environment.
I'm not familiar with reticulate, so you might need to tweak the above to work – in Python the equivalent would be:
dmat = patsy.dmatrices(formula_null, data=dataset, NA_action="drop",
return_type="dataframe",
eval_env=patsy.EvalEnvironment([])
In particular, if reticulate doesn't convert c() into an empty list, then you'll want to find something that does. (Maybe try patsy$EvalEnvironment(list())?)

Related

source code for grow function in randomForest R package

In source code of R randomForest package, I find the following code in grow.R. What's the purpose for UseMethod? Why does function grow not have function definition and just grow.default and grow.randomForest have definition? Is this something related to calling C function in the R package?
grow <- function(x, ...) UseMethod("grow")
grow.default <- function(x, ...)
stop("grow has not been implemented for this class of object")
grow.randomForest <- function(x, how.many, ...) {
y <- update(x, ntree=how.many)
combine(x, y)
}
Also, in the randomForest.R file, I only find the following code. There is randomForest.default.R file too. Why is there no randomForest.randomForest definition like function grow?
"randomForest" <-
function(x, ...)
UseMethod("randomForest")
What's the purpose for UseMethod? Why does function grow not have function definition and just grow.default and grow.randomForest have definition?
I'd suggest reading about S3 dispatch to understand the patterns you see. Advanced R has a chapter on S3. You can also see related questions here on Stack Overflow.
Is this something related to calling C function in the R package?
No.
Why is there no randomForest.randomForest definition like function grow?
This should make sense if you do the recommended reading above. S3 dispatch uses a pattern of function_name.class to call the correct version of the function (method) based on class of the input. You don't give a randomForest object as an input to the randomForest function, so there is no randomForest.randomForest method defined.
grow() does get called on randomForest objects, hence the grow.randomForest() method. Presumably the authors wanted grow() to error early if it gets called on inappropriate input, so the default for other classes is an immediate error, but they still keep dispatch flexible to work with other classes, enabling extensions of the package and nice play with other packages that may have their own grow() implementations.

Using Huggingface Transformer Models in R

I am trying to use different Huggingface models in R.
This works by importing the transformers package through reticulate (thank you, https://rpubs.com/eR_ic/transfoRmers)
Models where inputs just require a single string work for me.
Some models require a lists or a vector and I simply don't know where to get the information on how exactly to call the model.
Take this model for example. https://huggingface.co/openai/clip-vit-base-patch32.
From the python example I know it takes a picture and (I assume) a character vector of possible classes.
The Python input is: text=["a photo of a cat", "a photo of a dog"], images=image
library(reticulate)
library(here)
library(tidyverse)
transformers <- reticulate::import("transformers")
image_classification_zero_shot <- transformers$pipeline(task = "zero-shot-image-classification", model = "openai/clip-vit-base-patch32")
image_classification <- transformers$pipeline(task = "image-classification", model = "microsoft/beit-base-patch16-224-pt22k-ft22k")
image_url <- "http://images.cocodataset.org/val2017/000000039769.jpg"
The model just requiring the image works
image_classification(images = image_url)
The model which also requires a character input with the classes does not work.
image_classification_zero_shot(text = c("cats", "dogs"), images = image_url)
image_classification_zero_shot(text = "[cats, dogs]", images = image_url)
> Error in py_call_impl(callable, dots$args, dots$keywords) :
TypeError: object of type 'NoneType' has no len()
View(image_classification_zero_shot) does not yield any information.
How do I get the zero shot model to work?
How do I generally get the information on how to call these models in R? It's a function, shouldn't I be able to find information about its parameters somewhere (in R or on huggingface)?
Thank you very much!
I am experiencing a similar issue with another huggingface transformer called "jonas/sdg_classifier_osdg".
Error in py_call_impl(callable, dots$args, dots$keywords) :
TypeError: linear(): argument 'input' (position 1) must be Tensor, not
BatchEncoding
Solutions:
To write python code within a rmarkdown notebook
Activate repl_python() on the console and then write python code.
The same code generates a tensor when written in python but a character string when written in R.
Further Questions:
How to effectively transform an R string into a tensor that its understood by pythorch? (When I try to load torch together with reticulate, R crashes)

loading multiple R packages stored as an object using pacman's p_load

I'd like to use pacman's p_load function. I've read through the documentation and understand how to pass multiple packages into the function directly. However, I'd like to store the package names separately and feed them into pacman so that I can use this same 'list' to later test that the packages have been loaded into the environment.
Based on the documentation, pacman is expecting a character vector. my first attempt was:
pkg_list <- as.vector(c("tidyverse", "forecast")
or
pkg_list <- "tidyverse, forecast"
followed by:
pacman::p_load(pkg_list)
or
pacman::p_load(pkg_list, character.only = FALSE)
Which all return the same error stating:
package ‘pkg_list’ is not available
Fine, it's obviously looking for a package called pkg_list instead of the contents of the object so I also tried using a list and unlisting it in the p_load statement, using eval, etc. but p_load always seems to evaluate whatever is input as a literal.
Just trying to better understand why and how to escape that literal evaluation if possible.

Where did the forecast.Holtwinters go in R 3.4.3?

I'm using R Studio based on R 3.4.3. However, when I tried to call the forecast.HoltWinters function, R told me that "could not find function "forecast.HoltWinters"". Inspect the installed package (v8.2) told me that it's true, there is no forecast.HoltWinters. But the manual in https://cran.r-project.org/web/packages/forecast/ clearly stated that forecast.HoltWinters is still available.
I have also tried stats::HoldWinters, but it's working wrong. The code run fine on another computer, but it couldn't run at all on mine. Is there any solution?
Here is the code. Book2.csv has enough data to last more than 3 periods.
dltt <- read.csv("book2.csv", header = TRUE)
dltt.ts <- ts(dltt$Total, frequency=12, start=c(2014,4))
dltt.ts.hw <- HoltWinters(dltt.ts)
library(forecast)
dltt.ts.hw.fc <- forecast.HoltWinters(dltt.ts.hw) //Error as soon as I run this line
Fit a HoltWinters model using the HoltWinters function and then use forecast. Its all in the help for HoltWinters and forecast, namely "The function invokes particular _methods_ which depend on the class of the first argument". I'll copy the guts of it here:
m <- HoltWinters(co2)
forecast(m)
Note this will call the non-exported forecast.HoltWinters function, which you should never call directly using triple-colon notation as some may suggest.

mas5 normalization error: unable to find an inherited method for function

Goal: mas5 normalize data.
Problem: when I try the following R code, I get this
error: unable to find an inherited method for function bg.correct for signature ExpressionFeatureSet, character
I have looked on SO, and found the following: What does this mean: unable to find an inherited method for function ‘A’ for signature ‘"B"’, but I am not exactly sure how to fix my specific problem and use the mas5 function properly. I have also looked at this affy manual but still stuck...
installpkg("affy")
library('affy')
setwd("/Users/er/Desktop/DesktopFolders/DataSets/CD8Helios/Microarray/CELfiles/CEL")
cel_Files <- list.celfiles()
affyRaw <- read.celfiles(cel_Files)
eset <- mas5(affyRaw)
If you are sure that the .cel files were created based on experiments performed on the type of array that works with affy package than you should try this workflow using ReadAffy from affy package.
cel_Files <- list.celfiles()
affyRaw <- affy::ReadAffy(filenames=cel_Files)
eset <- mas5(affyRaw)
However, it might be the case that the affy package is not designed for your array type. Then, you should switch to the oligo and oligoClasses packages and normalize with analogous function rma
cel_Files <- oligoClasses::list.celfiles()
affyRaw <- oligo::read.celfiles(cel_Files)
eset <- oligo::rma(affyRaw)

Resources