NLP textEmbed function - r

I am trying to run the textEmbed function in R.
Set up needed:
require(quanteda)
require(quanteda.textstats)
require(udpipe)
require(reticulate)
#udpipe_download_model(language = "english")
ud_eng <- udpipe_load_model(here::here('english-ewt-ud-2.5-191206.udpipe'))
virtualenv_list()
reticulate::import('torch')
reticulate::import('numpy')
reticulate::import('transformers')
reticulate::import('nltk')
reticulate::import('tokenizers')
require(text)
It runs the following code
tmp1 <- textEmbed(x = 'sofa help',
model = 'roberta-base',
layers = 11)
tmp1$x
However, it does not run the following code
tmp1 <- textEmbed(x = 'sofa help',
model = 'roberta-base',
layers = 11)
tmp1$x
It gives me the following error
Error in x[[1]] : subscript out of bounds
In addition: Warning message:
Unknown or uninitialised column: `words`.
Any suggestions would be highly appreciated

I believe that this error has been fixed with a newer version of the text-package (version .9.50 and above).
(I cannot see any difference in the two code parts – but I think that this error is related to only submitting one token/word to textEmbed, which now works).
Also, see updated instructions for how to install the text-package http://r-text.org/articles/Extended_Installation_Guide.html
library(text)
library(reticulate)
# Install text required python packages in a conda environment (with defaults).
text::textrpp_install()
# Show available conda environments.
reticulate::conda_list()
# Initialize the installed conda environment.
# save_profile = TRUE saves the settings so that you don't have to run textrpp_initialize() after restarting R.
text::textrpp_initialize(save_profile = TRUE)
# Test so that the text package work.
textEmbed("hello")

Related

Why does running the tutorial example fail with Rstan?

I installed the RStan successfully, the library loads. I try to run a tutorial example (https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started#example-1-eight-schools). After applying the stan function
fit1 <- stan(
file = "schools.stan", # Stan program
data = schools_data, # named list of data
chains = 4, # number of Markov chains
warmup = 1000, # number of warmup iterations per chain
iter = 2000, # total number of iterations per chain
cores = 1, # number of cores (could use one per chain)
refresh = 0 # no progress shown
)
I get the following error:
*Error in compileCode(f, code, language = language, verbose = verbose) :
C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: file1c34165764a4.o:file1c34165764a4.cpp:(.text$_ZN3tbb8internal26task_scheduler_observer_v3D0Ev[_ZN3tbb8internal26task_scheduler_observer_v3D0Ev]+0x1d): undefined reference to `tbb::internal::task_scheduler_observer_v3::observe(bool)'C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: file1c34165764a4.o:file1c34165764a4.cpp:(.text$_ZN3tbb10interface623task_scheduler_observerD1Ev[_ZN3tbb10interface623task_scheduler_observerD1Ev]+0x1d): undefined reference to `tbb::internal::task_scheduler_observer_v3::observe(bool)'C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: file1c34165764a4.o:file1c34165764a4.cpp:(.text$_ZN3tbb10interface623task_scheduler_observerD1Ev[_ZN3tbb10interface623task_scheduler_observerD1Ev]+0x3a): undefined reference to `tbb::internal::task_scheduler_observer_v3::observe(bool)'C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: file1c34165764a4.o:file1c34165764a4.cpp:(.text$_ZN3tbb10interface
Error in sink(type = "output") : invalid connection*
Simply running example(stan_model, run.dontrun=T) gives the same error.
What does this error mean?
Is rtools wrongly installed? My PATH seems to contain the correct folder C:\\rtools42\\x86_64-w64-mingw32.static.posix\\bin;. Is something wrong with the version of my Rstan package? I am struggling to interpret this error?
What to try to solve it?
Apparently Rtools42 in incompatible with the current version Rstan on CRAN. The solution that worked for me is found here: https://github.com/stan-dev/rstan/wiki/Configuring-C---Toolchain-for-Windows#r-42

R package development: tests pass in console, but fail via devtools::test()

I am developing an R package that calls functions from the package rstan. As a MWE, my test file is currently set up like this, using code taken verbatim from rstan's example:
library(testthat)
library(rstan)
# stan's own example
stancode <- 'data {real y_mean;} parameters {real y;} model {y ~ normal(y_mean,1);}'
mod <- stan_model(model_code = stancode, verbose = TRUE)
fit <- sampling(mod, data = list(y_mean = 0))
# I added this line, and it's the culprit
summary(fit)$summary
When I run this code in the console or via the "Run Tests" button in RStudio, no errors are thrown. However, when I run devtools::test(), I get:
Error (test_moments.R:11:1): (code run outside of `test_that()`)
Error in `summary(fit)$summary`: $ operator is invalid for atomic vectors
and this error is definitely not occurring upstream of that final line of code, because removing the final line allows devtools::test() to run without error. I am running up-to-date packages devtools and rstan.
It seems that devtools::test evaluates the test code in a setting where S4 dispatch does not work in the usual way, at least for packages that you load explicitly in the test file (in this case rstan). As a result, summary dispatches to summary.default instead of the S4 method implemented in rstan for class "stanfit".
The behaviour that you're seeing might relate to this issue on the testthat repo, which seems unresolved.
Here is a minimal example that tries to illuminate what is happening, showing one possible (admittedly inconvenient) work-around.
pkgname <- "foo"
usethis::create_package(pkgname, rstudio = FALSE, open = FALSE)
setwd(pkgname)
usethis::use_testthat()
path_to_test <- file.path("tests", "testthat", "test-summary.R")
text <- "test_that('summary', {
library('rstan')
stancode <- 'data {real y_mean;} parameters {real y;} model {y ~ normal(y_mean,1);}'
mod <- stan_model(model_code = stancode, verbose = TRUE)
fit <- sampling(mod, data = list(y_mean = 0))
expect_identical(class(fit), structure('stanfit', package = 'rstan'))
expect_true(existsMethod('summary', 'stanfit'))
x <- summary(fit)
expect_error(x$summary)
expect_identical(x, summary.default(fit))
print(x)
f <- selectMethod('summary', 'stanfit')
y <- f(fit)
str(y)
})
"
cat(text, file = path_to_test)
devtools::test(".") # all tests pass
If your package actually imports rstan (in the NAMESPACE sense, not in the DESCRIPTION sense), then S4 dispatch seems to work fine, presumably because devtools loads your package and its dependencies in a "proper" way before running any tests.
cat("import(rstan)\n", file = "NAMESPACE")
newtext <- "test_that('summary', {
stancode <- 'data {real y_mean;} parameters {real y;} model {y ~ normal(y_mean,1);}'
mod <- stan_model(model_code = stancode, verbose = TRUE)
fit <- sampling(mod, data = list(y_mean = 0))
x <- summary(fit)
f <- selectMethod('summary', 'stanfit')
y <- f(fit)
expect_identical(x, y)
})
"
cat(newtext, file = path_to_test)
## You must restart your R session here. The current session
## is contaminated by the previous call to 'devtools::test',
## which loads packages without cleaning up after itself...
devtools::test(".") # all tests pass
If your test is failing and your package imports rstan, then something else may be going on, but it is difficult to diagnose without a minimal version of your package.
Disclaimer: Going out of your way to import rstan to get around a relatively obscure devtools issue should be considered more of a hack than a fix, and documented accordingly...

mlr error in randomForestSRC after first run

I have installed Java 9.0.4 and all the relevant R libraries on macOS 10.13.4 to run the following script in R 3.5.0 (invoked in RStudio 1.1.423):
options("java.home"="/Library/Java/JavaVirtualMachines/jdk-9.0.4.jdk/Contents/Home/lib")
Sys.setenv(LD_LIBRARY_PATH='$JAVA_HOME/server')
dyn.load('/Library/Java/JavaVirtualMachines/jdk-9.0.4.jdk/Contents/Home/lib/server/libjvm.dylib')
library(mlr)
library(tidyverse) # for ggplot and data wrangly
library(ggvis) # ggplot visualisation in shiny app
library(rJava)
library(FSelector)
data <- read.csv('week07/PhishingWebsites.csv')
# All variables to nominal (PhishingWebsites)
data[c(1:31)] <- lapply(data[c(1:31)] , factor)
# Configure a classification task and specify Result as the target feature.
classif.task <- makeClassifTask(id = "web", data = data, target = "Result")
fv <- generateFilterValuesData(classif.task)
It works fine the first time I run it, but if I run it a second time I get the following error:
Error in randomForestSRC::rfsrc(getTaskFormula(task), data = getTaskData(task), :
An error has occurred in the grow algorithm. Please turn trace on for further analysis.
Any help much appreciated.

R: pwfdtest produces an error message even with the test data

In R using the plm package, pwfdtest produces an error message even with the test data:
pwfdtest(log(emp) ~ log(wage) + log(capital), data = EmplUK, h0 = "fe")
Error message:
"Error in order(nchar(cnames), decreasing = TRUE) : 4 arguments
passed to .Internal(nchar) which requires 3".
Bug or am I missing something?
I had a similar question myself with a similar test (linearHypothesis). Apparently the nchar function changed, so you need to install the newest version of R and then update all your packages.
After re-installing R,
update.packages(checkBuilt=TRUE, ask=FALSE)

Locked package namespace

I am using the "BMA" package in R 3.1.0, and get an error when running one of the functions in the package, iBMA.glm. When running the example in the package documentation:
## Not run:
############ iBMA.glm
library("MASS")
library("BMA")
data(birthwt)
y<- birthwt$lo
x<- data.frame(birthwt[,-1])
x$race<- as.factor(x$race)
x$ht<- (x$ht>=1)+0
x<- x[,-9]
x$smoke <- as.factor(x$smoke)
x$ptl<- as.factor(x$ptl)
x$ht <- as.factor(x$ht)
x$ui <- as.factor(x$ui)
### add 41 columns of noise
noise<- matrix(rnorm(41*nrow(x)), ncol=41)
colnames(noise)<- paste('noise', 1:41, sep='')
x<- cbind(x, noise)
iBMA.glm.out<- iBMA.glm( x, y, glm.family="binomial",
factor.type=FALSE, verbose = TRUE,
thresProbne0 = 5 )
summary(iBMA.glm.out)
I get the error:
Error in registerNames(names, package, ".__global__", add) :
The namespace for package "BMA" is locked; no changes in the global variables list may be made.
I get the error in RStudio running R 3.1.0 on Ubuntu.
on Windows 7, from RStudio and the R console I get a similar error:
Error in utils::globalVariables(c("nastyHack_glm.family", "nastyHack_x.df")) :
The namespace for package "BMA" is locked; no changes in the global variables list may be made.
I also get the same error when running my own data in the function. I'm not clear on what this error means and how to work around the error to be actually able to use the function. Any advice would be appreciated!

Resources