Newsmap topic classification: issue with "predict" step of the newsmap process - r

I am giving a try to the Newsmap package for topic classification (not geographical, but you know, implementing to other tasks...). I follow the instructions from the Quanteda tutorials website (here).
Everything runs smoothly until I try to predict the topic most strongly associated with each of my texts.
Here is the code:
labels <- types(toks_labelExclu)
dfmt_label <- dfm(toks_labelExclu, tolower = FALSE) # The dfm with the labels of my 35 topics
dfmt_feat <- dfm(toks_sent) %>%
dfm_trim(min_termfreq = 50) # The dfm with the features to be associated with my topics
model_nm <- textmodel_newsmap(dfmt_feat, dfmt_label)
coef(model_nm, n= 20)[labels] # All good so far
pred_nm <- predict(model_nm)
# Here is the snag: the function returns
Error in x[, feature] : Subscript out of bounds
Does anyone have an idea of where the error could come from?
For information, here is the sessionInfo:
R version 4.0.0 (2020-04-24)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.4
other attached packages:
[1] newsmap_0.7.1 quanteda_2.0.1

Related

GVIZ on R not loading mm10 from UCSC

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.6
I am trying to load an ideogram track for the mm10 mouse genome on GVIZ. This always used to work but now is giving me the below error:
> itrack <- IdeogramTrack(genome = "mm10", chromosome = "chr5")
Error in value[3L] :
There doesn't seem to be any chromosome length data available for genome 'mm10' at UCSC or the service is temporarily down.
In addition: Warning message:
In value[3L] :
There doesn't seem to be any cytoband data available for genome 'mm10' at UCSC or the service is temporarily down. Trying to fetch the chromosome length data.
One solution is to update R and reinstall Gviz via Bioconductor:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("Gviz")
library(Gviz)
itrack <- IdeogramTrack(genome = "mm10", chromosome = "chr5")

response.plot3() crashes RStudio

0. Session information
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
1. Summary of my issue
I am having a crash while using a modified function of response.plot2(), named response.plot3(). The problem is not the function in itself, as the same problem occurs with response.plot2().
2. Code
library(biomod2)
library(raster)
library(reshape)
library(ggplot2)
setwd("xxx")
# I load the modified version of response.plot2()
source("/response.plot_modified.R", local = TRUE)
sp <- "NAME"
baseline_EU <- readRDS("./data/baseline_EU.rds")
initial.wd <- getwd()
setwd("models")
# Loading of formatted data and models calibrated by biomod
load(paste0(sp, "/run.data"))
load(paste0(sp, "/model.runs"))
# Variables used for calibration
cur.vars <- model.runs#expl.var.names
# Loading model names into R memory
models.to.plot <- BIOMOD_LoadModels(model.runs)
# Calculation of response curves with all models (stored in the object resp which is an array)
resp <- response.plot3(models = models.to.plot,
Data = baseline_EU[[cur.vars]],
fixed.var.metric = "sp.mean",
show.variables = cur.vars,
run.data = run.data)
I have got 60 models and the code plot the first curve before aborting the session, with no further explanation.
3. What I unsuccessfully tried
(1) check that it was not a ram issue
(2) uninstall-reinstall all the packages and their dependencies
(3) uptade to the last R version
(4) go back to response.plot2() to see if the issue could come from response.plot3()
4. I found some similar errors which lead me to think that it might be a package issue
https://github.com/rstudio/rstudio/issues/9373
Call to library(raster) or require(raster) causes Rstudio to abort session
Now I presume that there is a problem either with the biomod2 or the raster packages, or maybe the R version?
I would greatly appreciate your help if you have any ideas.

How to approach this strange error when working with the flextable library in R?

I'm trying to use the flextable library to make tables in RMarkdown (on an RStudio server). I'm getting a strange error message and can't make any progress on figuring out what I'm doing wrong. I'm getting this error message: Error in UUIDgenerate(n = nrow(uid), use.time = TRUE) :
unused argument (n = nrow(uid)).
NOTE: The error occurs when I tried to run the code within the RMarkdown document to the console. The code below (and error output) occurred in an R Script.
This code below produces the error:
library(flextable)
ft <- flextable(head(mtcars))
ft
Error in UUIDgenerate(n = nrow(uid), use.time = TRUE) :
unused argument (n = nrow(uid))
head_ft <- flextable(as.data.frame(mtcars))
head_ft
Error in UUIDgenerate(n = nrow(uid), use.time = TRUE) :
unused argument (n = nrow(uid))
sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)
packageVersion("flextable")
[1] ‘0.5.11’
You need to update the package uuid. This issue is solved in the latest releases of flextable, so updating flextable should also update uuid automatically

h2o deeplearning error when specifying nfolds for cross validation

has this issue been resolved by now? I encounter the same error message.
Usecase: I am doing binary classification using h2o's deeplearning() function. Below, I provide randomly generated data the same size as my actual usecase. System specs:
# R version 3.3.2 (2016-10-31)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows >= 8 x64 (build 9200)
# h2o version h2o_3.20.0.2
I am currently learning how to use h2o, so I have played with that function quite a bit. Everything runs smoothly until I specify parameters for cross validation.
The problem occurs when specifying the nfolds parameter for cross-validation. Interestingly, I can specify low values for nfolds and everything goes fine. For my use case, even nfolds > 3 produced an error message (see below). I provide an example below, here I was able to specify nfolds < 7 (not really consistent... sometimes just up to nfolds = 3). Above those values, the REST API give the above mentioned error: object not found for argument: key.
# R version 3.3.2 (2016-10-31)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows >= 8 x64 (build 9200)
# h2o version h2o_3.20.0.2
#does not matter whether run on debian or windows, does not matter how many threads are used
#error occurs with options for cross validation, otherwise works fine
#no error occurs with specifying a low nfold number(in my actual use case, maximum of 3 folds possible without running into that error message)
require(h2o)
h2o.init(nthreads = -1)
x = matrix(rnorm(900*96, mean=10, sd=2), nrow=900, ncol=96)
y = sample(size=900, x=c(0,1), replace=T)
sampleData = cbind(x, y)
sampleData = data.frame(sampleData)
sampleData[,97] = as.factor(sampleData[,97])
m = h2o.deeplearning(x = 1:96, y = 97,
training_frame = as.h2o(sampleData), reproducible = T,
activation = "Tanh", hidden = c(64,16), epochs = 10000, verbose=T,
nfolds = 4, balance_classes = TRUE, #Cross-validation
standardize = TRUE, variable_importances = TRUE, seed=123,
stopping_rounds=2, stopping_metric="misclassification", stopping_tolerance=0.01, #early stopping
)
performance = h2o.performance(model = m)
print(performance)
######### gives error message
# ERROR: Unexpected HTTP Status code: 404 Not Found (url = http://localhost:xxxxx/3/Models/DeepLearning_model_R_1535036938222_489)
#
# water.exceptions.H2OKeyNotFoundArgumentException
# [1] "water.exceptions.H2OKeyNotFoundArgumentException: Object 'DeepLearning_model_R_1535036938222_489' not found for argument: key"
I cannot understand why it does work only for low values of nfolds. Any suggestions? What am I missing here? I've searched most remotely related threads on Google Groups and also here on stackoverflow, but without success. If this is to do with a changed API for h2o 3.x as suggested above (though that post was 18 months ago...) I would highly appreciate some documentary on how to correctly specify the syntax to do CV with h2o.deeplearning(). Thanks in advance!
This is a bug caused by setting the verbose parameter to True, the workaround is to leave the verbose parameter as the default which is FALSE. I've created a jira ticket to track the issue here

Error in varimp (R party package) when conditional = TRUE

I am trying to calculate variable importance for a random forest built using the cforest function in the party package. I would like to run varimp with conditional set to TRUE, but I get an error message when I do so. The error reads:
Error in if (node[[5]][1] == variableID) cp <- node[[5]][[3]] :
argument is of length zero
Varimp run with the default setting conditional = FALSE works just fine.
Regarding the data set, all variables are categorical. The response variable is Glottal (yes/no), and there are seven predictors. Here is a link to the data, and here is the code I am using:
library(party)
glottal.df <-read.csv("~glottal_data.csv", header=T)
glottal.df$Instance <- factor(glottal.df$Instance)
data.controls <- cforest_unbiased(ntree = 500, mtry = 2)
set.seed(45)
glottal.cf <- cforest(Glottal ~ Stress + Boundary + Context + Instance + Region + Target + Speaker, data = glottal.df, controls = data.controls)
# this gives me an error
glottal.cf.varimp.true <- varimp(glottal.cf, conditional = TRUE)
# this works
glottal.cf.varimp.false <- varimp(glottal.cf)
Can anyone tell me why I am getting this error? It is not a problem with any specific variable as the problem persists even if I remove a variable, create a new forest and try to recalculate varimp, and there are no missing values in the data set. Many thanks in advance for your help!
Appears to be working with party 1.2.4:
> glottal.cf.varimp.true
Stress Boundary Context
0.0003412322 0.2405971564 0.0122369668
Instance Region Target
-0.0043507109 0.0044360190 -0.0011469194
Speaker
0.0384834123
> packageVersion('party')
[1] ‘1.2.4’
> R.version
_
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 3
minor 4.3
year 2017
month 11
day 30
svn rev 73796
language R
version.string R version 3.4.3 (2017-11-30)
nickname Kite-Eating Tree

Resources