problems with UDpipe models - r

I'm trying to implement a sentiment analysis study on data extracted from Twitter, with R.
I am using the udpipe library
when I write
udpipe_dowload_model("model")
model< <- udpipe_load_model("directory)
out <- as.data.frame(udpipe_annotate(object, x, doc_id,...)
and I run, an exception is raised:
Error in udp_tokenise_tag_parse(object$model, x, doc_id, tokenizer, tagger, :
external pointer is not valid
the relative traceback is:
4.
stop(structure(list(message = "external pointer is not valid",
call = udp_tokenise_tag_parse(object$model, x, doc_id, tokenizer,
tagger, parser, log_every, log_now), cppstack = structure(list(
file = "", line = -1L, stack = c("1 udpipe.so 0x0000000117ba907e _ZN4Rcpp9exceptionC2EPKcb + 222", ...
3.
udp_tokenise_tag_parse(object$model, x, doc_id, tokenizer, tagger,
parser, log_every, log_now)
2.
udpipe_annotate(model, x = x, doc_id = doc_id, trace = F) at textAnalysisFunct.R#221
1.
lemmaUDP(x = twt$text_clean, model = modelI, doc_id = twt$doc_id,
stopw = tm::stopwords("italian"), userstopw = mystop)
then I started to debug and on the console appeared:
Error during wrapup: external pointer is not valid
the function lemmaUDP was created by my teacher, if useful I paste here its definition as well, but is the same as if done manually
lemmaUDP <- function(x = NULL,
model = NULL,
doc_id = NULL,
stopw = tm::stopwords("italian"),
userstopw=NULL){
require(udpipe)
if(is.null(x)){message("manca vettore testi");return()}
if(is.null(model)){message("manca modello");return()}
if(class(x) != "character"){message("il vettore x non è di tipo testo");return()}
if(class(model) != "udpipe_model"){message("modello non valido");return()}
if(is.null(doc_id)){doc_id <- 1:length(x)}
if(!is.null(userstopw)){
stopw <- c(stopw,userstopw)
}
xx <- udpipe_annotate(model, x = x, doc_id = doc_id,trace = F)
xx <- as.data.frame(xx)
xx$STOP <- ifelse(xx$lemma %in% stopw | xx$token %in% stopw,TRUE,FALSE)
return(xx)
}

Related

Concatenate layers in R Keras

I have this BERT classifier, where I want to concatenate the BERT output with additional features (hot-coded, 13 categories).
I get this error message which I do not understand - the arguments specified are all named.
input_word_ids <- layer_input(shape = c(set.max_length), dtype = 'int32', name = "input_word_ids")
input_mask <- layer_input(shape = c(set.max_length), dtype = 'int32', name = "input_attention_mask")
input_topic <- layer_input(shape = c(13), dtype = 'int32', name = "input_topic")
last_hidden_state <- model_tf(input_word_ids, attention_mask = input_mask)[[1]] # shape=(None, 512, 768)
cls_token <- last_hidden_state[, 1,] # shape=(None, 768)
output <- cls_token %>%
layer_concatenate(inputs = list(cls_token, input_topic), axis = -1)
Error in assert_all_dots_named(envir, cl) :
All arguments provided to `...` must be named.
Call with unnamed arguments in dots:
layer_concatenate(inputs = list(cls_token, input_topic), axis = -1, .)
If I run layer_concatenate(inputs = list(cls_token, input_topic)) [without the axis argument],
I get
Error in modifiers[[nm]](args[[nm]]) :
cannot coerce type 'environment' to vector of type 'integer'
The first error message stems from the Keras package (assert_all_dots_named(), line 435, https://github.com/rstudio/keras/blob/main/R/utils.R) if I am not mistaken
I read the Keras vignette, I don't see what I am doing wrong...
Any help is highly appreciated, many thanks in advance!
I was able to solve it on my own - cls_token %>% was the problem. A conflict of Keras functional api and maggritr-piping I suppose. cls_token %>% was used by layer_concatenate() as another "unnamed" input, therefore the error message.
Solution:
output <- layer_concatenate(inputs = list(cls_token, input_topic), axis = 1) %>%
layer_dropout(rate = set.dropout)

error unused argument for gene_ontology function

I am trying to perform a gene ontology analysis in R and I got this error;
Error in get_ontology(x, name = paste("Cluster", names(df.list[i]), "Pathways_for_kmeans_cluster", :
unused argument (name = paste("Cluster", names(df.list[i]), "Pathways_for_kmeans_cluster", j, sep = "_"))
after I run this script:
numclus <- sort(unique(df.list[[i]]$kmeans.cluster))
subdirname <- paste("D:/Master jaar 1/RP1/RP1 projects/Aged macrophage characterisation/Single cell sequencing/nieuwe stuff", "/top100_genes_from_", names(df.list[i]), "_pathways_from_kmeans_clusters", sep = " ")
dir.create(subdirname, showWarnings = FALSE)
install.packages("ontologyPlot")
for (j in numclus){
x <- data.frame(gene = rownames(df.list[[i]][which(df.list[[i]]$kmeans.cluster == j),]), avg_logFC = 0)}
if(nrow(x)>10){
get_ontology(x, name = paste("Cluster", names(df.list[i]), "Pathways_for_kmeans_cluster", j, sep = "_"), return.data = F, outdir = subdirname, full_GSEA = F)
}
}
I get the error after the line with get_ontology
It's telling you that get_ontology doesn't have an argument called name. You haven't told us which packages you're using, so I can't help you further, but you might get somewhere by loooking at the online help.

R -> Error in `row.names<-.data.frame`

Following this other question (Get p-value about contrast hypothesis for rectangular matrix) I am trying to run the following code in R, but the line:
colnames(posmat) <- "pos_c1"
causes an error when calling the function summary().
Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique value when setting 'row.names': ‘Pos’
Does anybody knows why this error comes up?
Here the MWE:
library(lme4)
library(lmerTest)
library(corpcor)
database <- data.frame(
Clos=factor(c(4,4,1,4,4,3,2,1,2,1,2,2,4,3,1,2,1,4,1,3,2,2,4,4,4,4,2,1,4,2,2,1,4,2,4,2,1,4,4,3)),
Pos=factor(c(2,4,1,2,5,6,7,2,2,2,5,6,3,3,3,8,5,3,4,2,1,4,3,3,2,6,1,8,3,7,5,7,8,3,6,6,1,6,3,7)),
RF=c(8,6,2,9,7,1,7,6,3,4,6,4,5,2,5,5,3,4,1,3,1,2,3,1,2,2,3,1,8,5,2,2,7,1,9,4,5,6,4,2),
Score=c(4,3,3,5,4,3,2,4,5,2,2,3,3,4,4,4,3,2,3,3,5,4,3,4,4,2,3,4,3,4,1,2,2,2,3,4,5,3,1,2)
)
clos_c1 = c(0,0,-1,1)
clos_c2 = c(0,-1,0,1)
clos_c3 = c(-1,0,0,1)
closmat.temp = rbind(constant = 1/4,clos_c1,clos_c2,clos_c3)
closmat = solve(closmat.temp)
closmat = closmat[, -1]
closmat
pos_c1 = c(1/2,1/2,-1/6,-1/6,-1/6,-1/6,-1/6,-1/6)
posmat.temp = rbind(pos_c1)
posmat = pseudoinverse(posmat.temp)
colnames(posmat) <- "pos_c1"
contrasts(database$Clos) = closmat
contrasts(database$Pos) = posmat
model = lmer(Score~Clos+Pos+(1|RF), data = database, REML = TRUE)
summary(model)
The problem is that when you run the model, you have the contrasts(database$Pos) without colnames but just one.
You can see that by running your model variable and you will see 6 variables with the name "Pos". This causes trouble in reading the summary() command. Just by adding the line
colnames(contrasts(database$Pos))<-c("pos1","pos2","pos3","pos4","pos5","pos6","pos7")
after the creation of your contrasts(database$Pos) <- posmat
your code will work. Feel free to put the colnames you require.
The whole code is as follows then:
library(lme4)
library(lmerTest)
library(corpcor)
database <- data.frame(
Clos=factor(c(4,4,1,4,4,3,2,1,2,1,2,2,4,3,1,2,1,4,1,3,2,2,4,4,4,4,2,1,4,2,2,1,4,2,4,2,1,4,4,3)),
Pos=factor(c(2,4,1,2,5,6,7,2,2,2,5,6,3,3,3,8,5,3,4,2,1,4,3,3,2,6,1,8,3,7,5,7,8,3,6,6,1,6,3,7)),
RF=c(8,6,2,9,7,1,7,6,3,4,6,4,5,2,5,5,3,4,1,3,1,2,3,1,2,2,3,1,8,5,2,2,7,1,9,4,5,6,4,2),
Score=c(4,3,3,5,4,3,2,4,5,2,2,3,3,4,4,4,3,2,3,3,5,4,3,4,4,2,3,4,3,4,1,2,2,2,3,4,5,3,1,2)
)
clos_c1 = c(0,0,-1,1)
clos_c2 = c(0,-1,0,1)
clos_c3 = c(-1,0,0,1)
closmat.temp = rbind(constant = 1/4,clos_c1,clos_c2,clos_c3)
closmat = solve(closmat.temp)
closmat = closmat[, -1]
closmat
pos_c1 = c(1/2,1/2,-1/6,-1/6,-1/6,-1/6,-1/6,-1/6)
posmat.temp = rbind(pos_c1)
posmat <- pseudoinverse(posmat.temp)
colnames(posmat) <- "pos_c1"
contrasts(database$Clos) <- closmat
contrasts(database$Pos) <- posmat
##NEW LINE
colnames(contrasts(database$Pos))<-c("pos1","pos2","pos3","pos4","pos5","pos6","pos7")
model <- lmer(Score~Clos+Pos+(1|RF), data = database, REML = TRUE)
summary(model)
I hope it helps. Cheers!

Unused arguments in R error

I am new to R , I am trying to run example which is given in "rebmix-help pdf". It use galaxy dataset and here is the code
library(rebmix)
devAskNewPage(ask = TRUE)
data("galaxy")
write.table(galaxy, file = "galaxy.txt", sep = "\t",eol = "\n", row.names = FALSE, col.names = FALSE)
REBMIX <- array(list(NULL), c(3, 3, 3))
Table <- NULL
Preprocessing <- c("histogram", "Parzen window", "k-nearest neighbour")
InformationCriterion <- c("AIC", "BIC", "CLC")
pdf <- c("normal", "lognormal", "Weibull")
K <- list(7:20, 7:20, 2:10)
for (i in 1:3) {
for (j in 1:3) {
for (k in 1:3) {
REBMIX[[i, j, k]] <- REBMIX(Dataset = "galaxy.txt",
Preprocessing = Preprocessing[k], D = 0.0025,
cmax = 12, InformationCriterion = InformationCriterion[j],
pdf = pdf[i], K = K[[k]])
if (is.null(Table))
Table <- REBMIX[[i, j, k]]$summary
else Table <- merge(Table, REBMIX[[i, j,k]]$summary, all = TRUE, sort = FALSE)
}
}
}
It is giving me error ERROR:
unused argument (InformationCriterion = InformationCriterion[j])
Plz help
I'm running R 3.0.2 (Windows) and the library rebmix defines a function REBMIX where InformationCriterion is not listed as a named argument, but Criterion.
Brief invoke REBMIX as :
REBMIX[[i, j, k]] <- REBMIX(Dataset = "galaxy.txt",
Preprocessing = Preprocessing[k], D = 0.0025,
cmax = 12, Criterion = InformationCriterion[j],
pdf = pdf[i], K = K[[k]])
It looks as though there have been substantial changes to the rebmix package since the example mentioned in the OP was created. Among the most noticable changes is the use of S4 classes.
There's also an updated demo in the rebmix package using the galaxy data (see demo("rebmix.galaxy"))
To get the above example to produce results (Note: I am not familiar with this package or the rebmix algorithm!!!):
Change the argument to Criterion as mentioned by #Giupo
Use the S4 slot access operator # instead of $
Don't name the results object REDMIX because that's already the function name
library(rebmix)
data("galaxy")
## Don't re-name the REBMIX object!
myREBMIX <- array(list(NULL), c(3, 3, 3))
Table <- NULL
Preprocessing <- c("histogram", "Parzen window", "k-nearest neighbour")
InformationCriterion <- c("AIC", "BIC", "CLC")
pdf <- c("normal", "lognormal", "Weibull")
K <- list(7:20, 7:20, 2:10)
for (i in 1:3) {
for (j in 1:3) {
for (k in 1:3) {
myREBMIX[[i, j, k]] <- REBMIX(Dataset = list(galaxy),
Preprocessing = Preprocessing[k], D = 0.0025,
cmax = 12, Criterion = InformationCriterion[j],
pdf = pdf[i], K = K[[k]])
if (is.null(Table)) {
Table <- myREBMIX[[i, j, k]]#summary
} else {
Table <- merge(Table, myREBMIX[[i, j,k]]#summary, all = TRUE, sort = FALSE)
}
}
}
}
I guess this is late. But I encountered a similar problem just a few minutes ago. And I realized the real scenario that you may face when you got this kind of error msg... It's just the version conflict.
You may use a different version of the R package from the tutorial, thus the argument names could be different between what you are running and what the real code use.
So please check the version first before you try to manually edit the file. Also, it happens that your old version package is still in the path and it overrides the new one. This was exactly what I had... since I manually installed the old and new version separately...

Create a S4 super class - with code example

Okay, it took me a while to create a snippet of code that replicates my problem. Here it is. Notice that if you run the command new("FirstSet", id = "Input", multiplier = 2)
you will get the correct answer. However, if you try to create a class that contains both you will get the following: Error in .local(.Object, ...) : argument "id" is missing, with no default. This is literally the best I can do to explain/show the problem.
What in the world am I doing wrong?
setClass("Details",
representation(
ID = "character",
Anumber = "numeric"))
Input <- new("Details",
ID = "Input",
Anumber = 2)
setClass("FirstSet",
representation(
Anothernumber = "numeric"))
setGeneric(
name = "FirstSet",
def = function(object){standardGeneric("FirstSet")}
)
setMethod("initialize",
signature(.Object = "FirstSet"),
function (.Object, id, multiplier)
{ x = id#Anumber
y = x * multiplier
.Object#Anothernumber = y
return(.Object)
}
)
setClass("Super", contains = c("Details", "FirstSet"))
Corrected Code now gives a new error. I followed the instruction in the post and solved my problem. I also created a generic and a method for "Super", see code below,. Now, I get a new error. Error in .local(.Object, ...) : trying to get slot "Anumber" from an object of a basic class ("character") with no slots. Man, this is exhausting, I thought I had it.
The goal for details is, there will be many files that are serialized and methods are called depending on characteristics of the data in the file. Is this even possible in R or am I trying to do something that R cannot do?
New Code
setClass("Details",
representation(
ID = "character",
Anumber = "numeric"))
setGeneric("Details",
def = function(object){standardGeneric("Details")})
setMethod("initialize",
signature(.Object = "Details"),
function(.Object, ID = character(), Anumber = numeric()){
.Object#ID = ID
.Object#Anumber = 2
return(.Object)
})
setClass("FirstSet",
representation(
Anothernumber = "numeric"))
setGeneric(
name = "FirstSet",
def = function(object){standardGeneric("FirstSet")}
)
setMethod("initialize",
signature(.Object = "FirstSet"),
function (.Object, id = character(), multiplier = numeric())
{ x = id#Anumber
y = x * multiplier
.Object#Anothernumber = y
return(.Object)
}
)
setClass("Super", contains = c("Details", "FirstSet"))
setGeneric("Super",
def = function(object){standardGeneric("Super")})
setMethod("initialize",
signature(.Object = "Super"),
function(.Object, id = character(), Anumber = numeric()){
Details <- new("Details", ID = id, Anumber = Anumber)
FirstSet <- new("FirstSet", Anothernumber = Anothernumber)
Super <- new("Super", Details, FirstSet)
return(.Object)
})
The basic rule is that new("FirstSet") (or any non-virtual class) needs to work. Yours doesn't (because the intiailize arguments don't have default values). See this answer for some more guidelines.

Resources