Has the method annotations in R(NLP package) been deprecated or replaced? - r

I am following this article https://mylearnmachinelearning.com/category/linear-regression/ to create a Named Entity Extractor. Like required, I have installed all the openNLP, NLP, rJava, magrittr and openNLPmodels.en packages. All has gone to plan except when using this function annotations.:
# Extract entities from an AnnotatedPlainTextDocument
entities <- function(doc, kind) {
s <- doc$content
a <- annotations(doc)[[1]] #Point of error
if(hasArg(kind)) {
k <- sapply(a$features, `[[`, "kind")
s[a[k == kind]]
} else {
s[a[a$type == "entity"]]
}
}
by using this:
entities(text_doc, kind = "person").
The thing is even the intellisense in RStudio does not seem to know any function annotations. It show annotation,annotate and annotations_in_spans and what not but there is no annotations.
There is even a YouTube video which demonstrates the same. Strangely he is able to use annotations there.
Package versions:
openNLP: v0.2-6
openNLPmodels.en: v1.5-1
rJava - v0.9-9
magrittr - v1.5
NLP - v0.2-0

The annotations method was associated with objects of type AnnotatedPlainTextDocument in earlier versions of the NLP package.
Here is the documentation for version 0.1-11.
The latest NLP version is 0.2-0.
The method for AnnotatedPlainTextDocument is now called annotation (no 's' at the end). From the documentation it seems the main difference is that it returns an Annotation object, not a list of Annotation objects.

The function annotations is in a lot of packages, please see here:
https://www.rdocumentation.org/search?q=annotations
Albeit probably not the best way, if you are looking for a specific function without knowing which package the function belongs to, this site may help you find such a package.

try this:
# Extract entities from an AnnotatedPlainTextDocument
entities <- function(doc, kind) {
s <- doc$content
a <- annotation(doc)
if(hasArg(kind)) {
k <- sapply(a$features, `[[`, "kind")
s[a[k == kind]]
} else {
s[a[a$type == "entity"]]
}
}

Related

Having trouble understanding this syntax for R

I have been given following code for R, but I am having trouble understanding what it is doing. In fact I can not even run it in R because of its syntax. I assume the syntax is for lower level code behind R. If someone could help explain what's happening here and translate this into executable R code that would be very helpful.
soft_thresholding = function(x,a){
result a)] a)] - a
result[which(x < -a)] = x[which(x < -a)] + a
return(result)}
Here is a summary of the findings. This is not a definite answer but could help the questioner.
If one uses wordpress, then x <- a will look like x < -a. Check this URL that confirms this assumption
Upon further online search with the function name in the question "soft_thresholding", shows that this function is probably attempting to do soft thresholding defined here.
Some more online searching about soft thresholding lands on a CRAN package that is present here.
Further deepdive into the r folder in the package binaries shows the following.
soft.threshold <- function(x,sumabs=1)
return(soft(x, BinarySearch(x,sumabs)))
The function above seems very close to the code in the question.
Furthermore, the soft.threshold function uses another internal function BinarySearch that looks like this.
BinarySearch <-
function(argu,sumabs){
if(norm2(argu)==0 || sum(abs(argu/norm2(argu)))<=sumabs) return(0)
lam_max = max(abs(argu))
lam1 <- 0
lam2 <- lam_max
iter <- 1
while(iter < 500){
su <- soft(argu,(lam1+lam2)/2)
if(sum(abs(su/norm2(su)))<sumabs){
lam2 <- (lam1+lam2)/2
} else {
lam1 <- (lam1+lam2)/2
}
if((lam2-lam1)/lam1 < 1e-10){
if (lam2 != lam_max){
return(lam2)
}else{
return(lam1)
}
}
iter <- iter+1
}
warning("Didn't quite converge")
return((lam1+lam2)/2)
}
This recursive research leads one to believe that the function is perhaps attempting to mimic the function soft.threshold in the CRAN package "RGCCA"
Hope it helps

Is there a way to manually attach packages and globals with `future.apply::future_apply`

I am using R's excellent future package. And in the documentation it mentions %global% and %packages% for assigning global variables and packages to be evaluated in the future environment. But those seem to only work with %<-%.
My question is: is there away to do that with future_apply as well. I tried
x = 1
future.apply::future_sapply(1:50, function(y) {
glue("{x}")
}) %packages% "glue" %globals% "x"
and It doesn't work
If you look at the help page for future_sapply, you'll see that future_lapply has the arguments future.packages and future.globals, and if you read carefully, these are also used in future_sapply. So this works:
x = 1
future.apply::future_sapply(1:50, function(y) {
glue("{x}")
}, future.packages = "glue", future.globals = "x")

Which library is the pr_DB object defined in?

I am completely new to R.
I am trying to use the dist object with a custom function based on the specification here, but I was unable to pass the custom function directly by name, so I tried to add it using the registry described here, but it appears that I am missing a library.
However, I'm not sure which library I need and cannot find a reference to find the name of the library.
Here's a code sample that I'm trying to run:
library(cluster)
myfun <- function(x,y) {
numDiffs <- 0;
for (i in x) {
if (x[i] != y[i])
numDiffs <- numDiffs + 1;
}
return(numDiffs);
}
summary(pr_DB)
pr_DB$set_entry(FUN = myfun, names = c("myfun", "vectorham"))
pr_DB$get_entry("MYFUN")
Here's the error:
Error in summary(pr_DB) : object 'pr_DB' not found
Execution halted
You need to learn the conventions used by R help pages. That "{proxy}" at the top of the page you linked to is really the answer to your question. The convention for the help page construction is "topic {package_name}".

rJava: using java/lang/Vector with a certain template class

I'm currently programming an R-script which uses a java .jar that makes use of the java/lang/Vector class, which in this case uses a class in a method that is not native. In java source code:
public static Vector<ClassName> methodname(String param)
I found nothing in the documentation of rJava on how to handle a template class like vector and what to write when using jcall or any other method.
I'm currently trying to do something like this:
v <- .jnew("java/util/Vector")
b <- .jcall(v, returnSig = "Ljava/util/Vector", method = "methodname",param)
but R obviously throws an exception:
method methodname with signature (Ljava/lang/String;)Ljava/util/Vector not found
How do I work the template class into this command? Or for that matter, how do I create a vector of a certain class in the first place? Is this possible?
rJava does not know java generics, there is no syntax that will create a Vector of a given type. You can only create Vectors of Objects.
Why are you sticking with the old .jcall api when you can use the J system, which lets you use java objects much more nicely:
> v <- new( J("java.util.Vector") )
> v$add( 1:10 )
[1] TRUE
> v$size()
[1] 1
# code completion
> v$
v$add( v$getClass() v$removeElement(
v$addAll( v$hashCode() v$removeElementAt(
v$addElement( v$indexOf( v$retainAll(
v$capacity() v$insertElementAt( v$set(
v$clear() v$isEmpty() v$setElementAt(
v$clone() v$iterator() v$setSize(
v$contains( v$lastElement() v$size()
v$containsAll( v$lastIndexOf( v$subList(
v$copyInto( v$listIterator( v$toArray(
v$elementAt( v$listIterator() v$toArray()
v$elements() v$notify() v$toString()
v$ensureCapacity( v$notifyAll() v$trimToSize()
v$equals( v$remove( v$wait(
v$firstElement() v$removeAll( v$wait()
v$get( v$removeAllElements()

R's text mining package... adding a new function to getTransformation

I am attempting to add a new stemmer that works using a table look up method. if h is the hash the contains the stemming operation, it is encoded as follows: keys as words before stemming and values as words post-stemming.
I would like to ideally add a custom hash that allows me to do the following
myCorpus = tm_map(myCorpus, replaceWords, h)
the replaceWords function is applied to each document in myCorpus and uses the hash to stem the contents of the document
Here is the sample code from my replaceWords function
$hash_replace <- function(x,h) {
if (length(h[[x]])>0) {
return(h[[x]])
} else {
return(x)
}
}
replaceWords <- function(x,h) {
y = tolower(unlist(strsplit(x," ")))
y=y[which(as.logical(nchar(y)))]
z = unlist(lapply(y,hash_replace,h))
return(paste(unlist(z),collapse=' '))
}
Although this works, the transformed corpus is no longer contains content of type "TextDocument" or "PlainTextDocument" but of type "character"
I tried using
return(as.PlainTextDocument(paste(unlist(z),collapse=' ')))
but that that gives me an error while trying to run.
In the previous versions of the R's tm package, I did see a replaceWords function that allowed for synonym and WORDNET based subtitution. But I no longer see it in the current version of tm package (especially when I call the function getTransformations())
Does anybody out there have ideas on how I can make this happen?
Any help is greatly appreciated.
Cheers,
Shivani
Thanks,
Shivani Rao
You just need to use the PlainTextDocument function instead of as.PlainTextDocument. R will automatically return the last statement in your function, so it works if you just make the last line
PlainTextDocument(paste(unlist(z),collapse=' '))

Resources