coreNLP- r package annotateString gives Null openie - r

I am new in using the R coreNLP package. I just installed the package with the objective to use the function getOpenIE. However, even when I run the code on a very simple sentence.The annotateString function doesn't work for annotating openie. See the below code:
library(coreNLP)
downloadCoreNLP()
initCoreNLP()
text <- "food is good and staff is friendly"
t <- annotateString(text)
> t$openie
NULL
> getOpenIE(t)
NULL
Is this a common issue? has anyone found a solution yet? Thank you

I had trouble with this too. You need to addtype = "english_all" when using the initCoreNLP function. See a working reprex below:
library(coreNLP)
#> Warning: package 'coreNLP' was built under R version 3.4.4
downloadCoreNLP()
#> [1] 0
initCoreNLP(type = "english_all")
text <- "food is good and staff is friendly"
t <- annotateString(text)
t$openie
#> subject_start subject_end subject relation_start relation_end relation
#> 1 4 5 staff 5 6 is
#> 2 0 1 food 1 2 is
#> object_start object_end object
#> 1 6 7 friendly
#> 2 2 3 good
getOpenIE(t)
#> subject_start subject_end subject relation_start relation_end relation
#> 1 4 5 staff 5 6 is
#> 2 0 1 food 1 2 is
#> object_start object_end object
#> 1 6 7 friendly
#> 2 2 3 good
Created on 2018-12-29 by the reprex package (v0.2.1)

Related

R-Mosaic: Is there a mean.n function?

is there a mean.n function (just as in SPSS) in mosaic in R?
I have 3 columns of data (including "NA") and I want a new column to have the means of the 3 data points for each row. How do I do that?
rowMeans might be just what you are looking for. It will return the row-wise mean, make sure to select/subset the right columns.
Here is an example
# Load packages
library(dplyr)
# Example data
ex_data = data.frame(A = rnorm(10), B = rnorm(10)*2, C = rnorm(10)*5)
ex_data
#> A B C
#> 1 0.2838024 -1.8784902 -2.7519131
#> 2 -0.4090575 1.6457548 6.1643390
#> 3 0.2061454 0.2103105 7.2798434
#> 4 -1.5246471 -0.6071042 -7.2411695
#> 5 -1.0461921 -2.6290405 -1.3840000
#> 6 -1.4802151 1.9323571 5.8539328
#> 7 0.1827485 0.1608848 -0.5157152
#> 8 -0.3006229 2.8650122 -1.4393171
#> 9 2.2981543 -0.2790727 2.6193970
#> 10 1.0495951 -0.9061784 -4.4013859
# Use rowMeans
ex_data$abc_means = rowMeans(x = ex_data[1:3])
ex_data
#> A B C abc_means
#> 1 0.2838024 -1.8784902 -2.7519131 -1.44886698
#> 2 -0.4090575 1.6457548 6.1643390 2.46701208
#> 3 0.2061454 0.2103105 7.2798434 2.56543308
#> 4 -1.5246471 -0.6071042 -7.2411695 -3.12430691
#> 5 -1.0461921 -2.6290405 -1.3840000 -1.68641084
#> 6 -1.4802151 1.9323571 5.8539328 2.10202491
#> 7 0.1827485 0.1608848 -0.5157152 -0.05736064
#> 8 -0.3006229 2.8650122 -1.4393171 0.37502404
#> 9 2.2981543 -0.2790727 2.6193970 1.54615953
#> 10 1.0495951 -0.9061784 -4.4013859 -1.41932305
You mentioned that you have NAs in your data, make sure to include na.rm = TRUE if appropriate.
Created on 2021-04-02 by the reprex package (v0.3.0)

R detect zeroes in ts object

Simple question : in R, what's the best way to detect if there is a zero somewhere in a time series (ts class)? I run X13 (seasonal package) on hundreds of time series and I would like to identify those who contain zero values (since multiplicative models don't work when they encounter a zero). If I could detect those series, I could use a IF-THEN-ELSE statement with proper specs for the X13.
Thank you!
You can replace or delete them:
ts <- ts(0:10)
## Deleting
ts[ts != 0]
#> [1] 1 2 3 4 5 6 7 8 9 10
## Replacing
replace(ts, ts==0, 1)
#> Time Series:
#> Start = 1
#> End = 11
#> Frequency = 1
#> [1] 1 1 2 3 4 5 6 7 8 9 10
## Detecting
any(ts == 0)
#> [1] TRUE
Created on 2020-10-29 by the reprex package (v0.3.0)

data.frame Using Vector of Names

Can I use a vector of variable names to make a data frame?
have=c("aaa","bbb","ccc","ddd","eee","fff","ggg","hhh","iii","jjj")
for(i in 1:10){assign(have[i],rnorm(10))}
want=data.frame(aaa,bbb,ccc,ddd,eee,fff,ggg,hhh,iii,jjj)
I wonder if I can alter the last aaa,bbb,ccc,ddd,eee,fff,ggg,hhh,iii,jjj somehow using have.
Assume that all variables in have are stored in the Global environment. Then you can also try this:
want <- as.data.frame(mget(have))
You could do
have=c("aaa","bbb","ccc","ddd","eee","fff","ggg","hhh","iii","jjj")
for(i in 1:10){assign(have[i],rnorm(10))}
want <- data.frame(sapply(have, get))
want
#> aaa bbb ccc ddd eee fff
#> 1 2.2111971 0.58169621 0.7558816 -1.6408627 0.7975625 0.09160389
#> 2 -0.7847731 1.60423888 0.3819555 -1.2061538 0.7545381 -0.64964125
#> 3 -1.2757056 0.57714761 0.4700359 -1.1041282 -0.3816839 0.40549014
#> 4 -0.0360762 -1.29007252 -0.7820075 -0.5319163 -0.2999686 0.51213744
#> 5 0.1763021 0.82259576 -0.4409983 1.4809103 -0.3658530 -0.16434920
#> 6 1.3196823 -0.18163744 1.5261259 1.3087872 -1.0644242 -1.31891628
#> 7 0.4076277 -0.89769591 -0.7778384 -0.3837985 -1.8659484 -1.53683062
#> 8 1.1872413 -0.06917426 0.3875081 0.4146543 -0.7035016 -0.63534985
#> 9 0.9037385 0.10581530 0.6210197 2.4435195 -1.2323838 0.84316865
#> 10 -0.8933586 1.47698413 0.4561502 1.0824430 2.2895535 0.05699095
#> ggg hhh iii jjj
#> 1 -0.4915989 -0.02034347 -1.6870239 -1.08651315
#> 2 1.7595238 0.47375431 0.5408044 0.65031636
#> 3 -2.0502394 0.85440730 -0.4114844 -0.17392623
#> 4 -1.1268393 0.68303043 1.1722424 -0.90590156
#> 5 -1.3235682 0.59603361 -0.8958801 -0.94192724
#> 6 -0.3669457 -0.27870024 1.8228263 0.01478657
#> 7 0.6525810 -0.00354290 0.3757264 0.34386963
#> 8 -0.3378531 -0.45219282 -0.8959065 -0.43244283
#> 9 0.3931531 0.61264470 0.6359348 0.02984539
#> 10 -0.5256779 0.79624735 -2.2912426 -1.06220090
Created on 2020-10-03 by the reprex package (v0.3.0)

arulesSequences cspade function: "Error in file(con, "r") : cannot open the connection"

One day I tried to execute my routine cspade sequences mining in R and it suddenly failed with error and some very strange print to console. Here is the example code:
library(arulesSequences)
data(zaki)
cspade(zaki, parameter=list(support=0.5))
It throws very long output (even with option control=list(verbose=F)) followed by an error:
CONF 4 9 2.7 2.5
MINSUPPORT 2 4
MINMAX 1 4
1 SUPP 4
2 SUPP 4
4 SUPP 2
6 SUPP 4
numfreq 4 : 0 SUMSUP SUMDIFF = 0 0
EXTRARYSZ 2465792
OPENED C:\Users\Dawid\AppData\Local\Temp\Rtmp279Wy5\cspade2cd4751e5905.idx
OFF 9 38
Wrote Offt 0.00099802
BOUNDS 1 5
WROTE INVERT 0.000998974
Total elapsed time 0.00299406
MINSUPPORT 2 out of 4 sequences
1 -- 4 4
2 -- 4 4
4 -- 2 2
6 -- 4 4
1 6 -- 3 3
2 6 -- 4 4
4 -> 6 -- 2 2
4 -> 2 6 -- 2 2
1 2 6 -- 3 3
1 2 -- 3 3
4 -> 2 -- 2 2
2 -> 1 -- 2 2
4 -> 1 -- 2 2
6 -> 1 -- 2 2
4 -> 6 -> 1 -- 2 2
2 6 -> 1 -- 2 2
4 -> 2 6 -> 1 -- 2 2
4 -> 2 -> 1 -- 2 2
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open file
'C:\Users\Dawid\AppData\Local\Temp\Rtmp279Wy5\cspade2cd4751e5905.out': No
such file or directory
It looks like it is printing the mined rules to the console (which has never happened before). And it ends with error so I can't write the rules into a variable. Looks like some problem with writing temporary files maybe?
My configuration:
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Packages:
arulesSequences_0.2-19
arules_1.6-1
(arulesSequences have new version but on the latest version arulesSequences_0.2-20 it fails in the same way)
Thank you!
One workaround is to use the R console, not Rstudio.
Well, it should work fine then. I see that more people have the same problem. I have tried reinstalling Rstudio together with reinstalling packages and using older Rstudio version but it didn't work.
Hope it helps but I would be grateful for a full answer. Thanks!

Speed of Cleaning Text in R using a Dictionary

I currently have a list of misspellings and a list of corrections, indexed with a 1 to 1 relationship.
These corrections are specific to the work I am doing so I cannot use existing spelling correction packages.
Given a list of strings which I want to apply these corrections to, I have the following code:
for (i in 1:n){
new_text <- gsub(match[i], dict[i], new_text)
new_text <- gsub('[[:punct:]]', '', new_text)
}
Although this gives the results I want, it takes most of the day to run.
I cannot figure out how to use apply functions because the operations happen in a specific order on the same object.
Is there anything else I can try to speed this up?
Edit: This is the very small test set I have put together to benchmark performance.
match <- c("\\b(abouta|aobut|bout|abot|abotu)\\b","\\b(avdised|advisd|advized|advsied)\\b","\\b(posible|possibl)\\b","\\b(replacment|repalcement|replacemnt|replcement|rplacement)\\b","\\b(tommorrow|tomorow|tommorow|tomorro|tommoro)\\b")
dict <- c('about','advised','possible','replacement','tomorrow')
new_text <- c('be advisd replacment coming tomorow','did you get the email aobut the repalcement tomorro','the customer has been avdised of a posible replacement','there is a replacement coming tomorrow','what time tommorow is the replacment coming')
n <- 5
Running my current code 1000 times on this data gives 0.424 elapsed.
Try the corpus library, using a custom stemmer. The library lets you provide an arbitrary stemmer function. In your case you would use something like the following for your stemmer:
library(corpus)
dict <- strsplit(split = "\\|",
c("about" = "abouta|aobut|bout|abot|abotu",
"advised" = "avdised|advisd|advized|advsied",
"possible" = "posible|possibl",
"replacement" = "replacment|repalcement|replacemnt|replcement|rplacement",
"tomorrow" = "tommorrow|tomorow|tommorow|tomorro|tommoro"))
my_stemmer <- new_stemmer(unlist(dict), rep(names(dict), lengths(dict)))
Then, you can either pass this function as the stemmer argument to any function expecting text, or else you can create a corpus_text object with the stemmer attribute (as part of its token_filter that defines how text gets transformed to tokens):
new_text <- c('be advisd replacment coming tomorow',
'did you get the email aobut the repalcement tomorro',
'the customer has been avdised of a posible replacement',
'there is a replacement coming tomorrow','what time tommorow is the replacment coming')
Use term_stats to count (stemmed) token occurrences:
text <- as_corpus_text(new_text, stemmer = my_stemmer, drop_punct = TRUE)
term_stats(text)
#> term count support
#> 1 replacement 5 5
#> 2 tomorrow 4 4
#> 3 the 4 3
#> 4 coming 3 3
#> 5 a 2 2
#> 6 advised 2 2
#> 7 is 2 2
#> 8 about 1 1
#> 9 be 1 1
#> 10 been 1 1
#> 11 customer 1 1
#> 12 did 1 1
#> 13 email 1 1
#> 14 get 1 1
#> 15 has 1 1
#> 16 of 1 1
#> 17 possible 1 1
#> 18 there 1 1
#> 19 time 1 1
#> 20 what 1 1
#> ⋮ (21 rows total)
Use text_locate to find instances of (stemmed) tokens in the original text:
text_locate(text, "replacement")
#> text before instance after
#> 1 1 be advisd replacment coming tomorow
#> 2 2 …u get the email aobut the repalcement tomorro
#> 3 3 …been avdised of a posible replacement
#> 4 4 there is a replacement coming tomorrow
#> 5 5 what time tommorow is the replacment coming
The results of the stemming function get cached, so this is all very fast.
More examples at http://corpustext.com/articles/stemmer.html

Resources