arulesSequences cspade function: "Error in file(con, "r") : cannot open the connection" - r

One day I tried to execute my routine cspade sequences mining in R and it suddenly failed with error and some very strange print to console. Here is the example code:
library(arulesSequences)
data(zaki)
cspade(zaki, parameter=list(support=0.5))
It throws very long output (even with option control=list(verbose=F)) followed by an error:
CONF 4 9 2.7 2.5
MINSUPPORT 2 4
MINMAX 1 4
1 SUPP 4
2 SUPP 4
4 SUPP 2
6 SUPP 4
numfreq 4 : 0 SUMSUP SUMDIFF = 0 0
EXTRARYSZ 2465792
OPENED C:\Users\Dawid\AppData\Local\Temp\Rtmp279Wy5\cspade2cd4751e5905.idx
OFF 9 38
Wrote Offt 0.00099802
BOUNDS 1 5
WROTE INVERT 0.000998974
Total elapsed time 0.00299406
MINSUPPORT 2 out of 4 sequences
1 -- 4 4
2 -- 4 4
4 -- 2 2
6 -- 4 4
1 6 -- 3 3
2 6 -- 4 4
4 -> 6 -- 2 2
4 -> 2 6 -- 2 2
1 2 6 -- 3 3
1 2 -- 3 3
4 -> 2 -- 2 2
2 -> 1 -- 2 2
4 -> 1 -- 2 2
6 -> 1 -- 2 2
4 -> 6 -> 1 -- 2 2
2 6 -> 1 -- 2 2
4 -> 2 6 -> 1 -- 2 2
4 -> 2 -> 1 -- 2 2
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open file
'C:\Users\Dawid\AppData\Local\Temp\Rtmp279Wy5\cspade2cd4751e5905.out': No
such file or directory
It looks like it is printing the mined rules to the console (which has never happened before). And it ends with error so I can't write the rules into a variable. Looks like some problem with writing temporary files maybe?
My configuration:
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Packages:
arulesSequences_0.2-19
arules_1.6-1
(arulesSequences have new version but on the latest version arulesSequences_0.2-20 it fails in the same way)
Thank you!

One workaround is to use the R console, not Rstudio.
Well, it should work fine then. I see that more people have the same problem. I have tried reinstalling Rstudio together with reinstalling packages and using older Rstudio version but it didn't work.
Hope it helps but I would be grateful for a full answer. Thanks!

Related

R- Data Analytic syntax

Purpose : I want to repeat the analysis i have already done in python using R.codes are below kindly help write equivalent code in R:
Question no 1:
For below table
caught bowled run out lbw stumped
62 21 8 4 4
caught and bowled hit wicket
2 1
But then I when I converted it back to `dataframe` for using `ggplot` it so coming as
A Freq
1 1 1
2 2 1
3 4 2
4 8 1
5 21 1
6 62 1
How to i avoid this? kindly advice?
******Question no 2 :****
```python code is as below:
len(df_warner\[df_warner\['batsman_runs'\]==6\])
# what is Eqivalent R syntax?
df_six<-df_warner2[(df_warner2$batsman_runs==6),]
nrow(df_six) # worked well

coreNLP- r package annotateString gives Null openie

I am new in using the R coreNLP package. I just installed the package with the objective to use the function getOpenIE. However, even when I run the code on a very simple sentence.The annotateString function doesn't work for annotating openie. See the below code:
library(coreNLP)
downloadCoreNLP()
initCoreNLP()
text <- "food is good and staff is friendly"
t <- annotateString(text)
> t$openie
NULL
> getOpenIE(t)
NULL
Is this a common issue? has anyone found a solution yet? Thank you
I had trouble with this too. You need to addtype = "english_all" when using the initCoreNLP function. See a working reprex below:
library(coreNLP)
#> Warning: package 'coreNLP' was built under R version 3.4.4
downloadCoreNLP()
#> [1] 0
initCoreNLP(type = "english_all")
text <- "food is good and staff is friendly"
t <- annotateString(text)
t$openie
#> subject_start subject_end subject relation_start relation_end relation
#> 1 4 5 staff 5 6 is
#> 2 0 1 food 1 2 is
#> object_start object_end object
#> 1 6 7 friendly
#> 2 2 3 good
getOpenIE(t)
#> subject_start subject_end subject relation_start relation_end relation
#> 1 4 5 staff 5 6 is
#> 2 0 1 food 1 2 is
#> object_start object_end object
#> 1 6 7 friendly
#> 2 2 3 good
Created on 2018-12-29 by the reprex package (v0.2.1)

Is R able to compute contingency tables on big file without putting the whole file in RAM?

Let me explain the question:
I know the functions table or xtabs compute contingency tables, but they expect a data.frame, which is always stored in RAM. It's really painful when trying to do this on a big file (say 20 GB, the maximum I have to tackle).
On the other hand, SAS is perfectly able to do this, because it reads the file line by line, and updates the result in the process. Hence there is ever only one line in RAM, which is much more acceptable.
I have done the same as SAS with ad-hoc Python programs on occasion, when I had to do more complicated stuff that either I didn't know how to do in SAS or thought it was too cumbersome. Python syntax and integrated features (dictionaries, regular expressions...) compensate for its weaknesses (speed, mainly, but when reading 20 GB, speed is limitated by the hard drive anyway).
My question, then: I would like to know if there are packages to do this in R. I know it's possible to read a file line by line, like I do in Python, but computing simple statistics (contingency tables for instance) on a big file is such a basic task that I feel there should be some more or less "integrated" feature to do it in a statistical package.
Please tell me if this question should be asked on "Cross Validated". I had a doubt, since it's more about software than statistics.
You can use the package ff for this which uses the hard disk drive instead of RAM but it is implemented in a way that it doesn't make it (significantly) slower than the normal way R uses RAM.
This if from the package description:
The ff package provides data structures that are stored on disk but behave (almost) as if they were in RAM by transparently mapping only a section (pagesize) in main memory.
I think this will solve your problem of loading a 20GB file in RAM. I have used it myself for such purposes and it worked great.
See here a small example as well. From the example on the xtabs documentation:
Base R
#example from ?xtabs
d.ergo <- data.frame(Type = paste0("T", rep(1:4, 9*4)),
Subj = gl(9, 4, 36*4))
> print(xtabs(~ Type + Subj, data = d.ergo)) # 4 replicates each
Subj
Type 1 2 3 4 5 6 7 8 9
T1 4 4 4 4 4 4 4 4 4
T2 4 4 4 4 4 4 4 4 4
T3 4 4 4 4 4 4 4 4 4
T4 4 4 4 4 4 4 4 4 4
ff package
#convert to ff
d.ergoff <- as.ffdf(d.ergo)
> print(xtabs(~ Type + Subj, data = d.ergoff)) # 4 replicates each
Subj
Type 1 2 3 4 5 6 7 8 9
T1 4 4 4 4 4 4 4 4 4
T2 4 4 4 4 4 4 4 4 4
T3 4 4 4 4 4 4 4 4 4
T4 4 4 4 4 4 4 4 4 4
You can check here for more information on memory manipulation.

Error while trying to do a prediction with bnlearn package - Bayesian network

I'm trying to do a prediction model with bnlearn package but I get error indicating : "Error in check.data(data) : the data are missing".
Here is my example data set and line of codes that I used to preformed the prediction model:
dat <- read.table(text = " category birds wolfs snakes
yes 3 9 7
no 3 8 4
no 1 2 8
yes 1 2 3
yes 1 8 3
no 6 1 2
yes 6 7 1
no 6 1 5
yes 5 9 7
no 3 8 7
no 4 2 7
notsure 1 2 3
notsure 7 6 3
no 6 1 1
notsure 6 3 9
no 6 1 1 ",header = TRUE)
Here are the lines of code that I used to get the prediction:
dat$birds<-as.numeric(dat$birds)
dat$wolfs<-as.numeric(dat$wolfs)
dat$snakes<-as.numeric(dat$snakes)
training.set = dat[1:8,2:4 ]
demo.set = dat[8:16,2:4 ]
res <- hc(training.set)
fitted = bn.fit(res, training.set)
pred = predict(fitted, demo.set) # I get an error: "Error in check.data(data) : the data are missing."
Any Idea how to solve it ?
predict(fittedbn, node="column name to predict", data=testdata) worked for me
I don't have bnlearn installed, but from your code I guess that the problem is that you didn't provide the output (which is the category column) into the training set. Change:
training.set = dat[1:8,]
and see if it works.

Converting R data frame with RDS package: recruitment id error?

I am using the RDS package for respondent-driven sampling survey data. I want to convert a regular R data frame to an rds.data.frame. To do so, I have been trying to use the as.rds.data.frame function from RDS.
Here is an excerpted section of my data frame, where the first case (id=1) is the 'seed' respondent (who has no recruiter). It contains the variables: id (respondent id number), recruit.id(id number of respondent who recruited him/her), netsize (respondent's network size) and population (estimate of whole population size).
df<-data.frame(id=c(1,2,3,4,5,6,7,8,9,10),
recruit.id=c(-1,1,1,2,2,4,5,3,8,3),
netsize=c(6,6,6,5,5,4,4,3,4,6), population=rep(22,000, 10))
I then (try to) apply the relevant function:
new.df <-as.rds.data.frame(df,id=df$id,
recruiter.id=df$recruit.id,
network.size=df$netsize,
population.size=df$population,
max.coupons=2)
I get the error message:
Error in as.rds.data.frame(df, id = df$id, recruiter.id = df$recruit.id,: Invalid id
and the warning
In addition: Warning message:In if (!(id %in% names(x))) stop("Invalid id") :
the condition has length > 1 and only the first element will be used
I have tried assigning various 'recruiter id' values for seed participants, including -1,0 or their own id number but I still get the same message. I have also tried eliminating function arguments (coupon.max, population) or deleting seed respondents, but I still get the same message.
Package documentation says the function will fail if recruitment information is incomplete. As far as I can tell, this is not the case.
I am new to this, so if anyone can point me in the right direction I would be really grateful.
This seems to work:
colnames(df)[2:4] <- c("recruiter.id", "network.size.variable", "population.size")
as.rds.data.frame(df,max.coupons=2)
This gives a result with a warning
as.rds.data.frame(df, id="id", recruiter.id="recruit.id",
network.size="netsize", population.size="population", max.coupons=2)
# An object of class "rds.data.frame"
#id: 1 2 3 4 5 6 7 8 9 10
#recruiter.id: -1 1 1 2 2 4 5 3 8 3
# id recruit.id netsize population
#1 1 -1 6 22
#2 2 1 6 22
#3 3 1 6 22
#4 4 2 5 22
#5 5 2 5 22
#6 6 4 4 22
#7 7 5 4 22
#8 8 3 3 22
#9 9 8 4 22
#10 10 3 6 22
# Warning message:
#In as.rds.data.frame(df, id = "id", recruiter.id = "recruit.id", :
#NAs introduced by coercion

Resources