Error in makebin(data, file) : 'sid' invalid (order) - r

When trying to use the cSPADE algorithm from the arulesSequences package in R I always receive the following error:
Error in makebin(data, file) : 'sid' invalid (order)
I think the problem is related to the arrangement of the data somehow but I really don't know what the problem may be.
A similar problem was presented here but the solution that I tried to incorporate by using
arrange(sessions#itemsetInfo,sequenceID,eventID)
did not change anything. Thanks for your help!
#This is how I pre-processed the data
load("example.Rda")
example.Rda %>% head(5) %>% knitr::kable()
str(example.Rda)
example.Rda = as.data.frame(example.Rda)
sessions <- as(example.Rda %>% transmute(items = question_type), "transactions")
transactionInfo(sessions)$sequenceID <- example.Rda$user_id
transactionInfo(sessions)$eventID <- example.Rda$item_id
itemLabels(sessions) <- str_replace_all(itemLabels(sessions), "items=", "")
inspect(head(sessions,10))
arrange(sessions#itemsetInfo,sequenceID,eventID)
str(sessions)
View(sessions)
#This is the definition of the itemsets after which the error occurs
itemsets <- cspade(sessions,
parameter = list(support = 0.001),
control = list(verbose = FALSE))

Related

Consistent error message while running grouping analysis in 'plspm' package

I am looking for some help in resolving an error using the partial least squares path modeling package ('plspm').
I can get results running a basic PLS-PM analysis but run into issues when using the grouping function, receiving the error message:
Error in if (w_dif < specs$tol || iter == specs$maxiter) break : missing value where TRUE/FALSE needed
I have no missing values and all variables have the proper classification. Elsewhere I read that there is a problem with processing observations with the exact same values across all variables, I have deleted those and still face this issue. I seem to be facing the issue only when I run the groups using the "bootstrap" method as well.
farmwood = read.csv("farmwood_groups(distance).csv", header = TRUE) %>%
slice(-c(119:123))
Control = c(0,0,0,0,0,0)
Normative = c(0,0,0,0,0,0)
B_beliefs = c(0,0,0,0,0,0)
P_control = c(1,0,0,0,0,0)
S_norm = c(0,1,0,0,0,0)
Behavior = c(0,0,1,1,1,0)
farmwood_path = rbind(Control, Normative, B_beliefs, P_control, S_norm, Behavior)
colnames(farmwood_path) = rownames(farmwood_path)
farmwood_blocks = list(14:18,20:23,8:13,24:27,19,4:7)
farmwood_modes = rep("A", 6)
farmwood_pls = plspm(farmwood, farmwood_path, farmwood_blocks, modes = farmwood_modes)
ames(farmwood)[names(farmwood) == "QB3"] <- "Distance"
farmwood$Distance <- as.factor(farmwood$Distance)
distance_boot = plspm.groups(farmwood_pls, farmwood$Distance, method = "bootstrap")
distance_perm = plspm.groups(farmwood_pls, farmwood$Distance, method = "permutation")
The data is contained here:
https://www.dropbox.com/s/8vewuupywpi1jkt/farmwood_groups%28distance%29.csv?dl=0
Any help would be appreciated. Thank you in advance

Error in w2v_train(trainFile = file_train, modelFile = model, stopWordsFile = file_stopwords (full error text below)

Full error text: Error in w2v_train(trainFile = file_train, modelFile = model, stopWordsFile = file_stopwords, : Expecting a single string value: [type=closure; extent=1].
I am trying to run a word embedding analysis using this data https://www.kaggle.com/datasets/therohk/million-headlines?resource=download to obtain:
top 25 closest words to focus word
plot these 25 words
compare same analysis with different data (JSTOR data on articles with "populism" https://constellate.org/dataset/f53e497b-844e-2b60-ec2f-b9c54d2e334e?unigrams=political,%20social)
I loaded all the data and necessary packages, as well as pre-processing the ABCNews data for the analysis. (See code)
#Loading necessary packages
install.packages(c("tidyverse", "tidytext", "word2vec", "Rtsne", "future", "jstor", "magritrr", "ggplot2", "dplyr"))
library("tidyverse")
library("tidytext")
library("word2vec")
library("Rtsne")
library("future")
library("jstor")
library(magrittr)
library("ggplot2")
library("dplyr")
#Preprocessing abcnews data
##Select text data from csv file ABC NEWS FILE
head(abcnews_pop)
abc_pop_text <- abcnews_pop %>%
select("headline_text")
head(abc_pop_text)
I then used the following code to process the embedding:
#ABCNews data
text_news<-abc_pop_text%>%
txt_clean_word2vec(.,ascii = TRUE, alpha = TRUE, tolower = TRUE, trim = TRUE)
set.seed(123456789)
news_model<-word2vec(x=text, type = "cbow", dim = 500, iter = 50)
embedding_news<-as.matrix(news_model)
The first function (text_news<-abc_pop...) ran smoothly. However, the second one (set.seed(123456789) news_model...) puts out this mistake:
Error in w2v_train(trainFile = file_train, modelFile = model, stopWordsFile = file_stopwords, : Expecting a single string value: [type=closure; extent=1].
Does anyone know how to address this?
I had an error in naming objects/variables. This has been resolved, thank you.

Rpart Error with Anova: `Error in !isord : invalid argument type`

I'm running below code to call rpart function but it keeps giving me error Error in !isord : invalid argument type
# set arguments for rpart function
group.target.metric <- "loan_amount"
group.data.variables <- c(data.config$dict[is_group == TRUE, variable_name_modeling], group.target.metric)
print(group.data.variables)
group.training.data <- complete.data[, ..group.data.variables]
# run main code
group.tree <- rpart(formula = paste(group.target.metric, "~." ),
data = group.training.data,
method = "anova")
Can anyone please guide what this could be about?
Rpart version I'm using is 4.1-15
The issue was while creating data.config$dict it had missing definitions/datatypes for one variable I was using in the model. To check & update datatype in complete.data table use query:
complete.data <- UpdateDataTypes(complete.data, data.config$dict)

arulesViz subscript out of bounds paracoord

I want to perform basket analysis and draw a paracoord plot however I receive an error.
Content of this error is: :
Error in m[j, i] : subscript out of bounds.In addition: Warning message:
In cbind(pl, pr) :
number of rows of result is not a multiple of vector length (arg 2)
I am using data from: Link.
First I am transforming this to fit basket analysis, name of the original excel files is Online_Retail:
library(arules)
library(arulesViz)
library(plyr)
items <- ddply(Online_Retail, c("CustomerID", "InvoiceDate"), function(df1)paste(df1$Description, collapse = ","))
items1 <- items["V1"]
write.csv(items1, "groceries1.csv", quote=FALSE, row.names = FALSE, col.names = FALSE)
trans1 <- read.transactions("groceries1.csv", format = "basket", sep=",",skip=1)
And to draw paracoord I have created such a code:
rules.trans2<-apriori(data=trans1, parameter=list(supp=0.001,conf = 0.05),
appearance=list(default="rhs", lhs="ROSES REGENCY TEACUP AND SAUCER"), control=list(verbose=F))
sorted.plot <- sort(rules.trans2, by="support", decreasing = TRUE)
plot(sorted.plot, method="paracoord", control=list(reorder=TRUE, verbose = TRUE))
Why my code for paracoord is not working? how can I fix it? What should I change?
This is, unfortunately, a bug in arulesViz. This will be fixed in the next release (arulesViz 1.3-3). The fix is already available in the development version on GitHub: https://github.com/mhahsler/arulesViz

Error in DESeq formula nbinomTest

I'm using the DESeq package to analyze RNA sequencing data. I have only one replicate and two treatments. My code is:
data <- read.csv()
metadata <- data.frame(row.names = colnames(data), condition =c("treated", "untreated"))
cds2 <- newCountDataSet( countData = data, conditions = metadata )
cds2 <- estimateSizeFactors(cds2)
counts( cds2, normalized=TRUE )
cds2 <- estimateDispersions(cds2, method="blind", sharingMode="fit-only")
res <- nbinomTest(cds2, "treated", "untreated" )
Everything works fine until I call "estimateDispersions". However, the function "nbinomTest" gives me this error:
Error in if (dispTable(cds)[condA] == "blind" || dispTable(cds)[condB] == :
missing value where TRUE/FALSE needed
I found some documentation on this error, but the answers are not helpful for me. I work with R version 3.1.2 (2014-10-31).
Can someone help my with my problem, please?
Cheers!

Resources