R function slda.em: error when setting the logistic = T argument - r

I am trying to run supervised LDA on a set of annotated tweets. I keep getting this error msg:
Error in structure(.Call("collapsedGibbsSampler", documents, as.integer(K), : This should not have happened (nan).
How do I fix it?
It only happens when I set logistic = T.
Code below. s is a sample of the data.
corpus1 <- lexicalize(tweets[s], lower=TRUE)
to.keep <- corpus1$vocab[word.counts(corpus1$documents, corpus1$vocab) >= 1]
documents <- lexicalize(tweets[s], lower=TRUE, vocab=to.keep)
params <- sample(c(-1, 1), num.topics, replace=TRUE)
result <- slda.em(documents=documents,
K=num.topics,
vocab=to.keep,
num.e.iterations=10,
num.m.iterations=4,
alpha=1.0, eta=0.1,
annotations = as.integer(annotations[s]),
params,
variance=0.25,
lambda=1.0,
logistic=T)

Related

External Cluster Validation - Categorical Data R

I've recently been attempting to evaluate output from k-modes (a cluster label), relative to a so-called True cluster label (labelled 'class' below).
In other words: I've been attempting to external validate the clustering output. However, when I tried external validation measures from the 'fpc' package, I was unsuccessful (error term posted below script).
I've attached my code for the mushroom dataset. I would appreciate if anyone could show me how to successful execute these external validation measures in the context of categorical data.
Any help appreciated.
# LIBRARIES
install.packages('klaR')
install.packages('fpc')
library(klaR)
library(fpc)
#MUSHROOM DATA
mushrooms <- read.csv(file = "https://raw.githubusercontent.com/miachen410/Mushrooms/master/mushrooms.csv", header = FALSE)
names(mushrooms) <- c("edibility", "cap-shape", "cap-surface", "cap-color",
"bruises", "odor", "gill-attachment", "gill-spacing",
"gill-size", "gill-color", "stalk-shape", "stalk-root",
"stalk-surface-above-ring", "stalk-surface-below-ring",
"stalk-color-above-ring", "stalk-color-below-ring", "veil-type",
"veil-color", "ring-number", "ring-type", "spore-print-color",
"population", "habitat")
names(mushrooms)[names(mushrooms)=="edibility"] <- "class"
indexes <- apply(mushrooms, 2, function(x) any(is.na(x) | is.infinite(x)))
colnames(mushrooms)[indexes]
table(mushrooms$class)
str(mushrooms)
#REMOVING CLASS VARIABLE
mushroom.df <- subset(mushrooms, select = -c(class))
#KMODES ANALYSIS
result.kmode <- kmodes(mushroom.df, 2, iter.max = 50, weighted = FALSE)
#EXTERNAL VALIDATION ATTEMPT
mushrooms$class <- as.factor(mushrooms$class)
class <- as.numeric(mushrooms$class))
clust_stats <- cluster.stats(d = dist(mushroom.df),
class, result.kmode$cluster)
#ERROR TERM
Error in silhouette.default(clustering, dmatrix = dmat) :
NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In dist(mushroom.df) : NAs introduced by coercion

Error in eval(expr, p): object 'X' not found; predict (BayesARIMAX)

I am trying to use BayesARIMAX to model and predict us gdp (you can find the data here: https://fred.stlouisfed.org/series/GDP).I followed the example (https://cran.r-project.org/web/packages/BayesARIMAX/BayesARIMAX.pdf) to build my model. I didnt have any major issue to build the model(used error handling to overcome Getting chol.default error when using BayesARIMAX in R issue). However could not get the prediction of the model. I tried to look for solution and there is no example of predicting the model that is build using BayesARIMAX. Every time that I run the "predict" I get the following error:
"Error in eval(expr, p) : object 'X' not found"
Here is my code.
library(xts)
library(zoo)
library(tseries)
library(tidyverse)
library(fpp2)
gdp <- read.csv("GDP.csv", head = T)
date.q <- as.Date(gdp[, 1], "%Y-%m-%d")
gdp <- xts(gdp[,2],date.q)
train.row <- 248
number.row <- dim(merge.data)[1]
gdp.train <- gdp[1:train.row]
gdp.test <- gdp[(train.row+1):number.row]
date.test <- date.q[(train.row+1):number.row]
library(BayesARIMAX)
#wrote this function to handle randomly procuded error due to MCMC simulation
test_function <- function(a,b,P=1,Q=1,D=1,error_count = 0)
{
tryCatch(
{
model = BayesARIMAX(Y=a,X = b,p=P,q=Q,d=D)
return(model)
},
error = function(cond)
{
error_count=error_count+1
if (error_count <40)
{
test_function(a,b,P,Q,D,error_count = error_count)
}
else
{
print(paste("Model doesnt converge for ARIMA(",P,D,Q,")"))
print(cond)
}
}
)
}
set.seed(1)
x = rnorm(length(gdp.train),4,1)
bayes_arima_model <- test_function(a = gdp.train,b=x,P = 3,D = 2,Q = 2)
bayes_arima_pred <- xts(predict(bayes_arima_model[[1]],newxreg = x[1:3])$pred,date.test)
and here is the error code
Error in eval(expr, p) : object 'X' not found
Here is how I resolve the issue after reading through the BayesARIMAX code (https://rdrr.io/cran/BayesARIMAX/src/R/BayesianARIMAX.R) . I basically created the variable "X" and passed it to predict function to get the result. You just have to set the length of X variable equal to number of prediction.
here is the solution code for prediction.
X <- c(1:3)
bayes_arima_pred <- xts(predict(bayes_arima_model[[1]],newxreg = X[1:3])$pred,date.test)
which gave me the following results.
bayes_arima_pred
[,1]
2009-01-01 14462.24
2009-04-01 14459.73
2009-07-01 14457.23

Error in 'indepTest' in PC algorithm for conditional Independence Test

I am using PC algorithm function, in which Conditional Independence is one of the attribute. Facing error in the following code. Note that 'data' here is the data that I have been using, and 1,6,2 in gaussCItest are the node positions in my adjacency matrix x and y of the data.
code:
library(pcalg)
suffstat <- list(C = cor(data), n = nrow(data))
pc.data <- pc(suffstat,
indepTest=gaussCItest(1,6,2,suffstat),
p=ncol(data),alpha=0.01)
Error:
Error in indepTest(x, y, nbrs[S], suffStat) :
could not find function "indepTest"
Below is the code that worked.removed the parameters for gaussCItest as its a function, which can be used directly.
library(pcalg)
suffstat <- list(C = cor(data), n = nrow(data))
pc.data <- pc(suffstat,indepTest=gaussCItest, p=ncol(data),alpha=0.01)

Error in boot.ci() function in R

I am trying to calculate bootstrap confidence intervals.
Here is my code.
library(boot)
nboot <- 10000 # Number of simulations
alpha <- .01 # alpha level
n <- 1000 # sample size
bootThetaQuantile <- function(x,i) {
quantile(x[i], probs=.5)
}
raw <- rnorm(n,0, 1) # raw data
( theta.boot.median <- boot(raw, bootThetaQuantile, R=nboot) )
boot.ci(theta.boot.median, conf=(1-alpha)) #this causes no error
boot.ci(theta.boot.median, conf=(1-alpha), type = "percent") #this causes an error
The error message reads "Error in ci.out[[4L]] : subscript out of bounds". I am very confused by this because I am not sure why the call to boot.ci will cause an error when the previous line caused no error.
That is because you have to use type = 'perc'.
boot.ci(theta.boot.median, conf=(1-alpha), type = "perc")

How to solve "The data cannot have more levels than the reference" error when using confusioMatrix?

I'm using R programming.
I divided the data as train & test for predicting accuracy.
This is my code:
library("tree")
credit<-read.csv("C:/Users/Administrator/Desktop/german_credit (2).csv")
library("caret")
set.seed(1000)
intrain<-createDataPartition(y=credit$Creditability,p=0.7,list=FALSE)
train<-credit[intrain, ]
test<-credit[-intrain, ]
treemod<-tree(Creditability~. , data=train)
plot(treemod)
text(treemod)
cv.trees<-cv.tree(treemod,FUN=prune.tree)
plot(cv.trees)
prune.trees<-prune.tree(treemod,best=3)
plot(prune.trees)
text(prune.trees,pretty=0)
install.packages("e1071")
library("e1071")
treepred<-predict(prune.trees, newdata=test)
confusionMatrix(treepred, test$Creditability)
The following error message happens in confusionMatrix:
Error in confusionMatrix.default(rpartpred, test$Creditability) : the data cannot have more levels than the reference
The credit data can download at this site.
http://freakonometrics.free.fr/german_credit.csv
If you look carefully at your plots, you will see that you are training a regression tree and not a classication tree.
If you run credit$Creditability <- as.factor(credit$Creditability) after reading in the data and use type = "class" in the predict function, your code should work.
code:
credit <- read.csv("http://freakonometrics.free.fr/german_credit.csv" )
credit$Creditability <- as.factor(credit$Creditability)
library(caret)
library(tree)
library(e1071)
set.seed(1000)
intrain <- createDataPartition(y = credit$Creditability, p = 0.7, list = FALSE)
train <- credit[intrain, ]
test <- credit[-intrain, ]
treemod <- tree(Creditability ~ ., data = train, )
cv.trees <- cv.tree(treemod, FUN = prune.tree)
plot(cv.trees)
prune.trees <- prune.tree(treemod, best = 3)
plot(prune.trees)
text(prune.trees, pretty = 0)
treepred <- predict(prune.trees, newdata = test, type = "class")
confusionMatrix(treepred, test$Creditability)
I had the same issue in classification. It turns out that there is ZERO observation in a specific group therefore I got the error "the data cannot have more levels than the referenceā€.
Make sure there all groups in your test set appears in your training set.

Resources