theta.sparse error with lorDIF - r

I was wondering whether anyone can help me out.
I am trying to run a dif analysis on my data but keep getting a theta.sparse error, which I am unsure of how to fix. I would really appreciate any that you can give me.
library(lordif)
dat<- read.csv2("OPSO.csv",header=TRUE)
datgender <- read.csv2("DATA.csv",header=TRUE)
group<-datgender$Gender
sink("outputDIFopso.txt")
gender.difopso <- lordif(dat, group, selection = NULL,
criterion = c("Chisqr", "R2", "Beta"),
pseudo.R2 = c("McFadden", "Nagelkerke", "CoxSnell"), alpha = 0.01,
beta.change = 0.1, R2.change = 0.02, maxIter = 10, minCell = 5,
minTheta = -4, maxTheta = 4, inc = 0.1, control = list(), model = "GRM",
anchor = NULL, MonteCarlo = FALSE, nr = 100)
print(gender.difopso)
summary(gender.difopso)
sink()
pdf("graphtestop.pdf")
plot(gender.difopso)
dev.off()
dev.off()
Error in lordif(dat, group, selection = NULL, criterion = c("Chisqr", :
object 'theta.sparse' not found
Thank you :)

You should check the error line before then. The output will probably say you have no items flagged for DIF. When that's the case you should just run the mirt function and extract theta and ipar objects as necessary.
The author could add some case handling for when compare(flags, flags.matrix) is true. At the very least, it seems a warning is omitted when there are no items with DIF the same way it says
if (ndif == ni) {
warning("all items got flagged for DIF - stopping\n")
}
and there is no case handling when (ndif == 0) although compare(flags, flag.matrix) evaluates to TRUE.
The implications when all or none of the items have DIF is that you would get the same results (generating the same ICC plots, same inference etc) by fitting mirt in the combined sample (no DIF) or two or more mirt models for each group (all DIF). So it's a correct time saving procedure to just bypass when all that breaks down.

Related

Consistent error message while running grouping analysis in 'plspm' package

I am looking for some help in resolving an error using the partial least squares path modeling package ('plspm').
I can get results running a basic PLS-PM analysis but run into issues when using the grouping function, receiving the error message:
Error in if (w_dif < specs$tol || iter == specs$maxiter) break : missing value where TRUE/FALSE needed
I have no missing values and all variables have the proper classification. Elsewhere I read that there is a problem with processing observations with the exact same values across all variables, I have deleted those and still face this issue. I seem to be facing the issue only when I run the groups using the "bootstrap" method as well.
farmwood = read.csv("farmwood_groups(distance).csv", header = TRUE) %>%
slice(-c(119:123))
Control = c(0,0,0,0,0,0)
Normative = c(0,0,0,0,0,0)
B_beliefs = c(0,0,0,0,0,0)
P_control = c(1,0,0,0,0,0)
S_norm = c(0,1,0,0,0,0)
Behavior = c(0,0,1,1,1,0)
farmwood_path = rbind(Control, Normative, B_beliefs, P_control, S_norm, Behavior)
colnames(farmwood_path) = rownames(farmwood_path)
farmwood_blocks = list(14:18,20:23,8:13,24:27,19,4:7)
farmwood_modes = rep("A", 6)
farmwood_pls = plspm(farmwood, farmwood_path, farmwood_blocks, modes = farmwood_modes)
ames(farmwood)[names(farmwood) == "QB3"] <- "Distance"
farmwood$Distance <- as.factor(farmwood$Distance)
distance_boot = plspm.groups(farmwood_pls, farmwood$Distance, method = "bootstrap")
distance_perm = plspm.groups(farmwood_pls, farmwood$Distance, method = "permutation")
The data is contained here:
https://www.dropbox.com/s/8vewuupywpi1jkt/farmwood_groups%28distance%29.csv?dl=0
Any help would be appreciated. Thank you in advance

R error "attempt to select less than one element in get1index"

I need to estimate some parameters in gp model so I install "kergp" package in R to generate a customized kernel. When I use "mle" function to estimate parameters, I get the error as below:
Error in fitList[[bestIndex]] : attempt to select less than one element in get1index
Here is the code:(where initial_data is a dataframe.)
q4<-q1CompSymm(af, input = "af", cov = "corr", intAsChar = TRUE)
quan<-covTP(k1Fun1 = k1Fun1PowExp,d = 4,cov = "homo",
iso1 = 0,parLower = c(rep(0,9)), parUpper = c(2,2,2,2,Inf,Inf,Inf,Inf,Inf))
inputNames(quan)<-c("hl","hn1","hn2","hn3")
kernel<-covComp(formula = ~quan()*q4())
mle(kernel,
parCovIni = c(rep(0.5,10)),
initial_data$y,select(initial_data,-c('index','y','iw','rts')), F = NULL,
parCovLower = c(rep(0,10)),
parCovUpper = c(2,2,2,2,Inf,Inf,Inf,Inf,Inf,1),
noise = TRUE, varNoiseIni = var(initial_data$y) / 10,
optimFun = "stats::optim",
optimMethod = "BFGS")
I had seen the related document from github(https://github.com/cran/kergp/blob/master/R/methodMLE.R) but still couldn't figure out how this situation happen...
Besides, this error occurred after some iterations if there are any help. Any help would be appreciated. Thanks

Error calling rCBA::fpgrowth: method fpgrowth with signature (DDI)[[Ljava/lang/String; not found

I wrote the R code below to mine with the FP-Growth algorithm:
fpgabdata <- read.csv('../Agen Biasa.csv', header = FALSE)
train <- sapply(fpgabdata, as.factor)
train <- data.frame(train, check.names = TRUE)
txns <- as(train,"transactions")
abrulesfpg = rCBA::fpgrowth(txns, support = 0.25, confidence = 0.5, maxLength = 10, consequent = NULL, verbose = TRUE, parallel = TRUE)
But I get the following error:
Error in .jcall(jPruning, "[[Ljava/lang/String;", "fpgrowth", support, :
method fpgrowth with signature (DDI)[[Ljava/lang/String; not found
These are my data:
The reason you are seeing this error is that the current implementation of the FP-growth algorithm in rCBA requires that you specify a value for the consequent (right hand side).
For example, the following should work, assuming you have sensible thresholds for support and confidence:
abrulesfpg = rCBA::fpgrowth(
txns,
support = 0.25,
confidence = 0.5,
maxLength = 10,
consequent = "SPIRULINA",
verbose = TRUE,
parallel = TRUE
)
I know the OP is likely to have discovered this by now, but I've answered this just in case anyone else encounters the same error.

xgb.plot.multi.trees What is the meaning of the numbers in the parentheses?

It isn't number of trees, since I only trained 25. It also isn't the value of the variable. This is evident by the scale of the values in the parenthesis, which doesn't make sense since many variables are logged. I checked the documentation and there was no explanation. Any ideas or other references?
df1 <- xgb.train(data = X_train_dmat,
eta = 0.1,
max_depth = 5,
nround=25,
subsample = 0.5,
colsample_bytree = 0.5,
booster = 'gbtree',
objective = 'reg:squarederror',
nthread = 3
)
xgb.plot.multi.trees(model = df1,
features_keep = 5,
use.names=FALSE,
plot_width = NULL,
plot_height = NULL,
render = TRUE
)
Looking at the source code, https://github.com/dmlc/xgboost/blob/master/R-package/R/xgb.plot.multi.trees.R#L94, this is the part creating the nodes in the tree:
nodes.dt <- tree.matrix[
, .(Quality = sum(Quality))
, by = .(abs.node.position, Feature)
][, .(Text = paste0(Feature[1:min(length(Feature), features_keep)],
" (",
format(Quality[1:min(length(Quality), features_keep)], digits=5),
")") %>%
paste0(collapse = "\n"))
, by = abs.node.position]
Specifically, this is the code that writes those numbers:
format(Quality[1:min(length(Quality), features_keep)], digits=5)
So, those numbers show the quality of each node, which I think reflects how appropriately that node divides the data. It's been a while since I dealt with these models and I've never been savvy, so I cannot be sure of my interpretation. If you want further explanation about the meaning of quality, you may dig deeper in the source code to figure out how it gets calculated.

Genetic Algorithm Optimization

I asked a question a few weeks back regarding how one would do optimization in R(Optimizing for Vector Using Optimize R). Now that I have a proper grip with basic optimization in R, I would like to start employing GA's to solve for solutions.
Given the objective function:
div.ratio <- function(weight, vol, cov.mat){
weight <- weight / sum(weight)
dr <- (t(weight) %*% vol) / (sqrt(t(weight) %*% cov.mat %*% (weight)))
return(-dr)
}
I am using genalg package for optimizing, specifically the "rbga.bin" function. But the thing is one cannot seem to pass in more than one parameter, ie can't pass in vol and cov.mat. Am I missing something or understanding this incorrectly.
Edit:
In the genalg package, there is a function called rbga.bin which is the one I am using.
Here is the simple code from previous question that can get you started:
rm(list=ls())
require(RCurl)
sit = getURLContent('https://github.com/systematicinvestor/SIT/raw/master/sit.gz', binary=TRUE, followlocation = TRUE, ssl.verifypeer = FALSE)
con = gzcon(rawConnection(sit, 'rb'))
source(con)
close(con)
load.packages('quantmod')
data <- new.env()
tickers<-spl("VTI,VGK,VWO,GLD,VNQ,TIP,TLT,AGG,LQD")
getSymbols(tickers, src = 'yahoo', from = '1980-01-01', env = data, auto.assign = T)
for(i in ls(data)) data[[i]] = adjustOHLC(data[[i]], use.Adjusted=T)
bt.prep(data, align='remove.na', dates='1990::2013')
prices<-data$prices[,-10]
ret<-na.omit(prices/mlag(prices) - 1)
vol<-apply(ret,2,sd)
cov.mat<-cov(ret)
out <- optim(par = rep(1 / length(vol), length(vol)), # initial guess
fn = div.ratio,
vol = vol,
cov.mat = cov.mat,
method = "L-BFGS-B",
lower = 0,
upper = 1)
opt.weights <- out$par / sum(out$par) #optimal weights
While the above optim function works just fine, I was thinking if it is possible to reproduce this using a GA algorithm. So in the future if I am searching for multiple objectives I will be able to do this faster compared to GA. (I am not sure if it is faster, but this is the step to take to find out)
GAmodel <- rbga.bin(size = 7, #genes
popSize = 200, #initial number of chromosomes
iters = 100, #number of iterations
mutationChance = 0.01, #chance of mutation
evalFunc = div.ratio) #objective function
Doing the above seems to produce an error as div.ratio needs extra paramters, so I am looking for some help in structuring my problem so that it will be able to produce the optimal answer. I hope the above edit clarifies things.
Thanks
This is what you need:
GAmodel <- rbga(stringMin=rep(0, length(vol)), stringMax=rep(1, length(vol)),
popSize = 200,
iters = 100,
mutationChance = 0.01,
evalFunc = function(weight) div.ratio(weight, vol=vol, cov.mat=cov.mat))
(see first and last lines above).
The problems were:
vectors weight and vol must match lengths.
function evalFunc is called with a single parameter, causing the others to be missing. As I understand, you want to optimize in the weight vector only, keeping vol and cov.mat fixed.
If you want weight to be treated as a continuous variable, then use rbga instead.

Resources