Errors when running HERGM and MLERGM models - network-analysis

I'm trying to run a HERGM and MLERGM models on network data, but my code always returns the same error:
Error in rep(" ", max_char - num_chars[i]) : invalid 'times' argument In addition: Warning >message: In max(num_chars) : no non-missing arguments to max; returning -Inf
I'm using a much larger network dataset, consisting of 5969 nodes and explanatory variables, but I've made a smaller, reproducible example below of some basic edges-only models.
library(mlergm)
library(hergm)
#HERGM
my_sociomatrix <- matrix(round(runif(20*20)), # edge values
nrow = 20, #nrow must be same as ncol
ncol = 20)
test.network <-
network(x = my_sociomatrix,
directed = F, matrix.type = "adjacency")
test.model <-
hergm(test.network ~
edges_ij,
max_iter = 4,
method = "ml")
#MLERGM
my_sociomatrix <- matrix(round(runif(30*30)),
nrow = 30,
ncol = 30)
node_memb <- c(rep(1, 10), rep(2, 10), rep(3, 10))
mlnet <- mlnet(network = my_sociomatrix,
node_memb = node_memb)
model_est <- mlergm(mlnet ~ edges)
My question is, why am I encountering such an error? And what can I do to solve it?

Did you try to include a gwesp term in your last line of code to mlergm:
model_est <- mlergm(mlnet ~ edges + gwesp)?
This worked for me, although I'm not sure why.
I don't have knowledge about hergm, but maybe something similar works there.

Related

ParBayesianOptimization suddenly fails while logging epoch results

I am currently using the R package ParBayesianOptimization to tune parameters for ML methods. While searching for an optimal cost parameter for the svmLinear2 model (contained in caret), the optimization stopped with a sudden error after successfully completing 15 iterations.
Here is the error traceback:
Error in rbindlist(l, use.names, fill, idcol) :
Item 2 has 9 columns, inconsistent with item 1 which has 10 columns. To fill missing columns use fill=TRUE.
7.
rbindlist(l, use.names, fill, idcol)
6.
rbind(deparse.level, ...)
5.
rbind(scoreSummary, data.table(Epoch = rep(Epoch, nrow(NewResults)),
Iteration = 1:nrow(NewResults) + nrow(scoreSummary), inBounds = rep(TRUE,
nrow(NewResults)), NewResults))
4.
addIterations(optObj, otherHalting = otherHalting, iters.n = iters.n,
iters.k = iters.k, parallel = parallel, plotProgress = plotProgress,
errorHandling = errorHandling, saveFile = saveFile, verbose = verbose,
...)
3.
ParBayesianOptimization::bayesOpt(FUN = ...
So somehow the data tables storing the summary information each iteration suddenly differ in the number of columns present. Is this a common bug with the ParBayesianOptimization package? Has anyone else encountered a similar problem? Did you find a fix - other than rewriting the addIterations function to fill the missing columns?
EDIT:I don't have an explanation for why the error may suddenly occur after a number of successful iterations. However, this issue has reoccurred when using svmLinear and svmRadial. I was able to reconstruct a similar case with the same error on the iris dataset:
library(data.table)
library(caret)
library(ParBayesianOptimization)
set.seed(1234)
bayes.opt.bounds = list()
bayes.opt.bounds[["svmRadial"]] = list(C = c(0,1000),
sigma = c(0,500))
svmRadScore = function(...){
grid = data.frame(...)
mod = caret::train(Species~., data=iris, method = "svmRadial",
trControl = trainControl(method = "repeatedcv",
number = 7, repeats = 5),
tuneGrid = grid)
return(list(Score = caret::getTrainPerf(mod)[, "TrainAccuracy"], Pred = 0))
}
bayes.create.grid.par = function(bounds, n = 10){
grid = data.table()
params = names(bounds)
grid[, c(params) := lapply(bounds, FUN = function(minMax){
return(runif(n, minMax[1], minMax[2]))}
)]
return(grid)
}
prior.grid.rad = bayes.create.grid.par(bayes.opt.bounds[["svmRadial"]])
svmRadOpt = ParBayesianOptimization::bayesOpt(FUN = svmRadScore,
bounds = bayes.opt.bounds[["svmRadial"]],
initGrid = prior.grid.rad,
iters.n = 100,
acq = "ucb", kappa = 1, parallel = FALSE,plotProgress = TRUE)
Using this example, the error occurred on the 9th epoch.
Thanks!
It appears that the scoring function returned NAs in place of accuracy measures leading to the error later downstream. This has been described by the library's creator at
https://github.com/AnotherSamWilson/ParBayesianOptimization/issues/33.
It looks like the SVM is trying a cost of 0 during the 9th iteration. Given the problem statement the SVM is solving, the cost parameter should probably be positive.
According to AnotherSamWilson, this error may commonly occur when the scoring function "returns something unexpected".

Parallel processing in R : Error in checkForRemoteErrors(val) : 7 nodes produced errors; first error: unused argument (MARGIN = base::quote(2))

I am trying to do parallel computing using the apply family of functions with the following lines of code.
the objective is to fit each column of the silum_ matrix to my specification and when I check
dim((simul_[,1]))
I get "NULL" which causes the problem with the apply function.
The complete code is the following:
## Library
lib_vec = c("MSGARCH", "matrixStats", "parallel")
invisible(lapply(lib_vec, f_install_load_lib))
## seed
set.seed(1234)
MSGARCH model Specification from the MSGARCH package
MSGARCH_spec <- CreateSpec(variance.spec = list(model = c("sGARCH", "sGARCH")),
distribution.spec = list(distribution = c("norm",
"norm")),
switch.spec = list(do.mix = FALSE, K = NULL),
constraint.spec = list(fixed = list(),
regime.const = NULL),
prior = list(mean = list(), sd = list()))
MSGARCH fitting: the sp500_logrets are just log-returns.
MSgarch_fit <- FitML(data = sp500_logrets, spec = MSGARCH_spec)
Simulating MSGARCH Log_returns
nsim <- 100 # number of simulations
nahead <- 1000 # size of each simualtion
MS_simul <- simulate(MSgarch_fit, nsim = nsim, nahead = nahead, n.start = 500,
nburn = 100)
simul_ <- MS_simul$draw # retrieving the simulated data
Parallel computing settings
n_cores <- detectCores()
cl <- makeCluster(n_cores[1] - 1)
fitting each simulation by parallel computing with the apply functions
fitt_ <- parSapply(cl, X = simul_, MARGIN = 2, FUN = FitML, spec = MSGARCH_spec)
stopCluster(cl)
The error I get is :
7 nodes produced errors; first error: unused argument (MARGIN = base::quote(2))
I think that
I am quite lost and would very much appreciate any help :)
The error is quite explicit:
7 nodes produced errors; first error: unused argument (MARGIN = base::quote(2))
There is no parameter MARGIN in parSapply() / sapply().
You're maybe mistaking with apply().

Problem with Over- and Under-Sampling with ROSE in R

I have a dataset to classify between won cases (14399) and lost cases (8677). The dataset has 912 predicting variables.
I am trying to oversample the lost cases in order to reach almost the same number as the won cases (so having 14399 cases for each of the won and lost cases).
TARGET is the column with lost (0) and won (1) cases:
table(dat_train$TARGET)
0 1
8677 14399
Now I am trying to balance them using ROSE ovun.sample
dat_train_bal <- ovun.sample(dat_train$TARGET~., data = dat_train, p=0.5, seed = 1, method = "over")
I get this error:
Error in parse(text = x, keep.source = FALSE) :
<text>:1:17538: unexpected symbol
1: PPER_409030143+BP_RESPPER_9639064007+BP_RESPPER_7459058285+BP_RESPPER_9339059882+BP_RESPPER_9339058664+BP_RESPPER_5209073603+BP_RESPPER_5209061378+CRM_CURRPH_Initiation+Quotation+CRM_CURRPH_Ne
Can anyone help?
Thanks :-)
Reproducing your code from a sham example I found an error in your formula dat_train$TARGET~. needs to be corrected as TARGET~.
dframe <- tibble::tibble(val = sample(c("a", "b"), size = 100, replace = TRUE, prob = c(.1, .9))
, xvar = rnorm(100)
)
# Use oversampling
dframe_os <- ROSE::ovun.sample(formula = val ~ ., data = dframe, p=0.5, seed = 1, method = "over")
table(dframe_os$data$val)

MXNetR Not enough information to get shape

I am implementing a neural network in MXNetR. I attempted to customize my loss function to compute the correlation between my output vector and the targeting vector. Below is my code:
Below is my code:
# Generate testing data
train.x = matrix(data = rexp(200, rate = 10), nrow = 120, ncol = 6380)
test.x = matrix(data = rexp(200, rate = 10), nrow = 60, ncol = 6380)
train.y = matrix(data = rexp(200, rate = 10), nrow = 120, ncol = 319)
test.y = matrix(data = rexp(200, rate = 10), nrow = 60, ncol = 319)
# Reshape testing data
train.array <-train.x
dim(train.array) <-c(20,319,1,ncol(train.x))
test.array<-test.x
dim(test.array) <-c (20,319,1,ncol(test.x))
# Define the input data
data <- mx.symbol.Variable("data")
# Define the first fully connected layer
fc1 <- mx.symbol.FullyConnected(data, num_hidden = 100)
act.fun <- mx.symbol.Activation(fc1, act_type = "relu") # create a hidden layer with Rectified Linear Unit as its activation function.
output <<- mx.symbol.FullyConnected(act.fun, num_hidden = 319)
# Customize loss function
label <- mx.symbol.Variable("label")
output_mean <- mx.symbol.mean(output)
label_mean <- mx.symbol.mean(label)
output_delta <-mx.symbol.broadcast_sub(output, output_mean)
label_delta <- mx.symbol.broadcast_sub(label, label_mean)
output_sqr <-mx.symbol.square(output_delta)
label_sqr <- mx.symbol.square(label_delta)
output_sd <- mx.symbol.sqrt(mx.symbol.sum(output_delta))
label_sd <- mx.symbol.sqrt(mx.symbol.sum(label_delta))
numerator <- mx.symbol.sum(output_delta * label_delta)
denominator <- output_sd * label_sd
lro <- mx.symbol.MakeLoss(numerator/denominator)
# Generate a new model
model <- mx.model.FeedForward.create(symbol=lro, X=train.array, y=train.y,
num.round=5000, array.batch.size=1, optimizer = "adam",
learning.rate = 0.0003, eval.metric = mx.metric.rmse,
epoch.end.callback = mx.callback.log.train.metric(20, logger))
And I got this error:
Error in mx.model.init.params(symbol, input.shape, initializer, mx.cpu()) :
Not enough information to get shapes
I tried to wrap the whole correlation formula in MXNet:
lro2 <- mx.symbol.MakeLoss(
mx.symbol.negative((mx.symbol.sum(output * label) -
(mx.symbol.sum(output) * mx.symbol.sum(label))) /
mx.symbol.sqrt((mx.symbol.sum(mx.symbol.square(output)) -
((mx.symbol.sum(output)) * (mx.symbol.sum(output)))) *
(mx.symbol.sum(mx.symbol.square(label)) - ((mx.symbol.sum(label)) * (mx.symbol.sum(label))))))
)
I can compile with this version, but my model runs very slowly and the code is apparently not very readable. I wonder if there is any way to implement get around the error and implement the first version as I described above.
MXNet performs shape inference to determine the required shape of the model parameters (weights and biases) in order to allocate memory, and the first time this is done is when the model parameters are initialized.
Somewhere in your symbol you have a shape that can't be inferred from the neighbors, and I suspect it may be the broadcast_sub which you removed in the inline definition. It's hard to diagnose the exact issue due to the error in the reshape. You could also try working with NDArray to test the logic and then convert back to using Symbol.
If you're looking to batch samples, you should change the array.batch.size parameter of mx.model.FeedForward.create rather than reshaping your data into batches.

non-numeric matrix extent error when plotting in R

When running the code of this example I'm getting the following error in the last line:
Error in matrix(mean(range), ncol = ncol(x), nrow = nrow(x), dimnames
= dimnames(x)) : non-numeric matrix extent
However, I remember having seen other cases some months ago where the library arulesViz worked whit categorical data type.
landing.data=read.csv2("http://archive.ics.uci.edu/ml/machine-learning-databases/shuttle-landing-control/shuttle-landing-control.data",
sep=",", header=F, dec=".")
landing.data=as.data.frame(sapply(landing.data,gsub,pattern="\\*",replacement=10))
library(arules)
landing.system <- as(landing.data, "transactions")
rules <- apriori(landing.system, parameter=list(support=0.01, confidence=0.6))
rulesLandingManual <- subset(rules, subset=rhs %in% "V1=1" & lift>1.2)
library(arulesViz)
plot(head(sort(rulesLandingManual, by="confidence"), n=3),
method="graph",control=list(type="items"))
Doing a traceback() after running your code gives this:
6: matrix(mean(range), ncol = ncol(x), nrow = nrow(x), dimnames = dimnames(x))
5: map(m, c(5, 20))
4: graph_arules(x, measure = measure, shading = shading, control,
...)
3: plot.rules(head(sort(rulesLandingManual, by = "confidence"),
n = 3), method = "graph", control = list(type = "items"))
2: plot(head(sort(rulesLandingManual, by = "confidence"), n = 3),
method = "graph", control = list(type = "items"))
1: plot(head(sort(rulesLandingManual, by = "confidence"), n = 3),
method = "graph", control = list(type = "items"))
So, basically the error comes from 6:. And the error implies that any of the argument matrix(.) are not numeric. To illustrate this:
> matrix(1:4, ncol=2)
# [,1] [,2]
# [1,] 1 3
# [2,] 2 4
> matrix(1:4, ncol="x")
# Error in matrix(1:4, ncol = "x") : non-numeric matrix extent
You see the error? I don't think there's nothing much YOU can do here as the package extends graph, map and matrix to objects of class rules. So, this probably has a lot to do with the developer side. If it is indeed the case, probably it is worth writing/contacting the developers.
I had exactly the same problem with some data I was mining rules for, and after doing some tests I found out that this error comes from the use of the sort() and head() commands when there are more rules that met the condition for quality measures than required.
For instance, in your code, you ask to plot the 3 top confidence rules in rulesLandingManual, but if you inspect(rulesLandingManual) you find that there are 216 rules with confidence 1 (max confidence) , so, when you ask to subset the top n (with n less than 217), the matrix generated in this new rules object goes messy, at least for the graph method in the plot function.
To test what I´m explaining, in your code, change n to anything between 217 to 224 (224 is the number of rules in rulesLandingManual) and it will draw the graph, while n = 216 or less will cause the mentioned error.
I don´t know if this is intended to work this way or it is a bug, I am trying to figure it out at the moment, so an explanation will come really handy.
range is a function. Did you mean mean(range(x)), ...?
Mean mean. Heh.

Resources