Can the ergm/statnet package deal with missing attribute data? - r

Just starting out with ERGM so apologies if the following question is not logical. I have tried to search on this site, and statnet_help, with no luck.
I was wondering whether the ergm() function in statnet can now cope with missing data on attributes? I have coded it as 'na' in R but running the following ergm model resulted in an error.
> m2 <- ergm(d1~edges + nodecov('wellbeing'))
> Error in ergm.getglobalstats(nw, model, response = response) :
> NA/NaN/Inf in foreign function call (arg 13)
The attribute variable in question is continuous.
Many thanks,
S

I don't think it is possible to have NAs on edge/node covariates. It is not very clear how should they be treated anyway. Depending on your interests in tracing the importance of nodes with missing data you might try:
Imputing NAs with some sensible values (even a mean)
Adding a binary covariate equal to 1 for NA and 0 otherwise and using it in nodecov and perhaps some other effects to check whether there is any evidence for these nodes to have some special role in the network structure.

Related

Not losing observations when faced with missing data

I have a dataset where I've fitted a linear model and I've tried to use the step function on this linear model. I get an error message "saying number of rows in use has changed: remove missing values?".
I noticed that a few of the observations (not many) in my dataset had NA values for one variable. I've seen similar questions which suggest using na.omit(), but when I do this I lose the observations. I want to keep the observations however, because they contain useful information for the other variables. Is there a way to use step and avoid losing the observations?
You can call the nobs function to check that the number of observations is unchanged, and its use.fallback argument to potentially guess the missing values. The R documentation however recommends omitting the relevant data before running step.
I would discourage you from simply omitting the missing values if they are indeed really missing. You can use multiple imputation via Amelia to impute the data such that you have a full dataset.
see here: https://cran.r-project.org/web/packages/Amelia/Amelia.pdf
also I would recommend reviewing the book "Statistical Analysis With Missing Data" by R. Little and D.B. Rubin.

Multiple predictors with the smbinning package

This might not be the right place to ask but I'm not sure where else to ask it. I'm trying to use the smbinning package. In particular, I'm trying to bin by multiple predictor variables. The issue is all the examples in the package documentation only deal with one predictor variable. I tried this naively:
result=smbinning(df=training,y="FlagGB",x=".,",p=.05)
which seemed to execute okay, but then if I tried to run result$ivtable I got the error
Error in result$ivtable : $ operator is invalid for atomic vectors
Does anyone know a) how to get smbinning to accept multiple predictors or if it can't another package that can; b) how to resolve the specific error listed above?
I have solved the problem ,It is because the training may not a data frame, you have to convert training into data frame with as.data.frame(training). you can see the smbinning code (https://github.com/cran/smbinning/blob/master/R/smbinning.R#L490), there is this block
i=which(names(df)==y) # Find Column for dependant
j=which(names(df)==x) # Find Column for independant
if (!is.numeric(df[,i]))
{
return("Target (y) not found or it is not numeric")
}
secondly,the y FlagGB must be numerical ,if your y varible is factor ,you have to convert to numerical ,you can use as.numeric(as.character(y)) not directly use as.numerical()
the problem is similarly to "Target (y) not found or it is not numeric" -Package smbinning - R
Have you looked into "Information" package? It seems to be doing the job, but there is no facility to recode the variable. Of if there is one, I haven't been able to find. Otherwise, it is a really great package for exploration and analysis of the variables.
To answer b) you should do: result and (most probably) see that the function in fact did not execute for the specific reason that you will get in return.
Indeed, it is a bit confusing that the smbinning package returns its errors silently and within the variable itself.
Question a), on the other hand, is hard to answer without looking at the data. You can try to cross/multiply your variables, but that may result in a very large number of factor levels. I would suggest that you apply the smbinnign package to group each of your characteristics into a few groups and then try to cross the groups.
for question a), you should use sumiv method which can calculates IV for all variables in one step. code like:
sumivt=smbinning.sumiv(chileancredit.train,y="FlagGB")
sumivt # Display table with IV by characteristic

How do I set the levels in a dataset using the model data structure from bnlearn?

I'm trying to use models from the bnlearn package in R to do classifier predictions, but with some datasets, some ofthe variable values (levels) are rarely seen, which means that the test data partition may not have all of the values for variable represented in the data file.
When using predict() with the bn model on this type of data set, an error message similar to the following is returned:
: In check.data(data) : variable V3 has levels that are not observed
in the data.
I would like to reset the levels in the model similar to the method here:
Error in bn.fit predict function in bnlear R
but I don't have access to the original data, just the model.
So, how do I get the number of levels from the bn data structure to set the number of levels in the data set to be predicted?
The answer is that the question is asking the wrong thing. After quite a bit of poring over the code, the answer lies in a function, check.data, used to verify the data for both the learning and the predicting phases, which is, in this case, non-sensical. The correct answer is to modify bnlearn to eliminate this bug.

Error thrown while imputing values using regression FNN package in R

I am trying to impute missing values using regression and have searched thoroughly online and it hasn't been of much help. I read the FNN package documentation for the knn.reg function and find it difficult to interpret. I have a column of missing values in the test data which i want to predict using my training data and have a code like this ::
regress<-knn.reg(data.train[data.train[,4]==1,][c(1,2,3)],test=data.test[c(1,2,3)],data.test[c(2)],5)
But I get the following error:: Error in get.knnx(train, test, k, algorithm) : Data include NAs. The column which contains missing values is col #2. When I exclude the column which has NA values i.e.
regress<-knn.reg(data.train[data.train[,4]==1,][c(1,2,3)],test=data.test[c(1,3)],data.test[c(2)],5)
I get an error:: Error in get.knnx(train, test, k, algorithm) : Number of columns must be same!. Please help !!
You might want to consider the mice package (and read part of the paper).
Using standard settings which have been proven to a good starting point:
library(mice)
mi <- mice(dataset)
mi.reg <- with(data=mi,exp=glm(y~x+z))
Here, simply calling mice() on your data will fill in each NA value. Finer tuning is of course possible (and needed if it would take too long to converge, or if you have reason to believe it is not accurate). Many different types of imputations are possible and are listed on page 16.

clusterboot function in the fpc package

I have a dataset of various measurements of eggs and coloration patterns etc.
I want to group these into clusters. I have used hierarchical clustering on the dataset, but I haven't found a good way to verify or validate the clusters.
I've heard discussion of cluster stability, and I want to use something like the clusterboot function in the fpc package. For some reason I can't get it to work though. I was wondering if there is anyone on here who has experience with this function.
Here is the code I was using below:
dMOFF.2007<-dist(MOFF.2007)
cf1<-clusterboot(MOFF.2007,B=3,bootmethod=boot,bscompare=TRUE,multipleboot=TRUE,clustermethod=hclust)
I'm just starting to understand what all of this means. I have experience with R but not with this specific function or much with cluster analyses.
I get this error:
Error in if (is.na(n) || n > 65536L) stop("size cannot be NA nor exceed 65536") :
missing value where TRUE/FALSE needed
Any thoughts? What am I doing wrong?
Just came across this because I'm working with clusterboot too--are you still stuck on this? I have two basic thoughts: 1) wouldn't you want to pass the distance matrix to clusterboot (dMOFF.2007) instead of the raw data (MOFF.2007)? 2) for the clustermethod argument, I believe it should be hclustCBI, not hclust. Hope you've got it working.

Resources