Clustering with bigkmeans from bigmemory package in R? - r

I recently started experimenting with the biganalytics package for R. I ran into a problem however...
I am trying to run bigkmeans with a cluster number of about 2000 e.g clust <- bigkmeans(mymatrix, centers=2000)
However, I get the following error:
Error in 1:(10 + 2^k) : result would be too long a vector
Can someone maybe give me a hint what I am doing wrong here?

Vectors are limited by the type used for the index -- there is/was some talk about replacing this index type by a double but it hasn't happen yet and is unlikely as it may break so much existing code.
If your k is really large, you may not be able to do this the way you had planned.

Related

unrecognized function Nn in R

I am learning R package SimInf to simulate data-driven stochastic epidemiological models. As I was reading the documentation I came across an unrecognized funcion Nn when defining a function for epicurves. Specifically, this line:
j <- sample(seq_len(Nn(model)), 1)
Values of model are integers. My guess is that Nn selects non-negative values, however my R does not recognize this function. From documentation it does not look like they pre-defined Nn either. Can someone please tell if they know what "Nn" is for? Thank you.
A way to go is always taking the package-name and triple-":" it, such that you can find nearly all functions inside the package. Maybe you are familiar with namespacing a function via packageName::functionFrompackageTocall. The packageName::: shows (nearly) all functions defined in this package. If you do this in R-Studio with SimInf:: and SimInf:::, you will see that the latter gives much more functions. But you can only find the functions SimInf:::Nd and SimInf:::Nc, not the Nn-function. Hence you will have to go to the github-sources of the package, in this case https://github.com/stewid/SimInf .Then search for Nn the whole repository. You will see that it seems like it is always an int, but this doesn't help you since you want to get ii as a function, not as a variable. Scrolling further down in the search-results, you will find the NEWS.md-file which mentions The 'Nn' function to determine the number of nodes in a model has been replaced with the S4 method 'n_nodes'. in the https://github.com/stewid/SimInf/blob/fd7eb4a29b82a4a97f64b528bb0e78e5474aa8a5/NEWS.md file under SimInf 8.0.0 (2020-09-13). Hence having a current version of SimInf installed, it shouldn't use the method Nn anymore. If you use it in your code, replace it by n_nodes. If you find it in current package code, you can email the package-maintainer that you found a bug in his code.
TLDR: Nn is an outdated version of n_nodes

R simmer package question about get_mon_arrivals

I'm currently learning how to use the simmer package in R in order to simulate processes.
I'm trying to gather information regarding a simulation I've built, using the get_mon_arrivals function.
I've noticed something weird about running this function - when I run:
arrivalData <- get_mon_arrivals(Mall)
arrivalDataOngoing <- get_mon_arrivals(Mall,ongoing=TRUE)
I get 2 different tables - just as expected, the first one containing rows for finished customers only, while the second one contains rows for unfinished customers as well, which are the customers that were generated but the simulation ended before they managed to finish the trajectory.
But if I write it the other way around, meaning:
arrivalDataOngoing <- get_mon_arrivals(Mall,ongoing=TRUE)
arrivalData <- get_mon_arrivals(Mall)
I get the same exact table in both cases. I know it's not something important, but I would really like to understand WHY it does that. I know I can fix it easily by going with the first option, but I am a man who likes to understand what he does.
Thanks alot for the help

Silhouette not working in r

Running R 3.5.1 in R Studio.
I’ve edited pam.res$clustering to manually change the clusters.
Used silhouette() to try to observe the silhouette info for the edited clustering:
mahal<-D2.dist(data, cov.wt(data)$cov)
newsil<-silhouette(pam.res$clustering, mahal)
And all I get from summary(newsil) is
Mode NA’s
logical 1
I can’t reference within newsil, as it’s an atomic vector, which it shouldn’t be.
Can’t figure out what’s gone wrong. Any ideas? Thanks.

clusterboot function in the fpc package

I have a dataset of various measurements of eggs and coloration patterns etc.
I want to group these into clusters. I have used hierarchical clustering on the dataset, but I haven't found a good way to verify or validate the clusters.
I've heard discussion of cluster stability, and I want to use something like the clusterboot function in the fpc package. For some reason I can't get it to work though. I was wondering if there is anyone on here who has experience with this function.
Here is the code I was using below:
dMOFF.2007<-dist(MOFF.2007)
cf1<-clusterboot(MOFF.2007,B=3,bootmethod=boot,bscompare=TRUE,multipleboot=TRUE,clustermethod=hclust)
I'm just starting to understand what all of this means. I have experience with R but not with this specific function or much with cluster analyses.
I get this error:
Error in if (is.na(n) || n > 65536L) stop("size cannot be NA nor exceed 65536") :
missing value where TRUE/FALSE needed
Any thoughts? What am I doing wrong?
Just came across this because I'm working with clusterboot too--are you still stuck on this? I have two basic thoughts: 1) wouldn't you want to pass the distance matrix to clusterboot (dMOFF.2007) instead of the raw data (MOFF.2007)? 2) for the clustermethod argument, I believe it should be hclustCBI, not hclust. Hope you've got it working.

How to use DWD R package in order to remove biases and merge two microarray datasets

I am trying to find a way to use distance weighted discrimination method (DWD) to remove biases from multiple microarray datasets.
My starting point is this. The problem is that Matlab version runs only under Windows, needs excel 5 format as input (where data appears to be truncated at line 65535 - matlab error is:
Error reading record for cells starting at column 65535. Try saving as Excel 98.
). Java version runs only with caBIG support, which, if I understood, has been shut down recently.
So I searched a lot and I find R/DWD package but from example I could not get how to provide the two datasets to merge to kdwd function.
Does anybody know how to use it?
Thanks
Try this, it has a DWD implementation
http://www.bioconductor.org/packages/release/bioc/html/inSilicoMerging.html

Resources