could not find function "FUNcluster" in R - r

I want to run kmeans clustering on my data and show the plot using this code:
Elbow method is used to calculate number of k.
library(tidyverse) # data manipulation
library(cluster) # clustering algorithms
library(factoextra) # clustering algorithms & visual
library(NbClust) #use zip file to install it
wss <- function(k) {
kmeans(df_km, k, nstart = 25 )$tot.withinss
}
# Compute and plot wss for k = 1 to k = 32
k.values <- 1:32
wss_values <- map_dbl(k.values, wss)
set.seed(123)
fviz_nbclust(df_km, FUNcluster = kmeans, method = "wss")
in the last line of code, it shows this error:
Error in FUNcluster(x, i, ...) : could not find function "FUNcluster"
I tried to restart session, also used .zip, CRAN, and also this link to download the "factoextra"
devtools::install_github("kassambara/factoextra")
But still I get the error. Any solution to solve it?

Related

How to convert a tensor to an R array (in a loss function, so without eager execution)?

I have TensorFlow version 2.4 and work with the R packages tensorflow (2.2.0) and keras (2.3.0.0.9000).
I would like to convert tensors to R arrays in a loss function (don't ask why).
Here is an example when such a conversion (outside a loss function) works:
library(tensorflow)
library(keras)
x.R <- matrix(1:12, ncol = 3) # dummy R object
x.tensor <- keras_array(x.R) # converting the R object to a tensor
as.array(x.tensor) # converting it back to an R array. This works because...
stopifnot(tf$executing_eagerly()) # ... eager execution is enabled
During training of a model, eager execution is FALSE though and thus
the as.array() call fails. To see this, let's first define a dummy
neural network model and training data.
d <- 2 # input and output dimension
in.lay <- layer_input(shape = d)
hid.lay <- layer_dense(in.lay, units = 300, activation = "relu")
out.lay <- layer_dense(hid.lay, units = d, activation = "sigmoid")
model <- keras_model(in.lay, out.lay)
n <- 1200 # number of training samples
data <- matrix(runif(n * d), ncol = d) # training data
Now let's define the loss function and compile the model with it.
myloss <- function(x, y) { # x and y are tensors here
stopifnot(!tf$executing_eagerly()) # confirms that eager execution is disabled
x. <- as.array(x) # ... fails with "RuntimeError: Evaluation error: invalid first argument, must be vector (list or atomic)." How can we convert x to an R array?
loss_mean_squared_error(x, y) # just a dummy return value (the MSE)
}
compile(model, optimizer = "adam", loss = myloss)
Let's try and fit this model (to see that it fails to convert the tensor x to an R array via as.array()).
prior <- matrix(rexp(n * d), ncol = d) # input sample to train the NN on
n.epoch <- 5 # number of epochs to train
batch.size <- 400 # batch size
fit(model, x = prior, y = data, batch_size = batch.size, epochs = n.epoch) # fails with error message given above
The R package tensorflow provides tfe_enable_eager_execution() to enable eager execution
in a session. But if I call it with TensorFlow 2.4, then I obtain:
tfe_enable_eager_execution() # "Error in py_get_attr_impl(x, name, silent) : AttributeError: module 'tensorflow' has no attribute 'contrib'"
Ideally, I wouldn't want to mess with eager execution much (not sure about the side effects),
just converting a tensor to an array. My guess is that there is no other way than eager execution as
only then the pointers are resolved and the R package tensorflow finds the data
in the tensor and is able to convert it to an array.
Other ideas to enable/disable eager execution are mentioned here but that's all in Python
and not available in R it seems. And this this post seems to ask the same question but in a different context.

Conditional simulation (with Kriging) in R with parallelization?

I am using gstat package in R to generate sequential gaussian simulations. My pc have 4 cores and I tried to parallelize the krige() function using the parallel package following the script provided by Guzmán to answer the question How to achieve parallel Kriging in R to speed up the process?.
The resulting simulations are, however, different from the ones using only one core at the time (no parallelization). It looks a geometry problem, but i can't find out how to fix it.
Next i will provide an example (using 4 cores) generating 2 simulations. You will see that after running the code, the simulated maps derived from parallelization show some artifacts (like vertical lines), and are different from the ones using only one core at the time.
The code needs the libraries gstat, sp, raster, parallel and spatstat. If any of the lines library() do not work, run install.packages() first.
library(gstat)
library(sp)
library(raster)
library(parallel)
library(spatstat)
# create a regular grid
nx=100 # number of columns
ny=100 # number of rows
srgr <- expand.grid(1:ny, nx:1)
names(srgr) <- c('x','y')
gridded(srgr)<-~x+y
# generate a spatial process (unconditional simulation)
g<-gstat(formula=z~x+y, locations=~x+y, dummy=T, beta=15, model=vgm(psill=3, range=10, nugget=0,model='Exp'), nmax=20)
sim <- predict(g, newdata=srgr, nsim=1)
r<-raster(sim)
# generate sample data (Poisson process)
int<-0.02
rpp<-rpoispp(int,win=owin(c(0,nx),c(0,ny)))
df<-as.data.frame(rpp)
coordinates(df)<-~x+y
# assign raster values to sample data
dfpp <-raster::extract(r,df,df=TRUE)
smp<-cbind(coordinates(df),dfpp)
smp<-smp[complete.cases(smp), ]
coordinates(smp)<-~x+y
# fit variogram to sample data
vs <- variogram(sim1~1, data=smp)
m <- fit.variogram(vs, vgm("Exp"))
plot(vs, model = m)
# generate 2 conditional simulations with one core processor
one <- krige(formula = sim1~1, locations = smp, newdata = srgr, model = m,nmax=12,nsim=2)
# plot simulation 1 and 2: statistics (min, max) are ok, simulations are also ok.
spplot(one["sim1"], main = "conditional simulation")
spplot(one["sim2"], main = "conditional simulation")
# generate 2 conditional with parallel processing
no_cores<-detectCores()
cl<-makeCluster(no_cores)
parts <- split(x = 1:length(srgr), f = 1:no_cores)
clusterExport(cl = cl, varlist = c("smp", "srgr", "parts","m"), envir = .GlobalEnv)
clusterEvalQ(cl = cl, expr = c(library('sp'), library('gstat')))
par <- parLapply(cl = cl, X = 1:no_cores, fun = function(x) krige(formula=sim1~1, locations=smp, model=m, newdata=srgr[parts[[x]],], nmax=12, nsim=2))
stopCluster(cl)
# merge all parts
mergep <- maptools::spRbind(par[[1]], par[[2]])
mergep <- maptools::spRbind(mergep, par[[3]])
mergep <- maptools::spRbind(mergep, par[[4]])
# create SpatialPixelsDataFrame from mergep
mergep <- SpatialPixelsDataFrame(points = mergep, data = mergep#data)
# plot mergep: statistics (min, max) are ok, but simulated maps show "vertical lines". i don't understand why.
spplot(mergep[1], main = "conditional simulation")
spplot(mergep[2], main = "conditional simulation")
I have tried your code and I think the problem lies with the way you split the work:
parts <- split(x = 1:length(srgr), f = 1:no_cores)
On my dual core machine that meant that all odd indices in srgr where handled by one process and all even indices where handled by the other process. This is probably the source of the vertical artifacts you are seeing.
A better way should be to split the data into consecutive chunks like this:
parts <- parallel::splitIndices(length(srgr), no_cores)
Using this splitting with the rest of your code I get results that look comparable to the sequential ones. At least to my untrained eyes ...
Original answer, which is only a minor effect. It still might make sense to fix the seed with set.seed for sequential and clusterSetRNGStream for parallel processing.
From what I have read about Kriging it requires you to draw random numbers. These random numbers will be different with parallel processing. See section 6 of the parallel vignette (vignette("parallel")) for more details.

Mclust in R: How to output cluster centers

I'm currently using RStudio for doing text mining on Support tickets, clustering them by their description (freetext). For this, I compare kmeans to EM algorithm. I prepared the data with the tm package, and now I try do apply clustering algorithms to the data matrix.
With the kmeans() function, I can use following Code snippet to Output the 5 most frequent Terms in text Clusters (kmeans21):
> for (i in 1:num_cluster) {
cat(paste("cluster ", i, ": ", sep = ""))
s <- sort(kmeans21$centers[i, ], decreasing = T)
cat(names(s)[1:5], "\n")
}
Until now, I couldnt find a function to do the same within the mclust package. My data has the following Format:
> bic21 <- MclustBIC(m1, G=21)
> emmodel21 <- summary(bic21, data = m1)
With the command
> emmodel21$classification
I can see the Cluster for each supportticket, but is there also the possibility to Output the most frequent Terms like in the first Code block for kmeans?
I think you can try
summary(mod1, parameters = TRUE)
Just tried the same example in the link
library(mclust)
data(diabetes)
X <- diabetes[,-1]
BIC <- mclustBIC(X)
mod1 <- Mclust(X, x = BIC)
summary(mod1, parameters = TRUE)
Slightly altering the first example in the vignette:
data(diabetes)
X <- diabetes[,-1]
mod <- mclust(X)
means <- mod$parameters$means
The means object is now a matrix containing the means of the clusters.

K-means clustering of spatially constrained data - skater in spdep package

I want to cluster the codebook from a self-organizing map using k-means clustering. However, given the 'spatial' nature of the data, I want to constrain the clustering so that only contiguous nodes are clustered together.
After looking around, I decided to try and use the function skater in the spdep package.
Here's an example of what I've been doing.
# the 'codebook' data obtained from the self-organizing map.
# My grid is 15 by 15 nodes.
data <- data.frame(var1=rnorm(15*15, mean = 0, sd = 1), var2=rnorm(15*15, mean = 5, sd = 2))
# creating a matrix with all edges listed
# (so basically one row to show a connection between each pair of adjacent nodes)
require(spdep)
nbs <- cell2nb(nrow=15, ncol=15)
edges <- data.frame(node=rep(1:(tt.grid$xdim*tt.grid$ydim), each=4))
edges$nb <- NA
for (i in 1:(tt.grid$xdim*tt.grid$ydim)) {
vals <- nbs[[i]][1:4]
edges$nb[(i-1)*4+1] <- vals[1]
edges$nb[(i-1)*4+2] <- vals[2]
edges$nb[(i-1)*4+3] <- vals[3]
edges$nb[(i-1)*4+3] <-
vals[4] }
edges <- edges[which(!is.na(edges$nb)),]
edges$from <- apply(edges[c("node", "nb")], 1, min)
edges$to <- apply(edges[c("node", "nb")], 1, max)
edges <- edges[c("to", "from")]
edges <- edges[!duplicated(edges),]
edges <- as.matrix(edges)
I know the code above is really clumsy and not elegant (please bear with me). I tried using mstree(nb2listw(nbs))[,1:2] but it didn't list all the links. I'm not sure I quite understood what this was doing, so I created my matrix of edges manually.
Then I tried to use this matrix into the skater function
test <- skater(edges=edges, data=data, ncuts=5)
but I get the following error message:
Error in colMeans(data[id, , drop = FALSE]) :
error in evaluating the argument 'x' in selecting a method for function 'colMeans': Error in data[id, , drop = FALSE] : subscript out of bounds
However, if I use the mstree edges, I don't get an error message but the results don't make sense at all.
test <- skater(edges=mstree(nb2listw(nbs))[,1:2], data=data, ncuts=5)
Any help on this error message (or alternative suggestions as to how to do the spatially constrained clustering I would like to do) is much appreciated.

value of model unknown error on train function from AMORE package

I'm learning to use AMORE package for fitting data with an optimal neural network, so i'm following examples on his wiki page and im' trying to run the following code but I couldn't, train function triggered a message error:
modelLookup(method) : value of model unknown
require(AMORE)
## We create two artificial data sets. ''P'' is the input data set. ''target'' is the output.
P <- matrix(sample(seq(-1,1,length=500), 500, replace=FALSE), ncol=1)
target <- P^2 + rnorm(500, 0, 0.5)
## We create the neural network object
net.start <- newff(n.neurons=c(1,3,1),
learning.rate.global=1e-2,
momentum.global=0.5,
error.criterium="LMS",
Stao=NA, hidden.layer="tansig",
output.layer="purelin",
method="ADAPTgdwm")
## We train the network according to P and target.
result <- train(net.start, P, target, error.criterium="LMS", report=TRUE, show.step=100, n.shows=5 )
## Several graphs, mainly to remark that
## now the trained network is is an element of the resulting list.
y <- sim(result$net, P)
plot(P,y, col="blue", pch="+")
points(P,target, col="red", pch="x")
Any suggestion is appreciated!

Resources