How do I calculate weighted degree distributions with igraph in R? - r

Consider a dataframe df where the first two columns are node pairs and successive columns V1, V2, ..., Vn represent flows between the nodes (potentially 0, implying no edge for that column's network). I would like to conduct analysis on degree, community detection, and other network measures using the flows as weights.
Then to analyze the graph with respect to the weights in V1 I do:
# create graph and explore unweighted degrees with respect to V1
g <- graph.data.frame( df[df$V1!=0,] )
qplot(degree(g))
x <- 0:max(degree(g))
qplot(x,degree.distribution(g))
# set weights and explore weighted degrees using V1
E(g)$weights <- E(g)$V1
qplot(degree(g))
The output from the third qplot is no different than the first. What am I doing wrong?
Update:
So graph.strength is what I am looking for, but graph.strength(g) in my case gives standard degree output followed by:
Warning message:
In graph.strength(g) :
At structural_properties.c:4928 :No edge weights for strength calculation,
normal degree
I must be setting the weights incorrectly, is it not sufficient to do E(g)$weights <- E(g)$V1 and why can g$weights differ from E(g)$weights?

The function graph.strength can be given a weights vector with the weights argument. I think what is going wrong in your code is that you should call the weights attribute E(g)$weight not E(g)$weights.

I created an equivalent degree.distribution function for weighted graphs for my own code by taking the degree.distribution code and making one change:
graph.strength.distribution <- function (graph, cumulative = FALSE, ...)
{
if (!is.igraph(graph)) {
stop("Not a graph object")
}
# graph.strength() instead of degree()
cs <- graph.strength(graph, ...)
hi <- hist(cs, -1:max(cs), plot = FALSE)$density
if (!cumulative) {
res <- hi
}
else {
res <- rev(cumsum(rev(hi)))
}
res
}

Related

Principal component analysis using R. Automatic and manual results do not match

Two different methods of the principal component analysis were conducted to analyze the following data (ch082.dat) using the Box1's R-code, below.https://drive.google.com/file/d/1xykl6ln-bUnXIs-jIA3n5S3XgHjQbkWB/view?usp=sharing
The first method uses the rotation matrix (See 'ans_mat' under the '#rotated data' of the Box1's code) and,
the second method uses the 'pcomp' function (See 'rpca' under the '#rotated data' of the Box1's code).
However, there is a subtle discrepancy in the answer between the method using the rotation matrix and the method using the 'pcomp' function.
make it match
My Question
What should I do so that the result of the rotation matrix -based method matches the result of the'pcomp' function?
As far as I've tried with various data, including other data, the actual discrepancies seem to be limited to scale shifts and mirroring transformations.
The results of the rotation matrix -based method is shown in left panel.
The results of the pcomp function -based method is shown in right panel.
Mirror inversion can be seen in "ch082.dat" data.(See Fig.1);
It seems that, in some j, the sign of the "jth eigenvector of the correlation matrix" and the sign of the "jth column of the output value of the prcomp function" may be reversed. If there is a degree of overlap in the eigenvalues, it is possible that the difference may be more complex than mirror inversion.
Fig.1
There is a scale shift for the Box2's data (See See Fig.2), despite the centralization and normalization to the data.
Fig.2
Box.1
#dataload
##Use the 'setwd' function to specify the directory containing 'ch082.dat'.
##For example, if you put this file directly under the C drive of your Windows PC, you can run the following command.
setwd("C:/") #Depending on where you put the file, you may need to change the path.
getwd()
w1<-read.table("ch082.dat",header = TRUE,row.names = 1,fileEncoding = "UTF-8")
w1
#Function for standardizing data
#Thanks to https://qiita.com/ohisama2/items/5922fac0c8a6c21fcbf8
standalize <- function(data)
{ for(i in length(data[1,]))
{
x <- as.matrix(data[,i])
y <- (x-mean(x)/sd(x))
data[,i] <- y
}
return(data)}
#Method using rotation matrix
z_=standalize(w1)
B_mat=cor(z_) #Compute correlation matrix
eigen_m <- eigen(B_mat)
sample_mat <- as.matrix(z_)
ans_mat=sample_mat
for(j in 1:length(sample_mat[1,])){
ans_mat[,j]=sample_mat%*%eigen_m$vectors[,j]
}
#Method using "rpca" function
rpca <- prcomp(w1,center=TRUE, scale=TRUE)
#eigen vectors
eigen_m$vectors
rpca
#rotated data
ans_mat
rpca$x
#Graph Plots
par(mfrow=c(1,2))
plot(
ans_mat[,1],
ans_mat[,2],
main="Rotation using eigenvectors"
)
plot(rpca$x[,1], rpca$x[,2],
main="Principal component score")
par(mfrow=c(1,1))
#summary
summary(rpca)$importance
Box2.
sample_data <- data.frame(
X = c(2,4, 6, 5,7, 8,10),
Y = c(6,8,10,11,9,12,14)
)
X = c(2,4, 6, 5,7, 8,10)
Y = c(6,8,10,11,9,12,14)
plot(Y ~ X)
w1=sample_data
Reference
https://logics-of-blue.com/principal-components-analysis/
(Written in Japanease)
The two sets of results agree. First we can simplify your code a bit. You don't need your function or the for loop:
z_ <- scale(w1)
B_mat <- cor(z_)
eigen_m <- eigen(B_mat)
ans_mat <- z_ %*% eigen_m$vectors
Now the prcomp version
z_pca <- prcomp(z_)
z_pca$sdev^2 # Equals eigen_m$values
z_pca$rotation # Equals eigen_m$vectors
z_pca$x # Equals ans_mat
Your original code mislabeled ans_mat columns. They are actually the principal component scores. You can fix that with
colnames(ans_mat) <- colnames(z_pca$x)
The pc loadings (and therefore the scores) are not uniquely defined with respect to reflection. In other words multiplying all of the loadings or scores in one component by -1 flips them but does not change their relationships to one another. Multiply z_pca$x[, 1] by -1 and the plots will match:
z_pca$x[, 1] <- z_pca$x[, 1] * -1
dev.new(width=10, height=6)
par(mfrow=c(1,2))
plot(ans_mat[,1], ans_mat[,2], main="Rotation using eigenvectors")
plot(z_pca$x[,1], z_pca$x[,2], main="Principal component score")

How to permute a network in igraph for R?

I'm trying to write a code for a Monte Carlo procedure in R. My goal is to estimate the significance of a metric calculated for a weighted, unipartite, undirected network formatted for the package igraph.
So far, I included the following steps in the code:
1. Create the weighted, unipartite, undirected network and calculate the observed Louvain modularity
nodes <- read.delim("nodes.txt")
links <- read.delim("links.txt")
anurosnet <- graph_from_data_frame(d=links, vertices=nodes, directed=F)
anurosnet
modularity1 = cluster_louvain(anurosnet)
modularity1$modularity #observed value
obs=modularity1$modularity
obs
real<-data.frame(obs)
real
2. Create the empty vector
Nperm = 9 #I am starting with a low n, but intend to use at least 1000 permutations
randomized.modularity=matrix(nrow=length(obs),ncol=Nperm+1)
row.names(randomized.modularity)=names(obs)
randomized.modularity[,1]=obs
randomized.modularity
3. Permute the original network preserving its characteristics, calculate the Louvain modularity for all randomized networks, and compile the results in the vector
i<-1
while(i<=Nperm){
randomnet <- rewire(anurosnet, with=each_edge(0.5)) #rewire vertices with constant probability
E(randomnet)$weight <- sample(E(anurosnet)$weight) #shuffle initial weights and assign them randomly to edges
mod<-(cluster_louvain(randomnet))
mod$modularity
linha = mod$modularity
randomized.modularity[,i+1]=linha
print(i)
i=i+1
}
randomized.modularity #Here the result is not as expected
4. Plot the observed value against the distribution of randomized values
niveis<-row.names(randomized.modularity)
for(k in niveis)
{
if(any(is.na(randomized.modularity[k,]) == TRUE))
{
print(c(k, "metrica tem NA"))
} else {
nome.arq<- paste("modularity",k,".png", sep="")
png(filename= nome.arq, res= 300, height= 15, width=21, unit="cm")
plot(density(randomized.modularity[k,]), main="Observed vs. randomized",)
abline(v=obs[k], col="red", lwd=2, xlab="")
dev.off()
print(k)
nome.arq<- paste("Patefield_Null_mean_sd_",k,".txt", sep="")
write.table(cbind(mean(randomized.modularity[k,]),sd(randomized.modularity[k,])), file=paste(nome.arq,sep=""),
sep=" ",row.names=TRUE,col.names=FALSE)
}
}
5. Estimate the P-value (significance)
significance=matrix(nrow=nrow(randomized.modularity),ncol=3)
row.names(significance)=row.names(randomized.modularity)
colnames(significance)=c("p (rand <= obs)", "p (rand >= obs)", "p (rand=obs)")
signif.sup=function(x) sum(x>=x[1])/length(x)
signif.inf=function(x) sum(x<=x[1])/length(x)
signif.two=function(x) ifelse(min(x)*2>1,1,min(x)*2)
significance[,1]=apply(randomized.modularity,1,signif.inf)
significance[,2]=apply(randomized.modularity,1,signif.sup)
significance[,3]=apply(significance[,-3],1,signif.two)
significance
Something is going wrong in step 3. I expected the vector to be filled with 10 values, but for some reason it stops after a while.
The slot "mod$modularity" suddenly receives 2 values instead of 1.
The two TXT files mentioned in the beginning of the code can be downloaded from here:
https://1drv.ms/t/s!AmcVKrxj94WClv8yQyqyl4IWk5mNvQ
https://1drv.ms/t/s!AmcVKrxj94WClv8z_Pow5Tg2U7mjLw
Could you please help me?
Your error is due to a mismatch in dimensions with your randomized.modularity matrix and some of your randomized modularity results. In your example your matrix end up being [1 x Nperm] however sometimes 2 modularity scores are returned during the permutations. To fix this I simply store the results in a list. The rest of your analysis will need to be adjusted since you have a mismatch of modularity scores.
library(igraph)
nodes <- read.delim("nodes.txt")
links <- read.delim("links.txt")
anurosnet <- graph_from_data_frame(d=links, vertices=nodes, directed=F)
anurosnet
modularity1 = cluster_louvain(anurosnet)
modularity1$modularity #observed value
obs <- modularity1$modularity
obs
real<-data.frame(obs)
real
Nperm = 100 #I am starting with a low n, but intend to use at least 1000 permutations
#randomized.modularity <- matrix(nrow=length(obs),ncol=Nperm+1)
#row.names(randomized.modularity) <- names(obs)
randomized.modularity <- list()
randomized.modularity[1] <- obs
randomized.modularity
for(i in 1:Nperm){
randomnet <- rewire(anurosnet, with=each_edge(0.5)) #rewire vertices with constant probability
E(randomnet)$weight <- sample(E(anurosnet)$weight) #shuffle initial weights and assign them randomly to edges
mod <- (cluster_louvain(randomnet))
mod$modularity
linha = mod$modularity
randomized.modularity <- c(randomized.modularity, list(linha))
}
randomized.modularity
Better way to write the loop
randomized.modularity <- lapply(seq_len(Nperm), function(x){
randomnet <- rewire(anurosnet, with=each_edge(0.5)) #rewire vertices with constant probability
E(randomnet)$weight <- sample(E(anurosnet)$weight) #shuffle initial weights and assign them randomly to edges
return(cluster_louvain(randomnet)$modularity)
})

How to predict cluster labeling using DBSCAN object and Gower distance matrix for new data in R

I'm having issue with predicting cluster labeling for a test data, based on a dbscan clustering model on the training data.
I used gower distance matrix when creating the model:
> gowerdist_train <- daisy(analdata_train,
metric = "gower",
stand = FALSE,
type = list(asymm = c(5,6)))
Using this gowerdist matrix, the dbscan clustering model created was:
> sb <- dbscan(gowerdist_train, eps = .23, minPts = 50)
Then I try to use predict to label a test dataset using the above dbscan object:
> predict(sb, newdata = analdata_test, data = analdata_train)
But I receive the following error:
Error in frNN(rbind(data, newdata), eps = object$eps, sort = TRUE,
...) : x has to be a numeric matrix
I can take a guess on where this error might be coming from, which is probably due to the absence of the gower distance matrix that hasn't been created for the test data.
My question is, should I create a gower distance matrix for all data (datanal_train + datanal_test) separately and feed it into predict? how else would the algorithm know what the distance of test data from the train data is, in order to label?
In that case, would the newdata parameter be the new gower distance matrix that contains ALL (train + test) data? and the data parameter in predict would be the training distance matrix, gowerdist_train?
What I am not quite sure about is how would the predict algorithm distinguish between the test and train data set in the newly created gowerdist_all matrix?
The two matrices (new gowerdist for all data and the gowerdist_train) would obviously not have the same dimensions. Also, it doesn't make sense to me to create a gower distance matrix only for test data because distances must be relative to the test data, not the test data itself.
Edit:
I tried using gower distance matrix for all data (train + test) as my new data and received an error when fed to predict:
> gowerdist_all <- daisy(rbind(analdata_train, analdata_test),
metric = "gower",
stand = FALSE,
type = list(asymm = c(5,6)))
> test_sb_label <- predict(sb, newdata = gowerdist_all, data = gowerdist_train)
ERROR: Error in 1:nrow(data) : argument of length 0 In addition:
Warning message: In rbind(data, newdata) : number of columns of
result is not a multiple of vector length (arg 1)
So, my suggested solution doesn't work.
I decided to create a code that would use KNN algorithm in dbscan to predict cluster labeling using gower distance matrix. The code is not very pretty and definitely not programmaticaly efficient but it works. Happy for any suggestions that would improve it.
The pseydocode is:
1) calculate new gower distance matrix for all data, including test and train
2) use the above distance matrix in kNN function (dbscan package) to determine the k nearest neighbours to each test data point.
3) determine the cluster labels for all those nearest points for each test point. Some of them will have no cluster labeling because they are test points themselves
4) create a count matrix to count the frequency of clusters for the k nearest points for each test point
5) use very simple likelihood calculation to choose the cluster for the test point based on its neighbours clusters (the maximum frequency). this part also considers the neighbouring test points. That is, the cluster for the test point is chosen only when the maximum frequency is largest when you add the number of neighbouring test points to the other clusters. Otherwise, it doesn't decide the cluster for that test point and waits for the next iteration when hopefully more of its neighboring test points have had their cluster label decided based on their neighbours.
6) repeat above (steps 2-5) until you've decided all clusters
** Note: this algorithm doesn't converge all the time. (once you do the math, it's obvious why that is) so, in the code i break out of the algorithm when the number of unclustered test points doesn't change after a while. then i repeat 2-6 again with new knn (change the number of nearest neighbours and then run the code again). This will ensure more points are involved in deciding in th enext round. I've tried both larger and smaller knn's and both work. Would be good to know which one is better. I haven't had to run the code more than twice so far to decide the clusters for the test data point.
Here is the code:
#calculate gower distance for all data (test + train)
gowerdist_test <- daisy(all_data[rangeofdataforgowerdist],
metric = "gower",
stand = FALSE,
type = list(asymm = listofasymmvars),
weights = Weights)
summary(gowerdist_test)
Then use the code below to label clusters for test data.
#library(dbscan)
# find the k nearest neibours for each point and order them with distance
iteration_MAX <- 50
iteration_current <- 0
maxUnclusterRepeatNum <- 10
repeatedUnclustNum <- 0
unclusteredNum <- sum(is.na(all_data$Cluster))
previousUnclustereNum <- sum(is.na(all_data$Cluster))
nn_k = 30 #number of neighbourhoods
while (anyNA(all_data$Cluster) & iteration_current < iteration_MAX)
{
if (repeatedUnclustNum >= maxUnclusterRepeatNum) {
print(paste("Max number of repetition (", maxUnclusterRepeatNum ,") for same unclustered data has reached. Clustering terminated unsuccessfully."))
invisible(gc())
break;
}
nn_test <- kNN(gowerdist_test, k = nn_k, sort = TRUE)
# for the TEST points in all data, find the closets TRAIN points and decide statistically which cluster they could belong to, based on the clusters of the nearest TRAIN points
test_matrix <- nn_test$id[1: nrow(analdata_test),] #create matrix of test data knn id's
numClusts <- nlevels(as.factor(sb_train$cluster))
NameClusts <- as.character(levels(as.factor(sb_train$cluster)))
count_clusters <- matrix(0, nrow = nrow(analdata_test), ncol = numClusts + 1) #create a count matrix that would count number of clusters + NA
colnames(count_clusters) <- c("NA", NameClusts) #name each column of the count matrix to cluster numbers
# get the cluster number of each k nearest neibhour of each test point
for (i in 1:nrow(analdata_test))
for (j in 1:nn_k)
{
test_matrix[i,j] <- all_data[nn_test$id[i,j], "Cluster"]
}
# populate the count matrix for the total clusters of the neighbours for each test point
for (i in 1:nrow(analdata_test))
for (j in 1:nn_k)
{
if (!is.na(test_matrix[i,j]))
count_clusters[i, c(as.character(test_matrix[i,j]))] <- count_clusters[i, c(as.character(test_matrix[i,j]))] + 1
else
count_clusters[i, c("NA")] <- count_clusters[i, c("NA")] + 1
}
# add NA's (TEST points) to the other clusters for comparison
count_clusters_withNA <- count_clusters
for (i in 2:ncol(count_clusters))
{
count_clusters_withNA[,i] <- t(rowSums(count_clusters[,c(1,i)]))
}
# This block of code decides the maximum count of cluster for each row considering the number other test points (NA clusters) in the neighbourhood
max_col_countclusters <- apply(count_clusters,1,which.max) #get the column that corresponds to the maximum value of each row
for (i in 1:length(max_col_countclusters)) #insert the maximum value of each row in its associated column in count_clusters_withNA
count_clusters_withNA[i, max_col_countclusters[i]] <- count_clusters[i, max_col_countclusters[i]]
max_col_countclusters_withNA <- apply(count_clusters_withNA,1,which.max) #get the column that corresponds to the maximum value of each row with NA added
compareCountClust <- max_col_countclusters_withNA == max_col_countclusters #compare the two count matrices
all_data$Cluster[1:nrow(analdata_test)] <- ifelse(compareCountClust, NameClusts[max_col_countclusters - 1], all_data$Cluster) #you subtract one because of additional NA column
iteration_current <- iteration_current + 1
unclusteredNum <- sum(is.na(all_data$Cluster))
if (previousUnclustereNum == unclusteredNum)
repeatedUnclustNum <- repeatedUnclustNum + 1
else {
repeatedUnclustNum <- 0
previousUnclustereNum <- unclusteredNum
}
print(paste("Iteration: ", iteration_current, " - Number of remaining unclustered:", sum(is.na(all_data$Cluster))))
if (unclusteredNum == 0)
print("Cluster labeling successfully Completed.")
invisible(gc())
}
I guess you can use this for any other type of clustering algorithm, it doesn't matter how you decided the cluster labels for the train data, as long as they are in your all_data before running the code.
Hope this help.
Not the most efficient or rigorous code. So, happy to see suggestions how to improve it.
*Note: I used t-SNE to compare the clustering of train with the test data and looks impressively clean. so, it seems it is working.

Different results when performing PCA in R with princomp() and principal ()

I tried to use princomp() and principal() to do PCA in R with data set USArressts. However, I got two different results for loadings/rotaion and scores.
First, I centered and normalised the original data frame so it is easier to compare the outputs.
library(psych)
trans_func <- function(x){
x <- (x-mean(x))/sd(x)
return(x)
}
A <- USArrests
USArrests <- apply(USArrests, 2, trans_func)
princompPCA <- princomp(USArrests, cor = TRUE)
principalPCA <- principal(USArrests, nfactors=4 , scores=TRUE, rotate = "none",scale=TRUE)
Then I got the results for the loadings and scores using the following commands:
princompPCA$loadings
principalPCA$loadings
Could you please help me to explain why there is a difference? and how can we interprete these results?
At the very end of the help document of ?principal:
"The eigen vectors are rescaled by the sqrt of the eigen values to produce the component loadings more typical in factor analysis."
So principal returns the scaled loadings. In fact, principal produces a factor model estimated by the principal component method.
In 4 years, I would like to provide a more accurate answer to this question. I use iris data as an example.
data = iris[, 1:4]
First, do PCA by the eigen-decomposition
eigen_res = eigen(cov(data))
l = eigen_res$values
q = eigen_res$vectors
Then the eigenvector corresponding to the largest eigenvalue is the factor loadings
q[,1]
We can treat this as a reference or the correct answer. Now we check the results by different r functions.
First, by function 'princomp'
res1 = princomp(data)
res1$loadings[,1]
# compare with
q[,1]
No problem, this function actually just return the same results as 'eigen'. Now move to 'principal'
library(psych)
res2 = principal(data, nfactors=4, rotate="none")
# the loadings of the first PC is
res2$loadings[,1]
# compare it with the results by eigendecomposition
sqrt(l[1])*q[,1] # re-scale the eigen vector by sqrt of eigen value
You may find they are still different. The problem is the 'principal' function does eigendecomposition on the correlation matrix by default. Note: PCA is not invariant with rescaling the variables. If you modify the code as
res2 = principal(data, nfactors=4, rotate="none", cor="cov")
# the loadings of the first PC is
res2$loadings[,1]
# compare it with the results by eigendecomposition
sqrt(l[1])*q[,1] # re-scale the eigen vector by sqrt of eigen value
Now, you will get the same results as 'eigen' and 'princomp'.
Summarize:
If you want to do PCA, you'd better apply 'princomp' function.
PCA is a special case of the Factor model or a simplified version of the factor model. It is just equivalent to eigendecomposition.
We can apply PCA to get an approximation of a factor model. It doesn't care about the specific factors, i.e. epsilons in a factor model. So, if you change the number of factors in your model, you will get the same estimations of the loadings. It is different from the maximum likelihood estimation.
If you are estimating a factor model, you'd better use 'principal' function, since it provides more functions, like rotation, calculating the scores by different methods, and so on.
Rescale the loadings of a PCA model doesn't affect the results too much. Since you still project the data onto the same optimal direction, i.e. maximize the variation in the resulting PC.
ev <- eigen(R) # R is a correlation matrix of DATA
ev$vectors %*% diag(ev$values) %*% t(ev$vectors)
pc <- princomp(scale(DATA, center = F, scale = T),cor=TRUE)
p <-principal(DATA, rotate="none")
#eigen values
ev$values^0.5
pc$sdev
p$values^0.5
#eigen vectors - loadings
ev$vectors
pc$loadings
p$weights %*% diag(p$values^0.5)
pc$loading %*% diag(pc$sdev)
p$loadings
#weights
ee <- diag(0,2)
for (j in 1:2) {
for (i in 1:2) {
ee[i,j] <- ev$vectors[i,j]/p$values[j]^0.5
}
};ee
#scores
s <- as.matrix(scale(DATA, center = T, scale = T)) %*% ev$vectors
scale(s)
p$scores
scale(pc$scores)

How do I weight variables with gower distance in r

I am new to R and am working on a data set including nominal, ordinal and metric data.
Therefore I am using the gower distance. In the next step I use this distance with hclust(x, method="complete") to create clusters based on this distance.
Now I want to know how I can put different weights on variables in the gower distance.
The documentation says:
daisy(x, metric = c("euclidean", "manhattan", "gower"), stand = FALSE, type = list(), weights = rep.int(1, p))
So there is a way, but I am unsure about the syntax (weights = ...).
The documentation of weights and rep.int, did not help.
I also didn't find any other helpful explanation.
I would be very glad, if some one can help out.
Not sure if this is what you are getting at, but...
Let's say you have 5 variables, e.g. 5 columns in your data frame or matrix. Then weights would be a vector of length=5 containing the weights for the corresponding columns.
The notation weights=rep.int(1,p) in the documentation simply means that the default value of weights is a vector of length p that has all 1's, eg. the weights are all equal to 1. Elsewhere in the documentation it explains that p is the number of columns.
Also, note that daisy(...) produces a dissimilarity matrix. This is what you use in hclust(...). So if x is a data frame or matrix with five columns for your variables, then:
d <- daisy(x, metric="gower", weights=c(1,2,3,4,5))
hc <- hclust(d, method="complete")
EDIT (Response to OP's comments)
The code below shows how the clustering depends on the weights.
clust.anal <- function(df,w,h) {
require(cluster)
d <- daisy(df, metric="gower", weights=w)
hc <- hclust(d, method="complete")
clust <- cutree(hc,h=h)
plot(hc, sub=paste("weights=",paste(wts,collapse=",")))
rect.hclust(hc,h=0.8,border="red")
}
df <- read.table("ExampleClusterData.csv", sep=";",header=T)
df[1] <- factor(df[[1]])
df[2] <- factor(df[[2]])
# weights increase with col number...
wts=c(1,2,3,4,5,6,7)
clust.anal(df,wts,h=0.8)
# weights decrease with col number...
wts=c(7,6,5,4,3,2,1)
clust.anal(df,wts,h=0.8)

Resources