How to customize clustering in mclust? - r

I am trying to use the mclust package of R. I want to cluster some data.
Here are the steps to what I have done :
Reading data :
mydata <- read.table("\Users......", row.names= 1, sep = "\t", header = TRUE)
Using mclust : library(mclust)
mydataModel <- Mclust(mydata)
summary(mydataModel)
It breaks into 7 clusters. However, I want my data to be broken only into 2 clusters. Please help on how to do ?

As mentioned by MrFlick, you should read the documentation by adding a ?function().
In your case, do ?Mclust() in your R console to see how default parameters have been set up.
This will show up once you do ?Mclust()
Mclust(data, G = NULL, modelNames = NULL,
prior = NULL,
control = emControl(),
initialization = NULL,
warn = mclust.options("warn"), ...)
All you need to do is:
Mclust(mydata, 2)

Related

Consistent error message while running grouping analysis in 'plspm' package

I am looking for some help in resolving an error using the partial least squares path modeling package ('plspm').
I can get results running a basic PLS-PM analysis but run into issues when using the grouping function, receiving the error message:
Error in if (w_dif < specs$tol || iter == specs$maxiter) break : missing value where TRUE/FALSE needed
I have no missing values and all variables have the proper classification. Elsewhere I read that there is a problem with processing observations with the exact same values across all variables, I have deleted those and still face this issue. I seem to be facing the issue only when I run the groups using the "bootstrap" method as well.
farmwood = read.csv("farmwood_groups(distance).csv", header = TRUE) %>%
slice(-c(119:123))
Control = c(0,0,0,0,0,0)
Normative = c(0,0,0,0,0,0)
B_beliefs = c(0,0,0,0,0,0)
P_control = c(1,0,0,0,0,0)
S_norm = c(0,1,0,0,0,0)
Behavior = c(0,0,1,1,1,0)
farmwood_path = rbind(Control, Normative, B_beliefs, P_control, S_norm, Behavior)
colnames(farmwood_path) = rownames(farmwood_path)
farmwood_blocks = list(14:18,20:23,8:13,24:27,19,4:7)
farmwood_modes = rep("A", 6)
farmwood_pls = plspm(farmwood, farmwood_path, farmwood_blocks, modes = farmwood_modes)
ames(farmwood)[names(farmwood) == "QB3"] <- "Distance"
farmwood$Distance <- as.factor(farmwood$Distance)
distance_boot = plspm.groups(farmwood_pls, farmwood$Distance, method = "bootstrap")
distance_perm = plspm.groups(farmwood_pls, farmwood$Distance, method = "permutation")
The data is contained here:
https://www.dropbox.com/s/8vewuupywpi1jkt/farmwood_groups%28distance%29.csv?dl=0
Any help would be appreciated. Thank you in advance

How to solve unused argument error as part of a meltAssay?

When using meltAssay to convert a SummarizedExperiment object, the central data infrastructure for microbiome analysis in Bioconductor, into long data.frame, I use the following lines according to the instructions in the book:
tse <- transformSamples(tse, method="relabundance")
molten_tse <- meltAssay(tse,
add_row_data = TRUE,
add_col_data = TRUE,
assay_name = "relabundance")
molten_tse
However, I get the following error:
Error in .melt_assay(x, abund_values, feature_name, sample_name, ...)
: unused argument (assay_name = "relabundance")
Use the following syntax instead: (abund_values instead of assay_names.
There seems to be an error in the book)
molten_tse <- meltAssay(tse,
add_row_data = TRUE,
add_col_data = TRUE,
abund_values = "relabundance")

What is the solution for “Labels in tree and data differ” and “object of type 'closure' is not subsettable” Errors in modelTest”?

I’m trying to get the evolutionary model for my translated DNA to Protein file by phangorn and ape tools. With the codes I've already used now I have Errors! I tried to change .fasta file format to .phy or .nex., installing the other versions of R (4.1.1, 4.1.2) and finally trying whatever the codes we have for modeltest in tutorials but they didn’t work out! My question is that why there would be difference between labels as the tree is created form the given file. For the second error, I didn’t square bracket but it count it as subtitled!
library(ape)
library(phangorn)
file="C:/Users/ItCenter/Desktop/n.fasta"
Dat=read.phyDat(file,format="fasta",type="AA")
mt = modelTest(Dat, tree = NULL, model = c ("WAG ", "JTT", "LG", "Dayhoff ", "cpREV", "mtmam", "mtArt", "MtZoa", "mtREV24", "VT", "RtREV", "HIVw", "HIVb", "FLU", "Blosum62", "Dayhoff_DCMut", "JTT_DCMut"), G = TRUE, I = TRUE, FREQ = FALSE, k = 4, control = pml.control(epsilon = 1e-08, maxit = 10, trace = 1), multicore = FALSE, mc.cores = NULL)
Error in modelTest(Dat, model = c("WAG", "JTT", "LG", "Dayhoff",
"cpREV", : Labels in tree and data differ!
modelTest(Dat,phyDat,NULL)
Error: object of type 'closure' is not subsettable
There was a bug in modelTest, which I seem to have introduced recently, if one did not supply a tree. It is fixed now in the development version now. Best you download this version:
remotes::install_github("KlausVigo/phangorn")
and than
modelTest(Dat)
should work. Or you compute a tree beforehand, e.g.
tree <- NJ(dist.ml(chloroplast, "LG"))
modelTest(Dat, tree)

arulesViz subscript out of bounds paracoord

I want to perform basket analysis and draw a paracoord plot however I receive an error.
Content of this error is: :
Error in m[j, i] : subscript out of bounds.In addition: Warning message:
In cbind(pl, pr) :
number of rows of result is not a multiple of vector length (arg 2)
I am using data from: Link.
First I am transforming this to fit basket analysis, name of the original excel files is Online_Retail:
library(arules)
library(arulesViz)
library(plyr)
items <- ddply(Online_Retail, c("CustomerID", "InvoiceDate"), function(df1)paste(df1$Description, collapse = ","))
items1 <- items["V1"]
write.csv(items1, "groceries1.csv", quote=FALSE, row.names = FALSE, col.names = FALSE)
trans1 <- read.transactions("groceries1.csv", format = "basket", sep=",",skip=1)
And to draw paracoord I have created such a code:
rules.trans2<-apriori(data=trans1, parameter=list(supp=0.001,conf = 0.05),
appearance=list(default="rhs", lhs="ROSES REGENCY TEACUP AND SAUCER"), control=list(verbose=F))
sorted.plot <- sort(rules.trans2, by="support", decreasing = TRUE)
plot(sorted.plot, method="paracoord", control=list(reorder=TRUE, verbose = TRUE))
Why my code for paracoord is not working? how can I fix it? What should I change?
This is, unfortunately, a bug in arulesViz. This will be fixed in the next release (arulesViz 1.3-3). The fix is already available in the development version on GitHub: https://github.com/mhahsler/arulesViz

R Network Package - node attributes

I've recently started to learn R, with specific interest in network modeling applications. I've made a sample dataset and would like to visualize it, eventually getting to rigorous statistical network analysis.
The example is a high school friendship network. The node attributes are found in HS1_Node_Attributes.csv and the adjacency matrix is found in HS1_adjacency_matrix. I'm able to visualize the network, though I'm having trouble with node attributes (characteristics of the people). I'm using the network package.
The error I get is as follows:
Error in set.vertex.attribute(g, vertex.attrnames[[i]], vertex.attr[[i]]) :
Inappropriate value given in set.vertex.attribute.
I have cross-referenced my example with some tutorials online, along with the R Network package documentation. The potential problem could have been the type of my attribute data frame, though I confirmed it was type list, which checks out. So I'm not sure what the problem is. Everything works fine (meaning I can successfully create a network object) if I don't take out node attributes (the vertex.attr and the vertex.attrnames arguments), showing me that the rest of the code is sound. My code is below.
high_school1_attributes <- read.table("HS1_Node_Attributes.csv", header = TRUE,
sep = ",")
high_school1_adj <- read.table("HS1_adjacency_matrix.csv", header = TRUE,
row.names = 1, sep = ",")
adj1 <- as.matrix(high_school1_adj)
library("network")
high_school1_network <- network(adj1, vertex.attr = high_school1_attributes,
vertex.attrnames = colnames(high_school1_attributes),
directed = FALSE, hyper = FALSE, loops = FALSE,
multiple = FALSE, bipartite = FALSE)
You could be automatically converting strings to factors. I can recreate the error by doing:
high_school1_attributes <- read.csv(text=
"Name,Color
Kermit,green
Piggy,pink
Gonzo,blue")
high_school1_adj <- read.csv(text=
",From,To
1,1,3
2,3,2
3,2,1",
row.names = 1)
adj1 <- as.matrix(high_school1_adj)
library("network")
high_school1_network <- network(
adj1,
vertex.attr = high_school1_attributes,
vertex.attrnames = colnames(high_school1_attributes),
directed = FALSE, hyper = FALSE, loops = FALSE,
multiple = FALSE, bipartite = FALSE)
And can fix it by replacing the first statement with:
high_school1_attributes <- read.csv(text=
"Name,Color
Kermit,green
Piggy,pink
Gonzo,blue",
stringsAsFactors=FALSE)
Which you can see works by plotting:
library(igraph)
library(intergraph)
hs_graph <- asIgraph(high_school1_network)
plot(hs_graph, vertex.size=8,
vertex.color=V(hs_graph)$Color,
vertex.label=V(hs_graph)$Name,
edge.arrow.size=0.25,layout=layout.fruchterman.reingold)

Resources