Bootstrap for phylogenetic tree generated using Mahalanobis distance (R) - r

I created a phylogenetic NJ tree in R using the ape package. My data contains metric measures from multiple individuals belonging to known groups. Thus, I decided to calculate the Mahalanobis distance between these groups in order to incorporate the covariance structure in my analyses. Creating the nj tree thus was not a problem.
require(ape)
lda <- lda(y, as.factor(ynames))
dist <- dist(as.matrix(predict(lda,lda$mean)$x),upper=T,diag=T)
plot(nj(dist))
However, now I'd like to calculate some bootstrap values for branch splits. I'd use the boot.phylo function, but here I have no idea how I can deal with the FUN (function) command, and thus with the correct calculation of Mahalanobis distances for the bootstrapped data set.

Related

Multiple weighted matrices in a SLX model in R

Is there any packages or commands allow multiple weighted matrices in a spatial lagged X (SLX) model?
I want to include two different weighted matrices with one dependent variable, but I cannot find any packages for it?
Theoretically, in spatial analysis, including multiple W matrices are not appropriate? If it is possible, how can I conduct analysis with W1 and W2? Do I have to do it by hand?(I meant, once create the lagged variable by multiplying W matrix and the key DV, and and run a OLS regression with the variables. Is it the right way applying multiple weighted matrices?
Thanks!
Dongjin

How to do Hierarchical Clustering for Ordinal data-set in R?

I am trying to do Hierarchical clustering on a dataset where the columns are ordinal on the scale of 1 to 5.
Based on Hierarchical clustering can be done using hclust() function.
For doing analysis with ordinal data, we should use "Max" distance or Chebyshev distance method.
But which Linkage should I use with Chebyshev distance as most of the Linkage using squared Euclidean distance. like following linkage methods - Ward, Centroid and Median use squared Euclidean distance.
Linkage - ward.D, ward.D2, Single, Complete, Average, Centroid, median.
So what Linkage should I use with Chebyshev distance to do hierarchical clustering for Ordinal Data?

How to use bootstrapping and weighted data?

I have two variables that I'd like to analyze with a 2x2 table, which is easy enough.
datatable=table(data$Q1data1, data$Q1data2) summary(datatable)
However, I need to weight each variable separately using two frequency weighting variables that I have. So far, I've only found the wtd.chi.sq function in the weights package, which only allows you to weight both variables by the same weighting variable.
In addition, I need to perform this 2x2 chi-square 1000 times using bootstrapping or some resampling method, so that I can eventually peek at the distribution of p-values.

nearest shrunken centroids with sample weights

I am trying to train my data with nearest shrunken centroid classifier using pamr.train() function in pamr package of R. However, I also have a vector including sample weights except the training data. Is there any way to use this function with considering these sample weights?
Or, is there a way to obtain the source code of this function. If so, I can write the codes for weighted mean and weighted variances instead of the unweighted ones.
Thank you,

Clustering Variables

What are some proven methods for finding groupings of highly correlated variables within a large, high-dimensional binary dataset (think 200,000+ rows and 150+ fields) that can be easily implemented in R? I want to find groupings of variables which lends itself to interpretation so I don't think PCA would be the best method.
library(Hmisc)
mtc <- mtcars[,2:8]
mtcn <- data.matrix(mtc)
clust <- varclus(mtcn)
clust
plot(clust)
?varclus :Does a hierarchical cluster analysis on variables, using the Hoeffding D statistic, squared Pearson or Spearman correlations, or proportion of observations for which two variables are both positive as similarity measures. Variable clustering is used for assessing collinearity, redundancy, and for separating variables into clusters that can be scored as a single variable, thus resulting in data reduction.
For Binary Vraibles:
library(cluster)
data(animals)
ma <- mona(animals)
ma
plot(ma)
?mona : Returns a list representing a divisive hierarchical clustering of a dataset with binary variables only.

Resources