Normalizing a Phylogenetic Tree in R - r

When working with phylogenetic tree data in R (specifically when working with "phylo" or "phylo4" objects) it would be useful to normalize branch lengths so that certain taxa (the ones that evolve faster) do not contribute a disproportionate amount of branch length to the tree. This seems to be common in computing UniFrac values, as can be found in the discussion here: http://bmf2.colorado.edu/unifrac/help.psp. (I need more than just UniFrac values, however).
However, I cannot find a function that performs this normalization step. I have looked in ape, picante, adephylo, and phylobase. Could someone direct me to a package that includes this function, or a package that makes writing this kind of function straightforward?

Are you looking for a function to just scale the branch lengths of a tree? If so, compute.brlen() in ape will do it. There are built in options for Grafen's rho and all = 1. You can also supply your own function.
I don't know if UniFrac does some other kind of branch length scaling. But if so, you could write your function and pass it.

Related

How to add zoom option for wordcloud in Shiny (with reproducible example)

Could you please help me to add zooming option for wordcloud
Please find reproducible example #
´http://shiny.rstudio.com/gallery/word-cloud.html´
I tried to incorporate rbokeh and plotly but couldnt find wordcloud equivalent render function
Additionally, I found ECharts from github #
´https://github.com/XD-DENG/ECharts2Shiny/tree/8ac690a8039abc2334ec06f394ba97498b518e81´
But incorporating this ECharts are also not convenient for really zoom.
Thanks in advance,
Abi
Normalisation is required only if the predictors are not meant to be comparable on the original scaling. There's no rule that says you must normalize.
PCA is a statistical method that gives you a new linear transformation. By itself, it loses nothing. All it does is to give you new principal components.
You lose information only if you choose a subset of those principal components.
Usually PCA includes centering the data as a Pre Process Step.
PCA only arranges the data in its own Axis (Eigne Vectors) System.
If you use all axis you lose no information.
Yet, usually we want to apply Dimensionality Reduction, intuitively, having less coordinates for the data.
This process means projecting the data into Sub Space which is spanned by only some of the Eigen Vectors of the data.
If one chose wisely the number of vectors one might end up with a significant reduction in the number of dimensions of the data with negligible loss of data / information.
The way to do so is by choosing Eigen Vectors which their Eigen Values sum to most of the data power.
PCA itself is invertible, so lossless.
But:
It is common to drop some components, which will cause a loss of information.
Numerical issues may cause a loss in precision.

Determine number of factors in EFA (R) using Comparison Data

I am looking for ways to determine number of optimal factors in R factanal function. The most used method (conduct a pca and use scree plot to determine the number of factors) is already known to me. I have found a method described here to be easier for non technical folks like me. Unfortunately the R script is no longer accessible in which the method was implemented. I was wondering if there is a package available in R that does the same?
The method was originally proposed in this study: Determining the number of factors to retain in an exploratory factor analysis using comparison data of known factorial structure.
The R code is now moved here as per the author.
EFA.dimensions ist also a nice and easy to use package for that

In the R's Kohonen package for self organizing maps, what does the `codes` value of the som object represent?

I'm generating a self-organizing map in R using the kohonen package. However, when looking at the documentation, I cannot find a clear understanding of what the codes property of the som object represents.
The documentation only states:
codes: a matrix of code vectors.
What is a matrix of code vectors in this context?
If it works like other SOM packages do, I believe the code value you mention refers to codebook vectors. Here's a good resource that explains how those work:
The codebook vectors themselves represent prototypes (points) within
the domain, whereas the topological structure imposes an ordering
between the vectors during the training process.
From http://www.cleveralgorithms.com/nature-inspired/neural/som.html
I would recommend reading the original paper that accompanied the kohnen package, which you can find here: https://www.jstatsoft.org/article/view/v021i05/v21i05.pdf
It provides quite a bit more detail than the R-docs.

Determining the structure of an 'MCA' object in R and calculating 2d coordinate distances

I have an MCA object generated from function MCA in package missMDA, which returns several types of results from Multiple Correspondence Analysis. Of these, I want to be able to use the 'dist' function, if appropriate, to calculate all pairwise 2d distances among the coordinates. Before I can do that, it seems that I need to figure out how to specifically reference the vectors of X and Y coordinates from this object, but when I ask for mydata$var$coord I get an unruly list of values, and I'm not sure how to send the results to an appropriate format that the dist function can use.
I am also interested in learning how to understand the structure of different kinds of objects in general, so that I have will a clearer roadmap for referencing their components in the future (and don't have to come groveling back to all of you seeking help with that!).
My apologies if I haven't stated my question clearly enough. Thanks in advance!
Figured out what to do, which is (apparently) R 101:
names(mydata)
Gave me the appropriate information about the components of this object (a list of type MCA). From this, I was able to reference the component of interest -
mydata$ind$coord
mydata$var$coord
I guess it pays to be patient!

decision trees with forced structure

I have been using decision trees (CART) in R using the rpart package to look at the relationship between SST (predictor variables) and climate (predictand variable).
I would like to "force" the tree into a particular structure - i.e. split on predictor variable 1, then on variable 2.
I've been using R for a while so I thought I'd be able to look at the code behind the rpart function and modify it to search for 'best splits' in a particular predictor variable first. However the rpart function calls C routines and not having any experience with C I get lost here...
I could write a function from scratch but would like to avoid it if possible! So my questions are:
Is there another decision tree technique (implemented in R
preferably) in which you can force the structure of the tree?
If not - is there some way I could convert the C code to R?
Any other ideas?
Thanks in advance, and help is much appreciated.
When your data indicates a tree with a known structure, present that structure to R using either a newick or nexus file format. Then you can read in the structure using either read.tree or read.nexus from Package Phylo.
Maybe you should look at the method formal parameter of rpart
In the documentation :
... ‘method’ can be a list of functions named ‘init’, ‘split’ and ‘eval’. Examples are given in the file ‘tests/usersplits.R’ in the sources.

Resources