Phylocom non-ultrametric tree vs ultrametric tree - r

Again.
I need help one more time.
Nowadays I am getting into PHYLOCOM software for inferring characteristics of a phylogeny from different samples. This software allows you to calculate if your species are showing clustering or overdispersion within other populations in your analyses.
As input files you need a phylogenetic tree in NEWICK format and a sample file (.txt).
I have done two tests, one modifying my tree in R with 'ape' package this way:
compute.brlen(tree, main=expression(rho==10))
And the other one by this other 'ape' option:
tree$edge.length = tree$edge.length * 10
The first modification generates an output with an ultrametric tree while the second output is a non-ultrametric tree. If then I run PHYLOCOM itself by
phylocom comstruct
I get different results, not only in the values of the parameters, but also in the signification p-values.
My question is if anyone knows how should I run the PHYLOCOM to do these 'comstruct' analyses correctly, with an input of a ultrametric or non-ultrametric, and also what are the differences in running this in oone way or in another.
I know this is not a 'classical' question for stackoverflow forums, but maybe anyone that works with phylogeny could help me.
Thanks a lot.

I think I may be able to help, but unfortunately I cannot add comments to get more information so I will have to infer what you mean from the information given. I apologize if it doesnt help.
Firstly, you may want to consult the help for compute.brlen(). As there is no argument for "main" in this function. I think you have taken it from the example in the help file, but you may note that this is outside the compute.brlen function and in the plot function. It will give you a title in your plot.
To change the rho value in compute.brlen() you need to change the power argument.
For example:
compute.brlen(tree, power = 10)
This may be why you are getting different results for the different trees. Because there is no transformation being performed on your compute.brlen() tree.
I am not familiar with PHYLOCOM, so I can't help on that front. But ultrametric and non-ultra-metric trees will give different relationships between the tips of the tree, so I would not be surprised that they give different results. I should note that I am not super confident on the differences in the analysis of ultrametric and non-ultrametric trees, but from looking at the plotted differences I would assume that this is true.

Related

There are something that i don't know with this plot?

I am looking at this code, previously v-transformations were done and fitting VT-ARMA copula models, now here it is applying shapiro test to residuals and want to plot 4 graphs: 
https://i.stack.imgur.com/gTtBU.png
These 4 plots should come out of plot(vtcop, plotoption=3) etc... I have never used this argument plotoption, i think this argument is contained in the tscopula package ,but I have already done the necessary research on the help and read the pdf file that explains the tscopula package but there is no such "plotoption".
Can anyone tell me why it tells me unused argument at this point?
This code from by paper of AlexanderMcNeil: "Modelling Volatile Time Series with V-Transforms and Copulas".
Thank you very much. Good day.

Is repeated anova what i am looking for?

I'm studying the NDVI (normalized vegetation index) behaviour of some soils and cultivars. My database has 33 days of acquisition, 17 kind of soils and 4 different cultivars. I have built it in two different ways, that you can see attached. I am having troubles and errors with both the shapes.
The question first of all is: Is repeated anova the correct way of analyzing my data? I want to see if there are any differences between the behaviours of the different cultivars and the different soils. I've made an ANOVA for each day and there are statistical differecies in each day, but the results are not globally interesting due to the fact that I would like to investigate the whole year behaviour.
The second question then is: how can I perform it? I''ve tryed different tutorials but I had unexpected errors or I didn't manage to complete the analysis.
Last but not the least: I'm coding with R Studio.
Any help is appreciated, I'm still new to statistic but really interested in improving!
orizzontal database
vertical database
I believe you can use the ANOVA, but as always, you have to know if that really is what you're looking for. Either way, since this a plataform for programmin questions, I'll write a code that should work for the vertical version. However, since I don't have your data, I can't know for sure (for future reference, dput(data) creates easily importeable code for those trying to answer you).
summary(aov(suolo ~ CV, data = data))

R: Evaluate Gradient Boosting Machines (GBM) for Regression

Which are the best metrics to evaluate the fit of a GBM algorithm in R (metrics, graphs, ratios)? And how interpret them?
I think maybe you are overthinking this one! Take a step back and think about what matters... the error. You have forecasted values and you have observed values. the difference tells you most of what you need to know when comparing across models. Basic measures like MSE, MPE, etc. should do fine. If you are looking to refine within a given model, I would recommend taking a look at the gbm documentation. For example, you can pass your gbm model object to summary(), to get the relative influence of each of your variables. Additionally, you can find a lot of information in the documentation, so if you haven't taken a look, I would recommend doing so! I have posted the link at the bottom.
-Carmine
gbm_documentation

LightGBM plot tree not matching feature importance

I am plotting a model from lightgbm and am trying to view the plot tree. When I use plot.tree it works... however, the output of the tree does not match feature importance, nor does it match the # of leafs I have choosen in my optimzation of my parameters.
for example, Feature A is the most important feature in my feature importance plot, but this feature does not show up in my actual decision tree plot as a node to have a decision on. Also, one of my parameters is 22 leaves, but the tree plot has 24 leaves.
I am doing this within databricks environment using python.
any ideas what is happening?
I can't post code, sorry. anyone with a general idea of what is happening will help.
First of all, Ligthgbm is a Boosting ensemble method, which means that you create several tree in series.
So, which tree are you plotting? You have several trees, and only exploring one tree is not representative of how exactly the model works. For sure, if you check a few trees, your feature A should appear.
About different num_leaves, I don't have a clear answer. It makes no sense. I should have some code and output to analyze it (but I have seen in you comment that you can't provide it, don't worry). In theory, you shouldn't have any tree with more than 22 leaves if you specified this value... Anyway, you can try to use another hyperparameter: max_depth, which is quite similar, event better.

Extract sample of features used to build each tree in H2O

In GBM model, following parameters are used -
col_sample_rate
col_sample_rate_per_tree
col_sample_rate_change_per_level
I understand how the sampling works and how many variables get considered for splitting at each level for every tree. I am trying to understand how many times each feature gets considered for making a decision. Is there a way to easily extract all sample of features used for making a splitting decision from the model object?
Referring to the explanation provided by H2O, http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/col_sample_rate.html, is there a way to know 60 randomly chosen features for each split?
Thank you for your help!
If you want to see which features were used at a given split in a give tree you can navigate the H2OTree object.
For R see documentation here and here
For Python see documentation here
You can also take a look at this Blog (if this link ever dies just do a google search for H2OTree class)
I don’t know if I would call this easy, but the MOJO tree visualizer spits out a graphviz dot data file which is turned into a visualization. This has the information you are interested in.
http://docs.h2o.ai/h2o/latest-stable/h2o-genmodel/javadoc/overview-summary.html#viewing-a-mojo

Resources