R: Associate values to tips in phylogeny - r

I have a phylogeny and some data (traits values). I've reconstructed trait values for all nodes using ace in caper.
I used makeNodeLabel in ape to associate the reconstructed trait values with their appropriate nodes.
What I want to do is to export a nexus file (phylogeny) from R that contains both the node values (recontructed values) and the tip labels (emperical data).
I want to use color codes (in FigTree) to indicate the values, but right now I'm only able to do this with nodes, i.e. the tip-branches do not have data and are hence not "color codable".
I need to associate values to the tips in order to do this, but I haven't been able to figure out how to do this. I also need all the data I associate to the phylogeny to be in a "similar category", i.e. similarly to for example how theta values from *BEAST are coded in nexus files.
Any and all help is greatly appreciated.

I've found a workaround:
Export a tree (nexus format) from R with reconstructed traits associated to the nodes. Open in FigTree and define the "label" trait as "trait". Import tip annotations from a text file which contain empirical data in a column with the header "trait". Then export the tree from FigTree to a new file with nexus-block and include annotations. Lasty, copy names from the name block in the nexus file (animal/organism[&Trait=2.35754]) and exchange the names with the ones in the coded tree. You will then have trait values coded through [&Trait=value] for both nodes and tips. Now you can color code the entire tree which includes the empirical values that are now associated to the tips of the phylogeny.
Sooo, that's a stupid way of doing it. If anyone has a better way, I'd love to hear it.

You could have a try 'phytools' package in the R. The function 'plotTree.wBars' could be conducted. Best wishes.

Related

Trait tracking using R, with pasted Excel sheet and phylogenetic tree

I wanted to create a phylogenetic tree where I can trace a certain dimension throughout clades. I followed the tutorial by Winternitz 2016, but now I run into some problems.
Here is what I did so far:
tablename<- read.table(file("clipboard"), header=TRUE)
library(adegenet)
library(ape)
library(caper)
library(devtools)
library(geiger)
library(picante)
library(phytools)
library(stringr)
library(TreeTools)
supertree<-ReadTntTree("Pathname", tipLabels =1:36)
plot(supertree,no.margin=TRUE,edge.width=2) #to check if my tree is displayed correctly
Now I have the problem that my tree has (created by TNT) numbers as represenatives of taxa instead of the taxa names. For the copied table I created a column for the number and the second one is the taxon which is represented by the number. Column 3,4 and 5 are filled with either measurements or NA (for not avaiable). The names of the columns are code (column 1), specimen (column 2), HFM (column 3), WFM (column 4) and Wpp (column 5)
My questions are now:
How can I replace the numbers in my plotted tree with their representative taxon name?
I personally find the commands in the pdf a bit confusing regarding using the table data for mapping traits. How can I create the connection between the pasted table/the dataset with the tree and how do I follow up then?
Thank you already for reading and I am looking forward for an answer
Sincerely
Edit: After the quick comment I also attached a link to the files I can provide. I hope this helps to reproduce my progress so far -
https://drive.google.com/drive/folders/1CJBwCrSIkFqO6qvh0UH0yEiWtDwNpK1B?usp=sharing
Your line ReadTntTree("Pathname", tipLabels = 1:36) reads the tree, using the numbers 1..36 to label the tips. But you want the leaves to be labelled with the taxon names.
Approach 1: Specify tip labels within R
Specify the names of the tips in ReadTntTree. For example, if you know that the order of tips in the TNT tree matches the order of rows in your table, use
taxonNames <- tablename[, 2]
print(taxonNames) # Check that the names are what you expect
supertree <- ReadTntTree("Pathname", tipLabels = taxonNames)
More laboriously, specify the taxon names by hand: replace the first line with
taxonNames <- c("first_taxon", "second_taxon", <...>)
Approach 2: Specify tip labels within TNT
(Only an option if you have control over the TNT process that is generating your tree file.)
Ask TNT to save the taxon labels in the tree output, using the taxname=; tsav*; TNT command – see
https://ms609.github.io/TreeTools/articles/load-trees.html#trees-from-tnt
Read the trees into R with supertree <- ReadTntTree("Pathname")
Approach 3: Load tip labels from original matrix
This approach assumes that the TNT matrix and output file are in the same place on your computer as they were when the TNT analysis is run. As such, it is the least reproducible approach -- handy for initial analysis, but less well suited to inclusion in publications.
Omit the "tipLabels" parameter entirely. Trees saved in TNT's default parenthetical notation (TNT command tsav*;, with taxname-; to omit taxon names) link to the matrix used to generate the trees, and can load taxon names from there.
If you open the tree file with a text editor you should see the path to the original matrix in the first line.
See the ReadTntTree() manual page for further details: for example, of how to use relative paths to the original matrix.

Do I need to create all nodes by hand in Neo4j?

I am probably missing something because I am very new to Neo4j, but looking at their Movie graph - probably the very first graph to play with when you are learning the platform - they give us a really big piece of code where every node and labels and properties are imputed by hand, one after the other. Ok, it seems fair to a small graph for learning purpose. But, how should I proceed when I want to import a CSV and create a graph from this data? I believe a hand-imput is not expected at all.
My data look something like this:
date
origin
destiny
value
type
balance
01-05-2021
A
B
500
transf
2500
It has more than 10 thousand rows like this.
I loaded it as:
LOAD CSV FROM "file:///MyData.csv" AS data
RETURN data;
and it worked. The data was loaded etc. But now I have some questions:
1- How do I proceeed if I want origin to be a node and destiny to be another node with type to be edges with value as property? I mean, I know how to create it like (a)->[]->(b) but how to create the entire graph without creating edge by edge, node by node, property by property etc...?
2- Am I able to select the date and see something like a time evolution for this graph? I want to see all transactions in 20-05-2021, 01-05-2021 etc and see how it evolves. Is it possible?
As example in the official docs says here: https://neo4j.com/docs/operations-manual/current/tutorial/neo4j-admin-import/#tutorial-neo4j-admin-import
You may want to create 3 separate files for the import:
First: you need the movies.csv to import nodes with label :Movie
movieId:ID,title,year:int,:LABEL
tt0133093,"The Matrix",1999,Movie
tt0234215,"The Matrix Reloaded",2003,Movie;Sequel
tt0242653,"The Matrix Revolutions",2003,Movie;Sequel
Second: you need actors.csv to import nodes with label :Actor
personId:ID,name,:LABEL
keanu,"Keanu Reeves",Actor
laurence,"Laurence Fishburne",Actor
carrieanne,"Carrie-Anne Moss",Actor
Finally, you can import relationships
As you see, actors and movies are already imported. So now you just need to specify the relationships. In the example, you're importing ROLE relationship in the given format:
:START_ID,role,:END_ID,:TYPE
keanu,"Neo",tt0133093,ACTED_IN
keanu,"Neo",tt0234215,ACTED_IN
keanu,"Neo",tt0242653,ACTED_IN
laurence,"Morpheus",tt0133093,ACTED_IN
laurence,"Morpheus",tt0234215,ACTED_IN
laurence,"Morpheus",tt0242653,ACTED_IN
carrieanne,"Trinity",tt0133093,ACTED_IN
carrieanne,"Trinity",tt0234215,ACTED_IN
carrieanne,"Trinity",tt0242653,ACTED_IN
So as you see in the header, you've got values:
START_ID - where the relationship starts, from which node
role - property name (you can specify multiple properties here, just make sure the csv format contains data for it)
:END_IN - where the relationship ends, to which node
:TYPE - type of the relationship
That's all :)

arcmap network analyst iteration over multiple files using model builder

I have 10+ files that I want to add to ArcMap then do some spatial analysis in an automated fashion. The files are in csv format which are located in one folder and named in order as "TTS11_path_points_1" to "TTS11_path_points_13". The steps are as follows:
Make XY event layer
Export the XY table to a point shapefile using the feature class to feature class tool
Project the shapefiles
Snap the points to another line shapfile
Make a Route layer - network analyst
Add locations to stops using the output of step 4
Solve to get routes between points based on a RouteName field
I tried to attach a snapshot of the model builder to show the steps visually but I don't have enough points to do so.
I have two problems:
How do I iterate this procedure over the number of files that I have?
How to make sure that every time the output has a different name so it doesn't overwrite the one form the previous iteration?
Your help is much appreciated.
Once you're satisfied with the way the model works on a single input CSV, you can batch the operation 10+ times, manually adjusting the input/output files. This easily addresses your second problem, since you're controlling the output name.
You can use an iterator in your ModelBuilder model -- specifically, Iterate Files. The iterator would be the first input to the model, and has two outputs: File (which you link to other tools), and Name. The latter is a variable which you can use in other tools to control their output -- for example, you can set the final output to C:\temp\out%Name% instead of just C:\temp\output. This can be a little trickier, but once it's in place it tends to work well.
For future reference, gis.stackexchange.com is likely to get you a faster response.

Is there a way to read a raw netcdf file and tell what layer a value belongs to?

I'm in the process of evaluating how successful a script I wrote is and kind of a quick and dirty method I've employed is looking at the first few values and last few values of a single variable and doing a few calculations with them based on the same values in another netcdf file.
I know that there are better ways to approach this but again, this is a really quick and dirty method that has worked for me so far. My question though is that by looking at the raw data through ncdump, is there a way to tell which vertical layer that data belongs to? In my example, the file has 14 layers. I"m assuming that the first few values are a part of the surface layer and the last few values are a part of the top layer, but I suspect that this assumption is wrong, at least in part.
As a follow-up question, what would then be the easiest 'proper' way to tell what layer data belongs to? Thank you in advance!
ncview and NCO are both very powerful and quick command line operators to view data inside a netcdf file.
ncview: http://meteora.ucsd.edu/~pierce/ncview_home_page.html
NCO: http://nco.sourceforge.net/
You can easily show variables over all layers for example with
ncks -d layer,0,13 some_infile.nc
ncdump dumps the data with the last dimension varying fastest (http://www.unidata.ucar.edu/software/netcdf/docs/netcdf/CDL-Syntax.html) so if 'layer' is the slowest/first dimension, the earlier values are all in the first layer, while the last few values are in the last layer.
As to whether the first layer is the top or bottom layer, you'd have to look to the 'layer' dimension and its data.

Best way to plot histogram or any other graphical interpretation

I have csv file with following data set:
gv,ca,level1,2
gv,bg,level1,1
zea,li,level1,1
zea,li,level3,1
zea,de,level1,26
zea,de,level3,5
zea,el,level1,1
zea,eo,level1,3
zea,en,level1,5
zea,en,level2,34
zea,en,level3,38
zea,en,level4,12
zea,es,level1,7
zea,la,level1,7
zea,zea,level1,5
zea,zea,level3,4
zea,stq,level1,1
zea,sk,level2,1
zea,nl,level4,4
zea,fr,level2,9
zea,fy,level2,1
cdo,cdo,level3,1
cdo,de,level1,23
cdo,de,level2,4
cdo,de,level3,4
cdo,eo,level1,1
cdo,eo,level2,1
cdo,eo,level3,3
cdo,en,level1,6
cdo,en,level2,31
cdo,en,level3,38
cdo,en,level4,17
cdo,es,level1,8
cdo,es,level2,6
cdo,es,level3,3
cdo,fr,level1,14
I want to build a histogram but some how the second column need to be incorporated in the histogram, the way you read the data is example: In gv we have two users with with ca experience level1, similarly in gv we have 1 user with bg experience level 1.
I know how to build histograms in R but I am trying rap around this thought in my head and trying to figure how to get this in to a graphical representation.
Like #Ben said, it is a little difficult to see what you're getting at here. You may need to reformat your data so that you have only have only one type of data (class) per table.

Resources