Trait tracking using R, with pasted Excel sheet and phylogenetic tree - r

I wanted to create a phylogenetic tree where I can trace a certain dimension throughout clades. I followed the tutorial by Winternitz 2016, but now I run into some problems.
Here is what I did so far:
tablename<- read.table(file("clipboard"), header=TRUE)
library(adegenet)
library(ape)
library(caper)
library(devtools)
library(geiger)
library(picante)
library(phytools)
library(stringr)
library(TreeTools)
supertree<-ReadTntTree("Pathname", tipLabels =1:36)
plot(supertree,no.margin=TRUE,edge.width=2) #to check if my tree is displayed correctly
Now I have the problem that my tree has (created by TNT) numbers as represenatives of taxa instead of the taxa names. For the copied table I created a column for the number and the second one is the taxon which is represented by the number. Column 3,4 and 5 are filled with either measurements or NA (for not avaiable). The names of the columns are code (column 1), specimen (column 2), HFM (column 3), WFM (column 4) and Wpp (column 5)
My questions are now:
How can I replace the numbers in my plotted tree with their representative taxon name?
I personally find the commands in the pdf a bit confusing regarding using the table data for mapping traits. How can I create the connection between the pasted table/the dataset with the tree and how do I follow up then?
Thank you already for reading and I am looking forward for an answer
Sincerely
Edit: After the quick comment I also attached a link to the files I can provide. I hope this helps to reproduce my progress so far -
https://drive.google.com/drive/folders/1CJBwCrSIkFqO6qvh0UH0yEiWtDwNpK1B?usp=sharing

Your line ReadTntTree("Pathname", tipLabels = 1:36) reads the tree, using the numbers 1..36 to label the tips. But you want the leaves to be labelled with the taxon names.
Approach 1: Specify tip labels within R
Specify the names of the tips in ReadTntTree. For example, if you know that the order of tips in the TNT tree matches the order of rows in your table, use
taxonNames <- tablename[, 2]
print(taxonNames) # Check that the names are what you expect
supertree <- ReadTntTree("Pathname", tipLabels = taxonNames)
More laboriously, specify the taxon names by hand: replace the first line with
taxonNames <- c("first_taxon", "second_taxon", <...>)
Approach 2: Specify tip labels within TNT
(Only an option if you have control over the TNT process that is generating your tree file.)
Ask TNT to save the taxon labels in the tree output, using the taxname=; tsav*; TNT command – see
https://ms609.github.io/TreeTools/articles/load-trees.html#trees-from-tnt
Read the trees into R with supertree <- ReadTntTree("Pathname")
Approach 3: Load tip labels from original matrix
This approach assumes that the TNT matrix and output file are in the same place on your computer as they were when the TNT analysis is run. As such, it is the least reproducible approach -- handy for initial analysis, but less well suited to inclusion in publications.
Omit the "tipLabels" parameter entirely. Trees saved in TNT's default parenthetical notation (TNT command tsav*;, with taxname-; to omit taxon names) link to the matrix used to generate the trees, and can load taxon names from there.
If you open the tree file with a text editor you should see the path to the original matrix in the first line.
See the ReadTntTree() manual page for further details: for example, of how to use relative paths to the original matrix.

Related

How to get R to read my first column as a "header"?

I want to calculate diversity indices of different sampling sites in R. I have sites in the first row and the different species in the first column. However, R is reading the first column as normal data (not as a header so to speak).
Pics:
https://imgur.com/a/iBsFtbe
Code:
>Macro<-read.csv("C:\\Users\\Carly\\OneDrive\\Desktop\\Ecology >Projects\\Macroinvertebrates & Water >Quality\\Macro_RData\\Macroinvert\\MacroR\\MacroCSV.csv", header = T)
You need to add row.names = 1 to your command. This will indicate that row names are stored in column number 1.
Macro <- read.csv("<...>/MacroCSV.csv", header = TRUE, row.names = 1)
I sense that you are frustrated. As r2evans said, it is easier for people to help you if you provide them with the data in text form and not with screenshots - because we can't recreate the problem or try to solve it by loading a screenshot into R.
CSV files are just text, so you can open them with a text editor such as NotePad and copy and paste it here. You don't need the whole text - the columns and lines needed to reproduce the problem are enough. This was what we were looking for:
Site,Aeshnidae,Amnicolidae,Ancylidae,Asellidae
AN0119A,0,0,0,6,0
AN0143,0,0,0,0,0
Programming for many people is very frustrating when they start out, don't let this discourage you!
It looks like your data is in the wrong orientation for analysis in vegan - your species are the rows, and sites are columns. From your pics, it looks like you've spotted this issue and tried transposing, but are having issues with the placement of the headers.
Try reading your csv in, and specifying that the first column should be row names:
MacroDataDataFinal <- read.csv("Path/to/file.csv",
row.names=1)
Then transpose the data
MacroDataDataFinal_transposed <- t(MacroDataDataFinal)
Then try running the specaccum function:
library(vegan)
speccurve <- specaccum(comm=MacroDataDataFinal_transposed,
method="random",
permutation=1000)
Hopefully this will work. If you get any errors please let us know the code you typed, and the precise error message.

How to use Cheminformatics Toolkit for R to compare a set of SMILES structures

I have a set of SMILES codes of different molecules and I would like to know how to determine similarity among them. I have decided to use the ChemmineR package based on this tutorial. The issue is that I cannot understand how to connect my dataframe and use it like a ChemmineR object in order to run the analysis on SMILES.
DrugName<-c("alclofenac","alosetron")
DrugID_CID<-c("30951","2099")
DrugID<-c("CHEMBL94081","DB00969")
DrugBank<-c("DB13167","DB00969")
SMILES<-c("OC(=O)Cc1ccc(OCC=C)c(Cl)c1","Cc1[nH]cnc1CN1CCc2c(C1=O)c1ccccc1n2C")
Target<-c("PTGS1","HTR3A")
test<-data.frame(DrugName,DrugID_CID,DrugID,DrugBank,SMILES,Target)
I have used the read.SMIset function which imports one or many molecules from a SMILES file and stores them in a SMIset container but I cannot understand how to further proceed with this.
library("ChemmineR")
test; smiset <- smisample
write.SMI(smiset, file="sub.smi")
smiset <- read.SMIset("sub.smi")
data(smisample) # Loads the same SMIset provided by the library
smiset <- smisample
smiset
view(smiset)
cid(smiset)
smi <- as.character(smiset)
as(smi, "SMIset")
It's not entirely clear what you want to compare with what. However, here is one way to proceed with the SMILES in your example data frame.
First you need to convert the SMILES to a SDFset. This is the first step in most ChemmineR operations.
test_sdf <- smiles2sdf(test$SMILES)
For pairwise comparison using atom pairs, you need to convert again to an APset:
test_ap <- sdf2ap(test_sdf)
You could now compare, for example, the first compound in the APset with the second:
cmp.similarity(test_ap[1], test_ap[2])
[1] 0.1313131
I would spend some time reading and working through the Chemminer vignette linked in your question. It's a lot of information but it is well-presented, very clear and covers most things that you'll want to do.

Rstudio - how to write smaller code

I'm brand new to programming and an picking up Rstudio as a stats tool.
I have a dataset which includes multiple questionnaires divided by weeks, and I'm trying to organize the data into meaningful chunks.
Right now this is what my code looks like:
w1a=table(qwest1,talm1)
w2a=table(qwest2,talm2)
w3a=table(quest3,talm3)
Where quest and talm are the names of the variable and the number denotes the week.
Is there a way to compress all those lines into one line of code so that I could make w1a,w2a,w3a... each their own object with the corresponding questionnaire added in?
Thank you for your help, I'm very new to coding and I don't know the etiquette or all the vocabulary.
This might do what you wanted (but not what you asked for):
tbl_list <- mapply(table, list(qwest1, qwest2, quest3),
list(talm1, talm2, talm3) )
names(tbl_list) <- c('w1a', 'w2a','w3a')
You are committing a fairly typical new-R-user error in creating multiple similarly named and structured objects but not putting them in a list. This is my effort at pushing you in that direction. Could also have been done via:
qwest_lst <- list(qwest1, qwest2, quest3)
talm_lst <- list(talm1, talm2, talm3)
tbl_lst <- mapply(table, qwest_lst, talm_lst)
names(tbl_list) <- paste0('w', 1:3, 'a')
There are other ways to programmatically access objects with character vectors using get or wget.

CSV file to Histogram in R

I'm a total newbie with R, and I'm trying to create a histogram (with value and frequency as the axises) from a csv file (just one row of values). Any idea how I can do this?
I'm also an R newbie, and I ran into the same thing. I made two separate mistakes, actually, so I'll describe them both here.
Mistake 1: Passing a frequency table to hist(). Originally I was trying to pass a frequency table to hist() instead of passing in the raw data. One way to fix this is to use the rep() ("replicate") function to explode your frequency table back into a raw dataset, as described here:
Creating a histogram using aggregated data
Simple R (histogram) from counted csv file
Instead of that, though, I just decided to read in my original dataset instead of the frequency table.
Mistake 2: Wrong data type. My raw data CSV file contains two columns: hostname and bookings (idea is to count the number of bookings each host generated during some given time period). I read it into a table.
> tbl <- read.csv('bookingsdata.csv')
Then when I tried to generate a histogram off the second column, I did this:
> hist(tbl[2])
This gave me the "'x' must be numeric" error you mention in a comment. (It was trying to read the "bookings" column header in as a data value.)
This fixed it:
> hist(tbl$bookings)
You should really start to read some basic R manual...
CRAN offers a lot of them (look into the Manuals and Contributed sections)
In any case:
setwd("path/to/csv/file")
myvalues <- read.csv("filename.csv")
hist(myvalues, 100) # Example: 100 breaks, but you can specify them at will
See the manual pages for those functions for more help (accessible through ?read.table, ?read.csv and ?hist).
To plot the histogram, the values must be of numeric class i.e the data must be of numeric value. Here the value of x seems to be of some other class.
Run the following command and see:
sapply(myvalues[1,],class)

R: Associate values to tips in phylogeny

I have a phylogeny and some data (traits values). I've reconstructed trait values for all nodes using ace in caper.
I used makeNodeLabel in ape to associate the reconstructed trait values with their appropriate nodes.
What I want to do is to export a nexus file (phylogeny) from R that contains both the node values (recontructed values) and the tip labels (emperical data).
I want to use color codes (in FigTree) to indicate the values, but right now I'm only able to do this with nodes, i.e. the tip-branches do not have data and are hence not "color codable".
I need to associate values to the tips in order to do this, but I haven't been able to figure out how to do this. I also need all the data I associate to the phylogeny to be in a "similar category", i.e. similarly to for example how theta values from *BEAST are coded in nexus files.
Any and all help is greatly appreciated.
I've found a workaround:
Export a tree (nexus format) from R with reconstructed traits associated to the nodes. Open in FigTree and define the "label" trait as "trait". Import tip annotations from a text file which contain empirical data in a column with the header "trait". Then export the tree from FigTree to a new file with nexus-block and include annotations. Lasty, copy names from the name block in the nexus file (animal/organism[&Trait=2.35754]) and exchange the names with the ones in the coded tree. You will then have trait values coded through [&Trait=value] for both nodes and tips. Now you can color code the entire tree which includes the empirical values that are now associated to the tips of the phylogeny.
Sooo, that's a stupid way of doing it. If anyone has a better way, I'd love to hear it.
You could have a try 'phytools' package in the R. The function 'plotTree.wBars' could be conducted. Best wishes.

Resources