How to save vegan::simper() output to data table - r

I'm trying to save vegan::simper() output as a data frame so that I can filter objects and eventually export as a table for publication. However the simper output is of class = list and I'm not sure how to convert this to a data frame. Here is some sample code using Dune.
# Species and environmental data
dune <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/dune2.spe.txt', row.names = 1)
dune.env <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/dune2.env.txt', row.names = 1)
data(dune)
data(dune.env)
(sim <- with(dune.env, simper(dune, Management)))
summary(sim)
class(sim)

To complement the comment by #dcarlson: simper result object is a complicated beast and there is no easy way of getting a table – especially as I don't know what kind of table you are looking for. The result object stores every pair of factor classes in a separate table. You can extract all those tables with summary(sim). If you want to see only one table, for instance for the pair SF_HF, use
summary(sim)$SF_HF
(and to see available names, use names(sim)). Then it is up to you collect the table you desire from these individual tables. All information is there.
And read the warnings in the manual page.
If you want to get something similar as the short printed output, look at vegan:::print.simper to see how it can be done.

Related

Create Venn Diagram from two DF

I'm trying to create a Venn diagram of two data frames, but am only able receive incorrect results. An example of the data sets of the same structure:
Chemical
ChemID
Oxidopamine
D016627
Melatonin
D016627
I've only received incorrect results from the following:
VennDiagram::venn.diagram(
x = list(Lewy, Park),
category.names = c("ChemID, ChemID"),
filename ="venndiagramm.png",
output=TRUE)
Ideally, I would like to export an image of number of overlapping chemicals between the two sets.
Welcome to SO! As far as I guess your data structure (two dataframes Lewy and Park, each with the column ChemID), try the following:
VennDiagram::venn.diagram(
x = list(Lewy$ChemID, Park$ChemID), # expects vectors, not dataframes
# category.names = c("ChemID, ChemID"), # see if these are rather to construct nice labels
filename ="venndiagramm.png",
output=TRUE)
You may increase the chance of a useful answer by providing minimal working data samples by dput(). Of course you can use simulated data. Try to explain what exactly did not work.
See also ? venn.diagram

Writing For Loop or Split function to separate data from Master data frame into smaller data frames

I am once again asking for your help and guidance! Super duper novice here so I apologize in advance for not explaining things properly or my general lack of knowledge for something that feels like it should be easy to do.
I have sets of compounds in one "master" list that need to be separated into smaller list. I want to be able to do this with a "for loop" or some iterative function so I am not changing the numbers for each list. I want to separate the compounds based off of the column "Run.Number" (there are 21 Run.Numbers)
Step 1: Load the programs needed and open File containing "Master List"
# tMSMS List separation
#Load library packages
library(ggplot2)
library(reshape)
library(readr) #loading the csv's
library(dplyr) #data manipulation
library(magrittr) #forward pipe
library(openxlsx) #open excel sheets
library(Rcpp) #got this from an error code while trying to open excel sheets
#STEP 1: open file
S1_MasterList<- read.xlsx("/Users/owner/Documents/Research/Yurok/Bioassay/Bioassay Data/220410_tMSMS_neg_R.xlsx")
Step 2: Currently, to go through each list, I have to change the "i" value for each iteration. And I also must change the name manually (Ctrl+F), by replacing "S2_Export_1" with "S2_Export_2" and so on as I move from list to list. Also, when making the smaller list, there are a handful of columns containing data that need to be removed from the “Master List”. The specific format of column names are so it will be compatible with LC-MS software. This list is saved as a .csv file, again for compatibility with LC-MS software
#STEP 2: Iterative
#Replace: S2_Export_1
i=1
(S2_Separate<- S1_MasterList[which(S1_MasterList$Run.Number == i), ])
%>%
(S2_Export_1<-data.frame(S2_Separate$On,
S2_Separate$`Prec..m/z`,
S2_Separate$Z,
S2_Separate$`Ret..Time.(min)`,
S2_Separate$`Delta.Ret..Time.(min)`,
S2_Separate$Iso..Width,
S2_Separate$Collision.Energy))
(colnames(S2_Export_1)<-c("On", "Prec. m/z", "Z","Ret. Time (min)", "Delta Ret. Time (min)", "Iso. Width", "Collision Energy"))
(write.csv(S2_Export_1, "/Users/owner/Documents/Research/Yurok/Bioassay/Bioassay Data/Runs/220425_neg_S2_Export_1.csv", row.names = FALSE))
Results: The output should look like this image provided below, and for this one particular data frame called "Master List", there should be 21 smaller data frames. I also want the data frames to be named S2_Export_1, S2_Export_2, S2_Export_3, S2_Export_4, etc.
First, select only required columns (consider processing/renaming non-syntactic names first to avoid extra work downstream):
s1_sub <- select(S1_MasterList, Sample.Number, On, `Prec..m/z`, Z,
`Ret..Time.(min)`, `Delta.Ret..Time.(min)`,
Iso..Width, Collision.Energy)
Then split s1_sub into a list of dataframes with split()
s1_split <- split(s1_sub, s1_sub$Sample.Number)
Finally, name the resulting list of dataframes with setNames():
s1_split <- setNames(s1_split, paste0("S2_export_", seq_along(s1_split))

R - [DESeq2] - Making DESeq Dataset object from csv of already normalized counts

I'm trying to use DESeq2's PCAPlot function in a meta-analysis of data.
Most of the files I have received are raw counts pre-normalization. I'm then running DESeq2 to normalize them, then running PCAPlot.
One of the files I received does not have raw counts or even the FASTQ files, just the data that has already been normalized by DESeq2.
How could I go about importing this data (non-integers) as a DESeqDataSet object after it has already been normalized?
Consensus in vignettes and other comments seems to be that objects can only be constructed from matrices of integers.
I was mostly concerned with getting the format the same between plots. Ultimately, I just used a workaround to get the plots looking the same via ggfortify.
If anyone is curious, I just ended up doing this. Note, the "names" file is just organized like the meta file for colData for building a DESeq object from DESeqDataSetFrom Matrix, but I changed the name of the design column from "conditions" to "group" so it would match the output of PCAplot. Should look identical.
library(ggfortify)
data<-read.csv('COUNTS.csv',sep = ",", header = TRUE, row.names = 1)
names<-read.csv("NAMES.csv")
PCA<-prcomp(t(data))
autoplot(PCA, data = names, colour = "group", size=3)

Extract data of several gene sets from an RNAseq result summary file using R

I am trying to extract data of several gene sets from an RNAseq result summary file:
Example gene lists:
I am using Excel to first highlight duplicated genes, sort the summary file, then copy the data I need. It is time-consuming and Excel always "freeze" when sorting especially for big gene lists.
I was wondering if R can do a better job. Could someone kindly provide the code if R can be a better solution?
I think I got the solution although I still need to process those lists one by one.
It is faster than Excel anyway. :)
# read the RNAseq result summary file
result <- read_excel("RNAseq_Result.xlsx")
# read the gene lists file
geneset <- read_excel("Gene set list.xlsx")
# read one specific list from the gene lists file
ListA <- geneset$ListA
#subsetting
ResultListA <- result[(result$Gene_name) %in% ListA, ]
#output file
write.csv(ResultListA, 'ResultListA.csv')

Transform a matrix txt file in spectra data for ChemoSpec package

I want to use ChemoSpec with a mass spectra of about 60'000 datapoint.
I have them already in one txt file as a matrix (X + 90 samples = 91 columns; 60'000 rows).
How may I adapt this file as spectra data without exporting again each single file in csv format (which is quite long in R given the size of my data)?
The typical (and only?) way to import data into ChemoSpec is by way of the getManyCsv() function, which as the question indicates requires one CSV file for each sample.
Creating 90 CSV files from the 91 columns - 60,000 rows file described, may be somewhat slow and tedious in R, but could be done with a standalone application, whether existing utility or some ad-hoc script.
An R-only solution would be to create a new method, say getOneBigCsv(), adapted from getManyCsv(). After all, the logic of getManyCsv() is relatively straight forward.
Don't expect such a solution to be sizzling fast, but it should, in any case, compare with the time it takes to run getManyCsv() and avoid having to create and manage the many files, hence overall be faster and certainly less messy.
Sorry I missed your question 2 days ago. I'm the author of ChemoSpec - always feel free to write directly to me in addition to posting somewhere.
The solution is straightforward. You already have your data in a matrix (after you read it in with >read.csv("file.txt"). So you can use it to manually create a Spectra object. In the R console type ?Spectra to see the structure of a Spectra object, which is a list with specific entries. You will need to put your X column (which I assume is mass) into the freq slot. Then the rest of the data matrix will go into the data slot. Then manually create the other needed entries (making sure the data types are correct). Finally, assign the Spectra class to your completed list by doing something like >class(my.spectra) <- "Spectra" and you should be good to go. I can give you more details on or off list if you describe your data a bit more fully. Perhaps you have already solved the problem?
By the way, ChemoSpec is totally untested with MS data, but I'd love to find out how it works for you. There may be some changes that would be helpful so I hope you'll send me feedback.
Good Luck, and let me know how else I can help.
many years passed and I am not sure if anybody is still interested in this topic. But I had the same problem and did a little workaround to convert my data to class 'Spectra' by extracting the information from the data itself:
#Assumption:
# Data is stored as a numeric data.frame with column names presenting samples
# and row names including domain axis
dataframe2Spectra <- function(Spectrum_df,
freq = as.numeric(rownames(Spectrum_df)),
data = as.matrix(t(Spectrum_df)),
names = paste("YourFileDescription", 1:dim(Spectrum_df)[2]),
groups = rep(factor("Factor"), dim(Spectrum_df)[2]),
colors = rainbow(dim(Spectrum_df)[2]),
sym = 1:dim(Spectrum_df)[2],
alt.sym = letters[1:dim(Spectrum_df)[2]],
unit = c("a.u.", "Domain"),
desc = "Some signal. Describe it with 'desc'"){
features <- c("freq", "data", "names", "groups", "colors", "sym", "alt.sym", "unit", "desc")
Spectrum_chem <- vector("list", length(features))
names(Spectrum_chem) <- features
Spectrum_chem$freq <- freq
Spectrum_chem$data <- data
Spectrum_chem$names <- names
Spectrum_chem$groups <- groups
Spectrum_chem$colors <- colors
Spectrum_chem$sym <- sym
Spectrum_chem$alt.sym <- alt.sym
Spectrum_chem$unit <- unit
Spectrum_chem$desc <- desc
# important step
class(Spectrum_chem) <- "Spectra"
# some warnings
if (length(freq)!=dim(data)[2]) print("Dimension of data is NOT #samples X length of freq")
if (length(names)>dim(data)[1]) print("Too many names")
if (length(names)<dim(data)[1]) print("Too less names")
if (length(groups)>dim(data)[1]) print("Too many groups")
if (length(groups)<dim(data)[1]) print("Too less groups")
if (length(colors)>dim(data)[1]) print("Too many colors")
if (length(colors)<dim(data)[1]) print("Too less colors")
if (is.matrix(data)==F) print("'data' is not a matrix or it's not numeric")
return(Spectrum_chem)
}
Spectrum_chem <- dataframe2Spectra(Spectrum)
chkSpectra(Spectrum_chem)

Resources