I'm having trouble finding the documentation to answer my seemingly straightforward question.
For simplicity's sake, I have a list of 3 dataframes of differing numbers of rows.
mylist<-list()
mylist[[1]]<-c(1:10)
mylist[[2]]<-c(2:15)
mylist[[3]]<-c(20:54)
I'd like to write each element of the list to a separate sheet in an excel workbook, which I presumably can do with WriteXLS (?).
When I call
WriteXLS("mylist", ExcelFileName="mylist.xls")
Error in WriteXLS("mylist", ExcelFileName = "mylist.xls") :
One or more of the objects named in 'x' is not a data frame or does not exist
... does WriteXLS not support lists? If not, how do I get around this efficiently? I will be writing files as part of a large simulation.
I always create temporary data frames (rectangular arrays) ...
sheet0 <- data.frame(array.no.1) # This is usually set of descriptions of the sheets
sheet1 <- data.frame(array.no.2) # The data
sheet2 <- data.frame(array.no.3) # More information
myxls <- c(sheet0="Index",sheet1="Results",sheet2="Notes")
WriteXLS(names(myxls),ExcelFileName="my.xls",SheetNames=myxls)
The documentation says "data.frames" so that led me to this solution.
Related
I am once again asking for your help and guidance! Super duper novice here so I apologize in advance for not explaining things properly or my general lack of knowledge for something that feels like it should be easy to do.
I have sets of compounds in one "master" list that need to be separated into smaller list. I want to be able to do this with a "for loop" or some iterative function so I am not changing the numbers for each list. I want to separate the compounds based off of the column "Run.Number" (there are 21 Run.Numbers)
Step 1: Load the programs needed and open File containing "Master List"
# tMSMS List separation
#Load library packages
library(ggplot2)
library(reshape)
library(readr) #loading the csv's
library(dplyr) #data manipulation
library(magrittr) #forward pipe
library(openxlsx) #open excel sheets
library(Rcpp) #got this from an error code while trying to open excel sheets
#STEP 1: open file
S1_MasterList<- read.xlsx("/Users/owner/Documents/Research/Yurok/Bioassay/Bioassay Data/220410_tMSMS_neg_R.xlsx")
Step 2: Currently, to go through each list, I have to change the "i" value for each iteration. And I also must change the name manually (Ctrl+F), by replacing "S2_Export_1" with "S2_Export_2" and so on as I move from list to list. Also, when making the smaller list, there are a handful of columns containing data that need to be removed from the “Master List”. The specific format of column names are so it will be compatible with LC-MS software. This list is saved as a .csv file, again for compatibility with LC-MS software
#STEP 2: Iterative
#Replace: S2_Export_1
i=1
(S2_Separate<- S1_MasterList[which(S1_MasterList$Run.Number == i), ])
%>%
(S2_Export_1<-data.frame(S2_Separate$On,
S2_Separate$`Prec..m/z`,
S2_Separate$Z,
S2_Separate$`Ret..Time.(min)`,
S2_Separate$`Delta.Ret..Time.(min)`,
S2_Separate$Iso..Width,
S2_Separate$Collision.Energy))
(colnames(S2_Export_1)<-c("On", "Prec. m/z", "Z","Ret. Time (min)", "Delta Ret. Time (min)", "Iso. Width", "Collision Energy"))
(write.csv(S2_Export_1, "/Users/owner/Documents/Research/Yurok/Bioassay/Bioassay Data/Runs/220425_neg_S2_Export_1.csv", row.names = FALSE))
Results: The output should look like this image provided below, and for this one particular data frame called "Master List", there should be 21 smaller data frames. I also want the data frames to be named S2_Export_1, S2_Export_2, S2_Export_3, S2_Export_4, etc.
First, select only required columns (consider processing/renaming non-syntactic names first to avoid extra work downstream):
s1_sub <- select(S1_MasterList, Sample.Number, On, `Prec..m/z`, Z,
`Ret..Time.(min)`, `Delta.Ret..Time.(min)`,
Iso..Width, Collision.Energy)
Then split s1_sub into a list of dataframes with split()
s1_split <- split(s1_sub, s1_sub$Sample.Number)
Finally, name the resulting list of dataframes with setNames():
s1_split <- setNames(s1_split, paste0("S2_export_", seq_along(s1_split))
I already know how to load a single CSV into a DataFrame:
using CSV
using DataFrames
df = DataFrame(CSV.File("C:\\Users\\username\\Table_01.csv"))
How would I do this when I have several CSV files, e.g. Table_01.csv, Table_02.csv, Table_03.csv?
Would I create a bunch of empty DataFrames and use a for loop to fill them? Or is there an easier way in Julia? Many thanks in advance!
If you want multiple data frames (not a single data frame holding the data from multiple files) there are several options.
Let me start with the simplest approach using broadcasting:
dfs = DataFrame.(CSV.File.(["Table_01.csv", "Table_02.csv", "Table_03.csv"]))
or
dfs = #. DataFrame(CSV.File(["Table_01.csv", "Table_02.csv", "Table_03.csv"]))
or (with a bit of more advanced stuff, using function composition):
(DataFrame∘CSV.File).(["Table_01.csv", "Table_02.csv", "Table_03.csv"])
or using chaining:
CSV.File.(["Table_01.csv", "Table_02.csv", "Table_03.csv"]) .|> DataFrame
Now other options are map as it was suggested in the comment:
map(DataFrame∘CSV.File, ["Table_01.csv", "Table_02.csv", "Table_03.csv"])
or just use a comprehension:
[DataFrame(CSV.File(f)) for f in ["Table_01.csv", "Table_02.csv", "Table_03.csv"]]
(I am listing the options to show different syntactic possibilities in Julia)
This is how I have done it, but there might be an easier way.
using DataFrames, Glob
import CSV
function readcsvs(path)
files=glob("*.csv", path) #Vector of filenames. Glob allows you to use the asterisk.
numfiles=length(files) #Number of files to read.
tempdfs=Vector{DataFrame}(undef, numfiles) #Create a vector of empty dataframes.
for i in 1:numfiles
tempdfs[i]=CSV.read(files[i]) #Read each CSV into its own dataframe.
end
masterdf=outerjoin(tempdfs..., on="Column In Common") #Join the temporary dataframes into one dataframe.
end
A simple solution where you don't have to explicitly enter filenames:
using CSV, Glob, DataFrames
path = raw"C:\..." # directory of your files (raw is useful in Windows to add a \)
files=glob("*.csv", path) # to load all CSVs from a folder (* means arbitrary pattern)
dfs = DataFrame.( CSV.File.( files ) ) # creates a list of dataframes
# add an index column to be able to later discern the different sources
for i in 1:length(dfs)
dfs[i][!, :sample] .= i # I called the new col sample
end
# finally, if you want, reduce your collection of dfs via vertical concatenation
df = reduce(vcat, dfs)
I am trying to extract data of several gene sets from an RNAseq result summary file:
Example gene lists:
I am using Excel to first highlight duplicated genes, sort the summary file, then copy the data I need. It is time-consuming and Excel always "freeze" when sorting especially for big gene lists.
I was wondering if R can do a better job. Could someone kindly provide the code if R can be a better solution?
I think I got the solution although I still need to process those lists one by one.
It is faster than Excel anyway. :)
# read the RNAseq result summary file
result <- read_excel("RNAseq_Result.xlsx")
# read the gene lists file
geneset <- read_excel("Gene set list.xlsx")
# read one specific list from the gene lists file
ListA <- geneset$ListA
#subsetting
ResultListA <- result[(result$Gene_name) %in% ListA, ]
#output file
write.csv(ResultListA, 'ResultListA.csv')
I have 500 csv. files with data that looks like:
sample data
I want to extract one cell (e.g. B4 or 0.477) per a csv file and combine those values into a single csv. What are some recommendations on how to do this easily?
You can try something like this
all.fi <- list.files("/path/to/csvfiles", pattern=".csv", full.names=TRUE) # store names of csv files in path as a string vector
library(readr) # package for read_lines and write_lines
ans <- sapply(all.fi, function(i) { eachline <- read_lines(i, n=4) # read only the 4th line of the file
ans <- unlist(strsplit(eachline, ","))[2] # split the string on commas, then extract the 2nd element of the resulting vector
return(ans) })
write_lines(ans, "/path/to/output.csv")
I can not add a comment. So, I will write my comment here.
Since your data is very large and it is very difficult to load it individually, then try this: Importing multiple .csv files into R. It is similar to the first part of your problem. For second part, try this:
You can save your data as a data.frame (as with the comment of #Bruno Zamengo) and then you can use select and merge functions in R. Then, you can easily combine them in single csv file. With select and merge functions you can select all the values you need and them combine them. I used this idea in my project. Do not forget to use lapply.
I have 3 text files each of which has 14 similar columns. I want to first read these 3 files (data frames) and then combine them into one data frame. Following is what I have tried after finding some help in R mailing list:
file_name <- list.files(pattern='sEMA*') # CREATING A LIST OF FILE NAMES OF FILES HAVING 'sEMA' IN THEIR NAMES
NGSim <- lapply (file_name, read.csv, sep=' ', header=F, strip.white=T) # READING ALL THE TEXT FILES
This piece of code can read the files altogether but does not combine them into one data frame. I have tried data.frame(NGSim) but R gives an error: cannot allocate vector of size 4.2 Mb. How can I combine the files in one single data frame?
Like this:
do.call(rbind, NGSim)
library(plyr)
rbind.fill(NGSim)
or,
ldply(NGSim)
If file size is an issue that's the case you may want to the use data.table functions instead of less efficient base functions like read.csv().
library(data.table)
NGSim <- data.frame(rbindlist(lapply(list.files(pattern='sEMA*'),fread)))