How can I save 3 dataframes of different dimensions to one csv in order to load them afterwards in 3 different dataframes?
E.g
write.table(A, file = "di2.csv", row.names = FALSE, col.names = FALSE, sep=',')
write.table(B, file = "di2.csv", row.names = FALSE, col.names = FALSE, sep=',', append=TRUE)
write.table(C, file = "di2.csv", row.names = FALSE, col.names = FALSE, sep=',', append=TRUE)
or in a more elegant way
write.csv(rbind(A, B, C), "di2.csv")
How can I load this CSV to 3 dataframes, A,B and C?
This worked for me:
save(A_table, B_table, C_table, file="temp.Rdata")
load("temp.Rdata")
As mentioned in the comments if your purpose is just to read them back into R later then you could use save/load.
Another simple solution is dump/source:
A <- B <- C <- BOD # test input
dump(c("A", "B", "C"))
# read back
source("dumpdata.R")
Other multiple object formats that you could consider would be hdf and an SQLite database (which is a single file).
On the other hand if it is important that it be readable text, directly readable by Excel and at least somewhat similar to a csv file then write the data frames out one after another with a blank line after each. Then to read them back later, read the file and separate the input at the blank lines. st and en are the starting and ending line numbers of the chunks in Lines.
A <- B <- C <- BOD # test inputs
# write data frames to a single file
con <- file("out.csv", "w")
for(el in list(A, B, C)) {
write.csv(el, con, row.names = FALSE)
writeLines("", con)
}
close(con)
# read data.frames from file into a list L
Lines <- readLines("out.csv")
en <- which(Lines == "")
st <- c(1, head(en, -1))
L <- Map(function(st, en) read.csv(text = Lines[st:en]), st, en)
Note that there are some similarities between this question and Importing from CSV from a specified range of values
Related
I have several files with the names RTDFE, TRYFG, FTYGS, WERTS...like 100 files in txt format. For each file, I'm using the following code and writing the output in a file.
name = c("RTDFE")
file1 <- paste0(name, "_filter",".txt")
file2 <- paste0(name, "_data",".txt")
### One
A <- read.delim(file1, sep = "\t", header = FALSE)
#### two
B <- read.delim(file2, sep = "\t", header = FALSE)
C <- merge(A, B, by="XYZ")
nrow(C)
145
Output:
Samples Common
RTDFE 145
Every time I'm assigning the file to variable name running my code and writing the output in the file. Instead, I want the code to be run on all the files in one go and want the following output. Common is the row of merged data frame C
The output I need:
Samples Common
RTDFE 145
TRYFG ...
FTYGS ...
WERTS ...
How to do this? Any help.
How about putting all your names in a single vector, called names, like this:
names<-c("TRYFG","RTDFE",...)
and then feeding each one to a function that reads the files, merges them, and returns the rows
f<-function(n) {
fs = paste0(n,c("_filter", "_data"),".txt")
C = merge(
read.delim(fs[1],sep="\t", header=F),
read.delim(fs[2],sep="\t", header=F), by="XYZ")
data.frame(Samples=n,Common=nrow(C))
}
Then just call call this function f on each of the values in names, row binding the result together
do.call(rbind, lapply(names, f))
An easy way to create the vector names is like this:
p = "_(filter|data).txt"
names = unique(gsub(p,"",list.files(pattern = p)))
I am making some assumptions here.
The first assumption is that you have all these files in a folder with no other text files (.txt) in this folder.
If so you can get the list of files with the command list.files.
But when doing so you will get the "_data.txt" and the "filter.txt".
We need a way to extract the basic part of the name.
I use "str_replace" to remove the "_data.txt" and the "_filter.txt" from the list.
But when doing so you will get a list with two entries. Therefore I use the "unique" command.
I store this in "lfiles" that will now contain "RTDFE, TRYFG, FTYGS, WERTS..." and any other file that satisfy the conditions.
After this I run a for loop on this list.
I reopen the files similarly as you do.
I merge by XYZ and I immediately put the results in a data frame.
By using rbind I keep adding results to the data frame "res".
library(stringr)
lfiles=list.files(path = ".", pattern = ".txt")
## we strip, from the files, the "_filter and the data
lfiles=unique( sapply(lfiles, function(x){
x=str_replace(x, "_data.txt", "")
x=str_replace(x, "_filter.txt", "")
return(x)
} ))
res=NULL
for(i in lfiles){
file1 <- paste0(i, "_filter.txt")
file2 <- paste0(i, "_data.txt")
### One
A <- read.delim(file1, sep = "\t", header = FALSE)
#### two
B <- read.delim(file2, sep = "\t", header = FALSE)
res=rbind(data.frame(Samples=i, Common=nrow(merge(A, B, by="XYZ"))))
}
Ok, I will assume you have a folder called "data" with files named "RTDFE_filter.txt, RTDFE_data, TRYFG_filter.txt, TRYFG_data.txt, etc. (only and exacly this files).
This code should give a possible way
# save the file names
files = list.files("data")
# get indexes for "data" (for "filter" indexes, add 1)
files_data_index = seq(1, length(f), 2) # 1, 3, 5, ...
# loop on indexes
results = lapply(files_data_index, function(i) {
A <- read.delim(files[i+1], sep = "\t", header = FALSE)
B <- read.delim(files[i], sep = "\t", header = FALSE)
C <- merge(A, B, by="XYZ")
samp = strsplit(files[i], "_")[[1]][1]
com = nrow(C)
return(c(Samples = samp, Comon = com))
})
# combine results
do.call(rbind, results)
I need to extract cells from the range C6:E6 (in the code range is [4, 3:5]) from three different csv files ("Multi_year_summary.csv") which are in different folders and then copy them into a new excel files. All csv files have the same name (written above). I tried as follow:
library("xlsx")
zz <- dir("C:/Users/feder/Documents/Simulations_DNDC")
aa <- list.files("C:/Users/feder/Documents/Simulations_DNDC/Try_1", pattern = "Multi_year_summary.csv",
full.names = T, recursive = T, include.dirs = T)
bb <- lapply(aa, read.csv2, sep = ",", header = F)
for (i in 1:length(bb)) {
xx <- bb[[i]][4, 3:5]
qq <- rbind(xx)
jj <- write.xlsx(qq, "C:/Users/feder/Documents/Simulations_DNDC/Try_1/Results.xlsx",
sheetName="Tabelle1",col.names = FALSE, row.names = FALSE)
}
The code is executed, but extracts the cells only from one file so that in Results.xlsx I have only one row instead of three. Maybe the problem starts from xx <- bb[[i]][4, 3:5] since if I execute xx the console gives back "1 obs. of 3 variables" instead of 3 objects.
Any help will be greatly appreciated.
After reading the csv you can extract the relevant data needed in the same lapply loop, combine them into one dataframe and write it in xlsx format.
result <- do.call(rbind, lapply(aa, function(x) read.csv(x, header = FALSE)[4, 3:5]))
write.xlsx(result,
"C:/Users/feder/Documents/Simulations_DNDC/Try_1/Results.xlsx",
sheetName="Tabelle1",col.names = FALSE, row.names = FALSE)
I have 2 CSV datasets:
File 1:
Identity,Number,Data,Result,RT
5,3,13,45,34
6,1,44,12,56
3,1,67,23,47
0,6,43,55,91
4,5,33,34,29
File 2:
Identity,NB,NB,Result,Data,
1,4,55,92,62
3,7,43,12,74
7,3,58,52,64
0,6,10,22,96
3,8,13,92,22
I would like to concatenate these two datasets to create one dataset with the data of file 2 beneath the data of file 1 in the correct corresponding columns.
File 3:
Identity,Number,Data,Result,RT
5,3,13,45,34
6,1,44,12,56
3,1,67,23,47
0,6,43,55,91
4,5,33,34,29
Identity,NB,NB,Result,Data,
1,4,55,92,62
3,7,43,12,74
7,3,58,52,64
0,6,10,22,96
3,8,13,92,22
But where the columns for Data and Result line up on top of one another.
N.B.The columns with the corresponding data in file 1 don't align with the columns holding the same data in file 2.
If your expected output is an object in R, it's impossible. If you mean that File 3 is a csv file binding File 1 and 2, then you can try this :
f1 <- read.csv("file1.csv")
f2 <- read.csv("file2.csv")
inter <- intersect(names(f1), names(f2))
diff1 <- names(f1)[! names(f1) %in% inter]
diff2 <- names(f2)[! names(f2) %in% inter]
write.table(f1[c(inter, diff1)], "file3.csv", quote = F, sep = ",", row.names = F)
write.table(f2[c(inter, diff2)], "file3.csv", quote = F, sep = ",", row.names = F, append = T)
# append = T is applicable to write.table, not write.csv.
file3.csv
Identity,Data,Result,Number,RT
5,13,45,3,34
6,44,12,1,56
3,67,23,1,47
0,43,55,6,91
4,33,34,5,29
Identity,Data,Result,NB,NB.1
1,62,92,4,55
3,74,12,7,43
7,64,52,3,58
0,96,22,6,10
3,22,92,8,13
Use Excel to open it:
I currently have two data-frames, One DF contains around ~100,000 rows, while the other only has ~1000. I can export either one of these using the write.table function shown below...
write.table(DF_1, file = paste("DF_one.csv" ),
row.names = F, col.names = T, sep = ",")
This is easily opened by excel and works well. The problem is I need to include the other data frame in the very same excel file, and I'm not sure how to do this or if it is even possible.
I am open to any ideas, and have provided some example data to work with below.
#Example data for data frame one, length =30
Dates<-c(Sys.Date()+1:30)
Data1<-c(1+1:30)
#Data Frame One
Df1<-data.frame(Dates,Data1)
#Example data for data rame two, length=10
Letters<-c(letters[1:10])
Data2<-c(1:10)
#Data Frame Two
Df2<-data.frame(Letters,Data2)
#Now, is there a way can we export both to the same file?
#Here is the export for just data frame one
write.table(Df1, file = paste("DFone.csv" ),
row.names = F, col.names = T, sep = ",")
Any ideas including:"stop being picky and just export 2 files and then merge in excel" are appreciated.
Research Done:
I like this approach but would prefer a horizontal format instead of vertical
(I should probably just not be picky)
How to merge multiple data frame into one table and export to Excel?
How to write multiple tables, dataframes, regression results etc - to one excel file?
Thanks for all the help!
I have no idea if this preserves the information structure that you want but you are really intent on getting them into the same table you could do the following.
Both <- data.frame(Df1,Df2)
write.table(Both, file = paste("DF_Both.csv" ),
row.names = F, col.names = T, sep = ",")
Because the first solution did not meet your requirements here is another one that saves data frames to multiple tabs of an excel spreadsheet.
install.packages("xlsx")
library(xlsx)
###Define the save.xlsx function
save.xlsx <- function (file, ...)
{
require(xlsx, quietly = TRUE)
objects <- list(...)
fargs <- as.list(match.call(expand.dots = TRUE))
objnames <- as.character(fargs)[-c(1, 2)]
nobjects <- length(objects)
for (i in 1:nobjects) {
if (i == 1)
write.xlsx(objects[[i]], file, sheetName = objnames[i])
else write.xlsx(objects[[i]], file, sheetName = objnames[i],
append = TRUE)
}
print(paste("Workbook", file, "has", nobjects, "worksheets."))
}
### Save the file to your working directory.
save.xlsx("WorkbookTitle.xlsx", Df1, Df2)
Full discolsure this was adapted from another question on stack overflow R dataframes to multi sheet Excel Work
I'm quite new at R and a bit stuck on what I feel is likely a common operation to do. I have a number of files (57 with ~1.5 billion rows cumulatively by 6 columns) that I need to perform basic functions on. I'm able to read these files in and perform the calculations I need no problem but I'm tripping up in the final output. I envision the function working on 1 file at a time, outputting the worked file and moving onto the next.
After calculations I would like to output 57 new .txt files named after the file the input data first came from. So far I'm able to perform the calculations on smaller test datasets and spit out 1 appended .txt file but this isn't what I want as a final output.
#list filenames
files <- list.files(path=, pattern="*.txt", full.names=TRUE, recursive=FALSE)
#begin looping process
loop_output = lapply(files,
function(x) {
#Load 'x' file in
DF<- read.table(x, header = FALSE, sep= "\t")
#Call calculated height average a name
R_ref= 1647.038203
#Add column names to .las data
colnames(DF) <- c("X","Y","Z","I","A","FC")
#Calculate return
DF$R_calc <- (R_ref - DF$Z)/cos(DF$A*pi/180)
#Calculate intensity
DF$Ir_calc <- DF$I * (DF$R_calc^2/R_ref^2)
#Output new .txt with calcuated columns
write.table(DF, file=, row.names = FALSE, col.names = FALSE, append = TRUE,fileEncoding = "UTF-8")
})
My latest code endeavors have been to mess around with the intial lapply/sapply function as so:
#begin looping process
loop_output = sapply(names(files),
function(x) {
As well as the output line:
#Output new .csv with calcuated columns
write.table(DF, file=paste0(names(DF), "txt", sep="."),
row.names = FALSE, col.names = FALSE, append = TRUE,fileEncoding = "UTF-8")
From what I've been reading the file naming function during write.table output may be one of the keys I don't have fully aligned yet with the rest of the script. I've been viewing a lot of other asked questions that I felt were applicable:
Using lapply to apply a function over list of data frames and saving output to files with different names
Write list of data.frames to separate CSV files with lapply
to no luck. I deeply appreciate any insights or paths towards the right direction on inputting x number of files, performing the same function on each, then outputting the same x number of files. Thank you.
The reason the output is directed to the same file is probably that file = paste0(names(DF), "txt", sep=".") returns the same value for every iteration. That is, DF must have the same column names in every iteration, therefore names(DF) will be the same, and paste0(names(DF), "txt", sep=".") will be the same. Along with the append = TRUE option the result is that all output is written to the same file.
Inside the anonymous function, x is the name of the input file. Instead of using names(DF) as a basis for the output file name you could do some transformation of this character string.
example.
Given
x <- "/foo/raw_data.csv"
Inside the function you could do something like this
infile <- x
outfile <- file.path(dirname(infile), gsub('raw', 'clean', basename(infile)))
outfile
[1] "/foo/clean_data.csv"
Then use the new name for output, with append = FALSE (unless you need it to be true)
write.table(DF, file = outfile, row.names = FALSE, col.names = FALSE, append = FALSE, fileEncoding = "UTF-8")
Using your code, this is the general idea:
require(purrr)
#list filenames
files <- list.files(path=, pattern="*.txt", full.names=TRUE, recursive=FALSE)
#Call calculated height average a name
R_ref= 1647.038203
dfTransform <- function(file){
colnames(file) <- c("X","Y","Z","I","A","FC")
#Calculate return
file$R_calc <- (R_ref - file$Z)/cos(file$A*pi/180)
#Calculate intensity
file$Ir_calc <- file$I * (file$R_calc^2/R_ref^2)
return(file)
}
output <- files %>% map(read.table,header = FALSE, sep= "\t") %>%
map(dfTransform) %>%
map(write.table, file=paste0(names(DF), "txt", sep="."),
row.names = FALSE, col.names = FALSE, append = TRUE,fileEncoding = "UTF-8")