Creating a data frame with the contents of multiple txt files - r

I'm new to R programming and am having difficulties trying to create one data frame from a number of text files. I have a directory containing over 100 text files. Each of the files have a different file name but the contents are of a similar format e.g. 3 columns (name, age,gender). I want to load each of the text files into R and merge them into 1 data frame.
So far I have:
txt_files = list.files(path='names/', pattern="*.txt");
do.call("rbind", lapply(txt_files, as.data.frame))
This has created a list of the file names but not the contents of the files. I'm able to read in the content of one file and create a data frame but I can't seem to do it for multiple files at once. If anyone could offer any help I'd really appreciate it as I'm completely stuck!
Thanks in advance!

I think you might want something like this:
# Put in your actual path where the text files are saved
mypath = "C:/Users/Dave/Desktop"
setwd(mypath)
# Create list of text files
txt_files_ls = list.files(path=mypath, pattern="*.txt")
# Read the files in, assuming comma separator
txt_files_df <- lapply(txt_files_ls, function(x) {read.table(file = x, header = T, sep =",")})
# Combine them
combined_df <- do.call("rbind", lapply(txt_files_df, as.data.frame))
At least that worked for me when I created a couple of sample text files.
Hope that helps.

Related

reading files in a batch does not read them correctly

I have 30 .txt files that I need to read to a tibble. Its panel data and altogether 108M
The issue is that some files are read correctly with all values there, but some read as NA while values are there! Also, files include a lot of blank lines....
Here is what I use:
read_clean_table<-function(x){
x<-read.table(x, header = TRUE, fill = TRUE)
x[-(1:4),] #first 4 rows are system data
}
filenames<-list.files(path="./ML", pattern = ".*.txt", full.names=TRUE)
#read files and merge to table, first rows removed, FileName is the name of file
files<-filenames%>%
set_names(.) %>%
map_df(read_clean_table, .id = "FileName")%>%
mutate(FileName=str_replace_all(basename(FileName), pattern="\\.txt",""))
I tried read.delim as well with the same success...
THis is what the issue looks like
edited:
added two files
https://drive.google.com/drive/folders/1gDss6qV9aFUMpJFGHPMQZbTITJ9av-py?usp=sharing

R: how to extract column in multiple csv and then write multiple csv in one folder

I have a folder (folder 1) containing multiple csv: "x.csv", "y.csv", "z.csv"...
I want to extract the 3rd column of each file and then write new csv files in a new folder (folder 2). Hence, folder 2 must contain "x.csv", "y.csv", "z.csv"...(but with just the 3rd column).
I tried this:
dfiles <- list.files(pattern =".csv") #if you want to read all the files in working directory
lst2 <- lapply(dfiles, function(x) (read.csv(x, header=FALSE)[,3]))
But I got this error:
Error in `[.data.frame`(read.csv(x, header = FALSE), , 3) :
undefined columns selected
Moreover, I don't know how to write multiple csv.
However, if I do this with one file, it works properly, despite the output is in the same folder:
essai <-read.csv("x.csv", header = FALSE, sep = ",")[,3]
write.csv (essai, file = "x.csv")
Any help would be appreciated.
so here's how I would do it. There may be a nicer and more efficient way but it should still work pretty well.
setwd("~/stackexchange") #set your main folder. Best way to do this is actually the here() package. But that's another topic.
library(tools) #for file extension tinkering
folder1 <- "folder1" #your original folder
folder2 <- "folder2" #your new folder
#I setup a function and loop over it with lapply.
write_to <- function(file.name){
file.name <- paste0(tools::file_path_sans_ext(basename(file.name)), ".csv")
essai <-read.csv(paste(folder1, file.name, sep = "/"), header = FALSE, sep = ",")[,3]
write.csv(essai, file = paste(folder2, file.name, sep="/"))
}
# get file names from folder 1
dfiles <- list.files(path=folder1, pattern ="*.csv") #if you want to read all the csv files in folder1 directory
lapply(X = paste(folder1, dfiles, sep="/"), write_to)
Have fun!
Btw: if you have many files, you could use data.table::fread and data.table::fwrite which improves csv reading/writing speed by a lot.
First of all, from the error message it seems that some of the csv files have less than 3 columns. Check if you are reading the correct files and if all of them are supposed to have 3 columns at least.
Once you do that you can use the below code, to read the csv file, select the 3rd column and write the csv file in 'folder2'.
lapply(dfiles, function(x) {
df <- read.csv(x, header = FALSE)
write.csv(subset(df, select = 3), paste0('folder2/', x), row.names = FALSE)
})
For the "write" portion of this question, I had some luck using map2() in purrr. I'm not sure this is the most elegant solution but here it goes:
listofessais # this is your .csv files together as a named list of tbls
map2(listofessais, names(listofessais), ~write_csv(.x, glue("FilePath/{.y}.csv"))
That should give you all your .csv files exported in that folder, and named with the same names they were given in the list.

How to remove some certain column in multiple files in R?

everyone. I want to remove some certain columns in multiple files(csv.).
for example, I have 50 files. And I want to delete a,b,c column in every file.
The point is I don't know how to get the files. Save the change in every single file and remain the original file name.
library(tidyverse)
# I want to delet some column which contain messy code
# input a list of file
df <- list.files(here("Data"),pattern=".csv",full.names = TRUE) %>%
lapply(read_csv) %>% #read csv
lapply(subset,select = -c(a,b,c)) #To remove the messy code
write.csv(df, file = here())
# I want to save the change in the original files, but I don't know how to do it.
Read all the files (if all the files are in the working directory) directly into a list and process it.
files <- list.files() #if you want to read all the files in working directory
lst2 <- lapply(files, function(x) read.table(x, header=TRUE))
lapply(lst2,`[`,c(-a,-b,-c)

How to read every .csv file in R and export them into single large file

Hi so I have a data in the following format
101,20130826T155649
------------------------------------------------------------------------
3,1,round-0,10552,180,yellow
12002,1,round-1,19502,150,yellow
22452,1,round-2,28957,130,yellow,30457,160,brake,31457,170,red
38657,1,round-3,46662,160,yellow,47912,185,red
and I have been reading them and cleaning/formating them by this code
b <- read.table("sid-101-20130826T155649.csv", sep = ',', fill=TRUE, col.names=paste("V", 1:18,sep="") )
b$id<- b[1,1]
b<-b[-1,]
b<-b[-1,]
b$yellow<-B$V6
and so on
There are about 300 files like this, and ideally they will all compiled without the first two lines, since the first line is just id and I made a separate column to identity these data. Does anyone know how to read these table quickly and clean and format the way I want then compile them into a large file and export them?
You can use lapply to read all the files, clean and format them, and store the resulting data frames in a list. Then use do.call to combine all of the data frames into single large data frame.
# Get vector of files names to read
files.to.load = list.files(pattern="csv$")
# Read the files
df.list = lapply(files.to.load, function(file) {
df = read.table(file, sep = ',', fill=TRUE, col.names=paste("V", 1:18,sep=""))
... # Cleaning and formatting code goes here
df$file.name = file # In case you need to know which file each row came from
return(df)
})
# Combine into a single data frame
df.combined = do.call(rbind, df.list)

How to not overwrite file in R

I am trying to copy and paste tables from R into Excel. Consider the following code from a previous question:
data <- list.files(path=getwd())
n <- length(list)
for (i in 1:n)
{
data1 <- read.csv(data[i])
outline <- data1[,2]
outline <- as.data.frame(table(outline))
print(outline) # this prints all n tables
name <- paste0(i,"X.csv")
write.csv(outline, name)
}
This code writes each table into separate Excel files (i.e. "1X.csv", "2X.csv", etc..). Is there any way of "shifting" each table down some rows instead of rewriting the previous table each time? I have also tried this code:
output <- as.data.frame(output)
wb = loadWorkbook("X.xlsx", create=TRUE)
createSheet(wb, name = "output")
writeWorksheet(wb,output,sheet="output",startRow=1,startCol=1)
writeNamedRegion(wb,output,name="output")
saveWorkbook(wb)
But this does not copy the dataframes exactly into Excel.
I think, as mentioned in the comments, the way to go is to first merge the data frames in R and then writing them into (one) output file:
# get vector of filenames
filenames <- list.files(path=getwd())
# for each filename: load file and create outline
outlines <- lapply(filenames, function(filename) {
data <- read.csv(filename)
outline <- data[,2]
outline <- as.data.frame(table(outline))
outline
})
# merge all outlines into one data frame (by appending them row-wise)
outlines.merged <- do.call(rbind, outlines)
# save merged data frame
write.csv(outlines.merged, "all.csv")
Despite what microsoft would like you to believe, .csv files are not excel files, they are a common file type that can be read by excel and many other programs.
The best approach depends on what you really want to do. Do you want all the tables to read into a single worksheet in excel? If so you could just write to a single file using the append argument to the write.csv or other functions. Or use a connection that you keep open so each new one is appended. You may want to use cat to put a couple of newlines before each new table.
Your second attempt looks like it uses the XLConnect package (but you don't say, so it could be something else). I would think this the best approach, how is the result different from what you are expecting?

Resources