I have a bunch of folders from 20180101 to 20180331. Each folder has appropriately 500 Rda files. I wonder how can I transfer all these files into csv?
folders <- list.files(".")
for(folder in folders){
files <- list.files(folder)
for(file in files){
path_n_name <- paste0(folder, "/", file)
raw_read <- readRDS(path_n_name)
out_path_n_name <- gsub(".rds", ".csv", path_n_name, ignore.case = T)
write.csv(raw_read, out_path_n_name)
}
}
For example, in your current working directory, there are a bunch of folders. The first line folders <- list.files(".") will get all the name of the folders, and then you iterate this vector. In each iteration, you get the name of all files from each folder, then convert them to .csv. Note that it only makes sense to save some data types to .csv (i.e. dataframe).
Related
I am new to R. I have several txt. files in several sub-folders under one folder. The structuer of the txt files are the same.
I would like to iterate the files for each sub-folder and generate a signle file for the whole folder.
I coded as follow:
parent.folder <-"C:/.../18_0101" # Folder containing sub-folders
sub.folders <- list.dirs(parent.folder, recursive=TRUE)[-1] #
Sub-folders r.scripts <- file.path(sub.folders) HR_2018 <- list() for
(j in seq_along(r.scripts)) { HR_2018[[j]] <- dir(r.scripts[j],"\\.txt$")}
When I checked HR_2018[[1]], I found only the list of .txt files under the sub-folder. From there, I would like to analyze the files under each sub-folder. And then I would like to iterate the same process for other sub-folders under the folder and generate a single file.
Anyone help me?
Folder 1 and Folder 2 are full of .rds files. How would I go about merging all files in both folders into 1 .rds file?
What I have so far
mergedat <- do.call('rbind', lapply(list.files("File/Path/To/Folder/1/", full.names = TRUE), readRDS))
However I don't know how to add the second file path and even then, the code above does not seem to be working.
The information in the .rds files are all set up exactly the same as far as number of columns and column headers go, but the information in them is obviously different. I just figured out that I did not have the files read either within my code.
Any suggestions?
You can do something like this twice, each time for a different path:
path <- "./files"
files <- list.files(path = path,
full.names = TRUE,
all.files = FALSE)
files <- files[!file.info(files)$isdir]
data <- lapply(files,
function(x) {
readRDS(x)
})
You end up with 2 data objects which are lists with each list element containing a data frame that corresponds with what is in the RDS file. If all those files are the same in terms if structure, you can use dplyr::bind_rows() to concatenate all data frames into one combined data frame.
I have a folder of PDFs that I am supposed to perform text analytics on within R. Thus far the best method of doing so has been using R to convert these files to text files using pdftotext. After this however I am unable to perform any analytics as the text files are placed into the same folder as the PDFs from which they are derived.
I am achieving this through:
dest <- "C:/PDF"
myfiles <- list.files(path = dest, pattern = "pdf", full.names = TRUE)
lapply(myfiles, function(i) system(paste('"C:/xpdfbin-win-3.04/bin64/pdftotext.exe"', paste0('"',i,'"')), wait= FALSE))
I was wondering the best method of retaining only the text files, whether it be saving them to a newly created folder in this step or if more must be done.
I have tried:
dir.create("C:/txtfiles")
new.folder <- "C:/txtfiles"
dest <- "C:/PDF"
list.of.files <-list.files(dest, ".txt$")
file.copy(list.of.files, new.folder)
However this only fills the new folder 'txtfiles' with blank text files named after the ones created by the first few lines of code.
use the following code:
files <- list.files(path="current folder location",pattern = "\\.txt$") #lists all .txt files
for(i in 1:length(files)){
file.copy(from=paste("~/current folder location/",files[i],sep=""),
to="destination folder")
This should copy all text files in "current folder location" into a separate folder "destination folder".
I have a large number of nested directories with .ZIP files containing .CSV files that I want to loop through in R, extract the contents using unzip(), and then read the csv files into R.
However, there are many cases (numbering thousands of files) where there are multiple .zip files in the same directory containing .csv files with identical file names. If I set the overwrite=FALSE argument in unzip(), it ignores all duplicated names after the first. What I want is for it to extract all files but add some suffix to the file name that will allow the duplicated files to be extracted to the same directory, so that I do not have to create even more nested subdirectories to hold the files.
Example:
Directory ~/zippedfiles contains:
archive1.zip (consists of foo.csv, bar.csv), archive2.zip (foo.csv, blah.csv)
Run the following:
unzip('~/zippedfiles/archive1.zip', exdir='~/zippedfiles', overwrite=FALSE)
unzip('~/zippedfiles/archive2.zip', exdir='~/zippedfiles', overwrite=FALSE)
The result is
bar.csv
blah.csv
foo.csv
The desired result is
bar.csv
blah.csv
foo.csv
foo(1).csv
Rather than renaming the duplicate file names, why not keep them unique by assigning a separate folder for each unzip action (just like your OS probably would). This way you don't have to worry about changing file names, and you end up with a single list referencing all unzipped folders:
setwd( '~/zippedfiles' )
# get a list of ".zip" files
ziplist <- list.files( pattern = ".zip" )
# start a fresh vector to fill
unzippedlist <- vector( mode = "character", length = 0L )
# for every ".zip" file we found...
for( zipfile in ziplist ) {
# decide on a name for an output folder
outfolder <- gsub( ".zip", "", zipfile )
# create the output folder
dir.create( outfolder )
# unzip into the new output folder
unzip( 'zipfile', exdir = outfolder, overwrite=FALSE )
# get a list of files just unzipped
newunzipped <- list.files( path = outfolder, full.names = T )
# add that new list of files to the complete list
unzippedlist <- c( unzippedlist, newunzipped )
}
The vector unzippedlist should contain all of your unzipped files, with every one being unique, not necessarily by file name, but by a combination of directory and filename. So you can pass it as a vector to capture all of your files.
A solution for you might be to use system()/system2() and then use one of the countless unix methods to archieve that.
UPDATE
Thanks for the suggestions. This is how for I got so far, but I still don't find how I can get the loop to work within the file path name.
setwd("//tsclient/C/Users/xxx")
folders <- list.files("TEST")
--> This gives me a list of my folder names
for(f in folders){
setwd("//tsclient/C/xxx/[f]")
files[f] <- list.files("//tsclient/C/Users/xxx/TEST/[f]", pattern="*.TXT")
mergedfile[f] <- do.call(rbind, lapply(files[f], read.table))
write.table(mergedfile[f], "//tsclient/C/Users/xxx/[f].txt", sep="\t")
}
I have around 100 folders, each containing multiple txt files. I want to create 1 merged file per folder and save that elsewhere. However, I do not want to manually adapt the folder name in my code for each folder.
I created the following code to load in all files from a single folder (which works) and merge these files.
setwd("//tsclient/C/xxx")
files <- list.files("//tsclient/C/Users/foldername", pattern="*.TXT")
file.list <- lapply(files, read.table)
setattr(file.list, "names", files)
masterfilesales <- rbindlist(file.list, idcol="id")[, id := substr(id,1,4)]
write.table(masterfilesales, "//tsclient/C/Users/xxx/datasets/foldername.txt", sep="\t")
If I wanted to do this manually, I would every time have to adapt "foldername". The foldernames contain numeric values, containing 100 numbers between 2500 and 5000 (always 4 digits).
I looked into repeat loops, but those don't run using it within a file path.
If anyone could direct me in a good direction, I would be very grateful.