I have a folder of PDFs that I am supposed to perform text analytics on within R. Thus far the best method of doing so has been using R to convert these files to text files using pdftotext. After this however I am unable to perform any analytics as the text files are placed into the same folder as the PDFs from which they are derived.
I am achieving this through:
dest <- "C:/PDF"
myfiles <- list.files(path = dest, pattern = "pdf", full.names = TRUE)
lapply(myfiles, function(i) system(paste('"C:/xpdfbin-win-3.04/bin64/pdftotext.exe"', paste0('"',i,'"')), wait= FALSE))
I was wondering the best method of retaining only the text files, whether it be saving them to a newly created folder in this step or if more must be done.
I have tried:
dir.create("C:/txtfiles")
new.folder <- "C:/txtfiles"
dest <- "C:/PDF"
list.of.files <-list.files(dest, ".txt$")
file.copy(list.of.files, new.folder)
However this only fills the new folder 'txtfiles' with blank text files named after the ones created by the first few lines of code.
use the following code:
files <- list.files(path="current folder location",pattern = "\\.txt$") #lists all .txt files
for(i in 1:length(files)){
file.copy(from=paste("~/current folder location/",files[i],sep=""),
to="destination folder")
This should copy all text files in "current folder location" into a separate folder "destination folder".
Related
I have a bunch of folders from 20180101 to 20180331. Each folder has appropriately 500 Rda files. I wonder how can I transfer all these files into csv?
folders <- list.files(".")
for(folder in folders){
files <- list.files(folder)
for(file in files){
path_n_name <- paste0(folder, "/", file)
raw_read <- readRDS(path_n_name)
out_path_n_name <- gsub(".rds", ".csv", path_n_name, ignore.case = T)
write.csv(raw_read, out_path_n_name)
}
}
For example, in your current working directory, there are a bunch of folders. The first line folders <- list.files(".") will get all the name of the folders, and then you iterate this vector. In each iteration, you get the name of all files from each folder, then convert them to .csv. Note that it only makes sense to save some data types to .csv (i.e. dataframe).
I download an xlxs file everyday with a long unique name with dates each day. I need R to read the new xlsx file saved in the directory everyday without typing the unique name everyday. My idea is to utilize the *.xlsx but whenever I try it, it always say the path does not exist:
excel_df <- read_excel("C:/Home/User/dbd/*.xlsx")
the code above does not work
This code says the same:
base <- as.character("C:/Home/User/dbd/*.xlsx")
files <- file.info(list.files(path = base, pattern = '*.xlsx',
full.names = TRUE, no.. = TRUE))
daily_numebrs<-readxl::read_excel(rownames(files)[order(files$mtime)][nrow(files)])
each line of results shows the
...path does not exist.
The path shouldn't contain the pattern:
path <- "C:/Home/User/dbd"
files <- list.files(path= path, full.names=T, pattern ='\\.xlsx$')
files
lapply(files, function(file) readxl::read_excel(file))
I am trying to copy a large number of files from one folder to another. We need to restructure the folders, so there is a translation from the old folder path to a new one. The old folder structure is also nested.
Currently the code I have is not throwing any errors, but is returning false on executing the file.copy for all files.
ETA: When I copy a single file, it works.
allFilePaths <- list.files('./oldTopLevelFolder', recursive = TRUE)
testIds <- c(1:4)
otherTestIds <- c(5:8)
allNewFolders <- paste('newTopLevelFolder', testIds, 'aFolderName', otherTestIds, sep = '/')
lapply(allNewFolders, dir.create, recursive = TRUE)
file.copy(from=allFilePaths, to=allNewFolders,
copy.mode = TRUE)
file.copy can copy multiple files, but only to a single destination folder by the looks of it.
In order to copy a bunch of files into varying destination folders, the following will do the job, where allOldFilePaths is a column containing the old filepath where each file currently exists, and allNewFilePaths is a column containing the new folder path for each file.
# function to copy a single file
copySingleFile <- function(oldPath, newPath) {
file.copy(from=oldPath, to=newPath,
copy.mode = TRUE)
}
# copy each file to its new folder path
mapply(copySingleFile, allFilePathsWithRoot, allNewFilePaths)
I am trying to extract many .csv files from multiple directories/subdirectories and copy them in a new folder, where I would like to end up with only .csv files.
The csv files are stored in subdirectories with the following structure:
D:\R data\main_folder\03\07\04\BBB_0120180307031414614.csv
D:\R data\main_folder\03\07\05\BBB_0120180307031414615.csv
I am trying the list.files function to extract the csv files names only.
my_dirs <- list.files("D:\\R data\\main_folder\\",pattern="\\.csv$" ,recursive = TRUE,
include.dirs = FALSE,full.names = FALSE)
The problem is that csv files are listed with the directory path, e.g.
03/07/03/BBB_0120180307031414614.csv
And this, even though full.names and include.dirs is set to FALSE.
This prevents me from copying those files in a new folder, as the name is not recognized.
What am I doing wrong?
Thanks
Use basename function coupled with list.files like below.
If I understood you correctly then you want to fetch the names of .csv files present in different directory.
I have made a temp folder in my documents directory of windows machine , Inside that I have two folders "one" and "two", Inside these folders I have csv files named as "just_one.csv" and "just_two.csv".
So If I want to fetch the names "just_one.csv" and "just_two.csv" then I could do this:
basename(list.files("C:/Users/C_Nfdl_99878314/Documents/temp", "*.csv", recursive=T))
Which results to:
[1] "just_one.csv" "just_two.csv"
I have a folder Tmin which contains 18 folders. Each of the 18 folders contains hundreds of file. I would like to create a program with R that allow to add the name of the folder files for each file. I do not want to rename each of the file with a different name, I only want to add the folder name at the beginning of the file name. I am new in R and in programming. I was not able to have a batch function that can repeat the operation for each folder. You can find attached two pictures, which show what I would like to obtain.
For example, the file called "name_date.tiff" contained in the folder "MACA_Miroc" will become "MACA_Miroc_name_date.tiff". Moreover, I would like to repeat the operation automatically for each folder. Thanks in advance for any help!
Wanted situation and organization of my folders and file
This ought to work:
mydir <- getwd()
primary_folder <- "C:/Users/Desktop/Test_Data/"
subfolders <- grep("*MACA*", list.dirs(primary_folder, full.names = T, recursive = F),
value = T)
renameFunc <- function(z){
setwd(z)
fnames <- dir(recursive = F, pattern= ".tiff|.csv")
addname <- substr(z, nchar(primary_folder)+2, nchar(z))
lapply(fnames, function(current_name){
#Regex to get extension, may need to addd $ sign to signify end of file name
ptrn <- ".*\\.([a-zA-Z]{2,4})"
extension <- regmatches(current_name, regexec(ptrn, current_name))[[1]][2]
no_extension <- gsub(paste(".",extension, sep = ""), "", current_name)
new_name <- paste(gsub("_"," ", no_extension), " ", addname, ".", extension, sep = "")
file.rename(current_name, new_name)
})
}
lapply(subfolders, readFunc)
setwd(mydir)
I think if you're not in the directory where you want to change file names, you must specify the full name. Changing the working directory was a quick way but you could use full names (using regular expressions to get the correct from and to values for file.rename()). I got some errors at one poing when I was not in the directory where I wanted to change the name.
I feel this allows more control over which folders you want to change the names in since incorrect operation can be very messy. You may also want to skip some file extensions or subfolders etc.
Your path folder
folder<-"C:/path/example/"
Extract files list
files<-list.files(folder)
Extract folder name
folder_name<-unlist(strsplit(folder,"/"))[length(unlist(strsplit(folder,"/")))]
Rename all files
file.rename(from = paste0(folder,files),to = paste0(folder,folder_name,"_",files))