R - load multiple csv files and drop .csv from name - r

I have some files in a base directory that I use to house all my .csv files
base_dir <- file.path(path)
file_list <- list.files(path = base_dir, pattern = "*.csv")
I would like to load all of them at once:
for (i in 1:length(file_list)){
assign(file_list[i],
read.csv(paste(base_dir, file_list[i], sep = ""))
)}
However, this produces files with ".csv" in the names in R.
What I would like to do is load all the files but drop the ".csv" from the name once they are loaded.
I have tried the following:
for (i in 1:length(file_list)){ assign(file_list[i],
read.csv(substr(paste(base_dir, file_list[i], sep = ""), 1,
nchar(file_list[i]) -4))
)}
But I received an error: No such file or directory
Is there a way to do this somewhat efficiently?

Normally one reads them into a list rather than having them as free objects floating around in the workspace. Use dir or Sys.glob to generate the full path names and then use read.csv to read each one in. L's names will be pathnames so reduce them to the basename and remove .csv .
# paths <- dir(path = path, pattern = "\\.csv$", full = TRUE)
paths <- Sys.glob(sprintf("%s/*.csv", path))
L <- Map(read.csv, paths)
names(L) <- sub("\\.csv$", "", basename(names(L)))
If you really want them as free floating objects anyways then add this:
list2env(L, .GlobalEnv)

We can use sub to remove the .csv at the end
for (i in 1:length(file_list)){
assign(sub("\\.csv$", "", basename(file_list[i])),
read.csv(paste(base_dir, file_list[i], sep = ""))
)}
Or another option is file_path_sans_ext
for (i in 1:length(file_list)){
assign(tools::file_path_sans_ext(basename(file_list[i])),
read.csv(paste(base_dir, file_list[i], sep = ""))
)}
The error produced in OP's code is because the substr is applied on the 'value' part i.e. the one goes into reading the file instead of the 'x' i.e. the corrected code would be
for(i in 1:length(file_list)){
assign(substr(paste(base_dir, file_list[i], sep = ""),
1, nchar(file_list[i]) -4),
read.csv(file_list[i])
)}
Also, if the working directory is different, it may be better to specify full.names = TRUE
file_list <- list.files(path = base_dir, pattern = "*\\.csv$", full.names = TRUE)

Related

Loading Multiple .txt files where columns are separated by | character in R

I am attempting to load multiple text files into R and in each of the files, the columns are divided using the "|" character.
To give a sense of what the file structure looks like, a given row will look like:
congression printer|-182.431552949032
In this file I want to separate the congressional printer string from the numerical characters.
When using the following code:
folder <- '~/filepath'
file_list <- list.files(path=folder, pattern="*.txt")
data <-
do.call('rbind',
lapply(file_list,
function(x)
read.table(paste(folder, x, sep= ""),
header = TRUE, row.names = NULL)))
It'll load in the data as:
[1] [2]
congression printer|-182.431552949032
Is there away to correct this later using the tidyr::separate() function or by hedging the problem in the beginning? When trying to just put sep ="|" in the code above, that just impacts how my text files are found so that doesn't really work.
Things are always easier (and more powerful) with data.table :
library(data.table)
folder <- '~/filepath'
pathsList <- list.files(path=folder, pattern="*.txt", full.names = T)
rbindlist(lapply(pathsList, fread))
this works too:
folder <- '~/filepath'
file_list <- list.files(path=folder, pattern="*.txt")
data <-
do.call('rbind',
lapply(file_list,
function(x)
read.table(paste0(folder, x), sep = "|",
header = TRUE, row.names = NULL)))

Appending a list in a loop (R)

I want to use a loop to read in multiple csv files and append a list in R.
path = "~/path/to/csv/"
file.names <- dir(path, pattern =".csv")
mylist=c()
for(i in 1:length(file.names)){
datatmp <- read.csv(file.names[i],header=TRUE, sep=";", stringsAsFactors=FALSE)
listtmp = datatmp[ ,6]
finallist <- append(mylist, listtmp)
}
finallist
For each csv file, the desired column has a different length.
In the end, I want to get the full appended list with all values in that certain column from all csv files.
I am fairly new to R, so I am not sure what I'm missing...
There are four errors in your approach.
First, file.names <- dir(path, pattern =".csv") will extract just file names, without path. So, when you try to import then, read.csv() doesn't find.
Building the path
You can build the right path including paste0():
path = "~/path/to/csv/"
file.names <- paste0(path, dir(path, pattern =".csv"))
Or file.path(), which add slashes automaticaly.
path = "~/path/to/csv"
file.names <- file.path(path, dir(path, pattern =".csv"))
And another way to create the path, for me more efficient, is that suggested in the answer commented by Tung.
file.names <- list.files(path = "~/path/to/csv", recursive = TRUE,
pattern = "\\.csv$", full.names = TRUE)
This is better because in addition to being all in one step, you can use within a directory containing multiple files of various formats. The code above will match all .csv files in the folder.
Importing, selecting and creating the list
The second error is in mylist <- c(). You want a list, but this creates a vector. So, the correct is:
mylist <- list()
And the last error is inside the loop. Instead of create other list when appending, use the same object created before the loop:
for(i in 1:length(file.names)){
datatmp <- read.csv(file.names[i], sep=";", stringsAsFactors=FALSE)
listtmp = datatmp[, 6]
mylist <- append(mylist, list(listtmp))
}
mylist
Another approach, easier and cleaner, is looping with lapply(). Just this:
mylist <- lapply(file.names, function(x) {
df <- read.csv(x, sep = ";", stringsAsFactors = FALSE)
df[, 6]
})
Hope it helps!

Looping multiple pdf and converting to multiple excel using R programming

I have few PDF files in a folder. I am performing certain operations and converting them into excel. Below is the code,
init <- dir(path = "C:/Users/sankirtanmoturi/Desktop/rloop", pattern = "\\.pdf$", all.files = TRUE, full.names = TRUE)
trans <- function(file){
try <- pdf_text(file)
try1 <- unlist(str_split(try,"[\\r\\n]+"))
try2 <- str_split_fixed(str_trim(try1), "\\s{1,}, 20")
write.xlsx(try2, sub("\\.xlsx$", "-UP.xlsx", file))
}
lapply(init, trans)
I am getting the below error
Error in identical(n, Inf) : argument "n" is missing, with no default
I figured out that, there's problem with str_split or str_split_fixed.
But if I am not trying to loop and try for a single file, It is converting successfully
Please help me to run this for all pdf files in a folder
There are mainly typos in your question. The below code should work:
init <- dir(path = "C:/Users/sankirtanmoturi/Desktop/rloop", pattern = "\\.pdf$", all.files = TRUE, full.names = TRUE)
trans <- function(file){
try <- pdf_text(file)
try1 <- unlist(str_split(try,"[\\r\\n]+"))
try2 <- str_split_fixed(str_trim(try1), "\\s{1,}", 20)
write.xlsx(try2, sub("\\.pdf$", "-UP.xlsx", file))
}
lapply(init, trans)

Apply a script by looping through multiple files in a directory and copy the changes back to every corresponding file

I want to apply the below script to every file in the Weather directory and copy the changes back to the same csv file (Bladen.csv in this case).
Bladen <- read.csv("C:/Users//Desktop/Weather/Bladen.csv",header=T, na.strings=c("","NA"))
Bladen <- Bladen[,c(1,6,11,17,18,19)]
I would try something like this:
setwd('/adress/to/the/path')
files <- dir()
for(i in files){
Bladen <- read.csv(i, header=T, na.strings=c("","NA"))
Bladen <- Bladen[,c(1,6,11,17,18,19)]
write.csv(Bladen, i)
}
Please tell me if it works for you.
If you are looking to update each file in your directory by adding the same column to each file and writing the file back to the same directory.
setwd(set_your_path)
filenames <- list.files()
lapply(filenames, function(i){
Bladen = read.csv(i, sep = ",", header = TRUE, na.strings = c("NA","N/A","null",""," "))
Bladen<- Bladen[, c(1,6,11,17,18,19)]
write.csv(Bladen, i, sep = ",")
})

Combine csv files with common file identifier

I have a list of approximately 500 csv files each with a filename that consists of a six-digit number followed by a year (ex. 123456_2015.csv). I would like to append all files together that have the same six-digit number. I tried to implement the code suggested in this question:
Import and rbind multiple csv files with common name in R but I want the appended data to be saved as new csv files in the same directory as the original files are currently saved. I have also tried to implement the below code however the csv files produced from this contain no data.
rm(list=ls())
filenames <- list.files(path = "C:/Users/smithma/Desktop/PM25_test")
NAPS_ID <- gsub('.+?\\([0-9]{5,6}?)\\_.+?$', '\\1', filenames)
Unique_NAPS_ID <- unique(NAPS_ID)
n <- length(Unique_NAPS_ID)
for(j in 1:n){
curr_NAPS_ID <- as.character(Unique_NAPS_ID[j])
NAPS_ID_pattern <- paste(".+?\\_(", curr_NAPS_ID,"+?)\\_.+?$", sep = "" )
NAPS_filenames <- list.files(path = "C:/Users/smithma/Desktop/PM25_test", pattern = NAPS_ID_pattern)
write.csv(do.call("rbind", lapply(NAPS_filenames, read.csv, header = TRUE)),file = paste("C:/Users/smithma/Desktop/PM25_test/MERGED", "MERGED_", Unique_NAPS_ID[j], ".csv", sep = ""), row.names=FALSE)
}
Any help would be greatly appreciated.
Because you're not doing any data manipulation, you don't need to treat the files like tabular data. You only need to copy the file contents.
filenames <- list.files("C:/Users/smithma/Desktop/PM25_test", full.names = TRUE)
NAPS_ID <- substr(basename(filenames), 1, 6)
Unique_NAPS_ID <- unique(NAPS_ID)
for(curr_NAPS_ID in Unique_NAPS_ID){
NAPS_filenames <- filenames[startsWith(basename(filenames), curr_NAPS_ID)]
output_file <- paste0(
"C:/Users/nwerth/Desktop/PM25_test/MERGED_", curr_NAPS_ID, ".csv"
)
for (fname in NAPS_filenames) {
line_text <- readLines(fname)
# Write the header from the first file
if (fname == NAPS_filenames[1]) {
cat(line_text[1], '\n', sep = '', file = output_file)
}
# Append every line in the file except the header
line_text <- line_text[-1]
cat(line_text, file = output_file, sep = '\n', append = TRUE)
}
}
My changes:
list.files(..., full.names = TRUE) is usually the best way to go.
Because the digits appear at the start of the filenames, I suggest substr. It's easier to get an idea of what's going on when skimming the code.
Instead of looping over the indices of a vector, loop over the values. It's more succinct and less likely to cause problems if the vector's empty.
startsWith and endsWith are relatively new functions, and they're great.
You only care about copying lines, so just use readLines to get them in and cat to get them out.
You might consider something like this:
##will take the first 6 characters of each file name
six.digit.filenames <- substr(filenames, 1,6)
path <- "C:/Users/smithma/Desktop/PM25_test/"
unique.numbers <- unique(six.digit.filenames)
for(j in unique.numbers){
sub <- filenames[which(substr(filenames,1,6) == j)]
data.for.output <- c()
for(file in sub){
##now do your stuff with these files including read them in
data <- read.csv(paste0(path,file))
data.for.output <- rbind(data.for.output,data)
}
write.csv(data.for.output,paste0(path,j, '.csv'), row.names = F)
}

Resources