Batch conversion with write.xlsx and header name issue

Batch conversion with write.xlsx and header name issue - r

I am batch converting a lot of .csv files to xlsx format using write.xlsx in the openxlsx package
I use the following to convert the list of files (there are over 200).
The reason I need to do this is for upload to a database, it will only accept xlsx files.
filenames <- list.files("C:/split files", pattern="*.csv", full.names=TRUE)
for(i in filenames) {
a <- read.csv(i)
new_name <- sub('.csv', '.xlsx', i, fixed = TRUE)
write.xlsx(a, new_name, row.names = F)
The problem I have is that the headers which used to have spaces in their names (again required format for the database) now have "." where the spaces used to be. Is there a simple way to add to the above code and replace the "." with " " ?

Try
read.csv(i, check.names = F)
You got that "." because R checks and converts your column names when reading the csv file. We can preserve original names by disabling that checking.

Related

R: how to extract column in multiple csv and then write multiple csv in one folder

I have a folder (folder 1) containing multiple csv: "x.csv", "y.csv", "z.csv"...
I want to extract the 3rd column of each file and then write new csv files in a new folder (folder 2). Hence, folder 2 must contain "x.csv", "y.csv", "z.csv"...(but with just the 3rd column).
I tried this:
dfiles <- list.files(pattern =".csv") #if you want to read all the files in working directory
lst2 <- lapply(dfiles, function(x) (read.csv(x, header=FALSE)[,3]))
But I got this error:
Error in `[.data.frame`(read.csv(x, header = FALSE), , 3) :
undefined columns selected
Moreover, I don't know how to write multiple csv.
However, if I do this with one file, it works properly, despite the output is in the same folder:
essai <-read.csv("x.csv", header = FALSE, sep = ",")[,3]
write.csv (essai, file = "x.csv")
Any help would be appreciated.

so here's how I would do it. There may be a nicer and more efficient way but it should still work pretty well.
setwd("~/stackexchange") #set your main folder. Best way to do this is actually the here() package. But that's another topic.
library(tools) #for file extension tinkering
folder1 <- "folder1" #your original folder
folder2 <- "folder2" #your new folder
#I setup a function and loop over it with lapply.
write_to <- function(file.name){
file.name <- paste0(tools::file_path_sans_ext(basename(file.name)), ".csv")
essai <-read.csv(paste(folder1, file.name, sep = "/"), header = FALSE, sep = ",")[,3]
write.csv(essai, file = paste(folder2, file.name, sep="/"))
}
# get file names from folder 1
dfiles <- list.files(path=folder1, pattern ="*.csv") #if you want to read all the csv files in folder1 directory
lapply(X = paste(folder1, dfiles, sep="/"), write_to)
Have fun!
Btw: if you have many files, you could use data.table::fread and data.table::fwrite which improves csv reading/writing speed by a lot.

First of all, from the error message it seems that some of the csv files have less than 3 columns. Check if you are reading the correct files and if all of them are supposed to have 3 columns at least.
Once you do that you can use the below code, to read the csv file, select the 3rd column and write the csv file in 'folder2'.
lapply(dfiles, function(x) {
df <- read.csv(x, header = FALSE)
write.csv(subset(df, select = 3), paste0('folder2/', x), row.names = FALSE)
})

For the "write" portion of this question, I had some luck using map2() in purrr. I'm not sure this is the most elegant solution but here it goes:
listofessais # this is your .csv files together as a named list of tbls
map2(listofessais, names(listofessais), ~write_csv(.x, glue("FilePath/{.y}.csv"))
That should give you all your .csv files exported in that folder, and named with the same names they were given in the list.

Opening and closing multiple .csv files in R

I need to summarise a .csv file in R that have 12GB of data. I divided the .csv in multiple files, so I can load them, but for this I have to read the files and close them before reading the next ones. How can I do it? Is the rm() function inside the loop enough?
EDIT:
the solution i thought was this:
files <- list.files(path = "/path", pattern = ".csv")
for (i in 1:lenght(files)) {
temp <- read.csv(files[i], sep = ";", header = TRUE)
dosomething(temp)
rm(temp)
}
But I don't know if this will remove the variables from my RAM so i load them all.

How to get a vector of the file names contained in a tempfile in R?

I am trying to automatically download a bunch of zipfiles using R. These files contain a wide variety of files, I only need to load one as a data.frame to post-process it. It has a unique name so I could catch it with str_detect(). However, using tempfile(), I cannot get a list of all files within it using list.files().
This is what I've tried so far:
temp <- tempfile()
download.file("https://url/file.zip", destfile = temp)
files <- list.files(temp) # this is where I only get "character(0)"
# After, I'd like to use something along the lines of:
data <- read.table(unz(temp, str_detect(files, "^file123.txt"), header = TRUE, sep = ";")
unlink(temp)
I know that the read.table() command probably won't work, but I think I'll be able to figure that out once I get a vector with the list of the files within temp.
I am on a Windows 7 machine and I am using R 3.6.0.

Following what was said before, this structure should allow you to check the correct download with a temporary file structure :
temp <- tempfile("test.zip")
download.file("https://url/file.zip", destfile = temp)
files <- list.files(temp)

How can I read multiple xlsx files into R and then store them as seperate lists labeled with the xlsx file name?

I have a folder with numerous xlsx files that all need to be formatted in the exact same way. I want to read them into R and store them as lists that can be referenced using the xlsx file name so that I can feed it through my formatting code. This is the code that I found that labels them based on the iteration value in the for loop.
library("xlsx")
library("gdata")
library("rJava")
setwd("C:/Users/Owner/Desktop/FolderDatabase")
getwd()
files = list.files(pattern = "\\.xlsx")
#View(files)
dfList <- list()
for (i in seq_along(files)){
dfList[[paste0("excel",i)]] <- read.xlsx(files[i], sheetIndex = 1)
}
# Calling the xlsx lists that were created from the directory
dfList$excel1
dfList$excel2
dfList$excel3
dfList$excel4
If the xlsx file is named myname1.xlsx, I would like the list to be named myname1.

Rather than initializing dfList as empty, try non-for approach:
dfList <- lapply( files, read.xlsx, sheetIndex = 1)
names(dfList) <- gsub("^.+/|\\.xlsx", "", files)
Or just:
dfList <- sapply( files, read.xlsx, sheetIndex = 1)
The first part of that two part pattern is in there because I usually wor with full file spec although in your case it's probably not needed. The second part of the "OR" ("|") is needed.

R: used "write.csv" inside a loop but R won't read it back in

I have a short loop I used to create several .csv files. The loop works and the files are created correctly, and I can open them in Microsoft Excel. Everything looks great. But when I try to read these files back into R in another script, R doesn't recognize them.
Do you need to turn off some sort of driver inside the loop as you would if you were creating several png files?
Here is the loop (works). For reference, dtlm is a large dataframe with several columns including "diag", "county" and "date" and "Freq".
single = c("492", "493", "427", "428", "786")
for (q in 1:length(single)) {
xx = xtabs(~date+county, data=dtlm, subset = dtlm$diag == single[q])
xy=as.data.frame(xx)
write.csv(xy, paste(single[q], ".csv", sep=""))
}
Now here is an example of a command that R can't recognize the file with:
dt <- read.csv("C:/Users/myname/Desktop/FreqTables/492.csv")
So weird! I have also tried read.table and that didn't work either, and I didn't find anything helpful in ?read.csv. Any suggestions would be greatly appreciated!

This is how I normally do,
## store the path in some object, here 'dir'
dir <- "[path to the folder where you have your data]"
## then pick up the file names from the 'dir,' change the filter as needed
fnames <- list.files(path = dir, pattern = ".csv")
## read the data into a list
dfn = list()
for (string in fnames){
dfn[[string]]=read.csv(paste(dir,string,sep = ""))
}
You can probably do it in fewer lines, but this works for me.
If you like to merge the files together you can use something like this, note that I bring in the file name from the csv files
## remove the .csv suffix
names(dfn) <- strsplit(names(dfn),".csv")
## merging the data frames together (traditional)
DF <- dfn[[1]]
for ( .df in dfn) {
DF <-merge(DF, .df, by.x="ID", by.y="ID", all.x=T,
suffixes=paste(":", names(dfn), sep = ""))
}
Let me know if this works for you.
Best,
Eric

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Batch conversion with write.xlsx and header name issue - r

Try read.csv(i, check.names = F) You got that "." because R checks and converts your column names when reading the csv file. We can preserve original names by disabling that checking.

Related

R: how to extract column in multiple csv and then write multiple csv in one folder

Opening and closing multiple .csv files in R

How to get a vector of the file names contained in a tempfile in R?

How can I read multiple xlsx files into R and then store them as seperate lists labeled with the xlsx file name?

R: used "write.csv" inside a loop but R won't read it back in

Categories

Resources