I have hundreds of excel files with a single column and a single sheet containing text. I am trying to write a loop that will 'Wrap Text' and align the single column in all of the files, preferably without reading the files into R.
I already set the style object as follows:
style <-
openxlsx::createStyle(
halign = "left",
valign = "center",
wrapText = T
)
I have tried both a for loop and lapply but both only performs the openxlsx::addStyle to one file out of the 100s. Doesn't have to be openxlsx, it can be xlConnect or or any other package for xlsx files, even VBA is welcomed, if I can call it from R.
Please help.
Thanks in advance.
This will probably be pretty slow and will most likely require reading the files into R, so I'm not sure how much this helps š
.
Libraries
library(openxlsx)
Find files
First you need a list of all the excel files you have:
xlsx_paths <- list.files(path = "./folder_with_yr_excels", pattern = "xlsx$")
This will create a vector of all the .xlsx files you have in the folder.
Write function
Then we can write a function to do what you want to a single file:
text_wrapper <- function(xlsx_path){
#this links the file to R using the openxlsx package
n3 <- openxlsx::loadWorkbook(file = xlsx_path)
# this creates the style that you wanted:
style <-
openxlsx::createStyle(
halign = "left",
valign = "center",
wrapText = TRUE
)
# this adds the style to the excel file we just linked with R
openxlsx::addStyle(n3, sheet = 1, cols = 1:400, rows= 1:400, style, gridExpand = TRUE)
#this removes the .xlsx part from the path name
xlsx_path2 <- sub(pattern = ".xlsx",
replacement = "",
x= xlsx_path)
# This is the naming standard I'll use:
#"original_file_name" -> "original_file_name_reformatted.xlsx"
new_path <- paste(xlsx_path2, "_reformatted", ".xlsx", sep = "")
# this saves the reformated excel file
saveWorkbook(n3, file = new_path, overwrite = TRUE)
}
Notes
For other people coming across this post, here's a more in depth description of the openxlsx R package and some of the formatting things that can be done with it: https://cran.r-project.org/web/packages/openxlsx/openxlsx.pdf
An annoying thing about this package is that you have to specify how many rows and columns you want to apply the style to, which become annoying when you don't know how many rows and columns you have. The not great workaround is to specify a large number of columns (in this case I did 400)
openxlsx::addStyle(n3, sheet = 1, cols = 1:400, rows= 1:400, style, gridExpand = TRUE)
As of the time of posting, it sounds like there's not a better solution: https://github.com/awalker89/openxlsx/issues/439
Apply function to files
Anyways, the final step is to apply the function we wrote to all the excel files we found.
lapply(paste("./folder_with_yr_excels",xlsx_paths,sep = ""), text_wrapper)
Since that was done inside of a function we don't have to go back and delete intermediate data file. Yay!
Notes
The paste("./folder_with_yr_excels",xlsx_paths,sep = "") step adds the folder name back to the path name. There's an option in list.files() to keep the whole file path intact, but I like to keep track of which folder I'm dealing with by pasting the folder name back on at the end.
Related
i have to save at least three diferent lists in trhee diferent execel files at the same time, in this case the name of the lists that will bacame files change only in the years, like this:
fluminense_2011
fluminense_2012
fluminense_2013
with this pointed i want to make a loop that can automate the process of saving the lists on excel files, but i dont know how to make that. I was tryng to save the lists in one vector and than triyng to aplly the follow formula:
data_names <- c("fluminense_2011", "fluminense_2012" , "fluminense_2014")
for(i in 2:length(data_names)) {
write.xlsx2(get(data_names[i]), paste0(my_path, "fluminense_bruto"),
row.names = FALSE, sheetName = data_names[i], append = TRUE)}
the problem is that i dont know how to change de formula above to adpapt for my problem
take a look at this answer.
dfs <- c('iris','cars')
lapply(dfs,function(x) xlsx::write.xlsx2(eval(as.symbol(x)), paste0(my_path, x,".xlsx"), row.names = FALSE, sheetName = x, append = TRUE))
I have a folder (folder 1) containing multiple csv: "x.csv", "y.csv", "z.csv"...
I want to extract the 3rd column of each file and then write new csv files in a new folder (folder 2). Hence, folder 2 must contain "x.csv", "y.csv", "z.csv"...(but with just the 3rd column).
I tried this:
dfiles <- list.files(pattern =".csv") #if you want to read all the files in working directory
lst2 <- lapply(dfiles, function(x) (read.csv(x, header=FALSE)[,3]))
But I got this error:
Error in `[.data.frame`(read.csv(x, header = FALSE), , 3) :
undefined columns selected
Moreover, I don't know how to write multiple csv.
However, if I do this with one file, it works properly, despite the output is in the same folder:
essai <-read.csv("x.csv", header = FALSE, sep = ",")[,3]
write.csv (essai, file = "x.csv")
Any help would be appreciated.
so here's how I would do it. There may be a nicer and more efficient way but it should still work pretty well.
setwd("~/stackexchange") #set your main folder. Best way to do this is actually the here() package. But that's another topic.
library(tools) #for file extension tinkering
folder1 <- "folder1" #your original folder
folder2 <- "folder2" #your new folder
#I setup a function and loop over it with lapply.
write_to <- function(file.name){
file.name <- paste0(tools::file_path_sans_ext(basename(file.name)), ".csv")
essai <-read.csv(paste(folder1, file.name, sep = "/"), header = FALSE, sep = ",")[,3]
write.csv(essai, file = paste(folder2, file.name, sep="/"))
}
# get file names from folder 1
dfiles <- list.files(path=folder1, pattern ="*.csv") #if you want to read all the csv files in folder1 directory
lapply(X = paste(folder1, dfiles, sep="/"), write_to)
Have fun!
Btw: if you have many files, you could use data.table::fread and data.table::fwrite which improves csv reading/writing speed by a lot.
First of all, from the error message it seems that some of the csv files have less than 3 columns. Check if you are reading the correct files and if all of them are supposed to have 3 columns at least.
Once you do that you can use the below code, to read the csv file, select the 3rd column and write the csv file in 'folder2'.
lapply(dfiles, function(x) {
df <- read.csv(x, header = FALSE)
write.csv(subset(df, select = 3), paste0('folder2/', x), row.names = FALSE)
})
For the "write" portion of this question, I had some luck using map2() in purrr. I'm not sure this is the most elegant solution but here it goes:
listofessais # this is your .csv files together as a named list of tbls
map2(listofessais, names(listofessais), ~write_csv(.x, glue("FilePath/{.y}.csv"))
That should give you all your .csv files exported in that folder, and named with the same names they were given in the list.
I am pretty new to R and programming so I do apologies if this question has been asked elsewhere.
I'm trying to load multiple .csv files, edit them and save again. But cannot find out how to manage more than one .csv file and also name new files based on a list of character strings.
So I have .csv file and can do:
species_name<-'ace_neg'
{species<-read.csv('species_data/ace_neg.csv')
species_1_2<-species[,1:2]
species_1_2$species<-species_name
species_3_2_1<-species_1_2[,c(3,1,2)]
write.csv(species_3_2_1, file='ace_neg.csv',row.names=FALSE)}
But I would like to run this code for all .csv files in the folder and add text to a new column based on .csv file name.
So I can load all .csv files and make a list of character strings for use as a new column text and as new file names.
NDOP_files <- list.files(path="species_data", pattern="*.csv$", full.names=TRUE, recursive=FALSE)
short_names<- substr(NDOP_files, 14,20)
Then I tried:
lapply(NDOP_files, function(x){
species<-read.csv(x)
species_1_2<-species[,1:2]
species_1_2$species<-'name' #don't know how to insert first character string of short_names instead of 'name', than second character string from short_names for second csv. file etc.
Then continue in the code to change an order of columns
species_3_2_1<-species_1_2[,c(3,1,2)]
And then write all new modified csv. files and name them again by the list of short_names.
I'm sorry if the text is somewhat confusing.
Any help or suggestions would be great.
You are actually quite close and using lapply() is really good idea.
As you state, the issue is, it only takes one list as an argument,
but you want to work with two. mapply() is a function in base R that you can feed multiple lists into and cycle through synchronically. lapply() and mapply()are both designed to create/ manipulate objects inRbut you want to write the files and are not interested in the out withinR. Thepurrrpackage has thewalk*()\ functions which are useful,
when you want to cycle through lists and are only interested in creating
side effects (in your case saving files).
purrr::walk2() takes two lists, so you can provide the data and the
file names at the same time.
library(purrr)
First I create some example data (Iām basically already using the same concept here as I will below):
test_data <- map(1:5, ~ data.frame(
a = sample(1:5, 3),
b = sample(1:5, 3),
c = sample(1:5, 3)
))
walk2(test_data,
paste0("species_data/", 1:5, "test.csv"),
~ write.csv(.x, .y))
Instead of getting the file paths and then stripping away the path
to get the file names, I just call list.files(), once with full.names = TRUE and once with full.names = FALSE.
NDOP_filepaths <-
list.files(
path = "species_data",
pattern = "*.csv$",
full.names = TRUE,
recursive = FALSE
)
NDOP_filenames <-
list.files(
path = "species_data",
pattern = "*.csv$",
full.names = FALSE,
recursive = FALSE
)
Now I feed the two lists into purrr::walk2(). Using the ~ before
the curly brackets I can define the anonymous function a bit more elegant
and then use .x, and .y to refer to the entries of the first and the
second list.
walk2(NDOP_filepaths,
NDOP_filenames,
~ {
species <- read.csv(.x)
species <- species[, 1:2]
species$species <- gsub(".csv", "", .y)
write.csv(species, .x)
})
Learn more about purrr at purrr.tidyverse.org.
Alternatively, you could just extract the file name in the loop and stick to lapply() or use purrr::map()/purrr::walk(), like this:
lapply(NDOP_filepaths,
function(x) {
species <- read.csv(x)
species <- species[, 1:2]
species$species <- gsub("species///|.csv", "", x)
write.csv(species, gsub("species///", "", x))
})
NDOP_files <- list.files(path="species_data", pattern="*.csv$",
full.names=TRUE, recursive=FALSE)
# Get name of each file (without the extension)
# basename() removes all of the path up to and including the last path seperator
# file_path_sands_ext() removes the .csv extension
csvFileNames <- tools::file_path_sans_ext(basename(NDOP_files))
Then, I would write a function that takes in 1 csv file and does some manipulation to the file and outputs out a data frame. Since you have a list of csv files from using list.files, you can use the map function in the purrr package to apply your function to each csv file.
doSomething <- function(NDOP_file){
# your code here to manipulate NDOP_file to your liking
return(NDOP_file)
NDOP_files <- map(NDOP_files, ~doSomething(.x))
Lastly, you can manipulate the file names when you write the new csv files using csvFileNames and a custom function you write to change the file name to your liking. Essentially, use the same architecture of defining your custom function and using map to apply to each of your files.
I have 3 xlsx files with 7 sheets each. Each morning I have to delete the contents of each sheet to prepare for importing new data.
I want to automate the process using R. I can do this with Excel Macro, but an R script is what I want (that way I don't need to wait longer as macro enabled files are way slower).
I only want to get blank sheets while keeping the sheetnames, and all formatting (I have formatted all cells as Text, and cells have particular widths). Solutions using openxlsx package are more appreciated.
EDIT:
The sheets contain about 15 columns and at max 200 rows. Still non vectorized solutions are a bit slow.
You can use deleteData function of openxlsx.
library(openxlsx)
wkbook <- loadWorkbook(file = "test.xlsx")
deleteData(wkbook , sheet = 1,cols = 1:100, rows = 1:100000, gridExpand = TRUE)
After deletion use you use WriteData function to insert/append data to existing file. Here I'm clearing rows till 100000 and columns till 100. You can identify the last row and last column of the workbook and delete accordingly.
This seems like a bit of a hack, but you can simply write an empty string to each cell that is in use and then save the resulting workbook.
library(openxlsx)
wb = loadWorkbook("Test.xlsx")
for(ss in 1:length(wb$worksheets)) {
Rows = wb$worksheets[[ss]]$sheet_data$rows
Cols = wb$worksheets[[ss]]$sheet_data$cols
for(i in 1:length(Rows)) {
writeData(wb,ss,"", startCol = Cols[i], startRow = Rows[i])
}
}
saveWorkbook(wb, "Test2.xlsx", overwrite = TRUE)
However, Why can't you just make the empty spreadsheet once as a template and then make a copy it every time you need to use it?
I have several hundred xls files that have incorrect data in them.
I need to open them make corrections and save them.
Making corrections is a trivial matter so I already wrote the code for that, trick is that each sheet has one table which starts on row 3 and first two rows contain legal header.
I am accustomed to using readxl package but it does not provide tools for saving spreadsheets. So today I have been experimenting with xslx package but I'm not quite sure how to make it work.
In readr I was able to use following to write csv files with a disclaimer:
write_csv(Disclaimer, filepath,col_names = FALSE)
write_csv(my.data.frame,filepath, col_names = TRUE,append =TRUE)
In xlsx this doesn't work:
write.xlsx(filepath,Disclaimer,"Sheet1",col.names = FALSE)
write.xlsx(filepath,my.data.frame,"Sheet1",col.names = TRUE,append=TRUE)
This yeilds a java error:
java.lang.IllegalArgumentException: The workbook already contains a sheet of this name
So my question is how can you write a xls file(alternatively xlsx but not csv), such that it has a header above the acutual table.
For the record my workbooks have two sheets, both sheets have the disclaimer, but they contain different tables.
I really wish I could move away from excel files but I need to maintain original format.
Also I'm open to using other packages but I'm not familiar with others(reasearching XLConnect as we speak).
I figured it out!!!
xlsx has other low level functions that let you build each sheet piece by piece.
See my test code bellow:
wb <- createWorkbook(type = "xls")
sh1 <- createSheet(wb,sheetName = "Sheet1")
addDataFrame(data.frame("Disclaimer"=c("Disclaimer")),sheet = sh1,row.names = FALSE,startRow = 1,col.names = FALSE)
addDataFrame(data.frame("Col1" = c(1,2,3),"Col2"=4:6),sheet = sh1,row.names = FALSE,startRow = 2)
sh2 <- createSheet(wb,sheetName = "Sheet2")
addDataFrame(data.frame("Disclaimer"=c("Disclaimer")),sheet = sh2,row.names = FALSE,startRow = 1,col.names = FALSE)
addDataFrame(data.frame("Col1" = c(1,2,3),"Col2"=4:6),sheet = sh2,row.names = FALSE,startRow = 2)
saveWorkbook(wb,"test_wb.xls")