Save a lot of file excel as rda using R - r

I have 1000 file excel with the name "1.xlsx" "2.xlsx" ... "1000.xlsx". Then how can i write a loop to save them as "1.rda" "2.rda" ... "1000.rda" without using this code 1000 times
j1 <- read.xlsx("1.xlsx",1)
save(j1, file = "j1.rda")
Thanks a lot

Does this work?
library(tidyverse)
xlsx_to_rda <- function(inputname, outputname){
save(read.xlsx(inputname,1), file = outputname)
}
walk2(paste0(1:1000, ".xlsx"),
paste0(1:1000, ".rda"),
xlsx_to_rda)
By the way rds would be a better file format, because it stores just one r object.

Related

How to convert xlsx files to csv files in RStudio? Need to convert multiple workbooks all with multiple spreadsheets

Trying to write an R script that will convert multiple xlsx workbook files within a folder while also converting the sheets within the workbook as separate csv files.
Looking for a single script to automatically apply code to all workbooks and their spreadsheets.
For reading Excel files, there are several packages.
I personally am happy with the xlsx package, which you can use to read Excel files, as well as their individual sheets. This article looks like it will give you the gist of it.
Each worksheet you read out you should then be able to export to CSV files by using R's built-in write.csv (or write.csv2) method.
Below is an example to convert a single xlsx workbook to multiple csv files.
Note that type conversions are not guaranteed to be correct.
xlsx_path <-"path_to_xlsx.xlsx"
sheet_names <- readxl::excel_sheets(xlsx_path)
# read from all sheets to a list of data frames
xlsx_data <- purrr::map(
sheet_names,
~readxl::read_excel(xlsx_path,.x,col_types = "text",col_names = FALSE)
)
# write a list of data frame to csv files
purrr::walk2(
xlsx_data,sheet_names,
~readr::write_csv(.x,paste0(xlsx_path,"-",.y,".csv"),col_names = FALSE)
)
# csv files will be saved as:
# path_to_xlsx-sheet1.xlsx, path_to_xlsx-sheet2.xlsx, ...
If you need to apply this function to many xlsx files. Use list.files() to get the path to all xlsx files. And write a for loop or use another map function to iterate this process.
If you are using Rstudio it is possible that you already have the package readxl installed. They have many workflows for common usecases explained here: https://readxl.tidyverse.org/articles/articles/readxl-workflows.html
They also provide this nice code snippet to do what you are asking for:
read_then_csv <- function(sheet, path) {
pathbase <- tools::file_path_sans_ext(basename(path))
df <- read_excel(path = path, sheet = sheet)
write.csv(df, paste0(pathbase, "-", sheet, ".csv"),
quote = FALSE, row.names = FALSE)
df
}
path <- readxl_example("datasets.xlsx")
sheets <- excel_sheets(path)
xl_list <- lapply(excel_sheets(path), read_then_csv, path = path)
names(xl_list) <- sheets
If you go to here and put "excel" and "xls" in the search bar, you 'll get a list of packages and functions which might help.

Convert XLS to CSV - R (Tried Rio Package)

I have a list of files in a directory which I'm trying to convert to csv, had tried rio package and solutions as suggested here
The output is list of empty CSV files with no content. It could be because the first 8 rows of the xls files have an image and few emtpy lines with couple couple of cells filled with text.
Is there any way I could skip those first 8 lines in all of xls files before converting.
Tried exploring options from openxlsx or readxls packages, any suggestions or guidance will be helpful.
Please do not mark as duplicate since I have a different problem than the one that was already answered
Maybe the following will work. At least it does for my own mock-up of an excel file with a picture in the top
library("readxl") # To read xlsx
library("readr") # Fast csv write
indata <- read_excel("~/cowexcel.xlsx", skip=8)
write_csv(indata, path="cow.csv")
If you are running this for several files then combine it into a function. Note that the function below does no checking and might overwrite existing csv files
convert_excel_to_csv <- function(name) {
indata <- read_excel(name, skip=8)
write_csv(indata, path=paste0(tools::file_path_sans_ext(name), ".csv"))
}
Although I was not able to do it with rio to convert, I read it as xls and wrote it back as csv using below code. Testing worked fine, Hope it works without glitch in implementation.
files <- list.files(pattern = '*.xls')
y=NULL
for(i in files ) {
x <- read.xlsx(i, sheetIndex = 1, header=TRUE, startRow=9)
y= rbind(y,x)
}
dt <- Sys.Date()
fn<- paste("path/",dt,".csv",sep="")
write.csv(y,fn,row.names = FALSE)

How do I zip a csv file and write that zipped file to a folder using R

I have an R script which generates a csv file of nearly 80000 KB after calculations. I want to write this csv file to folder say D:/My_Work/Output with file name result.zip as a zipped file. Please suggest is there any function or any way that i could achieve this.
Use the zip function:
zip(*path to zip*,*path to csv*)
edit: Unfortunately you cannot go from data.frame straight to zipped csv. You need to explicitly make the csv, but it wouldn't be hard to write a wrapper that deletes the csv so that you never know its there like so:
zipped.csv <- function(df, zippedfile) {
# init temp csv
temp <- tempfile(fileext=".csv")
# write temp csv
write.csv(df, file=temp)
# zip temp csv
zip(zippedfile,temp)
# delete temp csv
unlink(temp)
}
If you want just save some space on the disk then it is more convenient to use *.gz compression.
write.csv(iris, gzfile("iris.csv.gz"), row.names = FALSE)
iris2 = read.csv("iris.csv.gz")

Save data.frame objects into .Rds files within a loop

I have data.frame objects with normalized names into my global env and I want to save them into .Rda files.
My first question is, should I save them into one big .Rda file or should I create one file for each data frame ? (df have 14 col and ~260 000 row).
Assuming that I'll save them into differents files, I was thinking about a function like this : (All my data.frame names begin by "errDatas")
sapply(ls(pattern = "errDatas"), function(x) save(as.name(x), file = paste0(x, ".Rda")))
But I have this error :
Error in save(as.name(x), file = paste0(x, ".Rda")) :
objet ‘as.name(x)’ introuvable
Seems like save can't parse as.name(x) and evaluate it as is. I tried also with eval(parse(text = x)) but it's the same thing.
Do you have an idea about how I can manage to save my data frames within a loop ? Thanks.
And I have a bonus question to know if what I'm trying to do is useful and legit :
These data frames come from csv files (one data frame by csv file which I import with read.csv). Each day I have one new csv file and I want to do some analysis on all the csv files. I realized that reading from csv is much slower than saving and loading a Rda file. So instead of reading all the csv each time I run my program, I actualy want to read each csv file only once, saving it into a Rda file and then loading it. Is this a good idea ? Is there best-practices for that with R ?
Use the list= parameter of the save function. This allows you to specify the name of the object as a character vector rather than passing the object itself. For example
lapply(ls(pattern = "errDatas"), function(x) {
save(list=x, file = paste0(x, ".Rda"))
})

In R, opening an object saved to Excel through shell.exec

I would like to be able to open files quickly in Excel after saving them. I learned from R opening a specific worksheet in a excel workbook using shell.exec 1 on SO
On my Windows system, I can do so with the following code and could perhaps turn it into a function: saveOpen <_ function {... . However, I suspect there are better ways to accomplish this modest goal.
I would appreciate any suggestions to improve this multi-step effort.
# create tiny data frame
df <- data.frame(names = c("Alpha", "Baker"), cities = c("NYC", "Rome"))
# save the data frame to an Excel file in the working directory
save.xls(df, filename "test file.xlsx")
# I have to reenter the file name and add a forward slash for the paste() command below to create a proper file path
name <- "/test file.xlsx"
# add the working directory path to the file name
file <- paste0(getwd(), name)
# with shell and .exec for Windows, open the Excel file
shell.exec(file = file)
Do you just want to create a helper function to make this easier? How about
save.xls.and.open <- function(dataframe, filename, ...) {
save.xls(df, filename=filename, ...)
cmd <- file.path(getwd(), filename)
shell.exec(cmd)
}
then you just run
save.xls.and.open(df, filename ="testfile.xlsx")
I guess it doesn't seem like all that many steps to me.

Resources