Load 93-2003 Excel Worksheet (.XLS) into Excel R - r

I am trying to load excel worksheets into R using the xlsx package. The files are saved as old 97-2003 worksheets (the endings are .XLS) for newer files the code below worked fine.
df <- read.xlsx(filename,sheetIndex=2)
However, when I try on the older files I get the error message:
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
org.apache.poi.hssf.OldExcelFormatException: The supplied spreadsheet seems to be Excel 5.0/7.0 (BIFF5) format. POI only supports BIFF8 format (from Excel versions 97/2000/XP/2003)
I know the error has to do with the files being in the older format but I do not know how to solve this. I have too many files to manually update each one.
Any suggestions would be greatly appreciated!
P.S. apologies for not adding a fully reproducible example. I do not know how to attach files to go along with my question.

Package readxl is one way to read Excel files. The advantage is that there is no dependy to Java or other.
Your code would be
library(readxl)
df <- read_excel(path = filepath, sheet =2)
It should work with XLS and XLSX files.
Use excel_sheets(filepath) to get the name of sheets to import and pass them through the sheet arg of read_excel. You can do a loop with that if it helps you.

Related

Can I save an Excel Workbook as a binary file using R?

In R I am trying to save an Excel workbook as a binary worksheet (.xlsb) instead of the standard (.xlsx or .xls) method. Using packages like openxlsx or xlsx do not work because they do not convert the file into binary format. I have been digging and found the package excel.link but it keeps crashing my R session and doesn't seem to work in a timely manner.
Does anyone know of a method to achieve this?
No, Excel Binary format is a proprietary encoding/compression format used by Microsoft that is not shared. You can only view and edit Binary files in excel. Is there any reason you cant save them as csvs or a regular excel file? If it is to large you can save it as a gzip file with
data.table::fwrite(file, "filename.gzip")
It is possible to convert an XLSX file to XLSB with the R package RDCOMClient
library(RDCOMClient)
path_XLSX_File <- "C:\\...\\xlsx_File.xlsx"
path_XLSB_File <- "C:\\...\\xlsb_File.xlsb"
xlApp <- COMCreate("Excel.Application")
xlApp[['Visible']] <- FALSE
xlWbk <- xlApp$Workbooks()$Open(path_XLSX_File)
xlWbk$SaveAs(path_XLSB_File, 50)
xlWbk$Close()
xlApp$Quit()
For all the format, see https://learn.microsoft.com/en-us/previous-versions/office/developer/office-2010/ff198017(v=office.14). The "XLSB" format is "xlExcel12".

Read sheet names from a xlsb file

I've been trying to read xlsb files into R.
I've tried using excel.link and readxlsb packages and they do work for reading the file but i also need to read the sheet names.
For a normal xlsx i would go gor GetSheets but it does not work for XLSB.
Any suggestion?
Thanks

How to write data into a macro-enabled Excel file (write.xlslx corrupts my document)?

I'm trying to write a table into a macro-enabled Excel file (.xlsm) through the R. The write.xlsx (openxlsx) and writeWorksheetToFile (XLconnect) functions don't work.
When I used the openxlsx package, as seen below, the resulting .xlsm files ended up getting corrupted.
Code:
library(XLConnect)
library(openxlsx)
for (i in 1:3){
write.xlsx(Input_Files[[i]], Inputs[i], sheetName="Input_Sheet")
}
#Input_Files[[i]] are the R data.frames which need to be inserted into the .xslm file
#Inputs[i] are the excel files upon which the tables should be written into
Corrupted .xlsm file error message after write.xlsx:
Excel cannot open the file 'xxxxx.xslm' because the file format or file extension is not valid. Verify that the file has not been corrupted and that the file extension matches the format of the file
After researching this problem extensively, I found that the XLConnect connect package offers the writeWorksheetToFile function which works with .xlsm, albeit after running it a few times it yields an error message that there is no more free space. It also runs for 20+ minutes for tables with approximately 10,000 lines. I tried adding xlcFreeMemory at the beginning of the for loop, but it doesn't solve the issue.
Code:
library(XLConnect)
library(openxlsx)
for (i in 1:3){
xlcFreeMemory()
writeWorksheetToFile(Inputs[i], Input_Files[[i]], "Input_Sheet")
}
#Input_Files[[i]] are the R data.frames which need to be inserted into the .xslm file
#Inputs[i] are the excel files upon which the tables should be written into
Could anyone recommend a way to easily and quickly transfer an R table into an xlsm file without corrupting it?

Importing .xls file that is saved as *.htm, *.html as it is saved on the backend

I have a requirement where I have to import an .xls file which is saved as .*htm, .*html.
How do we load this inside R in a data frame. The data is present in Sheet1 starting from Row Number 5. I have been struggling with this by trying to load it using xlsx package and readxl package. But neither of them worked, because the native format of the file is different.
I can't edit and re-save the file manually as .xlsx, as it cannot be automated.
Also to note, saved it as a .xlsx file and it works fine. But that's not what I need.
Kindly help me with this.
Try the openxlsx package and its function read.xlsx. If that doesn't work, you could programmatically rename the file as described for example here, and then open it using one of these excel packages.
Your file could be in xls format instead of xlsx, have you tried read_xls() function from readxl? Or it could also be in text format, in this case read.table() or fread() from data.tableshould work. The fact that it works after saving the file in xlsx strongly suggests that it is not formatted as an xlsx to begin with.
Hope this helps.

Reading in xls and xlsx files in R

I am working with loads of xls and xlsx files at the same time with no easy way to convert them to the same file type.
I am facing issue reading them in because read.xlsx() from "xlsx" package works just fine with xls files but I am getting the Java Out of Memory error when trying to read in xlsx files. I tried to use the following line to extend memories with no success:
options(java.parameters = "-Xmx1000m")
As an alternative option I have tried read.xlsx() from "openxlsx" package but it does not read xls files and the aforementioned two packages are not compatible when loaded at the same time. I faced the same difficulty with the "XLConnect" package where again I face java errors when trying to use "xlsx" and "XLConnect" packages loaded at the same time.
I would be interested what people do to solve situations like this?
You can consider the read_excel function in the readxl package:
read_excel(path, sheet = 1, col_names = TRUE, col_types = NULL, na = "", skip = 0)
You can even specify which sheet in the xlsx file you want to import in, whether the first row consists of column names, as well as the missing value used in the excel files.

Resources