R readxl::read_excel failed to open xls file - r

readxl_1.1.0
I'm trying to read the file from this link (US gov website)
https://www.cftc.gov/files/dea/history/dea_com_xls_2018.zip
When I unzip the xls file inside, and read with readxl::read_excel, it fails with the error message failed to open C:\path to file
I can open the file in excel, save it to csv and read it to R by fread, but there are a lot of those files, so that's tedious. By the way, some other xls files downloaded from the same webpage can be read by read_excel

There's something odd about the xls file. I think it's because it contains some VBA code.
If you are happy to use XLConnect here is an alternative that reads the file.
library(XLConnect)
extdir = tempdir()
unzip("dea_com_xls_2018.zip", exdir = extdir)
file = list.files(extdir, pattern = 'xls', full.names = T)
wb = loadWorkbook(file)
ws = readWorksheet(wb, sheet = 1)
dim(ws)
#[1] 11131 126

Related

Exporting .XLS file from openxlsx

I have a function that receives DataFrame does a bunch of transformations with openxlsx and exports the data from R to .xlsx:
export_workbook_from_df <- function(data, path) {
wb <- openxlsx::createWorkbook()
openxlsx::addWorksheet(wb, sheetName = "Sheet1")
openxlsx::openxlsx_setOp("numFmt", "0,00")
number_format <- openxlsx::createStyle(numFmt = "Number") # create thousands format
wb |>
openxlsx::addStyle(sheet = 1,
number_format,
rows = 1:nrow(dados) + 1, cols = c(6),
gridExpand = T
)
openxlsx::writeData(wb, sheet = 1, data)
openxlsx::saveWorkbook(wb, paste0(path, ".xlsx"))
}
if I try to save as .xls using openxlsx::saveWorkbook(wb, paste0(path, ".xls")) I get the following error:
Which roughly translates to:
The format of the file and the extension don't correspond. The file may be corrupted or not be safe. Don't open it, unless you trust the source. Do you want to open anyway?
The file works fine if I save it as .xlsx and manually save as .xls within Excel;
I also tried using XLConnect to load the file after is saved and export in a different format, like:
openxlsx::saveWorkbook(wb, paste0(path, ".xlsx"))
XLConnect::loadWorkbook(paste0(path, ".xlsx")) %>%
XLConnect::saveWorkbook(paste0(path, ".xls"))
While it does export the file as .xls I get the same error.
It may be worth mentioning that when I open the file I get exactly the same data as in the .xlsx file when using either methods (using openxlsx and XLConnect)
The xls and xlsx file formats are not the same: XLSX is a zipped, XML-based file format. Microsoft Excel 2007 and later uses XLSX as the default file format when creating a new spreadsheet. Support for loading and saving legacy XLS files is also included. XLS is the default format used with Office 97-2003. When you try to load the XLSX which you saved as an XLS Excel barfs as above because it is expecting the old binary format but it is encountering a zipped XML-based one instead.

How to open .xlsx files such that filenames saved as utf8 (persian/ arabic characters ) in R using Openxlsx package?

I have 3000 files saved as Persian characters for example, "فایل ۱.xlsx". When I try to open these files (using Openxlsx package), I got following error:
library(openxlsx)
> file_temp<-file.choose()
> parameters_old<-read.xlsx(file_temp,sheet = "parameters")
Error in unzip(xlsxFile, exdir = xmlDir) :
unable to translate 'D:\allfiles\<U+0627><U+0631><U+062F><U+0628><U+06CC><U+0644>.xlsx' to
native encoding
Is there any method to open the files?

Trying to convert XML into a dataframe

I am downloading the zip files from the location
http://nemweb.com.au/Data_Archive/Wholesale_Electricity/NEMDE/2019/NEMDE_2019_03/NEMDE_Market_Data/NEMDE_Files/NemPriceSetter_20190301_xml.zip
The zip file has multiple xml files inside which iam trying to read but based on the style of the XML file I cannot parse it properly and cannot convert that into a data frame
I have tried to download the zip file into a temporary directory and then tried parsing one file at a time
library(xml2)
library(tidyverse)
tf <- tempfile(tmpdir = tdir <- tempdir())
download.file("http://nemweb.com.au/Data_Archive/Wholesale_Electricity/NEMDE/2019/NEMDE_2019_03/NEMDE_Market_Data/NEMDE_Files/NemPriceSetter_20190301_xml.zip", tf)
xml_files <- unzip(tf, exdir = tdir)
library(XML)
doc<-xmlParse(xml_files[1])
a<-xmlToDataFrame(nodes=getNodeSet(doc,"//SolutionAnalysis/PriceSetting"))
unlink(tdir, T, T)
This is how the XML file looks
and I am trying to put the information in a specific column using a data frame

Overwrite An Excel File through RDCOMClient Package in R

I am trying to manipulate an Excel file (.xls) in R through RDCOMClient Package.
I created an Excel object in R, opened a workbook saved as .xls file format, and tried to convert the file format into .xlsx without pop-up dialog box when there is an Excel file with the same file name. Codes as below.
excel <- COMCreate("Excel.Application")
wb <- excel$Workbooks()$Open(Filename = "filepath.xls",Password = "xxxxx")
excel$DisplayAlerts(FALSE)
wb$SaveAs(Filename = "filepath.xlsx" ,FileFormat = 51,Password = "")
I got an error message when I executed the code:
excel$DisplayAlerts(FALSE)
<'checkErrorInfo'> 8002000E Error: invalid number of parameter.
You should replace it with the following:
excel[["DisplayAlerts"]]=FALSE

Can't open Excel File created in R language

I get the corruption error when I try to open the Excel workbook created in R.
I tried with both .xlsx and .xls extensions but neither worked!
The code that I used for doing all this is:
wb <- loadWorkbook("RCreated.xls", create = TRUE);
saveWorkbook(wb)
createSheet(wb, name = "First")
HELP!
Create the sheet BEFORE saving the workbook.

Resources