Importing .xls file that is saved as *.htm, *.html as it is saved on the backend - r

I have a requirement where I have to import an .xls file which is saved as .*htm, .*html.
How do we load this inside R in a data frame. The data is present in Sheet1 starting from Row Number 5. I have been struggling with this by trying to load it using xlsx package and readxl package. But neither of them worked, because the native format of the file is different.
I can't edit and re-save the file manually as .xlsx, as it cannot be automated.
Also to note, saved it as a .xlsx file and it works fine. But that's not what I need.
Kindly help me with this.

Try the openxlsx package and its function read.xlsx. If that doesn't work, you could programmatically rename the file as described for example here, and then open it using one of these excel packages.
Your file could be in xls format instead of xlsx, have you tried read_xls() function from readxl? Or it could also be in text format, in this case read.table() or fread() from data.tableshould work. The fact that it works after saving the file in xlsx strongly suggests that it is not formatted as an xlsx to begin with.
Hope this helps.

Related

Read sheet names from a xlsb file

I've been trying to read xlsb files into R.
I've tried using excel.link and readxlsb packages and they do work for reading the file but i also need to read the sheet names.
For a normal xlsx i would go gor GetSheets but it does not work for XLSB.
Any suggestion?
Thanks

Optimum way to overwrite an xlsx worksheet

I'm trying to write an Excel worksheet with the XLConnect package. The data I'm using is a data.frame (820*132). Once I'm done building the dataset, I'm using the writeWorksheetToFile function to export.
If the file does not exist yet and I am creating it from scratch, everything works well.
If I want to overwrite an existing sheet, the function takes approximately a minute to write and in addition, when I open the excel file, I have an error message saying: "we found a problem with some content in 'my_file.xlsx. Do you want to try to recover as much as we can?"
I tried to use other packages to write to excel like xlsx and openxlsx but they do not allow to overwrite a sheet without overwriting the entire workbook.
I've checked a few solutions such as this, but I not optimal.
I am looking for the most optimal way of writing excel worksheets, with an overwrite option that is suitable for large datasets.
I'm using the latest versions of R and RStudio.
My Excel verion is 1902, 64bits.

How to write data into a macro-enabled Excel file (write.xlslx corrupts my document)?

I'm trying to write a table into a macro-enabled Excel file (.xlsm) through the R. The write.xlsx (openxlsx) and writeWorksheetToFile (XLconnect) functions don't work.
When I used the openxlsx package, as seen below, the resulting .xlsm files ended up getting corrupted.
Code:
library(XLConnect)
library(openxlsx)
for (i in 1:3){
write.xlsx(Input_Files[[i]], Inputs[i], sheetName="Input_Sheet")
}
#Input_Files[[i]] are the R data.frames which need to be inserted into the .xslm file
#Inputs[i] are the excel files upon which the tables should be written into
Corrupted .xlsm file error message after write.xlsx:
Excel cannot open the file 'xxxxx.xslm' because the file format or file extension is not valid. Verify that the file has not been corrupted and that the file extension matches the format of the file
After researching this problem extensively, I found that the XLConnect connect package offers the writeWorksheetToFile function which works with .xlsm, albeit after running it a few times it yields an error message that there is no more free space. It also runs for 20+ minutes for tables with approximately 10,000 lines. I tried adding xlcFreeMemory at the beginning of the for loop, but it doesn't solve the issue.
Code:
library(XLConnect)
library(openxlsx)
for (i in 1:3){
xlcFreeMemory()
writeWorksheetToFile(Inputs[i], Input_Files[[i]], "Input_Sheet")
}
#Input_Files[[i]] are the R data.frames which need to be inserted into the .xslm file
#Inputs[i] are the excel files upon which the tables should be written into
Could anyone recommend a way to easily and quickly transfer an R table into an xlsm file without corrupting it?

R Copying to and Reading from csv Files

When I go to save Excel data that I've pasted into a .csv file, I get a formatting issue and often the saved file has all the numbers in each row as one long string.
My read statement is
resids<-read.csv("C:\\Projects\residuals_Parts3.csv",header=TRUE)
Any ideas on how to fix this?
The warning you are getting is fairly standard in Excel - any formatting you've added to the file (e.g. widening columns) will get lost if you don't save the file as an excel file.. and the warning is supposed to remind you of this. Personally, the extra click or two annoys me too.
If you would like to avoid converting excel files to CSV before bringing them into R, try the openxls package. It's saved me from a lot of that monkey business.

How to load xlsx file using fread function?

I wanted to use fread function to load all the datasets as I think it would better to use one type of import function so I just sticked to the fread.
Few of my files are in xlsx format and I was saving them to csv format and then using the fread function was trying to load the datasets.
But I noticed that when I converted the xlsx files into csv, an empty or incomplete row was being created in the newly created csv files.
Is there a way I can resolve this issue? Can I load xlsx file somehow using the fread function rather than converting it to csv file and then loading it using the fread function?
Here's how: Using command line tools directly in conjunction with csvkit like this
my.dt<-fread('in2csv my.xls')

Resources