Import data from excel with HSSF in R - r

I'm trying to import data from an excel file into R, with the library xlsx. I get the error:
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod",
cl, : org.apache.poi.EncryptedDocumentException: The supplied
spreadsheet seems to be an Encrypted .xlsx file. It must be decrypted
before use by XSSF, it cannot be used by HSSF
I changed the file from filename.xlsx to filename.xls, but I keep getting the same message
I also tried the advice of this links:
Import password-protected xlsx workbook into R
How to read xlsx file in protect mode to R
but it won't work.
The sheets of my file are protected but not the file itself.

It seems from the package xlsx website that facilities to work with password protected spreadsheets is a feature still being worked on - although a user Heather has made a fix.
See https://code.google.com/p/rexcel/issues/detail?id=49
But it is not clear if this extends to protected sheets as well.
Fercho - Can you try other workarounds?
Save as csv and use read.csv to get data into R?
Save a version of Excel file without protected sheets for your data input?
Try other Excel to R programs like XLConnect? This package seems more up to date.
EDIT: Mango Solutions has a comparison of Excel and R tools. openxlsx can handle password protected sheets but is slower than XLConnect.
CODE for 1 Above
' Microsoft for Excel VBA for saving as csv
' First Select your sheet to turn to CSV file and then run code like this
' Save sheet as csv
ThisWorkbook.SaveAs Filename:=strSaveFilename, _
FileFormat:= xlCSV
Workbook.SaveAs Method
' SYNTAX expression .SaveAs(FileName, FileFormat, Password, WriteResPassword, ReadOnlyRecommended, CreateBackup, AccessMode, ConflictResolution, AddToMru, TextCodepage, TextVisualLayout, Local)

thanks, I finally did it in VBA it takes a little bit of time but it works, here is the code I used for VBA.
Sub LoopThroughFiles()
FolderName = "C:folder with files\"
If Right(FolderName, 1) <> Application.PathSeparator Then FolderName = FolderName & Application.PathSeparator
Fname = Dir(FolderName & "*.xls")
'loop through the files
Do While Len(Fname)
With Workbooks.Open(FolderName & Fname)
Dim ws As Worksheet
For Each ws In ActiveWorkbook.Worksheets
On Error Resume Next
ws.Unprotect Password:="password 1"
ws.Unprotect Password:="password 2"
On Error GoTo 0
Next ws
For Each w In Application.Workbooks
w.Save
Next w
End With
' go to the next file in the folder
Fname = Dir
Loop
Application.Quit
End Sub
I used two password to unlock the sheets, I didn't know which password was so I try both on each file.
thanks again for the help.

Related

"Error: expected <" when reading .xlsx and .xls files into R with readxl package

I am reading a batch of excel files into r using the readxl package and a for loop. Here is my simplified code:
filelist = list.files(input.dir) #get list of excel files
for (i in seq_along(filelist)){
read_excel(filelist[i], col_names=F)
}
I am able to read some files without issue. Others are read, but the result is a 0x0 tibble. Other files stop the loop and force the error message:
Error: expected <
I can bypass the issue by opening a problem file, making any minor edit, and saving. However, it is not feasible to do this for every file - I have over 1,000 in total. You can find a subset of my excel files here. For your reference, "AMEX1-61-2020-PH.xlsx" is one of the problem files that returns the error message. I am using readxl version 1.3.1 and r version 4.1.1. Thank you!
There is a possibility the file(s) are corrupt (ie incorrect file format per extension, etc) and thus causing the problem with R opening it.
Here is a possible solution of writing an Excel macro to opens, re-saves and closes each file in the directory as defined by the "strDirectory" variable below.
Once this macro processes all of the files in the directory then the files should work in R.
Sub ResaveFiles()
Dim varDirectory As Variant
Dim flag As Boolean
Dim i As Integer
Dim strDirectory As String
strDirectory = "/Users/stacktest/Downloads/"
i = 1
flag = True
varDirectory = Dir(strDirectory & "*.xls*", vbNormal)
While flag = True
If varDirectory = "" Then
flag = False
Else
Set wb = Application.Workbooks.Open(varDirectory)
Cells(1, 1) = Cells(1, 1)
ActiveWorkbook.Save
ActiveWorkbook.Close
varDirectory = Dir
i = i + 1
End If
Wend
End Sub

Read Excel file into R with locked cells

I have an Excel spreadsheet to read into R, that is both password protected and has locked cells. I can use excel.link to import a password protected file, but I can't figure out how to unlock/unprotect the cells. excel.link gives me this error:
> <checkErrorInfo> 80020009 Error in top_left_corner[["CurrentRegion"]]
> : You cannot use this command on a protected sheet. To use this
> command, you must first unprotect the sheet (Review tab, Changes
> group, Unprotect Sheet button). You may be prompted for a password.
> (Microsoft Excel)
Any advice is welcome. I can manually unprotect the cells, but I have to do this to hundreds of files on a daily basis.
My end goal here is to have the data from the 100s of spreadsheets imported into R for analytics. I do not need to export back into Excel. I also do not need to import the protected cells into R, so if there was a way to skip them that would work.
EDIT: New issue has emerged related to this operation. I get an error in R when I try to do the extraction on a shared workbook:
80020009 Error: Exception occurred.
If I manually go into Excel and unshare the workbook (under Review->Share Workbook->Uncheck Allow changes made by more than one user). Is there a way with excel.link to programmatically do this?
Try the following code:
library(excel.link)
filename = "shared.xlsx"
xl.workbook.open(filename, password = "test")
# here we resave workbook to the temporary folder with exclusive access
new_path = paste0(tempdir(), "\\", filename)
xl()[["Activeworkbook"]]$saveas(new_path, AccessMode=xl.constants$xlExclusive)
###
xl()[["Activesheet"]]$Unprotect(password = "test")
data = crc[a1]
xl.workbook.close()
unlink(new_path) # remove temporary Excel File
UPDATE 2018.07.16 Add code for saving workbook with exclusive access.

reading gctx file in R

I am trying to read a gctx file extracted from LINCS source for gene expression analysis. The codes for eading the file are provided at the link below.
https://github.com/cmap/l1ktools.
I am using the script provided and I have sourced the script. however when I tried the function parse.gctx it gives me following error:
ds <- parse.gctx("../L1000 Data/zspc_n40172x22268.gctx")
reading ../L1000 Data/zspc_n40172x22268.gctx
Error in h5checktypeOrOpenLoc(file, readonly = TRUE) :
Error in h5checktypeOrOpenLoc(). Cannot open file. File 'C:\L1000 Data\zspc_n40172x22268.gctx' does not exist.
How can I resolve this issue and read my gctx file?
Since you're getting a 'file does not exist' error, I think the problem is because you have a space in the path to the file you're trying to read (specifically, in "L1000 Data"); if you remove the space in the path it should parse properly.
In other words, try renaming your "L1000 Data" folder so that instead of:
ds <- parse.gctx("../L1000 Data/zspc_n40172x22268.gctx")
you have something along the lines of:
ds <- parse.gctx("../L1000_Data/zspc_n40172x22268.gctx")

A disk error occurred during a write operation. (Exception from HRESULT: 0x8003001D (STG_E_WRITEFAULT))

I am using EPPlus to read .csv file in vb.net.
When I run this code, I get the error "A disk error occurred during a write operation.
(Exception from HRESULT: 0x8003001D (STG_E_WRITEFAULT))"
Here is my code :
Public Function ImportExcelSheet(ByVal filePath As String) As DataTable
Dim dtImportData As New DataTable()
Try
'If csv file have header then "true" else "false"
Dim hasHeader As Boolean = True
Using pck = New OfficeOpenXml.ExcelPackage()
Using stream = File.OpenRead(filePath)
pck.Load(stream)
End Using
What should I do to fix this error?
I had the same error with a plugin I had created to import from excel. Originally I had saved the import file as .xls.
I opened this excel spreadsheet and resaved as .xlsx.
This solved the problem.
So maybe it is the file format that the csv was saved as.
I get the same error reading xls file. It turns out the workbook had hidden rows on the first sheet. I inspected the document and removed the hidden rows and it worked perfectly

Download generated excel file

Following code given:
Microsoft.Office.Interop.Excel.Application excelFile = new Microsoft.Office.Interop.Excel.Application();
excelFile.Visible = false;
Workbook wb = excelFile.Workbooks.Add(XlWBATemplate.xlWBATWorksheet);
Worksheet sheet1 = wb.ActiveSheet as Worksheet;
sheet1.Name = "Test";
sheet1.Cells[1, 1] = "Test";
string fileName = Environment.GetFolderPath(System.Environment.SpecialFolder.DesktopDirectory) + "\\tickets.xlsx";
wb.SaveAs(Filename: fileName, FileFormat: XlFileFormat.xlOpenXMLWorkbook, AccessMode: XlSaveAsAccessMode.xlNoChange);
wb.Close();
excelFile.UserControl = true;
excelFile.Quit();
This generates an excelfile and saves it to the desktop. What do I have to change to ask for a save location?
Using excel on the server is not supported and opens a whole can of worms, especially the is a high risk that excel pops up a dialog, which cannot be dismissed because no one sees the server desktop. Also, excel is very slow, generating a critical bottleneck. Last, debugging this is nearly impossible - this solution works, but will never work well.
The solution: use a library like epplus which can read / write xlsx files easily, is faster to develop, magnitudes faster in building the file and free. There are other libraries out there which can read xls files, if needed.
Prior to setting the filename, you could open the SaveFileDialog

Resources