EPPlus 'System.OutOfMemoryException' - asp.net

I am trying to open a 38MB Excel File using EPPlus v4.0, I am able to pass it to the ExcelPackage variable but when I'm trying to get the workbook from that variable, it causes me a 'System.OutOfMemoryException'.
Here's my code:
Dim temppath = Path.GetTempPath()
Dim filenamestr As String = Path.GetFileNameWithoutExtension(Path.GetRandomFileName())
Dim tempfilename As String = Path.Combine(temppath, filenamestr + ".xlsx")
fileUploadExcel.SaveAs(tempfilename)
Dim XLPack = New ExcelPackage(File.OpenRead(tempfilename))
GC.Collect()
If File.Exists(tempfilename) Then
File.Delete(tempfilename)
End If
Dim xlWorkbook As ExcelWorkbook = XLPack.Workbook 'the error shows here
I'm stuck. Any help would really be appreciated. Thanks in advance.

You are probably hitting the ram limit as that is a big file. If you have the option to compile to 64 bit you might be able to solve the problem:
https://stackoverflow.com/a/29912563/1324284
But if you can only compile to x86 there is not a whole lot you can do with epplus. You will have to either use a different library or build the XML files for excel yourself:
https://stackoverflow.com/a/26802061/1324284

Essential XlsIO is an option for loading large Excel files using .NET.
The whole suite of controls is available for free (commercial applications also) through the community license program if you qualify (less than 1 million US Dollars in revenue). The community license is the full product with no limitations or watermarks.
Note: I work for Syncfusion.

Related

"Error: expected <" when reading .xlsx and .xls files into R with readxl package

I am reading a batch of excel files into r using the readxl package and a for loop. Here is my simplified code:
filelist = list.files(input.dir) #get list of excel files
for (i in seq_along(filelist)){
read_excel(filelist[i], col_names=F)
}
I am able to read some files without issue. Others are read, but the result is a 0x0 tibble. Other files stop the loop and force the error message:
Error: expected <
I can bypass the issue by opening a problem file, making any minor edit, and saving. However, it is not feasible to do this for every file - I have over 1,000 in total. You can find a subset of my excel files here. For your reference, "AMEX1-61-2020-PH.xlsx" is one of the problem files that returns the error message. I am using readxl version 1.3.1 and r version 4.1.1. Thank you!
There is a possibility the file(s) are corrupt (ie incorrect file format per extension, etc) and thus causing the problem with R opening it.
Here is a possible solution of writing an Excel macro to opens, re-saves and closes each file in the directory as defined by the "strDirectory" variable below.
Once this macro processes all of the files in the directory then the files should work in R.
Sub ResaveFiles()
Dim varDirectory As Variant
Dim flag As Boolean
Dim i As Integer
Dim strDirectory As String
strDirectory = "/Users/stacktest/Downloads/"
i = 1
flag = True
varDirectory = Dir(strDirectory & "*.xls*", vbNormal)
While flag = True
If varDirectory = "" Then
flag = False
Else
Set wb = Application.Workbooks.Open(varDirectory)
Cells(1, 1) = Cells(1, 1)
ActiveWorkbook.Save
ActiveWorkbook.Close
varDirectory = Dir
i = i + 1
End If
Wend
End Sub

Import data from excel with HSSF in R

I'm trying to import data from an excel file into R, with the library xlsx. I get the error:
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod",
cl, : org.apache.poi.EncryptedDocumentException: The supplied
spreadsheet seems to be an Encrypted .xlsx file. It must be decrypted
before use by XSSF, it cannot be used by HSSF
I changed the file from filename.xlsx to filename.xls, but I keep getting the same message
I also tried the advice of this links:
Import password-protected xlsx workbook into R
How to read xlsx file in protect mode to R
but it won't work.
The sheets of my file are protected but not the file itself.
It seems from the package xlsx website that facilities to work with password protected spreadsheets is a feature still being worked on - although a user Heather has made a fix.
See https://code.google.com/p/rexcel/issues/detail?id=49
But it is not clear if this extends to protected sheets as well.
Fercho - Can you try other workarounds?
Save as csv and use read.csv to get data into R?
Save a version of Excel file without protected sheets for your data input?
Try other Excel to R programs like XLConnect? This package seems more up to date.
EDIT: Mango Solutions has a comparison of Excel and R tools. openxlsx can handle password protected sheets but is slower than XLConnect.
CODE for 1 Above
' Microsoft for Excel VBA for saving as csv
' First Select your sheet to turn to CSV file and then run code like this
' Save sheet as csv
ThisWorkbook.SaveAs Filename:=strSaveFilename, _
FileFormat:= xlCSV
Workbook.SaveAs Method
' SYNTAX expression .SaveAs(FileName, FileFormat, Password, WriteResPassword, ReadOnlyRecommended, CreateBackup, AccessMode, ConflictResolution, AddToMru, TextCodepage, TextVisualLayout, Local)
thanks, I finally did it in VBA it takes a little bit of time but it works, here is the code I used for VBA.
Sub LoopThroughFiles()
FolderName = "C:folder with files\"
If Right(FolderName, 1) <> Application.PathSeparator Then FolderName = FolderName & Application.PathSeparator
Fname = Dir(FolderName & "*.xls")
'loop through the files
Do While Len(Fname)
With Workbooks.Open(FolderName & Fname)
Dim ws As Worksheet
For Each ws In ActiveWorkbook.Worksheets
On Error Resume Next
ws.Unprotect Password:="password 1"
ws.Unprotect Password:="password 2"
On Error GoTo 0
Next ws
For Each w In Application.Workbooks
w.Save
Next w
End With
' go to the next file in the folder
Fname = Dir
Loop
Application.Quit
End Sub
I used two password to unlock the sheets, I didn't know which password was so I try both on each file.
thanks again for the help.

Download generated excel file

Following code given:
Microsoft.Office.Interop.Excel.Application excelFile = new Microsoft.Office.Interop.Excel.Application();
excelFile.Visible = false;
Workbook wb = excelFile.Workbooks.Add(XlWBATemplate.xlWBATWorksheet);
Worksheet sheet1 = wb.ActiveSheet as Worksheet;
sheet1.Name = "Test";
sheet1.Cells[1, 1] = "Test";
string fileName = Environment.GetFolderPath(System.Environment.SpecialFolder.DesktopDirectory) + "\\tickets.xlsx";
wb.SaveAs(Filename: fileName, FileFormat: XlFileFormat.xlOpenXMLWorkbook, AccessMode: XlSaveAsAccessMode.xlNoChange);
wb.Close();
excelFile.UserControl = true;
excelFile.Quit();
This generates an excelfile and saves it to the desktop. What do I have to change to ask for a save location?
Using excel on the server is not supported and opens a whole can of worms, especially the is a high risk that excel pops up a dialog, which cannot be dismissed because no one sees the server desktop. Also, excel is very slow, generating a critical bottleneck. Last, debugging this is nearly impossible - this solution works, but will never work well.
The solution: use a library like epplus which can read / write xlsx files easily, is faster to develop, magnitudes faster in building the file and free. There are other libraries out there which can read xls files, if needed.
Prior to setting the filename, you could open the SaveFileDialog

how to export data from gridview to excel 2003,2007,2010 without warning message

I am trying to export data from gridview to excel. I have office 2010 installed on my pc. When i am trying to open excel file it gives me error i.e. "the file you are trying to open is in a different format than specified by the file extension c#".
My Code for exporting gridview:
Protected Sub btnexptoexcel_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles btnexptoexcel.Click
Try
Response.Clear()
Response.Buffer = True
Response.AddHeader("content-disposition", "attachment;filename=Complaint_Details.xls")
Response.Charset = ""
Response.ContentType = "application/ms-excel"
Using sw As New StringWriter()
Dim hw As New HtmlTextWriter(sw)
grd_ComplaintDetails.AllowPaging = False
grd_ComplaintDetails.HeaderRow.BackColor = Color.White
For Each cell As TableCell In grd_ComplaintDetails.HeaderRow.Cells
cell.BackColor = grd_ComplaintDetails.HeaderStyle.BackColor
Next
For Each row As GridViewRow In grd_ComplaintDetails.Rows
row.BackColor = Color.White
For Each cell As TableCell In row.Cells
If row.RowIndex Mod 2 = 0 Then
cell.BackColor = grd_ComplaintDetails.AlternatingRowStyle.BackColor
Else
cell.BackColor = grd_ComplaintDetails.RowStyle.BackColor
End If
cell.CssClass = "textmode"
Next
Next
grd_ComplaintDetails.RenderControl(hw)
Dim style As String = "<style> .textmode { } </style>"
Response.Write(style)
Response.Output.Write(sw.ToString())
Response.Flush()
Response.[End]()
End Using
Catch ex As Exception
div_Msg.InnerText = "Can not generate Excel File"
End Try
End Sub
My question is when i open file (in MSOffice 2003, 2007 or 2010) it shouldnt give me file extension error...
Can you please tell me what are the changes i should made in the code???
The Article from microsoft says Excel uses the following extensions:
.xls – Is The Excel 97 - Excel 2003 Binary file format (BIFF8).
.xlsx –The default Office Excel 2007 XML-based file format. Cannot store Microsoft Visual Basic for Applications (VBA) macro code or Microsoft Office Excel 4.0 macro sheets (.xlm).
.xlt – The Excel 97 - Excel 2003 Binary file format (BIFF8) for an Excel template.
.xlsm The Office Excel 2007 XML-based and macro-enabled file format. Stores VBA macro code or Excel 4.0 macro sheets (.xlm).
.xltx The default Office Excel 2007 file format for an Excel template. Cannot store VBA macro code or Excel 4.0 macro sheets (.xlm).
.xla The Excel 97-2003 Add-In, a supplemental program that is designed to run additional code. Supports the use of VBA projects.
Your preference is MSOffice 2003, 2007 or 2010
So you have to choose
Response.AddHeader("content-disposition", "attachment;filename=Complaint_Details.xlsx") instead for
Complaint_Details.xls"
If you can do it, try saving your file as .xlsx instead (excel 2007+)
Response.AddHeader("content-disposition", "attachment;filename=Complaint_Details.xlsx")
Also, I suggest trying this free library to deal with xlsx Excel files. EPPLUS is very good. epplus.codeplex.com
If you need specifically xls files (excel 2003 and older), they're not supported by EPPLUS but you can use this other library NPOI https://npoi.codeplex.com/ that supports them
I've been having the same issue. I finally opened what RenderControl was outputting with Notepad and see that it is actually a web page regardless of the extension is. That is why you get the warning message. The user actually has to save from Excel as an Excel file.
One of the other downsides is that what is in the grid renders all of the code also. So, the column headers will link back to a doPostBack function and hyperlinks will still have references that are invalid.
I have seen this method posted many times, but it is not really a perfect solution.

Converting Docx to image using Docx4j and PdfBox causes OutOfMemoryError

I'm converting the first page of a docx file to an image in twoo steps using dox4j and pdfbox but I'm currently getting an OutOfMemoryError every time.
I've been able to determine that the exception is thrown on the very last step of this process, while the convertToImage method is being called, however I've been using the second step of this method to convert pdfs for some time now without issue so I am at a loss as to what might be the cause unless perhaps dox4j is encoding the pdf is a way which I have not yet tested or is corrupt.
I've tried replacing the ByteArrayOutputStream with a FileOutputStream and the pdf seems to render correctly is not any larger than I would expect.
This is the code I am using:
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(file);
org.docx4j.convert.out.pdf.PdfConversion c = new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(wordMLPackage);
((org.docx4j.convert.out.pdf.viaXSLFO.Conversion)c).setSaveFO(File.createTempFile("fonts", ".fo"));
ByteArrayOutputStream os = new ByteArrayOutputStream();
c.output(os, new PdfSettings());
byte[] bytes = os.toByteArray();
os.close();
ByteArrayInputStream is = new ByteArrayInputStream(bytes);
PDDocument document = PDDocument.load(is);
PDPage page = (PDPage) document.getDocumentCatalog().getAllPages().get(0);
BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 96);
is.close();
document.close();
Edit
To give more context on this situation, this code is being run in a grails web-application. I have tried several different variants of this code, including nulling out everything once no longer needed, using FileInputStream and FileOutputStream to try to conserve more physical memory and inspect the output of docx4j and pdfbox, each of which seem to work correctly.
I'm using docx4j 2.8.1 and pdfbox 0.7.3, I have also tried pdf-renderer but I still get an OutOfMemoryError. My suspicions are that docx4j is using too much memory but does not produce the error until the pdf to image conversion.
I would gladly except an alternate way of converting a docx file to a pdf or directly to an image as an answer, however I am currently trying to replace jodconverter which has been problematic to run on a server.
I'm part of XDocreport team.
We recently develop a little webapp deployed on cloudbees (http://xdocreport-converter.opensagres.cloudbees.net/) that shows the behaviour converters.
You can easily compare the behaviour and the performances of docx4j and xdocreport for PDF and Html convertion.
Source code can be found here :
https://github.com/pascalleclercq/xdocreport-demo (REST-Service-Converter-WebApplication subfolder).
and here :
https://github.com/pascalleclercq/xdocreport/blob/master/remoting/fr.opensagres.xdocreport.remoting.converter.server/src/main/java/fr/opensagres/xdocreport/remoting/converter/server/ConverterResourceImpl.java
The firsts numbers I get is that Xdocreport is roughly 10 time faster for generating a PDF than Docx4J.
Feedback is welcome.
Glorious success at last! I replaced docx4j with XDocReport and the document converts to a PDF in no time at all. However there seems to be some issues with some documents but I would expect this is due to the OS that they were created on and may be solved by using:
PDFViaITextOptions options = PDFViaITextOptions.create().fontEncoding("windows-1250");
Using the approiate OS instead of just:
PDFViaITextOptions options = PDFViaITextOptions.create();
Which defaults to the current OS.
This is the code I now use to convert from DOCX to PDF:
FileInputStream in = new FileInputStream(file);
XWPFDocument document = new XWPFDocument(in);
PDFViaITextOptions options = PDFViaITextOptions.create();
ByteArrayOutputStream out = new ByteArrayOutputStream();
XWPF2PDFViaITextConverter.getInstance().convert(document, out, options);
byte[] bytes = out.toByteArray();
out.close();
ByteArrayInputStream is = new ByteArrayInputStream(bytes);
PDDocument document = PDDocument.load(is);
PDPage page = (PDPage) document.getDocumentCatalog().getAllPages().get(0);
BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 96);
is.close();
document.close();
return image;

Resources