Following code given:
Microsoft.Office.Interop.Excel.Application excelFile = new Microsoft.Office.Interop.Excel.Application();
excelFile.Visible = false;
Workbook wb = excelFile.Workbooks.Add(XlWBATemplate.xlWBATWorksheet);
Worksheet sheet1 = wb.ActiveSheet as Worksheet;
sheet1.Name = "Test";
sheet1.Cells[1, 1] = "Test";
string fileName = Environment.GetFolderPath(System.Environment.SpecialFolder.DesktopDirectory) + "\\tickets.xlsx";
wb.SaveAs(Filename: fileName, FileFormat: XlFileFormat.xlOpenXMLWorkbook, AccessMode: XlSaveAsAccessMode.xlNoChange);
wb.Close();
excelFile.UserControl = true;
excelFile.Quit();
This generates an excelfile and saves it to the desktop. What do I have to change to ask for a save location?
Using excel on the server is not supported and opens a whole can of worms, especially the is a high risk that excel pops up a dialog, which cannot be dismissed because no one sees the server desktop. Also, excel is very slow, generating a critical bottleneck. Last, debugging this is nearly impossible - this solution works, but will never work well.
The solution: use a library like epplus which can read / write xlsx files easily, is faster to develop, magnitudes faster in building the file and free. There are other libraries out there which can read xls files, if needed.
Prior to setting the filename, you could open the SaveFileDialog
Related
I am reading a batch of excel files into r using the readxl package and a for loop. Here is my simplified code:
filelist = list.files(input.dir) #get list of excel files
for (i in seq_along(filelist)){
read_excel(filelist[i], col_names=F)
}
I am able to read some files without issue. Others are read, but the result is a 0x0 tibble. Other files stop the loop and force the error message:
Error: expected <
I can bypass the issue by opening a problem file, making any minor edit, and saving. However, it is not feasible to do this for every file - I have over 1,000 in total. You can find a subset of my excel files here. For your reference, "AMEX1-61-2020-PH.xlsx" is one of the problem files that returns the error message. I am using readxl version 1.3.1 and r version 4.1.1. Thank you!
There is a possibility the file(s) are corrupt (ie incorrect file format per extension, etc) and thus causing the problem with R opening it.
Here is a possible solution of writing an Excel macro to opens, re-saves and closes each file in the directory as defined by the "strDirectory" variable below.
Once this macro processes all of the files in the directory then the files should work in R.
Sub ResaveFiles()
Dim varDirectory As Variant
Dim flag As Boolean
Dim i As Integer
Dim strDirectory As String
strDirectory = "/Users/stacktest/Downloads/"
i = 1
flag = True
varDirectory = Dir(strDirectory & "*.xls*", vbNormal)
While flag = True
If varDirectory = "" Then
flag = False
Else
Set wb = Application.Workbooks.Open(varDirectory)
Cells(1, 1) = Cells(1, 1)
ActiveWorkbook.Save
ActiveWorkbook.Close
varDirectory = Dir
i = i + 1
End If
Wend
End Sub
saveWorkbook() function in XLConnect saves the workbook and the changes and updated calculations are visible in the excel file but not on R (because it has a formula not accepted by the apache poi)
However, to view the cell I save the file to disk and call it using another function. But when I call the same file again the calculated fields still show the old values. I don't want to save the excel file every time I make a change in the workbook.
Would you know a workaround to be able to call the new values without manually saving excel?
Code -
options(java.parameters = "-Xmx1024m")
library(rJava)
library(XLConnect)
wb = loadWorkbook(file.choose(), create = TRUE)
readWorksheet(wb,16, region = 'D25:D26')
writeWorksheet(wb,-.45,sheet = 16,startRow = 25,startCol = 4)
setForceFormulaRecalculation(wb,sheet = 16, TRUE)
saveWorkbook(wb)
detach("package:XLConnect", unload=TRUE)
detach("package:XLConnectJars", unload=TRUE)
library(xlsx)
y = read.xlsx(file.choose(), sheetIndex = 16)
So the Excel file on the system shows the changes corresponding to the new -.45 value but when I read the file again, the calculated values are the old values and not the new ones. This gets fixed if I save the file manually.
I believe the command you are using is correct but maybe some small modifications would make this work.
I think you could try placing the needed calculations in a different sheet in excel and treat the data you inserted as a dependency for those calculations in the new sheet.
Then read it in as a fresh workbook and call the new sheet. I think that will you the output you need.
setForceFormulaRecalculation(wb, sheet = "*", TRUE)
I would use this command to force all sheets to recalculate instead.
Hope that helps!
I am trying to open a 38MB Excel File using EPPlus v4.0, I am able to pass it to the ExcelPackage variable but when I'm trying to get the workbook from that variable, it causes me a 'System.OutOfMemoryException'.
Here's my code:
Dim temppath = Path.GetTempPath()
Dim filenamestr As String = Path.GetFileNameWithoutExtension(Path.GetRandomFileName())
Dim tempfilename As String = Path.Combine(temppath, filenamestr + ".xlsx")
fileUploadExcel.SaveAs(tempfilename)
Dim XLPack = New ExcelPackage(File.OpenRead(tempfilename))
GC.Collect()
If File.Exists(tempfilename) Then
File.Delete(tempfilename)
End If
Dim xlWorkbook As ExcelWorkbook = XLPack.Workbook 'the error shows here
I'm stuck. Any help would really be appreciated. Thanks in advance.
You are probably hitting the ram limit as that is a big file. If you have the option to compile to 64 bit you might be able to solve the problem:
https://stackoverflow.com/a/29912563/1324284
But if you can only compile to x86 there is not a whole lot you can do with epplus. You will have to either use a different library or build the XML files for excel yourself:
https://stackoverflow.com/a/26802061/1324284
Essential XlsIO is an option for loading large Excel files using .NET.
The whole suite of controls is available for free (commercial applications also) through the community license program if you qualify (less than 1 million US Dollars in revenue). The community license is the full product with no limitations or watermarks.
Note: I work for Syncfusion.
I'm trying to import data from an excel file into R, with the library xlsx. I get the error:
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod",
cl, : org.apache.poi.EncryptedDocumentException: The supplied
spreadsheet seems to be an Encrypted .xlsx file. It must be decrypted
before use by XSSF, it cannot be used by HSSF
I changed the file from filename.xlsx to filename.xls, but I keep getting the same message
I also tried the advice of this links:
Import password-protected xlsx workbook into R
How to read xlsx file in protect mode to R
but it won't work.
The sheets of my file are protected but not the file itself.
It seems from the package xlsx website that facilities to work with password protected spreadsheets is a feature still being worked on - although a user Heather has made a fix.
See https://code.google.com/p/rexcel/issues/detail?id=49
But it is not clear if this extends to protected sheets as well.
Fercho - Can you try other workarounds?
Save as csv and use read.csv to get data into R?
Save a version of Excel file without protected sheets for your data input?
Try other Excel to R programs like XLConnect? This package seems more up to date.
EDIT: Mango Solutions has a comparison of Excel and R tools. openxlsx can handle password protected sheets but is slower than XLConnect.
CODE for 1 Above
' Microsoft for Excel VBA for saving as csv
' First Select your sheet to turn to CSV file and then run code like this
' Save sheet as csv
ThisWorkbook.SaveAs Filename:=strSaveFilename, _
FileFormat:= xlCSV
Workbook.SaveAs Method
' SYNTAX expression .SaveAs(FileName, FileFormat, Password, WriteResPassword, ReadOnlyRecommended, CreateBackup, AccessMode, ConflictResolution, AddToMru, TextCodepage, TextVisualLayout, Local)
thanks, I finally did it in VBA it takes a little bit of time but it works, here is the code I used for VBA.
Sub LoopThroughFiles()
FolderName = "C:folder with files\"
If Right(FolderName, 1) <> Application.PathSeparator Then FolderName = FolderName & Application.PathSeparator
Fname = Dir(FolderName & "*.xls")
'loop through the files
Do While Len(Fname)
With Workbooks.Open(FolderName & Fname)
Dim ws As Worksheet
For Each ws In ActiveWorkbook.Worksheets
On Error Resume Next
ws.Unprotect Password:="password 1"
ws.Unprotect Password:="password 2"
On Error GoTo 0
Next ws
For Each w In Application.Workbooks
w.Save
Next w
End With
' go to the next file in the folder
Fname = Dir
Loop
Application.Quit
End Sub
I used two password to unlock the sheets, I didn't know which password was so I try both on each file.
thanks again for the help.
I'm converting the first page of a docx file to an image in twoo steps using dox4j and pdfbox but I'm currently getting an OutOfMemoryError every time.
I've been able to determine that the exception is thrown on the very last step of this process, while the convertToImage method is being called, however I've been using the second step of this method to convert pdfs for some time now without issue so I am at a loss as to what might be the cause unless perhaps dox4j is encoding the pdf is a way which I have not yet tested or is corrupt.
I've tried replacing the ByteArrayOutputStream with a FileOutputStream and the pdf seems to render correctly is not any larger than I would expect.
This is the code I am using:
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(file);
org.docx4j.convert.out.pdf.PdfConversion c = new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(wordMLPackage);
((org.docx4j.convert.out.pdf.viaXSLFO.Conversion)c).setSaveFO(File.createTempFile("fonts", ".fo"));
ByteArrayOutputStream os = new ByteArrayOutputStream();
c.output(os, new PdfSettings());
byte[] bytes = os.toByteArray();
os.close();
ByteArrayInputStream is = new ByteArrayInputStream(bytes);
PDDocument document = PDDocument.load(is);
PDPage page = (PDPage) document.getDocumentCatalog().getAllPages().get(0);
BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 96);
is.close();
document.close();
Edit
To give more context on this situation, this code is being run in a grails web-application. I have tried several different variants of this code, including nulling out everything once no longer needed, using FileInputStream and FileOutputStream to try to conserve more physical memory and inspect the output of docx4j and pdfbox, each of which seem to work correctly.
I'm using docx4j 2.8.1 and pdfbox 0.7.3, I have also tried pdf-renderer but I still get an OutOfMemoryError. My suspicions are that docx4j is using too much memory but does not produce the error until the pdf to image conversion.
I would gladly except an alternate way of converting a docx file to a pdf or directly to an image as an answer, however I am currently trying to replace jodconverter which has been problematic to run on a server.
I'm part of XDocreport team.
We recently develop a little webapp deployed on cloudbees (http://xdocreport-converter.opensagres.cloudbees.net/) that shows the behaviour converters.
You can easily compare the behaviour and the performances of docx4j and xdocreport for PDF and Html convertion.
Source code can be found here :
https://github.com/pascalleclercq/xdocreport-demo (REST-Service-Converter-WebApplication subfolder).
and here :
https://github.com/pascalleclercq/xdocreport/blob/master/remoting/fr.opensagres.xdocreport.remoting.converter.server/src/main/java/fr/opensagres/xdocreport/remoting/converter/server/ConverterResourceImpl.java
The firsts numbers I get is that Xdocreport is roughly 10 time faster for generating a PDF than Docx4J.
Feedback is welcome.
Glorious success at last! I replaced docx4j with XDocReport and the document converts to a PDF in no time at all. However there seems to be some issues with some documents but I would expect this is due to the OS that they were created on and may be solved by using:
PDFViaITextOptions options = PDFViaITextOptions.create().fontEncoding("windows-1250");
Using the approiate OS instead of just:
PDFViaITextOptions options = PDFViaITextOptions.create();
Which defaults to the current OS.
This is the code I now use to convert from DOCX to PDF:
FileInputStream in = new FileInputStream(file);
XWPFDocument document = new XWPFDocument(in);
PDFViaITextOptions options = PDFViaITextOptions.create();
ByteArrayOutputStream out = new ByteArrayOutputStream();
XWPF2PDFViaITextConverter.getInstance().convert(document, out, options);
byte[] bytes = out.toByteArray();
out.close();
ByteArrayInputStream is = new ByteArrayInputStream(bytes);
PDDocument document = PDDocument.load(is);
PDPage page = (PDPage) document.getDocumentCatalog().getAllPages().get(0);
BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 96);
is.close();
document.close();
return image;