We have an automatic process that opens a template excel file, writes rows of data, and returns the file to the user. This process is usually fast, however I was recently asked to add a summary page with some Excel formulas to one of the templates, and now the process takes forever.
It successfully runs with about 5 records after a few minutes, however this week's record set is almost 400 rows and the longest I've let it run is about half an hour before cancelling it. Without the formulas, it only takes a few seconds to run.
Is there any known issues with writing rows to an Excel file that contains formulas? Or is there a way to tell Excel not to evaluate formulas until the file is opened by a user?
The formulas on the summary Sheet are these:
' Returns count of cells in column where data = Y
=COUNTIF(Sheet1!J15:Sheet1!J10000, "Y")
=COUNTIF(Sheet1!F15:Sheet1!F10000, "Y")
' Return sum of column where data is a number greater than 0
' Column contains formula calculating the difference in months between two dates
=SUMIF(Sheet1!I15:Sheet1!I10000,">0",Sheet1!I15:Sheet1!I10000)
' Returns a count of distinct values in a column
=SUMPRODUCT((Sheet1!D15:Sheet1!D10000<>"")/COUNTIF(Sheet1!D15:Sheet1!D10000,Sheet1!D15:Sheet1!D10000&""))
And the code that writes to excel looks something like this:
Dim xls as New Excel.Application()
Dim xlsBooks as Excel.Workbooks, xlsBook as Excel.Workbook
Dim xlsSheets as Excel.Sheets, xlsSheet as Excel.Worksheet
Dim xlsCells as Excel.Range
xls.Visible = False
xls.DisplayAlerts = False
xlsBooks = xls.Workbooks
xlsBooks.Open(templateFile)
xlsBook = xlsBooks.Item(1)
' Loop through excel Sheets. Some templates have multiple sheets.
For Each drSheet as DataRow in dtSheets.Rows
xlsSheets = xlsBook.Worksheets
xlsSheet = CType(xlsSheets.Item(drSheet("SheetName")), Excel.Worksheet)
xlsCells = xlsSheet.Cells
' Loop though Column list from Database. Each Template requires different columns
For Each drDataCols as DataRow in dtDataCols.Rows
' Loop though Rows to get data
For Each drData as DataRow in dtData.Rows
xlsCells(drSheet("StartRow") + dtData.Rows.IndexOf(drData), drDataCols("DataColumn")) = drData("Col" + drDataCols("DataColumn").toString).toString
Next
Next
Next
xlsSheet.SaveAs(newFile)
xlsBook.Close
xls.Quit()
Every time you write to a cell Excel recalculates the open workbooks and refreshes the screen. Both of these things are slow, so you need to set Application.Screenupdating=false and Application.Calculation=xlCalculationManual
Also there is a high overhead associated with each write to a cell, so it is much faster to acuumulate the data in an array and then write the array to the range with a single call to the Excel object model.
With auto mode calculation, recalculation occurs after every data input/changed. I had the same problem, was solved by setting Manual calculation mode. (Reference MSDN link.)
xls.Calculation = Excel.XlCalculation.xlCalculationManual
Also, this property can only be set after a Workbook has been opened or it will throw a run-time error.
One way that has saved me over the years is to add
Application.ScreenUpdating = False
directly before I execute a potentially lengthy method, and then
Application.ScreenUpdating = True
directly after, or at least at some later point in the code. This forces Excel to not redraw anything on the visible screen until it is complete That issue is where I've found lengthy running operations to stem from quite often.
Related
I am generating a .xls file, in which there is a column containing formulas like these:
IF(A1=1; 'xxx'; IF(A1=2; 'yyy'; IF(A1=3; 'zzz' ...
You know, it would be a SWITCH if it wouldn't be excel formula.
The problem is, depending on how many IF's I use, the time it takes to generate the .xls file grows exponentially.
The filesize is not much different.
I have 18 cases, which means 18 IF's and that takes just unacceptable amount of time.
Why is that so? Is there anything I might be doing wrong?
Here is a sample code:
for ($k = 1; $k<16; $k++){
$cellID = "A".($row+$k);
$codes_if = '=IF('.$cellID.'="1",4579,'
.'IF('.$cellID.'="2",7978,'
... // some more IF's
.'""))))))))))))))))';
$actSheet->SetCellValue("B".($row+$k),$codes_if);
}
A formula like this with multiple nested IFs will be inefficient anyway
Consider replacing your multiple nested IFs with VLOOKUP instead
e.g.
=VLOOKUP(A1,E1:F3,2,TRUE)
where column E contains the lookup values 1, 2, 3, ... and column F contains the return values aaa, yyy, zzz, etc)
That is the equivalent of a switch statement in MS Excel
You will find that using VLOOKUP is a lot more efficient than your nested IFs
To reduce the time it takes to save the file further, be aware that PHPExcel calculates all formulae before saving by default. You can change that behaviour by calling
$objWriter->setPreCalculateFormulas(false);
before calling the save
I'm importing an .xls file using the following connection string:
If _
SetDBConnect( _
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & filepath & _
";Extended Properties=""Excel 8.0;HDR=Yes;IMEX=1""", True) Then
This has been working well for parsing through several Excel files that I've come across. However, with this particular file, when I SELECT * into a DataTable, there is a whole column of data, Item Description, missing from the DataTable. Why?
Here are some things that may set this particular workbook apart from the others that I've been working with:
The workbook has a freeze pane consisting of the first 24 rows (however, all of these rows appear in the DataTable)
There is some weird cell highlighting going on throughout the workbook
That's pretty much it. I can't see anything that would make the Item Description column not import correctly. Its data is comprised of all Strings that really have no special characters apart from &. Additionally, each data entry in this column is a maximum of 20 characters. What is happening? Is there any other way I can get all of the data? Keep in mind I have to use the original file and I cannot alter it, as I want this to ultimately be an automated process.
Thanks!
Some initial thoughts/questions: Is the missing column the very first column? What happens if you remove the space within "Item Description"? Stupid question, but does that column have a column header?
-- EDIT 1 --
If you delete that column, does the problem move to another column (the new index 4), or is the file complete. My reason for asking this -- is the problem specific to data in that column/header, or is the problem more general, on index 4.
-- EDIT 2 --
Ok, so since we know it's that column, we know it's either the header, or the rows. Let's concentrate on rows for now. Start with that ampersand; dump it, and see what happens. Next, work with the first 50% of rows. Does deleting that subset affect anything? What about the latter 50% of rows? If one of those subsets changes the result, you ought to be able to narrow it down to an individual row (hopefully not plural) by halfing your selection each time.
My guess is that you're going to find a unicode character or something else funky is one of the cells. Maybe there's a formula or, as you mentioned, some of that "weird cell highlighting."
It's been years since I worked with excel access, but I recall some problems with excel grouping content into some areas that would act as table inside each sheet. Try copy/paste the content from the problematic sheet to a new workbook and connect to that workbook. If this works you may be able to investigate a bit further about areas.
I parsed through an Excel spreadsheet and returned the entire result as a DataTable. However, this Excel spreadsheet has several empty rows that I would like to eliminate from the resulting DataTable. In particular, each empty row begins with an empty cell. So, I know that the entire row is empty if the value of the cell at the first index is empty. Note that I cannot simply modify the Excel spreadsheet because I have to work with exactly what the client has sent to me. Based on this information, I assumed that I could perform the following function to remove empty rows:
' DataTable dt has already been populated with the data
For Each row As DataRow In dt.Rows
If dt.Rows.Item(0).ToString = "" Then
dt.Rows.Remove(row)
ElseIf dt.Rows.Item(0) Is Nothing Then
dt.Rows.Remove(row)
End If
Next
However, after crafting this solution, I am greeted with the following error:
Collection was modified; enumeration operation might not execute.
I now realize that I cannot alter the collection as I access it. How can I get around this? I'm wondering if I should create a new DataTable with the rows that aren't empty. Is this the best approach or are there better options?
EDIT: I have also tried iterating over the rows backwards:
For i = dt.Rows.Count() - 1 To 0
If dt.Rows.Item(i).Item(0).ToString = "" Then
dt.Rows.RemoveAt(i)
End If
Next
You can't modify the collection while you're enumerating it with For Each, but a simple For loop will work. You'll need to loop backwards to avoid skipping the row after a removed row.
You've also got your tests the wrong way round; if Item(0) returns Nothing, then Item(0).ToString will throw a NullReferenceException.
I'm assuming the dt.Rows.Item(0) is a typo, and should read row.Item(0) instead.
For i As Integer = dt.Rows.Count - 1 To 0 Step -1
Dim row As DataRow = dt.Rows(i)
If row.Item(0) Is Nothing Then
dt.Rows.Remove(row)
ElseIf row.Item(0).ToString = "" Then
dt.Rows.Remove(row)
End If
Next
Vb.Net using linq
Dtset.Tables(0).AsEnumerable().Where(Function(row) row.ItemArray.All(Function(field) field Is Nothing Or field Is DBNull.Value Or field.Equals(""))).ToList().ForEach(Sub(row) row.Delete())
Dtset.Tables(0).AcceptChanges()
I'm really new to the use of closedXMl and Excel too(at least for this purpose) so sorry if I'm asking silly questions.
I know that closedXML doesn't support charts yet so the only thing that came to mind to get around this was to create my chart using an excel table . That way I thought ClosedXML would update the ranges when I inserted new rows and the chart would pick up on it. Well , it didn't. At least not when I add the rows from code using the closedXML library.
What is curious is that adding new rows from inside excel automatically updates the chart but if I want to get that same result from code, I have to use OFFSET formula along with named ranges and then set the chart source data to these named ranges.
That's why I'd like to know if if there is anything wrong with the code I use to insert new rows:
Dim ruta As String = Server.MapPath("~/Templates/MyTemplate.xlsx")
Dim wb As New XLWorkbook(ruta)
Dim ws = wb.Worksheet(1)
Dim tblData = ws.Table("Table1")
Dim year As Integer = 2000
For i As Integer = 1 To 13
With tblData.DataRange.LastRow()
.Field("Year").SetValue(year)
.Field("Sales").SetValue(CInt(Math.Floor((2000 - 500 + 1) * Rnd())) + 500)
End With
tblData.DataRange.InsertRowsBelow(1)
year = year + 1
Next
tblData.LastRow.Delete()
As you can see the code is very simple and so is the template , that consists of only two columns : "Year"(table1[Year]) and "Sales"(Table1[Sales]
I don't think this has anything to do with my template because as I told you, adding new rows directly from excel works as expected and it is only when I generate the table from code that the chart series doesn't include the new row that were added
Being necessary to manually add the new ranges(Sheet1!Table1[Sales] and Sheet1!Table1[Year]) as it only includes the first row(the one added by default when you insert a table)
Any help would be much appreciated
P.S. Here is a link to a rar containing the full code as well as the excel template(\Templates\MyTemplate.xlsx)
If the problem is that your table doesn't recognise the additional rows, try adding this after the last row delete:
tblData.Resize tblData.Range(1, 1).CurrentRegion
That should resize the table. Then hopefully your table operations should work.
I am working with Excel 2003 and trying to find the total of individual criteria. I am currently using this formula and it is working successfully.
Data A1:G1776 is the the database ---
Data C1 - is the column that has what I want total --- and
F4:F5 is a column where I set up a the criteria for the line to match.
=DSUM(DATA!$A$1:$G$17996,DATA!$C$1,$F$4:F5)
The problem I am running into is that the file size is over 5MB, which is huge when you are trying to email it to other people.
Any suggestions how I can replicate that formula, while decrease the file size and also improving the speed of the document? I am not wishing to use a Pivot Table
A replacement for DSUM, you could use a sumproduct formula:
=SUMPRODUCT((DATA!$C$2:$C$17996)*(DATA!$A$2:$A$17996="Boys")*(DATA!$B$2:$B17996>18))
The above example creates a total of column C only including rows where: Column A are "boys and column B is greater than 18.
The example assumes that row 1 is a header row.
For speeding up calculations you could use VBA to enable calculation of indvidual sheets.
VBA for enabling calculation:
Public Sub enableCalc(ParamArray sheetsInUse())
Dim i As Integer
For i = 0 To UBound(sheetsInUse) Step 1
sheetsInUse(i).EnableCalculation = True
Next i
End Sub
Called using: enableCalc activeworkbook.Worksheets("Sheet1")
Which would enable calculation Sheet1 in the activeworkbook
VBA for disabling calculation for all sheets in workbook:
Public Sub finishedUse(wrkbook As Workbook)
Dim i As Integer
Dim wrkSheet As Worksheet
For Each wrkSheet In wrkbook.Worksheets
wrkSheet.EnableCalculation = False
Next wrkSheet
End Sub
Called using: finishedUse activeworkbook
Which would disable calculation of all the sheets in the activeworkbook.
The above method isnt effected by changing Automatic/Manual Calculation in Tools --> Options