i'm trying to loop over a datatable with more then 100 000 row using the Parrallel For each. Everything work fine up to around 25 000 iterations. I dont get any error, and I see the apps still working, but it kind of block and nothing happen. I tried to encapsulate the loop in a factory.startnew and I get a random abort expection at around 5000 iterations for no reason.
Dim lstExceptions As New ConcurrentQueue(Of Exception)
Dim options As New ParallelOptions
options.MaxDegreeOfParallelism = 3
Parallel.ForEach(ReservationReportDS.Tables(0).AsEnumerable(), options,
Sub(row)
Try
Dim tmpRow As DataRow = CType(row, DataRow)
Dim ReservationID As Integer = tmpRow.Field(Of Integer?)("autNoReservation")
Dim customerID As Integer = tmpRow.Field(Of Integer?)("CustomerID")
Dim VehiculeID As Integer = tmpRow.Field(Of Integer?)("autNoVehicule")
Dim bill As New BillingPath()
bill.Calculate_Billing(ReservationID, customerID, VehiculeID)
Catch err As Exception
lstExceptions.Enqueue(err)
End Try
End Sub
)
If (lstExceptions.Count > 0) Then
Throw New AggregateException(lstExceptions)
End If
Catch errAgg As AggregateException
For Each ex As Exception In errAgg.InnerExceptions
Log(Log_Billing_UI, "", System.Reflection.MethodBase.GetCurrentMethod().Name & GetExceptionInfo(ex))
Next
Catch ex As Exception
Log(Log_Billing_UI, "", System.Reflection.MethodBase.GetCurrentMethod().Name & GetExceptionInfo(ex))
End Try
Since you have such amount of records, I would like to recommend you to think about following concept:
Read all records into ConcurrentQueue(Of SomeBillingInfoClass) collection first - it will allow you to not keep connection to DB opened, make thread-safe rest operations with data readed from DB.
Create list of Tasks with Billing calc code inside. This will allow you to run tasks in parallel and pass ConcurrentQueue variable from #1 easily.
Keep tasks running in loop while at least one element in ConcurrentQueue remains.
In case you can aggregate billing calculation result to some other class - you may do it using additional thread safe ConcurrentQueue(Of BillingCalcResultInfoClass) collection.
After all billings are calculated - write to DB in single thread and single long transaction - this may be faster then granular writing to db.
Some notes about your code - I think you may not need to throw AggregateException manually - .Net environment will do it for you automatically. You only will need to catch it in .ContinueWith() method of task (sorry, mostly I'm c# developer and use c# notation).
I used similar approach to process millions of records and it works fine. Typically I use 3-5 tasks. But you can always study how much tasks you may have.
Using ConcurrentQueue or similar thread safe collection will allow you to keep your code thread safe more easily.
Please let me know if you have any questions.
Thank you all for your answers and especially Anton Norko. I finally found the problem and it was on my side. Under certain condition, Calculate_Billing was stuck in an infinite loop. Since I used 3 threads at the same time, they were getting stuck one by one.
Related
I have a very long running process in an ASP.net application that we desperately need to dramatically shorten. The process in question is charging a large number of credit cards. Currently it performs at about 1 charge per second. We need this to be more like 10 per second.
So we decided that utilizing multiple simultaneous threads would be one way to go. So we basically take this large list of orders to process, divide the list into ten lists and then spawn a new thread to process each of the ten lists simultaneously.
An additional complication of this process is that we need to report progress on this process, and not only to the user session that initiated the process, but to any user, in any session in the application. So for example, if I log in and start this process, I will see a progress bar. If after I initiate the process, and it is still running, another user logs in elsewhere and goes to this same page, they will also see the progress bar.
I did some research and thought that I could use Application variables to store the relevant bits of information required to report progress. The client polls the server on a regular basis whenever on this page to see if there are any threads running, and if so, it returns various statistics on the progress of the process back to the client.
It would seem that this approach does not work. A simple counter of the number of currently running threads does not work as expected. It seems that the so-called thread safety of the Application object is safe in that no two threads will be able to access the same variable simultaneously, but not safe in that if two threads both attempt to increment a variable, one of them will be able to increment it, and the other will not, and rather than queue up and increment it in turn, the second thread just moves on. I'm sure this is my thread safety ignorance shining through.
Another issue is that using Debug.Print or Debug.WriteLine seem to be the same kind of "thread-safe" as the Application object. As each thread starts, we use Debug.WriteLine to output the name and start time of the thread, and as it completes, we do the same thing to write that it completed. We consistently see ten threads start and four threads end in the debug window.
I don't think we need to use Application.Lock() and Application.Unlock(), but I have tried it both with and without those calls before and after every write operation, but to no avail- the results are the same either way.
I have a ton of code, so I'm not sure exactly which parts to share, but here are some of the relevant parts:
This is how we create and start the threads:
For Each oBatch As List(Of Guid) In oOrderBatches
Dim t As New Threading.Thread(Sub() ProcessPaymentBatch(oBatch, clubrunid, oToken.UserID))
t.IsBackground = True
t.Start()
Next
Here is the sub that is started by each thread:
Private Sub ProcessPaymentBatch(oBatch As List(Of Guid), clubrunid As String, UserID As Guid)
ThreadsRunning(clubrunid) += 1
Try
Debug.Print("Thread Start")
For Each oID As Guid In oBatch
‘Do a bunch of processing stuff…
Next
Finally
ThreadsRunning(clubrunid) -= 1
Debug.Print("Thread End")
End Try
End Sub
Finally, this is an example of one of the application variables that the threads attempt to access, but seems to be failing.
Private Const _THREADSRUNNING As String = "ThreadsRunningThisRun_"
Public Property ThreadsRunning(clubid As String) As Integer
Get
Dim sToken As String = _THREADSRUNNING & clubid
If Application(sToken) Is Nothing Then
ThreadsRunning(clubid) = 0
End If
Return Application(sToken)
End Get
Set(ByVal value As Integer)
Debug.Print(value)
Dim sToken As String = _THREADSRUNNING & clubid
Application.Lock()
Application(sToken) = value
Application.UnLock()
End Set
End Property
The Debug output from this property looks something like this:
Thread Start
1
Thread Start
Thread Start
1
1
4
Thread End
5
3
Thread Start
6
3
1
-1
Thread End
-2
-3
I can't understand why there would be a different number of "Thread Start" and "Thread End" debug statements, and I don't understand how the thread count could get to negative numbers. This is why I am confused by the thread safety of the Application and Debug objects.
Your help in this matter would be greatly appreciated!
Nevermind, I was just being an idiot. The problem had nothing to do with the Application or Debug objects not being thread safe, the problem was in my methodology (as was expected really).
To clarify, the issue was that we were locking the global variables in the application object when writing, but not when reading. We then tried also locking when reading, but still had the same problem. What we failed to realize was that when incrementing a value, you are getting the current value, adding onto that, then setting the new value. The lock needed to bridge all three of those operations, so it goes like this:
Lock
Get
Add
Set
Unlock
What we were doing previously was:
Lock
Get
Unlock
Add
Lock
Set
Unlock
Which allowed for multiple threads to Get and then Set the same values as one another, which explains all of the oddities we were seeing in the debug window.
ASP.NET WEB PROGRAMMING
How to optimally handle
Open
Close
Dispose
Exception
when retrieving data from a database.
LOOKING TO OPTIMISE
I have always used the following to make a connection, catch any errors, and then correctly dispose of the connection in either event.
VB.NET
Try
con.Open()
//...
con.Close()
Catch ex As Exception
lbl.Text = ex.Message
Finally
If con IsNot Nothing Then
con.Dispose()
End If
End Try
After reading many articles I find people practically throwing up at this code; however, I do not see any other way to accomodate the four steps required efficiently.
The alternative, and I believe more cpu friendly USING statement, seems to be the tool of choice for the correct disposal of a sql connection. But what happens when bad data is retrieved from the database and there is nothing in place to indicate to an end user what went wrong?
QUESTION
Between: Using, try catch, and other.
Which is faster, cleaner, and/or most efficient way to handle a data retrieval statement?
I am happy to hear people's opinions but I am looking for facts.
You can also use the following block of code as template. It combines the Using...End Using and Try...Catch blocks.
Using conn As New SqlConnection(My.Settings.SQLConn)
Try
Conn.open
Catch ex As SqlException
End Try
End Using
There is no need to call conn.Dispose() because using block does that automatically.
Use Entity Framework, it implements Unit Of Work Pattern for You efficiently, and perform your operations within transaction scope
Always use the "Using" process as it will automatically assign and then free up system resources. You are still able to perform the try-catch within this to present errors to the user.
I have a function that parses an input file.
Private Function getSvSpelOdds(ByVal BombNo As Integer) As Boolean
Dim InputFileBase As String = HttpContext.Current.Application("InputFileBase")
strInputFile = InputFileBase & "PC_P7_D.TXT"
OddsReader = New StreamReader(strInputFile)
'some other code
End Function
If the file is not there (getSvSpelOdds returns False), I would like to retry after 30 seconds.
To achieve this I use a timer.
If Not getSvSpelOdds(y) Then
Timer1.Interval = 30000
End If
Private Sub Timer1_Elapsed(sender As Object, e As System.Timers.ElapsedEventArgs) Handles Timer1.Elapsed
getSvSpelOdds(y)
End Sub
Problem is that when timer fires the HttpContext.Current (used to get the value of gloal variable) is null.
Should I use some other approach to get this to work?
As already described HttpContext should be null as Timer_Elapsed is called in different thread. But you may use System.Web.HttpRuntime.Cache to pass filename, cache should be accessible from all threads.
HttpContext.Current only gives you the context you want when you call it on the thread that handles the incoming thread.
When calling it outside of such threads, you get null. That matches your case, as Timer1_Elapsed is executed on a new thread.
Should I use some other approach to get this to work?
Almost certainly, yes. 30 seconds is a long time to wait without giving any feedback to users.
It would probably be better to return a "no results are available yet, but we're still looking" page to the user. That page can be set to refresh automatically after 30 seconds, by adding a suitable meta-tag:
<META HTTP-EQUIV="refresh" CONTENT="30">
And you then get a fresh request/response cycle on the server. And haven't tied up server resources in the meantime.
Other answers seems to address the other part of your question (about why it doesn't work in the timer callback)
The Elapsed event on the Timer will run on a separate thread therefore its expected behaviour for the current context to be null.
You can only access it from the same thread.
Should I use some other approach to get this to work?
Yes, it's not generally a good idea to mix ASP.NET and threads given the complexity of how ASP.NET works. Like already mentioned its not a great UX to have no feedback for 30 seconds, its better to let the user know what's actually going on.
Also, you need to determine whether the timeout length is appropriate or whether a timeout is needed at all. I don't know the nature of your application but I assume there is some external means for the file to be generated and picked up by your site.
is there a way in asp.net to make sure that a certain threaded sub is not run twice concurrently, no matter what?
the code i have now is
Public Class CheckClass
ReadOnly Property CheckSessionsLock As Object
Get
If HttpRuntime.Cache("CheckSessionsLock") Is Nothing Then HttpRuntime.Cache("CheckSessionsLock") = New Object
Return HttpRuntime.Cache("CheckSessionsLock")
End Get
End Property
Sub TryThreads()
Dim thread = New Thread(AddressOf TryLock)
thread.Priority = ThreadPriority.Lowest
thread.Start()
End Sub
Sub TryLock()
SyncLock CheckSessionsLock
DoTrace("entered locker")
For x = 0 To 10000
Next
DoTrace("exiting locker")
End SyncLock
DoTrace("exited locker")
End Sub
End Class
if i run this code on every page then several times the code overlaps. the DoTrace function in the code simply writes the message to a table.
the messages in the table should appear in order (entered,exiting,exited) again and again, but in reality, they don't. i get like entered, exiting,entered,exited,exiting...
this means that the synclock is not complete. is that true?
if so, how can we implement a complete synclock on a block of code, across requests and across sessions?
EDIT: i need this lock, as the real code will be sending emails, according to a list of mailing types in a db. after each mailing type is sent, its marked, then it continues with the next mailing. i cant have in middle of processing, another thread should see this mailing as unprocessed.
please advise
Rather than using the HttpRuntime Cache have you considered using a static variable?
Just as a note (it might be helpful to explain why you want this functionality) your website is not going to be very scalable if this can only be run once at a time.
In C# (sorry, don't know VB syntax) I use this:
private static readonly object Padlock = new object();
It's a field, not a property,
It's static (in VB, that's "shared" if I'm not mistaken) so it's the same throughout the entire application
It's initialised once as soon as you use this class, not when you explicitly use the field.
With your property/cache version, you could have two threads trying to get the lock-object and each creating a different one:
Thread 1 checks the cache and doesn't find the object
Thread 1 is parked
Thread 2 checks the cache, doesn't find the object
Thread 2 creates the object and caches it, retrieves it again and returns from the property
Thread 1 resumes
Thread 1 creates a new object and caches it, retrieves it again and returns a different lock object than thread 2 uses
Any further threads will use the lock object of thread 1
''' <summary>
''' Returns true if a submission by the same IP address has not been submitted in the past n minutes.
'' </summary>
Protected Function EnforceMinTimeBetweenSubmissions(ByVal minTimeBetweenRequestsMinutes as Integer) As Boolean
If minTimeBetweenRequestsMinutes = 0 Then
Return True
End If
If Cache("submitted-requests") Is Nothing Then
Cache("submitted-requests") = New Dictionary(Of String, Date)
End If
' Remove old requests. '
Dim submittedRequests As Dictionary(Of String, Date) = CType(Cache("submitted-requests"), Dictionary(Of String, Date))
Dim itemsToRemove = submittedRequests.Where(Function(s) s.Value < Now).Select(Function(s) s.Key).ToList
For Each key As String In itemsToRemove
submittedRequests.Remove(key)
Next
If submittedRequests.ContainsKey(Request.UserHostAddress) Then
' User has submitted a request in the past n minutes. '
Return False
Else
submittedRequests.Add(Request.UserHostAddress, Now.AddMinutes(minTimeBetweenRequestsMinutes))
End If
Return True
End Function
No. The ASP.NET Cache is not inherently thread-safe and it looks like you are creating objects in the Cache depending on whether they exist or not.
You need to lock the Cache when writing to it.
Let me word things a little differently. The code is, in fact, thread safe. The way you currently have it coded though could cause performance issues in multi-threaded situations.
In this case, multiple users would be running the same code simultaneously, theoretically accessing and modifying the same cache objects at the same time. As that scenario scales up, performance suffers.
Creating a lock will improve performance under heavy load (while imposing a slight overhead under light load) because you won't be fetching data neadlessly due to Caching issues.
The System.Web.Caching.Cache class is thread-safe according to the MSDN documenation. However, the documenation also shows an example where a read and a write are performed on the cache without locking. That cannot possibily be thread-safe since the write is dependent on the read. The code you posted basically looks like the example. I definitely recommend putting a lock around the entire method.