Datanucleus, Is PMF thread save ? issue in tomcat - jdo

Do you know if PMF in datanucleus is thread safe ? I have following issue:
I have one PMF which creates several PersistenceManagers, one for method invocation. The PMF is share between threads and methods.
Everyghing is ok when running application in single thread, but when app was deployed on Tomcat stress test shown that in some situation persistence managers were stack on simultanous getObjectById invocations.
This is not a lock issue becasue the nontransactional read was on. The multithread flag was also set to true.
The problem disapeared when I created new PMF every time. But this solution is expensive.
Any ideas on better solution ?

PMF is thread-safe as the JDO spec says very clearly. PM is not thread-safe and all DataNucleus docs advise on one PM per request. Since you (say you) have 1 PM per request then there is no prospect of locking since the PM shares nothing with another PM.
If you say you have blocking you could easily provide a thread dump that shows how two PMs are locking each other, which then demonstrates what is blocking what and why. Possibilities ... L2 cache, store manager (datastore). Obviously using recent releases makes sense for all of that

Related

Axon Event Processing Timeout

I am using an Axon Event Tracking processor. Sometimes events take longer that 10 seconds to process.
This seems to cause the message to be processed again and this appears in the log "Releasing claim of token X/0 failed. It was owned by another node."
If I up the number of segments it does not log this BUT the event is still processed twice so I think this might be misleading. (I think I was mistaken about this)
I have tried adjusting the fetchDelay, cleanupDelay and tokenClaimInterval. None of which has fixed this. Is there a property or something that I am missing?
Edit
The scenario taking longer than 10 seconds is making a HTTP request to an external service.
I'm using axon 4.1.2 with all default configuration when using with Spring auto configuration. I cannot see the Releasing claim on token and preparing for retry in [timeout]s log.
I was having this issue with a single segment and 2 instances of the application. I realised I hadn't increased the number of segments like I thought I had.
After further investigation I have discovered that adding an additional segment seems to have stopped this. Even if I have for example 2 segments and 6 applications it still doesn't reappear, however I'm not sure how this is different to my original scenario of 1 segment and 2 application?
I didn't realise it would be possible for multiple threads to grab the same tracking token and process the same event. It sounds like the best action would be to put an idem-potency check before the HTTP call?
The Releasing claim of token [event-processor-name]/[segment-id] failed. It was owned by another node. message can only occur in three scenarios:
You are performing a merge operation of two segments which fails because the given thread doesn't own both segments.
The main event processing loop of the TrackingEventProcessor is stopped, but releasing the token claim fails because the token is already claimed by another thread.
The main event processing loop has caught an Exception, making it retry with a exponential back-off, and it tries to release the claim (which might fail with the given message).
I am guessing it's not options 1 and 2, so that would leave us with option 3. This should also mean you are seeing other WARN level messages, like:
Releasing claim on token and preparing for retry in [timeout]s
Would you be able to share whether that's the case? That way we can pinpoint a little better what the exact problem is you are encountering.
By the way, very likely you have several processes (event handling threads of the TrackingEventProcessor) stealing the TrackingToken from one another. As they're stealing an un-updated token, both (or more) will handled the same event. Hence why you see the event handler being invoked twice.
Obviously undesirable behavior and something we should resolve for you. I would like to ask you to provide answers to my comments under the question, as right now I have to little to go on. Let us figure this out #Dan!
Update
Thanks for updating your question #dan, that's very helpful.
From what you've shared, I am fairly confident that both instances are stealing the token from one another. This does depend though on whether both are using the same database for the token_entry table (although I am assuming they are).
If they are using the same table, then they should "nicely" share their work, unless one of them takes to long. If it takes to long, the token will be claimed by another process. This other process in this case is the thread of the TEP of your other application instance. The "claim timeout" is defaulted to 10 seconds, which also corresponds with the long running event handling process.
This claimTimeout is adjustable though, by invoking the Builder of the JpaTokenStore/JdbcTokenStore (depending on which you are using / auto wiring) and calling the JpaTokenStore.Builder#claimTimeout(TemporalAmount) method. And, I think this would be required on your end, giving the fact you have a long running operation.
There are of course different ways of tackling this. Like, making sure the TEP is only ran on a single instance (not really fault tolerant though), or offloading this long running operation to a schedule task which is triggered by the event.
But, I think we've found the issue at least, so I'd suggest to tweak the claimTimeout and see if the problem persists.
Let us know if this resolves the problem on your end #dan!

Slow Transactions - WebTransaction taking the hit. What does this mean?

Trying to work out why some of my application servers have creeped up over 1s response times using newrelic. We're using WebApi 2.0 and MVC5.
As you can see below the bulk of the time is spent under 'WebTransaction'. The throughput figures aren't particularly high - what could be causing this, and what are the steps I can take to reduce it down?
Thanks
EDIT I added transactional tracing to this function to get some further analysis - see below:
Over 1 second waiting in System.Web.HttpApplication.BeginRequest().
Any insight into this would be appreciated.
Ok - I have now solved the issue.
Cause
One of my logging handlers which syncs it's data to cloud storage was initializing every time it was instantiated, which also involved a call to Azure table storage. As it was passed into the controller in question, every call to the API resulted in this instantiate.
It was a blocking call, so it added ~1s to every call. Once i configured this initialization to be server life-cycle wide,
Observations
As the blocking call was made at the time of the Controller being build (due to Unity resolving the dependancies at this point) New Relic reports this as
System.Web.HttpApplication.BeginRequest()
Although I would love to see this a little granular, as we can see from the transactional trace above it was in fact the 7 calls to table storage (still not quite sure why it was 7) that led me down this path.
Nice tool - my new relic subscription is starting to pay for itself.
It appears that the bulk of time is being spent in Account.NewSession. But it is difficult to say without drilling down into your data. If you need some more insight into a block of code, you may want to consider adding Custom Instrumentation
If you would like us to investigate this in more depth, please reach out to us at support.newrelic.com where we will have you account information on hand.

How commonly do deadlock issues occur in programming?

I've programmed in a number of languages, but I am not aware of deadlocks in my code.
I took this to mean it doesn't happen.
Does this happen frequently (in programming, not in the databases) enough that I should be concerned about it?
Deadlocks could arise if two conditions are true: you have mutilple theads, and they contend for more than one resource.
Do you write multi-threaded code? You might do this explicitly by starting your own threads, or you might work in a framework where the threads are created out of your sight, and so you're running in more than one thread without you seeing that in your code.
An example: the Java Servlet API. You write a servlet or JSP. You deploy to the app server. Several users hit your web site, and hence your servlet. The server will likely have a thread per user.
Now consider what happens if in servicing the requests you want to aquire some resources:
if ( user Is Important ){
getResourceA();
}
getResourceB();
if (today is Thursday ) {
getResourceA();
}
// some more code
releaseResourceA();
releaseResoruceB();
In the contrived example above, think about what might happen on a Thursday when an important user's request arrives, and more or less simultaneously an unimportant user's request arrives.
The important user's thread gets Resoruce A and wants B. The less important user gets resource B and wants A. Neither will let go of the resource that they already own ... deadlock.
This can actually happen quite easily if you are writing code that explicitly uses synchronization. Most commonly I see it happen when using databases, and fortunately databases usually have deadlock detection so we can find out what error we made.
Defense against deadlock:
Acquire resources in a well defined order. In the aboce example, if resource A was always obtained before resource B no deadlock would occur.
If possible use timeouts, so that you don't wait indefinately for a resource. This will allow you to detect contention and apply defense 1.
It would be very hard to give an idea of how often it happens in reality (in production code? in development?) and that wouldn't really give a good idea of how much code is vulnerable to it anyway. (Quite often a deadlock will only occur in very specific situations.)
I've seen a few occurrences, although the most recent one I saw was in an Oracle driver (not in the database at all) due to a finalizer running at the same time as another thread trying to grab a connection. Fortunately I found another bug which let me avoid the finalizer running in the first place...
Basically deadlock is almost always due to trying to acquire one lock (B) whilst holding another one (A) while another thread does exactly the same thing the other way round. If one thread is waiting for B to be released, and the thread holding B is waiting for A to be released, neither is willing to let the other proceed.
Make sure you always acquire locks in the same order (and release them in the reverse order) and you should be able to avoid deadlock in most cases.
There are some odd cases where you don't directly have two locks, but it's the same basic principle. For example, in .NET you might use Control.Invoke from a worker thread in order to update the UI on the UI thread. Now Invoke waits until the update has been processed before continuing. Suppose your background thread holds a lock with the update requires... again, the worker thread is waiting for the UI thread, but the UI thread can't proceed because the worker thread holds the lock. Deadlock again.
This is the sort of pattern to watch out for. If you make sure you only lock where you need to, lock for as short a period as you can get away with, and document the thread safety and locking policies of all your code, you should be able to avoid deadlock. Like all threading topics, however, it's easier said than done.
If you get a chance take a look at first few chapters in Java Concurrency in Practice.
Deadlocks can occur in any concurrent programming situation, so it depends how much concurrency you deal with. Several examples of concurrent programming are: multi-process, multi-thread, and libraries introducing multi-thread. UI frameworks, event handling (such as timer event) could be implemented as threads. Web frameworks could spawn threads to handle multiple web requests simultaneously. With multicore CPUs you might see more concurrent situations visibly than before.
If A is waiting for B, and B is waiting for A, the circular wait causes the deadlock. So, it also depends on the type of code you write as well. If you use distributed transactions, you can easily cause that type of scenario. Without distributed transactions, you risk bank accounts from stealing money.
All depends on what you are coding. Traditional single threaded applications that do not use locking. Not really.
Multi-threaded code with multiple locks is what will cause deadlocks.
I just finished refactoring code that used seven different locks without proper exception handling. That had numerous deadlock issues.
A common cause of deadlocks is when you have different threads (or processes) acquire a set of resources in different order.
E.g. if you have some resource A and B, if thread 1 acquires A and then B, and thread 2 acquires B and then A, then this is a deadlock waiting to happen.
There's a simple solution to this problem: have all your threads always acquire resources in the same order. E.g. if all your threads acquire A and B in that order, you will avoid deadlock.
A deadlock is a situation with two processes are dependent on each other - one cannot finish before the other. Therefore, you will likely only have a deadlock in your code if you are running multiple code flows at any one time.
Developing a multi-threaded application means you need to consider deadlocks. A single-threaded application is unlikely to have deadlocks - but not impossible, the obvious example being that you may be using a DB which is subject to deadlocking.

Long-running ASP.NET tasks

I know there's a bunch of APIs out there that do this, but I also know that the hosting environment (being ASP.NET) puts restrictions on what you can reliably do in a separate thread.
I could be completely wrong, so please correct me if I am, this is however what I think I know.
A request typically timeouts after 120 seconds (this is configurable) but eventually the ASP.NET runtime will kill a request that's taking too long to complete.
The hosting environment, typically IIS, employs process recycling and can at any point decide to recycle your app. When this happens all threads are aborted and the app restarts. I'm however not sure how aggressive it is, it would be kind of stupid to assume that it would abort a normal ongoing HTTP request but I would expect it to abort a thread because it doesn't know anything about the unit of work of a thread.
If you had to create a programming model that easily and reliably and theoretically put a long running task, that would have to run for days, how would you accomplish this from within an ASP.NET application?
The following are my thoughts on the issue:
I've been thinking a long the line of hosting a WCF service in a win32 service. And talk to the service through WCF. This is however not very practical, because the only reason I would choose to do so, is to send tasks (units of work) from several different web apps. I'd then eventually ask the service for status updates and act accordingly. My biggest concern with this is that it would NOT be a particular great experience if I had to deploy every task to the service for it to be able to execute some instructions. There's also this issue of input, how would I feed this service with data if I had a large data set and needed to chew through it?
What I typically do right now is this
SELECT TOP 10 *
FROM WorkItem WITH (ROWLOCK, UPDLOCK, READPAST)
WHERE WorkCompleted IS NULL
It allows me to use a SQL Server database as a work queue and periodically poll the database with this query for work. If the work item completed with success, I mark it as done and proceed until there's nothing more to do. What I don't like is that I could theoretically be interrupted at any point and if I'm in-between success and marking it as done, I could end up processing the same work item twice. I might be a bit paranoid and this might be all fine but as I understand it there's no guarantee that that won't happen...
I know there's been similar questions on SO before but non really answers with a definitive answer. This is a really common thing, yet the ASP.NET hosting environment is ill equipped to handle long-running work.
Please share your thoughts.
Have a look at NServiceBus
NServiceBus is an open source
communications framework for .NET with
build in support for publish/subscribe
and long-running processes.
It is a technology build upon MSMQ, which means that your messages don't get lost since they are persisted to disk. Nevertheless the Framework has an impressive performance and an intuitive API.
John,
I agree that ASP.NET is not suitable for Async tasks as you have described them, nor should it be. It is designed as a web hosting platform, not a back of house processor.
We have had similar situations in the past and we have used a solution similar to what you have described. In summary, keep your WCF service under ASP.NET, use a "Queue" table with a Windows Service as the "QueueProcessor". The client should poll to see if work is done (or use messaging to notify the client).
We used a table that contained the process and it's information (eg InvoicingRun). On that table was a status (Pending, Running, Completed, Failed). The client would submit a new InvoicingRun with a status of Pending. A Windows service (the processor) would poll the database to get any runs that in the pending stage (you could also use SQL Notification so you don't need to poll. If a pending run was found, it would move it to running, do the processing and then move it to completed/failed.
In the case where the process failed fatally (eg DB down, process killed), the run would be left in a running state, and human intervention was required. If the process failed in an non-fatal state (exception, error), the process would be moved to failed, and you can choose to retry or have human intervantion.
If there were multiple processors, the first one to move it to a running state got that job. You can use this method to prevent the job being run twice. Alternate is to do the select then update to running under a transaction. Make sure either of these outside a transaction larger transaction. Sample (rough) SQL:
UPDATE InvoicingRun
SET Status = 2 -- Running
WHERE ID = 1
AND Status = 1 -- Pending
IF ##RowCount = 0
SELECT Cast(0 as bit)
ELSE
SELECT Cast(1 as bit)
Rob
Use a simple background tasks / jobs framework like Hangfire and apply these best practice principals to the design of the rest of your solution:
Keep all actions as small as possible; to achieve this, you should-
Divide long running jobs into batches and queue them (in a Hangfire queue or on a bus of another sort)
Make sure your small jobs (batched parts of long jobs) are idempotent (have all the context they need to run in any order). This way you don't have to use a quete which maintains a sequence; because then you can
Parallelise the execution of jobs in your queue depending on how many nodes you have in your web server farm. You can even control how much load this subjects your farm to (as a trade off to servicing web requests). This ensures that you complete the whole job (all batches) as fast and as efficiently as possible, while not compromising your cluster from servicing web clients.
Have thought about the use the Workflow Foundation instead of your custom implementation? It also allows you to persist states. Tasks could be defined as workflows in this case.
Just some thoughts...
Michael

What are common reasons for deadlocks?

Deadlocks are hard to find and very uncomfortable to remove.
How can I find error sources for deadlocks in my code? Are there any "deadlock patterns"?
In my special case, it deals with databases, but this question is open for every deadlock.
Update: This recent MSDN article, Tools And Techniques to Identify Concurrency Issues, might also be of interest
Stephen Toub in the MSDN article Deadlock monitor states the following four conditions necessary for deadlocks to occur:
A limited number of a particular resource. In the case of a monitor in C# (what you use when you employ the lock keyword), this limited number is one, since a monitor is a mutual-exclusion lock (meaning only one thread can own a monitor at a time).
The ability to hold one resource and request another. In C#, this is akin to locking on one object and then locking on another before releasing the first lock, for example:
lock(a)
{
...
lock(b)
{
...
}
}
No preemption capability. In C#, this means that one thread can't force another thread to release a lock.
A circular wait condition. This means that there is a cycle of threads, each of which is waiting for the next to release a resource before it can continue.
He goes on to explain that the way to avoid deadlocks is to avoid (or thwart) condition four.
Joe Duffy discusses several techniques
for avoiding and detecting deadlocks,
including one known as lock leveling.
In lock leveling, locks are assigned
numerical values, and threads must
only acquire locks that have higher
numbers than locks they have already
acquired. This prevents the
possibility of a cycle. It's also
frequently difficult to do well in a
typical software application today,
and a failure to follow lock leveling
on every lock acquisition invites
deadlock.
The classic deadlock scenario is A is holding lock X and wants to acquire lock Y, while B is holding lock Y and wants to acquire lock X. Since neither can complete what they are trying to do both will end up waiting forever (unless timeouts are used).
In this case a deadlock can be avoided if A and B acquire the locks in the same order.
No deadlock patterns to my knowledge (and 12 years of writing heavily multithreaded trading applications).. But the TimedLock class has been of great help in finding deadlocks that exist in code without massive rework.
http://www.randomtree.org/eric/techblog/archives/2004/10/multithreading_is_hard.html
basically, (in dotnet/c#) you search/replace all your "lock(xxx)" statements with "using TimedLock.Lock(xxx)"
If a deadlock is ever detected (lock unable to be obtained within the specified timeout, defaults to 10 seconds), then an exception is thrown. My local version also immediately logs the stacktrace. Walk up the stacktrace (preferably debug build with line numbers) and you'll immediately see what locks were held at the point of failure, and which one it was attempting to get.
In dotnet 1.1, in a deadlock situation as described, as luck would have it all the threads which were locked would throw the exception at the same time. So you'd get 2+ stacktraces, and all the information necessary to fix the problem. (2.0+ may have changed the threading model internally enough to not be this lucky, I'm not sure)
Making sure all transactions affect tables in the same order is the key to avoiding the most common of deadlocks.
For example:
Transaction A
UPDATE Table A SET Foo = 'Bar'
UPDATE Table B SET Bar = 'Foo'
Transaction B
UPDATE Table B SET Bar = 'Foo'
UPDATE Table A SET Foo = 'Bar'
This is extremely likely to result in a deadlock as Transaction A gets a lock on Table A, Transaction B gets a lock on table B, therefore neither of them get a lock for their second command until the other has finished.
All other forms of deadlocks are generally caused through high intensity use and SQL Server deadlocking internally whilst allocated resources.
Yes - deadlocks occur when processes try to acquire resources in random order. If all your processes try to acquire the same resources in the same order, the possibilities for deadlocks are greatly reduced, if not eliminated.
Of course, this is not always easy to arrange...
The most common (according to my unscientific observations) DB deadlock scenario is very simple:
Two processes read something (a DB record for example), both acquire a shared lock on the associated resource (usually a DB page),
Both try to make an update, trying to upgrade their locks to exclusive ones - voila, deadlock.
This can be avoided by specifying the "FOR UPDATE" clause (or similar, depending on your particular RDBMS) if the read is to be followed by an update. This way the process gets the exclusive lock from the start, making the above scenario impossible.
I recommend reading this article by Herb Sutter. It explains the reasons behind deadlocking issues and puts forward a framework in this article to tackle this problem.
The typical scenario are mismatched update plans (tables not always updated in the same order). However it is not unusual to have deadlocks when under high processing volume.
I tend to accept deadlocks as a fact of life, it will happen one day or another so I have my DAL prepared to handle and retry a deadlocked operation.
A condition that occure whene two process are each waiting for the othere to complete befoure preceding.the result is both procedure is hang.
its most comonelly multitasking and clint/server.
Deadlock occurs mainly when there are multiple dependent locks exist. In a thread and another thread tries to lock the mutex in reverse order occurs. One should pay attention to use a mutex to avoid deadlocks.
Be sure to complete the operation after releasing the lock. If you have multiple locks, such as access order is ABC, releasing order should also be ABC.
In my last project I faced a problem with deadlocks in an sql Server Database. The problem in finding the reason was, that my software and a third party software are using the same Database and are working on the same tables. It was very hard to find out, what causes the deadlocks. I ended up writing an sql-query to find out which processes an which sql-Statements are causing the deadlocks. You can find that statement here: Deadlocks on SQL-Server
To avoid the deadlock there is a algorithm called Banker's algorithm.
This one also provides helpful information to avoid deadlock.

Resources