Preventing Deadlocks

Preventing Deadlocks - deadlock

for a pseudo function like
void transaction(Account from, Account to, double amount){
Semaphore lock1, lock2;
lock1 = getLock(from);
lock2 = getLock(to)
wait(lock1);
wait(lock2);
withdraw(from, amount);
deposit(to, amount);
signal(lock2);
signal(lock1);
}
deadlock happens if you run transaction(A,B,50) transaction(B,A,10)
how can this be prevented?
would this work?

A simple deadlock prevention strategy when handling locks is to have strict order on the locks in the application and always grab the locks according to this order. Assuming all accounts have a number, you could change your logic to always grab the lock for the account with the lowest account number first. Then grab the lock for the one with the highest number.
Another strategy for preventing deadlocks is to reduce the number of locks. In this case it might be better to have one lock that locks all accounts. It would definitely make the lock structure far more simple. If the application shows performance problems under heavy load and profiling shows that lock congestion is the problem - then it is time to invent a more fine grained locking strategy.

By making the entire transaction a critical section? That's only one possible solution, at least.
I have a feeling this is homework of some sort, because it's very similar to the dining philosophers problem based on the example code you give. (Multiple solutions to the problem are available at the link provided, just so you know. Check them out if you want a better understanding of the concepts.)

Related

What could cause a sqlite application to slow down over time with high load?

I'll definitely need to update this based on feedback so I apologize in advance.
The problem I'm trying to solve is roughly this.
The graph shows Disk utilization in the Windows task manager. My sqlite application is a webserver that takes in json requests with timestamps, looks up the existing entry in a 2 column key/value table, merges the request into the existing item (they don't grow over time), and then writes it back to the database.
The db is created as follows. I've experimented with and without WAL without difference.
createStatement().use { it.executeUpdate("CREATE TABLE IF NOT EXISTS items ( key TEXT NOT NULL PRIMARY KEY, value BLOB );") }
The write/set is done as follows
try {
val insertStatement = "INSERT OR REPLACE INTO items (key, value) VALUES (?, ?)"
prepareStatement(insertStatement).use {
it.setBytes(1, keySerializer.serialize(key))
it.setBytes(2, valueSerializer.serialize(value))
it.executeUpdate()
}
commit()
} catch (t: Throwable) {
rollback()
throw t
}
I use a single database connection the entire time which seems to be ok for my use case and greatly improves performance relative to getting a new one for each operation.
val databaseUrl = "jdbc:sqlite:${System.getProperty("java.io.tmpdir")}/$name-map-v2.sqlite"
if (connection?.isClosed == true || connection == null) {
connection = DriverManager.getConnection(databaseUrl)
}
I'm effectively serializing access to the db. I'm pretty sure the default threading mode for the sqlite driver is to serialize and I'm also doing some serializing in kotlin coroutines (via actors).
I'm load testing the application locally and I notice that disk utilization spikes around the one minute mark but I can't determine why. I know that throughput plummets when that happens though. I expect the server to chug along at a more or less constant rate. The db in these tests is pretty small too, hardly reaches 1mb.
Hoping people can recommend some next steps or set me straight as far as performance expectations. I'm assuming there is some sqlite specific thing that happens when throughput is very high for too long, but I would have thought it would be related to WAL or something (which I'm not using).

I have a theory but it's a bit farfetched.
The fact that you hit a performance wall after some time makes me think that either a buffer somewhere is filling up, or some other kind of data accumulation threshold is being reached.
Where exactly the culprit is, I'm not sure.
So, I'd run the following tests.
// At the beginning
connection.setAutoCommit(true);
If the problem is in the driver side of the rollback transaction buffer, then this will slightly (hopefully) slow down operations, "spreading" the impact away from the one-minute mark. Instead of getting fast operations for 59 seconds and then some seconds of full stop, you get not so fast operations the whole time.
In case the problem is further down the line, try
PRAGMA JOURNAL_MODE=MEMORY
PRAGMA SYNCHRONOUS=OFF disables the rollback journal synchronization
(The data will be more at risk in case of a catastrophic powerdown).
Finally, another possibility is that the page translation buffer gets filled after a sufficient number of different keys has been entered. You can test this directly by doing these two tests:
1) pre-fill the database with all the keys in ascending order and a large request, then start updating the same many keys.
2) run the test with only very few keys.
If the slowdown does not occur in the above cases, then it's either TLB buffer management that's not up to the challenge, or database fragmentation is a problem.
It might be the case that issuing
PRAGMA PAGE_SIZE=32768
upon database creation might solve or mitigate the problem. Conversely, PRAGMA PAGE_SIZE=1024 could "spread" the problem avoiding performance bottlenecks.
Another thing to try is closing the database connection and reopening it when it gets older than, say, 30 seconds. If this works, we'll still need to understand why it works (in this case I expect the JDBC driver to be at fault).

First of all, I want to say that I do not use exactly your driver for sqlite, and I use different devices in my work. (but how different are they really?)
From what I see, correct me if im wrong, you use one transaction, for one insert statement. You get request, you use the disc, you use the memory, open, close etc... every time. This can't work fast.
The first thing I do when I have to do inserts in sqlite is to group them, and use a single transaction to do it. That way, you are using your resources in batches.
One transaction, many insert statements, single commit. If there is a problem with a batch, handle the valid separately, log the faulty, move the next batch of requests.

How do I determine whether a deadlock will occur in this system?

N processes share M resource units that can be reserved and release only one at a time. The maximum need of each process does not exceed M, and the sum of all maximum needs is less than M+N. Can a deadlock occur in the system ?

I hope you got the answer. Answering this question for other visitors.
The answer is that the deadlock will not occur in the system.
The proof is given in the image below.
The image was taken from http://alumni.cs.ucr.edu/~choua/school/cs153/Solution%20Manual.pdf on page 31

the system you are describing looks like semaphores
about your last question : YES. You "could" always do a deadlock ; if you don't see how, ask a young/shameful/motivated/deviant developer.
One good way to make a good one ; is to have strange locking/releasing resources rules. For example, if a process needs M resources to perform a task, he could locks half of them right away, and then waits for the other half to be available before doing anything.
I assume he never gives up until he have its M precious resources and releases them all once the task done.
A single process wouldn't cause much problems but several will as they will lock more than M total resources and will need more of them to get out this frozen state.

Can you sacrifice performance to get concurrency in Sqlite on a NFS?

I need to write a client/server app stored on a network file system. I am quite aware that this is a no-no, but was wondering if I could sacrifice performance (Hermes: "And this time I mean really slash.") to prevent data corruption.
I'm thinking something along the lines of:
Create a separate file in the system everytime a write is called (I'm willing do it for every connection if necessary)
Store the file name as the current millisecond timestamp
Check to see if the file with that time or earlier exists
If the same one exists wait a random time between 0 to 10 ms, and try again.
While file is the earliest timestamp, do work, delete file lock, otherwise wait 10ms and try again.
If a file persists for more than a minute, log as an error, stop until it is determined that the data is not corrupted by a person.
The problem I see is trying to maintain the previous state if something locks up. Or choosing to ignore it, if the state change was actually successful.
Is there a better way of doing this, that doesn't involve not doing it this way? Or has anyone written one of these with a lot less problems than the Sqlite FAQ warns about? Will these mitigations even factor in to preventing data corruption?
A couple of notes:
This must exist on an NSF, the why is not important because it is not my decision to make (it doesn't look like I was clear enough on that point).
The number of readers/writers on the system will be between 5 and 10 all reading and writing at the same time, but rarely on the same record.
There will only be clients and a shared memory space, there is no way to put a server on there, or use a server based RDMS, if there was, obviously I would do it in a New York minute.
The amount of data will initially start off at about 70 MB (plain text, uncompressed), it will grown continuous from there at a reasonable, but not tremendous rate.
I will accept an answer of "No, you can't gain reasonably guaranteed concurrency on an NFS by sacrificing performance" if it contains a detailed and reasonable explanation of why.

Yes, there is a better way. Don't use NFS to do this.
If you are willing to create a new file every time something changes, I expect that you have a small amount of data and/or very infrequent changes. If the data is small, why use SQLite at all? Why not just have files with node names and timestamps?
I think it would help if you described the real problem you are trying to solve a bit more. For example if you have many readers and one writer, there are other approaches.

What do you mean by "concurrency"? Do you actually mean "multiple readers/multiple writers", or can you get by with "multiple readers/one writer with limited latency"?

Starting mutliple orchestrations from parent orchestration and passing messages to them

I have a situation where a main orchestration is responsible for processing a convoy of messages. These messages belong to a set of customers, the orchestration will read the messages as they come in, and for each new customer id it finds, it will spin up a new orchestration that is responsible for processing the messages of a particular customer. I have to preserve the order of messages as they come in, so the newly created orchestrations should process the message it has and wait for additional messages from the main orchestration.
Tried different ways to tackle this, but was not able to successfuly implement it.
I would like to hear your opinions on how this could be done.
Thanks.

It sounds like what you want is a set of nested convoys. While it might be possible to get that working, it's going to... well, hurt. In particular, my first worry would be maintenance: any changes to the process would be a pain in the neck to make, and, much worse, deployment would really, really suck.
Personally, I would really try to find an alternative way to implement this and avoid the convoys if possible, but that would depend a lot on your specific scenario.
A few questions, if you don't mind:
What are your ordering requirements? For example, do you only need ordered processing for each customer on a single incoming batch, or across batches? If the latter, could you make do without the master orchestration and just force a single convoy'd instance per customer? Still not great, but would likely simplify things a lot.
What are you failure requirements with respect to ordering? Should it completely stop processing? Save message and keep going? What about retries?
Is ordering based purely on the arrival time of the message? Is there anything in the message that you could use to force ordering internally instead of relying purely on the arrival time?
What does the processing of the individual messages do? Is the ordering requirement only to ensure that certain preconditions are met when a specific message is processed (for example, messages represent some tree structure that requires parents are processed before children).

I don't think you need a master orchestration to start up the sub-orchestrations. I am assumin you are not talking about the master orchestration implmenting a convoy pattern. So, if that's the case, here's what I might do.
There is a brief example here on how to implment a singleton orchestration. This example shows you how to setup an orchestration that will only ever exist once. All the messages going to it will be lined up in order of receipt and processed one at a time. Your example differs in that you want to have this done by customer ID. This is pretty simple. Promote the customer ID in the inbound message and add it to the correlation type. Now, there will only ever be one instance of the orchestration per customer.
The problem with singletons is this. You have to kill them at some point or they will live forever as dehydrated orchestrations. So, you need to have them end. You can do this if there is a way for the last message for a given customer to signal the orchestration that it's time to die through an attribute or such. If this is not possible, then you need to set a timer. If no messags are received in x seconds, terminate the orch. This is all easy to do, but it can introduce Zombies. Zombies occur when that orchestration is in the process of being shut down when another message for that customer comes in. this can usually be solved by tweeking the time to wait. Regardless, it will cause the occasional Zombie.
A note fromt he field. We've done this and it's really not a great long term solution. We were receiving customer info updates and we had to ensure ordered processing. We did this singleton approach and it's been problematic from the Zombie issue and the exeption issue. If the Singleton orchestration throws an exception, it will block the processing for a all future messages for that customer. So - handle every single possible exception. The real solution would have been to have the far end system check the time stamps from the update messages and discard ones that were older than the last update. We wanted to go this way, but the receiving system didn't want to do this extra work.

Practical value for concurrent-request-timeout parameter or options for avoiding concurrent access to conversation exception

In the Seam Reference Guide, one can find this paragraph:
We can set a sensible default for the concurrent request timeout (in ms) in components.xml:
<core:manager concurrent-request-timeout="500" />
However, we found that 500 ms is not nearly enough time for most of the cases we had to deal with, especially with the severe restriction seam places on conversation access.
In our application we have a combination of page scoped ajax requests (triggered by various user actions), some global scoped polling notification logic (part of the header, so included in every page) and regular links that invoke actions and/or navigate to other pages.
Therefore, we get the dreaded concurrent access to conversation exception way too often, even without any significant load on the site.
After researching the options for quite a bit, we ended up bumping this value to several seconds (we're debating whether to bump it up to 10s), as none of the recommended solutions seemed able to solve our issue completely (even forcing a global queue for all the ajax requests would still leave us exposed to a user deciding to click a link right when one of our polling calls was in progress). And we'd much rather have the users wait for a second or two instead of getting an error page just because they clicked a link at the wrong moment.
And now to the question: is there something obvious we're missing (like a way to allow concurrent access to conversations and taking care of the needed locking ourselves, for instance :)? How do people solve this problem (ajax requests mixed with user driven interaction) in seam? Disabling all the links on the page while ajax requests are in progress (as suggested by one blog page) is really not a viable option.
Any other suggestions?
TIA,
Andrei

We use 60000 or 120000 (1-2 minutes). Concurrent-request-timeout is designed to avoid deadlocks. Historically we have far more problems with timeouts than deadlocks. A better approach is to use a client-side queue (<a4j:ajaxQueue> if using RichFaces) to serialize and remove duplicate requests as much as possible, then set the timeout high enough to avoid any remaining problems.
There are many serious issues resulting from Seam's concurrent request timeouts:
The issue is the last request gets the ConcurrentRequestTimeoutException. If the user double-clicks or reloads the page, only the last request matters -- why should he get an error?
Usually the ConcurrentRequestTimeoutException is suppressed, and only secondary NullPointerExceptions and #In injection failures are shown, making debugging difficult.
Seam 2.2.1 has a severe problem where transactions, ThreadLocals, and locks may leak after a timeout occurs, especially when used with <spring:spring-transaction/>. Look at SeamPhaseListener.afterRestoreView: there's no finally block to clean up after restoreConversation fails!
In my opinion there are many poor aspects to this design, so it's best to use a much higher timeout and try to avoid the issues.

This is what we have and it works fine for us:
<core:manager concurrent-request-timeout="5000"
conversation-timeout="120000" conversation-id-parameter="cid"
parent-conversation-id-parameter="pid" />

We also use a much higher value for the concurrent-request-timeout.
At least for duplicate events you can use settings in the a4j components to filter and delay them with eventsQueue, requestDelay and ignoreDupResponses=”true”.
(Last point http://docs.jboss.org/seam/2.0.1.GA/reference/en/html/conversations.html )

Can you analyse which types of request are taking a long time? Is there a particular type which you could reduce the request time by doing the "work" asynchronously and getting the update back in your poll?
In my opinion, ajax requests should always complete fairly quickly, then you can calculate a max concurrent request time by (request time * max number of requests likely to be initiated)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex