In the docs here:
http://www.sqlite.org/threadsafe.html
For serialized mode it says:
"In serialized mode, SQLite can be safely used by multiple threads with no restriction."
I want to make sure I understand the guarantee presented there. If a single database connection is opened using the "SQLITE_OPEN_FULLMUTEX" flag and two threads simultaneously try to call sqlite3_exec at the exact same instant, does Sqlite automatically serialize the calls?
The answer is yes. sqlite3_exec() will grab a mutex when the function is entered and releases the mutext once the function is left. Only one thread can own the mutex at any given time, so only one thread can execute sqlite3_exec() at any given time. If two threads try to execute sqlite3_exec() at the same time, one will wait for the other.
Related
I'll definitely need to update this based on feedback so I apologize in advance.
The problem I'm trying to solve is roughly this.
The graph shows Disk utilization in the Windows task manager. My sqlite application is a webserver that takes in json requests with timestamps, looks up the existing entry in a 2 column key/value table, merges the request into the existing item (they don't grow over time), and then writes it back to the database.
The db is created as follows. I've experimented with and without WAL without difference.
createStatement().use { it.executeUpdate("CREATE TABLE IF NOT EXISTS items ( key TEXT NOT NULL PRIMARY KEY, value BLOB );") }
The write/set is done as follows
try {
val insertStatement = "INSERT OR REPLACE INTO items (key, value) VALUES (?, ?)"
prepareStatement(insertStatement).use {
it.setBytes(1, keySerializer.serialize(key))
it.setBytes(2, valueSerializer.serialize(value))
it.executeUpdate()
}
commit()
} catch (t: Throwable) {
rollback()
throw t
}
I use a single database connection the entire time which seems to be ok for my use case and greatly improves performance relative to getting a new one for each operation.
val databaseUrl = "jdbc:sqlite:${System.getProperty("java.io.tmpdir")}/$name-map-v2.sqlite"
if (connection?.isClosed == true || connection == null) {
connection = DriverManager.getConnection(databaseUrl)
}
I'm effectively serializing access to the db. I'm pretty sure the default threading mode for the sqlite driver is to serialize and I'm also doing some serializing in kotlin coroutines (via actors).
I'm load testing the application locally and I notice that disk utilization spikes around the one minute mark but I can't determine why. I know that throughput plummets when that happens though. I expect the server to chug along at a more or less constant rate. The db in these tests is pretty small too, hardly reaches 1mb.
Hoping people can recommend some next steps or set me straight as far as performance expectations. I'm assuming there is some sqlite specific thing that happens when throughput is very high for too long, but I would have thought it would be related to WAL or something (which I'm not using).
I have a theory but it's a bit farfetched.
The fact that you hit a performance wall after some time makes me think that either a buffer somewhere is filling up, or some other kind of data accumulation threshold is being reached.
Where exactly the culprit is, I'm not sure.
So, I'd run the following tests.
// At the beginning
connection.setAutoCommit(true);
If the problem is in the driver side of the rollback transaction buffer, then this will slightly (hopefully) slow down operations, "spreading" the impact away from the one-minute mark. Instead of getting fast operations for 59 seconds and then some seconds of full stop, you get not so fast operations the whole time.
In case the problem is further down the line, try
PRAGMA JOURNAL_MODE=MEMORY
PRAGMA SYNCHRONOUS=OFF disables the rollback journal synchronization
(The data will be more at risk in case of a catastrophic powerdown).
Finally, another possibility is that the page translation buffer gets filled after a sufficient number of different keys has been entered. You can test this directly by doing these two tests:
1) pre-fill the database with all the keys in ascending order and a large request, then start updating the same many keys.
2) run the test with only very few keys.
If the slowdown does not occur in the above cases, then it's either TLB buffer management that's not up to the challenge, or database fragmentation is a problem.
It might be the case that issuing
PRAGMA PAGE_SIZE=32768
upon database creation might solve or mitigate the problem. Conversely, PRAGMA PAGE_SIZE=1024 could "spread" the problem avoiding performance bottlenecks.
Another thing to try is closing the database connection and reopening it when it gets older than, say, 30 seconds. If this works, we'll still need to understand why it works (in this case I expect the JDBC driver to be at fault).
First of all, I want to say that I do not use exactly your driver for sqlite, and I use different devices in my work. (but how different are they really?)
From what I see, correct me if im wrong, you use one transaction, for one insert statement. You get request, you use the disc, you use the memory, open, close etc... every time. This can't work fast.
The first thing I do when I have to do inserts in sqlite is to group them, and use a single transaction to do it. That way, you are using your resources in batches.
One transaction, many insert statements, single commit. If there is a problem with a batch, handle the valid separately, log the faulty, move the next batch of requests.
In previous date time API thread are not safe.... I want to know how they achieved in new date time API in java 8?? (earlear they also can safe the thread by using synchronizing and making seprate instance for each thread ) In java 8 what they add new give some examples also... Thank you.
The SimpleDateFormat that's existed since the early days of Java used inner fields to hold temporary state but didn't do anything to prevent two thread concurrently updating these. This lead to the wrong date being returned if two threads happened to call the format or parse methods on the same SimpleDateFormat instance at the same time, since they'd modify the internal state of the SimpleDateFormat object whilst the other was still using that state.
Java 8 hasn't done anything to change SimpleDateFormat, instead it's introduced a whole new LocalDate API that uses internal synchronization to protect fields being accessed concurrently (and possibly uses local variables to reduce locking overhead, but I've not checked this), as well as removing the complexity of Timezones and pre-1990 dates that were also a headache for users of the old Date APIs.
The thread safety in java.time (the modern Java date and time API introduced from Java 8) is obtained through immutable classes. An immutable object is always thread-safe (see the modification of the last statement near the bottom of the first link). As Holger notes in a comment,
without mutation, there can’t be any inconsistencies.
Links:
Does Immutability Really Mean Thread Safety?
Immutable objects are thread safe, but why?
I'm investigating asynchronous transaction in order to improve performances.
Could you please explain me what is the behaviour of a transaction in a replicated asynchronous cache?
If I have a transaction composed by operations, in which every operation depends by previous operation (i.e. order of execution of operations is important).
For instance, consider a transaction T that performs a read of data1 needed to build data2, data2 is then written on cache.
TRANSACTION T {
// 1° operation
data1 = get(key1);
// 2° operation
data2 = elaborate(data1);
// 3° operation
put(data2);
}
In other words, I need that "entire transactions" are asynchronously executed, but I need that operations performed inside a transaction remain synchronous.
Is it possible? If yes, how I have to configure infinispan?
Many Thanks
Francesco Sclano
I guess you've wrapped those operations into tm.begin(); try { ... tm.commit(); } catch (...) { tm.rollback(); }.
There are few other option, therefore, I assume you use the default - optimistic transactions and 2-phase commit. I'd also recommend enabling write skew check - this should be default in Infinispan 7 but in prior versions you had to enable it explicitly, and some operations behave weird without that.
As for the get, this is always synchronous - you have to wait for the response.
Before commit, the put does basically just another get (in order to return the value) and records that the transaction should do the write during commit.
Then, during commit, the PrepareCommand which locks all updated entries is sent asynchronously - therefore, you can't know whether it succeeded or failed (usually due to timeout, but also due to write skew check or changed value in conditional command).
Second phase, the CommitCommand which overwrites the entries and unlocks the locks is sent synchronously after that - so you wait until the response is received. However, if you have <transaction useSynchronization="true" /> (default), even if this command fails somewhere, the commit succeeds.
In order to send the CommitCommand (or RollbackCommand) asynchronously, you need to configure <transaction syncCommitPhase="false" syncRollbackPhase="false" />.
I hope I've interpreted the sources correctly - and don't ask me why is it done this way. Regrettably, I am not sure whether you can configure the semantics "commit or rollback this transaction reliably, but don't report the result and let me continue" out of the box.
EDIT: The commit in asynchronous mode should be 1-phase as you can't check the result anyway. Therefore, it's possible that concurrent writes get reordered on different nodes and you'll get the cluster inconsistent, but you are not waiting for the transaction to be completed.
Anyway, if you want to execute the whole block atomically and asynchronously, there's nothing easier than to wrap the code into Runnable and execute it in your own threadpool. And with synchronous cache.
I am learning about F# agents (MailboxProcessor).
I am dealing with a rather unconventional problem.
I have one agent (dataSource) which is a source of streaming data. The data has to be processed by an array of agents (dataProcessor). We can consider dataProcessor as some sort of tracking device.
Data may flow in faster than the speed with which the dataProcessor may be able to process its input.
It is OK to have some delay. However, I have to ensure that the agent stays on top of its work and does not get piled under obsolete observations
I am exploring ways to deal with this problem.
The first idea is to implement a stack (LIFO) in dataSource. dataSource would send over the latest observation available when dataProcessor becomes available to receive and process the data. This solution may work but it may get complicated as dataProcessor may need to be blocked and re-activated; and communicate its status to dataSource, leading to a two way communication problem. This problem may boil down to a blocking queue in the consumer-producer problem but I am not sure..
The second idea is to have dataProcessor taking care of message sorting. In this architecture, dataSource will simply post updates in dataProcessor's queue. dataProcessor will use Scanto fetch the latest data available in his queue. This may be the way to go. However, I am not sure if in the current design of MailboxProcessorit is possible to clear a queue of messages, deleting the older obsolete ones. Furthermore, here, it is written that:
Unfortunately, the TryScan function in the current version of F# is
broken in two ways. Firstly, the whole point is to specify a timeout
but the implementation does not actually honor it. Specifically,
irrelevant messages reset the timer. Secondly, as with the other Scan
function, the message queue is examined under a lock that prevents any
other threads from posting for the duration of the scan, which can be
an arbitrarily long time. Consequently, the TryScan function itself
tends to lock-up concurrent systems and can even introduce deadlocks
because the caller's code is evaluated inside the lock (e.g. posting
from the function argument to Scan or TryScan can deadlock the agent
when the code under the lock blocks waiting to acquire the lock it is
already under).
Having the latest observation bounced back may be a problem.
The author of this post, #Jon Harrop, suggests that
I managed to architect around it and the resulting architecture was actually better. In essence, I eagerly Receive all messages and filter using my own local queue.
This idea is surely worth exploring but, before starting to play around with code, I would welcome some inputs on how I could structure my solution.
Thank you.
Sounds like you might need a destructive scan version of the mailbox processor, I implemented this with TPL Dataflow in a blog series that you might be interested in.
My blog is currently down for maintenance but I can point you to the posts in markdown format.
Part1
Part2
Part3
You can also check out the code on github
I also wrote about the issues with scan in my lurking horror post
Hope that helps...
tl;dr I would try this: take Mailbox implementation from FSharp.Actor or Zach Bray's blog post, replace ConcurrentQueue by ConcurrentStack (plus add some bounded capacity logic) and use this changed agent as a dispatcher to pass messages from dataSource to an army of dataProcessors implemented as ordinary MBPs or Actors.
tl;dr2 If workers are a scarce and slow resource and we need to process a message that is the latest at the moment when a worker is ready, then it all boils down to an agent with a stack instead of a queue (with some bounded capacity logic) plus a BlockingQueue of workers. Dispatcher dequeues a ready worker, then pops a message from the stack and sends this message to the worker. After the job is done the worker enqueues itself to the queue when becomes ready (e.g. before let! msg = inbox.Receive()). Dispatcher consumer thread then blocks until any worker is ready, while producer thread keeps the bounded stack updated. (bounded stack could be done with an array + offset + size inside a lock, below is too complex one)
Details
MailBoxProcessor is designed to have only one consumer. This is even commented in the source code of MBP here (search for the word 'DRAGONS' :) )
If you post your data to MBP then only one thread could take it from internal queue or stack.
In you particular use case I would use ConcurrentStack directly or better wrapped into BlockingCollection:
It will allow many concurrent consumers
It is very fast and thread safe
BlockingCollection has BoundedCapacity property that allows you to limit the size of a collection. It throws on Add, but you could catch it or use TryAdd. If A is a main stack and B is a standby, then TryAdd to A, on false Add to B and swap the two with Interlocked.Exchange, then process needed messages in A, clear it, make a new standby - or use three stacks if processing A could be longer than B could become full again; in this way you do not block and do not lose any messages, but could discard unneeded ones is a controlled way.
BlockingCollection has methods like AddToAny/TakeFromAny, which work on an arrays of BlockingCollections. This could help, e.g.:
dataSource produces messages to a BlockingCollection with ConcurrentStack implementation (BCCS)
another thread consumes messages from BCCS and sends them to an array of processing BCCSs. You said that there is a lot of data. You may sacrifice one thread to be blocking and dispatching your messages indefinitely
each processing agent has its own BCCS or implemented as an Agent/Actor/MBP to which the dispatcher posts messages. In your case you need to send a message to only one processorAgent, so you may store processing agents in a circular buffer to always dispatch a message to least recently used processor.
Something like this:
(data stream produces 'T)
|
[dispatcher's BCSC]
|
(a dispatcher thread consumes 'T and pushes to processors, manages capacity of BCCS and LRU queue)
| |
[processor1's BCCS/Actor/MBP] ... [processorN's BCCS/Actor/MBP]
| |
(process) (process)
Instead of ConcurrentStack, you may want to read about heap data structure. If you need your latest messages by some property of messages, e.g. timestamp, rather than by the order in which they arrive to the stack (e.g. if there could be delays in transit and arrival order <> creation order), you can get the latest message by using heap.
If you still need Agents semantics/API, you could read several sources in addition to Dave's links, and somehow adopt implementation to multiple concurrent consumers:
An interesting article by Zach Bray on efficient Actors implementation. There you do need to replace (under the comment // Might want to schedule this call on another thread.) the line execute true by a line async { execute true } |> Async.Start or similar, because otherwise producing thread will be consuming thread - not good for a single fast producer. However, for a dispatcher like described above this is exactly what needed.
FSharp.Actor (aka Fakka) development branch and FSharp MPB source code (first link above) here could be very useful for implementation details. FSharp.Actors library has been in a freeze for several months but there is some activity in dev branch.
Should not miss discussion about Fakka in Google Groups in this context.
I have a somewhat similar use case and for the last two days I have researched everything I could find on the F# Agents/Actors. This answer is a kind of TODO for myself to try these ideas, of which half were born during writing it.
The simplest solution is to greedily eat all messages in the inbox when one arrives and discard all but the most recent. Easily done using TryReceive:
let rec readLatestLoop oldMsg =
async { let! newMsg = inbox.TryReceive 0
match newMsg with
| None -> oldMsg
| Some newMsg -> return! readLatestLoop newMsg }
let readLatest() =
async { let! msg = inbox.Receive()
return! readLatestLoop msg }
When faced with the same problem I architected a more sophisticated and efficient solution I called cancellable streaming and described in in an F# Journal article here. The idea is to start processing messages and then cancel that processing if they are superceded. This significantly improves concurrency if significant processing is being done.
I have the feeling that I should not care about thread safe accessing / writing to an
public static int MyVar = 12;
in ASP .NET.
I read/write to this variable from various user threads. Let's suppose this variable will store the numbers of clicks on a certain button/link.
My theory is that no thread can read/write to this variable at the same time. It's just a simple variable of 4 bytes.
I do care about thread safe, but only for refference objects and List instances or other types that take more cycles to read/update.
I am wrong with my presumption ?
EDIT
I understand this depend of my scenario, but wasn't that the point of the question. The question is: it is right that can be written thread safe code with an (static int) variable without using lock keyword ?
It is my problem to write correct code. The answer seems to be: Yes, if you write correct and simple code, and not to much complicated, you can create thread safe functions without the need of lock keyword.
If one thread simply sets the value and another thread reads the value, then a lock is not necessary; the read and write are atomic. But if multiple threads might be updating it and are also reading it to do the update (e.g., increment), then you definitely do need some kind of synchronization. If only one thread is ever going to update it even for an increment, then I would argue that no synchronization is necessary.
Edit (three years later) It might also be desirable to add the volatile keyword to the declaration to ensure that reads of the value always get the latest value (assuming that matters in the application).
The concept of thread 'safety' is too vague to be meaningful unfortunately. If you're asking whether you can read and write to it from multiple threads without the program crashing during the operation, the answer is almost certainly yes. If you're also asking if the variable is guaranteed to either be the old value or the new value without ever storing any broken intermediate values, the answer for this data type is again almost certainly yes.
But if your question is "will my program work correctly if I access this from multiple threads", then the answer depends entirely on what your program is doing. For example, if you run the following pseudo code in 2 threads repeatedly in most programming languages, eventually you'll hit the assertion.
if MyVar >= 1:
MyVar = MyVar - 1
assert MyVar >= 0
Primitives like int are thread-safe in the sense that reads/writes are atomic. But as with most any type, it's left to you to do proper checking with more complex operations. For example, if (x > 0) x--; would be problematic in a multi-threaded scenario because x might change in between the if condition check and decrement.
A simple read or write on a field of 32 bits or less is always atomic. But you should provide your read/write code to make sure that it is thread safe.
Check out this post: http://msdn.microsoft.com/en-us/magazine/cc163929.aspx
It explains why you need to synchronize access to the integers in this scenario
Try Interlocked.Increment() or Interlocked.Add() and you'll be right. Your code complexity will be the same but you truly won't have to worry. If you're not worried about losing a few clicks in your counter, you can continue as you are.
Reading or writing integers is atomic. However, reading and then writing is not atomic. So, if you have one thread that writes and many that read, you may be able to get away without locks.
However, even though the operations are atomic, there are still potential multi-threading issues. In order for one thread to be guaranteed that another thread can see values it writes, you need a memory barrier. Otherwise, the compiler can optimize the code so that the variable stays in a register (or even optimize the operation away completely), so changes would be invisible from one thread to another.
You can establish a memory barrier explicitly (volatile or Thread.MemoryBarrier), or with the Interlocked class -- or with the lock statement (Monitor).