I've spent a fair amount of time looking into the Realm database mechanics and I can't figure out if Realm is using row level read locks under the hood for data selected during write transactions.
As a basic example, imagine the following "queue" logic
assume the queue has an arbitrary number of jobs (we'll say 5 jobs)
async getNextJob() {
let nextJob = null;
this.realm.write(() => {
let jobs = this.realm.objects('Job')
.filtered('active == FALSE')
.sorted([['priority', true], ['created', false]]);
if (jobs.length) {
nextJob = jobs[0];
nextJob.active = true;
}
});
return nextJob;
}
If I call getNextJob() 2 times concurrently, if row level read blocking isn't occurring, there's a chance that nextJob will return the same job object when we query for jobs.
Furthermore, if I have outside logic that relies on up-to-date data in read logic (ie job.active == false when it actually is true at current time) I need the read to block until update transactions complete. MVCC reads getting stale data do not work in this situation.
If read locks are being set in write transactions, I could make sure I'm always reading the latest data like so
let active = null;
this.realm.write(() => {
const job = this.realm.pseudoQueryToGetJobByPrimaryKey();
active = job.active;
});
// Assuming the above write transaction blocked the read until
// any concurrent updates touching the same job committed
// the value for active can be trusted at this point in time.
if (active === false) {
// code to start job here
}
So basically, TL;DR does Realm support SELECT FOR UPDATE?
Postgresql
https://www.postgresql.org/docs/9.1/static/explicit-locking.html
MySql
https://dev.mysql.com/doc/refman/5.7/en/innodb-locking-reads.html
So basically, TL;DR does Realm support SELECT FOR UPDATE?
Well if I understand the question correctly, the answer is slightly trickier than that.
If there is no Realm Object Server involved, then realm.write(() => disallows any other writes at the same time, and updates the Realm to its latest version when the transaction is opened.
If there is Realm Object Server involved, then I think this still stands locally, but the Realm Sync manages the updates from remote, in which case the conflict resolution rules apply for remote data changes.
Realm does not allow concurrent writes. There is at most one ongoing
write transaction at any point in time.
If the async getNextJob() function is called twice concurrently, one of
the invocations will block on realm.write().
SELECT FOR UPDATE then works trivially, since there are no concurrent updates.
Related
GRPC END Contract=IPubSubPartitionManager Action=GetOrAddContextSelectorPartition ID=71eff709-3920-4139-a5c9-2b0bef6f5ba7 From=ipv4:10.0.0.56:35091 IsFault=True Duration(ms)=4932 Request { "contextSelector": "/debug/fabric--madari-madariservice-01-GrpcPublishSubscribeProber-8", "addIfNotExists": true, "clientCredential": { "credentialValue": "uswestcentral-prod.sdnpubsub.core.windows.net-client", "credentialRegex": { "pattern": "null string" }, "enablePropertyBasedAcls": false } } Response { "errorMsg": "Timed out waiting for Shared lock on key; id=49b61cd7-31ba-4c5d-b579-d068326e8a90#133028293161628388#urn:ContextSelectorMapping/dataStore#132077756302731635, timeout=4000ms, txn=133029757743026569, lockResourceNameHash=6572262935404555983; oldest txn with lock=133029757735370545 (mode Shared)\r\n" }
This problem mostly affects read operations that support Repeatable Read, the user may request an Update lock rather than a Shared lock. Update lock is an asymmetric lock is used to prevent but when a several transactions potentially updates at that time deadlock occurs.
Try to avoid TimeSpan.MaxValue for time-outs. It may detect deadlocks.
Don't create a transaction within another transaction's using statement. For example: the two transactions (T1 and T2) are attempting to read from and update K1, respectively. Due to the fact that they both end up with the Shared lock possible for them to deadlock. In this situation, one or both of the operations will time out.
To avoid a frequent type of deadlock that arises,
The default transaction timeout should be increased. Usually, it takes 4 second try to use in a different value
Make sure the transactions are short-lived; if they last any longer than necessary, you'll be blocking other tasks in the queue for a longer period of time than necessary.
For Reference: Azure Service Fabric
I'm trying to update the same document which triggered an onUpdate cloud function, with a read value from the same collection.
This is in a kind of chat app made in Flutter, where the previous response to an inquiry is replicated to the document now being updated, for easier showing in the app.
The code does work, however when a user quickly responds to two separate inquiries, they both read the same latest response thus setting the same previousResponse. This must be down to the asynchronous nature of flutter and/or the cloud function, but I can't figure out where to await or if there's a better way to make the function, so it is never triggering the onUpdate for the same user, until a previous trigger is finished.
Last part also sound a bit like a bad idea.
So far I tried sticking the read/update in a transaction, however that only seems to work for the single function call, and not when they're asynchronous.
Also figured I could fix it, by reading the previous response in a transaction on the client, however firebase doesn't allow reading from a collection in a transaction, when not using the server API.
async function setPreviousResponseToInquiry(
senderUid: string,
recipientUid: string,
inquiryId: string) {
return admin.firestore().collection('/inquiries')
.where('recipientUid', '==', recipientUid)
.where('senderUid', '==', senderUid)
.where('responded', '==', true)
.orderBy('modified', 'desc')
.limit(2)
.get().then(snapshot => {
if (!snapshot.empty &&
snapshot.docs.length >= 2) {
return admin.firestore()
.doc(`/inquiries/${inquiryId}`)
.get().then(snap => {
return snap.ref.update({
previousResponse: snapshot.docs[1].data().response
})
})
}
})
}
I see three possible solutions:
Use a transaction on the server, which ensures that the update you write must be based on the version of the data you read. If the value you write depends on the data that trigger the Cloud Function, you may need to re-read that data as part of the transaction.
Don't use Cloud Functions, but run all updates from the client. This allows you to use transactions to prevent the race condition.
If it's no possible to use a transaction, you may have to include a custom version number in both the upstream data (the data that triggers the write), and the fanned out data that you're updating. You can then use security rules to ensure that the downstream data can only be written if its version matches the current upstream data.
I'd consider/try them in the above order, as they gradually get more involved.
I have written a flow which creates a transaction that outputs a new state (TransactionBuilder.signInitialTransaction), and then passes it to FinalityFlow to notarize/record/broadcast it. My client-application is starting this flow over RPC with CordaRPCOps.startFlowDynamic and waits for the returned CordaFutures getOrThrow(). This is rather slow, since FinalityFlow only returns once it has delivered the transaction to all other parties/nodes (in fact, if a remote-node is down it seems to never return).
I figured I can speed things up by letting my application only wait for FinalityFlow to have completed notarizeAndRecord(), as I should then have the tx/states in my nodes vault and I can safely assume that other nodes will eventually have this tx delivered and accept it. I implemented this using ProgressTracker, waiting only until FinalityFlow sets currentStep to BROADCASTING.
However, what I'm observing is that if I query the vault (using CordaRPCOps.vaultQueryByCriteria) for the new state very shortly after notarizeAndRecord has returned, I sometimes do not yet get it returned. Is this a bug or rather some deliberate asynchronous behavior where the database is not immediately written to ?
To work around this I then tried to synchronize with the vault inside my flow, in order to update the progressTracker only after the tx/state was actually written to the vault:
val stx = serviceHub.signInitialTransaction(tx)
serviceHub.vaultService.rawUpdates.subscribe {
logger.info("receiving update $it")
if(it.produced.any { it.ref.txhash == stx.id }) {
progressTracker.currentStep = RECORDED
}
}
subFlow(FinalityFlow(stx))
I can see the update in the node-logs, yet a subsequent vault-query by the RPC-Client (which also shows in the node-logs, after the update) for that very state still does not return anything if executed immediately afterwards...
I am running Corda v2.0.
I do not know whether vault writes are synchronous.
However, you can side-step this issue by creating an observable on the vault so that you are notified when the new state is recorded. Here's an example where we update a state using its linear ID, then wait for vault updates matching that linear ID:
proxy.startFlowDynamic(UpdateState::class.java, stateLinearId)
val queryCriteria = QueryCriteria.LinearStateQueryCriteria(linearId = listOf(stateLinearId))
val (snapsnot, updates) = proxy.vaultTrackBy<MyLinearState>(queryCriteria)
updates.toBlocking().subscribe { update ->
val newVaultState = update.produced.single()
// Perform action here.
}
I have a node on firebase that lists all the players in the game. This list will update as and when new players join. And when the current user ( me ) disconnects, I would like to remove myself from the list.
As the list will change over time, at the moment I disconnect, I would like to update this list and update firebase.
This is the way I am thinking of doing it, but it doesn't work as .update doesnt accept a function. Only the object. But if I create the object beforehand, when .onDisconnect calls, it will not be the latest object... How should I go about doing this?
payload.onDisconnect().update( () => {
const withoutMe = state.roomObj
const index = withoutMe.players.indexOf( state.userObj.name )
if ( index > -1 ) {
withoutMe.players.splice( index, 1 )
}
return withoutMe
})
The onDisconnect handler was made for this use-case. But it requires that the data of the write operation is known at the time that you set the onDisconnect. If you think about it, this should make sense: since the onDisconnect happens after your client is disconnected, the data of the data of that write operation must be known before the disconnect.
It sounds like you're building a so-called presence system: a list that contains a node for each user that is currently online. The Firebase documentation has an example of such a presence system. The key difference from your approach is that it in the documentation each user only modifies their own node.
So: when the user comes online, they write a node for themselves. And then when they get disconnected, that node gets removed. Since all users write their node under the same parent, that parent will reflect the users that are online.
The actual implementation is a bit more involved since it deals with some edge cases too. So I recommend you check out the code in the documentation I linked, and use that as the basis for your own similar system.
I'm trying to use Dart with sqlite, with this project dart-sqlite.
But I found a problem: the API it provides is synchronous style. The code will be looked like:
// Iterating over a result set
var count = c.execute("SELECT * FROM posts LIMIT 10", callback: (row) {
print("${row.title}: ${row.body}");
});
print("Showing ${count} posts.");
With such code, I can't use Dart's future support, and the code will be blocking at sql operations.
I wonder how to change the code to asynchronous style? You can see it defines some native functions here: https://github.com/sam-mccall/dart-sqlite/blob/master/lib/sqlite.dart#L238
_prepare(db, query, statementObject) native 'PrepareStatement';
_reset(statement) native 'Reset';
_bind(statement, params) native 'Bind';
_column_info(statement) native 'ColumnInfo';
_step(statement) native 'Step';
_closeStatement(statement) native 'CloseStatement';
_new(path) native 'New';
_close(handle) native 'Close';
_version() native 'Version';
The native functions are mapped to some c++ functions here: https://github.com/sam-mccall/dart-sqlite/blob/master/src/dart_sqlite.cc
Is it possible to change to asynchronous? If possible, what shall I do?
If not possible, that I have to rewrite it, do I have to rewrite all of:
The dart file
The c++ wrapper file
The actual sqlite driver
UPDATE:
Thanks for #GregLowe's comment, Dart's Completer can convert callback style to future style, which can let me to use Dart's doSomething().then(...) instead of passing a callback function.
But after reading the source of dart-sqlite, I realized that, in the implementation of dart-sqlite, the callback is not event-based:
int execute([params = const [], bool callback(Row)]) {
_checkOpen();
_reset(_statement);
if (params.length > 0) _bind(_statement, params);
var result;
int count = 0;
var info = null;
while ((result = _step(_statement)) is! int) {
count++;
if (info == null) info = new _ResultInfo(_column_info(_statement));
if (callback != null && callback(new Row._internal(count - 1, info, result)) == true) {
result = count;
break;
}
}
// If update affected no rows, count == result == 0
return (count == 0) ? result : count;
}
Even if I use Completer, it won't increase the performance. I think I may have to rewrite the c++ code to make it event-based first.
You should be able to write a wrapper without touching the C++. Have a look at how to use the Completer class in dart:async. Basically you need to create a Completer, return Completer.future immediately, and then call Completer.complete(row) from the existing callback.
Re: update. Have you seen this article, specifically the bit about asynchronous extensions? i.e. If the C++ API is synchronous you can run it in a separate thread, and use messaging to communicate with it. This could be a way to do it.
The big problem you've got is that SQLite is an embedded database; in order to process your query and provide your results, it must do computation (and I/O) in your process. What's more, in order for its transaction handling system to work, it either needs its connection to be in the thread that created it, or for you to run in serialized mode (with a performance hit).
Because these are fairly hard constraints, your plan of switching things to an asynchronous operation mode is unlikely to go well except by using multiple threads. Since using multiple connections complicates things a lot (as you can't share some things between them, such as TEMP TABLEs) let's consider going for a single serialized connection; all activity will be serialized at the DB level, but for an application that doesn't use the DB a lot it will be OK. At the C++ level, you'd be talking about calling that execute from another thread and then sending messages back to the caller thread to indicate each row and the completion.
But you'll take a real hit when you do this; in particular, you're committing to only doing one query at a time, as the technique runs into significant problems with semantic effects when you start using two connections at once and the DB forces serialization on you with one connection.
It might be simpler to do the above by putting the synchronous-asynchronous coupling at the Dart level by managing the worker thread and inter-thread communication there. That would let you avoid having to change the C++ code significantly. I don't know Dart well enough to be able to give much advice there.
Myself, I'd just stick with synchronous connection processing so that I can make my application use multi-threaded mode more usefully. I'd be taking the hit with the semantics and giving each thread its own connection (possibly allocated lazily) so that overall speed was better, but I do come from a programming community that regards threads as relatively heavyweight resources, so make of that what you will. (Heavy threads can do things that reduce the number of locks they need that it makes no sense to try to do with light threads; it's about overhead management.)