I am developing a RESTful Android app using SyncAdapter. I have watched some screencasts and read some tutorials, but they only address basic dynamics. Before starting to write the code, I would like to try and receive feedback from expert users about concurrency issues.
Step 1. The user U inserts a new entry E into the table T of the database. The entry has a column status = TO_SYNC.
Step 2. Before the entry E is synced, U decides to modify it. An activity A starts for modifying the column values of E.
Step 3. While the user is modifying the entry, the SyncAdapter starts and sends the entries with status == TO_SYNC to the server. For each entry, the SyncAdapter sets status = SYNCED once it receives a positive answer from the server.
Step 4. Let's say that a concurrent access to the same entry E causes a conflict:
The SyncAdapter reads E, sends E to the server
The activity A completes and sets E status to TO_SYNC
The SyncAdapter receives the ok from the server and sets the status of E to SYNCED
At this point the new values of E will not be synced since the value TO_SYNC has been overwritten by the SyncAdapter.
My question is: how can I avoid such issue without blocking the entire database with a begin/end transaction while syncing (which could take a long time to complete)? Should I fall back on a classical java lock on single entries? Is there a more elegant way?
You can sync to the server based on timestamps (call it LAST_UPDATED_TIME) rather than a simple flag and along with that in another column (call it HASH) you store the hash value of the concatenated string of all the values in a row. With this you check the timestamp since last sync, get those incremental data to be synced and sync to the server (of course in a separate thread) and when you get back the results you will do the following steps:
Compare the hash of the data currently in the database/row against what was synced.
Based on the truthness of the above statement two things can happen:
If the hash of the data in the db/row is equals to that value that was synced, you just update the LAST_UPDATED_TIME field with the time of sync
If the hash of the data in the db/row is not equals to that values that was synced, you immediately sync the row again (you can obviously optimize this even more)
Another way, with second column "updated_at"
The SyncAdapter reads E, sends E to the server
The activity A completes and sets E status to TO_SYNC and updated_at at NOW
The SyncAdapter receives the ok from the server
The SyncAdapter reads E_bis, compare E.updated_at and E_bis.updated_at, if different => request new sync, else => sets the status of E to SYNCED
Related
It is ideal to keep all the data until you can't. When there is a need to delete data, it doesn't necessarily have to be from all the streams. There might be data in some streams that we might want to keep. The current approach does not let the user select streams to delete the telemetry data from but rather purges data from all the streams.
The solution I came up with is to add four new functions to the existing purge script which now enables the user to select streams to purge data from.
Steps -
First run the purge script
python purge.py
This will show you three menu options. The last option is 3 -- Purge selected streams.
Upon selecting the third option, a list of streams is displayed. The script prompts you to select stream(s) to purge. Enter a comma-separated list of stream names. If the stream name(s) is/are incorrect, you will be prompted to try one more time.
Enter the number of days older than today to purge data. Confirm with a y/n. If the input is y, data will be purged from all the streams with id corresponding to the stream names you input. Lastly, a list of all the streams the data was purged from is printed. If the input is n, you will be taken back to the main menu.
To explain the code a little;
The first function is get_streams which fetches all the stream names and corresponding IDs from the stream table and stores them as key-value pairs in a dictionary.
The second function is list_streams which calls the get_streams function to get the dictionary and the existing get_stream_tables function to get all the streams corresponding to each ID in the stream table. It prints a list of streams (say, socomec 0, generator 11 etc.) for users to choose from.
The third function is stream_input which takes a comma-separated input from the user and checks if the stream name(s) input by the user exists or not. If the input is incorrect, it prompts the user to try again (one time only). If the input is correct, it takes the ID(s) and appends 'stream' in front of it and filters out all the streams corresponding to that particular ID, using a lambda function, into a list. It then prompts the user to enter the number of days and provide confirmation.
The fourth function , purge_stream, is a slight modification of the original purge function. The loop variable in this function is the list of streams that we get from the lambda function mentioned above which ensures that the data is purged only from the selected streams.
I want to track changes to documents in a collection in Firestore. I use a lastModified property to filter results. The reason I use this “lastModified” filter is so that each time the app starts the initial snapshot in the listener does not return all documents in the collection.
// The app is only interested in changes occurring after a certain date
let date: Date = readDateFromDatabase()
// When the app starts, begin listening for city updates.
listener = db.collection("cities")
.whereField("lastModified", isGreaterThanOrEqualTo: date)
.addSnapshotListener() { (snapshot, error)
// Process added, modified, and removed documents.
// Keep a record of the last modified date of the updates.
// Store an updated last modified date in the database using
// the oldest last modified date of the documents in the
// snapshot.
writeDateToDatabase()
}
Each time documents are processed in the closure, a new “lastModified” value is stored in the database. The next time the app starts, the snapshot listener is created with a query using this new “lastModified” value.
When a new city is created, or one is updated, its “lastModified” property is updated to “now”. Since “now” should be greater than or equal to the filter date, all updates will be sent to the client.
However, if a really old city is deleted, then its “lastModified” property may be older than the filter date of a client that has received recent updates. The problem is that the deleted city’s “lastModified” property cannot be updated to “now” when it is being deleted.
Example
Client 1 listens for updates ≥ d_1.
Client 2 creates two cities at d_2, where d_1 < d_2.
Client 1 receives both updates because d_1 < d_2.
Client 1 stores d_2 as a future filter.
Client 2 updates city 1 at d_3, where d_2 < d_3.
Client 1 receives this update because d_1 < d_3.
Client 1 stores d_3 as a future filter.
...Some time has passed.
Client 1 app starts and listens for updates ≥ d_3.
Client 2 deletes city 2 (created at d_2).
Client 1 won’t receive this update because d_2 < d_3.
My best solution
Don’t delete cities, instead add an isDeleted property. Then, when a city is marked as deleted, its “lastModified” property is updated to “now”. This update should be sent to all clients because the query filter date will always be before “now”. The main business logic of the app ignores cities where isDeleted is true.
I feel like I don’t fully understand this problem. Is there a better way to solve my problem?
The solution you've created is quite common and is known as a tombstone.
Since you no longer need the actual data of the document, you can delete its fields. But the document itself will need to remain to indicate that it's been deleted.
There may be other approaches, but they'll all end up similarly. As you have to somehow signal to each client (no matter when they connect/query) that the document is gone, keeping the document as a tombstone seems like a simple and good approach to me.
Imaging that there are two clients client1 and client2, both writing the same key. This key has three replicas named A, B, C. A first receives client1's request, and then client2', while B receives client2's request, and then client1's. Now A and B must be inconsistent with each other, and they cannot resolve conflict even using Vector Clock. Am I right?
If so, it seems that it is easy to occur write conflict in dynamo. Why so many open source projects based on dynamo's design?
If you're using Dynamo and are worried about race conditions (which you should be if you're using lambda)
You can check conditionals on putItem or updateItem, if the condition fails
e.g. during getItem the timestamp was 12345, add conditional that timestamp must equal 12345, but another process updates it, changes the timestamp to 12346, your put/update should fail now, in java for example, you can catch ConditionalCheckFailedException, you can do another get item, apply your changes on top, then resubmit the put/update
To prevent a new item from replacing an existing item, use a conditional expression that contains the attribute_not_exists function with the name of the attribute being used as the partition key for the table. Since every record must contain that attribute, the attribute_not_exists function will only succeed if no matching item exists.
For more information about PutItem, see Working with Items in the Amazon DynamoDB Developer Guide.
Parameters:
putItemRequest - Represents the input of a PutItem operation.
Returns:
Result of the PutItem operation returned by the service.
Throws:
ConditionalCheckFailedException - A condition specified in the operation could not be evaluated.
https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/dynamodbv2/AmazonDynamoDB.html#putItem-com.amazonaws.services.dynamodbv2.model.PutItemRequest-
Can't talk about HBase, but I can about Cassandra, which is inspired in Dynamo.
If that happens in Cassandra, the most recent key wins.
Cassandra uses coordinator nodes (which can be any node), that receive the client requests and resends them to all replica nodes. Meaning each request has its own timestamp.
Imagine that Client2 has the most recent request, miliseconds after Client1.
Replica A receives Client1, which is saved, and then Client2, which is saved over Client1 since Client2 is the most recent information for that key.
Replica B receives Client2, which is saved, and then Client1, which is rejected since has an older timestamp.
Both replicas A and B have Client2, the most recent information, and therefore are consistent.
I want a Firebase to hold the last 10 most recently added objects, but no more. I'll use a web server log as an example.
Say I have a program watching a web server log. Every time a new entry is made in the log, I want my Firebase to get the IP address from that entry. But I only need the Firebase to store the last 10 IP addresses sent, not every one it ever received.
I can imagine doing this by setting up 10 objects in Firebase, say:
app/slot0
app/slot1
app/slot2
app/slot3
etc
Then PATCH slot0 to add the IP and, when done, update the slot tracker:
currentSlot++
And when currentSlot gets to 10 it wraps around and points to 0
if(currentSlot > numSlots) currentSlot = 0;
So that it's basically a list of 10 objects and I'm manually keeping track of which slot is the next one. This way I don't need to store an infinite number of items, but only the last 10. And clients listening to all of these slots will get updates every time one changes.
My question is whether this is an optimal way of doing this? I can't help thinking there is a more efficient way.
There's 100 different ways to do this but here's a thought:
Assume that an app stores 10 IP's in an array (0-9) and the IP at index 0 is the latest IP connection.
When a new connection is made, the IP at index 9 is removed from he array and the IPs at 0-8 have their indexes incremented (IP at index 0 moves to index 1, IP at index 1 moved to index 2 etc).
Then the newest IP is inserted at item 0. The array data is written to Firebase.
Depending on your platform, this is easy as inserting an IP into the array at index 0 and removing index 10, then writing to firebase.
However, try to avoid writing arrays into Firebase. There are much better ways to do this - a node with IP and a timestamp would work well.
connection_events
connection_id_0123
ip: 192.168.1.1
timestamp: 20151107133000
connection_id_4566
ip: 198.168.1.123
timestamp: 20151107093000
The connection_id's are generated by childByAutoId or push so they are 'random' but you always have the timestamp to order by.
Another thought using the above structure is to Firebase query for the oldest one and remove that node, then add the newest one. This would work since ordering is controlled by the timestamp.
I want to achieve something specific using ASP.NET and SQL Server. Let's for example that I have several pages, each one with each own identification (ie. id=1, id=5). Furthermore, let's assume that for each one of those id I have a row in the database:
So in short, what I want to achieve is: Pushing database changes in-directly to specific clients on specific pages while taking advantage of web sockets (persistent connection).
for example:
Row 1:
id = 1
name = myname1
Row 2:
id = 2
name = myname2
What I want to do is that when the specific row or even a specific value in a column changes, it will trigger an event that can send a specific data to ONLY those clients that are visiting the page with a specific id that was changed.
for example: if row 1 column name changed from 'name1' to 'name2', and the ID of the primary key is 5, I want all those who visit the page with id=5 to recieve an event in the client side.
I want to prevent myself for developing a client code that will contentiously send requests to a webservice and query that specific row by id to see if it was update or a specific column value was changed.
One solution that I thought about is to keep the key/value in memory (ie. memcache) like the key represents the id and the value will be the datetime lst updated. Then I can query the memory and if, for example, [5, 05/11/2012 12:03:45] I can know if they data was last updated by saving the last time I queries the memory in the client side, and compare the dates. If the client datetime value is older than the one in the key/value in the memory, then I would query the database again.
However, it's still a passive approach.
Let me draw it how it should work:
Client and Server have persistent connection [can be done using ASP.NET 4.5 socket protocols]
Server knows to differentiate between a connection that comes from different pages, those with different query strings for example, id=1, id=2, etc. One option I thought about is to create an array in memory that stores the connection Ids for each connection string id value. For example: {1: 2346,6767,87878, 2:876,8765,3455}. 1 and 2 are the page's identification (ie. id=1, id=2), and the other values are the connection ids of the persistent connection that I get using ASP.net 4.5
A column value in a row with primary key value id=5 has its column 'count' updated from value '1' to '2'.
A trigger calls a function and pass the id (let's assume value X) of the changed row. I prefer being able to also send specific columns' value ( some column of my choice) [this is done using CLR triggers]
The function has a list of connections, for the clients who are visiting the page with id with value X (a number)
The Server sends the client the column values, or if it's not possible, just send true or false, notifying the client that a change to that row has been taken place.
Solved until now:
1] Can be done using ASP.NET 4.5 socket protocols
4] Using CLR triggers I can have a function that gets to have the columns data and id of a specific row that was altered.
I am developing my app using ASP.NET 4.5.
Thanks
Sql Server Service Broker can accomplish a good portion of your requirements.
Service Broker allows for async messaging in sql server. Since it's asynchronous let's split up the functionality into 2 parts.
The first part is a trigger on the table that writes a message the the service broker queue. This is just straight T-SQL, and fairly straight forward. The payload of the message is anything you can convert to varbinary(max). it could be xml, a varchar(100) that contains comma seperated values, or some other representation.
The second part is the handling of the message. You issue a transact-sql RECEIVE statement to get the next message from the queue. This statement blocks until something arrives. A queue can have multiple conversations, so each client gets their own notifications.
Conceptually, it could work like this (Client is asp.net code):
Client opens Service Broker conversation .
Client sends a message which says "I'm Interested in Page=3)
Client does a RECEIVE which blocks indefinitely
UPDATE changes data for page=3
Trigger on table sends message to every conversation that is interested in Page=3
Client receives the message, and sends updated data to web browser.
No CLR required, no periodic polling of the database.