Can I use Transactions and batched writes for read only?

Can I use Transactions and batched writes for read only? - firebase

Can I read multi data from firestore in one connection similar to Transactions and batched writes but without written.
For example :
I logged in via google button and player name is Player1
First connection : I want read highest 10 players they have diamonds.
Second connection : I want read diamonds Player1.
Can I mix first and second connection in one connection.
Because I want if first connection failed so cancel second connection. Or if first connection connected successfully and second connection failed so cancel first connection etc... , I hope you understand what I mean.

Transactions and batched writes are write operations. There's nothing similar for read operations, nor should it really be needed.
If you want to fail subsequent reads, you should:
Sequentially start each read, after the previous read has completed.
Start all reads at the same time, but check the status of each completed read operation. Only continue once all read operations completed successfully.
From reading your question it sound like you want to do a client-side join of the player info for the top 10 ranked players. This typically leads to 11 reads.
The query to get the top 10 scores, which includes the player's UIDs.
The 10 individual document reads to get each top player's profile.
In this case you could for example keep a counter to track how many of the player profiles you've already successfully read. Once that counter reaches 10, you know you have all the player profiles, and can start any subsequent operation you might have. If you want to fail the entire operation once any player profile fails to load, you'll want a separate flag for that too.

Related

Cloud Firestore throttling high-volume update syncing

(Note: sorry if I am using the relational DB terms here.)
Let's say I have ten clients that are connected to a database. This database has a sustained throughput of about 1k updates per second. Obviously sending 1k updates per second to a web-browser (let's say 1MB data changes per second) is not going to be a good experience for the end-user. Does Firebase have any controls as to how much data a client can 'accept' before it starts throttling it? I understand it may batch requests, but my point here is, Google can accept data/updates faster than a browser can (potentially from a phone on a weak internet connection), so what controls or techniques are there in place to control this experience for the end-user?
The only items I see from the docs are:
You should not update a single document more than once per second. If you update a document too quickly, then your application will experience contention, including higher latency, timeouts, and other errors.
https://firebase.google.com/docs/firestore/best-practices#updates_to_a_single_document

This topic is covered in here, keeping the language used to code in aside, the linked code in that answer can assist.
In a general explanation, if your client application is configured to listen for Firestore updates, it will receive all the update events to that listener (just like you mentioned is happening).
You can consider polling Firebase for changes. The poll can even be an extension of the client application code where the code tracks the frequency of the updates being received and has a maximum value of updates per second which, when reached, results in the client disconnecting as a listener and performing periodic polls for the data.
The listener could then be re-established after a period to continue the normal workflow when there are fewer updates again.
The above being said, this is not optimal and treats the symptom rather than the cause. If a listener is returning too many updates, you should consider the structure of the data and look to isolate the updates to only require updates to listeners that require it.
Similarly, the large updates can be mitigated by ensuring smaller records contain the changes resulting in less data.
A generalized example is where two fields of data are updated, but the record is 150 fields in size. Rather than returning the full 150 fields, shard the fields into different data sets, so the two fields are in their own record with an additional reference field used to correlate with a second data set of the remaining 148 fields (plus the reference field).
When the smaller record is updated, the client application receives the small update, determines if the update is applicable to itself, and if so, fetches the corresponding larger record.

To prevent high volumes of writes from overwhelming the client's snapshot listeners, you could periodically duplicate the writes to a proxy collection that the client watches instead.
Documents would need a field to record the time of the last duplicate write to the proxy collection, and the process performing the writes should avoid making writes to the proxy collection until after the frequency duration has elapsed.
A small number of unnecessary writes may still occur due to any concurrent processes you have, but these might be insignificant in practice (with a reasonably long duplication frequency).
If the data belongs to a user, rather than being global data, then you could conceivably adjust the frequency of writes per user to suit their connection, either dynamically or based on user configuration.
In this way, your processes get to control the frequency of writes seen by clients, without needing to throttle or otherwise reject ingress writes (which would presumably be bad news for the upstream processes).
Relevant part of the documentation below.
https://firebase.google.com/docs/firestore/best-practices#realtime_updates
Limit the collection write rate
1,000 operations/second
Keep the rate of write operations for an individual collection under 1,000 operations/second.
Limit the individual client push rate
1 document/second
Keep the rate of documents the database pushes to an individual client under 1 document/second.

Firebase update single document concurrently from multiple devices

In my project I have to update field value concurrently from multiple devices and check if the count reaches to 0. When I reduce value of field by 1 then I have make entry in another table. So when I try to update concurrently from 2 devices, the field decrement only by 1 instead of two. And when I tried using transaction, multiple entries are getting added to other table.

If multiple client need to modify multiple documents concurrently, you'll need to use a transaction. In this transaction you then get() the documents, and write the updates back to the database.
Once you do this in a transaction, the Firestore servers will detect conflicting updates, and force the second client to retry - at which point it gets the correct value (0 IIRC) and can then alert the user that there's no more stock.

Is there a way to use the TTL feature of CosmosDb but always keep at least n records

I have diagnostic data for devices being written to cosmos, some devices write 1000's of messages a day while others write just a few. I always want there to be diagnostics data regardless of when it was added but I don't want to retain all of it forever. Adding a TTL of 90 days works fine for the devices that are very active, they will always have diagnostics data as they are sending it in on a daily basis. The not so active devices will loose their diagnostics logs after the TTL.
Is there a way to use the TTL feature of CosmosDb but always keep at least n records?
I am looking for something like only keeping records from the last last 90 days (TTL) but always keep at least 100 documents regardless of the last updated timestamp.

There are no built-in quantity-based filters for TTL: you either have collection-based TTL, or collection+item TTL (item-based TTL overriding default set in the collection).
You'd need to create something yourself, where you'd mark eligible documents for deletion (based on time period, perhaps?), and then run a periodic cleanup routine based on item counts, age of delete-eligible items, etc.
Alternatively, you could treat low-volume and high-volume devices differently, with high-volume device telemetry written to TTL-based collections, and low-volume device telemetry written to no-TTL collections (or something like that)...
tl;dr this isn't something built-in.

Short anwser: there's no such built in functionality.
You could create your own Function App working on a schedule trigger that fires a query as such:
SELECT *
FROM c
WHERE NOT IS_DEFINED(c.ttl) --only update items that have no ttl
ORDER BY c._ts DESC
OFFSET 100 LIMIT 2147483647 --skip the newest 100
and then updates the items from it by setting a ttl for them. That way you'll be assured that that the newest 100 records remain available (assuming you don't have another process deleting others), while cleaning up the other items periodically. Keep in mind the update resets the tll as _ts will be updated.

Exactly-once semantics in Dataflow stateful processing

We are trying to cover the following scenario in a streaming setting:
calculate an aggregate (let’s say a count) of user events since the start of the job
The number of user events is unbounded (hence only using local state is not an option)
I'll discuss three options we are considering, where the two first options are prone to dataloss and the final one is unclear. We'd like to get more insight into this final one. Alternative approaches are of course welcome too.
Thanks!
Approach 1: Session windows, datastore and Idempotency
Sliding windows of x seconds
Group by userid
update datastore
Update datastore would mean:
Start trx
datastore read for this user
Merging in new info
datastore write
End trx
The datastore entry contains an idempotency id that equals the sliding window timestamp
Problem:
Windows can be fired concurrently, and then can hence be processed out of order leading to dataloss (confirmed by Google)
Approach: Session windows, datastore and state
Sliding windows of x seconds
Group by userid
update datastore
Update datastore would mean:
Pre-check: check if state for this key-window is true, if so we skip the following steps
Start trx
datastore read for this user
Merging in new info
datastore write
End trx
Store in state for this key-window that we processed it (true)
Re-execution will hence skip duplicate updates
Problem:
Failure between 5 and 7 will not write to local state, causing re-execution and potentially counting elements twice.
We can circumvent this by using multiple states, but then we could still drop data.
Approach 3: Global window, timers and state
Based on the article Timely (and Stateful) Processing with Apache Beam, we would create:
A global window
Group by userid
Buffer/count all incoming events in a stateful DoFn
Flush x time after the first event.
A flush would mean the same as Approach 1
Problem:
The guarantees for exactly-once processing and state are unclear.
What would happen if an element was written in the state and a bundle would be re-executed? Is state restored to before that bundle?
Any links to documentation in this regard would be very much appreciated. E.g. how does fault-tolerance work with timers?

From your Approach 1 and 2 it is unclear whether out-of-order merging is a concern or loss of data. I can think of the following.
Approach 1: Don't immediately merge the session window aggregates because of out of order problem. Instead, store them separately and after sufficient amount of time, you can merge the intermediate results in timestamp order.
Approach 2: Move the state into the transaction. This way, any temporary failure will not let the transaction complete and merge the data. Subsequent successful processing of the session window aggregates will not result in double counting.

Efficient DynamoDB schema for time series data

We are building a conversation system that will support messages between 2 users (and eventually between 3+ users). Each conversation will have a collection of users who can participate/view the conversation as well as a collection of messages. The UI will display the most recent 10 messages in a specific conversation with the ability to "page" (progressive scrolling?) the messages to view messages further back in time.
The plan is to store conversations and the participants in MSSQL and then only store the messages (which represents the data that has the potential to grow very large) in DynamoDB. The message table would use the conversation ID as the hash key and the message CreateDate as the range key. The conversation ID could be anything at this point (integer, GUID, etc) to ensure an even message distribution across the partitions.
In order to avoid hot partitions one suggestion is to create separate tables for time series data because typically only the most recent data will be accessed. Would this lead to issues when we need to pull back previous messages for a user as they scroll/page because we have to query across multiple tables to piece together a batch of messages?
Is there a different/better approach for storing time series data that may be infrequently accessed, but available quickly?

I guess we can assume that there are many "active" conversations in parallel, right? Meaning - we're not dealing with the case where all the traffic is regarding a single conversation (or a few).
If that's the case, and you're using a random number/GUID as your HASH key, your objects will be evenly spread throughout the nodes and as far as I know, you shouldn't be afraid of skewness. Since the CreateDate is only the RANGE key, all messages for the same conversation will be stored on the same node (based on their ConversationID), so it actually doesn't matter if you query for the latest 5 records or the earliest 5. In both cases it's query using the index on CreateDate.
I wouldn't break the data into multiple tables. I don't see what benefit it gives you (considering the previous section) and it will make your administrative life a nightmare (just imagine changing throughput for all tables, or backing them up, or creating a CloudFormation template to create your whole environment).
I would be concerned with the number of messages that will be returned when you pull the history. I guess you'll implement that by a query command with the ConversationID as the HASH key and order results by CreationDate descending. In that case, I'd return only the first page of results (I think it returns up to 1MB of data, so depends on an average message length, it might be enough or not) and only if the user keeps scrolling, fetch the next page. Otherwise, you might use a lot of your throughput on really long conversations and anyway, the client doesn't really want to get stuck for a long time waiting for megabytes of data to appear on screen..
Hope this helps

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex