Xamarin.Forms: Synchronization with Azure MobileServices too slow - sqlite

I'm trying to implement Azure MobileServices in Xamarin.Forms, following this tutorial: https://learn.microsoft.com/en-us/azure/developer/mobile-apps/azure-mobile-apps/quickstarts/xamarin-forms/offline
but I noticed that synchronization is very slow. For example, I synchronized a db containing 15 tables and about 60k records, and the entire process required about 6 mins! The result changes a little if I rerun the operation on a db already synchronized.
Does it possible to improve entire process?
I have some doubts that the technology is already used extensively, because there is very little documentation on internet and it is often out of date.
In this case, what are the alternatives?

Firstly, 60K records takes a long time to synchronize the first time. It's inevitable because of the amount of data to transfer. 6 minutes is not a surprise.
However, you should have implemented the appropriate stuff for incremental sync. That includes ensuring your model has the UpdatedAt and CreatedAt timestamps, plus a globally unique ID, and naming your query when you use PullAsync(). Something like:
await mTable.PullAsync('allItems', mTable.CreateQuery());
More information: https://azure.github.io/azure-mobile-apps/howto/client/dotnet/#syncing-an-offline-table

Related

Firebase database high delay after a long standby

I'm currently testing Firebase on a non-production Firebase app which I am the only one who works on.
When I try to query the database to retrieve the data after there has not been any query during the last 24 hours, the query take about 8 seconds. After a query is done, the next ones would take normal amount of time (about 100ms).
This is not about caching the queries, by "next queries" I mean new queries which are not the same.
To reproduce it:
Create a database node called users, users children are user data (first name, last name, age, gender, etc)
Add 500,000 users to this node
Get a user by its UID and measure the time. (It should take about 100ms)
Wait 24 hours (I don't know the exact time, but I'm sure about 24 hours)
Get any user by its UID and measure the time. (It should take about 8sec)
Get any user by its UID and measure the time. (It should take about 100ms)
I want to know if this is a known issue to Firebase realtime database or not?
I reached Firebase support, they were able to recreate the issue and faced a wait time of about 6 seconds. Here is their answer after the investigation:
It looks like this is intended behavior. The realtime database queries work by building the index in-memory, which takes time linear to the number of nodes at that location. Once the index is built things are very fast, but the initial build can take a bit to build, especially for large locations.
If you wants the index to stay in memory on the database you should have a listener always listening for this query.
So basically the database takes a long time to process the query because of indexing the large database.
The problem can be solved by keeping a listener on the database or querying the database every few hours.
In production it is not very likely that you face this problem, because the database is being accessed by the user all the time, but if your database is not accessed all the time and you don't want the users experience that long wait time, you should utilize the discussed solution.
Firebase keeps recently used data in its internal cache. This cache is cleared after a few minutes.
But the exact numbers depend on how much data you're loading and how you're loading that data. Without seeing a specific setup that shows how to reproduce these numbers there really isn't much anyone can say.

Is it ok to build architecture around regular creation/deletion of tables in DynamoDB?

I have a messaging app, where all messages are arranged into seasons by creation time. There could be billions of messages each season. I have a task to delete messages of old seasons. I thought of a solution, which involves DynamoDB table creation/deletion like this:
Each table contains messages of only one season
When season becomes 'old' and messages no longer needed, table is deleted
Is it a good pattern and does it encouraged by Amazon?
ps: I'm asking, because I'm afraid of two things, met in different Amazon services -
In Amazon S3 you have to delete each item before you can fully delete bucket. When you have billions of items, it becomes a real pain.
In Amazon SQS there is a notion of 'unwanted behaviour'. When using SQS api you can act badly regarding SQS infrastructure (for example not polling messages) and thus could be penalized for it.
Yes, this is an acceptable design pattern, it actually follows a best practice put forward by the AWS team, but there are things to consider for your specific use case.
AWS has a limit of 256 tables per region, but this can be raised. If you are expecting to need multiple orders of magnitude more than this you should probably re-evaluate.
You can delete a table a DynamoDB table that still contains records, if you have a large number of records you have to regularly delete this is actually a best practice by using a rolling set of tables
Creating and deleting tables is an asynchronous operation so you do not want to have your application depend on the time it takes for these operations to complete. Make sure you create tables well in advance of you needing them. Under normal circumstances tables create in just a few seconds to a few minutes, but under very, very rare outage circumstances I've seen it take hours.
The DynamoDB best practices documentation on Understand Access Patterns for Time Series Data states...
You can save on resources by storing "hot" items in one table with
higher throughput settings, and "cold" items in another table with
lower throughput settings. You can remove old items by simply deleting
the tables. You can optionally backup these tables to other storage
options such as Amazon Simple Storage Service (Amazon S3). Deleting an
entire table is significantly more efficient than removing items
one-by-one, which essentially doubles the write throughput as you do
as many delete operations as put operations.
It's perfectly acceptable to split your data the way you describe. You can delete a DynamoDB table regardless of its size of how many items it contains.
As far as I know there are no explicit SLAs for the time it takes to delete or create tables (meaning there is no way to know if it's going to take 2 seconds or 2 minutes or 20 minutes) but as long your solution does not depend on this sort of timing you're fine.
In fact the idea of sharding your data based on age has the potential of significantly improving the performance of your application and will definitely help you control your costs.

Riak and time-sorted records

I'd like to sort some records, stored in riak, by a function of the each record's score and "age" (current time - creation date). What is the best way do do a "time-sensitive" query in riak? Thus far, the options I'm aware of are:
Realtime mapreduce - Do the entire calculation in a mapreduce job, at query-time
ETL job - Periodically do the query in a background job, and store the result back into riak
Punt it to the app layer - Don't sort at all using riak, and instead use an application-level layer to sort and cache the records.
Mapreduce seems the best on paper, however, I've read mixed-reports about the real-world latency of riak mapreduce.
MapReduce is a quite expensive operation and not recommended as a real-time querying tool. It works best when run over a limited set of data in batch mode where the number of concurrent mapreduce jobs can be controlled, and I would therefore not recommend the first option.
Having a process periodically process/aggregate data for a specific time slice as described in the second option could work and allow efficient access to the prepared data through direct key access. The aggregation process could, if you are using leveldb, be based around a secondary index holding a timestamp. One downside could however be that newly inserted records may not show up in the results immediately, which may or may not be a problem in your scenario.
If you need the computed records to be accurate and will perform a significant number of these queries, you may be better off updating the computed summary records as part of the writing and updating process.
In general it is a good idea to make sure that you can get the data you need as efficiently as possibly, preferably through direct key access, and then perform filtering of data that is not required as well as sorting and aggregation on the application side.

ASP.NET: Create static collection for table data that doesn't change

I'm creating an ASP.NET MVC app that uses EF to perform all DB tasks.
There's a couple of related tables in the database that never change and I was thinking on creating a static collection that retrieves the data from those two tables (it's a few hundred records) the first time it is requested and just stores it in an object to prevent hitting the database every time.
Since I've read several people saying that you should avoid static objects in ASP.NET I was wondering if this was a bad practice or if it is acceptable for scenarios like this (read-only and small amount of data which should prevent concurrency problems).
Also I would like to know if there are other better alternatives to do this.
Thanks.
I have done exactly what you are planning on doing for exactly the same reason. It has been working well for several years already.
Just make sure that you get the initialization of the data right and you should be fine. When initializing, keep in mind:
Don't use locking if at all possible (or your app will deadlock 2
minutes before you're going on vacation)
You MUST NOT under any circumstance let a static constructor fail
Make sure no consumer of your cache has the ability to modify it
If the data isn't really static and you would actually need to re-read it fairly often then this might not be the best solution.
Just in case you're wondering, I've used this approach to cache for instance country data, currency data (base data, not rates), sales unit data (pcs, m, kg etc). These are all stored in a database but almost never change.
It is not a very good approach to use static objects. I would use something like RavenDB which can be used to store your settings or DB data in-memory. It has a very small footprint and is very fast. It has full LINQ support.

Storing messages and threads in Windows Azure Table Storage

I am designing a simple messaging service using ASP.NET MVC / Windows Azure Table Storage. I have two kinds of entities - messages and message threads. Relation between them is simple - each thread can have multiple messages but the message can only be assigned to one thread.
Table storage is not a relational DB, so representing relations is always a bit tricky. I need to decide between 2 approaches:
Having one big table for threads and one for messages. And having threadId as a partition key of message entity so that messages are partitioned by threads.
Dynamically creating a special table for each message thread and having threadId as a name of the table.
I tend to prefer the second because it fits better into architecture of the rest of the service. But there will obviously be large number of tables created in a storage account.
Do you think this may be a problem?
You could also consider having just one table, that stores both Thread and Message entities. This would give you transaction support, and you could use Lucifure's hybrid approach on this table.
Creating a large number of tables may be an issue, depending on how you want to manage them. The underlying REST API for listing tables works like a query for table entities. It only returns the first 1000 tables, after that you have to use a continuation token. All of the storage explorers I've seen don't allow you to query tables based on name, they simply like the first 1000 tables. If you end up with 20000 threads, it could take you a while to get to the table you want.
One way you could mitigate this is to put your message table in its own storage account. This way your storage account with all of your other tables won't get crowded out by all of these dynamic tables that you will be creating and possibly deleting.
Deleting is actually one of the ways in which using a separate table for each thread would be easier. To delete all of the related messages you simply have to delete one table rather than iterating over each message and deleting it.
Everything else however will be more complicated than keeping all of the messages in one table. If this is core functionality to your app and you can dedicate enough time to develop it this way, one table per thread is probably a good idea. Otherwise the easy way to do things is with one big table.
You may consider a hybrid approach to keep the number of tables to a manageable level, depending on your scalability needs.
My experience has been that date based partitioning at the table level is a very effective approach and can be leverage across the board.
For example you could partition tables based on date and with a granularity of day or month. So a table name like “Thread201202” could be used for all threads started in February 2012.
Your thread id would implicitly include the “201202” and be something like “201202-myid01” although you would not need to explicitly store it in the partition key since it would be implied in the table name.
Aged threads could then be easily disposed by deleting tables say more than a year old.

Resources