I'm using Azure CosmosDB with the SQL api and I am trying to create, in my frontend, a graph that represents, in a month, all the documents that have been uploaded each specific day. The graph should be at most a month long. Below I have attached a screenshot of a mock of my idea. After some discussion in the comments I will add the data schema too.
Example of the data message (partition key is /message/deviceId)
{
"message": {
"deviceId": "device01",
"timestamp": "2018-07-25T08:47:16,094",
"payload": "6c,65,33",
},
"id": "ff670801-de08-422c-be0a-fa67e6324bb8",
"_rid": "75klAPTTTHADAAAAAAAAAA==",
"_self": "dbs/75klAA==/colls/75klAPTTTHA=/docs/75klAPTTTHADAAAAAAAAAA==/",
"_etag": "\"0000bc1d-0000-0000-0000-5c112e5a0000\"",
"_attachments": "attachments/",
"_ts": 1544629850
}
Now my question is: what is the best way to get this type of data? I usually go for the more easy and fast Functions but I think that this kind of approach wouldn't really work since I would need to fetch pretty much all the last month worth of data to get how many times something has been uploaded; it would also cost a lot of time and money to do so.
Is there an alternative way of gathering this sort of data? Would you guys recommend another approach? If so which one? I would like not to add any more services since I am already working on a relatively large project and I'm familiarizing myself with all these services.
EDIT: Would it be a bad idea to create some sort of document that kept all the information about the current month, like an Array of days? So the query would run just for the days that are not inside the array.
Thanks a lot in advance for the help!
I'm from the CosmosDB engineering team. From your question, I understand that you need counts of documents updated per day in the last month.
You could do this in two ways:
Issue a COUNT() query with a _ts filter for the date that you're interested in. This is currently sub-optimal - we are working on serving aggregates much more efficiently, and GROUP BY support as well, but we don't have a fixed date for these features yet. If the number of documents are small enough and your collection does not have a heavy workload, you could still stick with this option.
You could setup a change feed pipeline from your source collection, trap all the changes and update a separate metadata document that indicates the number of updates per day, with changes from the feed. Here's a link to working with the change feed processor: https://learn.microsoft.com/en-us/azure/cosmos-db/change-feed
Related
I'm new to firebase and I'm currently trying to understand how to properly index frequently updating counters.
Let's say I have a list of articles on a news website. Every article is stored in my collection 'articles' and the documents inside have a like counter, a date when it was published and an id to represent a certain news category. I would like to be able to retrieve the most liked and latest articles for every category. Therefore I'm thinking about creating two indices, one for category type (in ASC order) and likes (DESC order) and one of the category type and the published date (DESC order).
I tried researching limitations and on the best practices page I found this, regarding creating hotspots with indices:
Creates new documents with a monotonically increasing field, like a timestamp, at a very high rate.
In my example I'm using articles which are not created too frequently. So I'm pretty sure this wouldn't create an issue, correct me if I'm wrong please. But I do still wonder if I could run into limitations or high costs with my approach (especially regarding to likes which can change frequently, while the timestamp is constant).
Is my approach to indexing likes and timestamps by category a sound approach or am I overseeing something?
If you are not adding documents at a high rate, then you will not trigger the limit that you cited in your question.
From the documentation:
Maximum write rate to a collection in which documents contain sequential values in an indexed field: 500 per second
If you are changing a single document frequently, then you will possibly trigger the limitation that a single document can't be updated more than 1 times per second (in a sustained burst of updates only, not a hard limit).
From the documentation on distributed counters:
In Cloud Firestore, you can only update a single document about once per second, which might be too low for some high-traffic applications.
That limit seems to (now) be missing from the formal documentation, not sure why that is. But I'm told that particular rate limit has been dropped. You might want to start a discussion on firebase-talk to get an official answer from Google staff.
Whether or not your approach is "sound" depends entirely on your expected traffic. We can't predict that for you, but you are at least aware of when things will go poorly.
I am working on a reminder app with FLutter. Is the first time I am working with Firestore. I have checked on google but I could not find the way to do it.
I want to have the option that the reminder can be set on specific date but or multiple days. Like Monday, Saturday and Sunday.
How should I setup the database? I tried with timestamp, but is not what I want. Should be ARRAY?
Cheers
Usually you should write your answer more precise. It' hard to help you because we don't even know what you want to do with your app and the solutions depends on that.
If you need to filter the entries a lot according to those days (Mo, To, Fri) you could save them in an array to filter easy on them. Also if you want to get them together with the document you save in Firestore I would recommended to use an array. That way you won't need to call on a different collection with the days to get them. The bad side of saving in an array is that you can't just update it. You would need to download the whole array, edit it and save the whole array again. But if you just save up to 7 days of a week that won't be a big deal.
On the other side if there is no need to filter on the days and to get them together with the document you could save them in a separate collection or subcollection of your document. With this approach each day of the week would be a document and you could update each of them very easy and even add additional data very easy. The bad side of this approach is that you would need to call that collection of days separately of your event document and cause more reads on Firestore.
As you can see it all depends how you want your app to work and that is the reason you could not find anything on the web. Those kind of questions are to unprecise to be answered as you expect it to be. There is no silver bullet for such kind of database structures and even the same apps with same purpose could have completely different structures and work as expected.
I hope I could at least guide you to a direction so you can go the next step.
I have a doubt on how to structure my firestore data.
Daily, I have information like:
TodayDate, From, ToUser1, Subject, Attachment2, AttachmentTypeB
TodayDate, From, ToUser1, Subject, Attachment3, AttachmentTypeA
TodayDate, From, ToUser2, Subject, Attachment4, AttachmentTypeA
TodayDate, From, ToUser2, Subject, Attachment5, AttachmentTypeC
Subject and From are never the same.
I am hesitating between Two structures but I am opened to consider other structure design.
0/ Root / doc / sub col / sub col fields
1/ users / userid / date / from,subject,etc
OR
2/ reports / date / userid / from,subject,etc
I believe solution 2 will be more cost saving in the long run since for one query, I will have more records per date than records per user. For the update, it is similar.
What are your advice, please?
Kind regards,
Julie
Given your current data structure, I suggest you simply use cloud firestore instead of realtime-database, as that scales better and you get quite good performance for very low cost.
You could start a collection, with each of the records containing your listed attributes: TodayDate, From, ToUser2, Subject, Attachment5, AttachmentTypeC. And its easy to query using where:
firestore().collection("myCollection").where("subject", "==", subject).get()
See this comparison.
UPDATE: Regarding your two options, I don't think it comes down to which option fetches/updates more/less records. It comes down to your app's requirements/actual usage. You might need to fetch the records for a specific user and not just for a specific date and vice versa. So, both structures don't really make any difference in terms of cost, unless you're sure you're sure you'll never need to fetch records per user.
Hence, I think the main focus should be on how intuitive and flexible your structure is and how easy it is to maintain over time. You should consider not using sub-collections in the first place, as it appears (from your daily record data) you could achieve what you need and get a more flexible structure with a simple collection containing documents with the necessary properties. I think sub-collections are generally needed when you are don't want to always fetch all properties of a record or when you want real-time listeners for specific properties and not the entire record. Sub-collections don't really increase/reduce the amount of records fetched, that depends on actual usage
I've been trying to figure out how to best model data for a complex feed in Cloud Firestore without returning unnecessary documents.
Here's the challenge --
Content is created for specific topics, for example: Architecture, Bridges, Dams, Roads, etc. The topic options can expand to included as many as needed at any time. This means it is a growing and evolving list.
When the content is created it is also tagged to specific industries. For example, I may want to create a post in Architecture and I want it to be seen within the Construction, Steel, and Concrete industries.
Here is where the tricky part comes in. If I am a person interested in the Steel and Construction industries, I would like to have a feed that includes posts from both of those industries with the specific topics of Bridges and Dams. Since it's a feed the results will need to be in time order. How would I possibly create this feed?
I've considered these options:
Query for each individual topic selected that includes tags for Steel and Construction, then aggregate and sort the results. The problem I have with this one is that it can return too many posts, which means I'm reading documents unnecessarily. If I select 5 topics between a specific time range, That's 5 queries, which is ok. However, each can have any possible amount of results, which is problematic. I could add a limit but then I run the risk of posts being omitted from topics even though they fall within the time range.
Create a post "index" table in Cloud SQL and perform queries on it to get the post ID's then retrieve the Firestore documents as needed. Then the question is, why not just use Cloud MySql.... Well it's a scaling, cost, and maintenance issue. The whole point of firestore is not having to worry so much about DBAs, load, and scale.
I've not been able to come to any other ideas and hoping someone has dealt with such a challenge and can shed some light on the matter. Perhaps firestore is just completely the wrong solution and I'm trying to fit a square peg into a round hole, but it seems like a workable solution can be found.
The perfect structure is to have separate node for posts then for each post you give it a reference parent category eg Steel and Construction. Have them also arranged with timestamps. If you think that the database will be too massive for firebase's queries. You can connect your firebase database to elasticsearch and do the search from there.
I hope it helps.
Apologies for this long winded question but I just want to walk through my research into trying to solve this problem myself.
My first stackoverflow question... here goes...
My meteor app was plugging along nicely until I wanted to have a standings table. This 30 years of basketball standings article was inspirational. I wanted to tap into that power and flexibility that Mongo provided and keep track of team and player standings I bought this book on Mongo (highly recommended). I have multiple collections and one collection called 'games' that keeps track of games and scores per game. Here is a same document:
{
"leagueId" : "6RtH74QbxGG7xbZXh",
"regionId" : "KDbqfAoKDx2iDDXSS",
"seasonId" : "b5HkGcWFNenpGGvCd",
"gameTime" : "15:00",
"gameDate" : "11/23/2014",
"gameNumber" : 4,
"gameStatus" : "played",
"homeTeam" : "MYBw2RiNwrBhfh9W8",
"homeTeamScore" : 4,
"awayTeam" : "fwx79JJFob5XbaAx6",
"awayTeamScore" : 2,
"gameType" : "regular_season",
"userId" : "4MKaZK84AdZ8j3xr2",
"author" : "league",
"submitted" : "Wed Dec 10 2014 09:51:48 GMT-0800 (PST)",
"_id" : "Gwsu6X6DXXzavdqZQ"
}
I determined the aggregate framework was the way to go but after digging into this I saw that Meteor had some client side limitations with the mongo aggregate framework (stackoverflow.com/questions/11266977/are-group-by-aggregation-queries-possible-in-meteor-yet).
I also thought about [denormalizing my document structure after reading articles like this][4]. And after reading the Denormalization chapter of Discover Meteor (good book!), it seems not denormalizing gives me a lot more flexibility but I can denormalize in Meteor/Mongo with other benefits. The NBA article mentioned earlier is very denormalized and not sure if I can replicate that structure using Meteor/Mongo.
I then jumped in the Aggregate rabbit hole and found some good reading (stackoverflow.com/questions/18520567/average-aggregation-queries-in-meteor/18884223#18884223 - I tried this using my data and I saw the output on the server but when I added the subscription to 'waiton' in my iron router there were no errors and it just hung. So it worked on the server, not the client?).
I then found this article which looked very promising. That led me to the atmosphere package ([meteorhacks:aggregate][8]... and it's dependencies) but the article seemed more directed at experts with mongo as I could not get my data client side.
From all my reading I surmise that I should create a new collection in my 'collections' folder (available on client and server) and then I publish that collection using my aggregate code ($group and $match and cool things like $sum). Then I'm really not sure what to do next. Do I create a server method and call that method from my client side code (tried that and couldn't get it to work)? Am I supposed to subscribe to the collection in iron-router (every time I did, my app just hung with no errors)?
If anyone gets how to use aggregate with meteor I would really appreciate some/any guidance. I think I just need a few more pieces of information before I can generate that standings table with wins, losses and draws.
note: I took a detour and thought MapReduce might be the solution(thebhwgroup.com/blog/2014/09/bi-nodejs-couchdb-mapreduce) but I couldn't get this to work in Meteor either.
You raise a number of topics and there are probably a lot of solutions to get where you want. One key point is that you cannot currently run mongo aggregations on the client side in Meteor. So any approach where you are aggregating game data into standings needs to first aggregate the game data server-side, and only then publish a cursor of the results to the client.
To break your problem down into more manageable simples, I would first focus on assembling an aggregation on the server side that looks the way you want. You will need a package such as meteorhacks:aggregate as you mentioned, or others that accomplish the same purpose.
Then, for starters, you should simply run the aggregation pipeline server-side on Meteor.startup a few times until it looks the way you want. To get a quick look at it, you could save it to a variable and log it in the console (again, server-side). Alternatively, you could save it into a new Collection such as Standings and inspect it using meteor mongo from the console. Part of the issue is that you may be getting ahead of things trying to publish and subscribe to your collections with Iron Router while you are still thinking about how to set up your collections.
This gets to the next point. Once the aggregation document looks the way you want on the server, you'll need to decide whether you want to save the results each time you aggregate a new set of standings, or assemble them on the fly each time you publish it. In the Owens article you linked to, he is aggregating data directly inside of a publish function. This means it is re-run each time it is published or subscribed to, which could be very fast or very slow depending on the size and complexity of the data. It is also important to consider that under this model you are not actually publishing a collection of standings to the client - you are publishing a mutated collection of games. If you expect your standings to acquire their own properties, change over time, or you want to poll them historically, you are best off creating a new Collection such as Standings and inserting a new standing into the collection each time you receive new game results.
Your publish function could then be fairly simple, e.g., Standings.find({}, {limit: 1}) with a query for the most recent standing. You can subscribe in Iron Router's waitOn as you suggested.
To insert new entries into the Standings collection, you could call a server side method like Meteor.Call("createStandings") each time a new game is introduced. This could result in a large document over time, which you could cull or even write over if you don't care about maintaining historical records.
Again, there are a lot of ways you could take this. Hope this helps.