We have a process where we need to .get() a large (~50-200) number of keys, multiple times, in one process. We can't use ndb.get_multi to get the keys all in one go, unfortunately. But we do know the keys in advance that we are likely to want. We are wondering -- would it be smart to get_multi these keys at the beginning of our process? Would that put the entities in memcache for faster lookup later?
Yes. They are added to memcache.
You can check it with app stats
Related
I have a collection with thousands of documents all of which have a synthetic partition key property like:
partitionKey: ‘some-document-related-value’
now i need to change values for partitionKey. of course, it takes recreation of documents in order to do so but i am wondering what is the most efficient/straightforward way to do it?
should i use azure function with cosmosdbtrigger? (set to start feed from begining)
change feed processor?
some other way?
i’m looking for quickest solution thats still reliable.
Yes, change feed is a common way to migrate data from one container to another. Another simple option may be to use Data Migration Tool where you build your new partition key in the select statement.
Hopefully this is helpful.
I am looking at CosmosDB partitioning facility and what I have got so far is that it is good for performance. It can really help us in avoiding the fanout queries but I have got stuck into one question with partitioning. For partitioning in write if I have got different type of documents, can be thousands of them, belong to same partition the write operation will be slow but if I give them different partition key then I will lose the transactional behaviour because store procedures are scoped to one transaction.
My use case is I have got different type of documents within same collection and at one given time i will be updating and inserting thousands of different type of documentation and I have to do that within the same transaction which means I have to use the same key but if I do that then I will be doing HOT write operation which is not suggested in CosmosDB. Anyhelp on how to achive this issue will be be appreciated.
People use stored procedures to batch their documents and today it does constrain you to one partition. However, be aware of other limitations that your partition key should be as such that your documents fan out in different partitions. So your one batch can be for one partition key and next batch is for another.
read more here
https://learn.microsoft.com/en-us/azure/cosmos-db/partition-data
hope this help.
Rafat
Its tricky.. I do have a large set of docs within a single partition at the moment, maybe later on I would need to redesign the collection. Right now I am using a bulk insert/update lib in CosmosDB. Link https://learn.microsoft.com/en-us/azure/cosmos-db/bulk-executor-overview Way faster for large data inserts/updates, its Microsoft backed library, however it supports transactional behaviour but only withing a single partition. So at the moment, I am safe.
I am using a riak bucket to store a list of messages, using a UUID as the key and a json message as value. This is working fine.
What I need is an efficient way to get a single message from the bucket without knowing its key, at least in one of these two scenarios:
Get the last inserted object (this is my prefered approach).
Get a random object from the bucket (if the first alternative is not possible).
Is there any efficient way to achieve that?
I think one alternative could be to retrieve the keys in the bucket and then get the first one. But this means making two calls to riak, one to obtain all the keys (just to discard all but one) and a second one to obtain the object. It does not seem very efficient.
As Riak is a key-value store, the by far most efficient way to retrieve data is through the keys. Listing or retrieving all keys in a bucket, even if you only end up using the one returned first, is one of the least efficient operations you can perform as it causes Riak to scan ALL keys in the system (not just the bucket), and it is usually recommended NEVER to use this on a production system.
The most efficient way to get the last inserted object would probably be to store the id in a separate, known record in a different bucket. This would however require you to perform two writes on every insert and two reads for every read, but would do so in the most efficient way. You could possibly implement a post-commit hook (would have to be in Erlang as it is not currently not possible to write records using JavaScript functions) on the bucket containing messages to get the system to perform the update for you, which would remove the need for the last write.
If you write a lot of data to the bucket containing messages, you may want to adjust the separate bucket so that it does not allow multiple values and that the last value wins. This way you would reduce the risk of having lots of siblings created due to frequent updates to this single record across the system. This would always give you one of the last written records, but not necessarily the last one (especially if you frequently write messages to the database), as Riak does not support any type of atomicity and is an eventually consistent database.
You could also create one or more secondary indexes if you are using the leveldb backend, and use this to limit your scan to only recent records, which would be more efficient than a scann of all keys. You could then either select the most recent key or a random one through mapreduce, but this would be much less efficient than the previously described approach.
I can not think of any efficient way to retrieve a random record in a bucket from Riak unless you know the range of keys you have inserted and can decide randomly on the client which one to get. One way to do this would be to generate all keys in sequence rather than using a UUID, but that is naturally not a good idea in a highly concurrent distributed system.
1st task is pretty easy to implement:
Add post-commit hook that will write the last inserted key to some predefined key/bucket place
Get the key from that predefined key/bucket and issue a get query using them
It's still two operations but both are just gets that are fast. Plus additional overhead on hook but nothing too heavy either.
2nd scenario is also easy, but it is way too inefficient to be used practically:
Get all keys (extremely expensive operation)
Pick random
Issue get
I have come up with the same scenario. In My scenario I have to save the users. For that I required an auto increment Id. So what I did is, I placed the last inserted key in a separate bucket as like mentioned by "Christian Dahlqvist", every time I want to insert new record I fetch the last inserted key from that key bucket. Here we have only one value in that bucket with the key as "LastKey" which is always known to us. And I incremented the key based on the fetched key and again updated the key bucket. So always the key bucket contains the latest key in it.
I need to store 2 values (counters) for my ASP.NET web app. The counters always grow, they should never return to 0. So one option would be to save them in DB, what other options do I have, because storing them in a table seems disproportionate? Session is not an option because the counters have to survive app restarts.
Thanks :-)
Storing the counts in a DB table sounds perfectly appropriate here.
The other options which come to mind are to use a file, which would not be very reliable ACID wise, or memory, using something like memcache, which would not survive a system reboot.
Don't worry about using a DB table. It will hardly take up any storage space or incur any significant overhead unless they are being written to very frequently. If that's the case, please add more info and we may be able to propose other solutions.
If these counters are rarely updated, and can be machine dependent (you're not in a cluster) then I'd use something simple, like writing their values to a Settings file. Keep in mind you'll have to cope with multi-threading.
If there's a lot of access to the counters, store them in a database.
Save them in a file. Fastest way.
I need to manage the acquisition of many record at hour. About 1000000 records. And I need to get every second the last insert value for every primary key. It works quit well with sharding. I was thinking to try the use os capped collection to get only the last record for every primary key. In order to do this, I made two separated insert, there is a way, into mongodb, to make some kind of trigger to propagate the insert into a collection to another collection?
MongoDB does not have any support for triggers or similar behavior.
The only way to do this is to make it happen in your code. So the code that writes the first entry should also write the second.
People have definitely requested triggers. If they are necessary for your solution, please cast a vote on the feature request.
I disagree with "triggers is needed". People, MongoDB was created to be very fast and to provide as basic functionalities as can be. This is a power of this solution.
I think that here the best think is to create triggers inside Your application as a part of Data Access layer.