Is their any way get last updated record of key from riak - riak

Suppose key contains following data
{name:"abc"}
Then I override it with new data
{name:"aaa",grade:"A"} .
Is their any way in riak where I get old data ? i.e {name:"abc"}

The short answer is that without using siblings there is no automated way in Riak to retrieve that last value for a specific key.
The answer to this type of problem is to build a versioning system into your application where you store N number of versions of your key/value pair based on your business needs.

Related

Firebase Functions - Delete data without knowing the key

I am inserting data in Firebase Realtime Database in a table with the above structure. The key of the data is auto-generated based on push. After several such entries are created, sometime due to certain conditions I may need to delete one of the entries. At the point of deleting the entry, I may know some of the values of the node that I want to delete like createdAt and createdForPostID. But I will not know the key as it was auto-generated using push feature of firebase database. A combination of createdAt and createdForPostID makes a unique combination and only one such entry should exist in the database.
What would be the most efficient way to identify the entry without having to retrieve the entire node at OUTBOUND?
The reason I am using push is because Firebase claims it to be efficient and not subject to write conflicts. I also rely on the auto-sorting by date/time offered by push.
If no efficient way can be found, then I will generate my own key using date/time stamp. But I am hoping that this is a problem that someone has solved before and hence can guide me.
Any suggestions are welcome.
You'll need to run a query to find the items that match your conditions.
Since you seem to have multiple properties in your conditions, and the Firebase Database can only query on a single property, you'll need to combine the values into a single property as shown here.
Then you can run a query on that combined property and delete the items it returns:
var query = ref.orderByChild("createForPostID-createdAt").equalTo("20171229_124904-20171230_200343");
query.once("value", function(snapshot) {
snapshot.forEach(function(child) {
child.ref.remove();
})
Given Frank's answer I realised, I needed to create a unique property as per his suggestion because I will need it to do the future query. But then it seemed that I may be better off using that unique property as the key instead of using push
So it seems from an overall perspective, it might be more efficient to create your own key instead of push, if the app needs both create and delete functions. Reliance on push makes sense only if data is being created and deletion is not a big functionality of your app.
So, in conclusion, for Firebase data, the most efficient way to do both data create and delete needs creation of a unique key on your own.

Fetch new entities only

I thought Datastore's key was ordered by insertion date, but apparently I was wrong. I need to periodically look for new entities in the Datastore, fetch them and process them.
Until now, I would simply store the last fetched key and wrongly query for anything greater than it.
Is there a way of doing so?
Thanks in advance.
Datastore automatically generated keys are generated with uniform distribution, in order to make search more performant. You will not be able to understand which entity where added last using keys.
Instead, you can try couple of different approaches.
Use Pub/Sub and architecture your app so another background task will consume this last added entities. On entities add in DB, you will just publish new Event into Pub/Sub with key id. You event listener (separate routine) will receive it.
Use names and generate you custom names. But, as you want to create sequentially growing names, this will case performance hit on even not big ranges of data. You can find more about this in Best Practices of Google Datastore.
https://cloud.google.com/datastore/docs/best-practices#keys
You can add additional creation time column, and still use automatic keys generation.

DynamoDB: How to find unique hash keys from primary key if its hash-range schema?

I have a dynamodb table.
It has Primary partition key - IdType (String) and Primary sort key - Id (String)
As it's hash range schema, IdType is not unique and one key can be multiple times. I need to find all the unique IdType.
How do we find that? One possible solution is to get all IdType using Scan and process all client side and find unique using our own code. But scan is expensive and scan only limits to 1MB data per scan so it is not feasible to scan as the table is already more than 1 MB data and it will gradually increase in future.
Is there any other way to do this? Any help would be appreciated.
PS: There are no indexes
Short answer would be NO, to query DynamoDB table the first thing you need is the Hash key so this eliminates all the options of Querying data because you must have hash key to find the data.
As far as I know DyanmoDB does not have any inbuilt attribute for finding a uniqueness of a key.
If you want to achieve this you can do it by
1) Scanning the table as you have mentioned and filter it at an application level.
2) If your data is not updated frequently then you can store the data in cache and retrieve the desired information
3) You can use another AWS service called cloudSearch to achieve the desired result (have to pay more)
If you are able to achieve with another method please do share it.
Hope that helps

Maximum records can be stored at Riak database

Can anyone give an example of maximum record limit in Riak database with specific hardware details? please help me in this case.I'm going to build a CDR information system. Will it be suitable to select Riak as my database?
Riak uses the 2^160 SHA-1 hash value to identify the partitions to store data in. Data is then stored in the identified partitions based on the bucket and key name. The size of the hash space is therefore not related to the amount of data that can be stored. Two different objects that happen to hash to the same value will therefore not overwrite each other.
When working with Riak, it is important to model your data correctly and consider how it needs to be retrieved and queried during the design process. Ideally you should try to ensure that the vast majority of your queries can be done through direct key access. It is often recommended to de-normalise your data and use natural keys. For CDRs this may mean creating an object holding all CDRs for a subscriber per day. These objects can be named based on the subscriber id and date, making it easy to retrieve data directly by key. It is also often more efficient to retrieve a few larger objects than many small ones and perform filtering in the application rather than try to just get the exact data that is needed. I have described this approach in greater detail here.
The limit to the number of records (or key/value pairs) you can store in Riak is governed only by the size of the hash space: 2^160. According to WolframAlpha, this is the number:
1461501637330902918203684832716283019655932542976
In other words, go nuts. :)

Get an object from a bucket in riak without knowing its key

I am using a riak bucket to store a list of messages, using a UUID as the key and a json message as value. This is working fine.
What I need is an efficient way to get a single message from the bucket without knowing its key, at least in one of these two scenarios:
Get the last inserted object (this is my prefered approach).
Get a random object from the bucket (if the first alternative is not possible).
Is there any efficient way to achieve that?
I think one alternative could be to retrieve the keys in the bucket and then get the first one. But this means making two calls to riak, one to obtain all the keys (just to discard all but one) and a second one to obtain the object. It does not seem very efficient.
As Riak is a key-value store, the by far most efficient way to retrieve data is through the keys. Listing or retrieving all keys in a bucket, even if you only end up using the one returned first, is one of the least efficient operations you can perform as it causes Riak to scan ALL keys in the system (not just the bucket), and it is usually recommended NEVER to use this on a production system.
The most efficient way to get the last inserted object would probably be to store the id in a separate, known record in a different bucket. This would however require you to perform two writes on every insert and two reads for every read, but would do so in the most efficient way. You could possibly implement a post-commit hook (would have to be in Erlang as it is not currently not possible to write records using JavaScript functions) on the bucket containing messages to get the system to perform the update for you, which would remove the need for the last write.
If you write a lot of data to the bucket containing messages, you may want to adjust the separate bucket so that it does not allow multiple values and that the last value wins. This way you would reduce the risk of having lots of siblings created due to frequent updates to this single record across the system. This would always give you one of the last written records, but not necessarily the last one (especially if you frequently write messages to the database), as Riak does not support any type of atomicity and is an eventually consistent database.
You could also create one or more secondary indexes if you are using the leveldb backend, and use this to limit your scan to only recent records, which would be more efficient than a scann of all keys. You could then either select the most recent key or a random one through mapreduce, but this would be much less efficient than the previously described approach.
I can not think of any efficient way to retrieve a random record in a bucket from Riak unless you know the range of keys you have inserted and can decide randomly on the client which one to get. One way to do this would be to generate all keys in sequence rather than using a UUID, but that is naturally not a good idea in a highly concurrent distributed system.
1st task is pretty easy to implement:
Add post-commit hook that will write the last inserted key to some predefined key/bucket place
Get the key from that predefined key/bucket and issue a get query using them
It's still two operations but both are just gets that are fast. Plus additional overhead on hook but nothing too heavy either.
2nd scenario is also easy, but it is way too inefficient to be used practically:
Get all keys (extremely expensive operation)
Pick random
Issue get
I have come up with the same scenario. In My scenario I have to save the users. For that I required an auto increment Id. So what I did is, I placed the last inserted key in a separate bucket as like mentioned by "Christian Dahlqvist", every time I want to insert new record I fetch the last inserted key from that key bucket. Here we have only one value in that bucket with the key as "LastKey" which is always known to us. And I incremented the key based on the fetched key and again updated the key bucket. So always the key bucket contains the latest key in it.

Resources