RocksDB iterate in lexicographic order - rocksdb

Does RocksDB guarantee retrieval of keys in lexicographic order when using the default comparator?
And what if data is divided into multiple SST files, does it still retrieve it in the correct order?
I did try a simple expirement with a small set of data and it worked as expected. However, i dont know if it still works when i have terrabytes of data and many sst files.

Yes it is guaranteed in all situations

Related

Read all data from a table of 10k rows in a single request

Could there be problems in reading all the data of a 10k rows table in a single request?
It would be a read only request.
I would like to do it because I want to perform some queries on the array, and from the documentation I can’t find a way to do it directly with Pact.
No there shouldn't be. Read only queries are "free" atm.
You can do it in two ways
Do a select query which will always evaluate true
Get all the keys (i.e. unique ids in the table) via (keys your-table-name) and then have a separate method which returns data for a list of ids.
But do consider using select statements to help filter out your data during the query as this could be easier than you doing it yourself.
Pact will check arrays like any other property, but you should ask yourself the question - do you need to test all 10k records or just a representative sample of them (the answer should in most cases be the latter).
You should also consider:
Do you need to exact match? (if so, the consumer and provider must have exactly the same data - not recommended)
Can you use matchers to check the shape of the items in the array

DynamoDB top item per partition

We are new to DynamoDB and struggling with what seems like it would be a simple task.
It is not actually related to stocks (it's about recording machine results over time) but the stock example is the simplest I can think of that illustrates the goal and problems we're facing.
The two query scenarios are:
All historical values of given stock symbol <= We think we have this figured out
The latest value of all stock symbols <= We do not have a good solution here!
Assume that updates are not synchronized, e.g. the moment of the last update record for TSLA maybe different than for AMZN.
The 3 attributes are just { Symbol, Moment, Value }. We could make the hash_key Symbol, range_key Moment, and believe we could achieve the first query easily/efficiently.
We also assume could get the latest value for a single, specified Symbol following https://stackoverflow.com/a/12008398
The SQL solution for getting the latest value for each Symbol would look a lot like https://stackoverflow.com/a/6841644
But... we can't come up with anything efficient for DynamoDB.
Is it possible to do this without either retrieving everything or making multiple round trips?
The best idea we have so far is to somehow use update triggers or streams to track the latest record per Symbol and essentially keep that cached. That could be in a separate table or the same table with extra info like a column IsLatestForMachineKey (effectively a bool). With every insert, you'd grab the one where IsLatestForMachineKey=1, compare the Moment and if the insertion is newer, set the new one to 1 and the older one to 0.
This is starting to feel complicated enough that I question whether we're taking the right approach at all, or maybe DynamoDB itself is a bad fit for this, even though the use case seems so simple and common.
There is a way that is fairly straightforward, in my opinion.
Rather than using a GSI, just use two tables with (almost) the exact same schema. The hash key of both should be symbol. They should both have moment and value. Pick one of the tables to be stocks-current and the other to be stocks-historical. stocks-current has no range key. stocks-historical uses moment as a range key.
Whenever you write an item, write it to both tables. If you need strong consistency between the two tables, use the TransactWriteItems api.
If your data might arrive out of order, you can add a ConditionExpression to prevent newer data in stocks-current from being overwritten by out of order data.
The read operations are pretty straightforward, but I’ll state them anyway. To get the latest value for everything, scan the stocks-current table. To get historical data for a stock, query the stocks-historical table with no range key condition.

why is Sqlite checksum not same after reversing edits?

obviously editing any column value will change the checksum.
but saving the original value back will not return the file to the original checksum.
I ran VACUUM before and after so it isn't due to buffer size.
I don't have any indexes referencing the column and rows are not added or removed so pk index shouldn't need to change either.
I tried turning off the rollback journal, but that is a separate file so I'm not surprised it had no effect.
I'm not aware of an internal log or modified dates to explain why the same content does not produce the same file bytes.
Looking for insight on what is happening inside the file to explain this and if there is a way to make it behave(I don't see a relevant PRAGMA).
granted https://sqlite.org/dbhash.html exists to work around this problem but I don't see any of these conditions being triggered "... and so forth" is a pretty vague cause
Database files contain (the equivalent of) a timestamp of the last modification so that other processes can detect that the data has changed.
There are many other things that can change in a database file (e.g., the order of pages, the B-tree structure, random data in unused parts) without a difference in the data as seen at the SQL level.
If you want to compare databases at the SQL level, you have to compare a canonical SQL representation of that data, such as the .dump output, or use a specialized tool such as dbhash.

How to Combine multiple files in BizTalk?

I have multiple flatfiles (CSV) (with multiple records) where files will be received randomly. I have to combine them (records) with unique ID fields.
How can I combine them, if there is no common unique field for all files, and I don't know which one will be received first?
Here are some files examples:
In real there are 16 files.
Fields and records are much more then in this example.
I would avoid trying to do this purely in XSLT/BizTalk orchestrations/C# code. These are fairly simple flat files. Load them into SQL, and create a view to join your data up.
You can still use BizTalk to pickup/load the files. You can also still use BizTalk to execute the view or procedure that joins the data up and sends your final message.
There are a few questions that might help guide how this would work here:
When do you want to join the data together? What triggers that (a time of day, a certain number of messages received, a certain type of message, a particular record, etc)? How will BizTalk know when it's received enough/the right data to join?
What does a canonical version of this data look like? Does all of the data from all of these files truly get correlated into one entity (e.g. a "Trade" or a "Transfer" etc.)?
I'd probably start with defining my canonical entity, and then look towards the path of getting a "complete" picture of that canonical entity by using SQL for this kind of case.

Multi-row atomicity-consistency with Riak?

Let me get to my example:
For the ID=>values 0=>87, 1=>24, 2=>82, 3=>123, 4=>34, 5=>61,
increment all values for keys between 1 and 4 by 10
For a multi-row operation like this, does Riak offer atomicity; ie this operation either fails or succeeds, without dirtying the data partially?
Do queries aggregating on the rows when they are updating see consistent results?
I saw no place which dealt with this question explicitly. But I guess the "tunable CAP" controls set to "enable consistency and partition tolerance" seems like the key.
No.
Riak has no concept of atomicity overall (it's an eventually consistent system), and also does not have any concept of a "transaction" where multiple K/V pairs can be modified or read as a set.

Resources