Is there an efficient way to process data expiration / massive deletion (to free space) with Riak on leveldb? - riak

On Riak :
Is there a way to process data expiration or to dump old data to free some space?
Is it efficient ?
Edit: Thanks to Joe to provide the answer and its workaround (answer down).
Data expiration should be thought from the very beginning as it requires an additional index with a map-reduce algorithme.

Short answer: No, there is no publisher-provided expiry.
Longer answer: Include the write time, in a integer representation like Unix epoch, in a secondary index entry on each value that you want to be subject to expiry. The run a periodic job during off-peak times to do a ranged 2I query to get any entries from 0 to (now - TTL). This could be used as an input to a map/reduce job to do the actual deletes.
As to recovering disk space, leveldb is very slow about that. When a value is written to leveldb it starts in level 0, then as each level fills, compaction moves values to the next level, so your least recently written data resides on disk in the lowest levels. When you delete a value, a tombstone is written to level 0, which masks the previous value in the lower level, and as normal compaction occurs the tombstone is moved down as any other value would be. The disk space consumed by the old value is not reclaimed until the tombstone reaches the same level.

I have written a little c++ tool that uses the leveldb internal function CompactRange to perform this task. Here you can read the article about this.
With this we are able to delete an entire bucket (key by key) and wipe all tombstones. 50Gb of 75Gb are freed!
Unfortunately, this only works if leveldb is used as backend.

Related

What could cause a sqlite application to slow down over time with high load?

I'll definitely need to update this based on feedback so I apologize in advance.
The problem I'm trying to solve is roughly this.
The graph shows Disk utilization in the Windows task manager. My sqlite application is a webserver that takes in json requests with timestamps, looks up the existing entry in a 2 column key/value table, merges the request into the existing item (they don't grow over time), and then writes it back to the database.
The db is created as follows. I've experimented with and without WAL without difference.
createStatement().use { it.executeUpdate("CREATE TABLE IF NOT EXISTS items ( key TEXT NOT NULL PRIMARY KEY, value BLOB );") }
The write/set is done as follows
try {
val insertStatement = "INSERT OR REPLACE INTO items (key, value) VALUES (?, ?)"
prepareStatement(insertStatement).use {
it.setBytes(1, keySerializer.serialize(key))
it.setBytes(2, valueSerializer.serialize(value))
it.executeUpdate()
}
commit()
} catch (t: Throwable) {
rollback()
throw t
}
I use a single database connection the entire time which seems to be ok for my use case and greatly improves performance relative to getting a new one for each operation.
val databaseUrl = "jdbc:sqlite:${System.getProperty("java.io.tmpdir")}/$name-map-v2.sqlite"
if (connection?.isClosed == true || connection == null) {
connection = DriverManager.getConnection(databaseUrl)
}
I'm effectively serializing access to the db. I'm pretty sure the default threading mode for the sqlite driver is to serialize and I'm also doing some serializing in kotlin coroutines (via actors).
I'm load testing the application locally and I notice that disk utilization spikes around the one minute mark but I can't determine why. I know that throughput plummets when that happens though. I expect the server to chug along at a more or less constant rate. The db in these tests is pretty small too, hardly reaches 1mb.
Hoping people can recommend some next steps or set me straight as far as performance expectations. I'm assuming there is some sqlite specific thing that happens when throughput is very high for too long, but I would have thought it would be related to WAL or something (which I'm not using).
I have a theory but it's a bit farfetched.
The fact that you hit a performance wall after some time makes me think that either a buffer somewhere is filling up, or some other kind of data accumulation threshold is being reached.
Where exactly the culprit is, I'm not sure.
So, I'd run the following tests.
// At the beginning
connection.setAutoCommit(true);
If the problem is in the driver side of the rollback transaction buffer, then this will slightly (hopefully) slow down operations, "spreading" the impact away from the one-minute mark. Instead of getting fast operations for 59 seconds and then some seconds of full stop, you get not so fast operations the whole time.
In case the problem is further down the line, try
PRAGMA JOURNAL_MODE=MEMORY
PRAGMA SYNCHRONOUS=OFF disables the rollback journal synchronization
(The data will be more at risk in case of a catastrophic powerdown).
Finally, another possibility is that the page translation buffer gets filled after a sufficient number of different keys has been entered. You can test this directly by doing these two tests:
1) pre-fill the database with all the keys in ascending order and a large request, then start updating the same many keys.
2) run the test with only very few keys.
If the slowdown does not occur in the above cases, then it's either TLB buffer management that's not up to the challenge, or database fragmentation is a problem.
It might be the case that issuing
PRAGMA PAGE_SIZE=32768
upon database creation might solve or mitigate the problem. Conversely, PRAGMA PAGE_SIZE=1024 could "spread" the problem avoiding performance bottlenecks.
Another thing to try is closing the database connection and reopening it when it gets older than, say, 30 seconds. If this works, we'll still need to understand why it works (in this case I expect the JDBC driver to be at fault).
First of all, I want to say that I do not use exactly your driver for sqlite, and I use different devices in my work. (but how different are they really?)
From what I see, correct me if im wrong, you use one transaction, for one insert statement. You get request, you use the disc, you use the memory, open, close etc... every time. This can't work fast.
The first thing I do when I have to do inserts in sqlite is to group them, and use a single transaction to do it. That way, you are using your resources in batches.
One transaction, many insert statements, single commit. If there is a problem with a batch, handle the valid separately, log the faulty, move the next batch of requests.

Is DynamoDb UpdateExpression with ADD to increment a counter transactional?

Do I need to use optimistic locking when updating a counter with ADD updateExpression to make sure that all increments from all the clients will be counted?
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_UpdateItem.html#API_UpdateItem_RequestSyntax
I'm not sure if you would still call it a transaction if that is the only thing you are doing in DynamoDB, it is a bit confusing the terminology.
IMO it is more correct to say it is Atomic. You can combine the increment with other changes in DynamoDB with a condition that will mean it won't be written unless that condition is true, but if your only change is the increment then other than hitting capacity limits there won't be any other reason (other than an asteroid hitting a datacenter or something of the like) why your increment would fail. (Unless you put a condition on your request which turns out to be false upon writing). If you have two clients incrementing at the same time, DynamoDB will handle this somebody will get in first.
But let's say you are incrementing a values many many times a second, whereby you may indeed be hitting a DynamoDB capacity limit. Consider batching the increments in a Kinesis Stream, whereby you can set the maximum time the stream should wait upon receiving a value that processing should begin. This will enable you to achieve consistency within x seconds in your aggregation.
But other than extremely high traffic situations you should be fine, and in that case the standard way of approaching that problem is using Streams which is very cost effective, saving you capacity units.

minimizing occurrence of gaps during sequence generation

I know that sequence does not guarantee absence of gaps, but I want to minimize their occurrence, so they will occur only in exceptional situations (preferably only when transaction rolls back).
I have several nodes in RAC which may concurrently access sequence.
create sequence seq_1 start with 1 order; # this seems to return numbers without gaps, but what will happen when database is restarted? will cached elements be dropped?
create sequence seq_2 start with 1 nocache; # this one also seems to return numbers in order without gaps, but I heard some objections about using nocache as it hinders performance
create sequence seq_3 start with 1 nocache order; # any improvements over previous two?
So which one is better?
As an alternative I could use a table for storing sequence number, but currently I want to consider sequence based solution rather than table based.
Thanks.
For your 1st statement, if the DB is restarted, NOCACHE is not specified so it would default to 20, so for sure you will loose numbers. But there is no point in worrying about losing the numbers, since rollback, shutdown will definitely "lose" a number (As you rightly said).
ASKTOM Quote: "If you have CACHE = NOCACHE, you will of course not "lose" any, you don't have any cached to lose. If you pin a cached sequence, you'll lose some on shutdown but not otherwise. SEQUENCES are not gap free under ANY circumstance -- EVER. They are 100% assured to have a gap at
some point. 100%"
Using ORDER is only to guarantee ordered generation for RAC. If you are using exclusive mode, then sequence numbers are always generated in order. Since NOORDER is the default, go for ORDER keyword.
If you omit both CACHE and NOCACHE, then the database caches 20 sequence numbers by default. Oracle recommends using the CACHE setting to enhance performance if you are using sequences in an Oracle Real Application Clusters environment.
Go for NOCYLCE if you want to manage it your way.
Using the CACHE and NOORDER options together results in the best performance for a sequence.
CACHE option is used without the ORDER option, each instance caches a separate range of numbers and sequence numbers may be assigned out of order by the different instances.
CACHE option causes each instance to cache its own range of numbers, thus reducing I/O to the Oracle Data Dictionary, and the NOORDER option eliminates message traffic over the interconnect to coordinate the sequential allocation of numbers across all instances of the database.
NOCACHE will be SLOW...
Read this
My suggestion would be a temp table to hold the SEQNAME, STARTVAL, ENDVAL, CURRVAL as columns and use them as CURRVAL+1 and update the latest. -- For strict numbering and can have better control, but reinventing the wheel.
If you still need to stick with sequences, then my suggestion would be NOCACHE, ORDER, NOCYCLE.

Can you sacrifice performance to get concurrency in Sqlite on a NFS?

I need to write a client/server app stored on a network file system. I am quite aware that this is a no-no, but was wondering if I could sacrifice performance (Hermes: "And this time I mean really slash.") to prevent data corruption.
I'm thinking something along the lines of:
Create a separate file in the system everytime a write is called (I'm willing do it for every connection if necessary)
Store the file name as the current millisecond timestamp
Check to see if the file with that time or earlier exists
If the same one exists wait a random time between 0 to 10 ms, and try again.
While file is the earliest timestamp, do work, delete file lock, otherwise wait 10ms and try again.
If a file persists for more than a minute, log as an error, stop until it is determined that the data is not corrupted by a person.
The problem I see is trying to maintain the previous state if something locks up. Or choosing to ignore it, if the state change was actually successful.
Is there a better way of doing this, that doesn't involve not doing it this way? Or has anyone written one of these with a lot less problems than the Sqlite FAQ warns about? Will these mitigations even factor in to preventing data corruption?
A couple of notes:
This must exist on an NSF, the why is not important because it is not my decision to make (it doesn't look like I was clear enough on that point).
The number of readers/writers on the system will be between 5 and 10 all reading and writing at the same time, but rarely on the same record.
There will only be clients and a shared memory space, there is no way to put a server on there, or use a server based RDMS, if there was, obviously I would do it in a New York minute.
The amount of data will initially start off at about 70 MB (plain text, uncompressed), it will grown continuous from there at a reasonable, but not tremendous rate.
I will accept an answer of "No, you can't gain reasonably guaranteed concurrency on an NFS by sacrificing performance" if it contains a detailed and reasonable explanation of why.
Yes, there is a better way. Don't use NFS to do this.
If you are willing to create a new file every time something changes, I expect that you have a small amount of data and/or very infrequent changes. If the data is small, why use SQLite at all? Why not just have files with node names and timestamps?
I think it would help if you described the real problem you are trying to solve a bit more. For example if you have many readers and one writer, there are other approaches.
What do you mean by "concurrency"? Do you actually mean "multiple readers/multiple writers", or can you get by with "multiple readers/one writer with limited latency"?

Most bandwidth efficient unidirectional synchronise (server to multiple clients)

What is the most bandwidth efficient way to unidirectionally synchronise a list of data from one server to many clients?
I have sizeable chunk of data (perhaps 20,000, 50-byte records) which I need to periodically synchronise to a series of clients over the Internet (perhaps 10,000 clients). Records may added, removed or updated only at the server end.
Something similar to bittorrent? Or even using bittorrent. Or maybe invent a wrapper around bittorrent.
(Assuming you pay for bandwidth on your server and not the others ...)
Ok, so we've got some detail now - perhaps 10 GB of total (uncompressed) data, every 3 days, so that's 100 GB per month.
That's actually not really a sizeable chunk of data these days. Whose bandwidth are you trying to save - yours, or your clients'?
Does the data perhaps compress very readily? For raw binary data it's not uncommon to achieve 50% compression, and if the data happens to have a lot of repeated patterns within it then 80%+ is possible.
That said, if you really do need a system that can just transfer the changes, my thoughts are:
make sure you've got a well defined primary key field - use that as your key to identify each record
record a timestamp for each record to say when it last changed
have each client tell you the timestamp of the last change it knows of, so you can calculate the deltas
ensure that full downloads are possible too, in case clients get out of sync

Resources