Riak & memory backend: handling expiration and pruning

Riak & memory backend: handling expiration and pruning - riak

I am wondering what the best strategy is to manage expiration of session-related data stored in a memory Riak bucket type.
It seems that this backend supports ttl (http://docs.basho.com/riak/kv/2.2.3/setup/planning/backend/memory/#ttl and http://docs.basho.com/riak/kv/2.2.3/configuring/backend/#memory-backend), however the documentation of the second link states that:
"Once that object's time is up, it will be deleted on the next read of
its key."
What if the object will never be read again? Will it stay in-memory? However I guess it will eventually be destroyed when reaching the memory_backend.max_memory_per_vnode limit.
Is storing the expiration timestamp another relevant option? In this case pruning would be done by a process periodically searching for "old" timestamps:
:riakc_pb_socket.search(pid, "expirable_token", "exp_counter:[* TO 1542468475]")
# then we delete them
I've tested it by storing the timestamp in a counter since it's not possible to range-request registers that are indexed as strings:
iex(34)> :riakc_counter.increment(System.system_time(:second), :riakc_counter.new())
{:counter, 0, 1542468373}
However, I'm not sure counters are designed for storing integers. What's the best practice for storing integers in Riak data types? Custom schema with proper int type declared?

I'd recommend to use a different backend. Bitcask expiration is very efficient and can be configured very precisely. leveldb expiration works well enough.

Related

What could cause a sqlite application to slow down over time with high load?

I'll definitely need to update this based on feedback so I apologize in advance.
The problem I'm trying to solve is roughly this.
The graph shows Disk utilization in the Windows task manager. My sqlite application is a webserver that takes in json requests with timestamps, looks up the existing entry in a 2 column key/value table, merges the request into the existing item (they don't grow over time), and then writes it back to the database.
The db is created as follows. I've experimented with and without WAL without difference.
createStatement().use { it.executeUpdate("CREATE TABLE IF NOT EXISTS items ( key TEXT NOT NULL PRIMARY KEY, value BLOB );") }
The write/set is done as follows
try {
val insertStatement = "INSERT OR REPLACE INTO items (key, value) VALUES (?, ?)"
prepareStatement(insertStatement).use {
it.setBytes(1, keySerializer.serialize(key))
it.setBytes(2, valueSerializer.serialize(value))
it.executeUpdate()
}
commit()
} catch (t: Throwable) {
rollback()
throw t
}
I use a single database connection the entire time which seems to be ok for my use case and greatly improves performance relative to getting a new one for each operation.
val databaseUrl = "jdbc:sqlite:${System.getProperty("java.io.tmpdir")}/$name-map-v2.sqlite"
if (connection?.isClosed == true || connection == null) {
connection = DriverManager.getConnection(databaseUrl)
}
I'm effectively serializing access to the db. I'm pretty sure the default threading mode for the sqlite driver is to serialize and I'm also doing some serializing in kotlin coroutines (via actors).
I'm load testing the application locally and I notice that disk utilization spikes around the one minute mark but I can't determine why. I know that throughput plummets when that happens though. I expect the server to chug along at a more or less constant rate. The db in these tests is pretty small too, hardly reaches 1mb.
Hoping people can recommend some next steps or set me straight as far as performance expectations. I'm assuming there is some sqlite specific thing that happens when throughput is very high for too long, but I would have thought it would be related to WAL or something (which I'm not using).

I have a theory but it's a bit farfetched.
The fact that you hit a performance wall after some time makes me think that either a buffer somewhere is filling up, or some other kind of data accumulation threshold is being reached.
Where exactly the culprit is, I'm not sure.
So, I'd run the following tests.
// At the beginning
connection.setAutoCommit(true);
If the problem is in the driver side of the rollback transaction buffer, then this will slightly (hopefully) slow down operations, "spreading" the impact away from the one-minute mark. Instead of getting fast operations for 59 seconds and then some seconds of full stop, you get not so fast operations the whole time.
In case the problem is further down the line, try
PRAGMA JOURNAL_MODE=MEMORY
PRAGMA SYNCHRONOUS=OFF disables the rollback journal synchronization
(The data will be more at risk in case of a catastrophic powerdown).
Finally, another possibility is that the page translation buffer gets filled after a sufficient number of different keys has been entered. You can test this directly by doing these two tests:
1) pre-fill the database with all the keys in ascending order and a large request, then start updating the same many keys.
2) run the test with only very few keys.
If the slowdown does not occur in the above cases, then it's either TLB buffer management that's not up to the challenge, or database fragmentation is a problem.
It might be the case that issuing
PRAGMA PAGE_SIZE=32768
upon database creation might solve or mitigate the problem. Conversely, PRAGMA PAGE_SIZE=1024 could "spread" the problem avoiding performance bottlenecks.
Another thing to try is closing the database connection and reopening it when it gets older than, say, 30 seconds. If this works, we'll still need to understand why it works (in this case I expect the JDBC driver to be at fault).

First of all, I want to say that I do not use exactly your driver for sqlite, and I use different devices in my work. (but how different are they really?)
From what I see, correct me if im wrong, you use one transaction, for one insert statement. You get request, you use the disc, you use the memory, open, close etc... every time. This can't work fast.
The first thing I do when I have to do inserts in sqlite is to group them, and use a single transaction to do it. That way, you are using your resources in batches.
One transaction, many insert statements, single commit. If there is a problem with a batch, handle the valid separately, log the faulty, move the next batch of requests.

Is DynamoDb UpdateExpression with ADD to increment a counter transactional?

Do I need to use optimistic locking when updating a counter with ADD updateExpression to make sure that all increments from all the clients will be counted?
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_UpdateItem.html#API_UpdateItem_RequestSyntax

I'm not sure if you would still call it a transaction if that is the only thing you are doing in DynamoDB, it is a bit confusing the terminology.
IMO it is more correct to say it is Atomic. You can combine the increment with other changes in DynamoDB with a condition that will mean it won't be written unless that condition is true, but if your only change is the increment then other than hitting capacity limits there won't be any other reason (other than an asteroid hitting a datacenter or something of the like) why your increment would fail. (Unless you put a condition on your request which turns out to be false upon writing). If you have two clients incrementing at the same time, DynamoDB will handle this somebody will get in first.
But let's say you are incrementing a values many many times a second, whereby you may indeed be hitting a DynamoDB capacity limit. Consider batching the increments in a Kinesis Stream, whereby you can set the maximum time the stream should wait upon receiving a value that processing should begin. This will enable you to achieve consistency within x seconds in your aggregation.
But other than extremely high traffic situations you should be fine, and in that case the standard way of approaching that problem is using Streams which is very cost effective, saving you capacity units.

How Java make thread safe in date time API

In previous date time API thread are not safe.... I want to know how they achieved in new date time API in java 8?? (earlear they also can safe the thread by using synchronizing and making seprate instance for each thread ) In java 8 what they add new give some examples also... Thank you.

The SimpleDateFormat that's existed since the early days of Java used inner fields to hold temporary state but didn't do anything to prevent two thread concurrently updating these. This lead to the wrong date being returned if two threads happened to call the format or parse methods on the same SimpleDateFormat instance at the same time, since they'd modify the internal state of the SimpleDateFormat object whilst the other was still using that state.
Java 8 hasn't done anything to change SimpleDateFormat, instead it's introduced a whole new LocalDate API that uses internal synchronization to protect fields being accessed concurrently (and possibly uses local variables to reduce locking overhead, but I've not checked this), as well as removing the complexity of Timezones and pre-1990 dates that were also a headache for users of the old Date APIs.

The thread safety in java.time (the modern Java date and time API introduced from Java 8) is obtained through immutable classes. An immutable object is always thread-safe (see the modification of the last statement near the bottom of the first link). As Holger notes in a comment,
without mutation, there can’t be any inconsistencies.
Links:
Does Immutability Really Mean Thread Safety?
Immutable objects are thread safe, but why?

Is there an efficient way to process data expiration / massive deletion (to free space) with Riak on leveldb?

On Riak :
Is there a way to process data expiration or to dump old data to free some space?
Is it efficient ?
Edit: Thanks to Joe to provide the answer and its workaround (answer down).
Data expiration should be thought from the very beginning as it requires an additional index with a map-reduce algorithme.

Short answer: No, there is no publisher-provided expiry.
Longer answer: Include the write time, in a integer representation like Unix epoch, in a secondary index entry on each value that you want to be subject to expiry. The run a periodic job during off-peak times to do a ranged 2I query to get any entries from 0 to (now - TTL). This could be used as an input to a map/reduce job to do the actual deletes.
As to recovering disk space, leveldb is very slow about that. When a value is written to leveldb it starts in level 0, then as each level fills, compaction moves values to the next level, so your least recently written data resides on disk in the lowest levels. When you delete a value, a tombstone is written to level 0, which masks the previous value in the lower level, and as normal compaction occurs the tombstone is moved down as any other value would be. The disk space consumed by the old value is not reclaimed until the tombstone reaches the same level.

I have written a little c++ tool that uses the leveldb internal function CompactRange to perform this task. Here you can read the article about this.
With this we are able to delete an entire bucket (key by key) and wipe all tombstones. 50Gb of 75Gb are freed!
Unfortunately, this only works if leveldb is used as backend.

What is the difference between Application("Something") and Session("Something")

While debugging a classic ASP application (and learning about classic ASP at the same time) I've encountered the following
Application("Something") = "some value"
and elsewhere in the code this value gets used thus:
someObj.Property = Session("Something")
How does the Application object relate to Session?

A Session variable is linked to a user. An Application variable is shared between all users.
Application is a handy vault for storing things you want to persist but you can't guarantee they'll always be there. So think low-end caching, short-term variable storage, etc.
In this context with these definitions, they have very little to do with each other except that getting and setting variables is roughly the same for each.
Note: there can be concurrency issues when using Application (because you could easily have more than one user hitting something that reads or writes to it) so I suggest you use Application.Lock before you write and Application.Unlock after you're done. This only really applies to writing.
Note 2: I'm not sure if it automatically unlocks after the request is done (that would be sensible) but I wouldn't trust it to. Make sure that any part of the application that could conceivable explode isn't within a lock otherwise you might face locking other users out.
Note 3: In that same vein, don't put things that take a long time to process inside a lock, only the bit where you write the data. If you do something that takes 10 seconds while in a lock, you lock everybody else out.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex