What happens when rocksdb can't flush a memtable to a sst file? - rocksdb

Our service (which uses rockdb) was out of disk space for approximately 30 minutes.
We manually deleted some files which freed-up 650MiB.
However, even with those free 650MiB, rocksdb kept complaining:
IO error: Failed to pwrite for:
/path/to/database/454250.sst: There is not enough space on the disk.
Is it possible that the memtable got so big that it needed more than 650MB of disk space?
Looking at other sst files in the database folder, they don't take up more than ~40MiB.
If not, what other reasons could there be for these error messages?

There are two cases where this can happen,
1) Rocksdb persists the in-memory data via WAL files and it gets removed when the memtable is flushed. When you have multiple column families, where some of them have higher insert rates (memtable fills faster) and others have lower rates, the .log files (rocksdb WAL files) cannot be removed. This is because, the wal files contain transactions from all column families and it cannot be removed until all the column families have been persisted via a flush.
This might lead to stagnant .log files resulting in Disk space issues.
2) Assume, the memtable size is configured to be 1GB and the number to memtables to merge is 3,
you actually wait for 3 memtables to fill and then the flush gets triggerred. Even if you had configured the target file size (since you had mentioned your SSTs are around 40MB) to 50MB, you will generate 185 SSTs each of size 50MB totalling 3GB.
But the space you have is around 650MB which might be a problem.
There are various options that influence the flush behaviour in rocksdb. you can take a look at,
write_buffer_size - Size of each memtable.
min_write_buffer_number_to_merge - Number of memtables to merge during a flush or in other words, when the immutable memtable count becomes equal to this value, do a flush to disk.
target_file_size_base - Size of the resulting SSTs from compaction or Flush.
target_file_size_multiplier - Decides the Size of SSTs in each level.
You can also take a look at SST compression techniques. Let me know if it helps.

Related

Freeing up space in Cosmos db

I have a cosmos db collection over 41GB in size. In it one partition key was overrepresented with about 17GB of data. I am now running a a program that is going through all the documents with that partition key and removing some unnecessary fields from each document, which should reduce every affected document with about 70%. I'm doing this because data size per partition key cannot exceed 20GB.
When the run is now half way through I can see that index size is decreasing but the data size seems unaffected. Is this the same as the .mdf file in Sql server reserving empty space or is there just some delay in the statistics?
To give you an idea what to expect. I've done roughly the same while also changing some property names in the process and here's what my graph looks like after more than a month with no significant changes to the data afterwards. You can disregard the single point spikes. I think it sometimes misses a physical partition or counts one twice.
In my situation I see no change at all in index size while the data size seems to move all over the place. I'm running with minimal RU so every time the size suddenly increases the RU is automatically scaled up without notification.

SQLite file grows in size after pruning old records

I have a SQLite database my application uses to store some data. It can get very large (a few GBs in size), it has only 3 columns: an auto incrementing counter, a UUID, and a binary BLOB. My application keeps track of how many rows are in the database and removes the oldest ones (based on the increment) when it has exceeded the row limit.
On startup I also run VACUUM to compress the database in case the row limit has changed and the space in the database is mostly free allocated space.
My understanding is that a DELETE command will simply mark the deleted pages as "free pages" which can be written over again. Despite this, I see that the file size is continuing to grow (albeit slower) when inserting new rows after the row limit has been reached. Is this due to fragmentation of the free pages? Can I expect this fragmentation accretion to stop after a long enough time has passed? My application is intended to run uninterrupted for a very long time and if the file size increases on every INSERT the hard drive of the machine will fill up.

H2 DB persistence.mv.db file size increases even on data cleanup in CordApp

Persistence.mv.db size increases even on wiping out old data. And after size increases more than 71 Mb it gives handshake timeout(netty connection). Nodes stop responding to REST services.
We have cleared data from tables like NODE_MESSAGE_IDS, NODE_OUR_KEY_PAIRS, due to large number of hoping between six nodes. And generation of temporary key pairs for a session. And similarly many other tables, e.g. node_transactions, even after clearing them, size increases.
And also when we declare:
val session = serviceHub.jdbcSession()
"session.autoCommit is false" everytime. Also I tries to set its value to true, and execute sql queries.But it did not decrease database size.
This is in reference to the same project. We solved pagination issue by removing the data from tables but DB size still increases. So it is not completely solved:-
Buffer overflow issue when rows in vault is more than 200
There might be issue with your flows, as the node is doing a lot of checkpointing.
Besides that I cannot think of any other scenarios to cause the database to constantly growing.

Writing to a file, confused about speed results

First, this is done on a unix-based system. I was asked to simulate writes to a 128MB file, using different methods. First writing to an aligned random location (lseek to a location that is a multiple of the write size) in the file, with changing write sizes (1MB, 256KB,64KB,16KB,4KB), and keep writing until 128MB are written. Then to perform the same operation but with the O_DIRECT flag, and lastly perform it again with unaligned location without the flag.
The results I got were that for the aligned + O_DIRECT writes, decreasing the write size drastically reduces performance, which is understandable because with O_DIRECT it will access the disk directly more with smaller write sizes.
With the unaligned writes, reducing write sizes again reduces performance which makes sense because we have more writes done to different locations in the file, and then when the cache is flushed, during the physical writing, the disk might go back and forth between sectors to write the data. More writes, more going back and forth.
The results that confused me were for the aligned writes without the O_DIRECT flag. This time for smaller write sizes, the performance drastically increased, with 32,678 4KB writes being a lot faster than 128 1MB writes. And I can't figure out what would cause such an increase. I should also add the unix computer i'm working with is a remote one, and I do not know its specs, i.e what hdd it has, whats its sector size and so on. I guess I should also add that the code to write unaligned and aligned is exactly the same, the only change is the location that lseek receives.
One thing I did think about, if the physical sector size is say, 4KB, and the PC has enough RAM, all the write operations can be cached in pages and then flushed. Then if the disk goes from the smallest address to the biggest, it can go sector by sector forward because its aligned, which might mean it can write everything in one go.

Best way to store 100,000+ CSV text files on server?

We have an application which will need to store thousands of fairly small CSV files. 100,000+ and growing annually by the same amount. Each file contains around 20-80KB of vehicle tracking data. Each data set (or file) represents a single vehicle journey.
We are currently storing this information in SQL Server, but the size of the database is getting a little unwieldy and we only ever need to access the journey data one file at time (so the need to query it in bulk or otherwise store in a relational database is not needed). The performance of the database is degrading as we add more tracks, due to the time taken to rebuild or update indexes when inserting or deleting data.
There are 3 options we are considering:
We could use the FILESTREAM feature of SQL to externalise the data into files, but I've not used this feature before. Would Filestream still result in one physical file per database object (blob)?
Alternatively, we could store the files individually on disk. There
could end being half a million of them after 3+ years. Will the
NTFS file system cope OK with this amount?
If lots of files is a problem, should we consider grouping the datasets/files into a small database (one peruser) so that each user? Is there a very lightweight database like SQLite that can store files?
One further point: the data is highly compressible. Zipping the files reduces them to only 10% of their original size. I would like to utilise compression if possible to minimise disk space used and backup size.
I have a few thoughts, and this is very subjective, so your mileage ond other readers' mileage may vary, but hopefully it will still get the ball rolling for you even if other folks want to put differing points of view...
Firstly, I have seen performance issues with folders containing too many files. One project got around this by creating 256 directories, called 00, 01, 02... fd, fe, ff and inside each one of those a further 256 directories with the same naming convention. That potentially divides your 500,000 files across 65,536 directories giving you only a few in each - if you use a good hash/random generator to spread them out. Also, the filenames are pretty short to store in your database - e.g. 32/af/file-xyz.csv. Doubtless someone will bite my head off, but I feel 10,000 files in one directory is plenty to be going on with.
Secondly, 100,000 files of 80kB amounts to 8GB of data which is really not very big these days - a small USB flash drive in fact - so I think any arguments about compression are not that valid - storage is cheap. What could be important though, is backup. If you have 500,000 files you have lots of 'inodes' to traverse and I think the statistic used to be that many backup products can only traverse 50-100 'inodes' per second - so you are going to be waiting a very long time. Depending on the downtime you can tolerate, it may be better to take the system offline and back up from the raw, block device - at say 100MB/s you can back up 8GB in 80 seconds and I can't imagine a traditional, file-based backup can get close to that. Alternatives may be a filesysten that permits snapshots and then you can backup from a snapshot. Or a mirrored filesystem which permits you to split the mirror, backup from one copy and then rejoin the mirror.
As I said, pretty subjective and I am sure others will have other ideas.
I work on an application that uses a hybrid approach, primarily because we wanted our application to be able to work (in small installations) in freebie versions of SQL Server...and the file load would have thrown us over the top quickly. We have gobs of files - tens of millions in large installations.
We considered the same scenarios you've enumerated, but what we eventually decided to do was to have a series of moderately large (2gb) memory mapped files that contain the would-be files as opaque blobs. Then, in the database, the blobs are keyed by blob-id (a sha1 hash of the uncompressed blob), and have fields for the container-file-id, offset, length, and uncompressed-length. There's also a "published" flag in the blob-referencing table. Because the hash faithfully represents the content, a blob is only ever written once. Modified files produce new hashes, and they're written to new locations in the blob store.
In our case, the blobs weren't consistently text files - in fact, they're chunks of files of all types. Big files are broken up with a rolling-hash function into roughly 64k chunks. We attempt to compress each blob with lz4 compression (which is way fast compression - and aborts quickly on effectively-incompressible data).
This approach works really well, but isn't lightly recommended. It can get complicated. For example, grooming the container files in the face of deleted content. For this, we chose to use sparse files and just tell NTFS the extents of deleted blobs. Transactional demands are more complicated.
All of the goop for db-to-blob-store is c# with a little interop for the memory-mapped files. Your scenario sounds similar, but somewhat less demanding. I suspect you could get away without the memory-mapped I/O complications.

Resources