Berkeley DB File Splitting - berkeley-db

Our application uses berkeley db for temporary storage and persistance.A new issue has risen where tremendous data comes in from various input sources.Now the underlying file system does not support such large file sizes.Is there anyway to split the berkeley DB files into logical segments or partitions without losing data inside it.I also need it to set using berkeley DB properties and not cumbersome programming for this simple task.

To my knowledge, BDB does not support this for you. You can however implement it yourself by creating multiple databases.
I did this before with BDB, programatically. i.e. My code partitioned a potentially large index file into seperate files and created a top level master index over those sub files.

Modern BDB has means to add additional directories either using DB_CONFIG (recommended) or with API calls.
See if these directives (and corresponding API calls) help:
add_data_dir
set_create_dir
set_data_dir
set_lg_dir
set_tmp_dir
Note that adding these directives is unlikely to transparently "Just Work", but it shouldn't be too hard to use db_dump/db_load to recreate the database files configured with these directives.

Related

A file storage format for file sharing site

I am implementing a file sharing system in ASP.NET MVC3. I suppose most file sharing sites store files in a standard binary format on a server's file system, right?
I have two options storage wise - a file system, or binary data field in a database.
Is there any advantages in storing files (including large one's) in a database, rather then on file system?
MORE INFO:
Expected average file size is 800 MB. 3 files per minute are to be usually requested to be fed back to the user, who is downloading.
If the files are as big as that, then using the filesystem is almost certainly a better option. Databases are designed to contain relational data grouped into small rows and are optimized for consulting and comparing the values in these relations. Filesystems are optimized for storing fairly large blobs and recalling them by name as a bytestream.
Putting files that big into a database will also make it difficult to manage the space occupied by the database. The tools to query space used in a filesystem, and remove and replace data are better.
The only caveat to using the filesystem is that your application has to run under an account that has the necessary permission to write the (portion of the) filesystem you use to store these files.
Use FileStream when:
Objects that are being stored are, on average, larger than 1 MB.
Fast read access is important.
You are developing applications that use a middle tier for application logic.
Here is MSDN link https://msdn.microsoft.com/en-us/library/gg471497.aspx
How to use it: https://www.simple-talk.com/sql/learn-sql-server/an-introduction-to-sql-server-filestream/

Can you use rsync to replicate block changes in a Berkeley DB file?

I have a Berkeley DB file that is quite large (~1GB) and I'd like to replicate small changes that occur (weekly) to an alternate location without having the entire file be re-written at the target location.
Does rsync properly handle Berkeley DBs by it's block level algo?
Does anyone have an alternative to only have changes be written to the Berkeley DBs files that are targets of replication?
Thanks!
Rsync handles files perfectly, at the block level. The problem with databases can come into play in a number of ways.
Caching
File locking
Synchronization/transaction logs
If you can insure that during the period of the rsync, no applications have the berkeley db open, then rsync should work fine, and offer a significent advantage over copying the entire file. However, depending on the configuration and version of bdb, there are transaction logs. You probably want to investigate the same mechanisms used for backups and hot backups. They also have a "snapshot" feature that might better facilitate a working solution.
You should probably read this carefully: http://www.cs.sunysb.edu/documentation/BerkeleyDB/ref/transapp/archival.html
I'd also recommend you consider using replication as an alternative solution that is blessed by BDB https://idlebox.net/2010/apidocs/db-5.1.19.zip/programmer_reference/rep.html
They now call this High Availabity -> http://www.oracle.com/technetwork/database/berkeleydb/overview/high-availability-099050.html

How enable iCloud support for sqlite?

I want to provide iCloud support for my wrapper around sqlite. Is not using coredata.
I wonder how enable iCloud for it. The database content is changed all the time (is for invoicing). Also, if is possible to have some kind of versioning will be great.
Exist any sample I can use to do this?
The short answer is no, you would need to use Core Data as you suspected. Apple has stated that sqlite is unsupported.
Edit: Check out the section on iCloud that's now in the iOS Application Programming Guide under Using iCloud in Conjunction with Databases
Using iCloud with a SQLite database is possible only if your app uses
Core Data to manage that database. Accessing live database files in
iCloud using the SQLite interfaces is not supported and will likely
corrupt your database. However, you can create a Core Data store based
on SQLite as long as you follow a few extra steps when setting up your
Core Data structures. You can also continue to use other types of Core
Data stores—that is, stores not based on SQLite—without any special
modifications.
You can't just put the SQLite database in the iCloud container, because it might get corrupted. (As you modify an SQLite DB, temporary files are created and renamed, so if the sync process starts copying those files, you'll get a corrupt database.)
If you don't want to move to Core Data, you can do what Core Data does: store your database in your document folder, and store a transaction log in the iCould container. Every time you change the database, you add those changes to a log file, so you can play them back and make equivalent changes on other devices.
This gets pretty complicated: aside from getting the log/reply logic right, you'll want to coalesce redundant changes and periodically collapse the log into a complete copy of the database.
You might have an easier time developing a solution if you can exploit knowledge of your application (Core Data has to solve the problem in the general case). For example, you could save invoices as separate files in the cloud container (text, Property List, XML, JSON, whatever), writing them out as the database changes and only importing ones if the system tells you they were created or changed.
In summary, your choice is either to migrate to Core Data or write a sync solution yourself. Which one is best depends on the particulars of your application.

Is there any value in including SQLite in VCS's

Having an argument with my team. We are developing an application using SQLite and some want to add it to the repo (GIT) and some don't. Previously with RDBMS system there has been no perceived benefit of using VCS on the DB. However SQLite is a self contained file with no external dependencies so i assume, even though it is binary, that a commit of the project code + the SQLite file will give an accurate snapshot of the state of play at that point.
I also assume that a branch and merge would work as well.
Has anyone actually done this and if so does it work?
You'd get more benefit from GIT's versioning facilities if you stored a dump of the SQLite database (i.e. commands required to create it) rather than the database file itself. That way you could look at the history of the dump file and see tables or data being added etc.
Generally speaking, it's preferable to include full set of dependencies in a VCS repository. This makes your life a whole lot simpler.
If you're after versioning DB schema, check out Wizardby.

Temporary in-memory database in SQLite

Is it possible somehow to create in-memory database in SQLite and then destroy it just by some query?
I need to do this for unit testing my db layer. So far I've only worked by creating normal SQLite db file and delete if after all tests, but doing it all in memory would be much better.
So is it possible to instanciate database only in memory without writing anything to disc?
I can't use just transactions, because I want to create whole new database.
Create it with the filename ":memory:": In-Memory Databases.
It'll cease to exist as soon as the connection to it is closed.
As an alternative to in memory databases, you can create a SQLite temporary database by using an empty string for the filename. It will be deleted when the connection is closed. The advantage over an in-memory database is your databases are not limited to available memory.
Alternatively, you can create your database in a temp file and let the operating system clean it up. This has the advantage of being accessible for inspection.
I'd suggest mounting a tmpfs filesystem somewhere (RAM only filesystem) and using that for your unit tests.
Instantiate DB files as normal then blow them away using rm - yet nothing has gone to disk.
(EDIT: Nice - somebody beat me to a correct answer ;) Leaving this here as another option regardless)
I suggest you using lmDisk toolkit.
It's a tool kit to mount a part of ram or image file as normal disk. you can copy your project (or just your db) there.
I've try it to process raw data and create a db for a game ai.

Resources