Local SQLite vs Remote MongoDB - sqlite

I'm designing a new web project and, after studying some options aiming scalability, I came up with two database solutions:
Local SQLite files carefully designed for a scalable fashion (one new database file for each X users, as writes will depend on user content, with no cross-user data dependence);
Remote MongoDB server (like Mongolab), as my host server doesn't serve MongoDB.
I don't trust MySQL server at current shared host, as it cames down very frequently (and I had problems with MySQL on another host, too). For the same reason I'm not goint to use postgres.
Pros of SQLite:
It's local, so it must be faster (I'll take care of using index and transactions properly);
I don't need to worry about tcp sniffing, as Mongo wire protocol is not crypted;
I don't need to worry about server outage, as SQLite is serverless.
Pros of MongoDB:
It's more easily scalable;
I don't need to worry on splitting databases, as scalability seems natural;
I don't need to worry about schema changes, as Mongo is schemaless and SQLite doesn't fully support alter table (specially considering changing many production files, etc.).
I want help to make a decision (and maybe consider a third option). Which one is better when write and read operations is growing?
I'm going to use Ruby.

One major risk of the SQLite approach is that as your requirements to scale increase, you will not be able to (easily) deploy on multiple application servers. You may be able to partition your users into separate servers, but if that server were to go down, you would have some subset of users who could not access their data.
Using MongoDB (or any other centralized service) alleviates this problem, as your web servers are stateless -- they can be added or removed at any time to accommodate web load without having to worry about what data lives where.

Related

Is CouchDb suitable for use on client desktops (Windows 10)?

I want to support the users of my Windows 10 desktop application with:
Local Data (not having to perform a fetch from the cloud every time they want new data)
Offline support
Replication with a cloud database
There can potentially be multiple users (in the order of 10-100 but not 1000) simultaneously editing the same database. I would run CouchDb as a service (ie. in a separate process to my app).
To achieve the above I am considering installing CouchDb on each client desktop PC (all replicating to a single main cloud CouchDb instance) together with my application to achieve the above goals.
One of the reasons I am pursuing this line of thinking is that it would allow my application code to mostly just be written in a manner that interacts with local data and the sync/replication (which is probably quite complex) can be taken care of by CouchDb.
I am using CouchDb as a replacement for something that I often see is done by sqlite but I really want the replication ability of CouchDb (which sqlite does not have).
Is the above a scenario in which I can expect CouchDb to perform well or is there something that I am not considering?
We have multiple clients doing this successfully. This is a recommended use-case for CouchDB. The limiting factor will be the cloud vm configuration, but 100s to 1000s of clients should be no problem on a decent vm setup.
CouchDB is sometimes called "A replication protocol with a database attached to it." This sounds appropriate for your use case. However, installing CouchDB (or any external database/service), particularly as part of a client application, isn't necessarily trivial. That doesn't mean it should be avoided--it's just complexity to consider when making a choice.
You might consider lighter-weight options for your client. PouchDB is the go-to solution for web apps, and often mobile apps, as it syncs with CouchDB, but has much lower resource requirements (at the cost of not being multi-tenant). For a desktop app, that may be appropriate.
Ultimately, whether CouchDB (or PouchDB, or PouchDB Server, etc) is appropriate for your use case depends on which trade-offs you're willing to make.

Method to replicate sqlite database across multiple servers

I'm developing an application that works distributed, and I have a SQLite database that must be shared between distributed servers.
If I'm in serverA, and change sqlite row, this change must be in the other servers instantly, but if a server were offline and then it came online, it must update all info equal other servers.
I'm trying to develop a HA service with small SQLite databases.
I'm thinking on something like MongoDB or ReThinkDB, due to replication works fine and I have got data independently server online I had.
There are a library or other SQL methodology to share data between servers?
I used the Raft consensus protocol to replicate my SQLite database. You can find the system here:
https://github.com/rqlite/rqlite
Here are some options:
LiteReplica:
It supports master-slave replication for SQLite3 databases using a single master (writable node) and one or many replicas (read-only nodes).
If a device went offline and then it came online, the secondary/slave dbs are updated with the primary/master one incrementally.
LiteSync:
It implements multi-master replication so we can write to the db in any node, even when the device is off-line.
On both we open the database using a modified URI, like this:
“file:/path/to/app.db?replica=master&bind=tcp://0.0.0.0:4444”
AergoLite:
Blockchain based, it has the highest level of security. Stores immutable relational data, secured by a distributed consensus with low resource usage.
Disclosure: I am the author of these solutions
You can synchronize SQLite databases by embedding SymmetricDS in your application. It supports occasionally connected clients, so it will capture changes and sync them when a server comes online. It supports several different database platforms and can be used as a library or as a standalone service.
You can also use CopyCat, which support SQLite as well as a few other database types.
Marmot looks good:
https://github.com/maxpert/marmot
From their docs:
What & Why?
Marmot is a distributed SQLite replicator with leaderless, and eventual consistency. It allows you to build a robust replication between your nodes by building on top of fault-tolerant NATS Jetstream. This means if you are running a read heavy website based on SQLite, you should be easily able to scale it out by adding more SQLite replicated nodes. SQLite is probably the most ubiquitous DB that exists almost everywhere, Marmot aims to make it even more ubiquitous for server side applications by building a replication layer on top.

Any real experiences of sharing a sqlite db on a network drive?

I have a client who is interested in hiring my company to do a small, custom, multi-user contact database/crm. They don't care about the technology I use, as long as the product is hosted inside their organization (no "cloud" hosting). Now the problem is that their IT department refuses to host any application developed by an outside company on their servers, additionally they will not allow any server not serviced by them inside of their network.
The only means of sharing data that IT would allow is a windows network share...
I was thinking to do the application as a fat client in Adobe Air, and let all users access a shared sqlite database, but then I read a lot of negative things about this approach.
So I'm asking you - Are there people out there who have actually tried this ?
What are your experiences ?
You can use an MS-Access 2007+ (accdb) file.
Of course there are many database engines with much more features and much better SQL syntax, but if you are looking for a file-based database system that can be accessed simultaneously by multiple processes on a shared Windows drive, then an accdb file is as good as you're going to get I think.
Also note that another popular embedded database, SQL Server Compact Edition, cannot be used on shared drives (at least not by multiple processes from different machines).
References:
Share Access Database on a Network Drive:
http://office.microsoft.com/en-us/access-help/ways-to-share-an-access-database-HA010279159.aspx#BM3
SQL Server CE Cannot be used on a shared drive:
SQLCE 4 - EF4.1 Internal error: Cannot open the shared memory region
The ways sqlite locks databases means you have to be careful if there's a chance you'll have multiple sources trying to access the database. You either have to develop a waiting method, or a timeout, or something

can postgresql scale to the likes of sql server? is it easy to tune?

hoping someone has experience with both sql server and postgresql.
Between the two db's, which one is easier to scale?
Is creating a read only db that mirrors the main db easier/harder than sql server?
Seeing as sql server can get $$, I really want to look into postgresql.
Also, are the db access libraries written well for an asp.net application?
(please no comments on: do you need the scale, worry about scaling later, and don't optimize until you have scaling issues...I just want to learn from a theoretical standpoint thanks!)
Currently, setting up a read-only replica is probably easier with SQL Server. There's a lot of work going on to get hot standby and streaming replication part of the next release, though.
Regarding scaling, people are using PostgreSQL with massive databases. Skype uses PostgreSQL, and Yahoo has something based on PostgreSQL with several petabyte in it.
I've used Postgresql with C# and ASP.Net 2.0 and used the db provider from devart:
http://www.devart.com/dotconnect/postgresql/
The visual designer has a few teething difficulties but the connectivity was fine.
I have only used SQL Server and not much PostgreSQL, so I can only answer for SQL Server.
When scaling out SQL Server you have a couple of options. You can use peer to peer replication between databases, or you can have secondary read-only DB(s) as you mention. The last option is relatively straight-forward to set up using database mirroring or log shipping. Database mirroring also gives you the benefit of automatic switchover on primary DB failure. For an overview, look here:
http://www.microsoft.com/sql/howtobuy/passive-server-failover-support.mspx
http://technet.microsoft.com/en-us/library/cc917680.aspx
http://blogs.technet.com/josebda/archive/2009/04/02/sql-server-2008-database-mirroring.aspx
As for licensing, you only need a license for the standby server if it is actively used for serving queries - you do not need one for a pure standby server.
If you are serious, you can set up a failover cluster with a SAN for storage, but that is not really a load balancing setup in itself.
Here are some links on the general scale up/out topic:
http://www.microsoft.com/sqlserver/2008/en/us/wp-sql-2008-performance-scale.aspx
http://msdn.microsoft.com/en-us/library/aa479364.aspx
ASP.NET Libraries are obviously very well written for SQL Server, but I would believe there exists good alternatives for PostgreSQL as well.

What's the best solution for file storage for a load-balanced ASP.NET app?

We have an ASP.NET file delivery app (internal users upload, external users download) and I'm wondering what the best approach is for distributing files so we don't have a single point of failure by only storing the app's files on one server. We distribute the app's load across multiple front end web servers, meaning for file storage we can't simply store a file locally on the web server.
Our current setup has us pointing at a share on a primary database/file server. Throughout the day we robocopy the contents of the share on the primary server over to the failover. This scneario ensures we have a secondary machine with fairly current data on it but we want to get to the point where we can failover from the primary to the failover and back again without data loss or errors in the front end app. Right now it's a fairly manual process.
Possible solutions include:
Robocopy. Simple, but it doesn't easily allow you to fail over and back again without multiple jobs running all the time (copying data back and forth)
Store the file in a BLOB in SQL Server 2005. I think this could be a performance issue, especially with large files.
Use the FILESTREAM type in SQL Server 2008. We mirror our database so this would seem to be promising. Anyone have any experience with this?
Microsoft's Distributed File System. Seems like overkill from what I've read since we only have 2 servers to manage.
So how do you normally solve this problem and what is the best solution?
Consider a cloud solution like AWS S3. It's pay for what you use, scalable and has high availability.
You need a SAN with RAID. They build these machines for uptime.
This is really an IT question...
When there are a variety of different application types sharing information via the medium of a central database, storing file content directly into the database would generally be a good idea. But it seems you only have one type in your system design - a web application. If it is just the web servers that ever need to access the files, and no other application interfacing with the database, storage in the file system rather than the database is still a better approach in general. Of course it really depends on the intricate requirements of your system.
If you do not perceive DFS as a viable approach, you may wish to consider Failover clustering of your file server tier, whereby your files are stored in an external shared storage (not an expensive SAN, which I believe is overkill for your case since DFS is already out of your reach) connected between Active and Passive file servers. If the active file server goes down, the passive may take over and continue read/writes to the shared storage. Windows 2008 clustering disk driver has been improved over Windows 2003 for this scenario (as per article), which indicates the requirement of a storage solution supporting SCSI-3 (PR) commands.
I agree with Omar Al Zabir on high availability web sites:
Do: Use Storage Area Network (SAN)
Why: Performance, scalability,
reliability and extensibility. SAN is
the ultimate storage solution. SAN is
a giant box running hundreds of disks
inside it. It has many disk
controllers, many data channels, many
cache memories. You have ultimate
flexibility on RAID configuration,
adding as many disks you like in a
RAID, sharing disks in multiple RAID
configurations and so on. SAN has
faster disk controllers, more parallel
processing power and more disk cache
memory than regular controllers that
you put inside a server. So, you get
better disk throughput when you use
SAN over local disks. You can increase
and decrease volumes on-the-fly, while
your app is running and using the
volume. SAN can automatically mirror
disks and upon disk failure, it
automatically brings up the mirrors
disks and reconfigures the RAID.
Full article is at CodeProject.
Because I don't personally have the budget for a SAN right now, I rely on option 1 (ROBOCOPY) from your post. But the files that I'm saving are not unique and can be recreated automatically if they die for some reason so absolute fault-tolerance is necessary in my case.
I suppose it depends on the type of download volume that you would be seeing. I am storing files in a SQL Server 2005 Image column with great success. We don't see heavy demand for these files, so performance is really not that big of an issue in our particular situation.
One of the benefits of storing the files in the database is that it makes disaster recovery a breeze. It also becomes much easier to manage file permissions as we can manage that on the database.
Windows Server has a File Replication Service that I would not recommend. We have used that for some time and it has caused alot of headaches.
DFS is probably the easiest solution to setup, although depending on the reliability of your network this can become un-synchronized at times, which requires you to break the link, and re-sync, which is quite painful to be honest.
Given the above, I would be inclined to use a SQL Server storage solution, as this reduces the complexity of your system, rather then increases it.
Do some tests to see if performance will be an issue first.

Resources