sqlite over network share for failover - sqlite

As a follow-up of this question: sqlite-over-a-network-share
If I put the SQlite DB on a network share, but will not access it concurrently from different machines. I only have the SQLite db stored on a share so a cluster of failover computers can take over where one machine left off.
Are there any inherent problems with that approach?

Interested in knowing your experiences (After 5 years). Per Eric Grange's helpful hint:
"SQLite uses POSIX advisory locks to implement locking on Unix"... "POSIX advisory locking is known to be buggy or even unimplemented on many NFS implementations" ... "Your best defense is to not use SQLite for files on a network filesystem."
Having said that, if your NFS server is rock-solid (ie, NetApp) and your clients are rock-solid (ie, probably not Linux; see for instance http://nfsworld.blogspot.co.at/2006/10/review-of-why-nfs-sucks-paper-from.html).
POSIX advisory locking over NFS is also impelmentation-dependent: From the File locking Wikipedia article: "On Linux prior to 2.6.12, flock calls on NFS files would act only locally. Kernel 2.6.12 and above implement flock calls on NFS files using POSIX byte-range locks. These locks will be visible to other NFS clients that implement fcntl-style POSIX locks, but invisible to those that do not." If there's doubt, you can use nfstrace to determine what your OS is trying to do.
What happens if node A has begun a transaction, locked the table-file, then crashed? Will node B see the advisory lock and refuse to write to the file?

Related

Are Google Cloud Disks OK to use with SQLite?

Google Cloud disks are network disks that behave like local disks. SQLite expects a local disk so that locking and transactions work correctly.
A. Is it safe to use Google Cloud disks for SQLite?
B. Do they support the right locking mechanisms? How is this done over the network?
C. How does disk IOP's and Throughput relate to SQLite performance? If I have a 1GB SQLite file with queries that take 40ms to complete locally, how many IOP's would this use? Which disk performance should I choose between (standard, balanced, SSD)?
Thanks.
Related
https://cloud.google.com/compute/docs/disks#pdspecs
Persistent disks are durable network storage devices that your instances can access like physical disks
https://www.sqlite.org/draft/useovernet.html
the SQLite library is not tested in across-a-network scenarios, nor is that reasonably possible. Hence, use of a remote database is done at the user's risk.
Yeah, the article you referenced, essentially stipulates that since the reads and writes are "simplified", at the OS level, they can be unpredictable resulting in "loss in translation" issues when going local-network-remote.
They also point out, it may very well work totally fine in testing and perhaps in production for a time, but there are known side effects which are hard to detect and mitigate against -- so its a slight gamble.
Again the implementation they are describing is not Google Cloud Disk, but rather simply stated as a remote networked arrangement.
My point is more that Google Cloud Disk may be more "virtual" rather than purely networked attached storage... to my mind that would be where to look, and evaluate it from there.
Checkout this thread for some additional insight into the issues, https://serverfault.com/questions/823532/sqlite-on-google-cloud-persistent-disk
Additionally, I was looking around and I found this thread, where one poster suggest using SQLite as a read-only asset, then deploying updates in a far more controlled process.
https://news.ycombinator.com/item?id=26441125
the persistend disk acts like a normal disk in your vm. and is only accessable to one vm at a time.
so it's safe to use, you won't lose any data.
For the performance part. you just have to test it. for your specific workload. if you have plenty of spare ram, and your database is read heavy, and seldom writes. the whole database will be cached by the os (linux) disk cache. so it will be crazy fast. even on hdd storage.
but if you are low on spare ram. than the database won't be in the os cache. and writes are always synced to disk. and that causes lots of I/O operations.
in that case use the highest performing disk you can / are willing to afford.

percona xtrabackup incremental backup vs replication

I was playing with percona xtrabackup innobackupex for incremental backups. It is a cool tool and very efficient and effective for incremental backups. However, i could not help but wonder why doing incremental backups would be any better than just doing a regular mysql master-slave replication, and whenever needed to retrieve point-in-time data, just use the binary log?
What advantages would doing incremental backups have over doing master-slave replication? When should you choose to use over the other?
One disadvantage to using master-slave replication as a backup is that accidentally running data damaging commands like
DROP TABLE users;
would replicate to the slave.
They are solutions to two different problems; master-slave is redundancy and backup is resilience.
The MySQL JDBC driver has the ability to connect to many servers. If you look at the driver options (https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-url-format.html) you will notice that the host option is not only host, but hosts. If you specify the URL to both the master and the slave and something happens to the master the driver will automatically connect to the slave instead.
Backup, on the other hand, is, as was mentioned earlier, a way to recover from either a catastrophic crash (having your backups stored off-site is a must) or recover from a catastrophic mistake -- neither of which is served by a master-slave setup. (Well, technically you could have the slave at a different site but that still does not cover the mistake scenario)

SqLite3 NFS mount issue with locking - can I use something like CIFS nobrl?

I'm having a locking problem where an SQLITE3 databse is permanently locked when created on an NFS file system. I have read that an option called nobrl can help this issue when the file system in question is CIFS. (its an option to the mount command).
From: http://linux.die.net/man/8/mount.cifs
nobrl
Do not send byte range lock requests to the server. This is
necessary for certain applications that break with cifs style
mandatory byte range locks (and most cifs servers do not yet support
requesting advisory byte range locks).
Is there any way to stop byte-range-lock requests in NFS if they occur, or am I running in the wrong direction by even thinking about this? I'm happy to change the mount command as was done for the CIFS solution.
I recommend to open you sqlite db by software with nolock parameter enabled, golang exg.:
sql.Open("sqlite3", "file:/media/R/Databases//your.db?nolock=1")
while /media/R is a mounted windows nfs-network-drive. Be carefull because you have to lock your db interactions by software otherwise you could corrupt your db, when accessing it simultaneously.
You can read more about sqlite parameters here:
https://www.sqlite.org/c3ref/open.html

locked s3db-journal

a few days we had a strange error with sqlite. We use a sqlite database on a network share with several computers accessing it. Our client reported, that the database is gone. A quick overview showed, that the database was still there but no computer could access it. It also showed a s3db-journal file indicating that someone is/was accessing the db when something happened. The thing that is strange - the s3db-journal file was locked by the file system (we could not copy/delete it). After restarting all applications, the locked file disappeared as it should be.
How does this happen? We would like to deduct somehow how our client got into this situation. We know, that there was a corrupt network cabeling to one of the computers.
Thank you for your help.
Tobias
To clarify this: several = up to 10 computer
From the "Appropriate uses for SQLite" page:
If you have many client programs accessing a common database over a network, you should consider using a client/server database engine instead of SQLite. SQLite will work over a network filesystem, but because of the latency associated with most network filesystems, performance will not be great. Also, the file locking logic of many network filesystems implementation contains bugs (on both Unix and Windows). If file locking does not work like it should, it might be possible for two or more client programs to modify the same part of the same database at the same time, resulting in database corruption. Because this problem results from bugs in the underlying filesystem implementation, there is nothing SQLite can do to prevent it.
A good rule of thumb is that you should avoid using SQLite in situations where the same database will be accessed simultaneously from many computers over a network filesystem.
It very well might be a bug in the network filesystem you're using. Either way, the SQLite developers explicitly recommend against using databases on network filesystems.
The issue is resolved. The database-component (zeos) threw an exception and we tried a rollback. Due to the way the component was designed, this is only allowed when you started a transaction. If you don't you get the locked s3db-journal file.
In the end we learned 2 things: never rollback when you did not start a transaction, second - there is a function InTransaction from zeos for that.

What's the best solution for file storage for a load-balanced ASP.NET app?

We have an ASP.NET file delivery app (internal users upload, external users download) and I'm wondering what the best approach is for distributing files so we don't have a single point of failure by only storing the app's files on one server. We distribute the app's load across multiple front end web servers, meaning for file storage we can't simply store a file locally on the web server.
Our current setup has us pointing at a share on a primary database/file server. Throughout the day we robocopy the contents of the share on the primary server over to the failover. This scneario ensures we have a secondary machine with fairly current data on it but we want to get to the point where we can failover from the primary to the failover and back again without data loss or errors in the front end app. Right now it's a fairly manual process.
Possible solutions include:
Robocopy. Simple, but it doesn't easily allow you to fail over and back again without multiple jobs running all the time (copying data back and forth)
Store the file in a BLOB in SQL Server 2005. I think this could be a performance issue, especially with large files.
Use the FILESTREAM type in SQL Server 2008. We mirror our database so this would seem to be promising. Anyone have any experience with this?
Microsoft's Distributed File System. Seems like overkill from what I've read since we only have 2 servers to manage.
So how do you normally solve this problem and what is the best solution?
Consider a cloud solution like AWS S3. It's pay for what you use, scalable and has high availability.
You need a SAN with RAID. They build these machines for uptime.
This is really an IT question...
When there are a variety of different application types sharing information via the medium of a central database, storing file content directly into the database would generally be a good idea. But it seems you only have one type in your system design - a web application. If it is just the web servers that ever need to access the files, and no other application interfacing with the database, storage in the file system rather than the database is still a better approach in general. Of course it really depends on the intricate requirements of your system.
If you do not perceive DFS as a viable approach, you may wish to consider Failover clustering of your file server tier, whereby your files are stored in an external shared storage (not an expensive SAN, which I believe is overkill for your case since DFS is already out of your reach) connected between Active and Passive file servers. If the active file server goes down, the passive may take over and continue read/writes to the shared storage. Windows 2008 clustering disk driver has been improved over Windows 2003 for this scenario (as per article), which indicates the requirement of a storage solution supporting SCSI-3 (PR) commands.
I agree with Omar Al Zabir on high availability web sites:
Do: Use Storage Area Network (SAN)
Why: Performance, scalability,
reliability and extensibility. SAN is
the ultimate storage solution. SAN is
a giant box running hundreds of disks
inside it. It has many disk
controllers, many data channels, many
cache memories. You have ultimate
flexibility on RAID configuration,
adding as many disks you like in a
RAID, sharing disks in multiple RAID
configurations and so on. SAN has
faster disk controllers, more parallel
processing power and more disk cache
memory than regular controllers that
you put inside a server. So, you get
better disk throughput when you use
SAN over local disks. You can increase
and decrease volumes on-the-fly, while
your app is running and using the
volume. SAN can automatically mirror
disks and upon disk failure, it
automatically brings up the mirrors
disks and reconfigures the RAID.
Full article is at CodeProject.
Because I don't personally have the budget for a SAN right now, I rely on option 1 (ROBOCOPY) from your post. But the files that I'm saving are not unique and can be recreated automatically if they die for some reason so absolute fault-tolerance is necessary in my case.
I suppose it depends on the type of download volume that you would be seeing. I am storing files in a SQL Server 2005 Image column with great success. We don't see heavy demand for these files, so performance is really not that big of an issue in our particular situation.
One of the benefits of storing the files in the database is that it makes disaster recovery a breeze. It also becomes much easier to manage file permissions as we can manage that on the database.
Windows Server has a File Replication Service that I would not recommend. We have used that for some time and it has caused alot of headaches.
DFS is probably the easiest solution to setup, although depending on the reliability of your network this can become un-synchronized at times, which requires you to break the link, and re-sync, which is quite painful to be honest.
Given the above, I would be inclined to use a SQL Server storage solution, as this reduces the complexity of your system, rather then increases it.
Do some tests to see if performance will be an issue first.

Resources