Database Access in R: MariaDB vs RODBC, dplyr etc - r

I'm a noob so forgive my ignorance :-)
I'm creating a Shiny app to perform read/write operations on an existing MS Access database (.mdb) with about 20 small tables and a variety of joins on them. I may have a small number of people connecting simultaneously.
I have planned to use MariaDB (RMariaDB). I notice e.g. RODBC or dplyr can connect to .mdb directly? The app will be hosted on a remote server.
It's unlikely the largest table will exceed 5000 rows in the next 2 years. Should I be using MariaDB now or would the other 'direct' options be enough?
Many thanks in advance for your replies...
Gary

This is not an answer to 'which db shall I use?', but it is a recommendation for a starting point:
https://db.rstudio.com/
The short answer is 'ensure DBI compatibility' and the slightly longer answer is 'for Shiny apps consider using the pool package'.

Related

Is it possible to host SQLite as a separate process

If we have 500 processes accessing SQLite, is it possible to host it as a separate process, so all 500 processes do not have to perform IO.
These processes can attach to one instance of SQLite and access data. Is it possible twith SQLite
The short answer is no. SQLite isn't a client/server database, it's just code linked into your application/process. There are 3rd party client/server implementations of SQLite, but I've never used one and can't speak to their quality. It sounds like you may be better off looking at client/server dbs such as PostgreSQL or MySQL.
It might also be worth reading Appropriate Uses For SQLite to see if your particular use case is a good fit for SQLite or not.

Method to replicate sqlite database across multiple servers

I'm developing an application that works distributed, and I have a SQLite database that must be shared between distributed servers.
If I'm in serverA, and change sqlite row, this change must be in the other servers instantly, but if a server were offline and then it came online, it must update all info equal other servers.
I'm trying to develop a HA service with small SQLite databases.
I'm thinking on something like MongoDB or ReThinkDB, due to replication works fine and I have got data independently server online I had.
There are a library or other SQL methodology to share data between servers?
I used the Raft consensus protocol to replicate my SQLite database. You can find the system here:
https://github.com/rqlite/rqlite
Here are some options:
LiteReplica:
It supports master-slave replication for SQLite3 databases using a single master (writable node) and one or many replicas (read-only nodes).
If a device went offline and then it came online, the secondary/slave dbs are updated with the primary/master one incrementally.
LiteSync:
It implements multi-master replication so we can write to the db in any node, even when the device is off-line.
On both we open the database using a modified URI, like this:
“file:/path/to/app.db?replica=master&bind=tcp://0.0.0.0:4444”
AergoLite:
Blockchain based, it has the highest level of security. Stores immutable relational data, secured by a distributed consensus with low resource usage.
Disclosure: I am the author of these solutions
You can synchronize SQLite databases by embedding SymmetricDS in your application. It supports occasionally connected clients, so it will capture changes and sync them when a server comes online. It supports several different database platforms and can be used as a library or as a standalone service.
You can also use CopyCat, which support SQLite as well as a few other database types.
Marmot looks good:
https://github.com/maxpert/marmot
From their docs:
What & Why?
Marmot is a distributed SQLite replicator with leaderless, and eventual consistency. It allows you to build a robust replication between your nodes by building on top of fault-tolerant NATS Jetstream. This means if you are running a read heavy website based on SQLite, you should be easily able to scale it out by adding more SQLite replicated nodes. SQLite is probably the most ubiquitous DB that exists almost everywhere, Marmot aims to make it even more ubiquitous for server side applications by building a replication layer on top.

Should you use ODBC or Registry Entry for data connections?

I support a group of developers that are telling me to setup a registry entry for an application that they made in asp.net to connect to our SQL backend. Would it not be better to do this from an ODBC connection? Is this lazy programming or is this common practice?
If all their connections are in registry entries how will I be able to spin up the DRP site in case we have an issue? Right now we replicate the content across and it would be a heck of a lot easier if the DB connections were in ODBC instead of having to redo all these registry entries. (there are multiple apps doing this).
Please fill me in. Thanks
Why are they not using the web.config to store connection strings? http://msdn.microsoft.com/en-us/library/ms178411.aspx

Accessing SQL Server Cluster from ASP.Net

I'm a total unix-way guy, but now our company creates a new application under ASP.NET + SQL Server cluster platform.
So I know the best and most efficient principles and ways to scale the load, but I wanna know the MS background of horizontal scaling.
The question is pretty simple – are there any built-in abilities in ASP.Net to access the least loaded SQL server from SQL Server cluster?
Any words, libs, links are highly appreciated.
I also would be glad to hear best SQL Server practices or success stories around this theme.
Thank you.
Pavel
SQL Server clustering is not load balancing, it is for high-availability (e.g. one server dies, cluster is still alive).
If you are using SQL Server clustering, the cluster is active/passive, in that only one server in the cluster ever owns the SQL instance, so you can't split load across both of them.
If you have two databases you're using, you can create two SQL instances and have one server in the cluster own one of the two instances, and the other server own the other instance. Then, point connection strings for one database to the first instance, and connection strings for the second database to the second instance. If one of the two instances fails, it will failover to the passive server for that instance.
An alternative (still not load-balancing, but easier to setup IMO than clustering) is database mirroring: http://msdn.microsoft.com/en-us/library/ms189852.aspx. For mirroring, you specify the partner server name in the connection string: Data Source=myServerAddress;Initial Catalog=myDataBase;User Id=myUsername;Password=myPassword;Failover Partner=myBackupServerAddress; ADO.Net will automatically switch to the failover partner if the primary fails.
Finally, another option to consider is replication. If you replicate a primary database to several subscribers, you can split your load to the subscribers. There is no built-in functionality that I am aware of to split the load, so your code would need to handle that logic.

can postgresql scale to the likes of sql server? is it easy to tune?

hoping someone has experience with both sql server and postgresql.
Between the two db's, which one is easier to scale?
Is creating a read only db that mirrors the main db easier/harder than sql server?
Seeing as sql server can get $$, I really want to look into postgresql.
Also, are the db access libraries written well for an asp.net application?
(please no comments on: do you need the scale, worry about scaling later, and don't optimize until you have scaling issues...I just want to learn from a theoretical standpoint thanks!)
Currently, setting up a read-only replica is probably easier with SQL Server. There's a lot of work going on to get hot standby and streaming replication part of the next release, though.
Regarding scaling, people are using PostgreSQL with massive databases. Skype uses PostgreSQL, and Yahoo has something based on PostgreSQL with several petabyte in it.
I've used Postgresql with C# and ASP.Net 2.0 and used the db provider from devart:
http://www.devart.com/dotconnect/postgresql/
The visual designer has a few teething difficulties but the connectivity was fine.
I have only used SQL Server and not much PostgreSQL, so I can only answer for SQL Server.
When scaling out SQL Server you have a couple of options. You can use peer to peer replication between databases, or you can have secondary read-only DB(s) as you mention. The last option is relatively straight-forward to set up using database mirroring or log shipping. Database mirroring also gives you the benefit of automatic switchover on primary DB failure. For an overview, look here:
http://www.microsoft.com/sql/howtobuy/passive-server-failover-support.mspx
http://technet.microsoft.com/en-us/library/cc917680.aspx
http://blogs.technet.com/josebda/archive/2009/04/02/sql-server-2008-database-mirroring.aspx
As for licensing, you only need a license for the standby server if it is actively used for serving queries - you do not need one for a pure standby server.
If you are serious, you can set up a failover cluster with a SAN for storage, but that is not really a load balancing setup in itself.
Here are some links on the general scale up/out topic:
http://www.microsoft.com/sqlserver/2008/en/us/wp-sql-2008-performance-scale.aspx
http://msdn.microsoft.com/en-us/library/aa479364.aspx
ASP.NET Libraries are obviously very well written for SQL Server, but I would believe there exists good alternatives for PostgreSQL as well.

Resources