Mock a MariaDB Socket connection for test purposes - mariadb

I have been looking for a way to mock a MariaDB connection for test purposes.
The code I need to test opens a connection to a MariaDB server through unix socket.
All along the code, it updates rows.
The behavior I want to test has nothing to do with that is done with the database.
What service could I run in order to mock a MariaDB server that would accept every connection as well as insert or update queries without having to create databases, tables and so on ?
Note: The closest solution I could find was this
https://dev.mysql.com/doc/dev/mysql-server/latest/PAGE_MYSQL_SERVER_MOCK.html
But it doesn't actually fit my purpose.

Related

Apache Airflow Connection hook is instantiating multiple connection

Background: Apache Airflow documentation reads:
Hooks Hooks act as an interface to communicate with the external
shared resources in a DAG. For example, multiple tasks in a DAG can
require access to a MySQL database. Instead of creating a connection
per task, you can retrieve a connection from the hook and utilize it.
I have tried spawning 10 tasks using different DB: MYSQL, POSTGRES, MONGODB. Please note that I am using one DB (ex: MYSQL) in one DAG (consisting of 10 tasks).
But, All tasks are instantiating a new connection.
Example of my task:
conn_string = kwargs.get('conn_id')
pg = PostgresHook(conn_string)
pg_query ="...."
records = pg.get_records(pg_query)
why is airflow instantiating a new connection when airflow documentation itself reads (..... multiple tasks in a DAG can require access to a MySQL database. Instead of creating a connection per task, you can retrieve a connection from the hook and utilize it...........)
What is being missed here...
I believe what they mean to say with that part of documentation is that hooks prevent you from redefining the same credentials over and over again. With connection, they are referring to an Airflow connection you define in the web interface and not an actual internet connection to a host.
If you think about it this way:
A task can be scheduled in any of the 3 airflow worker nodes.
Your 10 tasks are divided between these 3.
How would they be able to share the same internet connection if it comes from different hosts? It would be very very hard to maintain those internet connections across workers.
But don't worry, it also took ages for me to understand what they meant there.

Connecting to the databases of a MockNetwork in Corda

Often, when writing a MockNetwork test I want to connect to the databases of the nodes and interactively query them. Is there a way of doing this?
see this class and example
https://gist.github.com/dazraf/01115f0d376647f99e8fc453ba07251c
Essentially starts the H2 TCP server and dumps the jdbc connection strings for each node.
It has a method to block the test whilst you interactively query the DBs

How can i perform normal R-functions for hadoop remote on SQL Server?

how can I perform normal R-Code on a SQL Server without using the Microsoft rx-functions? I think the ComputeContext "RxInSqlServer" isn't the right one? But I couldn't find good Information about the other ComputeContext-options.
Is this possible with this Statement?
rxSetComputeContext(ComputeContext)
Or can I only use it to perform rx-functions? An other Option could be to set the Server Connection in RStudio or VisualStudio?
My Problem is: I want analyse data from hadoop via ODBC-Connection on the SQL Server, so I would like to use the performance of the remote SQL Server and not the data in SQL Server. And then I want analyse the hadoop-data with sparklyr.
Summary: I want to use the performance from the remote server and not the SQL Server data. So RStudio should run not local, it should perform and use the memory of the remote server.
Thanks!
The concept of a compute context in Microsoft R Server is, “Where will the computation be performed?”
When setting compute context, you are telling Microsoft R Server that computation will occur on either the local machine (with either “local” or “localpar” compute contexts), or, the script will be executed on a remote machine which has Microsoft R Server installed on it. Remote compute contexts are defined by creating a compute context object, and then setting the context to that object.
For SQL Server, you would create an RxInSqlServer() object, and then call rxSetComputeContext() on that object. For Hadoop, the object would be created via the RxHadoopMR() call.
In code, it would look something like:
CC <- RxHadoopMR( < context defined here > )
rxSetComputeContext(CC)
To see usage on defining a context, please see documentation (Enter "?RxHadoopMR" in the R Client, no quotes).
Any call to an "rx" function after this will be performed on the Hadoop cluster, with no data being transferred to the client; other than the results.
RxInSqlServer() would follow the same pattern.
Note: To perform any remote computation, Microsoft R Server must be installed on that machine.
If you wish to run a standard R function on a remote compute context, you must wrap that function in a call to rxExec(). rxExec() is desinged as an interface to parallelize any Open Source R function and allow for its execution on a remote context. Please see documentation (enter "?rxExec" in the R Client, no quotes) for usage.
For information on efficient parallelization, please see this blog: https://blogs.msdn.microsoft.com/microsoftrservertigerteam/2016/11/14/performance-optimization-when-using-rxexec-to-parallelize-algorithms/
You called out "without using the Microsoft rx-functions" and I am interpreting this as, "I would like to use Open Source R Algorithms on data in-SQL Server", with Microsoft R Server, you must use rxExec() as the interface to run Open Source R. If you want to use no rx functions at all, you will need to query the data to your local machine, and then use Open Source R. To interface with a remote context using Microsoft R Server, the bare minimum is using rxExec().
This is how you will be able to achieve the first part of your ask, "how can I perform normal R-Code on a SQL Server without using the Microsoft rx-functions? I think the ComputeContext "RxInSqlServer" isn't the right one?"
For your second ask, "My Problem is: I want analyse data from hadoop via ODBC-Connection on the SQL Server, so I would like to use the performance of the remote SQL Server and not the data in SQL Server. And then I want analyse the hadoop-data with sparklyr."
First, I'd like to comment that with the release of Microsoft R Server 9.1, you can use sparklyr in-line with an MRS Spark connection, for some examples, please see this blog: https://blogs.msdn.microsoft.com/microsoftrservertigerteam/2017/04/19/new-features-in-9-1-microsoft-r-server-with-sparklyr-interoperability/
Secondly, what you are trying to do is very involved. I can think of two ways that this is possible.
One is, if you have SQL Server PolyBase, you can configure SQL Server to make a virtual table referencing data in Hadoop, similar to Hive. After you have referenced your Hadoop data in SQl Server, you would use an RxInSqlServer() compute context on these tables. This would analyse the data in SQL Server, and return the results to the client.
Here is a detailed blog explaining an end-to-end setup on Cloudera and SQL Server: https://blogs.msdn.microsoft.com/microsoftrservertigerteam/2016/10/17/integrating-polybase-with-cloudera-using-active-directory-authentication/
The Second, which I would NOT recommend, is untested, hacky, and has the following prereqs:
1) Your Hadoop cluster must have OpenSSH installed and configured
2) Your SQL Server Machine must have the ability to SSH into your Hadoop Cluster
3) You must be able to place an SSH Key on your SQL Server machine in a directory which the R Services process has the ability to access
And I need to add another disclaimer here, there is No Guarantee of this working, and, likely, it will not work. The software was not designed to operate in this fashion.
You would then do the following:
On your client machine, you would define a custom function which contains the analysis that you wish to perform, this can be Open Source R Function, rx functions, or a mix.
In this custom function, before calling any other R or rx functions, you would define a RxHadoopMR compute context object which points to your cluster, referencing the SSH key in the directory on the SQL Server machine as if you were executing from that machine. (in the same way that you would define the RxHadoopMR object if you were to do a remote Hadoop operation from your client machine).
Within this custom function, immediately after RxHadoopMR() is defined, you would call rxSetComputeContext() on your defined RxHadoopMR() object
Still in this custom function, write the actual script which will operate on the data in Hadoop.
After this function is defined, you would define an RxInSqlServer() compute context object on the client machine.
You would set your compute context to RxInSqlServer()
Then you would call rxExec() with your custom function as an input.
What this will do is execute your custom function on the SQL Server machine, which would hopefully cause it to define its compute context as your Hadoop cluster, and pull the data over SSH for analysis on the SQL Server machine; returning the results to client.
With that said, this is not how Microsoft R Server was designed to be used, and if you wish to optimize performance, please use Option One and configure PolyBase.

SQL Server session in CLUSTER

Can anyone help me with this ...
I have a 3 node sql server cluster lets say N1, N2 and N3. The name for the three node cluster is SQLCLUS.
The application connects to database using the name SQLCLUS in connections strings name.
The application uses SQL Server session manangement. So I remote desktopped to N1 (which is active while N2 and N3 are passive) and from the locaiton
C:\Windows\Microsoft.NET\Framework64\v2.0.50727
I executed the following command
aspnet_regsql.exe -S SQLCLUS -E -ssadd -sstype p
The command executed successfully. I could then login into SQLCLUS and see ASpState database created with 2 tables.
I then tested the applciation which uses the SQL Server session and it also works fine.
Now my question is ...
If there is a fail over to node N2 or N3 will my application still work ?. I did not execute the above command (aspnet_regsql.exe ) from N2.
Should I execute the command , aspnet_regsql.exe -S SQLCLUS -E -ssadd -sstype p , in N2 and N3 too ?
What changes happens in a sql server after executing the above command ?. I mean , is there any kind of service ot settings changes that can be seen ?.
Greatly apprecite any in puts regarding this....
Thanks in advance...
3.
Sql Server failover clustering can be conceptually explained as a smoke-and-mirrors dns hack. Thinking of clustering in familiar terms makes you realize how simple a technology it really is.
Simplified description of Sql Server Failover Clustering
Imagine you have two computers: SrvA and SrvB
You plug an external HD (F:) into SrvA, Install Sql Server and configure it to store its database files on f:\ (The executable is under C:\Program Files).
Unplug the HD, plug it into SrvB, Install Sql Server and configure it to store its database files on F:\ in the exact same location.
Now, you create a dns alias "MyDbServer" that points to SrvA, plug the external HD back into SrvA, and start sql server.
All is good until one day when the power supply fails on SrvA and the machine goes down.
To recover from this disaster you do the following:
Plug the external drive into SrvB
Start sql server on SrvB
Tweak the dns entry for "MyDbServer" to point to SrvB.
You're now up and going on SrvB, and your client applications are blissfully unaware that SrvA failed because they only ever connected using the name "MyDbServer".
Failover Clustering in the Reality
SrvA and SrvB are the cluster nodes.
The External HD is Shared SAN Storage.
The three step recovery process is what happens during a cluster failover and is managed automatically by the Windows Failover Clustering service.
What kinds of tasks need to be run on each Sql Node?
99.99% of the tasks that you perform in Sql Server will be stored in the database files on shared storage and therefore will move between nodes during a failover. This includes everything from creating logins, creating databases, INSERTS/UPDATES/DELETES on tables, Sql Agent jobs and just about everything else you can think of. This also includes all of the tasks that aspnet_regsql command performs (it does nothing special from a database perspective).
The remaining .01% of things that would have to be done on each individual node (because they aren't stored on shared storage) are things like applying service packs (remember that the executable is on c:), certain registry settings (some Sql Server registry settings are "checkpointed" and failover, some aren't), registering 3rd party COM dll's (no one really does this anymore) and changing the service account that Sql Server runs under.
Try it for yourself
If you want to verify that aspnet_regsql doesn't need to be run on each node, then try failing over and verify that your app still works. If you do run aspnet_regsql on each node and reference the clustered name (SQLCLUS) then you will effectively be over-writing the database, so if it doesn't error out, it will just wipe out your existing data.

Accessing SQL Server Cluster from ASP.Net

I'm a total unix-way guy, but now our company creates a new application under ASP.NET + SQL Server cluster platform.
So I know the best and most efficient principles and ways to scale the load, but I wanna know the MS background of horizontal scaling.
The question is pretty simple – are there any built-in abilities in ASP.Net to access the least loaded SQL server from SQL Server cluster?
Any words, libs, links are highly appreciated.
I also would be glad to hear best SQL Server practices or success stories around this theme.
Thank you.
Pavel
SQL Server clustering is not load balancing, it is for high-availability (e.g. one server dies, cluster is still alive).
If you are using SQL Server clustering, the cluster is active/passive, in that only one server in the cluster ever owns the SQL instance, so you can't split load across both of them.
If you have two databases you're using, you can create two SQL instances and have one server in the cluster own one of the two instances, and the other server own the other instance. Then, point connection strings for one database to the first instance, and connection strings for the second database to the second instance. If one of the two instances fails, it will failover to the passive server for that instance.
An alternative (still not load-balancing, but easier to setup IMO than clustering) is database mirroring: http://msdn.microsoft.com/en-us/library/ms189852.aspx. For mirroring, you specify the partner server name in the connection string: Data Source=myServerAddress;Initial Catalog=myDataBase;User Id=myUsername;Password=myPassword;Failover Partner=myBackupServerAddress; ADO.Net will automatically switch to the failover partner if the primary fails.
Finally, another option to consider is replication. If you replicate a primary database to several subscribers, you can split your load to the subscribers. There is no built-in functionality that I am aware of to split the load, so your code would need to handle that logic.

Resources