Aggregating multiple distributed MySQL databases - innodb

I have a project that requires us to maintain several MySQL databases on multiple computers. They will have identical schemas.
Periodically, each of those databases must send their contents to a master server, which will aggregate all of the incoming data. The contents should be dumped to a file that can be carried via flash drive to an internet-enabled computer to send.
Keys will be namespace'd, so there shouldn't be any conflict there, but I'm not totally sure of an elegant way to design this. I'm thinking of timestamping every row and running the query "SELECT * FROM [table] WHERE timestamp > last_backup_time" on each table, then dumping this to a file and bulk-loading it at the master server.
The distributed computers will NOT have internet access. We're in a very rural part of a 3rd-world country.
Any suggestions?

Your
SELECT * FROM [table] WHERE timestamp > last_backup_time
will miss DELETEed rows.
What you probably want to do is use MySQL replication via USB stick. That is, enable the binlog on your source servers and make sure the binlog is not thrown away automatically. Copy the binlog files to USB stick, then PURGE MASTER LOGS TO ... to erase them on the source server.
On the aggregation server, turn the binlog into an executeable script using the mysqlbinlog command, then import that data as an SQL script.
The aggregation server must have a copy of each source servers database, but can have that under a different schema name as long as your SQL all does use unqualified table names (does never use schema.table syntax to refer to a table). The import of the mysqlbinlog generated script (with a proper USE command prefixed) will then mirror the source servers changes on the aggregation server.
Aggregation across all databases can then be done using fully qualified table names (i.e. using schema.table syntax in JOINs or INSERT ... SELECT statements).

Related

Efficient method to move data from Oracle (SQL Developer) to MS SQL Server

Daily, I query a few tables in SQL Developer, filtering to prior day activity, adding column to date stamp the data, then export to xlsx. Then I manually import each file to a MS SQL Server via SQL Server Import and Export Wizard. Takes many clicks, much waiting...
I'm essentially creating an archive in SQL Server, the application I'm querying overwrites data daily. I'm not a DBA of either database, I use the archived data to do validations and research.
It's tough to get my org to provide additional software, I've been trying to make this work via SQL Developer, SSMS Express ed, and other standard tools.
I'm looking to make this reasonably automated, either via scripts, scheduled tasks, etc. Appreciate suggestions that would work on my current situation, but if that isn't reasonable, and there's a very reasonable alternative, I can go back to the org to request software/access/assistance.
You can use SSIS to import the data directly from Oracle to SQL Server, unless you need the .xlsx files for another purpose. You can also export from Oracle to these, then load to SQL Server from these files if you do need the files. For the date stamp column, a Derived Column can be added within a Data Flow Task using the SSIS GETDATE() function for a timestamp in order to achieve the same result. This function returns a timestamp, and if only the date is necessary the (DT_DBDATE) function can cast it to a date data type that's compatible with this data type of SQL Server. Once you have the SSIS package configured, you can schedule in to run at regular intervals as a SQL Agent job. I'd also recommend installing the SSIS catalog (SSISDB) and using this the source to run the packages from. The following links shed more light on these areas.
SSIS
Connecting to Oracle from SSIS
Data Flow Task
Derived Column Transformation
Creating SQL Server Agent Jobs for SSIS packages
SSIS Catalog
Another option that you may consider (if it is supported in SQL Express) is using the BCP utility, which can be run from command line.
The BCP utility allows you to bulk copy the data from a delimited text file into a SQL Server table.
If you go this approach, things to consider:
Number of Columns in the source file need to match the number of columns in the destination
Data types must match (or be comparable)
Typically, empty strings will be converted to nulls, so you will need to consider if the columns are nullable.
(to name a few - if you want to delve deeper, you might also need to look at custom delimiters between fields and records. Don't forget, commas and line feeds are still valid characters in char type fields).
Anyhow, maybe it will work for you, maybe not. Sure, you might still have to deal with the exporting of the data from Oracle, but it might ease the pain getting the data in.
Have a read:
https://learn.microsoft.com/en-us/sql/tools/bcp-utility?view=sql-server-2017

How to transfer data from SQL Server to Informix and vice versa

I want to transfer tables data from SQL server to Informix and vice versa.
The transferring should be run scheduled and sometimes when the user make a specific action.
I do this operation through delete and insert transactions and it takes along long time through the web between 15 minute to 30 minute.
How to do this operation in easy way taking the performance in consideration?
Say I have
Vacation table in SQL Server and want to transfer all the updated data to the Vacation table in Informix.
and
Permission table in Informix and want to transfer all the updated data to the Permission table in SQL Server.
DISCLAIMER: I am not an SQL Server DBA. However, I have been an Informix DBA for over ten years and can make some recommendations as to its performance.
Disclaimer aside, it sounds like you already have a functional application, but the performance is a show-stopper and that is where you are mainly looking for advice.
There are some technical pieces of information that would be helpful to know, but in their absence, I'm going to make the following assumptions about your environment and application. Please comment or edit your question if I am wrong on any of these.
Database server versions. From the tags, it appears you are using SQL server 2012. However, I cannot determine the Informix server and version. I will assume you are running at least IDS 11.50 or greater.
How the data is being exchanged currently. Are you connecting directly from your .NET application to Informix? I would assume that is the case with SQL Server and will make the same assumption for your Informix connection as well.
Table structures. I assume you have proper indexing on the tables. On the Informix side, dbschema -d *dbname* -t *tablename* will give the basic schema.
If you haven't tried exporting data to CSV and as long as you don't have any compliance concerns doing this, I would suggest loading the data from a comma-delimited file. (Informix normally deals with pipe-delimited files, so you'll either need to adjust the delimiter on the SQL Server side to a pipe | or on the Informix import side). On the Informix end, this would be a
LOAD FROM 'source_file_from_sql_server' DELIMITER '|' INSERT INTO vacation (field1, field2, ..)
For reusability, I would recommend putting this in a stored procedure. Just wrap that load statement inside a BEGIN WORK; and COMMIT WORK; to keep your transactional integrity. MichaƂ Niklas suggested some ways to track changes. If there is any correlation between the transfer of data to the vacation table in Informix and the permission table back in SQL Server, I would propose another option, which is adding a trigger to the vacation table so that you write all new values to a staging table.
With the import logic in a stored procedure, you can fire the import on demand:
EXECUTE PROCEDURE vacation_import();
You also mentioned the need to schedule the import, which can be accomplished with Informix's "dbcron". Using this feature, you'll create a scheduled task that executes vacation_import() periodically as well. If you haven't used this feature before, using OAT will be helpful. You will also want to do some housekeeping with the CSV files. This can be addressed with the system() call, which you can make from stored procedures in Informix.
Some ideas:
Add was_transferred column to source tables setting its default value to 0 (you can use 0/1 instead of false/true).
From source table select data with was_transferred=0.
After transferring data update selected source row, set its was_transferred to 1.
Make table syncro_info with fields like date_start and date_stop. If you discover that there is record with date_stop IS NULL it will mean that you are tranferring data. This will protect you against synchronizing data twice.

is it possible to insert rows from a local table into a remote table in postgresql?

I have two postgresql databases, one on my local machine and one on a remote machine. If I'm connected to the local database (with psql), how do I execute a statement that inserts rows into a table on the remote database by selecting rows from a table on the local database? (I've seen this asked a handful of times, like here, here and here, but I haven't yet found a satisfactory answer or anyone saying definitively that it's not possible).
Specifically, assume I have tables remotetable and localtable, each with a single column columnA. I can run this statement successfully:
select dblink_exec('myconnection', 'insert into remotetable (columnA) values (1);');
but what I want to do is this:
select dblink_exec('myconnection', 'insert into remotetable (columnA) select columnA from localtable;');
But this fails with: relation "localtable" does not exist, presumably because localtable does not exist on the remote database.
Is it possible to do what I'm trying to do? If so, how do I indicate that localtable is, in fact, local? All of the examples I've seen for dblink-exec show inserts with static values, not with the results of a local query.
Note: I know how to query data from a remote table and insert into a local table, but I'm trying to move data in the other direction.
If so, how do I indicate that localtable is, in fact, local?
It's not possible because dblink acts as an SQL client to the remote server. That's why the queries sent through dblink_exec must be self-contained: they can do no more than any other query sent by any SQL application. Every object in the query is local to it from the server's perspective.
That is, unless you use another functionality, a Foreign-Data Wrapper with the postgres_fdw driver. This is a more sophisticated way to achieve server-to-server querying in which the SQL engine itself has this notion of foreign and local tables.

How does one use the "create database" statement for Oracle express 11g?

According to one of my posts (below) it seems that there is no such thing as a database in Oracle. What we call database in MySQL and MS-SQL is called schema in Oracle.
If that is the case, then why do the oracle docs mention the create database statement ?
For the record, I am using Oracle 11g and oracle SQL Developer GUI tool.
Post-
How to create a small and simple database using Oracle 11 g and SQL Developer?
The create database statement from oracle docs is given below. If there is no database concept, then how did this command come into the picture ?
CREATE DATABASE
CREATE DATABASE [ database ]
{ USER SYS IDENTIFIED BY password
| USER SYSTEM IDENTIFIED BY password
| CONTROLFILE REUSE
| MAXDATAFILES integer
| MAXINSTANCES integer
| CHARACTER SET charset
| NATIONAL CHARACTER SET charset
| SET DEFAULT
{ BIGFILE | SMALLFILE } TABLESPACE
| database_logging_clauses
| tablespace_clauses
| set_time_zone_clause
}... ;
There is concept of a "database" in Oracle. What the term "database" means in Oracle terms is different than what the term means in MySQL or SQL Server.
Since you are using the express edition, Oracle automatically runs the CREATE DATABASE statement as part of the installation process. You can only have 1 express edition database on a single machine. If you are installing a different edition, you can choose whether to have the installer create a database as part of the installation process or whether to do that manually via the CREATE DATABASE statement later. If you are just learning Oracle, you're much better off letting Oracle create the database for you at installation time-- you can only create the database via command-line tools (not SQL Developer) and it is rare that someone just starting out would need to tweak the database settings in a way that the installer didn't prmopt you for.
In Oracle, a "database" is a set of data files that includes the data files for the SYS and SYSTEM schemas which contain all the Oracle data dictionary tables, the data files for the TEMP tablespace where sorts and other temporary operations occur, and the data files for whatever schemas you want to create. In SQL Server and other RDBMSs, these would be separate "databases". In SQL Server, you have a master database, a tempdb database, additional database for different products (i.e. msdb for the SQL Server Agent), and then additional user-defined databases. In Oracle, these would all be separate schemas in a larger container that Oracle refers to as a "database".
Occasionally, a DBA will want to run multiple Oracle databases on the same server-- most commonly when there are different packaged applications that have different requirements about database versions or parameters. If you want to run application A that requires an 11.2 database and application B that doesn't support 11.2 yet, you would need to have two different databases on the server. The DBA could create a separate database and a separate instance but that doubles the memory requirements, doubles the number of background processes required to run the database, and generally makes things less scalable. It's necessary if you really want to run different versions of the database simultaneously but it's not ideal.
The person who answered your original question is correct. The DDL (Data Definition Language) above prepares a space for schemas, which is analogous to MySQL's 'database'. The above statement defines characteristics of the schemas, such as timezone, MBs of space for tables, encoding characterset, root account, etc. You would then issue DDL statements such as those in your other post to create schemas, which define what each user can see.

Database for replication or simple transferring data

I will try to describe my problem of choosing good technology.
I have many machines which stores data locally in database. And there is one client machine with its own database. What I need is to pull data from all machines and put in client's database.
For now I have started implementing some RPC, but I don't know if its good idea. Because I need to manually take care of each table. Database is SQLite.
What is better. Making some RPC calls or find some light database with replication? Maybe NoSQL db like MonoDB?
I have a similar setup where I have a couple of servers that collect various statistics and store in a sqlite3 database. Combining them is really easy. I have a python script that connect to each server, downloads each database file into a temporary folder. I then open the first one, and use ATTACH for each file, and then insert * for each table to merge in all the other databases into a combined database:
conn = connect('/tmp/database1.sl3');
curs = conn.cursor();
mergeDatabases(curs, 8);
def mergeDatabases(curs, j):
for i in range(2, j):
print "merge in database%d" %i
print "ATTACH '/tmp/database%d.sl3' AS foo%d;" %(i,i)
curs.execute("ATTACH '/tmp/database%d.sl3' AS foo%d;" %(i,i))
curs.execute("insert into db select * from foo%d.db;" %i)
curs.execute("insert into vars select * from foo%d.vars;" %i)
curs.execute("detach foo%d;" %i)

Resources