Is there any way to execute repeatable flyway scripts first? - flyway

We use flyway since years to maintain our DB scripts, and it does a wonderful job.
However there is one single situation where I am not really happy - possibly someone out there has a solution:
In order to reduce the number of scripts required (and also in order to keep overview about "where" our procedures are defined) I'd like to implement our functions/procedures in one script. Every time a procedure changes (or a new one is developed) this script shall be updated - repeatable scripts sound perfect for this purpose, but unfortunately they are not.
The drawback is, that a new procedure cannot be accessed by non-repeatable scripts, as repeatable scripts are executed last, so the procedure does not exist when the non-repeatable script executes.
I hoped I can control this by specifying different locations (e.g. loc_first containing the repeatables I want to be executed first, loc_normal for the standard scripts and the repeatables to be executed last).
Unfortunately the order of locations has no impact on execution order ;-(
What's the proper way to deal with this situation? Right now I need to specify the corresponding procedures in non-repeatable scripts, but that's exactly what I'd like to avoid ....

I found a workaround on my own: I'm using flyway directly with maven (the same would work in case you use the API of course). Each stage of my maven script has its own profile (specifiying URL etc.)
Now I create two profiles for every stage - so I have e.g. dev and devProcs.
The difference between these two maven profiles is, that the "[stage]Procs" profile operates on a different location (where only the repeatable scripts maintaining procedures are kept). Then I need to execute flyway twice - first with [stage]Procs then with [stage].
To me this looks a bit messy, but at least I can maintain my procedures in a repeatable script this way.

According to flyway docs, Repeatable migrations ALWAYS execute after versioned migration.
But, I guess, you can use Flyway callbacks. Looks like, beforeMigrate.sql callback is exactly what you need.

Related

Version control of big data tables (iceberg)

I'm building a Iceberg tables on the top of a data lake. These tables are used for reporting tools. I'm trying to figure out what is the best way to control a version/deploy changes to these tables in CI/CD process. E.g. I could like to add a column to the Iceberg table. To do that I have to write a ALTER TABLE statement, save it to the git repository and deploy via CI/CD pipeline. Tables are accessible via AWS Glue Catalog.
I couldn't find to much info about this in google so if anyone could share some knowledge, it would be much appreciated.
Cheers.
Version control of Iceberg tables.
Agree with #Fokko Driesprong. This is a supplement only.
Sometimes, table changes are considered as part of task version changes. That is, table change statements, ALTER TABLE, are bound to task upgrades.
Tasks are sometimes automatically deployed. So it often executes a table change statement first, and then deploys a new task. If the change is disruptive, then we need to stop the old task first and then deploy the new one.
Corresponding to the upgrade, we also have a rollback script, of course, the corresponding table change statement.
thanks for asking this question. I don't think there is a definitive way of doing this. In practice I see most people bundling this as part of the job that writing to the Iceberg table. This way you can make sure that new columns are populated right away with the new version of the job. If you don't do any breaking changes (such as deletion of column), then the downstream jobs won't break. Hope this helps!

How to add Flyway to an existing database without a DDL script?

I currently have a Kotlin-Exposed project that I would like to add Flyway to. The problem I am having is that most documentation and answers online indicate that the best way to add Flwyay to an existing schema is to just have the first script be a data definition script. This usually would work, but since I'm dynamically generating my SQL with an ORM, this doesn't really make sense. Are there ways around this?
I really just want to use Flyway to add/delete persistent data that I will always need in certain tables. I don't want to insert it at the ORM level because if the application is run multiple times, then it can insert the data each time it's run (as opposed to Flyway where it will just migrate the database to the newest constructed state).
I think another way to word this question is: "Can I use Flyway for static data only, and not schema?"
Yes, you can.
Some info:
You are not required to have a first script containing the data definition / "baseline" of the schema. You can simply skip that.
When running Flyway against a non-empty database for the first time, you will still need to run the baseline command. In your case this will simply indicate to Flyway that it can assume the baseline schema is present and that it's safe to run migrations. (Your baseline schema was deployed by the ORM instead of a baseline script -- that's totally fine, Flyway won't check/doesn't care.)
You could choose to write your scripts that insert static data in a way that they are idempotent / use a guard clause so that they don't insert the data twice. That way it would be safe to use at the ORM level if you choose.

Symfony2 testing: Why should I use fixtures instead of managing data directly in test?

I'm writing some functional tests for an API in Symfony that depend on having data in the database. It seems the generally accepted method for doing this is to use Fixtures and load the fixtures before testing.
It seems pretty daunting and impractical to create a robust library of Fixture classes that are suitable for all my tests. I am using the LiipFunctionalTestBundle so I only load the Fixtures I need, but it only makes things slightly easier.
For instance, for some tests I may need 1 user to exist in the database, and other tests I may need 3. Beyond that, I may need each of those users to have slightly different properties, but it all depends on the test.
I'd really like to just create the data I need for each test on-demand as I need it. I don't want to pollute the database with any data I don't need which may be a side effect of using Fixtures.
My solution is to use the container to get access to Doctrine and set up my objects in each test before running assertions.
Is this a terrible decision for any reason which I cannot foresee? This seems like a pretty big problem and is making writing tests a pain.
Another possibility it to try and use this port of Factory Girl for PHP, but it doesn't seem to have a large following, even though Factory Girl is used so widely in the Ruby community to solve this same problem.
The biggest reason to fixture test data in a single place (rather than in each test as needed) is that it allows you to isolate test failures from BC-breaking schema changes to only tests that are affected by the changes directly.
A simple example:
Suppose you have a User class, in which you have a required field name, for which you provide getName() and setName(). You write your functional tests without fixtures (each test creating Users as needed) and all is well.
Sometime down the road, you decide you actually need firstname and lastname fields instead of just name. Naturally you change the schema and replace getName/setName with new get and set methods for these, and go ahead and run your tests. Test fail all over the place (anything that sets up a User), since even tests that don't use the name field have called setName() during the setup, and now they need to be changed.
Compare to using a fixture, where the User classes needed for testing are all created in one fixture. After making your breaking change, you update the fixture (in one place / class) to setup the Users properly, and run your tests again. Now the only failures you get should be directly related to the change (ie tests that use getName() for something, and every other test that doesn't care about the name field proceeds as normal.
For this reason it's highly preferred to use fixtures for complicated functional tests where possible, but if your particular schema / testing needs make it really inconvenient you can set entities up manually in tests, but I'd do my best to avoid doing so except where you really need to.

How to avoid code duplication for non-data structures (views, stored procedures etc)

My project contains a lot of objects like views and stored procedures which are being changed quite frequently. Now I have to create new SQL script on every update which contains complete source code of changed objects despite I've actually changed only few rows. It leads to massive code duplication and I also found it difficult to review these changes.
I'd like to have only one actual version of SQL script for every object like view or procedure and recreate these objects every time I redeploy the database. As result I could change existing source file (like in Java or C programming) instead of creating a new update every time I need to alter view or procedure.
Is there a possibility to execute some scripts every time I migrate the database with Flyway?
I'm not sure why that got so many downvotes, it's a perfectly understandable and valid question. Perhaps it's because it closely resembles this open question:
Migrating Stored Procedures with Flyway
We are actually starting to push against this issue now. We've been using flyway for development and testing (and love it). But we've come to a point where we're starting to have to use procs/triggers/views (p/t/v's) and the fundamental disconnect between how we did it before, and how we must use flyway, is starting to be a strain.
Before, for a given database object (let's say it's a procedure), there'd be one source file. And if you needed to change the proc 'n' times, there would be 'n' versions of the same file in your VCS. Diff tools work great, IDE's all understand this, merges detect when two developers working in separate branches make changes to the proc, etc, etc. You know, old school.
But with flyway, any one proc with 'n' changes is now scattered across 'n' files. Instead of "one object in one file with 'n' versions", you have "one objecct in 'n' files with one change each". I now need to do a text search in my IDE for any instance of "proc_name" if I want to know the history of changes to the proc. The VCS knows nothing about it. Devs can each make a migration in their own branches that succeed when each is deployed, but leave the proc with a missing update.
I'm not saying any of this to complain about flyway, and I fully realize it's not a simple area. I'd almost say it's unsolveable (by flyway).
We're scheming how to handle this problem, and I'd be very interested to know how others have handled it.
Repeatable migrations are supported by Flyway 4.0, now.
Just add sql files starting with "R" without any version information to your migration folder:
R__Blue_cars.sql
You have to ensure, that the script could be repeatable migrated.
This is usually done by "CREATE OR REPLACE" clauses in your DDL statements.
https://flywaydb.org/documentation/migration/repeatable

I'm working with Drupal and I want to diff between two databases, to see which tables have been updated in a given session. how do I do this?

I'm working with Drupal on a project, trying to find a way to speed up tests (we're using Cucumber and Selenium), and I'm trying to see which tables have been changed in a given series of steps, so I can just revert dump and out reset those tables between each test case.
Right now, Simpletest, the Drupal testing framework works by installing and setting up the tables for every module needed for a test, which makes for slow tests, and I'm emulating a similar approach by loading a db dump for each test.
Given that a site, if you're doing integration testing has a 'known good' state to be starting from, I think it would be faster to be able to just revert back to that point each time, instead of waiting twenty seconds or so to drop the database then pipe the dumpfile back in between each test runs.
However, when I try diffing between two dumpfiles (ie before.I.create.a.node.sql, and after.I.create.a.node.sql) the output is an unreadable load of serialised php, that I can't make sense of.
Ae there any tools I can use to help work out which tables I need to drop and rebuild between test cases, so I don't incur the 20 second hit on each test, short of reading the schema and code of every module I'm working with?
I'm following the ideas outlined here with getting cucumber to work with PHP, and yes, I have seen [this question here on a similar subject
Thanks!
Drupal does store a lot of serialized PHP in the database. But the main part of it is kept in the cache tables; like cache, cache_field, cache_menu, etc and you can safely truncate these before dumping the database.
If you have any simpletest tables you could drop those. They are all temporary and is used only for running your Simpletest test suite.
That should reduce the dump size a lot. If it's not enough I can recommend reading up on the tables in the book Pro Drupal Development, or you could skim through the .install files to read the module's schema definitions. Though most will probably be real data you'd want to revert between tests.
Because of the relational nature of the database, be sure to either know exactly what you're doing or dump/revert all the remaining tables together.

Resources