Doctrine schema update or Doctrine migrations - symfony

What are the practical advantages of Doctrine Migrations over just running a schema update?
Safety?
The orm:schema-tool:update command (doctrine:schema:update in Symfony) warns
This operation should not be executed in a production environment.
but why is this? Sure, it can delete data but so can a migration.
Flexibility?
I thought I could tailor my migrations to add stuff like column defaults but this often doesn't work as Doctrine will notice the discrepancy between the schema and the code on the next diff and stomp over your changes.

When you are using the schema-tool, no history of database modification is kept, and in a production/staging environment this is a big downside.
Let's assume you have a complicated database structure in a live project. And in the next changeset you have to alter the database somehow. For example, your users' contact phones need to be stored in a different format, not a VARCHAR, but three SMALLINT columns for country code, area code and the phone number.
Well, that's not so hard to figure out a query that would fetch the current data, separate it into three values and insert them back. That's when migrations come into play: you can create your new fields, then do the transforms and finally drop the field that was holding the data before.
And even more! You can even describe the backwards process (the down migration), when you need to undo the changes introduced in your migration. Let's assume that someone somewhere relied heavily on the format of the VARCHAR field, and now that you've changed the structure, his piece of code is not working as expected. So, you run migration:down, and everything gets reverted. In this specific case you'd just bring back the old VARCHAR column and concatenate the values back, and then drop the fields.
Doctrine's migration tool basically does most of the work for you. When you diff your schema, it generates all the necessary up's and down's, so only thing you'll have to do is handle the data that could be damaged when the migration is applied.
Also, migrations are something that gives other developers on your team knowledge on when it's time to update their schemas. With just the schema-tool, your teammates would have to run doctrine:schema:update each and every time they pull, `cause they wouldn't know if the schema has really changed.
When using migrations, you always see that there are some updates in the migrations folder, which means you need to update your schema.

I think that you indeed nailed it on Safety. With Migrations you can go back to another state of the table (just like you can do in Git version control). With the schema update command you can only UPDATE the tables. There is no log kept for going back in case of a failure with already saved data in those tables. I don't know exactly, but doesn't a migration also saves the data of the corresponding table that's being updated? That would be essential in my opinion, otherwise there is no big reason to use them.
So yes, I personally think that the main reason for using migrations in a production environment is safety and maybe a bit of flexibility. Safety would be the winner here I think :)
Hope this helps.
edit: Here is another answer with references to the Symfony docs: Is it safe to use doctrine2 migrations in production environment with symfony2 and php

You also cant perform large updates with plain doctrine migration. Like try to update index on 30 mln users database. As it will a lot of time while you app will not be accessible.

Related

Version control of big data tables (iceberg)

I'm building a Iceberg tables on the top of a data lake. These tables are used for reporting tools. I'm trying to figure out what is the best way to control a version/deploy changes to these tables in CI/CD process. E.g. I could like to add a column to the Iceberg table. To do that I have to write a ALTER TABLE statement, save it to the git repository and deploy via CI/CD pipeline. Tables are accessible via AWS Glue Catalog.
I couldn't find to much info about this in google so if anyone could share some knowledge, it would be much appreciated.
Cheers.
Version control of Iceberg tables.
Agree with #Fokko Driesprong. This is a supplement only.
Sometimes, table changes are considered as part of task version changes. That is, table change statements, ALTER TABLE, are bound to task upgrades.
Tasks are sometimes automatically deployed. So it often executes a table change statement first, and then deploys a new task. If the change is disruptive, then we need to stop the old task first and then deploy the new one.
Corresponding to the upgrade, we also have a rollback script, of course, the corresponding table change statement.
thanks for asking this question. I don't think there is a definitive way of doing this. In practice I see most people bundling this as part of the job that writing to the Iceberg table. This way you can make sure that new columns are populated right away with the new version of the job. If you don't do any breaking changes (such as deletion of column), then the downstream jobs won't break. Hope this helps!

How to add Flyway to an existing database without a DDL script?

I currently have a Kotlin-Exposed project that I would like to add Flyway to. The problem I am having is that most documentation and answers online indicate that the best way to add Flwyay to an existing schema is to just have the first script be a data definition script. This usually would work, but since I'm dynamically generating my SQL with an ORM, this doesn't really make sense. Are there ways around this?
I really just want to use Flyway to add/delete persistent data that I will always need in certain tables. I don't want to insert it at the ORM level because if the application is run multiple times, then it can insert the data each time it's run (as opposed to Flyway where it will just migrate the database to the newest constructed state).
I think another way to word this question is: "Can I use Flyway for static data only, and not schema?"
Yes, you can.
Some info:
You are not required to have a first script containing the data definition / "baseline" of the schema. You can simply skip that.
When running Flyway against a non-empty database for the first time, you will still need to run the baseline command. In your case this will simply indicate to Flyway that it can assume the baseline schema is present and that it's safe to run migrations. (Your baseline schema was deployed by the ORM instead of a baseline script -- that's totally fine, Flyway won't check/doesn't care.)
You could choose to write your scripts that insert static data in a way that they are idempotent / use a guard clause so that they don't insert the data twice. That way it would be safe to use at the ORM level if you choose.

How to avoid code duplication for non-data structures (views, stored procedures etc)

My project contains a lot of objects like views and stored procedures which are being changed quite frequently. Now I have to create new SQL script on every update which contains complete source code of changed objects despite I've actually changed only few rows. It leads to massive code duplication and I also found it difficult to review these changes.
I'd like to have only one actual version of SQL script for every object like view or procedure and recreate these objects every time I redeploy the database. As result I could change existing source file (like in Java or C programming) instead of creating a new update every time I need to alter view or procedure.
Is there a possibility to execute some scripts every time I migrate the database with Flyway?
I'm not sure why that got so many downvotes, it's a perfectly understandable and valid question. Perhaps it's because it closely resembles this open question:
Migrating Stored Procedures with Flyway
We are actually starting to push against this issue now. We've been using flyway for development and testing (and love it). But we've come to a point where we're starting to have to use procs/triggers/views (p/t/v's) and the fundamental disconnect between how we did it before, and how we must use flyway, is starting to be a strain.
Before, for a given database object (let's say it's a procedure), there'd be one source file. And if you needed to change the proc 'n' times, there would be 'n' versions of the same file in your VCS. Diff tools work great, IDE's all understand this, merges detect when two developers working in separate branches make changes to the proc, etc, etc. You know, old school.
But with flyway, any one proc with 'n' changes is now scattered across 'n' files. Instead of "one object in one file with 'n' versions", you have "one objecct in 'n' files with one change each". I now need to do a text search in my IDE for any instance of "proc_name" if I want to know the history of changes to the proc. The VCS knows nothing about it. Devs can each make a migration in their own branches that succeed when each is deployed, but leave the proc with a missing update.
I'm not saying any of this to complain about flyway, and I fully realize it's not a simple area. I'd almost say it's unsolveable (by flyway).
We're scheming how to handle this problem, and I'd be very interested to know how others have handled it.
Repeatable migrations are supported by Flyway 4.0, now.
Just add sql files starting with "R" without any version information to your migration folder:
R__Blue_cars.sql
You have to ensure, that the script could be repeatable migrated.
This is usually done by "CREATE OR REPLACE" clauses in your DDL statements.
https://flywaydb.org/documentation/migration/repeatable

Avoding unnecessary updates In Update Query

In our application,Many pages includes "update" and when we update a table,we update unnecessary columns,which dont change,too.
i want to know that is there a way to avoid unnecessary column updates?We use stored procedures in .net 2003.In Following link,i found a solution but it is not for stored procedures.
http://blogs.msdn.com/alexj/archive/2009/04/25/tip-15-how-to-avoid-loading-unnecessary-properties.aspx
Thanks
You can really only accomplish this with a good ORM tool that generates the update query for you. It will typically look at what changed and generate the query for only the columns that changed.
If you're using a stored procedure then all of the column values get sent over to the database anyway when you call the stored procedure so you can't save there. The SP will probably just execute a run-of-the-mill UPDATE statement. The RDMS then takes over. It won't physically change the data on disc if it's not different. It's smart enough for that.
So my answer in short: don't worry about it. It's not really a big deal and requires drastic changes to get what you want and you wont even see performance benefits.
When I was working at a financial software company, performance was vital. Some tables had hundreds of columns, and the update statements were costly. We created our own ORM layer (in java) which included an object cache. When we generated the update statement, we compared the current values of every field to the values as they were on load and only updated the changed fields.
Our db was SQLServer. I do not remember the performance improvement, but it was substantial and worth the investment. We also did bulk inserts and updates where possible.
I believe that Hibernate and the other big ORMs all do this sort of thing for you, if you do not want to write one yourself.

Undelete accidentally deleted records in Sqlite3

As title, possible? I have by accident deleted another record due to my ugly html interface in FireFox. The bad thing is this record delete is a root folder which the program automatically cascade delete everything :(
Take a look at undark. I already used it. It it can export the rows (deleted or not) from a SQLite db file if the records were not overwritten. Last version here.
The SQLite-Deleted-Records-Parser does not give the same type of output, but can be useful.
And there are also some products like the SQLite Forensic Explorer, SQLite Repair, Sqlite Database Recovery and SQLiteDoctor.
If you are a developer you can avoid having the same problem again using litereplica. It adds single-master replication to SQLite.
But remember to enable the point-in-time recovery because as the transactions are replicated to the replicas an accidental command like DROP TABLE or DELETE FROM will also be replicated. With PITR you will be able to go to a previous point-in-time.
Or use the Backup API regularly. Although it transfers the entire db on each backup.
And remember: if you copy an SQLite file or use a regular backup approach while a transaction is active
the copy can be corrupted.
Sorry -- nope. Backups are the only option I know of.
In the future, consider never issuing DELETE queries, especially from user-accessible forms (let only the DB admin do it, if anyone) -- just include a field in your tables that marks a record as inactive and then factor that in to your queries in the WHERE clause.
Unfortunately I don't know of a way, either. However, until you do a VACUUM on the SQLite database file the deleted data is generally not technically removed. Perhaps you might be able to still recover some of the data using some sort of hex editor on the file.
It might be possible to go in and see the data via a hex-editor. The only info I could find said that metadata was gone so the records weren't going to come back, but the data itself might still be there. It has a lot to do with how important the data is, I suspect it's not important enough for you to dig out a hex editor.
The data isn't always removed from the file straightaway. If there's lots of it and you're desperate, you could use the UNIX command strings on the file. This may help you to recover various bits and pieces of human-readable data, but it'll be a hard and inaccurate process.
No way. Without a working backup you won't be able to restore this.

Resources