Using Flyway in modular applications - flyway

We have a modular application, each app creates its own tables (typically one two) and manage the data.
We use Flyway in our main application but also need it for our modules. However, if we add the patches to our main application, ALTER TABLE queries won't work for some deployments if the corresponding module is not installed.
One way to solve this issue is to perform the schema evolution with multiple Flyway operations, each module gets its own Flyway and manage itself. However since Flyway creates tables for managing the state we ended up too much tables since we have ~20 modules right now.
What's the elegant way to solve this issue?

I would say having the migrations managed by the unit of software it is supporting is the cleanest and trumps "too many tables". In terms of neat organisation of those tables, you can silo those using a schema (if you RDBMS supports those) and Flyway lets you name the table that is used per migration managed application.
The key thing here is "modules". From your description, it sounds like not all applications are made up of the same modules. I would ask you, if we go to the effort of making our modules descrete to create decoupled / reusable software - why should database schemas of those modules be treated any different?
To counter your concern about "too many tables", lets try and debunk the costs of that.
Volume. Your RDBMS is made to handle thousands, there is no cost there.
Operational. Flyway does all the management here, they are effectively opaque to you.
Performance. They are a deployment concern, not a runtime liability.
Organisational. Hide them / name them with the methods mentioned above.
Our natural urge is to aggregate related things but doesn't always lead to the best outcome, so we must be pragmatic. In this situation, good/flexible design trumps aggregation.

Related

Sharing stored procedures across multiple apps

Team A has an enterprise app that uses ADO.NET for data access that executes stored procedures. The data access is encapsulated in it's own project (let's call it DAL.dll)
Team B is creating another unrelated app that's reusing the stored procedures in the enterprise app. This app is currently using the MS application block for data access. The issue we run into is that whenever Team A make any change to the input/output params in the stored procedures, there is a runtime error in Team B's app and this app needs to be updated to accommodate the additional params (or params that were removed). So, most of these go unnoticed until a user complains. At the very least, we would like to have the app throw a compilation error so that the build process warns us of the changes made.
One way to do this is to have Team B's project add a reference to the DAL.dll
I'd like to know if there are any other cleaner ways of solving the issue. We are ready to replace Team B's MS Data application block to use a different technology (Entity Framework?) if necessary.
Among the other answers, I'd strongly suggest getting those stored procedures into source control, in a Database Project. You then may be able to use the features of your source control system to do several things:
Lock some of the code so that it cannot be changed
Give you notifications if the code is changed
Warn you if the stored procedures change in a way that would prevent them from being called
Branch the stored procedures so that each team can have their own version of changed code, while keeping the unchanged stored procedures common. You of course will need to separate the different versions in the database.
I agree with the other posters on this thread that you should not share stored procedure's across different .NET DLL's, that is just a recipe for disaster. I would also shy away from ORM's like Entity Framework if you are doing anything at all complicated with your database schema because ORM's excel at getting a simple object model translated from your .NET application classes into SQL tables and SP's, but traditionally do poorly at optimizing them for performance on the database side. There will be people who claim otherwise, and they may have a valid point if you are an expert in wrangling an ORM to do waht you want like they are, but chances are you are not and it will cause you headaches in the long run.
A shared data access layer might work, but conceptually you are then just changing the implementation of the dependency from some code that a DBA wrote to some code that a .NET programmer wrote. Yes, you can use integration tests to achieve better verifiability, but the same case could be made for SQL with tools like Red Gate's SQL Test. I would shy away from this approach if the two applications are already experiencing some sort of pain from sharing SP's. That is an indication that the dependency just should be done away with.
If it were up to me, I'd just make a new schema for Team B's app. You can read more about schemas in SQL Server here: MSDN Schema description for 2008 R2. You can think of them as namespaces for SQL Server but with some additional bells and whistles like permission and access control. Separating out your different applications into separate schemas on the same shared database will probably make for the most flexible implementation in the long run.
unrelated app that's reusing the stored procedures in the enterprise app
If these two application are really unrelated why are those sharing procedures or even the same database. I know this is a long read, but I recommend you to read this: A Better Path to Enterprise Architectures
The partioning concept in there relates to the bounded context in Domain driven design:
Multiple models are in play on any large project. Yet when code based on distinct models is combined, software becomes buggy, unreliable, and difficult to understand. Communication among team members becomes confusing. It is often unclear in what context a model should not be applied.
Therefore: Explicitly define the context within which a model applies. Explicitly set boundaries in terms of team organization, usage within specific parts of the application, and physical manifestations such as code bases and database schemas. Keep the model strictly consistent within these bounds, but don’t be distracted or confused by issues outside.
It is expected you end with problems when you don't explicitely deal with this. You're lucky you're seeing early failures, as it can turn into problems much harder to find on the long run.
Analyze the problem again with the above in mind. Consider if you're missing some explicit context where this common functionality should live.
My question is: which team owns the store procedured and the database shared? Usually as a good architecture/design, you should not have two different apps sharing same database / procedures.
A better way to share data/functionality between two different applications is through a services or API, so the team who owns the functionality would be responsible to maintain it.
Also, have a good communication between both teams is highly recommend.
Depending on the owner of the DAL project, you could host web services and share the API. That way, you separate the Data Access Layer from the business logic, which allows anyone to use the same DAL without having to publish it to each different location.
From my point of view, it looks like both Team A and Team B should share the same core model and look at Multitier architecture as a possible solution.
It sounds like it would make sense to create a shared DAL that both applications can share.
I would add unit tests (or really integration tests) to make sure the DAL is compatible with the apps after changes. That way your tests would fail if incompatible changes have been made
"I'd like to know if there are any other cleaner ways of solving the issue."
The cleanest way is for Team B to sit down with Team A and encapsulate the relevant business logic into a shared API. It doesn't matter so much how you implement that API; what does matter is that the API's interface is documented and versioned so everyone knows what to expect.
One reasonable mechanism for this in a .NET environment is to use Microsoft's WebAPI.
In short, the question of "how do we share a stored procedure?" is most likely looking at the wrong level of abstraction.

EF and customer data separation

Is it possible to build an ASP.NET website using EF where each customer logging in has separately stored data? We have customers demanding that their data won’t be stored in the same tables as other customers’ data.
I’ve read that EF can’t work with several databases but is it possible to switch database at runtime depending on input parameters? I have a feeling it won’t be possible since the migration features are tightly connected to the database being used, but I'm not sure.
One solution could be to have a separate website deployment and database for each customer. They’ll get separate domains to access but that’s not a problem. But this solution feels a bit clumsy if you’re having many customers, especially with deployment and future upgrades.
Am I missing some smart ways of solving this or is this a very tricky issue?
is structure (of the db) the same ?
if so you could switch connections - not w/o issues though, but should work. For details on how that should be done check the long discussion we've had here (and linked previous questions etc.)...
Code first custom connection string and migrations without using IDbContextFactory

LINQ vs Stored Procedures vs Inline Queries

We are a small team working on a very tight deadline to develop a large web application in .NET. We are using multiple databases (one per client) so our requirements are slightly different than most applications. The databases will only be used for this particular application so it doesn't matter if they are tightly coupled with the application. The main deciding factors are speed of development, long-term maintainability, and security. There are 3 options we are considering:
Option 1 - LINQ to SQL
None of us have any experience with LINQ, but we have been researching it and it seems like a good option and not too difficult to learn. Worth the risk of learning a new method on a tight deadline?
Option 2 - Stored Procedures
Seems like it could be a nightmare to maintain with the multiple database setup (or would it?) and it may slow down development to work in another environment as we don't have a dedicated database developer. Basic CRUD queries would be generated by code generator which is an advantage.
Option 3 - Inline Queries
This method would be the fastest to develop but I know people are generally against hard-coded queries nowadays and I fear we may suffer in the long term with maintainability issues. Basic CRUD queries would be generated by code generator.
Please let me know if there are any factors we are missing. What solution seems the most appropriate for this project?
If you have a tight deadline don't try something new. Ask the devs to study Entity Framework at home and in their spare time and try it in next project. Meanwhile do what you know best and have used in past successfully.
Inline queries are not bad if they are decoupled in a DAL assembly.
Since #Hasan Khan covered the primary answers regarding SQL. I'm going to throw out a somewhat different answer. Another option is to consider usage of RavenDB, a NoSQL db. It has the concept of Tenant databases inherently baked into it. Which from your requirements it sounds like this is the intended goal.

Should we have separate database instance for each developer?

What is the best way for developing a database based application? We can have two approaches.
One common database for all the developers.
Separate database for all the developers.
What are the pros and cons of each? And which one is better way?
Edit: More then one developer is supposed to update the database and we already have SqlExpress 2005 on each developer machine.
Edit: Most of us are suggesting a common database. However if one of the dev has modified the code and database schema . He has not committed the code changes but the schema changes has gone to the common database. Will it not possibly break the other developers code.
Both -
I like a single database that changes are tested on before going live, or going to a 'formal' test environment. This is your developer's sanity check; it stays up to date with the live system and it makes sure they always consider each others changes. The rule should be that changes don't go on here if they might break something else.
A database per developer is great (even essential) when more than one developer is making updates. It allows them all the development flexibility they want without breaking things for other developers.
The key is to have a process for moving database changes from development through to your live system, and stick to your process.
Shared database
Simpler
Less cases of "It works on my machine".
Forces integration
Issues are found quickly (fail fast)
Individual databases
Never affect other developers, but this is also a bad thing, in continuous integration
We use a shared development database and it works out nicely. Our schema rarely changes in a way that makes it backwards incompatible, but occasionally a design change will occur before we go live, and we simply ask the other developers to update.
We do have separate development application (web) servers, but they share the same database. Our developers do have the option to use their own database, as they know how to set this up, and will do that on occasion, but only temporarily. The norm, for us, is to share the database.
Thought I'd throw this out there, but why not let every developer host their own instance of SQL Server Developer on their desktops and then have a shared server for each of the other environments (development, QA, and prod)? I think even the basic MSDN that comes with Visual Studio Pro (if you opt for it) includes a license for SQL Server Developer.
The developer can work on their desktop without impacting the others and then you can have them move the code to the next shared environment as you see fit (at will, with daily/weekly builds, etc.).
EDIT:
I should add that the desktop instance allows developers to do things that he DBAs often restrict on shared environments. This includes database creation, backup/restore, profiler, etc.. These things are not essential but they allow the developer to become so much more productive while reducing the demands they make against your DBAs.
The shared environment is completely necessary for testing - I would not recommend going from desktop to production. But you can add so much by allowing the developers to have 100% control over a given database environment (including isolation from others) with a relatively minor cost.
Depends on your development, testing and maintenance cycles. Also on the size and location of the development team (and of course organization). If you support several versions of the database you might need even more environments.
In real world I found the following approach rather satisfying:
single central database/application for testing purposes, gets all the changes by various developers periodically merged into it
local copies for development (so you are free to drop and reload the whole database)
upgrade scripts are maintained for any changes to schema, auxiliary and sample data sets
Here are some further points:
If two developers (two teams) are working on changes that can affect each other then they should complete their tasks independently and then integrate/merge and test. For this it is much better to have separate development environments (unless they have to work together in which case I consider them to be a part of the same team; still they can work on their own copies of the database and share it if necessary)
If they work on the changes that do not influence each other they could work on the main server. Or on their own local copies of the database.
So, developing on the local copy has all the benefits with no risk in a general case (when you support multiple versions of the system and maintain upgrade scripts anyway).
Still it is great if you can share test cases so ability to dump/restore the database easily and quickly is a big plus.
EDIT:
All of the above assume that having a copy on the local machine of the whole system for testing purposes is feasible (size, performance, licenses, etc).
I would opt for solution #1 : One common database for all the developers.
Pros
Less expensive for the infrastructure;
Only one dump is required when it's time to refresh the development database;
Everyone develops with the same data, so it closely represents the production environment;
Cons
If one developer performs a bad operation, this could impact a larger amount of developers.
As for solution #2 : One independant database for each of the developers;
Pros
This could be useful for new features developments, when development requires isolation;
Cons
More expensive for the company (infrastructure, licences...);
Multiplication of problems caused by eager isolation development environment (works in devloper's environement, not integrated);
Multiplication of dumps by the DBAs of the same copy from the production environment.
Considering the above, I would recommend, depending on your company size:
One database for development;
One database for testing the integration;
One database for acceptance tests;
One for new feature development that will perhaps require integration tests.
If your company doesn't require integration tests, then go with acceptance tests, this step is crucial before going to production.
One per developer plus a continuous integration and build server to run unit and integration tests. That gives you the best of both worlds.
Having all developers modify a single dev database quickly becomes less productive once the amount of database change reaches a certain level because it forces a developer to deploy changes to the shared database before he is ready to check-in, which means other parts of the code line may break unnecessarily.
Simple answer:
Have one development database, and if the developers want their own, they can just run their own instance on their own machines. Just be sure to test/publish on the shared.
We do both:
We use code generation where I'm at and our database is generated as well. So we have an instance on each developer's box where the database is generated. Then we use the scripts that are generated to apply the changes to a central test database. If that goes well we apply the changes to the production database during a release.
What's nice with this approach is that when our "source of truth" is checked in to source control, all the database changes are automatically distributed to the other developers when they rebase and regenerate. It works well for us.
The best way is single database on Test/QA server and one database (probably on developer's local computer) for each developer (so, 10 developers work with 10 + 1 databases).
The same approach as for general development: each developer has own copy of source code on local machine.
Also, multiple-database approach simplifies the keeping database schema in version control systems. We are keeping database creation scripts in SVN.
We are using the approach, described here:
http://www.sqlaccessories.com/Howto/Version_Control.aspx
You might also want to look at Refactoring Databases. Aside from discussing database changes, he includes discussions on going from development to production in a way that reduces risk.
Why on earth would you want a separate database for all developers?
Have one common database for all, that way the table structure is consistent and the sql statements are as well.
The biggest problems with developers having their own databases are:
First it is unlikely to be the size
of the real production database (if
you take all the databases we need to
work with here, they would take up
several hundred gigabytes of space, I
don't have that available on my
machine), this causes bad code to be
written that will never work on a
large database for performance
reasons. SQL code should never be written against a data set significantly smaller than the one on prod.
Second, developers who use their own
database create problems when they
spend a long time developing
something and then find out only
after they merge with a real datbase
that it affects something else. You
find this stuff much faster when you
share the environment. So there is
inthe end less wasted development
time.
Third developers working on related
things need to know about the changes
you are making, it will affect their
change.
When you know you are going to affect others, I think you tend to be more careful what you do which isa plus in my book.
Now the shared database server should have what we call a scratch database, a place where people can create and test table changes, so if they are doing something that might need to drop and recreate a table (which should be a rare case!), they can test the process first by copying the table to the scratch database and running their process there and then changin to the real database when they are sure it works. Or we often copy a backup table to the scratch database before testing a particular change, so we can easily recreate the old data if it goes bad.
I see no advantages at all to using individual databases.

Why split a BizTalk solution into multiple projects

I've read it's good practice to split a BizTalk solution into multiple projects, and have seen some debate as to the exact nature of the split, e.g. ...
- could be split by artifact, i.e. Schemas, Orchestrations, Maps, etc.
- could be split by function
But what's the benefits / con's ??
BizTalk solutions typically include schemas, maps, and orchestrations. Solutions can also include supporting components, Business Rules, definitions of port-based routing and transformations, trading partners, and several other types of artifacts.
Effectively managing all of these artifacts has many benefits – far more benefits than drawbacks.
The benefits include:
Separation of concerns based on the
logical grouping of artifacts (by
functionality or artifact type for
example). This approach reduces the
possibility of modifying aspects of
your solution that aren’t related to
the problem you’re working on at the
time.
Easier to test – you can compile and
deploy just the components you’re
modifying.
Easier to split work among a group of
developers.
Easier to manage when the solution
gets larger – it can take several
minutes to load large BizTalk
solutions in Visual Studio.
Supports more advanced approaches
related to ESB-style solutions (very
loose coupling). Depending on your
overall approach, you can create a
solution that is very modular – to
the point that modules can operate
and be updated completely
independently of each other.
Makes it possible to version
artifacts separately.
Facilitates more fine-grained control
over security and memory utilization
by grouping related functions such
that you deploy them for a particular
host instance, for example (you can
also administer fine grained .NET
security policies more easily than
you can with a solution that deploys
a few assemblies).
The main drawback to splitting your solution over several projects or solutions surfaces when you are debugging your solution. Debugging BizTalk solutions is not straightforward for many developers that are new to BizTalk and having to narrow-down bugs across solutions does not make the job any easier. However, you can mange this issue by more effectively arranging your solution and using standards around naming, directory structure, arrangement of namespaces, and related methods to make it easier to figure out where to look.
Other drawbacks include:
More assemblies to sign and deploy
into the GAC
Inter-dependencies between
projects can result in deployment
errors that can be difficult to
track down in poorly organized
solutions.
You should dedicate some time at the beginning of a project – ideally during design – to setup the basic organization of your solution. A one-size-fits-all approach does not exist – you need to think about how you want to manage the solution during development, deployment and maintenance in the context of the functionality that the solution provides to your organization or clients.
A good place to start is to divide your solution based on artifact type or functional areas. As you grow your solution, you’ll get a better understanding about how artifacts relate to each other, how you want to manage strong naming, security, and physical deployment and be in a better position to arrange your solution more effectively. You need to be careful with this approach since you could end up having to rearrange large parts of the solution, which can be disruptive if your project’s timelines are tight.

Resources