I have implemented a simple behaviour that is executed every time a node with a given aspect is updated.
It works well in my development environment (community with a single node).
I am wondering what would happen if someone installs my module in a clustered environment.
Will the behaviour be executed on each node of the cluster or only in the node that executed the change?
Behaviors are only executed on the node in the cluster that handled the change.
Related
When handling multiple environments (such as Dev/Staging/Prod etc) having separate (preferably identical) Airflow instances for each of these environments would be the best case scenario.
I'm using the GCP managed Airflow ( GCP Cloud Composer), which is not very cheap to run, and having multiple instances would increase our monthly bill significantly.
So, I'd like to know if anyone has recommendations on using a single Airflow instance to handle multiple environments?
One approach I was considering of was to have separate top-level folders within my dags folder corresponding to each of the environment (i.e. dags/dev, dags/prod etc)
and copy my DAG scripts to the relevant folder through the CI/CD pipeline.
So, within my source code repository if my dag looks like:
airflow_dags/my_dag_A.py
During the CI stage, I could have a build step that creates 2 separate versions of this file:
airflow_dags/DEV/my_dag_A.py
airflow_dags/PROD/my_dag_A.py
I would follow a strict naming convention for naming my DAGs, Airflow Variables etc to reflect the environment name, so that the above build step can automatically rename those accordingly.
I wanted check if this is an approach others may have used? Or are there any better/alternative suggestions?
Please let me know if any additional clarifications are needed.
Thank you in advance for your support. Highly appreciated.
I think it can be a good approach to have a shared environement because it's cost effective.
However if you have a Composer cluster per environment, it's simpler to manage, and it's allows having a better separation.
If you stay on a shared environment, I think you are on the good direction with a separation on the Composer bucket DAG and a folder per environment.
If you use Airflow variables, you also have to deal with environment in addition to the DAGs part.
You can then manage the access to each folder in the bucket.
In my team, we chose another approach.
Cloud Composer uses GKE with autopilot mode and it's more cost effective than the previous version.
It's also easier to manage the environement size of the cluster and play with differents parameters (workers, cpu, webserver...).
In our case, we created a cluster per environment but we have a different configuration per environment (managed by Terraform):
For dev and uat envs, we have a little sizing and an environment size as small
For prod env, we have a higher sizing and an environment size as Medium
It's not perfect but this allows us to have a compromise between cost and separation.
I've set up a cloud composer environment and I'm very confused with the pricing. In the pricing page, it's written that any environment would have at least 2 DB vCPU and 2 Web server vCPU.
Further in the page, it's written that it's possible to partially run the environment (e.g. 25%/month).
However, in my case, my environment seems to be running 24/24h (my billing report shows ~48 hours of usage per day, corresponding to 24h per vCPU I guess)
The problem is that I've nothing running on it, except for airflow_monitoring DAG, that I can't stop even by switching it off or deleting it (but I read here Why is there an automatic DAG 'airflow_monitoring' generated in GCP Composer? that it was necessary?).
So my question is: why does the doc say I can partially run my environment whereas the latter seems to be running all the time without any DAG scheduled on it?
It's not currently possible to "turn off" an environment aside from deleting it entirely, because there are certain system components that always need to be active (like the managed database). The example is referring to using an environment for development for a week, and then getting rid of it completely.
Airflow consumes CPU and memory regardless of if DAGs are running/scheduled, so Composer is charging for the GCE instances/GKE cluster it's deployed on.
I am in a confusing situation where I develop plug-ins for WordPress and push them to my git repository. The WordPress is on the AWS server and for every push to git, I have to create a new environment with elastic beanstalk.
Once I push to git, I first create a DEV environment and pull the changes that I want to push to production. Ex. I have changes: c1,c2,c3,c4,c5 and I want to push c1,c2,c3. I pull the changes and create the DEV. I then create the test environment to test. Once that is passed, I create the UAT (customer test environment). Let's say that the customer did not like the c3 and asked us to only push c1 and c2. In this case, I have to recreate a DEV, TEST and UAT environment and retest because removing c3 might affect other code as well. I have to send the code to UAT because at that point I repackaged the code and therefore, needs a new UAT.
I am looking for a way to reduce the number of times I send the same code to UAT. Technically, I am not supposed to send the same code to UAT again.
I was thinking about pushing each change individually rather than packaging them together; this will take away the redundancy in UAT but will add more work to the test team which will lead to a bottle neck.
PS. I cannot create automated tests, because the changes are mostly about the graphics and visuals. Also, there are thousands of pages to test. It just doesn't make sense to write test scripts for everything. Are there any suggestions?
Technically you're not sending the same code to UAT: after the c3 rejection you're sending back c1+c2, not c1+c2+c3 - not the same code.
Unfortunately, with an integration solution based on post-commit verification there is not a really deterministic way to minimize the number of UAT submissions. That's because you have no way of knowing in advance which commit will cause a UAT rejection.
As you noticed, the most predictable way of moving forward is also the most costly - to run UAT for every change. The only way to reduce the UAT submissions is to submit multiple changes bundled together - the larger the bundle the fewer UAT submissions. But this raises a conflict: the chances of failing UAT also increase with the bundle size and so does the number of bisection re-tries required to identify the culprit (that's assuming only one per bundle, if there are several of them it's even worse).
I'd run an analysis on the UAT submissions in the most recent 2-4 weeks or so and determine at what bundle size the probability of a UAT rejection reaches something like 30-50%, then pick the max power of 2 below that value (better for the bisection you'd need to perform in case of failure). Say, for example, that the analysis suggests a value of 5, then pick 4 as the bundle size.
If you don't have enough changes to fill the bundle I'd suggest again picking the max power of 2 and leave the rest for the next bundle - other changes may be merged in the mean time and maybe you can fill that next bundle. Unless you already know about dependencies between changesets which would require them together in the same bundle and may land you in between the preferred values. Up to you if you pick up those dependencies (higher risk) of let all of them for the next bundle.
You should also keep monitoring the bundle size vs chances of UAT rejection trends (both product and UAT evolve, things change) - you may need to adjust the preferred bundle size from time to time.
Side comment: you can always build some custom UAT wrapper script(s) to make it appear as an automated test that you can hookup into a CI/CD pipeline. Only it'll have undeterministic queue wait and/or execution times. And, if indeed its execution is manual, it can also be more unreliable.
I have been running a OpenLdap N-Way Structure with two Master Nodes. This configuration have been running for some months whitout any problem. https://www.openldap.org/doc/admin24/replication.html
Now, I need to add a third node. The strange behavior is that, when I add this third node (clean Database), It starts to delete the entries of the others two nodes.
Seems that the "clean database" is getting replicated to the other servers, deleting several entries. It is creating a lot of problems even, to restore the backups.
I am looking for the best practice/way to add a completely new node in this environment that already is running, without losing data.
Also, is there some official documentation about the best way to Backup this environment?
Any information is welcome.
Thank you,
Our team has hundreds of integration tests that hit a database and verify results. I've got two base classes for all the integration tests, one for retrieve-only tests and one for create/update/delete tests. The retrieve-only base class regenerates the database during the TestFixtureSetup so it only executes once per test class. The CUD base class regenerates the database before each test. Each repository class has its own corresponding test class.
As you can imagine, this whole thing takes quite some time (approaching 7-8 minutes to run and growing quickly). Having this run as part of our CI (CruiseControl.Net) is not a problem, but running locally takes a long time and really prohibits running them before committing code.
My question is are there any best practices to help speed up the execution of these types of integration tests?
I'm unable to execute them in-memory (a la sqlite) because we use some database specific functionality (computed columns, etc.) that aren't supported in sqlite.
Also, the whole team has to be able to execute them, so running them on a local instance of SQL Server Express or something could be error prone unless the connection strings are all the same for those instances.
How are you accomplishing this in your shop and what works well?
Thanks!
Keep your fast (unit) and slow (integration) tests separate, so that you can run them separately. Use whatever method for grouping/categorizing the tests is provided by your testing framework. If the testing framework does not support grouping the tests, move the integration tests into a separate module that has only integration tests.
The fast tests should take only some seconds to run all of them and should have high code coverage. These kind of tests allow the developers to refactor ruthlessly, because they can do a small change and run all the tests and be very confident that the change did not break anything.
The slow tests can take many minutes to run and they will make sure that the individual components work together right. When the developers do changes that might possibly break something which is tested by the integration tests but not the unit tests, they should run those integration tests before committing. Otherwise, the slow tests are run by the CI server.
in NUnit you can decorate your test classes (or methods) with an attribute eg:
[Category("Integration")]
public class SomeTestFixture{
...
}
[Category("Unit")]
public class SomeOtherTestFixture{
...
}
You can then stipulate in the build process on the server that all categories get run and just require that your developers run a subset of the available test categories. What categories they are required to run would depend on things you will understand better than I will. But the gist is that they are able to test at the unit level and the server handles the integration tests.
I'm a java developer but have dealt with a similar problem. I found that running a local database instance works well because of the speed (no data to send over the network) and because this way you don't have contention on your integration test database.
The general approach we use to solving this problem is to set up the build scripts to read the database connection strings from a configuration file, and then set up one file per environment. For example, one file for WORKSTATION, another for CI. Then you set up the build scripts to read the config file based on the specified environment. So builds running on a developer workstation run using the WORKSTATION configuration, and builds running in the CI environment use the CI settings.
It also helps tremendously if the entire database schema can be created from a single script, so each developer can quickly set up a local database for testing. You can even extend this concept to the next level and add the database setup script to the build process, so the entire database setup can be scripted to keep up with changes in the database schema.
We have an SQL Server Express instance with the same DB definition running for every dev machine as part of the dev environment. With Windows authentication the connection strings are stable - no username/password in the string.
What we would really like to do, but haven't yet, is see if we can get our system to run on SQL Server Compact Edition, which is like SQLite with SQL Server's engine. Then we could run them in-memory, and possibly in parallel as well (with multiple processes).
Have you done any measurements (using timers or similar) to determine where the tests spend most of their time?
If you already know that the database recreation is why they're time consuming a different approach would be to regenerate the database once and use transactions to preserve the state between tests. Each CUD-type test starts a transaction in setup and performs a rollback in teardown. This can significantly reduce the time spent on database setup for each test since a transaction rollback is cheaper than a full database recreation.