Reusing Tasks (operators) across DAGs - airflow

I'm exploring AirFlow as a workflow execution engine.
I like the fact that it's very flexible, allow multiple operators, such as Python functions. However, I'm afraid I may be missing something fundamental - Task reuse. I want to run existing operators in multiple DAGs without having to redefine them.
As far as I can tell - this is not supported. Am I wrong? and If so, I'll be happy if someone can point me to a solution.
The only (awkward) solution that comes to mind is to have a dummy-DAG for every Operator, and then building my DAG on top of these dummy-DAGs with a DagRunOperator.
Many thanks!

The recommended way to achieve this would be to create your own Airflow plugin.
(From the Airflow Documentation on Plugins)
Airflow has a simple plugin manager built-in that can integrate external features to its core by simply dropping files in your $AIRFLOW_HOME/plugins folder.
The python modules in the plugins folder get imported, and hooks, operators, sensors, macros, executors and web views get integrated to Airflow’s main collections and become available for use.
So if you were to create a custom operator in your plugin, you would be able to re-use that same operator across multiple DAGs.
This repo may also be helpful as it has quite a few examples of Airflow plugins: https://github.com/airflow-plugins

Related

Alfresco content services - Extentions/AMP/customization - How does it work?

I have recently started learning about Alfresco Content Service.
I have some questions:
My understanding is that the standard way to add customization is to create AMP's.
Why create an amps for each customization instead of adding it directly to the configurations of ACS? Are there some benefits like not having to restart the service or something?
If apply_amps adds all custom amps to the alfresco server (.war files), won't there be a risk of customizations writing over each other?
E.g if two different amps change the same standard button in the share service.
I have found that there are 2 ways to add these customizations as well:
Add dependency to the pom file. (works only for .jar)
Actually compile the .amp and move it to the correct folder and run apply_amps.sh.
From the documentation it seems to my like AMP-files used to be the standard way of adding customization but that there have now been a move away from this in favor of using regular jar files and eventually in 7.1 and forward use JSON instead.
Yet other tutorials I find mentions things like "always use .amp". Which then seems strange if it contradicts the information on the official documentation.
Also I found something about adding amps through the share interface? Or must they always be added when building the server (.war)?
Could someone provide me with a thorough explanation of the best practice for applying customizations to the alfresco content service? Preferably with details regarding a live production setting.
Thanks for helping me make some of this clearer.
I'll try to give you helpful answers:
Making app packages (APMs or JARs) is much better than changing config manually. It's good for versioning, portability (TEST vs PROD or between projects), composition (you can add some addons witch are often very useful)... It is standard and good way how to build a web app.
About conflict of customizations, I'm not sure how it works. Is good practise always use own namespace for every AMP.
If AMPs write to the same file, result is always append (share-config-custom.xml can get be very big).
Problem about JARs and AMPs is simple. Old version of Alfresco supports more AMPs than JARs. Now it does not matter with way you use. Try to look inside these packages they look very similarly.
I never heard about adding AMPs through the share interface. Have you some source? Only thing which is similar is creating content model through Model manager (https://docs.alfresco.com/content-services/latest/tutorial/model/)
I use for PROD combination of AMPs and JARs. I have a lot of legacy code and addons in AMPs and new things in JARs. Alfresco work with them same...

Compose a pipeline with other pipelines

I am currently trying to choose between numerous workflow frameworks. I need an important feature which is workflow composition.
I found nothing on the documentation even in the API reference or advanced tutorial.
So my question is : Is it possible to compose pipelines ? i.e. To build some DAGs with already written ones. There is maybe some workarounds but I am interested by its native integration.
Thanks
I think you may be looking for the composite solid abstraction, which lets you compose sub-DAGs. https://docs.dagster.io/tutorial/advanced_solids#composite-solids https://docs.dagster.io/_apidocs/solids#composing-solids

Deploying wordpress as AWS lambda functions?

I am wondering if it is feasible to deploy wordpress as a series of lambda functions on AWS API gateway. Any pointers on the feasibility/gotchas would be greatly appreciated!
Thanks in advance,
PKK
You'll have a lot of things to consider with persistence and even before that, Lambda doesn't support PHP. I'd probably look at Microsoft Azure Functions instead that do support PHP and do have persistent storage.
While other languages (such as Go, Rust, Swift etc.) can be "wrapped" to run in AWS Lambda with relative ease, compiling PHP targeting the same platform and running it is a bit different (and certainly more painstaking). Think about all the various PHP modules you'd need for starters. Moreover, I can't imagine performance will be as good as something like a Go binary.
If you can do something clever with the Phalcon framework and come up with an easy build and deploy process, then maayyyybee.
Though, you'd probably need to really overhaul something like WordPress which was not designed for this at all. It still uses some pretty old conventions due to the age of the project and while that is all well and good for your typical PHP server, it's a different ball game in the sense of this "portable" PHP installation.
Keep in mind that PHP sessions are relied upon as well and so you're going to need to move those elsewhere due to the lack of persistence with AWS Lambda. You can probably find some sort of plugin for WordPress that works with Redis?? I have to imagine something like that has been built by now... But there will be many complications.
I would seriously consider using Azure Functions to begin with OR using Docker and forgoing the pricing model that cloud functions offers. You can still find some pretty cheap and scalable hosting out there.
What I've done previously was use AWS ECS (Docker) with EFS (network storage) for persistence and RDS for the database. While this doesn't carry the same pricing model as Lambda, it is still cost efficient. You can set up your ECS Service to autoscale up and down. So that way you're running the bare minimum until you need more.
I've written a more in depth article about it here: https://serifandsemaphore.io/how-to-host-wordpress-like-a-boss-b5993fcfbd8e#.n6fbnf8ii ... but it's basically just the idea of running WordPress in Docker and using EFS to offload the persistent storage issues. You can swap many of the pieces of the puzzle out if you like. Use a database hosted in some other Docker service or Compose or where ever. That part need not be RDS for example. Even your storage could be handled in a different way, though EFS worked pretty well! The only major thing to note about EFS is the write speed. Most WordPress sites are read heavy though. Your mileage will vary depending on your needs.
Is it possible? Yes, anything is possible with enough time and effort. Is it worth it? That is a question best to ask yourself.
PHP can be run on Lambda as per the documentation located here: https://aws.amazon.com/blogs/compute/scripting-languages-for-aws-lambda-running-php-ruby-and-go/ .
The bigger initial problem as stated in other comments is a persistent file system. S3 for media storage is doable via Wordpress plugin (again from the comments) but any other persistent storage for the request / script execution is the initial biggest hurdle. Tackle one problem at a time till you get to the end!

Is It Possible To Use Celery With Another Programming Language?

I heard about celery and I really like it. But now I'm writing an application with node.js and I have to manage (asynchronous) tasks and I want to use celery to this. I've installed it in my development environment and played around with some python scripts. It all works well but I want to "call" the tasks with node.js. Has anyone tried to do something like this (with any programming language)?
I saw this example, but the base of this HTTP Gateway idea is a django application and I don't want to create a django app to only handle these calls.
I thought about creating a SimpleXMLRPCServer and use the node-xmlrpc module to connect with that. What do you think? There is a better way to do this? Is there another app or service that works natively with node.js?
Thanks in advance.
Celery will force you to inherit a whole Python stack for a simple message queue - seems like a messy pain to me. Check out coffee-resque for a simple and native solution.

Wordpress pages and version control

We are a software development company and are using Wordpress for static portion of the web site. Naturally, all our workflow is built around version control: multiple developers -> continious integration -> staging -> deployment.
Our challenge with integrating Wordpress into our workflow is that its database is stuck like a bone in the throat: you cannot put it into the version control, easily roll back, promote from staging to production etc.
I am wondering what people do in similar situations? I would like to find a way to integrate WP into the development workflow and not the other way around :-)
Clarification we want to "develop" and test pages on the staging system and when ready then move them over to the production as part of the version upgrade process. We don't want to do full replication of the staging database to production.
That's a common question and one that I've worked on tackling. I've written some code to address these issues albeit the code's not ready for distribution. Basically the idea is to create scripts to import the content and then version control the scripts. (Actually my approach uses a custom import/export format designed to be easy to hand-modify, but the idea is similar.)
Anyway, there are some related questions over on StackOverflow's sister site WordPress Answers:
Questions tagged with the term [staging]
Questions tagged with the term [deploy]
UPDATE
Per the clarification, this would probably be helpful too:
Is there any way to draft a revision of a published page or post? What workarounds have you used?
Hope this helps.
-Mike
I've just hit the same problem. For now we are using MySQL dump files to export/import database content, but it gets ugly with several people working on the database changes.
Since the team that works on the project is all internal and consists of just a few people, I'm thinking into the direction of locking the database dump file in VCS. Subversion had this functionality built-in, but we are using git, which, I think, is conceptually opposite of any kind of locking.
Probably we'll have a workaround script with pre-commit hook to check for the existence of a lock file next to the dump. The person who committed the lock file will be the only one allowed to commit the dump. Once he finishes the work, he will need to commit the removal of the lock file.
It sounds ugly, I know. But I've thought about it for a while and don't see an elegant solution yet.
If you're only using WordPress for static content, then any tool/methodology for version controlling databases should work - for example, work the mysql command line tools into your CI and deployment routines.

Resources