spring cloud dataflow and airflow - airflow

We have airflow as workflow management tool to schedule/monitor tasks and also some have applications using Spring cloud dataflow for loose coupling across processes via producer and consumer talking message bus Kafka and Grafana dashboards for UI (ETL). Kubernetes and AWS (EKS) are options for deployments.
We are starting to creating data pipelines which will have sources( Files on S3 or server or databases), processors( custom applications, AL/ML pipelines ) and destinations (Kafka, s3, databases, ES). I am planning to use airflow to manage overall management of pipelines and tasks within pipeline via SCDF based applications or future applications written python as AL/ML piece expands. Is this correct approach or can I let go of one over other?

Based on your requirements, SCDF would fit and provide options to manage your streaming data pipelines.
While you can still research to find any other possible approaches, I can provide some more hints on what SCDF provides to meet some of your requirements.
SCDF provides out of the box apps which you can extend/customize. These apps include S3 source and sink which you can use out of the box. For a complete list of out of the box apps, you can refer the page here
Apparently, SCDF has Kubernetes deployer which you can work on any Kubernetes based platforms. You can configure your K8s specific properties as a set of kubernetes deployer properties when you deploy the applications.
You can embed a python based application as a processor/transformer in the streaming data pipeline. You can check this receipe from the SCDF site to know more about this.
You can also embed tensorflow application as a processor application inside a pipeline.

Related

Monitoring HTTP Traffic in Kubernetes

I need to monitor HTTP traffic within my Kubernetes cluster. Specifically, I need response times, status codes, etc.
Currently, I am using a service mesh (Open Service Mesh) for this purpose. But, is there a lightweight solution that only allows monitoring (without security layer etc.) ?
Thanks for all ideas!
I can recommend you ELK stack (Elasticsearch, Fluent-bit and Kibana), one of the popular monitoring system suites on top of Kubernetes.
Fluent-bit has built-in parsers for logs of popular Web servers (apache, nginx).
Fluent Bit is a super fast, lightweight, and highly scalable logging and metrics processor and forwarder. It is the preferred choice for cloud and containerized environments.
Elasticsearch
Elasticsearch is a distributed, RESTful search and analytics engine capable of addressing a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data for lightning fast search, fine‑tuned relevancy, and powerful analytics that scale with ease.
Kibana
Kibana is a free and open user interface that lets you visualize your Elasticsearch data and navigate the Elastic Stack. Do anything from tracking query load to understanding the way requests flow through your apps.

What is the recommended way to interface with a CorDapp via an API?

I am designing a CorDapp, which would require user input as well as API integration, and I am considering various approaches to expose flows and vault queries to the outside world.
Default option seems to be to use Corda RPC. Unless I missed something, there are only Java bindings for it, which is effectively restricting the clients to only be JVM-based. This is somewhat inconvenient, and ideally I would like something like OpenAPI to make it more open and implementation-agnostic.
Another option is to use some kind of Corda RPC to OpenAPI proxy. I know about Braid, and I'm sure there are others. Braid seems to support deployment as a Corda service packed together with the flows into the CorDapp itself, effectively making it running embedded into the Corda JVM.
Braid can be deployed as a standalone proxy too, which I suppose is option three.
Instinctively I find the embedded mode more attractive, as it reduces the number of moving parts, as opposed to a standalone mode. However, I am concerned that such model may be in fact become discouraged at some point, either because Corda developers consider it to be a misuse of services facility, or because some organisations will not be keen to deploy such code onto their nodes, especially when they may be running multiple CorDapps. I would imagine anything deployed as part of Corda JVM would at least require more scrutiny due to potential impact on other things running there, which in turn would reduce the agility.
I wonder what approach to integrate with a CorDapp is actually recommended?
Edit 1: I know it is technically possible to embed a webserver into the node and expose a REST API from there, at least in the current version of Corda (4.3 at the time of writing). The question is more about whether it is a good idea to do so, or not, and why.
Take a look at the question I had asked on Stackoverflow regarding front end for CordApp. This might be of some help.
Following is the link -
"Corda: Can we develop Dapps that will be run by IIS webserver to talk to Corda platform?"
You can use any front-end technology you want.
As of Corda 3, your backend must be JVM-based, for two reasons:
You need to load various flow, state and other class definitions onto
the classpath to pass as arguments to flows, retrieve objects from the
vault, etc.
You need to use the CordaRPCClient library to create an
RPC connection to the node
If you really need to write your back-end
in another language, there are a few workarounds:
Create a thin Java webserver that sits between your main webserver and
the node. The Java webserver translates HTTP requests from the main
webserver into RPC calls to the node, and RPC responses from the node
into HTTP responses to the main webserver
This is the approach taken
by libraries such as Braid
Use a library such as GraalVM to compile
non-JVM languages to JVM bytecode
An example of writing a JVM
webserver in Javascript using GraalVM is available here:
https://github.com/nitesh7sid/cordapp-example-nodejs-server-graalvm

Corda: Can we develop Dapps that will be run by IIS webserver to talk to Corda platform?

We have used "Yo!CorDapp" example (https://github.com/corda/spring-observable-stream) to build a POC.
In this POC, can we replace angular by .NET for frontend and use IIS webserver in place of springboot webserver to talk to Corda platform?
Thanks
You can use any front-end technology you want.
As of Corda 3, your backend must be JVM-based, for two reasons:
You need to load various flow, state and other class definitions onto the classpath to pass as arguments to flows, retrieve objects from the vault, etc.
You need to use the CordaRPCClient library to create an RPC connection to the node
If you really need to write your back-end in another language, there are a few workarounds:
Create a thin Java webserver that sits between your main webserver and the node. The Java webserver translates HTTP requests from the main webserver into RPC calls to the node, and RPC responses from the node into HTTP responses to the main webserver
This is the approach taken by libraries such as Braid
Use a library such as GraalVM to compile non-JVM languages to JVM bytecode
An example of writing a JVM webserver in Javascript using GraalVM is available here: https://github.com/nitesh7sid/cordapp-example-nodejs-server-graalvm

Distributing Axon Framework events using Amazon messaging services

Axon Framework supports Distributing Events which should allow to post events to the external message broken and read events from message broker.
Amazon provides two different messaging services
Amazon Simple Queue Service (SQS)
Amazon MQ
Questions:
What Amazon messaging service might be used (SQS, MQ or may be both) as message queue with Axon Framework?
What is the best practice for implementing distributed Axon message processing in Amazon EC2 cloud?
It looks like that Amazon MQ should work fine with Axon Framework as message queue because it supports AMQP. But I failed to find any references about any practical experience using MQ with Axon.
Any messaging service implementation which supports, AMQP should work just fine with the Axon Framework. More specifically together with the axon-amqp dependency, as you might already have found out.
Without personally having any experience with Amazon SQS or MQ, I did a quick search on both, and it seems Amazon MQ is indeed the go to solution to distributed your events on Amazon EC2.
I do not have or heard any best practices for distributed your Axon Framework events over Amazon EC2 specifically, sadly enough, but I can share this.
All the Axon Framework (when adding the axon-amqp dependency) does, is subscribe to the EventBus and publish any incoming events on a queue; no further specifics.
Hence I'd say any best practices for general use of Amazon MQ should apply on your second question.

Does Azure Service Fabric do the same thing as Docker?

My thinking is that people use Docker to be sure that local environment is the same as production and that I they can stop thinking about where are their apps running physically and balancing mechanisms should just allocate apps in best places for that moment.
I'm 100% web based and I'm going to move to cloud together with our databases, and what cannot be moved will be seamlessly bridged so the corporate stuff and the cloud will become one subnetwork.
And so I'm wondering, maybe Service Fabric already does the same thing that Docker does plus it gives as address translation service (fabric:// that acts a bit like DNS for the processes in fabric space) plus (important for some) encourages on demand worker allocation - huge scalability perk.
Can Service Fabric successfully replace Docker?
Is it gaining audience and acceptance? Because otherwise even the greatest invention can fail.
It's confusing since Docker (the company) is trying to stake claims in everything cloud.
Docker Engine (what most people call "Docker") is a containerization technology. It can give you
Process isolation
Network isolation
Consistent application environment
Docker Hub is an image registry. It stores Docker images so you can download them as part of your deployment.
Docker Cloud is an orchestration system for Docker. It can give you
Scale your applications up and down
Connect your applications to each other
CI testing, integrated with Docker Hub (this isn't part of orchestration, just another thing it does)
Service Fabric is an orchestration system. It can orchestrate Docker containers, but it can also integrate more tightly with your services if you build specifically for Fabric. (Docker is completely agnostic about what runs inside a container.)
So Service Fabric is mostly comparable to Docker Cloud, though it's not an exact match. There are some other Docker-based orchestration solutions (Kubernetes is probably the biggest) and there are other cloud-based micro-service solutions (Heroku is probably the best-known).
The primary disadvantage of Service Fabric is that it's a Microsoft technology and so you're going to be tied to Azure to a greater degree than if you were running Docker. The other is that Docker has a broader range of choices for building your stack: all three Docker-things I listed above have at least one open-source alternative (this is also a big disadvantage of Docker, since nobody's laying out a single Best Practices For You document).
If you love Microsoft and if cobbling systems together is not something that's important to you, then Service Fabric should be a fine alternative to the Docker ecosystem. (And you can still run Docker containers under it.)
The key similarities between the Service Fabric and Docker containerization:
Both Dockers and SF, are capable of creating an immutable image out of your micro-service implementation, on both the platforms - Linux and Windows.
Both Dockers and SF, are capable of orchestrating your containerized application within a cluster of VMs. These VMs can be anywhere - public cloud, private cloud or your own data center. Please note that both of them are cloud platform agnostic, that means, they don't have strong affinity on any of the cloud service. So as long as you are not using any cloud specific feature within your micro-service, this should be fine.
Both Dockers and SF, are capable exhibiting essential capabilities of an orchestrating platform: Service Discovery, Service level load balancing, Network level isolation among services, fail-over handling and replication control etc.
The key differences between the Service Fabric and Docker containerization:
Docker container is essentially a deployment / packaging construct. That said, docker doesn't dictate on what you are packaging within a container as part of your service implementation. Neither it provides any programming construct to implement your kind of service. Whereas, Service Fabric provides programming constructs in the form of base types / interfaces from which your service implementation can start with a certain kind of service declared - stateful service, stateless service, virtual actor.
In the Docker world, everything is a container, i.e. your minimum deployment / orchestration unit is a container. Hence, it doesn't recognize or support an individual process. Whereas, in SF, we have a provision wherein, your micro-service derived from stateless / stateful service can be orchestrated and governed as a process. However, SF also supports container orchestration the way Docker does. Also, the latest version of SF allows packaging your stateful / stateless service within a container.
With above facts in mind, please note that SF doesn't have any strong affinity on any cloud provider. It can run equally on any public cloud - Azure, AWS or GCP, as long as you are able to create the VMs with desired platform.
It is not comparable at all. With service fabric, you get health monitoring, code integration with the fabric, logging, monitoring, load-balancing, and other intelligent features. Your application can even execute shutdown code. Service Fabric is not just for Microsoft technologies and even docker can reside inside SF so is rkt or Unix OS. Security and networking features(in-line with web apps) is another plus. Reliable collections is simply brilliant. And a roadmap to better application building and performance is guaranteed for the companies adopting it (history says so).
This question is highly favoring 'greatest invention' Docker. This comparison can do good for Docker marketing but no one will replace SF for Docker. Docker is just a tiny OS copy (nothing to do with services, applications or intelligence). Docker even has nothing to do with application development, it wasn't the intention. Just that people have started to find the need for isolation and sharing. And that is what Docker is all about.

Resources