Microservices orchestration choices - airflow

I am exploring the possible solutions for orchestrating my flows across multiple services via some infrastructure. Searching shows me a few options such as Conductor, Camunda, Airflow etc.
I am wondering what would fit my use case better
One of my service is in Java, the other is in Python
I need to pass info to the Java service, then take the output and pass it to the Python service
Final output is then published to another queue
It feels like Conductor is a good choice, but would love to hear your inputs!

All options can fulfill the requirement stated. Think about further / future requirements. Is it only a data pipe? Is it about orchestrating a larger end-to end business process? Do you need support for long-running processes? Is end-to-end transparency in a graphical form a benefit? Is graphical process modelling in BPMN2 standard going to be a benefit? Are there going to be audit or reporting requirements? Or is it going to be a simple, isolated, technical solution?
This article gives a great overview of tools in the market and what their primary use cases are: https://blog.bernd-ruecker.com/understanding-the-process-automation-landscape-9406fe019d93

All listed tools might technically be able to execute your workflow (I have no experience working with Conductor & Camunda). A few characteristics on which a decision is usually made are:
open vs closed source
how do you define workflows? (e.g. Python code in Airflow. Others use e.g. JSON/XML/something custom)
does it come with a UI?
can it scale out in case my workloads start growing?
is it agnostic to any technology or limited to running certain technologies? (e.g. Oozie is built for scheduling jobs on Hadoop)
other requirements could be e.g. security, logging, monitoring, etc.
There are many orchestration-tool-comparisons on the internet, e.g. 1 or 2.

Introduction to Container Orchestration
The practice of automating the administration of container-based microservice applications across different clusters is known as container orchestration. Within corporations, this notion is gaining popularity. In addition, a variety of Container Orchestration technologies have become indispensable in the deployment of microservice-based applications.
Software development in the modern era is no longer monolithic. Instead, it generates component-based apps that run across many containers. These adaptable and scalable containers work together to accomplish a specified purpose or microservice.
Depending on the complexity of the application and other requirements like load balancing, they may span many clusters.
Containers encapsulate application code as well as its dependencies. To function efficiently, they receive the resources they require from physical or virtual hosts. When complicated systems are built as containers, clustering them for deployment requires adequate management and priority.
How to Choose a Container Orchestration Tool?
We've looked at a number of Orchestration Tools that you may examine when selecting which is ideal for your business. To do so, make sure to understand your company's requirements and operations. Then you'll be able to more readily weigh the benefits and drawbacks of each option.
Kubernetes
Kubernetes has a lot of features and is ideally suited for container and cluster management at the corporate level. Kubernetes is managed by a number of platforms, including Google, AWS, Azure, Pivotal, and Docker. As the containerized workload grows, you have a lot of options.
The biggest disadvantage is that it does not work with Docker Swarm and Compose CLI manifests. It might also be difficult to understand and set up. Despite these flaws, it is one of the most used systems for cluster deployment and management.
Docker Swarm
For individuals who are already familiar with Docker Compose, Docker Swarm is a better option. It's easy to use and doesn't require any additional software. Unlike Kubernetes and Amazon ECS, however, Docker Swarm lacks sophisticated features such as built-in logging and monitoring. As a result, it is better suited to small-scale businesses that are just starting started with containers.
Amazon ECS
If you're already familiar with Amazon Web Services, Amazon ECS is a great way to install and configure clusters. It's a quick and easy method to get started, and it scales to match demand. It also connects with a number of other AWS services. It's also excellent for small teams with limited resources for container maintenance.
One of its disadvantages is that it is incompatible with nonstandard deployments. It also contains ECS-specific configuration files, which complicates debugging.

Related

GRPC Services: Central Proto Repository or Distributed

We plan to keep a central proto repository to keep all proto definitions and its generated code here. We would keep messages as well as service definitions in a central Git repo. We plan to drive API design standard from this central repository.
But, any service which want to use this to expose a sever service or generate clients would have to import from this repo (.pg.go).
Do you see any issue with this approach? Or do you see keeping service proto files individually in the service repos as a better alternative.
PS: Starter in the GRPC journey of building microservices. Still learning the right way to structure and distribute code here.
This question occurs regularly and I suspect the fact that there's no published guidance is because the answer depends on your needs more than the technology's.
The specific issue of many vs one is not dissimilar to whether you prefer to use a monorepo and only you can effectively determine that. Perhaps one way to determine this is to understand now (and in the future) how many shared dependencies your services will have? Another may be to determine how many repos you'll have (how complex would it be to manage 10s or 100s of repos?).
In my experience, it's a good practice to keep the protos distinct (i.e. separate repo) from code that uses them. Not only may you want to version protos independently from implementations (across languages) but the implementations themselves are independent; in one use-case I must clone a repo containing an entire system (written mostly in one language) in order to get its protos to generate bindings in another language. In this case, it would be preferable if the repo were limited to just the protos.
You could look to examples for guidance. The gRPC repo keeps a bunch of stuff rooted on the grpc package in addition to math. Although less broad, Google bundles its well-known types under google.protobuf.

Alfresco Community 5 Share Clustering

I'm seeing a lot of conflicting information on the internet about Alfresco Share clustering. From what I can find, it looks like clustering was removed completely from Alfresco Community in versions 4.2 and above.
I did find some documentation showing that Alfresco One 5 has Share clustering and I noticed that I can enable hazelcast in Alfresco Community 5 but the clustering doesn't work at all.
Is there a way to have more than 1 instance of Alfresco Community 5 behind a load balancer and have proper synchronization/replication/clustering occur between the share instances?
Short answer
There is no cluster and no load balancer support for the Alfresco Community version (I know of). Alfresco removed that feature from the community version starting with 4.2 when they refactored the whole cluster thing.
Long answer
What are you trying to archive?
If scalability is your goal you should focus on the bottlenecks in the Alfresco architecture which will not be solved by clustering / load balancing. I haven't seen a system where Share tier was the bottleneck.
quite the contrary: If load from share against the repository tier is too high you will fall into a timeout and thread escalation since Alfresco follows the "retrying transaction" principle: If errors occur, share will retry - which means: if repositry is answering too slow share will create new requests/threads until the OS reaches kernel or process limits without any result.
So instead you should focus on optimizing the repository tier to become as fast as possible to avoid thread escalations in share (This also can't be achived by clustering):
transformation --> understand, replace or disable sync transfomation stuff running on repository tier
search --> understand, optimize tracking and run SOLR on separate host(s), but tracking will rely on the transformation performance of the repository tier
caching --> use smart reverse proxys to cache Share stuff on client and proxy side to minimize traffic
very fast/smart storage concepts on db and index tier
If availability is your concern you may get better results by using HA features from virtualisation platforms like VMWare ESX and your support efforts will be a fraction compared to clustered Alfresco.

In what way is Ruby on Rails NOT multithreaded?

Disclaimer: I'm a c# ASP.NET developer learning "RoR". Sorry if this question doesn't "get" RoR, any corrections greatly appreciated!
What is multithreading
My understanding of "multithread" ability in web apps is twofold:
Every time a web/app server receives a request it can assign a thread to the new request, thus multiple requests can run concurrently.
The app runtime + language allows for multiple threads to be used WITHIN a single request (in ASP.NET via "Async" methods and keywords for example).
In this way, IIS7 + ASP.NET can do points 1 AND 2.
I'm confused about RoR
I've read these two articles and they have left me confused:
Clearing up some things about LinkedIn mobile’s move from Rails to
node.js
How to deploy a multi-threaded Rails app
question one.
I think I understand that RoR doesn't lend itself very well to point number 2 above, that is, having multiple threads within the same request, have I got that right?
question two.
Just to be crystal clear, RoR app/web servers can also do point number 1 above right (that is, multiple requests can run concurrently)? is that not always the case with RoR?
Question 1:
You can spawn more Ruby threads in one request if you want, although that seems to be outside the typical use case for Rails. There are uses for it for certain long-running IO or external operations.
Question 2:
The limiting factor for Ruby concurrency in general, not just with Rails, is the Global Interpreter Lock. This feature of Ruby prevents more than 1 thread of Ruby from executing at any given time per process. The lock is released whenever there is non-Ruby code executing, such as waiting for disk IO or SQL responses. You can get around this by using a different implementation of Ruby than the default, such as JRuby, but not all.
Phusion Passenger uses process based concurrency to handle a few requests concurrently, so, strictly speaking, is not "multithreaded," but is still concurrent.
This talk from Ruby MidWest 2011 has some good thoughts on getting multithreaded Ruby on Rails going.
Since this is about "from ASP.NET to RoR" there is another small but important detail to remember: In *nix environments it's common to achieve concurrency of a service application through multi-processing rather than multi-threading. This is an architecture that goes way back and is related to the relatively cheap cost of multi-processing on *nix systems using fork and Copy-on-Write. Each process serves one request at a time in a single thread and the main process controls spawning and killing worker child processes. Multiple requests are served concurrently by different child processes.
Modern service applications, for example Apache, have multi-process, multi-threaded, and even combined modes (where the service forks several processes, each running several threads).
In cases where the application was built with portability at mind (examples again: Apache, MySQL, etc) it is customary to run it in multi-process or combined mode on *nix systems, and in multi-threaded mode on Windows servers.
However, admittedly Rails is somewhat lacking on the Windows front. It's not that you can't run it on Windows, it's just that not a lot of effort went into making sure it runs well and smoothly for production use on Windows servers. It's not a common production platform among the RoR community.
As a result, Eventhough Rails itself is thread-safe since version 2.2, there isn't yet a good multi-threaded server for it on Windows servers. And you get the best results by running it on *nix servers using multi-process/single-threaded concurrency model.
Rails as a framework is thread-safe. So, the answer is yes!
The second link that you posted here names many of the Rails servers that don't do well with multi-threading. He later mentions that nginx is the way to go (it is definitely the most popular one and highly recommended). But he doesn't mention what made him come up to the conclusions.
Ruby 1.9.3 came out recently and has some new threading goodness built in which didn't exist before.
Use of multi-threading generally depends on the use case.
Personally I have tried it once an year ago and it had worked but I haven't used it in any production code because I haven't come across a use case where using multi-threading made more sense over pushing the long running task to a background job.
I would love to explore this more. So, if you can describe what you are trying to achieve then maybe we can do a POC.

Where to start with Xen?

I am a newbie with Xen.I want to know how does Xen work.
It's really a puzzle when facing the code and I don't know where to start.
Are there some easy articles for me?
Since you mention looking at the code, I assume you want to understand the technical details of Xen and not just merely how to start a VM.
As with all problems, start with something simple and then work your way up. Some pointers:
Be sure to have the prerequisite experience under your belt. In particular, strong C and Linux affinity, but also x86 paging and virtualized memory workings.
Make sure you have a sound grasp of the general Xen architecture. For instance, paravirtualized versus hardware-supported virtualization, the special role of the management domain (Dom0) compared to unprivileged domains (DomU), etc.
Investigate the the Xen components running in Dom0:
The Xen control library (libxc) which implements much of the logic relating to hypercalls and adds sugar around these (look in tools/libxc).
The swiss army knife for administrating Xen, namely the Xen light library (libxl). This library replaces the deprecated xm tool with the xl tool and takes care of all your maintenance tasks such as starting/stopping a VM, listing all running VMs, etc. For all these operations, it works in tandem with the aforementioned libxc. (Libxl lives in tools/libxl.)
The Xenstore is a tree-like data structure from which all running domains can retrieve and store data. This is necessary since all I/O goes through Dom0 (not the hypervisor!), and domains need to communicate with Dom0 how they are going to pass I/O along. (Look in tools/xenstore.) You can inspect the Xenstore with a tool such as xenstore-ls.
the blkback/netback kernel drivers which pass the data over shared channels to the VMs. (You will find these drivers in a recent Linux kernel (e.g. >= v3.0) that has so-called PVOPS support).
Take a look at the console daemon (tools/console). Note that sometimes the Qemu console is actually used. Qemu also comes in the pictures as a default backend for if you choose a file-backed virtual storage for a VM.
Experiment with the 'Xen-way' of inter-VM communication: Grant tables, event channels and the Xenstore. With these fundamentals you can create your own shared channel between VMs. You can do this, for example, with writing a kernel module that you use in two domains to let them talk to each other.
I can also give some pointers in the source that you can check out:
xen/xen/include/public/xen.h will give you a list of all the hypercalls with comments what they do.
xen/xen/include/xen/mm.h gives you an introduction to the different memory terminology used by Xen (i.e., real versus virtualized addresses and page numbers). If you don't grasp these differences, then reading the hypervisor code will surely be frustrating.
xen/xen/include/asm-x86/config.h gives an overview of the memory layout of Xen.
xen/tools/libxc/xenctrl.h exports a large list of interesting domain control operations, which gives an abstract view of task division between Dom0 and the hypervisor.
Last but not least, the book 'The Definitive Guide to the Xen Hypervisor' by David Chisnall comes highly recommended. It covers all these topics and more in a thorough, technical fashion with plenty of code examples.
The Xen wiki and developer mailing lists are also a great resource for understanding Xen.
If you have a more specific question, then I can give you a more specific answer.
Here are few links which will guide you with ZEN Start up.Hope they will be useful.
http://www.howtoforge.com/howtos/virtualization/xen
http://wiki.xen.org/wiki/Category:HowTo
http://wiki.debian.org/Xen
For me, that is the best and more concrete tutorial with examples and step by step to start. I used it when I started.
Then you can read a lot more on Xen documentation itself or some books but as a starting point that allows you to easily install and test Xen, I choose that tutorial from Debian Wiki.
If you just want an overview, you may read this: http://wiki.xenproject.org/wiki/Xen_Project_Beginners_Guide.
This will introduce you to Xen hypervisor, suggest configuration to set up virtual machines, provide information about the networking and finally have details about tools for the management of virtual machines.
This documentation is to get the Xen specifically on ubuntu (Most importantly, it works!)
https://help.ubuntu.com/community/Xen
===
However, if you want to go to the next level and understand the working of Xen; Xen architecture, memory management, device management, CPU scheduling etc., I would recommend reading the book "The Definitive Guide to the Xen Hypervisor".

Service Oriented Architecture: How would you define it

Service Oriented Architecture seems to be more and more of a hot quote these days, but after asking around the office I have found that I seem to get many different definitions for it. How would you guys define SOA? What would you consider the official definition?
As Martin Fowler says, it means different things to different people. His article on the topic is pretty good although it isn't quite a definition.
http://martinfowler.com/bliki/ServiceOrientedAmbiguity.html
It may explain, the difficulty coming up with a concrete definition.
Wikipedia: "A SOA is a software architecture that uses loosely coupled software services to support the requirements of business processes and software users. Resources on a network in an SOA enviroment are made available as independent services that can be accessed without knowledge of their underlying platform implementation."
SOA is not that new, but it has potential to achieve some amazing things. But the organization has to be ready for it: the business has to think in processes and that's the big problem
I'd go with:
Defining a series of stateless, client
agnostic business operations created
to be leveraged in multiple
applications.
An SOA design includes components (i.e., services) that can be used by code regardless of implementation (i.e., any OS or langauge). A single instance of a service may also be used by multiple applications, whereas, e.g., a DLL would have to be duplicated for each app and require the same implementation technology as the linking application.
Services in an SOA design are usually implemented as interoperable web services.
There isn't an official definition as Ryan mentioned eariler. However, I find Thomas Erl's view of the whole service-orientation quite well-structured and relevant. Here is the definition of SOA from his SOA Glossary (more):
Service-oriented architecture represents an architectural model that aims to enhance the agility and cost-effectiveness of an enterprise while reducing the overall burden of IT on an organization.
Thomas Erl is the author of many SOA titles most of them receiving endorsement from SOA vendors including IBM, Oracle, and Microsoft. The nice thing about his books is that they are as SOA vendor independent as possible. It means you learn more about service-orientation itself and less about some vendor's middleware that supports SOA.
I agree with all of the people that point you to Fowler on this. Basically it runs like this: service oriented architecture got a reputation as being good, so anything that people want to be associated with good they call SOA. In reality it has a lot of downsides and can create a Service Oriented Gridlock or Dependency Oriented Architecture.
Here's my go at a definition:
Service Oriented Architecture is a systems integration and code reuse approach where applications are dependent on connecting to services provided by other running applications across the network. This is distinct from component architectures, where software components are shared statically between applications in the form of libraries or SDKs, for example.
A clarification here - "Service Oriented Architecture is a systems integration and code reuse approach where applications are dependent on connecting to services provided by other running applications across the network."
I have a scenario where two j2ee applications have been integrated using event driven messaging. Here the above phrases of systems integration and connecting to services provided by other running applications across the network hold good. Can i call this SOA ?
The following principles would hold good here
1) statelessness
2) message oriented - loosely coupled infact de-coupled
3) extensible.
However, the following do not apply
1) platform independence - neither of the applications being integrated has been designed to work in a different platform.
2) The applications are plain j2ee applications which have not been designed with all soa concepts.
I attempted to define SOA in one of my blog posts. Here's an excerpt...
For years it's been standard practice to separate functionality into functions, classes, and modules. The idea has always been that these smaller, highly specialized components are easier to share and maintain than monolithic blocks of code.
Functionally, SOA is not much different. The goals are the same - reusability and easy maintenance. The biggest difference - in the case of a web service SOA - is that the shared library included in your application is replaced with an HTTP call.
Here's a definition for you:
SOA - Software Over Architected. The inclusion of pointless, over-bloated, functional interface framework called an architecture in a pretty web site with a 3d graphic folder flying from one side to the other where "dir /s > a.txt | ftp -s:upload.ftp" did the job.
Software components are not bricks, cannot be generalised by common functional patterns and architecture emerges in the enterprise from good practice, not good design. Software isn't architected, it's engineered.
SCRUM ON!

Resources