Alfresco Community 5 Share Clustering - alfresco

I'm seeing a lot of conflicting information on the internet about Alfresco Share clustering. From what I can find, it looks like clustering was removed completely from Alfresco Community in versions 4.2 and above.
I did find some documentation showing that Alfresco One 5 has Share clustering and I noticed that I can enable hazelcast in Alfresco Community 5 but the clustering doesn't work at all.
Is there a way to have more than 1 instance of Alfresco Community 5 behind a load balancer and have proper synchronization/replication/clustering occur between the share instances?

Short answer
There is no cluster and no load balancer support for the Alfresco Community version (I know of). Alfresco removed that feature from the community version starting with 4.2 when they refactored the whole cluster thing.
Long answer
What are you trying to archive?
If scalability is your goal you should focus on the bottlenecks in the Alfresco architecture which will not be solved by clustering / load balancing. I haven't seen a system where Share tier was the bottleneck.
quite the contrary: If load from share against the repository tier is too high you will fall into a timeout and thread escalation since Alfresco follows the "retrying transaction" principle: If errors occur, share will retry - which means: if repositry is answering too slow share will create new requests/threads until the OS reaches kernel or process limits without any result.
So instead you should focus on optimizing the repository tier to become as fast as possible to avoid thread escalations in share (This also can't be achived by clustering):
transformation --> understand, replace or disable sync transfomation stuff running on repository tier
search --> understand, optimize tracking and run SOLR on separate host(s), but tracking will rely on the transformation performance of the repository tier
caching --> use smart reverse proxys to cache Share stuff on client and proxy side to minimize traffic
very fast/smart storage concepts on db and index tier
If availability is your concern you may get better results by using HA features from virtualisation platforms like VMWare ESX and your support efforts will be a fraction compared to clustered Alfresco.

Related

Microservices orchestration choices

I am exploring the possible solutions for orchestrating my flows across multiple services via some infrastructure. Searching shows me a few options such as Conductor, Camunda, Airflow etc.
I am wondering what would fit my use case better
One of my service is in Java, the other is in Python
I need to pass info to the Java service, then take the output and pass it to the Python service
Final output is then published to another queue
It feels like Conductor is a good choice, but would love to hear your inputs!
All options can fulfill the requirement stated. Think about further / future requirements. Is it only a data pipe? Is it about orchestrating a larger end-to end business process? Do you need support for long-running processes? Is end-to-end transparency in a graphical form a benefit? Is graphical process modelling in BPMN2 standard going to be a benefit? Are there going to be audit or reporting requirements? Or is it going to be a simple, isolated, technical solution?
This article gives a great overview of tools in the market and what their primary use cases are: https://blog.bernd-ruecker.com/understanding-the-process-automation-landscape-9406fe019d93
All listed tools might technically be able to execute your workflow (I have no experience working with Conductor & Camunda). A few characteristics on which a decision is usually made are:
open vs closed source
how do you define workflows? (e.g. Python code in Airflow. Others use e.g. JSON/XML/something custom)
does it come with a UI?
can it scale out in case my workloads start growing?
is it agnostic to any technology or limited to running certain technologies? (e.g. Oozie is built for scheduling jobs on Hadoop)
other requirements could be e.g. security, logging, monitoring, etc.
There are many orchestration-tool-comparisons on the internet, e.g. 1 or 2.
Introduction to Container Orchestration
The practice of automating the administration of container-based microservice applications across different clusters is known as container orchestration. Within corporations, this notion is gaining popularity. In addition, a variety of Container Orchestration technologies have become indispensable in the deployment of microservice-based applications.
Software development in the modern era is no longer monolithic. Instead, it generates component-based apps that run across many containers. These adaptable and scalable containers work together to accomplish a specified purpose or microservice.
Depending on the complexity of the application and other requirements like load balancing, they may span many clusters.
Containers encapsulate application code as well as its dependencies. To function efficiently, they receive the resources they require from physical or virtual hosts. When complicated systems are built as containers, clustering them for deployment requires adequate management and priority.
How to Choose a Container Orchestration Tool?
We've looked at a number of Orchestration Tools that you may examine when selecting which is ideal for your business. To do so, make sure to understand your company's requirements and operations. Then you'll be able to more readily weigh the benefits and drawbacks of each option.
Kubernetes
Kubernetes has a lot of features and is ideally suited for container and cluster management at the corporate level. Kubernetes is managed by a number of platforms, including Google, AWS, Azure, Pivotal, and Docker. As the containerized workload grows, you have a lot of options.
The biggest disadvantage is that it does not work with Docker Swarm and Compose CLI manifests. It might also be difficult to understand and set up. Despite these flaws, it is one of the most used systems for cluster deployment and management.
Docker Swarm
For individuals who are already familiar with Docker Compose, Docker Swarm is a better option. It's easy to use and doesn't require any additional software. Unlike Kubernetes and Amazon ECS, however, Docker Swarm lacks sophisticated features such as built-in logging and monitoring. As a result, it is better suited to small-scale businesses that are just starting started with containers.
Amazon ECS
If you're already familiar with Amazon Web Services, Amazon ECS is a great way to install and configure clusters. It's a quick and easy method to get started, and it scales to match demand. It also connects with a number of other AWS services. It's also excellent for small teams with limited resources for container maintenance.
One of its disadvantages is that it is incompatible with nonstandard deployments. It also contains ECS-specific configuration files, which complicates debugging.

Alfresco Community Enterprise Feature Comparison

I've seen this question but the answers are simply not good enough. I've searched the web and could find a clear listing of the main differences.
I am particularly surprised to see contradictions in the above link, that holds only 4 short answers.
So the question is, beyond support, what are (all) the differences between Alfresco Community and Enterprise editions (for the current versions of course)?
Are there functional or technical features that available in the Enterprise edition, that are not in the community edition?
I find it strange that it's so difficult to get a clear list. Looking at the forums to find this answer is not a serious option from a business perspective.
Until now, I found this link to be useful, but it's from 2009.
In particular, I find the platform support interesting, with the community edition supporting only lamp stuff:
Linux
MySQL
Tomcat
OpenLDAP
Firefox
And the enterprise edition supporting:
Windows
SQL Server
WebLogic, WebSphere
AD/Kerberos
IE and Safari
Apparently, these features are only available in the enterprise edition:
JMX monitoring
Runtime admininstration: What's that exactly? And what's in the community edition then?
Runtime indexing consistency check and update: What's in the community edition then?
High performance and availability: How is that implemented and what's in the community edition then?
Storage policies
Open source and proprietary technology stack support: which ones exaclty? Which ones are supported in the community edition?
If anyone could guide me towards serious documentation about these differences, that would be great.
I also went through the wiki but could not find an answer to my questions in there.
differences between Enterprise and Community vary in detail from version to version and are mainly visible for administrators. We see or maintain both flavors of Alfresco in midsize to very large environments and I would say it's more or less a question of taste and budget what the best decision / edition is for you. Excellent skills in infrastructure and java are highly advisable for both editions to run Alfresco in production.
The technical differences are not as dramatic as not being able to provide very similar functionality for the users - so if you're actually in a decision you should focus on a good technical partner, the support services and maybe the fact that you only get official patches in the Enterprise subscription, not on the Community. BTW Alfresco Enterprise is not Open Source but this is not a real point of interest for most end users. You can access the code as a subscription customer but it is not public available/accessible.
The main differences in features are already named more or less:
Administration
Enterprise has more views and setting in the admin web GUI. In Community you can access most configuration only from the command line. This may be a restriction but in real live Administrators prefer the command line and scripting automation.
Enterprise lets you change some Alfresco settings during runtime (most settings still require restart). Some can be change in the GUI and more in the jmx interface. Also you're able to stop and start subsystems like the CIFS protocol server. We use this feature to switch a system in read only mode. This point is meant with "runtime admininstration". Community requires restart of the service for most configuration changes. It is possible to work around this by advanced scripting like groovy or by implementing modules.
Indexing
Runtime indexing consistency check and update is not a self healing functionality as expected. You will have to learn (at least for now) that you have to recreate the Alfresco index from time to time even in Enterprise environments and that it is better to focus on good strategies how to speed recreation or how to setup standby indexes instead of hunting failed indexing transactions using the check and update methods. For major document model changes you need to recreate the index anyway.
High performance and availability
This is mainly the cluster and replication functionality which is no longer available in Community. It's similar to MS Clusters: It's a lot, lot work for very view more availability since some concepts are missing. The price is high in terms of complexity and can end up in loss of robustness. Even with enterprise support it's a hard job to keep a alfresco cluster running - so you need very good arguments why to go this way. But of course: its possible and available!
High performance: There shouldn't be any difference and if - I'm very curious about the explanation.
Technology stack
The main difference is the database support. In the Community you only can choose between MySQL and Postgres (No Oracle or MS SQL for Community). All other technologies are independent from Enterprise or Community (AD, Kerberos, OS, Browser, ...)
Java Container: I believe over 95% of all Alfresco installations run in tomcat. That's the configuration which is documented, tested and scales. Using WebLogic or WebSphere gives you no added value except new challenges - quite the contrary: You have to solve most issues for yourself and can't benefit from others experience.
Storage policies: I'm not pretty sure and should check in 4.2.x if the Content Store Selector / Storage policies is no longer available in the Community, but it was there in the 3.x versions.
[Edit]: storage policies have been removed in Community 4.2.x:
NoSuchBeanDefinitionException: No bean named 'storeSelectorContentStoreBase' is defined
If there is a really need for this functionality someone may re-enable that feature by coding a module for Community.
Regards
This page explains the difference between the editions:
https://wiki.alfresco.com/wiki/Enterprise_Edition
This page is the canonical, comprehensive list of the differences.
If you are considering an Enterprise Subscription and you have a question that isn't answered by what you can find on that page, you should talk to your account rep.
Well, regarding JMX monitoring:
Runtime administration: Alfresco enterprise allows to perform certain actions on Alfresco subsystems without restarting the server. This allows you to be very fast during debugging/developing and also making changes in production environment. Also you can access the JMX interface that supports JMX Remoting.
There is no consistency check or update, until you restart the server (during the startup you have to validate/check/rebuild your indexes). There is an option in alfresco.global.properties (or the original repository.properties config file) for that. If you have some inconsistencies in the Alfresco Community index, you're gonna have a bad time xD.
Alfresco Enterprise has specific license for clustering your architecture, the Community edition doesn't support those systems. Replicate and cluster Alfresco is one of the main improvements in performance/scalability/availability you could achieve.
The storage policies allow you to use Content Store selectors in Alfresco Enterprise. You can manage a primary and a secondary file store, and map/connect these stores in your architecture. The Community Edition allows you only to use one content store at a time.
These include everything inside Alfresco (Spring Framework, Apache-Lucene/Solr, Tomcat, and so on), because with the Enterprise license you have also the full support with everything inside the Alfresco package. The difference is that the Community is based on daily builds, supported by community, and therefor not guaranteed. The Enterprise support helps you resolve many problems that you might encounter during developing and in production environment, not only Alfresco related, but also on some configurations on supported platforms (Windows/Linux), your web application servers, and so on.
Hope it helps.

Migrating an ASP.Net App to Azure

I'm getting close to finishing a public-facing ASP.Net app and I'm starting to weigh deployment options. I'm an ASP.Net/SQLServer veteran but noob when it comes to Azure. I'm wondering how others have felt about the learning curve to effectively migrate a local dev ASP.Net/SQLServer apps into Azure cloud.
More specifically:
How steep is the learning curve towards understanding administration and programming concepts, and do you think it's worth the investment?
What is Microsoft's support like if I have catastrophic problems from my cloud infrastructure and my live site is down? My expectation is a large price tag for a not-so-urgent SLA.
Will my non-Azure ASP.Net app require significant modification and/or coupling to run in the Azure environment?
Thanks
I answered a similar question a while back, here. Azure has evolved since then:
Azure's AppFabric Cache is currently in CTP (community technology preview) and will go live some time later this year (sorry, I can't quote a date). With a single configuration change, you'll be able to enable the asp.net session state provider without changing any code, and have your session state available to all of your web role instances.
With Azure v1.3 which rolled out in November, you have have the ability to run tasks at startup with elevated privileges (e.g. to run an MSI to install some prerequisite control suite).
For monitoring, you can take advantage of Microsoft System Center, which now supports Azure directly. Alternatively, you can look into 3rd-party options such as AzureWatch.
With Azure's extra-small instance, you can run a site for approx. $44 monthly. You mentioned catastrophic failures and SLA. With Azure, you need a minimum of two instances for SLA to take effect (this is because your virtual machines are located in physically different areas of the data center, in separate fault domains). So you're looking at approx. $90 / month to run a site with 99.95% uptime. Only you can determine whether this is worth it to you. Yes, you can host with a simple hosting provider for significantly less (such as GoDaddy). However, if your site fails there, you have to wait for it to be detected and then installed on a separate box. Also, you share each server with potentially dozens of other tenants, which will impact your site's performance. With Azure, at most 8 tenants will occupy a box, depending on how many cores you configure your virtual machines to use. And it's incredibly simple to scale up or down to handle traffic increases and decreases.
My personal experience is that there isn't much documentation and you have to search through blogs / forums to find answers for more advanced questions. If you have a nicely design app then there shouldn't be much problem with porting - you can google for Azure version of ASP.NET providers, ie. membership.
The biggest disadvantage may be cost: you have to do your maths but for me it turned out that a VPS hosting is much cheaper than Azure.
I would say that unless you get considerable savings on infrastructure don't move to Azure for just the sake of doing it. A hosted server with SQL and IIS will give you less problems and a bit more freedom.
I see an excellent answer by David Makogon already. The following might be helpful for you as well. The last episode of the Connected Show podcast was about migrating Wold Maps to Azure. If you are considering moving to Azure it is certainly worth listening to, as they explain the challenges they faced during the migration.
You could give a look at Moving Applications to the Cloud on the Microsoft Windows Azure Platform in MSDN.
Cheers.

Scalability Case Studies

I'm starting to build a community website from the site up and my web framework will be Asp.net and Mysql.
I want to start planning some scalability into the infrastructure early because I'm anticipating high traffic when the site goes live.
Are there any case studies which you recommend reading where asp.net or mysql has been scaled and which demonstrates good scaling techniques?
I think it could be a challenge to find reference materials for that particular combination. Many .NET shops stick to SQL Server, and fewer use MySQL (at least at scale).
In general it would be appropriate to:
Follow general .NET practices for scalability. Weed out what is not appropriate for you.
Learn about database performance and implications of various design strategies such as denormalisation (when and why).
Consider out-of-process caching like memcached.
Review books on MySQL performance. Most of these are focused on UNIX platforms. Windows users may have problems applying some of these practices.
Read up on how other people are scaling their sites (Building Scalable Sites and The Art of Capacity Planning)
Consider how you might optimise your web design to be more scalable. Are you using AJAX? Work out what the impact of excessive polling may be etc.
Learn how to measure the performance of your application and database (starting points ASP.NET and MySQL).
Develop a plan for scaling your architecture (1 server to 2 servers, to multiple servers etc) so that you have some frame of reference for making decisions about building things in your system.
I only know of one really good resource to read case studies about scalability techniques and I am really surprised no one has mentioned it. High Scalability
There is so many examples of "out of the box" thinking that and different techniques for scaling that I think it makes a good read for anyone who is interested in the topic.
BrianLy said it best here:
"Develop a plan for scaling your
architecture (1 server to 2 servers,
to multiple servers etc) so that you
have some frame of reference for
making decisions about building things
in your system."
As a forum I frequent says, 'quoted for truth'. All of his points are excellent, but this one is a key point that many people overlook. It doesn't matter how scalable your code and database are if you are running on a creaky old server. The hardware may not be as important as your code, improving it beyond a certain point will give diminishing returns VERY quickly, but do NOT forget to get your hardware to that point. If you have crap hardware, or even good hardware but not enough of it, your site will bomb out.
For mysql scaling, you may find this interesting: danga livejournal

FOSS ASP.Net Session Replication Solution?

I've been searching (with little success) for a free/opensource session clustering and replication solution for asp.net. I've run across the usual suspects (indexus sharedcache, memcached), however, each has some limitations.
Indexus - Very immature, stubbed session interface implementation. Its otherwise a great caching solution, though.
Memcached - Little replication/failover support without going to a db backend.
Several SF.Net projects - All aborted in the early stages... nothing that appears to have any traction, and one which seems to have gone all commercial.
Microsoft Velocity - Not OSS, but seems nice. Unfortunately, I didn't see where CTP1 supported failover, and there is no clear roadmap for this one. I fear that this one could fall off into the ether like many other MS dev projects.
I am fairly used to the Java world where it is kind of taken for granted that many solutions to problems such as this will be available from the FOSS world.
Are there any suitable alternatives available on the .Net world?
As far as Velocity is concerned I have heard some great things about that project lately. It's still in the developing stages and probably not primetime ready yet. But I think the project has a solid footing and will become a strong mature product from Microsoft and not fall off into the ether like you predict.
Recently I've heard podcasts from Scott Hanselman and Polymorphic Podcast regarding Velocity.
BTW Windows Server AppFabric is out of beta. That's what i mentioned in my previous post.
here is the link on general availability;- http://blogs.technet.com/b/appfabric/archive/2010/06/07/windows-server-appfabric-now-generally-available.aspx
which specific features do you think one can get on NCache and not on AppFabric?
Just a quick update on this thread for the sake of completion.
Velocity (now known as Windows Server AppFabric) is already out in the production and offers a great distributed caching platform. More details are available on the msdn site
http://msdn.microsoft.com/en-us/windowsserver/ee695849.aspx
Although Velocity has made progress from CTP1 to CTP2, it still leaves much to be desired. It will be some time before they provide all the important features in a distributed cache and even longer before it is tested in the market. I wish them good luck.
In the meantime, NCache already provides all CTP2 & V1, and many more features. NCache is the first, the most mature, and the most feature-rich distributed cache in the .NET space. NCache is an enterprise level in-memory distributed cache for .NET and also provides a distributed ASP.NET Session State. Check it out at Distributed Cache.
NCache Express is a totally free version of NCache. Check it out at Free Distributed Cache.

Resources