We have a VERY high traffic site on a Plone 3 installation that is currently struggling under the load. We have scaled the server many times over and its now apparent this is not the bottle neck.
Is it possible to setup a Zeo clustering server with Plone 3?
The simple answer is: "yes, of course." If you're using Plone > 3.1, you do it pretty much the same way you set up a cluster in Plone 4 or 5.
Will it solve your problem? If your problem is that you're not making good use of all the cores on a multi-core machine, a zeocluster is a good way to solve it. The threading in a single Zope instance is very inefficient. A cluster does much better.
At the same time, you should look to see if you can reduce the work done by your Zope instances. Having a proxy cache and a good caching setup is the key.
And, spend some time updating to Plone 4.x. It's faster out-of-the-box at rendering pages and has a much more efficient blob-handling system. You'll also find that the documentation for Plone 4.x is excellent, including that on scaling. If you can't do that, track down a copy of "Practical Plone 3" for documentation on cluster architecture and caching.
Related
I'm seeing a lot of conflicting information on the internet about Alfresco Share clustering. From what I can find, it looks like clustering was removed completely from Alfresco Community in versions 4.2 and above.
I did find some documentation showing that Alfresco One 5 has Share clustering and I noticed that I can enable hazelcast in Alfresco Community 5 but the clustering doesn't work at all.
Is there a way to have more than 1 instance of Alfresco Community 5 behind a load balancer and have proper synchronization/replication/clustering occur between the share instances?
Short answer
There is no cluster and no load balancer support for the Alfresco Community version (I know of). Alfresco removed that feature from the community version starting with 4.2 when they refactored the whole cluster thing.
Long answer
What are you trying to archive?
If scalability is your goal you should focus on the bottlenecks in the Alfresco architecture which will not be solved by clustering / load balancing. I haven't seen a system where Share tier was the bottleneck.
quite the contrary: If load from share against the repository tier is too high you will fall into a timeout and thread escalation since Alfresco follows the "retrying transaction" principle: If errors occur, share will retry - which means: if repositry is answering too slow share will create new requests/threads until the OS reaches kernel or process limits without any result.
So instead you should focus on optimizing the repository tier to become as fast as possible to avoid thread escalations in share (This also can't be achived by clustering):
transformation --> understand, replace or disable sync transfomation stuff running on repository tier
search --> understand, optimize tracking and run SOLR on separate host(s), but tracking will rely on the transformation performance of the repository tier
caching --> use smart reverse proxys to cache Share stuff on client and proxy side to minimize traffic
very fast/smart storage concepts on db and index tier
If availability is your concern you may get better results by using HA features from virtualisation platforms like VMWare ESX and your support efforts will be a fraction compared to clustered Alfresco.
Disclaimer: I'm a c# ASP.NET developer learning "RoR". Sorry if this question doesn't "get" RoR, any corrections greatly appreciated!
What is multithreading
My understanding of "multithread" ability in web apps is twofold:
Every time a web/app server receives a request it can assign a thread to the new request, thus multiple requests can run concurrently.
The app runtime + language allows for multiple threads to be used WITHIN a single request (in ASP.NET via "Async" methods and keywords for example).
In this way, IIS7 + ASP.NET can do points 1 AND 2.
I'm confused about RoR
I've read these two articles and they have left me confused:
Clearing up some things about LinkedIn mobile’s move from Rails to
node.js
How to deploy a multi-threaded Rails app
question one.
I think I understand that RoR doesn't lend itself very well to point number 2 above, that is, having multiple threads within the same request, have I got that right?
question two.
Just to be crystal clear, RoR app/web servers can also do point number 1 above right (that is, multiple requests can run concurrently)? is that not always the case with RoR?
Question 1:
You can spawn more Ruby threads in one request if you want, although that seems to be outside the typical use case for Rails. There are uses for it for certain long-running IO or external operations.
Question 2:
The limiting factor for Ruby concurrency in general, not just with Rails, is the Global Interpreter Lock. This feature of Ruby prevents more than 1 thread of Ruby from executing at any given time per process. The lock is released whenever there is non-Ruby code executing, such as waiting for disk IO or SQL responses. You can get around this by using a different implementation of Ruby than the default, such as JRuby, but not all.
Phusion Passenger uses process based concurrency to handle a few requests concurrently, so, strictly speaking, is not "multithreaded," but is still concurrent.
This talk from Ruby MidWest 2011 has some good thoughts on getting multithreaded Ruby on Rails going.
Since this is about "from ASP.NET to RoR" there is another small but important detail to remember: In *nix environments it's common to achieve concurrency of a service application through multi-processing rather than multi-threading. This is an architecture that goes way back and is related to the relatively cheap cost of multi-processing on *nix systems using fork and Copy-on-Write. Each process serves one request at a time in a single thread and the main process controls spawning and killing worker child processes. Multiple requests are served concurrently by different child processes.
Modern service applications, for example Apache, have multi-process, multi-threaded, and even combined modes (where the service forks several processes, each running several threads).
In cases where the application was built with portability at mind (examples again: Apache, MySQL, etc) it is customary to run it in multi-process or combined mode on *nix systems, and in multi-threaded mode on Windows servers.
However, admittedly Rails is somewhat lacking on the Windows front. It's not that you can't run it on Windows, it's just that not a lot of effort went into making sure it runs well and smoothly for production use on Windows servers. It's not a common production platform among the RoR community.
As a result, Eventhough Rails itself is thread-safe since version 2.2, there isn't yet a good multi-threaded server for it on Windows servers. And you get the best results by running it on *nix servers using multi-process/single-threaded concurrency model.
Rails as a framework is thread-safe. So, the answer is yes!
The second link that you posted here names many of the Rails servers that don't do well with multi-threading. He later mentions that nginx is the way to go (it is definitely the most popular one and highly recommended). But he doesn't mention what made him come up to the conclusions.
Ruby 1.9.3 came out recently and has some new threading goodness built in which didn't exist before.
Use of multi-threading generally depends on the use case.
Personally I have tried it once an year ago and it had worked but I haven't used it in any production code because I haven't come across a use case where using multi-threading made more sense over pushing the long running task to a background job.
I would love to explore this more. So, if you can describe what you are trying to achieve then maybe we can do a POC.
I whould like to know some opinions about OpenEJB: we are considering to use it on a new project, but really didn't found many opinions about it.
So, here is my question: how about it? Does it perform well? Is it stable enough for a production environment?
We switched to OpenEJB (deployed embedded in our app on Tomcat). Performance tests showed better or not worse results processing our transactions compared to JBoss (transactions include data access, JMS, and servlets). We use ActiveMQ within OpenEJB for JMS. There are no stability problems as of yet - we are still in staging (pre-production) environment though. The documentation is definitely lacking, but not as poor as other embedded choices. Overall, we consider this as a good choice if you run on Tomcat. Deploying it on other application servers turned out to be much more difficult (JBoss, Weblogic, Websphere) but there are not many reasons for this usually (we had few but dropped this after several attempts basically failed).
And as in all open source products: expect lack of support (documentation, troubleshooting, bugs, etc.) to be compensated by free access to sources.
We've had experience with Oracle OAS and JBOSS before. We decided to give OpenEJB a try. We've found out that it is not only very fast but it also much easier to setup and configure, and it has much better defaults.
Currently we implement our own failure measures in the client, so we don't know how they compare for clustering, or other advanced features that we don't use.
We we have to go back and deal with JBOSS in the developer side, we see a drop on productivity, because it takes too long to bootstrap.
Until recently I'd considered myself to be a pretty good web programmer (coming up for 10yrs commercial experience on a variety of e-commerce, static and enterprise applications). I'm self taught and have always used the Microsoft product stack (ASP, ASP.NET)...
My applications are always functional, relatively bug free, but have never been lightening quick. As a frequent web user I always found this to be the norm... how fast are the websites from the big tech players (eBay, Facebook, Microsoft, IBM, Dell, Telerik etc etc) - in truth none are particularly fast. I always attributed this to "the way things are with web apps"...
...then I cam across a product called Jira from atlasian and this has stopped me in my tracks...
This application is fast, and I mean blindingly fast.. too fast to time the switches between pages, fully live content, lots of images and data and cross references etc etc...
I run this on an intranet, with a large application DB, and this is running on a very normal server (single processor, SATA HDD, 8GB RAM).
Am I missing something?? Are my programming techniques that bad?? I am wondering if this speed gain is down to it being written in Java and running on Tomcat.
Does anyone have any benchmarks to compare JSP / ASP or Tomcat / IIS???
Thanks,
Mark
NOTE: this isn't a blatant plug for Jira. I don't work for them or have any affiliation to them... but I would like to be able to write applications like them :)
YMMV. But one of the longest-lived Things That Aren't True Anymore is the assertion that "Java Is Slow". Excepting floating-point (where most Java implementations aren't at liberty to use the floating-point hardware), Java is generally as fast or faster than compiled code. Some of the best and brightest have spent years of effort ensuring this, including such things as dynamic recompilation/re-optimization of code based on run-time metrics - something that statically-compiled languages like C or assembler cannot boast.
ASP is sort of the opposite extreme, since the original ASP had to recompile each page request each and every time it was made. ASPX addressed this by allowing retention of the compiled page code. That got rid of a lot of useless overhead.
A more compelling reason to prefer Java over ASPanything/IIS is freedom. A Java/Tomcat webapp will run under almost any OS on almost any hardware. IIS runs on Windows. Period. And for the most part, that also means Intel. Not Sparc, Not zSeries. Maybe you don't care. But then again, maybe next week IBM will offer your employer a can't-refuse deal on a mainframe.
I don't have benchmarks, and there are a lot of things that can make one platform preferable. But I permanently gave up on the "Java is slow" idea when I encountered the Poseidon UML tool with its cool real-time graphics UI and the FreeMind mindmapper tool. A small hit to startup the JVM, but after that, you'd never know what language you were working under.
The great debate. Java vs. .Net.
When .Net first came out there was an application written called "The Pet Shop." Which was a .Net port of Sun's J2EE reference application, "The Pet Store". It was announced that Microsoft's implementation was "faster."
As with anything, especially anything to do with marketing, you have to dig deeper to find the truth.
Any technology can be fast with enough hardware and the correct design.
In my experience there are two factors to speed: What type of hardware is used and how you architect your application (this includes database tuning).
Caching at various levels (response, db, etc.) makes a huge different in responsiveness of a web application. There is also a lot of things that are done to reduce time consuming operations like db connection pooling, sql statement caching, etc. As much as I'd like to say Java is better :-), I think in this case the performance is due to the way Jira was written and the fact that it's being run internally (probably with few users as compared to eBay, Facebook, Microsoft). This site, Stackoverflow, uses ASP.NET MVC and IIS and is very responsive and my guess (since code is not open sourced, yet) is that they use many of the same techniques you would find in Jira or any other web application built to scale.
I think that it is not typically the frameworks and languages used that make an application slow. In my experience, some frameworks like JSF or .NET server side controls give developers alot of freedom to make too many database calls and look things up too often, but that's definitely not the fault of the framework used.
Keep your application as light as possible and focus on keeping the data sent to the client as small as possible, and you will have a fast application. It's usually faster to develop fast applications too.
The Jira folks have written a best in class application (and charge for it) - nice work crocodile dundees.
I also suggest to consider also two aspects:
the maintenance activities: logging and deployment. In my opinion under a unix like server is more easier to log, deploy, and maintain new release than doing the same on a Windows server.
if the project require to use some open source application (i.e. Alfresco repository) Java is better solutions
People's opinion is mostly biased. Most people have never really tried the other while claiming the other is slower. I wouldn't trust any answer: it's mere opinion. It's boring to always read the same 4 cents again and again.
I am curious as to what others are using in this situation. I know a couple of the options that are out there like a memcached port or ScaleOutSoftware. The memcached ports don't seem to be actively worked on (correct me if I'm wrong). ScaleOutSoftware is too expensive for me (I don't doubt it is worth it). This is not to say that I don't want to hear about people using memcached or ScaleOutSoftware. I'm just stating what I "know" at this point.
So my question is basically this: for those of you ACTIVELY using distributed caching, what are you using, are you happy with it, and what should I look out for?
I am moving to two servers very soon...both will be at the same location. I use caching fairly heavily (but carefully) to reduce the load on my database server.
Edit: I downloaded Scaleout Software's solution. I've coded for it and it seems to work real well. I just have to decide if my wallet will part with the cash for it. :) Anyone have experiences good or bad with ScaleoutSoftware?
Edit Again: It's been a little while since I asked this? Any more thoughts on it? We ended up buying the solution from ScaleOutSoftware and have been happy with it, but I'm curious what others are doing.
Microsoft has a product pending code-named Velocity. It's still in CTP, and is moving slowly, but looks like it will be pretty good. We'll be beating it up in the near future to see how it handles what we want it to do (> 2 million read/writes per hour). Will post back with results.
There is a 100% native .NET, well documented open source (LGPL) project called Shared Cache. Looks like it is not yet mentioned on SO, but it's promising and should be able to do what most people expect from a distributed cache. It even supports different strategies like distributed or replicated caching etc.
I will update this post with more details as soon as I had a chance to try it on a real project.
We're currently using an incredibly simple cache that I wrote in a couple of hours, based on re-hosting the ASP.NET cache in a Windows Service (more info and source code here). I won't pretend it's anywhere near as optimised as something like Memcached but we were just looking for something simple and free until Velocity came along, and it's held up extremely well even under fairly heavy load.
It comes down to our personal preference for core components - i.e. ones that affect whether the site is available or not - that they are either (a) supported by a vendor with a history of rapid and high quality support, or (b) written by us so that if something goes wrong we can fix it quickly. Open source is all well and good, and indeed we do use some OSS, but if your site is offline then unfortunately newsgroups et al don't have a 1 hour SLA, and just because it's OSS doesn't mean you have the necessary understanding or ability to fix it yourself.
We are using the memcached port for Windows and we are very pleased with it. The enyim.com memcached client API is great and easy to work with. It's also open source, which is a big advantage, if you ask me.
We are now using this setup in a production web-app and it has helped a lot in improving its performance.
There's a great .NET wrapper/port found here on Codeplex. Awesomesauce!
We use memcached with the enyim library in a production environment (www.funda.nl). Works fine, very pleased with it, but we did notice a substantial raise in CPU use on the clients. Presumably due to the serializing/deserializing going on. We do around 1000 reads per second.
One tried and tested product by 100's of customers worldwide is NCache. Its
a feature rich product that lets you store session state in a redundant and highly available manner, lets you share data
within the enterprise as well as bridging for WAN communication essentially acting as a data fabric and lastly it lets you build an elastic caching tier so that when
your application scales, you can add servers to the cache and actually boost performance further.