How big can be the Plone 4 database? - plone

In 2004 Mahendra gives a talk about using Plone with DSpace to manage digital assets.
http://linux-bangalore.org/2004/schedules/talkdetails.php?talkcode=C0300032
Mahendra said:
Zope provides a lot of features and an excellent architecture for handling digital content. However, Zope has issues as the stored data scales to the order of Giga/Tera bytes. A combination of Zope + Plone is great as a portal management system, but if an attempt is made to use it for storing digital assets, performance can drop down.
So he propposes using DSpace to manage digital assets instead that Plone. But maybe it has changed today. What are the limits to use Plone as a Digital Assets Manager now?

Since that article was written, the ZODB (Zope's data storage system) has grown support for blobs stored as separate files on the filesystem, which basically means you're limited by the capabilities of the filesystem in use. I know of multiple installations with more than 20GB of data, which was the number mentioned in the article.
Now if you want to catalog the assets so they can be easily found, then you'll hit other limits based on the sophistication of your catalog algorithms and data structures. Plone can handle quite a bit of data, but it tends to require lots of RAM and careful tuning (and probably customization) once you get beyond 50,000 content items or so.

Related

Need suggestion regarding SCORM compliant leaning solutions

We are building an m-learning solution[IOS and Android compatible] at our company. The product needs to be SCORM compliant. I would like to know whether it should be developed in-house by the developers or other paid options should be pursued? What are other ways of making our product SCORM compliant? We are not rally positive about using SCORM Engine for this due to its high cost solution to our problem here.Any suggestion/help is appreciated.
You can include SCORM within content using a number of open source options available on GitHub.
Getting SCORM in the content (free) is step 1.
Packaging, bundling and deploying is really step 2.
This typically has a close relationship to how Curriculum defines a structure of lessons, modules, units etc. Not knowing exactly how they want to organize this, I can speculate that you may just have a simple "I want to know that the student viewed the content" approach. If you get into a more rich dependency on how the student performs dictating what they see or do next, that requires a much for up front design so you can bridge the design, development, and deployment of your content.
Including SCORM Support in content -
Like mentioned if you search google for my SCOBot project or Pipwerks you'll hit the ground running.
Requires JavaScript friendly developer and some base SCORM knowledge attained thru reading. This could be outsourced.
Knowing the version of SCORM you wish to support can help. Consult the LMS to find out that info.
Far as presenting / creating content; if you are doing this from scratch you'd need a HTML/JS developer or if its more interactive your dipping into WebGL, Canvas or beyond. There are other paid services like iSpring, Captivate and others that offer content creation with SCORM Standards support. They may even take care of the packaging for you (covered below).
Packaging -
This requires a zip (CAM content aggregated model) which includes a imsmanifest.xml file to describe a one to many relationship of a TOC. Again simple is 1, many begins to allow you to group tiers and add objectives and other things increasing complexity but doable.
You can perform creating this package with XML, Zip and specification knowledge. I have a Packaging app on my site and a Mac (free) applescript which can also perform very basic packaging. I am not away of any other free options.
Deployment
Commonly performed thru FTP/FileShare by uploading these CAM (zip) packages. LMS decompresses and reads the manifest. Sometimes you can just copy the raw files up to the LMS thru a media / content server but this greatly depends on the options.

Why Cloudera's Impala is still "incubating"?

We are using Impala in my company and we have used it in my previous one without
problems.
Is there something affecting possible production use (for example, it breaks under heavy usage, or memory leaks possible, or concurrent access is not recommended)?
Quoting from http://incubator.apache.org/:
The Apache Incubator has two primary goals:
Ensure all donations are in accordance with the ASF legal standards
Develop new communities that adhere to our guiding principles
According to the above, the Incubator doesn't have to do anything with performance or stability, but rather attracting an active, diverse community to a given project and reviewing the licenses involved.

Alfresco Community 5 Share Clustering

I'm seeing a lot of conflicting information on the internet about Alfresco Share clustering. From what I can find, it looks like clustering was removed completely from Alfresco Community in versions 4.2 and above.
I did find some documentation showing that Alfresco One 5 has Share clustering and I noticed that I can enable hazelcast in Alfresco Community 5 but the clustering doesn't work at all.
Is there a way to have more than 1 instance of Alfresco Community 5 behind a load balancer and have proper synchronization/replication/clustering occur between the share instances?
Short answer
There is no cluster and no load balancer support for the Alfresco Community version (I know of). Alfresco removed that feature from the community version starting with 4.2 when they refactored the whole cluster thing.
Long answer
What are you trying to archive?
If scalability is your goal you should focus on the bottlenecks in the Alfresco architecture which will not be solved by clustering / load balancing. I haven't seen a system where Share tier was the bottleneck.
quite the contrary: If load from share against the repository tier is too high you will fall into a timeout and thread escalation since Alfresco follows the "retrying transaction" principle: If errors occur, share will retry - which means: if repositry is answering too slow share will create new requests/threads until the OS reaches kernel or process limits without any result.
So instead you should focus on optimizing the repository tier to become as fast as possible to avoid thread escalations in share (This also can't be achived by clustering):
transformation --> understand, replace or disable sync transfomation stuff running on repository tier
search --> understand, optimize tracking and run SOLR on separate host(s), but tracking will rely on the transformation performance of the repository tier
caching --> use smart reverse proxys to cache Share stuff on client and proxy side to minimize traffic
very fast/smart storage concepts on db and index tier
If availability is your concern you may get better results by using HA features from virtualisation platforms like VMWare ESX and your support efforts will be a fraction compared to clustered Alfresco.

Hosting big files for users

We need to be able to supply big files to our users. The files can easily grow to 2 or 3GB. These files are not movies or similiar. They are software needed to control and develop robots in an educational capacity.
We have some conflict in our project group in how we should approach this challenge. First of all, Bittorrent is not a solution for us (despite the goodness it could bring us). The files will be availiable through HTTP (not FTP) and via a filestream so we can control who gets access to the files.
As a former pirate in the early days of the internet i have often struggled with corrupt files and using filehashes and filesets to minimize the amount of redownload required. I advocate a small application that downloads and verifies a fileset and extracts the big install file once it is completely downloaded and verified.
My colleagues don't think this is nessecary and point to the TCP/IP protocols inherit capabiltities to avoid corrupt downloads. They also mention that Microsoft has moved away from a downloadmanager for their MSDN files.
Are corrupt downloads still a widespread issue or will the amount of time we spend creating a solution to this problem be wasted, compared to the amount of people who will actually be affected by it?
If a download manager is the way to go, what approach would you suggest we take?
-edit-
Just to clearify. Is downloading 3GB of data in one chunk, over HTTP a problem OR should we make our own EXE that downloads the big file in smaller chunks (and verifies them).
You do not need to go for your own download manager. You can use some really smart approach.
Split files in smaller chunks, let's say 100MB each. So even if a download is corrupted, user will end-up downloading with that particular chunk.
Most of web servers are capable of understanding and treating/serving range headers. You can recommend the users to use download manager / browser add-ons which can use this capacity. If your users are using unix/linux systems, wget is such a utility.
Its true that TCP/IP has capacities of preventing corruption but it basically assumes that network is still up and accessible. #2 mentioned above can be one possible work-around to the problems where network was completely down in middle of download.
And finally, it is always good to provide file hash to your users. This is not only to ensure the download but also to ensure the security of the software that you are distributing.
HTH

What is the best way to store big files in Plone 3?

I want to serve a lot of big files in a Plone site. By big files I mean around 5MB (music) and a lot of them. I've already do it straight to the ZODB, not a good idea. I'm running Plone 3.1.1 and Zope 2.10.6.
Zodb blob support is the best, most integrated way to deal with large files. Big files are stored transparently on the filesytem instead of in the zodb object database. "Transparently" in this case means that you won't notice it in your actual programming work after initial configuration.
The blob functionality has been backported to current (halfway 2008) zope versions and can be used in plone 3. Use plone.app.blob in your project for this: http://plone.org/products/plone.app.blob.
Yeah, you shouldn't use anything else than the ZODB BLOB support at this point. It works fine with the 3.x series of releases.
More information in ticket #6805
— Alexander Limi, Plone co-founder
Clarifying, to the best of my knowledge:
from various candidate technologies in a PLIP (Plone Immprovement Proposal), plone.app.blob is the lead contender with widespread support
-- for exceptional use cases, we sometimes find something other than BLOBs recommended
4.0 is currently the most likely milestone for plone.app.blob to become a product within Plone core
in the meantime plone.app.blob is a recommended add-on product for current 3.x versions of Plone
-- for use cases that suggest BLOB-like technologies.
As you may already know, the long-term solution for this is supposed to be the ZODB BLOB support. Ticket 6805 is probably the most authorative source on this. Unfortunately, the milestone is set to 4.0, and running it in production on an older release is perhaps not a good solution.
There has, historically, existed a lot of Plone products for storing files externally, keeping only metadata in the ZODB. I have tried several of them, and from my experience, there is not a single one that works well with current Plone/Zope releases. Don't trust me on this, though, I have not tried any products of this type the last year or so.
Personally, I would go for a solution that is as simple as possible and doesn't involve Plone more than neccesary. Storing the music files on disk, serving them directly from apache/whatever web server you use, keeping only metadata in Plone - in a product you write yourself, will give you a robust solution with good performance. That is, your product should produce links to a path on your web server where the music files are available.
If you require authorization for download of the music files and assuming that you run lighthttpd or apache in front of your Zope, looking at a solution based on X-sendfile is probably the best option. With X-sendfile, you keep the files on disk, and add a header (X-sendfile) to the response when a music file should be sent to the client browser. The web server will pick this header up and send the file to the client, without Plone being involved.
Some pointers:
http://tn123.ath.cx/mod_xsendfile/ (The apache module)
http://john.guen.in/past/2007/4/17/send_files_faster_with_xsendfile/ (Ruby example)
I have plone.app.blob installed on some low-traffic sites and installable (ready to roll, if you like) for my busier production sites in the same instance.
There's the 4.0 milestone but I'll certainly review (and probably click the install button for plone.app.blob on my production sites) around 3.4 time.
A couple of references:
http://n2.nabble.com/PLIPs-I%27d-love-to-see-for-Plone-3.3-tp1123218p1130015.html
http://dev.plone.org/plone/ticket/8629#comment:2 highlight
… 3.4, when we'll probably have blob filestorage specification
support added to plone.recipe.zeoserver and zope2instance. That will
give us a standard location for whatever owner/permission fixups the
installers need to make.
In context: I'm playing roughly with plone.app.blob and a very mixed bag of other add-on products with versions 3.1.7 and 3.2a1 of Plone based on standard and experimental installers. In these environments, without me treating things with kid gloves, Plone sies behave remarkably well and when (as expected) experiments lead to oddities, the support from the community is paced and proper.

Resources