How to migrate repository data from Alfresco 4 to 5? - alfresco

I'm working on migration from Alfresco 4 to 5 and applying any add-ons on Alfresco 4 for the purpose is not applicable. Database used for the both versions are different from each other. I have tried with ACP files and it is very time consuming. Is there a size limitation on ACP files? What other methods can be used?

Use Standard Upgrade Procedure
What is your main intention? "Just" doing an upgrade from 4 to 5?
In that case the robust, easy way would be to:
Install required modules having custom models in your target sytstem (or if you customized models in the extension path than you have to copy that config)
backup and restore the alfresco repo database to your new (5.x) system. If your target system uses a different db product (not just a different version) you need to manage the db migration using db specific migration tools. It is no alternative to use Alfresco export/import.
sync alf_data/contentstore to your new system (make sure the db dump
is always older or you need to do an offline sync)
During startup Alfresco recognizes that the repo needs to be upgraded and does everything. Check the catalina.out for any output during migration.
If you need a subset from your previous system it is much easier to delete the content afterwards (don't forget to purge the trash and you should configure the cleaner job not to wait 14 days).
Some words concerning ACP
It is a nice tooling to export single directories but unfortunately it is limited:
no support accross Alfresco versions (exactly your case)
no support for site metadata / no site export/import (maybe it is working after the changes in 4.x when putting site metadata in nodes but I suppose nobody tested this)
must run in one transaction. So hard limits depend on your hardware / JVM configuration but I wouldn't recommend to export/import more than some thousand nodes at once.
If you really need to use export/import a huge number of documents you should use the import/export in a separate java process which means your Alfresco needs to be shut down. s. https://wiki.alfresco.com/wiki/Export_and_Import#Export_Tool

ACP does have a file limit (I can't remember the actual number), but we've had problems with ones below that limit too. We've given up on this approach in favor of using Alfresco bulk import tool.
One big advantage this tool has, it can continue a failed import from the point of failure, no need to delete the partially imported batch and start all over again. It can also update files as needed, something ACP method can't (would fail with DuplicateChildNameNotAllowed).

Related

When migrating from an old Artifactory instance to a new one, what is the point of copying $ARTIFACTORY_HOM/data/filestore?

Artifactory recommends the steps outlined here when moving from an old Artifactory server to a new one: https://jfrog.com/knowledge-base/what-is-the-best-way-to-migrate-a-large-artifactory-instance-with-minimal-downtime/
Under both methods it says that you're supposed to copy over $ARTIFACTORY_HOME/data/filestore, but then you just go ahead an export the old data and import it into the new instance, and in the first method you also rsync the files. This seems like you're just doing the exact same thing three times in a row. JFrog really doesn't explain why each of these steps is necessary and I don't understand what each does differently that cannot be done by the other.
When migrating Artifactory instance we need to take two things into consideration:
Artifactory Database - Contains the information about the binaries, configurations, security information (users, groups, permission targets, etc)
Artifactory Filestore - Contains all the binaries
Regardless to your questions, I would like to add that from my experience, in case of a big filestore size (500GB+) it is recommended to use a skeleton export (export the database only, without the filestore. This can be done by marking "Exclude Content" in Export System) and copy the filestore with the help of a 3rd party tool such as Rsync.
I hope this clarifies further.
The main purpose of this article is to provide a bit faster migration comparing to simple full export & import.
The idea of both methods is to select the "Exclude Content". The content we select to exclude is exactly the one that is stored in $ARTIFACTORY_HOME/data/filestore/.
The difference between the methods is that Method #1 exposes some downtime, as you will have to shut down Artifactory at a certain point, sync the diffs, and start the new one.
While method #2 exposes a bit more complexed process, that includes in-app replications to sync the diffs.
Hope that makes more sense.

Transfer content from one Alfresco instance to another (same version) on another server

What would be the best /better way to transfer repository content from one Alfresco (enterprise edition) to another instance running on a different server. Currently we copy the entire Alfresco database & file system under alf_data but that needs a down time on the servers.
I would require a mechanism without down time & the repository data be copied from one instance to another. Is there any way this is possible ?
In addition to Heiko's solution, you might be interested in:
The out-of-the-box replication service, which wouldn't be good for replicating your entire repo, but can be used for replicating a handful of nodes from one server to another.
A solution from Parashift which allows one- and two-way replication of nodes between servers.
An Alfresco presentation on using Apache Camel and Apache Kafka to replicate nodes between servers. This is available through Alfresco's professional services organization, but it may make it into the product at some point. Or you could use it as inspiration to write your own solution.
What is your intention? A standby system, a real copy, an external private cloud with a subset of data?
If you just need a 100% clone you can script backup & restore without downtime on the source server. Downtime is limited to the db and index restore on the target system. Your script shouldn't copy life data from solr index - use the backup done by the solr backup job instead. Depending on the database you use online db backup shouldn't be an issue.
Our Alfresco Virtual Appliance has preconfigured scripts and jobs for this task to start an additional alfresco instance from snapshot backups without copying the contentstore (we call this Alfresco Time Machine).
If your aim is an external private cloud server or a road warrior solution ecm4u has a commercial alfresco module to sync very efficient a subset of modified nodes including metadata/types/aspects (list of types and aspects needs to be defined). This sync provides a REST interface for automation and also manual execution from alfresco's admin console. We support mix of alfreso versions and editions. At the moment this sync is implemented as a unidirectional sync but could be extended as a bidirectional sync.
I recently did this task of installing 2 alfresco instances on my local running on 2 different ports.
While performing some tasks, I realized that 2 instances having same Repository ID is creating issues.
I was able to change the repository ID of one of them following below steps:
update alfresco-global.properties:
db.name="Add new DB Name"
(Alfresco will create a db db.name mentioned here while initializing)
and restart the server
If you are still facing issues, try deleting solr indexes under alf-data folder.

How to upgrade (merge) web.config with web deploy (msdeploy)?

I'm trying to set up a deployment chain for some of our ASP.NET applications. The tool of choice is Web Deploy (msdeploy) - for now. Unfortunately I'm stuck on a problem.
A high level overview of the chain is thus:
Web developer creates the code and checks it in SVN;
Buildserver sees the update and builds the msdeploy .zip package of the website;
The .zip package is automatically put inside our installer and sent to various clients;
The clients run the installer on their webserver(-s);
The installer uses msdeploy internally to deploy the .zip package and create a new website or upgrade an existing one.
Msdeploy makes it easy to deploy a new instance, but I'm stumped about how to perform an "upgrade" install. The main problem is the web.config file. Each client will most certainly have made some customizations there to suit their specific environment. The installer itself offers to set some more critical parameters at the first-time installation (achieved by msdeploy's parameter mechanism), but they can do others by hand.
On the other hand, we developers also occasionally make changes to web.config, adding some new settings or removing obsolete ones. So I can't just tell msdeploy to ignore the file entirely. I need some kind of advanced XML modification mechanism. It could be a script that the developers maintain, but then it needs to be run ONLY at upgrades, not new installs.
I've no idea how to accomplish this.
Besides that, sometimes there's also some completely weird upgrade logic. For example, the application comes with our company logo, but some clients have replaced that .png file to show their own logo. Recently we needed to update the logo - but only for clients that hadn't replaced it with their own.
Similarly, there might be some cache folders that might need to be cleaned at SOME upgrades but not at others. Or folders with user content that may not be touched (but come with default content at the initial installation). Etc.
How do you normally achieve this dual behavior for msdeploy packages? Do I really need to create 2 distinct packages for every application?
Suggestion from personal experience:
Isolate customisations
Your customers should have the ability to customise their set up and the best way is to provide them with something like an override file. That way you install the new package and follow by superimposing your customer's customisations on top of your standard setup. If its a brand new install then there will be nothing to superimpose.
> top-level --
> standard files |
images | This will never be touched or changed by customer
settings.txt |
__
> customer files --
images | Customer hacks this to their heart's content
settings.txt_override |
--
Yes, this does mean that some kind of merging process needs to happen and there needs to be some script that does that but this approach has several advantages.
For settings that suddenly become redundant just issue a warning to that effect
If a customer has their own logo provide the ability to specify this in the override file
The message is clear to customers. Stay off standard files.
If customers request more customisable settings then write the default if it does not exist into the override file during upgrades.
Vilx, in answer to your question, the logic for knowing whether it is an upgrade or not must be contained in the script itself.
To run an upgrade script before installation
msdeploy -verb:sync -source:contentPath="C:\Test1" -dest:contentPath="C:\Test2" -preSync:runcommand="c:\UpgradeScript.bat"
Or to run an upgrade script after installation
msdeploy -verb:sync -source:contentPath="C:\Test1" -dest:contentPath="C:\Test2" -postSync:runcommand="c:\UpgradeScript.bat"
More info here
As to how you know its an upgrade your script could check for a text file called "version.txt" and if it exists the upgrade bat script will run. Version to be contained within the text file. Bit basic but it should work.
This also has the added advantage of giving you the ability of more elegantly merging customer's custom settings between versions as you know which properties could be overriden for that particular version.
There are some general suggestions (not specific to msdeploy), but I hope that helps:
I think you'll need to provide several installers anyway: for the initial setup and for each version-to-version upgrade.
I would suggest to let your clients to merge the config files themselves. You could just provide them either detailed desciption of waht was added/changed/removed, and/or include the utility that simplifies the merge. Maybe this and this links will give you some pointers.
as for merging the replaced logos, other client's customization, I think the best approach would be to support branding your application. I mean - move all branding details to the place where your new/upgrade installers won't touch that.
as for the rest of the adjustments made by your clients, they do that on their own risk, so the only help you could provide them is to include the detailed list of changes (maybe even the list of changed files since the previous version) and the How-To article about merging the sources with tools like Araxis Merge or similar
Or.. you could create a utility and include it to the installer, which will try to do all the tricky merging stuff on client's machine. I would not recommend this way as it requires a lot of efforts/resources to maintain.
One more thing: you could focus on backup-ing the previous client copy before upgrade. So even client will have troubles with upgrading - that will be always possible to roll back. The only thing here for you is to provide a good feedback channel which your clients can use to shoot their troubles. This feedback will allow you to figure out what the troubles your clients have and how to make their upgrade process more comfortable.
I would build on what the above have said, but I would do it with transformations, and strict documentation about who configures what. The way you have it now relies on customer intervention against a config that is mission critical to the app deploy process.
Create three config file areas. One for development, one for the "production generic" build, and one that is an empty template for the customer to edit.
The development instance should be self explanatory. This is the transform that takes the production generic template and creates a web config for your development server. (it sounds like you are shooting for a CI type process here)
The "production generic" transform should set the app up for a hypothetically perfect instance of the app. This is what the install would look like if the architect had his way.
The customer transform is used by the customers to set up the web config as required to meet their own needs. Write some documentation and see what happens. Edit the docs as you help customers through the process.
It that what you were looking for? Thoughts?

Synchronizing Plone 4 sites

I'm using Plone 4 for my sites and I was wondering if there is a way to synchronize two plone sites i.e. be able to synchronize my development site with my production site.
I have looked at Zsyncer product and it appears it is no longer maintained. Besides, the last version is not compatible with Plone 4.
I am thinking of writing a custom script that will handle exporting of the data.fs files and the src files as explained in these two articles:
Copying a remote site database
Copying a Plone site
Is there a better way of synchronizing two plone sites as described by my use case above?
For keeping the code synchronized, you want collective.hostout
For the database, use collective.recipe.backup - you could probably also use hostout to import the backups
Not sure if this solution will fit all your needs, but I use DemoStorage which is build-in to ZODB since version 3.9 (Plone 4 use it).
DemoStorage you have to setup on development instance and use Data.fs from production. All changes will be stored in memory or in separated file (it depends how you configure it), so changes in dev will not be visible on production. If you have both instances on the same server you can use Data.fs directly (without copying it), so it will be always synchronized.
To configure it you have to modify buildout. See: https://pypi.python.org/pypi/plone.recipe.zope2instance#advanced-options
When on prod and on dev transactions changes the same objects (it happens occasionally) DemoStorage can show errors, Than you have to just reboot dev instance (if you use memory change storage) or remove file with changes and than reboot.

MSBuild: automate collecting of db migration scripts?

Summary of environment.
Asp.net web application (source stored in svn)
SQL Server database. (Database schema (tables/sprocs) stored in svn)
db version is synced with web application assembly version. (stored in table 'CurrentVersion')
CI hudson server that checks out web app from repo and runs custom msbuild file to publish/package app.
My msbuild script updates the assembly version of the web app (Major.Minor.Revision.Build) on each build. The 'Revision' is set to the currently checked out svn revision and the 'Build' to the hudson build number (incremented on each automated build).
This way i can match the app to a specific trunk revision also get other build stats from the hudson build number.
I'd like to automate the collecting of migration scripts (updated sprocs etc) to add to the zip package.
I guess by comparing the svn revision of the db that has yet to be deployed to, to the revision being deployed, i can find what db files have changed in the trunk since the last deployment to that database/environment.
This could easily be achieved by manually calling the svn diff -r REVNO:REVNO command to list changed .sql files. These files could then manually have to be added to the package.
It would be great if this could be automated.
Firstly i'd imagine I'll have to write a custom task to check the version of the db that has yet to be deployed to. After that I'm quite unsure.
Does anyone have any suggestion on how this would be achieved through an msbuild task either existing or custom?
Finally I'll have to autogen a script to add to the package that updates the database version table so as to be in sync with the application.
Integrating SQL changes into an automated build/deploy process is HARD. I know, because I've tried to to it a couple times with limited success. What you're trying to do is roughly on the right track, but I would argue that it's actually a bit too complicated. In your proposal, you suggest collecting the specific SQL scripts that need to be applied to your DB at build/package time. Instead, you should package all your delta scripts (for the entire history of your database) with your project, and calculate the deltas that actually need to be applied when you deploy -- that way, your deployable package can be deployed to environments with databases of differing versions. There are two implementation pieces you need to achieve this:
1) You need to package your deltas into your deployable package. Note that you should package deltas -- not static files that create the schema in its current state. These delta scripts should be in source control. It's okay to keep the static schema in source control as well, but you will have to keep it in sync with the deltas. You can actually use a tool like Red Gate's SQLCompare or the VS Database version to generate (most) deltas from the static schema. To get the deltas into your deployable package, and given that you're using svn -- you may want to look into svn:externals as a way to "soft link" the delta scripts into your web project. Your build script can then simply copy them into your deployable package.
2) You need a system that can read the list of delta files, compare them to an existing database, determine which deltas need to be applied to that database, and then apply the deltas (and update the bookkeeping information, like the database version). There is an open-source project (sponsored by ThoughtWorks) called dbdeploy that accomplishes this. I've had some success with that tool personally.
Good luck -- this is a tough nut to crack (correctly).
Have a look at SQL database projects. In VS 2010 they have been enhanced quite a bit and have built in deployment capabilities that can sync your DEV database to other environments.
Here are a few good links about DB projects in vs 2010:
http://msmvps.com/blogs/deborahk/archive/2010/05/02/vs-2010-database-project-building-and-deployment.aspx
http://weblogs.asp.net/gunnarpeipman/archive/2009/07/29/visual-studio-2010-database-projects.aspx
Try SQL Examiner:
http://www.sqlaccessories.com/Howto/Version_Control.aspx
You can automate script collecting with SQL Examiner command-line tool.
The solutions available today that target a .NET/SQL Server stack are:
DBUp (open source)
ReadyRoll (deeper Visual Studio integration,
auto-generation of scripts)
The latter product is one that we're actively developing here at Redgate.

Resources