Transfer content from one Alfresco instance to another (same version) on another server - alfresco

What would be the best /better way to transfer repository content from one Alfresco (enterprise edition) to another instance running on a different server. Currently we copy the entire Alfresco database & file system under alf_data but that needs a down time on the servers.
I would require a mechanism without down time & the repository data be copied from one instance to another. Is there any way this is possible ?

In addition to Heiko's solution, you might be interested in:
The out-of-the-box replication service, which wouldn't be good for replicating your entire repo, but can be used for replicating a handful of nodes from one server to another.
A solution from Parashift which allows one- and two-way replication of nodes between servers.
An Alfresco presentation on using Apache Camel and Apache Kafka to replicate nodes between servers. This is available through Alfresco's professional services organization, but it may make it into the product at some point. Or you could use it as inspiration to write your own solution.

What is your intention? A standby system, a real copy, an external private cloud with a subset of data?
If you just need a 100% clone you can script backup & restore without downtime on the source server. Downtime is limited to the db and index restore on the target system. Your script shouldn't copy life data from solr index - use the backup done by the solr backup job instead. Depending on the database you use online db backup shouldn't be an issue.
Our Alfresco Virtual Appliance has preconfigured scripts and jobs for this task to start an additional alfresco instance from snapshot backups without copying the contentstore (we call this Alfresco Time Machine).
If your aim is an external private cloud server or a road warrior solution ecm4u has a commercial alfresco module to sync very efficient a subset of modified nodes including metadata/types/aspects (list of types and aspects needs to be defined). This sync provides a REST interface for automation and also manual execution from alfresco's admin console. We support mix of alfreso versions and editions. At the moment this sync is implemented as a unidirectional sync but could be extended as a bidirectional sync.

I recently did this task of installing 2 alfresco instances on my local running on 2 different ports.
While performing some tasks, I realized that 2 instances having same Repository ID is creating issues.
I was able to change the repository ID of one of them following below steps:
update alfresco-global.properties:
db.name="Add new DB Name"
(Alfresco will create a db db.name mentioned here while initializing)
and restart the server
If you are still facing issues, try deleting solr indexes under alf-data folder.

Related

How to migrate repository data from Alfresco 4 to 5?

I'm working on migration from Alfresco 4 to 5 and applying any add-ons on Alfresco 4 for the purpose is not applicable. Database used for the both versions are different from each other. I have tried with ACP files and it is very time consuming. Is there a size limitation on ACP files? What other methods can be used?
Use Standard Upgrade Procedure
What is your main intention? "Just" doing an upgrade from 4 to 5?
In that case the robust, easy way would be to:
Install required modules having custom models in your target sytstem (or if you customized models in the extension path than you have to copy that config)
backup and restore the alfresco repo database to your new (5.x) system. If your target system uses a different db product (not just a different version) you need to manage the db migration using db specific migration tools. It is no alternative to use Alfresco export/import.
sync alf_data/contentstore to your new system (make sure the db dump
is always older or you need to do an offline sync)
During startup Alfresco recognizes that the repo needs to be upgraded and does everything. Check the catalina.out for any output during migration.
If you need a subset from your previous system it is much easier to delete the content afterwards (don't forget to purge the trash and you should configure the cleaner job not to wait 14 days).
Some words concerning ACP
It is a nice tooling to export single directories but unfortunately it is limited:
no support accross Alfresco versions (exactly your case)
no support for site metadata / no site export/import (maybe it is working after the changes in 4.x when putting site metadata in nodes but I suppose nobody tested this)
must run in one transaction. So hard limits depend on your hardware / JVM configuration but I wouldn't recommend to export/import more than some thousand nodes at once.
If you really need to use export/import a huge number of documents you should use the import/export in a separate java process which means your Alfresco needs to be shut down. s. https://wiki.alfresco.com/wiki/Export_and_Import#Export_Tool
ACP does have a file limit (I can't remember the actual number), but we've had problems with ones below that limit too. We've given up on this approach in favor of using Alfresco bulk import tool.
One big advantage this tool has, it can continue a failed import from the point of failure, no need to delete the partially imported batch and start all over again. It can also update files as needed, something ACP method can't (would fail with DuplicateChildNameNotAllowed).

Deploying a Symfony 2 application in AWS Opsworks

I want to deploy a php application from a git repository to AWS Opsworks service.
I've setup an App and configured chef cookbooks so it runs the database schema creation, dumping assets etc...
But my application has some user generated files in a sub folder under web root. git repository has a .gitignore file in that folder so an empty folder is there when i run deploy command.
My problem is : after generating some files (by using the site) in that folder, if I run 'deploy' command again 'Opsworks' adds a new release under 'site_name/releases/xxxx' folder and symlink to it from 'site_name/current' folder.
So it makes my previous 'user generated stuff' inaccessible. What is the best solution for this kind of situation?
Thanks in advance for your kind answers.
You have a few different options. Listed below in order of personal preference:
Use Simple Storage Service (S3) to store the files.
Add an Elastic Block Store (EBS) volume to your server and save files to the volume.
Save files to a database (This is something I would not do myself but the option is there.).
When using OpsWorks think of replicable/disposable servers.
What I mean by this is that if you can create one server (call it server A) and then switch to a different one in the same stack (call it server B), the result of using server A or server B should not impact how your application works.
While it may seem like a good idea to save your user generated files in a directory that is common between different versions of your app (every time you deploy a new release directory is generated) when you destroy your server, you run the risk of destroying your files.
Benefits and downsides of using S3?
Benefits:
S3 will give you high redundancy and availability to your files.
S3 is external to your application server so if your server dies or decide to move it to a different region, you can continue using the same s3 bucket.
Application Easy to scale. You could add multiple application servers that read and write files to S3.
Downsides:
You need extra code in you application. You will have to use the AWS API in order to store and retrieve the files. Using the S3 API is not hard but it may require an extra step to get you where you need. Take a look at the "Using an Amazon S3 Bucket" walk through for reference. This is be the code they use to upload the files to the S3 bucket in the example.
Benefits and downsides of using EBS?
Benefits:
EBS is an "external hard drive" that you can easily mount to your machine using the OpsWorks Resource Manager.
EBS volumes can be backed-up and restored.
It may be the fastest option to implement and integrate to your application.
Downsides:
You need to assign it to an instance before it is running.
It could be time consuming to move from server A to server B (downtime may be required).
You can not scale your application horizontally. While you can create copies of the EBS and assign them to different instances, the EBS will not be shared.
Downside of using a database?
Just do a google search on "storing files in database"
Take a look at Storing Images in DB - Yea or Nay?
My preferred choice would be to use S3, but ultimately this is your decision.
Good luck!
EDIT:
Take a look at this repository opsworks-chef-cookbooks it contains some recipes to deploy Symfony2 application on OpsWorks. I have been using it for over a year and works quite well.
Use Chef templates, and use them in a recipe in the opsworks deploy lifecycle event.

The right way to create multiple instances for Load Balancer (EC2)

I installed Wordpress using EC2. I created a Load Balancer by creating image (AMI) then adding both Wordpress1 and Wordpress2 on Load Balancer. But I'm still getting database error and have to restart the instances. If I'd like to make 4 instances as Load balancer, are the steps the same? because I saw a "Number of Instances" option when I launched an AMI. Default value is 1. I'm not sure if I should enter 3 or 4 to create multiple instances in one click.
Also, if I update on Wordpress1 instance, will the updates show if the domain will load Wordpress2 instance?
If you want to launch multiple instances and a database etc, you should consider using
AWS CloudFormation. CloudFormation is just a big json string that contains the configuration of your environment, including the servers, autoscaling, access, register with the loadbalancer, etc.
See http://aws.amazon.com/en/cloudformation/ for more details.
There is already an example template for wordpress including a database and autoscaling groups (example wordpress template)
However like datasage mentioned you will need to make adjustments to wordpress to make it working in a multiserver environment.
The "problem" with multiserver environments is that if you upload a file or in your case upgrade wordpress, it will only happen on one server, which could be terminated at any point. Furthermore the upgrade could contain changes in the database structure and then its getting complicated.
If you are building something in the cloud you should always keep in mind that every service you build, in you case the frontend webservers and the database should be allowed to fail without interrupting your service.
Another point is, that you should avoid doing stuff by hand, automation is the key.
An environment where you need to link your server by hand to a loadbalancer is not very useful in the cloud where servers are continuously terminated, rebooted and exchanged.
For you webservers you can use "autoscaling groups" to get this behavior.
If you are using autoscaling groups and a server is terminated or considered unhealthy, a new one will be started automatically and registered with the loadbalancer as soon as it is considered as healthy.
For your database amazon offers for rds multi AZ environments which provide a automatic failover.
Applying upgrades in the cloud can be a tricky and there are different ways to do this. for example using a shared NFS mount with the code base, git deployments or the way you already started: creating a new AMI for every upgrade and then replacing the servers. There are a lot options and they all have their benefits and drawbacks.
As far as i understand you use-case the cloud is maybe not the right choice at the moment.
Normally hosting a small business in the cloud is much more expensive than using a single server. You will only save money if you need like 20 servers in the evening and only 2 or 3 for the rest of the day. Of course there are a lot more points to consider but that would be to much.
Autoscaling in ec2 is horizontal scaling. Which means that instances are added as your infrastructure scales up. This in contrast to vertical scaling where the a single instance is given more resources.
In order to use this effectively, each instance cannot store data that may be needed by other instances. The most common requirement is the database which will need to exist on its own instance outside of the autoscaled instances. You could use RDS for this.
Wordpress also stores file uploads, plugins and themes within the wp-content folder within the wordpress install. By default, if you upload a file, it will be stored on one instance but not any of the others. You could store everything on an NFS volume shared by one of the instances, or you could try a plugin like this: http://wordpress.org/plugins/wp2cloud-wordpress-to-cloud/

MSBuild: automate collecting of db migration scripts?

Summary of environment.
Asp.net web application (source stored in svn)
SQL Server database. (Database schema (tables/sprocs) stored in svn)
db version is synced with web application assembly version. (stored in table 'CurrentVersion')
CI hudson server that checks out web app from repo and runs custom msbuild file to publish/package app.
My msbuild script updates the assembly version of the web app (Major.Minor.Revision.Build) on each build. The 'Revision' is set to the currently checked out svn revision and the 'Build' to the hudson build number (incremented on each automated build).
This way i can match the app to a specific trunk revision also get other build stats from the hudson build number.
I'd like to automate the collecting of migration scripts (updated sprocs etc) to add to the zip package.
I guess by comparing the svn revision of the db that has yet to be deployed to, to the revision being deployed, i can find what db files have changed in the trunk since the last deployment to that database/environment.
This could easily be achieved by manually calling the svn diff -r REVNO:REVNO command to list changed .sql files. These files could then manually have to be added to the package.
It would be great if this could be automated.
Firstly i'd imagine I'll have to write a custom task to check the version of the db that has yet to be deployed to. After that I'm quite unsure.
Does anyone have any suggestion on how this would be achieved through an msbuild task either existing or custom?
Finally I'll have to autogen a script to add to the package that updates the database version table so as to be in sync with the application.
Integrating SQL changes into an automated build/deploy process is HARD. I know, because I've tried to to it a couple times with limited success. What you're trying to do is roughly on the right track, but I would argue that it's actually a bit too complicated. In your proposal, you suggest collecting the specific SQL scripts that need to be applied to your DB at build/package time. Instead, you should package all your delta scripts (for the entire history of your database) with your project, and calculate the deltas that actually need to be applied when you deploy -- that way, your deployable package can be deployed to environments with databases of differing versions. There are two implementation pieces you need to achieve this:
1) You need to package your deltas into your deployable package. Note that you should package deltas -- not static files that create the schema in its current state. These delta scripts should be in source control. It's okay to keep the static schema in source control as well, but you will have to keep it in sync with the deltas. You can actually use a tool like Red Gate's SQLCompare or the VS Database version to generate (most) deltas from the static schema. To get the deltas into your deployable package, and given that you're using svn -- you may want to look into svn:externals as a way to "soft link" the delta scripts into your web project. Your build script can then simply copy them into your deployable package.
2) You need a system that can read the list of delta files, compare them to an existing database, determine which deltas need to be applied to that database, and then apply the deltas (and update the bookkeeping information, like the database version). There is an open-source project (sponsored by ThoughtWorks) called dbdeploy that accomplishes this. I've had some success with that tool personally.
Good luck -- this is a tough nut to crack (correctly).
Have a look at SQL database projects. In VS 2010 they have been enhanced quite a bit and have built in deployment capabilities that can sync your DEV database to other environments.
Here are a few good links about DB projects in vs 2010:
http://msmvps.com/blogs/deborahk/archive/2010/05/02/vs-2010-database-project-building-and-deployment.aspx
http://weblogs.asp.net/gunnarpeipman/archive/2009/07/29/visual-studio-2010-database-projects.aspx
Try SQL Examiner:
http://www.sqlaccessories.com/Howto/Version_Control.aspx
You can automate script collecting with SQL Examiner command-line tool.
The solutions available today that target a .NET/SQL Server stack are:
DBUp (open source)
ReadyRoll (deeper Visual Studio integration,
auto-generation of scripts)
The latter product is one that we're actively developing here at Redgate.

ASP/ASP.net: Web-based JET database management tool?

I need to manipulate some tables in a JET database housed on a web-server:
check existing indexes
change table cluster/primary key
see what tables exist
rename tables
add tables
drop tables
browse data
etc
I don't have the option of installing PlaneDisaster or Access (even if i had it) on the local machine.
I've already written a generic web-based query tool. I'd rather not have to get into writing a whole web-based database maintenance GUI. Someone must have done this already, and probably many times over.
A partial answer might be Compare'Em
http://home.gci.net/~mike-noel/CompareEM-LITE/CompareEMscreens/CompareEM-About.htm The Pro version allows you to create SQL statements to update the Access database file. This will allow you to generate the differences between one version and a newer version.
His website isn't very clear but as I recall the price for the Pro version was $10.
As you say you have already done a generic web based query tool. The problem with JET is that you cannot connect with it as database server like you can do with one SQL server in order to process changes to tables and other maintenance procedures. Jet is is not a client/server RDBMS. You need to have an application in the server to do that for you as you already have done with your generic web based tool, or download the database to your machine. That's why you have done some procedures and locate them in the server as asp pages.
Anyway you can use JetSQLConsole, if you don't want to use Planedisaster or Access, but remember that you need always an application on the server to to the job for you
You can also use access "in your machine" and connect to a database located in a URL (http://myserver/mydatabase.mdb) but remember when you are doing this you are downloading all the database and when you save it you are uploading it again.

Resources