How to migrate data to Alfresco from Ftp servers as data sources? - alfresco

The Situation: I'm going to implement a digital repository using alfresco community version 5.1 to manage our university digital content which is stored at a moment in differents ftp servers (software installers, books, thesis). I intent to use alfresco as a backend and Orchard CMS as our intranet frontend which is a non functional requierement and communicate both with CMIS. The general idea is that we use a social networking approch in which every user can modify metadata, add tags in order to improve the search, which by the way is the general objective of my work (allows searches and download to the digital content of our intranet , because right know it takes a lot of time to find anything because it is storage in a ftp server without a good cataloging).
I already successfully created a custom data model but when a decided to migrate the content from these ftps, i didn't find any documentation about it. I read about bulk import tool but it happent that i need the data locally in the same computer that runs alfresco, and as i said, the data source are different ftp server.
So How can i migrate data from differents ftps servers as datasource to Alfresco?. Is it necessary to physically import files to Alfresco or can i work with index pointing to the ftp files (keep the files in the ftps and have in Alfresco a reference of that object (I only have search and download functional requierements))?.
Please I need your help as a guidence because here in cuba we dont have experience working with Alfresco and it is very difficult to have access to internet. So if you can point out the way of fixing this, or any recommendation i will be forever greatfull. Thank You and Again so sorry to disturb You

If this were a one-time thing, you could use an FTP client of some sort to simply drag and drop the files from your FTP server into Alfresco's FTP server. This would copy the files only and would not set any custom metadata.
Alternatively, you could write some Java to do this. Java can read from FTP servers and can write to Alfresco via CMIS. This would give you the opportunity to set some properties on the objects written into Alfresco beyond just the file name, creation date, and modification date.
However, if you are going to do this regularly, you might want to look at an integration tool. For example, you could use Apache Camel to watch the FTP servers, and when there is a change, it could fetch the file and write it to Alfresco via CMIS.
This will likely take some coding to make it work exactly right, but hopefully this gives you some options to consider.

Related

Sharing large files efficiently on web link

I would like to provide a link on my web site to download a large file. This should be done with scale in mind. What is best efficient way as of today?
Of course i can do a classic way:
<a href="//download.myserver.com/largefile.zip" title="Download via HTTP" >
The problem with this approach is: i dont want traffic to my server to explode with downloads. So I would rather redirect to external hosting for this large file. What is best way to host this file then?
If you want to avoid download traffic to your server, then I personally suggest using Azure Blob Storage. There is lots of documentation and client libraries for .Net. It removes download traffic from your site and the security concerns of hosting files and moves them to the Azure cloud which is very secure to say the least.
If you want the files to be publicly available to anyone, then make a public container, get the url of the file you want and place it in the anchor tag, otherwise you may need to familiarise yourself with the blob leasing (plenty of documentation too). Though like most things it is not free. The silver lining is you only pay for what you use.
You can get started here.
https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-dotnet
Disclaimer,
I do not work for Microsoft, nor I do not benefit form this. This is just a personal opinion based on previous experiences and projects.

Build an Offline website - Burn it on a CD

I need to build a website that can be downloaded to a CD.
I'd like to use some CMS (wordpress,Kentico, MojoPortal) to setup my site, and then download it to a cd.
There are many program that know how to download a website to a local drive, but how to make the search work is beyond my understanding.
Any idea???
The project is supposed to be an index of Local community services, for communities without proper internet connection.
If you need to make something that can be viewed from a CD, the best approach is to use only HTML.
WordPress, for example, needs Apache and MySQL to run. And although somebody can "install" the website on his own computer if you supply the content via a CD, most of your users will not be knowledgeable enough to do this task.
Assuming you are just after the content of the site .. in general you should be able to find a tool to "crawl" or mirror most sites and create an offline version that can be burned on a CD (for example, using wget).
This will not produce offline versions of application functionality like search or login, so you would need to design your site with those limitations in mind.
For example:
Make sure your site can be fully navigated without JavaScript (most "crawl" tools will discover pages by following links in the html and will have limited or no JavaScript support).
Include some pages which are directory listings of resources on the site (rather than relying on a search).
Possibly implement your search using a client-side technology like JavaScript that would work offline as well.
Use relative html links for images/javascript, and between pages. The tool you use to create the offline version of the site should ideally be able to rewrite/correct internal links for the site, but it would be best to minimise any need to do so.
Another approach you could consider is distributing using a clientside wiki format, such as TiddlyWiki.
Blurb from the TiddlyWiki site:
TiddlyWiki allows anyone to create personal SelfContained hypertext
documents that can be published to a WebServer, sent by email,
stored in a DropBox or kept on a USB thumb drive to make a WikiOnAStick.
I think you need to clarify what you would like be downloaded to the CD. As Stennie said, you could download the content and anything else you would need to create the site either with a "crawler" or TiddlyWiki, but otherwise I think what you're wanting to develop is actually an application, in which case you would need to do more development than what standard CMS packages would provide. I'm not happy to, but would suggest you look into something like the SalesForce platform. Its a cloud based platform that may facilitate what you're really working towards.
You could create the working CMS on a small web/db server image using VirtualBox and put the virtual disk in a downloadable place. The end user would need the VirtualBox client (free!) and the downloaded virtual disk, but you could configure it to run with minimal effort for the creation, deployment and running phases.

How to scale a document storage system?

I maintain a web application (ASP.NET/IIS7/SQL2K8/Win2K8) that needs to access documents, actually hundreds of thousands of documents, and growing. Currently, they are all on a Windows 2K8 Server fileshare, being accessed by UNC path (SMB). The files are in a single flat directory and I'm trying to plan how to best improve this solution. I don't want to use the SQL Filestream attribute as it would be significant effort to migrate it all into that, and would really lock in to SQL Server. I also need to find a way to replicate the data for disaster recovery, so perhaps a solution can help with that too.
Options could be:
Segment files into multiple directories?
Application would add metadata for which directory it's on (or segment by other means)
Segment files into separate servers? (virtualize)
Backup becomes more complicated.
Application would add metadata for which server it's on
NAS Storage
SAN Storage
Put a service (WCF) in front of the files and have the app talk to the service
bonus of being reusable across many applications
Assuming I'm going to store on filesystem and not in database (I've read those disccusions here), which would be a more scalable solution?
You've got a couple issues:
- managing a large volume of (static?) files
- preparing for backups and disaster recovery of said files
I'll throw this out there, even though I'm not a fan of the answer, but you might poke around with the free SharePoint 2010 Foundation that's included with server 2k8. If you're having issues with finding the documents you need (either by search, taxonomy via tagging or other metadata) as well as document expiration and you don't want to buy a full blown document management system, this might be a solution. Of course it introduces new problems...
If your only desire is to have these files available to spit out on the web, then the file store like you're using now really is the simplest solution. For DR/redundancy purposes, I'd look at a) running them on a raid/SAN of some sort and b) auto-syncing them with the cloud (either azure or amazon). For b) you can get apps that make the cloud appear as a mapped drive and then use an rsync type software to keep the cloud up to date.
If you want to build something new and cool, you might think about moving the entire file archive into the cloud and just write a table in a db to manage the file name, old location, new cloud location and a redirector code that can provide the access tokens to requestors.
3 different approaches... your choice.

Large Video Uploads via a website

Some of the problems that can happen are timeouts, disconnections, and not being able to resume a file and having to start from the beginning. Assuming these files are up to around 5gigs in size, what is the best solution for dealing with this problem?
I'm using a Drupal 6 install for the website.
Some of my constraints due to the server setup I have to deal with:
Shared hosting with max 200 connections at a time (unlimited disk space)
Shared hosting.
Unable to create users through an API (so can't automatically generate ftp accounts)
I do have the ability to run cron-type scripts via a Drupal module.
My initial thought was to create ftp users based off of Drupal accounts and requiring them to download an ftp client for their OS of choice. But the lack of API to auto-create ftp accounts and the inability to do it from command line kind of hinder that solution. If there's a workaround someone can think of, let me know!
Thanks
Usually, shared hostings does not support large files uploads through the browser. A solution may be to use another files hosting for your large uploads. A nice an easy solution to integrate is Amazon S3 and its browser based upload with a from POST.
It can be integrated in a custom module that provide an upload form protected using Drupal access control. If you require the files to be hosted on the Drupal server, you can use cron (either Drupal's or an external one) to move the files from S3 to your own hosting.
You're kind of limited in what you can do on a shared host. Your best option is probably to install SWFUpload and hope there aren't a lot of mid-upload errors.
Better options that you probably can't use on a shared host include the upload progress PHP extension (which Drupal automatically uses when it's installed) and, as you said, associating FTP accounts with Drupal accounts.

Developing an online music store

We need to develop an application to sell music online. No need to specify that all will be done quite legally and in so doing, we have to plan an interface to pay artists. However, we are confronted with a question: What is the best way to store music on the server? Should we save it on server's disk from a HTTP fileupload? Should we save via FTP or would it be wiser to save it in the database? No need to say that we need it to be the most safiest as possible. So maybe an https is required here. But, we what you think is the best way? Maybe other idea? Because in all HTTP case, upload songs (for administration) is quite long and boring, but easly linkable to a song that admin create in his web application comparativly to an FTP application to upload song on server and then list directory in admin part to link the correct uploaded song to the song informations in database.
I know that its maybe not quite clear, it's because i'm french but tell me and I will try to explain part that you don't understand.
I've used Krystalware's SlickUpload ASP.NET control in the past to take care of the uploading part for you (you can use the in built control if you want to but this has a lot of the nifty ajax-style features done for you and is quite cheap).
Edit:
[I would not advocate storing the music file itself in the database. Much better [in my humble opinion] only to store the location of the file in the database. If you use one of the cloud services listed below then the location might simply be an HTTP link]
I'd also seriously consider using a cloud storage service for storing the music files. Something like Amazon S3 or Rackspace Cloud Files. CloudFiles is good because, if you wish, you can also enable CDN delivery (Content Delivery Network) which means your users can access the uploaded music tracks much faster than if served off your local web server, for instance.
Hope this helps,
Richard.

Resources