Document Association in Alfresco - alfresco

If a Alfresco user selects x number of documents from the current folder and wants to have a parent document where all x documents are attached in a single document and can download it. Should I create a custom web script to perform this or how can association concept be leveraged here. Eg. Lets say a product requirement document, testing and release document needs to be attached together into a single document.

It seems to me you mismatch document (download one combined document) and collection (association) concepts.
You could create your own custom document model which supports to logically attach documents to another (master) document by adding an association. You could also define in that model that the attached documents will be stored as a child of the master which will somehow hide the attached documents in the folders. We implemented this concept for our Alfresco Email and our custom Attachment module.
If you need the possibility to download that logical document (which still may be a collection of documents) the easiest way would be to implement a custom action shown up on your master document which will zip the master and all connected documents. If you expect to download only a single document like a PDF you will have to write your custom conversion logic which will convert the single docs into pages and to compose them into a single PDF. This could be sophisticated since the documents could be of any format. Maybe you also want/need to save metadata, process information, decisions, structure also ...

Related

Creating a Firestore Database from existing files

I have a few music albums - basically just files in folders - that I want to upload to Firebase Storage.
One would usually run a function after a file has been uploaded to create a Document containing the metadata about the Song but that's where Im stuck.
I can get most infos I need by reading the Tracks ID3 Tags but in a NoSql Database I think im supposed to not only create a Document for the Track but also a Document for each album with an array of all tracks - or at least an array with all track ids.
But when or how do I create the Album Document? Another example is the Album Cover.. I want to save the Url inside the Track Document as well as in the corresponding Album but that means that the Artwork is the first thing I need to upload because I can't add an URL because it doesn't exist yet.
I feel like I have to get this right before I start because updating everything afterwards is a pain.
Is using upload functions really the way to go here or is there really a tool or another way im missing.
thank you very much
You mentioned Firebase Storage wich is a just a cover for Cloud Storage and it's a obejct managment system not a Database, however I think you are refering to Firebase Firestore.
On firestore since as you mentioned is a NoSQL DB and the schema structure your Db should have, There no correct way to do this and will defitetly depend on each specific use case. However you can take a look at this docs where it's expalined how to arquitecture your schema thinking from a SQL to a NoSQL format.
Among other information the main pointsa are:
In general, you can treat documents as lightweight JSON records
You have complete freedom over what fields you put in each document
After you create the first document in a collection, the collection exists. If you delete all of the documents in a collection, it no longer exists.
You can use sub collections inside of collections
Deleting a document does not delete its subcollections!
And finally to have an idea on how to structure the information, you can take a look at this repo where "NoSQL-Spotify by Luke Halley" explains a NoSQL schema based on spotify so I think it shoudl fit your need or at least give you a starting point.

How Can I Quickly Populate a Firestore DB?

I'm setting up a Firestore database and am playing around with structuring it. Is there a way to populate and change it quickly without having to add/change fields manually every single time?
Two example things I am looking to do are:
1) Populate collections with documents that have predetermined fields. Currently I have to add the fields manually every single time.
2) Edit the fields en masse for all documents within a collection (e.g. change the name of a field, delete a field entirely, add a new field)
The Firebase console doesn't seem to provide these tools, would my best bet be to write a separate app specifically for this purpose?
Since such bulk uploads and bulk edits are not part of the console, you'll have to build something yourself indeed.
A good place to start would be the Cloud Firestore API, which allows adding and updating documents in the database.

How to embed custom code in MarkLogic Data Hub Framework?

I have created an Entity and have created input and harmonize flow. I can able to see generated XQuery files.
Now i have a requirement where i need to do some (if-else) on my raw data and based on the conditions i need to Push some of the data to my FINAL Database and some Data will remain in STAGING itself (That should not go into FINAL).
I am confused on which files (main.xqy,headers.xqy etc) i need to do code changes so if i run my Harmonize Flow then entire thing should work in one go.
Each of the harmonization flow plugins in the MarkLogic Operational Data Hub Framework are intended to be customized. There are five plugins, collector.xqy, content.xqy, header.xqy, triples.xqy, and writer.xqy. The simplest harmonization follows something like this:
Identify which documents in the staging database need to be processed in the collector plugin
Transform the documents from step 1. in the content plugin (add the if/else logic)
Write the harmonized documents from step 2. to the final database using the writer plugin.
Here are summaries of each of the plugins from the ODH Wiki:
Collector
Select IDs of documents in the staging database to be processed.
Content
Perform transformation of input data into a normalized or canonical format to store in the final document or documents. You can add custom transformation code here.
Header
A headers plugin is responsible for extracting header items from the content. You can add metadata or augment the content in the header section here.
Triples
A triples plugin is responsible for extracting semantic triples from the source content. You can control the embedded triples in the envelope document.
Writer
A writer plugin is responsible for writing the final envelope to the database. You can control the output permissions, URI, collections etc. of the harmonized document with this module.

How to update metadata using content indexs in webcenter content

I need to create a program which can search a document and fill the metadata from document( eg. resume of candidate) like user experience, user skill , location etc.
for this i like to use oracle indexing mechanism(Oracle text search) because it index all the data from document. when it index the document, i like to first update my metadata field from indexed data and then content server will update their indexes. Can anyone help me how i will get to know the working of indexer and event on which i will trap and do some modification for updating my metadata.
i need to update metadata because requirement are:
Extensive choices for Search Filter criteria (that searches within Resumes and not just form keywords) :
- Boolean search between multiple parameters
- Have search on Skills, Years of experiences, particular company, education qualification, Geo/Location and Submission date of the profile.
- Search on who referred, name, team , BU etc.
- Result window adequate size of results, filters
- Predefined resume filter criteria to assisting screening in case of candidate applying on job portal
You are looking at this problem from the wrong end. The indexer (OracleText Search) is a powerful and complex tool embedded inside the workings of the database. What you are suggesting is to interpret the results of text indexing and use this as metadata for your content - if I am not mistaken? OracleText generates huge amounts of data and literally "chops" up documents word for word. For you to make meaningful metadata from this would be a huge task.
Instead you should be looking at the capture of the metadata from as close to the source as possible. This could be done using (if you are using MS-OFFICE) Word vbScript when the user saves to the repository or filesystem. I believe you can fully manipulate the metadata in a document at savetime.
You will of course need to install the Oracle WebCenter Content Desktop Integration suite.
Look into Oracle WebCenter Capture. WebCenter Capture can scan a document and allows metadata to be automatically tagged on the document. WebCenter Capture integrates with WebCenter Content (WCC) and allows you to directly checkin scanned documents to WebCenter Content.
http://www.oracle.com/technetwork/middleware/webcenter/content/index-090596.html

How to read content in scanned content in alfresco?

I have a number of scanned content items which are being scanned by scanner & converted into pdf/image and finally got stored in alfresco repository.
I can search these scanned items using metadata properties but can anybody help me on how i can search them thru content stored into scanned documents. E.g. I have scanned a form with filled in user details & i want to search into alfresco with that particular user's name.
How is it possible? Is there any way to make it as closer as possible to scanner end?
Use EpheSoft or Kofax for the scanning software. Both products have integrations with Alfresco were they can automatic recognize fields and map those to an Alfresco model.
After this process had been done you can search on these specific fields.
I can integrate & scan the content using kofax & this integration can automatically capture all details including text content of scanned content which will be filled in custom content model automatically which has mapping to all these fields and this model is attached to scanned content. Once done, it comes under purview of alfresco indexing after which user can search for same.
Also I assume kofax provides many components such as Scan, Virtual ReScan (VRS), Recognition (OCR / OMR / ICR), Validation, Verification, Quality Control, PDF Generator, etc. which are available OOTB but we need to configure these for use in our implementation. E.g. by configuring quality module, we can see error generated while scanning the content. Further as I am looking for alfresco+Kofax integration so I assume that these features would be provided by Kofax OOTB & I need to just map the scanned content to alfresco content repository for storing content & metadata as per defined content model.
There are a number of options that you could explore but they all require that OCR is performed on the scanned content and the text that is extracted from the OCR needs to be stored in the PDF (if you're using PDFs) or it needs to be stored in Alfresco as either metadata or full text.
If you store the OCR text in the PDF, Alfresco will then be able to extract the text using its content transformers so long as the content type being used specifies that you will be indexing the full text of the content.
Now there are a number of options available to accomplish what you're after but to keep the solution close to the scanner, you will want to investigate a capture solution such as Ephesoft, which is used for intelligent document capture and processing. Other solutions are available (such as Kofax) or you can implement your own solution using Tesseract.

Resources