How to embed custom code in MarkLogic Data Hub Framework? - xquery

I have created an Entity and have created input and harmonize flow. I can able to see generated XQuery files.
Now i have a requirement where i need to do some (if-else) on my raw data and based on the conditions i need to Push some of the data to my FINAL Database and some Data will remain in STAGING itself (That should not go into FINAL).
I am confused on which files (main.xqy,headers.xqy etc) i need to do code changes so if i run my Harmonize Flow then entire thing should work in one go.

Each of the harmonization flow plugins in the MarkLogic Operational Data Hub Framework are intended to be customized. There are five plugins, collector.xqy, content.xqy, header.xqy, triples.xqy, and writer.xqy. The simplest harmonization follows something like this:
Identify which documents in the staging database need to be processed in the collector plugin
Transform the documents from step 1. in the content plugin (add the if/else logic)
Write the harmonized documents from step 2. to the final database using the writer plugin.
Here are summaries of each of the plugins from the ODH Wiki:
Collector
Select IDs of documents in the staging database to be processed.
Content
Perform transformation of input data into a normalized or canonical format to store in the final document or documents. You can add custom transformation code here.
Header
A headers plugin is responsible for extracting header items from the content. You can add metadata or augment the content in the header section here.
Triples
A triples plugin is responsible for extracting semantic triples from the source content. You can control the embedded triples in the envelope document.
Writer
A writer plugin is responsible for writing the final envelope to the database. You can control the output permissions, URI, collections etc. of the harmonized document with this module.

Related

How to export a complex element in a doc showing just a property and keeping all other information?

I need to import and export some documents from my web app written in .net-core to docx and viceversa: the users should be able to export, modify offline, and import back. Currently I am using OpenXml-PowerTools to export.
The problem is that there are dynamic contents that show the current value of some fields in the database so I should be able to export the document showing a face value (for instance an amount of money) and when importing back I should be able to recall the original reference (which is an object containing an expression and operations, like "sum_db_1 + sum_db_2" and info about the formatting of numbers and so on). Of course if needed everything can be treated as a String instead of a complex object.
In the original document the face value is shown (a text or an amount) while the original formula is stored like in this xml:
<reuse-link reuse="reuse-link">
<reuse-param name="figname" value="exp_sum_n"></reuse-param>
<reuse-param name="format" value="MC"></reuse-param>
</reuse-link>
In short, I need the possibility to export a complex object in Word that shows the face value and keeps somewhere also the other additional fields of the original object so they can be retrieved once imported back. The possibility of editing the "complex" values is not foreseen.
How can I achieve this?
I tried to negotiate with customers explaining they should only edit online but they are not flexible to change their internal workflow that foresee an exchange of the document between various parties.
Thank you in advance for your help.
I suggest you use one or more Custom XML Parts to store whatever additional information you need. You will probably need to create a naming convention that will allow you to relate elements/attributes in those Parts to the "face values" (however they may be expressed).
A Custom XML Part can store any XML (the content does have to be valid XML). As long as you create them, and the necessary relationships, in the .docx or Flat OPC format .xml file, the Parts should survive normal use - i.e. the user would have to do something unusual to delete them.
You could also store the information in Word document variables, but Custom XML Parts look like a better fit to your scenario.
(Sorry, I am not yet allowed to post comments or respond to them here).

Document Association in Alfresco

If a Alfresco user selects x number of documents from the current folder and wants to have a parent document where all x documents are attached in a single document and can download it. Should I create a custom web script to perform this or how can association concept be leveraged here. Eg. Lets say a product requirement document, testing and release document needs to be attached together into a single document.
It seems to me you mismatch document (download one combined document) and collection (association) concepts.
You could create your own custom document model which supports to logically attach documents to another (master) document by adding an association. You could also define in that model that the attached documents will be stored as a child of the master which will somehow hide the attached documents in the folders. We implemented this concept for our Alfresco Email and our custom Attachment module.
If you need the possibility to download that logical document (which still may be a collection of documents) the easiest way would be to implement a custom action shown up on your master document which will zip the master and all connected documents. If you expect to download only a single document like a PDF you will have to write your custom conversion logic which will convert the single docs into pages and to compose them into a single PDF. This could be sophisticated since the documents could be of any format. Maybe you also want/need to save metadata, process information, decisions, structure also ...

How to update metadata using content indexs in webcenter content

I need to create a program which can search a document and fill the metadata from document( eg. resume of candidate) like user experience, user skill , location etc.
for this i like to use oracle indexing mechanism(Oracle text search) because it index all the data from document. when it index the document, i like to first update my metadata field from indexed data and then content server will update their indexes. Can anyone help me how i will get to know the working of indexer and event on which i will trap and do some modification for updating my metadata.
i need to update metadata because requirement are:
Extensive choices for Search Filter criteria (that searches within Resumes and not just form keywords) :
- Boolean search between multiple parameters
- Have search on Skills, Years of experiences, particular company, education qualification, Geo/Location and Submission date of the profile.
- Search on who referred, name, team , BU etc.
- Result window adequate size of results, filters
- Predefined resume filter criteria to assisting screening in case of candidate applying on job portal
You are looking at this problem from the wrong end. The indexer (OracleText Search) is a powerful and complex tool embedded inside the workings of the database. What you are suggesting is to interpret the results of text indexing and use this as metadata for your content - if I am not mistaken? OracleText generates huge amounts of data and literally "chops" up documents word for word. For you to make meaningful metadata from this would be a huge task.
Instead you should be looking at the capture of the metadata from as close to the source as possible. This could be done using (if you are using MS-OFFICE) Word vbScript when the user saves to the repository or filesystem. I believe you can fully manipulate the metadata in a document at savetime.
You will of course need to install the Oracle WebCenter Content Desktop Integration suite.
Look into Oracle WebCenter Capture. WebCenter Capture can scan a document and allows metadata to be automatically tagged on the document. WebCenter Capture integrates with WebCenter Content (WCC) and allows you to directly checkin scanned documents to WebCenter Content.
http://www.oracle.com/technetwork/middleware/webcenter/content/index-090596.html

how to modify stored dcm files

I have the need to mopdify the information of patient,study,series,instance,and I have done this by putting information to dataset that stored in database.Now my question is that the information stored in database does has been modified but the dcm files stored in pacs can't modified.Is there any way to modify the dcm files at the same time?
I could be wrong, but I do not believe dcm4chee changes the images when you edit the fields through the web interface. Instead what it does is it modifies the fields in the data base. When an image is retrieved from the dcm4chee pacs, it will prepare and send the modified images. At that point in time it creates a new image header (updated with changes made through the web UI and the changes required because dcm4chee handled the images). The retriever will then get the modified set of images, dcm4chee will continue to store the original images.
To get the modified images issue a C-STORE request and have dcm4chee send the images to another client or pacs. That system will receive the modified images.
If you have DCM4CHEE 2.17.x, then you should be able to edit some information. Go onto the web interface for your installation and look for the Edit [Patient/Study/Series/Instance] Attributes icon (looks like a document with a pencil). If you click on that, you should be able to enter new values for some of the items.
Most PACS will ignore a storage request if they already have the identical SOP Instance UID value. So, another method to change the data is to use a toolkit to modify the fields you want, and then generate new UID values for the images (and it's a good idea to do the same for the study and series UID values too). This will create duplicate entries, but with different values.

How can i create a segment data feature

I have a task to extend my web application to provide users the ability to segment their own data (i.e choose their own fields and add their criteria using And/Or etc), so I'm creating something similar to a query builder tool but lighter. I'm not worrying about the front end for the moment, i am just trying to focus on how to do this in the back end.
My only thoughts so far are to store their "Segment" as an XML document (serialized in the DB) which contains all of their columns and criteria and how they map to the database, then when the segment is called, i have a mapping class which deserializes this xml document and maps the fields and builds a SQL query for this and then returns the query results. The problem i see with this is if the database setup changes (likely) then i have a serialized XML document which knows nothing about these changes.
Has anyone tacked a similar situation?
I had a similar problem and posted a question on here with what could be a potential solution to your own issue.
Dynamic linq query with multiple/unknown criteria
See how you get on with that.

Resources