I need to create a program which can search a document and fill the metadata from document( eg. resume of candidate) like user experience, user skill , location etc.
for this i like to use oracle indexing mechanism(Oracle text search) because it index all the data from document. when it index the document, i like to first update my metadata field from indexed data and then content server will update their indexes. Can anyone help me how i will get to know the working of indexer and event on which i will trap and do some modification for updating my metadata.
i need to update metadata because requirement are:
Extensive choices for Search Filter criteria (that searches within Resumes and not just form keywords) :
- Boolean search between multiple parameters
- Have search on Skills, Years of experiences, particular company, education qualification, Geo/Location and Submission date of the profile.
- Search on who referred, name, team , BU etc.
- Result window adequate size of results, filters
- Predefined resume filter criteria to assisting screening in case of candidate applying on job portal
You are looking at this problem from the wrong end. The indexer (OracleText Search) is a powerful and complex tool embedded inside the workings of the database. What you are suggesting is to interpret the results of text indexing and use this as metadata for your content - if I am not mistaken? OracleText generates huge amounts of data and literally "chops" up documents word for word. For you to make meaningful metadata from this would be a huge task.
Instead you should be looking at the capture of the metadata from as close to the source as possible. This could be done using (if you are using MS-OFFICE) Word vbScript when the user saves to the repository or filesystem. I believe you can fully manipulate the metadata in a document at savetime.
You will of course need to install the Oracle WebCenter Content Desktop Integration suite.
Look into Oracle WebCenter Capture. WebCenter Capture can scan a document and allows metadata to be automatically tagged on the document. WebCenter Capture integrates with WebCenter Content (WCC) and allows you to directly checkin scanned documents to WebCenter Content.
http://www.oracle.com/technetwork/middleware/webcenter/content/index-090596.html
Related
I'm working with a classmate to build some kind of politicaly-related memes database where users will have the ability to tag images with hashtags, using Meteor. The purpose of this, beyond data collection, is to provide a powerful search engine, where one can find memes with keywords (let's say, for i.e., with the keywords "ukraine" and/or "poutine", you'll find memes related to theses topics) that matches the hashtags.
We have to build everything from scratch, and I'm wondering if someone here have an idea where to start. In other words :
What is the easiest way to host images with Meteor ? Is it through MangoDB ?
Is it possible to change the metadata of the images in the client side ? Do we need to grant this ability using javascript only (or is there also json in it) ?
If we can manage the two first parts, is there a way to link the metadata (the hashtags in that case) with the search engine in order to retrieve the images ?
Thank you for inputs !
It's not easiest but I would store images in Google Cloud Storage or Amazon S3
I would store image metadata in mongodb database. You can update the database from client side by calling Meteor Methods
When users search for images by entering keywords or link with keywords, you can query the database then return the related images.
I have a number of scanned content items which are being scanned by scanner & converted into pdf/image and finally got stored in alfresco repository.
I can search these scanned items using metadata properties but can anybody help me on how i can search them thru content stored into scanned documents. E.g. I have scanned a form with filled in user details & i want to search into alfresco with that particular user's name.
How is it possible? Is there any way to make it as closer as possible to scanner end?
Use EpheSoft or Kofax for the scanning software. Both products have integrations with Alfresco were they can automatic recognize fields and map those to an Alfresco model.
After this process had been done you can search on these specific fields.
I can integrate & scan the content using kofax & this integration can automatically capture all details including text content of scanned content which will be filled in custom content model automatically which has mapping to all these fields and this model is attached to scanned content. Once done, it comes under purview of alfresco indexing after which user can search for same.
Also I assume kofax provides many components such as Scan, Virtual ReScan (VRS), Recognition (OCR / OMR / ICR), Validation, Verification, Quality Control, PDF Generator, etc. which are available OOTB but we need to configure these for use in our implementation. E.g. by configuring quality module, we can see error generated while scanning the content. Further as I am looking for alfresco+Kofax integration so I assume that these features would be provided by Kofax OOTB & I need to just map the scanned content to alfresco content repository for storing content & metadata as per defined content model.
There are a number of options that you could explore but they all require that OCR is performed on the scanned content and the text that is extracted from the OCR needs to be stored in the PDF (if you're using PDFs) or it needs to be stored in Alfresco as either metadata or full text.
If you store the OCR text in the PDF, Alfresco will then be able to extract the text using its content transformers so long as the content type being used specifies that you will be indexing the full text of the content.
Now there are a number of options available to accomplish what you're after but to keep the solution close to the scanner, you will want to investigate a capture solution such as Ephesoft, which is used for intelligent document capture and processing. Other solutions are available (such as Kofax) or you can implement your own solution using Tesseract.
I have a task to extend my web application to provide users the ability to segment their own data (i.e choose their own fields and add their criteria using And/Or etc), so I'm creating something similar to a query builder tool but lighter. I'm not worrying about the front end for the moment, i am just trying to focus on how to do this in the back end.
My only thoughts so far are to store their "Segment" as an XML document (serialized in the DB) which contains all of their columns and criteria and how they map to the database, then when the segment is called, i have a mapping class which deserializes this xml document and maps the fields and builds a SQL query for this and then returns the query results. The problem i see with this is if the database setup changes (likely) then i have a serialized XML document which knows nothing about these changes.
Has anyone tacked a similar situation?
I had a similar problem and posted a question on here with what could be a potential solution to your own issue.
Dynamic linq query with multiple/unknown criteria
See how you get on with that.
hi friends present i am working as developes,
i want code for the following scenario
my scenario is the word document must contain checkbox, and this word document should read to asp.net page, when user click the check box, the selected value should be stored into the database
can any one help me
From what I understand, what you're trying to do is to read a column inside a word document, and store the values into a database.
First approach - sharepoint
It seems to be a perfect fit for sharepoint. If that is an option you can do the following:
set up sharepoint
set up a document library
set up a document template
The user will have a form to fill values into, but also available in a word document format.
This technique may be overkill depending on what you ultimately want to do.
Second approach - Office SDK
Microsoft Office SDK comes with the CheckBox object. You can try open up the document programmatically and interogate the CheckBox object.
I would not advice this code to be run on the server as Microsoft Office isn't meant to be run as a server. Whereas Sharepoint is.
If you really want to do this, you may need to write a queueing mechanism so that the act of running the office sdk calls is batched and run one at a time in sequence.
I have a search functionality on my site that is accessible from every page. Typical top of the masterpage textbox and button deal. I'm looking for a better way to accomplish my caching of the most common search strings and their result using System.Web.Caching.Cache.
I was thinking of concatenating the search string with some applicable user group permission data and using that as the cache key with the value being the List.
example cache key: Microsoft Visual Studio 2008 Service Pack 1--usergroup2,3,6,17,89
But that got me thinking about what's the max length of cache key. Is there a max length that the key can be? By trying to store things this way can end up with some pretty lengthy key name values and it really doesn't do anything about keeping the most common searches as well the most recently used.
Is there already a commonly used method to accomplish what I'm trying to do? Does my question even make sense? Thanks for any help.
But that got me thinking about what's the max length of cache key. Is there a max length that the key can be? By trying to store things this way can end up with some pretty lengthy key name values and it really doesn't do anything about keeping the most common searches as well the most recently used.
The length for the key is the maximum length of the "string" itself.
According to the documentation here : http://msdn.microsoft.com/en-us/library/system.web.caching.cache.add.aspx, the key can be defined in a string with the value in Object type.
I would suggest to tag a custom Object with a unique key, so that when you query from the Cache, you can object your custom Object with more complex information tagged along in the Custom Object.
EDIT 11072009_1154
After i carefully read your requirement again, i noticed that your objective is to cache the frequently search string.
In your given example, the frequently search string might be "Microsoft Visual Studio 2008 Service Pack 1". In my opinion this should be the key, while the value is a custom object which will have additional properties to hold your other necessary attributes.
In summary, this might be the example :
Key : "Microsoft Visual Studio 2008 Service Pack 1"
Value : CustomObjectInstance where : CustomObjectInstance.UserLanguage = "English" and CustomObjectInstance.UserLocalization : "USA" , CustomObjectInstance.UserKeyboardLayout = "UK" etc.
AFAIK, The Cache implement a dictionary type of data structure, so the key must be unique enough. So if your key is "Microsoft Visual Studio 2008 Service Pack 1--usergroup2,3,6,17,89" How can you uniquely identify this particular key from your ASp.NET web apps ? Because in my search textbox, i will not insert usergroup2,3,6,17,89
Think also like StackOverflow site search functionality: users will insert a common search string i.e. "learn jquery material", then in my opinion, your cache key should have an entry of "learn jquery material".
EDIT 11072009_1250
Thanks for the additional information. I can also give additional solution by enforcing multiple layers, what i mean is, rather than cramming all the information into one layer of cache, why not store additional layer.
Means that your cache will have a key (string) and a value which point to a dictionary again.
Another possible solution, is to push these feature by using SQL Server Full Text Index Search, i am not quite familiar to the SQL Server Full Text Index Search, but it can be good if we can leverage this functionality to existing infrastructure if possible.
Caching search results is a fairly common technique. ASP.NET Cache will store all the cached data in memory for faster access. It all depends on how much memory is available to you for caching. If you want to deviate from the ASP.NET Cache approach, there's another method for implementing this - that method for caching the data retrieved from search is to store it in a database table.
Searching a table with billions of records is really expensive; so, you can store the data for the most searched keywords in a table for faster access. You can also create a job to refresh the table at regular intervals, based on some fairly easy algorithms. Least Recently Used algorithm, for example. You can remove the search results which have not been used recently.
EDIT: And, as for your question for the length of the cache key; it is a string, and the length of a string is dependent on the memory available to store it.