metadata of a BFILE - plsql

Is there any way, working with BFILEs, to collect the metadata information?
In my case, I have a table who's one of the columns is BFILE and points towards a location from the hard drive where I have text file(PDF, DOC, DOCX, TXT, HTML, etc.)
For example, I would need to collect the information from below screenshot.
Is it possible, not manually entering into the a properties table?
Thanks a lot.

You can create Java method for this purpose (read more here).
And here is solution how to get metadata in Java.

Related

How to export a complex element in a doc showing just a property and keeping all other information?

I need to import and export some documents from my web app written in .net-core to docx and viceversa: the users should be able to export, modify offline, and import back. Currently I am using OpenXml-PowerTools to export.
The problem is that there are dynamic contents that show the current value of some fields in the database so I should be able to export the document showing a face value (for instance an amount of money) and when importing back I should be able to recall the original reference (which is an object containing an expression and operations, like "sum_db_1 + sum_db_2" and info about the formatting of numbers and so on). Of course if needed everything can be treated as a String instead of a complex object.
In the original document the face value is shown (a text or an amount) while the original formula is stored like in this xml:
<reuse-link reuse="reuse-link">
<reuse-param name="figname" value="exp_sum_n"></reuse-param>
<reuse-param name="format" value="MC"></reuse-param>
</reuse-link>
In short, I need the possibility to export a complex object in Word that shows the face value and keeps somewhere also the other additional fields of the original object so they can be retrieved once imported back. The possibility of editing the "complex" values is not foreseen.
How can I achieve this?
I tried to negotiate with customers explaining they should only edit online but they are not flexible to change their internal workflow that foresee an exchange of the document between various parties.
Thank you in advance for your help.
I suggest you use one or more Custom XML Parts to store whatever additional information you need. You will probably need to create a naming convention that will allow you to relate elements/attributes in those Parts to the "face values" (however they may be expressed).
A Custom XML Part can store any XML (the content does have to be valid XML). As long as you create them, and the necessary relationships, in the .docx or Flat OPC format .xml file, the Parts should survive normal use - i.e. the user would have to do something unusual to delete them.
You could also store the information in Word document variables, but Custom XML Parts look like a better fit to your scenario.
(Sorry, I am not yet allowed to post comments or respond to them here).

Power Automate Flow: Convert json to readable PowerAutomate-Items

In CRM I have a 'Doc_Config' value.
The 'Doc_Config' value gets passed to Power Automate Flow
With the data I populate an Microsoft Word Document. My problem here is, that instead of the data the raw text is filled into the Word Document.
Is there a way to convert the raw text so Power Automate recognizes the data I actually want? Like as if it is presented to the Flow like so:
Problem: You probably have copied the path for your objects and pasted the path value in your 'Doc_Config' value. Here the problems should be with the #{...} pattern.
Solution: Please, remove #{...} pattern from any objects that you are referring to by their path as in the example below:
incorrect:
#{items('Apply_to_each_2')?['productname']}
correct:
items('Apply_to_each_2')?['productname']
Background:
In Power Automate cloud flows, you reference objects that the dynamic content tooling offers. However, sometimes you want to catch objects that the dynamic content tooling cannot see, or does not provide. At these times, you can refer to them by specifying the path for them as in the example below.
items('Apply_to_each_2')?['productname']
You can observe the path for the objects by hovering over any object that the dynamic content tooling is providing you.
Another option would be to simply parse the data from your array, as it is already JSON.
I will add three links here to Imgur images as I can't post images directly yet, but the idea is very simple:
Append your data to your array variable
Add a Parse JSON task, click generate from sample, and paste in the JSON you use.
Your actions can be used in all other steps now.
The images will clarify a lot.
Append your data
Parse your data
Use your data

How to improve xdmp:document-filter() performance in Marklogic?

I am using xdmp:document-filter(doc()) to extract metadata from documents(doc, docx, pdf etc). We are using this because it works for all kinds of document format and generates the XHTML format for every kind of document. But the major drawback of this command is that it slows down the query. If there are one or two documents in the database then the query works fine but if there are more documents (e.g. 10 or 15) then the query slows down. We want to extract and show the information from the metadata of all the documents in the database.
We are using this query:-
for $d in fn:doc()
return xdmp:document-filter(doc(fn:base-uri($d)))
Is there any way to make this query work faster or is there any alternative to xdmp:document-filter() ?
The xdmp:document-filter() is typically used at ETL time. If you use Information Studio to load your content, then you can add a 'Filter documents' transform. You can choose between storing the extracted metadata as separate xhtml documents, or as document properties. That way they don't need to be calculated on the fly at each request.
HTH!

Is it possible (and wise) to add more data to the riak search index document, after the original riak object has been saved (with a precommit hook)?

I am using riak (and riak search) to store and index text files. For every file I create a riak object (the text content of the file is the object value) and save it to a riak bucket. That bucket is configured to use the default search analyzer.
I would like to store (and be able to search by) some metadata for these files. Like date of submission, size etc.
So I have asked on IRC, and also given it quite some thought.
Here are some solutions, though they are not as good as I would like:
I could have a second "metadata" object that stores the data in question (maybe in another bucket), have it indexed etc. But that is not a very good solution especially if I want to be able to do combined searches like value:someword AND date:somedate
I could put the contents of the file inside a JSON object like: {"date":somedate, "value":"some big blob of text"}. This could work, but it's going to put too much load on the search indexer, as it will have to first deserialize a big json object (and those files are sometimes quite big).
I could write a custom analyzer/indexer that reads my file object and generates/indexes the metadata in question. The only real problem here is that I have a hard time finding documentation on how to do that. And it is probably going to be a bit of an operational PITA as I will need to push some erlang code to every riak node (and remember to do that when I update the cluster, when I add new nodes etc.) I might be wrong on this, if so, please, correct me.
So the best solution for me would be if I could alter the riak search index document, and add some arbitrary search fields to it, after it gets generated. Is this possible, is this wise, and is there support for this in libraries etc.? I can certainly modify the document in question "manually", as a bucket with index documents gets automatically created, but as I said, I just don't know what's the right thing to do.

How to save documents like PDF,Docx,xls in sql server 2008

I develop a web application that let users to upload files like images and documents. this file divided into two parts :
binary files
document files
I want to allow users to search documents that uploaded. specialy using full text search. What data types I should use for these two file types?
You can store the data in binary and use full text search to interpret the binary data and extract the textual information: .doc, .txt, .xls, .ppt, .htm. The extracted text is indexed and becomes available for querying (make sure you use the CONTAINS keyword). Needless to say, full text search has to be enabled.Not sure how adding a full text index will affect your system - i.e., its size. You'll also need to look at the execution plan to ensure the index gets used at query time.
For more information look at this:
http://technet.microsoft.com/en-us/library/ms142499(SQL.90).aspx
Pros:
The main advantage of storing data in the database is that it makes the data "self-contained". Since all of the data is contained within the database, backing up the data, moving the data from one database server to another, replicating the database, and so on, is much easier.
also you can enable versioning of files and also make it easier for load balanced web farms.
Cons:
you can read it here: https://dba.stackexchange.com/questions/3924/sql-server-2005-large-binary-storage. But this is something that you have to do in order to search through the files efficiently.
Or the other thing that I could suggest is probably storing keywords in the database and then linking the same to file in the fileshare.
Here is an article discussing abt using a FileStream and a database: http://blogs.msdn.com/b/manisblog/archive/2007/10/21/filestream-data-type-sql-server-2008.aspx
You first need to convert the PDF to text. There are libraries for this sort of thing (ie: PowerGREP). Then I'd recommend storing the text of the PDF files in a database. If you need to do full text searching and logic such as "on the same line" then you'll need to store one record per line of text. If you just want to search for text in a file, then you can change the structure of your SQL schema to match your needs.
For docx files, I would convert them to RTF and search them that way while stored in SQL.
For images, Microsoft has a program called Microsoft OneNote that does OCR (optical character recognition) so you can search for text within images. It doesn't matter what tool you use, just that it supports OCR.
Essentially, if you don't have a way to directly read the binary file, then you need to convert it to text with some library, then worry about doing your searching.
The full-text index can be created for columns which use any of the following data types – CHAR, NCHAR, VARCHAR, NVARCHAR, TEXT, NTEXT, VARBINARY, VARBINARY (MAX), IMAGE and XML.
In addition, To use full text search you must create a full-text index for the table against which they want to run full-text search queries. For a particular SQL Server Table or Indexed View you can create a maximum of one Full-Text Index.
these are two article about it:
SQL SERVER - 2008 - Creating Full Text Catalog and Full Text Search
Using Full Text Search in SQL Server 2008

Resources