Does the Cloud Vision API support TIFF & PDF label detection yet?

Does the Cloud Vision API support TIFF & PDF label detection yet? - google-cloud-vision

The documentation here https://cloud.google.com/vision/docs/supported-files states PDF & TIFF are supported.
Using the Node.js client library I get an error response code:3 & message:'Bad image data.' for TIFFs and PDFs but all other file types are accepted and labeled.
I'm using local files not cloud storage e.g.
const [result] = await client.labelDetection('./test.pdf');
Neither of the "Try it!" demo pages will allow upload of TIFF or PDF either.

While the Vision API supports PDF and TIFF formats, there are some considerations you should be mindful of. Quoting the official documentation linked:
Currently DOCUMENT_TEXT_DETECTION and TEXT_DETECTION are the only feature types available for offline (asynchronous) large batch file (PDF/TIFF) annotation.
Alternatively,
Online small batch annotation is available for all Vision features.
Bear in mind that small batch is limited to files not larger than 5 PDF/TIFF pages.
You can find examples of how to implement this in Node.js in the links above.
Additionally, I was able to successfully run this Try it demo.

Related

C# .NET Core - Where to store LARGE files for integration testing?

I'm writing some .NET core API integration tests that verify if a file upload mechanism is working properly.
What is the best practice for storing large files (2Gb +) that I upload for testing purposes? It doesn't seem like a good idea to include these test files in the solution/source control due to their size.
What is the recommended path to take when it comes to writing integration tests using large files?

I see two options
#1 If contents of the file are not important I would generate stream of size which is xGB on the fly (like serialize array with couple milions of references to the same complex structure). Try to create proper file in console app and save it locally to see if it meets your expectations first.
#2 I would recommend placing these files in Azure Storage. If needed you can generate proper SAS Tokens to make these file available directly through http address or load them as stream using BlobStorage Nuget Package. Blob Storage is also cheap. Or use any server where these files can be placed and be retrievable in your integration tests.
Definitely don't include these files in the project!

Flutter Web Display Microsoft Documents Firebase

I am trying to embed MS Documents in a Flutter Web App.
Documents are stored on Firebase Storage. I am using MS Web Viewer to display them in browser.
This works without any problem:
https://view.officeapps.live.com/op/embed.aspx?src=https://file-examples.com/wp-content/uploads/2017/08/file_example_PPT_250kB.ppt
The following two versions where the documents are hosted on Firebase are not working:
https://view.officeapps.live.com/op/embed.aspx?src=https://firebasestorage.googleapis.com/v0/b/tutor-and-learn.appspot.com/o/public%2Ffile_example_PPT_250kB.ppt?alt=media&token=6e293eb9-9f3b-41ab-9969-f936b3c54384
https://view.officeapps.live.com/op/embed.aspx?src=https://storage.googleapis.com/tutor-and-learn.appspot.com/public/file_example_PPT_250kB.ppt?GoogleAccessId=firebase-adminsdk-t47jn%40tutor-and-learn.iam.gserviceaccount.com&Expires=1597309385&Signature=VHbm8U8xlf%2BYybwalAveZtl8FsmEmr6Uml%2BwX%2FR7TOFNqlj%2B8QW1FFSJUNB4qcAzVpEcntLzipT15Zj73B%2FLlSZlQwEU10s5RkJdR5CZeZ6MuF2DUptUbqfnNobdLkizEmwlQ6Bkk4DkDWCd9nRL%2BQ0GLYypBr%2Bxs39bpd8JSuxxACWCjq0Of8qLTBMZQmD%2BgbE8JkMdqvBVOV75A7EQyy1IWqHrRBD7RgVc46IEq4TaO2ZT9h56joJgawqZOt81%2Fkq95YmNWZNOeU9kVRuLpSFsqZru8Ku7aapiFcUXjrjuMWZeC1XCrTK7fwU6A8shNIyHq3bE8RB9a%2BCQnS0llA%3D%3D
Either via Firebase directly nor Google Cloud Storage I get it to work.
The individual links in the above example work without any problems and you can download the file.
https://firebasestorage.googleapis.com/v0/b/tutor-and-learn.appspot.com/o/public%2Ffile_example_PPT_250kB.ppt?alt=media&token=6e293eb9-9f3b-41ab-9969-f936b3c54384
https://storage.googleapis.com/tutor-and-learn.appspot.com/public/file_example_PPT_250kB.ppt?GoogleAccessId=firebase-adminsdk-t47jn%40tutor-and-learn.iam.gserviceaccount.com&Expires=1597309385&Signature=VHbm8U8xlf%2BYybwalAveZtl8FsmEmr6Uml%2BwX%2FR7TOFNqlj%2B8QW1FFSJUNB4qcAzVpEcntLzipT15Zj73B%2FLlSZlQwEU10s5RkJdR5CZeZ6MuF2DUptUbqfnNobdLkizEmwlQ6Bkk4DkDWCd9nRL%2BQ0GLYypBr%2Bxs39bpd8JSuxxACWCjq0Of8qLTBMZQmD%2BgbE8JkMdqvBVOV75A7EQyy1IWqHrRBD7RgVc46IEq4TaO2ZT9h56joJgawqZOt81%2Fkq95YmNWZNOeU9kVRuLpSFsqZru8Ku7aapiFcUXjrjuMWZeC1XCrTK7fwU6A8shNIyHq3bE8RB9a%2BCQnS0llA%3D%3D
I presume the MS Web Viewer can not cope with the URLs. Is there any way I can adapt or change anything in Firebase to get it to work?
Looking in Firebase Storage Console the files are listed with the correct type as application/vnd.ms-powerpoint.

Encode the signed URL before adding it to https://view.officeapps.live.com/op/embed.aspx?src=. You can use online tools to encode URLs like https://www.urlencoder.org/

PowerBI ExportReport Stream format

I am using PowerBI online with pro license.
Using PowerBIClient object in .Net core 2.1 I am able to ExportReport function which is returning me Stream.
I want to convert this into an image ( any format) so I can use it further. How do I get this done? Am I thinking correct?

I'm afraid it doesn't work this way. ExportReport returns you the .pbix file, i.e. the report itself, which can be opened in Power BI Desktop. There is no easy way to convert this to an image pragmatically.
You can save this stream to a .pbix file and try to open it in Power BI Desktop. If the data is imported, or in case you have access to the data source, you will be able to visualize the report and export it to PDF, but this is manual operation. Also, you can export the report from the service directly (to PowerPoint and PDF).
You can try to automate Power BI Desktop (e.g. using this), but I wouldn't go this way for a production system.
There are also some 3rd party tools that eventually can help you (e.g. this one).
You can embed Power BI in a desktop application and try some screen capturing magic, but this isn't nice and easy solution either.
You can suggest an idea or vote for existing one in Power BI Ideas, e.g. Export to PDF via Power BI Embedded API.
UPDATE 2020: There is Export to File API now - Export report to PDF, PPTX and PNG files using Power BI REST API (Preview), Reports - Export To File, Reports - Export To File In Group.

Bulk Upload PDF Forms to AEM Repository

We recently began using Adobe LiveCycle and Adobe Experience Manager. We're not using AEM for our web site, however. We're just wanting to use the Forms Portal portion of it to allow us to organize our forms, tag them, etc, then provide a searchable interface.
We have several hundred pre-AEM/LiveCycle PDF forms. Rather than manually uploading one PDF file at a time, we would like to do a bulk upload into the repository.
If I remember correctly, I saw someone do this using CRXDE Lite, navigating to /content/dam/formsanddocuments, then dragging the documents in. However, when I try that, it just opens up Acrobat Reader to display the PDF.
Any help would be greatly appreciated!

I'm not sure about drag/drop in the browser (never tried it), but there are a couple of methods you could use:
Curl / Bash
If you're happy scripting this in Bash, one method is to use curl to upload the file, along the lines of:
curl -F"./*=#form1.pdf" \
-F"./*#TypeHint=nt:file" \
http://admin:admin#localhost:4502/content/dam/forms/form1
For example, you could create a script to loop over all the PDFs in your directory & upload them one at a time.
VLT
Another option would be to add create a content package with Maven, build it and install via CRX Package Manager.
Or you could use the Filevault (VLT) tool that Adobe provide as a VCS-style link between your file system & the repository — checkout to a directory, add the PDFs in and then do a vlt add; vlt ci to push them back into AEM.
WebDav
AEM supports mounting the repository via WebDav, so that you could drag the files in using Finder/Explorer.
It can be slow, but if you're just doing a one-off dump of files into AEM, it could be an option.

Open source online document editor

Can somebody tell me some open source projects that implement document management online?
I need to upload document (pdf, docx,fb2), convert it to doc and allow user to edit it online and then convert back to pdf.
Images and formatting should be preserved.
I found teamlab, but it is not free and also I don't need collaborative editing.
Thank you.

It depends what your requirements are when you say 'online editing'.
You could use Nuxeo including the Nuxeo Drive extension, to enable users to edit the files from the remote repository locally (transparent in the sense the user does not take the step of downloading or uploading files), and their changes are then synced to the repository.
The user can edit the .doc file locally using OpenOffice (or MS Word if they have it).
If the requirement is strictly 'online/web only', you could convert the PDF to RTF via an OCR engine such as Tesseract, and then use one of the many WYSIWG inline editors, and connect this to Nuxeo as an edit button using their extension framework. There is an existing tesseract-ocr extension available at the Nuxeo source repo.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Does the Cloud Vision API support TIFF & PDF label detection yet? - google-cloud-vision

Related

C# .NET Core - Where to store LARGE files for integration testing?

Flutter Web Display Microsoft Documents Firebase

PowerBI ExportReport Stream format

Bulk Upload PDF Forms to AEM Repository

Open source online document editor

Categories

Resources