Microsoft Computer Vision OCR Read API charged as S3 transaction instead of S2 - azure-cognitive-services

I am using Microsoft Computer Vision API for OCR processing and I noticed that they are getting charged as S3 transactions instead of S2 in my bill.
I'm using the .NET SDK and the API I am using is this one.
https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.cognitiveservices.vision.computervision.computervisionclientextensions.readasync?view=azure-dotnet
I have also confirmed that the actual REST API the SDK calls is the following POST /vision/v3.2/read/analyze
https://centraluseuap.dev.cognitive.microsoft.com/docs/services/computer-vision-v3-2/operations/5d986960601faab4bf452005
According to documentation, that should be the OCR Read API, am I correct?
https://learn.microsoft.com/en-us/azure/cognitive-services/computer-vision/vision-api-how-to-topics/call-read-api
I am puzzled as to why my calls are getting charged as S3 instead of S2. This is important for me because S3 is 50% more expensive than S2. Using the Pricing Calculator, 1000 S2 transactions is $1, whereas 1000 S3 transactions is $1.5.
https://azure.microsoft.com/en-us/pricing/calculator/?service=cognitive-services
What's the difference between OCR and "Describe and Recognize Text" anyways? OCR (Optical Character Recognition) by definition must recognize text. I am calling the Read API without any of the optional parameters so I did not ask for "Describe" hence the call should be S2 feature rather than S3 feature I think.
I already posted this question at Microsoft Q&A but I thought SO might get more traffic hence help me get an answer faster.
https://learn.microsoft.com/en-us/answers/questions/689767/computer-vision-api-charged-as-s3-transaction-inst.html

To help you understand, you need a bit of history of those services. Computer Vision API (and all "calling" SDKs, whether C#/.Net, Java, Python etc using these APIs) have moved frequently and it is sometimes hard to understand which SDK calls which version of the APIs.
API operations history
Regarding optical character reading operations, there have been several operations:
Computer Vision 1.0
See definition here was containing:
OCR operation, a synchronous operation to recognize printed text
Recognize Handwritten Text operation, an asynchronous operation for handwritten text (with "Get Handwritten Text Operation Result" operation to collect the result once completed)
Computer Vision 2.0
See definition here. OCR was still there, but "Recognize Handwritten Text" was changed. So there were:
OCR operation, a synchronous operation to recognize printed text
Recognize Text operation, asynchronous (+ Get Recognize Text Operation Result to collect the result), accepting both printed or handwritten text (see mode input parameter)
Batch Read File operation, asynchronous (+ "Get Read Operation Result" to collect the result), which was also processing PDF files whereas the other one were only accepting images. It was intended "for text-heavy documents"
Computer Vision 2.1 was similar in terms of operations.
Computer Vision 3.0
See definition here.
Main changes: Recognize Text and Batch Read File were "unified" into a Read operation, with models improvements. No more need to specify handwritten / printed for example (see link).
The Read API is optimized for text-heavy images and multi-page, mixed language, and mixed type (print – seven languages and handwritten – English only) documents
So there were:
OCR operation, a synchronous operation to recognize printed text
Read operation, asynchronous (+ Get Read Result to collect the result), accepting both printed or handwritten text, images and PDF inputs.
Same for Computer Vision v3.1-preview.1, v3.1-preview.2, v3.1, v3.2-preview.1, v3.2-preview.2, v3.2-preview.3
SDKs
All recent versions of the SDKs implementing a Read method are calling this 3.x. operation. See the changelog for example for .Net SDK here:
v7.0.x of the SDK "supports v3.2 Cognitive Services Computer Vision API endpoints."
Conclusion
It is normal that you are billed S3 for Read. But the calculator is misleading as the "Recognize Text" term should be changed for "Read".
If you really want to use OCR operation, use RecognizePrintedTextAsync method of the SDK which is the one using it.
OCR is an old model, used only for printed text. Read operation is the latest model.
I can also confirm (based on a few tests that I made) that the performance is lower than Read operation. If you want to quickly test, you can use your key on this website: it is an open-source portal created by another Microsoft MVP, where I also contributed. You will be able to see both results of OCR and Read operations. It is currently using 6.0.0 SDK version of Computer Vision (see source).
Sample:
OCR result:
Read result:

Related

Azure ML batch execution latency

After reading this post here: Azure Machine Learning Request Response latency
and the article mentioned in the comments I was wondering if this behavior is also true when a published webservice is called in batch mode.
Especially since I have read somewhere (sorry, can't find the link at the moment) that the batch calls are not influenced by the "concurrent calls" config...
In our scenario we have a custom R module uploaded to our workspace which includes some libraries that are not available on aML by default. The module takes a dataset, trains a binary tree, creates some plots and encodes them in base64 before returning those as a dataset. Locally that does not take more than 5s. But in the aML webservice it takes approx. 90s and it seems that the runtime in batchmode does not improve when calling the service multiple times.
Additionally it would be nice to know for how long the containers, mentioned in the linked post, will stay warm.

DocumentDB auto generated ID: GUID or UUID? Which variant?

TL;DR: Are the IDs that are auto-generated by DocumentDB supposed to be GUIDs or UUIDs, and is there actually a difference? If they are UUIDs, then which variant/version of UUID?
Background: Some of the DocumentDB client libraries will auto generate an ID for you if you do not provide one. I have seen it mentioned in the Azure blog and in several related questions that the generated IDs are GUIDs. I know there is some discussion over whether GUIDs are UUIDs, with many people saying that they are.
The problem: However, I have noticed that some of the IDs that DocumentDB auto-generates do not follow the UUID RFC, which allows only the digits 1-5 in the "version" nibble (V in xxxxxxxx-xxxx-Vxxx-xxxx-xxxxxxxxxxxx). DocumentDB generates IDs with any hex digit in that nibble, for example d981befd-d19b-ee48-35bd-c1b507d3ec4f, whose version nibble is the first e of ee48.
It is possible that this depends on which client is used to create the documents. In our DocumentDB database, we have documents with the third grouping dde5, 627a, fe95, and so on. These documents were stored from within a stored procedure by calling Collection.createDocument() with the options {'disableAutomaticIdGeneration': false}. Other documents that I create through the third party DocumentDB Studio application always have 4xxx in the third grouping, which is a valid UUID version. However, documents that I create through the Azure portal have non-standard third groupings like b359.
Question: Are the auto-generated DocumentDB IDs supposed to be GUIDs or UUIDs, and is there actually a difference? If UUIDs, then which variant?
Poking around in the source code on GitHub, I found that the various client and server side libraries use several different methods for creating what they're calling a GUID (in some libraries) or a UUID (in other libraries).
The nodejs client, Javascript client, and server-side library manufacture what they call a GUID by concatenating series of hex digits and hyphens. Note that these are random, but do not comply with the rules for creating RFC4122 version 4 UUIDs.
The Python client and Java client call their respective standard library methods to generate a random (version 4) UUID.
The .NET client is available via NuGet, but the source code is not yet published.
Summary:
Microsoft is not making a distinction between GUID and UUID in their client libraries. They are using the terms interchangeably.
What you get for a GUID/UUID depends on which client library you're using to call DocumentDB when you create your documents.

Best UI interface/Language to query MarkLogic Data

We will be moving from Oracle and use MarkLogic 8 as our datastore and will be using MarkLogic's Java api to talk with data.
I am exploring for any UI tool (like SQL Developer is there for Oracle), which can be used for ML. I found that ML's Query Manager can used for accessing data. But I see multiple options wrt language:
SQL
SPARQL
XQuery
JavaScript
We need to perform CRUD operations and search for data, and our testing team is aware of SQL (for Oracle), so I am confused which route I should follow and on what basis I should decide which one/two will be better to explore. We are most likely to use JSON document type.
Any help/suggestions would be helpful.
You already mention you will be using the MarkLogic Java Client API, that should provide most of the common needs you could have, including search, CRUD, facets, lexicon values, and also custom extension though REST extensions as the Client API will be leveraging the MarkLogic REST API. It saves you from having to code inside MarkLogic to a large extent.
Apart from that you can run ad hoc commands from the Query Console, using one of the above mentioned languages. SQL will require the presence of a so-called SQL view (see also your earlier question Using SQL in Query Manager in MarkLogic). SPARQL will require enabling the triple index, and ingestion of RDF data.
That leaves XQuery and JavaScript, that have pretty much identical expression power, and performance. If you are unfamiliar with XQuery and XML languages in general, JavaScript might be more appealing.
HTH!

Recommendations using R with SimpleDB or BigQuery or using PHP with SimpleDB

I am currently working on system that generated product recommendations like those on Amazon : "People who bought this also bought this.."
Current Scenario:
Extract the Google Analytics data of the client and insert it in database.
On the website of the client, on load of product page the API call is made to get the recommendations of the product being viewed.
When API receives the product ID as request it looks in the database and retrieves (using association rules) the recommended product IDs and sends them as response.
The list of these product Ids will be processed to get the product details(image,price..) at the client end and displayed on website.
Currently I am using PHP and MYSQL with gapi package and REST api
storage on AMAZON EC2 .
My Question is:
Now, if I have to choose amongst the following, which will be the best choice to implement the above mentioned concept.
PHP with SimpleDB or BIGQuery.
R language with BIGQuery.
RHIPE-(R and hadoop ) with SimpleDB.
Apache Mahout.
Plese help!
This isn't so easy to answer, because the constraints are fairly specialized.
The following considerations can be made, though:
BIGQuery is not yet public. Thus, with a small usage base, even if you are in the preview population, it will be harder to get advice on improvement.
Each of your answers asked about a modeling system & a storage system. Apache Mahout is not a storage mechanism, so it won't necessarily work on its own. I used to believe that its machine learning implementations were a a pastiche of a few Google Summer of Code, but I've updated that view on the suggestion of a commenter. It still looks like it has rather uneven and spotty coverage of different algorithms, and it's not particularly clear how the components are supported or maintained. I encourage an evangelist for Mahout to address this.
As a result, this eliminates the 1st, 2nd, and 4th options.
What I don't quite get is the need for a real-time server to utilize Hadoop and RHIPE. That should be done in your batch processing for developing the recommendation models, not in real-time. I suppose you could use RHIPE as a simple one-stop front end for firing off queries.
I'd recommend using RApache instead of RHIPE, because you can get your packages and models pre-loaded. I see no advantage to using Hadoop in the front end, but it would be a very natural back end system for the model fitting.
(Update 1) Other interface options include RServe (http://www.rforge.net/Rserve/) and possibly RStudio in server mode. There are R/PHP interfaces (see comments below), but I suspect it would be better to access R through HTTP or TCP/IP.
(Update 2) Addressing the whole process, the basic idea I see is that you could query the data from PHP and pass to R or, if you wish to query from within R, look at the link in the comments (to the OmegaHat tools) or post a new question about R & SimpleDB - I'm sure someone else on SO would be able to give better insight on this particular connection. RApache would let you instantiate many R processes already prepared with packages loaded and data in RAM; thus you would only need to pass whatever data needs to be used for prediction. If your new data is a small vector then RApache should be fine, and it seems this is correct for the data being processed in real-time.
If you want a real-time API for recommendations based on data in a database, Apache Mahout does this directly. You want to use ReloadFromJDBCDataModel, put on top a GenericItemBasedRecommender, and use the servlet-based wrapper in the examples module. It's probably a day or two of work to get familiar with the code and customize it to your needs, but it's pretty simple.
When you get past about 100M data points you would need to look at distributing the computation Hadoop. That's a fair bit more complex. Mahout has a distributed recommender too which you can customize.

Best Publish/Subscribe "Middleware" [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I'm in the market for a good open source network based Pub/Sub (observer pattern) library. I haven't found any I like:
JMS - tied to Java, treats message contents as dumb binary blobs
NDDS - $$, use of IDL
CORBA/ICE - Pub/Sub is built on-top of RPC, CORBA API is non-intuitive
JBOSS/ESB - not too familiar with
It would be nice if such a package could to the following:
Network based
Aware of payload data, users should not have to worry about endian/serialization issues
Multiple language support (C++, ruby, Java, python would be nice)
No auto-generated code (no IDLs!)
Intuitive subscription/topic management
For fun, I've created my own. Thoughts?
You might want to look into RabbitMQ.
As pointed-out by an earlier post in this thread, one of your options is OpenSplice DDS which is an Open Source implementation of the OMG DDS Standard (the same standard implemented by NDDS).
The main advantages of OpenSplice DDS over the other middleware you are considering can be summarized as:
Performance
Rich support for QoS (Persistence, Fault-Tolerance, Timeliness, etc.)
Data Centricity (e.g. possibility of querying and filtering data streams)
Something that I'd like to understand is what are your issues with IDL. DDS uses IDL as language-independent way of specifying user data types. However DDS is not limited to IDL, you could be using XML, if you prefer. The advantage of specifying your data types, and decoupling their representation from a specific programming language, is that the middleware can:
(1) take away from you the burden of serializing data,
(2) generate very time/space efficient serialization,
(3) ensure end-to-end type safety,
(4) allow content filtering on the whole data type (not just the header like in JMS), and
(5) enable on-the wire interoperability across programming languages (e.g. Java, C/C++, C#, etc.)
Depending on the system or application you are designing, some of the properties above might not be useful/relevant. In that case, you can simply generate one, a few, "DDS Type" which is the holder of you serialized data.
If you think about JMS, it provides you with 5 different topic types you can use to send your data. With DDS you can do the same, but you have the flexibility to define exactly the topic types.
Finally, you might want to check out this blog entry on Scala and DDS for a longer discussion on why types and static-typing are good especially in distributed systems.
-AC
We use the RTI DDS implementation. It costs $$, but it supports many quality of service parameters.
There is a free DDS implementation called OpenDDS, but I've not used it.
I don't see how you can get around the need to predefine your data types if the target language is statically typed.
Look a bit deeper into the various JMS implementations.
Most of them are not Java only, they provide client libraries for other languages too.
Suns OpenMQ have at least a C++ interface, Apache ActiveMQ provides client side libraries for many common languages.
When it comes to message formats, they're usually decoupled from the message middleware itself. You could define your own message format. You could define your own XML schema and send XML messages. You could send BER encoded ASN.1 using some 3. party library if you want.
Or format and parse the data with a JSON library.
You might be interested in the MUSCLE library (disclaimer: I wrote it, so I may be biased). I think it meets all of the criteria you specified.
https://public.msli.com/lcs/muscle/
Three I've used:
IBM MQ Series - Too Expensive, hard to work with.
Tico Rendezvous - (renamed now to EMS?) Was very fast, used UDP, could also be used with no central server. My favorite but expensive and requires a maint fee.
ActiveMQ - I'm using this currently but finding it crashes frequently. Also it's requires some projects ported from Java like spring.net. It works but I can't recommend it due to stability issues.
Also used MSMQ in an attempt to build my own Pub/Sub, but since it doesn't handle it out of the box your stuck writing a considerable amount of code.
There is also OpenSplice DDS. This one is similar to RTI's DDS, except that it's LGPL!
Check it out:
IBM Webpshere MQ, and the licence is not too expnsive if you work on a corporate level.
You might take a look at PubSubHubbub. It's a extension to Atom/RSS to alow pubsub through webhooks. The interface is HTTP and XML, so it's language-agnostic. It's gaining increasing adoption now that Google Reader, FriendFeed and FeedBurner are using it. The main use case is blogs and stuff, but of course you can have any sort of payload.
The only open source implementation I know of so far is this one for the Google AppEngine. They say support for self-hosting is coming.

Resources