How to specify signature of model in Vertex AI batch prediction - vertex

I uploaded TF universal-sentence-encoder-qa to Vertex and want to use this model to embed some question and answer data.
This model has two signatures - question_encoder and answer_encoder
My question is how can I tell Vertex to use specific encoder when run a batch prediction?
For online prediction, I found someone given a solution Specify signature name on Vertex AI Predict
For batch prediction, I cannot find solution yet. The closest thing I can find is an article about Google AI Platform. In this article it suggests to specify signature via gcloud cli parameter. However, seems it only applies to AI Platform, but NOT for Vertex AI.
Seems that Vertex AI doesn't support gcloud cli yet (or I missed something)

Related

Point Cloud library pose estimation given a pre-existing model as truth

PCL's github directs these questions here so I don't really know where else to ask this.
I'm trying to implement pose estimation given a mesh and a generated point cloud. Using PCL, I know you can do pose estimation with two point clouds from the tutorial. In my case I have an accurate faceted model of my target object. Does there exist a PCL pose estimator that can consume faceted truth models? I would like to avoid using mesh_sampling or mesh2pcd as a work around.
Googling does not bring any results relevant to my search with the following 54 terms
point cloud library pose estimation with
mesh
triangles
facets
truth data
model truth data
model
mesh truth data
vertexes
vertices
point cloud library point set registration with
(above)
point cloud library registration with
(above)
point cloud library 6DOF with
(above)
point cloud library pose with
(above)
point cloud library orientation with
(above)
Maybe I don't know the right words to search?
but it appears like it might be possible, because functors like this
pcl::SampleConsensusPrerejective<PointNT,PointNT,FeatureT>
and this
pcl::Registration< PointSource, PointTarget, Scalar >
take what seem to be pretty generic template arguments, only requiring PCL base functionality. But placing pcl::mesh did not compile (though it doesn't appear to be the only "mesh" type in PCL), since mesh doesn't seem to inherit off of base. The documentation does not talk about what is or is not possible with template types. Additionally I have found zero documentation that states this is impossible or indicates that only point clouds are allowed.
Can I use the model directly with out point cloud conversion, and if not why?
PCL is a library for point cloud processing. While some mesh support is available (pcl::PolygonMesh), practically all the implemented algorithms are based on point cloud data.
However keep in mind that a mesh is just a point cloud + additional triangulation information (faces) - so this means that any point cloud algorithm can be applied on a mesh. You just need to generate a point cloud from your mesh's vertices, and ignore the faces - no need for mesh sampling.

Transforming h2o model into non-h2o one

I know that there is possibility to export/import h2o model, that was previously trained.
My question is - is there a way to transform h2o model to a non-h2o one (that just works in plain R)?
I mean that I don't want to launch the h2o environment (JVM) since I know that predicting on trained model is simply multiplying matrices, applying activation function etc.
Of course it would be possible to extract weights manually etc., but I want to know if there is any better way to do it.
I do not see any previous posts on SA about this problem.
No.
Remember that R is just the client, sending API calls: the algorithms (those matrix multiplications, etc.) are all implemented in Java.
What they do offer is a POJO, which is what you are asking for, but in Java. (POJO stands for Plain Old Java Object.) If you call h2o.download_pojo() on one of your models you will see it is quite straightforward. It may even be possible to write a script to convert it to R code? (Though it might be better, if you were going to go to that trouble, to convert it to C++ code, and then use Rcpp!)
Your other option is to export the weights and biases, in the case of deep learning, implement your own activation function, and use them directly.
But, personally, I've never found the Java side to be a bottleneck, either from the point of view of dev ops (install is easy) or computation (the Java code is well optimized).

Finding similar items using Microsoft Cognitive Services

Which Microsoft Cognitive Services (or Azure Machine Learning services?) is best and least work to use to solve the problem of finding similar articles given an article. An article is a string of text. And assuming I do not have user interaction data about the articles.
Are there anything in Microsoft Cognitive Services that can solve this problem out-of-the-box? It seems I cannot use the Recommendations API since I don't have interaction/user data.
Anthony
I am not sure that Text Analytics API may be a good use for this scenario, at least not yet.
There are really two types of similarities:
1. Surface similarity (lexical) – Similarity by presence of words/alphabets
If we are looking for surface similarity, try fuzzy matching/lookup (SQL Server Integration Services – provides a component for this.), or approximate similarity functions (Jaro-Winkler distance, Levenshtein distance) etc. This would be easier as it would not require you to create a custom machine learning model.
2. Semantic similarity – Similarity by meaning of words
If we are looking for Semantic similarity, then you need to go for semantic clustering, word embedding, DSSM (Deep semantic similarity model) etc.
this is harder to do, as it would require you to train your own machine learning model based on an annotated corpus.
Luis Cabrera | Text Analytics Program Manager | Cloud AI Platform, Microsoft
Yes, you can use Text Analytics API.
examples are available here. https://www.microsoft.com/cognitive-services/en-us/text-analytics-api
I would suggest you use the Text Analytics API [1] as #Narasimha suggested. You would put your strings through the Topic detection API, and then come up with a metric (say, Similarity = count(Matching topics) - count(Non Matching topics)) that could order each string against the others for similarity. This would just require one API call and a little JSON parsing.
[1] https://www.microsoft.com/cognitive-services/en-us/text-analytics-api
Sentence similarity or semantic textual similarity is a measure of how similar two pieces of text are, or to what degree they express the same meaning.
This Microsoft's GitHub repo for NLP provide some sample wich could be used from Azure VM and Azure ML : https://github.com/microsoft/nlp/tree/master/examples/sentence_similarity
This folder contains examples and best practices, written in Jupyter notebooks, for building sentence similarity models. The gensen and pretrained embeddings utility scripts are used to speed up the model building process in the notebooks.
The sentence similarity scores can be used in a wide variety of applications, such as search/retrieval, nearest-neighbor or kernel-based classification methods, recommendations, and ranking tasks.

Which of these "Safe" ECC curves are available in Bouncy Castle?

I'm trying to figure out which "safe" ECC curves are supported in Bouncy Castle. I found a few curves in the namespace Org.BouncyCastle.Asn1, but they are hard to find, and I'm sure I'm missing some.
Do any of the following curves exist in Bouncy Castle? (should I use them?)
M-221
E-222
Curve1174
Curve25519
E-382
M-383
Curve383187
Curve41417
Ed448-Goldilocks
M-511
E-521
I found an (apparently) definitive list of the ECC curves supported by Bouncy Castle. It seems to match the named curves defined in the codebase.
None of the curve names match the names you listed.
However, there is nothing preventing you from tracking down1 and using the parameters that define any of the curves you have listed to define an ECParameterSpec ... or a ECNamedCurveParameterSpec.
1 - The parameters are in the paper you linked to. According to #mentalurg, it is not simple to get them into the correct form. However, this is an open source project, so if >>you<< care about this, there is nothing preventing you from doing the work and submitting a patch. Or if you don't have the time, sponsoring them to do the work for you.
#Stephen C: "tracking down and using the parameters that define any of the curves" - wrong. The parameters (A and B) are only available for Weierstrass form. For Edwards or Mongomery forms one has to do a (error prone) coordinate transformation to Weierstrass form, call the encryption, then transform results back to the original coordinate system.
Besides transformation errors, the performance for such transformed curve might be not optimal.
Both Java native implementation and Bouncy Castle are missing direct support of curve forms other than Weierstrass. And that is the problem.

Finding Connected Components using Hadoop/MapReduce

I need to find connected components for a huge dataset. (Graph being Undirected)
One obvious choice is MapReduce. But i'm a newbie to MapReduce and am quiet short of time to pick it up and to code it myself.
I was just wondering if there is any existing API for the same since it is a very common problem in Social Network Analysis?
Or atleast if anyone is aware of any reliable(tried and tested) source using which atleast i can get started with the implementation myself?
Thanks
I blogged about it for myself:
http://codingwiththomas.blogspot.de/2011/04/graph-exploration-with-hadoop-mapreduce.html
But MapReduce isn't a good fit for these Graph analysis things. Better use BSP (bulk synchronous parallel) for that, Apache Hama provides a good graph API on top of Hadoop HDFS.
I've written a connected components algorithm with MapReduce here: (Mindist search)
https://github.com/thomasjungblut/tjungblut-graph/tree/master/src/de/jungblut/graph/mapreduce
Also a BSP version for Apache Hama can be found here:
https://github.com/thomasjungblut/tjungblut-graph/blob/master/src/de/jungblut/graph/bsp/MindistSearch.java
The implementation isn't as difficult as in MapReduce and it is at least 10 times faster.
If you're interested, checkout the latest version in TRUNK and visit our mailing list.
http://hama.apache.org/
http://apache.org/hama/mail-lists.html
I don't really know if an API is available which has methods to find strongly connected components. But, I implemented the BFS algorithm to find distance from source node to all other nodes in the graph (the graph was a directed graph as big as 65 million nodes).
The idea was to explore the neighbors (distance of 1) for each node in one iteration and feeding the output of reduce back to map, until the distances converge. The map emits the shortest distances possible from each node, and reduce updated the node with the shortest distance from the list.
I would suggest to check this out. Also, this could help. These two links would give you the basic idea about graph algorithms in map reduce paradigm (if you are already not familiar). Essentially, you need to twist the algorithm to use DFS instead of BFS.
You may want to look at the Pegasus project from Carnegie Mellon University. They provide an efficient - and elegant - implementation using MapReduce. They also provide binaries, samples and a very detailed documentation.
The implementation itself is based on the Generalized Iterative Matrix-Vector multiplication (GIM-V) proposed by U Kang in 2009.
PEGASUS: A Peta-Scale Graph Mining System - Implementation and
Observations U Kang, Charalampos E. Tsourakakis, Christos Faloutsos In
IEEE International Conference on Data Mining (ICDM 2009)
EDIT:
The official implementation is actually limited to 2.1 billions nodes (node id are stored as integers). I'm creating a fork on github (https://github.com/placeiq/pegasus) to share my patch and other enhancements (eg. Snappy compression).
It is a little old question but here is something you want to checkout. We implemented connected component using map-reduce on Spark platform.
https://github.com/kwartile/connected-component

Resources