Using "DISTINCT" functionality in DataStoreIO.read with Apache Beam Java SDK

Using "DISTINCT" functionality in DataStoreIO.read with Apache Beam Java SDK - google-cloud-datastore

I am running a dataflow job (Apache Beam SDK 2.1.0 Java, Google dataflow runner) and I need to read from the Google DataStore "distinctly" on one particular property. (like the good old "DISTINCT" keyword in SQL).
Here is my code snippet :
Query.Builder q = Query.newBuilder();
q.addKindBuilder().setName("student-records");
q.addDistinctOn(PropertyReference.newBuilder().setName("studentId").build());
pipeline.apply(DatastoreIO.v1().read().withProjectId("project-id").withQuery(q.build()));
pipelilne.run();
When the pipeline runs, the read() fails due to the following error:
java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: com.google.datastore.v1.client.DatastoreException: Inequality filter on key must also be a group by property when group by properties are set., code=INVALID_ARGUMENT
Could someone please tell me where I am going wrong.

Related

Azure function runtime unreachable using app settings

I have an azure function app ( time trigger functions ) which is giving "Azure Function runtime is unreachable" error. We are using appsetting.json in place of local.host.json and the variables are configured in devops pipeline. I can see that the variables gets updated in azure function app files, However function is not executing yet it shows some memory consumption every 3o mins. Please share your suggestions to fix this. Thanks !

As far as I Know,
Azure Functions Runtime is unreachable comes if the function app is blocked by firewall or storage account configuration is incorrect in the Connection String.
I found few solved issues in the SO for similar errors in the context of Azure Functions deployed through Azure DevOps Pipelines where user #JsAndDotNet solved here the above error by correcting the Platform value in Configuration menu of deployed Azure Function App in the portal.
Another user #DelliganeshSevanesan solved this error in the Context of Azure Functions Deployment using IDEs 70934637 where multiple reasons are given as a check to this error along with the resolutions.
Also, I have observed we need to add Azure Function App Configuration Settings in the Pipeline with the form of Key-Value Pair, which is also known as transformation of the local.settings.json of Azure Function App to the Azure CI/CD Pipelines. For this, I have found the practical workarounds given by #VijayanathViswanathan and #Sajid in these SO Issue 1 & 2.

Get a bulk pipeline runs providing run ids as a list from .NET

I have an endpoint which tells me the status of a given adf pipeline from .NET. For that purpose I use the .NET sdk for ADF, specifically I run PipelineRun pipelineRun = client.PipelineRuns.Get(resourceGroup, dataFactoryName, runId); and then I retrieve the status from pipelineRun.Status. The only thing I get from an user here is runId. However, I have a scenario where I need to send a list of runIds. From what I've seen, reading the official documentation, I noticed that most of their apis work with runId of type str which means it works only per runId? Has any of you ever stumbled upon a scenario like this and how did you manage to get status of multiple runIds? Did you use an already built function from the SDK or you just for-looped the PipelineRun pipelineRun = client.PipelineRuns.Get(resourceGroup, dataFactoryName, runId); for listSize times?

Debugging gremlin problems with AWS Neptune

I've sent a gremlin statement which fails with the following error:
{"code":"InternalFailureException","requestId":"d924c667-2c48-4a7e-a288-6f23bac6283a","detailedMessage":"null: .property(Cardinality, , String)"} (599)
I've enabled audit logs on the cluster side but there aren't any indications for any errors there although I see the request.
Are there any technics to debug such problems with AWS Neptune?
Since the gremlin is built dynamically and builds complicated graphs with thousands of operations, I'm looking for some way to understand better where the error is.
In one case it turned out the payload was too big and in another, the gremlin bytecode failed although it worked great on the local Tinkerpop Server.
But with the generic internal failure exception it is very hard to pinpoint the issues.
I'm using the Node.js gremlin package (3.5.0).

Thanks for the clarifications. As a side note, Neptune is not yet at the 3.5.0 TinkerPop level so there is always the possibility of mismatches between client and server. The audit log only shows the query received. It does not reflect the results of running the query. When I am debugging issues like this one, I often create a text version of the query (the node client has a translator that can do that for you) and use the Neptune REST API to first of all check the /explain plan - you can also generate that from the Neptune notebooks using %%gremlin explain. If nothing obvious shows up in the explain plan I usually try to run the text version of the query just to see if anything shows up - to rule out the client driver. I also sometimes run my query against a TinkerGraph just to see if the issue is as simple as Neptune not returning the best error message it could. If the text query should work, changing the Node.js Gremlin client driver to 3.4.11 is worth trying.
UPDATED to add a link to the documentation on Gremlin translators.
https://tinkerpop.apache.org/docs/current/reference/#translators

RavenDb patch api in embedded version of the server

Is there any difference in patch api in embedded and standard version of the server?
Is there a need to configure document store in some way to enable patch api?
I'm writing a test which use embedded raven. The code works correctly on the standard version but in test it doesn't. I'm constantly receiving patch result: DocumentDoesNotExists. I`ve checked with debugger and the document exists in the store - so it is not a problem with test.
Here you can find a repro of my issue: https://gist.github.com/pblachut/c2e0e227fa3beb51f4f9403505c292bb

I`ve reached the contact in the ravendb support and I have answer for my question.
There should be no difference between embedded and normal version of the server. The problem was that I did not passed explicitly for which database I want to invoke batch command. In the result I tried to patch document in system database.
var result = await documentStore.AsyncDatabaseCommands.ForDatabase("testDb).BatchAsync(new[] {command});
I assumed that database name will be taken from the session (beacuse I get documentStore from there). But the name of database should be always passed.
var documentStore = session.Advanced.DocumentStore;

Forming "complete" keys for google datastore v1 Entity using Java API

I am writing a collection of Entities (ie PCollection< Entity >) into the datastore from within a dataflow job. I am using Java, Apache Beam SDK and Datastore V1 APIs. I write it in the following way :
entities.apply(DatastoreIO.v1().write().withProjectId("my-project-id"));
where entities is a PCollection of type Entity
For dataflow jobs, each Entity must have a "complete key". What is the best way to allocate Ids within a dataflow job ? I need "numeric" keys for the entities, and as of now , I have a hacky way of doing it, which I am not comfortable with
Key.Builder keyBuilder = DatastoreHelper.makeKey("StudentRecord", UUID.randomUUID().getLeastSignificantBits());
Could someone suggest a better way to do it ? I am not familiar with the Datastore V1 APIs , and not sure how exactly to perform the RPC call which the documentation here prescribes
AllocateIds
If someone could provide me some sample code, it would help immensely. Thanks !

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Using "DISTINCT" functionality in DataStoreIO.read with Apache Beam Java SDK - google-cloud-datastore

Related

Azure function runtime unreachable using app settings

Get a bulk pipeline runs providing run ids as a list from .NET

Debugging gremlin problems with AWS Neptune

RavenDb patch api in embedded version of the server

Forming "complete" keys for google datastore v1 Entity using Java API

Categories

Resources