Debugging ingestion failure in Kusto - azure-data-explorer

I see a bunch of 'permanent' failures when I fire the following command:-
.show ingestion failures | where FailureKind == "Permanent"
For all the entries that are returned the error code is UpdatePolicy_UnknownError.
The Details column for all the entries shows something like this:-
Failed to invoke update policy. Target Table = 'mytable', Query = '<some query here>': The remote server returned an error: (409) Conflict.: : :
What does this error mean? How do I find out the root cause behind these failures? The information I find through this command is not sufficient. I also copied OperationId for a sample entry and looked it up against the operations info:-
.show operations | where OperationId == '<sample operation id>'
But all I found in the Status is the message Failed performing non-transactional update policy. I know it failed, but can we find out the underlying reason?

"(409) Conflict" error usually comes from writing to the Azure storage.
In general, this error should be treated as a transient one.
If it happens in the writing of the main part of the ingestion, it should be retried (****).
In your case, it happens in writing the data of the non-transactional update policy - this write is not retried - the data enters the main table, but not the dependent table.
In the case of a transactional update policy, the whole ingestion will be failed and then retried.
(****) There was a bug in treating such an error, it was treated as permanent for a short period for the main ingestion data. The bug should be fixed now.

Related

Resource Error when connecting dremio to Alteryx

I've data that is on Dremio and I'm looking to connect it with Alteryx.
It was working fine until once I had cancelled the Alteryx workflow in the middle of the execution. After that that instance - it is always giving the below error which I'm not able to figure out why.
Error: Input Data (1): Error SQLExecute: [Dremio][Connector] (1040) Dremio failed to execute the query: SELECT * FROM "Platform_Reporting"."PlatformActivity" limit 1000
[30039]Query execution error. Details:[
RESOURCE ERROR: Query cancelled by Workload Manager. Query enqueued time of 300.00 seconds exceeded for 'High Cost User Queries' queue.
[Error Id: 3a1e1bb0-18b7-44c0-965a-6933a156ab70 ]
Any help is appreciated!
I got this response from the Alteryx Support team:
Based on the error message it seems the error sit within Dremio itself I would advice to consult the admin to check: https://docs.dremio.com/advanced-administration/workload-management/
I would assume the cancellation of a previous pipeline was not send accordingly to the queue management and thus the error.

Interpretation of CachedStorageObject error while querying Kusto table

I am running a query from follower cluster pointing to the table that exists on the leader cluster and get the following error:-
Partial query failure: An unexpected error occurred. (message: 'StorageException with HttpStatusCode 503 was thrown.: : : ', details: 'Source: Kusto::CachedStorageObject')
Since the error seems to be related to cache , I am trying to understand exactly how to interpret it? If something is not found in the follower cache, ADX should have automatically got the data from leader storage right , I don't quite see why it should fail. It's not quite clear what the error means.
Judging by the StorageException with HttpStatusCode 503, this appears to be a transient failure in accessing underlying storage objects.
If the issue persists, I would recommend that you open a support ticket for your resource, via the Azure portal.

Google Dataflow writing insufficient data to datastore

One of my Batch-Jobs tonight failed with a Runtime-Exception. It writes Data to Datastore like 200 other jobs that were running tonight. This one failed with a very long list auf causes, the root of it should be this:
Caused by: com.google.datastore.v1.client.DatastoreException: I/O error, code=UNAVAILABLE
at com.google.datastore.v1.client.RemoteRpc.makeException(RemoteRpc.java:126)
at com.google.datastore.v1.client.RemoteRpc.call(RemoteRpc.java:95)
at com.google.datastore.v1.client.Datastore.commit(Datastore.java:84)
at com.google.cloud.dataflow.sdk.io.datastore.DatastoreV1$DatastoreWriterFn.flushBatch(DatastoreV1.java:925)
at com.google.cloud.dataflow.sdk.io.datastore.DatastoreV1$DatastoreWriterFn.processElement(DatastoreV1.java:892)
Caused by: java.io.IOException: insufficient data written
at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.close(HttpURLConnection.java:3501)
at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:81)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981)
at com.google.datastore.v1.client.RemoteRpc.call(RemoteRpc.java:87)
at com.google.datastore.v1.client.Datastore.commit(Datastore.java:84)
at com.google.cloud.dataflow.sdk.io.datastore.DatastoreV1$DatastoreWriterFn.flushBatch(DatastoreV1.java:925)
at com.google.cloud.dataflow.sdk.io.datastore.DatastoreV1$DatastoreWriterFn.processElement(DatastoreV1.java:892)
at com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49)
at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:139)
at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:188)
at com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42)
at com.google.cloud.dataflow.sdk.runners.
How can this happen? It's very similar to all the other jobs I run. I am using the Dataflow-Version 1.9.0 and the standard DatastoreIO.v1().write....
The jobIds with this error message:
2017-08-29_17_05_19-6961364220840664744
2017-08-29_16_40_46-15665765683196208095
Is it possible to retrieve the errors/logs of a job from an outside application (Not cloud console) to automatically being able to restart jobs, if they would usually succeed and fail because of quota-issues or other reasons that are temporary?
Thanks in advance
This is most likely because DatastoreIO is trying to write more mutations in one RPC call than the Datastore RPC size limit allows. This is data-dependent - suppose the data for this job differs somewhat from data for other jobs. In any case: this issue was fixed in 2.1.0 - updating your SDK should help.

When to check for database is locked errors in sqlite

On the linux server, where our web app runs, we also have a small app that uses
sqlite (it is written in c).
For performing database actions we use the following commands:
sqlite3_prepare_v2
sqlite3_bind_text or sqlite3_bind_int
sqlite3_step
sqlite3_finalize
Every now and then there was a concurrency situation and I got the following error:
database is locked
So I thought: "This happens when one process writes a certain record and the
other one is trying to read exactly the same record."
So after every step-command, where this collision could be, I checked for this error. When it happended, I waited a few milliseconds and the tried again.
But the sqlite error "database is locked" still occurred.
So I changed every step command and the code lines after it. Somehow I thought that this "database is locked" error could only occur with the step command.
But the error kept coming.
My question is now:
Do I have to check after any sqlite3 command for "error_code ==5" (database is locked)?
Thanks alot in advance
If you're receiving error code 5 (busy) you can limit this by using an immediate transaction. If you're able to begin an immediate transaction, SQLite guarantees that you won't receive a busy error until you commit.
Also note that SQLite doesn't have row-level locking. The entire database is locked. Using a WAL journal, you can one writer and multiple readers. With other journaling methods, you can have either one writer, or multiple readers, but not both simultaneously.
SQLite Documentation on 'SQLITE_BUSY'

Error while saving data to data table in Pega PRPC

While saving data in Pega PRPC using activity with Obj-save method, I got following error message:
pyCommitError: A commit cannot be performed because a deferred save of instance ANDY-FW-ANDYCARRENTALFW-DATA-CARINFO L3 failed: code: SQLState: Message:
Can anyone share some idea on how to fix this issue?
Andy
This message could come when we have a field which has a less field length specified in DB and we are trying to insert bigger string in the field.
There could be other reasons as well.See Tracer.
Deferred save of instance usually fails due to locked instance which you are trying to commit. Or the record you are trying to commit is stale (someone else committed before your commit)
Ideally, this behavior can be observed because of the lock on the work object not being held while you are trying to save the object. Make sure you acquire the lock before updating and save.
Additionally please check if privilege is given to the operator.

Resources