429 Client Error: TooManyRequests for url - azure-data-explorer

I have a script that executes an ingestion statement with a certain period. In simplified way it looks as follows:
import time
from azure.kusto.data import KustoClient, KustoConnectionStringBuilder, ClientRequestProperties
cluster = "https://<adxname>.centralus.kusto.windows.net"
client_id = "<sp_guid>"
client_secret = "<sp_secret>"
authority_id = "<tenant_guid>"
db = "db-name"
kcsb = KustoConnectionStringBuilder.with_aad_application_key_authentication(
cluster, client_id, client_secret, authority_id)
client = KustoClient(kcsb)
query = """
.append my_table <|
another_table | where ... | summarize ... | project ...
"""
while True:
client.execute(db, query)
time.sleep(30.0)
So it executes a small query every 30 seconds. The query takes only milliseconds to complete. Lib version: azure-kusto-data==3.1.0.
It works fine for a while, but after some time it starts failing with this error:
requests.exceptions.HTTPError: 429 Client Error: TooManyRequests for
url:
https://adxname.centralus.kusto.windows.net/v1/rest/mgmt
azure.kusto.data.exceptions.KustoApiError: The control command was
aborted due to throttling. Retrying after some backoff might succeed.
CommandType: 'TableAppend', Capacity: 1, Origin:
'CapacityPolicy/Ingestion'.
Looking at the CapacityPolicy/Ingestion mentioned in the error, I cannot see how it can be relevant. This policy left as default:
.show cluster policy capacity
"Policy": {
"IngestionCapacity": {
"ClusterMaximumConcurrentOperations": 512,
"CoreUtilizationCoefficient": 0.75
},
...
}
I do not quite understand how it can be related to concurrent operations or core utilization as ingestion is fast and rarely executed.
How to troubleshoot the issue?

Assuming that you have a monitoring or admin permission on the database you can run the following to see the ingestion activity for any given time period (ensure that you have all types in the "in" clause), for example:
.show commands
| where StartedOn > ago(1h)
| where CommandType in ("DataIngestPull", "TableAppend", "TableSetOrReplace", "TableSetOrAppend")
| summarize count() by CommandType
As a side note, for this type of operation, you should consider using materialized views.

According to the error message, the ingestion capacity for your cluster is 1. This likely indicates you're using the dev SKU that has a single node with 2 cores.
With such a setup, only a single ingestion operation can run at a given time. Any additional concurrent ingestions will be throttled.
You can either implement tighter control over the client(s) ingesting into the cluster, so that no more than a single ingestion command attempts to run concurrently, and the calling code can recover from throttling errors; or scale the cluster up/out - by adding more nodes/cores you'll be increasing the ingestion capacity.
You can also verify who/what else is ingesting into your cluster by using .show commands

Related

Logs of Azure Durable Function in Application Insights

I see the following logs in Application Insights after running an Azure Durable Function:
Does 'Response time' indicate the execution time of each function? If so, is there a way to run a kusto query to return the Response time and name of each function?
Yes, Response Time is the time taken to complete execution
or
Response Time = Latency + Processing time.
You can use the below kql query to pull the function name and response time
requests
| project timestamp,functionName=name,FuncexecutionTime=parse_json(customDimensions).FunctionExecutionTimeMs,operation_Id,functionappName=cloud_RoleName

update policy query and ingestion retry in ADX

By default update policy on a Kusto table is non-transactional. Lets say I have an Update Policy defined on a table MyTarget for which the source is defined in the update policy as MySource. The update policy is defined as transactional. Ingestion has been set on the table MySource. So continuously data will be getting loaded to MySource. Now say certain ingestion data batch is loaded to MySource, right after that the query defined in the Update Policy will be triggered. Now lets say this query fails , due to memory issues etc -- even the data batch loaded to MySource will not be committed (because the update policy is transactional). I have heard that in this case the ingestion will be re-tried automatically. Is it so? I haven't found any documentation regarding this retry. Anyways -- my simple question is -- how many times retry will be attempted and how much is the interval after each attempt? Are these configurable properties (I am talking about ADX cluster which is available through Azure) if I am owner of the ADX cluster?
yes, there's an automatic retry for ingestions that failed due to a failure in a transactional update policy.
the full details can be found here: https://learn.microsoft.com/en-us/azure/kusto/management/updatepolicy#failures
Failures are treated as follows:
Non-transactional policy: The failure is ignored by Kusto. Any retry is the responsibility of the data owner.
Transactional policy: The original ingestion operation that triggered the update will fail as well. The source table and the database will not be modified with new data.
In case the ingestion method is pull (Kusto's Data Management service is involved in the ingestion process), there's an automated retry on the entire ingestion operation, orchestrated by Kusto's Data Management service, according to the following logic:
Retries are done until reaching the earliest between the maximum retry period (2 days) and maximum retry attempts (10 attempts).
The backoff period starts from 2 minutes, and grows exponentially (2 -> 4 -> 8 -> 16 ... minutes)
In any other case, any retry is the responsibility of the data owner.

Debugging ingestion failure in Kusto

I see a bunch of 'permanent' failures when I fire the following command:-
.show ingestion failures | where FailureKind == "Permanent"
For all the entries that are returned the error code is UpdatePolicy_UnknownError.
The Details column for all the entries shows something like this:-
Failed to invoke update policy. Target Table = 'mytable', Query = '<some query here>': The remote server returned an error: (409) Conflict.: : :
What does this error mean? How do I find out the root cause behind these failures? The information I find through this command is not sufficient. I also copied OperationId for a sample entry and looked it up against the operations info:-
.show operations | where OperationId == '<sample operation id>'
But all I found in the Status is the message Failed performing non-transactional update policy. I know it failed, but can we find out the underlying reason?
"(409) Conflict" error usually comes from writing to the Azure storage.
In general, this error should be treated as a transient one.
If it happens in the writing of the main part of the ingestion, it should be retried (****).
In your case, it happens in writing the data of the non-transactional update policy - this write is not retried - the data enters the main table, but not the dependent table.
In the case of a transactional update policy, the whole ingestion will be failed and then retried.
(****) There was a bug in treating such an error, it was treated as permanent for a short period for the main ingestion data. The bug should be fixed now.

flask celery get task real time task status [duplicate]

How does one check whether a task is running in celery (specifically, I'm using celery-django)?
I've read the documentation, and I've googled, but I can't see a call like:
my_example_task.state() == RUNNING
My use-case is that I have an external (java) service for transcoding. When I send a document to be transcoded, I want to check if the task that runs that service is running, and if not, to (re)start it.
I'm using the current stable versions - 2.4, I believe.
Return the task_id (which is given from .delay()) and ask the celery instance afterwards about the state:
x = method.delay(1,2)
print x.task_id
When asking, get a new AsyncResult using this task_id:
from celery.result import AsyncResult
res = AsyncResult("your-task-id")
res.ready()
Creating an AsyncResult object from the task id is the way recommended in the FAQ to obtain the task status when the only thing you have is the task id.
However, as of Celery 3.x, there are significant caveats that could bite people if they do not pay attention to them. It really depends on the specific use-case scenario.
By default, Celery does not record a "running" state.
In order for Celery to record that a task is running, you must set task_track_started to True. Here is a simple task that tests this:
#app.task(bind=True)
def test(self):
print self.AsyncResult(self.request.id).state
When task_track_started is False, which is the default, the state show is PENDING even though the task has started. If you set task_track_started to True, then the state will be STARTED.
The state PENDING means "I don't know."
An AsyncResult with the state PENDING does not mean anything more than that Celery does not know the status of the task. This could be because of any number of reasons.
For one thing, AsyncResult can be constructed with invalid task ids. Such "tasks" will be deemed pending by Celery:
>>> task.AsyncResult("invalid").status
'PENDING'
Ok, so nobody is going to feed obviously invalid ids to AsyncResult. Fair enough, but it also has for effect that AsyncResult will also consider a task that has successfully run but that Celery has forgotten as being PENDING. Again, in some use-case scenarios this can be a problem. Part of the issue hinges on how Celery is configured to keep the results of tasks, because it depends on the availability of the "tombstones" in the results backend. ("Tombstones" is the term use in the Celery documentation for the data chunks that record how the task ended.) Using AsyncResult won't work at all if task_ignore_result is True. A more vexing problem is that Celery expires the tombstones by default. The result_expires setting by default is set to 24 hours. So if you launch a task, and record the id in long-term storage, and more 24 hours later, you create an AsyncResult with it, the status will be PENDING.
All "real tasks" start in the PENDING state. So getting PENDING on a task could mean that the task was requested but never progressed further than this (for whatever reason). Or it could mean the task ran but Celery forgot its state.
Ouch! AsyncResult won't work for me. What else can I do?
I prefer to keep track of goals than keep track of the tasks themselves. I do keep some task information but it is really secondary to keeping track of the goals. The goals are stored in storage independent from Celery. When a request needs to perform a computation depends on some goal having been achieved, it checks whether the goal has already been achieved, if yes, then it uses this cached goal, otherwise it starts the task that will effect the goal, and sends to the client that made the HTTP request a response that indicates it should wait for a result.
The variable names and hyperlinks above are for Celery 4.x. In 3.x the corresponding variables and hyperlinks are: CELERY_TRACK_STARTED, CELERY_IGNORE_RESULT, CELERY_TASK_RESULT_EXPIRES.
Every Task object has a .request property, which contains it AsyncRequest object. Accordingly, the following line gives the state of a Task task:
task.AsyncResult(task.request.id).state
You can also create custom states and update it's value duting task execution.
This example is from docs:
#app.task(bind=True)
def upload_files(self, filenames):
for i, file in enumerate(filenames):
if not self.request.called_directly:
self.update_state(state='PROGRESS',
meta={'current': i, 'total': len(filenames)})
http://celery.readthedocs.org/en/latest/userguide/tasks.html#custom-states
Old question but I recently ran into this problem.
If you're trying to get the task_id you can do it like this:
import celery
from celery_app import add
from celery import uuid
task_id = uuid()
result = add.apply_async((2, 2), task_id=task_id)
Now you know exactly what the task_id is and can now use it to get the AsyncResult:
# grab the AsyncResult
result = celery.result.AsyncResult(task_id)
# print the task id
print result.task_id
09dad9cf-c9fa-4aee-933f-ff54dae39bdf
# print the AsyncResult's status
print result.status
SUCCESS
# print the result returned
print result.result
4
Just use this API from celery FAQ
result = app.AsyncResult(task_id)
This works fine.
Answer of 2020:
#### tasks.py
#celery.task()
def mytask(arg1):
print(arg1)
#### blueprint.py
#bp.route("/args/arg1=<arg1>")
def sleeper(arg1):
process = mytask.apply_async(args=(arg1,)) #mytask.delay(arg1)
state = process.state
return f"Thanks for your patience, your job {process.task_id} \
is being processed. Status {state}"
Try:
task.AsyncResult(task.request.id).state
this will provide the Celery Task status. If Celery Task is already is under FAILURE state it will throw an Exception:
raised unexpected: KeyError('exc_type',)
I found helpful information in the
Celery Project Workers Guide inspecting-workers
For my case, I am checking to see if Celery is running.
inspect_workers = task.app.control.inspect()
if inspect_workers.registered() is None:
state = 'FAILURE'
else:
state = str(task.state)
You can play with inspect to get your needs.
First,in your celery APP:
vi my_celery_apps/app1.py
app = Celery(worker_name)
and next, change to the task file,import app from your celery app module.
vi tasks/task1.py
from my_celery_apps.app1 import app
app.AsyncResult(taskid)
try:
if task.state.lower() != "success":
return
except:
""" do something """
res = method.delay()
print(f"id={res.id}, state={res.state}, status={res.status} ")
print(res.get())
for simple tasks, we can use http://flower.readthedocs.io/en/latest/screenshots.html and http://policystat.github.io/jobtastic/ to do the monitoring.
and for complicated tasks, say a task which deals with a lot other modules. We recommend manually record the progress and message on the specific task unit.
Apart from above Programmatic approach
Using Flower Task status can be easily seen.
Real-time monitoring using Celery Events.
Flower is a web based tool for monitoring and administrating Celery clusters.
Task progress and history
Ability to show task details (arguments, start time, runtime, and more)
Graphs and statistics
Official Document:
Flower - Celery monitoring tool
Installation:
$ pip install flower
Usage:
http://localhost:5555
Update:
This has issue with versioning, flower (version=0.9.7) works only with celery (version=4.4.7) more over when you install flower, it uninstalls your higher version of celery into 4.4.7 and this never works for registered tasks

Airflow- failing a task which returns no data?

What would the best way be to fail a task which is the result of a BCP query (command line query for MS SQL server I am connecting to)?
I am downloading data from multiple tables every 30 minutes. If the data doesn't exist, the BCP command is still creating a file (0 size). This makes it seem like the task was always successful, but in reality it means that there is missing data on a replication server another team is maintaining.
bcp "SELECT * FROM database.dbo.table WHERE row_date = '2016-05-28' AND interval = 0" queryout /home/var/filename.csv -t, -c -S server_ip -U user -P password
The row_date and interval would be tied to the execution date in Airflow. I would like for airflow to show a failed task instance if the query returned no data though. Any suggestions?
Check for file size as part of the task?
Create an upstream task which reads the first couple of rows and tells Airflow whether the query was valid or not?
I would use your first suggestion and check for the file size as part of the task.
If it is not possible to do this in the same task as the query, create a new one with that specific purpose with an upstream dependency. In the cases that the file is empty just trigger an exception in the task.

Resources