Airflow HTTP call from unreliable network - airflow

I need to fetch data from REST API via HTTP get on apache airflow (e.g. to https://something.com/api/data).
The data is come in pages with following structure :
{
"meta" : {
"size" : 50,
"currentPage" : 3,
"totalPage" : 10
},
"data" : [
....
]
}
The problem is, the API provider is not reliable. Sometimes we get 504 gateway timeout. So I have to retry the API call, until current page = total page, and retry if we got 504 Gateway timeout. However, the overall retry process must not exceed 15 minutes.
Is there any way I can achieve this using apache airflow?
Thanks

You could use HTTP Operator from HTTP providers package. Check the examples and guide in those links.
If you don't have it already, start by installing the provider package:
pip install apache-airflow-providers-http
Then you could try it sending requests to https://httpbin.org .To do so, create a connection like:
You could create Tasks using the SimpleHttpOperator:
from datetime import datetime, timedelta
from airflow import DAG
from airflow.providers.http.operators.http import SimpleHttpOperator
with DAG(
'example_http_operator',
default_args={
'retries': 1,
'retry_delay': timedelta(minutes=5),
},
start_date=datetime(2021, 10, 9),
) as dag:
task_get_op = SimpleHttpOperator(
task_id='get_op',
method='GET',
endpoint='get',
data={"param1": "value1", "param2": "value2"},
headers={},
)
By default, under the hood, this operator performs a raise_for_status to the obtained response. So, if the response status_code is not in the range of 1xx or 2xx will raise an exception and the Task will be mark as failed. If you want to customize this behaviour you can provide your own response_check as an argument to the SimpleHttpOperator
:param response_check: A check against the 'requests' response object.
The callable takes the response object as the first positional argumentand optionally any number of keyword arguments available in the context dictionary.
It should return True for 'pass' and False otherwise.
:type response_check: A lambda or defined function.
Finally to handle retries on failures as needed, you could use the following parameters avaiblables in any Operator in Airflow (docs):
retries (int) -- the number of retries that should be performed before failing the task
retry_delay (datetime.timedelta) -- delay between retries
retry_exponential_backoff (bool) -- allow progressive longer waits between retries by using exponential backoff algorithm on retry delay (delay will be converted into seconds)
max_retry_delay (datetime.timedelta) -- maximum delay interval between retries
Finally, to try out how everything works together, perform a request to an endpoint which will answer with an specific error status code:
task_get_op = SimpleHttpOperator(
task_id='get_op',
method='GET',
endpoint='status/400', # response stus code will be 400
data={"param1": "value1", "param2": "value2"},
headers={},
)
Let me know if that works for you!

Related

Spring #StreamListener: Infinite retries with exponential backoff

I'm trying to configure my consumer to work with an exponential backoff where the message will be processed a fixed number of attempts, applying among them the backoff period. But I don't get the expected behaviour.
This is my Java code:
#EnableBinding({
MessagingConfiguration.EventTopic.class
})
public class MessagingConfiguration {
public interface EventTopic {
String INPUT = "events-channel";
#Input(INPUT)
#Nonnull
SubscribableChannel input();
}
}
#StreamListener(MessagingConfiguration.EventTopic.INPUT))
void handle(#Nonnull Message<Event> event) {
throw new RuntimeException("FAILING!");
}
If I try the next configuration:
spring.cloud.stream:
bindings:
events-channel:
content-type: application/json
destination: event-develop
group: group-event-service
consumer:
max-attempts: 2
After all retries (20*) I get this message:
Backoff FixedBackOff{interval=0, currentAttempts=10, maxAttempts=9} exhausted for ConsumerRecord(...
2 (consumer.max-attempts) * 10 (FixedBackOff.currentAttempts) = 20* retries
All these retries occur with 1 second of delay (default backoff period)
If I change the configuration to:
spring.cloud.stream:
bindings:
events-channel:
content-type: application/json
destination: event-develop
group: group-event-service
consumer:
max-attempts: 8
#Times in milliseconds
back-off-initial-interval: 1000
back-off-max-interval: 60000
back-off-multiplier: 2
The backoff period is well applied during the 8 retries (max-attempts) BUT when the 8 retries finish a new cycle of retries is started indefinitely blocking the topic.
In the next versions, maybe I will implement a more sophisticated system of error handling but now I only need to discard the message after the retries and get the next one.
What am I doing wrong?
I read a lot of questions/answers here, the official documentation and some tutorials on the internet but I didn't find a solution to avoid the infinite loop of retries.
P.S.: I'm working with spring-cloud-stream (3.1.1) and spring-kafka (2.6.6)
This is because the listener container is now configured, by default, with a SeekToCurrentErrorHandler with 10 attempts.
This means you are compounding retries.
You can inject a suitably configured SeekToCurrentErrorHandler using a ListenerContainerCustomizer #Bean.
It is recommended to not have retries configured in both places; either remove the binding configuration and replace it with a suitably configured error handler, or change the error handler to have no retries.

why doesn't resty.redis work with the ngx.timer?

I've asked here but thought I'd post on SO as well:
given this code:
local redis = require('resty.redis')
local client = redis:new()
client:connect(host,port)
ngx.thread.spawn(function()
ngx.say(ngx.time(),' ',#client:keys('*'))
end)
ngx.timer.at(2,function()
ngx.say(ngx.time(),' ',#client:keys('*'))
end)
I get this error:
---urhcdhw2pqoz---
1611628086 5
2021/01/26 10:28:08 [error] 4902#24159: *4 lua entry thread aborted: runtime error: ...local/Cellar/openresty/1.19.3.1_1/lualib/resty/redis.lua:349: bad request
stack traceback:
coroutine 0:
[C]: in function 'send'
...local/Cellar/openresty/1.19.3.1_1/lualib/resty/redis.lua:349: in function 'keys'
./src/main.lua:20: in function <./src/main.lua:19>, context: ngx.timer
so it seems that threads work with redis but timers don't. Why is that?
There are two errors in your code.
It is not possible to pass the cosocket object between Lua handlers (emphasis added by me):
The cosocket object created by this API function has exactly the same lifetime as the Lua handler creating it. So never pass the cosocket object to any other Lua handler (including ngx.timer callback functions) and never share the cosocket object between different Nginx requests.
https://github.com/openresty/lua-nginx-module#ngxsockettcp
In your case, the reference to the cosocket object is stored in the client table (client._sock).
ngx.print/ngx.say are not available in the ngx.timer.* context.
https://github.com/openresty/lua-nginx-module#ngxsay (check the context: section).
You can use ngx.log instead (it writes to nginx log, set error_log stderr debug; in nginx.conf to print logs to stderr).
The following code works as expected:
ngx.timer.at(2, function()
local client = redis:new()
client:connect('127.0.0.1' ,6379)
ngx.log(ngx.DEBUG, #client:keys('*'))
end)

Airflow Exception - Task received SIGTERM signal

I am running airflow tasks using SSH operator. I am pretty sure that the python program has no error and runs successfully when i run it. But when run from airflow towards the end of program execution I end up with SIGTERM error.
I tried to figure out by looking into various solutions but nothing worked. I tried increasing
killed_task_cleanup_time = 1200 from 60 in airflow.cfg file. Also tried changing hostname_callable to socket:gethostname in airflow.cfg as I received the following warning before this error
Warning: The recorded hostname xxx does not match this instance's hostname
Error:
[2020-10-15 10:45:34,937] {taskinstance.py:954} ERROR - Received SIGTERM. Terminating subprocesses.
[2020-10-15 10:45:34,959] {taskinstance.py:1145} ERROR - SSH operator error: Task received SIGTERM signal
Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.7/site-packages/airflow/contrib/operators/ssh_operator.py", line 137, in execute
readq, _, _ = select([channel], [], [], self.timeout)
File "/opt/anaconda3/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 956, in signal_handler
raise AirflowException("Task received SIGTERM signal")
airflow.exceptions.AirflowException: Task received SIGTERM signal
Any ideas and suggestions are teally helpful. Stuck with this for a day now
This problem is triggered by the fact that the RECORDED hostname XXX maps an IP address that is different from the IP address mapped by instance's hostname, throwing a SIGTERM error. So you need to specify the IP mapping for the recorded Hostname XXX
Possibly this thread might help? https://issues.apache.org/jira/browse/AIRFLOW-966.
Which version of airflow are you using, and did you check your celery broker settings?
The solution seems to be setting visibility timeout higher than the celery default, which is 1 hour, to prevent celery from re-submitting the job. I believe this only affects tasks created via manual run / CLI (not normally scheduled tasks.)

How to extend timeout when waiting for citrus async action to complete?

I'm using citrus to test a process that invoke a callback after performing several steps.
I've got the following sequence working:
-> httpClient kicks process
<- SUT answers OK
<-> Several Additional Steps
<- SUT invokes httpServer
-> httpServer answers OK
I'm now trying to make it more generic by using the citrus async container to wait for the SUT invocation in // to the Additional Steps execution.
async(
<- SUT invokes httpServer
-> httpServer answers OK
)
-> httpClient kicks process
<- SUT answers OK
<-> Several Additional Steps
The problem I'm facing is that after the last additional steps executes the async container does not seem to be waiting long enough for my SUT to invoke it. It seems to be waiting maximum 10 sec.
See below the output and the code snippet (without additional steps to make it simple)
14:14:46,423 INFO port.LoggingReporter|
14:14:46,423 DEBUG port.LoggingReporter| TEST STEP 3/4 SUCCESS
14:14:46,423 INFO port.LoggingReporter|
14:14:46,423 DEBUG port.LoggingReporter| TEST STEP 4/4: echo
14:14:46,423 INFO actions.EchoAction| VM Creation processInstanceID: 3543
14:14:46,423 INFO port.LoggingReporter|
14:14:46,423 DEBUG port.LoggingReporter| TEST STEP 4/4 SUCCESS
14:14:46,530 DEBUG citrus.TestCase| Wait for test actions to finish properly ...
14:14:47,530 DEBUG citrus.TestCase| Wait for test actions to finish properly ...
14:14:48,530 DEBUG citrus.TestCase| Wait for test actions to finish properly ...
14:14:49,528 DEBUG citrus.TestCase| Wait for test actions to finish properly ...
14:14:50,529 DEBUG citrus.TestCase| Wait for test actions to finish properly ...
14:14:51,530 DEBUG citrus.TestCase| Wait for test actions to finish properly ...
14:14:52,526 DEBUG citrus.TestCase| Wait for test actions to finish properly ...
14:14:53,529 DEBUG citrus.TestCase| Wait for test actions to finish properly ...
14:14:54,525 DEBUG citrus.TestCase| Wait for test actions to finish properly ...
14:14:55,525 DEBUG citrus.TestCase| Wait for test actions to finish properly ...
14:14:56,430 INFO port.LoggingReporter|
14:14:56,430 ERROR port.LoggingReporter| TEST FAILED StratusActorSSL.SRCreateVMInitGoodParamCentOST004 <com.grge.citrus.cmptest.stratus> Nested exception is: com.consol.citrus.exceptions.CitrusRuntimeException: Failed to wait for nested test actions to finish properly
at com.consol.citrus.TestCase.finish(TestCase.java:266)
Code snippet
async()
.actions(
http().server(extServer)
.receive()
.post("/api/SRResolved")
.contentType("application/json;charset=UTF-8")
.accept("text/plain,application/json,application/*+json,*/*"),
http().server("extServer")
.send()
.response(HttpStatus.OK)
.contentType("application/json")
);
http()
.client(extClientSSL)
.send()
.post("/bpm/process/key/SRCreateVMTest")
.messageType(MessageType.JSON)
.contentType(ContentType.APPLICATION_JSON.getMimeType())
http()
.client(extClientSSL)
.receive()
.response(HttpStatus.CREATED)
.messageType(MessageType.JSON)
.extractFromPayload("$.processInstanceID", "processId");
echo(" processInstanceID: ${processId}");
another update... hopefully this might help other citrus users...
I finally implemented the behaviour I wanted, using the parallel citrus container as shown below. Nevertheless, I'll let this question open for few days as this does not answer my initial question...
parallel().actions(
sequential().actions(
http().server(extServer)
.receive()
.post("/api/SRResolved")
.contentType("application/json;charset=UTF-8")
.accept("text/plain,application/json,application/*+json,*/*"),
http().server("extServer")
.send()
.response(HttpStatus.OK)
.contentType("application/json")
),
sequential().actions(
http()
.client(extClientSSL)
.send()
.post("/bpm/process/key/SRCreateVMTest")
.messageType(MessageType.JSON)
.contentType(ContentType.APPLICATION_JSON.getMimeType())
http()
.client(stratusClientSSL)
.receive()
.response(HttpStatus.CREATED)
.messageType(MessageType.JSON)
.extractFromPayload("$.processInstanceID", "processId"),
echo("VM Creation processInstanceID: ${processId}")
)
);
The more I think about it, the more I believe this is a bug as when using async (as described above) I'm expecting the async part (and thus the test) to continue until the timeout given on the http server (in my case 60 sec) will expire or an expected request is received from the SUT and not after an arbitrary (10 sec) delay following the end of the non-async part of the test case unless I missed something about the async container features & objectives.

How to mark scrape failed because of 503 as error in Scrapy?

So I got status 503 when I crawl. It's retried, but then it gets ignored. I want it to be marked as an error, not ignored. How to do that?
I prefer to set it in settings.py so it would apply to all of my spiders. handle_httpstatus_list seems will only affect one spider.
There are two settings that you should look into:
RETRY_HTTP_CODES:
Default: [500, 502, 503, 504, 408]
Which HTTP response codes to retry. Other errors (DNS lookup issues, connections lost, etc) are always retried.
https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#retry-http-codes
And HTTPERROR_ALLOWED_CODES:
Default: []
Pass all responses with non-200 status codes contained in this list.
https://doc.scrapy.org/en/latest/topics/spider-middleware.html#std:setting-HTTPERROR_ALLOWED_CODES
In the end, I overwrite the retry middleware just for a small change. I set so every time the scraper gave up retrying on something, doesn't matter what is the status code, it will be marked as an error.
It seems Scrapy somehow doesn't associate giving up retrying as an error. That's weird for me.
This is the middleware if anyone wants to use it. Don't forget to activate it on the settings.py
from scrapy.downloadermiddlewares.retry import *
class Retry500Middleware(RetryMiddleware):
def _retry(self, request, reason, spider):
retries = request.meta.get('retry_times', 0) + 1
if retries <= self.max_retry_times:
logger.debug("Retrying %(request)s (failed %(retries)d times): %(reason)s",
{'request': request, 'retries': retries, 'reason': reason},
extra={'spider': spider})
retryreq = request.copy()
retryreq.meta['retry_times'] = retries
retryreq.dont_filter = True
retryreq.priority = request.priority + self.priority_adjust
return retryreq
else:
# This is the point where I update it. It used to be `logger.debug` instead of `logger.error`
logger.error("Gave up retrying %(request)s (failed %(retries)d times): %(reason)s",
{'request': request, 'retries': retries, 'reason': reason},
extra={'spider': spider})

Resources