Unable to prevent requests reaching the endpoint after circuit breaking is in action - linkerd

I am trying to verify linkerd's circuit breaking configuration by requesting through simple error prone endpoint deployed as a pod in the same k8s cluster where linkerd is deployed as a daemonset.
I have noticed circuit breaking happening by observing the logs but when I try to hit the endpoint again I still recieve the response from the endpoint.
Setup and Test
I used below configs to setup linkerd and its endpoint,
https://raw.githubusercontent.com/linkerd/linkerd-examples/master/k8s-daemonset/k8s/linkerd-egress.yaml
https://raw.githubusercontent.com/zillani/kubex/master/examples/simple-err.yml
endpoint behaviour:
endpoint always return 500 internal server error
failure accrual setting: default
responseClassifier: retryable5XX
proxy curl:
http_proxy=$(kubectl get svc l5d -o jsonpath="{.status.loadBalancer.ingress[0].*}"):4140 curl -L http://<loadblancer-ingress>:8080/simple-err
Observations
1. At the Admin Metrics
"rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/connects" : 505,
"rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/dtab/size.count" : 0,
"rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/failed_connect_latency_ms.count" : 0,
"rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/failure_accrual/probes" : 8,
"rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/failure_accrual/removals" : 2,
"rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/failure_accrual/removed_for_ms" : 268542,
"rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/failure_accrual/revivals" : 0,
"rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/failures" : 505,
"rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/failures/com.twitter.finagle.service.ResponseClassificationSyntheticException" : 505,
"rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/loadbalancer/adds" : 2,
"rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/loadbalancer/algorithm/p2c_least_loaded" : 1.0,
"rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/loadbalancer/available" : 2.0,
"rt/outgoing/service/svc/<loadbalancer-ingress>:8080/failures" : 5,
"rt/outgoing/service/svc/<loadbalancer-ingress>:8080/failures/com.twitter.finagle.service.ResponseClassificationSyntheticException" : 5,
"rt/outgoing/service/svc/<loadbalancer-ingress>:8080/pending" : 0.0,
"rt/outgoing/service/svc/<loadbalancer-ingress>:8080/request_latency_ms.count" : 0,
"rt/outgoing/service/svc/<loadbalancer-ingress>:8080/requests" : 5,
"rt/outgoing/service/svc/<loadbalancer-ingress>:8080/retries/budget" : 100.0,
"rt/outgoing/service/svc/<loadbalancer-ingress>:8080/retries/budget_exhausted" : 5,
"rt/outgoing/service/svc/<loadbalancer-ingress>:8080/retries/per_request.count" : 0,
"rt/outgoing/service/svc/<loadbalancer-ingress>:8080/retries/total" : 500,
"rt/outgoing/service/svc/<loadbalancer-ingress>:8080/success" : 0,
2. At the log
I 0518 10:31:15.816 UTC THREAD23 TraceId:e57aa1baa5148cc5: FailureAccrualFactorymarking connection to "$/io.buoyant.rinet/8080/<loadbalancer-ingress>" as dead.
Problem
After the node being marked as dead, a new request to the linkerd (same http_proxy command above) is hitting the endpoint and returning the response.

This question was answered on the Linkerd community forum. Adding the answer here as well for the sake of completeness:
When failure accrual (circuit breaker) triggers, the endpoint is put into a state called Busy. This actually doesn't guarantee that the endpoint won't be used. Most load balancers (including the default P2CLeastLoaded) will simply pick the healthiest endpoint. In the case where failure accrual has triggered on all endpoints, this means it will have to pick one in the Busy state.

Related

Error while trying to send logs with rsyslog without local storage

I'm trying to send logs into datadog using rsyslog. Ideally, I'm trying to do this without having the logs stored on the server hosting rsyslog. I've run into an error in my config that I haven't been able to find out much about. The error occurs on startup of rsyslog.
omfwd: could not get addrinfo for hostname '(null)':'(null)': Name or service not known [v8.2001.0 try https://www.rsyslog.com/e/2007 ]
Here's the portion I've added into the default rsyslog.config
module(load="imudp")
input(type="imudp" port="514" ruleset="datadog")
ruleset(name="datadog"){
action(
type="omfwd"
action.resumeRetryCount="-1"
queue.type="linkedList"
queue.saveOnShutdown="on"
queue.maxDiskSpace="1g"
queue.fileName="fwdRule1"
)
$template DatadogFormat,"00000000000000000 <%pri%>%protocol-version% %timestamp:::date-rfc3339% %HOSTNAME% %app-name% - - - %msg%\n "
$DefaultNetstreamDriverCAFile /etc/ssl/certs/ca-certificates.crt
$ActionSendStreamDriver gtls
$ActionSendStreamDriverMode 1
$ActionSendStreamDriverAuthMode x509/name
$ActionSendStreamDriverPermittedPeer *.logs.datadoghq.com
*.* ##intake.logs.datadoghq.com:10516;DatadogFormat
}
First things first.
The module imudp enables log reception over udp.
The module omfwd enables log forwarding over (tcp, udp, ...)
So most probably - or atleast as far as i can tell - with rsyslog you just want to log messages locally and then send them to datadog.
I don't know anything about the $ActionSendStreamDriver tags, so I can't help you there. But what is jumping out is, that in your action you haven't defined where the logs should be sent to.
ruleset(name="datadog"){
action(
type="omfwd"
target="10.100.1.1"
port="514"
protocol="udp"
...
)
...
}

Openstack compute node Port Binding Failed

Creation instance failed with error:
Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance
Controller node has not any errors.
There is exception on compute node:
nova.exception.PortBindingFailed: Binding failed for port b70c2f30-f83c-4cae-abf8-98be39a382d5, please check neutron logs for more information.
Neutron's log no errors.
neutron config
[linux_bridge]
physical_interface_mappings = provider:ens3
[vxlan]
enable_vxlan = true
local_ip = 10.101.1.46
l2_population = true
[securitygroup]
enable_security_group = true
firewall_driver = neutron.agent.linux.iptables_firewall.IptablesFirewallDriver
How can I fix port binding?
1, check the port create successful by openstack port show b70c2f30-f83c-4cae-abf8-98be39a382d5, I guess it's good because of the nova execute PortBinding.
2, check whether the all network component works by openstack network agent list, and check the not works component's log.
3, make sure the all hypervisor are at consistent time.
did this port created first then try to attach as interface ? if yes pleas check port owner ( project) .
Port type you created is normal or direct ?

Airflow HTTP call from unreliable network

I need to fetch data from REST API via HTTP get on apache airflow (e.g. to https://something.com/api/data).
The data is come in pages with following structure :
{
"meta" : {
"size" : 50,
"currentPage" : 3,
"totalPage" : 10
},
"data" : [
....
]
}
The problem is, the API provider is not reliable. Sometimes we get 504 gateway timeout. So I have to retry the API call, until current page = total page, and retry if we got 504 Gateway timeout. However, the overall retry process must not exceed 15 minutes.
Is there any way I can achieve this using apache airflow?
Thanks
You could use HTTP Operator from HTTP providers package. Check the examples and guide in those links.
If you don't have it already, start by installing the provider package:
pip install apache-airflow-providers-http
Then you could try it sending requests to https://httpbin.org .To do so, create a connection like:
You could create Tasks using the SimpleHttpOperator:
from datetime import datetime, timedelta
from airflow import DAG
from airflow.providers.http.operators.http import SimpleHttpOperator
with DAG(
'example_http_operator',
default_args={
'retries': 1,
'retry_delay': timedelta(minutes=5),
},
start_date=datetime(2021, 10, 9),
) as dag:
task_get_op = SimpleHttpOperator(
task_id='get_op',
method='GET',
endpoint='get',
data={"param1": "value1", "param2": "value2"},
headers={},
)
By default, under the hood, this operator performs a raise_for_status to the obtained response. So, if the response status_code is not in the range of 1xx or 2xx will raise an exception and the Task will be mark as failed. If you want to customize this behaviour you can provide your own response_check as an argument to the SimpleHttpOperator
:param response_check: A check against the 'requests' response object.
The callable takes the response object as the first positional argumentand optionally any number of keyword arguments available in the context dictionary.
It should return True for 'pass' and False otherwise.
:type response_check: A lambda or defined function.
Finally to handle retries on failures as needed, you could use the following parameters avaiblables in any Operator in Airflow (docs):
retries (int) -- the number of retries that should be performed before failing the task
retry_delay (datetime.timedelta) -- delay between retries
retry_exponential_backoff (bool) -- allow progressive longer waits between retries by using exponential backoff algorithm on retry delay (delay will be converted into seconds)
max_retry_delay (datetime.timedelta) -- maximum delay interval between retries
Finally, to try out how everything works together, perform a request to an endpoint which will answer with an specific error status code:
task_get_op = SimpleHttpOperator(
task_id='get_op',
method='GET',
endpoint='status/400', # response stus code will be 400
data={"param1": "value1", "param2": "value2"},
headers={},
)
Let me know if that works for you!

Connecting to AWS Elasticsearch Service using R - Getting 404 Error

I am trying to query AWS ElasticSearch Service (AWS ES) through a package in R called elastic. I am getting an error when trying to connect to the server.
Here is an example:
install.packages("elastic")
library(elastic)
aws_endpoint = "<secret>"
# I am certain the endpoint exists and is correct, as it functions with Kibana
aws_port = 80
# I have tried 9200, 9300, and 443 with no success
connect(es_host = aws_endpoint,
es_port = 80,
errors = "complete")
ping()
Search(index = "foobar", size = 1)$hits$hits
Whether pinging the server, or actually trying to search a document, both retrieve this error:
Error: 404 - no such index
ES stack trace:
type: index_not_found_exception
reason: no such index
resource.type: index_or_alias
resource.id: us-east-1.es.amazonaws.com
index: us-east-1.es.amazonaws.com
I have gone into my AWS ES dashboard and made certain I am using indexes that exist. Why this error?
I imagine I am misunderstanding something about transport protocols. elastic interacts with elasticsearch's HTTP API. I thought this was fine.
How do I establish an approriate connection between R and AWS ES?
R version 3.3.0 (2016-05-03); elastic_0.7.8
Solved it.
es_path must be specified as an empty string (""). Otherwise, connect() understands the AWS region (i.e. us-east-1.es.amazonaws.com) as the path. I imagine connect() adds the misunderstood path in the HTTP request, following the format shown here.
connect(es_host = aws_endpoint,
es_port = 80,
errors = "complete",
es_path = ""
)
Just to be clear, the parameters I actually used is shown below, but they should not make a difference. Fixing es_path is the key.
connect(es_host = aws_endpoint,
es_port = 443,
es_path = "",
es_transport_schema = "https")

No valid host was found. There are not enough hosts available

I have installed openstack (liberty release). All the services are installed on the VM. Now i am trying to integrate the Ironic service and trying to provision the physical server.
Nova compute service has configured for baremetal hypervisor and the command "nova hypervisor-stats" shows the correct output.
However, when i am trying to launch the instance from horizon , getting error
No valid host was found. There are not enough hosts available.
somehow , nova compute service is not able to connect to baremetal node OR ironic service.
In fact , i have referred the doc :
openstack troubleshoot doc
but no luck.
please suggest
Regards
This typically happens when Nova scheduler tries to find a suitable host to instantiate your VM and then could not succeed. Nova scheduler first runs the list of all available hosts through a series of filters to narrow down the list to the best possible hosts that are capable of running that instance.
nova-scheduler.log:
... Filter ExactRamFilter returned 0 hosts
... Filtering removed all hosts for the request with reservation ID 'r-mld1goh8' and instance ID '98c49d72-9d8e-4377-bbe0-6dbef187e75a'. Filter results: ['RetryFilter: (start: 3, end: 3)', 'AvailabilityZoneFilter: (start: 3, end: 3)', 'ComputeFilter: (start: 3, end: 3)', 'ComputeCapabilitiesFilter: (start: 3, end: 3)', 'ImagePropertiesFilter: (start: 3, end: 3)', 'ExactRamFilter: (start: 3, end: 0)']
ExactRamFilter tries to match a host with the same amount of RAM as the amount of RAM specified in the flavor chosen for the VM. Either create a new flavor or use an existing flavor with exact RAM as the hosts, and you should be able to create the VM successfully (unless you run into some other issues).

Resources