"kubectl describe pod" does not report proper url of liveness probe - http

Below is the report for liveness & readiness after running kubectl -n mynamespace describe pod pod1:
Liveness: http-get http://:8080/a/b/c/.well-known/heartbeat delay=3s timeout=3s period=10s #success=1 #failure=3
Readiness: http-get http://:8080/a/b/c/.well-known/heartbeat delay=3s timeout=3s period=10s #success=1 #failure=3
Is this the valid(working) url? http://:80/
What does #success=1 #failure=3 mean?

The results are completely right:
http://:8080 indicates that it will try an http-get in port 8080 inside your pod
#success=1 indicates a success threshold of 1 (the default), so the first time it gets an answer it will mark the pod as live or ready
#failure=3 indicates a failure threshold of 3 (the default again), so the third time the call fails will mark it unready or try to restart it.
See the official docs: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#configure-probes
You may try to execute this command to see how the probes are configured:
kubectl -n mynamespace get pod pod1 -o yaml

Related

DynamoDB local behaving erratically

This is a very strange situation that's driving me nuts, and I would really appreciate some help here.
I am using CDK to define the DynamoDB table and associated indices. To test them locally, I installed cdklocal and DynamoDB local using localstack. When the computer (Mac running Ventura 13.1) is restarted, everything works as expected. Here is the script I use to bootstrap and start the stack (this is in a file called startStack.sh):
docker-compose up -d
echo "Waiting for 5 seconds"
sleep 5
cd test-app
cdklocal bootstrap
echo "Waiting for 5 seconds"
sleep 5
cdklocal deploy TestAppStack
#cdklocal deploy TestAppStack/ops-table
DYNAMO_ENDPOINT="http://localhost:4566/" dynamodb-admin &
open http://0.0.0.0:8001
cd ..
The test-app directory contains a local copy of the DynamoDB (and associated indices) definition. I do not encounter any errors running the cdklocal (or cdk) deploy commands so I am assuming that the CDK definition is not an issue.
The docker-compose looks like this:
version: "3.8"
services:
localstack:
container_name: AWS-DEVELOPMENT-WITH-LOCALSTACK
image: localstack/localstack:latest
network_mode: bridge
ports:
- "127.0.0.1:53:53"
- "127.0.0.1:53:53/udp"
- "127.0.0.1:443:443"
- "127.0.0.1:4566:4566"
- "127.0.0.1:4571:4571"
- "127.0.0.1:${PORT_WEB_UI-8080}:${PORT_WEB_UI-8080}"
environment:
- DYNAMODB_SHARE_DB=1
- DISABLE_CORS_CHECKS=1
- SERVICES=s3,dynamodb,sns,sqs,firehose,kinesis,ses,sts,cloudformation
- DEBUG=1
- DATA_DIR=/tmp/localstack/data
- PORT_WEB_UI=8080
- LAMBDA_EXECUTOR=local
- KINESIS_ERROR_PROBABILITY=1.0
- DOCKER_HOST=unix:///var/run/docker.sock
- HOST_TMP_FOLDER=./.localstack
volumes:
- './.localstack:/var/lib/localstack'
- '/var/run/docker.sock:/var/run/docker.sock'
Everything works as expected when I first run the startStack.sh file - the dynamodb-admin window opens up correctly and other interfaces can interact with the local DynamoDB table. But after some time (and I have not been able to pinpoint the cause), all interactions with local DynamoDB start failing with the following error(s):
Bootstrapping environment aws://000000000000/us-west-2...
❌ Environment aws://000000000000/us-west-2 failed bootstrapping: UnknownEndpoint: Inaccessible host: `localhost' at port `4566'. This service may not be available in the `us-west-2' region.
at Request.ENOTFOUND_ERROR (/usr/local/lib/node_modules/aws-sdk/lib/event_listeners.js:611:46)
at Request.callListeners (/usr/local/lib/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
at Request.emit (/usr/local/lib/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
at Request.emit (/usr/local/lib/node_modules/aws-sdk/lib/request.js:686:14)
at error2 (/usr/local/lib/node_modules/aws-sdk/lib/event_listeners.js:443:22)
at ClientRequest.<anonymous> (/usr/local/lib/node_modules/aws-sdk/lib/http/node.js:99:9)
at ClientRequest.emit (node:events:513:28)
at ClientRequest.emit (node:domain:489:12)
at Socket.socketErrorListener (node:_http_client:494:9)
at Socket.emit (node:events:513:28) {
code: 'UnknownEndpoint',
region: 'us-west-2',
hostname: 'localhost',
retryable: true,
originalError: [Error],
time: 2023-01-15T06:46:40.614Z
}
Inaccessible host: `localhost' at port `4566'. This service may not be available in the `us-west-2' region.
The script hangs at the following message:
[16:52:01] Retrieved account ID 000000000000 from disk cache
[16:52:01] Assuming role 'arn:aws:iam::000000000000:role/cdk-hnb659fds-deploy-role-000000000000-us-west-2'.
[16:52:01] Assuming role failed: Inaccessible host: `localhost' at port `4566'. This service may not be available in the `us-west-2' region.
[16:52:01] Could not assume role in target account using current credentials Inaccessible host: `localhost' at port `4566'. This service may not be available in the `us-west-2' region. . Please make sure that this role exists in the account. If it doesn't exist, (re)-bootstrap the environment with the right '--trust', using the latest version of the CDK CLI.
current credentials could not be used to assume 'arn:aws:iam::000000000000:role/cdk-hnb659fds-deploy-role-000000000000-us-west-2', but are for the right account. Proceeding anyway.
[16:52:01] Waiting for stack CDKToolkit to finish creating or updating...
Restarting the computer fixes it, but it's not clear what causes the issue in the first place. Restarting Docker does not help either.
Any thoughts on what could be causing the problem and how I can avoid it?
I'm adding this as an answer, although I do not have an affirmative answer I thought I would try to help.
I believe your port is being occupied and thus the process you are running is unable to obtain it resulting in error. Before running the job, check if the port is occupied:
sudo lsof -i :4566

When applying "helm upgrade", ingress nginx says failed calling webhook. (details below)

UPGRADE FAILED: Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post : x509: certificate signed by unknown authority
this is the exact error which comes when I am using helm upgrade.
tried to apply the previous local values file to helm upgrade. did not work
Running the following command fixed the problem. I am unable to find the root cause of the same
kubectl delete -A ValidatingWebhookConfiguration ingress-nginx-admission

Use kubectl patch to add DNS Rewrite Rule to CoreDNS Configmap

I want to use the kubectl patch command to add a DNS rewrite rule to the coredns configmap, as described at Custom DNS Entries For Kubernetes. The default config map looks like this:
apiVersion: v1
data:
Corefile: |
.:53 {
log
errors
health
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
....
and I want to add the line
rewrite name old.name new.name
but how to specify adding a line within the ".:53" element is confounding me.
I know that I can get a similar result using kubectl get ... | sed ... | kubectl replace -f - but that would look kind of ugly, plus I want to expand my knowledge of kubctl patch using JSON. Thanks!
You cannot modify ConfigMap using patch in your case.
data.Corefile is a key and its value (Corefile content) is of type: string.
It is treated by api-server as a string of bytes. You cannot patch part of a string with kubectl patch.
And second of all:
I want to expand my knowledge of kubctl patch using JSON
Corefile is not even a valid json file. Even if it was, api-server doesn't see a json/yaml, for api-server its just a string of random alphanumeric characters.
So what can you do?
You are left with kubectl get ... | sed ... | kubectl replace -f - , and this is a totally valid solution.

Nginx fails to restart via Ansible

I have a task in a playbook that tries to restart nginx via a handler as per usual:
name: restart nginx
service: name=nginx state=restarted
It gaves me this following error:
RUNNING HANDLER [webtier : restart nginx] **************************************
fatal: [vagrant]: FAILED! => {"changed": false, "msg": "Unable to restart service nginx: Failed to restart nginx.service: Connection timed out\nSee system logs and 'systemctl status nginx.service' for details.\n"}
However until last time sudo: yes command was working. and the above error was not coming.
But this time, by adding sudo: yes command
name: restart nginx
service: name=nginx state=restarted
sudo: yes
Gives following error:
ERROR! conflicting action statements: service, sudo
The error appears to be in '/Users/mac/Documents/GitHub/petalandstem/ansible/roles/webtier/handlers/main.yml': line 28, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: restart nginx
^ here
Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again.
How to restart nginx successfully ?
The correct syntax is either INI
- name: restart nginx
service: name=nginx state=restarted
become: true
become_method: sudo
or YAML
- name: restart nginx
service:
name: nginx
state: restarted
become: true
become_method: sudo
See Understanding privilege escalation: become.
Ansible 1.x: sudo: yes
Ansible 2.x: become: yes
That's because the become_method is a choice now but the default is "sudo".
--become-method=BECOME_METHOD
privilege escalation method to use (default=sudo),
valid choices: [ sudo | su | pbrun | pfexec | doas | dzdo | ksu | runas | machinectl ]
i was facing the same issue
this happened to me because httpd was already running on port 80
so i had to stop the httpd service
$ service httpd stop
then try the ansible-playbook
First don't edit the files in sites-enabled, but create links and edit in sites-available.
For me the problem was in sites-enabled folder.
When you delete the default site from sites-available folder, you need to delete the link from sites-enabled.
After deleting the default link from sites-enabled for me worked.

Airflow live executor logs with DaskExecutor

I have an Airflow installation (on Kubernetes). My setup uses DaskExecutor. I also configured remote logging to S3. However when the task is running I cannot see the log, and I get this error instead:
*** Log file does not exist: /airflow/logs/dbt/run_dbt/2018-11-01T06:00:00+00:00/3.log
*** Fetching from: http://airflow-worker-74d75ccd98-6g9h5:8793/log/dbt/run_dbt/2018-11-01T06:00:00+00:00/3.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='airflow-worker-74d75ccd98-6g9h5', port=8793): Max retries exceeded with url: /log/dbt/run_dbt/2018-11-01T06:00:00+00:00/3.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f7d0668ae80>: Failed to establish a new connection: [Errno -2] Name or service not known',))
Once the task is done, the log is shown correctly.
I believe what Airflow is doing is:
for finished tasks read logs from s3
for running tasks, connect to executor's log server endpoint and show that.
Looks like Airflow is using celery.worker_log_server_port to connect to my dask executor to fetch logs from there.
How to configure DaskExecutor to expose log server endpoint?
my configuration:
core remote_logging True
core remote_base_log_folder s3://some-s3-path
core executor DaskExecutor
dask cluster_address 127.0.0.1:8786
celery worker_log_server_port 8793
what i verified:
- verified that the log file exists and is being written to on the executor while the task is running
- called netstat -tunlp on executor container, but did not find any extra port exposed, where logs could be served from.
UPDATE
have a look at serve_logs airflow cli command - I believe it does exactly the same.
We solved the problem by simply starting a python HTTP handler on a worker.
Dockerfile:
RUN mkdir -p $AIRFLOW_HOME/serve
RUN ln -s $AIRFLOW_HOME/logs $AIRFLOW_HOME/serve/log
worker.sh (run by Docker CMD):
#!/usr/bin/env bash
cd $AIRFLOW_HOME/serve
python3 -m http.server 8793 &
cd -
dask-worker $#

Resources