DynamoDB local behaving erratically - amazon-dynamodb

This is a very strange situation that's driving me nuts, and I would really appreciate some help here.
I am using CDK to define the DynamoDB table and associated indices. To test them locally, I installed cdklocal and DynamoDB local using localstack. When the computer (Mac running Ventura 13.1) is restarted, everything works as expected. Here is the script I use to bootstrap and start the stack (this is in a file called startStack.sh):
docker-compose up -d
echo "Waiting for 5 seconds"
sleep 5
cd test-app
cdklocal bootstrap
echo "Waiting for 5 seconds"
sleep 5
cdklocal deploy TestAppStack
#cdklocal deploy TestAppStack/ops-table
DYNAMO_ENDPOINT="http://localhost:4566/" dynamodb-admin &
open http://0.0.0.0:8001
cd ..
The test-app directory contains a local copy of the DynamoDB (and associated indices) definition. I do not encounter any errors running the cdklocal (or cdk) deploy commands so I am assuming that the CDK definition is not an issue.
The docker-compose looks like this:
version: "3.8"
services:
localstack:
container_name: AWS-DEVELOPMENT-WITH-LOCALSTACK
image: localstack/localstack:latest
network_mode: bridge
ports:
- "127.0.0.1:53:53"
- "127.0.0.1:53:53/udp"
- "127.0.0.1:443:443"
- "127.0.0.1:4566:4566"
- "127.0.0.1:4571:4571"
- "127.0.0.1:${PORT_WEB_UI-8080}:${PORT_WEB_UI-8080}"
environment:
- DYNAMODB_SHARE_DB=1
- DISABLE_CORS_CHECKS=1
- SERVICES=s3,dynamodb,sns,sqs,firehose,kinesis,ses,sts,cloudformation
- DEBUG=1
- DATA_DIR=/tmp/localstack/data
- PORT_WEB_UI=8080
- LAMBDA_EXECUTOR=local
- KINESIS_ERROR_PROBABILITY=1.0
- DOCKER_HOST=unix:///var/run/docker.sock
- HOST_TMP_FOLDER=./.localstack
volumes:
- './.localstack:/var/lib/localstack'
- '/var/run/docker.sock:/var/run/docker.sock'
Everything works as expected when I first run the startStack.sh file - the dynamodb-admin window opens up correctly and other interfaces can interact with the local DynamoDB table. But after some time (and I have not been able to pinpoint the cause), all interactions with local DynamoDB start failing with the following error(s):
Bootstrapping environment aws://000000000000/us-west-2...
❌ Environment aws://000000000000/us-west-2 failed bootstrapping: UnknownEndpoint: Inaccessible host: `localhost' at port `4566'. This service may not be available in the `us-west-2' region.
at Request.ENOTFOUND_ERROR (/usr/local/lib/node_modules/aws-sdk/lib/event_listeners.js:611:46)
at Request.callListeners (/usr/local/lib/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
at Request.emit (/usr/local/lib/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
at Request.emit (/usr/local/lib/node_modules/aws-sdk/lib/request.js:686:14)
at error2 (/usr/local/lib/node_modules/aws-sdk/lib/event_listeners.js:443:22)
at ClientRequest.<anonymous> (/usr/local/lib/node_modules/aws-sdk/lib/http/node.js:99:9)
at ClientRequest.emit (node:events:513:28)
at ClientRequest.emit (node:domain:489:12)
at Socket.socketErrorListener (node:_http_client:494:9)
at Socket.emit (node:events:513:28) {
code: 'UnknownEndpoint',
region: 'us-west-2',
hostname: 'localhost',
retryable: true,
originalError: [Error],
time: 2023-01-15T06:46:40.614Z
}
Inaccessible host: `localhost' at port `4566'. This service may not be available in the `us-west-2' region.
The script hangs at the following message:
[16:52:01] Retrieved account ID 000000000000 from disk cache
[16:52:01] Assuming role 'arn:aws:iam::000000000000:role/cdk-hnb659fds-deploy-role-000000000000-us-west-2'.
[16:52:01] Assuming role failed: Inaccessible host: `localhost' at port `4566'. This service may not be available in the `us-west-2' region.
[16:52:01] Could not assume role in target account using current credentials Inaccessible host: `localhost' at port `4566'. This service may not be available in the `us-west-2' region. . Please make sure that this role exists in the account. If it doesn't exist, (re)-bootstrap the environment with the right '--trust', using the latest version of the CDK CLI.
current credentials could not be used to assume 'arn:aws:iam::000000000000:role/cdk-hnb659fds-deploy-role-000000000000-us-west-2', but are for the right account. Proceeding anyway.
[16:52:01] Waiting for stack CDKToolkit to finish creating or updating...
Restarting the computer fixes it, but it's not clear what causes the issue in the first place. Restarting Docker does not help either.
Any thoughts on what could be causing the problem and how I can avoid it?

I'm adding this as an answer, although I do not have an affirmative answer I thought I would try to help.
I believe your port is being occupied and thus the process you are running is unable to obtain it resulting in error. Before running the job, check if the port is occupied:
sudo lsof -i :4566

Related

docker run is stalling

I'm trying to run a docker image, which is essentially the default ASP.Net 6 default API:
docker run my-image-name --restart=aways -d -p 80:5111
I've tried a few different ways of running this, but it appears to always do the same thing. First of all, whether I launch in detached mode or interactive, I get the following display:
{"EventId":14,"LogLevel":"Information","Category":"Microsoft.Hosting.Lifetime","Message":"Now listening on: http://[::]:80","State":{"Message":"Now listening on: http://[::]:80","address":"http://[::]:80","{OriginalFormat}":"Now listening on: {address}"}}
{"EventId":0,"LogLevel":"Information","Category":"Microsoft.Hosting.Lifetime","Message":"Application started. Press Ctrl\u002BC to shut down.","State":{"Message":"Application started. Press Ctrl\u002BC to shut down.","{OriginalFormat}":"Application started. Press Ctrl\u002BC to shut down."}}
{"EventId":0,"LogLevel":"Information","Category":"Microsoft.Hosting.Lifetime","Message":"Hosting environment: Production","State":{"Message":"Hosting environment: Production","envName":"Production","{OriginalFormat}":"Hosting environment: {envName}"}}
{"EventId":0,"LogLevel":"Information","Category":"Microsoft.Hosting.Lifetime","Message":"Content root path: /app","State":{"Message":"Content root path: /app","contentRoot":"/app","{OriginalFormat}":"Content root path: {contentRoot}"}}
It then seems to stall - I can't ctrl-c, nor can I attach to the process (it does show under docker ps). I also can't connect to the API:
http://localhost:5111/swagger
Just returns ERR_CONNECTION_REFUSED
My guess here is that there's something wrong with the dockerfile itself - however, it builds fine. The question I have is: how can I debug this in order to determine the error?
I can't add this as a comment because i don't have the reputation so it will have to be an answer instead.
I am new to Docker myself so I am not sure why it is stalling but when you specify the port with -p 80:5111 you are mapping port 5111 in the container to port 80 on the Docker host.
So you should connect with http://localhost:80/swagger instead.

Airflow Log file does not exist:

Airflow was working fine for several weeks and suddenly started getting errors for a few days.
Dags fail randomly with this error.
Log file does not exist: airflow_path/1.log
Fetching from: http://:8793/airflow_path/1.log
*** Failed to fetch log file from worker. The request to ':///' is missing either an 'http://
I had a similar issue, and I figured that in my case the worker node (I was using Celery Executor) was exhausted and therefore unavailable to execute any dags on it, can you check the CPU and memory utilized by the worker node (or its alternative if you are not using celery executor).
You can try to increase the CPU and memory for that applicable node and try.
Happened to me as well using LocalExecutor and an Airflow setup on Docker Compose. Eventually, I figured that the webserver would fail to fetch old logs whenever I recreated my Docker containers. Digging deeper, I realized that the webserver was failing to fetch the logs because it didn't have access to the filesystem of the scheduler (where the logs live).
The fix was to ensure that both the scheduler and the webserver services in docker-compose.yml share a volume with the logs, i.e.:
# docker-compose.yml
version: "3.9"
services:
scheduler:
image: ...
volumes:
- airflow_logs:/airflow/logs
...
webserver:
image: ...
volumes:
- airflow_logs:/airflow/logs
...
volumes:
airflow_logs:

Euca 5.0 Ansible Console Task Failing

Background:
I am only able to get past the ansible console install/config tasks by adding --region localhost to anywhere in: /usr/share/eucalyptus-ansible/roles/cloud-post/tasks/console.yml wherever it calls tools that take that argument.
Otherwise each sub task fails like this: ["euca-describe-images: error: connection error (('Connection aborted.', gaierror(-2, 'Name or service not known')))"]
Running the commands from that playbook directly on the euca server being configured gives the same result unless I specify --region localhost
Problem:
I'm stuck here: [cloud-post : update console route53 system domain for eucalyptus-cloud authentication]
Error: "euform-update-stack: error (ValidationError): No updates are to be performed.", "stderr_lines": ["euform-update-stack: error (ValidationError): No updates are to be performed."]
All services are running except the ImagingBackend is Not Ready
No instances are running according to euca-describe-instances
Images are available:
IMAGE ami-5be483c81cf8bd65c eucalyptus-console-image-5-0-823/eucalyptus-console-image-5-0-823.raw.manifest.xml 000216594841 available private x86_64 machine instance-store hvm
TAG image ami-5be483c81cf8bd65c type eucalyptus-console-image
TAG image ami-5be483c81cf8bd65c version 5.0.823
IMAGE ami-f31092ddb73e29af9 eucalyptus-service-image-v5.0.100/eucalyptus-service-image.raw.manifest.xml 000216594841 available privatx86_64 machine instance-store hvm
TAG image ami-f31092ddb73e29af9 provides imaging,loadbalancing
TAG image ami-f31092ddb73e29af9 type eucalyptus-service-image
TAG image ami-f31092ddb73e29af9 version 5.0.100
---
all:
hosts:
exp-euca.lan.com:
exp-enc-[01:02].lan.com:
vars:
vpcmido_public_ip_range: "192.168.100.5-192.168.100.254"
vpcmido_public_ip_cidr: "192.168.100.1/24"
cloud_system_dns_dnsdomain: "cloud.lan.com"
cloud_public_port: 443
eucalyptus_console_cloud_deploy: yes
cloud_service_image_rpm: no
cloud_properties:
services.imaging.worker.ntp_server: "x.x.x.x"
services.loadbalancing.worker.ntp_server: "x.x.x.x"
children:
cloud:
hosts:
exp-euca.lan.com:
console:
hosts:
exp-euca.lan.com:
node:
hosts:
exp-enc-[01:02].lan.com:
EDIT:
Solved. Details are in the comments of the marked answer.
The name error most likely means that DNS for the domain cloud.lan.com is not being correctly delegated to your deployment. To test this, check if the nameserver is found:
dig +short NS cloud.lan.com
you should see "ns1.cloud.lan.com" and then should be able to use that nameserver to resolve services, e.g.
dig +short ec2.cloud.lan.com #ns1.cloud.lan.com
which should be the IP of the host for the compute service.
The second item is a bug in the ansible playbook that occurs when the stack is already present and up to date. To work around it, you can either update your playbook or delete the stack before running the playbook. Depending on how far the playbook progressed you may have a script to do this:
/usr/local/bin/console-manage-stack -a delete
the related playbook change is https://github.com/AppScale/ats-deploy/pull/36

Airflow (LocalExecutor) - Docker :: Job is failing with Log file does not exist

Airflow version: 1.10.9
Executor : LocalExecutor
Docket Setup
when job runs sometime we are getting following error. I have searched in web, many people faced this issue in celeryExecutor but we are using LocalExecutor(Docker setup). How can I resolve this problem?
*** Log file does not exist: /home/ubuntu/airflow/airflow/logs/es_update_relevance_score/es_update_relevance_score/2020-05-14T16:26:06.062416+00:00/1.log
*** Fetching from: http://:8793/log/es_update_relevance_score/es_update_relevance_score/2020-05-14T16:26:06.062416+00:00/1.log
*** Failed to fetch log file from worker. Invalid URL 'http://:8793/log/es_update_relevance_score/es_update_relevance_score/2020-05-14T16:26:06.062416+00:00/1.log': No host supplied
Here is one approach I've seen when running the scheduler and webserver in their own containers and using LocalExecutor:
Mount a host log directory as a volume into both the scheduler and webserver containers:
volumes:
- /location/on/host/airflow/logs:/opt/airflow/logs
Make sure the user within the airflow containers (usually airflow) has permissions to read and write that directory. If the permissions are wrong you will see an error like the one in your post.
This probably won't scale beyond LocalExecutor usage though.

Devstack - Changing IP address after installation

I have devstack installed on a ubuntu 12.04 and I could get logged into Dashboard , Now I changed the IP of my ubuntu machine. After changing the IP, I couldn't log into Dashboard anymore
I gets the following error message. I can see my old IP in the error message.
ConnectionError at /auth/login/
HTTPConnectionPool(host='OLD_IP_ADDRESS', port=35357): Max retries exceeded with url: /v2.0/tokens (Caused by <class 'socket.error'>: [Errno 113] No route to host)
Request Method: POST
Request URL: http://NEW_IP_ADDRESS/auth/login/
Django Version: 1.4.5
Exception Type: ConnectionError
Exception Value:
HTTPConnectionPool(host='OLD_IP_ADDRESS', port=35357): Max retries exceeded with url: /v2.0/tokens (Caused by <class 'socket.error'>: [Errno 113] No route to host)
Exception Location: /usr/local/lib/python2.7/dist-packages/requests/adapters.py in send, line 246
Python Executable: /usr/bin/python
Python Version: 2.7.3
Python Path:
['/opt/stack/horizon/openstack_dashboard/wsgi/../..',
'/opt/stack/python-keystoneclient',
'/usr/local/lib/python2.7/dist-packages',
'/opt/stack/python-glanceclient/setuptools_git-1.0b1-py2.7.egg',
'/opt/stack/python-glanceclient',
'/opt/stack/python-cinderclient',
Is there a documented procedure available to change the IP address manually ?
My New IP doesn't have connection to internet so I wouldn't be able to redeploy devstack
Thanks guys for your answers..
I missed to update my answer, I fixed that issue in an easy way.
Solution is first run unstack.sh and then run stack.sh once more. It will do the necessary fix. Since I haven't made much progress with Devstack after installation it makes me more confident to run stack.sh
For the second time when you run stack.sh its not needed to connect to internet, So my issue is fixed.
Feel free to share your thoughts on this.
You will need to change the IP address hard-coded in OpenStack configuration files generated by devstack. They are stored in /etc/ and elsewhere.
http://xmodulo.com/2013/04/how-to-change-ip-address-after-openstack-installation-via-devstack.html
Here's a few steps I've taken to get back online.
backup the answers file...
cp packstack-answers-20130417.txt packstack-answers.txt-SAVE
replace ip addresses...
sed -i '/s/10\.10\.248\.11/10\.32\.70\.10/g' packstack-answers-20130417.txt
Delete the cinder loopback devices, installer fails if it exists
losetup -d /dev/loop0
List what's left mounted via the loop.
losetup -a
rm /var/lib/cinder/cinder-volumes
Now rerun the deploy scripts
packstack --answer-file=packstack-answers-20130417.txt
Fix up other IP addressing concerns with nova-manage in the CLI.
Should work from here.

Resources