New eks node instance not able to join cluster, getting "cni plugin not initialized"

New eks node instance not able to join cluster, getting "cni plugin not initialized" - terraform-provider-aws

I am pretty new to terraform and trying to create a new eks cluster with node-group and launch template. The EKS cluster, node-group, launch template, nodes all created successfully. However, when I changed the desired size of the node group (using terraform or the AWS management console), it would fail. No error reported in the Nodg group Health issues tab. I digged further, and found that new instances were launched by the Autoscaling group, but new ones were not able to join the cluster.
Look into the troubled instances, I found the following log by running "sudo journalctl -f -u kubelet"
an 27 19:32:32 ip-10-102-21-129.us-east-2.compute.internal kubelet[3168]: E0127 19:32:32.612322 3168 eviction_manager.go:254] "Eviction manager: failed to get summary stats" err="failed to get node info: node "ip-10-102-21-129.us-east-2.compute.internal" not found"
Jan 27 19:32:32 ip-10-102-21-129.us-east-2.compute.internal kubelet[3168]: E0127 19:32:32.654501 3168 kubelet.go:2427] "Error getting node" err="node "ip-10-102-21-129.us-east-2.compute.internal" not found"
Jan 27 19:32:32 ip-10-102-21-129.us-east-2.compute.internal kubelet[3168]: E0127 19:32:32.755473 3168 kubelet.go:2427] "Error getting node" err="node "ip-10-102-21-129.us-east-2.compute.internal" not found"
Jan 27 19:32:32 ip-10-102-21-129.us-east-2.compute.internal kubelet[3168]: E0127 19:32:32.776238 3168 kubelet.go:2352] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
Jan 27 19:32:32 ip-10-102-21-129.us-east-2.compute.internal kubelet[3168]: E0127 19:32:32.856199 3168 kubelet.go:2427] "Error getting node" err="node "ip-10-102-21-129.us-east-2.compute.internal" not found"
Looked like the issue has something to do with the cni add-ons, googled it and others suggest to check for the log inside the /var/log/aws-routed-eni directory. I could find that directory and logs in the working nodes (the ones created initialy when the eks cluster was created), but the same directory and log files do not exist in the newly launch instances nodes (the one created after the cluster was created and by changing the desired node size)
The image I used for the node-group is ami-0af5eb518f7616978 (amazon/amazon-eks-node-1.24-v20230105)
Here is what my script looks like:
resource "aws_eks_cluster" "eks-cluster" {
name = var.mod_cluster_name
role_arn = var.mod_eks_nodes_role
version = "1.24"
vpc_config {
security_group_ids = [var.mod_cluster_security_group_id]
subnet_ids = var.mod_private_subnets
endpoint_private_access = "true"
endpoint_public_access = "true"
}
}
resource "aws_eks_node_group" "eks-cluster-ng" {
cluster_name = aws_eks_cluster.eks-cluster.name
node_group_name = "eks-cluster-ng"
node_role_arn = var.mod_eks_nodes_role
subnet_ids = var.mod_private_subnets
#instance_types = ["t3a.medium"]
scaling_config {
desired_size = var.mod_asg_desired_size
max_size = var.mod_asg_max_size
min_size = var.mod_asg_min_size
}
launch_template {
#name = aws_launch_template.eks_launch_template.name
id = aws_launch_template.eks_launch_template.id
version = aws_launch_template.eks_launch_template.latest_version
}
lifecycle {
create_before_destroy = true
}
}
resource "aws_launch_template" "eks_launch_template" {
name = join("", [aws_eks_cluster.eks-cluster.name, "-launch-template"])
vpc_security_group_ids = [var.mod_node_security_group_id]
block_device_mappings {
device_name = "/dev/xvda"
ebs {
volume_size = var.mod_ebs_volume_size
volume_type = "gp2"
#encrypted = false
}
}
lifecycle {
create_before_destroy = true
}
image_id = var.mod_ami_id
instance_type = var.mod_eks_node_instance_type
metadata_options {
http_endpoint = "enabled"
http_tokens = "required"
http_put_response_hop_limit = 2
}
user_data = base64encode(<<-EOF
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="
--==MYBOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
set -ex
exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
B64_CLUSTER_CA=${aws_eks_cluster.eks-cluster.certificate_authority[0].data}
API_SERVER_URL=${aws_eks_cluster.eks-cluster.endpoint}
K8S_CLUSTER_DNS_IP=172.20.0.10
/etc/eks/bootstrap.sh ${aws_eks_cluster.eks-cluster.name} --apiserver-endpoint $API_SERVER_URL --b64-cluster-ca $B64_CLUSTER_CA
--==MYBOUNDARY==--\
EOF
)
tag_specifications {
resource_type = "instance"
tags = {
Name = "EKS-MANAGED-NODE"
}
}
}
Another thing I notice is that I tagged the instance Name as "EKS-MANAGED-NODE". That tag showed up correctly in nodes created when the eks cluster was created. However, any new nodes created afterward, the Name changed to "EKS-MANAGED-NODEGROUP-NODE"
I wonder if that indicates there is issue?
I checked the log confirmed that the user-data got looked at and ran when instances started up.
sh-4.2$ more user-data.log
B64_CLUSTER_CA=LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUMvakNDQWVhZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJek1ERXlOekU
0TlRrMU1Wb1hEVE16TURFeU5E (deleted the rest)
API_SERVER_URL=https://EC283069E9FF1B33CD6C59F3E3D0A1B9.gr7.us-east-2.eks.amazonaws.com
K8S_CLUSTER_DNS_IP=172.20.0.10
/etc/eks/bootstrap.sh dev-test-search-eks-oVpBNP0e --apiserver-endpoint https://EC283069E9FF1B33CD6C59F3E3D0A1B9.gr7.us-east-2.eks.amazonaws.com --b64-cluster-ca LS0tLS
1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUMvakND...(deleted the rest)
Using kubelet version 1.24.7
true
Using containerd as the container runtime
true
‘/etc/eks/containerd/containerd-config.toml’ -> ‘/etc/containerd/config.toml’
‘/etc/eks/containerd/sandbox-image.service’ -> ‘/etc/systemd/system/sandbox-image.service’
Created symlink from /etc/systemd/system/multi-user.target.wants/containerd.service to /usr/lib/systemd/system/containerd.service.
Created symlink from /etc/systemd/system/multi-user.target.wants/sandbox-image.service to /etc/systemd/system/sandbox-image.service.
‘/etc/eks/containerd/kubelet-containerd.service’ -> ‘/etc/systemd/system/kubelet.service’
Created symlink from /etc/sy
I confirmed that the role being specified has all the required permission, the role is being used in other eks cluster, I am trying to create a new one based on the existing one using terraform.
I tried removing the launch template and let aws using the default one. Then new nodes have no issue joining the cluster.
I looked at my launch template script and at the registry https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/launch_template
nowhere mentioned that I need to manually add or run the cni plugin.
So I don't understand why the cni plugin was not installed automatically and why instances are not able to join the cluster.
Any help is appreciated.

Related

How to add custom log file for every day entries in wso2 apim?

I am trying to configure gateway access log of wso2 apim (4.0.0) to be written to a separate log file every day so that it should contain api username and api name in a log file. To form the structure of the log file I followed the answer of following question.
The log file structure I have as below:
datetime | remoteIp | username | invoked_api_name | api_url | request | response
Now all entries is being written in wso2carbon.log
I would like it to be written in a file with following pattern:
custom_access_log_gwyyyy-mm-dd.log
Any help is welcome!

You can introduce an extra Log Appender to log the specific Handler logs in it. Find sample instructions below
Open the <apim>/repository/conf/log4j2.properties and add the following to create a Log Appender
appender.APIHANDLER_LOG.type = RollingFile
appender.APIHANDLER_LOG.name = APIHANDLER_LOG
appender.APIHANDLER_LOG.fileName = ${sys:carbon.home}/repository/logs/api-log.log
appender.APIHANDLER_LOG.filePattern = ${sys:carbon.home}/repository/logs/api-log-%d{MM-dd-yyyy}.log
appender.APIHANDLER_LOG.layout.type = PatternLayout
appender.APIHANDLER_LOG.layout.pattern = TID: [%tenantId] [%appName] [%d] %5p {%c} - %m%ex%n
appender.APIHANDLER_LOG.policies.type = Policies
appender.APIHANDLER_LOG.policies.time.type = TimeBasedTriggeringPolicy
appender.APIHANDLER_LOG.policies.time.interval = 1
appender.APIHANDLER_LOG.policies.time.modulate = true
appender.APIHANDLER_LOG.policies.size.type = SizeBasedTriggeringPolicy
appender.APIHANDLER_LOG.policies.size.size=10MB
appender.APIHANDLER_LOG.strategy.type = DefaultRolloverStrategy
appender.APIHANDLER_LOG.strategy.max = 20
appender.APIHANDLER_LOG.filter.threshold.type = ThresholdFilter
appender.APIHANDLER_LOG.filter.threshold.level = DEBUG
Add the created Appender to the appenders property at the top of the log4j2.properties
appenders=APIHANDLER_LOG, CARBON_CONSOLE, ..
Configure your package to log into the new Appender
logger.api-log-handler.name = com.sample.handlers.APILogHandler
logger.api-log-handler.level = DEBUG
logger.api-log-handler.appenderRef.APIHANDLER_LOG.ref = APIHANDLER_LOG
logger.api-log-handler.additivity = false
loggers = api-log-handler, AUDIT_LOG, ...
Save the configurations and invoke the API. Now the logs will be printed to a file called api-log.log.

Nomad : raw_exec install nginx

I want to run raw_exec to install an nginx , is this possible? or how can this be done by raw_exec only. since this code will not start/run.
job "raw-exec" {
datacenters = ["dc1"]
type = "service"
group "exec" {
count = 1
update {
max_parallel = 1
min_healthy_time = "10s"
healthy_deadline = "5m"
progress_deadline = "10m"
auto_revert = true
}
task "raw-exec-test" {
driver = "raw_exec"
config {
command = "/bin/apt"
args = ["-y", "install", "nginx.service"]
}
resources {
cpu = 100
memory = 125
}
}
}
}

Without a job status it is hard to troubleshoot what's wrong.
nomad job status raw-exec would show your job status. It will also show allocations created by the job.
You can check what's wrong with the allocations (set of tasks in a job should be run on a particular node) that Nomad creates by nomad alloc status YOUR-ALLOC-ID.
I've run the following Nomad job specification and it worked well on my MacBook.
I ran nomad using nomad agent -dev in one terminal window, then created the file test.job in another terminal window and ran nomad job run test.job and it installed htop software on the MacBook.
job "raw-exec" {
datacenters = ["dc1"]
type = "batch"
group "exec" {
count = 1
task "raw-exec-test" {
driver = "raw_exec"
config {
command = "brew"
args = ["install", "htop"]
}
resources {
cpu = 100
memory = 125
}
}
}
}
Notice that I've changed job type from service to batch. Batch jobs are designed to run once while services should be up and running all the time. I suppose that you want your apt install -y nginx command to run only once. You can read more about job types here.

Jupyterhub K8s - Issue with Changing User from Jovyan to NB_USER

Everything works well until we wanted to set the NB_USER to the logged in user. When changed the config to run as root and start.sh as default cmd, getting the below error in the log and the container is failing to start. Any help is highly appreciated
After running the container as root, getting the below log for the error:
Set username to: user1
Relocating home dir to /home/user1
mv: cannot move '/home/jovyan' to '/home/user1': Device or resource busy
Here is the config.yaml
singleuser:
defaultUrl: "/lab"
uid: 0
fsGid: 0
hub:
extraConfig: |
c.KubeSpawner.args = ['--allow-root']
c.Spawner.cmd = ['start.sh','jupyterhub-singleuser']
def notebook_dir_hook(spawner):
spawner.environment = {'NB_USER':spawner.user.name, 'NB_UID':'1500'}
c.Spawner.pre_spawn_hook = notebook_dir_hook
from kubernetes import client
def modify_pod_hook(spawner, pod):
pod.spec.containers[0].security_context = client.V1SecurityContext(
privileged=True,
capabilities=client.V1Capabilities(
add=['SYS_ADMIN']
)
)`enter code here`
return pod
c.KubeSpawner.modify_pod_hook = modify_pod_hook

Resource 7bed8adc-9ed9-49dc-b15e-6660e2fc3285 transitioned to failure state ERROR when use openstacksdk to create_server

When I create the openstack server, I get bellow Exception:
Resource 7bed8adc-9ed9-49dc-b15e-6660e2fc3285 transitioned to failure state ERROR
My code is bellow:
server_args = {
"name":server_name,
"image_id":image_id,
"flavor_id":flavor_id,
"networks":[{"uuid":network.id}],
"admin_password": admin_password,
}
try:
server = user_conn.conn.compute.create_server(**server_args)
server = user_conn.conn.compute.wait_for_server(server)
except Exception as e: # there I except the Exception
raise e
When create_server, my server_args data is bellow:
{'flavor_id': 'd4424892-4165-494e-bedc-71dc97a73202', 'networks': [{'uuid': 'da4e3433-2b21-42bb-befa-6e1e26808a99'}], 'admin_password': '123456', 'name': '133456', 'image_id': '60f4005e-5daf-4aef-a018-4c6b2ff06b40'}
My openstacksdk version is 0.9.18.

In the end, I find the flavor data is too big for openstack compute node, so I changed it to a small flavor, so I create success.

Dccp protocol simulation in ns2 2.34

How to add dccp patches to ns2 2.34? Please give me detailed steps.
The file is the file is ns234-dccp-1.patch.
The error comes when I try to simulate dccp is
Kar#ubuntu:~$ ns audiodccp.tcl
invalid command name "Agent/DCCP/TCPlike"
while executing
"Agent/DCCP/TCPlike create _o726 "
invoked from within
"catch "$className create $o $args" msg"
invoked from within
"if [catch "$className create $o $args" msg] {
if [string match "__FAILED_SHADOW_OBJECT_" $msg] {
delete $o
return ""
}
global errorInfo
error "class $..."
(procedure "new" line 3)
invoked from within
"new Agent/DCCP/TCPlike"
invoked from within
"set dccp1 [new Agent/DCCP/TCPlike]"
(file "audiodccp.tcl" line 50)
UBUNTU-10.04
NS2 allinone 2.34

audiodccp.tcl : Unknown file.
invalid command name "Agent/DCCP/TCPlike"
→ → You have a failed build. Or you are using the wrong executable 'ns'. The suggestion is to do :
cd ns-allinone-2.34/-ns-2.34/
cp ns ns-dccp
sudo cp ns-dccp /usr/local/bin/
... and then do simulations with $ ns-dccp [file.tcl]
You can also use ns-2.35, which has DCCP included by default.
Note : You can have as many times ns-allinone-2.xx as you want, installed at the same time. But : Do never add any PATH text to .bashrc. Not required.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

New eks node instance not able to join cluster, getting "cni plugin not initialized" - terraform-provider-aws

Related

How to add custom log file for every day entries in wso2 apim?

Nomad : raw_exec install nginx

Jupyterhub K8s - Issue with Changing User from Jovyan to NB_USER

Resource 7bed8adc-9ed9-49dc-b15e-6660e2fc3285 transitioned to failure state ERROR when use openstacksdk to create_server

Dccp protocol simulation in ns2 2.34

Categories

Resources