I'm trying to apply a terraform resource (helm_release) to k8s and the apply command is failed half way through.
I checked the pod issue now I need to update some values in the local chart.
Now I'm in a dilemma, where I can't apply the helm_release as the names are in use, and I can't destroy the helm_release since it is not created.
Seems to me the only option is to manually delete the k8s resources that were created by the helm_release chart?
Here is the terraform for helm_release:
cat nginx-arm64.tf
resource "helm_release" "nginx-ingress" {
name = "nginx-ingress"
chart = "/data/terraform/k8s/nginx-ingress-controller-arm64.tgz"
}
BTW: I need to use the local chart as the official chart does not support the ARM64 architecture.
Thanks,
Edit #1:
Here is the list of helm release -> there is no gninx ingress
/data/terraform/k8s$ helm list -A
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
cert-manager default 1 2021-12-08 20:57:38.979176622 +0000 UTC deployed cert-manager-v1.5.0 v1.5.0
/data/terraform/k8s$
Here is the describe pod output:
$ k describe pod/nginx-ingress-nginx-ingress-controller-99cddc76b-62nsr
Name: nginx-ingress-nginx-ingress-controller-99cddc76b-62nsr
Namespace: default
Priority: 0
Node: ocifreevmalways/10.0.0.189
Start Time: Wed, 08 Dec 2021 11:11:59 +0000
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=nginx-ingress
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=nginx-ingress-controller
helm.sh/chart=nginx-ingress-controller-9.0.9
pod-template-hash=99cddc76b
Annotations: <none>
Status: Running
IP: 10.244.0.22
IPs:
IP: 10.244.0.22
Controlled By: ReplicaSet/nginx-ingress-nginx-ingress-controller-99cddc76b
Containers:
controller:
Container ID: docker://0b75f5f68ef35dfb7dc5b90f9d1c249fad692855159f4e969324fc4e2ee61654
Image: docker.io/rancher/nginx-ingress-controller:nginx-1.1.0-rancher1
Image ID: docker-pullable://rancher/nginx-ingress-controller#sha256:177fb5dc79adcd16cb6c15d6c42cef31988b116cb148845893b6b954d7d593bc
Ports: 80/TCP, 443/TCP
Host Ports: 0/TCP, 0/TCP
Args:
/nginx-ingress-controller
--default-backend-service=default/nginx-ingress-nginx-ingress-controller-default-backend
--election-id=ingress-controller-leader
--controller-class=k8s.io/ingress-nginx
--configmap=default/nginx-ingress-nginx-ingress-controller
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Wed, 08 Dec 2021 22:02:15 +0000
Finished: Wed, 08 Dec 2021 22:02:15 +0000
Ready: False
Restart Count: 132
Liveness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
POD_NAME: nginx-ingress-nginx-ingress-controller-99cddc76b-62nsr (v1:metadata.name)
POD_NAMESPACE: default (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-wzqqn (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-wzqqn:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 8m38s (x132 over 10h) kubelet Container image "docker.io/rancher/nginx-ingress-controller:nginx-1.1.0-rancher1" already present on machine
Warning BackOff 3m39s (x3201 over 10h) kubelet Back-off restarting failed container
The terraform state list shows nothing:
/data/terraform/k8s$ t state list
/data/terraform/k8s$
Though the terraform.tfstate.backup shows the nginx ingress (I guess that I did run the destroy command in between?):
/data/terraform/k8s$ cat terraform.tfstate.backup
{
"version": 4,
"terraform_version": "1.0.11",
"serial": 28,
"lineage": "30e74aa5-9631-f82f-61a2-7bdbd97c2276",
"outputs": {},
"resources": [
{
"mode": "managed",
"type": "helm_release",
"name": "nginx-ingress",
"provider": "provider[\"registry.terraform.io/hashicorp/helm\"]",
"instances": [
{
"status": "tainted",
"schema_version": 0,
"attributes": {
"atomic": false,
"chart": "/data/terraform/k8s/nginx-ingress-controller-arm64.tgz",
"cleanup_on_fail": false,
"create_namespace": false,
"dependency_update": false,
"description": null,
"devel": null,
"disable_crd_hooks": false,
"disable_openapi_validation": false,
"disable_webhooks": false,
"force_update": false,
"id": "nginx-ingress",
"keyring": null,
"lint": false,
"manifest": null,
"max_history": 0,
"metadata": [
{
"app_version": "1.1.0",
"chart": "nginx-ingress-controller",
"name": "nginx-ingress",
"namespace": "default",
"revision": 1,
"values": "{}",
"version": "9.0.9"
}
],
"name": "nginx-ingress",
"namespace": "default",
"postrender": [],
"recreate_pods": false,
"render_subchart_notes": true,
"replace": false,
"repository": null,
"repository_ca_file": null,
"repository_cert_file": null,
"repository_key_file": null,
"repository_password": null,
"repository_username": null,
"reset_values": false,
"reuse_values": false,
"set": [],
"set_sensitive": [],
"skip_crds": false,
"status": "failed",
"timeout": 300,
"values": null,
"verify": false,
"version": "9.0.9",
"wait": true,
"wait_for_jobs": false
},
"sensitive_attributes": [],
"private": "bnVsbA=="
}
]
}
]
}
When I try to apply in the same directory, it prompts the error again:
Plan: 1 to add, 0 to change, 0 to destroy.
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
helm_release.nginx-ingress: Creating...
╷
│ Error: cannot re-use a name that is still in use
│
│ with helm_release.nginx-ingress,
│ on nginx-arm64.tf line 1, in resource "helm_release" "nginx-ingress":
│ 1: resource "helm_release" "nginx-ingress" {
Please share your thoughts. Thanks.
Edit2:
The DEBUG logs show some more clues:
2021-12-09T04:30:14.118Z [INFO] provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceDiff: nginx-ingress] Release validated: timestamp=2021-12-09T04:30:14.118Z
2021-12-09T04:30:14.118Z [INFO] provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceDiff: nginx-ingress] Done: timestamp=2021-12-09T04:30:14.118Z
2021-12-09T04:30:14.119Z [WARN] Provider "registry.terraform.io/hashicorp/helm" produced an invalid plan for helm_release.nginx-ingress, but we are tolerating it because it is using the legacy plugin SDK.
The following problems may be the cause of any confusing errors from downstream operations:
- .cleanup_on_fail: planned value cty.False for a non-computed attribute
- .create_namespace: planned value cty.False for a non-computed attribute
- .verify: planned value cty.False for a non-computed attribute
- .recreate_pods: planned value cty.False for a non-computed attribute
- .render_subchart_notes: planned value cty.True for a non-computed attribute
- .replace: planned value cty.False for a non-computed attribute
- .reset_values: planned value cty.False for a non-computed attribute
- .disable_crd_hooks: planned value cty.False for a non-computed attribute
- .lint: planned value cty.False for a non-computed attribute
- .namespace: planned value cty.StringVal("default") for a non-computed attribute
- .skip_crds: planned value cty.False for a non-computed attribute
- .disable_webhooks: planned value cty.False for a non-computed attribute
- .force_update: planned value cty.False for a non-computed attribute
- .timeout: planned value cty.NumberIntVal(300) for a non-computed attribute
- .reuse_values: planned value cty.False for a non-computed attribute
- .dependency_update: planned value cty.False for a non-computed attribute
- .disable_openapi_validation: planned value cty.False for a non-computed attribute
- .atomic: planned value cty.False for a non-computed attribute
- .wait: planned value cty.True for a non-computed attribute
- .max_history: planned value cty.NumberIntVal(0) for a non-computed attribute
- .wait_for_jobs: planned value cty.False for a non-computed attribute
helm_release.nginx-ingress: Creating...
2021-12-09T04:30:14.119Z [INFO] Starting apply for helm_release.nginx-ingress
2021-12-09T04:30:14.119Z [INFO] Starting apply for helm_release.nginx-ingress
2021-12-09T04:30:14.119Z [DEBUG] helm_release.nginx-ingress: applying the planned Create change
2021-12-09T04:30:14.120Z [INFO] provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] setting computed for "metadata" from ComputedKeys: timestamp=2021-12-09T04:30:14.120Z
2021-12-09T04:30:14.120Z [INFO] provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceReleaseCreate: nginx-ingress] Started: timestamp=2021-12-09T04:30:14.120Z
2021-12-09T04:30:14.120Z [INFO] provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceReleaseCreate: nginx-ingress] Getting helm configuration: timestamp=2021-12-09T04:30:14.120Z
2021-12-09T04:30:14.120Z [INFO] provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [INFO] GetHelmConfiguration start: timestamp=2021-12-09T04:30:14.120Z
2021-12-09T04:30:14.120Z [INFO] provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] Using kubeconfig: /home/ubuntu/.kube/config: timestamp=2021-12-09T04:30:14.120Z
2021-12-09T04:30:14.120Z [INFO] provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [INFO] Successfully initialized kubernetes config: timestamp=2021-12-09T04:30:14.120Z
2021-12-09T04:30:14.121Z [INFO] provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [INFO] GetHelmConfiguration success: timestamp=2021-12-09T04:30:14.121Z
2021-12-09T04:30:14.121Z [INFO] provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceReleaseCreate: nginx-ingress] Getting chart: timestamp=2021-12-09T04:30:14.121Z
2021-12-09T04:30:14.125Z [INFO] provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceReleaseCreate: nginx-ingress] Preparing for installation: timestamp=2021-12-09T04:30:14.125Z
2021-12-09T04:30:14.125Z [INFO] provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 ---[ values.yaml ]-----------------------------------
{}: timestamp=2021-12-09T04:30:14.125Z
2021-12-09T04:30:14.125Z [INFO] provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceReleaseCreate: nginx-ingress] Installing chart: timestamp=2021-12-09T04:30:14.125Z
╷
│ Error: cannot re-use a name that is still in use
│
│ with helm_release.nginx-ingress,
│ on nginx-arm64.tf line 1, in resource "helm_release" "nginx-ingress":
│ 1: resource "helm_release" "nginx-ingress" {
│
╵
2021-12-09T04:30:14.158Z [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = transport is closing"
2021-12-09T04:30:14.160Z [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/helm/2.4.1/linux_arm64/terraform-provider-helm_v2.4.1_x5 pid=558800
2021-12-09T04:30:14.160Z [DEBUG] provider: plugin exited
You don't have to manually delete all the resources using kubectl. Under the hood the Terraform Helm provider still uses Helm. So if you run helm list -A you will see all the Helm releases on your cluster, including the nginx-ingress release. Deleting the release is then done via helm uninstall nginx-ingress -n REPLACE_WITH_YOUR_NAMESPACE.
Before re-running terraform apply do check if the Helm release is still in your Terraform state via terraform state list (run this from the same directory as where you run terraform apply from). If you don't see helm_release.nginx-ingress in that list then it is not in your Terraform state and you can just rerun your terraform apply. Else you have to delete it via terraform state rm helm_release.nginx-ingress and then you can run terraform apply again.
Just faced a similar issue like this, but in my case nor there was terraform state for the helm, and nor there was a helm release.
so helm list -A or helm list in the current namespace does not.
I found this that solved: helm/helm#4174
With Helm 3, all releases metadata are saved as Secrets in the same
Namespace of the release. If you got "cannot re-use a name that is
still in use", this means you may need to check some orphan secrets
and delete them
and then its start working
Related
My original version was not old, around 7.41.x, after updating to the latest version, artifactory no longer starts. I can't understand the problem as I don't understand the architecture well. Please help me figure it out. There are no specific errors in the logs.
Artifactory status web page
As far as I understand, the problem is that for some reason the service that implements the user interface on port 8070 does not start.
There is nothing interesting in the logs for this service.
last errors in logs console.log
2022-12-10T19:48:33.842Z [jfrou] [ERROR] [5fb2daba52b5d736] [healthcheck.go:66 ] [main ] [] - Checking health of service 'jffe_000-artifactory' using URL 'http://localhost:8070/readiness' returned an error: Get "http://localhost:8070/readiness": dial tcp 127.0.0.1:8070: connect: connection refused
2022-12-10T19:48:33.852Z [jfrou] [ERROR] [5fb2daba52b5d736] [healthcheck.go:71 ] [main ] [] - Checking health of service 'jfrt_01d96rcgsyfzkj1xnv8zqe1snm-artifactory' using URL 'http://localhost:8091/artifactory/api/v1/system/readiness' returned unexpected response status: 500
2022-12-10T19:48:33.853Z [jfrou] [WARN ] [5fb2daba52b5d736] [local_topology.go:274 ] [main ] [] - Readiness test failed with the following error: "required node services are missing or unhealthy"
2022-12-10T19:48:35.991Z [jfac ] [ERROR] [52b88c5fcb86903d] [o.j.c.ExecutionUtils:190 ] [jf-common-pool-2 ] - Router readiness check failed, cannot start Access
2022-12-10T19:48:35.994Z [jfac ] [ERROR] [52b88c5fcb86903d] [a.s.b.AccessServerRegistrar:79] [jf-common-pool-1 ] - Could not register access
2022-12-10T20:09:28.988Z [jfrou] [ERROR] [37b0a206df08bfae] [healthcheck.go:66 ] [main ] [] - Checking health of service 'jffe_000-artifactory' using URL 'http://localhost:8070/readiness' returned an error: Get "http://localhost:8070/readiness": dial tcp 127.0.0.1:8070: connect: connection refused
2022-12-10T20:09:28.995Z [jfrou] [ERROR] [37b0a206df08bfae] [healthcheck.go:71 ] [main ] [] - Checking health of service 'jfrt_01d96rcgsyfzkj1xnv8zqe1snm-artifactory' using URL 'http://localhost:8091/artifactory/api/v1/system/readiness' returned unexpected response status: 500
I checked the access rights, the availability of the database, the correctness of the configuration files. Tried running ./artifactory.sh and watching the launch progress.
[jfrou] [ERROR] [1dd615d3e5a6357a] [healthcheck.go:71 ] [main ] [] - Checking health of service 'jfrt_01d96rcgsyfzkj1xnv8zqe1snm-artifactory' using URL 'http://localhost:8091/artifactory/api/v1/system/readiness' returned unexpected response status: 500
curl http://localhost:8091/artifactory/api/v1/system/readiness
{
"errors" : [ {
"status" : 500,
"message" : "Bad credentials"
} ]
add:
I spent more than 10 hours on the analysis, turned on the debug. I found this problem, maybe this is just the reason.
I can't find how to fix it yet.
2022-12-11T06:33:14.769Z [jfrt ] [DEBUG] [349c95a5eb2a97e9] [.AuthenticationFilterUtils:146] [http-nio-8081-exec-8] - Entering ArtifactorySsoAuthenticationFilter.getRemoteUserName
2022-12-11T06:33:14.769Z [jfrt ] [DEBUG] [349c95a5eb2a97e9] [o.a.w.s.AccessFilter:519 ] [http-nio-8081-exec-8] - Using anonymous
2022-12-11T06:33:14.770Z [jfrt ] [DEBUG] [349c95a5eb2a97e9] [o.a.w.s.AccessFilter:232 ] [http-nio-8081-exec-8] - Non-UI authentication cache accessed
2022-12-11T06:33:14.770Z [jfrt ] [DEBUG] [349c95a5eb2a97e9] [o.a.w.s.AccessFilter:227 ] [http-nio-8081-exec-8] - UI authentication cache accessed
2022-12-11T06:33:14.770Z [jfrt ] [DEBUG] [349c95a5eb2a97e9] [o.a.w.s.AccessFilter:523 ] [http-nio-8081-exec-8] - Creating the Anonymous token
2022-12-11T06:33:14.775Z [jfrt ] [DEBUG] [349c95a5eb2a97e9] [PassAuthenticationProvider:126] [http-nio-8081-exec-8] - 401, Access didn't authenticate user: 'anonymous'
2022-12-11T06:33:14.776Z [jfrt ] [DEBUG] [349c95a5eb2a97e9] [priseAuthenticationProvider:87] [http-nio-8081-exec-8] - Non docker request. Github provider supports only docker requests.
2022-12-11T06:33:14.781Z [jfrt ] [DEBUG] [ ] [o.a.w.s.ArtifactoryFilter:130 ] [http-nio-8081-exec-8] - org.artifactory.webapp.servlet.ArtifactoryFilter
org.springframework.security.authentication.BadCredentialsException: Bad credentials
at org.artifactory.security.db.DbAuthenticationProvider.additionalAuthenticationChecks(DbAuthenticationProvider.java:64)
at org.springframework.security.authentication.dao.AbstractUserDetailsAuthenticationProvider.authenticate(AbstractUserDetailsAuthenticationProvider.java:147)
at org.artifactory.security.db.DbAuthenticationProvider.authenticate(DbAuthenticationProvider.java:53)
at org.springframework.security.authentication.ProviderManager.authenticate(ProviderManager.java:182)
at org.artifactory.security.RealmAwareAuthenticationManager.authenticate(RealmAwareAuthenticationManager.java:68)
at org.artifactory.webapp.servlet.AccessFilter.useAnonymousIfPossible(AccessFilter.java:533)
at org.artifactory.webapp.servlet.AccessFilter.doFilterInternal(AccessFilter.java:301)
at org.artifactory.webapp.servlet.AccessFilter.doFilter(AccessFilter.java:218)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
at org.artifactory.webapp.servlet.RequestFilter.doFilter(RequestFilter.java:88)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
at org.artifactory.webapp.servlet.ArtifactoryCsrfFilter.doFilter(ArtifactoryCsrfFilter.java:83)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
at org.springframework.session.web.http.SessionRepositoryFilter.doFilterInternal(SessionRepositoryFilter.java:164)
at org.springframework.session.web.http.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:80)
at org.artifactory.webapp.servlet.SessionFilter.doFilter(SessionFilter.java:67)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
at org.artifactory.webapp.servlet.ArtifactoryTracingFilter.doFilter(ArtifactoryTracingFilter.java:38)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
at org.artifactory.webapp.servlet.ArtifactoryFilter.doFilter(ArtifactoryFilter.java:126)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:197)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:97)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:541)
at org.apache.catalina.valves.rewrite.RewriteValve.invoke(RewriteValve.java:289)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:135)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:78)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:360)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:399)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:924)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1743)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191)
at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.base/java.lang.Thread.run(Thread.java:833)
I'm working on the observability part of Opensearch so I'm trying to collect the trace data of a wordpress website and send it to Opensearch.
I'm collecting the trace data using the wordpress plugin Decalog, this later sends the data to Jaeger agent, then from jaeger i'm sending the data to Opentelemetry and then to Data prepper and lastly to Opensearch.
Jaeger agent service in docker-compose :
jaeger-agent:
container_name: jaeger-agent
image: jaegertracing/jaeger-agent:latest
command: [ "--reporter.grpc.host-port=otel-collector:14250" ]
ports:
- "5775:5775/udp"
- "6831:6831/udp"
- "6832:6832/udp"
- "5778:5778/tcp"
networks:
- our-network
The "command" ligne got me this error : Err: connection error: desc = "transport: Error while dialing dial tcp: lookup otel-collector on 127.0.0.11:53: server misbehaving"","system":"grpc","grpc_log":true
So I changed otel-collector to the IP of the otel-collector container.
Otel collector and data prepper are installed using docker-compose.
data-prepper:
restart: unless-stopped
container_name: data-prepper
image: opensearchproject/data-prepper:latest
volumes:
- ./data-prepper/examples/trace_analytics_no_ssl.yml:/usr/share/data-prepper/pipelines.yaml
- ./data-prepper/examples/data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml
- ./data-prepper/examples/demo/root-ca.pem:/usr/share/data-prepper/root-ca.pem
ports:
- "21890:21890"
networks:
- our-network
depends_on:
- "opensearch"
otel-collector:
container_name: otel-collector
image: otel/opentelemetry-collector:0.54.0
command: [ "--config=/etc/otel-collector-config.yml" ]
working_dir: "/project"
volumes:
- ${PWD}/:/project
- ./otel-collector-config.yml:/etc/otel-collector-config.yml
- ./data-prepper/examples/demo/demo-data-prepper.crt:/etc/demo-data-prepper.crt
ports:
- "4317:4317"
depends_on:
- data-prepper
networks:
- our-network
The configuration of otel.yaml (to send data from opentelemetry to opensearch):
receivers:
jaeger:
protocols:
grpc:
exporters:
otlp/2:
endpoint: data-prepper:21890
tls:
insecure: true
insecure_skip_verify: true
logging:
service:
pipelines:
traces:
receivers: [jaeger]
exporters: [logging, otlp/2]
The configuration for data prepper pipeline : entry-pipeline:
delay: "100"
source:
otel_trace_source:
ssl: false
sink:
- pipeline:
name: "raw-pipeline"
- pipeline:
name: "service-map-pipeline"
raw-pipeline:
source:
pipeline:
name: "entry-pipeline"
prepper:
- otel_trace_raw_prepper:
sink:
- opensearch:
hosts: [ "http://localhost:9200" ]
cert: "/usr/share/data-prepper/root-ca.pem"
username: "admin"
password: "admin"
trace_analytics_raw: true
service-map-pipeline:
delay: "100"
source:
pipeline:
name: "entry-pipeline"
prepper:
- service_map_stateful:
sink:
- opensearch:
hosts: ["http://localhost:9200"]
cert: "/usr/share/data-prepper/root-ca.pem"
username: "admin"
password: "admin"
trace_analytics_service_map: true
As of now I'm getting the following errors:
Jaeger agent :
Err: connection error: desc = \"transport: Error while dialing dial tcp otel-collector-container-IP:14250: i/o timeout\"","system":"grpc","grpc_log":true}
Open telemetry collector :
2022-08-04T15:31:32.675Z info pipelines/pipelines.go:78 Exporter is starting... {"kind": "exporter", "data_type": "traces", "name": "logging"}
2022-08-04T15:31:32.675Z info pipelines/pipelines.go:82 Exporter started. {"kind": "exporter", "data_type": "traces", "name": "logging"}
2022-08-04T15:31:32.675Z info pipelines/pipelines.go:78 Exporter is starting... {"kind": "exporter", "data_type": "traces", "name": "otlp/2"}
2022-08-04T15:31:32.682Z info pipelines/pipelines.go:82 Exporter started. {"kind": "exporter", "data_type": "traces", "name": "otlp/2"}
2022-08-04T15:31:32.682Z info pipelines/pipelines.go:86 Starting processors...
2022-08-04T15:31:32.682Z info pipelines/pipelines.go:98 Starting receivers...
2022-08-04T15:31:32.682Z info pipelines/pipelines.go:102 Exporter is starting... {"kind": "receiver", "name": "jaeger", "pipeline": "traces"}
2022-08-04T15:31:32.683Z info static/strategy_store.go:203 No sampling strategies provided or URL is unavailable, using defaults {"kind": "receiver", "name": "jaeger", "pipeline": "traces"}
2022-08-04T15:31:32.683Z info pipelines/pipelines.go:106 Exporter started. {"kind": "receiver", "name": "jaeger", "pipeline": "traces"}
2022-08-04T15:31:32.683Z info service/collector.go:220 Starting otelcol... {"Version": "0.54.0", "NumCPU": 2}
2022-08-04T15:31:32.683Z info service/collector.go:128 Everything is ready. Begin running and processing data.
2022-08-04T15:31:32.684Z warn zapgrpc/zapgrpc.go:191 [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {
"Addr": "data-prepper:21890",
"ServerName": "data-prepper:21890",
"Attributes": null,
"BalancerAttributes": null,
"Type": 0,
"Metadata": null
}. Err: connection error: desc = "transport: Error while dialing dial tcp data-prepper-container-ip:21890: connect: connection refused" {"grpc_log": true}
Data prepper :
Caused by: org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.amazon.dataprepper.DataPrepper]: Constructor threw exception; nested exception is java.lang.RuntimeException: No valid pipeline is available for execution, exiting
Followed by this at the end :
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2022-08-04T15:23:22,803 [main] INFO com.amazon.dataprepper.parser.config.DataPrepperAppConfiguration - Command line args: /usr/share/data-prepper/pipelines.yaml,/usr/share/data-prepper/data-prepper-config.yaml
2022-08-04T15:23:22,806 [main] INFO com.amazon.dataprepper.parser.config.DataPrepperArgs - Using /usr/share/data-prepper/pipelines.yaml configuration file
Opensearch needs a separate tool to support ingestion of Opentelemetry data. It is called DataPrepper and is part of the Opensearch project. There is a nice getting started guide on how to set up trace analytics in Opensearch.
DataPrepper works similar as Fluentd or the Opentelemetry Collector, but has proper support for Opensearch as a data sink. It pre-processes trace data adequately for the Opensearch Dashboards UI tracing plugin. DataPrepper also supports the Opentelemetry metrics format.
Are you still having issues running Data Prepper? The configuration used in this example has been updated since the latest release, and should now be up to date and working (https://github.com/opensearch-project/data-prepper/blob/main/examples/trace_analytics_no_ssl.yml)
I was setting up Argo in my k8s cluster in Argo namespace.
I also Installed MinIO as an Artifact repository (https://github.com/argoproj/argo-workflows/blob/master/docs/configure-artifact-repository.md).
I am configuring a workflow which tries to access that Non-Default Artifact Repository as:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: artifact-passing-
spec:
entrypoint: artifact-example
templates:
- name: artifact-example
steps:
- - name: generate-artifact
template: whalesay
- - name: consume-artifact
template: print-message
arguments:
artifacts:
# bind message to the hello-art artifact
# generated by the generate-artifact step
- name: message
from: "{{steps.generate-artifact.outputs.artifacts.hello-art}}"
- name: whalesay
container:
image: docker/whalesay:latest
command: [sh, -c]
args: ["cowsay hello world | tee /tmp/hello_world.txt"]
outputs:
artifacts:
# generate hello-art artifact from /tmp/hello_world.txt
# artifacts can be directories as well as files
- name: hello-art
path: /tmp/hello_world.txt
s3:
endpoint: argo-artifacts-minio.argo:9000
bucket: my-bucket
key: /my-output-artifact.tgz
accessKeySecret:
name: argo-artifacts-minio
key: accesskey
secretKeySecret:
name: argo-artifacts-minio
key: secretkey
- name: print-message
inputs:
artifacts:
# unpack the message input artifact
# and put it at /tmp/message
- name: message
path: /tmp/message
s3:
endpoint: argo-artifacts-minio.argo:9000
bucket: my-bucket
accessKeySecret:
name: argo-artifacts-minio
key: accesskey
secretKeySecret:
name: argo-artifacts-minio
key: secretkey
container:
image: alpine:latest
command: [sh, -c]
args: ["cat /tmp/message"]
I created the workflow in argo namespace by:
argo submit --watch artifact-passing-nondefault-new.yaml -n argo
But the workflow fails with an error:
STEP PODNAME DURATION MESSAGE
✖ artifact-passing-z9g64 child 'artifact-passing-z9g64-150231068' failed
└---⚠ generate-artifact artifact-passing-z9g64-150231068 12s failed to save outputs: Get https://argo-artifacts-minio.argo:9000/my-bucket/?location=: http: server gave HTTP response to HTTPS client
Can someone help me to solve this error?
Since the minio setup runs without TLS configured, the workflow should specify that it should connect to an insecure artifact repository.
Including a field insecure: true in the s3 definition section of the workflow solves the issue.
s3:
endpoint: argo-artifacts-minio.argo:9000
insecure: true
bucket: my-bucket
key: /my-output-artifact.tgz
accessKeySecret:
name: argo-artifacts-minio
key: accesskey
secretKeySecret:
name: argo-artifacts-minio
key: secretkey
I've been working with Symfony 2.7 and the RabbitMQBundle to handle some long processes asynchronously.
After facing the issue where the MySQL connection dies after a few minutes, I discovered rabbitmq-cli-consumer, a small app in Go that takes care of consuming the queue, and gives its content to a command.
In my case, I use it with this command: ./rabbitmq-cli-consumer -c configuration-stock.conf --include -V -e 'php app/console amqp:consume:stock --env=prod -vvv', with this configuration file:
[rabbitmq]
host = HOST
username = USERNAME
password = PASSWORD
vhost=/VHOST
port=PORT
queue=stock
compression=Off
[exchange]
name=exports
type=direct
durable=On
[queuesettings]
routingkey=stock
messagettl=10000
deadLetterExchange=exports.dl
deadLetterroutingkey=stock
priority=10
To handle errors, I intend to use RabbitMQ's x-dead-letter-exchange and x-dead-letter-routing-key configuration, to be able to retry the message later (in case something went temporarly wrong).
My issue is that, when I define my queues in RabbitMQBundle's configuration, rabbitmq-cli-consumer is unable to consume the queue, throwing this error:
2018/04/23 11:35:54 Connecting RabbitMQ...
2018/04/23 11:35:54 Connected.
2018/04/23 11:35:54 Opening channel...
2018/04/23 11:35:54 Done.
2018/04/23 11:35:54 Setting QoS...
2018/04/23 11:35:54 Succeeded setting QoS.
2018/04/23 11:35:54 Declaring queue "stock"...
2018/04/23 11:35:54 Registering consumer...
2018/04/23 11:35:54 failed to register a consumer: Exception (504) Reason: "channel/connection is not open"
Here is the configuration I use for RabbitMQBundle:
old_sound_rabbit_mq:
producers:
exports:
connection: default
exchange_options:
name: 'exports'
type: direct
exports_dl:
connection: default
exchange_options:
name: 'exports.dl'
type: direct
consumers:
stock_dead_letter:
connection: default
exchange_options:
name: exports.dl
type: direct
queue_options:
name: stock.dl
routing_keys:
- stock
arguments:
x-dead-letter-exchange: ['S', 'exports']
x-dead-letter-routing-key: ['S', 'stock']
x-message-ttl: ['I', 60000]
callback: amqp.consumers.exports.stock
multiple_consumers:
exports:
connection: default
exchange_options:
name: 'exports'
type: direct
queues:
stock:
name: stock
callback: amqp.consumers.exports.stock
routing_keys:
- stock
arguments:
x-dead-letter-exchange: ['S', 'exports.dl']
x-dead-letter-routing-key: ['S', 'stock']
Has anyone ever encountered something similar ? And how did you solve it ?
I am experimenting with Cloudformation CFN-INIT & CFN-HUP based on below template but the wordpress stack doesn't get created. CFN-HUP process is not started and CFN-Init throws Code1 error. Please see Stack-template and error log details below. Can anyone help me understand what's going wrong here please?
Stack-Template:
**Parameters:
DecideEnvSize:
Type: String
Default: LOW
AllowedValues:
- LOW
- MEDIUM
- HIGH
Description: Select Environment Size (S,M,L)
DatabaseName:
Type: String
Default: DB4wordpress
DatabaseUser:
Type: String
Default: ***************
DatabasePassword:
Type: String
Default: *************
NoEcho: true
TestString:
Type: String
Default: Don't eat yourself up!!!
Mappings:
MyRegionMap:
us-east-1:
"AMALINUX" : "ami-c481fad3" # AMALINUX SEP 2016 - N. Verginia
us-east-2:
"AMALINUX" : "ami-71ca9114" # AMALINUX SEP 2016 - Ohio
InstanceSize:
LOW:
"EC2" : "t2.micro"
"DB" : "db.t2.micro"
MEDIUM:
"EC2" : "t2.small"
"DB" : "db.t2.small"
HIGH:
"EC2" : "t2.medium"
"DB" : "db.t2.medium"
Resources:
DBServer:
Type: "AWS::RDS::DBInstance"
Properties:
AllocatedStorage: 5
StorageType: gp2
DBInstanceClass: !FindInMap [InstanceSize, !Ref DecideEnvSize, DB] # Dynamic mapping + Pseudo Parameter
DBName: !Ref DatabaseName
Engine: MySQL
MasterUsername: !Ref DatabaseUser
MasterUserPassword: !Ref DatabasePassword
DeletionPolicy: Delete
EC2server:
Type: "AWS::EC2::Instance"
DependsOn: DBServer
Properties:
ImageId: !FindInMap [MyRegionMap, !Ref "AWS::Region", AMALINUX] # Dynamic mapping + Pseudo Parameter
InstanceType: !FindInMap [InstanceSize, !Ref DecideEnvSize, EC2]
KeyName: AdvancedCFN
**UserData:
"Fn::Base64":
!Sub |
#!/bin/bash
yum update -y aws-cfn-bootstrap # good practice - always do this.
/opt/aws/bin/cfn-init -v --stack ${AWS::StackName} --resource EC2server --configsets wordpress --region ${AWS::Region}
yum -y update
Metadata:
AWS::CloudFormation::Init:
configSets:
wordpress:
- "configure_cfn"
- "install_wordpress"
- "config_wordpress"
configure_cfn:
files:
/etc/cfn/cfn-hup.conf:
content: !Sub |
[main-just some name]
stack=${AWS::StackId}
region=${AWS::Region}
verbose=true
interval=5
mode: "000400"
owner: root
group: root
/etc/cfn/hooks.d/cfn-auto-reloader.conf:
content: !Sub |
[cfn-auto-reloader-hook #just a name]
triggers=post.update
path=Resources.EC2server.Metadata.AWS::CloudFormation::Init
action=/opt/aws/bin/cfn-init -v --stack ${AWS::StackName} --resource EC2server --configsets wordpress --region ${AWS::Region}
mode: "000400"
owner: root
group: root
/var/www/html/index2.html:
content: !Ref TestString
services:
sysvinit:
cfn-hup:
enabled: "true"
ensureRunning: "true"
files:
- "/etc/cfn/cfn-hup.conf"
- "/etc/cfn/hooks.d/cfn-auto-reloader.conf"**
install_wordpress:
packages:
yum:
httpd: []
php: []
mysql: []
php-mysql: []
sources:
/var/www/html: "http://wordpress.org/latest.tar.gz"
services:
sysvinit:
httpd:
enabled: "true"
ensureRunning: "true"
config_wordpress:
commands:
01_clone_config:
cwd: "/var/www/html/wordpress"
test: "test ! -e /var/www/html/wordpress/wp-config.php"
command: "cp wp-config-sample.php wp-config.php"
02_inject_dbhost:
cwd: "/var/www/html/wordpress"
command: !Sub |
sed -i 's/localhost/${DBServer.Endpoint.Address}/g' wp-config.php
03_inject_dbname:
cwd: "/var/www/html/wordpress"
command: !Sub |
sed -i 's/database_name_here/${DatabaseName}/g' wp-config.php
04_inject_dbuser:
cwd: "/var/www/html/wordpress"
command: !Sub |
sed -i 's/username_here/${DatabaseUser}/g' wp-config.php
05_inject_dbpassword:
cwd: "/var/www/html/wordpress"
command: !Sub |
sed -i 's/password_here/${DatabasePassword}/g' wp-config.php
S3blob:
Type: "AWS::S3::Bucket"**
Error & log details
[root#ip-172-31-25-239 ec2-user]# cd /var/log
[root#ip-172-31-25-239 log]# ls
*audit btmp cfn-init-cmd.log cfn-wire.log cloud-init-output.log dmesg lastlog maillog ntpstats spooler wtmp
boot.log cfn-hup.log cfn-init.log cloud-init.log cron dracut.log mail messages secure tallylog yum.log
[root#ip-172-31-25-239 log]# cat cfn-hup.log
2017-12-30 10:48:15,923 [ERROR] Error: [main] section must contain stack option*
===========================================================================================
===========================================================================================
[root#ip-172-31-25-239 log]# cat cfn-init.log
2017-12-30 10:48:15,499 [DEBUG] CloudFormation client initialized with endpoint https://cloudformation.us-east-1.amazonaws.com
2017-12-30 10:48:15,501 [DEBUG] Describing resource EC2server in stack Yetagain-init-hup-try10
2017-12-30 10:48:15,616 [INFO] -----------------------Starting build-----------------------
2017-12-30 10:48:15,616 [DEBUG] Not setting a reboot trigger as scheduling support is not available
2017-12-30 10:48:15,617 [INFO] Running configSets: wordpress
2017-12-30 10:48:15,618 [INFO] Running configSet wordpress
2017-12-30 10:48:15,619 [INFO] Running config configure_cfn
2017-12-30 10:48:15,620 [DEBUG] No packages specified
2017-12-30 10:48:15,620 [DEBUG] No groups specified
2017-12-30 10:48:15,620 [DEBUG] No users specified
2017-12-30 10:48:15,620 [DEBUG] No sources specified
2017-12-30 10:48:15,620 [DEBUG] Parent directory /etc/cfn does not exist, creating
2017-12-30 10:48:15,625 [DEBUG] Writing content to /etc/cfn/cfn-hup.conf
2017-12-30 10:48:15,625 [DEBUG] Setting mode for /etc/cfn/cfn-hup.conf to 000400
2017-12-30 10:48:15,626 [DEBUG] Setting owner 0 and group 0 for /etc/cfn/cfn-hup.conf
2017-12-30 10:48:15,626 [DEBUG] Parent directory /etc/cfn/hooks.d does not exist, creating
2017-12-30 10:48:15,626 [DEBUG] Writing content to /etc/cfn/hooks.d/cfn-auto-reloader.conf
2017-12-30 10:48:15,626 [DEBUG] Setting mode for /etc/cfn/hooks.d/cfn-auto-reloader.conf to 000400
2017-12-30 10:48:15,626 [DEBUG] Setting owner 0 and group 0 for /etc/cfn/hooks.d/cfn-auto-reloader.conf
2017-12-30 10:48:15,626 [DEBUG] Parent directory /var/www/html does not exist, creating
2017-12-30 10:48:15,627 [DEBUG] Writing content to /var/www/html/index2.html
2017-12-30 10:48:15,627 [DEBUG] No mode specified for /var/www/html/index2.html. The file will be created with the mode: 0644
2017-12-30 10:48:15,627 [DEBUG] No commands specified
2017-12-30 10:48:15,627 [DEBUG] Using service modifier: /sbin/chkconfig
2017-12-30 10:48:15,627 [DEBUG] Setting service cfn-hup to enabled
2017-12-30 10:48:15,634 [INFO] enabled service cfn-hup
2017-12-30 10:48:15,635 [DEBUG] Restarting cfn-hup due to change detected in dependency
2017-12-30 10:48:15,635 [DEBUG] Using service runner: /sbin/service
*2017-12-30 10:48:15,941 [ERROR] Could not restart service cfn-hup; return code was 1
2017-12-30 10:48:15,941 [DEBUG] Service output: Stopping cfn-hup: [FAILED]
Starting cfn-hup: [FAILED]
2017-12-30 10:48:15,942 [ERROR] Error encountered during build of configure_cfn: Could not restart cfn-hup*
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/cfnbootstrap/construction.py", line 542, in run_config
CloudFormationCarpenter(config, self._auth_config).build(worklog)
File "/usr/lib/python2.7/dist-packages/cfnbootstrap/construction.py", line 270, in build
CloudFormationCarpenter._serviceTools[manager]().apply(services, changes)
File "/usr/lib/python2.7/dist-packages/cfnbootstrap/service_tools.py", line 161, in apply
self._restart_service(service)
File "/usr/lib/python2.7/dist-packages/cfnbootstrap/service_tools.py", line 185, in _restart_service
raise ToolError("Could not restart %s" % service)
ToolError: Could not restart cfn-hup
2017-12-30 10:48:15,942 [ERROR] -----------------------BUILD FAILED!------------------------
2017-12-30 10:48:15,944 [ERROR] Unhandled exception during build: Could not restart cfn-hup
Traceback (most recent call last):
File "/opt/aws/bin/cfn-init", line 171, in <module>
worklog.build(metadata, configSets)
File "/usr/lib/python2.7/dist-packages/cfnbootstrap/construction.py", line 129, in build
Contractor(metadata).build(configSets, self)
File "/usr/lib/python2.7/dist-packages/cfnbootstrap/construction.py", line 530, in build
self.run_config(config, worklog)
File "/usr/lib/python2.7/dist-packages/cfnbootstrap/construction.py", line 542, in run_config
CloudFormationCarpenter(config, self._auth_config).build(worklog)
File "/usr/lib/python2.7/dist-packages/cfnbootstrap/construction.py", line 270, in build
CloudFormationCarpenter._serviceTools[manager]().apply(services, changes)
File "/usr/lib/python2.7/dist-packages/cfnbootstrap/service_tools.py", line 161, in apply
self._restart_service(service)
File "/usr/lib/python2.7/dist-packages/cfnbootstrap/service_tools.py", line 185, in _restart_service
raise ToolError("Could not restart %s" % service)
ToolError: Could not restart cfn-hup
=============================================================================================================
==============================================================================================================
[root#ip-172-31-25-239 log]# cat /etc/cfn/cfn-hup.conf
[main-just some name]
stack=arn:aws:cloudformation:us-east-1:523324464109:stack/Yetagain-init-hup-try10/908305e0-ed4d-11e7-b9f7-500c285ebefd
region=us-east-1
verbose=true
interval=5
==========================================================================================================
=========================================================================================================
[root#ip-172-31-25-239 log]# cat /etc/cfn/hooks.d/cfn-auto-reloader.conf
[cfn-auto-reloader-hook #just a name]
triggers=post.update
path=Resources.EC2server.Metadata.AWS::CloudFormation::Init
action=/opt/aws/bin/cfn-init -v --stack Yetagain-init-hup-try10 --resource EC2server --configsets wordpress --region us-east-1
Looks like the main error is in cfn-hup.log:
2017-12-30 10:48:15,923 [ERROR] Error: [main] section must contain stack option*
Try changing [main-just some name] to [main] in your cfn-hup.conf. For reference, my /etc/cfn/cfn-hup.conf looks like something like this:
[main]
stack=arn:aws:cloudformation:us-west-1:acccount_id:stack/mystack-dev-ecs-EC2-1VF68LZMOLAIY/cb2a6a80-554a-11e8-b318-503dcab41efa
region=us-west-1
interval=5
verbose=true