I installed datadog-agent using helm upgrade/install and provided the -f datadog.yaml parameter. The datadog.yaml contains this entry:
...
agents:
enabled: true
useConfigMap: true
customAgentConfig:
# Autodiscovery for Kubernetes
listeners:
- name: kubelet
config_providers:
- name: kubelet
polling: true
- name: docker
polling: true
apm_config:
enabled: false
apm_non_local_traffic: true
dogstatsd_mapper_profiles:
- name: airflow
prefix: "airflow."
mappings:
- match: "airflow.*_start"
name: "airflow.job.start"
tags:
job_name: "$1"
- match: "airflow.*_end"
name: "airflow.job.end"
tags:
job_name: "$1"
- match: "airflow.operator_failures_*"
...
But I don't see DD_DOGSTATSD_MAPPER_PROFILES env variable in the datadog-agent pod.
How can I inject this env variable to the datadog-agents?
Update 2/24/2022: I do see it's been added as a ConfigMap but it does not look like it is being mounted to the datadog-agent pod.
Update 3/4/2022: This yaml is working and I see the metrics in datadog dashboard. I do see it got mounted on the datadog-agent pod as config-map.
Use this in your datadog's helm chart:
datadog.env:
- name: DD_DOGSTATSD_MAPPER_PROFILES
value: >
[{"prefix":"airflow.","name":"airflow","mappings":[{"name":"airflow.job.start","match":"airflow.*_start","tags":{"job_name":"$1"}},{"name":"airflow.job.end","match":"airflow.*_end","tags":{"job_name":"$1"}},{"name":"airflow.job.heartbeat.failure","match":"airflow.*_heartbeat_failure","tags":{"job_name":"$1"}},{"name":"airflow.operator_failures","match":"airflow.operator_failures_*","tags":{"operator_name":"$1"}},{"name":"airflow.operator_successes","match":"airflow.operator_successes_*","tags":{"operator_name":"$1"}},{"match_type":"regex","name":"airflow.dag_processing.last_runtime","match":"airflow\\.dag_processing\\.last_runtime\\.(.*)","tags":{"dag_file":"$1"}},{"match_type":"regex","name":"airflow.dag_processing.last_run.seconds_ago","match":"airflow\\.dag_processing\\.last_run\\.seconds_ago\\.(.*)","tags":{"dag_file":"$1"}},{"match_type":"regex","name":"airflow.dag.loading_duration","match":"airflow\\.dag\\.loading-duration\\.(.*)","tags":{"dag_file":"$1"}},{"name":"airflow.dagrun.first_task_scheduling_delay","match":"airflow.dagrun.*.first_task_scheduling_delay","tags":{"dag_id":"$1"}},{"name":"airflow.pool.open_slots","match":"airflow.pool.open_slots.*","tags":{"pool_name":"$1"}},{"name":"airflow.pool.queued_slots","match":"pool.queued_slots.*","tags":{"pool_name":"$1"}},{"name":"airflow.pool.running_slots","match":"pool.running_slots.*","tags":{"pool_name":"$1"}},{"name":"airflow.pool.used_slots","match":"airflow.pool.used_slots.*","tags":{"pool_name":"$1"}},{"name":"airflow.pool.starving_tasks","match":"airflow.pool.starving_tasks.*","tags":{"pool_name":"$1"}},{"match_type":"regex","name":"airflow.dagrun.dependency_check","match":"airflow\\.dagrun\\.dependency-check\\.(.*)","tags":{"dag_id":"$1"}},{"match_type":"regex","name":"airflow.dag.task.duration","match":"airflow\\.dag\\.(.*)\\.([^.]*)\\.duration","tags":{"dag_id":"$1","task_id":"$2"}},{"match_type":"regex","name":"airflow.dag_processing.last_duration","match":"airflow\\.dag_processing\\.last_duration\\.(.*)","tags":{"dag_file":"$1"}},{"match_type":"regex","name":"airflow.dagrun.duration.success","match":"airflow\\.dagrun\\.duration\\.success\\.(.*)","tags":{"dag_id":"$1"}},{"match_type":"regex","name":"airflow.dagrun.duration.failed","match":"airflow\\.dagrun\\.duration\\.failed\\.(.*)","tags":{"dag_id":"$1"}},{"match_type":"regex","name":"airflow.dagrun.schedule_delay","match":"airflow\\.dagrun\\.schedule_delay\\.(.*)","tags":{"dag_id":"$1"}},{"name":"airflow.scheduler.tasks.running","match":"scheduler.tasks.running"},{"name":"airflow.scheduler.tasks.starving","match":"scheduler.tasks.starving"},{"name":"airflow.sla_email_notification_failure","match":"sla_email_notification_failure"},{"match_type":"regex","name":"airflow.dag.task_removed","match":"airflow\\.task_removed_from_dag\\.(.*)","tags":{"dag_id":"$1"}},{"match_type":"regex","name":"airflow.dag.task_restored","match":"airflow\\.task_restored_to_dag\\.(.*)","tags":{"dag_id":"$1"}},{"name":"airflow.task.instance_created","match":"airflow.task_instance_created-*","tags":{"task_class":"$1"}},{"name":"airflow.ti.start","match":"ti.start.*.*","tags":{"dagid":"$1","taskid":"$2"}},{"name":"airflow.ti.finish","match":"ti.finish.*.*.*","tags":{"dagid":"$1","state":"$3","taskid":"$2"}}]}]
I've got a docker compose service with a bunch of containers and I am attempting to collect both the docker container metrics from these containers but also the container host metrics from the Ubuntu server the containers are running on. I'm getting the docker container stats but I am not getting the Ubuntu container host metrics. The stats from the non-docker based input plugins (inputs.diskio,inputs.mem, etc) are from the telegraf container.
I found this and opened up the volumes but still nothing: https://community.influxdata.com/t/how-can-we-collect-host-machine-metrics-while-telegraf-is-running-in-docker-container/12005
Here is my compose file:
version: "3"
services:
telegraf:
image: telegraf:1.20.3
volumes:
- ./telegraf.conf:/etc/telegraf/telegraf.conf:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
- /sys:/rootfs/sys:ro
- /proc:/rootfs/proc:ro
- /etc:/rootfs/etc:ro
environment:
HOST_PROC: /rootfs/proc
HOST_SYS: /rootfs/sys
HOST_ETC: /rootfs/etc
vote:
build: ./vote
# use python rather than gunicorn for local dev
command: python app.py
depends_on:
- redis
volumes:
- ./vote:/app
ports:
- "5000:80"
networks:
- front-tier
- back-tier
result:
build: ./result
# use nodemon rather than node for local dev
command: nodemon server.js
depends_on:
- db
volumes:
- ./result:/app
ports:
- "5001:80"
- "5858:5858"
networks:
- front-tier
- back-tier
worker:
build:
context: ./worker
depends_on:
- redis
- db
networks:
- back-tier
redis:
image: redis:5.0-alpine3.10
volumes:
- "./healthchecks:/healthchecks"
healthcheck:
test: /healthchecks/redis.sh
interval: "5s"
ports: ["6379"]
networks:
- back-tier
db:
image: postgres:9.4
environment:
POSTGRES_USER: "postgres"
POSTGRES_PASSWORD: "postgres"
volumes:
- "db-data:/var/lib/postgresql/data"
- "./healthchecks:/healthchecks"
healthcheck:
test: /healthchecks/postgres.sh
interval: "5s"
networks:
- back-tier
volumes:
db-data:
networks:
front-tier:
back-tier:
Here is the agent config:
[agent]
interval = "10s"
[[inputs.mem]]
[[inputs.disk]]
## Ignore mount points by filesystem type.
ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.ethtool]]
[[inputs.procstat]]
pattern = ".*"
[[inputs.docker]]
endpoint = "unix:///var/run/docker.sock"
gather_services = false
container_names = []
source_tag = true
container_name_include = []
container_name_exclude = []
timeout = "5s"
perdevice = true
docker_label_include = []
docker_label_exclude = []
[[inputs.cpu]]
percpu = true
totalcpu = true
collect_cpu_time = true
report_active = true
How do I get the container host metrics??
I'm trying to connect dapr with nats with jetstream functionality enabled.
I want to start everything with docker-compose. Nats service is started and when I run nats-cli with command nats -s "nats://localhost:4222" server check jetstream, I get OK JetStream | memory=0B memory_pct=0%;75;90 storage=0B storage_pct=0%;75;90 streams=0 streams_pct=0% consumers=0 consumers_pct=0% indicating nats with jetstream is working ok.
Unfortunately, dapr returns first warning then error
warning: error creating pub sub %!s(*string=0xc0000ca020) (pubsub.jetstream/v1): couldn't find message bus pubsub.jetstream/v1" app_id=conversation-api1 instance=50b51af8e9a8 scope=dapr.runtime type=log ver=1.3.0
error: process component conversation-pubsub error: couldn't find message bus pubsub.jetstream/v1" app_id=conversation-api1 instance=50b51af8e9a8 scope=dapr.runtime type=log ver=1.3.0
I followed instructions on official site.
docker-compose.yaml
version: '3.4'
services:
conversation-api1:
image: ${DOCKER_REGISTRY-}conversationapi1
build:
context: .
dockerfile: Conversation.Api1/Dockerfile
ports:
- "5010:80"
conversation-api1-dapr:
container_name: conversation-api1-dapr
image: "daprio/daprd:latest"
command: [ "./daprd", "--log-level", "debug", "-app-id", "conversation-api1", "-app-port", "80", "--components-path", "/components", "-config", "/configuration/conversation-config.yaml" ]
volumes:
- "./dapr/components/:/components"
- "./dapr/configuration/:/configuration"
depends_on:
- conversation-api1
- redis
- nats
network_mode: "service:conversation-api1"
nats:
container_name: "Nats"
image: nats
command: [ "-js", "-m", "8222" ]
ports:
- "4222:4222"
- "8222:8222"
- "6222:6222"
# OTHER SERVICES...
conversation-pubsub.yaml
apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
name: conversation-pubsub
namespace: default
spec:
type: pubsub.jetstream
version: v1
metadata:
- name: natsURL
value: "nats://host.docker.internal:4222" # already tried with nats for host
- name: name
value: "conversation"
- name: durableName
value: "conversation-durable"
- name: queueGroupName
value: "conversation-group"
- name: startSequence
value: 1
- name: startTime # in Unix format
value: 1630349391
- name: deliverAll
value: false
- name: flowControl
value: false
conversation-config.yaml
apiVersion: dapr.io/v1alpha1
kind: Configuration
metadata:
name: config
namespace: default
spec:
tracing:
samplingRate: "1"
zipkin:
endpointAddress: "http://zipkin:9411/api/v2/spans"
The problem was in old Dapr version. I used version 1.3.0, Jetstream support is introduced in 1.4.0+. Pulling latest version of daprio/daprd fixed my problem. Also no need for nats://host.docker.internal:4222, nats://nats:4222 works as expected.
I have:
a simple Python app ("iss-web") writing JSON log output to stdout
the Python app ("iss-web") is within a Docker Container
the Python app ("iss-web") Container logging driver is set to "fluentd"
a separate Container running "fluent/fluent-bit:1.7" to collect the Python app JSON log output
Loki 2.2.1 deployed via a Container to receive the Python app log output from fluent-bit
Grafana connected to Loki to visualize the log data
The issue is that the "log" field is not filtered/parsed by fluent-bit, therefore in Loki/Grafana the content of the "log" field is not parsed and used as "Detected fields".
"iss-web" docker-compose.yml
version: '3'
services:
iss-web:
build: ./iss-web
image: iss-web
container_name: iss-web
env_file:
- ./iss-web/app.env
ports:
- 46664:46664
logging:
driver: fluentd
options:
tag: iss.web
redis:
image: redis
container_name: redis
ports:
- 6379:6379
logging:
driver: "json-file"
options:
max-file: ${LOG_EXPIRE}
max-size: ${LOG_SEGMENT}
"fluent-bit" docker-compose.yml
version: '3'
services:
fluent-bit:
image: fluent/fluent-bit:1.7
container_name: fluent-bit
environment:
- LOKI_URL=http://135.86.186.75:3100/loki/api/v1/push
user: root
volumes:
- ./fluent-bit.conf:/fluent-bit/etc/fluent-bit.conf
- ./parsers.conf:/fluent-bit/etc/parsers.conf
ports:
- "24224:24224"
- "24224:24224/udp"
fluent-bit.conf
[SERVICE]
Flush 1
Daemon Off
log_level debug
Parsers_File /fluent-bit/etc/parsers.conf
[INPUT]
Name forward
Listen 0.0.0.0
port 24224
[FILTER]
Name parser
Match iss.web
Key_Name log
Parser docker
Reserve_Data On
Preserve_Key On
[OUTPUT]
Name loki
Match *
host 135.86.186.75
port 3100
labels job=fluentbit
[OUTPUT]
Name stdout
Match *
parsers.conf
I've tried with/without Time_Key, Time_Format, Time_Keep
[PARSER]
Name docker
Format json
#Time_Key time
#Time_Format %Y-%m-%dT%H:%M:%S.%L
#Time_Keep On
# Command | Decoder | Field | Optional Action
# =============|==================|=================
#Decode_Field_As escaped_utf8 log do_next
Decode_Field_As json log
fluent-bit log extract
[0] iss.web: [1620640820.000000000, {"log"=>"{'timestamp': '2021:05:10 11:00:20.439513', 'epoch': 1620640820.4395688, 'pid': 1, 'level': 'DEBUG', 'message': '/ping', 'data': {'message': 'PONG', 'timestamp': '1620640820.4394963', 'version': '0.1'}}", "container_id"=>"bffd720e9ac1e8c3992c1120eed37e00c536cd44ec99e9c13cf690d840363f80", "container_name"=>"/iss-web", "source"=>"stdout"}]
Grafana/Loki screen
I would expect the "Detected fields" to contain pid=1, message=/ping etc
I needed a "json.dumps" in my "logger":
def log(message, level="INFO", **extra):
out = {"timestamp": get_now(), "epoch": get_epoch(), "pid": get_pid(), "level": level, "message": message}
if extra: out |= extra
print(json.dumps(out), flush=True)
return True
I have had success in both instantiating a traefik container, as well as 4 other nginx containers to serve applications that route my subdomains to each individual service. The routing works, and I am using [acme] for certificate generation, but everytime i try to go to any of my subdomains chrome still gives me an error saying "this connection isn't trusted", and then I have to hit advanced and proceed. The individual applications load fine, but there's something wrong with the certificates.
I have tried clearing the acme.json file to no avail. I had also played around with enabling onDemand in the traefick.toml but that didn't work either.
Please help?
traefik.toml
# defaultEntryPoints must be at the top
# because it should not be in any table below
defaultEntryPoints = ["http", "https"]
# Entrypoints, http and https
[entryPoints]
# http should be redirected to https
[entryPoints.http]
address = ":80"
[entryPoints.http.redirect]
entryPoint = "https"
# https is the default
[entryPoints.https]
address = ":443"
[entryPoints.https.tls]
# Enable ACME (Let's Encrypt): automatic SSL
[acme]
email = "chris#myubercode.com"
storage = "./acme.json"
entryPoint = "https"
OnHostRule = true
acmeLogging = true
caServer = "https://acme-v02.api.letsencrypt.org/directory"
[acme.httpChallenge]
entryPoint = "http"
[acme.dnsChallenge]
provider = "digitalocean"
delayBeforeCheck = 0
[[acme.domains]]
main = "cswilson.site"
sans = ["profile.cswilson.site", "ecommerce.cswilson.site", "fitness.cswilson.site", "biosite.cswilson.site"]
traefikLogsFile = "/tmp/traefik.log"
logLevel = "DEBUG"
[accessLog]
filePath = "/tmp/access.log"
[docker]
endpoint = "unix:///var/run/docker.sock"
domain = "cswilson.site"
watch = true
exposedbydefault = false
docker-compose.yml (for the traefik container):
version: '3'
services:
traefik:
image: traefik
command: --docker
ports:
- "80:80"
- "443:443"
restart: always
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
- "./traefik.toml:/traefik.toml"
- "./acme.json:/acme.json"
networks:
- default
And here is the docker-compose.yml for the 4 different application containers:
version: '3'
services:
profile:
build: .
image: nginx
labels:
- "traefik.enabled=true"
- "traefik.backend=profile"
- "traefik.frontend.rule=Host:profile.cswilson.site"
- "traefik.frontend.entryPoinst=http,https"
restart: always
networks:
- "traefik_default"
fitness:
build: .
image: nginx
labels:
- "traefik.enabled=true"
- "traefik.backend=fitness"
- "traefik.frontend.rule=Host:fitness.cswilson.site"
- "traefik.frontend.entryPoinst=http,https"
restart: always
networks:
- "traefik_default"
ecommerce:
build: .
image: nginx
labels:
- "traefik.enabled=true"
- "traefik.backend=ecommerce"
- "traefik.frontend.rule=Host:ecommerce.cswilson.site"
- "traefik.port=80"
restart: always
networks:
- "traefik_default"
biosite:
build: .
image: nginx
labels:
- "traefik.enabled=true"
- "traefik.backend=ecommerce"
- "traefik.frontend.rule=Host:biosite.cswilson.site"
- "traefik.port=80"
restart: always
networks:
- "traefik_default"
networks:
traefik_default:
external:
name: traefik_default
I am new to docker and just found traefik this morning, and I don't really know if I need some sort of a real certificate to put into
[[entryPoints.http.tls.certificates]]
Any help is greatly appreciated, thank you