How to collect jolokia data via telegraf but just if the jolokia connection is active? - telegraf

When my application is up, telegraf works fine and collects data related to jolokia since my application opens the port 11722 that telegraf uses to get the metrics. But then, when my application is down, telegraf starts to get errors since it can't connect to Jolokia. My telegraf version is 1.5.3 and this is a Production environment, so I don't have much flexibility to change the version. Is there a way to collect the jolokia metrics just when my application is up and running?
I've tried to create a script to check if jolokia was running and use with a tag that then I could use with my agent, but this didn't work:
[[inputs.exec]]
commands = ["sh /local/1/home/svcegctp/telegraf/inputs/scripts/check_jolokia.sh"]
timeout = "1s"
data_format = "influx"
name_override = "jvm_status"
[inputs.exec.tags]
running = "true"
(...)
[[inputs.jolokia2_agent]]
# Add agents URLs to query
urls = ["http://localhost:11722/jolokia"]
[inputs.jolokia2_agent.tags]
running = "true"
This is my script:
check_jolokia.sh
#!/bin/bash
if curl -s -u <username>:<password> http://localhost:11722/jolokia/version >/dev/null 2>&1; then
echo "jvm_status running=true"
else
echo "jvm_status running=false"
fi

Related

airflow webserver suddenly stopped after long time of no issues, "No response from gunicorn"

Have had airflow webserver -D deamon process (v1.10.7) running on machine (CentOS 7) for long time. Suddenly saw that the webserver could no longer be accessed and checking the airflow-webserver.log saw...
[airflow#airflowetl airflow]$ cat airflow-webserver.log
2020-10-23 00:57:15,648 ERROR - No response from gunicorn master within 120 seconds
2020-10-23 00:57:15,649 ERROR - Shutting down webserver
(nothing of note in airflow-webserver.err)
[airflow#airflowetl airflow]$ cat airflow-webserver.err
/home/airflow/.local/lib/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
""")
The airflow.cfg values for the webserver section looks like...
[webserver]
# The base url of your website as airflow cannot guess what domain or
# cname you are using. This is used in automated emails that
# airflow sends to point links to the right web server
#base_url = http://localhost:8080
base_url = http://airflowetl.co.local:8080
# The ip specified when starting the web server
web_server_host = 0.0.0.0
# The port on which to run the web server
web_server_port = 8080
# Paths to the SSL certificate and key for the web server. When both are
# provided SSL will be enabled. This does not change the web server port.
web_server_ssl_cert =
web_server_ssl_key =
# Number of seconds the webserver waits before killing gunicorn master that doesn't respond
web_server_master_timeout = 120
# Number of seconds the gunicorn webserver waits before timing out on a worker
#web_server_worker_timeout = 120
web_server_worker_timeout = 300
# Number of workers to refresh at a time. When set to 0, worker refresh is
# disabled. When nonzero, airflow periodically refreshes webserver workers by
# bringing up new ones and killing old ones.
worker_refresh_batch_size = 1
# Number of seconds to wait before refreshing a batch of workers.
worker_refresh_interval = 30
# Secret key used to run your flask app
secret_key = my_key
# Number of workers to run the Gunicorn web server
workers = 4
# The worker class gunicorn should use. Choices include
# sync (default), eventlet, gevent
worker_class = sync
Ultimately, just restarted the process as a daemon again (airflow webserver -D (should I have deleted the old airflow-webserer.log and .err files first?)), but not sure what would make this happen, since it had had no problems running for months before this.
Could anyone with more experience explain what could have happened after all this time and how I could prevent it in the future? Any issues with running dags or anything else that I should check for that this temporary unexpected shutdown of the websever may have caused?
I am experiencing the same issue, and it only started (very unfrequently) when I changed the following two config parameters in the webserver.
worker_refresh_interval = 120
workers = 2
However, my parameters are also set quite differently than yours, will share them here.
rbac = True
web_server_host = 0.0.0.0
web_server_port = 8080
web_server_master_timeout = 600
web_server_worker_timeout = 600
default_ui_timezone = Europe/Amsterdam
reload_on_plugin_change = True
After comparing the two, as your settings of the two I changed were set to the default (same as me before changing them), it seems that it is a combination of more parameters.

Airflow metrics with prometheus and grafana

any one knows how to send metrics from airflow to prometheus, I'm not finding much documents about it, I tried the airflow operator metrics on Grafana but it doesnt show any metrics and all it says no data points.
By default, Airflow doesn't have any support for Prometheus metrics. There are two ways I can think of to get metrics in Prometheus.
Enable StatsD metrics and then export it to Prometheus using statsd exporter.
Install third-party/open-source Prometheus exporter agents (ex. airflow-exporter).
If you are going with 2nd approach then the Airflow Helm Chart also provides support for that.
Edit
If you're using statsd exporter here is a good resource for Grafana Dashboard and exporter config.
This is how it worked for me -
Running airflow in docker using this doc
Added this configuration inside the docker-compose file downloaded in the previous step AIRFLOW__SCHEDULER__STATSD_ON: 'true'
AIRFLOW__SCHEDULER__STATSD_HOST: statsd-exporter
AIRFLOW__SCHEDULER__STATSD_PORT: 9125
AIRFLOW__SCHEDULER__STATSD_PREFIX: airflow
Under environment section
Now run the statsd_export
docker run -d -p 9102:9102 -p 9125:9125 -p 9125:9125/udp \ -v $PWD/statsd_mapping.yml:/tmp/statsd_mapping.yml \ prom/statsd-exporter --statsd.mapping-config=/tmp/statsd_mapping.yml
Get the statsd_mapping.yml contents from Here
Now do docker-compose up to run the airflow and try to run some worflow and you should see logs at http://localhost:9102/metrics
If you installed your Airflow with statsd support:
pip install 'apache-airflow[statsd]'
you can expose Airflow statsd metrics in the scheduler section of your airflow.cfg file, something like this:
[scheduler]
statsd_on = True
statsd_host = localhost
statsd_port = 8125
statsd_prefix = airflow
Then, you can install a tool called statsd_exporter, that captures statsd-format metrics and converts them to Prometheus-format, making them available at the /metrics endpoint for Prometheus to scrape.
There is a docker image available on DockerHub called astronomerinc/ap-statsd-exporter that already maps Airflow statsd metrics to Prometheus metrics.
References:
https://airflow.apache.org/docs/stable/metrics.html
https://github.com/prometheus/statsd_exporter
https://hub.docker.com/r/astronomerinc/ap-statsd-exporter/tags

How to deploy a python WSGI Flask Application using nginx without manually running uwsgi?

So, I am right now at this point. The webpage can be accessed without any errors and without using any specific port. Example: www.my-example.com.
But, this works only when I run the command "uwsgi --socket 0.0.0.0:4567 --protocol=http -w wsgi" in my server.
How to automate this app deployment through nginx?
You can use something like Supervisor to automatically start uWSGI, restart it if it fails, and log stderr/stdout:
[program:app]
# emulates a virtualenv
directory = /srv/app/
environment = PATH="/srv/app/virtualenv/bin"
command = /srv/app/virtualenv/bin/uwsgi --ini /srv/app/config/uwsgi.ini
autostart = true
autorestart = true
user = app-user

Run R/Rook as a web server on startup

I have created a server using Rook in R - http://cran.r-project.org/web/packages/Rook
Code is as follows
#!/usr/bin/Rscript
library(Rook)
s <- Rhttpd$new()
s$add(
name="pingpong",
app=Rook::URLMap$new(
'/ping' = function(env){
req <- Rook::Request$new(env)
res <- Rook::Response$new()
res$write(sprintf('<h1>Pong</h1>',req$to_url("/pong")))
res$finish()
},
'/pong' = function(env){
req <- Rook::Request$new(env)
res <- Rook::Response$new()
res$write(sprintf('<h1>Ping</h1>',req$to_url("/ping")))
res$finish()
},
'/?' = function(env){
req <- Rook::Request$new(env)
res <- Rook::Response$new()
res$redirect(req$to_url('/pong'))
res$finish()
}
)
)
## Not run:
s$start(port=9000)
$ ./Rook.r
Loading required package: tools
Loading required package: methods
Loading required package: brew
starting httpd help server ... done
Server started on host 127.0.0.1 and port 9000 . App urls are:
http://127.0.0.1:9000/custom/pingpong
Server started on 127.0.0.1:9000
[1] pingpong http://127.0.0.1:9000/custom/pingpong
Call browse() with an index number or name to run an application.
$
And the process ends here.
Its running fine in the R shell but then i want to run it as a server on system startup.
So once the start is called , R should not exit but wait for requests on the port.
How will i convince R to simply wait or sleep rather than exiting ?
I can use the wait or sleep function in R to wait some N seconds , but that doesnt fit the bill perfectly
Here is one suggestion:
First split the example you gave into (at least) two files: One file contains the definition of the application, which in your example is the value of the app parameter to the Rhttpd$add() function. The other file is the RScript that starts the application defined in the first file.
For example, if the name of your application function is named pingpong defined in a file named Rook.R, then the Rscript might look something like:
#!/usr/bin/Rscript --default-packages=methods,utils,stats,Rook
# This script takes as a single argument the port number on which to listen.
args <- commandArgs(trailingOnly=TRUE)
if (length(args) < 1) {
cat(paste("Usage:",
substring(grep("^--file=", commandArgs(), value=T), 8),
"<port-number>\n"))
quit(save="no", status=1)
} else if (length(args) > 1)
cat("Warning: extra arguments ignored\n")
s <- Rhttpd$new()
app <- RhttpdApp$new(name='pingpong', app='Rook.R')
s$add(app)
s$start(port=args[1], quiet=F)
suspend_console()
As you can see, this script takes one argument that specifies the listening port. Now you can create a shell script that will invoke this Rscript multiple times to start multiple instances of your server listening on different ports in order to enable some concurrency in responding to HTTP requests.
For example, if the Rscript above is in a file named start.r then such a shell script might look something like:
#!/bin/sh
if [ $# -lt 2 ]; then
echo "Usage: $0 <start-port> <instance-count>"
exit 1
fi
start_port=$1
instance_count=$2
end_port=$((start_port + instance_count - 1))
fifo=/tmp/`basename $0`$$
exit_command="echo $(basename $0) exiting; rm $fifo; kill \$(jobs -p)"
mkfifo $fifo
trap "$exit_command" INT TERM
cd `dirname $0`
for port in $(seq $start_port $end_port)
do ./start.r $port &
done
# block until interrupted
read < $fifo
The above shell script takes two arguments: (1) the lowest port-number to listen on and (2) the number of instances to start. For example, if the shell script is in an executable file named start.sh then
./start.sh 9000 3
will start three instances of your Rook application listening on ports 9000, 9001 and 9002, respectively.
You see the last line of the shell script reads from the fifo which prevents the script from exiting until caused to by a received signal. When one of the specified signals is trapped, the shell script kills all the Rook server processes that it started before it exits.
Now you can configure a reverse proxy to forward incoming requests to any of the server instances. For example, if you are using Nginx, your configuration might look something like:
upstream rookapp {
server localhost:9000;
server localhost:9001;
server localhost:9002;
}
server {
listen your.ip.number.here:443;
location /pingpong/ {
proxy_pass http://rookapp/custom/pingpong/;
}
}
Then your service can be available on the public Internet.
The final step is to create a control script with options such as start (to invoke the above shell script) and stop (to send it a TERM signal to stop your servers). Such a script will handle things such as causing the shell script to run as a daemon and keeping track of its process id number. Install this control script in the appropriate location and it will start your Rook application servers when the machine boots. How to do that will depend on your operating system, the identity of which is missing from your question.
Notes
For an example of how the fifo in the shell script can be used to take different actions based on received signals, see this stack overflow question.
Jeffrey Horner has provided an example of a complete Rook server application.
You will see that the example shell script above traps only INT and TERM signals. I chose those because INT results from typing control-C at the terminal and TERM is the signal used by control scripts on my operating system to stop services. You might want to adjust the choice of signals to trap depending on your circumstances.
Have you tried this?
while (TRUE) {
Sys.sleep(0.5);
}

Openstack-Keystone failing to start

I've tried almost everything in the past couple of days to get keystone running to no avail.
The setup is all on the same host, the virtualization and openstack and keystone are all on the same host, so I've tried setting up keystone with 127.0.0.1 and localhost and the IP of the host with no luck
[DEFAULT] log_file = /var/log/keystone/keystone.log
admin_token = ***
bind_host = 192.168.33.11
public_port = 5000
admin_port = 35357
compute_port = 8774
# === Logging Options ===
# Print debugging output verbose = True
# Print more verbose output
# (includes plaintext request logging, potentially including passwords)
# debug = False
# Name of log file to output to. If not set, logging will go to stdout. log_file = keystone.log
# The directory to keep log files in (will be prepended to --logfile) log_dir = /var/log/keystone
# Use syslog for logging.
# use_syslog = False
# syslog facility to receive log lines
# syslog_log_facility = LOG_USER
# If this option is specified, the logging configuration file specified is
# used and overrides any other logging options specified. Please see the
# Python logging module documentation for details on logging configuration
# files. log_config = logging.conf
# A logging.Formatter log message format string which may use any of the
# available logging.LogRecord attributes.
# log_format = %(asctime)s %(levelname)8s [%(name)s] %(message)s
# Format string for %(asctime)s in log records.
# log_date_format = %Y-%m-%d %H:%M:%S
# onready allows you to send a notification when the process is ready to serve
# For example, to have it notify using systemd, one could set shell command:
# onready = systemd-notify --ready
# or a module with notify() method:
# onready = keystone.common.systemd
[sql] connection = mysql://keystone:***#localhost/keystone
# idle_timeout = 200
[identity] driver = keystone.identity.backends.sql.Identity
[catalog] template_file = /etc/keystone/default_catalog.templates driver = keystone.catalog.backends.sql.Catalog
# dynamic, sql-based backend (supports API/CLI-based management commands)
# driver = keystone.catalog.backends.sql.Catalog
# static, file-based backend (does *NOT* support any management commands)
# driver = keystone.catalog.backends.templated.TemplatedCatalog
# template_file = default_catalog.templates
[token] driver = keystone.token.backends.sql.Token
# driver = keystone.token.backends.kvs.Token
# Amount of time a token should remain valid (in seconds)
# expiration = 86400
I've enabled logging in the logging.conf file and set the level to DEBUG and INFO, however nothing in log files.
[root#* keystone]# service openstack-keystone restart
Stopping keystone: [FAILED]
Starting keystone: [ OK ]
[root#* keystone]# service openstack-keystone restart
Stopping keystone: [FAILED]
Starting keystone: [ OK ]
[root#* keystone]# ps aux | grep keystone
root 25580 0.0 0.0 103236 880 pts/1 S+ 09:41 0:00 grep keystone
[root#* keystone]#
Any ideas will be greatly appreciated.Thank you
As I mentioned in the comment, I've never seen a config file with the section headings on the same line as config option:
[DEFAULT] log_file = /var/log/keystone/keystone.log
I've also seen it like this instead:
[DEFAULT]
log_file = /var/log/keystone/keystone.log
However, I have no idea if this is related to your issue.
To enable debug-level logging, make sure you set the following in /etc/keystone/logging.conf:
[logger_root]
level=DEBUG
Then try running keystone manually instead of as a service:
$ sudo -u keystone bash
$ HOME=/var/lib/keystone keystone-all --debug
Hopefully you'll see a relevant error message on standard out.
(I believe it will still send the logging to /var/log/keystone/keystone.log, not sure how to actually get it to log to standard out when running manually like this).
Add a valid token for admin_token. It should not be "*".
Check the below line:
[sql] connection = mysql://keystone:*#localhost/keystone
It should be something like:
connection = mysql://keystone:keystone#localhost/keystone
Refer to this url for an example keystone.conf file
http://docs.openstack.org/trunk/openstack-compute/install/yum/content/keystone-conf-file.html
I ran into this issue as well. I am running on Ubuntu 12.04LTS. What i found was the the service start command in /etc/init/keystone.conf is using start-stop-daemon to run the service. It was written for a newer version than the one on my box. The --chdir variable is not accepted as an input. once i removed that line keystone started right up.
Try running:
start-stop-daemon --start --chuid keystone --name keystone --exec /usr/bin/keystone-all
/etc/init/keystone.conf after
description "Keystone API server"
author "Soren Hansen <soren#linux2go.dk>"
start on runlevel [2345]
stop on runlevel [!2345]
respawn
exec start-stop-daemon --start --chuid keystone \
--name keystone \
--exec /usr/bin/keystone-all
Check if your IP-adress is equal to HOST_IP=... in localrc
This might be due to keystone not getting started properly and therefore port 35357 is not in listening mode.
This seems to be anomalous behavior of service keystone.
I am mentioning steps which have worked on my system for havana installtion on Ubuntu 12.04 Kernel version 3.2.0-67-generic. After a day of headache around this issue. Try these steps, preferably in the same order.
1) Remove keystone package:-
apt-get remove keystone
2) Reboot your system
reboot
3) After reboot again INSTALL KEYSTONE.
apt-get install keystone
4) Check status of keystone service
service keystone status
It will show start/running
5) Now do the necessary changes you want to do in /etc/keystone/keystone.conf
after making changes in conf file DO NOT RESTART KEYSTONE SERVICE
Use stop and start command to make an effect of restart but don't restart.
service keystone stop
service keystone start
For further help, pasting a dump of my CLI :-
http://pastebin.com/sduuFCL7
There are multiple problems with the icehouse documentations and install. packstack is broken so the only way to get started is to manually follow the upstream docs for your distro. keystone is very important to set up first correctly before moving on, because other services rely on it.
the paste-file /usr/share/keystone/keystone-dist-paste.ini should be copied to /etc/ to be accessible to the config scripts like this:
cp /usr/share/keystone/keystone-dist-paste.ini /etc/keystone/
chown keystone:keystone /etc/keystone/*
make sure to update keystone.conf with the new config_file value
documentation is wrong about the mysql connection, it should go to [sql] and not [database] so:
openstack-config --set /etc/keystone/keystone.conf sql connection mysql://keystone:PASSWD#controller/keystone
the name controller should be resolved to whatever mysql is bound to, I will add it to /etc/hosts like this if [mysqld]/bind-address in /etc/my.cnf is 10.1.1.100:
10.1.1.100 controller
make sure to uncomment log_file in keystone.conf to get what is happening.
I was facing similar issue.I followed below mentioned steps and openstack-keystone service got started.
Edit the /etc/keystone/keystone.conf file and complete the following actions:
In the [DEFAULT] section
[DEFAULT]
admin_token = ADMIN_TOKEN
In the [database] section
[database]
connection = mysql://keystone:KEYSTONE_DBPASS#controller/keystone
In the [token] section, configure the UUID token provider and SQL driver
[token]
provider = keystone.token.providers.uuid.Provider
driver = keystone.token.persistence.backends.sql.Token
In the [revoke] section
[revoke]
driver = keystone.contrib.revoke.backends.sql.Revoke
After making above changes populate the Identity service database using command
su -s /bin/sh -c "keystone-manage db_sync" keystone
Start the openstack-keystone service using below command
systemctl start openstack-keystone

Resources