StatsD displaying "go_gc_duration" metrics but not the airflow metrics - airflow

We have installed statsD exporter by following steps..
pip install 'apache-airflow[statsd]'
Added below config details
statsd_on = True
statsd_host = localhost
statsd_port = 8125
statsd_prefix = airflow
And then downloaded the below zip file
https://github.com/prometheus/statsd_exporter/releases/download/v0.21.0/statsd_exporter-0.21.0.linux-amd64.tar.gz
And then executed the below command
./statsd_exporter --statsd.listen-udp localhost:8125 &
statsd metrics is started but it is showing the "go_gc_duration" stats like below but airflow stats are not being displayed.
go_gc_duration_seconds{quantile="0"} 1.1717e-05
go_gc_duration_seconds{quantile="0.25"} 1.5126e-05
go_gc_duration_seconds{quantile="0.5"} 1.9535e-05
go_gc_duration_seconds{quantile="0.75"} 4.3568e-05
go_gc_duration_seconds{quantile="1"} 0.000384508
go_gc_duration_seconds_sum 669.380082897
go_gc_duration_seconds_count 8.186899e+06
but expected metrics should be as below
# HELP airflow_collect_dags Metric autogenerated by statsd_exporter.
# TYPE airflow_collect_dags gauge
airflow_collect_dags 50.056391
# HELP airflow_dag_loading_duration_example_bash_operator Metric
autogenerated by statsd_exporter.
# TYPE airflow_dag_loading_duration_example_bash_operator summary
airflow_dag_loading_duration_example_bash_operator{quantile="0.5"} 1.108e-06
airflow_dag_loading_duration_example_bash_operator{quantile="0.9"} 4.942e-06
is there something we have to do changes or missed, pls any suggestions

Actually or not, by reference https://quay.io/repository/prometheus/statsd-exporter
./statsd_exporter --statsd.listen-udp localhost:8125 --log.level debug

Same thing was happening to me. In the airflow.cfg there is another option called statsd_allow_list = airflow: I deleted it, restarted AF and it's working

Related

Dataflow from Colab issue

I'm trying to run a Dataflow job from Colab and getting the following worker error:
sdk_worker_main.py: error: argument --flexrs_goal: invalid choice: '/root/.local/share/jupyter/runtime/kernel-1dbd101c-a79e-432e-89b3-5ba68df104d7.json' (choose from 'COST_OPTIMIZED', 'SPEED_OPTIMIZED')
I haven't provided the flexrs_goal argument, and if I do it doesn't fix this issue. Here are my pipeline options:
beam_options = PipelineOptions(
runner='DataflowRunner',
project=...,
job_name=...,
temp_location=...,
subnetwork='regions/us-west1/subnetworks/default',
region='us-west1'
)
My pipeline is very simple, it's just:
with beam.Pipeline(options=beam_options) as pipeline:
(pipeline
| beam.io.ReadFromBigQuery(
query=f'SELECT column FROM {BQ_TABLE} LIMIT 100')
| beam.Map(print))
It looks like the command line args for the sdk worker are getting polluted by jupyter somehow. I've rolled back to the past two apache-beam library versions and it hasn't helped. I could move over to Vertex Workbench but I've invested a lot in this Colab notebook (plus I like the easy sharing) and I'd rather not migrate.
Figured it out. The PipelineOptions constructor will pull in sys.argv if no parameter is given for the first argument (called flags). In my case it was pulling in the command line args that my jupyter notebook was started with and passing them as Beam options to the workers.
I fixed my issue by doing this:
beam_options = PipelineOptions(
flags=[],
...
)

How to change the interval of a plugin in telegraf?

Using: telegraf version 1.23.1
Thats the workflow Telegraf => Influx => Grafana.
I am using telegraf to check my metrics on a shared server. So far so good, i already could initalize the Telegraf uWSGI Plugin and display the data of my running django projects in grafana.
Problem
Now i wanted to check some folder size too with the [[inputs.filecount]] Telegraf Plugin and this works also well. However i do not need Metrics for every 10s for this plugin. So i change the interval like mentioned in the Documentation in the [[inputs.filecount]] Plugin.
telegraf.conf
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "5s"
flush_interval = "10s"
flush_jitter = "0s"
#... PLUGIN
[[inputs.filecount]]
# set different interval for this input plugin every 10min
interval=“600s”
collection_jitter=“20s”
# Default from Doc =>
directories = ["/home/myserver/logs", "/home/someName/growingData, ]
name = "*"
recursive = true
regular_only = false
follow_symlinks = false
size = "0B"
mtime = "0s"
After restarting Telegram with Supervisor it crashed because it could not parse the new lines.
supervisor.log
Error running agent: Error loading config file /home/user/etc/telegraf/telegraf.conf: Error parsing data: line 208: invalid TOML syntax
So that are these lines i added because i thought that is how the Doc it mention it.
telegraf.conf
# set different interval for this input plugin every 10min
interval=“600s”
collection_jitter=“20s”
Question
So my question is. How can i change or setup the interval for a single input plugin in telegraf?
Or do i have to apply a different TOML syntax like [[inputs.filecount.agent]] or so?
I assume that i do not have to change any output interval also? Because i assume even though its currently 10s, if this input plugin only pulls/inputs data every 600s it should not matter, some flush cycle will push the Data to influx .
How can i change or setup the interval for a single input plugin in telegraf?
As the link you pointed to shows, individual inputs can set the interval and collection_jitter options. There is no difference in the TOML syntax for example I can do the following for the memory input plugin:
[[inputs.mem]]
interval="600s"
collection_jitter="20s"
I assume that i do not have to change any output interval also?
Correct, these are independent of each other.
line 208: invalid TOML syntax
Knowing what exactly is on line 208 and around that line will hopefully resolve your issue and get you going again. Also make sure your quotes that you used are correct. Sometimes when people copy and paste quotes they get ” vs " which can cause issues!

how to specify "disk=[....]" setting in "xl create" config?

I'm following the wiki https://help.ubuntu.com/community/Xen#Manually_Create_a_PV_Guest_VM
(section "
Set Up Initial Guest Configuration
")
I downloaded the netboot initrd.gz from https://mirror.arizona.edu/ubuntu//ubuntu/dists/bionic/main/installer-amd64/current/images/netboot/xen/
but in the .cfg , what should I specify for the "disk = " line? ---- my host box is not using LVM, so I'll have to use "file-backed storage" for PV disk image. (https://wiki.xenproject.org/wiki/Storage_options , indeed this worked when I gave --dir= instead of --lvm= when running the xml-create command in https://wiki.xenproject.org/wiki/Xen_Project_Beginners_Guide )
here is my current config:
yy#yy-70A4000HUX:~/ub_xen$ cat ub_xen.cfg
name = "ubud1"
kernel = "/home/yy/ub_xen/vmlinuz"
ramdisk = "/home/yy/ub_xen/initrd.gz"
#bootloader = "/usr/lib/xen-4.4/bin/pygrub"
memory = 1024
vcpus = 1
# Custom option for Open vSwitch
vif = [ 'bridge=xenbr0' ]
disk = [ 'vdev=hda,target=/home/yy/ub_xen/images' ]
# You may also consider some other options
# [[http://xenbits.xen.org/docs/4.4-testing/man/xl.cfg.5.html]]
yy#yy-70A4000HUX:~/ub_xen$
I ran the command with sudo xl create -c ub_xen.cfg
this worked fine first, giving me the regular install process on console, pulling install files from remote archive, but when it comes to the step of disk paritioning, it's showing me a "SCSI" partitioning choice, with no volumes / partitions/disks to be chosen.
I guess this is because I'm not setting the right value for "disk = [ ]" option. what should I use here if I use file-backed storage for PV (just like VMware does)?
thanks a lot
Yang
found it, huge thanks to the author here: https://www.systutorials.com/create-and-manage-virtual-machines-on-xen/

How to set the value of a SettingKey based on different sbt commands?

There's the command sbt flywayMigrate from flywaydb.org. The command requires use to set flywayUrl, flywayUser, and flywayPassword beforehand. It was good so far.
Now I want to be able to use sbt flywayMigrate for two different environment; Their variables should be different.
I tried to make two new commands: sbt flywayMigrateDev and sbt flywayMigrateProd. I couldn't figure out how to connect the new commands to flywayMigrate.
I tried creating a new scope. But I couldn't figure out how to wire the variables and tasks properly.
I wonder if anyone can give me an example on how to do this. I'd like to see a code example.
We can simplify the problem to:
There's the command sbt flywayMigrate that depends on flywayUrl. How do we allow the command to use different flywayUrls by calling sbt commands (or any other way is good, too)?
Thank you!
You should use config for this.
Example .sbt file contents:
// Set up your configs.
lazy val prodConfig = config("prod")
lazy val devConfig = config("dev")
// Set up any configuration that's common between dev and prod.
val commonFlyway = Seq(
// For the sake of example, a couple of shared settings.
flywayUser := "pg_admin",
flywayLocations := Seq("filesystem:migrations")
)
// Set up prod and dev.
inConfig(prodConfig)(flywayBaseSettings(prodConfig) ++ commonFlyway)
flywayUrl.in(prodConfig) := "jdbc:etc:proddb.somecompany.com"
// Or however you want to load your production password.
flywayPassword.in(prodConfig) := sys.env.getOrElse("PROD_PASSWD", "(unset)")
inConfig(devConfig)(flywayBaseSettings(prodConfig) ++ commonFlyway)
flywayUrl.in(devConfig) := "jdbc:etc:devdb.somecompany.com"
flywayPassword.in(devConfig) := "development_passwd"
Now you can run prod:flywayMigrate and dev:flywayMigrate to migrate production and development, respectively.
See the Flyway docs page for other examples.

how to run arc diff in a script, without prompting for a message

Phabricator's arcanist command line tool allows you to add a "diff" for revision. This is useful because you can quickly generate a diff which your colleagues can review.
Normally, running arc diff master, for example, will prompt your for a diff message, a test plan, and some other information, and then create a diff on Phabricator.
However, I would like to run arc diff from a continuous integration server, therefore assuming yes to all questions and passing the message and test plan as an argument to the command. What I have now is:
arc diff master --allow-untracked
Still, it is assuming that it is being called from a human user, and asking for a message, which fails when called from a continuous integration server. How can skip the prompts?
I think what you are looking for is the --verbatim option.
Considering the changes are committed so that it has a commit message you can run a command like:
arc diff --verbatim --reviewers xxxx --uncommitted --allow-untracked
This implies you set the Test plan to optional, else you have to specify it as well.
Finally you can also read revision info from a file using --message-file.
Another approach would be:
Create a Diff (but not a rev) with arc diff --raw-command "git diff origin/master"
Read the result to get the diff Id
Use the createrevision conduit call as described here to create the revision:
https://secure.phabricator.com/conduit/method/differential.createrevision/
the best practice is:
You can prepare a template file like this. This file can be named msg.conf
${title}
Summary:
${summary_content}
修订人:
${reviewers}
订阅者:
RBA-DEV
Test Plan:
${test_plan}
and then you can generate some content you need to fill this template and then.
you can rum this command:
arc diff --create --allow-untracked --skip-binaries --message-file msg.conf origin/master

Resources