Using graphite/Grafana to record the sizes of all collections in a mongodb instance. I wrote a simple (WIP) python script to do so:
#!/usr/bin/python
from pymongo import MongoClient
import socket
import time
statsd_ip = '127.0.0.1'
statsd_port = 8125
# create a udp socket
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
client = MongoClient(host='12.34.56.78', port=12345)
db = client.my_DB
# get collection list each runtime
collections = db.collection_names()
sizes = {}
# main
while (1):
# get collection size per name
for collection in collections:
sizes[collection] = db.command('collstats', collection)['size']
# write to statsd
for size in sizes:
MESSAGE = "collection_%s:%d|c" % (size, sizes[size])
sock.sendto(MESSAGE, (statsd_ip, statsd_port))
time.sleep(60)
This properly shows all of my collection sizes in grafana. However, I want to get a rate of change on these sizes, so I build the following graphite query in grafana:
derivative(statsd.myHost.collection_myCollection)
And the graph shows up totally blank. Any ideas?
FOLLOW-UP: When selecting a time range greater than 24h, all data similarly disappears from the graph. Can't for the life of me figure out that one.
Update: This was due to the fact that my collectd was configured to send samples every second. The statsd plugin for collectd, however, was receiving data every 60 seconds, so I ended up with None for most data points.
I discovered this by checking the raw data in Graphite by appending &format=raw to the end of a graphite-api query in a browser, which gives you the value of each data point as a comma-separated list.
The temporary fix for this was to surround the graphite query with keepLastValue(60). This however creates a stair-step graph, as the value for each None (60 values) becomes the last valid value within 60 steps. Graphing a derivative of this then becomes a widely spaced sawtooth graph.
In order to fix this, I will probably go on to fix the flush interval on collectd or switch to a standalone statsd instance and configure as necessary from there.
Related
I am trying to write a recursive web API call in PBI to collect all 27,515 records, the oDATA feed has a limit of 1,000 rows. I need this data to be refreshable in the PBI service, therefore these 28 requests via M code cannot be formulated in a dynamic way. PBI only allows for static or non-dynamic sources for refresh within the service. Below, I will share two pieces of M code, 1. one that is considered to be a dynamic data source (not what I need, but pulls all 27,515 records correctly) and 2. one that is a static data source (which is giving an incorrect number of 19,000 records, but is the type of data source that I need for this refreshing problem).
Noteworthy: Upon initial API call I receive a table named table "d" (in the photo below) with two rows one row it titled "results" which contains all of the data (1,000 rows) I need per request, the second row is titled "__next" which has the next API URL with an embedded skiptoken from the current calls worth of data. This skiptoken tells the API which rows to skip so that the next request doesn't deliver the data we have already collected.
Table d, Initial Table
M Code for Dynamic Data Source: This dynamic data source is pulling the correct number of records in 28 requests (up to 1,000 records per request) totaling 27,515 rows.
= List.Generate( ()=> Json.Document(Web.Contents("https://my_instance/odata/v2/Table?$format=JSON&$paging=snapshot"))[d],
each Record.HasFields(_, "results")= true,
each try Json.Document(Web.Contents(_[__next]))[d] otherwise [df=[__next="dummy_variable"]])
M Code for Static Data Source: This static data source is the type that I need for refreshing in PBI service (I confirmed it does refresh in the service), but is returning an incorrect number of rows, 19,000 versus 27,515. This code is calling 19 requests versus the needed 28 requests. I believe the error lies in the Query portion where I am attempting to call the next API URL with the skiptoken from the previous request.
= List.Generate( ()=> Json.Document(Web.Contents("https://my_instance/odata/v2/Table?$format=JSON&$paging=snapshot"))[d],
each Record.HasFields(_, "results")= true,
each try Json.Document(Web.Contents("https://my_instance/odata/v2/Table?$format=JSON&$paging=snapshot", [Query=[q=_[__next]]]))[d] otherwise [df=[__next="dummy_variable"]])
Does anyone see an error in the static code for iteratively calling each new request in the table [d] which has rows labeled [results] (all the data) and another row labeled [__next] which has the next URL with the skiptoken from the previous API call.
To be clear, in Web.Contents the url must be static, but you can freely use dynamic components in the RelativePath optional option argument (as in this simple example function) which is how you can generate dynamic web API queries that work in the service without the error you are seeing w.r.t. dynamic queries:
(current_page as text) =>
let
data = Web.Contents(
"https://my_instance/api/v2/endpoint", // static!
[
RelativePath = "?page="¤t_page // dynamic!
]
)
in
data
So if you can split out the relative path of your _next parameter and feed it into such a function it will be OK for automatic refreshes in the Power BI service.
Consider this example - you need to load table1 from source database, do some generic transformations (like convert time zones for timestamped columns) and write resulting data into Snowflake. This is an easy one and can be implemented using 3 dagster ops.
Now, imagine you need to do the same thing but with 100s of tables. How would you do it with dagster? Do you literally need to create 100 jobs/graphs? Or can you create one job, that will be executed 100 times? Can you throttle how many of these jobs will run at the same time?
You have a two main options for doing this:
Use a single job with Dynamic Outputs:
With this setup, all of your ETLs would happen in a single job. You would have an initial op that would yield a DynamicOutput for each table name that you wanted to do this process for, and feed that into a set of ops (probably organized into a graph) that would be run on each individual DynamicOutput.
Depending on what executor you're using, it's possible to limit the overall step concurrency (for example, the default multiprocess_executor supports this option).
Create a configurable job (I think this is more likely what you want)
from dagster import job, op, graph
import pandas as pd
#op(config_schema={"table_name": str})
def extract_table(context) -> pd.DataFrame:
table_name = context.op_config["table_name"]
# do some load...
return pd.DataFrame()
#op
def transform_table(table: pd.DataFrame) -> pd.DataFrame:
# do some transform...
return table
#op(config_schema={"table_name": str})
def load_table(context, table: pd.DataFrame):
table_name = context.op_config["table_name"]
# load to snowflake...
#job
def configurable_etl():
load_table(transform_table(extract_table()))
# this is what the configuration would look like to extract from table
# src_foo and load into table dest_foo
configurable_etl.execute_in_process(
run_config={
"ops": {
"extract_table": {"config": {"table_name": "src_foo"}},
"load_table": {"config": {"table_name": "dest_foo"}},
}
}
)
Here, you create a job that can be pointed at a source table and a destination table by giving the relevant ops a config schema. Depending on those config options, (which are provided when you create a run through the run config), your job will operate on different source / destination tables.
The example shows explicitly running this job using python APIs, but if you're running it from Dagit, you'll also be able to input the yaml version of this config there. If you want to simplify the config schema (as it's pretty nested as shown), you can always create a Config Mapping to make the interface nicer :)
From here, you can limit run concurrency by supplying a unique tag to your job, and using a QueuedRunCoordinator to limit the maximum number of concurrent runs for that tag.
I'm using Qt 5.11.1 with Qt OPC Ua and the Open62541 backend to create an OPC Client application.
Is it at all possible to make a request for historical data with the Qt OPC UA module? E.g. Get the values for this variable (node) between these two times.
My server application has this functionality (FreeOpcUa) as I can set variables to be 'historized' and view previously stored values. But I cannot see an obvious solution to access this data easily on the client side.
At the moment I'm considering exposing a function on my server for each variable which would take in a start and end timestamp and manually gather the values and format them into a string or some object for the client to use.
Would anyone have any ideas or thoughts on a better way to do this?
I'm not overly familiar with OPC-UA or Qt so may just be missing something obvious.
To use the OPC UA History Feature, both your OPC UA Client and Server shall supports the HistoryRead/HistoryWrite Services.
I don't know the status of the feature for your Server but for your Client (Open62541) those Services are not completely functionnal yet. Check the FEATURES document from their GitHub here
Apparently those should be fully functionnal within the next 0.4 Release.
The Freeopcua-Server supports historization (https://python-opcua.readthedocs.io/en/latest/server.html).
You have to enable historization per node (i.e. per variable you want to historize):
historize_node_data_change(node, period=datetime.timedelta(7), count=0)
Start historizing supplied nodes
Args:
node: node or list of nodes that can be historized
(variables/properties)
period: time delta to store the history; older
data will be deleted from the storage
count: number of changes to
store in the history
e.g. if you want to serve the history of the temperature you have to use "server.historize_node_data_change(Temp, period=datetime.timedelta(7), count=0)" (after the server was started):
[Python]:
from opcua import Server
from random import randint
import datetime
import time
server = Server()
server.set_endpoint("opc.tcp://192.168.178.20:443")
addspace = server.register_namespace("OPCUA_BurkhardsTemperatureSensor")
node = server.get_objects_node()
Param = node.add_object(addspace, "Thermometer_1")
Temp = Param.add_variable(addspace, "Temperature", 0)
Temp.set_writable()
Time = Param.add_variable(addspace, "Time", 0)
Time.set_writable()
SerialNr = Param.add_variable(addspace, "SerialNr.", "2323784628346")
server.start()
server.historize_node_data_change(Temp, period=datetime.timedelta(7), count=0)
while True:
Temperature = randint (10,50)
TIME = datetime.datetime.now()
print (Temperature,TIME)
Temp.set_value(Temperature)
Time.set_value(TIME)
time.sleep (2)
Suppose I have a metric named a.b.c.count. I am trying to write a python script which reads the latest value of the metric a.b.c.count in graphite.
I went through the docs and figured out that we can use curl to retrieve metrics from graphite using functions http://graphite.readthedocs.org/en/0.9.13-pre1/functions.html.
But still unable to figure out how to achieve the same.
I haven't seen a way to ask Graphite for a single value, but you can ask for a summary of values over a configurable period, and take the last one. (This is just for minimizing the data returned, you could as easily pull out the last value from any series in a given timeframe.) Example render parameters:
target=summarize(a.b.c.count,'1hour','last')&from=-1h&format=json
The JSON returned will look like this:
[{"target": "summarize(a.b.c.count, \"1hour\", \"last\")",
"datapoints": [[5.1333330000000004, 1442160000],
[5.5499989999999997, 1442163600]]}]
Here is a Python snippet to retrieve and parse this, using the 'requests' HTTP library
import requests
r = requests.get("http://graphite.yourdomain.com/render/?" +
"target=summarize(a.b.c.count,'1hour','last')&from=-1h&format=json")
print r.json()[0][u'datapoints'][-1][0]
I'm using Graphite and Collectd to monitor my server. In particular, I'm using the tail pluggin to count failed SSH logins. I'm using a counter for this metric, so expect to see 1, 2, 3, 0, etc... for data points. However, what I'm seeing is 0.1, 0.2, 0.3, 0, etc... It seems to me like Graphite is providing counts-per-second. I say this because my retention policy is one data point every 10 seconds for two hours. So 1 failed login per 10 seconds = 0.1 per second. I'm looking at this in a graph. It looks like this:
Furthermore, when I scale out to the next retention level, the numbers get adjusted accordingly: so 1 failed login which was shown as 0.1 is now shown as much less than this: 0.017 or something.
I don't think this is related to the aggregation method used: even the finest data is off. How can I get Graphite to treat this metric as a pure, raw, counter?
Here's my storage-schemas.conf (the retention policy):
[my_server]
pattern = .*
retentions = 10s:2h,1m:2d,30m:400d
Here's my configuration of the collectd tail plugin:
<Plugin "tail">
<File "/var/log/auth.log">
Instance "auth"
<Match>
Regex "sshd[^:]*: Failed password"
DSType "CounterInc"
Type "counter"
Instance "sshd-invalid_user"
</Match>
</File>
</Plugin>
And here's my configuration of the write_graphite pluggin (which sends data to graphite):
<Plugin write_graphite>
<Node "my_server_name">
Host "localhost"
Port "2003"
Protocol "tcp"
LogSendErrors true
Prefix "collectd."
#Postfix ""
StoreRates true
AlwaysAppendDS false
EscapeCharacter "_"
</Node>
</Plugin>
I tried setting StoreRates false for the write_graphite pluggin, but this didn't work. It did change the behaviour: when I performed a single failed SSH login, that metric shows as 1. However, it didn't drop back down to 0. When I performed two more failed logins, the metric pops up to 3.
Also of interest: I've also loaded the users pluggin which simply shows the number of users logged in and it's working great: shows 1 when I SSH in, two when I SSH in again, and back to 1 when I exit one SSH. For both settings of StoreRates. So it seems like what I want is possible somehow. Maybe not with the tail pluggin though.
The SSH logins with StoreRates false along with correct behaviour for Users Logged in can be seen in these graphs:
Any ideas? Thanks,
You are asking the system to count the number of events. And this is exactly what it's doing: it's counting the number of failed logins since its startup. Whether you're using StoreRates or not simply changes the way that information is displayed: as a rate or as the raw counter. A counter may never decrease! What you're actually asking for is a counter that resets itself upon reading: count the number of failed logins since the last time collectd checked.
As it happens the ABSOLUTE data source type in rrdtool can be used to achieve this, but that won't help you.
Step back, and think about what you're trying to achieve: the number of failed logins per second seems to me like a perfectly sane metric!
Although swissunix's answer is very helpful, to achieve the behaviour I was looking for, I ended up using Logster instead of Collectd. With Logster, you write the bit of code that parses the file as well as the bit that returns the metric. So although dividing a count by the time is common with Logster, you don't have to do this if you don't want to: there's lots of flexibility.
I've put my parsers here: https://github.com/camlee/logster-parsers
If you set StoreRates to false, in graphite you can apply the derivative function to the ever-increasing counter to get your rate of increase per retention interval, which would match your requirement.
E.g. in your example of reporting 1 failed login, then 2, you saw the values 1 and 3. The derivative is 1 and 2: the failed logs per interval that graphite tracks.