Aggregate data in Telegraf only by time, not by tags - telegraf

I use Telegraf to collect data and then write it to InfluxDb. And I would like to aggregate the data into one row. As I know there are a lot of plugins for Telegraf including plugins to aggregate data. I use BasicStats Aggregator Plugin for my purpose. But it has unexpected behavior (unexpected for me). It aggregates rows by their tags but I need aggregation only by timestamp. How can I make this plugin to aggregate rows only by their timestamps? For example I have the following rows:
timestamp=1 tag1=foo field1=3
timestamp=1 tag1=bar field1=1
timestamp=1 tag1=baz field1=4
and for them I would like to get the following aggregated row:
timestamp=1 sum=8
Thank you in advance

I believe you can't aggregate by timestamp with Telegraf's conventional processors or aggregators( like BasicStats ) because of InfluxDB's nature. It is a time-series database and it is indexed by time. But you can aggregate the data with an InfluxDB Query:
SELECT mean("field1") FROM "statsdemo"."telegraf"."my_metric" WHERE time > now()-5m AND time < now() GROUP BY time(2000ms) FILL(null)
Another approach could be using execd aggregator. Just write a script with bash or your favorite programming language that reads from STDIN, aggregate the data by timestamp, and print the result to STDOUT following the influx line protocol. I have never used this aggregator before, but I think it may work.

Related

How to count number of rows in a sas table using x command

Does anyone know how to count number of rows in a SAS table using x command, I need to achieve this through unix. I tried wc -l but it gave me a different result than what proc sql count(*) gives me.
Lee has got the right idea here - SAS datasets are stored in a proprietary binary format, in which line breaks are not necessarily row separators, so you cannot use tools like wc to get an accurate row count. Using SAS itself is one option, or you could potentially use other tools like the python pandas.read_sas module to load the table if you don't have SAS installed on your unix server.
Writing a script to do this for you is outside the scope of this answer, so have a go at writing one yourself and post a more specific question if you get stuck.

Importing option chain data from Bloomberg

I would like to import from Bloomberg into R for a specified day the entire option chain for a particular stock, i.e. all expiries and strikes for the exchange traded options. I am able to import the option chain for a non-specified day (today):
bbgData <- bds(connection,sec,"OPT_CHAIN")
Where connection is a valid Bloomberg connection and sec is a Bloomberg security ticker such as "TLS AU Equity"
However, if I add extra fields it doesn't work, i.e.
bbgData <- bds(connection, sec,"OPT_CHAIN", testDate, "OPT_STRIKE_PX", "MATURITY", "PX_BID", "PX_ASK")
bbgData <- bds(connection, sec,"OPT_CHAIN", "OPT_STRIKE_PX", "MATURITY", "PX_BID", "PX_ASK")
Similarly, if I switch to using the historical data function it doesn't work
bbgData <- dateDataHist <- bdh(connection,sec,"OPT_CHAIN","20160201")
I just need the data for one day, but for a specified day, and including the additional fields
Hint: I think the issue is that every field following "OPT_CHAIN" is dependent on the result of "OPT_CHAIN", so for example it is the strike price given the code in "OPT_CHAIN", but I am unsure how to introduce this conditionality into the R Bloomberg query.
It's better to use the field CHAIN_TICKERS and related overrides when retrieving option data for a given underlying from Bloomberg. You can, for example, request points for a given moneyness by getting CHAIN_TICKERS with an override of CHAIN_STRIKE_PX_OVRD equal to 90%-110%.
In either case you need to use the tickers that are the result of your first request in a second request if you want to retrieve additional data. So:
option_tickers <- bds("TLS AU Equity","CHAIN_TICKERS",
overrides=c(CHAIN_STRIKE_PX_OVRD="90%-110%"))
option_prices <- bdp(sapply(option_tickers, paste, "equity"), c("PX_BID","PX_ASK"))

Format date time in cts:element-values

I want to format dateTime with in the cts:element-values itself. Can anyone help me around this?
I have a dateTime format string -
let $date-format := "[Y0001]-[M01]-[D01]T[h01]:[m01]:[s01].[f1]"
and I want to use it in a query like this -
cts:element-values(
xs:QName($field),
(),
($direction),
cts:and-query((cts:collection-query("urn:iddn:collections:searchable"), cts:query($cts-query)))
)
Provided $field is of type dateTime.
You can accomplish this by writing a User-Defined Function. UDFs are run as map/reduce, so they are very fast even with a large data set. I wrote an example UDF to create a day-of-the-week facet based on dateTime data. That example is based on MarkLogic 6, but should still work in MarkLogic 8.
The good thing is that UDFs are very fast. The tricky part is that you'll have to write it in C++. Full documentation in the User-Defined Functions section of the MarkLogic documentation.

Does `sqlite3` support loops?

I wrote the little bash script below and it works as intended, but I added couple comments and newlines for readability which breaks the code. Removing comments and newlines should make it a valid script.
### read all measurements from the database and list each value only once
sqlite3 -init /tmp/timeout /tmp/testje.sqlite \
'select distinct measurement from errors order by measurement;' |
### remove the first line of stdout as this is a notification rather than intended output
sed '1d' |
### loop though all found values
while read error; do
### count the number of occurences in the original table and print that
sqlite3 -init /tmp/timeout /tmp/testje.sqlite \
"select $error,count( measurement ) from errors where measurement = '$error' ;"
done
The result is like this:
134 1
136 1
139 2
159 1
Question: Is it possible with sqlite3 to translate the while-loop to SQL statements? In other words, does sqlite3 support some sort of for-loop to loop through results of a previous query?
Now I know sqlite3 is a very limited database and chances are that what I want is just too complex for it. I've been searching, for it but I'm really a database nitwit and the hits I get so far are either on a different database or solving an entirely different problem.
The easiest answer (that I do not hope for BTW) is 'sqlite3 does not support loops'.
SQLite does not support loops. Here is the entire language, you'll notice that structured programming is completely absent.
However, that's not to say that you can't get what you want without loops, using sets or some other SQL construct instead. In your case it might be as simple as:
select measurement, count( measurement ) from errors GROUP BY measurement
That will give you a list of all measurements in the errors table and a count of how often each one occurs.
In general, SQL engines are best utilized by expressing your query in a single (sometimes complex) SQL statement, which is submitted to the engine for optimization. In your example you've already codified some decisions about the strategy used to get the data from the database -- it's a tenet of SQL that the engine is better able to make those decisions than the programmer.

Sequence number inside a txt file in UNIX

I want to generate a unique sequence number for each row in the file in unix. I can not make identity column in database as it has some other sources which also inserts data in it. I tried using NR number in awk but since i have filters in my script it may skip rows in the file so i may not get sequential numbers.
my requirements are - This sequence number needs to be persistent since everday i would receive this file and should start from where i left of. also the number needs to be preceded by "EMP_" for each line in the file.
Please suggest.
Thanks in advance.
To obtain unique id in UNIX you may use file to store and read the value. however this method is so tedious and require mechanism on file IO locking. the easiest way is to use date time to obtain unique id example :
#!/bin/sh
uniqueVal = `date '+%Y%m%d%H%M%S'`

Resources