Run [[processors.regex]] over multiple measurements - telegraf

Is it possible to run the regex preprocessor over multiple measurements like that?
[[processors.regex]]
namepass = ["measure1", "measure2"]
[[processors.regex.fields]]
key = "agent"
pattern = '^.*$'
replacement = "NORMAL"
result_key = "agent_type"
In my case two measurements both have an Access-Log as source ([[inputs.tail]]) but I want to keep them seperate as I want to compare both eventually.

To answer my own question: I'm not sure if this is how it's meant to be but a quickfix would looke like that:
[[processors.regex]]
namepass = ["measure1"]
[[processors.regex.fields]]
key = "agent"
pattern = '^.*$'
replacement = "NORMAL"
result_key = "agent_type"
[[processors.regex]]
namepass = ["measure2"]
[[processors.regex.fields]]
key = "agent"
pattern = '^.*$'
replacement = "NORMAL"
result_key = "agent_type"
Unfortunately it contains duplicated code which is bad.

Related

How do I create a function that would take a textfile, two logical operators for comment and blank lines?

I am to construct a function named read_text_file.
It takes in an argument textFilePath that is a single character and two optional parameters withBlanks and withComments that are both single
logicals;
textFilePath is the path to the text file (or R script);
if withBlanks and withComments are set to FALSE, then read_text_file() will return the text file without blank lines (i.e. lines that contain nothing or only whitespace) and commented (i.e. lines that starts with “#”) lines respectively;
it outputs a character vector of length n where each element corresponds to its respective line of text/code.
I came up with the function below:
read_text_file <- function(textFilePath, withBlanks = TRUE, withComments = TRUE){
# check that `textFilePath`: character(1)
if(!is.character(textFilePath) | length(textFilePath) != 1){
stop("`textFilePath` must be a character of length 1.")}
if(withComments==FALSE){
return(grep('^$', readLines(textFilePath),invert = TRUE, value = TRUE))
}
if(withBlanks==FALSE){
return(grep('^#', readLines(textFilePath),invert = TRUE, value = TRUE))
}
return(readLines(textFilePath))
}
The second if-statement will always be executed leaving the third if-statement unexecuted.
I'd recommend processing an imported object instead of returning it immediately:
read_text_file <- function(textFilePath, withBlanks = TRUE, withComments = TRUE){
# check that `textFilePath`: character(1)
if(!is.character(textFilePath) | length(textFilePath) != 1){
stop("`textFilePath` must be a character of length 1.")}
result = readLines(textFilePath)
if(!withComments){
result = grep('^\\s*#\\s*', result, invert = TRUE, value = TRUE)
}
if(!withBlanks){
result = grep('^\\s*$', result, invert = TRUE, value = TRUE)
}
result
}
The big change is defining the result object that we modify as needed and then return at the end. This is good both because (a) it is more concise, not repeating the readLines command multiple times, (b) it lets you easily do 0, 1, or more data cleaning steps on result before returning it.
I also made some minor changes:
I don't use return() - it is only needed if you are returning something before the end of the function code, which with these modifications is not necessary.
You had your "comment" and "blank" regex patterns switched, I corrected that.
I changed == FALSE to !, which is a little safer and good practice. You could use isFALSE() if you want more readability.
I added \\s* into your regex patterns in a couple places which will match any amount of whitespace (including none)

Pandas Dataframe Datetime Indices Concat/Merge/Join corrupts index order

Consider 2 .csv files containing waterlevel data with a DateTime index. Can be dowloaded from:
https://www.dropbox.com/sh/50zaz9ore00j7rp/AAA2MhNrNMRImoSBWWcUBNp4a?dl=0
Imported as follows:
pbn61 = pd.read_csv('3844_61.csv,
index_col = 0,
delimiter = ';',
dayfirst = True,
usecols = ['Datumtijd','DisplayWaarde']
)
Same for the 2nd file. Global variable 'pbn65'.
Now I want to merge these 2 DataFrames such that I can plot both dataseries in one graph. Reason for this is that I have about 50 of these files and neither of them have the same starting date and/or time. So merging some will greatly reduce the amount of graphs I end up with.
Now I only want the data that is available in both series, since only then the data becomes relevant for research. Therefore I use the following code(s):
pbn65.columns = ['DisplayWaarde2']
result1 = pd.merge(pbn61,pbn65, left_index = True, right_index = True, how='inner')
result2 = pbn65.join(pbn61, how = 'inner')
pd.set_option('max_rows', 25000)
result2
I needed to rename one column in order to make sure it could join. Increased max rows to show counting error
Both ways result in the same issue. That is that the index is put in a wrong order. Now this is probably because the index is a DateTime of the form
DD-MM-YYYY HH:MM
and joining/merging causes pandas to count Decimal instead of DateTime.
Concatenating both DataFrames gives the following error:
result3 = pd.concat([pbn61,pbn65], axis = 1, join = 'inner')
result3
Shape of passed values is (2, 20424), indices imply (2, 19558)
Which is exactly the length of the resulting DataFrame using merge/join.
Is there a way around this issue?
P.S. I would like to keep a DateTime index since I need to have a time indication for evaluation.
P.P.S. Most files contain duplicate indices. Trying to use index.drop_duplicate seems to do nothing.
Solution
pbn61 = pd.read_csv('3844_61.csv',
index_col = 0,
delimiter = ';',
dayfirst = True,
usecols = ['Datumtijd','DisplayWaarde'],
parse_dates = [0],
)
pbn65 = pd.read_csv('3847_65.csv',
index_col = 0,
delimiter = ';',
dayfirst = True,
usecols = ['Datumtijd','DisplayWaarde'],
parse_dates = [0],
)
pbn61 = pbn61.groupby(level=0).first()
pbn65 = pbn65.groupby(level=0).first()
result = pd.concat([pbn61, pbn65], axis=1).dropna()
Explanation
parse_dates = [0],
parse_dates specifies which column should be parsed as a date
pbn61 = pbn61.groupby(level=0).first()
this takes care of duplicate indices. drop_duplicates takes care of duplicated records.
result = pd.concat([pbn61, pbn65], axis=1).dropna()
This merges the two. I find this more intuitive. There are many ways to do this.
Demonstration
result.plot()
#piRSquared
import numpy as np
import pandas as pd
import glob
pd.version
Files = glob.glob('Peilbuizen/*.csv')
def Invoer(F):
F = Files
for i in range(len(Files)):
globals()['pbn%s' % Files[i][16:-1-3]] = pd.read_csv(Files[i],
index_col = 0,
delimiter = ';',
dayfirst = True,
usecols = ['Datumtijd','DisplayWaarde'],
parse_dates =[0]
)
Invoer(Files)
pbn11 = pbn11.groupby(level = 0).first()
pbn13 = pbn13.groupby(level = 0).first()
result = pd.concat([pbn11, pbn13], axis = 1).dropna()
result.plot()
I updated the dropbox folder to 10 files for experiments. Creating a folder called "Peilbuizen" within the python save directory will create the globals.

How to print a complex number without percent sign in Scilab?

I tried this
a = 1+3*%i;
disp("a = "+string(a))
I got a = 1+%i*3 , but what I want is a = 1. + 3.i
So is there any method in Scilab to print a complex number without the percent sign?
Similarly to Matlab, you can format the output string by including the real and imaginary parts separately.
mprintf('%g + %gi\n', real(a) , imag(a))
However, that looks pretty ugly when the imaginary part is negative. I suggest writing a formatting function:
function s = complexstring(a)
if imag(a)>=0 then
s = sprintf('%g+%gi', real(a) , imag(a))
else
s = sprintf('%g%gi', real(a) , imag(a))
end
endfunction
Examples:
disp('a = '+complexstring(1+3*%i))
disp('b = '+complexstring(1-3*%i))
Output:
a = 1+3i
b = 1-3i

Hardware convertion : written data is different than my read data

I am testing a program executed partially on a MPC603 and partially on a MPC555.
I have to verify that some data is correctly "moved" from one processor to the other via a DPRAM.
I am guessing that at some point "someone" makes a conversion but I don't know how to find what kind of conversion is done.
Here are some examples:
Pt_Dpram->acq1 at 0x8D00008 = 0x3EB2
acq1 = (0xA010538) = 1182451712 = 0x467AC800
Pt_Dpram->acq2 at 0x8D0000A = 0x5528
acq2 = (0xA010540) = 1185566720 = 0x46AA5000
Pt_Dpram->acq3 at 0x8D0000C = 0x416E
acq3 = (0xA010548) = 1107552036 = 0x4203E724
Pt_Dpram->acq4 at 0x8D0000E = 0x413C
acq4 = (0xA010550) = 1107526232 = 0x42038258
I got my answers from a collegue : the values in acqX are in Motorola binary format : http://en.wikipedia.org/wiki/SREC_(file_format)
Here is a small software that does the conversion : http://www.hexworkshop.com/onlinehelp/500/html/idhelp_baseconv.htm

Py2Neo: Cypher Query

I am trying to make the following Cypher query:
start me = node:actors(actor = 'Tom Baker') , you = node:actors(actor = 'Peter Davison') match p = you-[*1..3]-me return p
using the Dr.Who dataset available in the neo4j site. It gives correct results in the Neo4j console as well as correct result in Py2Neo. However now I want to make the query in such a way such that
x='Tom Baker'
y='Peter Davison'
and make the same query using the variables x and y. However I dont know the escape sequence for Py2Neo. I tried the below query
"start me = node:actors(actor = \'.x.\') , you = node:actors(actor = \'.y.\') match p = you-[*1..3]-me return p"
but it didnt work. Any help would be appreciated.
Try to use parameters instead, named parameters in cypher are {name} and you pass a hash/dictionary with the name-value pairs along with the query.
start me = node:actors(actor = {me}) ,
you = node:actors(actor = {you})
match p = you-[*1..3]-me
return p
params: {"me":"Tom Baker","you":"Peter Davison"}

Resources