Any smarter way to force a set of decision variables to be equal?
(If not feel free to use the solution)
Declarations
Given the following set:
ID1 | ID2
------- | -------
A | AA
B | AA
C | BB
C | AA
C | CC
D | CC
e.g. initialized in OPL by
//Set ID
tuple ObjectID{
string ID1;
string ID2;
}
{ObjectID} ID = {
<"A", "AA">,
<"B", "AA">,
<"C", "BB">,
<"C", "AA">,
<"C", "CC">,
<"D", "CC">,
};
And a decision variable x[ID]
to be declared in OPL as
dvar int+ x[ID]
The Problem
The decision variable x[ID] should be equal if ID1 is equal for all ID2.
Example:
x[<"C", "BB">] == x[<"C", "AA">] == x[<"C", "CC">]
Current solution
Pairwise comparison of all dvar's with identical ID1 and different ID2.
forall(
id_1 in ID, id_2 in ID:
id_1.ID1 == id_2.ID1 &&
id_1.ID2 != id_2.ID2
)
x[id_1] == x[id_2];
first improvement would be to use ordered to divide by 2 the number of equality constraints:
forall(ordered id_1,id_2 in ID :id_1.ID1 == id_2.ID1 )
x[id_1] == x[id_2];
and second improvement could be to move from (n-1)*n/2 constraints to (n-1) constraints
//Set ID
tuple ObjectID{
string ID1;
string ID2;
}
{ObjectID} ID = {
<"A", "AA">,
<"B", "AA">,
<"C", "BB">,
<"C", "AA">,
<"C", "CC">,
<"D", "CC">
};
dvar int+ x[ID];
{string} Id1s={i.ID1 | i in ID};
{string} Id2PerId1[id1 in Id1s]={i.ID2 | i in ID : i.ID1==id1};
subject to
{
forall(id1 in Id1s) forall(id2 in Id2PerId1[id1] diff {last(Id2PerId1[id1])})
x[<id1,id2>] == x[<id1,next(Id2PerId1[id1],id2)>];
}
`
Related
I am applying the series_decompose_anomalies algorithm to time data coming from multiple meters. Currently, I am using the ADX dashboard feature to feed my meter identifier as a parameter into the algorithm and return my anomalies and scores as a table.
let dt = 3hr;
Table
| where meter_ID == dashboardParameter
| make-series num=avg(value) on timestamp from _startTime to _endTime step dt
| extend (anomalies,score,baseline) = series_decompose_anomalies( num, 3,-1, 'linefit')
| mv-expand timestamp, num, baseline, anomalies, score
| where anomalies ==1
| project dashboardParameter, todatetime(timestamp), toreal(num), toint(anomalies), toreal(score)
I would like to bulk process all my meters in one go and return a table with all anomalies found across them. Is it possible to feed an array as an iterable in KQL or something similar to allow my parameter to change multiple times in a single run?
Simply add by meter_ID to make-series
(and remove | where meter_ID == dashboardParameter)
| make-series num=avg(value) on timestamp from _startTime to _endTime step dt by meter_ID
P.S.
Anomaly can be positive (num > baseline => flag = 1) or negative (num < baseline => flag = -1)
Demo
let _step = 1h;
let _endTime = toscalar(TransformedServerMetrics | summarize max(Timestamp));
let _startTime = _endTime - 12h;
TransformedServerMetrics
| make-series num = avg(Value) on Timestamp from _startTime to _endTime step _step by SQLMetrics
| extend (flag, score, baseline) = series_decompose_anomalies(num , 3,-1, 'linefit')
| mv-expand Timestamp to typeof(datetime), num to typeof(real), flag to typeof(int), score to typeof(real), baseline to typeof(real)
| where flag != 0
SQLMetrics
num
Timestamp
flag
score
baseline
write_bytes
169559910.91717172
2022-06-14T15:00:30.2395884Z
-1
-3.4824039875238131
170205132.25708669
cpu_time_ms
17.369556143036036
2022-06-14T17:00:30.2395884Z
1
7.8874529842826
11.04372634506527
percent_complete
0.04595588235294118
2022-06-14T22:00:30.2395884Z
1
25.019464868749985
0.004552738927738928
blocking_session_id
-5
2022-06-14T22:00:30.2395884Z
-1
-25.019464868749971
-0.49533799533799527
pending_disk_io_count
0.0019675925925925924
2022-06-14T23:00:30.2395884Z
1
6.4686836384225685
0.00043773741690408352
Fiddle
Given a dynamic field, say, milestones, it has value like: {"ta": 1655859586546, "tb": 1655859586646},
How do I print a table with columns like "ta", "tb" etc, with the single row as unixtime_milliseconds_todatetime(tolong(taValue)), unixtime_milliseconds_todatetime(tolong(tbValue)) etc.
I figured that I'll need to write a function that I can call, so I created this:-
let f = view(a:string ){
unixtime_milliseconds_todatetime(tolong(a))
};
I can use this function with a normal column as:- project f(columnName).
However, in this case, its a dynamic field, and the number of items in the list is large, so I do not want to enter the fields manually. This is what I have so far.
log_table
| take 1
| evaluate bag_unpack(milestones, "m_") // This gives me fields as columns
// | project-keep m_* // This would work, if I just wanted the value, however, I want `view(columnValue)
| project-keep f(m_*) // This of course doesn't work, but explains the idea.
Based on the mv-apply operator
// Generate data sample. Not part of the solution.
let log_table = materialize(range record_id from 1 to 10 step 1 | mv-apply range(1, 1 + rand(5), 1) on (summarize milestones = make_bag(pack_dictionary(strcat("t", make_string(to_utf8("a")[0] + toint(rand(26)))), 1600000000000 + rand(60000000000)))));
// Solution Starts here.
log_table
| mv-apply kv = milestones on
(
extend k = tostring(bag_keys(kv)[0])
| extend v = unixtime_milliseconds_todatetime(tolong(kv[k]))
| summarize milestones = make_bag(pack_dictionary(k, v))
)
| evaluate bag_unpack(milestones)
record_id
ta
tb
tc
td
te
tf
tg
th
ti
tk
tl
tm
to
tp
tr
tt
tu
tw
tx
tz
1
2021-07-06T20:24:47.767Z
2
2021-05-09T07:21:08.551Z
2022-07-28T20:57:16.025Z
2022-07-28T14:21:33.656Z
2020-11-09T00:54:39.71Z
2020-12-22T00:30:13.463Z
3
2021-12-07T11:07:39.204Z
2022-05-16T04:33:50.002Z
2021-10-20T12:19:27.222Z
4
2022-01-31T23:24:07.305Z
2021-01-20T17:38:53.21Z
5
2022-04-27T22:41:15.643Z
7
2022-01-22T08:30:08.995Z
2021-09-30T08:58:46.47Z
8
2022-03-14T13:41:10.968Z
2022-03-26T10:45:19.56Z
2022-08-06T16:50:37.003Z
10
2021-03-03T11:02:02.217Z
2021-02-28T09:52:24.327Z
2021-04-09T07:08:06.985Z
2020-12-28T20:18:04.973Z
9
2022-02-17T04:55:35.468Z
6
2022-08-02T14:44:15.414Z
2021-03-24T10:22:36.138Z
2020-12-17T01:14:40.652Z
2022-01-30T12:45:54.28Z
2022-03-31T02:29:43.114Z
Fiddle
I cannot understand the behaviour of the update operator of jq (version 1.6) shown in the following examples.
Why does example 1 return an updated object, but example 2 and 3 return an empty object or a wrong result?
The difference between the examples is only the calling order of the function to convert a string into a number.
#!/bin/bash
#
# strange behaviour jq
# example 1 - works as expected
jq -n '
def numberify($x): $x | tonumber? // 0;
"1" as $stringValue
| numberify($stringValue) as $intValue
# | { } # version 1: a.b does not exist yet
| { a: { b: 1 } } # version 2: a.b exists already
| .["a"] |= { b: (.b + $intValue) }
'
# result example 1, version 1 - expected
# {
# "a": {
# "b": 1
# }
# }
# result example 1, version 2 - expected
# {
# "a": {
# "b": 2
# }
# }
# example 2 - erroneous result
jq -n '
def numberify($x): $x | tonumber? // 0;
"1" as $stringValue
# | { } # version 1: a.b does not exist yet
| { a: { b: 1 } } # version 2: a.b exists already
| .["a"] |= { b: (.b + numberify($stringValue)) }
'
# result example 2, version 1 - unexpected
# {}
# result example 2, version 2 - unexpected
# {}
# example 3 - erroneous result
jq -n '
def numberify($x): $x | try tonumber catch 0;
"1" as $stringValue
# | { } # version 1: a.b does not exist yet
| { a: { b: 1 } } # version 2: a.b exists already
| .["a"] |= { b: (.b + numberify($stringValue)) }
'
# result example 3, version 1 - unexpected
# {
# "a": {
# "b": 0
# }
# }
# result example 3, version 2 - unexpected
# {
# "a": {
# "b": 1
# }
# }
#oguzismail That's a good idea to use '+=' instead of '|='.
I hadn't thought of it before.
Currently, my code with the workaround for the bug looks like this:
def numberify($x): $x | tonumber? // 0;
"1" as $sumReqSize
| "10" as $sumResSize
| { statistics: { count: 1, sumReqSize: 2, sumResSize: 20 } }
| [numberify($sumReqSize), numberify($sumResSize)] as $sizes # workaround for bug
| .statistics |= {
count: (.count + 1),
sumReqSize: (.sumReqSize + $sizes[0]),
sumResSize: (.sumResSize + $sizes[1])
}
'
Following your suggestion it will be more concise and doesn't need the ugly workaround:
def numberify($x): $x | tonumber? // 0;
"1" as $sumReqSize
| "10" as $sumResSize
| { statistics: { count: 1, sumReqSize: 2, sumResSize: 20 } }
| .statistics.count += 1
| .statistics.sumReqSize += numberify($sumReqSize)
| .statistics.sumResSize += numberify($sumResSize)
This is a bug in jq 1.6. One option would be to use an earlier version of jq (e.g. jq 1.5).
Another would be to avoid |= by using = instead, along the lines of:
.a = (.a | ...)
or if the RHS does not actually depend on the LHS (as in your original examples), simply replacing |= by =.
This is a bug in jq 1.6. In this case you can use try-catch instead.
def numberify($x): $x | try tonumber catch 0;
But I don't know if there is a generic way to walk around this issue.
I need to validate the file with respect to the data types. I have a file with below data,
data.csv
Col1 | Col2 | Col3 | Col4
100 | XYZ | 200 | 2020-07-11
200 | XYZ | 500 | 2020-07-10
300 | XYZ | 700 | 2020-07-09
I have another file having the configurations,
Config_file.txt
Columns = Col1|Col2|Col3|Col4
Data_type = numeric|string|numeric|date
Delimiter = |
I have to compare the configuration file and data file and return a result.
For example:
In configuration file data_type of Col1 is numeric. In case if i get any string value in Col1 in data file, the script should return Datatype Mismatch Found in Col1. I have tried with awk, if its one line item its easy to get it done by defining the position of the columns. But am not sure how to loop entire file column by column ad check the data.
I have also tried providing the patterns and achieve this. But am unable to validate complete file. Any suggestion would be helpful.
awk -F "|" '$1 ~ "^[+-]?[0-9]+([.][0-9]+)?$" && $4 ~ "^[+-]?[0-9]+([.][0-9]+)?$" && length($5) == 10 {print}' data.csv
The goal is to compare the data file (data.csv) and Data_Type in config file(Config_file.txt) for each column and check if any column is having datatype mismatch.
For example, consider below data
Col1 | Col2 | Col3 | Col4
100 | XYZ | 200 | 2020-07-11
ABC | XYZ | 500 | 2020-07-10 -- This is incorrect data because Col1 is having string value `ABC`, in config file, the data type is numeric
300 | XYZ | 700 | 2020-07-09
300 | XYZ | 700 | 2020-07-09
300 | XYZ | XYZ | 2020-07-09 -- Incorrect Data
300 | 300 | 700 | 2020-07-09
300 | XYZ | 700 | XYX -- Incorrect Data
The data type provided in config table is as below,
Columns = Col1|Col2|Col3|Col4
Data_type = numeric|string|numeric|date
The script should echo the result as Data Type Mismatch Found in Col1
Here is a skeleton solution in GNU awk. In lack of sample output I improvised:
awk '
BEGIN {
FS=" *= *"
}
function numeric(p) { # testing for numeric
if(p==(p+0))
return 1
else return 0
}
function string(p) { # cant really fail string test, right
return 1
}
function date(p) {
gsub(/-/," ",p)
if(mktime(p " 0 0 0")>=0)
return 1
else return 0
}
NR==FNR{ # process config file
switch($1) {
case "Columns":
a["Columns"]=$NF;
break
case "Data_type":
a["Data_type"]=$NF;
break
case "Delimiter":
a["Delimiter"]=$NF;
}
if(a["Columns"] && a["Data_type"] && a["Delimiter"]) {
split(a["Columns"],c,a["Delimiter"])
split(a["Data_type"],d,a["Delimiter"])
for(i in c) { # b["Col1"]="string" etc.
b[c[i]]=d[i]
FS= a["Delimiter"]
}
}
next
}
FNR==1{ # processing headers of data file
for(i=1;i<=NF;i++) {
h[i]=$i # h[1]="Col1" etc.
}
}
{
for(i=1;i<=NF;i++) { # process all fields
f=b[h[i]] # using indirect function calls check
printf "%s%s",(#f($i)?$i:"FAIL"),(i==NF?ORS:FS) # the data
}
}' config <(tr -d \ <data) # deleting space from your data as "|"!=" | "
Sample output:
FAIL|Col2|FAIL|FAIL
100|XYZ|200|2020-07-11
200|XYZ|500|2020-07-10
300|XYZ|700|2020-07-09
FAIL|XYZ|FAIL|FAIL # duplicated previous record and malformed it
$ cat tst.awk
NR == FNR {
gsub(/^[[:space:]]+|[[:space:]]+$/,"")
tag = val = $0
sub(/[[:space:]]*=.*/,"",tag)
sub(/[^=]+=[[:space:]]*/,"",val)
cfg_tag2val[tag] = val
next
}
FNR == 1 {
FS = cfg_tag2val["Delimiter"]
$0 = $0
reqd_NF = split(cfg_tag2val["Columns"],reqd_names)
split(cfg_tag2val["Data_type"],reqd_types)
}
NF != reqd_NF {
printf "%s: Error: line %d NF (%d) != required NF (%d)\n", FILENAME, FNR, NF, reqd_NF | "cat>&2"
got_errors = 1
}
FNR == 1 {
for ( i=1; i<=NF; i++ ) {
reqd_name = reqd_names[i]
name = $i
gsub(/^[[:space:]]+|[[:space:]]+$/,"",name)
if ( name != reqd_name ) {
printf "%s: Error: line %d col %d name (%s) != required col name (%s)\n", FILENAME, FNR, i, name, reqd_name | "cat>&2"
got_errors = 1
}
}
}
FNR > 1 {
for ( i=1; i<=NF; i++ ) {
reqd_type = reqd_types[i]
if ( reqd_type != "string" ) {
value = $i
gsub(/^[[:space:]]+|[[:space:]]+$/,"",value)
type = val2type(value)
if ( type != reqd_type ) {
printf "%s: Error: line %d field %d (%s) type (%s) != required field type (%s)\n", FILENAME, FNR, i, value, type, reqd_type | "cat>&2"
got_errors = 1
}
}
}
}
END { exit got_errors }
function val2type(val, type) {
if ( val == val+0 ) { type = "numeric" }
else if ( val ~ /^[0-9]{4}(-[0-9]{2}){2}$/ ) { type = "date" }
else { type = "string" }
return type
}
.
$ awk -f tst.awk config.txt data.csv
data.csv: Error: line 3 field 1 (ABC) type (string) != required field type (numeric)
data.csv: Error: line 6 field 3 (XYZ) type (string) != required field type (numeric)
data.csv: Error: line 8 field 4 (XYX) type (string) != required field type (date)
I have a data that looks as follows:
Date | Time | Temperature
16995 | "12:00" | 23
16995 | "12:30" | 24
...
17499 | "23:30" | 23
17500 | "00:00" | 24
I'm writing a function to select a range of cases based on certain start and end time points. To do this I need to determine the start_pt and end_pt indices which should match with a pair of rows in the dataframe.
select_case <- function(df,date,time) {
start_pt = 0
end_pt = 0
for (i in 1:nrow(df)) {
if ((date[i] == 17000) & (time[i] == "12:00")) {
start_pt <- i
return(start_pt)
} else {
next
}
}
for (i in start_pt:nrow(df)) {
if (date[i] == 17500) {
end_pt <- i - 1
return(end_pt)
break
} else {
next
}
}
return(df[start_pt:end_pt,])
}
When I called:
test <- select_case(data,data$Date,data$Time)
test
I expect the following:
Date | Time | Temperature
17000 | "12:00" | 23
17000 | "12:30" | 24
...
17499 | "23:00" | 23
17499 | "23:30" | 23
Instead i got
[1] 1
Not sure where i got it wrong here. When I separately ran each of the two for-loops from R console and substituting in the corresponding arguments for each loop, i got the correct indices for both start_pt and end_pt.
I tried putting each loop in a separate function, named sta(date,time) and end(date). Then I bind them in the following function:
binder <- function(date,time) {
return(sta(date,time),end(date))
}
and call
sta_end <- binder(date,time)
I got the error:
Error in return(sta(date, time), end(date)) :
multi-argument returns are not permitted
So i combined them and it worked:
binder <- function(date,time) {
return(c(sta(date,time),end(date)))
}
sta_end <- binder(date,time)
[1] 1 <an index for end_pt>
So the mistake i made in my original function is that i use return() 3 times and the function will only return the first one which is start_pt. So I took out the first two return() and retained the last one:
return(df[start_pt:end_pt,])
This worked, i got the expected result.