Search matching data on list/datatable and table

Search matching data on list/datatable and table - azure-application-insights

I am trying to search all requests that matches certain regular expression (simple wildcards mostly), but I am not sure how to proceed. I can easily do this with a single endpoint, where I would use it directly in the query, but I have multiple endpoints per type/group of requests I want to get data for (for example, I need all request for login that matches multiple endpoints, same for, logouts, posts, etc...).
I have, for example, a dynamic array (even tried with a string delimited by a semicolon, which I split) but I get the same error at the end.
let login = dynamic(["https://example.com/*/login*", "https://example.com/login*"]);
requests
| extend group = "Login"
| mv-expand endpoint = login
| where url matches regex endpoint
'matches regex' operator requires string arguments
Tried force casting and using typeof string, but that doesn't seem to help...
let login = dynamic(["https://example.com/*/login*", "https://example.com/login*"]);
requests
| extend group = "Login"
| mv-expand endpoint = login to typeof(string)
| where url matches regex tostring(endpoint) // tried with and without tostring
matches regex: failed to cast argument 2 to scalar constant
I tried using a datatable, I wanted to do something like this, but not sure how to proceed with that...:
let TEndpoints = datatable(group: string, endpoints: dynamic)
[
"Login", dynamic(["https://example.com/*/login*", "https://example.com/login*"]),
"Logouts", dynamic(["https://example.com/*/logout*", "https://example.com/logout*"]),
"Register", dynamic(["https://example.com/*/register*", "https://example.com/register*"])
];
Note that the endpoints are really just example. The wildcards are due to different locales and geo-regions.
Anyone has any idea if this is achievable?
Thanks,

That was a challenging one.
It seems all of KQL regex functions & operators accept only string literals / arguments as patterns, except for one exception I was able to identify in context of print, e.g.:
print str = 'A', pat = '.'
| where str matches regex pat
or
print str = 'A', pat = '.'
| union (print str = 'B', pat = '.')
| where str matches regex pat
I was also able to manipulate partition by and use is it to answer your question.
Please note that I am not projecting the login column within the partition by brackets, however I do use login later on with match regex.
My guess is that login, the partition value, is stored behind the scenes in a variable and therefore can be used with match regex.
P.S.
The patterns you supplied are not regex patterns, but globbing, so I changed them accordingly.
let requests = datatable(id:int, url:dynamic)
[
1 ,dynamic(["https://example.com/login123","https://example.com/foo/bar/login123"]),
2 ,dynamic(["https://example.com/tic/login","https://example.com/tic/login/tac/toe"])
];
let login = dynamic(["https://example.com/[^/]+/login.*", "https://example.com/login.*"]);
requests
| mv-expand url
| mv-expand login = login
| extend url = tostring(url), login = tostring(login)
| partition by login (project id, url | where url matches regex login)
id
url
1
https://example.com/login123
2
https://example.com/tic/login
2
https://example.com/tic/login/tac/toe
Fiddle

Related

How can I delete all keys that don't match certain names with JQ?

I have a huge JSON file with lots of stuff I don't care about, and I want to filter it down to only the few keys I care about, preserving the structure. I won't bother if the same key name might occur in different paths and I get both of them. I gleaned something very close from the answers to this question, it taught me how to delete all properties with certain values, like all null values:
del(..|nulls)
or, more powerfully
del(..|select(. == null))
I searched high and low if I could write a predicate over the name of a property when I am looking at a property. I come from XSLT where I could write something like this:
del(..|select(name|test("^(foo|bar)$")))
where name/1 would be the function that returns the property name or array index number where the current value comes from. But it seems that jq lacks the metadata on its values, so you can only write predicates about their value, and perhaps the type of their value (that's still just a feature of the value), but you cannot inspect the name, or path leading up to it?
I tried to use paths and leaf_paths and stuff like that, but I have no clue what that would do and tested it out to see how this path stuff works, but it seems to find child paths inside an object, not the path leading up to the present value.
So how could this be done, delete everything but a set of key values? I might have found a way here:
walk(
if type == "object" then
with_entries(
select( ( .key |test("^(foo|bar|...)$") )
and ( .value != "" )
and ( .value != null ) )
)
else
.
end
)
OK, this seems to work. But I still wonder it would be so much easier if we had a way of querying the current property name, array index, or path leading up to the present item being inspected with the simple recusion ..| form.

In analogy to your approach using .. and del, you could use paths and delpaths to operate on a stream of path arrays, and delete a given path if not all of its elements meet your conditions.
delpaths([paths | select(all(IN("foo", "bar") or type == "number") | not)])
For the condition I used IN("foo", "bar") but (type == "string" and test("^(foo|bar)$")) would work as well. To also retain array elements (which have numeric indices), I added or type == "number".

Unlike in XML, there's no concept of attributes in jq. You'll need to delete from objects.
To delete an element of an object, you need to use del( obj[ key ] ) (or use with_entries). You can get a stream of the keys of an object using keys[]/keys_unsorted[] and filter out the ones you don't want to delete.
Finally, you need to invert the result of test because you want to delete those that don't match.
After fixing these problems, we get the following:
INDEX( "foo", "bar" ) as $keep |
del(
.. | objects |
.[
keys_unsorted[] |
select( $keep[ . ] | not )
]
)
Demo on jqplay
Note that I substituted the regex match with a dictionary lookup. You could use test( "^(?:foo|bar)\\z" ) in lieu of $keep[ . ], but a dictionary lookup should be faster than a regex match. And it should be less error-prone too, considering you misused $ and (...) in lieu of \z and (?:...).
The above visits deleted branches for nothing. We can avoid that by using walk instead of ...
INDEX( "foo", "bar" ) as $keep |
walk(
if type == "object" then
del(
.[
keys_unsorted[] |
select( $keep[ . ] | not )
]
)
else
.
end
)
Demo on jqplay
Since I mentioned one could use with_entries instead of del, I'll demonstrate.
INDEX( "foo", "bar" ) as $keep |
walk(
if type == "object" then
with_entries( select( $keep[ .key ] ) )
else
.
end
)
Demo on jqplay

Here's a solution that uses a specialized variant of walk for efficiency (*). It retains objects all keys of which are removed; only trivial changes are needed if a blacklist or some other criterion (e.g., regexp-based) is given instead. WHITELIST should be a JSON array of the key names to be retained.
jq --argjson whitelist WHITELIST '
def retainKeys($array):
INDEX($array[]; .) as $keys
| def r:
if type == "object"
then with_entries( select($keys[.key]) )
| map_values( r )
elif type == "array" then map( r )
else .
end;
r;
retainKeys($whitelist)
' input.json
(*) Note for example:
the use of INDEX
the recursive function, r, has arity 0
for objects, the top-level deletion occurs first.

Here's a space-efficient, walk-free approach, tailored for the case of a WHITELIST. It uses the so-called "streaming" parser, so the invocation would look like this:
jq -n --stream --argjson whitelist WHITELIST -f program.jq input.json
where WHITELIST is a JSON array of the names of the keys to be deleted, and
where program.jq is a file containing the program:
# Input: an array
# Output: the longest head of the array that includes only numbers or items in the dictionary
def acceptable($dict):
last(label $out
| foreach .[] as $x ([];
if ($x|type == "number") or $dict[$x] then . + [$x]
else ., break $out
end));
INDEX( $whitelist[]; .) as $dict
| fromstream(inputs
| if length==2
then (.[0] | acceptable($dict)) as $p
| if ($p|length) == (.[0]|length) - 1 then .[0] = $p | .[1] = {}
elif ($p|length) < (.[0]|length) then empty
else .
end
else .
end )
Note: The reason this is relatively complicated is that it assumes that you want to retain objects all of whose keys have been removed, as illustrated in the following example. If that is not the case, then the required jq program is much simpler.
Example:
WHITELIST: '["items", "config", "spec", "setting2", "name"]'
input.json:
{
"items": [
{
"name": "issue1",
"spec": {
"config": {
"setting1": "abc",
"setting2": {
"name": "xyz"
}
},
"files": {
"name": "cde",
"path": "/home"
},
"program": {
"name": "apache"
}
}
},
{
"name": {
"etc": 0
}
}
]
}
Output:
{
"items": [
{
"name": "issue1",
"spec": {
"config": {
"setting2": {
"name": "xyz"
}
}
}
},
{
"name": {}
}
]
}

I am going to put my own tentative answer here.
The thing is, the solution I had already in my question, meaning I can select keys during forward navigation, but I cannot find out the path leading up to the present value.
I looked around in the source code of jq to see how come we cannot inquire the path leading up to the present value, so we could ask for the key string or array index of the present value. And indeed it looks like jq does not track the path while it walks through the input structure.
I think this is actually a huge opportunity forfeited that could be so easily kept track during the tree walk.
This is why I continue thinking that XML with XSLT and XPath is a much more robust data representation and tool chain than JSON. In fact, I find JSON harder to read even than XML. The benefit of the JSON being so close to javascript is really only relevant if - as I do in some cases - I read the JSON as a javascript source code assigning it to a variable, and then instrument it by changing the prototype of the anonymous JSON object so that I have methods to go with them. But changing the prototype is said to cause slowness. Though I don't think it does when setting it for otherwise anonymous JSON objects.
There is JsonPath that tries (by way of the name) to be something like what XPath is for XML. But it is a poor substitute and also has no way to navigate up the parent (or then sibling) axes.
So, in summary, while selecting by key in white or black lists is possible in principle, it is quite hard, because a pretty easy to have feature of a JSON navigation language is not specified and not implemented. Other useful features that could be easily achieved in jq is backward navigation to parent or ancestor of the present value. Currently, if you want to navigate back, you need to capture the ancestor you want to get back to as a variable. It is possible, but jq could be massively improved by keeping track of ancestors and paths.

ELM get query parameter as string

Based on this post and thanks to the #glennsl iam getting some where.
First if someone has a link that i could learn about the parses i will be very glad.
page : Url.Url -> String
page url =
case (Parser.parse (Parser.query (Query.string "name")) url) of
Nothing -> "My query string: " ++ (Maybe.withDefault "empty" url.query)
Just v -> case v of
Just v2 -> "Finnaly a name"
Nothing -> "????"
As far i can understand the expression Parser.parse (Parser.query (Query.string "name")) urlis returning a Maybe (Maybe String) I see this as the parser could return something, and if do it could be an string, is that right?
In my mind if i have the parameter name in my url then my first Just would be executed and then i can get the name.
But no mather what i put on my url it always go the the first Nothing
The result i got

The problem is that you're not parsing the path part of the URL, which is what Url.Parser is primarily for. You have to match the path exactly.
Here's a parser that will match your URL:
s "src" </> s "Main.elm" <?> (Query.string "name")
Note also that parsing the query string is optional, meaning this will also match your URL:
s "src" </> s "Main.elm"
But as long as you include a query param parser, that also has to match.
If all you care about is the query parameter, you'll have to parse the query string specifically, by either writing your own function to do so, or using a library like qs for example:
QS.parse
QS.config
"?a=1&b=x"
== Dict.fromList
[ ( "a", One <| Number 1 )
, ( "b", One <| Str "x" )
]

A Search Query Term Should Not Be Prefix of Any Data in Db

Give a table User_DNA and column a sequence,
I want to write a query such that if data in the sequence column matches or is prefix of the search query term, it should return an item found or true.
e.g,
sequence
dog
hor
cat
tig
cat
if my search query is doga (has dog as prefix in db), horrible(has hor as prefix in db),tiger(has tig as prefix in db), caterpillar(has cat as prefix in db), the query should return true as all these search queries have prefixes in the database.
What should my sql search query?
Thanks

If you use Room, you can try this (using Kotlin):
#Query("select * from User_DNA WHERE sequence LIKE :search")
fun getItem(search: String): UserDNA?
Setting your search parameter to this method you should put the search pattern as "[your search string]%", for example: "dog%"
if my search query is dog, the query should return true or one of the
items (dogsequence/doggysequence) what ever is efficient
You can check result of the query - if it's null, then there are no matching values in your column.
UPDATED
If you want to find "hor" with "horrible" I can propose next way (maybe it's a task for RegExp but honestly I haven't used it in ROOM):
You can put two methods in your DAO. One method is auxiliary, its task - is ti fill a list with words, that we want to find. For example, for pattern "horrible" that method should prepare list with {"horrible", "horribl", "horrib", "horri", "horr", "hor"}.
Second method should fetch a result from SQLite where your fields holds value from the list prepared on step1. This method should be annotated with Room annotation.
So first method prepares list, invokes query for searching word in SQLite and returns it to ViewModel (or Repository).
Something like this:
#Query("select * from User_DNA WHERE sequence IN (:search) ORDER BY sequence")
fun getItem(search: List<String>): User_DNA?
fun findItem(search: String): User_DNA? {
val searchList = mutableListOf<String>()
val minimalStringLength = 2 // it's up to you, maybe 1?
while (search.length > minimalStringLength) {
searchList.add(search)
search = search.dropLast(1)
}
return getItem(searchList.toList())
}

Find a specific tuple by key in an Erlang list (eJabberd HTTP Header)

I am just getting started with eJabberd and am writing a custom module with HTTP access.
I have the request going through, but am now trying to retrieve a custom header and that's where I'm having problems.
I've used the Request record to get the request_headers list and can see that it contains all of the headers I need (although the one I'm after is a binary string on both the key and value for some reason...) as follows:
[
{ 'Content-Length', <<"100">> },
{ <<"X-Custom-Header">>, <<"CustomValue">> },
{ 'Host', <<"127.0.0.1:5280">> },
{ 'Content-Type', <<"application/json">> },
{ 'User-Agent', <<"Fiddler">> }
]
This is also my first foray into functional programming, so from procedural perspective, I would loop through the list and check if the key is the one that I'm looking for and return the value.
To this end, I've created a function as:
find_header(HeaderKey, Headers) ->
lists:foreach(
fun(H) ->
if
H = {HeaderKey, Value} -> H;
true -> false
end
end,
Headers).
With this I get the error:
illegal guard expression
I'm not even sure I'm going about this the right way so am looking for some advice as to how to handle this sort of scenario in Erlang (and possibly in functional languages in general).
Thanks in advance for any help and advice!
PhilHalf

The List that you have mentioned is called a "Property list", which is an ordinary list containing entries in the form of either tuples, whose first elements are keys used for lookup and insertion or atoms, which work as shorthand for tuples {Atom, true}.
To get a value of key, you may do the following:
proplists:get_value(Key,List).
for Example to get the Content Length:
7> List=[{'Content-Length',<<"100">>},
{<<"X-Custom-Header">>,<<"CustomValue">>},
{'Host',<<"127.0.0.1:5280">>},
{'Content-Type',<<"application/json">>},
{'User-Agent',<<"Fiddler">>}].
7> proplists:get_value('Content-Type',List).
<<"application/json">>

You can use the function lists:keyfind/3:
> {_, Value} = lists:keyfind('Content-Length', 1, Headers).
{'Content-Length',<<"100">>}
> Value.
<<"100">>
The 1 in the second argument tells the function what tuple element to compare. If, for example, you wanted to know what key corresponds to a value you already know, you'd use 2 instead:
> {Key, _} = lists:keyfind(<<"100">>, 2, Headers).
{'Content-Length',<<"100">>}
> Key.
'Content-Length'
As for how to implement this in Erlang, you'd write a recursive function.
Imagine that you're looking at the first element of the list, trying to figure out if this is the entry you're looking for. There are three possibilities:
The list is empty, so there is nothing to compare.
The first entry matches. Return it and ignore the rest of the list.
The first entry doesn't match. Therefore, the result of looking for this key in this list is the same as the result of looking for it in the remaining elements: we recurse.
find_header(_HeaderKey, []) ->
not_found;
find_header(HeaderKey, [{HeaderKey, Value} | _Rest]) ->
{ok, Value};
find_header(HeaderKey, [{_Key, _Value} | Rest]) ->
find_header(HeaderKey, Rest).
Hope this helps.

pyparsing for querying a database of chemical elements

I would like to parse a query for a database of chemical elements.
The database is stored in a xml file. Parsing that file produces a nested dictionary that is stored in a singleton object that inherit from collections.OrderedDict.
Asking for an element will give me an ordered dictionary of its corresponding properties
(i.e. ELEMENTS['C'] --> {'name':'carbon','neutron' : 0,'proton':6, ...}).
Conversely, asking for a propery will give me an ordered dictionary of its values for all the elements (i.e. ELEMENTS['proton'] --> {'H' : 1, 'He' : 2} ...).
A typical query could be:
mass > 10 or (nucleon < 20 and atomic_radius < 5)
where each 'subquery' (i.e. mass > 10) will return the set of elements that matches it.
Then, the query will be converted and transformed internally to a string that will be evaluated further to produce a set of the indexes of the elements that matched it. In that context the operators and/or are not boolean operator but rather ensemble operator that acts upon python sets.
I recently sent a post for building such a query. Thanks to the useful answers I got, I think that I did more or less the job (I hope on a nice way !) but I still have some questions related to pyparsing.
Here is my code:
import numpy
from pyparsing import *
# This import a singleton object storing the datase dictionary as
# described earlier
from ElementsDatabase import ELEMENTS
and_operator = oneOf(['and','&'], caseless=True)
or_operator = oneOf(['or' ,'|'], caseless=True)
# ELEMENTS.properties is a property getter that returns the list of
# registered properties in the database
props = oneOf(ELEMENTS.properties, caseless=True)
# A property keyword can be quoted or not.
props = Suppress('"') + props + Suppress('"') | props
# When parsed, it must be replaced by the following expression that
# will be eval later.
props.setParseAction(lambda t : "numpy.array(ELEMENTS['%s'].values())" % t[0].lower())
quote = QuotedString('"')
integer = Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0]))
float_ = Regex(r'[+-]?(\d+(\.\d*)?)?([eE][+-]?\d+)?').setParseAction(lambda t:float(t[0]))
comparison_operator = oneOf(['==','!=','>','>=','<', '<='])
comparison_expr = props + comparison_operator + (quote | float_ | integer)
comparison_expr.setParseAction(lambda t : "set(numpy.where(%s)%s%s)" % tuple(t))
grammar = Combine(operatorPrecedence(comparison_expr, [(and_operator, 2, opAssoc.LEFT) (or_operator, 2, opAssoc.LEFT)]))
# A test query
res = grammar.parseString('"mass " > 30 or (nucleon == 1)',parseAll=True)
print eval(' '.join(res._asStringList()))
My question are the following:
1 using 'transformString' instead of 'parseString' never triggers any
exception even when the string to be parsed does not match the grammar.
However, it is exactly the functionnality I need. Is there is a way to do so ?
2 I would like to reintroduce white spaces between my tokens in order
that my eval does not fail. The only way I found to do so it the one
implemented above. Would you see a better way using pyparsing ?
sorry for the long post but I wanted to introduce in deeper details its context. BTW, if you find this approach bad, do not hesitate to tell it me!
thank you very much for your help.
Eric

do not worry about my concern, I found a work around. I used the SimpleBool.py example shipped with pyparsing (thanks for the hint Paul).
Basically, I used the following approach:
1 for each subquery (i.e. mass > 10), using the setParseAction method,
I joined a function that returns the set of eleements that matched
the subquery
2 then, I joined the following functions for each logical operator (and,
or and not):
def not_operator(token):
_, s = token[0]
# ELEMENTS is the singleton described in my original post
return set(ELEMENTS.keys()).difference(s)
def and_operator(token):
s1, _, s2 = token[0]
return (s1 and s2)
def or_operator(token):
s1, _, s2 = token[0]
return (s1 or s2)
# Thanks for Paul for the hint.
grammar = operatorPrecedence(comparison_expr,
[(not_token, 1,opAssoc.RIGHT,not_operator),
(and_token, 2, opAssoc.LEFT,and_operator),
(or_token, 2, opAssoc.LEFT,or_operator)])
Please not that these operators acts upon python sets rather than
on booleans.
And that does the job.
I hope that this approach will help anyone of you.
Eric