JanusGraph - Warning about all vertices scan after index was created - gremlin

I am using Janusgraph 0.2.0 and have the following vertex defined (in Python):
class Airport(TypedVertex):
type = goblin.VertexProperty(goblin.String, card=Cardinality.single)
airport_code = goblin.VertexProperty(goblin.String,
card=Cardinality.single)
airport_city = goblin.VertexProperty(goblin.String,
card=Cardinality.single)
airport_name = goblin.VertexProperty(goblin.String,
card=Cardinality.single)
airport_region = goblin.VertexProperty(goblin.String,
card=Cardinality.single)
airport_runways = goblin.VertexProperty(goblin.Integer,
card=Cardinality.single)
airport_longest_runway = goblin.VertexProperty(goblin.Integer,
card=Cardinality.single)
airport_elev = goblin.VertexProperty(goblin.Integer,
card=Cardinality.single)
airport_country = goblin.VertexProperty(goblin.String,
card=Cardinality.single)
airport_lat = goblin.VertexProperty(goblin.Float,
card=Cardinality.single)
airport_long = goblin.VertexProperty(goblin.Float,
card=Cardinality.single)
I then defined an index for this node on the airport code field using the following commands (some commands were excluded to keep it short).
mgmt.makePropertyKey('type').dataType(String.class).cardinality(Cardinality.SINGLE).make()
mgmt.makePropertyKey('airport_city').dataType(String.class).cardinality(Cardinality.SINGLE).make()
mgmt.makePropertyKey('airport_code').dataType(String.class).cardinality(Cardinality.SINGLE).make()
mgmt.makePropertyKey('airport_country').dataType(String.class).cardinality(Cardinality.SINGLE).make()
airport_code = mgmt.getPropertyKey('airport_code')
airport_city = mgmt.getPropertyKey('airport_city')
airport_country = mgmt.getPropertyKey('airport_country')
mgmt.buildIndex('by_airport_code_unique', Vertex.class).addKey(airport_code).unique().buildCompositeIndex()
mgmt.buildIndex('by_airport_city', Vertex.class).addKey(airport_city).buildCompositeIndex()
mgmt.buildIndex('by_airport_country', Vertex.class).addKey(airport_country).buildCompositeIndex()
mgmt.awaitGraphIndexStatus(graph, 'by_airport_code_unique').call()
mgmt.awaitGraphIndexStatus(graph, 'by_airport_city').call()
mgmt.awaitGraphIndexStatus(graph, 'by_airport_country').call()
After the creating, I use a script to describe the :schema and I see that all the indexes are Registered:
| Graph Index . | Type . | Element | Unique | Backing | PropertyKey | Status |
|-----------------------:|:-----|:--------|:-------|:--------|:-----------|:--------|
| by_airport_code_unique | Composite | JanusGraphVertex | true | internalindex | airport_code | REGISTERED |
| by_airport_city | Composite | JanusGraphVertex | false | internalindex | airport_city | REGISTERED |
| by_airport_country | Composite | JanusGraphVertex | false | internalindex | airport_country | REGISTERED |
When I try to insert the second vertex with the same airport_code, as expected, I get an exception on constraint violation. However, if I go into the gremlin console and run a traversal to retrieve the vertices by their airport_code:
g.V().has('airport_code').values()
I get a warning: WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertices [()]. For better performance, use indexes
I had a similar problem a few weeks ago, and the issue was that I was trying to define indexes based on labels and I was told that at the time, janusgraph does not support indexes on labels. However, I don't think this is the case here.
Any suggestions or ideas on why my index is not working or not being used?
Thanks in advance for any help.
--MD

You are seeing the warning because your query does not utilize the index. A composite index is used for equality matches.
Composite indexes are very fast and efficient but limited to equality lookups for a particular, previously-defined combination of property keys. Mixed indexes can be used for lookups on any combination of indexed keys and support multiple condition predicates in addition to equality depending on the backing index store.
In order to leverage a composite index, you need to provide the property and a value to match. For example:
g.V().has('airport_code', 'JFK').toList()
I'm not sure why the index wasn't ENABLED after creation, perhaps something in the steps you left out. If you create the index within the same management transaction as the property keys, it should be ENABLED rather than REGISTERED. Check out the index lifecycle wiki.

Related

Using Indexes results in Update locks cannot be acquired during a READ UNCOMMITTED transaction

After upgrading to mariadb 10.5.11 I ran into a weird problem with the indexes.
Simple table with two colums Type(varchar) and Point(point)
An index on Type(Tindex) and a spatial index on Point(Pindex)
Now a query like
SELECT X(Point) as x,Y(Point) as y,hotels.Type FROM hotels WHERE (Type in ("acco")) AND MBRContains( GeomFromText( 'LINESTRING(4.922 52.909,5.625 52.483)' ), hotels.Point)
;
Results in a
Error in query (1207): Update locks cannot be acquired during a READ UNCOMMITTED transaction
While both
SELECT X(Point) as x,Y(Point) as y,hotels.Type FROM hotels USE INDEX (Pindex) WHERE (Type in ("acco")) AND MBRContains( GeomFromText( 'LINESTRING(4.922 52.909,5.625 52.483)' ), hotels.Point)
;
and
SELECT X(Point) as x,Y(Point) as y,hotels.Type FROM hotels USE INDEX (Tindex) WHERE (Type in ("acco")) AND MBRContains( GeomFromText( 'LINESTRING(4.922 52.909,5.625 52.483)' ), hotels.Point)
;
work fine. As mariadb 10.5.10 did
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | hotels | range|filter | Type,Pindex | Pindex|Type | 34|302 | NULL | 340 (4%) | Using where; Using rowid filter |
The issue is now being tracked as MDEV-26123 (I guess you reported it there). The issue description says that the problem was introduced in MariaDB 10.2.39, 10.3.30, 10.4.20, 10.5.11, 10.6.1.
I ran into the issue after upgrading to MariaDB 10.6.4. I downgraded to 10.6.0, which was possible without having to do any migration of the data. It seems to have fixed the problem for now.
The cause of this appears to be the code fix for MDEV-25594.
I cannot see anything in the commit message or discussion there that indicates that a change to the READ UNCOMMITTED behavior was intentional.
There are no open bug reports on this so I recommend you create a new bug report.
select ##session.autocommit;
set ##session.autocommit=0;
select ##session.autocommit;
#add in my.cnf
autocommit = 0
using mariadb 10.2.40 ( resolved )
https://developpaper.com/transaction-isolation-level-of-mariadb/

Can we make pack_all consider only non-null & non-empty columns

pack_all() function considers all the input columns while making a dynamic object. Is it possible to somehow force it to consider only non-empty & non-null columns? If not, is there any workaround to apply filter on top of the resulting dynamic value?
There is no flavor of pack_all that will do it, but as an alternative, you can combine mv-apply and mv-expand operators to achieve this. Here is an example (adapted from the docs):
datatable(SourceNumber:string,TargetNumber:string,CharsCount:long)
[
'555-555-1234','555-555-1212',46,
'555-555-1234','555-555-1213',50,
'555-555-1212','',int(null)
]
| extend values =pack_all()
| mv-apply removeProperties = values on
(
mv-expand kind = array values
| where isempty(values[1])
| summarize propsToRemove = make_set(values[0])
)
| extend values = bag_remove_keys(values, propsToRemove)
| project-away propsToRemove
It should be added as a new answer, that pack_all() did in the meantime get a new option to exclude null/empty values
pack_all([ignore_null_empty])
ignore_null_empty: An optional bool indicating whether to
ignore null/empty columns and exclude them from the resulting property
bag. Default: false.
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/packallfunction

Use query to populate an extend in KQL?

I have a JSON property bag being received as an update to configuration. I would like to retrieve the existing latest property bag from the table, manipulate it to remove the keys that are being updated using the bag_remove_keys and bag_merge operators.
This is being used in an update policy, so I have the new property bag in an extend from the input data, I need to perform a new extend to retrieve the current latest property set existing in the table.
Something similar to the below:
rawhsimessages
//Get #RecType
| extend ParsedMessage = parse_json(Message)
| extend Objects = ParsedMessage["ExportedConfig"]["Objects"]
| parse Objects with "[" Objects "]"
| extend Properties = parse_json(Objects)
| extend RecType = Properties["#RecType"]
| where RecType == "CHANGE"
| extend latestconfig = XXXX(GeoSCADAConfigurationTest | where Id == Properties["Id"] | summarize arg_max(ConfigTime, Properties)
| project-away Message, ParsedMessage, Objects
Can I replace the XXXX with anything that will allow me to do this?
If not, is there a better approach I can take?

Summarizing amount of times options are selected true/false in a concatenated string

I'm pretty new to KQL and I'm having a difficult time with it (I don't have a background in stats, and I'm not very good at SQL either). I have telemetry data coming in from Microsoft AppCenter that I want to parse out into some charts but I'm trying to first figure out how to split a concatenated string that is essentially a dictionary that has two possible values: true and false. I want to count the number of each, so every key would have 2 values (true/false) which would also each have a numerical count value.
The input string I'm trying to get this data from is of the format Remove Splash/Main Menu Branding=True;Disable Aim Assist=False - unique items are split by ; and each pair is split by =. I am trying to figure out which options my users are using this way. The example string here would be split into:
Remove Splash/Main Menu Branding = True (count 1)
Disable Aim Assist = False (count 1).
If a new item came in that was Remove Splash/Main Menu Branding=True;Disable Aim Assist=True the summarized data would be
Remove Splash/Main Menu Branding = True (count 2)
Disable Aim Assist = False (count 1).
Disable Aim Assist = True (count 1).
So far I've got a query that selects a single item, but I don't know how to count this across multiple rows:
customEvents
| where timestamp > ago(7d)
| where name == "Installed a mod"
| extend Properties = todynamic(tostring(customDimensions.Properties))
| where isnotnull(Properties.["Alternate Options Selected"])
| extend OptionsStr = Properties.["Alternate Options Selected"] //The example string in above
| extend ModName = Properties.["Mod name"]
| where ModName startswith "SP Controller Support" //want to filter only to one mod's options
| extend optionsSplit = split(OptionsStr, ";")
| summarize any(optionsSplit)
I'm not sure how to make counts of it in a dictionary though. If anyone has any suggestions or tips or examples on something like this, I would really appreciate it, thanks.
Here you go:
let MyTable = datatable(Flags:string) [
"Remove Splash/Main Menu Branding=True;Disable Aim Assist=False",
"Remove Splash/Main Menu Branding=True;Disable Aim Assist=True"
];
MyTable
| extend Flags = split(Flags, ";")
| mv-expand Flag = Flags to typeof(string)
| summarize Count = count() by Flag
The output of this is:
| Flag | Count |
|---------------------------------------|-------|
| Remove Splash/Main Menu Branding=True | 2 |
| Disable Aim Assist=False | 1 |
| Disable Aim Assist=True | 1 |
Explanation:
First you split every input string (that contains multiple flags) into substrings, so that each will only have a single flag - you achieve this by using split.
Now your new Flags column has a list of strings (each one containing a single flag), and you want to create a record with every string, so you use the mv-expand operator
Lastly, you want to count how many times every key=value pair appears, and you do it with summarize count() by Flag
In case you want to see one record (in the output) per Key, then you can use the following query instead:
let MyTable = datatable(Flags:string) [
"Remove Splash/Main Menu Branding=True;Disable Aim Assist=False",
"Remove Splash/Main Menu Branding=True;Disable Aim Assist=True"
];
MyTable
| extend Flags = split(Flags, ";")
| mv-expand Flag = Flags to typeof(string)
| parse Flag with Key "=" Value
| project Key, Value
| evaluate pivot(Value, count(Value))
Its output is:
| Key | False | True |
|----------------------------------|-------|------|
| Remove Splash/Main Menu Branding | 0 | 2 |
| Disable Aim Assist | 1 | 1 |
You wrote that you're new to KQL, so you might find the following free Pluralsight courses interesting:
How to start with Microsoft Azure Data Explorer
Basic KQL
Azure Data Explorer – Advanced KQL
P.S. In the future please provide sample input in datatable format (if you're using Kusto Explorer, just select the relevant query results, right-click on the selection, and click Copy as datatable() literal), and also the expected output in a table format, so that it will be easier to understand what you want to achieve.

(for-each-row scenario).in kusto

Query1
cluster(x).database('$systemdb').Operations
| where Operation == "DatabaseCreate" and Database contains "oci-"| where State =='Completed'
and StartedOn between (datetime(2020-04-07) .. 3d)
| distinct Database , StartedOn
| order by StartedOn desc
Output of my query1 is list of databases , now I have to pass each db value into query2 to get buildnumber
Query2:
set query_take_max_records=5000;
let view=datatable(Property:string,Value:dynamic)[];
let viewFile=datatable(FileName:string)[];
alias database db = cluster(x).database('y');
let latestInfoFile = toscalar((
union isfuzzy=true viewFile,database('db').['TextFileLogs']
| where FileName contains "AzureStackStampInformation"
| distinct FileName
| order by FileName
| take 1));
union isfuzzy=true view,(
database('db').['TextFileLogs']
| where FileName == latestInfoFile
| distinct LineNumber,FileLineContent
| order by LineNumber asc
| summarize StampInfo=(toobject(strcat_array(makelist(FileLineContent,100000), "\r\n")))
| mvexpand bagexpansion=array StampInfo
| project Property=tostring(StampInfo[0]), Value=StampInfo[1]
)|where Property contains "StampVersion" | project BuildNumber = Value;
database() function: is a special scoping function, and it does not support non-constant arguments due to security consideration.
As a result - you cannot use sub-query to fetch list of databases and then operate on this list as input for database() function.
This behavior is described at:
https://learn.microsoft.com/en-us/azure/kusto/query/databasefunction?pivots=azuredataexplorer
Syntax
database(stringConstant)
Arguments
stringConstant: Name of the database that is referenced. Database identified can be either DatabaseName or PrettyName. Argument has to be constant prior of query execution, i.e. cannot come from sub-query evaluation.

Resources