How to recover tabular data using a Neo4j query? - graph

I've stored numeric tabular data as relationship properties in a Neo4j database. I would like to recover the data in tabular form.
For instance, one node was stored as follows:
MATCH (g:GNE),(p:EXP)
WHERE g.etr='5313' AND p.NExp='Bos_RM'
CREATE UNIQUE (p)-[r:Was_norm
{Method:'NULL', time_t_35: '6.04',time_t9: '6.587',time_t14: '5.708',time_t31: '6.89',time_t224: '4.842'}
]->(g)
I tried a query like this:
MATCH (g:GNE)-[r1:Was_sel]-(e:EXP)-[r2:Was_norm]-(g)
WHERE e.NExp = 'Bos_SM'
RETURN g.etr,r2
but I'd like to recover the data in tabular form, and in the correct order.
Does anyone have any suggestions?

It may not be possible to do what you want with your current data model, given Cypher's current capabilities. Part of the problem is that there is no way to get a property value without hardcoding (in your query) the name of the property. Another part of the problem is that property keys are not necessarily returned in the original order (or in any predictable order).
Instead, you can get around these problems by changing the way you store your tabular data.
For example, suppose you stored a node this way (notice that the collections are stored in the desired order):
MATCH (g:GNE),(p:EXP)
WHERE g.etr='5313' AND p.NExp='Bos_RM'
CREATE UNIQUE
(p)-[r:Was_norm {
Method:'NULL',
times: [ 9, 14, 31, 224],
values:[6.587, 5.708, 6.89, 4.842]
}]->(g)
Given the above data model, you can easily get the tabular data back as 2 separate arrays:
MATCH (g:GNE)-[r:Was_norm]->(p:EXP)
WHERE g.etr='5313' AND p.NExp='Bos_RM'
RETURN g.etr, r.times, r.values;
Or, if you wanted to get the data back in a single array:
MATCH (g:GNE)-[r:Was_norm]->(p:EXP)
WHERE g.etr='5313' AND p.NExp='Bos_RM'
RETURN g.etr,
REDUCE(s =[], i IN RANGE(0,LENGTH(r.times)-1) | s + { time: r.times[i], value: r.values[i]}) AS table;
The result of the above query (see this console) would look like this:
+-------------------------------------------------------------------------------------------------------+
| g.etr | table |
+-------------------------------------------------------------------------------------------------------+
| "5313" | [{time=9, value=6.587},{time=14, value=5.708},{time=31, value=6.89},{time=224, value=4.842}] |
+-------------------------------------------------------------------------------------------------------+

Related

In KQL how can I use bag_unpack to turn a serialized dictionary object in customDimensions into columns?

I'm trying to write a KQL query that will, among other things, display the contents of a serialized dictionary called Tags which has been added to the Application Insights traces table customDimensions column by application logging.
An example of the serialized Tags dictionary is:
{
"Source": "SAP",
"Destination": "TC",
"SAPDeliveryNo": "0012345678",
"PalletID": "(00)312340123456789012(02)21234987654(05)123456(06)1234567890"
}
I'd like to use evaluate bag_unpack(...) to evaluate the JSON and turn the keys into columns. We're likely to add more keys to the dictionary as the project develops and it would be handy not to have to explicitly list every column name in the query.
However, I'm already using project to reduce the number of other columns I display. How can I use both a project statement, to only display some of the other columns, and evaluate bag_unpack(...) to automatically unpack the Tags dictionary into columns?
Or is that not possible?
This is what I have so far, which doesn't work:
traces
| where datetime_part("dayOfYear", timestamp) == datetime_part("dayOfYear", now())
and message has "SendPalletData"
| extend TagsRaw = parse_json(customDimensions.["Tags"])
| evaluate bag_unpack(TagsRaw)
| project timestamp, message, ActionName = customDimensions.["ActionName"], TagsRaw
| order by timestamp desc
When it runs it displays only the columns listed in the project statement (including TagsRaw, so I know the Tags exist in customDimensions).
evaluate bag_unpack(TagsRaw) doesn't automatically add extra columns to the result set unpacked from the Tags in customDimensions.
EDIT: To clarify what I want to achieve, these are the columns I want to output:
timestamp
message
ActionName
TagsRaw
Source
Destination
SAPDeliveryNo
PalletID
EDIT 2: It turned out a major part of my problem was that double quotes within the Tags data are being escaped. While the Tags as viewed in the Azure portal looked like normal JSON, and copied out as normal JSON, when I copied out the whole of a customDimensions record the Tags looked like "Tags": "{\"Source\":\"SAP\",\"Destination\":\"TC\", ... with the double quotes escaped with backslashes.
The accepted answer from David Markovitz handles this situation in the line:
TagsRaw = todynamic(tostring(customDimensions["Tags"]))
A few comments:
When filtering on timestamp, better use the timestamp column As Is, and do the manipulations on the other side of the equation.
When using the has[...] operators, prefer the case-sensitive one (if feasable)
Everything extracted from dynamic value is also dynamic, and when given a dynamic value parse_json() (or its equivalent, todynamic()), simply returns it, As Is.
Therefore, we need to treet customDimensions.["Tags"] in 2 steps:
1st, convert it to string. 2nd, convert the result to dynamic.
To reference a field within a dynamic type you can use X.Y, X["Y"], or "X['Y'].
No need to combine them as you did with customDimensions.["Tags"].
As the bag_unpack plugin doc states:
"The specified input column (Column) is removed."
In other words, TagsRaw does not exist following the bag_unpack operation.
Please note that you can add prefix to the columns generated by bag_unpack. Might make it easier to differentiate them from the rest of the columns.
While you can use project, using project-away is sometimes easier.
// Data sample generation. Not part of the solution.
let traces =
print c1 = "some columns"
,c2 = "we"
,c3 = "don't need"
,timestamp = ago(now()%1d * rand())
,message = "abc SendPalletData xyz"
,customDimensions = dynamic
(
{
"Tags":"{\"Source\":\"SAP\",\"Destination\":\"TC\",\"SAPDeliveryNo\":\"0012345678\",\"PalletID\":\"(00)312340123456789012(02)21234987654(05)123456(06)1234567890\"}"
,"ActionName":"Action1"
}
)
;
// Solution starts here
traces
| where timestamp >= startofday(now())
and message has_cs "SendPalletData"
| extend TagsRaw = todynamic(tostring(customDimensions["Tags"]))
,ActionName = customDimensions.["ActionName"]
| project-away c*
| evaluate bag_unpack(TagsRaw, "TR_")
| order by timestamp desc
timestamp
message
ActionName
TR_Destination
TR_PalletID
TR_SAPDeliveryNo
TR_Source
2022-08-27T04:15:07.9337681Z
abc SendPalletData xyz
Action1
TC
(00)312340123456789012(02)21234987654(05)123456(06)1234567890
0012345678
SAP
Fiddle
If I understand correctly, you want to use project to limit the number of columns that are displayed, but you also want to include all of the unpacked columns from TagsRaw, without naming all of the tags explicitly.
The easiest way to achieve this is to switch the order of your steps, so that you first do the project (including the TagsRaw column) and then you unpack the tags. If desired, you can then use project-away to specifically remove the TagsRaw column after you've unpacked it.

How do I remove special characters from an extracted value of a query?

I'm using the following code to get a user's recovery_token and store it in a variable:
Connect To Database psycopg2 ${DB_NAME}
... ${DB_USER_NAME}
... ${DB_USER_PASSWORD}
... ${DB_HOST}
... ${DB_PORT}
${RECOVERY_TOKEN}= Query select recovery_token FROM public."system_user" where document_number like '57136570514'
Looking at the log, the recovery_token is being saved as follows:
${RECOVERY_TOKEN} = [('eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpZCI6ImU3ZGM4MmNjLTliMGQtNDc3OC1hMzM0LWEyNjM4MDU1Mzk1MSIsImlhdCI6MTYyMzE5NjM4NSwiZXhwIjoxNjIzMTk2NDQ1fQ.mdsrQlgaWUol02tZO8dXlL3KEwY6kqwj5T7gfRDYVfU',)]
But I need what is saved in the variable ${RECOVERY_TOKEN} to be just the token, without the special characters [('',)]
${RECOVERY_TOKEN} = eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpZCI6ImU3ZGM4MmNjLTliMGQtNDc3OC1hMzM0LWEyNjM4MDU1Mzk1MSIsImlhdCI6MTYyMzE5NjM4NSwiZXhwIjoxNjIzMTk2NDQ1fQ.mdsrQlgaWUol02tZO8dXlL3KEwY6kqwj5T7gfRDYVfU
Is there any way I can remove the special characters?
Thanks in advance!!
The returned value is a list of tuples, a two-dimensional matrix (e.g. a table); if you have queried for 3 columns for example, the inner tuple would have 3 members. And if there were 5 records that match it, the list would have 5 tuples in it.
Thus to get the value you are after, get it from the matrix by its indexes (which are 0-based, e.g. the first element is with index "0"):
${RECOVERY_TOKEN}= Set Variable ${RECOVERY_TOKEN[0][0]}

How to update entries in a table within a nested dictionary?

I am trying to create an order book data structure where a top level dictionary holds 3 basic order types, each of those types has a bid and ask side and each of the sides has a list of tables, one for each ticker. For example, if I want to retrieve all the ask orders of type1 for Google stock, I'd call book[`orderType1][`ask][`GOOG]. I implemented that using the following:
bookTemplate: ([]orderID:`int$();date:"d"$();time:`time$();sym:`$();side:`$();
orderType:`$();price:`float$();quantity:`int$());
bookDict:(1#`)!enlist`orderID xkey bookTemplate;
book: `orderType1`orderType2`orderType3 ! (3# enlist(`ask`bid!(2# enlist bookDict)));
Data retrieval using book[`orderType1][`ask][`ticker] seems to be working fine. The problem appears when I try to add new order to a specific order book e.g:
testorder:`orderID`date`time`sym`side`orderType`price`quantity!(111111111;.z.D;.z.T;
`GOOG;`ask;`orderType1;100.0f;123);
book[`orderType1][`ask][`GOOG],:testorder;
Executing the last query gives 'assign error. What's the reason? How to solve it?
A couple of issues here. First one being that while you can lookup into dictionaries using a series of in-line repeated keys, i.e.
q)book[`orderType1][`ask][`GOOG]
orderID| date time sym side orderType price quantity
-------| -------------------------------------------
you can't assign values like this (can only assign at one level deep). The better approach is to use dot-indexing (and dot-amend to reassign values). However, the problem is that the value of your book dictionary is getting flattened to a table due to the list of dictionaries being uniform. So this fails:
q)book . `orderType1`ask`GOOG
'rank
You can see how it got flattened by inspecting the terminal
q)book
| ask
----------| -----------------------------------------------------------------
orderType1| (,`)!,(+(,`orderID)!,`int$())!+`date`time`sym`side`orderType`pric
orderType2| (,`)!,(+(,`orderID)!,`int$())!+`date`time`sym`side`orderType`pric
orderType3| (,`)!,(+(,`orderID)!,`int$())!+`date`time`sym`side`orderType`pric
To prevent this flattening you can force the value to be a mixed list by adding a generic null
q)book: ``orderType1`orderType2`orderType3 !(::),(3# enlist(`ask`bid!(2# enlist bookDict)));
Then it looks like this:
q)book
| ::
orderType1| `ask`bid!+(,`)!,((+(,`orderID)!,`int$())!+`date`time`sym`side`ord
orderType2| `ask`bid!+(,`)!,((+(,`orderID)!,`int$())!+`date`time`sym`side`ord
orderType3| `ask`bid!+(,`)!,((+(,`orderID)!,`int$())!+`date`time`sym`side`ord
Dot-indexing now works:
q)book . `orderType1`ask`GOOG
orderID| date time sym side orderType price quantity
-------| -------------------------------------------
which means that dot-amend will now work too
q).[`book;`orderType1`ask`GOOG;,;testorder]
`book
q)book
| ::
orderType1| `ask`bid!+``GOOG!(((+(,`orderID)!,`int$())!+`date`time`sym`side`o
orderType2| `ask`bid!+(,`)!,((+(,`orderID)!,`int$())!+`date`time`sym`side`ord
orderType3| `ask`bid!+(,`)!,((+(,`orderID)!,`int$())!+`date`time`sym`side`ord
Finally, I would recommend reading this FD whitepaper on how to best store book data: http://www.firstderivatives.com/downloads/q_for_Gods_Nov_2012.pdf

how to use spark(r) to partition data?

I have a cluster with 3 nodes.
I have a json file, with each line being a json string.
I need to partition the data into X blocks based on an ID field in each line of the json file, so that lines of jsons with the same ID will be calculated in the same node.
How can I do the partition?
I am using SparkR, The structure of the code looks like this:
getObj=function(x)
{
rec1= rjson:::fromJSON(x) ;
kv=list(rec1$id, rec1) ;
return(kv)
}
data= SparkR:::textFile(sc, "Path")
mapdata= SparkR:::map(data, getObj)
mapdataP=SparkR:::partitionBy(mapdata,100)
The id ranges from 1 to 100. I aim to partition to 100 parts, so each part will have one ID. However, the code above does not give the expected answer. Some partitions are null. For instance, when I try to get the second partition, using
result=SparkR:::collectPartition(mapdataP, 1L), which return Null.
Are there something missing or wrong here? Many thanks!

DSE cassandra and spark map collections type: how to perform get operation

For example I have the following table named "example":
name | age | address
'abc' | 12 | {'street':'1', 'city':'kl', 'country':'malaysia'}
'cab' | 15 | {'street':'5', 'city':'jakarta', 'country':'indonesia'}
In Spark I can do this:
scala> val test = sc.cassandraTable ("test","example")
and this:
scala> test.first.getString
and this:
scala> test.first.getMapString, String
which gives me all the fields of the address in the form of a map
Question 1: But how do I use the "get" to access "city" information?
Question 2: Is there a way to falatten the entire table?
Question 3: how do I go about counting number of rows where "city" = "kl"?
Thanks
Question 3 : How do we count the number of rows where city == something
I'll answer 3 first because this may provide you an easier way to work with the data. Something like
sc.cassandraTable[(String,Map[String,String],Int)]("test","example")
.filter( _._2.getOrElse("city","NoCity") == "kl" )
.count
First, I use the type parameter [(String,Map[String,String],Int)] on my cassandraTable call to transform the rows into tuples. This gives me easy access to the Map without any casting. (The order is just how it appears when I made the table in my test environment you may have to change the ordering)
Second I say I would like to filter based on the _._2 which is shorthand for the second element of the incoming tuple. getOrElse returns the value for the key "city" if the key exists and "NoCity" otherwise. The final equivalency checks what city it is.
Finally, I call count to find out the number of entries in the city.
1 How do we access the map?
So the answer to 2 is that once you have a Map, you can call get("key") or getOrElse("key") or any of the standard Scala operations to get a value out of the map.
2 How to flatten the entire table.
Depending on what you mean by "flatten" this can be a variety of things. For example if you want to return the entire table as an array to the driver (Not recommended since your RDD should be very big in production.) You can call collect
If you want to flatten the elements of your map into a tuple you can always do something like calling toSeq and you will end up with a list of (key,value) tuples. Feel free to ask another question if I haven't answered what you want with "flattening."

Resources