I am new to graph database and stuck with the following issue. I'm trying to store below conditional information in the graph.
when a=1 and b=2 then sum=3,
when a=2 and b=3 then sum=5 and mul=6
Here there are 4 pre-conditions[(a=1, b=2),(a=2, b=3)], 3 post conditions(sum=3,sum=5,mul=6)
The number of pre/post conditions can change from sentence to sentence.
What is the appropriate way to store such information in graphs.
Case 1:
Case 2:
Or please do suggest any other scalable way to store such info which can be easily queried.
One option is to use something like this graph, only Input, Cond and Res nodes:
MERGE (a:Input{key: 'a', value: 2})
MERGE (b:Input{key: 'b', value: 1})
MERGE (c:Res{key: 'sum', value: 5})
MERGE (d:Input{key: 'a', value: 7})
MERGE (e:Res{key: 'sum', value: 19})
MERGE (a)-[:POINTS]-(c)
MERGE (b)-[:POINTS]-(c)
MERGE (d)-[:POINTS]-(e)
MERGE (b)-[:POINTS]-(e)
With a result from a query like this:
MATCH (n:Res{key: 'sum'})<-[:POINTS]-(a:Input{key: 'a', value: 2})
WITH n
MATCH (n)<-[:POINTS]-(b:Input{key: 'b', value: 1})
WITH n
MATCH (n)<--(p:Input)
WITH n, COUNT(p) as inputCount
WHERE inputCount=2
RETURN n
Or:
MATCH (res:Res)<--(i:Input)
WITH res, count(i) as inputCount
WHERE EXISTS {MATCH (res)<--(Input{key: 'a', value: 2})}
AND EXISTS {MATCH (res)<--(Input{key: 'b', value: 1})}
AND inputCount=2
RETURN res
But keep in mind that this works for 'AND' conditions only
Related
I have a complex query which group bys multiple columns, this is an example
$queryBuilder
->join('e.something', 's')
->where('...')
->addGroupBy('e.id s.someField');
Using ->getQuery()->getResults() returns me an array of entities like I'd expect.
[
0 => App\Entity\ExampleEntity,
1 => App\Entity\ExampleEntity
]
If I try to count the results returned using a select like so:
$queryBuilder
->select('COUNT(e.id)')
->join('e.something', 's')
->where('...')
->addGroupBy('e.id s.someField');
I am returned an array of arrays, inside each array is the count. This isn't what I want. Removing the group by I'm given the correct result however the group by is required.
[
[
1 => '11',
],
[
1 => '4',
]
]
I'm stuck on how I can count the results of the group by. I have tried using distinct and I also do not want to count in PHP.
I'm performing an Sqlite3 query similar to
SELECT * FROM nodes WHERE name IN ('name1', 'name2', 'name3', ...) LIMIT 1
Am I guaranteed that it will search for name1 first, name2 second, etc? Such that by limiting my output to 1 I know that I found the first hit according to my ordering of items in the IN clause?
Update: with some testing it seems to always return the first hit in the index regardless of the IN order. It's using the order of the index on name. Is there some way to enforce the search order?
The order of the returned rows is not guaranteed to match the order of the items inside the parenthesis after IN.
What you can do is use ORDER BY in your statement with the use of the function INSTR():
SELECT * FROM nodes
WHERE name IN ('name1', 'name2', 'name3')
ORDER BY INSTR(',name1,name2,name3,', ',' || name || ',')
LIMIT 1
This code uses the same list from the IN clause as a string, where the items are in the same order, concatenated and separated by commas, assuming that the items do not contain commas.
This way the results are ordered by their position in the list and then LIMIT 1 will return the 1st of them which is closer to the start of the list.
Another way to achieve the same results is by using a CTE which returns the list along with an Id which serves as the desired ordering of the results, which will be joined to the table:
WITH list(id, item) AS (
SELECT 1, 'name1' UNION ALL
SELECT 2, 'name2' UNION ALL
SELECT 3, 'name3'
)
SELECT n.*
FROM nodes n INNER JOIN list l
ON l.item = n.name
ORDER BY l.id
LIMIT 1
Or:
WITH list(id, item) AS (
SELECT * FROM (VALUES
(1, 'name1'), (2, 'name2'), (3, 'name3')
)
)
SELECT n.*
FROM nodes n INNER JOIN list l
ON l.item = n.name
ORDER BY l.id
LIMIT 1
This way you don't have to repeat the list twice.
I do have a MySQL table called user_activities where one column (activities) is in JSON format:
id name activities
1 Peter ["football", "volley"]
2 Mary ["football", "hockey", "basketball"]
3 Jason ["volley", "hockey", "golf"]
And I need to construct a query that, given a list of activities, will return all those users that have at least one of the activities in this list.
Example 1:
Given a list
inputList <- list("football", "basketball")
the MySQL query should return:
id name activities
1 Peter ["football", "volley"]
2 Mary ["football", "hockey", "basketball"]
Example 2:
Given a list
inputList <- list("hockey", "golf", "basketball")
the MySQL query should return:
id name activities
2 Mary ["football", "hockey", "basketball"]
3 Jason ["volley", "hockey", "golf"]
I know that it's possible to check the existence of each element checking activity by activity, like:
SELECT * FROM user_activities
WHERE JSON_SEARCH(`activities`, 'one', 'football') IS NOT NULL
OR JSON_SEARCH(`activities`, 'one', 'basketball') IS NOT NULL
OR JSON_SEARCH(`activities`, 'one', 'volley') IS NOT NULL
OR JSON_SEARCH(`activities`, 'one', 'hockey') IS NOT NULL
OR JSON_SEARCH(`activities`, 'one', 'golf') IS NOT NULL;
But if an activity it's not in the specified list (inputList), I don't want to check its existence in activities. And this inputList changes every time I run the MySQL query.
So, is there any way to just check the list with the content in activities? I tried with:
SELECT * FROM user_activities
WHERE JSON_SEARCH(`activities`, 'all', (",paste(shQuote(inputList, type = "sh"), collapse = ','),")) IS NOT NULL;
but it 'obviously' returns an error:
`Error in .local(conn, statement, ...): could not run statement: Operand should contain 1 column(s)`
because JSON_SEARCH checks if a single string exists in a json array or a json document, and I'm not putting a single string in the function.
And JSON_CONTAINS
SELECT * FROM user_activities
WHERE JSON_CONTAINS(`activities`->'$[*]', JSON_ARRAY(", paste(shQuote(inputList, type = "sh"), collapse = ','), "))
returns if all the elements in the inputList exist in activities, and I want if any of the elements exist in activities (not necessarily all).
How could I achieve this?
Edit
I found a solution (see answer below) by building a dynamic query as it is suggested in this question MySQL Filter JSON_CONTAINS Any value from Array for PHP.
I found myself a solution with the dynamic query option I mentioned in the Edit part.
library(tractor.base)
condition <- function(dbcolumn,inlist){
cond <- implode(sapply(inlist, function(x) paste0("JSON_SEARCH(`",dbcolumn,"`, 'one', '", x,"') IS NOT NULL")), " OR ")
return(cond)
}
So, if I call the function (with the Example 2 in my question):
condition("activities",inputList)
it returns:
"JSON_SEARCH(`activities`, 'one', 'hockey') IS NOT NULL OR JSON_SEARCH(`activities`, 'one', 'golf') IS NOT NULL OR JSON_SEARCH(`activities`, 'one', 'basketball') IS NOT NULL"
So the MySQL query in R will finally look like:
query <- paste0("SELECT * FROM user_activities
WHERE ", condition("activities", inputList),";")
Given a table with columns(name, lat, lon, population, type) where there are many rows for each name, I'd like to select the rows grouped by name where population is the highest. The following works if I restrict myself to just name and population
SELECT name, Max(population)
FROM table WHERE name IN ('a', 'b', 'c')
GROUP BY name;
But I want the other columns — lat, lon, type — as well in the result. How can I achieve this using SQLite?
SQLite allows you to just list the other columns you want; they are guaranteed to come from the row with the maximum value:
SELECT name, lat, lon, Max(population), type
FROM table
WHERE name IN ('a', 'b', 'c')
GROUP BY name;
The docs read:
Special processing occurs when the aggregate function is either min() or max(). Example:
SELECT a, b, max(c) FROM tab1 GROUP BY a;
When the min() or max() aggregate functions are used in an aggregate query, all bare columns in the result set take values from the input row which also contains the minimum or maximum.
Join against that result to get the complete table records
SELECT t1.*
FROM your_table t1
JOIN
(
SELECT name, Max(population) as max_population
FROM your_table
WHERE name IN ('a', 'b', 'c')
GROUP BY name
) t2 ON t1.name = t2.name
and t1.population = t2.max_population
RANK or ROW_NUMBER window functions
Although max is guaranteed to work on SQLite as mentioned at https://stackoverflow.com/a/48328243/895245 the following method appears to be more portable and versatile:
SELECT *
FROM (
SELECT
ROW_NUMBER() OVER (
PARTITION BY "name"
ORDER BY "population" DESC
) AS "rnk",
*
FROM "table"
WHERE "name" IN ('a', 'b', 'c')
) sub
WHERE
"sub"."rnk" = 1
ORDER BY
"sub"."name" ASC,
"sub"."population" DESC
That exact same code works on both:
SQLite 3.34.0
PostgreSQL 14.3
Furthermore, we can easily modify that query to cover the following use cases:
if you replace ROW_NUMBER() with RANK(), it returns all ties for the max if more than one row reaches the max
if you replace "sub"."rnk" = 1 with "sub"."rnk" <= n you can get the top n per group rather than just the top 1
I'm attempting to reorder the values of a particular key in a python dictionary.
For instance, if a I have dictionary {'a':"hello","hey","hi"}
When i print the values of 'a', how could I change the order of "hello", "hey", or "hi"?
To rephrase this: the value of your key 'a' in your dictionary is a list containing 3 string-objects.
dict = { 'a': [ "hello", "hey", "hi" ] }
this being the case, you can sort the list like this