Neo4j 2.0 How do I count relationships in the CASE clause? - count

I've been writing a cypher query to count all nodes with a label, property, or relationship that contain the criteria typed in by the user. When I do this query, I want a count of all the nodes with unique label names (which I've accomplished). I also want a count of all the nodes that contain specific relationships.
This is the part of the query that returns a count of all the nodes with the labels: Fruit, Lesson, Tech, and TestData. I left out the WHERE clause because it's pretty long.
match (n)
return
sum(CASE when any(l IN labels(n) WHERE l='Fruit') THEN 1 ELSE 0 END) AS Fruit,
sum(CASE when any(l IN labels(n) WHERE l='Lesson') THEN 1 ELSE 0 END) AS Lesson,
sum(CASE when any(l IN labels(n) WHERE l='Tech') THEN 1 ELSE 0 END) AS Tech,
sum(CASE when any(l IN labels(n) WHERE l='TestData') THEN 1 ELSE 0 END) AS TestData
It returns
Fruit Lesson Tech TestData
1000 20 100 50
However, I'd also like to count the number of nodes that had specific relationships (I know the names ahead of time) like "KNOWS", "IS_A", and "DESTINATION." For instance, if a user was searching for the word "knows" and the resulting nodes had a relationship called "knows," then I would count that node. Afterwards, my query would report that I had found 20 nodes that were connected via the "knows" relationship.
I'd like to do this without excluding any of my result nodes. Notice I didn't include any relationships in the match clause. I still want to include nodes that don't have a relationship (I don't care about counting that).
Does anyone know how to do this? Can it be done?
I'm looking for something similar to this:
match (n)
return
sum(CASE when any(r IN rels(n) WHERE r='KNOWS') THEN 1 ELSE 0 END) AS KNOWS,
sum(CASE when any(r IN rels(n) WHERE r='IS_A') THEN 1 ELSE 0 END) AS IS_A,
sum(CASE when any(r IN rels(n) WHERE r='DESTINATION') THEN 1 ELSE 0 END) AS DESTINATION,

Not sure if I catch what you mean with "count the nodes that matched specific relationships" and "without excluding any of my result nodes". Do you mean to count cases where a node has at least one relationship in any direction of each type (and if for instance the node has two or five such relationships it still counts as one), as opposed to counting the relationships? Or do you mean that you want a sub-query that works in combination with the previous query counting labels? Does this do what you want?
MATCH (n)
OPTIONAL MATCH (n)-[r]-()
WITH n, collect(r) as rs
SUM(CASE WHEN ANY(l IN labels(n) WHERE l='Fruit') THEN 1 ELSE 0 END) AS Fruit,
SUM(CASE WHEN ANY(l IN labels(n) WHERE l='Lesson') THEN 1 ELSE 0 END) AS Lesson,
SUM(CASE WHEN ANY(l IN labels(n) WHERE l='Tech') THEN 1 ELSE 0 END) AS Tech,
SUM(CASE WHEN ANY(l IN labels(n) WHERE l='TestData') THEN 1 ELSE 0 END) AS TestData,
SUM(CASE WHEN ANY(r IN rs WHERE type(r)='KNOWS') THEN 1 ELSE 0 END) AS KNOWS,
SUM(CASE WHEN ANY(r IN rs WHERE type(r)='IS_A') THEN 1 ELSE 0 END) AS IS_A,
SUM(CASE WHEN ANY(r IN rs WHERE type(r)='DESTINATION') THEN 1 ELSE 0 END) AS DESTINATION
Something that you could try, that may be an improvement, would be to match each thing you want to count explicitly, count it, and carry the count with WITH until you return. This has particular benefits if there are many things in your database that you are not counting, which on the query above will be matched and evaluated anyway. So you could experiment with something like
MATCH (n:Fruit)
WITH count(n) as Fruit
MATCH (n:'Lesson')
WITH Fruit, count(n) as Lesson
MATCH (n:'Tech')
WITH Fruit, Lesson, count(n) as Tech
MATCH (n:TestData)
WITH Fruit, Lesson, Tech, count(n) as TestData
MATCH (n)-[:KNOWS]-()
WITH Fruit, Lesson, Tech, TestData, count(n) as KNOWS
MATCH (n)-[:IS_A]-()
WITH Fruit, Lesson, Tech, TestData, KNOWS, count(n) as IS_A
MATCH (n)-[:DESTINATION]-()
RETURN Fruit, Lesson, Tech, TestData, KNOWS, IS_A, count(n) as DESTINATION
I don't know how much of a difference it will make on your data, but it's better as far as possible to use narrow or specific patterns rather than to use a broad pattern and then filter. It may be worth profiling the two queries and compare.

Related

SQLite query returns 0 results

I am having trouble with a query.
Fiddle: https://www.db-fiddle.com/f/JXQHw1VzF7vAowNLFrxv5/1
This is not going to work.
So my question is: What has to be done to get a result when I wanna use both conditions.
(attr_key = 0 AND attr_value & 201326592 = 201326592)
AND
(attr_key = 30 AND attr_value & 8 = 8)
Thanks in advance!
Best regards
One way to check for the presence of some number of key value pairs in the items_attributes table would be to use conditional aggregation:
SELECT i.id
FROM items i
LEFT JOIN items_attributes ia
ON i.id = ia.owner
GROUP BY
i.id
HAVING
SUM(CASE WHEN ia.key = 0 AND ia.value = 201326592 THEN 1 ELSE 0 END) > 0 AND
SUM(CASE WHEN ia.key = 30 AND ia.value = 8 THEN 1 ELSE 0 END) > 0
The trick in the above query is that we scan each cluster of key/value pairs for each item, and then check whether the pairs you expect are present.
Note: My query just returns id values from items matching all key value pairs. If you want to bring in other columns from either of the two tables, you may simply add on more joins to what I wrote above.

Optimize left join query with multiple counts in SQLAlchemy?

Trying to optimize a query, which has multiple counts for objects in subordinate table (used aliases in SQLAlchemy). In Witch Academia terms, something like this:
SELECT
exam.id AS exam_id,
exam.name AS exam_name,
count(tried_witch.id) AS tried,
count(passed_witch.id) AS passed,
count(failed_witch.id) AS failed
FROM exam
LEFT OUTER JOIN witch AS tried_witch
ON tried_witch.exam_id = exam.id AND
tried_witch.is_failed = 0 AND
tried_witch.status != "passed"
LEFT OUTER JOIN witch AS passed_witch
ON passed_witch.exam_id = exam.id AND
passed_witch.is_failed = 0 AND
passed_witch.status = "passed"
LEFT OUTER JOIN witch AS failed_witch
ON failed_witch.exam_id = exam.id AND
failed_witch.is_failed = 1
GROUP BY exam.id, exam.name
ORDER BY tried ASC
LIMIT 20
Number of witches can be large (hundreds of thousands), number of exams is lower (hundreds), so the above query is quite slow. In a lot of similar questions I've found answers, which propose the above, but I feel like a totally different approach is needed here. I am stuck at coming up with alternative. NB, there is a need to order by calculated counts. It is also important to have zeros as counts, of course, where due. (do not pay attention to a somewhat funny model: witches can easily clone themselves to go to multiple exams, thus per exam identity)
With one EXISTS subquery, which is not reflected in the above and does not influence the ouotcome, the situation is:
# Query_time: 1.135747 Lock_time: 0.000209 Rows_sent: 20 Rows_examined: 98174
# Rows_affected: 0
# Full_scan: Yes Full_join: No Tmp_table: Yes Tmp_table_on_disk: Yes
# Filesort: Yes Filesort_on_disk: No Merge_passes: 0 Priority_queue: No
Updated query, which is still quite slow:
SELECT
exam.id AS exam_id,
exam.name AS exam_name,
count(CASE WHEN (witch.status != "passed" AND witch.is_failed = 0)
THEN witch.id
ELSE NULL END) AS tried,
count(CASE WHEN (witch.status = "passed" AND witch.is_failed = 0)
THEN witch.id
ELSE NULL END) AS passed,
count(CASE WHEN (witch.is_failed = 1)
THEN witch.id
ELSE NULL END) AS failed
FROM exam
LEFT OUTER JOIN witch ON witch.exam_id = exam.id
GROUP BY exam.id, exam.name
ORDER BY tried ASC
LIMIT 20
Indexing is the key to get performance of the query.
I do not know MariaDB at all, so not sure what the possibilities are. But if it is anything like Microsoft SQL Server, then here is what I would try:
Create ONE composite index covering ALL the required columns: witch_id, status and is_failed. If the query uses that index, that should be it. Here the order of the included columns might be very important. Then profile the query in order to understand if the index is used. See Optimization and Indexes documentation page.
Consider Generated (Virtual and Persistent) Columns.
It looks like all the information for classification of the witch into tried, passed or failed bucket is contained in the row for witch. Therefore, you can basically create those virtual columns on the database table directly and use PERSISTENT option. This option allows creating index on it. Then you can create an index specifically for this query containing witch_id and three virtual columns: tried, passed and failed. Make sure you query uses it, and that should be pretty good. The query will then look very simple:
SELECT exam.id,
exam.name,
sum(witch.tried) AS tried,
sum(witch.passed) AS passed,
sum(witch.failed) AS failed
FROM exam
INNER JOIN witch ON exam.id = witch.exam_id
GROUP BY exam.id,
exam.name
ORDER BY sum(witch.tried)
LIMIT 20
Although query simple comparisons and AND/OR clauses, you are basically offloading the calculation of the 3 statuses to the database during INSERT/UPDATE. Then during SELECT you query should be much faster.
Your example does not specify any result filtering (WHERE clause), but if you have one, it might also have an impact on the way one optimises indices for query performance.
Original answer: Below is the originally proposed change to the query.
Here i assume that indexing part of the optimisation has been already done.
Could you try with SUM instead of COUNT?
SELECT exam.id,
exam.name,
sum(CASE
WHEN (witch.is_failed = 0
AND witch.status != 'passed') THEN 1
ELSE 0
END) AS tried,
sum(CASE
WHEN (witch.is_failed = 0
AND witch.status = 'passed') THEN 1
ELSE 0
END) AS passed,
sum(CASE
WHEN (witch.is_failed = 1) THEN 1
ELSE 0
END) AS failed
FROM exam
INNER JOIN witch ON exam.id = witch.exam_id
GROUP BY exam.id,
exam.name
ORDER BY sum(CASE
WHEN (witch.is_failed = 0
AND witch.status != 'passed') THEN 1
ELSE 0
END)
LIMIT 20
The rest:
Given you have specified sqlalchemy in your answer, here is the sqlalchemy code, which i used to model and generate the query:
# model
class Exam(Base):
id = Column(Integer, primary_key=True)
name = Column(String)
class Witch(Base):
id = Column(Integer, primary_key=True)
exam_id = Column(Integer, ForeignKey('exam.id'))
is_failed = Column(Integer)
status = Column(String)
exam = relationship(Exam, backref='witches')
# computed fields
#hybrid_property
def tried(self):
return self.is_failed == 0 and self.status != 'passed'
#hybrid_property
def passed(self):
return self.is_failed == 0 and self.status == 'passed'
#hybrid_property
def failed(self):
return self.is_failed == 1
# computed fields: expression
#tried.expression
def _tried_expression(cls):
return case([(and_(
cls.is_failed == 0,
cls.status != 'passed',
), 1)], else_=0)
#passed.expression
def _passed_expression(cls):
return case([(and_(
cls.status == 'passed',
cls.is_failed == 0,
), 1)], else_=0)
#failed.expression
def _failed_expression(cls):
return case([(cls.is_failed == 1, 1)], else_=0)
and:
# query
q = (
session.query(
Exam.id, Exam.name,
func.sum(Witch.tried).label("tried"),
func.sum(Witch.passed).label("passed"),
func.sum(Witch.failed).label("failed"),
)
.join(Witch)
.group_by(Exam.id, Exam.name)
.order_by(func.sum(Witch.tried))
.limit(20)
)

Max Results symfony2 dql

I have some problem with my database query. I need to set max results, but when i'm doing that i don't have result what i expected. (eg. when i have 2 "hashtags" decreases by one)
Here is code:
$r = $this->createQueryBuilder("q")
->select("q AS image, sb, hs")
->addSelect("c, SUM(CASE WHEN (c.likeCount!=false) THEN c.likeCount ELSE 0 END) AS likes")
->addSelect("cm, COUNT(cm) AS comments")
->addSelect("u, SUM(CASE WHEN (c.user=:user) THEN c.likeCount ELSE 0 END) AS userLiked")
->leftJoin("q.siteBox","sb")
->leftJoin("q.imageLikes","c")
->leftJoin("q.hashtags","hs")
->leftJoin("q.comments","cm")
->setMaxResults(4)
->leftJoin("c.user","u")
->setParameter("user",$user)
->groupBy("hs.id,q.id")
->orderBy("q.date","DESC")
;
Sorry for my bad english, but i wthink everybody will understand.

use the same parameters multiple times

I have an sql-select (or insert) which uses the same two parameters several times.
Is there a way to avoid using multiple the same parameter for every "?,?,?,?,..." in the list ?
cursor.execute(statement, list)
I could think of two named parameters but without the possibility of code-injection.
In the example below every left "?" resp. right "?" is the same string. I used the seven counts in one statement in order to get one result.
select count(case (aart like "1%") and (adatum between ? and ?) when 1 then 1 else null end) as AufExt,
count(case (aart like "1%E") and (adatum between ? and ?) when 1 then 1 else null end) as AufExtE,
count(case (aart like "1%K") and (adatum between ? and ?) when 1 then 1 else null end) as AufExtK,
count(case (aart like "2S%") and (adatum between ? and ?) when 1 then 1 else null end) as AufInt,
count(case (eart like "3%") and (edatum between ? and ?) when 1 then 1 else null end) as EntExt,
count(case (eart like "3%K") and (edatum between ? and ?) when 1 then 1 else null end) as EntExtK,
count(case (eart like "2S%") and (edatum between ? and ?) when 1 then 1 else null end) as EntInt
from tabelle
Related question to this problem: It looks as if no index is used in "case". Correct?
The documentation says:
? A question mark that is not followed by a number creates a parameter with a number one greater than the largest parameter number already assigned. [...]
?NNN A question mark followed by a number NNN holds a spot for the NNN-th parameter. [...]
:AAAA A colon followed by an identifier name holds a spot for a named parameter with the name :AAAA. Named parameters are also numbered. [...] To avoid confusion, it is best to avoid mixing named and numbered parameters.
you can also add with clause like this:
with
parms as (select ? as parm1, ? as parm2)
select count(case (aart like "1%") and (adatum between parms.parm1 and parms.parm2) when 1 then 1 else null end) as AufExt
...

Teradata - Cannot nest aggregate operations

The PROD_AMT I'd like to get is when ACCT_NBR, PROD_NBR And PROD_AMT are the same, I only need one PROD_AMT which is 100 (from distinct), and when ACCT_NBR are the same but PROD_NBR are different, then the PROD_AMT I need is 90 (30+60)
SELECT ACCT_NBR
,COUNT(DISTINCT CASE WHEN PROD_NBR = 1 THEN SUM(DISTINCT PROD_AMT)
WHEN PROD_NBR > 1 THEN SUM(PROD_AMT)
END) AS AMT
FROM TABLE
ACCT_NBR PROD_NBR PROD_AMT
3007 001 30
3007 002 60
1000 003 100
1000 003 100
There's probably a few ways to solve this. Using a subquery to determine which records should be summed vs which ones should be distinct, you could use:
SELECT
acct_nbr,
CASE WHEN sumflag = 'X' THEN SUM(prod_amt) ELSE MAX(prod_amt) END as amt
FROM
(
SELECT
acct_nbr,
prod_nbr,
prod_amt,
CASE WHEN COUNT(*) OVER (PARTITION BY Acct_nbr, prod_nbr, prod_amt) = 1 THEN 'X' ELSE NULL END AS sumflag
FROM
table
)t1
GROUP BY acct_nbr, sumflag
I'm just using MAX() here since it doesn't matter... all the values that will be aggregated with max() we know are duplicates, so it's a wash.
You could get similar results with a UNION query where one query would do the summing in the event that the records are distinct, and the other would just return distinct prod_amt's where the records are duplicates.
While the above example is nice if you truly have different aggregation needs depending on complex logic, for your question there's a simpler way of doing the same thing that doesn't use window functions:
SELECT
acct_nbr,
sum(prod_amt) AS amt
FROM
(
SELECT DISTINCT
acct_nbr,
prod_amt
FROM
table
)t1
GROUP BY 1
If you need to adapt this to a complex statement you could just sling your complex statement in as subquery where table is above like:
SELECT
acct_nbr,
sum(prod_amt) AS amt
FROM
(
SELECT DISTINCT
acct_nbr,
prod_amt
FROM
(
YOUR REALLY COMPLEX QUERY GOES IN HERE
)t2
)t1
GROUP BY 1

Resources