I'm running some queries against my Cosmos database, and the results don't seem to make sense. I am getting 0 results with some queries when I know I should be getting some results. Here are the results of the queries:
SELECT COUNT(c.id) from c where c.ownerTime='2010775021_202009' -- 165229
SELECT COUNT(c.id) from c where c.ownerTime='2010775021_202009' and c.source='five9' -- 132100
SELECT COUNT(c.id) from c where c.ownerTime='2010775021_202009' and c.source<>'five9' -- 0
SELECT COUNT(c.id) from c where c.ownerTime='2010775021_202009' and c.source=null -- 0
My goal is to get a count of all items where the source attribute does not equal 'five9'. But that query (as seen above) returns a count of 0. As you can see, in the first query, the total number of items in this partition is 165,229. And from the second query, the total number of items in this partition where source is equal to 'five9' is 132,100. Common sense tells me that the number of items where the source is not equal to 'five9' should be 33,129. Why are my queries showing 0?
One thing to keep in mind is that when I go to inspect the items in raw form, the items that don't have source='five9' don't have a source attribute at all. Could that be causing this behavior? Also, ownerTime is the partition key.
Here is a small sample of data:
SELECT c.id,c.ownerTime,c.source from c where c.ownerTime='2010775021_202009'
...
{
"id": "L2n8D4c4GOFQTUA",
"ownerTime": "2010775021_202009"
},{
"id": "3524zXL55zQmQ8qk",
"ownerTime": "2010775021_202009",
"source": "five9"
}
...
Please try this sql:
SELECT COUNT(c.id) from c where c.ownerTime='2010775021_202009' and (not IS_DEFINED(c.source) or c.source<>'five9')
Related
I have a table like this:
Assume there are lots of Names (i.e., E,F,G,H,I etc.,) and their respective Date and Produced Items in this table. It's a massive table, so I'd want to write an optimised query.
In this, I want to query the latest A,B,C,D records.
I was using the following query:
SELECT * FROM c WHERE c.Name IN ('A','B','C','D') ORDER BY c.Date DESC OFFSET 0 LIMIT 4
But the problem with this query is, since I'm ordering by Date, the latest 4 records I'm getting are:
I want to get this result:
Please help me in modifying the query. Thanks.
I'm using SQLite 3.39.3.
I came up with this query to check whether two tables have the same number of rows:
SELECT COUNT(a.auth_id)=COUNT(b.auth_scopus_id) FROM auth_subject_areas_mapping AS a, auth_subject_areas_mapping_old AS b;
But the query never finishes.
EXPLAIN QUERY PLAN gives me:
Questions:
What's wrong with my query?
What is the correct way in SQLite to check whether two tables have the same number of rows?
WITH
acount AS (SELECT count(*) AS a FROM auth_subject_areas_mapping),
bcount AS (SELECT count(*) AS b FROM auth_subject_areas_mapping_old)
SELECT a = b FROM acount, bcount;
I have a simple core sql query that gets a count of rows. If i do the EXISTS and the IN separately, it's around 2/3RUs, but if i do a (EXISTS "OR" IN) -- I can even do (EXISTS "OR" TRUE), then it jumps up to 45RU. It makes more since for me to do 2 different queries than 1. Why does the OR cause the RU consuption to go up?
These are my queries that I've tried and I've experimented on.
SELECT VALUE COUNT(1) FROM ROOT r. -- 850 rows, 2-3RUs
SELECT VALUE COUNT(1) FROM ROOT r WHERE IS_NULL(r.deletedAt) -- 830 rows, 2-3RUs
SELECT VALUE COUNT(1) FROM ROOT r WHERE IS_NULL(r.deletedAt) AND r.id IN (......). 830 rows, 2-3RUs
SELECT VALUE COUNT(1) FROM ROOT r WHERE IS_NULL(r.deletedAt) AND EXISTS(SELECT s FROM s IN r.shared WHERE s.id = ID) -- 840rows, 2-3RUs
SELECT VALUE COUNT(1) FROM ROOT r WHERE IS_NULL(r.deletedAt) AND (EXISTS(SELECT s FROM s IN r.shared WHERE s.id = ID) OR r.id IN (...)) -- 840rows, 45RUs
This is also cross-listed on Microsoft Q/A as well.
Disclaimer: I have no internal view on CosmosdB engine and below is just a general guess.
While there may be tricks involved regarding data cardinality, how your index is set up and if/how the predicate tree could be pruned, but overall it is not too surprising that OR is a harder query. You can't have a covering index for OR-predicate and that requires data lookups.
For index-covered ANDs only, basically:
get matching entries from indexes for indexable predicates and take intersection.
return count
With OR-s you can't work on indexes alone:
get matching entries from indexes for indexable predicates and take intersection.
look up documents (or required parts)
Evaluate non-indexable predicates (like A OR B) on all matching documents
return count
Obviously the second requires a lot more computation and memory. Hence, higher RU. Query engine can do all kind of tricks, but the fact is that they must get extra data to make sure your "hard" predicates are taken into account.
BTW, if unhappy with RU, then you should always check which/how indexes were applied and if you can improve anything by setting up different indexes.
See: Indexing metrics in Azure Cosmos DB.
Having more complex queries have higher RU is still to be expected, though.
Considering the following query:
SELECT TOP 1 * FROM c
WHERE c.Type = 'Case'
AND c.Entity.SomeField = #someValue
AND c.Entity.CreatedTimeUtc > #someTime
ORDER BY c.Entity.CreatedTimeUtc DESC
Until recently, when I ran this query, the number of documents processed by the query (RetrievedDocumentCount in the query metrics) was the number of documents that satisfies the first two condition, regardless the "CreatedTimeUtc" or the TOP 1.
Only when I added a composite index of (Type DESC, Entity.SomeField DESC, Entity.CreatedTimeUtc DESC) and added them to the ORDER BY clause, the retrieved documents count dropped to the number of documents that satisfies all 3 conditions (still not one document as expected, but better).
Then, starting a few days ago, we noticed in our dev environment that the composite index is no longer needed as retrieved documents count changed to only one document (= the number in the TOP, as expected), and the RU/s reduced significantly.
My question – is this a new improvement/fix in CosmosDB? I couldn’t find any announcement/documentation on this manner.
If so, is the roll-out completed or still in-progress? We have several production instances in different regions.
Thanks
There have not been any recent changes to our query engine that would explain why this query is suddenly less expensive.
The only thing that would explain this is fewer results match the filter than before and that our query engine was able to perform an optimization that it would not otherwise be able to have done with a larger set of results.
Thanks.
this might be a simple question but I am confused, please help.....
I am using OrientDB 2.1.9 and I am experimenting with VehicleHistoryGraph database.
From Studio, Browse mode, set limit to 9 records only. Now I am entering this simple query
select out() from Person
The result set I am getting back is 9 records BUT only two have Bought a vehicle. The rest are displayed with empty collections []. This is no good, I am confused. I would expect to get back only those two vertices with collections of edges !
How do I get back these two persons that bought something ?
I noticed also that there is this unwind operator in select. Is this useful in that case, can you make an example ?
Your query asks for out(), so out() is computed in all cases, and you're shown the results. If you only want the rows for which out().size() > 0 then you can construct a query like this:
select out() from v let n=out().size() where $n > 0
If you think that one ought to be able to write this more succintly, e.g. like so:
select out() as n from v where n > 0
then join the club (e.g. by supporting this enhancement request).
(select out() from v where out().size() > 0 is supported.)