I have a SPARQL query which returns result with a LIMIT of 20..
In this query I also want to know the total number of results without running the query two times (one with LIMIT and one without LIMIT)..
For Example-- On running a query total possible results are 500 with LIMIT it displays only 20 at a time, but in my response I want a field which displays total result count, i.e., 500 ...
Updated question
Suppose
Now if i do a query where sequence = abc_11 with LIMIT=2
I will get something like
which is fine, with addition to this output what i want is
where totalMatchedResult is 5 because query actually matched 5 results but returned only 2 because of our LIMIT=2
Related
I have an asp.net application deployed in azure. This generates plenty of logs, some of which are exceptions. I do have a query in Log Analytics Workspace that picks up exceptions from logs.
I would like to know what is the best and/or cheapest way to detect anomalies in the exception count over a time period.
For example, if the average number of exceptions for every hour is N (based on information collected over the past 1 month or so), and if average goes > N+20 at any time (checked every 1 hour or so), then I need to be notified.
N would be dynamically changing based on trend.
I would like to know what is the best and/or cheapest way to detect anomalies in the exception count over a time period.
Yes, we can achieve this by following steps:
Store the average value in a Stored Query Result in Azure.
Using stored query result
.set stored_query_result
These are some limitations to keep the result. Refer MSDOC for detailed information.
Note: The stored query result will be available only 24 hours.
Workaround Follows
Set the Stored query result
# here i am using Stored query result to store the average value of trace message count for 5 hours
.set stored_query_result average <|
traces
| summarize events = count() by bin(timestamp, 5h)
| summarize avg(events)
2. Once Query Result Set you can use the Stored Query Result value in another KQL Query (The stored value was available till 24 hours)
# Retrieve the stored Query Result
stored_query_result(<StoredQueryResultName>) |
Query follows as per your need
Schedule the alert.
I need to run a query to find all documents with duplicated e-mails.
SELECT * FROM (SELECT c.Email, COUNT(1) as cnt FROM c GROUP BY c.Email) a WHERE a.cnt > 1
When I run it in Data Explorer in Azure Portal it finds 4 results, but it's not a complete list of duplicated emails, because I already know one email that is duplicated and when the query is narrowed (where email = 'x') it is returned and there are about 70 duplicated emails in the collection.
Currently, throughput is set to autoscale with 6000 Max RU/s, the collection has about 4kk of documents. When running the query I observe an increased count of 429s responses on this collection.
Query Statistics shows that all documents are retrieved from the collection, but output is only 4 (should be around 70).
Query used 277324 RUs and took 71 seconds which gives 3905 RU/s in average, so it shouldn't be throttled.
Why cosmos returns only limited results for this query?
What can I do to get all duplicates?
We are experiencing a sudden performance drop with a query structured like this:
table(tablename)
| where MeasurementName in ('ActiveJobId')
and MachineId == machineId
and SourceTimestamp <= from
and isnotnull( Value)
| order by SourceTimestamp desc
| distinct SourceTimestamp, MeasurementName, tostring(Value), SourceTimestampUtc
| take rows
tablename, machineId, from, rows are all query parameters. rows is typically "20". Value column is of type "dynamic"
The table contains 240 Million entries, with about 64,000 matching the WHERE criteria. The goal of the query is to get the last 20 UNIQUE, non-empty entries for a given machine and data point, starting after a specific date.
The query runs smooth in the Staging database system, but started to degrade in performance on the Dev system. Possibly because of increased data amount.
If we remove the distinct clause, or move it behind the TAKE clause, the query completes very fast. (<1s). The data contains about 5-10% duplicate entries.
To our understanding the query should be performed like this:
Prepare a filter for the source table, start at a specific datetime range
Order desc: walk backwards
Walk down the table and stop when you got 20 distinct rows
From the time it sometimes takes it looks almost as if ADX walks down the whole table, performs a distinct, and then only takes the topmost 20 rows.
The problem persists if we swap | order and | distinct around.
The problem disappears if we move | distinct to the end of the query, but then we often receive 1-2 items less than required.
Is there a logical error we make, can this query be rewritten, or are there better options at hand?
The goal of the query is to get the last 20 UNIQUE, non-empty entries for a given machine and data point, starting after a specific date.
This part of the description doesn't match the filter in your query: and SourceTimestamp <= from - did you mean to use >= instead of <= ?
Is there a logical error we make, can this query be rewritten, or are there better options at hand?
If you can't eliminate the duplicates upstream, you can consider setting a materialized view that performs the deduplication, then query the view directly instead of the raw data. Also see Handle duplicate data
I'm using Neo4j Sever 4.2.5.
The Pattern on which I want to run my query looks as follows:
(Artist)-[similar_to {score: <float>}]->(Artist)
Now what I want to do is get the 5 [similar_to] relations with the highest scores for each artist.
I've tried using Neo4j's collect() function to collect all the artists into a list and then using UNWIND to iterate over that. Sadly the LIMIT clause seems to limit the total number of returned records and not the returned records per iteration.
Any help would be appreciated.
Thanks in advance
To get the 5 rels with the highest scores, this should do it.
MATCH (n:Artist)-[r:similar_to]->(:Artist)
WITH n,r
ORDER BY r.score DESC
RETURN n, COLLECT(r)[..5] AS relsWithHighestScores
Considering the following query:
SELECT TOP 1 * FROM c
WHERE c.Type = 'Case'
AND c.Entity.SomeField = #someValue
AND c.Entity.CreatedTimeUtc > #someTime
ORDER BY c.Entity.CreatedTimeUtc DESC
Until recently, when I ran this query, the number of documents processed by the query (RetrievedDocumentCount in the query metrics) was the number of documents that satisfies the first two condition, regardless the "CreatedTimeUtc" or the TOP 1.
Only when I added a composite index of (Type DESC, Entity.SomeField DESC, Entity.CreatedTimeUtc DESC) and added them to the ORDER BY clause, the retrieved documents count dropped to the number of documents that satisfies all 3 conditions (still not one document as expected, but better).
Then, starting a few days ago, we noticed in our dev environment that the composite index is no longer needed as retrieved documents count changed to only one document (= the number in the TOP, as expected), and the RU/s reduced significantly.
My question – is this a new improvement/fix in CosmosDB? I couldn’t find any announcement/documentation on this manner.
If so, is the roll-out completed or still in-progress? We have several production instances in different regions.
Thanks
There have not been any recent changes to our query engine that would explain why this query is suddenly less expensive.
The only thing that would explain this is fewer results match the filter than before and that our query engine was able to perform an optimization that it would not otherwise be able to have done with a larger set of results.
Thanks.