How to conditionally force fail a query in Kusto - azure-data-explorer

How do I force a query to actually fail in Kusto depending on certain condition? Ideally the exact I need to force failure is the query returns 0 count.
MyTable | count | where Count==0 ... the query should fail
I am looking for actual technical failure and not just nulls etc. Basically if a certain query returns 0 count , I want the query to fail so that the corresponding Web API call will also get appropriate failure return code.

Can you check if assert() function helps your scenario?
https://learn.microsoft.com/en-us/azure/kusto/query/assert-function
let Count = toscalar(
range x from 1 to 1 step 1 | count
);
print assert(Count != 0, "Count must be non-zero")

Related

Slow query on table | WHERE x | ORDER by timestamp | DISTINCT a,b,c,d | TAKE 20 when table large

We are experiencing a sudden performance drop with a query structured like this:
table(tablename)
| where MeasurementName in ('ActiveJobId')
and MachineId == machineId
and SourceTimestamp <= from
and isnotnull( Value)
| order by SourceTimestamp desc
| distinct SourceTimestamp, MeasurementName, tostring(Value), SourceTimestampUtc
| take rows
tablename, machineId, from, rows are all query parameters. rows is typically "20". Value column is of type "dynamic"
The table contains 240 Million entries, with about 64,000 matching the WHERE criteria. The goal of the query is to get the last 20 UNIQUE, non-empty entries for a given machine and data point, starting after a specific date.
The query runs smooth in the Staging database system, but started to degrade in performance on the Dev system. Possibly because of increased data amount.
If we remove the distinct clause, or move it behind the TAKE clause, the query completes very fast. (<1s). The data contains about 5-10% duplicate entries.
To our understanding the query should be performed like this:
Prepare a filter for the source table, start at a specific datetime range
Order desc: walk backwards
Walk down the table and stop when you got 20 distinct rows
From the time it sometimes takes it looks almost as if ADX walks down the whole table, performs a distinct, and then only takes the topmost 20 rows.
The problem persists if we swap | order and | distinct around.
The problem disappears if we move | distinct to the end of the query, but then we often receive 1-2 items less than required.
Is there a logical error we make, can this query be rewritten, or are there better options at hand?
The goal of the query is to get the last 20 UNIQUE, non-empty entries for a given machine and data point, starting after a specific date.
This part of the description doesn't match the filter in your query: and SourceTimestamp <= from - did you mean to use >= instead of <= ?
Is there a logical error we make, can this query be rewritten, or are there better options at hand?
If you can't eliminate the duplicates upstream, you can consider setting a materialized view that performs the deduplication, then query the view directly instead of the raw data. Also see Handle duplicate data

How to show records according to time and whose status is less than 2 in GQL

I am new in GQL and facing problem while fetching records from cloud data store. I want to show records according to time when it is saving in data store and whose status is less than 2(i.e 0 or 1). Users whose details are saved recently comes on top in listing and then others. While saving details I am storing their timestamp also. Here is my query to retrieve the details whose status is 0 (which is working fine) but I want to retrieve details whose status is both 0 or 1.
"NOTE: Status datatype is string in datastore."
According to the GQL rule we can't use OR operator in it. So, what will be the solution for it. Anyone knows?
SELECT * FROM Users WHERE __key__ HAS ANCESTOR KEY(UsersDetails, 9623495224) AND status = '0' ORDER BY sent_on desc
Instead of using OR you can do the following workaround:
SELECT * FROM Users WHERE __key__ HAS ANCESTOR KEY(UsersDetails, 9623495224) AND status >= 0 AND status <= 1 ORDER BY sent_on desc
Note that for this case you should convert the data to int. You can check the reference here

DynamoDB Last Evaluated Key Expiration?

My application ingests data from a 3rd party REST API which is backed by DynamoDB. The results are paginated and thus I page forward by passing the last evaluated key to each subsequent request.
My question is does the last evaluated key have a shelf life? Does it ever expire?
Let's say I query the REST API and then decide to stop. If I save the last evaluated key, can pick up exactly where I left off 30 days later? Would that last evaluated key still work and return the correct next page based on where I left off previously?
You shouldn't think of the last evaluated key like a "placeholder" or a "bookmark" in a result set from which to resume paused iteration.
You should think of it more like a "start from" place marker. An example might help. Let's say you have a table with a hash key userId and a range key timestamp. The range key timestamp will provide an ordering for your result set. Say your table looked like this:
user ID | Timestamp
1 | 123
1 | 124
1 | 125
1 | 126
In this order, when you query the table for all of the records for userId 1, you'll get the records back in the order they're listed above, or ascending order by timestamp. If you wanted them back in descending order, you'd use Dyanmo DB's scanIndexForward flag to indicate to order them "newest to oldest" or in descending order by timestamp.
Now, suppose there were a lot more than 4 items in the table and it would take multiple queries to return all of the records with a userId of one. Well, you wouldn't want to have to keep getting pages and pages back, so you can tell Dynamo DB where to start by giving it the last evaluated key. Say the last result for the previous query was the record with userId = 1 and timestamp = 124. You tell Dynamo in your query that that was the last record you got, and it will start your next result set with the record that has userId = 1 and timestamp = 125.
So the last evaluated key isn't something that "expires," it's a way for you to communicate to Dynamo which records you want it to return based on records that you've already processed, displayed to the user, etc.

sqlite: insert or update a row, performance issue

sqlite table:
CREATE TABLE IF NOT EXISTS INFO
(
uri TEXT PRIMARY KEY,
cap INTEGER,
/* some other columns */
uid TEXT
);
INFO table has 5K+ rows and is run on a low power device (comparable to a 3 year old mobile phone).
I have this task: insert new URI with some values into INFO table, however, if URI is already present, then I need to update uid text field by appending extra text to it if the extra text to be appended isn't found within existing uid string; all other fields should remain unchanged.
As an example: INFO already has uri="http://example.com" with this uid string: "|123||abc||xxx|".
I need to add uri="http://example.com" and uid="|abc|". Since "|abc|" is a substring within existing field for the uri, then nothing should be updated. In any case, remaining fields shouldn't be updated
To get it done I have these options:
build some sql query (if it's possible to do something like that with sqlite in one sql statement),
Do everything manually in two steps: a) retrieve row for uid, do all processing manually and b) update existing or insert a new row if needed
Considering this is constrained device, which way is preferable? What if I omit the the extra requirement of sub-string match and always append uid to existing uid field?
"If it is possible with sqlite in one sql statement":
Yes, it is possible. The "UPSERT" statement has been nicely discussed in this question.
Applied to your extended case, you could do it like this in one statement:
insert or replace into info (uri, cap, uid)
values ( 'http://example.com',
coalesce((select cap from info where uri = 'http://example.com'),'new cap'),
(select case
when (select uid
from info
where uri = 'http://example.com') is null
then '|abc|'
when instr( (select uid
from info
where uri = 'http://example.com'),'|abc|') = 0
then (select uid
from info
where uri = 'http://example.com') || '|abc|'
else (select uid
from info
where uri = 'http://example.com')
end )
);
Checking the EXPLAIN QUERY PLAN gives us
selectid order from detail
---------- ---------- ---------- -------------------------
0 0 0 EXECUTE SCALAR SUBQUERY 0
0 0 0 SEARCH TABLE info USING INDEX sqlite_autoindex_INFO_1 (uri=?)
0 0 0 EXECUTE SCALAR SUBQUERY 1
1 0 0 EXECUTE SCALAR SUBQUERY 2
2 0 0 SEARCH TABLE info USING INDEX sqlite_autoindex_INFO_1 (uri=?)
1 0 0 EXECUTE SCALAR SUBQUERY 3
3 0 0 SEARCH TABLE info USING INDEX sqlite_autoindex_INFO_1 (uri=?)
1 0 0 EXECUTE SCALAR SUBQUERY 4
4 0 0 SEARCH TABLE info USING INDEX sqlite_autoindex_INFO_1 (uri=?)
1 0 0 EXECUTE SCALAR SUBQUERY 5
5 0 0 SEARCH TABLE info USING INDEX sqlite_autoindex_INFO_1 (uri=?)
As far as I know, sqlite will not cache the results of scalar sub-queries (I could not find any evidence of chaching when looking at the VM code produced by EXPLAIN for the above statement). Hence, since sqlite is an in-process db, doing things with two separate statements will most likely perform better than this.
You might want to benchmark the runtimes for this - results will of course depend on your host language and the interface you use (C, JDBC, ODBC, ...).
EDIT
A little performance benchmark using the JDBC driver and sqlite 3.7.2, running 100.000 modifications on a base data set of 5000 rows in table info, 50% updates, 50% inserts, confirms the above conclusion:
Using three prepared statements (first a select, then followed by an update or insert, depending on the selected data): 702ms
Using the above combined statement: 1802ms
The runtimes are quite stable across several runs.

How to get SimpleDB Query Count with Boto and SDBManager

I would like to query my SimpleDB domain to get the count of records that match a certain criteria. Something that could be done like this:
rs = appsDomain.select("SELECT count(*) FROM %s WHERE (%s='%s' or %s='%s') and %s!='%s'" % (APPS_SDBDOMAIN, XML_APPNODE_NAME_ATTR, appName, XML_APPNODE_RESERVED_NAME_ATTR, appName, XML_EMAIL_NODE, thisSession.email), None, True)
After doing some reading I have found that possibly getting a query count from SimpleDB via the SDBManager count method might be more efficient than doing a straight forward "count(*)" style query. Further, I would love not to have to loop over a result set when I know there is only one row and column that I need yet I would want to avoid this too:
count = int(rs.iter().next()['Count'])
Is it true that SDBManager is more efficient? Is there a better way?
If SDBManager is the best way can anyone show me how to use it as I have been thoroughly unsuccessful?
Thanks in advance!
Well, I stopped being lazy and simply went to the source to get my answer
(FROM: boto-2.6.0-py2.7.egg/boto/sdb/db/manager/sdbmanager.py)
def count(self, cls, filters, quick=True, sort_by=None, select=None):
"""
Get the number of results that would
be returned in this query
"""
query = "select count(*) from `%s` %s" % (self.domain.name, self._build_filter_part(cls, filters, sort_by, select))
count = 0
for row in self.domain.select(query):
count += int(row['Count'])
if quick:
return count
return count
As you can see the sdbmanager.count method does nothing special and in fact does what I was hoping to avoid which is looping over a record store just to get the 'Count' value(s).
So in the end I will probably just implement this method myself as using the SDBManager actually implies a lot more over head which, in my case, is not worth it.
Thanks!

Resources