Kusto query cancelled due to appended alias statements - azure-data-explorer

I wrote a Kusto query which runs in under 1 minute however the same query gets cancelled for a colleague due to a 10x longer runtime. I collected some metadata about both query runs using .show queries and found that he had the same ClientRequestProperties with even less cache misses however there were 400+ alias database statements appended to the start of his query text. Is there some connection or user settings that could be causing these alias statements to be appended?

The alias statements themselves (400+) should not have a negative impact, however, it can be that these statements point to other clusters, in that case, the query may turn to be a query that is sent to many clusters and is not the same as the first query.
The next steps should be to ensure that the semantics of the two queries that you compare are identical, if they are, then please open a support ticket so that the team can investigate.
If you need help evaluating the semantics of the queries, please edit your question and provide a sample, simplified, and anonymized query that will allow better analysis of the issue.

Related

Can you get a log file of 'reads' on specific RECID(Tablename) in Progress-4GL/Openedge at RunTime without access to Source Code?

I want to know which tables are being read by a query.
for each Customer where CustomerID = 12345.
Eventually this customer will be found in the following example, but progress must 'read' many tables before getting to customer 12345.
How do I know exactly which tables are read (By CustomerID), prior to getting to customer 12345?
*NOTE: I do not have access to modify the code being run for this selection. Ideally I would run a separate set of code that is executed at the same time as the customer query above to track the reads.
EDIT: More clearly - Can you track reads from a given program (.p) OR ProcessID and output either a RECID or the PrimaryKey to a file?
I understand the information is being read off the Disk and probably stored in a database buffer. So how would I get at the information in the database buffer?
You seem to be mixing up a few different things.
In a situation like your example where you FIND a specific record in one, and only one table then there is just a single record read. Progress will find that record by first scanning a relevant index. That might be 2 or 3 "logical reads" of the b-tree to get to the proper node. The record block and index blocks may, or may not be read from disk - that depends on what has happened previously.
There are "Virtual System Tables" available that can tell you how many READ operations take place against a particular table or index. But they do not trace the specific ROWID or other identifying data. _TableStat and _IndexStat are aggregates for all users on the system, _UserTableStat and _UserIndexStat are specific to a particular user's activity. You do need to set the -tablerangesize and -indexrangesize parameters adequately to take advantage of these.
If you have enabled the table and index statistics then you can use a tool like ProTop - http://protop.wss.com to get insight into this activity. Or you can write your own code.
OpenEdge Auditing does not track reads. That would be prohibitively expensive.
It's probably not really a good idea but, in theory, you could write FIND triggers for the tables you are interested in. That doesn't require access to the application source but you would need a development license. It will probably kill performance to do this though - so unless this is a non-production test environment that you just want to fiddle with I wouldn't really do that.
You mention wanting to know how you got to that point. That sounds more like you might need to have a "4gl trace". One easy way to get the stack trace of a running process is to execute:
$DLC/bin/proGetStack PID (UNIX)
or
%DLC%\bin\proGetStack PID (Windows)
This command will generate a "protrace.pid" file containing a 4gl stack trace and other interesting information.
There are also more complicated ways to get that info like using PROMON and the "client statement cache" or setting various log entry types at session startup. But proGetStack is pretty convenient and requires no code or scripting changes.
Some great options from Tom above. And all of them may be relevant to you. The option he only skirts around is the logging options. I feel obliged to expand on this because I'm giving a talk on it in a couple of weeks!
Assuming you are running a modern version of Progress, or even 10.2B08, then you have client logging available to you. Start your session with these additional options:
-clientlog "\somefolder\somefile.txt"
-logentrytypes "QryInfo:3"
This will log all the info of all the queries in your session to the file you specified above. If you navigate to the point in the system where you want to analyse your query and empty the logfile and save it, you can then run the offending query and see all the detail you need.
The output tells you all sorts of useful info, including the number of reads on each table, compared with the number returned to the user. You also get the index selected.
Using Tom's advice and/or this will get you what you need.

Riak: are my 2is broken?

we're having some weird things happening with a cleanup cronjob and riak:
the objects we store (postboxes) have a 2i for modification date (which is a unix timestamp).
there's a cronjob running freqently deleting all postboxes that have not been modified within 180 days. however we've found evidence that postboxes that some (very little) postboxes that were modified in the last three days were deleted by this cronjob.
After reviewing and debugging several times over every line of code, I am confident, that this is not a problem of the cronjob.
I also traced back all delete calls to that bucket - and no one else is deleting objects there.
Of course I also checked with Riak to read the postboxes with r=ALL: they're definitely gone. (and they are stored with w=QUORUM)
I also checked the logs: updating the post boxes did succeed (there were no errors reported back from the write operations)
This leaves me with two possible causes for this:
riak loses data (which I am not willing to believe that easily)
the secondary indexes are corrupt and queries to them return wrong keys
So my questions are:
Can 2is actually break?
Is it possible to verify that?
Am I missing something completely different?
Cheers,
Matthias
Secondary index queries in Riak are coverage queries, which means that they will only use one of the stored replicas, and not perform a quorum read.
As you are writing with w=QUORUM, it is possible that one (or more) of the replicas may not get updated if you have n_val set to 3 or higher while the operation still is deemed successful. If this is the one selected for the coverage query, you could end up deleting based on the old value. In order to avoid this, you will need to perform updates with w=ALL.

Black box testing a remote DICOM Q/R server

I am wondering if anyone has ever tried to work on the following issue. I need to execute a series of test on a remote DICOM Q/R server. This would allow some easy DICOM Conformance Statement checking.
As an implementation detail of the test suite, I am running the following (DCMTK style command):
$ findscu --study --cancel 1 --key 0020,0010=* --key 8,52=STUDY --aetitle MINE --call THEIR dicom.example.com 11112
The goal here is to find a valid StudyID (later on I'll use that StudyID to execute lower key level C-FIND, and some related C-MOVE queries). Of course it would be much easier if I could upload my own dataset and try to fetch it back, but I cannot do that against a running PACS in a clinical environment. I need to define with a minimal number of queries how to find a valid StudyID.
However I fear that some DICOM implementation may have policies where quering the entire database is forbidden.
So I was wondering if anyone has written a list of those policies, and maybe describe a way to retrieve a valid StudyID from a remote server with a minimal number of C-FIND queries.
I think I may simply go with:
TODAY=`date +"%Y%m%d"`
findscu --study --key 0008,0020="$TODAY-" --key 0020,0010=* --key 8,52=STUDY --aetitle MINE --call THEIR dicom.example.com 11112
If this does not work (return empty), I'll check yesterday results.
Welcome to DICOM-wonderland.
You are right that you should be very, very, very, very careful to run just random queries on a clinical PACS. I've seen commercial PACses send their whole(!) database as a result of a query which it did not understand. Not a pretty sight. This (and privacy) is one of the reasons that in a lot of hospitals around the world PACS admins are very afraid of giving direct access to their PACS via DICOM.
In general I would say that standardization is not going to help you. So you have to find something which works for you, and which will not get bring the PACS down. No guarantees here.
Just a list of observations from querying PACSes in hospitals:
Some are case sensitive in their matching, some are not.
Most support some kind of wild card. This normally is a '*'. but I've also seen '%' (since that is a SQL wildcard, and the query is just passed as a SQL string). This is not well-defined I think.
The list you will get back might be limited to say the first 500 entries. Or 1000. Or random number between 500 and 1000. Or the whole PACS. You just don't know.
DICOM and cancellation do not play well. Cancelling a query is not implemented well. Normally a PACS sees it as a failed transfer, and will retry after some time. And the retry-queue is limited in size, so it might ignore new queries. So always let your STORE-SCP server running to drain this queue.
Sometimes queries take minutes, especially for retrieve. The next time it might have been retrieved (from tape?) and be fast for a while.
A DICOM query may take a lot of resources from the PACS, depending on the PACS. Don't be suprised if a PACS admin shows up if you experiment a little too much.
The queries supported differ very much. Only basic queries are supported by all: list of patients, list of studyID/study instanceuid for patients, list of series per study, retrieve study or series. Unless you get a funky research department which uses Osirix, which does not support patient-level queries but only study-level-queries.
So what I would advise if you want to have something working on any random PACS:
Use empty-return-key instead of '*'. This is the DICOM way to retrieve information.
Do not use '-cancel'. If you really need to cancel, just close the TCP connection (not supported in DCMTK)
Use a query on PatientId, PatientName, Birthdate, StudyDate to get a list of StudyIDs/StudyInstanceUids.
The simplest is just use a fixed StudyID, assuming that it stays in the PACS long enough. If not, think of a limiting query to not overload the PACS (the 'TODAY' suggestion of you fitted that description).
Good luck!

Any SQL Server multiple-recordset stored procedure gotchas?

Context
My current project is a large-ish public site (2 million pageviews per day) site running a mixture of asp classic and asp.net with a SQL Server 2005 back-end. We're heavy on reads, with occasional writes and virtually no updates/deletes. Our pages typically concern a single 'master' object with a stack of dependent (detail) objects.
I like the idea of returning all the data required for a page in a single proc (and absolutely no unnecesary data). True, this requires a dedicated proc for such pages, but some pages receive double-digit percentages of our overall site traffic so it's worth the time/maintenance hit. We typically only consume multiple-recordsets from our .net code, using System.Data.SqlClient.SqlDataReader and it's NextResult method. Oh, yeah, I'm not doing any updates/inserts in these procs either (except to table variables).
The question
SQL Server (2005) procs which return multiple recordsets are working well (in prod) for us so far but I am a little worried that multi-recordset procs are my new favourite hammer that i'm hitting every problem (nail) with. Are there any multi-recordset sql server proc gotchas I should know about? Anything that's going to make me wish I hadn't used them? Specifically anything about it affecting connection pooling, memory utilization etc.
Here's a few gotchas for multiple-recordset stored procs:
They make it more difficult to reuse code. If you're doing several queries, odds are you'd be able to reuse one of those queries on another page.
They make it more difficult to unit test. Every time you make a change to one of the queries, you have to test all of the results. If something changed, you have to dig through to see which query failed the unit test.
They make it more difficult to tune performance later. If another DBA comes in behind you to help performance improve, they have to do more slicing and dicing to figure out where the problems are coming from. Then, combine this with the code reuse problem - if they optimize one query, that query might be used in several different stored procs, and then they have to go fix all of them - which makes for more unit testing again.
They make error handling much more difficult. Four of the queries in the stored proc might succeed, and the fifth fails. You have to plan for that.
They can increase locking problems and incur load in TempDB. If your stored procs are designed in a way that need repeatable reads, then the more queries you stuff into a stored proc, the longer it's going to take to run, and the longer it's going to take to return those results back to your app server. That increased time means higher contention for locks, and the more SQL Server has to store in TempDB for row versioning. You mentioned that you're heavy on reads, so this particular issue shouldn't be too bad for you, but you want to be aware of it before you reuse this hammer on a write-intensive app.
I think multi recordset stored procedures are great in some cases, and it sounds like yours maybe one of them.
The bigger (more traffic), you site gets, the more important that 'extra' bit of performance is going to matter. If you can combine 2-3-4 calls (and possibly a new connections), to the database in one, you could be cutting down your database hits by 4-6-8 million per day, which is substantial.
I use them sparingly, but when I have, I have never had a problem.
I would recommend having invoking in one stored procedure several inner invocations of stored procedures that return 1 resultset each.
create proc foo
as
execute foobar --returns one result
execute barfoo --returns one result
execute bar --returns one result
That way when requirments change and you only need the 3rd and 5th result set, you have a easy way to invoke them without adding new stored procedures and regenerating your data access layer. My current app returns all reference tables (e.g. US states table) if I want them or not. Worst is when you need to get a reference table and the only access is via a stored procedure that also runs an expensive query as one of its six resultsets.

Profiling SQL Server and/or ASP.NET

How would one go about profiling a few queries that are being run from an ASP.NET application? There is some software where I work that runs extremely slow because of the database (I think). The tables have indexes but it still drags because it's working with so much data. How can I profile to see where I can make a few minor improvements that will hopefully lead to larger speed improvements?
Edit: I'd like to add that the webserver likes to timeout during these long queries.
Sql Server has some excellent tools to help you with this situation. These tools are built into Management Studio (which used to be called Enterprise Manager + Query Analyzer).
Use SQL Profiler to show you the actual queries coming from the web application.
Copy each of the problem queries out (the ones that eat up lots of CPU time or IO). Run the queries with "Display Actual Execution Plan". Hopefully you will see some obvious index that is missing.
You can also run the tuning wizard (the button is right next to "display actual execution plan". It will run the query and make suggestions.
Usually, if you already have indexes and queries are still running slow, you will need to re-write the queries in a different way.
Keeping all of your queries in stored procedures makes this job much easier.
To profile SQL Server, use the SQL Profiler.
And you can use ANTS Profiler from Red Gate to profile your code.
Another .NET profiler which plays nicely with ASP.NET is dotTrace. I have personally used it and found lots of bottlenecks in my code.
I believe you have the answer you need to profile the queries. However, this is the easiest part of performance tuning. Once you know it is the queries and not the network or the app, how do you find and fix the problem?
Performance tuning is a complex thing. But there some places to look at first. You say you are returning lots of data? Are you returning more data than you need? Are you really returning only the columns and records you need? Returning 100 columns by using select * can be much slower than returning the 5 columns you are actually using.
Are your indexes and statistics up-to-date? Look up how to update statisistcs and re-index in BOL if you haven't done this in a awhile. Do you have indexes on all the join fields? How about the fields in the where clause.
Have you used a cursor? Have you used subqueries? How about union-if you are using it can it be changed to union all?
Are your queries sargable (google if unfamiliar with the term.)
Are you using distinct when you could use group by?
Are you getting locks?
There are many other things to look at these are just a starting place.
If there is a particular query or stored procedure I want to tune, I have found turning on statistics before the query to be very useful:
SET STATISTICS TIME ON
SET STATISTICS IO ON
When you turn on statistics in Query Analyzer, the statistics are shown in the Messages tab of the Results pane.
IO statistics have been particularly useful for me, because it lets me know if I might need an index. If I see a high read count from the IO statistics, I might try adding different indexes to the affected tables. As I try an index, I run the query again to see if the read count has gone down. After a few iterations, I can usually find the best index(es) for the tables involved.
Here are links to MSDN for these statistics commands:
SET STATISTICS TIME
SET STATISTICS IO

Resources