Convert Rowset variables to scalar value - u-sql

Is it possible to convert rowset variables to scalar value for eg.
#maxKnownId =
SELECT MAX(Id) AS maxID
FROM #PrevDayLog;
DECLARE #max int = #maxKnownId;

There is no implicit conversion of a single-cell rowset to a scalar value in U-SQL (yet).
What are you interested in using the value for?
Most of the time you can write your U-SQL expression in a way that you do not need the scalar variable. E.g., if you want to use the value in a condition in another query, you could just use the single value rowset in a join with the other query (and with the right statistics, I am pretty sure that the optimizer would turn it into a broadcast join).
If you feel you cannot easily write the expression without the rowset to a scalar, please let us know via http://aka.ms/adlfeedback by providing your scenario.

Thanks for input, below is the business cases -
We have catalog data coming from source for which we need to generate unique ids. With ROW_NUMBER() OVER() AS Id method we can generate unique id. But while merging new records it changes ids of existing records also and causes issues with relational data
Below is simple solutions
//get max id from existing catalog
#maxId =
SELECT (int)MAX(Id) AS lastId
FROM #ExistingCat;
//because #maxId is not scalar, we will do CROSS JOIN so that maxId is repeated for every record.
//ROW_NUMBER() always starts from 1, we can generate next Id with maxId+ROW_NUMBER()
#newRecordsWithId =
SELECT (int)lastId + (int)ROW_NUMBER() OVER() AS Id,
CatalogItemName
FROM #newRecords CROSS JOIN #maxId;

Related

When to create multi-column indices in SQLite?

Assume I have a table in an SQLite database:
CREATE TABLE orders (
id INTEGER PRIMARY KEY,
price INTEGER NOT NULL,
updateTime INTEGER NOT NULL,
) [WITHOUT ROWID];
what indices should I create to optimize the following query:
SELECT * FROM orders WHERE price > ? ORDER BY updateTime DESC;
Do I create two indices:
CREATE INDEX i_1 ON orders(price);
CREATE INDEX i_2 ON orders(updateTime);
or one complex index?
CREATE INDEX i_3 ON orders(price, updateTime);
What can be query time complexity?
From The SQLite Query Optimizer Overview/WHERE Clause Analysis:
If an index is created using a statement like this:
CREATE INDEX idx_ex1 ON ex1(a,b,c,d,e,...,y,z);
Then the index might
be used if the initial columns of the index (columns a, b, and so
forth) appear in WHERE clause terms. The initial columns of the index
must be used with the = or IN or IS operators. The right-most column
that is used can employ inequalities.
As explained also in The SQLite Query Optimizer Overview/The Skip-Scan Optimization with an example:
Because the left-most column of the index does not appear in the WHERE
clause of the query, one is tempted to conclude that the index is not
usable here. However, SQLite is able to use the index.
This means than if you create an index like:
CREATE INDEX idx_orders ON orders(updateTime, price);
it might be used to optimize the WHERE clause even though updateTime does not appear there.
Also, from The SQLite Query Optimizer Overview/ORDER BY Optimizations:
SQLite attempts to use an index to satisfy the ORDER BY clause of a
query when possible. When faced with the choice of using an index to
satisfy WHERE clause constraints or satisfying an ORDER BY clause,
SQLite does the same cost analysis described above and chooses the
index that it believes will result in the fastest answer.
Since updateTime is defined first in the composite index, the index may also be used to optimize the ORDER BY clause.

SQLite C API equivalent to typeof(col)

I want to detect column data types of any SELECT query in SQLite.
In the C API, there is const char *sqlite3_column_decltype(sqlite3_stmt*,int) for this purpose. But that only works for columns in a real table. Expressions, such as LOWER('ABC'), or columns from queries like PRAGMA foreign_key_list("mytable"), always return null here.
I know there is also typeof(col), but I don't have control over the fired SQL, so I need a way to extract the data type out of the prepared statement.
You're looking for sqlite3_column_type():
The sqlite3_column_type() routine returns the datatype code for the initial data type of the result column. The returned value is one of SQLITE_INTEGER, SQLITE_FLOAT, SQLITE_TEXT, SQLITE_BLOB, or SQLITE_NULL. The return value of sqlite3_column_type() can be used to decide which of the first six interface should be used to extract the column value.
And remember that in sqlite, type is for the most part associated with value, not column - different rows can have different types stored in the same column.

Date difference between two separate rows in SQLite with no ID

I have data in SQLite like this (a few thousands of rows):
1536074432|startRecording
1536074434|stopRecording
1536074443|startRecording
1536074447|stopRecording
1536074458|startRecording
1536074462|stopRecording
And I'd like to get the amounts of seconds passed between two consecutive distinct events (basically how many seconds of video I've recorded).
I know about another similar question (
Date Difference between consecutive rows ), but in my case it's different because I cannot get the "next" row by ID, but I have to get it based on a different event name.
There is an answer that works magic, but it's specific to SQL Server ( Query to find the time difference between successive events ), and I need this for SQLite.
I could do this in Oracle with the LAG / LEAD functions, but no idea how to do it in SQLite.
I could also do this with a separate parsing script, but I think it would be more efficient to be able to do this directly from a query.
Even though there is no id in the table, sqlite stores a rowid (from sqlite CREATE_TABLE doc):
ROWIDs and the INTEGER PRIMARY KEY
Except for WITHOUT ROWID tables, all rows within SQLite tables have a 64-bit signed integer key that uniquely identifies the row within its table. This integer is usually called the "rowid". The rowid value can be accessed using one of the special case-independent names "rowid", "oid", or "rowid" in place of a column name. If a table contains a user defined column named "rowid", "oid" or "rowid", then that name always refers the explicitly declared column and cannot be used to retrieve the integer rowid value.
Assuming perfectly clean data as described :) how about:
select a.rowid,a.time,a.event,b.rowid,b.time,b.event,b.time - a.time as elapsed --,sum(b.time-a.time)
from t2 a, t2 b
where a.rowid % 2 = 1
and b.rowid = a.rowid + 1

Use views and table valued functions as node or edge tables in match clauses

I like to use Table Valued functions in MATCH clauses in the same way as is possible with Node tables. Is there a way to achieve this?
The need for table valued functions
There can be various use cases for using table valued functions or views as Node tables. For instance mine is the following.
I have Node tables that contain NVarChar(max) fields that I would like to search for literal text. I need only equality searching and no full text searching, so I opted for using a index on the hash value of the text field. As suggested by Remus Rusanu in his answer to SQL server - worth indexing large string keys? and https://www.brentozar.com/archive/2013/05/indexing-wide-keys-in-sql-server/. A table valued function handles using the CHECKSUM index; see Msg 207 Invalid column name $node_id for pseudo column in inline table valued function.
Example data definitions
CREATE TABLE [Tags](
[tag] NVarChar(max),
[tagHash] AS CHECKSUM([Tag]) PERSISTED NOT NULL
) as Node;
CREATE TABLE [Sites](
[endPoint] NVarChar(max),
[endPointHash] AS CHECKSUM([endPoint]) PERSISTED NOT NULL
) as Node;
CREATE TABLE [Links] as Edge;
CREATE INDEX [IX_TagsByName] ON [Tags]([tagHash]);
GO
CREATE FUNCTION [TagsByName](
#tag NVarChar(max))
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN SELECT
$node_id AS [NodeId],
[tag],
[tagHash]
FROM [dbo].[Tags]
WHERE [tagHash] = CHECKSUM(#tag) AND
[tag] = #tag;
[TagsByName] returns the $node_id with an alias NodeId as suggested by https://stackoverflow.com/a/45565410/814206. However, real Node tables contain two more internal columns which I do not know how to export.
Desired query
I would like to query the database similar to this:
SELECT *
FROM [TagsByName]('important') as t,
[Sites] as s,
[Links] as l
WHERE MATCH ([t]-([l])->[s])
However, this results in the error1:
Msg 13901, Level 16, State 2, Line ...
Identifier 't' in a MATCH clause is not a node table or an alias for a node table.
I there a way to do this?
PS. There are some workarounds but they do not look as elegant as the MATCH-query; especially considering that my actual query involves matching more relations and more string equality tests. I will post these workarounds as answers and hope that someone comes with a better idea.
1 This gives a very specific difference between views and tables for Difference between View and table in sql; which only occurs in sql-server-2017 and only when using SQL Graph.
Workaround
Revert to traditional relational joins via JOIN clauses or FROM with <table_or_view_name> and WHERE clauses. In queries that match on more relations, the latter has the advantage that sql-server-2017-graph can MATCH on FROM <table_or_view_name> but not on FROM <table_source> JOIN <table_source>.
SELECT *
FROM [TagsByName]('important') as t
[Sites] as s,
[Links] as l
WHERE t.NodeId = l.$from_id AND
l.$to_id = s.$node_id;
Workaround
Add the Node table twice to the from clause: once as table and once as table valued function and join them via the $node_id in the where clause:
SELECT *
FROM [TagsByName]('important') as t1,
[Tags] as t2,
[Sites] as s,
[Links] as l
WHERE MATCH ([t2]-([l])->[s]) AND
t1.[NodeId] = t2.$node_id
Does this affect performance?
Workaround
Do not use the table valued function, but include its expression in the WHERE clause:
SELECT *
FROM [Tags] as t,
[Sites] as s,
[Links] as l
WHERE MATCH ([t]-([l])->[s]) AND
[t].[tagHash] = CHECKSUM('important') AND
[t].[tag] = 'important'
Downside: This is easy to get wrong; for example by forgetting to join on the CHECKSUM

Multiple inserts using 'User Defined Table Type' (asp.net sql-server)

In my ASP.NET web app I have a DataTable filled with data to insert into tblChildren.
The DataTable is passed to a stored procedure.
In the SP I need to read each row (e.i Loop through the DataTable), change a couple of columns in it (with accordance to the relevant data in tblParents) and only then insert the row into tblChildren.
SqlBulkCopy wouldn't do and I don't think TVP will do either (not sure... not too familiar with it yet).
Of course I can iterate through the DataTable rows in the app and send each one separately to the SP, but that would mean hundreds of round trips to the SqlServer.
I came across two possibilities that might achieve that : (1) Temp table
(2) Cursor.
The first is quite messy and the second, as I understand it, is NOT recommended)
Any guidance would be much appreciated.
EDIT :
I tried the approach of user-defined Table Type.
That works because I populate the Table Type (TT_Children) with values in the TT_Child_Family_Id column.
In real life, though, I will not know these values and I would need to loop thru #my_TT_Children and for each row get the value from tblFamilies, something like this :
SELECT Family_Id FROM tblFamilies WHERE Family_Name = TT_Child_Last_Name
(assuming there is always an equivalent for TT_Child_Last_Name in tblFamilies.Family_Name)
So my question is - how to loop through the table-type and for each row look up a value in a different table?
EDIT 2 (the solution) :
As in Amir's perfect answer, the stored procedure should look like this :
ALTER PROCEDURE [dbo].[usp_Z_Insert_Children]
#my_TT_Children TT_Children READONLY
AS
BEGIN
INSERT INTO tblChildren(Child_FirstName,
Child_LastName,
Child_Family_ID)
SELECT Cld.tt_child_FirstName,
Cld.tt_child_LastNAme,
Fml.Family_Id FROM #my_TT_Children Cld
INNER JOIN tblFamilies fml
ON Cld.TT_Child_LastName = Fml.Family_Name
END
Notes by Amir : column Family_Name in tblFamily must be unique and preferably indexed.
(Also I noticed that in case TT_Child_LastName does not have a match in tblFamilies, the row will not be inserted and I'll never know about it. That means that I have to check somehow if all rows were successfully processed).
You can join tblFamilies into the insert in the procedure and take the value from there. Much more efficient than looping through.
Or create a cursor and do one child at a time.
1) Make sure there is only one occurance of FamilyName in tblFamilies.
2) Make sure that if tblFamilies is a large table, then the FamilyName column is indexed.
INSERT INTO tblChildren(Child_FirstName, Child_LastName, Child_Family_ID)
SELECT Cld.tt_child_FirstName,
Cld.tt_child_LastNAme,
Fml.FamilyID
FROM #my_TT_Children Cld
INNER JOIN tblFamilies fml on Cld.TT_Child_LastName = Fml.FamilyName
But be aware that if tblFamilies has more than one entry per Family_Name, then this will duplicate the data. In this case you will need to add more restrictions in the where.

Resources