Common table expression functionality in SQLite - sqlite

I need to apply two successive aggregate functions to a dataset (the sum of a series of averages), something that is easily and routinely done with common table expressions in SQL Server or another DBMS that supports CTEs. Unfortunately, I am currently stuck with SQLite which does not support CTEs. Is there an alternative or workaround for achieving the same result in SQLite without performing two queries and rolling up the results in code?
To add a few more details, I don't think it could be easily done with views because the first set of aggregate values need to be retrieved based on a WHERE clause with several parameters. E.g.,
SELECT avg(elapsedTime)
FROM statisticsTable
WHERE connectionId in ([lots of values]) AND
updateTime > [startTime] AND
updateTime < [endTime]
GROUP BY connectionId
And then I need the sum of those averages.

Now that we are in THE FUTURE, let me note here that SQLite now does support Common Table Expressions, as of version 3.8.3 of 2014-02-03.
http://www.sqlite.org/lang_with.html

Would this work?
SELECT SUM(t.time) as sum_of_series_of_averages
FROM
(
SELECT avg(elapsedTime) as time
FROM statisticsTable
WHERE connectionId in ([lots of values]) AND
updateTime > [startTime] AND
updateTime < [endTime]
GROUP BY connectionId
) as t
By converting your averages into an inline view, you can SUM() the averages.
Is this what you are looking for?

As you've mentioned, SQLite doesn't support CTEs, window functions, or any of the like.
You can, however, write your own user functions that you can call inside SQLite by registering them to the database with the SQLite API using sqlite_create_function(). You register them with the database, and then you can use them in your own application code. You can make an aggregate function that would perform the sum of a series of averages based on the individual column values. For each value, a step-type callback function is called that allows you to perform some calculation on the data, and a pointer for holding state data is also available.
In your SQL, then, you could register a custom function called sum_of_series_of_averages and have:
SELECT sum_of_series_of_averages(columnA,columnB)
FROM table
WHERE ...
For some good examples on how those work, you should check out the SQLite source code, and also check out this tutorial (search for Defining SQLite User Functions).

Related

Apply multiple functions to update table using kusto

I want to produce/update the output of a table using several functions. Becasue each functions will create separate columns. For me it would be relatively practical to write several functions for it.
To update a table using one function is documented in the documentation. https://learn.microsoft.com/en-us/azure/data-explorer/kusto/management/updatepolicy
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/management/alter-table-update-policy-command
But this case is not informed. is it even possible to do it ? if so How ?
is is this right way to do this ? .set-or-append TABLE_NAME<| FUNCTION1 <| FUNCTION2 <| FUNCTION3
You can chain update policies as much as you need (as long as it does not create a circular reference), this means that Table B can have an update policy that runs a function over Table A and Table C can have an update policy that runs a function over Table B.
If you don't need the intermediate tables you can set their retention policy to 0 days, this means that no data will actually be ingested into these tables.

Not able to have commands in User-Defined functions in Kusto

I am trying to create a function that will accept name of tag and a datetime value and drop a extent within a specific table which has that tag and then ingest a new record into that table with the same tag and the input datetime value -- sort of 'update' simulation. I am not bothered about performance, it's just going to hold metadata -- maybe 20-30 rows at max.
So this is how the create table looks:-
.create table MyTable(sometext:string,somevalue:datetime)
And shown below is my function creation step, which is failing:-
.create-or-alter function MyFunction(arg_sometext:string,arg_somedate:datetime)
{
.drop extents <| .show table MyTable extents where tags has arg_sometext;
.ingest inline into table MyTable with (tags="[arg_sometext]") <| arg_somedate
}
So you can see I am trying to do something simple -- I am suspecting that Kusto won't allow commands in a function. Is there any workaround for achieving this?
Generally:
Kusto mandates that control commands start with a dot (.), and that this must be the first character in the text of the command. As queries, functions, etc. don't start with a dot, this precludes them from invoking control commands.
This is an intentional limitation that prevents a wide range of code injection attacks. By imposing this rule, Kusto makes it easy to guarantee that any query that does not begin with a dot will only have read access to the data and metadata, and never be able to alter them.
Specifically: with regards to your specific scenario:
I'm assuming it's triggered automatically (even if you did have the option to create a function), which suggests you should be able to achieve your goal using Kusto's API / Client libraries and a simple script/app.
An alternative, and perhaps even better approach, would be to re-consider if you actually need to delete or update specific records, or you can use summarize arg_max() in order to query for only the latest "versions" of the records (you could also create a function which encapsulates that logic and overrides the table, by naming the function with the table's name).

Dynamodb query expression

Team,
I have a dynamodb with a given hashkey (userid) and sort key (ages). Lets say if we want to retrieve the elements as "per each hashkey(userid), smallest age" output, what would be the query and filter expression for the dynamo query.
Thanks!
I don't think you can do it in a query. You would need to do full table scan. If you have a list of hash keys somewhere, then you can do N queries (in parallel) instead.
[Update] Here is another possible approach:
Maintain a second table, where you have just a hash key (userID). This table will contain record with the smallest age for given user. To achieve that, make sure that every time you update main table you also update second one if new age is less than current age in the second table. You can use conditional update for that. Update can either be done by application itself, or you can have AWS lambda listening to dynamoDB stream. Now if you need smallest age for each use, you still do full table scan of the second table, but this scan will only read relevant records, to it will be optimal.
There are two ways to achieve that:
If you don't need to get this data in realtime you can export your data into a other AWS systems, like EMR or Redshift and perform complex analytics queries there. With this you can write SQL expressions using joins and group by operators.
You can even perform EMR Hive queries on DynamoDB data, but they perform scans, so it's not very cost efficient.
Another option is use DynamoDB streams. You can maintain a separate table that stores:
Table: MinAges
UserId - primary key
MinAge - regular numeric attribute
On every update/delete/insert of an original query you can query minimum age for an updated user and store into the MinAges table
Another option is to write something like this:
storeNewAge(userId, newAge)
def smallestAge = getSmallestAgeFor(userId)
storeSmallestAge(userId, smallestAge)
But since DynamoDB does not has native transactions support it's dangerous to run code like that, since you may end up with inconsistent data. You can use DynamoDB transactions library, but these transactions are expensive. While if you are using streams you will have consistent data, at a very low price.
You can do it using ScanIndexForward
YourEntity requestEntity = new YourEntity();
requestEntity.setHashKey(hashkey);
DynamoDBQueryExpression<YourEntity> queryExpression = new DynamoDBQueryExpression<YourEntity>()
.withHashKeyValues(requestEntity)
.withConsistentRead(false);
equeryExpression.setIndexName(IndexName); // if you are using any index
queryExpression.setScanIndexForward(false);
queryExpression.setLimit(1);

How to use dynamic values while executing SQL scripts in R

My R workflow now involves dealing with a lot of queries (RPostgreSQL library). I really want to make code easy to maintain and manage in the future.
I started loading large queries from separate .SQL files (this helped) and it worked great.
Then I started using interpolated values (that helped) which means that I can write
SELECT * FROM table WHERE value = ?my_value;
and (after loading it into R) interpolate it using sqlInterpolate(ANSI(), query, value = "stackoverflow").
What happens now is I want to use something like this
SELECT count(*) FROM ?my_table;
but how can I make it work? sqlInterpolate() only interpolates safely by default. Is there a workaround?
Thanks
In ?DBI::SQL, you can read:
By default, any user supplied input to a query should be escaped using
either dbQuoteIdentifier() or dbQuoteString() depending on whether it
refers to a table or variable name, or is a literal string.
Also, on this page:
You may also need dbQuoteIdentifier() if you are creating tables or
relying on user input to choose which column to filter on.
So you can use:
sqlInterpolate(ANSI(),
"SELECT count(*) FROM ?my_table",
my_table = dbQuoteIdentifier(ANSI(), "table_name"))
# <SQL> SELECT count(*) FROM "table_name"
sqlInterpolate() is for substituting values only, not other components like table names. You could use other templating frameworks such as brew or whisker.

Determine flyway variables from earlier SQL step

I'd like to use flyway for a DB update with the situation that an DB already exists with productive data in it. The problem I'm looking at now (and I did not find a nice solution yet), is the following:
There is an existing DB table with numeric IDs, e.g.
create table objects ( obj_id number, ...)
There is a sequence "obj_seq" to allocate new obj_ids
During my DB migration I need to introduce a few new objects, hence I need new
object IDs. However I do not know at development time, what ID
numbers these will be
There is a DB trigger which later references these IDs. To improve performance I'd like to avoid determine the actual IDs every time the trigger runs but rather put the IDs directly into the trigger
Example (very simplified) of what I have in mind:
insert into objects (obj_id, ...) values (obj_seq.nextval, ...)
select obj_seq.currval from dual
-> store this in variable "newID"
create trigger on some_other_table
when new.id = newID
...
Now, is it possible to dynamically determine/use such variables? I have seen the flyway placeholders but my understanding is that I cannot set them dynamically as in the example above.
I could use a Java-based migration script and do whatever string magic I like - so, that would be a way of doing it, but maybe there is a more elegant way using SQL?
Many thx!!
tge
If the table you are updating contains only reference data, get rid of the sequence and assign the IDs manually.
If it contains a mix of reference and user data, you need to select the id based on values in other columns.

Resources