Writing SQL to U-SQL Query - u-sql

Can anybody please guide me in writing this below SQL in U-SQL language used in Azure Data Lake
select tt.userId, count(tt.userId) from (SELECT userId,count(userId) as cou
FROM [dbo].[users]
where createdTime> DATEADD(wk,-1,GETDATE())
group by userId,DATEPART(minute,createdTime)/5) tt group by tt.userId
I don't find the DATEPART function in U-SQL . Azure Data Analytic job is giving me error.

U-SQL does not provide T-SQL intrinsic functions except for a few (like LIKE). See https://msdn.microsoft.com/en-us/library/azure/mt621343.aspx for a list.
So how do you do DateTime operations? You just use the C# functions and methods!
So DATEADD(wk, -1, GETDATE()) is something like DateTime.Now.AddDays(-7)
and
DATEPART(minute,createdTime)/5 (there is an extra ) in your line) is something like createdTime.Minute/5 (maybe you need to cast it to a double if you want non-integer value).

For anybody who is looking for the implementation mentioned by Michael. It's like below
#records =
EXTRACT userId string,
createdTime DateTime
FROM "/datalake/input/data.tsv"
USING Extractors.Tsv();
#result =
SELECT
userId,
COUNT(createdTime) AS userCount
FROM #records
WHERE createdTime > DateTime.Now.AddDays(-30)
GROUP BY userId,createdTime.Minute/5;
#result2= SELECT userId,COUNT(userId) AS TotalCount
FROM #result
GROUP BY userId;
OUTPUT #result2
TO "/datalake/output/data.csv"
USING Outputters.Csv();

Related

Explode an Array in Athena

I have a simple table in athena, it has an array of events. I want to write a simple select statement so that each event in array becomes a row.
I tried explode, transform, but no luck. I have successfully done it in Spark and Hive. But this Athena is tricking me. Please advise
DROP TABLE bi_data_lake.royalty_v4;
CREATE external TABLE bi_data_lake.royalty_v4 (
KAFKA_ID string,
KAFKA_TS string,
deviceUser struct< deviceName:string, devicePlatform:string >,
consumeReportingEvents array<
struct<
consumeEvent: string,
consumeEventAction: string,
entryDateTime: string
>
>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://XXXXXXXXXXX';
Query which is not working
select kafka_id, kafka_ts,deviceuser,
transform( consumereportingevents, consumereportingevent -> consumereportingevent.consumeevent) as cre
from bi_data_lake.royalty_v4
where kafka_id = 'events-consumption-0-490565';
Not supported
lateral view explode(consumereportingevents) as consumereportingevent
Answer to question it to use unnset
Found the answer for my question
WITH samples AS (
select kafka_id, kafka_ts,deviceuser, consumereportingevent, consumereportingeventPos
from bi_data_lake.royalty_v4
cross join unnest(consumereportingevents) WITH ORDINALITY AS T (consumereportingevent, consumereportingeventPos)
where kafka_id = 'events-consumption-0-490565' or kafka_id = 'events-consumption-0-490566'
)
SELECT * FROM samples
Flatten ('explode') nested arrays in AWS Athena with UNNEST.
WITH dataset AS (
SELECT
'engineering' as department,
ARRAY['Sharon', 'John', 'Bob', 'Sally'] as users
)
SELECT department, names FROM dataset
CROSS JOIN UNNEST(users) as t(names)
Reference: Flattening Nested Arrays

Query date range?

Trying to preform a date range query on a datasource example:
query.where = 'TransactionDate BETWEEN: StartDate AND EndDate';
This is what I get:
Unexpected input at ': StartDate AND EndDate'.
Error: Unexpected input at ': StartDate AND EndDate'. at datasources.
I was assuming this would work similarly to a MySQL query:
WHERE TransactionDate BETWEEN "2012-03-15" AND "2012-03-31";
In order to use real SQL query you need to go with Calculated SQL model. With query.where = ... you are setting App Maker's Query Builder expression that supports limited set of operations. I think your Query Builder expression will look similar to this:
TransactionDate >= :StartDate AND TransactionDate <= :EndDate

PL/SQL - comma separated list within IN CLAUSE

I am having trouble getting a block of pl/sql code to work. In the top of my procedure I get some data from my oracle apex application on what checkboxes are checked. Because the report that contains the checkboxes is generated dynamically I have to loop through the
APEX_APPLICATION.G_F01
list and generate a comma separated string which looks like this
v_list VARCHAR2(255) := (1,3,5,9,10);
I want to then query on that list later and place the v_list on an IN clause like so
SELECT * FROM users
WHERE user_id IN (v_list);
This of course throws an error. My question is what can I convert the v_list to in order to be able to insert it into a IN clause in a query within a pl/sql procedure?
If users is small and user_id doesn't contain commas, you could use:
SELECT * FROM users WHERE ',' || v_list || ',' LIKE '%,'||user_id||',%'
This query is not optimal though because it can't use indexes on user_id.
I advise you to use a pipelined function that returns a table of NUMBER that you can query directly. For example:
CREATE TYPE tab_number IS TABLE OF NUMBER;
/
CREATE OR REPLACE FUNCTION string_to_table_num(p VARCHAR2)
RETURN tab_number
PIPELINED IS
BEGIN
FOR cc IN (SELECT rtrim(regexp_substr(str, '[^,]*,', 1, level), ',') res
FROM (SELECT p || ',' str FROM dual)
CONNECT BY level <= length(str)
- length(replace(str, ',', ''))) LOOP
PIPE ROW(cc.res);
END LOOP;
END;
/
You would then be able to build queries such as:
SELECT *
FROM users
WHERE user_id IN (SELECT *
FROM TABLE(string_to_table_num('1,2,3,4,5'));
You can use XMLTABLE as follows
SELECT * FROM users
WHERE user_id IN (SELECT to_number(column_value) FROM XMLTABLE(v_list));
I have tried to find a solution for that too but never succeeded. You can build the query as a string and then run EXECUTE IMMEDIATE, see http://docs.oracle.com/cd/B19306_01/appdev.102/b14261/dynamic.htm#i14500.
That said, it just occurred to me that the argument of an IN clause can be a sub-select:
SELECT * FROM users
WHERE user_id IN (SELECT something FROM somewhere)
so, is it possible to expose the checkbox values as a stored function? Then you might be able to do something like
SELECT * FROM users
WHERE user_id IN (SELECT my_package.checkbox_func FROM dual)
Personally, i like this approach:
with t as (select 'a,b,c,d,e' str from dual)
--
select val
from t, xmltable('/root/e/text()'
passing xmltype('<root><e>' || replace(t.str,',','</e><e>')|| '</e></root>')
columns val varchar2(10) path '/'
)
Which can be found among other examples in Thread: Split Comma Delimited String Oracle
If you feel like swamping in even more options, visit the OTN plsql forums.

SQLite Schema Information Metadata

I need to get column names and their tables in a SQLite database. What I need is a resultset with 2 columns: table_name | column_name.
In MySQL, I'm able to get this information with a SQL query on database INFORMATION_SCHEMA. However the SQLite offers table sqlite_master:
sqlite> create table students (id INTEGER, name TEXT);
sqlite> select * from sqlite_master;
table|students|students|2|CREATE TABLE students (id INTEGER, name TEXT)
which results a DDL construction query (CREATE TABLE) which is not helpful for me and I need to parse this to get relevant information.
I need to get list of tables and join them with columns or just get columns along with table name column. So PRAGMA table_info(TABLENAME) is not working for me since I don't have table name. I want to get all column metadata in the database.
Is there a better way to get that information as a result set by querying database?
You've basically named the solution in your question.
To get a list of tables (and views), query sqlite_master as in
SELECT name, sql FROM sqlite_master
WHERE type='table'
ORDER BY name;
(see the SQLite FAQ)
To get information about the columns in a specific table, use PRAGMA table_info(table-name); as explained in the SQLite PRAGMA documentation.
I don't know of any way to get tablename|columnname returned as the result of a single query. I don't believe SQLite supports this. Your best bet is probably to use the two methods together to return the information you're looking for - first get the list of tables using sqlite_master, then loop through them to get their columns using PRAGMA table_info().
Recent versions of SQLite allow you to select against PRAGMA results now, which makes this easy:
SELECT
m.name as table_name,
p.name as column_name
FROM
sqlite_master AS m
JOIN
pragma_table_info(m.name) AS p
ORDER BY
m.name,
p.cid
where p.cid holds the column order of the CREATE TABLE statement, zero-indexed.
David Garoutte answered this here, but this SQL should execute faster, and columns are ordered by the schema, not alphabetically.
Note that table_info also contains
type (the datatype, like integer or text),
notnull (1 if the column has a NOT NULL constraint)
dflt_value (NULL if no default value)
pk (1 if the column is the table's primary key, else 0)
RTFM: https://www.sqlite.org/pragma.html#pragma_table_info
There are ".tables" and ".schema [table_name]" commands which give kind of a separated version to the result you get from "select * from sqlite_master;"
There is also "pragma table_info([table_name]);" command to get a better result for parsing instead of a construction query:
sqlite> .tables
students
sqlite> .schema students
create table students(id INTEGER, name TEXT);
sqlite> pragma table_info(students);
0|id|INTEGER|0||0
1|name|TEXT|0||0
Hope, it helps to some extent...
Another useful trick is to first get all the table names from sqlite_master.
Then for each one, fire off a query "select * from t where 1 = 0". If you analyze the structure of the resulting query - depends on what language/api you're calling it from - you get a rich structure describing the columns.
In python
c = ...db.cursor()
c.execute("select * from t where 1=0");
c.fetchall();
print c.description;
Juraj
PS. I'm in the habit of using 'where 1=0' because the record limiting syntax seems to vary from db to db. Furthermore, a good database will optimize out this always-false clause.
The same effect, in SQLite, is achieved with 'limit 0'.
FYI, if you're using .Net you can use the DbConnection.GetSchema method to retrieve information that usually is in INFORMATION_SCHEMA. If you have an abstraction layer you can have the same code for all types of databases (NOTE that MySQL seems to swich the 1st 2 arguments of the restrictions array).
Try this sqlite table schema parser, I implemented the sqlite table parser for parsing the table definitions in PHP.
It returns the full definitions (unique, primary key, type, precision, not null, references, table constraints... etc)
https://github.com/maghead/sqlite-parser
The syntax follows sqlite create table statement syntax: http://www.sqlite.org/lang_createtable.html
This is an old question but because of the number of times it has been viewed we are adding to the question for the simple reason most of the answers tell you how to find the TABLE names in the SQLite Database
WHAT DO YOU DO WHEN THE TABLE NAME IS NOT IN THE DATABASE ?
This is happening to our app because we are creating TABLES programmatically
So the code below will deal with the issue when the TABLE is NOT in or created by the Database Enjoy
public void toPageTwo(View view){
if(etQuizTable.getText().toString().equals("")){
Toast.makeText(getApplicationContext(), "Enter Table Name\n\n"
+" OR"+"\n\nMake Table First", Toast.LENGTH_LONG
).show();
etQuizTable.requestFocus();
return;
}
NEW_TABLE = etQuizTable.getText().toString().trim();
db = dbHelper.getWritableDatabase();
ArrayList<String> arrTblNames = new ArrayList<>();
Cursor c = db.rawQuery("SELECT name FROM sqlite_master WHERE
type='table'", null);
if (c.moveToFirst()) {
while ( !c.isAfterLast() ) {
arrTblNames.add( c.getString( c.getColumnIndex("name")) );
c.moveToNext();
}
}
c.close();
db.close();
boolean matchFound = false;
for(int i=0;i<arrTblNames.size();i++) {
if(arrTblNames.get(i).equals(NEW_TABLE)) {
Intent intent = new Intent(ManageTables.this, TableCreate.class
);
startActivity( intent );
matchFound = true;
}
}
if (!matchFound) {
Toast.makeText(getApplicationContext(), "No Such Table\n\n"
+" OR"+"\n\nMake Table First", Toast.LENGTH_LONG
).show();
etQuizTable.requestFocus();
}
}

Numbering comments in ASP.NET and SQL Server

I've just thought about best way to store comments in database with appropriate numbers according to the article.
The idea is to store comments with composite primary key (commentId, articleId) where commentId is generated according to the given articleId. The system of generating should has same principle as IDENTITY generated columns in SQL Server, because if someone delete the comment, the number will be never used again. I guess there is not any functionality in Microsoft SQL Server to do that with composite PK, so I am asking about some replacement for this solution.
First thought was to use transaction to get MAX(commentId) + 1, but I am looking for something more abstract (maybe INSTEAD OF trigger), something that could be used for example in LINQ with no knowledge of the background, just insert to the appropriate table all required values (so no commentId) and save it.
I would use an autogenerated identity column for the commentId and have it be the primary key alone. I'd create an index on the articleId for look ups. I would also have createdDate column that is autopopulated with the current date on insertion -- mark it as db generated and readonly in LINQ so it doesn't require or try to insert/update the value. To get a numbering -- if showing them by date isn't enough -- I'd order by createdDate inversed and assign a numeric value in the select using Row_Number() or a numbering on the client side.
I would use an identity column as the key for the comments, why do you need a numbering for the comments stored in the database?
Thank you for responses, I wanted something with numbered comments because of referencing in the text of comments. I did not want to make reaction by names, sometimes one person reacts more times, so with this system, I will know to which one the person is replying.
So today I made up this INSTEAD OF INSERT trigger:
CREATE TRIGGER InsertComments ON Comments
INSTEAD OF INSERT
AS
DECLARE #Inserted TABLE
(
ArticleId INT NOT NULL,
UserId INT NOT NULL,
CommentDate DATETIME NOT NULL,
Content NVARCHAR(1000) NOT NULL,
RowNumber INT NOT NULL
)
INSERT INTO #Inserted
SELECT ArticleId, UserId, CommentDate, Content, ROW_NUMBER() OVER (ORDER BY CommentDate) AS RowNumber
FROM INSERTED
DECLARE #NumberOfRows INT = (SELECT COUNT(*) FROM #Inserted)
DECLARE #i INT = 1
WHILE (#i <= #NumberOfRows)
BEGIN
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRAN
DECLARE #CommentId INT = (SELECT ISNULL(MAX(CommentId), 0)
FROM Comments WHERE ArticleId = (SELECT ArticleId
FROM #Inserted WHERE RowNumber = #i)) + 1
INSERT INTO Comments(CommentId, ArticleId, UserId, CommentDate, Content)
SELECT #CommentId, ArticleId, UserId, CommentDate, Content
FROM #Inserted WHERE RowNumber = #i
COMMIT
SET #i = #i + 1
END
I know this is not the perfect solution, but it works exactly how I needed. If any of you has some comments, I'll be happy to read them.

Resources