I have a simple table in athena, it has an array of events. I want to write a simple select statement so that each event in array becomes a row.
I tried explode, transform, but no luck. I have successfully done it in Spark and Hive. But this Athena is tricking me. Please advise
DROP TABLE bi_data_lake.royalty_v4;
CREATE external TABLE bi_data_lake.royalty_v4 (
KAFKA_ID string,
KAFKA_TS string,
deviceUser struct< deviceName:string, devicePlatform:string >,
consumeReportingEvents array<
struct<
consumeEvent: string,
consumeEventAction: string,
entryDateTime: string
>
>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://XXXXXXXXXXX';
Query which is not working
select kafka_id, kafka_ts,deviceuser,
transform( consumereportingevents, consumereportingevent -> consumereportingevent.consumeevent) as cre
from bi_data_lake.royalty_v4
where kafka_id = 'events-consumption-0-490565';
Not supported
lateral view explode(consumereportingevents) as consumereportingevent
Answer to question it to use unnset
Found the answer for my question
WITH samples AS (
select kafka_id, kafka_ts,deviceuser, consumereportingevent, consumereportingeventPos
from bi_data_lake.royalty_v4
cross join unnest(consumereportingevents) WITH ORDINALITY AS T (consumereportingevent, consumereportingeventPos)
where kafka_id = 'events-consumption-0-490565' or kafka_id = 'events-consumption-0-490566'
)
SELECT * FROM samples
Flatten ('explode') nested arrays in AWS Athena with UNNEST.
WITH dataset AS (
SELECT
'engineering' as department,
ARRAY['Sharon', 'John', 'Bob', 'Sally'] as users
)
SELECT department, names FROM dataset
CROSS JOIN UNNEST(users) as t(names)
Reference: Flattening Nested Arrays
Related
Im trying to write a recursive query for a use on a old and poorly designed database - and so the queries get quite complex.
Here is the (relevant) table relationships
Because people asked - here is the creation code for these tables:
CREATE TABLE CircuitLayout(
CircuitLayoutID int,
PRIMARY KEY (CircuitLayoutID)
);
CREATE TABLE LitCircuit (
LitCircuitID int,
CircuitLayoutID int,
PRIMARY KEY (LitCircuitID)
FOREIGN KEY (CircuitLayoutID) REFERENCES CircuitLayout(CircuitLayoutID)
);
CREATE TABLE CircuitLayoutItem(
CircuitLayoutItemID int,
CircuitLayoutID int,
TableName varchar(255),
TablePK int,
PRIMARY KEY (CircuitLayoutItemID)
FOREIGN KEY (CircuitLayoutID) REFERENCES CircuitLayout(CircuitLayoutID)
);
TableName refers to another table in the database and thus TablePK is a primary key from the specified table
One of the valid options for TableName is LitCircuit
I'm trying to write a query that will select a circuit and any circuit it is related to
I am having trouble understanding the syntax for recursive ctes
my non-functional attempt is this:
WITH RECURSIVE carries AS (
SELECT LitCircuit.LitCircuitID AS recurseList FROM LitCircuit
JOIN CircuitLayoutItem ON LitCircuit.CircuitLayoutID = CircuitLayoutItem.CircuitLayoutID
WHERE CircuitLayoutItem.TableName = "LitCircuit" AND CircuitLayoutItem.TablePK IN (00340)
UNION
SELECT LitCircuit.LitCircuitID AS CircuitIDs FROM LitCircuit
JOIN CircuitLayout ON LitCircuit.CircuitLayoutID = CircuitLayoutItem.CircuitLayoutID
WHERE CircuitLayoutItem.TableName = "LitCircuit" AND CircuitLayoutItem.TablePK IN (SELECT recurseList FROM carries)
)
SELECT * FROM carries;
the "00340" is a dummy number for testing, and it would get replaced with an actual list in usage
What i'm attempting to do is get a list of LitCircuitIDs based on one or many LitCircuitIDs - that's the anchor member, and that works fine.
What I want to do is take this result and feed it back into itself.
I lack an understanding of how to access data from the anchor member:
I don't know if it is a table with the columns from the select in the anchor or if it is simply a list of resulting values
I dont understand if or where I need to include "carries" in the FROM part of a query
If I were to write this function in python I would do it like this:
def get_circuits(circuit_list):
result_list = []
for layout_item_key, layout_item in CircuitLayoutItem.items():
if layout_item['TableName'] == "LitCircuit" and layout_item['TablePK'] in circuit_list:
layout = layout_item['CircuitLayoutID']
for circuit_key, circuit in LitCircuit.items():
if circuit["CircuitLayoutID"] == layout:
result_list.append(circuit_key)
result_list.extend(get_circuits(result_list))
return result_list
How do I express this in SQL?
danblack's comment made me realize something I was missing:
Here is what I was trying to do:
WITH RECURSIVE carries AS (
SELECT LitCircuit.LitCircuitID FROM LitCircuit
JOIN CircuitLayoutItem ON LitCircuit.CircuitLayoutID = CircuitLayoutItem.CircuitLayoutID
WHERE CircuitLayoutItem.TableName = 'LitCircuit' AND CircuitLayoutItem.TablePK IN (00340)
UNION ALL
SELECT LitCircuit.LitCircuitID FROM carries
JOIN CircuitLayoutItem ON carries.LitCircuitID = CircuitLayoutItem.TablePK
JOIN LitCircuit ON CircuitLayoutItem.CircuitLayoutID = LitCircuit.CircuitLayoutID
WHERE CircuitLayoutItem.TableName = 'LitCircuit'
)
SELECT DISTINCT LitCircuitID FROM carries;
I did not think of the CTE as a table to query against - rather just a result set, so I did not realize you have to SELECT from it - or in general treat it like a table.
I am trying to assign or give all permissions of a user to another given user, 13053 but facing this Oracle error, ORA-01427: single-row subquery returns more than one row and i know exactly which part of my SQL statement shown below is returning this error but failed to handle it because what i want to achieve is to give those multiple rows returned to the given user with an id of 13053.
My attempt
INSERT INTO userpermissions (
userid,permissionid
) VALUES (
13053,( SELECT permissionid
FROM userpermissions
WHERE userid = ( SELECT userid
FROM users
WHERE username = '200376'
)
)
);
Any help ?
Thanks in advance.
A rewrite ought to do the trick:
INSERT INTO USERPERMISSIONS(
USERID,
PERMISSIONID
)
SELECT 13053 AS USERID,
p.PERMISSIONID
FROM USERPERMISSIONS p
WHERE p.userid = (SELECT userid FROM users WHERE username = '200376');
The problem with the original insert is that you are using single-row insert syntax when you are really trying to insert a set of rows.
Including the target userid as a literal is one way to make the set of rows look the way I am assuming you intend.
Can anybody please guide me in writing this below SQL in U-SQL language used in Azure Data Lake
select tt.userId, count(tt.userId) from (SELECT userId,count(userId) as cou
FROM [dbo].[users]
where createdTime> DATEADD(wk,-1,GETDATE())
group by userId,DATEPART(minute,createdTime)/5) tt group by tt.userId
I don't find the DATEPART function in U-SQL . Azure Data Analytic job is giving me error.
U-SQL does not provide T-SQL intrinsic functions except for a few (like LIKE). See https://msdn.microsoft.com/en-us/library/azure/mt621343.aspx for a list.
So how do you do DateTime operations? You just use the C# functions and methods!
So DATEADD(wk, -1, GETDATE()) is something like DateTime.Now.AddDays(-7)
and
DATEPART(minute,createdTime)/5 (there is an extra ) in your line) is something like createdTime.Minute/5 (maybe you need to cast it to a double if you want non-integer value).
For anybody who is looking for the implementation mentioned by Michael. It's like below
#records =
EXTRACT userId string,
createdTime DateTime
FROM "/datalake/input/data.tsv"
USING Extractors.Tsv();
#result =
SELECT
userId,
COUNT(createdTime) AS userCount
FROM #records
WHERE createdTime > DateTime.Now.AddDays(-30)
GROUP BY userId,createdTime.Minute/5;
#result2= SELECT userId,COUNT(userId) AS TotalCount
FROM #result
GROUP BY userId;
OUTPUT #result2
TO "/datalake/output/data.csv"
USING Outputters.Csv();
I am new in Hive and I am trying to count distinct words_values from my whole words column.
id---------------------------words
435400064446779392 [{"words_value":"i","words_id":"1"},{"words_value":"hate","words_id":"2"}]
Notice that the words column is an array. I have much more rows but this above is to show an example.
I have tried:
SELECT words.words_value,count(words.words_value) from T1 GROUP BY words.words_value WITH ROLLUP;
But it counts in each rows.
Does anyone have any idea?
The explode UDTF is useful for converting nested data structures into ordinary tables that work with ordinary SQL statements. Since you have an array of maps you would need to use explode twice.
select count(distinct value) from
( select explode(col) from
( select explode(words) from mytable ) subquery1
) subquery2
where
key = "words_value";
How can I put a table name dynamically in a query?
Suppose I have a query as shown below:
Select a.amount
,b.sal
,a.name
,b.address
from alloc a
,part b
where a.id=b.id;
In the above query I want to use a table dynamically (part b if the database is internal, p_part b if the database if external).
I have a function that returns which database it is. Suppose the function is getdatabase();
select decode(getdatabase(),'internal','part b','external','p_part b')
from dual;
How can I use this function in my main query to insert the table name dynamically into the query?
I don't want to implement this using the primitive way of by appending strings to make a final query and then open cursor with that string.
I don't want to implement this with primitive way of by appending
strings to make a final query and then open cursor with that string .
That's really the only way you can do it. It's not possible to use a variable or function call for the table name when using a regular PL/SQL SQL block, you have to use dynamic SQL.
Refer to Oracle documentation for more details:
http://docs.oracle.com/cd/B10500_01/appdev.920/a96590/adg09dyn.htm
Here's an example from the doc:
EXECUTE IMMEDIATE 'SELECT d.id, e.name
FROM dept_new d, TABLE(d.emps) e -- not allowed in static SQL
-- in PL/SQL
WHERE e.id = 1'
INTO deptid, ename;
You can do this without dynamic SQL, assuming both tables (part and p_part) are available at compile time:
select a.amount
,b.sal
,a.name
,b.address
from alloc a
,part b
where a.id=b.id
and (select getdatabase() from dual) = 'internal'
UNION ALL
select a.amount
,b.sal
,a.name
,b.address
from alloc a
,p_part b
where a.id=b.id
and (select getdatabase() from dual) = 'external'
;
I've put the function call in a subquery so that it is run only once per call (i.e. twice, in this instance).