Hive Distinct count over array of maps - count

I am new in Hive and I am trying to count distinct words_values from my whole words column.
id---------------------------words
435400064446779392 [{"words_value":"i","words_id":"1"},{"words_value":"hate","words_id":"2"}]
Notice that the words column is an array. I have much more rows but this above is to show an example.
I have tried:
SELECT words.words_value,count(words.words_value) from T1 GROUP BY words.words_value WITH ROLLUP;
But it counts in each rows.
Does anyone have any idea?

The explode UDTF is useful for converting nested data structures into ordinary tables that work with ordinary SQL statements. Since you have an array of maps you would need to use explode twice.
select count(distinct value) from
( select explode(col) from
( select explode(words) from mytable ) subquery1
) subquery2
where
key = "words_value";

Related

Explode an Array in Athena

I have a simple table in athena, it has an array of events. I want to write a simple select statement so that each event in array becomes a row.
I tried explode, transform, but no luck. I have successfully done it in Spark and Hive. But this Athena is tricking me. Please advise
DROP TABLE bi_data_lake.royalty_v4;
CREATE external TABLE bi_data_lake.royalty_v4 (
KAFKA_ID string,
KAFKA_TS string,
deviceUser struct< deviceName:string, devicePlatform:string >,
consumeReportingEvents array<
struct<
consumeEvent: string,
consumeEventAction: string,
entryDateTime: string
>
>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://XXXXXXXXXXX';
Query which is not working
select kafka_id, kafka_ts,deviceuser,
transform( consumereportingevents, consumereportingevent -> consumereportingevent.consumeevent) as cre
from bi_data_lake.royalty_v4
where kafka_id = 'events-consumption-0-490565';
Not supported
lateral view explode(consumereportingevents) as consumereportingevent
Answer to question it to use unnset
Found the answer for my question
WITH samples AS (
select kafka_id, kafka_ts,deviceuser, consumereportingevent, consumereportingeventPos
from bi_data_lake.royalty_v4
cross join unnest(consumereportingevents) WITH ORDINALITY AS T (consumereportingevent, consumereportingeventPos)
where kafka_id = 'events-consumption-0-490565' or kafka_id = 'events-consumption-0-490566'
)
SELECT * FROM samples
Flatten ('explode') nested arrays in AWS Athena with UNNEST.
WITH dataset AS (
SELECT
'engineering' as department,
ARRAY['Sharon', 'John', 'Bob', 'Sally'] as users
)
SELECT department, names FROM dataset
CROSS JOIN UNNEST(users) as t(names)
Reference: Flattening Nested Arrays

Insert into Table with the first column being a Sequence

I am trying to use an Insert, Sequence and Select * to work together.
INSERT INTO BRK_INDV
Select * from (Select brk_seq.NEXTVAL as INDV_SEQ, a.*
FROM (select to_date(to_char(REQUEST_DATETIME,'DD-MM-YYYY'),'DD-MM-YYYY') BUSINESS_DAY, to_char(REQUEST_DATETIME,'hh24') src_hour,
CASE tran_type
WHEN 'V' THEN 'Visa'
WHEN 'M' THEN 'MasterCard'
ELSE tran_type
end text,
tran_type, count(*) as count
from DLY_STATS
where 1=1
AND to_date(to_char(REQUEST_DATETIME,'DD-MM-YYYY'),'DD-MM-YYYY') = '09-FEB-2015'
group by to_date(to_char(REQUEST_DATETIME,'DD-MM-YYYY'),'DD-MM-YYYY'),to_char(REQUEST_DATETIME,'hh24'),tran_type order by src_hour)a);
This gives me the following error:
ERROR at line 2:
ORA-02287: sequence number not allowed here
I tried to remove the order by and still the same error.
However, if I only run
Select brk_seq.NEXTVAL as INDV_SEQ, a.*
FROM (select to_date(to_char(REQUEST_DATETIME,'DD-MM-YYYY'),'DD-MM-YYYY') BUSINESS_DAY, to_char(REQUEST_DATETIME,'hh24') src_hour,
CASE tran_type
WHEN 'V' THEN 'Visa'
WHEN 'M' THEN 'MasterCard'
ELSE tran_type
end text,
tran_type, count(*) as count
from DLY_STATS
where 1=1
AND to_date(to_char(REQUEST_DATETIME,'DD-MM-YYYY'),'DD-MM-YYYY') = '09-FEB-2015'
group by to_date(to_char(REQUEST_DATETIME,'DD-MM-YYYY'),'DD-MM-YYYY'),to_char(REQUEST_DATETIME,'hh24'),tran_type order by src_hour)a;
It shows me proper entries. Then, why is select * not working for that?
Kindly help.
I see what you're trying to do. You want to insert rows into the BRK_INDV table in a particular order. The sequence number, which I assume will be the primary key of BRK_INDV, will be generated sequentially in the sorted order of the input rows.
You are working with a relational database. One of the first characteristics we all learn about a relational database is that the order of the rows in a table is insignificant. That's just a fancy word for fugitaboutit.
You cannot assume that a select * from table will return the rows in the same order they were written. It might. It might for quite a long time. Then something -- the number of rows, the grouping of some column values, the phase of the moon -- something will change and you will get them out in a seemingly totally random order.
If you want order, it must be imposed in the query, not the insert.
Here's the statement you should be executing:
INSERT INTO BRK_INDV
With
Grouped( Business_Day, Src_Hour, Text, Tran_Type, Count )As(
Select Trunc( Request_Datetime ) Business_Day,
To_Char( Request_Datetime, 'hh24') Src_Hour,
Case Tran_Type
When 'V' Then 'Visa'
When 'M' Then 'MasterCard'
Else Tran_Type
end Text,
Tran_Type, count(*) as count
from DLY_STATS
Where 1=1 --> Generated as dynamic SQL?
And Request_Datetime >= Date '2015-02-09'
And Request_Datetime < Date '2015-02-10'
Group By Trunc( Request_Datetime ), To_Char( Request_Datetime, 'hh24'), Tran_Type
)
Select brk_seq.Nextval Indv_Seq, G.*
from Grouped G;
Notice there is no order by. If you want to see the generated rows in a particular order:
select * from Brk_Indv order by src_hour;
Since there could be hundreds or thousands of transactions in any particular hour, you probably order by something other than hour anyway.
In Oracle, the trunc function is the best way to get a date with the time portion stripped away. However, you don't want to use it in the where clause (or, aamof, any other function such as to_date or to_char)as that would make the clause non-sargable and result in a complete table scan.
The problem is that you can't use a sequence in a subquery. For example, this gives the same ORA-02287 error you are getting:
create table T (x number);
create sequence s;
insert into T (select * from (select s.nextval from dual));
What you can do, though, is create a function that returns nextval from the sequence, and use that in a subquery:
create function f return number as
begin
return s.nextval;
end;
/
insert into T (select * from (select f() from dual));

Sqlite Query replacing a column with a column from another table

I have 2 tables, one is indexing the other.
I am querying Table#1, and it has one column (string) that has an ID in it that corresponds to a unique row in Table#2. Im trying to write a query in Sqlite that allows me to retrieve the value from Table#2 if the column value in Table#1 is not an empty string.
Kinda like:
"SELECT TMake,TModel,TTrim,IYear,[%q] AS TPart1 FROM AppGuide WHERE TPart1 != ''"
But instead of retrieving the Index value (TPart1) Id like to get the string from Table#2.
Is this possible?
Any help is appreciated.
You could use a correlated subquery:
SELECT TMake,
TModel,
...,
(SELECT stringvalue
FROM Table2
WHERE Table2.ID = Table1.TPart1)
FROM Table1
WHERE Table1.TPart1 != ''
However, these are rather slow to execute, so you'd better use a join (this returns exactly the same result):
SELECT Table1.TMake,
Table1.TModel,
...,
Table2.stringvalue
FROM Table1 LEFT JOIN Table2 ON Table1.TPart1 = Table2.ID
WHERE Table1.TPart1 != ''
If you don't want to get records from Table1 that have no matching Table2 record, drop the LEFT.

Multiple query from multiple tables in sqlite

I am using it but no value i found...I think there is mistake in this query....Actually I want to know how to use multiple sum, multiplication etc using mutiple tables in sqlite
SELECT
dhid, dprice, dname,
SUM(dmilk) AS totalmilk,
dprice*SUM(dmilk) AS totalmilkamt,
SUM(ghee) AS toalghee,
SUM(ghee*gheeprice) AS totalgheeamt,
SUM(ghee*gheeprice)+dprice*SUM(dmilk) AS totals,
SUM(cashamount) AS totalcash,
SUM(ghee*gheeprice)+dprice*SUM(dmilk)-SUM(cashamount) AS balance
FROM
( SELECT *
FROM costumer
LEFT OUTER JOIN salesdata
ON costumer.dhid=salesdata.ddhid
LEFT OUTER JOIN cashdata
ON salesdata.ddhid=cashdata.uid
AND utype='costumer')
WHERE dmonth='$mikdatem'
AND dyear='$mikdatey'
AND dhid='$dhid'
ORDER BY dhid ASC
Your select above will not help us because we don't have the underlying data to get an idea what you wish to do.
So the generalistic answer is this:
when using grouping-functions (SUM/COUNT...) you always require some form of "GROUP BY" to columns not used in those group-functions.
Example given:
SELECT name, sum(dmilk)
FROM milk_entry
GROUP BY name

MySQL Changing Order Depending On Contents of a Column

I have a MySQL table Page with 2 columns: PageID and OrderByMethod.
I also then have a Data table with lots of columns including PageID (the Page the data is on), DataName, and DataDate.
I want OrderByMethod to have one of three entries: Most Recent Data First, Most Recent Data Last, and Alphabetically.
Is there a way for me to tack an "ORDER BY" clause to the end of this query that will vary its ordering method based on the contents of the "OrderByMethod" column? For example, in this query, I would want to have the ORDER BY clause contain whatever ordering rule is stored in Page 1's OrderByMethod column.
GET * FROM `Data` WHERE `Data`.`PageID`=1 ORDER BY xxxxxx;
Maybe a SELECT clause in the ORDER BY clause? I'm not sure how that would work though.
Thanks!
select Data.*
from Data
inner join Page on (Data.PageID=Page.PageID)
where Data.PageID=1
order by
if(Page.OrderByMethod='Most Recent Data First', now()-DataDate,
if(Page.OrderByMethod='Most Recent Data Last', DataDate-now(), DataName)
);
You can probably do this with the IF syntax to generate a column that you can then order by.
SELECT *, IF(Page.OrderBy = 'Alphabetically', Data.DataName, IF(Page.OrderBy = 'Most Recent Data First', NOW() - Data.DataDate, Data.DataDate - NOW())) AS OrderColumn
FROM Data
INNER JOIN Page ON Data.PageID = Page.PageID
WHERE Page.PageID = 1
ORDER BY OrderColumn
The direction of the ordering is determined in the calculation of the data instead of specifying a direction in the ORDER BY
Can you just append the order by clause to the select statement and rebind the table on postback?
If you want to use the content of the column in Page table as an expression in ORDER BY you have to do it using prepared statements. Let say, you store in OrderByMethod something like "field1 DESC, field2 ASC" and you want this string to be used as it is:
SET #order_by =(SELECT OrderByMethod FROM Page WHERE id = [value]);
SET #qr = CONCAT(your original query,' ORDER BY ', #order_by);
PREPARE stmt FROM #qr;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
If you want the result set to be sorted based on the value of OrderByMethod , you can use IF as it was already mentioned by others, or CASE :
...
ORDER BY
CASE OrderByMethod
WHEN 'val1' THEN field_name1
WHEN 'val2' THEN field_name2
....etc
END

Resources