How to write result of union of column of two tables in select query in hive - bigdata

I am writing Select statement in hive , but my requirement is to use one field in select query that comes after union of column from two tables . Tables joined have been used in the select query but I am more stuck with problem how to use union of column from two tables , Please refer SQl query below :
SELECT t1.a , t2.b FROM tab1 inner join tab2 on ( t1.c = t2.c) inner join tab3 t3 on(t2.b = t3.b );
on above query I want below result but I am stuck here :
SELECT t1.a , t2.b , (t1.d UNION t2.d ) FROM tab1 inner join tab2 on ( t1.c = t2.c) inner join tab3 t3 on(t2.b = t3.b );
I have tried writing subquery in select statement but seems not to be working as Hive does not support subquery within select statement . I have also tried writing with statement first and try to use the result in select statement but I also not seems to be working .

Related

Subquery with alias in SQLite

I have a query that I run in PostgreSQL like this:
select
c_count, count(*) as custdist
from (
select
c_custkey,
count(o_orderkey)
from
customer left outer join orders on
c_custkey = o_custkey
and o_comment not like '%special%requests%'
group by
c_custkey
)as c_orders (c_custkey, c_count)
group by
c_count
order by
custdist desc,
c_count desc;
And I wanted to run it on SQLite, but I got this error: Error: near" (": syntax error. Maybe he doesn't recognize this as c_orders (c_custkey, c_count).
Is there any way to rewrite this query to execute in SQLite?
SQLite does not allow redefining/renaming columns for an nested, aliased query. You can do that with a WITH clause (i.e. Common Table Expression; CTE). Or you can add aliases to the nested query columns directly using AS keyword.
Interesting that this is exactly how the outer query columns are named. Just use the same pattern for the nested query. I don't use PostgreSQL, but why not add aliases directly on each column and complicate it by using different syntax for each part of the query?
select
c_count, count(*) as custdist
from (
select
c_custkey,
count(o_orderkey) AS c_count
from
customer left outer join orders on
c_custkey = o_custkey
and o_comment not like '%special%requests%'
group by
c_custkey
) AS c_orders
group by
c_count
order by
custdist desc,
c_count desc;

CREATE TABLE with CTE statement

Is it possible to create a table from a query with a CTE statement?
Something like:
CREATE TABLE db1.test1 AS
(
WITH cte1(v1) as
( SEL v1 FROM db1.table1 )
SEL * FROM cte1
)
This is how the CTE's look like:
WITH employees(id, name, boss, senior_boss) AS
(
SEL
empls.id,
empls.name,
supervisors.name as boss,
senior_bosses.name as senior_boss
FROM empl_cte AS empls
LEFT JOIN empl_cte AS supervisors ON empls.boss_id = supervisors.id
LEFT JOIN empl_cte AS senior_bosses ON supervisors.boss_id = senior_bosses.id
),
WITH empl_cte(....) AS
(
SEL
id,
name
boss_id
FROM all_employees
WHERE <some_filters>
)
SEL
*
FROM products
LEFT JOIN employees ON products.sales_rep_id = employees.id
Both
converting the CTEs into views
and
converting employees as a sub-query (empl_cte as a VIEW) in the left join
leads to a massive loss of performance (run time blowing up from a couple of minutes to days of work). I can't figure out how Teradata optimizer works.
EXPLAIN on the new refactored queries seem indicate that the LEFT JOIN becomes a product join draining countless of time.
This will work in V16 (and possibly earlier versions).
CREATE TABLE myTable AS (
SELECT * FROM (
WITH x AS (
SELECT ...
FROM ...
WHERE ...
)
SELECT ...
FROM x ...
WHERE ...
) D
) WITH DATA PRIMARY INDEX (PK)
;
Basically you need to wrap the whole query, including the CTE, in a SELECT with an alias.

How to reuse a table with UNION?

I am trying to reuse a table in SQLite. My attempt is as follows:
SELECT
Partials.e_sentence
FROM
(SELECT
e_sentence, _id
FROM
Pair
JOIN PairCategories
ON
_id=PairId AND CategoryId=53
UNION
SELECT
e_sentence, _id
FROM
Pair
WHERE
e_sentence LIKE '%' || 'how often' || '%'
GROUP BY
e_sentence)
AS Parents JOIN Partials
ON Parents._id=ParentId
UNION
SELECT
e_sentence
FROM
Parents
The key part I am trying to accomplish is at the bottom, where I try to UNION a table created in the previous statement. Is there a way to do this in SQLite, or am I forced to repeat the query that made the Parents table in the first half of the UNION?
In SQLite 3.8.3 or later, you can use a common table expression:
WITH Parents AS (
SELECT e_sentence, _id
FROM Pair
JOIN PairCategories
...
)
SELECT Partials.e_sentence
FROM Parents
JOIN Partials ON Parents._id = ParentId
UNION
SELECT e_sentence
FROM Parents;
If you're using an older SQLite (probably because you're using an older Android), you can create a view for the subquery:
CREATE VIEW Parents AS
SELECT e_sentence, _id
FROM Pair
JOIN PairCategories
...;
SELECT Partials.e_sentence
FROM Parents
JOIN Partials ON Parents._id = ParentId
UNION
SELECT e_sentence
FROM Parents;
If you do not want to have this view permanently in the database, you could make it temporary (CREATE TEMPORARY VIEW ...) so that it is not available outside the current database connection, or, as last resort, you could just insert the subquery wherever you would use Parent:
SELECT Partials.e_sentence
FROM (SELECT ...) AS Parents
JOIN Partials ON Parents._id = ParentId
UNION
SELECT e_sentence
FROM (SELECT ...) AS Parents;

Sequence in sql operation?

I am passing datatable as input parameter to stored procedure. Datatable contains id, Name,Lname,Mobileno,EmpId.
Employee table contains [Name],[Lname],[mobno],[Did] as columns.
When user is logged in, his Id come as DId. There are more than 1000 records. Instead of passing that id to datatable, I have created
separete parameter to sp. I want to add records to Employee table, which are not already exist. If combination of mobileno and Did already exists, then
don't insert into Employee table, else insert. Datatable may contain records, which can be duplicate. So I don't want to include that record. I want select only
distinct records and add them to table. I am intrested in mobile no. If there are 10 record having same moble no, I am fetching record, which comes first.
Following code is right or wrong. According to my knowledge, first from clause, then inner join, then where, then select execute. Record get fetched from datatable,
then inner join happens generate result, from that result not from datatable it will check record. So it will give me proper output.
Create Procedure Proc_InsertEmpDetails
#tblEmp EmpType READONLY,
#DId int
as
begin
INSERT INTO Employee
([Name],[Lname],[mobno],[Did])
SELECT [Name],[Lname],[mobno] #DId
FROM #tblEmp A
Inner join (
select min(Id) as minID, mobno from #tblEmp group by mobno
) MinIDTbl
on MinIDTbl.minID = A.ExcelId
WHERE NOT EXISTS (SELECT 1
FROM Employee B
WHERE B.[mobno] = A.[mobno]
AND B.[Did] = #DId )
end
or does I need to change like this
INSERT INTO Employee
([Name],[Lname],[mobno],[Did])
SELECT C.[Name],C.[Lname],C.[mobno], C.D_Id
from
(SELECT [Name],[Lname],[mobno] #DId as D_Id
FROM #tblEmp A
Inner join (
select min(Id) as minID, mobno from #tblEmp group by mobno
) MinIDTbl
on MinIDTbl.minID = A.ExcelId
)C
WHERE NOT EXISTS (SELECT 1
FROM Employee B
WHERE B.[mobno] = C.[mobno]
AND B.[Did] = #DId )

Sqlite double left outer join with count

I have the following DB structure:
tbl_record(_id,_id_user,...)
tbl_photo(_id,_id_record,...)
tbl_note(_id,_id_record,...)
When listing the records of a specific user while counting the number of photos a record has, I use the following query, which works fine:
SELECT tbl_record._id, COUNT(tbl_photo._id_record) AS photo_count FROM tbl_record
LEFT OUTER JOIN tbl_photo ON tbl_record._id=tbl_photo._id_record
WHERE tbl_record._id_user=? GROUP BY tbl_record._id;
Now, I'd like to do the same as above, but also count the number of notes a record has:
SELECT tbl_record._id, COUNT(tbl_photo._id_record) AS photo_count, COUNT(tbl_note._id_record) AS note_count FROM tbl_record
LEFT OUTER JOIN tbl_photo ON tbl_record._id=tbl_photo._id_record
LEFT OUTER JOIN tbl_note ON tbl_record._id=tbl_note._id_record
WHERE tbl_record._id_user=? GROUP BY tbl_record._id;
The count of the 2nd query does not work properly when a record has >0 photos & >0 notes, e.g. 3 photos & 5 photos which results in a count of 15 (3*5) for each.
Any idea how to make the 2nd query return the proper counts?
Thanks!!
You might be able to filter out duplicates by using COUNT(DISTINCT some_id), but this would be inefficient.
Better use correlated subqueries:
SELECT _id,
(SELECT COUNT(*)
FROM tbl_photo
WHERE _id_record = tbl_record._id
) AS photo_count,
(SELECT COUNT(*)
FROM tbl_note
WHERE _id_record = tbl_record._id
) AS note_count
FROM tbl_record
WHERE _id_user = ?

Resources