HiveQL, Hive SQL select date range - datetime

It seems simple in SQL but I'm having troubles using HiveQL with date range.
I have a dataset like this:
hive> describe logs;
id string,
ts string,
app_id int
hive> select * from logs limit 5;
1389 2014-10-05 13:57:01 12
1656 2014-10-06 03:57:59 15
1746 2014-10-06 10:58:25 19
1389 2014-10-09 08:57:01 12
1656 2014-10-10 01:57:59 15
My goal is to get the distinct id for the last 3 days. The best thing is to read the current system time and get the unique id of last 3 days, but not sure where I need to put "unix_timestamp()". Considered that the log is recorded realtime and there's today's date in ts, I tried to use this query (first approach)
hive > SELECT distinct id FROM logs HAVING to_date(ts) > date_sub(max(ts), 3) and to_date(ts) < max(ts);
FAILED: SemanticException [Error 10025]: Line 1:45 Expression not in GROUP BY key 'ts'
If I add group by 'ts' like below, it spits up this error:
hive> SELECT distinct ext FROM pas_api_logs group by ts HAVING to_date(ts) > date_sub(max(ts), 7) and to_date(ts) < max(ts);
FAILED: SemanticException 1:47 SELECT DISTINCT and GROUP BY can not be in the same query. Error encountered near token 'ts'
After the numerous try, the last approach made was this, studied after [similar topic][1].
Select distinct id from (SELECT * FROM logs JOIN logs ON (max(logs.ts) = to_date(logs.ts))
UNION ALL
SELECT * FROM logs JOIN logs ON (to_date(logs.ts) = date_sub(max(logs.ts), 1))
UNION ALL
SELECT * FROM logs JOIN logs ON (to_date(logs.ts) = date_sub(max(logs.ts), 2)));
Apparently this doesn't work either. Can someone shed some lights on this?

The required result can be obtained by using this statement:
select distinct id from logs where DATEDIFF(from_unixtime(unix_timestamp()),ts) <= 3;
Hope it helps !

Related

How to exclude the records Using Qualify statement in Teradata

I have to create population for the people who has only one product association (ABC) using qualify statement.
For example I have the data
Id Code Prod Date
101 202 ABC 2017-05-31
101 203 DEF 2017-04-30
102 302 ABC 2018-06-30
From the above data I need the data for Id=102 because this id has only one prod relation where as id 101 has both ABC and DEF which should be excluded.
I tried the following
Select id,prod from table1
Qualify row_number() over (partition by id order by Date)=1
Where prod=‘ABC’
With this, I get the two records in my data which I don’t want. Appreciate your help.
Select *
from table1
Qualify min(Prod) over (partition by id)='ABC'
and max(Prod) over (partition by id)='ABC'
Both MIN and MAX return the same value ABC, thus there's no other value
If you want to return the id's that have one prod value (ABC) in the table, you can do something like this:
SELECT id, prod
FROM (
SELECT id, prod
FROM table1
GROUP BY id, prod -- Get unique (id, prod) combinations
QUALIFY COUNT(prod) OVER(PARTITION BY id) = 1 -- Get id's with only one prod
) src
WHERE prod = 'ABC' -- Only get rows with "ABC" prod
The key here is the order in which Teradata processes the query:
Aggregate - GROUP BY
OLAP - COUNT(prod) OVER()
QUALIFY
You may be able to move the WHERE prod = 'ABC' into the QUALIFY clause and get rid of the outer SELECT, not 100% sure.
Just use having, instead of qualify. I don't see any need for window fuctions. Something like:
Select id,prod ,
count(prod)
from
table1
group by
id,
prod
having count(prod) = 1

Get a count of rows that meet condition

SQLITE3
Task: get a data set that contains the following data - SEE NOTES BESIDE COLUMNS
SELECT DISTINCT DateTime(Rounded, 'unixepoch') AS RoundedDate, -- Rounded DateTime to the floor hour
Count() AS Count, -- Count of items that registered within the above time
CAST (avg(Speed) AS INT) AS AverageSpeed, -- Average table.Speed column data within the defined datetime
Count() AS SpeederCount -- ?? WTF? [pseudo constraints: if Speed > Speedlimit then +1]
FROM RawSpeedLane AS sl
INNER JOIN
SpeedLaneSearchData AS slsd ON slsd.ParentId = sl.Id
INNER JOIN
Projects AS p ON p.ProjectId = sl.ProjectId
WHERE sl.ProjectId = 72
GROUP BY RoundedDate;
The SQL above is currently gives me all the data I need, EXECPT for the last column.
This last column is supposed to be the count of records where that pass specific criteria. The only way I have found to successfully do this is to build a sub query... Cool? okay, but the problem is the sub query takes 4 minutes to run because well... I suck at SQL :P No matter how many different ways I've tried to write it, it still takes forever.
Here is the long, but working version.
SELECT DISTINCT RoundedDate,
Count() AS Count,
CAST (avg(Speed) AS INT) AS AverageSpeed,
(
SELECT count()
FROM RawSpeedLane AS slr
WHERE slr.ProjectId = 72 AND
datetime( ( (strftime('%s', Start) - (strftime('%M', Start) * 60 + strftime('%S', Start) ) ) ), 'unixepoch') = sl.RoundedDate AND
Speed > p.SpeedLimit
)
AS SpeederCount
FROM SpeedLaneReportDataView AS sl
INNER JOIN
Projects AS p ON p.ProjectId = sl.ProjectId
WHERE sl.ProjectId = 72
GROUP BY RoundedDate;
I currently just tried this for the last column
(select Count() where sl.Speed > p.SpeedLimit)
but as expected, i got 1s and 0s im not really sure on what to do here. Any hints or help that lead me in the right direction is very much appreciated.
I don't think SQLite has an IIF but CASE works.
This is a response to Backs answer, but I can't comment yet.
SELECT DISTINCT DateTime(Rounded, 'unixepoch') AS RoundedDate, -- Rounded DateTime to the floor hour
Count() AS Count, -- Count of items that registered within the above time
CAST (avg(Speed) AS INT) AS AverageSpeed, -- Average table.Speed column data within the defined datetime
SUM(CASE WHEN Speed > SpeedLimit THEN 1 ELSE 0 END) AS SpeederCount
FROM RawSpeedLane AS sl
With SUM and IIF:
SELECT DISTINCT DateTime(Rounded, 'unixepoch') AS RoundedDate, -- Rounded DateTime to the floor hour
Count() AS Count, -- Count of items that registered within the above time
CAST (avg(Speed) AS INT) AS AverageSpeed, -- Average table.Speed column data within the defined datetime
SUM(IIF(Speed > SpeedLimit, 1, 0)) AS SpeederCount
FROM RawSpeedLane AS sl

Calculating occupany level between a date range

I'm having trouble trying to wrap my head around how to write this query to calculate the occupancy level of a hotel and then list the results by date. Consider the following type of data from a table called reservations:
Arrival Departure Guest Confirmation
08/01/2015 08/05/2015 John 13234
08/01/2015 08/03/2015 Bob 34244
08/02/2015 08/03/2015 Steve 32423
08/02/2015 08/02/2015 Mark 32411
08/02/2015 08/04/2014 Jenny 24422
Output Data would ideally look like:
Date Occupancy
08/01/2015 2
08/02/2015 4
08/03/2015 2
08/04/2015 1
08/02/2015 0
And the query should be able to utilize a date range as a variable. I'm having trouble getting the obviously hardest piece of how to both get the count per night and spitting it out by date.
You can generate a list of dates first. In Oracle you can do this by using connect by. This will make a recursive query. For instance, to get the next 30 days, you can select today and keep connecting until you've got the desired number of days. level indicates the level of recursion.
select trunc(sysdate) + level - 1 as THEDATE
from dual
connect by level <= 30;
On that list, you can query the number of reservations for each day in that period:
select THEDATE,
(select count(*)
from reservations r
where r.Arrival >= THEDATE and
r.Departure < THEDATE) as RESERVATIONCOUNT
from
( select trunc(sysdate) + level - 1 as THEDATE,
from dual
connect by level <= 30)
Instead of getting a fixed number of dates, you can also get another value there, for instance, to get at least 30 days in the future, but further if there are reservations for later..:
select THEDATE,
(select count(*)
from reservations r
where r.Arrival >= THEDATE and
r.Departure < THEDATE) as RESERVATIONCOUNT
from
( select trunc(sysdate) + level - 1 as THEDATE,
from dual
connect by
level <= greatest(30, (select trunc(max(DEPARTURE) - sysdate)
from reservations)))

How to compare various date formats stored as nvarchar in a column with the getdate() in SQL Server

I am using SQL Server 2008 R2
When examined the values in the table, I found that few values are stored as 2015-03-20T06:06:46 and few values are stored as 11/25/2014. So now how to compare these both values in where clause with getdate()
SELECT B.Value
FROM table1 A WITH (NOLOCK)
INNER JOIN table1 B WITH (NOLOCK) ON A.id = B.id
WHERE
A.Name = 'COMPLETED_AT'
AND CONVERT(smalldatetime, A.Value) < GETDATE() - 30
AND B.Name = 'RESULT'
Getting error message
Conversion failed when converting character string to smalldatetime data type
when executing the above query
Example table structure
ID Name Value
1 Result R12344
1 Completed_At 2015-03-20T06:06:46
2 Result R23445
2 Completed_At 2014-03-20T06:06:46
3 Result R83261
3 Completed_At 11/25/2014
Column value is of nvarchar(400) datatype
The query result should display the values of result name type which have been made an entry of more than 30 days.
Looking forward in hearing from you.
Finally got answer to my query by myself after going through the link
https://msdn.microsoft.com/en-us/library/ms187928.aspx
CONVERT(nvarchar(400),A.value,126) < CONVERT(nvarchar(400),GETDATE()-30,126)
When i have used the above value at the where clause i didnt get any error and also the result is also correct

Getting All the record of particular month - Building SQL Query

I need some help to build SQL Query. I have table having data like:
ID Date Name
1 1/1/2009 a
2 1/2/2009 b
3 1/3/2009 c
I need to get result something like...
1 1/1/2009 a
2 1/2/2009 b
3 1/3/2009 c
4 1/4/2009 Null
5 1/5/2009 Null
6 1/6/2009 Null
7 1/7/2009 Null
8 1/8/2009 Null
............................
............................
............................
30 1/30/2009 Null
31 1/31/2009 Null
I want query something like..
Select * from tbl **where month(Date)=1 AND year(Date)=2010**
Above is not completed query.
I need to get all the record of particular month, even if some date missing..
I guess there must be equi Join in the query, I am trying to build this query using Equi join
Thanks
BIG EDIT
Now understand the OPs question.
Use a common table expression and a left join to get this effect.
DECLARE #FirstDay DATETIME;
-- Set start time
SELECT #FirstDay = '2009-01-01';
WITH Days AS
(
SELECT #FirstDay as CalendarDay
UNION ALL
SELECT DATEADD(d, 1, CalendarDay) as CalendarDay
FROM Days
WHERE DATEADD(d, 1, CalendarDay) < DATEADD(m, 1, #FirstDay)
)
SELECT DATEPART(d,d.CalendarDay), **t.date should be (d.CalendarDay)**, t.Name FROM Days d
LEFT JOIN tbl t
ON
d.CalendarDay = t.Date
ORDER BY
d.CalendarDay;
Left this original answer at bottom
You need DATEPART, sir.
SELECT * FROM tbl WHERE DATEPART(m,Date) = 1
If you want to choose month and year, then you can use DATEPART twice or go for a range.
SELECT * FROM tbl WHERE DATEPART(m,Date) = 1 AND DATEPART(yyyy,Date) = 2009
Range :-
SELECT * FROM tbl WHERE Date >= '2009-01-01' AND Date < '2009-02-01'
See this link for more info on DATEPART.
http://msdn.microsoft.com/en-us/library/ms174420.aspx
You can use less or equal to.
Like so:
select * from tbl where date > '2009-01-01' and date < '2009-02-01'
However, it is unclear if you want month 1 from all years?
You can check more examples and functions on "Date and Time Functions" from MSDN
Create a temporary table containing all days of that certain month,
Do left outer join between that table and your data table on tempTable.month = #month.
now you have a big table with all days of the desired month and all the records matching the proper dates + empty records for those dates who have no data.
i hope that's what you want.

Resources