Only choosing oldest date (BigQuery) - datetime

I'd like to only choose the oldest date. Using Max/Min doesn't work because it's at row level, and I couldn't figure out a way to use over or NTH as this query will be run each day with a different number of server, w_id and z_id.
The following query:
select server, w_id, z_id, date(datetime) as day
from( SELECT server, w_id, datetime, demand.b_id as id, demand.c_type, z_id,
FROM TABLE_DATE_RANGE(v3_data.v3_,DATE_ADD(CURRENT_DATE(),-2,"day"),
DATE_ADD(CURRENT_DATE(),-1,"day"))
where demand.b_id is not null and demand.c_type = 'rtb'
group by 1,2,3,4,5,6
having datetime >= DATE_ADD(CURRENT_DATE(),-2,"day")
)
group by 1,2,3,4
having count(day)<2
order by z_id, day
Gives results:
Row server w_id z_id day
1 A 722 1837 2016-04-19
2 SPORTS 51 2534 2016-04-19
3 A 1002 2546 2016-04-18
4 A 1303 3226 2016-04-19
5 A 1677 4369 2016-04-18
6 NEW 13608 9370 2016-04-19
So from the above I'd only like 2016-04-18.

I think a GROUP_CONCAT might get the job done quite simply here:
SELECT
server,
w_id,
z_id,
day,
FROM (
SELECT
server,
w_id,
z_id,
GROUP_CONCAT(day) day,
FROM (
SELECT
server,
w_id,
DATE(datetime) day,
demand.b_id AS id,
demand.c_type,
z_id,
FROM
TABLE_DATE_RANGE(v3_data.v3_,DATE_ADD(CURRENT_DATE(),-2,"day"), DATE_ADD(CURRENT_DATE(),-1,"day"))
WHERE
demand.b_id IS NOT NULL
AND demand.c_type = 'rtb'
AND DATE(datetime) >= DATE(DATE_ADD(CURRENT_DATE(),-2,"day"))
GROUP BY
1,2,3,4,5,6
ORDER BY
day) # Critical to order this dimension to make the GROUP_CONCAT permutations unique
GROUP BY
server,
w_id,
z_id,
# day is aggregated in GROUP_CONCAT and so it does not get included in the GROUP BY
)
WHERE
day = DATE(DATE_ADD(CURRENT_DATE(),-2,"day"))

Most inner select is your untouched original one
The rest is wrapper taking care of min_day
Not tested - as done on go - but at least should give you an idea
SELECT server, w_id, z_id, [day]
FROM (
SELECT server, w_id, z_id, [day], MIN([day]) OVER() AS min_day
FROM (
SELECT server, w_id, z_id, DATE(datetime) AS [day]
FROM (
SELECT server, w_id, datetime, demand.b_id AS id, demand.c_type, z_id,
FROM TABLE_DATE_RANGE(v3_data.v3_,DATE_ADD(CURRENT_DATE(),-2,"day"), DATE_ADD(CURRENT_DATE(),-1,"day"))
WHERE demand.b_id IS NOT NULL AND demand.c_type = 'rtb'
GROUP BY 1,2,3,4,5,6
HAVING datetime >= DATE_ADD(CURRENT_DATE(),-2,"day")
)
GROUP BY 1,2,3,4
HAVING COUNT([day])<2
)
)
WHERE [day] = min_day
ORDER BY z_id, [day]

Both solutions have been helpful, but I believe neither worked the way I wanted and the following does:
select server, w_id, id, demand.c_type,z_id,
NTH(1, day) First, NTH(2, day) Second,
from(
SELECT
server,
w_id,
DATE(datetime) as day,
demand.b_id AS id,
demand.c_type,
z_id,
FROM
TABLE_DATE_RANGE([black-beach-789:v3_data.v3_],DATE_ADD(CURRENT_DATE(),-2,"day"), DATE_ADD(CURRENT_DATE(),-1,"day"))
WHERE
demand.b_id IS NOT NULL
AND demand.c_type = 'rtb'
AND DATE(datetime) >= DATE(DATE_ADD(CURRENT_DATE(),-2,"day"))
GROUP BY
1,2,3,4,5,6
order by day
)
group by 1,2,3,4,5
having first = date(DATE_ADD(CURRENT_DATE(),-2,"day")) and Second is null

Related

Impala - Working hours between two dates in impala

I have two time stamps #starttimestamp and #endtimestamp. How to calculate number of working hours between these two
Working hours is defined below:
Mon- Thursday (9:00-17:00)
Friday (9:00-13:00)
Have to work in impala
think i found a better solution.
we will create a series of numbers using a large table. You can get a time dimension type table too. Make it doenst get truncated. I am using a large table from my db.
Use this series to generate a date range between start and end date.
date_add (t.start_date,rs.uniqueid) -- create range of dates
join (select row_number() over ( order by mycol) as uniqueid -- create range of unique ids
from largetab) rs
where end_date >=date_add (t.start_date,rs.uniqueid)
Then we will calculate total hour difference between the timestamp using unix timestamp considering date and time.
unix_timestamp(endtimestamp - starttimestamp )
Exclude non working hours like 16hours on M-T, 20hours on F, 24hours on S-S.
case when dayofweek ( dday) in (1,7) then 24
when dayofweek ( dday) =5 then 20
else 16 end as non work hours
Here is complete SQL.
select
end_date, start_date,
diff_in_hr - sum(case when dayofweek ( dday) in (1,7) then 24
when dayofweek ( dday) =5 then 20
else 16 end ) total_workhrs
from (
select (unix_timestamp(end_date)- unix_timestamp(start_date))/3600 as diff_in_hr , end_date, start_date,date_add (t.start_date,rs.uniqueid) as dDay
from tdate t
join (select row_number() over ( order by mycol) as uniqueid from largetab) rs
where end_date >=date_add (t.start_date,rs.uniqueid)
)rs2
group by 1,2,diff_in_hr

SQLite: Running balance with an ending balance

I have an ending balance of $5000. I need to create a running balance, but adjust the first row to show the ending balance then sum the rest, so it will look like a bank statement. Here is what I have for the running balance but how can I adjust row 1 to not show a sum of the first row, but the ending balance instead.
with BalBefore as (
select *
from transactions
where ACCT_NAME = 'Real Solutions'
ORDER BY DATE DESC
)
select
DATE,
amount,
'$' || printf("%.2f", sum(AMOUNT) over (order by ROW_ID)) as Balance
from BalBefore;
This gives me"
DATE AMOUNT BALANCE
9/6/2019 -31.00 $-31.00 <- I need this balance to be replaced with $5000 and have the rest
9/4/2019 15.00 $-16.00 sum as normal.
9/4/2019 15.00 $-1.00
9/3/2019 -16.00 $-17.00
I have read many other questions, but I couldn't find one that I could understand so I thought I would post a simpler question.
The following is not short and sweet, but using the WITH statement and CTEs, I hope that the logic is apparent. Multiple CTEs are defined which refer to each other to make the overall query more readable. Altogether the goal was just to add a beginning balance record that could be :
/*
DROP TABLE IF EXISTS data;
CREATE temp TABLE data (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
date DATETIME NOT NULL,
amount NUMERIC NOT NULL
);
INSERT INTO data
(date, amount)
VALUES
('2019-09-03', -16.00),
('2019-09-04', 15.00),
('2019-09-04', 15.00),
('2019-09-06', -31.00)
;
*/
WITH
initial_filter AS (
SELECT id, date, amount
FROM data
--WHERE ACCT_NAME = 'Real Solutions'
),
prepared AS (
SELECT *
FROM initial_filter
UNION ALL
SELECT
9223372036854775807 as id, --largest signed integer
(SELECT MAX(date) FROM initial_filter) AS FinalDate,
-(5000.00) --ending balance (negated for summing algorithm)
),
running AS (
SELECT
id,
date,
amount,
SUM(-amount) OVER
(ORDER BY date DESC, id DESC
RANGE UNBOUNDED PRECEDING
EXCLUDE CURRENT ROW) AS balance
FROM prepared
ORDER BY date DESC, id DESC
)
SELECT *
FROM running
WHERE id != 9223372036854775807
ORDER BY date DESC, id DESC;
This produces the following
id date amount balance
4 2019-09-06 -31.00 5000
3 2019-09-04 15.00 5031
2 2019-09-04 15.00 5016
1 2019-09-03 -16.00 5001
UPDATE: The first query was not producing the correct balances. The beginning balance row and the windowing function (i.e. OVER clause) were updated to accurately sum over the correct amounts.
Note: The balance on each row is determined completely from the previous rows, not from the current row's amount, because this works backward from an ending balance, not forward from the previous row balance.

Rolling 7 day uniques & 31 day uniques in BigQuery w/ Firebase

I'm trying to setup a rolling 7 day users & rolling 31 day users in BigQuery (w/ Firebase) using the following query. I want it where for each day it examines the previous 31 days as well as 7 days. I've been stuck and getting the message:
LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.
The query:
With events AS (
SELECT PARSE_DATE("%Y%m%d", event_date) as event_date, user_pseudo_id FROM `my_data_table.analytics_178206500.events_*`
Where _table_suffix NOT LIKE "i%" AND event_name = "user_engagement"
GROUP BY 1, 2
),
DAU AS (
SELECT event_date as date, COUNT(DISTINCT(user_pseudo_id)) AS dau
From events
GROUP BY 1
)
SELECT DAU.date, DAU.dau,
(
SELECT count(distinct(user_pseudo_id))
FROM events
WHERE events.event_date BETWEEN DATE_SUB(DAU.date, INTERVAL 29 DAY) and dau.date
) as mau,
(
SELECT count(distinct(user_pseudo_id))
FROM events
WHERE events.event_date BETWEEN DATE_SUB(DAU.date, INTERVAL 7 DAY) and dau.date
) as wau
FROM DAU
ORDER BY 1 DESC
I'm able to get the DAU part but the last 7 day users (WAU) & last 31 day users (MAU) aren't coming through. I have tried to CROSS JOIN DAU w/ events but I get the following results GraphResults
Any pointers would be greatly appreciated

Avoid repetition of subquery

I have a table messages that contains a column message_internaldate. Now I'd like to count the messages within certain time periods (each hour of a day) over several months. I could manage to get the sum of messages per hour by having lots of subqueries (24) but I hope that there is a more brainy way to do that. The subqueries are similar except that the time period changes. Any suggestions?
e.g. for the first two hours
SELECT T1, T2 FROM
(
SELECT sum(T1c) as T1 FROM
(
SELECT strftime('%H:%M',message_internaldate) AS T1s ,count(*) as T1c FROM messages WHERE
message_internaldate BETWEEN '2005-01-01 00:00:00' AND '2012-12-31 00:00:00'
AND strftime('%H:%M',message_internaldate) BETWEEN '01:00'AND '01:59'
GROUP BY strftime('%H:%M',message_internaldate)
)
)
,
(
SELECT sum(T2c) as T2 FROM
(
SELECT strftime('%H:%M',message_internaldate) AS T2s ,count(*) as T2c FROM messages WHERE
message_internaldate BETWEEN '2005-01-01 00:00:00' AND '2012-12-31 00:00:00'
AND strftime('%H:%M',message_internaldate) BETWEEN '02:00'AND '02:59'
GROUP BY strftime('%H:%M',message_internaldate)
)
)
...
Your problem is that you want to have the individual hours as columns.
To get them as rows, try a query like this:
SELECT strftime('%H', message_internaldate) AS hour,
strftime('%H:%M', message_internaldate) AS Ts,
COUNT(*) AS Tc
FROM messages
WHERE message_internaldate BETWEEN '2005-01-01 00:00:00' AND '2012-12-31 23:59:59'
GROUP BY 1, 2

Preventing Max function from using timestamp as part of criteria on a date column in PL/SQL

If I query:
select max(date_created) date_created
on a datefield in PL/SQL (Oracle 11g), and there are records that were created on the same date but at different times, Max() returns only the latest times on that date. What I would like to do is have the times be ignored and return ALL records that match the max date, regardless of their associated timestamp in that column. What is the best practice for doing this?
Edit: what I'm looking to do is return all records for the most recent date that matches my criteria, regardless of varying timestamps for that day. Below is what I'm doing now and it only returns records from the latest date AND time on that date.
SELECT r."ID",
r."DATE_CREATED"
FROM schema.survey_response r
JOIN
(SELECT S.CUSTOMERID ,
MAX (S.DATE_CREATED) date_created
FROM schema.SURVEY_RESPONSE s
WHERE S.CATEGORY IN ('Yellow', 'Blue','Green')
GROUP BY CUSTOMERID
) recs
ON R.CUSTOMERID = recs.CUSTOMERID
AND R.DATE_CREATED = recs.date_created
WHERE R.CATEGORY IN ('Yellow', 'Blue','Green')
Final Edit: Got it working via the query below.
SELECT r."ID",
r."DATE_CREATED"
FROM schema.survey_response r
JOIN
(SELECT S.CUSTOMERID ,
MAX (trunc(S.DATE_CREATED)) date_created
FROM schema.SURVEY_RESPONSE s
WHERE S.CATEGORY IN ('Yellow', 'Blue','Green')
GROUP BY CUSTOMERID
) recs
ON R.CUSTOMERID = recs.CUSTOMERID
AND trunc(R.DATE_CREATED) = recs.date_created
WHERE R.CATEGORY IN ('Yellow', 'Blue','Green')
In Oracle, you can get the latest date ignoring the time
SELECT max( trunc( date_created ) ) date_created
FROM your_table
You can get all rows that have the latest date ignoring the time in a couple of ways. Using analytic functions (preferrable)
SELECT *
FROM (SELECT a.*,
rank() over (order by trunc(date_created) desc) rnk
FROM your_table a)
WHERE rnk = 1
or the more conventional but less efficient
SELECT *
FROM your_table
WHERE trunc(date_created) = (SELECT max( trunc(date_created) )
FROM your_table)

Resources