Select count based on condition teradata - count

I want to select the number of ads which have at least 1 pageview in a month.
SELECT month, count(ads) as total_ads, count(????) as viewed_ads
....
GROUP BY 1
I am quite sure it has to be something with a qualify statement but I am not familiar with them.

Related

Aggregating on groups of data order by date in Snowflake

I have the following data in my table:
I need the output to be the following in Snowflake:
It is basically, order by transaction date and getting the first transaction and the last transaction for the country and city and the count of transactions as they are done in sequence. I tried using window functions but I'm not getting the desired result. The tricky part if you can see is that the grouping has to be done but in sequence. You can see TEXAS and CALIFORNIA repeating depending on the sequence of transactions for the country and city.
Best it can be via a query. Second best, in some other way of computation that is fast. Has to be done on batches of data. I don't really want to go to an approach where the data is pulled in an order and then gone through row by row in a sequence unless that is the only option. Open to advises on that as well. Thanks!
Hint: GROUP BY, MIN, MAX, COUNT
I was able to find a logic and the following query works:
select countryid, regionid, min(requesttime), max(requesttime), count(*) from (select deviceid,countryid,regionid,cityid, requesttime,
row_number() over (partition by countryid order by requesttime) as seqnum_1,
row_number() over (partition by countryid, regionid order by requesttime) as seqnum_2
from table t order by requesttime
) t group by countryid, regionid, (seqnum_1 - seqnum_2) order by min(requesttime);

BigQuery -firebase export working different when using wildcard character and _TABLE_SUFFIX compared to without using it

My Requirement:
To append unnested data in a separate table and use it for visualization and analytics
Implementing it :
As I am not sure at what time exactly events_intraday_YYYYMMDD syncs into events_YYYYMMDD for reference check here
0- Created an events_normalized table once at the start by using (It is done once not daily)
create analytics_data_export.events_normalized AS
SELECT .....
FROM
`analytics_xxxxxx.events_*
to collect all the data from events_YYYYMMDD
1- Creating/Replacing a daily temp table with
create or replace table analytics_data_export.daily_data_temp AS
SELECT...
_TABLE_SUFFIX BETWEEN
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 4 DAY)) AND
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
as I have seen multiple days data syncing together so to be on the safe side I am using 1-4 days data
2- Deleting the inner join of both the tables(daily_data_temp,events_normalized) from events_normalized to remove any duplicates it might have like let's say events_normalized has data till 18th but daily_data_temp has data from 16-19th so all the rows till 18th from events_normalized will be removed
4- Reinserting daily_data_temp in the events_normalized
Questions:
1- Is there any optimized way of implementing the requirements
2- In the 0th step while creating events_normalized table if I use :
WHERE
_TABLE_SUFFIX <=
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 0 DAY))
I get different results as compared to when I am using
create analytics_data_export.events_normalized AS
SELECT .....
FROM
`analytics_xxxxxx.events_*
The difference is the latter one has the current date data as well wherein events_YYYYMMDD I can only see data of yesterday. I don't understand this behavior
Like if the current day is 20th July in events_YYYYMMDD I can see only till events_20200719
To optimize you can follow below steps:
Create hash out of event_time_stamp and other unique fields, use this to filter the data
Instead of deleting duplicate rows from the larger initial table delete them from small temp table and then insert the table.
its because the filter analytics_xxxxxx.events_* will match both per day events table and intraday event tables which are name
like events_intraday_20200721

Sessions by hits.page.pagePath in GA bigquery tables

I am new to bigquery, so sorry if this is a noob question! I am interested in breaking out sessions by page path or title. I understand one session can contain multiple paths/titles so the sum would be greater than total sessions. Essentially, I want to create a 'session id' and do a count distinct of sessionids where path like a or b.
It might actually be helpful to start at the very beginning and manually calculate total sessions. I tried to concatenate visit id and full visitor id to create a unique visit id, but apparently that is quite different from sessions. Can someone help enlighten me? Thanks!
I am working with our GA site data. Schema is the standard in GA exports.
DATA SAMPLE
Let's use an example out of the sample BigQuery (London Helmet) data:
There are 63 sessions in this day:
SELECT count(*) FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
How many of those sessions are where hits.page.pagePath like /vests% or /helmets%? How many were vests only vs helmets only? Thanks!
Here is an example of how to calculate whether there were only helmets, or only vests or both helmets and vests or neither:
SELECT
visitID,
has_helmets AND has_vests AS both_helmets_and_vests,
has_helmets AND NOT has_vests AS helmets_only,
NOT has_helmets AND has_vests AS vests_only,
NOT has_helmets AND NOT has_vests AS neither_helmets_nor_vests
FROM (
SELECT
visitId,
SOME(hits.page.pagePath like '/helmets%') WITHIN RECORD AS has_helmets,
SOME(hits.page.pagePath like '/vests%') WITHIN RECORD AS has_vests,
FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
)
Way 1, easier but you need to repeat on each field
Obviously you can do something like this :
SELECT count(*) FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910] WHERE hits.page.pagePath like '/helmets%'
And then have multiple queries for your own substrings (one with '/vests%', one with 'helmets%', etc).
Way 2, works fine, but not with repeated fields
If you want ONE query that'll just group by on the first part of the string, you can do something like that :
Select a, Count(*) FROM (SELECT FIRST(SPLIT(hits.page.pagePath, '/')) as a FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910] ) group by a
When I do this, it returns me the following the 63 sessions, with a total count of 63 :).
Way 3, using a FLATTEN on the table to get each hit individually
Since the "hits" field is repeatable, you would need a FLATTEN in your query :
Select a, Count(*) FROM (SELECT FIRST(SPLIT(hits.page.pagePath, '/')) as a FROM FLATTEN ([google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910] , hits)) group by a
The reason why you need to FLATTEN here is that the "hits" field is repeatable. If you don't flatten, it won't look into ALL the "hits" in your response. Adding "FLATTEN" will make you work off a sub-table where each hit is in its own row, so you can query on all of them.
If you want it by sessions instead of hits, (it'll be both), do something like :
Select b, a Count(*) FROM (SELECT FIRST(SPLIT(hits.page.pagePath, '/')) as a, visitID as b, FROM FLATTEN ([google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910] , hits)) group by b, a

How to create database table dynamically and insert data selected by query

I'm working on website where I need to find rank of user on the basis of score. Earlier I'm calculating the score and rank of user by sql query .
select * from (
select
usrid,
ROW_NUMBER()
OVER(ORDER BY (count(*)+sum(sup)+sum(opp)+sum(visited)*0.3) DESC) AS rank,
(count(*)+sum(sup)+sum(opp)+sum(visited)*0.3 ) As score
from [DB_].[dbo].[dsas]
group by usrid) as cash
where usrid=#userid
Please don't concentrate more on query because this is only to explain how I select data.
Problem: Now I can't use above query because every time I use rank it need to select rank from dsas table and data of dsas table is increasing day by day and slows down my website.
What I need is select data by above query and insert in another table named as score. Can we do anything like this?
A better solution is to either include score as a field in your user table or have a separate table for scores. Any time you add new sup, opp, or visited data for a user, also recalculate their score at that time.
Then to get the highest ranking users, you will be able to perform a very simple select statement, ordering by score descending, and only fetching the number of rows you want. It will be very fast.

How can I average number of items in our SQLite database by 24-hour period?

Trying to create a report for our support ticketing system and I'm trying to have 2 results in the report that show a rolling average of how many tickets were opened in a day and how many were closed in a day.
Basically, query the entire tickets table, separate out everything by individual days that the tickets were created on, count the number tickets for each individual day, then average that number.
My friend gave me this query:
SELECT AVG(ticket_count)
FROM (SELECT COUNT(*) AS ticket_count FROM tickets
GROUP BY DATE(created_at, '%Y'), DATE(created_at, '%m'), DATE(created_at, '%d')) AS ticket_part
But it's not seeming to work for me. All I get is a single result with the number of tickets created last year.
Here's what finally worked for me:
SELECT round(CAST(AVG(TicketsOpened) AS REAL), 1) as DailyOpenAvg
FROM
(SELECT date(created_at) as Day, COUNT(*) as TicketsOpened
FROM tickets
GROUP BY date(created_at)
) AS X
The middle part of your query is collapsing the table to a single row, so the outer part has nothing upon which to group. It's hard to say exactly what you need without seeing the schema for ticket_count, but at a guess I'd try this:
SELECT
AVG(CAST(TicketsOpened AS REAL)) -- The cast to REAL ensures that { 1, 2 } averages to 1.5 rather than 1
FROM
(
SELECT
CAST(created_at AS DATE) AS Day -- The cast to DATE truncates any time element; if you're storing date alone, you can omit this
COUNT(*) AS TicketsOpened
FROM
ticket_count
GROUP BY
CAST(created_at AS DATE)
) AS X
Hope that helps!

Resources