I can get sessions by user ID, but I'm having problems to select also date. So my end result would have 3 cols: date, userid, sessions (where sessions is the total number of sessions for that day).
Sessions by user id, missing the date column:
SELECT
(SELECT value FROM UNNEST(customDimensions) WHERE index=2) AS userId,
sum(totals.visits) AS sessions,
FROM
`ga-360-tvgo.76246634.ga_sessions_*`
WHERE _table_suffix BETWEEN "20200701" AND "20200723"
GROUP BY
1
Expected result:
I think you can simply add it to your groups
SELECT
date,
(SELECT value FROM UNNEST(customDimensions) WHERE index=2) AS userId,
sum(totals.visits) AS sessions,
FROM
`ga-360-tvgo.76246634.ga_sessions_*`
WHERE _table_suffix BETWEEN "20200701" AND "20200723"
GROUP BY
1,2
Related
id,date,source,target,identifier
1,2020-10-10,internal,external,abc-123
2,2020-10-10,internal,internal,xyz-123
3,2020-10-11,external,external,abc-123
4,2020-10-12,external,external,abc-123
There are three entries for the same record (abc-123) and I would like to filter out the oldest and the newest record. For all the records, if there are duplicates then I would like to get the oldest and newest record.
I have no idea how to construct such a query. Any help will be greatly appreciated.
You could use analytic functions here:
WITH cte AS (
SELECT *, MIN(date) OVER (PARTITION BY identifier) min_date,
MAX(date) OVER (PARTITION BY identifier) max_date
FROM yourTable
)
SELECT id, date, source, target, identifier
FROM cte
WHERE date IN (min_date, max_date);
The CTE above adds to your table two new columns for the min and max date per each identifier. The outer query then restricts to only records having those min or max dates.
I am calculating Bounced sessions (sessions with only 1 pageview) via BQ.
Query is joining a table that gives me number of all sessions and a table that gives me bounced sessions.
When I run my query on just one specific date, my numbers match with the numbers in GA, but if I select bigger timeframe, for example a month, the numbers (only for Bounced sessions) are off.
Also, if I run each subquery separately, I get correct numbers for any timeframe.
Here is my query:
SELECT
A.date AS Date,
A.Landing_Content_Group AS Landing_Content_Group,
MAX(A.sessions) AS Sessions,
MAX(B.Bounced_Sessions) AS Bounced_Sessions
FROM (
SELECT
date,
hits.contentGroup.contentGroup2 AS Landing_Content_Group,
COUNT(DISTINCT CONCAT(CAST(visitStartTime AS string),fullVisitorId)) AS sessions
FROM
`122206032.ga_sessions_201808*`,
UNNEST(hits) AS hits
WHERE
hits.type="PAGE"
AND hits.isEntrance = TRUE
GROUP BY
date,
Landing_Content_Group
ORDER BY
date DESC,
sessions DESC ) A
LEFT JOIN (
SELECT
date,
hits.contentGroup.contentGroup2 AS Landing_Content_Group,
COUNT(DISTINCT CONCAT(CAST(visitStartTime AS string),fullVisitorId)) AS Bounced_Sessions
FROM
`122206032.ga_sessions_201808*`,
UNNEST(hits) AS hits
WHERE
hits.type="PAGE"
AND totals.pageviews = 1
AND hits.isEntrance = TRUE
GROUP BY
date,
Landing_Content_Group
ORDER BY
date DESC,
Bounced_Sessions DESC ) B
ON
a.Landing_Content_Group = b.Landing_Content_Group
GROUP BY
Date,
Landing_Content_Group
ORDER BY
Date DESC,
Sessions DESC
What I should get:
GA results
What I get in BQ for that date when a time frame is a month:
BQ results
I tried different JOINs and Aggregations but so far still in the unknown :)
Ok, I solved it, the solution was to also join the tables on the date.
ON
a.date = b.date
AND a.Landing_Content_Group = b.Landing_Content_Group
Below is the link of my previous quetsion. It worked as suggested by one of the community members #Doneth.
But when I use '2018-05-31' in coalesce , i'm getting an error saying 'datatype mismatch in if/then else statement'.
Query used:
with cte
{
SELECT customer_id, bal, st_ts,
-- return the next row's date
Coalesce(Min(st_ts)
Over (PARTITION BY customer_id
ORDER BY st_ts
ROWS BETWEEN 1 Following AND 1 Following)
,'2018-05-31') AS next_Txn_dt
FROM BAL_DET;
}
SELECT customer_id, bal
,Last(pd) -- last day of the period
FROM cTE
-- make a period of the current and next row's date
-- and return one row per day
EXPAND ON PERIOD(ST_TS, next_Txn_dt) AS pd;
Link to my question:
Retain values till there is a change in value in Teradata
I am pretty new to BigQuery and have a question about grouping the Date using Google Analytics data (StandardSQL). The data is currently on daily level, how can I aggregate this to Year/Month level?
Desired outcome: Data on year/month level + selection of only the last 12 months.
#StandardSQL
SELECT
TIMESTAMP(PARSE_DATE('%Y%m%d',date)) as Date,
SUM(totals.visits) AS Visits,
totals.timeOnSite AS TimeOnSite,
totals.newVisits AS NewVisit
FROM
`XXXX.ga_sessions_20*`
WHERE
_TABLE_SUFFIX >= '180215'
GROUP by
Date,
TimeOnSite,
NewVisit
Thanks in advance!
As you limit the data selection to the previous year and if you have a field in your database that registers the date of the visit, you can get your aggregated results per month using this query:
#StandardSQL
SELECT
EXTRACT(MONTH FROM 'date_field_of_the_visit') AS Month,
sum(totals.visits) AS Visits
FROM
'XXXX.ga_sessions_20*'
WHERE
_TABLE_SUFFIX >= '170312'
Group by Month
You can use DATE_TRUNC function (https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#date_trunc) for that:
#StandardSQL
SELECT
DATE_TRUNC(PARSE_DATE('%Y%m%d',date), MONTH) as MonthStart,
SUM(totals.visits) AS Visits,
totals.timeOnSite AS TimeOnSite,
totals.newVisits AS NewVisit
FROM
`XXXX.ga_sessions_20*`
WHERE
_TABLE_SUFFIX >= '180215'
GROUP by
Date,
TimeOnSite,
NewVisit
Hoping for some help with this:
1) I am trying to select all calls (rows) for CustomerIDs that show up 6 or more times within a 30 day rolling period, so if the CustomerID is within the file 6 or more times within 30 days, then it would provide me with all records for that CustomerID.
2) I also need to select all calls for CustomerIDs that show up 2 or more times within a 30 day rolling period but ONLY if two certain columns also match (CallType1 and CallType2). Very similar to the query with the 6 calls but we need to consider that the call types are exactly the same too.
SELECT * FROM tablename
WHERE CustomerID IN (SELECT CustomerID FROM tablename
WHERE "CustomerID"
IN ('MyProgram'));
The query above selects all of the CustomerIDs which reach my program. I need to add the logic to count >=6 CustomerIDs (item 1 above) and then a second query to get the >=2 with the same CallTypes.
The innermost subquery computes how many calls there are in the window beginning at First.
The middle subquery checks this value for every possible window in the table. (This is inefficient, but SQLite has no window functions.)
SELECT *
FROM TableName
WHERE CustomerID IN (SELECT CustomerID
FROM TableName AS First
WHERE (SELECT COUNT(*)
FROM TableName
WHERE Date BETWEEN First.Date
AND date(First.Date, '+30 days')
AND CustomerID = First.CustomerID
) >= 6)
This assumes that there is a column Date using the default date format (yyyy-mm-dd).