COUNT(totals.visits) - is it an accurate measure of sessions? - google-analytics

I am trying to write a query in Google BQ where our GA data is exported. The query is below
SELECT visitStartTime,date,,hits.eCommerceAction.*,count(totals.visits)
FROM flatten([bigquery-xxxxxx:xxxxxxxx.ga_sessions_20180925],hits.eCommerceAction)
WHERE hits.eCommerceAction.action_type <> '0'
GROUP BY date,visitStartTime,hits.eCommerceAction.action_type,hits.eCommerceAction.option,hits.eCommerceAction.step
LIMIT 1000
The output from this looks something like this
date hits_type hits_step hits_option f0_
20180925 5 1 1 0
20180925 2 1 0 1
My question is that when there is an ecommerce hit being sent, how can the session count be 0? (f0 column). Since totals.visits can return 1 or NULL and since count only counts non NULL values, should I be counting any other field like visitID to avoid NULLs? All tutorials online are shown as using totals.visits so I am confused whether I am missing something here.
Thanks

If there is only non interaction hits in the session, totals.visits will be null. If you want to include both interaction and non interaction hits then it's correct to count unique visitId+fullVisitorId combinations.

Related

TERADATA: Is it possible to ignore rows in an OLAP partition when the condition is met and still pass the value down when it isn't met?

I'm partitioning data based on a customers previous order, so if the customer previously added a service to their account (they either have the service or they don't), I want that value to carry down to the next row for that customer for all orders regardless of the order status, but I don't want canceled order services to be calculated with the next order, I want to skip those rows and bring down the value from the previously completed order. Does anyone know if this is possible? If I add the field into the Partition By clause, it'll partition by order status instead of reporting the order status from the previous completed order.
(
Sum
(
SUBSCR1_ORD
)
Over
(
PARTITION BY ACCT_NO
ORDER BY ORDER_DATE
ROWS BETWEEN 1 Preceding AND 1 Preceding
)
)
AS EXISTING_SVC1
This is what I'd want the results to look like for the EXISTING_SVC columns based on activity in the SUBSCR1_ORD column with special handing on ORDER_STATUS
ACCT_NO
ORDER_DATE
ORDER_STATUS
SUBSCR1_ORD
SUBSCR2_ORD
EXISTING_SVC1
EXISTING_SVC2
1234
6/5/2022
Complete
1
null
0
0
1234
6/6/2022
Canceled
-1
1
1
0
1234
6/7/2022
Complete
null
1
1
0
Use LAG with IGNORE NULLS and a CASE expression to "pull down" the prior value.
SELECT Acct_No, Order_Date, Order_Status, Subscr1_Ord, Subscr2_Ord,
LAG(CASE WHEN Order_Status='Canceled' THEN NULL ELSE Subscr1_Ord END,1,0)
IGNORE NULLS
OVER(PARTITION BY Acct_No ORDER BY Order_Date)
AS Existing_Svc1,
LAG(CASE WHEN Order_Status='Canceled' THEN NULL ELSE Subscr2_Ord END,1,0)
IGNORE NULLS
OVER(PARTITION BY Acct_No ORDER BY Order_Date)
AS Existing_Svc2
FROM MyTable
ORDER BY Order_Date;

How to Subtract Pageviews in Google Data Studio using a CASE statement?

I want to subtract Pageviews of a particular page from Pageviews of a different page, but when I try using COUNT with CASE, I get 1:
COUNT(CASE
WHEN page = "www.link1.com" THEN 1 END)
This gives me a wrong COUNT:
COUNT(CASE
WHEN page = "www.link1.com" THEN 1
ELSE 0 END)
What I ultimately want to do is:
COUNT(CASE
WHEN page="www.link1.com" OR page = "www.link2.com" THEN 1
ELSE 0 END) - COUNT(CASE
WHEN page="www.link3.com" THEN 1
ELSE 0 END)
I want the COUNT of Users who have visited link3 but NOT from link1 and link2. These links are steps in a funnel. link1 is the first step in the funnel but link2 and link3 have more Pageviews. I want to show how many users have come from sources other than the previous funnel step (i.e, link1).
Summary
One way it can be achieved is by using either the RegEx Formula (#2) or the CASE Statement (#3), however, as Pageviews is an aggregated Metric, the Calculated Fields will produce the below message when created at the Data Source:
Sorry, calculated fields can't mix metrics (aggregated values) and dimensions (non-aggregated values). Please check the aggregation types of the fields used in this formula. Learn more.
For future reference, added an Image:
The solution is to first use Data Blending to disaggregate the Pageviews field (#1) and then apply the Calculated Field (#2 or #3):
1) Data Blending
Data Source 1
Join Key 1: Date
Join Key 2: Page
Metric: Pageviews
Data Source 2
Join Key 1: Date
Join Key 2: Page
An image to elaborate:
2) RegEx Formula
SUM(NARY_MAX(CAST(REGEXP_REPLACE(CONCAT(Page, ";", Pageviews), "(www\\.link1\\.com|www\\.link2\\.com);(.*)", "\\2") AS NUMBER ), 0 ) ) - SUM(NARY_MAX(CAST(REGEXP_REPLACE(CONCAT(Page, ";", Pageviews), "(www\\.link3\\.com);(.*)", "\\2") AS NUMBER ), 0 ) )
3) (Alternative Calculated Field) CASE Statement
SUM(CASE
WHEN Page IN ("www.link1.com", "www.link2.com") THEN Pageviews
ELSE 0 END) - SUM(CASE
WHEN Page IN ("www.link3.com") THEN Pageviews
ELSE 0 END)
Google Data Studio Report and a GIF to elaborate:

BigQuery GA Exported with Duplicated Rows

We have been trying to explain why this happened in all of our datasets but so far we had no success.
We observed that starting on 18 April our ga_sessions dataset had for the most part duplicated entries (like 99% of rows). As an example, I tested this query:
SELECT
fullvisitorid fv,
visitid v,
ARRAY(
SELECT
AS STRUCT hits.*
FROM
UNNEST(hits) hits
ORDER BY
hits.hitnumber) h
FROM
`dafiti-analytics.40663402.ga_sessions*`
WHERE
1 = 1
AND REGEXP_EXTRACT(_table_suffix, r'.*_(.*)') BETWEEN FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY))AND FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY))
ORDER BY
fv,
v
LIMIT
100
And the result was:
We tried to investigate when this began to happen, so I ran this query:
SELECT
date,
f,
COUNT(f) freq from(
SELECT
date,
fullvisitorid fv,
visitid v,
COUNT(CONCAT(fullvisitorid, CAST(visitid AS string))) f
FROM
`dafiti-analytics.40663402.ga_sessions*`
WHERE
1 = 1
AND PARSE_TIMESTAMP('%Y%m%d', REGEXP_EXTRACT(_table_suffix, r'.*_(.*)')) BETWEEN TIMESTAMP('2017-04-01')
AND TIMESTAMP('2017-04-30')
GROUP BY
fv,
v,
date )
GROUP BY
f,
date
ORDER BY
date,
freq DESC
And we found that for 3 of our projects it started on day 18 April but in accounts related to LATAM data we started seeing duplicated rows just recently as well.
We also checked if in our GCP Console something was logged but couldn't find anything.
Is there some mistake we could have made that caused the duplication in the ga_sessions export? We checked our analytics tracking but it seems to be working just fine. Also there's no modification we did these days that explain it as well.
If you need more info please let me know.
Make sure to match only the intraday or non-intraday tables. For intraday:
`dafiti-analytics.40663402.ga_sessions_intraday*`
For non-intraday:
`dafiti-analytics.40663402.ga_sessions_2017*`
The important part is to include enough of the prefix to match the desired tables.

How to find maximum number of records for a particular key in a table

I was trying to find which customer has more number of records in a table, i got suggested by RANK function but its not the useful in finding the exact record , so i used this following snippet:
select count(customerkey),customerkey
FROM FILEMAPPERTEMPLATE
group by customerkey;
Result :
1 298,254
1 299,732
2 246,027
43 197,053
1 299,745
1 299,751
60 271,623
Though i am able to find how many reocrds attributed to a customerkey in the table, I couldn't find the single exact record(after executing the query ) that has maximum record fro a customer. Please help
I want only
60 271,623 as reult
select * from (select count(customerkey) cnt,customerkey
FROM FILEMAPPERTEMPLATE
group by customerkey order by cnt desc) where rownum<2;

Cognos: Count the number of occurences of a distinct id

I'm making a report in Cognos Report Studio and I'm having abit of trouble getting a count taht I need. What I need to do is count the number of IDs for a department. But I need to split the count between initiated and completed. If an ID occures more than once, it is to be counted as completed. The others, of course, will be initiated. So I'm trying to count the number of ID occurences for a distinct ID. Here is the query I've made in SQl Developer:
SELECT
COUNT((CASE WHEN COUNT(S.RFP_ID) > 8 THEN MAX(CT.GCT_STATUS_HISTORY_CLOSE_DT) END)) AS "Sales Admin Completed"
,COUNT((CASE WHEN COUNT(S.RFP_ID) = 8 THEN MIN(CT.GCT_STATUS_HISTORY_OPEN_DT) END)) as "Sales Admin Initiated"
FROM
ADM.B_RFP_WC_COVERAGE_DIM S
JOIN ADM.B_GROUP_CHANGE_REQUEST_DIM CR
ON S. RFP_ID = CR.GCR_RFP_ID
JOIN ADM.GROUP_CHANGE_TASK_FACT CT
ON CR.GROUP_CHANGE_REQUEST_KEY = CT.GROUP_CHANGE_REQUEST_KEY
JOIN ADM.B_DEPARTMENT_DIM D
ON D.DEPARTMENT_KEY = CT.DEPARTMENT_RESP_KEY
WHERE CR.GCR_CHANGE_TYPE_ID = '20'
AND S.RFP_LOB_IND = 'WC'
AND S.RFP_AUDIT_IND = 'N'
AND CR.GCR_RECEIVED_DT BETWEEN '01-JAN-13' AND '31-DEC-13'
AND D.DEPARTMENT_DESC = 'Sales'
AND CT.GCT_STATUS_IND = 'C'
GROUP BY S.RFP_ID ;
Now this works. But I'm not sure how to translate taht into Cognos. I tried doing a CASE taht looked liek this(this code is using basic names such as dept instead of D.DEPARTMENT_DESC):
CASE WHEN dept = 'Sales' AND count(ID for {DISTINCT ID}) > 1 THEN count(distinct ID)END)
I'm using count(distinct ID) instead of count(maximum(close_date)). But the results would be the same anyway. The "AND" is where I think its being lost. It obviously isn't the proper way to count occurences. But I'm hoping I'm close. Is there a way to do this with a CASE? Or at all?
--EDIT--
To make my question more clear, here is an example:
Say I have this data in my table
ID
---
1
2
3
4
2
5
5
6
2
My desired count output would be:
Initiated Completed
--------- ---------
4 2
This is because two of the distinct IDs (2 and 5) occure more than once. So they are counted as Completed. The ones that occure only once are counted as Initiated. I am able to do this in SQl Dev, but I can't figure out how to do this in Cognos Report Studio. I hope this helps to better explaine my issue.
Oh, I didn't quite got it originally, amending the answer.
But it's still easiest to do with 2 queries in Report Studio. Key moment is that you can use a query as a source for another query, guaranteeing proper group by's and calculations.
So if you have ID list in the table in Report Studio you create:
Query 1 with dataitems:
ID,
count(*) or count (1) as count_occurences
status (initiated or completed) with a formula: if (count_occurences > 1) then ('completed') else ('initiated').
After that you create a query 2 using query one as source with just 2 data items:
[Query1].[Status]
Count with formula: count([Query1].[ID])
That will give you the result you're after.
Here's a link to doco on how to nest queries:
http://pic.dhe.ibm.com/infocenter/cx/v10r1m0/topic/com.ibm.swg.ba.cognos.ug_cr_rptstd.10.1.0.doc/c_cr_rptstd_wrkdat_working_with_queries_rel.html?path=3_3_10_6#cr_rptstd_wrkdat_working_with_queries_rel

Resources