Google BigQuery - Updating nested Revenue fields - google-analytics

I tried to apply the solution in Google BigQuery - Updating a nested repeated field to the field hits.transaction.transactionRevenue, but I receive error message:
Scalar subquery produced more than one element
I have tried to run the following query:
UPDATE `project_id.dataset_id.table`
SET hits = ARRAY(
SELECT AS STRUCT * REPLACE (
(SELECT AS STRUCT transaction.* REPLACE (1 AS transactionRevenue)) AS transaction
)
FROM UNNEST(hits) as transactionRevenue
)
WHERE (select h.transaction.transactionId from unnest(hits) as h) LIKE 'ABC123XYZ'
Are there any obvious mistakes on my part? Would be great if anyone could share some tips or experiences that could help me with this.
What I basically want to do is to set the revenue of a specific transaction to 1.
Many thanks in advance,
David

This is the problem:
WHERE (select h.transaction.transactionId from unnest(hits) as h) LIKE 'ABC123XYZ'
If there is more than one hit in the array, this will cause the error that you are seeing. You probably want this instead:
WHERE EXISTS (select 1 from unnest(hits) as h WHERE h.transaction.transactionId LIKE 'ABC123XYZ')
But note that your UPDATE will now replace all elements of the array for any row where this condition is true. What you may want is to move the condition inside the ARRAY function call instead:
UPDATE `project_id.dataset_id.table`
SET hits = ARRAY(
SELECT AS STRUCT * REPLACE (
(SELECT AS STRUCT transaction.* REPLACE (1 AS transactionRevenue)) AS transaction
)
FROM UNNEST(hits) as h
WHERE h.transaction.transactionId LIKE 'ABC123XYZ'
)
WHERE true
Now the replacement will only apply to hits with a transaction ID matching the pattern.

Related

Accessing Struct(s) and Array(s) in Firebase Closed Funnels through BigQuery

I stumbled unto this standard SQL BigQuery documentation this week, which got me started with a Firebase Analytics Closed Funnel. I however got the wrong results (view image below). There should be no users that had a "Tutorial_LessonCompleted" before they did not start a "Tutorial_LessonStarted >> Lesson = 1 " first. This could be because of various reasons.
Questions:
Is it wise to use the User Property = "first_open_time", or is it better to use the Event = "first_open". How would the latter implementation look like ?
I suspect I am perhaps not correctly drilling down to: Event (String = "Tutorial_LessonStarted") >> parameter (String = "LessonNumber") >> value (String = "lesson1")?
How would a filter on _TABLE_SUFFIX = '20170701' possibly work, I read this will be cheaper. Any optimised code suggestions are received with open arms and an up-vote!
#standardSQL
SELECT
step1, step2, step3, step4, step5, step6,
COUNT(*) AS funnel_count,
COUNT(DISTINCT user_id) AS users
FROM (
SELECT
user_dim.app_info.app_instance_id AS user_id,
event.timestamp_micros AS event_timestamp,
event.name AS step1,
LEAD(event.name, 1) OVER (
PARTITION BY user_dim.app_info.app_instance_id
ORDER BY event.timestamp_micros ASC) as step2,
LEAD(event.name, 2) OVER (
PARTITION BY user_dim.app_info.app_instance_id
ORDER BY event.timestamp_micros ASC) as step3,
LEAD(event.name, 3) OVER (
PARTITION BY user_dim.app_info.app_instance_id
ORDER BY event.timestamp_micros ASC) as step4,
LEAD(event.name, 4) OVER (
PARTITION BY user_dim.app_info.app_instance_id
ORDER BY event.timestamp_micros ASC) as step5,
LEAD(event.name, 5) OVER (
PARTITION BY user_dim.app_info.app_instance_id
ORDER BY event.timestamp_micros ASC) as step6
FROM
`......`,
UNNEST(event_dim) AS event,
UNNEST(user_dim.user_properties) AS user_prop
WHERE user_prop.key = "first_open_time"
ORDER BY 1, 2, 3, 4, 5 ASC
)
WHERE step6 = "Tutorial_LessonStarted" AND EXISTS (
SELECT *
FROM `......`,
UNNEST(event_dim) AS event,
UNNEST(event.params)
WHERE key = 'LessonNumber' AND value.string_value = "lesson1") GROUP BY step1, step2, step3, step4, step5, step6
ORDER BY funnel_count DESC
LIMIT 100;
Note:
Enter your query table FROM, i.e:project_id.com_game_example_IOS.app_events_20170212,
I left out the funnel_count and user_count.
Output:
----------------------------------------------------------
Update since original question above:
#Elliot: I don’t understand why you said: -- ensure that an event with lesson1 precedes Tutorial_LessonStarted.
Tutorial_LessonStarted has a parameter "LessonNumber" with values lesson1,lesson2,lesson3,lesson4.
I want to count all funnels that took place with a last step in the funnel equal to LessonNumber=lesson1.
So, applied to event log-data for a brand new user's first session (aka: an user that fired first_open_time), the answer would be the table below:
View.OnboardingWelcomePage
View.OnboardingFinalPage
View.JamLoading
View.JamLoading
Jam.UserViewsJam
Jam.ProjectOpened
View.JamMixer
Tutorial.LessonStarted (This parameter “LessonNumber"'s value would be equal to “lesson1”)
Jam.ProjectPlayStarted
View.JamLoopSelector
View.JamMixer
View.JamLoopSelector
View.JamMixer
View.JamLoopSelector
View.JamMixer
Tutorial.LessonCompleted
Tutorial.LessonStarted (This parameter “LessonNumber"'s value would be equal to “lesson2”)
So it is important to firstly get all the users that had a first_open_time on a specific day, as well structure the events into a funnel so that the last event in the funnel is one which matches an event and a specific parameter value, and then form the funnel "backwards" from there.
Let me go through some explanation, then see if I can suggest a query to get you started.
It looks like you want to analyze the sequence of events in your analytics data, but the sequence is already there for you--you have an array of the events. Looking at the Firebase schema for BigQuery, event_dim is the relevant column, and unless I'm misunderstanding something, these events are ordered by time. If you want to check what the sixth event's name was, you can use:
event_dim[SAFE_ORDINAL(6)].name
This will evaluate to NULL if there were fewer than six events, or else it will give you the string with the event name.
Another observation is that you are attempting to analyze both event_dim and user_dim, but you are taking the cross product of the two, which will explode the number of rows and make it hard to reason about the results of the query. To look for a specific user property, use an expression of this form:
(SELECT value.value.string_value
FROM UNNEST(user_dim.user_properties)
WHERE key = 'first_open_time') = '<expected property value>'
Combining these two filters, your FROM and WHERE clause would look something like this:
FROM `project_id.com_game_example_IOS.app_events_*`
WHERE _TABLE_SUFFIX = '20170701' AND
event_dim[SAFE_ORDINAL(6)].name = 'Tutorial_LessonStarted' AND
(SELECT value.value.string_value
FROM UNNEST(user_dim.user_properties)
WHERE key = 'first_open_time') = '<expected property value>'
Using the bracket operator to access the steps from event_dim, we can do something like this:
WITH FilteredInput AS (
SELECT *
FROM `project_id.com_game_example_IOS.app_events_*`
WHERE _TABLE_SUFFIX = '20170701' AND
event_dim[SAFE_ORDINAL(6)].name = 'Tutorial_LessonStarted' AND
(SELECT value.value.string_value
FROM UNNEST(user_dim.user_properties)
WHERE key = 'first_open_time') = '<expected property value>' AND
-- ensure that an event with lesson1 precedes Tutorial_LessonStarted
EXISTS (
SELECT 1
FROM UNNEST(event_dim) WITH OFFSET event_offset
CROSS JOIN UNNEST(params)
WHERE key = 'LessonNumber' AND
value.string_value = 'lesson1' AND
event_offset < 5
)
)
SELECT
event_dim[ORDINAL(1)].name AS step1,
event_dim[ORDINAL(2)].name AS step2,
event_dim[ORDINAL(3)].name AS step3,
event_dim[ORDINAL(4)].name AS step4,
event_dim[ORDINAL(5)].name AS step5,
event_dim[ORDINAL(6)].name AS step6,
COUNT(*) AS funnel_count,
COUNT(DISTINCT user_dim.user_id) AS users
FROM FilteredInput
GROUP BY step1, step2, step3, step4, step5, step6;
This will return all unique "paths" along with a count and number of distinct users for each. Note that I'm just writing this off the top of my head--I don't have representative data that I can try it on--so there may be syntax or other errors.

Insert into Table with the first column being a Sequence

I am trying to use an Insert, Sequence and Select * to work together.
INSERT INTO BRK_INDV
Select * from (Select brk_seq.NEXTVAL as INDV_SEQ, a.*
FROM (select to_date(to_char(REQUEST_DATETIME,'DD-MM-YYYY'),'DD-MM-YYYY') BUSINESS_DAY, to_char(REQUEST_DATETIME,'hh24') src_hour,
CASE tran_type
WHEN 'V' THEN 'Visa'
WHEN 'M' THEN 'MasterCard'
ELSE tran_type
end text,
tran_type, count(*) as count
from DLY_STATS
where 1=1
AND to_date(to_char(REQUEST_DATETIME,'DD-MM-YYYY'),'DD-MM-YYYY') = '09-FEB-2015'
group by to_date(to_char(REQUEST_DATETIME,'DD-MM-YYYY'),'DD-MM-YYYY'),to_char(REQUEST_DATETIME,'hh24'),tran_type order by src_hour)a);
This gives me the following error:
ERROR at line 2:
ORA-02287: sequence number not allowed here
I tried to remove the order by and still the same error.
However, if I only run
Select brk_seq.NEXTVAL as INDV_SEQ, a.*
FROM (select to_date(to_char(REQUEST_DATETIME,'DD-MM-YYYY'),'DD-MM-YYYY') BUSINESS_DAY, to_char(REQUEST_DATETIME,'hh24') src_hour,
CASE tran_type
WHEN 'V' THEN 'Visa'
WHEN 'M' THEN 'MasterCard'
ELSE tran_type
end text,
tran_type, count(*) as count
from DLY_STATS
where 1=1
AND to_date(to_char(REQUEST_DATETIME,'DD-MM-YYYY'),'DD-MM-YYYY') = '09-FEB-2015'
group by to_date(to_char(REQUEST_DATETIME,'DD-MM-YYYY'),'DD-MM-YYYY'),to_char(REQUEST_DATETIME,'hh24'),tran_type order by src_hour)a;
It shows me proper entries. Then, why is select * not working for that?
Kindly help.
I see what you're trying to do. You want to insert rows into the BRK_INDV table in a particular order. The sequence number, which I assume will be the primary key of BRK_INDV, will be generated sequentially in the sorted order of the input rows.
You are working with a relational database. One of the first characteristics we all learn about a relational database is that the order of the rows in a table is insignificant. That's just a fancy word for fugitaboutit.
You cannot assume that a select * from table will return the rows in the same order they were written. It might. It might for quite a long time. Then something -- the number of rows, the grouping of some column values, the phase of the moon -- something will change and you will get them out in a seemingly totally random order.
If you want order, it must be imposed in the query, not the insert.
Here's the statement you should be executing:
INSERT INTO BRK_INDV
With
Grouped( Business_Day, Src_Hour, Text, Tran_Type, Count )As(
Select Trunc( Request_Datetime ) Business_Day,
To_Char( Request_Datetime, 'hh24') Src_Hour,
Case Tran_Type
When 'V' Then 'Visa'
When 'M' Then 'MasterCard'
Else Tran_Type
end Text,
Tran_Type, count(*) as count
from DLY_STATS
Where 1=1 --> Generated as dynamic SQL?
And Request_Datetime >= Date '2015-02-09'
And Request_Datetime < Date '2015-02-10'
Group By Trunc( Request_Datetime ), To_Char( Request_Datetime, 'hh24'), Tran_Type
)
Select brk_seq.Nextval Indv_Seq, G.*
from Grouped G;
Notice there is no order by. If you want to see the generated rows in a particular order:
select * from Brk_Indv order by src_hour;
Since there could be hundreds or thousands of transactions in any particular hour, you probably order by something other than hour anyway.
In Oracle, the trunc function is the best way to get a date with the time portion stripped away. However, you don't want to use it in the where clause (or, aamof, any other function such as to_date or to_char)as that would make the clause non-sargable and result in a complete table scan.
The problem is that you can't use a sequence in a subquery. For example, this gives the same ORA-02287 error you are getting:
create table T (x number);
create sequence s;
insert into T (select * from (select s.nextval from dual));
What you can do, though, is create a function that returns nextval from the sequence, and use that in a subquery:
create function f return number as
begin
return s.nextval;
end;
/
insert into T (select * from (select f() from dual));

MDX COUNT Number of Customers afected by a MDX query

I'm having a problem to get the count od a MDX query. I have a first query like this:
SELECT { [Measures].[Sale Amount] } ON COLUMNS,
NON EMPTY FILTER (
{[Customer].[Full Name].Children} * {[Report].[Name].Children}
,
([Measures].[Sale Amount] > 100)
AND (([Report].[Name].&[Report1]) OR ([Report].[Name].&[Report2]))
AND ([Report].[Name].&[Report3])
) ON ROWS
FROM [Default]
This will display the data that I need. But from here I need to know how many customers are within this result. For that I have the following MDX query
WITH MEMBER MEASURES.X AS Exists(
[Customer].[Customer Key].Children,
FILTER (
{[Customer].[Full Name].Children} * {[Report].[Name].Children}
,
([Measures].[Sale Amount] > 100)
AND (([Report].[Name].&[Report1]) OR ([Report].[Name].&[Report2]))
AND ([Report].[Name].&[Report3])
), 'Customer').Count
SELECT Measures.X ON 0 FROM [Default]
(the filter area of both queries is the same)
This last query always returns 0 results. I know that there are customers should be customers affected by this query Can any one give me a tip of what I am doing wrong?
Thanks
I would just use the count of the filter:
WITH MEMBER MEASURES.X AS
FILTER (
{[Customer].[Full Name].Children} * {[Report].[Name].Children}
,
([Measures].[Sale Amount] > 100)
AND (([Report].[Name].&[Report1]) OR ([Report].[Name].&[Report2]))
AND ([Report].[Name].&[Report3])
).Count
SELECT Measures.X ON 0 FROM [Default]

TYPO3 Extbase order by child records COUNT

I have this model:
News -> 1:n -> Visit
News -> m:n -> FrontendUserGroup
FrontendUser -> 1:n -> Visit
So Visit shows which FrontendUser accessed which News.
I need to get all news for the currently logged in FrontendUser.
All News should be ordered DESC by the "datetime" property, but first should appear the news which are not visited yet by the logged in user.
This is the SQL which gives me the correct results:
SELECT
(SELECT COUNT(*) FROM tx_xxnews_domain_model_visit v WHERE v.news = n.uid AND v.fe_user = fu.uid) AS visits,
n.*
FROM tx_xxnews_domain_model_news AS n
JOIN tx_xxnews_news_frontendusergroup_mm nfg ON n.uid = nfg.uid_local
JOIN fe_users fu ON FIND_IN_SET(nfg.uid_foreign, fu.usergroup)
WHERE fu.uid = 2271 # logged in user id
ORDER BY visits ASC, n.datetime DESC
Is there any way to get this result with Extbase?
I tried in NewsRepository this:
protected $defaultOrderings = array(
'COUNT(visits)' => \TYPO3\CMS\Extbase\Persistence\QueryInterface::ORDER_ASCENDING,
'datetime' => \TYPO3\CMS\Extbase\Persistence\QueryInterface::ORDER_DESCENDING
);
but it doesn't seem to work.
Any ideas?
Thank you.
This is in fact not possible since $defaultOrderings expects properties, not SQL field names or functions.
As far as I know, the only possibility is to use$query->statement('[YOUR QUERY]');.

PL/SQL - comma separated list within IN CLAUSE

I am having trouble getting a block of pl/sql code to work. In the top of my procedure I get some data from my oracle apex application on what checkboxes are checked. Because the report that contains the checkboxes is generated dynamically I have to loop through the
APEX_APPLICATION.G_F01
list and generate a comma separated string which looks like this
v_list VARCHAR2(255) := (1,3,5,9,10);
I want to then query on that list later and place the v_list on an IN clause like so
SELECT * FROM users
WHERE user_id IN (v_list);
This of course throws an error. My question is what can I convert the v_list to in order to be able to insert it into a IN clause in a query within a pl/sql procedure?
If users is small and user_id doesn't contain commas, you could use:
SELECT * FROM users WHERE ',' || v_list || ',' LIKE '%,'||user_id||',%'
This query is not optimal though because it can't use indexes on user_id.
I advise you to use a pipelined function that returns a table of NUMBER that you can query directly. For example:
CREATE TYPE tab_number IS TABLE OF NUMBER;
/
CREATE OR REPLACE FUNCTION string_to_table_num(p VARCHAR2)
RETURN tab_number
PIPELINED IS
BEGIN
FOR cc IN (SELECT rtrim(regexp_substr(str, '[^,]*,', 1, level), ',') res
FROM (SELECT p || ',' str FROM dual)
CONNECT BY level <= length(str)
- length(replace(str, ',', ''))) LOOP
PIPE ROW(cc.res);
END LOOP;
END;
/
You would then be able to build queries such as:
SELECT *
FROM users
WHERE user_id IN (SELECT *
FROM TABLE(string_to_table_num('1,2,3,4,5'));
You can use XMLTABLE as follows
SELECT * FROM users
WHERE user_id IN (SELECT to_number(column_value) FROM XMLTABLE(v_list));
I have tried to find a solution for that too but never succeeded. You can build the query as a string and then run EXECUTE IMMEDIATE, see http://docs.oracle.com/cd/B19306_01/appdev.102/b14261/dynamic.htm#i14500.
That said, it just occurred to me that the argument of an IN clause can be a sub-select:
SELECT * FROM users
WHERE user_id IN (SELECT something FROM somewhere)
so, is it possible to expose the checkbox values as a stored function? Then you might be able to do something like
SELECT * FROM users
WHERE user_id IN (SELECT my_package.checkbox_func FROM dual)
Personally, i like this approach:
with t as (select 'a,b,c,d,e' str from dual)
--
select val
from t, xmltable('/root/e/text()'
passing xmltype('<root><e>' || replace(t.str,',','</e><e>')|| '</e></root>')
columns val varchar2(10) path '/'
)
Which can be found among other examples in Thread: Split Comma Delimited String Oracle
If you feel like swamping in even more options, visit the OTN plsql forums.

Resources