Undesired flattening occuring - google-analytics

I'm using BigQuery on exported GA data (see schema here)
Looking at the documentation, I see that when I selected a field that is inside a record it will automatically flatten that record and duplicate the surrounding columns.
So I tried to create a denormalized table that I could query in a more SQL like mindset
SELECT
CONCAT( date, " ", if (hits.hour < 10,
CONCAT("0", STRING(hits.hour)),
STRING(hits.hour)), ":", IF(hits.minute < 10, CONCAT("0", STRING(hits.minute)), STRING(hits.minute)) ) AS hits.date__STRING,
CONCAT(fullVisitorId, STRING(visitId)) AS session_id__STRING,
fullVisitorId AS google_identity__STRING,
MAX(IF(hits.customDimensions.index=7, hits.customDimensions.value,NULL)) WITHIN RECORD AS customer_id__LONG,
hits.hitNumber AS hit_number__INT,
hits.type AS hit_type__STRING,
hits.isInteraction AS hit_is_interaction__BOOLEAN,
hits.isEntrance AS hit_is_entrance__BOOLEAN,
hits.isExit AS hit_is_exit__BOOLEAN,
hits.promotion.promoId AS promotion_id__STRING,
hits.promotion.promoName AS promotion_name__STRING,
hits.promotion.promoCreative AS promotion_creative__STRING,
hits.promotion.promoPosition AS promotion_position__STRING,
hits.eventInfo.eventCategory AS event_category__STRING,
hits.eventInfo.eventAction AS event_action__STRING,
hits.eventInfo.eventLabel AS event_label__STRING,
hits.eventInfo.eventValue AS event_value__INT,
device.language AS device_language__STRING,
device.screenResolution AS device_resolution__STRING,
device.deviceCategory AS device_category__STRING,
device.operatingSystem AS device_os__STRING,
geoNetwork.country AS geo_country__STRING,
geoNetwork.region AS geo_region__STRING,
hits.page.searchKeyword AS hit_search_keyword__STRING,
hits.page.searchCategory AS hits_search_category__STRING,
hits.page.pageTitle AS hits_page_title__STRING,
hits.page.pagePath AS page_path__STRING,
hits.page.hostname AS page_hostname__STRING,
hits.eCommerceAction.action_type AS commerce_action_type__INT,
hits.eCommerceAction.step AS commerce_action_step__INT,
hits.eCommerceAction.option AS commerce_action_option__STRING,
hits.product.productSKU AS product_sku__STRING,
hits.product.v2ProductName AS product_name__STRING,
hits.product.productRevenue AS product_revenue__INT,
hits.product.productPrice AS product_price__INT,
hits.product.productQuantity AS product_quantity__INT,
hits.product.productRefundAmount AS hits.product.product_refund_amount__INT,
hits.product.v2ProductCategory AS product_category__STRING,
hits.transaction.transactionId AS transaction_id__STRING,
hits.transaction.transactionCoupon AS transaction_coupon__STRING,
hits.transaction.transactionRevenue AS transaction_revenue__INT,
hits.transaction.transactionTax AS transaction_tax__INT,
hits.transaction.transactionShipping AS transaction_shipping__INT,
hits.transaction.affiliation AS transaction_affiliation__STRING,
hits.appInfo.screenName AS app_current_name__STRING,
hits.appInfo.screenDepth AS app_screen_depth__INT,
hits.appInfo.landingScreenName AS app_landing_screen__STRING,
hits.appInfo.exitScreenName AS app_exit_screen__STRING,
hits.exceptionInfo.description AS exception_description__STRING,
hits.exceptionInfo.isFatal AS exception_is_fatal__BOOLEAN
FROM
[98513938.ga_sessions_20151112]
HAVING
customer_id__LONG IS NOT NULL
AND customer_id__LONG != 'NA'
AND customer_id__LONG != ''
I wrote the result of this table into another table denorm (flatten on, large data set on).
I get different results when I query denorm with the clause
WHERE session_id_STRING = "100001897901013346771447300813"
versus wrapping the above query in (which yields desired results)
SELECT * FROM (_above query_) as foo where session_id_STRING = 100001897901013346771447300813
I'm sure this is by design, but if someone could explain the difference between these two methods that would be very helpful?

I believe you are saying that you did check the box "Flatten Results" when you created the output table? And I assume from your question that session_id_STRING is a repeated field?
If those are correct assumptions, then what you are seeing is exactly the behavior you referenced from the documentation above. You asked BigQuery to "flatten results" so it turned your repeated field into an un-repeated field and duplicated all the fields around it so that you have a flat (i.e., no repeated data) table.
If the desired behavior is the one you see when querying over the subquery, then you should uncheck that box when creating your table.

Looking at the documentation, I see that when I selected a field that
is inside a record it will automatically flatten that record and
duplicate the surrounding columns.
This is not correct. BTW, can you please point to the documentation - it needs to be improved.
Selecting a field does not flatten that record. So if you have a table T with a single record {a = 1, b = (2, 2, 3)}, then do
SELECT * FROM T WHERE b = 2
You still get a single record {a = 1, b = (2, 2)}. SELECT COUNT(a) from this subquery would return 1.
But once you write results of this query with flatten=on, you get two records: {a = 1, b = 2}, {a = 1, b = 2}. SELECT COUNT(a) from the flattened table would return 2.

Related

order of search for Sqlite's "IN" operator guaranteed?

I'm performing an Sqlite3 query similar to
SELECT * FROM nodes WHERE name IN ('name1', 'name2', 'name3', ...) LIMIT 1
Am I guaranteed that it will search for name1 first, name2 second, etc? Such that by limiting my output to 1 I know that I found the first hit according to my ordering of items in the IN clause?
Update: with some testing it seems to always return the first hit in the index regardless of the IN order. It's using the order of the index on name. Is there some way to enforce the search order?
The order of the returned rows is not guaranteed to match the order of the items inside the parenthesis after IN.
What you can do is use ORDER BY in your statement with the use of the function INSTR():
SELECT * FROM nodes
WHERE name IN ('name1', 'name2', 'name3')
ORDER BY INSTR(',name1,name2,name3,', ',' || name || ',')
LIMIT 1
This code uses the same list from the IN clause as a string, where the items are in the same order, concatenated and separated by commas, assuming that the items do not contain commas.
This way the results are ordered by their position in the list and then LIMIT 1 will return the 1st of them which is closer to the start of the list.
Another way to achieve the same results is by using a CTE which returns the list along with an Id which serves as the desired ordering of the results, which will be joined to the table:
WITH list(id, item) AS (
SELECT 1, 'name1' UNION ALL
SELECT 2, 'name2' UNION ALL
SELECT 3, 'name3'
)
SELECT n.*
FROM nodes n INNER JOIN list l
ON l.item = n.name
ORDER BY l.id
LIMIT 1
Or:
WITH list(id, item) AS (
SELECT * FROM (VALUES
(1, 'name1'), (2, 'name2'), (3, 'name3')
)
)
SELECT n.*
FROM nodes n INNER JOIN list l
ON l.item = n.name
ORDER BY l.id
LIMIT 1
This way you don't have to repeat the list twice.

SQLITE: pull items in the same order as IN array

I have the following query:
SELECT * from tWords where WordAton IN ("bbb", "aaa2", "ccc", "aaa1")
the query returns first the results for "aaa1" then for "aaa2" then for "bbb" and then for "ccc". Is there a way to return the results in the order of the input array, which means first the results for "bbb" then for "aaa2"... etc.
Thank you in advance.
You can apply conditional ordering like this:
SELECT *
from tWords
where WordAton IN ('bbb', 'aaa2', 'ccc', 'aaa1')
order by case WordAton
when 'bbb' then 1
when 'aaa2' then 2
when 'ccc' then 3
when 'aaa1' then 4
end
In SQL (not just SQLite), the only way to always return rows in a given order is with a SQL ORDER BY ... clause. So the short answer is, "No", there's no simple way to return rows in the order given by the contents of an IN (...) clause.
You could use a common table expression (CTE) to define a sort order, but that's usually not worth the trouble. This isn't the same thing as ordering by the contents of a IN (...) clause, but it looks the same. (You're ordering by the sort order specified in the CTE.)
with ordered_words as (
select 1 as sort_order, 'bbb' as WordAton union
select 2, 'aaa2' union
select 3, 'ccc' union
select 4, 'aaa1'
)
select t.WordAton
from tWords t
join ordered_words o on t.WordAton = o.WordAton
where t.WordAton in ('bbb', 'aaa2', 'ccc', 'aaa1')
order by o.sort_order;

MS Access Group by query by n records

Does anybody know how to do a group by query by n records.
For example if I have a db with xn records I would like to aggregate the first 3 and then the next 3 and so on.
Where {x,n member of positive integers excluding 0} :)
Thanks
This does exactly what you want :
SELECT int(((T.Rank - 1) / 3)) AS GroupID, SUM(T.field_to_agregate)
FROM
(
SELECT (SELECT COUNT(*) FROM your_table AS T2 WHERE T1.ID>T2.ID) + 1 AS Rank , ID, field_to_agregate
FROM your_table AS T1
) T
GROUP BY int(((T.Rank - 1) / 3))
However since you did not posted any data sample and table structure (mistake!), I had to suppose that you have an ID field in your table, if not you will have to adapt it. If you don't suceed add more info about your data and I will adapt my query to match your table struct

Select multiple tables with same id

I need to select data from four tables based on only one.
In my 'calculated' table, I have all the records I need.
But I need to retrieve some other info for each record, from 'programs', 'term' and 'imported' tables.
'calculated' has ID from 'programs'.
But, to achieve a record from 'imported', I need to join the 'item' table, because 'item' has ID from 'programs' and from 'imported'.
'term' has ID from 'imported'.
So, I tried this:
select c.date,
p.name,
c.name1,
c.name2,
t.date,
i.version,
c.price1,
c.price2,
c.price3
from calculated c, programs p, term t, imported i, item it
where c.programs_id = p.programs_id
and c.programs_id = it.programs_id
and it.imported_id = i.imported_id
and i.term_id = t.term_id;
But when I use count(*) on 'calculated', I get 30k of records, and from my select statement I get more than 130 millions of records.
What am I doing wrong?
What should I do for this to work?
If all duplicates rows are equivalent, u can try smth like this
select c.date,
p.name,
c.name1,
c.name2,
t.date,
i.version,
c.price1,
c.price2,
c.price3
from calculated c, programs p, term t, imported i
where c.programs_id = p.programs_id and
(select imported_id from item it where c.programs_id = it.programs_id and rownum = 1) = i.imported_id
and i.term_id = t.term_id;
where "rownum = 1" is restriction on the selection of one line for oracle.
you forgot to join term table.
Probably you need to add
and t.term_id = i.term_id

sqlite - how do I get a one row result back? (luaSQLite3)

How can I get a single row result (e.g. in form of a table/array) back from a sql statement. Using Lua Sqlite (LuaSQLite3). For example this one:
SELECT * FROM sqlite_master WHERE name ='myTable';
So far I note:
using "nrows"/"rows" it gives an iterator back
using "exec" it doesn't seem to give a result back(?)
Specific questions are then:
Q1 - How to get a single row (say first row) result back?
Q2 - How to get row count? (e.g. num_rows_returned = db:XXXX(sql))
In order to get a single row use the db:first_row method. Like so.
row = db:first_row("SELECT `id` FROM `table`")
print(row.id)
In order to get the row count use the SQL COUNT statement. Like so.
row = db:first_row("SELECT COUNT(`id`) AS count FROM `table`")
print(row.count)
EDIT: Ah, sorry for that. Here are some methods that should work.
You can also use db:nrows. Like so.
rows = db:nrows("SELECT `id` FROM `table`")
row = rows[1]
print(row.id)
We can also modify this to get the number of rows.
rows = db:nrows("SELECT COUNT(`id`) AS count FROM `table`")
row = rows[1]
print(row.count)
Here is a demo of getting the returned count:
> require "lsqlite3"
> db = sqlite3.open":memory:"
> db:exec "create table foo (x,y,z);"
> for x in db:urows "select count(*) from foo" do print(x) end
0
> db:exec "insert into foo values (10,11,12);"
> for x in db:urows "select count(*) from foo" do print(x) end
1
>
Just loop over the iterator you get back from the rows or whichever function you use. Except you put a break at the end, so you only iterate once.
Getting the count is all about using SQL. You compute it with the SELECT statement:
SELECT count(*) FROM ...
This will return one row containing a single value: the number of rows in the query.
This is similar to what I'm using in my project and works well for me.
local query = "SELECT content FROM playerData WHERE name = 'myTable' LIMIT 1"
local queryResultTable = {}
local queryFunction = function(userData, numberOfColumns, columnValues, columnTitles)
for i = 1, numberOfColumns do
queryResultTable[columnTitles[i]] = columnValues[i]
end
end
db:exec(query, queryFunction)
for k,v in pairs(queryResultTable) do
print(k,v)
end
You can even concatenate values into the query to place inside a generic method/function.
local query = "SELECT * FROM ZQuestionTable WHERE ConceptNumber = "..conceptNumber.." AND QuestionNumber = "..questionNumber.." LIMIT 1"

Resources