UNION of tables using bigquery LegacySQL - firebase

I'm trying without luck to do a query to retrieve the union two tables of events using legacySQL, as standardSQL is not yet supported on data studio.
In standardSQL that would be something like:
SELECT
*
FROM
`com_myapp_ANDROID.app_events_*`,
`com_myapp_IOS.app_events_*`
However, in legacySQL I get an error when trying to refer app_events_*. How do I include all the tables of my events, so I can filter it afterwards on data studio if I can't use the wildcard?
I've tried something like:
select * from (TABLE_QUERY(com_myapp_ANDROID, 'table_id CONTAINS "app_events_"'))
But not sure if this is the right approach, I get:
Cannot output multiple independently repeated fields at the same time.
Found user_dim_user_properties_value_index and event_dim_date
Edit: in the end this is the result of the query, as you can't use directly FLATTEN with TABLE_QUERY:
select
*
from
FLATTEN((SELECT * FROM TABLE_QUERY(com_myapp_ANDROID, 'table_id CONTAINS "app_events"')),user_dim.user_properties),
FLATTEN((SELECT * FROM TABLE_QUERY(com_myapp_IOS, 'table_id CONTAINS "app_events"')),user_dim.user_properties)

Table wildcards don't work in legacy SQL as you have guessed so you have to use the TABLE_QUERY() function.
Your approach is right but the first parameter in the TABLE_QUERY function should be the dataset name not the first part of the table name. Assuming your dataset name is app_events that would look like this:
TABLE_QUERY(app_events,'table_id CONTAINS "app_events"')

In legacySQL the union table operator is comma
select * from [table1],[table2]
For TABLE_QUERY you would include the dataset name as first param, and the expression for the second
select * from (TABLE_QUERY([dataset], 'table_id CONTAINS "event"'))
to read more how to debug TABLE_QUERY read this linked answer
The Web UI automatically flattens you the results, but when there are independent repeated fields you need to flatten with the FLATTEN wrapper.
It takes two params, table, and repeated field eg: FLATTEN(table, tags)
Also if TABLE_QUERY is involved you need to subselect probably like
select
*
from
FLATTEN((SELECT * FROM TABLE_QUERY(com_myapp_ANDROID, 'table_id CONTAINS "app_events"')),user_dim.user_properties)

That particular issue you are experiencing is not UNION related - you will see same error message even with just one table if the table has multiple independently repeated fields and you are trying to output them at once. This scenario is specific to Legacy SQL and can be resolved with use of FLATTEN clause
At the same time, most likely you don't actually mean to use SELECT * which cause those repeated fields to be in output all at the same time. If you can narrow down your output list - you have slight chance to address it - but if still few independently repeated fields are in output - you can use FLATTEN technique

Related

Do not fail on missing column in a SQLLite query

I have a simple query like this:
SELECT * FROM CUSTOMERS WHERE CUSTID LIKE '~' AND BANKNO LIKE '~'
The problem is, the customers-table might or might not contain the BANKNO column depending on circumstances I've no control over. If however BANKNO is not a column in CUSTOMERS, this query fails.
So my question is: it is possible to test if the BANKNO column exists and if so, to include it in the query and if not to exclude this column?
The query really has to be flexible.
A non-existent column in a SELECT to sqlite3 will always fail.
One option might be to put the "full" sql in a try block, and if it errors, execute the other sql.
Or, you could query PRAGMA table_info('CUSTOMERS') and interrogate the result to see if a column in question is in the database. Find the sqlite doc here https://www.sqlite.org/pragma.html#pragma_table_info.
I'm sure there are other options, but the bottom line is you need to know before the sql is executed that it contains only valid column names.

How to use dynamic values while executing SQL scripts in R

My R workflow now involves dealing with a lot of queries (RPostgreSQL library). I really want to make code easy to maintain and manage in the future.
I started loading large queries from separate .SQL files (this helped) and it worked great.
Then I started using interpolated values (that helped) which means that I can write
SELECT * FROM table WHERE value = ?my_value;
and (after loading it into R) interpolate it using sqlInterpolate(ANSI(), query, value = "stackoverflow").
What happens now is I want to use something like this
SELECT count(*) FROM ?my_table;
but how can I make it work? sqlInterpolate() only interpolates safely by default. Is there a workaround?
Thanks
In ?DBI::SQL, you can read:
By default, any user supplied input to a query should be escaped using
either dbQuoteIdentifier() or dbQuoteString() depending on whether it
refers to a table or variable name, or is a literal string.
Also, on this page:
You may also need dbQuoteIdentifier() if you are creating tables or
relying on user input to choose which column to filter on.
So you can use:
sqlInterpolate(ANSI(),
"SELECT count(*) FROM ?my_table",
my_table = dbQuoteIdentifier(ANSI(), "table_name"))
# <SQL> SELECT count(*) FROM "table_name"
sqlInterpolate() is for substituting values only, not other components like table names. You could use other templating frameworks such as brew or whisker.

BigQuery error: Cannot query the cross product of repeated fields

I am running the following query on Google BigQuery web interface, for data provided by Google Analytics:
SELECT *
FROM [dataset.table]
WHERE
  hits.page.pagePath CONTAINS "my-fun-path"
I would like to save the results into a new table, however I am obtaining the following error message when using Flatten Results = False:
Error: Cannot query the cross product of repeated fields
customDimensions.value and hits.page.pagePath.
This answer implies that this should be possible: Is there a way to select nested records into a table?
Is there a workaround for the issue found?
Depending on what kind of filtering is acceptable to you, you may be able to work around this by switching to OMIT IF from WHERE. It will give different results, but, again, perhaps such different results are acceptable.
The following will remove entire hit record if (some) page inside of it meets criteria. Note two things here:
it uses OMIT hits IF, instead of more commonly used OMIT RECORD IF).
The condition is inverted, because OMIT IF is opposite of WHERE
The query is:
SELECT *
FROM [dataset.table]
OMIT hits IF EVERY(NOT hits.page.pagePath CONTAINS "my-fun-path")
Update: see the related thread, I am afraid this is no longer possible.
It would be possible to use NEST function and grouping by a field, but that's a long shot.
Using flatten call on the query:
SELECT *
FROM flatten([google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910],customDimensions)
WHERE
  hits.page.pagePath CONTAINS "m"
Thus in the web ui:
setting a destination table
allowing large results
and NO flatten results
does the job correctly and the produced table matches the original schema.
I know - it is old ask.
But now it can be achieved by just using standard SQL dialect instead of Legacy
#standardSQL
SELECT t.*
FROM `dataset.table` t, UNNEST(hits.page) as page
WHERE
  page.pagePath CONTAINS "my-fun-path"

Can select individual columns but not select *

I have a SQLite DB that have a corrupted table somehow. I can't access the table in my sqlite gui tool but when I query the same table with just a single column, I get the value fine.
So this do NOT work(SQL Error: SQL logic error or missing database):
select * from table
This DOES work:
select column from table
Result back from this query is the value of column. Perfectly normal result.
Any suggestions how this can happen? Any suggestion for any analysis tool for the actual file?
Seems that one of the columns were corrupt and prevented the select * to fully work. But the question remains, what can possibly be written to this column that prevents a select?

sqlite Query optimisation

The query
SELECT * FROM Table WHERE Path LIKE 'geo-Africa-Egypt-%'
can be optimized as:
SELECT * FROM Table WHERE Path >= 'geo-Africa-Egypt-' AND Path < 'geo-Africa-Egypt-zzz'
But how can be this done:
select * from foodDb where Food LIKE '%apples%";
how this can be optimized?
One option is redundant data. If you're querying a lot for some fixed set of strings occuring in the middle of some column, add another column that contains the information whether a particular string can be found in the other column.
Another option, for arbitrary but still tokenizable strings is to create a dictionary table where you have the tokens (e.g. apples) and foreign key references to the actual table where the token occurs.
In general, sqlite is by design not very good at full text searches.
It would surprise me if it was faster, but you could try GLOB instead of LIKE and compare;
SELECT * FROM foodDb WHERE Food GLOB '*apples*';

Resources