Key, Value Count in BigQuery - dictionary

Implementation of the following problems in BigQuery:
I have this following dictionary in JSON format. How can I count total number of key, value inside id dictionary?
{"fil":{"property":{"id":{id_1:"a",id_2:"b",id_3:"c",id_4:"d"}}}}
The value "a" can appear in any of the ids (id_1,...,id_5) in multiple such dictionaries. Need to calculate number of times "a" has appeared in any of the ids in any of the dictionaries.

For 1., using standard SQL (uncheck the "Use Legacy SQL" box under "Show Options") you can use the comma operator to take the cross product of the table and the repeated field:
WITH MyTable AS (
SELECT STRUCT(STRUCT(ARRAY<STRUCT<key STRING, value STRING>>[('id_1', 'a'), ('id_2', 'b'), ('id_3', 'c'), ('id_4', 'd')] AS id) AS property) AS fil
UNION ALL SELECT STRUCT(STRUCT(ARRAY<STRUCT<key STRING, value STRING>>[('id_1', 'b'), ('id_3', 'e')] AS id) AS property) AS fil
UNION ALL SELECT STRUCT(STRUCT(ARRAY<STRUCT<key STRING, value STRING>>[] AS id) AS property) AS fil
UNION ALL SELECT STRUCT(STRUCT(ARRAY<STRUCT<key STRING, value STRING>>[('id_4', 'a'), ('id_2', 'c')] AS id) AS property) AS fil)
SELECT
COUNT(DISTINCT id.key) AS num_keys,
COUNT(DISTINCT id.value) AS num_values
FROM MyTable t, t.fil.property.id AS id;
+----------+------------+
| num_keys | num_values |
+----------+------------+
| 4 | 5 |
+----------+------------+
Using legacy SQL, you can accomplish something similar using EXACT_COUNT_DISTINCT (you probably won't need to flatten), although it's harder to set up an inline example.
For 2., you can apply a similar approach using standard SQL of flattening and then counting the number of occurrences of "a" with COUNTIF(id.value = "a"). In legacy SQL, alternatively, you can use COUNT(t.fil.property.id.value = "a").

Assuming you have your dictionaries stored in Your Table as string in field named json
The key for answer is below query.
It parses json field and extract all the key/value pair along with their parent (dictionary name)
SELECT parent, key, value
FROM JS((
SELECT json FROM
(SELECT '{"fil":{"property":{"id":{"id_1":"a","id_2":"b","id_3":"c","id_4":"d"}}}}' AS json),
(SELECT '{"fil":{"property":{"type":{"id_1":"x","id_2":"a","id_3":"y","id_4":"z"}, "category":{"id_1":"v","id_2":"w","id_3":"a","id_4":"b"}}}}' AS json)
),
json, // Input columns
"[{name: 'parent', type:'string'}, // Output schema
{name: 'key', type:'string'},
{name: 'value', type:'string'}]",
"function(r, emit) { // The function
x = JSON.parse(r.json);
processKey(x, '');
function processKey(node, parent) {
Object.keys(node).map(function(key) {
value = node[key].toString();
if (value !== '[object Object]') {
emit({parent:parent, key:key, value:value});
} else {
if (parent !== '' && parent.substr(parent.length-1) !== '.') {parent += '.'};
processKey(node[key], parent + key);
};
});
};
}"
)
Result of above query is as below
parent key value
fil.property.id id_1 a
fil.property.id id_2 b
fil.property.id id_3 c
fil.property.id id_4 d
fil.property.type id_1 x
fil.property.type id_2 a
fil.property.type id_3 y
fil.property.type id_4 z
fil.property.category id_1 v
fil.property.category id_2 w
fil.property.category id_3 a
fil.property.category id_4 b
From there, you can easily get both answers:
Q1: How can I count total number of key, value inside "id" (each) dictionary
SELECT parent, COUNT(1) AS key_value_pairs
FROM JS((
SELECT json FROM
(SELECT '{"fil":{"property":{"id":{"id_1":"a","id_2":"b","id_3":"c","id_4":"d"}}}}' AS json),
(SELECT '{"fil":{"property":{"type":{"id_1":"x","id_2":"a","id_3":"y","id_4":"z"}, "category":{"id_1":"v","id_2":"w","id_3":"a","id_4":"b"}}}}' AS json)
),
json, // Input columns
"[{name: 'parent', type:'string'}, // Output schema
{name: 'key', type:'string'},
{name: 'value', type:'string'}]",
"function(r, emit) { // The function
x = JSON.parse(r.json);
processKey(x, '');
function processKey(node, parent) {
Object.keys(node).map(function(key) {
value = node[key].toString();
if (value !== '[object Object]') {
emit({parent:parent, key:key, value:value});
} else {
if (parent !== '' && parent.substr(parent.length-1) !== '.') {parent += '.'};
processKey(node[key], parent + key);
};
});
};
}"
)
GROUP BY parent
result is
parent key_value_pairs
fil.property.id 4
fil.property.type 4
fil.property.category 4
Q2: Need to calculate number of times "a" (any value) has appeared in any of the ids in any of the dictionaries.
SELECT value, COUNT(1) AS value_appearances
FROM JS((
SELECT json FROM
(SELECT '{"fil":{"property":{"id":{"id_1":"a","id_2":"b","id_3":"c","id_4":"d"}}}}' AS json),
(SELECT '{"fil":{"property":{"type":{"id_1":"x","id_2":"a","id_3":"y","id_4":"z"}, "category":{"id_1":"v","id_2":"w","id_3":"a","id_4":"b"}}}}' AS json)
),
json, // Input columns
"[{name: 'parent', type:'string'}, // Output schema
{name: 'key', type:'string'},
{name: 'value', type:'string'}]",
"function(r, emit) { // The function
x = JSON.parse(r.json);
processKey(x, '');
function processKey(node, parent) {
Object.keys(node).map(function(key) {
value = node[key].toString();
if (value !== '[object Object]') {
emit({parent:parent, key:key, value:value});
} else {
if (parent !== '' && parent.substr(parent.length-1) !== '.') {parent += '.'};
processKey(node[key], parent + key);
};
});
};
}"
)
GROUP BY value
value value_appearances
a 3
b 2
c 1
d 1
x 1
y 1
z 1
v 1
w 1

As other answers were to hard for me I made regular expression that works for string:int dict
SELECT
*, REGEXP_EXTRACT_ALL(my_dict_column, r'"(\w+": \d+)') as keys
FROM test.test_table
From that you can do keys, values and etc

Related

Filler word for SQLite statement to return any and all rows using WHERE [duplicate]

I am doing my crm project with SQLITE+FLASK. And I need a feature is let user to input the condition to filer the result.
I hope that my SQL statement can ignore the WHERE condition if the parameter is space or null.
For example, My input is "NAME", "AGE", "GENDER"
so my statement will be
SELECT *
FROM CUSTOMER
WHERE NAME = 'James' AND AGE = '25' AND GENDER = 'M'
But I hope that if user did not enter "NAME" my SQL statement can be something like the code below
SELECT *
FROM CUSTOMER
WHERE AGE = '25' AND GENDER = 'M'
I know maybe I can do this with string concat, but I hope I can do this by SQL statement.
You can do it with the OR operator for each of the columns, by checking also if the parameter value that you pass is NULL or a string with spaces:
SELECT *
FROM CUSTOMER
WHERE (NAME = :name OR TRIM(COALESCE(:name, '')) = '')
AND (AGE = :age OR TRIM(COALESCE(:age, '')) = '')
AND (GENDER = :gender OR TRIM(COALESCE(:gender, '')) = '')
You can use null condition as follows:
SELECT *
FROM CUSTOMER
WHERE (NAME = :name_input or :name_input is null)
AND (AGE = :age_input or :age_input is null)
AND (GENDER = :gender_input or :gender_input is null)

Hot to use custom SQL functions in the WHERE clause

I'm trying to make a where clause with custom functions applied to the columns using Knex.js.
Suppose I have a table named tableName with columns named col1, col2, col3 and a function f that receives as parameter something that is the same type of the things that are in col1 and col2.
I also have two variables named var1 and var2 (defined beforehand) that are the same type of the thing returned by f. I tried some ways.
Example 1:
let rows = knexClient("tableName").whereRaw('f(?) <= ${var1} AND f(?) >= ${var2}', [col1, col2]).then((rows) => {
for (row of rows) {
console.log('${row["col1"]} ${row["col2"]} ${row["col3"]}');
}
}).catch((err) => {
console.log(err);
throw err;
});
This gives the following error:
ReferenceError: col1 is not defined.
Example 2:
let rows = knexClient("tableName").whereRaw("f(col1) <= ? AND f(col2) >= ?", [var1, var2]).then((rows) => {
for (row of rows) {
console.log('${row["col1"]} ${row["col2"]} ${row["col3"]}');
}
}).catch((err) => {
console.log(err);
throw err;
});
This gives the following error:
SQLITE_ERROR: no such column: col1] {
errno: 1,
code: 'SQLITE_ERROR'
}
What is the right way to do it? I have searched around and saw some people doing things similar to my first try here. But it didn't work for me.
You can't bind column names dynamically (i.e. using ? placeholders), that only works for values.
The following:
var var1 = 10, var2 = 20;
knex("tableName")
.whereRaw("f(col1) <= ?", var1)
.whereRaw("f(col2) => ?", var2)
.select();
results in generated SQL like this:
select
*
from
tableName
where
f(col1) <= 10
and f(col2) => 20
If you have variables that contain the target column names, you need to format them in yourself:
var col1 = "some_col", col2 = "other_col";
var var1 = 10, var2 = 20;
knex("tableName")
.whereRaw(`f(${col1}) <= ?`, var1)
.whereRaw(`f(${col2}) => ?`, var2)
.select();
which produces
select
*
from
tableName
where
f(some_col) <= 10
and f(other_col) => 20

Entity Framework query joins and group by issue

Please correct the query
IN PL/SQL
SELECT a.MENU_ID, a.menu_label, a.menu_value
FROM tbl_ims_menu a, TBL_IMS_ROLE_ASSIGNED_MENU b,TBL_IMS_USER_ROLE_PRIVILEGES c
WHERE a.menu_id = b.menu_id AND b.urole_id = c.granted_role
AND c.user_id = '3' AND a.menu_master <> '0'
AND a.menu_status = 'Active'
GROUP BY a.menu_id, a.menu_label, a.menu_value
query is working fine there is some issue when rewrite in Entity framework
check the following query
List<TBL_IMS_MENU> listSubMenu = (from m in db.TBL_IMS_MENU
join ra in db.TBL_IMS_ROLE_ASSIGNED_MENU on m.MENU_ID
equals ra.MENU_ID
join rp in db.TBL_IMS_USER_ROLE_PRIVILEGES on ra.UROLE_ID
equals rp.GRANTED_ROLE
where rp.USER_ID == UserID
group m by m.MENU_ID
into g select g).ToList();
if I used Var instead of List then how to fire loop?
I think you need to remove your join statements - and just use the where like you do in raw SQL query:
var qry = (from a in db.TBL_IMS_MENU
from b in db.TBL_IMS_ROLE_ASSIGNED_MENU
from c in db.TBL_IMS_USER_ROLE_PRIVILEGES
where c.USER_ID == UserID
where b.UROLE_ID == c.GRANTED_ROLE
where a.MENU_ID == b.MENU_ID
where a.menu_status == "Active"
where a.menu_master != "0"
select a)
.GroupBy(c => c.menu_id)
.ThenBy(c => c.menu_label)
.ThenBy(c => c.menu_value)
.ToList();
Try something like this:
var listSubMenu = (from m in db.TBL_IMS_MENU
join ra in db.TBL_IMS_ROLE_ASSIGNED_MENU on m.MENU_ID
equals ra.MENU_ID
join rp in db.TBL_IMS_USER_ROLE_PRIVILEGES on ra.UROLE_ID
equals rp.GRANTED_ROLE
where rp.USER_ID == UserID
group m by new { m.MENU_ID, m.menu_label, m.menu_value }
into g select g).ToList();
foreach(var groupItem in listSubMenu)
{
// go through groups like this - groupItem.Key.MENU_ID
foreach(var menuItem in grouItem)
{
//go through each item in group like this - menuItem.GRANTED_ROLE
}
}

How do you write a LINQ query that filters a sub table to a specific time period and sums the results of the sub table?

That is the most efficient way to perform a left outer join in LINQ if I must do the following...
Filter Table 2 by a beginning and ending date.
All rows in Table 1 must remain, even if the filtering of Table 2 returns no rows.
The result must be grouped so that the columns from Table 2 get summed.
For example (example code variable names changed for propietary reasons), supposed I have a database with two tables. Table 1 has a list of doors with a building code, door ID and current status (open or closed) - the building code and door ID are the primary key. Table 2 has a list of events for all doors (an event is an opening or closing) plus a timestamp. So the columns are building code, door ID, timestamp, opening, closing. Opening and closing are integers with a 1 in the column for the appropriate event. There is a foreign key relationship between the two tables on the building code and door ID.
For my query I need to return a list of all the unique doors with the current door status and a sum of all the opening and closing events for a selected time period. An entry must be returned for each door, even if no events occured during the selected time period.
Below is the best LINQ code I could come up with. It works, but it seems really inefficient and hard to understand. How would you make it more efficient and easier to understand?
var query =
from doors in Context.Doors
join fevents in
(
from events in db.Events
where events.TimeStamp >= date1 && events.TimeStamp <= date2
select new { events.BuildingCode, events.DoorID, events.TimeStamp, events.Opening, events.Closing }
)
on new { doors.BuildingCode, doors.DoorID } equals { fevents.BuildingCode, fevents.DoorID }
into g1
from c in g1.DefaultIfEmpty()
group c by new
{
doors.BuildingCode,
doors.DoorID,
doors.DoorStatus
} into g2
select new
{
BuildingCode = g2.Key.BuildingCode,
DoorID = g2.Key.DoorID,
Status = g2.Key.DoorStatus
NumOpenings = g2.Sum(i => (i == null ? 0 : i.Opening)),
NumClosings = g2.Sum(i => (i == null ? 0 : i.Closing))
};
I think this is slightly easier to read
var query =
from doors in Context.Doors
from c in db.Events
.Where(events => doors.BuildingCode == events.BuildingCode)
.Where(events => doors.DoorID == events.DoorID)
.Where(events => events.TimeStamp >= date1 && events.TimeStamp <= date2)
.Select(events => new { events.BuildingCode, events.DoorID, events.TimeStamp, events.Opening, events.Closing })
.DefaultIfEmpty()
group c by new
{
doors.BuildingCode,
doors.DoorID,
doors.DoorStatus
} into g2
select new
{
BuildingCode = g2.Key.BuildingCode,
DoorID = g2.Key.DoorID,
Status = g2.Key.DoorStatus
NumOpenings = g2.Sum(i => (i == null ? 0 : i.Opening)),
NumClosings = g2.Sum(i => (i == null ? 0 : i.Closing))
};
The answer from #adducci helped me come up with a slightly different solution that I think is even more readable, albeit possibly less efficient.
var query =
from doors in Context.Doors
from events in doors.Events
.Where(i => i.TimeStamp >= date1 && i.TimeStamp <= date2)
.DefaultIfEmpty()
group new { doors, events }
by doors into g
select new
{
BuildingCode = g.Key.BuildingCode,
DoorID = g.Key.DoorID,
Status = g.Key.DoorStatus,
NumOpenings = g.Sum(i => (i.events == null ? 0 : i.events.Opening)),
NumClosings = g.Sum(i => (i.events == null ? 0 : i.events.Closing))
};
Note that an alternative method for filtering by date would be directly in the summing function, as below, but this is much less efficient since all records would be retrieved from the database and then filtered locally.
...
//from events in doors.Events
// .Where(i => i.TimeStamp >= date1 && i.TimeStamp <= date2)
// .DefaultIfEmpty()
from events in doors.Events
.DefaultIfEmpty()
...
NumOpenings = g.Sum(i => (i.events == null ? 0 : (i.events.Timestamp >= date1 && i.events.TimeStamp <= date2) ? i.events.Opening : 0)),
NumClosings = g.Sum(i => (i.events == null ? 0 : (i.events.Timestamp >= date1 && i.events.TimeStamp <= date2) ? i.events.Closing : 0))
...

How to get a list of column names

Is it possible to get a row with all column names of a table like this?
|id|foo|bar|age|street|address|
I don't like to use Pragma table_info(bla).
SELECT sql FROM sqlite_master
WHERE tbl_name = 'table_name' AND type = 'table'
Then parse this value with Reg Exp (it's easy) which could looks similar to this: [(.*?)]
Alternatively you can use:
PRAGMA table_info(table_name)
If you are using the command line shell to SQLite then .headers on before you perform your query. You only need to do this once in a given session.
You can use pragma related commands in sqlite like below
pragma table_info("table_name")
--Alternatively
select * from pragma_table_info("table_name")
If you require column names like id|foo|bar|age|street|address, basically your answer is in below query.
select group_concat(name,'|') from pragma_table_info("table_name")
Yes, you can achieve this by using the following commands:
sqlite> .headers on
sqlite> .mode column
The result of a select on your table will then look like:
id foo bar age street address
---------- ---------- ---------- ---------- ---------- ----------
1 val1 val2 val3 val4 val5
2 val6 val7 val8 val9 val10
This helps for HTML5 SQLite:
tx.executeSql('SELECT name, sql FROM sqlite_master WHERE type="table" AND name = "your_table_name";', [], function (tx, results) {
var columnParts = results.rows.item(0).sql.replace(/^[^\(]+\(([^\)]+)\)/g, '$1').split(','); ///// RegEx
var columnNames = [];
for(i in columnParts) {
if(typeof columnParts[i] === 'string')
columnNames.push(columnParts[i].split(" ")[0]);
}
console.log(columnNames);
///// Your code which uses the columnNames;
});
You can reuse the regex in your language to get the column names.
Shorter Alternative:
tx.executeSql('SELECT name, sql FROM sqlite_master WHERE type="table" AND name = "your_table_name";', [], function (tx, results) {
var columnNames = results.rows.item(0).sql.replace(/^[^\(]+\(([^\)]+)\)/g, '$1').replace(/ [^,]+/g, '').split(',');
console.log(columnNames);
///// Your code which uses the columnNames;
});
Use a recursive query. Given
create table t (a int, b int, c int);
Run:
with recursive
a (cid, name) as (select cid, name from pragma_table_info('t')),
b (cid, name) as (
select cid, '|' || name || '|' from a where cid = 0
union all
select a.cid, b.name || a.name || '|' from a join b on a.cid = b.cid + 1
)
select name
from b
order by cid desc
limit 1;
Alternatively, just use group_concat:
select '|' || group_concat(name, '|') || '|' from pragma_table_info('t')
Both yield:
|a|b|c|
The result set of a query in PHP offers a couple of functions allowing just that:
numCols()
columnName(int $column_number )
Example
$db = new SQLIte3('mysqlite.db');
$table = 'mytable';
$tableCol = getColName($db, $table);
for ($i=0; $i<count($tableCol); $i++){
echo "Column $i = ".$tableCol[$i]."\n";
}
function getColName($db, $table){
$qry = "SELECT * FROM $table LIMIT 1";
$result = $db->query($qry);
$nCols = $result->numCols();
for ($i = 0; $i < $ncols; $i++) {
$colName[$i] = $result->columnName($i);
}
return $colName;
}
$<?
$db = sqlite_open('mysqlitedb');
$cols = sqlite_fetch_column_types('form name'$db, SQLITE_ASSOC);
foreach ($cols as $column => $type) {
echo "Column: $column Type: $type\n";
}
Using #Tarkus's answer, here are the regexes I used in R:
getColNames <- function(conn, tableName) {
x <- dbGetQuery( conn, paste0("SELECT sql FROM sqlite_master WHERE tbl_name = '",tableName,"' AND type = 'table'") )[1,1]
x <- str_split(x,"\\n")[[1]][-1]
x <- sub("[()]","",x)
res <- gsub( '"',"",str_extract( x[1], '".+"' ) )
x <- x[-1]
x <- x[-length(x)]
res <- c( res, gsub( "\\t", "", str_extract( x, "\\t[0-9a-zA-Z_]+" ) ) )
res
}
Code is somewhat sloppy, but it appears to work.
Try this sqlite table schema parser, I implemented the sqlite table parser for parsing the table definitions in PHP.
It returns the full definitions (unique, primary key, type, precision, not null, references, table constraints... etc)
https://github.com/maghead/sqlite-parser
Easiest way to get the column names of the most recently executed SELECT is to use the cursor's description property. A Python example:
print_me = "("
for description in cursor.description:
print_me += description[0] + ", "
print(print_me[0:-2] + ')')
# Example output: (inp, output, reason, cond_cnt, loop_likely)

Resources