Cassandra filtering map and other collections - collections

I am trying to do the following.
Connected to DevCluster
[cqlsh 5.0.1 | Cassandra 3.10.0.1695 | DSE 5.1.1 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
user#cqlsh:test> desc table del28;
CREATE TABLE test.del28 (
sno int PRIMARY KEY,
dob date,
name range_dates,
ssss_details map<text, date>,
ssss_range map<text, frozen<map<date, date>>> );
CREATE INDEX idx_ssss_range ON test.del28 (keys(ssss_range));
CREATE INDEX ssss_details_idx ON test.del28 (values(ssss_details));
CREATE INDEX ssss_range_idx ON test.del28 (values(ssss_range));
user#cqlsh:test> select * from del28;
sno | dob | name | ssss_details | ssss_range
-----+------+--------------------------------------+----------------------------------------------+---------------------------------
5 | null | {start: 2014-03-05, end: 2018-04-05} | {'hello': 2014-05-05} | {'1': {2018-04-05: 2012-02-05}}
8 | null | {start: 2018-03-04, end: 2018-08-02} | {'hello8': 2018-08-08} | {'8': {2018-08-08: 2012-02-08}}
2 | null | {start: 2018-03-04, end: 2018-05-05} | {'hello': 2018-05-05} | {'1': {2018-07-08: 2018-09-01}}
4 | null | {start: 2014-03-04, end: 2018-04-02} | {'hello1': 2014-05-02} | {'1': {2018-04-08: 2012-02-04}}
7 | null | {start: 2014-03-04, end: 2018-04-02} | {'hello4': 2014-05-03, 'hello5': 2014-05-02} | {'2': {2018-04-08: 2012-02-04}}
6 | null | {start: 2014-03-04, end: 2018-04-02} | {'hello2': 2014-05-02, 'hello3': 2014-05-03} | {'2': {2018-04-08: 2012-02-04}}
9 | null | {start: 2014-03-04, end: 2018-04-02} | {'hello7': 2014-05-02, 'hello8': 2014-05-03} | {'2': {2018-04-08: 2012-02-04}}
3 | null | {start: 2014-03-04, end: 2018-04-02} | {'hello': 2014-05-02} | {'1': {2018-04-08: 2012-02-04}}
(8 rows)
My question is, can I use Filters on ssss_range, if so how? If not what is the best way to save this data. Idea is, there is number or text followed by dates. Example house1: {2012-04-05: 2013-02-05}, house2:{2013-04-08: 2014-02-04}...... for one particular user and where dates are a set and , explaining that person stayed on these times. I tried to split the dates in 'name' column. Still it did not work for me. Now there is lot of other info regarding this record.
I should be able to query based on house1, house2 i.e. where aaa = 'house1', some thing like that. Also should be able query based on dates i.e. where from_date > '' and to_date < ''. Something like that.
I am okay to change the way data is changed if it can query a better way. Any type of collections or data types are fine.
Please suggest the right approach.
Thanks

Related

MariaDB DATETIME Index not working with Between FROM_UNIXTIME()

I have a table with DATETIME field, which is indexed by a BTree. Now i want to query it with following statement:
SELECT
count(us.CITY) as metric,
us.CITY as Name,
us.LATITUDE as latitude,
us.LONGITUDE as longitude
FROM
FACT
LEFT JOIN
USER us
ON
us.ID_USER = FACT.USER
WHERE
ASSESSMENT_DATE BETWEEN FROM_UNIXTIME(1601568552) AND FROM_UNIXTIME(1604028277)
GROUP BY us.CITY, us.LATITUDE, us.LONGITUDE;
EXPLAIN:
+------+-------------+-------+--------+----------------------------+---------+---------+------------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+--------+----------------------------+---------+---------+------------------------------+--------+----------------------------------------------+
| 1 | SIMPLE | FACT | ALL | INDEX_FACT_ASSESSMENT_DATE | NULL | NULL | NULL | 762621 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | us | eq_ref | PRIMARY | PRIMARY | 46 | dwh0.FACT.USER,dwh0.FACT.ENV | 1 | |
+------+-------------+-------+--------+----------------------------+---------+---------+------------------------------+--------+----------------------------------------------+
2 rows in set (0.001 sec)
Interestingly, by only changing the dates manually into the DATETIME Format string it uses the index. But the FROM_UNIXTIME() function should in my opinion return the exactly same thing...
SELECT
count(us.CITY) as metric,
us.CITY as Name,
us.LATITUDE as latitude,
us.LONGITUDE as longitude
FROM
FACT
LEFT JOIN
USER us
ON
us.ENV = FACT.ENV AND us.ID_USER = FACT.USER
WHERE
-- ASSESSMENT_DATE BETWEEN FROM_UNIXTIME(1596649101) AND FROM_UNIXTIME(1599108827)
ASSESSMENT_DATE BETWEEN '2020-08-05 11:30:11.987' AND '2020-09-03 11:30:11.987'
GROUP BY us.CITY, us.LATITUDE, us.LONGITUDE;
EXPLAIN:
+------+-------------+-------+--------+----------------------------+----------------------------+---------+------------------------------+--------+--------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
|
+------+-------------+-------+--------+----------------------------+----------------------------+---------+------------------------------+--------+--------------------------------------------------------+
| 1 | SIMPLE | FACT | range | INDEX_FACT_ASSESSMENT_DATE | INDEX_FACT_ASSESSMENT_DATE | 5 | NULL | 132008 | Using index condition; Using temporary; Using filesort |
| 1 | SIMPLE | us | eq_ref | PRIMARY | PRIMARY | 46 | dwh0.FACT.USER,dwh0.FACT.ENV | 1 |
|
+------+-------------+-------+--------+----------------------------+----------------------------+---------+------------------------------+--------+--------------------------------------------------------+
2 rows in set (0.001 sec)
Can anyone refer to such a problem? the where clause is generated by grafana, so i can not change that, but the rest i can change if it changes something.
Thanks for suggestions!
Sorry for bothering.. after around 10^5 more inserts, it works for both cases... Maybe it was just bad luck

Parse data in Kusto

I am trying to parse the below data in Kusto. Need help.
[[ObjectCount][LinkCount][DurationInUs]]
[ChangeEnumeration][[88][9][346194]]
[ModifyTargetInLive][[3][6][595903]]
Need generic implementation without any hardcoding.
ideally - you'd be able to change the component that produces source data in that format to use a standard format (e.g. CSV, Json, etc.) instead.
The following could work, but you should consider it very inefficient
let T = datatable(s:string)
[
'[[ObjectCount][LinkCount][DurationInUs]]',
'[ChangeEnumeration][[88][9][346194]]',
'[ModifyTargetInLive][[3][6][595903]]',
];
let keys = toscalar(
T
| where s startswith "[["
| take 1
| project extract_all(#'\[([^\[\]]+)\]', s)
);
T
| where s !startswith "[["
| project values = extract_all(#'\[([^\[\]]+)\]', s)
| mv-apply with_itemindex = i keys on (
extend Category = tostring(values[0]), p = pack(tostring(keys[i]), values[i + 1])
| summarize b = make_bag(p) by Category
)
| project-away values
| evaluate bag_unpack(b)
--->
| Category | ObjectCount | LinkCount | DurationInUs |
|--------------------|-------------|-----------|--------------|
| ChangeEnumeration | 88 | 9 | 346194 |
| ModifyTargetInLive | 3 | 6 | 595903 |

How to convert the jsonb datatype to other datatype in psql for amazon quicksight

My Psql database has a table which has jsonb as type for some columns, when i tried to upload these tables in amazon quicksight for some analysis purpose, am getting an error says unsupported datatype and the columns are getting skipped in amazom Quicksight.
Please help me to convert these into some supported type in amazon Quicksight.
Column | Type | Collation | Nullable | Default
---------------+-----------------------------+-----------+----------+-----------------------------------------------
id | bigint | | not null | nextval('solera_progresses_id_seq'::regclass)
milestones | jsonb | | |
reference_id | character varying | | |
response_code | integer | | |
activity | jsonb | | |
response | jsonb | | |
user_id | bigint | | |
You can use custom SQL to convert the data to the supported type before loading to Quicksight.
For instance, if your jsonb column contains objects like {"name": "John"}, you can create a column name in Quicksight using the query:
SELECT column_name->'name' AS name
FROM table_name

Cumulative count of occurrences per value in array in Kusto

I'm looking to get the count of query param usage from the query string from page views stored in app insights using KQL. My query currently looks like:
pageViews
| project parsed=parseurl(url)
| project keys=bag_keys(parsed["Query Parameters"])
and the results look like
with each row looking like
I'm looking to get the count of each value in the list when it is contained in the url in order to anwser the question "How many times does page appear in the querystring". So the results might look like:
Page | From | ...
1000 | 67 | ...
Thanks in advance
you could try something along the following lines:
datatable(url:string)
[
"https://a.b.c/d?p1=hello&p2=world",
"https://a.b.c/d?p2=world&p3=foo&p4=bar"
]
| project parsed = parseurl(url)
| project keys = bag_keys(parsed["Query Parameters"])
| mv-expand key = ['keys'] to typeof(string)
| summarize count() by key
which returns:
| key | count_ |
|-----|--------|
| p1 | 1 |
| p2 | 2 |
| p3 | 1 |
| p4 | 1 |

Getting App Maker to respect the order of a many-to-many relation

I'm having some trouble getting App Maker to respect the order of a many-to-many relation.
Let's say I have two models:
Model 1 has an ID and a many-to-many relation to model 2 which also has an ID.
App maker generates three tables:
DESCRIBE model_1;
+--------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+----------------+
| Id | int(11) | NO | PRI | NULL | auto_increment |
+--------------------+--------------+------+-----+---------+----------------+
DESCRIBE model_2;
+--------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+----------------+
| Id | int(11) | NO | PRI | NULL | auto_increment |
+--------------------+--------------+------+-----+---------+----------------+
DESCRIBE model_1_Has_model_2;
+------------------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+---------+------+-----+---------+-------+
| parentModel1_fk | int(11) | NO | MUL | NULL | |
| childModel2_fk | int(11) | NO | MUL | NULL | |
+------------------+---------+------+-----+---------+-------+
Now let's say I have a model_1 object with ID 1 and three model_2 objects with IDs 1, 2, 3. If I assign model_1.childModel_2 to [model_2_ID_1, model_2_ID_2] the model_1_Has_model_2 table will contain:
parentModel1_fk | childModel2_fk
--------------------------------
1 | 1
1 | 2
Now let's say I splice model_1.childModel_2 using model_1.childModel_2.splice(0, 1) and then insert model_2 ID 3 in index 0 using model_1.childModel_2.splice(0, 0, model_2_ID_3). I would expect my table to contain the following:
parentModel1_fk | childModel2_fk
--------------------------------
1 | 3
1 | 1
However it contains the opposite:
parentModel1_fk | childModel2_fk
--------------------------------
1 | 1
1 | 3
Is there any way I can stop this behavior short of clearing the entire relation and then setting it to my new expected order?
The short answer is no. App Maker is just creating a new record, not rearranging the table. Otherwise it would have to edit all the records below the desired insertion point (which could be a prohibitively time consuming transaction). If this is the desired functionality, you'll have to do it manually.
I would seriously consider creating your own join table that will allow you to have additional columns, where you can store the desired sort order.

Resources