Pipe "|" symbol in MariaDB EXPLAIN - mariadb

I ran an EXPLAIN on a slow (2 minutes to return 2 sorted results) query in MariaDB, and some of the returned columns contain multiple values separated by a "|" symbol.
When using a better index (same query running in 20ms), EXPLAIN returns similar values but separated by a comma.
I spent the last hour looking for any kind of reference online, both in the MariaDB and MySQL documentation (since I'm not sure it's MariaDB-specific), but nothing relevant came up - not even a SO question.
Do you know what the "|" symbol means in this context? Considering the time difference vs the comma-separated result it feels like a combinatory operator, but adding "combinatory" or "exponential" as google search key didn't provide any additional insight.
EXPLAIN EXTENDED followed by SHOW WARNINGS didn't provide any additional insight either.
Return fields examples:
TYPE: ref|filter
KEY: key1|key2
KEY_LEN: 9|9
Rows: 2 (0%)
Extra: Using where; Using rowid filter
Thank you for any input!
EDIT: for additional context, here's the hibernate-generated query that produces the result above:
select * from things this_ left outer join rel_tab rt_ on this_.id=rt_.thing_id left outer join tab2 t2_ on this_.id=t2_.thing_id where this_.filter1=123 and this_.filter2=456 and this_.filter3=1 order by this_.id desc limit 20;
I also updated the explain plan result above with the filter selectivity.

Related

Count with limit and offset in sqlite

am am trying to write a function in python to use sqlite and while I managed to get it to work there is a behavior in sqlite that I dont understand when using the count command. when I run the following sqlite counts as expected, ie returns an int.
SELECT COUNT (*) FROM Material WHERE level IN (?) LIMIT 10
however when I add, shown below, an offset to the end sqlite returns an emply list, in other words nothing.
SELECT COUNT (*) FROM Material WHERE level IN (?) LIMIT 10 OFFSET 82
while omitting the offset is an easy fix I don't understand why sqlite returns nothing. Is this the expected behavior for the command I gave?
thanks for reading
When you execute that COUNT(*) it will return you only a single row.
The LIMIT function is for limiting the number of rows returned. You are setting the limit to 10 which doesn't have any effect here (Because it is returning only a single row).
OFFSET is for offsetting/skipping specified number of rows. Which also doesn't have any effect here.
In simple terms your query translates to COUNT number of rows, then return 10 rows starting from 83rd position. Since you've a single row it will always return empty.
Read about LIMIT and OFFSET

Snowflake, Python/Jupyter analysis

I am new to Snowflake, and running a query to get a couple of day's data - this returns more than 200 million rows, and take a few days. I tried running the same query in Jupyter - and the kernel restars/dies before the query ends. Even if it got into Jupyter - I doubt I could analyze the data in any reasonable timeline (but maybe using dask?).
I am not really sure where to start - I am trying to check the data for missing values, and my first instinct was to use Jupyter - but I am lost at the moment.
My next idea is to stay within Snowflake - and check the columns there with case statements (e.g. sum(case when column_value = '' then 1 else 0 end) as number_missing_values
Does anyone have any ideas/direction I could try - or know if I'm doing something wrong?
Thank you!
not really the answer you are looking for but
sum(case when column_value = '' then 1 else 0 end) as number_missing_values`
when you say missing value, this will only find values that are an empty string
this can also be written is a simpler form as:
count_if(column_value = '') as number_missing_values
The data base already knows how many rows are in a column, and it knows how many null columns there are. If loading data into a table, it might make more sense to not load empty strings, and use null instead then, for not compute cost you can run:
count(*) - count(column) as number_empty_values
also of note, if you have two tables in snowflake you can compare the via the MINUS
aka
select * from table_1
minus
select * from table_2
is useful to find missing rows, you do have to do it in both directions.
Then you can HASH rows, or hash the whole table via HASH_AGG
But normally when looking for missing data, you have an external system, so the driver is 'what can that system handle' and finding common ground.
Also in the past we where search for bugs in our processing that cause duplicate data (where we needed/wanted no duplicates) so then the above, and COUNT DISTINCT like commands come in useful.

Can sqlite-utils convert function select two columns?

I'm using sqlite-utils to load a csv into sqlite which will later be served via Datasette. I have two columns, likes and dislikes. I would like to have a third column, quality-score, by adding likes and dislikes together then dividing likes by the total.
The sqlite-utils convert function should be my best bet, but all I see in the documentation is how to select a single column for conversion.
sqlite-utils convert content.db articles headline 'value.upper()'
From the example given, it looks like convert is followed by the db filename, the table name, then the col you want to operate on. Is it possible to simply add another col name or is there a flag for selecting more than one column to operate on? I would be really surprised if this wasn't possible, I just can't find any documentation to support it.
This isn't a perfect answer as it doesn't resolve whether sqlite-utils supports multiple column selection for transforms, but this is how I solved this particular problem.
Since my quality_score column would just be basic math, I was able to make use of sqlite's Generated Columns. I created a file called quality_score.sql that contained:
ALTER TABLE testtable
ADD COLUMN quality_score GENERATED ALWAYS AS (likes /(likes + dislikes));
and then implemented it by:
$ sqlite3 mydb.db < quality_score.sql
You do need to make sure you are using a compatible version of sqlite, as this only works with version 3.31 or later.
Another consideration is to make sure you are performing math on integers or floats and not text.
Also attempted to create the table with the virtual generated column first then fill it with my data later, but that didn't work in my case - it threw an error that said the number of items provided didn't match the number of columns available. So I just stuck with the ALTER operation after the fact.

Mondrian: Cannot seem to get Aggregation Tables to be used

I have been struggling to get aggregation tables to work. Here is what my fact table looks like:
employment_date_id
dimension1_id
dimension2_id
dimension3_id
dimension4
dimension5
measure1
measure2
measure3
I'm collapsing the employment_date_id from year, quarter, and month to include just the year, but every other column is included. This is what my aggregation table looks like:
yearquartermonth_year
dimension1_id
dimension2_id
dimension3_id
dimension4
dimension5
measure1
measure2
measure3
fact_count
I'm only collapsing the year portion of the date. The remaining fields are left as is. Here is my configuration:
<AggFactCount column="FACT_COUNT"/>
<AggForeignKey factColumn="dimension1_id" aggColumn="dimension1_id"/>
<AggForeignKey factColumn="dimension2_id" aggColumn="dimension2_id"/>
<AggForeignKey factColumn="dimension3_id" aggColumn="dimension3_id"/>
<AggMeasure name="[Measures].[measure1]" column="measure1"/>
<AggMeasure name="[Measures].[measure2]" column="measure2"/>
<AggMeasure name="[Measures].[measure3]" column="measure3"/>
<AggLevel name="[dimension4].[dimension4]" column="dimension4"/>
<AggLevel name="[dimension5].[dimension5]" column="dimension5"/>
<AggLevel name="[EmploymentDate.yearQuarterMonth].[Year]" column="yearquartermonth_year"/>
I'm for the most part copying the 2nd example of aggregation tables from the documentation. Most of my columns are not collapsed into the table and are foreign keys to the dimension tables.
My query I'm trying to execute is something like:
select {[Measures].[measure1]} on COLUMNS, {[EmploymentDate.yearQuarterMonth].[Year]} on ROWS from Cube1
The problem is that when I debug it and turn on the logging I see bit keys that look like this:
AggStar:agg_year_employment
bk=0x00000000000000000000000000000000000000000000000111111111101111100000000000000000000000000000000000000000000000000000000000000000
fbk=0x00000000000000000000000000000000000000000000000000000001101111100000000000000000000000000000000000000000000000000000000000000000
mbk=0x00000000000000000000000000000000000000000000000111111110000000000000000000000000000000000000000000000000000000000000000000000000
And my query's bit pattern is:
Foreign columns bit key=0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001
Measure bit key= 0x00000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000
And so my aggregation table is skipped. However, these are the exact columns that are folded into the table. But the bit positions are off between the query's and the aggregation table's. The other thing I find strange is that a portion of the columns is collapsed into the table, but all AggForeignKeys aren't included as bits so if I make a query with those columns this aggregation table will get skipped? That's counter to what I had planned. My plan was as long as you are making a query on year boundaries use this aggregation table.
I don't understand why this isn't working and why it fails to build the bit keys properly. I've tried debugging mondrian code, but figuring out which column maps to which position in the bit keys is not obvious. I feel like this shouldn't be this hard, but everything out there doesn't really explain this very well. And this aggregation table architecture is really to break.
What am I doing wrong? And why doesn't my solution work?
Update Here is my mondrian.properties file:
mondrian.jdbcDrivers=com.mysql.jdbc.Driver,oracle.jdbc.driver.OracleDriver
mondrian.rolap.generate.formatted.sql=true
mondrian.rolap.localePropFile=locale.properties
mondrian.rolap.aggregates.Use=true
mondrian.rolap.aggregates.Read=true
mondrian.trace.level=2
mondrian.drillthrough.enable=true
Might be the case mondrian­.­rolap­.­aggregates­.­Read is set to true and mondrian.rolap.aggregates.Use is set to false.
Please set mondrian.rolap.aggregates.Use=true and check.
Reference: http://mondrian.pentaho.com/documentation/configuration.php
If this is not the case, please attach all the properties related to aggregate tables and the complete cube definition XML.

sqlite subqueries with group_concat as columns in select statements

I have two tables, one contains a list of items which is called watch_list with some important attributes and the other is just a list of prices which is called price_history. What I would like to do is group together 10 of the lowest prices into a single column with a group_concat operation and then create a row with item attributes from watch_list along with the 10 lowest prices for each item in watch_list. First I tried joins but then I realized that the operations where happening in the wrong order so there was no way I could get the desired result with a join operation. Then I tried the obvious thing and just queried the price_history for every row in the watch_list and just glued everything together in the host environment which worked but seemed very inefficient. Now I have the following query which looks like it should work but it's not giving me the results that I want. I would like to know what is wrong with the following statement:
select w.asin,w.title,
(select group_concat(lowest_used_price) from price_history as p
where p.asin=w.asin limit 10)
as lowest_used
from watch_list as w
Basically I want the limit operation to happen before group_concat does anything but I can't think of a sql statement that will do that.
Figured it out, as somebody once said "All problems in computer science can be solved by another level of indirection." and in this case an extra select subquery did the trick:
select w.asin,w.title,
(select group_concat(lowest_used_price)
from (select lowest_used_price from price_history as p
where p.asin=w.asin limit 10)) as lowest_used
from watch_list as w

Resources