TopK function over Distributed Engine In clickhouse returns only 10 records - olap

I'm running the following query
select topK(30)(Country) from distributed_table
note: distributed_table's engine is Distributed.
and even though there are over 100 possible "country" values, the query returns only 10.
Also, when I run it on local table , I'm getting more than 10 results.
Have I missed out some crucial configuration?

It looks like the problem occurs when intermediate results from shards are combined to the final result.
Let's check the results from each shard (will use distributed_group_by_no_merge-setting to disable the merging of intermediate results from each shard):
select any(_shard_num), topK(30)(Country)
from distributed_table
SETTINGS distributed_group_by_no_merge = 1
On each shard, the topK-function works correctly so as a workaround you can combine all intermediate results manually:
SELECT arrayDistinct(
arrayMap(x -> x.1,
/* sort values by frequency */
arraySort(x -> x.2,
/* converts an array of arrays to a flat array */
flatten(
/* group results from shards to one array */
groupArray(
/* assign each value the index number */
arrayMap((x, index) -> (x, index), shard_result, arrayEnumerate(shard_result))))))) ordered_value
FROM (
select topK(30)(Country) AS shard_result
from distributed_table
SETTINGS distributed_group_by_no_merge = 1)

Related

Create a perl hash from a db select

Having some trouble understanding how to create a Perl hash from a DB select statement.
$sth=$dbh->prepare(qq{select authorid,titleid,title,pubyear from books});
$sth->execute() or die DBI->errstr;
while(#records=$sth->fetchrow_array()) {
%Books = (%Books,AuthorID=> $records[0]);
%Books = (%Books,TitleID=> $records[1]);
%Books = (%Books,Title=> $records[2]);
%Books = (%Books,PubYear=> $records[3]);
print qq{$records[0]\n}
print qq{\t$records[1]\n};
print qq{\t$records[2]\n};
print qq{\t$records[3]\n};
}
$sth->finish();
while(($key,$value) = each(%Books)) {
print qq{$key --> $value\n};
}
The print statements work in the first while loop, but I only get the last result in the second key,value loop.
What am I doing wrong here. I'm sure it's something simple. Many thanks.
OP needs better specify the question and do some reading on DBI module.
DBI module has a call for fetchall_hashref perhaps OP could put it to some use.
In the shown code an assignment of a record to a hash with the same keys overwrites the previous one, row after row, and the last one remains. Instead, they should be accumulated in a suitable data structure.
Since there are a fair number of rows (351 we are told) one option is a top-level array, with hashrefs for each book
my #all_books;
while (my #records = $sth->fetchrow_array()) {
my %book;
#book{qw(AuthorID TitleID Title PubYear)} = #records;
push #all_books, \%book;
}
Now we have an array of books, each indexed by the four parameters.
This uses a hash slice to assign multiple key-value pairs to a hash.
Another option is a top-level hash with keys for the four book-related parameters, each having for a value an arrayref with entries from all records
my %books;
while (my #records = $sth->fetchrow_array()) {
push #{$books{AuthorID}}, $records[0];
push #{$books{TitleID}}, $records[1];
...
}
Now one can go through authors/titles/etc, and readily recover the other parameters for each.
Adding some checks is always a good idea when reading from a database.

Database row len() can be print, but this value can’t be shown in tkinter entry

I can”t understood, why I can print the value of rows, but not populate this to a tkinter entry.
My code:
cursor.execute(‘SELECT * FROM contacts;’)
print(‘row in table contacts:’,len(cursor.fetchall())) # prints 104
self.no_count.set(len(cursor.fetchall())) # populate 0
Any hint?
You should store the fetched data inside a variable and then access it through the variable. This is because a cursor is like a python generator, and once you use cursor.fetchall() the results will no longer contain the result again. So go for something like:
cursor.execute('SELECT * FROM contacts;')
data = cursor.fetchall() # Store in variable
print(f'row in table contacts: {len(data)}') # Used f strings instead of comma(can be ignored)
self.no_count.set(len(data))
Or you could also go for the inefficient way of repeating your query each time, like:
cursor.execute(‘SELECT * FROM contacts;’)
print(f‘row in table contacts: {len(cursor.fetchall())}')
cursor.execute(‘SELECT * FROM contacts;’) # Repeat the query
self.no_count.set(len(cursor.fetchall())) # Fetch again

How to get Corda transaction time?

I am using the following to get the transaction timestamp:
val outputStateRef = StateRef(ledgerTx.id, 0)
val queryCriteria = QueryCriteria.VaultQueryCriteria(stateRefs = listOf(outputStateRef))
val results = serviceHub.vaultService.queryBy<ContractState>(queryCriteria)
val recordedTime = results.statesMetadata.singleOrNull()?.recordedTime
The problem is the transaction time is not always returned, sometimes null is returned for the timestamp.
Why is this happening and how can I ensure the timestamp is always returned?
results is Vault.Page<ContractState> which contains the following variables:
/**
* Returned in queries [VaultService.queryBy] and [VaultService.trackBy].
* A Page contains:
* 1) a [List] of actual [StateAndRef] requested by the specified [QueryCriteria] to a maximum of [MAX_PAGE_SIZE].
* 2) a [List] of associated [Vault.StateMetadata], one per [StateAndRef] result.
* 3) a total number of states that met the given [QueryCriteria] if a [PageSpecification] was provided,
* otherwise it defaults to -1.
* 4) Status types used in this query: [StateStatus.UNCONSUMED], [StateStatus.CONSUMED], [StateStatus.ALL].
* 5) Other results as a [List] of any type (eg. aggregate function results with/without group by).
*
* Note: currently otherResults are used only for Aggregate Functions (in which case, the states and statesMetadata
* results will be empty).
*/
As it looks from your code, if result page contains multiple StateAndRef, the method code singleOrNull()? will actually return null.
This is my guess based on available codes, please share more information if this wasnt the cause of the issue.
I would add your own timestamp to the state and record it in the flow.
Or, you can add a Time-Window to the transaction (https://docs.corda.net/api-transactions.html#time-windows). I believe this also ensures that the statesMetadata.recordedTime will not be null.

Crossfilter total by group

Im trying to show the total number of people in each geography when they hover over using crossfilter, but my current code is only showing the total of all geographies. So what is the equivalent in crossfilter to the sql query: SELECT COUNT(*) GROUP BY dma
This is my code so far
//geography that is being hovered over, getting dma name and removing everything that is after the comma
sel_geog = layer.feature.properties.dma_1;
sel_geog = sel_geog.split(",")[0];
console.log(sel_geog);
//crossfilter to get total number of people of each geography
var dmaDim = voter_data.dimension(function(d) {return d.dma == sel_geog}),
dma_grp = dmaDim.groupAll().reduceCount().value();
console.log(dma_grp);
Crossfilter isn't meant to be used in a way where you are building new dimensions and groups for each user interaction. It's meant to build dimensions and groups before interactions take place and then update them quickly when filtering based on user interactions.
It's not really clear from this question what your data looks like or what you are trying to do, but you probably want to create dimensions and group for your dma property and then build your map based on that:
var voter_data = crossfilter(my_data);
var dmaDim = voter_data.dimension(function(d) { return d.dma; });
var dmaGroup = dmaDim.group();
At this point dmaGroup.all() will be an array of objects that looks like { key: 'dmaKey', value: 10 } where 10 is the count of all records where d.dma === 'dmaKey'. There are lots of ways you can aggregate differently with Crossfilter, but that may get you started.

Correct parameter binding for SELECT WHERE .. LIKE using fmdb?

First time user of fmdb here, trying to start off doing things correctly. I have a simple single table that I wish to perform a SELECT WHERE .. LIKE query on and after trying several of the documented approaches, I can't get any to yield the correct results.
e.g.
// 'filter' is an NSString * containing a fragment of
// text that we want in the 'track' column
NSDictionary *params =
[NSDictionary dictionaryWithObjectsAndKeys:filter, #"filter", nil];
FMResultSet *results =
[db executeQuery:#"SELECT * FROM items WHERE track LIKE '%:filter%' ORDER BY linkNum;"
withParameterDictionary:params];
Or
results = [db executeQuery:#"SELECT * FROM items WHERE track LIKE '%?%' ORDER BY linkNum;", filter];
Or
results = [db executeQuery:#"SELECT * FROM items WHERE track LIKE '%?%' ORDER BY linkNum;" withArgumentsInArray:#[filter]];
I've stepped through and all methods converge in the fmdb method:
- (FMResultSet *)executeQuery:(NSString *)sql withArgumentsInArray:(NSArray*)arrayArgs orDictionary:(NSDictionary *)dictionaryArgs orVAList:(va_list)args
Depending on the approach, and therefore which params are nil, it then either calls sqlite3_bind_parameter_count(pStmt), which always returns zero, or, for the dictionary case, calls sqlite3_bind_parameter_index(..), which also returns zero, so the parameter doesn't get slotted into the LIKE and then the resultSet from the query is wrong.
I know that this is absolutely the wrong way to do it (SQL injection), but it's the only way I've managed to have my LIKE honoured:
NSString *queryString = [NSString stringWithFormat:#"SELECT * FROM items WHERE track LIKE '%%%#%%' ORDER BY linkNum;", filter];
results = [db executeQuery:queryString];
(I've also tried all permutations but with escaped double-quotes in place of the single quotes shown here)
Update:
I've also tried fmdb's own …WithFormat variant, which should provide convenience and protection from injection:
[db executeQueryWithFormat:#"SELECT * FROM items WHERE track LIKE '%%%#%%' ORDER BY linkNum;", filter];
Again, stepping into the debugger I can see that the LIKE gets transformed from this:
… LIKE '%%%#%%' ORDER BY linkNum;
To this:
… LIKE '%%?%' ORDER BY linkNum;
… which also goes on to return zero from sqlite3_bind_parameter_count(), where I would expect a positive value equal to "the index of the largest (rightmost) parameter." (from the sqlite docs)
The error was to include any quotes at all:
[db executeQuery:#"SELECT * FROM items WHERE track LIKE ? ORDER BY linkNum;", filter];
… and the % is now in the filter variable, rather than in the query.

Resources