Different results filtered by S>quantile(S, 0.8) and aggrTopN( sortingCol=S, top=0.2, ascending=false) in DolphinDB? - quantile

Why are the results of
select SecurityID,TradeDate,aggrTopN(func=abs, funcArgs=CLOSE, sortingCol=CLOSE, top=0.20, ascending=false) as res from data group by SecurityID
and
defg compute_part(CLOSE)
{
index = at(CLOSE>quantile(CLOSE,0.8))
return abs(CLOSE[index])
}
select SecurityID,TradeDate,computer(CLOSE) as res from data group by SecurityID
different in DolphinDB?

Related

Pandas Dataframe conversion to DynamoDB Table

I have a dataframe in the below format.
Am trying to move this to a DynamoDB table in the below format.
Device Id
SensorType
TimeStamp
Min Max Avg
Struggling with my code below..any help on this...
df = pd.DataFrame(col)
df['SensorValue'] = pd.to_numeric(df['SensorValue'], errors='coerce')
df['CurrentTime'] = pd.to_datetime(df['CurrentTime'])
minute = pd.Grouper(key='CurrentTime', freq='T')
df = df.groupby(['DeviceId','SensorDataType', minute]).SensorValue.agg(['min','max','mean'])
table1 = dynamodb.Table('bsm_data_table')
with table1.batch_writer() as batch:
for index, row in df.iterrows():
content = {
'DeviceId', row['DeviceId'],
'SensorDataType', row['SensorDataType'],
'CurrentTime', row['CurrentTime'],
'min', row['min'],
'max',row['max'],
'mean',row['mean']
}
batch.put_item(Item=content)

XQUERY for EMC XDB separate for loops return in results

For this table results display top of the results.
For this table results display bottom of the results.
I tried a few ways, a join. But the join takes alternates
table1 record
table2 record
table1 record
table2 record
I need
table1 record
table2 record
table2 record
table2 record
{
for $an in /db/table1/row
where $an/ACCOUNT = "something"
return $an
}
{
for $a in /db/table2/row
where $a/PAT_ACCT_NBR = "something"
return $a
}
results
$an here
$a here.
If I understand you correctly you could simply query the tables as needed and combine them to put them in the desired order:
let $table1 := //db/table1/row/ACCOUNT/[text() = 'something']
let $table2 := //db/table2/row/PAT_ACCT_NBR/[text() = 'something']
return ($table1, $table2)
The XPath part is just a suggestion; use whatever works for you.

row with max value per group - SQLite

Given a table with columns(name, lat, lon, population, type) where there are many rows for each name, I'd like to select the rows grouped by name where population is the highest. The following works if I restrict myself to just name and population
SELECT name, Max(population)
FROM table WHERE name IN ('a', 'b', 'c')
GROUP BY name;
But I want the other columns — lat, lon, type — as well in the result. How can I achieve this using SQLite?
SQLite allows you to just list the other columns you want; they are guaranteed to come from the row with the maximum value:
SELECT name, lat, lon, Max(population), type
FROM table
WHERE name IN ('a', 'b', 'c')
GROUP BY name;
The docs read:
Special processing occurs when the aggregate function is either min() or max(). Example:
SELECT a, b, max(c) FROM tab1 GROUP BY a;
When the min() or max() aggregate functions are used in an aggregate query, all bare columns in the result set take values from the input row which also contains the minimum or maximum.
Join against that result to get the complete table records
SELECT t1.*
FROM your_table t1
JOIN
(
SELECT name, Max(population) as max_population
FROM your_table
WHERE name IN ('a', 'b', 'c')
GROUP BY name
) t2 ON t1.name = t2.name
and t1.population = t2.max_population
RANK or ROW_NUMBER window functions
Although max is guaranteed to work on SQLite as mentioned at https://stackoverflow.com/a/48328243/895245 the following method appears to be more portable and versatile:
SELECT *
FROM (
SELECT
ROW_NUMBER() OVER (
PARTITION BY "name"
ORDER BY "population" DESC
) AS "rnk",
*
FROM "table"
WHERE "name" IN ('a', 'b', 'c')
) sub
WHERE
"sub"."rnk" = 1
ORDER BY
"sub"."name" ASC,
"sub"."population" DESC
That exact same code works on both:
SQLite 3.34.0
PostgreSQL 14.3
Furthermore, we can easily modify that query to cover the following use cases:
if you replace ROW_NUMBER() with RANK(), it returns all ties for the max if more than one row reaches the max
if you replace "sub"."rnk" = 1 with "sub"."rnk" <= n you can get the top n per group rather than just the top 1

Casting comma separated to integer for IN clause

I have three tables estimate, location and department. Now I am JOINing tables location and estimate to get desired results.
Query
SELECT e.id, e.department_ids FROM estimate e JOIN location l ON e.location_id = l.id WHERE e.user_id = '1' and e.delete_flag = 0 and l.active_flag = 1
Result
For above requirement this query was working fine.
Now I want relevant department names as well. So I am using this query
Query
SELECT e.id, e.department_ids, (SELECT group_concat(department, ', ') FROM department WHERE id IN (e.department_ids)) as departmentName FROM estimate e JOIN location l ON e.location_id = l.id WHERE e.user_id = '1' and e.delete_flag = 0 and l.active_flag = 1
Result
which gives me only departments with single department id.
Although if I hardcode e.department as "2, 5" I am getting desired result
Query
SELECT e.id, e.department_ids, (SELECT group_concat(department, ', ') FROM department WHERE id IN (2, 5)) as departmentName FROM estimate e JOIN location l ON e.location_id = l.id WHERE e.user_id = '1' and e.delete_flag = 0 and l.active_flag = 1
Result
I tried cast(e.department_ids as integer), but this is also taking single department_id per row. Is there any function I can cast whole string of e.departments (i.e. "4, 2") so that I can pass that in IN clause?
I got solution for the same in oracle, I could find it's equivalent for sqlite.
I got the desired result using GROUP BY clause.

Modify Crossfilter "Flights" example so one of graphs is not a group-by graph?

I'm trying to use the Crossfilter example site as a start for my desired graph, but am struggling with creating a non grouped graph that interacts with a grouped graph.
My data is a list of unique employee records:
employee,cnt
john,3
bill,15
fred,30
jill,6
...
I want one graph to show the cnt field grouped by value, analogous to the example's Distance graph. The next graph I want would have a bar for each employee, but instead of grouping them by employee value, I want the graph instead to simply show the cnt value.
Here's what I got going so far; however, this does group-by on both graphs:
// ...
var crossData = crossfilter(data),
all = crossData.groupAll(),
cnt = crossData.dimension(function(d) { return d.cnt; }),
cnts = cnt.group(),
emp = crossData.dimension(function(d) { return d.employee; }),
emps = emp.group();
var charts = [
barChart()
.dimension(cnt)
.group(cnts)
.x(d3.scale.linear()
.domain([0, 15])
.rangeRound([0, 920])),
barChart()
.dimension(emp)
.group(emps)
.x(d3.scale.ordinal().rangePoints([0, 920])
.domain(data.map(function(d) { return d.employee; })))
];
// ...
Make your "emps" group sum by cnt, like this:
emps = emp.group().reduceSum(function (d) { return d.cnt; });
That will give you the sum of the cnt field for each employee. Since you only have one record per employee, you'll just get the value of the cnt field.

Resources