KDB: Create Empty Table Using Dynamic Columns - functional-programming

I want to create an empty table with the following static columns:
date, security, active, horizon
and an undefined number of additional columns that are represented by the following variables:
outFactor, subFacCols
The columns represented by outFactor and subFacCols are float types. How can I create a dummy table with the aforementioned columns?
Example:
These are the first 5 columns, not including subFacCols
dummyTable:flip (`date`security`active`horizon,outFactor)!(`date$();`int$();`boolean$();`int$();`float$())

You need the key and value of the dictionary to be of the same length, therefore the following should work:
q)outFactor:`price`size
q)subFacCols:`bestBid
q)dummyTable:flip (`date`security`active`horizon,outFactor,subFacCols)!(`date$();`int$();`boolean$();`int$()),(count[outFactor]#`float$()),count[subFacCols]#`float$()
q)meta dummyTable
c | t f a
--------| -----
date | d
security| i
active | b
horizon | i
price | f
size | f
bestBid | f
Uses: https://code.kx.com/q/ref/lists/#take

Related

Kusto: Apply function on multiple column values during bag_unpack

Given a dynamic field, say, milestones, it has value like: {"ta": 1655859586546, "tb": 1655859586646},
How do I print a table with columns like "ta", "tb" etc, with the single row as unixtime_milliseconds_todatetime(tolong(taValue)), unixtime_milliseconds_todatetime(tolong(tbValue)) etc.
I figured that I'll need to write a function that I can call, so I created this:-
let f = view(a:string ){
unixtime_milliseconds_todatetime(tolong(a))
};
I can use this function with a normal column as:- project f(columnName).
However, in this case, its a dynamic field, and the number of items in the list is large, so I do not want to enter the fields manually. This is what I have so far.
log_table
| take 1
| evaluate bag_unpack(milestones, "m_") // This gives me fields as columns
// | project-keep m_* // This would work, if I just wanted the value, however, I want `view(columnValue)
| project-keep f(m_*) // This of course doesn't work, but explains the idea.
Based on the mv-apply operator
// Generate data sample. Not part of the solution.
let log_table = materialize(range record_id from 1 to 10 step 1 | mv-apply range(1, 1 + rand(5), 1) on (summarize milestones = make_bag(pack_dictionary(strcat("t", make_string(to_utf8("a")[0] + toint(rand(26)))), 1600000000000 + rand(60000000000)))));
// Solution Starts here.
log_table
| mv-apply kv = milestones on
(
extend k = tostring(bag_keys(kv)[0])
| extend v = unixtime_milliseconds_todatetime(tolong(kv[k]))
| summarize milestones = make_bag(pack_dictionary(k, v))
)
| evaluate bag_unpack(milestones)
record_id
ta
tb
tc
td
te
tf
tg
th
ti
tk
tl
tm
to
tp
tr
tt
tu
tw
tx
tz
1
2021-07-06T20:24:47.767Z
2
2021-05-09T07:21:08.551Z
2022-07-28T20:57:16.025Z
2022-07-28T14:21:33.656Z
2020-11-09T00:54:39.71Z
2020-12-22T00:30:13.463Z
3
2021-12-07T11:07:39.204Z
2022-05-16T04:33:50.002Z
2021-10-20T12:19:27.222Z
4
2022-01-31T23:24:07.305Z
2021-01-20T17:38:53.21Z
5
2022-04-27T22:41:15.643Z
7
2022-01-22T08:30:08.995Z
2021-09-30T08:58:46.47Z
8
2022-03-14T13:41:10.968Z
2022-03-26T10:45:19.56Z
2022-08-06T16:50:37.003Z
10
2021-03-03T11:02:02.217Z
2021-02-28T09:52:24.327Z
2021-04-09T07:08:06.985Z
2020-12-28T20:18:04.973Z
9
2022-02-17T04:55:35.468Z
6
2022-08-02T14:44:15.414Z
2021-03-24T10:22:36.138Z
2020-12-17T01:14:40.652Z
2022-01-30T12:45:54.28Z
2022-03-31T02:29:43.114Z
Fiddle

Julia Markdown - Chunk Output as Markdown

I would like to print a table in Julia Markdown. To the best of my knowledge there is no cool package, yet, that is doing this. Hence, I would like to create a nice looking table through code, but I can't figure out how.
This is my table code...
---
title: Just a test
author: Me
date: 2022-01-03
output: pdf_document
---
```julia
"""
| Column One | Column Two | Column Three |
|:---------- | ---------- |:------------:|
| Row `1` | Column `2` | |
| *Row* 2 | **Row** 2 | Column ``3`` |
"""
```
...and I want it to produce this...
...instead of this:
The Markdown standard library can parse tables too:
julia> tbl = """
| Column One | Column Two | Column Three |
|:---------- | ---------- |:------------:|
| Row `1` | Column `2` | |
| *Row* 2 | **Row** 2 | Column ``3`` |
"""
julia> md = Markdown.parse(tbl);
julia> # text formatting like emphasis and bold are lost in pasting
# to StackOverflow, but shown in the original output
md
Column One Column Two Column Three
–––––––––– –––––––––– ––––––––––––
Row 1 Column 2
Row 2 Row 2 Column 3
The parse output is a Markdown.MD object that is rendered appropriately depending on your output display (i.e. terminal, Jupyter, etc).
If you want to produce a markdown table directly from data (without parsing it from a string), you can also construct a Markdown.Table directly; check the varinfo() function from the InteractiveUtils standard library for an example of that.

SQLite find table row where a subset of columns satisfies a specified constraint

I have the following SQLite table
CREATE TABLE visits(urid INTEGER PRIMARY KEY AUTOINCREMENT,
hash TEXT,dX INTEGER,dY INTEGER,dZ INTEGER);
Typical content would be
# select * from visits;
urid | hash | dx | dY | dZ
------+-----------+-------+--------+------
1 | 'abcd' | 10 | 10 | 10
2 | 'abcd' | 11 | 11 | 11
3 | 'bcde' | 7 | 7 | 7
4 | 'abcd' | 13 | 13 | 13
5 | 'defg' | 20 | 21 | 17
What I need to do here is identify the urid for the table row which satisfies the constraint
hash = 'abcd' AND (nearby >= (abs(dX - tX) + abs(dY - tY) + abs(dZ - tZ))
with the smallest deviation - in the sense of smallest sum of absolute distances
In the present instance with
nearby = 7
tX = tY = tZ = 12
there are three rows that meet the above constraint but with different deviations
urid | hash | dx | dY | dZ | deviation
------+-----------+-------+--------+--------+---------------
1 | 'abcd' | 10 | 10 | 10 | 6
2 | 'abcd' | 11 | 11 | 11 | 3
4 | 'abcd' | 12 | 12 | 12 | 3
in which case I would like to have reported urid = 2 or urid = 3 - I don't actually care which one gets reported.
Left to my own devices I would fetch the full set of matching rows and then dril down to the one that matches my secondary constraint - smallest deviation - in my own Java code. However, I suspect that is not necessary and it can be done in SQL alone. My knowledge of SQL is sadly too limited here. I hope that someone here can put me on the right path.
I now have managed to do the following
CREATE TEMP TABLE h1(v1 INTEGER,v2 INTEGER);
SELECT urid,(SELECT (abs(dX - 12) + abs(dY - 12) + abs(dZ - 12))) devi FROM visits WHERE hash = 'abcd';
which gives
--SELECT * FROM h1
urid | devi |
-------+-----------+
1 | 6 |
2 | 3 |
4 | 3 |
following which I issue
select urid from h1 order by v2 asc limit 1;
which yields urid = 2, the result I am after. Whilst this works, I would like to know if there is a better/simpler way of doing this.
You're so close! You have all of the components you need, you just have to put them together into a single query.
Consider:
SELECT urid
, (abs(dx - :tx) + abs(dy - :tx) + abs(dz - :tx)) AS devi
FROM visits
WHERE hash=:hashval AND devi < :nearby
ORDER BY devi
LIMIT 1
Line by line, first you list the rows and computed values you want (:tx is a placeholder; in your code you want to prepare a statement and then bind values to the placeholders before executing the statement) from the visit table.
Then in the WHERE clause you restrict what rows get returned to those matching the particular hash (That column should have an index for best results... CREATE INDEX visits_idx_hash ON visits(hash) for example), and that have a devi that is less than the value of the :nearby placeholder. (I think devi < :nearby is clearer than :nearby >= devi).
Then you say that you want those results sorted in increasing order according to devi, and LIMIT the returned results to a single row because you don't care about any others (If there are no rows that meet the WHERE constraints, nothing is returned).

Last matching date in spreadsheet function

I have a spreadsheet where dates are being recorded in regards to individuals, with additional data, as such:
Tom | xyz | 5/2/2012
Dick | foo | 5/2/2012
Tom | bar | 6/1/2012
On another sheet there is a line in which I want to be able to put in the name, such as Tom, and retrieve on the following cell through a formula the data for the LAST (most recent by date) entry in the first sheet. So the first sheet is a log, and the second sheet displays the most recent one. In the following example, the first cell is entered and the remaining are formulas displaying data from the first sheet:
Tom | bar | 6/1/2012
and so on, showing the latest dated entry in the log.
I'm stumped, any ideas?
If you only need to do a single lookup, you can do that by adding two new columns in your log sheet:
Sheet1
| A | B | C | D | E | F
1 | Tom | xyz | 6/2/2012 | | * | *
2 | Dick | foo | 5/2/2012 | | * | *
3 | Tom | bar | 6/1/2012 | | * | *
Sheet2
| A | B | C
1 | Tom | =Sheet1.E1 | =Sheet1.F1
*(E1) = =IF(AND($A1=Sheet2.$A$1;E2=0);B1;E2)
(i.e. paste the formula above in E1, then copy/paste it in the other cells with *)
Explanation: if A is not what you're looking for, go for the next; if it is, but there is a non-empty next, go for the next; otherwise, get it. This way you're selecting the last one corresponding to your search. I'm assuming you want the last entry, not "the one with the most recent date", since that's what you asked in your example. If I interpreted your question wrong, please update it and I can try to provide a better answer.
Update: If the log dates can be out of order, here's how you get the last entry:
*(F1) = =IF(AND($A1=Sheet2.$A$1;C1>=F2);C1;F2)
*(E1) = =IF(C1=F1;B1;E2)
Here I just replaced the test F2=0 (select next if non-empty) for C1>=F2 (select next if more recent) and, for the other column, select next if the first test also did so.
Disclaimer: I'm very inexperienced with spreadsheets, the solution above is ugly but gets the job done. For instance, if you wanted a 2nd row in Sheet2 to do another lookup, you'd need to add two more columns to Sheet1, etc.

Code new variable based on grep return in R

I have a variable actor which is a string and contains values like "military forces of guinea-bissau (1989-1992)" and a large range of other different values that are fairly complex. I have been using grep() to find character patterns that match different types of actors. For example I would like to code a new variable actor_type as 1 when actor contains "military forces of", doesn't contain "mutiny of", and the string variable country is also contained in the variable actor.
I am at a loss as to how to conditionally create this new variable without resorting to some type of horrible for loop. Help me!
Data looks roughly like this:
| | actor | country |
|---+----------------------------------------------------+-----------------|
| 1 | "military forces of guinea-bissau" | "guinea-bissau" |
| 2 | "mutiny of military forces of guinea-bissau" | "guinea-bissau" |
| 3 | "unidentified armed group (guinea-bissau)" | "guinea-bissau" |
| 4 | "mfdc: movement of democratic forces of casamance" | "guinea-bissau" |
if your data is in a data.frame df:
> ifelse(!grepl('mutiny of' , df$actor) & grepl('military forces of',df$actor) & apply(df,1,function(x) grepl(x[2],x[1])),1,0)
[1] 1 0 0 0
grepl returns a logical vector and this can be assigned to whatever, e.g. df$actor_type.
breaking that appart:
!grepl('mutiny of', df$actor) and grepl('military forces of', df$actor) satisfy your first two requirements. the last piece, apply(df,1,function(x) grepl(x[2],x[1])) goes row by row and greps for country in actor.

Resources