Dump in memory sqlite using jooq context to byte[] - sqlite

Need to now get byte[] from in memory db as below.
DSLContext dsl = DSL.using("jdbc:sqlite::memory:");
Can we use DSLContext to get inputstream / byteArray ?
If multiple such "in memory" contexts are created in separate threads, can there be any race condition w.r.t sqlite read /write from DSLContext side ?

The jOOQ side of your question is quite easy to answer. What you're doing there is incomplete. If you're using the DSL.using(String), DSL.using(String, Properties), or DSL.using(String, String, String) methods, you will get a "resourceful" DSLContext, which you have to close yourself (in order to close the underlying JDBC connection. E.g.:
try (DSLContext dsl = DSL.using("jdbc:sqlite::memory:") {
...
}
Do note that jOOQ creates an underlying JDBC connection for you and operates on that for all methods called on dsl. Apart from that, everything works exactly the same way as if you had been using a JDBC connection as such:
try (Connection connection = DriverManager.getConnection("jdbc:sqlite::memory:") {
...
}
Regarding your specific questions:
Can we use DSLContext to get inputstream / byteArray ?
Of course, just fetch a byte array from your database using jOOQ, there's nothing special about it.
If multiple such "in memory" contexts are created in separate threads, can there be any race condition w.r.t sqlite read /write from DSLContext side ?
Without formally validating the docs, this can be checked empirically relatively simply:
try (
DSLContext ctx1 = DSL.using("jdbc:sqlite::memory:");
DSLContext ctx2 = DSL.using("jdbc:sqlite::memory:");
) {
ctx1.execute("create table x (i int primary key, j varchar(10))");
ctx1.execute("insert into x values (1, 'c1')");
ctx2.execute("create table x (i int primary key, j varchar(10))");
ctx2.execute("insert into x values (1, 'c2')");
System.out.println(ctx1.fetch("select i, j from x"));
System.out.println(ctx2.fetch("select i, j from x"));
}
Not only is there no exception when re-creating the table x, there is also no constraint violation on the second insertion of the primary key value 1. The output being:
+----+----+
| i|j |
+----+----+
| 1|c1 |
+----+----+
+----+----+
| i|j |
+----+----+
| 1|c2 |
+----+----+
And, as soon as you close the connection / DSLContext, the data is gone

Related

Does materialize function not work with tabular function argument in Kusto?

I created a simple function MaterializeRemoteInputTable below which accepts the output of a table as an input.
To test if the input is materialized, I am using it twice to calculate two scalar values - x and y:
.create-or-alter function MaterializeRemoteInputTable(inputTable: (State:string)) {
let materializedInput = materialize(inputTable);
let x = toscalar(materializedInput | count );
let y = toscalar(materializedInput | where State contains "S" | count);
print x, y;
}
I am using the sample kusto database's function output as an input to the above function:
cluster('https://help.kusto.windows.net').database('Samples').
GetStatesWithPopulationSmallerThan(1000000)
| invoke MaterializeRemoteInputTable()
On querying the help.kusto.windows.net cluster with my ClientActivityId, I see two queries are executed:
.show commands-and-queries
| where ClientActivityId contains "KE.RunQuery;<guid...>"
Above query outputs two rows:
GetStatesWithPopulationSmallerThan(long(1000000))|__executeAndCache|count as Count|limit long(1)|project ["b2fb..."]=["Count"]
GetStatesWithPopulationSmallerThan(long(1000000))|__executeAndCache|where (["State"] contains ("S"))|count as Count|limit long(1)|project ["5d0e2..."]=["Count"]
Since I have materialized the input to my MaterializeRemoteInputTable, why are two queries executed on the remote cluster, once each for x and y?
There is no issue with materialize & tabular argument to a function.
The issue is a known limitation in materialize with cross cluster query that in some cases the materialization is not performed.
Please note that this is only a performance issue and not a logical issue (query results are guaranteed to be correct).

performance of functional architecture with modules vs. class types, in F#

I understand that functional architecture is encouraged in F#, but I'm hitting a performance wall that doesn't exist with class types.
I have a state type with a bunch of fields in it and it is passed around a series of functions through the code pipeline.
At every state, when some transformation occurs, a new object is created.
Some example is this:
match ChefHelpers.evaluateOpening t brain with
| Some openOrder ->
info $"open order received: {openOrder.Side}"
match! ExchangeBus.submitOneOrderRetryAsync brain.Exchange openOrder with
| Ok _->
{brain.CookOrder.Instrument.PriceToString openOrder.Price} / sl.{stopLossOrder.Side.Letter} at {brain.CookOrder.Instrument.PriceToString stopLossOrder.Price}"
let m = $"{t.Timestamp.Format}: send open order {openOrder.Side}, last price was {brain.CookOrder.Instrument.PriceToString brain.Analysis.LastPrice}"
return Message m, ({ brain.WithStatus (Cooking MonitorForClose) with OpenOrder = Some openOrder })
| Error e ->
let m = $"{t.Timestamp}: couldn't open position:\n{e.Describe()}"
return! ChefHelpers.cancelAllOrdersAsync brain (ChefError m) (ExchangeError (Other (e.Describe())))
| None ->
return NoMessage, brain
Where the object 'brain' that holds all the states will get passed around, updated, etc
And this works very well when run live since everything may get executed a 2-3 times per second at most.
When I want to run the same code on static data to check behavior, etc, this is a different story because I'm running it millions of times while I'm waiting for it to finish.
All this code is dealing with small lists, doing basic comparisons, arithmetic, etc so the the cost of rebuilding the main object sticks out and becomes painfully apparent.
I tried to rebuild some of that logic as an object type where the state is a bunch of mutable variables and the performance difference is dramatic.
I have a lot of code like this:
type A = { }
let a : A = ...
let a = doSomething1 a
let a = doSomething2 a
let a =
match x with
| true -> doSomething3 a
| false -> a
etc
I'd say the whole tool architecture is built with code that looks like that.
and there is a lot of these:
let a = { a with X = 3 }
but there is no concurrency in the pipeline and it is very linear, so in the case of the last line, if I had a way to tell the compiler: it's the same object, it is not used anywhere else, edit it in place, then the performance would be a lot better.
What could be strategies I could use to keep the code readable, but minimize that issue?
Is the problem the actual data copying? the main object has 18 fields, so I can't imagine it being larger than 200 bytes, allocating space for it? or does it create a lot of garbage collection?
It's not something straightforward to profile since it's a cost that's everywhere and inside dotnet.
Any feedback / ideas would be great, especially "you're doing it wrong, do X instead" :)
From a design perspective, 18 fields is actually a fairly large record, in my opinion. Perhaps you could factor that into sub-records, so you're not constantly re-allocating the entire thing? So instead of this:
type A =
{
X : int
Field1 : int
Field2 : int
...
Field18 : int
}
You could have this instead:
type Sub1 =
{
Field1 : int
...
Field9 : int
}
type Sub2 =
{
Field10 : int
...
Field18 : int
}
type A =
{
X : int
Sub1 : Sub1
Sub2 : Sub2
}
Then the performance cost of let a = { a with X = 3 } would presumably be less.
Bonus idea: You might also want to emulate the cool Haskell kids, and look into lenses, which are designed specifically for reading and updating parts of immutable data.

How to get back one row's data in rusqlite?

rustc 1.38.0 (625451e37 2019-09-23)
rusqlite 0.20.0
I'm writing a program where I need to get back the id from the last insertion that sqlite just created.
db.execute("insert into short_names (short_name) values (?1)",params![short]).expect("db insert fail");
let id = db.execute("SELECT id FROM short_names WHERE short_name = '?1';",params![&short]).query(NO_PARAMS).expect("get record id fail");
let receiver = db.prepare("SELECT id FROM short_names WHERE short_name = "+short+";").expect("");
let id = receiver.query(NO_PARAMS).expect("");
println!("{:?}",id);
What I should be getting back is the id value sqlite automatically assigned with AUTOINCREMENT.
I'm getting this compiler Error:
error[E0599]: no method named `query` found for type `std::result::Result<usize, rusqlite::Error>` in the current scope
--> src/main.rs:91:100
|
91 | let id = db.execute("SELECT id FROM short_names WHERE short_name = '?1';",params![&short]).query(NO_PARAMS).expect("get record id fail");
| ^^^^^
error[E0369]: binary operation `+` cannot be applied to type `&str`
--> src/main.rs:94:83
|
94 | let receiver = db.prepare("SELECT id FROM short_names WHERE short_name = "+short+";").expect("");
| ------------------------------------------------^----- std::string::String
| | |
| | `+` cannot be used to concatenate a `&str` with a `String`
| &str
help: `to_owned()` can be used to create an owned `String` from a string reference. String concatenation appends the string on the right to the string on the left and may require reallocation. This requires ownership of the string on the left
|
94 | let receiver = db.prepare("SELECT id FROM short_names WHERE short_name = ".to_owned()+&short+";").expect("");
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^
error[E0277]: `rusqlite::Rows<'_>` doesn't implement `std::fmt::Debug`
--> src/main.rs:96:25
|
96 | println!("{:?}",id);
| ^^ `rusqlite::Rows<'_>` cannot be formatted using `{:?}` because it doesn't implement `std::fmt::Debug`
|
= help: the trait `std::fmt::Debug` is not implemented for `rusqlite::Rows<'_>`
= note: required by `std::fmt::Debug::fmt`
Line 94: I understand that rust's String is not the right type for the execute call, but I'm not sure what to do instead.
I suspect what needs to happen is the short_names table needs to be pulled form the database and then from the rust representation of the table for get the id that matches the short I'm trying to work with. I've been going off this example as a jumping off point, but It's dereferenced it's usefulness. The program I'm writing calls another program and then babysits it while this other program runs. To reduce over head I'm trying to not use OOP for this current program.
How should I structure my request to the database to get by the id I need?
Okay. First off, we are going to use a struct, because, unlike in Java, it is literally equivalent to not using one in this case, except that you gain in being able to keep things tidy.
You're trying to emulate Connection::last_insert_rowid(), which isn't a terribly smart thing to do, particularly if you are not in a transaction. We're also going to clear this up for you in a nice and neat fashion:
use rusqlite::{Connection};
pub struct ShortName {
pub id: i64,
pub name: String
}
pub fn insert_shortname(db: &Connection, name: &str) -> Result<ShortName, rusqlite::Error> {
let mut rtn = ShortName {
id: 0,
name: name.to_string()
};
db.execute("insert into short_names (short_name) values (?)",&[name])?;
rtn.id = db.last_insert_rowid();
Ok(rtn)
}
You can convince yourself that it works with this test:
#[test]
fn it_works() {
let conn = Connection::open_in_memory().expect("Could not test: DB not created");
let input:Vec<bool> = vec![];
conn.execute("CREATE TABLE short_names (id INTEGER PRIMARY KEY AUTOINCREMENT, short_name TEXT NOT NULL)", input).expect("Creation failure");
let output = insert_shortname(&conn, "Fred").expect("Insert failure");
assert_eq!(output.id, 1);
}
In rusqlite execute does not return a value. To return a value from a sqlite operation you need to use prepare and a variant of query. While much of Rust allows you to leave type up to the compiler, for rusqite you need to give the receiving variable a type.
There is not currently a way in rusqlite to take a single row out of a query. The type of rows is not a type iterator, so you need to progress over it with a while loop, that will progress based on the error type of rows. After the loop runs once it will return that there are no other row in rows and exit; if there is only one row from the query.
You can use query_named to modify the sql query your sanding. Using the named_params!{} macro will allow you to use a String to send information to the command.
use rusqlite::*;
fn main() {
let short = "lookup".to_string(); // example of a string you might use
let id:i64 = 0;
{ // open for db work
let db = Connection::open("YourDB.db").expect("db conn fail");
let mut receiver = db
.prepare("SELECT * FROM short_names WHERE short_name = :short;")
.expect("receiver failed");
let mut rows = receiver
.query_named(named_params!{ ":short": short })
.expect("rows failed");
while let Some(row) = rows.next().expect("while row failed") {
id=row.get(0).expect("get row failed");
}
} // close db work
println!("{}", id);
}
In the above example, we open a scope with {} around the database transaction, this will automatically close the db when it goes out of scope. Notice that we create our db connection and do all our work with the database solely inside the {}. This allows us to skip closing the db with the explicate command and is done by inference taken by the compiler from the scope: {}. The variables short and id, created in the scope of main(), are still available to the db scope and the rest of the scope of main(). While id is not assigned until the db scope, but it's defined outside of the scope, the scope of main, so that is where id's lifetime begins. id does not need to be mutable because it's only assigned once, if there is in fact only one row to retrieve, the while loop will only assign it once. Otherwise, if the database does not behave as expected this will result in an error.

std::map kind of implementation in Lua

I have std::map which contains list of values associated with a Key. Actual implementation contains many such Keys. Is there a similar way in Lua Table implementation which could hold multiple values for a specific key. If so how to write and read from that table.
I referred the How do I create a Lua Table in C++, and pass it to a Lua function?
I have only access to set and get values which is on my C++ code which was written more generic and cannot create table in C++. (Third party C++ code).
All I have is I can get KeyType, Key, and Value using the
luaState = luaL_newstate();
lua_register(luaState, "getValue", get_value);
lua_register(luaState, "setValue", set_value);.
The C++ code has something like
typedef std::set<const char *> TValueNames;
std::map<const char *, TValueNames> keyValueList;
By referring to Lua document I understood I can create a table with Key as index and assign value as its data. But I need to know how to assign multiple value(data) for one Key.
https://www.lua.org/pil/2.5.html
The example lua script implementation is like,
local keyType = getValue("KeyType");
local Key = getValue("Key");
local Value = getValue("Value");
KeyValueTable = {}
KeyValueTable[Key] = Value;
I need to create something which could hold information like,
["Key1"] = {"Value1,Value2,Value3"};
["Key2"] = {"Value2,Value3,Value4,Value5"};
As you know, a key in a Lua table can only refer to one value, but you can easily make that value a table to hold multiple values. To more faithfully represent the set in the C++ structure, we can make the values into keys in the inner table.
local function setValue(self, key, value)
self[key] = self[key] or {}
self[key][value] = true
end
local function removeValue(self, key, value)
if type(self[key]) == 'table' then
self[key][value] = nil
end
end
local function checkValue(self, key, value)
if type(self[key]) == 'table' then
return self[key][value]
end
end

how to design/create key for key/value storage?

I want to store serialized objects (or whatever) in a key/value cache.
Now I do something like this :
public string getValue(int param1, string param2, etc )
{
string key = param1+"_"+param2+"_"+etc;
string tmp = getFromCache();
if (tmp == null)
{
tmp = getFromAnotherPlace();
addToCache( key, tmp);
}
return tmp;
}
I think it can be awkward. How can I design the key?
if i understood the question, i think the simplest and smartest way to make a key is to use an unidirectional hash function as MD5, SHA1 ecc...
At least two reason for doing this:
The resulting key is unique for sure!(actually both MD5 and SHA1 have been cracked (= )
The resulting key has a fixed lenght!
You have to give your object as argument of the function and you have your unique key.
I don t know very much c# but i am quite sure you can find an unidirectional hash function builted-in.
First of all your key seems to be composed out of a lot of characters. Keep in mind that the key name also occupies memory (1byte / char) so try to keep it as short as possible. I've seen situations where the key name was larger than the value, which can happen if you have cases where you store an empty array or an empty value.
The key structure. I guess from your example that the object you want to store is identified by the params (one being the item id maybe, or maybe filters for a search [...]). Start with a prefix. The prefix should be the name of the object class (or a simplified name depicting the object in general).
Most of the time, keys will have a prefix + identifier. In your example you have multiple identifiers. If one of them is a unique id, go with only prefix + id and it should be enough.
If the object is large and you don't always use all of it then change your strategy to a multiple key storage. Use one main key for storing the most common values, or for storing the components of the object, values of which are stored in separate keys. Make use of pipes and get the whole object in one connection using one "multiple" query :
mainKey = prefix + objectId;
object = getFromCache(mainKey);
startCachePipeline();
foreach (object[properties] as property) {
object->property = getFromCache(prefix + objectId + property);
}
endCachePipeline();
The structure for an example "Person" object would then be something like :
person_33 = array(
properties => array(age, height, weight)
);
person_33_age = 28;
person_33_height = 6;
person_33_weight = 150;
Memcached uses memory most efficient when objects stored inside are of similar sizes. The bigger the size difference between objects (not talking about 1 lost big object or singular cases, although memory gets wasted then as well) the more wasted memory.
Hope it helps!

Resources