Turning Datalog queries in to SQL(ite) queries - sqlite

Datalog is a lovely language for querying relational data. It is simple, clear, composes well, and supports recursive queries without additional syntax.
SQLite is a fantastic embedded database with what seems to be a powerful query engine able to handle recursive queries – see the examples at the bottom of that page for generating Mandelbrot sets and finding all possible solutions to Sudoko puzzles!
I'm interested to know if there is a fairly standard way to translate from a datalog query in to recursive SQL as supported by SQLite, or if there are libraries that provide this facility.

DLVDB is an interpreter for recursive Datalog that uses an ODBC database connection for their extensional data: http://www.dlvsystem.com/dlvdb/
Apart from that, the paper
S. Ceri, G. Gottlob, and L. Tanca. 1989. What You Always Wanted to Know About Datalog (And Never Dared to Ask). IEEE Trans. on Knowl. and Data Eng. 1, 1 (March 1989), 146-166. http://dx.doi.org/10.1109/69.43410
provides theoretical background and some pointers for translating Datalog into relational algebra.

Related

Is there a language that does both what SQL does and general purpose programming?

I want to implement some game logic with a lot of relations between objects similar to those of relational databases or graph databases.
I know no language that would both allow me to do :
Strong, safe relationnal mapping with non nullable links, cascade deletion, ect.
Implement game logic
Write pure functions
Networking
If possible, a decent data access performance. (Like in-memory SQLlite is acceptable)
I want to avoid using 2 languages and map the data between both using some quite complex ORM. Instead I would like a language that is capable of all of these.
Obsiouly, there is SQL. But I do not know any implementation of SQL that :
Is capable of networking other than replying to SQL requests
Have the many features of a language like F# ? SQL is capable of functional programming but what about F# features like pipes, partial applications, pattern matching, strong typing over primitive types ?
I will accept partial alternative solutions.
Note that I do not need actual persistance storage, only objects relation like relationnal databases, or even graph databases do.
The answer is no, within the bounds as you have set them.
The purpose of The Third Manifesto is to define a language called D, which has the features of a general purpose programming language but implements a type system and relational features specifically aimed at database management. If implemented fully it might replace SQL, but not common GP languages such as C/C++, Java or C#.
There are many GP languages which can do all the things you propose, when used in conjunction with suitably chosen libraries. For the closest match to what you describe, you should stick with any language that suits your other needs, and add to it an in-memory in-process database that uses an API and not SQL. Almost by definition that means you should look for a 'NoSQL' database. There are many.
You question was mentioned here: https://forum.thethirdmanifesto.com/forum/topic/is-there-a-language-that-does-both-what-sql-does-and-general-purpose-programming/. You might find subsequent discussion enlightening.

How well does UnQLite perform? How does it compare to SQLite (in performance)?

I've researched on what I can about SQLite and UnQLite but there are still a few things that haven't quite been answered yet. UnQLite appears to have been released within the past few years which would attribute to the lack of benchmarks. "Performance" (read/write speed, querying, avg. database size before significant slowdown, etc.) comparisons may be somewhat apples-to-oranges here.
From all that I have seen the two have very few differences comparatively speaking, namely that SQLite is a relational database whereas UnQLite is a key-value pair and document (via Jx9) database. They're both portable, cross-platform, and 32/64-bit friendly, and can have single-write and multi-read connections. Very little can be found on UnQLite benchmarks while SQLite has quite a few with different implementations across various (scripting) languages. SQLite has some varied performance across in-memory databases, indexed data, and read/write modes with varying data size. Overall SQLite appears quick and reliable.
All that I can find on UnQLite are unreliable and confusing. I cannot seem to find anything helpful. What read/writes speeds does UnQLite seem to peak at? What languages are (not) recommended when using UnQLite? What are some known disadvantages and bugs?
If it helps at all to explain my intrigue, I'm developing a network utility that will be reading and processing packets with hot-swapping between network interfaces. Since the connections can, though unlikely, reach speeds up to 1 Gbps there will be a lot of raw data being written out to a database. It's still in the early stages of development and I'm having to find a way to balance out performance. There are a lot of factors such as missed packets, how large each write size is, how quickly it can process and move data, how much organization will be required, how many tables will be needed, if I can implement multiprocessing, how reliant each database is on HDD speeds, etc. etc.. My data will need tables but whether or not I have to store them as relational is still in the air. Seeing how the two stack up with their own pros and cons (aside from the usual KVP vs Relational debate) may push me towards either one or, if I'm crazy enough, a mix of both
I've done a bit of fooling around with UnQLite using python bindings I wrote. The Python bindings use cython and are quite fast.
What I've found from my experimentation is that UnQLite's key/value APIs are pretty damn fast, comparable to other DBMs. Things slow down a bit when you start using Jx9 and the document store, though.
Basically depends on what you need...
If you want SQL and ad-hoc querying, I'd suggest using SQLite. It is plenty fast and quite flexible.
If you want just keys and values, I'd use something like leveldb or rocksdb.
If you want a lightweight JSON document store, or key/value with a bit "extra", then UnQLite may be a good fit.

Is it possible to reference custom code in the Where-clause of Neo4J's Cipher Query Language?

Is it possible to use Neo4J's Cipher Query language (or another declarative language) but still reference custom code snippets (for instance to do custom WHERE-clauses based on, say, the result of a ElasticSearch/Lucene search?)
If other GraphDB's have declarative languages that support this, please shoot. I'm in no way bound to Neo4J.
Background:
I'm doing some research whether to include Neo4J in my current stack, which in the backend already consists of ElasticSearch, MongoDB and Redis.
Particulary with Redis' fast set-intersection capability, I could potentially create some rude graph-like querying. (although likely not as performant as a graphDB). I'm a long way in defining a DSL, with the type of queries to support.
However, I'm designing a CMS so contenttypes, and the relationships between these contenttypes which I would like to model with a graph are not known beforehand.
Therefore, the ideal case, of populating the needed Redis collections (with Mongo as source) to support all my quering based on Contenttypes and their relationships that are not known at design time, will be messy to say the least. Hope you're still following.
Which leads me to conclude that another solution may be needed, which is why I'm looking at GraphDb'd and Neo4J in particular (If others are potentially better suited for my use-case do shoot)
If you model your content-types as nodes you don't need to know them beforehand.
User-defined functions in javascript are planned for cypher later this year.
You can use a language like gremlin to declare your functions in groovy though.
You can store the node-id's in redis and then pass in an array of id's returned by redis to a cypher query for further processing.
start n=node({ids})
match n-[:HAS_TYPE]->content_type<-[:HAS_TYPE]-other_content
return content_type, count(*)
order by count(*) desc
limit 10
parameters: {"ids": [1,2,3,5]}

'Pre-prepared' statements in SQLite3?

Using SQLite in a memory-constrained embedded system with a fixed set of queries, it seems that code and data savings could be made if the queries could be 'pre-prepared'. That is, the prepared statement is produced by (an equivalent of) sqlite3_prepare_v2() at build time, and only _bind(), _step() etc need to be called at runtime, referencing one or more sqlite3_stmt* pointers that are effectively static data. The entire SQL parsing (and query planning?) engine could be eliminated from the target.
I realise that there is considerable complexity hidden behind the sqlite3_stmt* pointer, and that this is highly unlikely to be practical with the current sqlite3 implementation - but is the concept feasible?
This was discussed on the SQLite-users mailing list in 2006. At that time D. Richard Hipp supported a commercial version of SQLite that ran compiled statements on a stripped down target, which did not have any SQL parser. Perhaps you could check with hwaci to see if this product is still available.

query language for graph sets: data modeling question

Suppose I have a set of directed graphs. I need to query those graphs. I would like to get a feeling for my best choice for the graph modeling task. So far I have these options, but please don't hesitate to suggest others:
Proprietary implementation (matrix)
and graph traversal algorithms.
RDBM and SQL option (too space consuming)
RDF and SPARQL option (too slow)
What would you guys suggest? Regards.
EDIT: Just to answer Mad's questions:
Each one is relatively small, no more than 200 vertices, 400 edges. However, there are hundreds of them.
Frequency of querying: hard to say, it's an experimental system.
Speed: not real time, but practical, say 4-5 seconds tops.
You didn't give us enough information to respond with a well thought out answer. For example: what size are these graphs? With what frequencies do you expect to query these graphs? Do you need real-time response to these queries? More information on what your application is for, what is your purpose, will be helpful.
Anyway, to counter the usual responses that suppose SQL-based DBMSes are unable to handle graphs structures effectively, I will give some references:
Graph Transformation in Relational Databases (.pdf), by G. Varro, K. Friedl, D. Varro, presented at International Workshop on Graph-Based Tools (GraBaTs) 2004;
5 Conclusion and Future Work
In the paper, we proposed a new graph transformation engine based on off-the-shelf
relational databases. After sketching the main concepts of our approach, we carried
out several test cases to evaluate our prototype implementation by comparing it to
the transformation engines of the AGG [5] and PROGRES [18] tools.
The main conclusion that can be drawn from our experiments is that relational
databases provide a promising candidate as an implementation framework for graph
transformation engines. We call attention to the fact that our promising experimental
results were obtained using a worst-case assessment method i.e. by recalculating
the views of the next rule to be applied from scratch which is still highly inefficient,
especially, for model transformations with a large number of independent matches
of the same rule. ...
They used PostgreSQL as DBMS, which is probably not particularly good at this kind of applications. You can try LucidDB and see if it is better, as I suspect.
Incremental SQL Queries (more than one paper here, you should concentrate on " Maintaining Transitive Closure of Graphs in SQL "): "
.. we showed that transitive closure, alternating paths, same generation, and other recursive queries, can be maintained in SQL if some auxiliary relations are allowed. In fact, they can all be maintained using at most auxiliary relations of arity 2. ..
Incremental Maintenance of Shortest Distance and Transitive Closure in First Order Logic and SQL.
Edit: you give more details so... I think the best way is to experiment a little with both a main-memory dedicated graph library and with a DBMS-based solution, then evaluate carefully pros and cons of both solutions.
For example: a DBMS need to be installed (if you don't use an "embeddable" DBMS like SQLite), only you know if/where your application needs to be deployed and what your users are. On the other hand, a DBMS gives you immediate benefits, like persistence (I don't know what support graph libraries gives for persisting their graphs), transactions management and countless other. Are these relevant for your application? Again, only you know.
The first option you mentioned seems best. If your graph won't have many edges (|E|=O(|V|)) then you might earn better complexity of time and space using Dictionary:
var graph = new Dictionary<Vertex, HashSet<Vertex>>();
An interesting graph library is QuickGraph. Never used it but it seems promising :)
I wrote and designed quite a few graph algorithms for various programming contests and in production code. And I noticed that every time I need one, I have to develop it from scratch, assembling together concepts from graph theory (BFS, DFS, topological sorting etc).
Perhaps a lack of experience is a reason, but it seems to me that there's still no reasonable general-purpose query language to solve graph problems. Pick a couple of general-purpose graph libraries and solve your particular task in a programming (not query!) language. That will give you best performance and space consumption, but will also require understanding of graph theory basic concepts and of their limitations.
And the last one: do not use SQL for graphs.

Resources