Has anybody used the WB B-tree library? - disk-based

I stumbled across the WB on-disk B-tree library:
http://people.csail.mit.edu/jaffer/WB
It seems like it could be useful for my purposes (swapping data to disk during very large statistical calculations that do not fit in memory), but I was wondering how stable it is. Reading the manual, it seems worringly 'researchy' - there are sections labelled [NOT IMPLEMENTED] etc. But maybe the manual is just out-of-date.
So, is this library useable? Am I better off looking at Tokyo Cabinet, MemcacheDB, etc.?
By the way I am working in Java.

I have looked at the WB B-Tree Database, but SQLite might be a better fit. It handles extremely large datasets in a single file, and is a lightweight, fully-functional database.
http://www.sqlite.org/
Info on using SQLite with Java is here:
Java and SQLite

Yup, I gave it the good old college try in java. The jar file was easy to find as was the documentation. I think it was written in Scheme or something the likes and was translated to be usable in java.
The documentation speaks of functions that you ought to use but not what Objects they reside on. Sadly there is no java doc to help me out... There are no working examples and after 2 hours of trying I finally gave up. I found it not very useful at all.
I hope others have better luck using it.

Related

How I can learn mlir?

Hi I just came from milr doc and got quiet confused.
I tried to work through the toy project, but cannot understand the mechanism and concept of dialect.
The tutorial just offered an example of some code, how they would interact with each other, how should I use them, it mentioned nothing.
As a beginner, I'm really lost and do not know what to do.
May someone please help me on how to compile a simple program that transfer source to mlir, using the current framework it provided.
The easiest way to learn is by doing some projects. For MLIR, I think you can start by first understanding and doing the Toy tutorial
Then see if you can extend it by adding a new operation to this toy language. If you found this interesting, try out a dialect conversion exercise (say toy to SCF).

'make'-like dependency-tracking library?

There are many nice things to like about Makefiles, and many pains in the butt.
In the course of doing various project (I'm a research scientist, "data scientist", or whatever) I often find myself starting out with a few data objects on disk, generating various artifacts from those, generating artifacts from those artifacts, and so on.
It would be nice if I could just say "this object depends on these other objects", and "this object is created in the following manner from these objects", and then ask a Make-like framework to handle the details of actually building them, figuring out which objects need to be updated, farming out work to multiple processors (like Make's -j option), and so on. Makefiles can do all this - but the huge problem is that all the actions have to be written as shell commands. This is not convenient if I'm working in R or Perl or another similar environment. Furthermore, a strong assumption in Make is that all targets are files - there are some exceptions and workarounds, but if my targets are e.g. rows in a database, that would be pretty painful.
To be clear, I'm not after a software-build system. I'm interested in something that (more generally?) deals with dependency webs of artifacts.
Anyone know of a framework for these kinds of dependency webs? Seems like it could be a nice tool for doing data science, & visually showing how results were generated, etc.
One extremely interesting example I saw recently was IncPy, but it looks like it hasn't been touched in quite a while, and it's very closely coupled with Python. It's probably also much more ambitious than I'm hoping for, which is why it has to be so closely coupled with Python.
Sorry for the vague question, let me know if some clarification would be helpful.
A new system called "Drake" was announced today that targets this exact situation: http://blog.factual.com/introducing-drake-a-kind-of-make-for-data . Looks very promising, though I haven't actually tried it yet.
This question is several years old, but I thought adding a link to remake here would be relevant.
From the GitHub repository:
The idea here is to re-imagine a set of ideas from make but built for R. Rather than having a series of calls to different instances of R (as happens if you run make on R scripts), the idea is to define pieces of a pipeline within an R session. Rather than being language agnostic (like make must be), remake is unapologetically R focussed.
It is not on CRAN yet, and I haven't tried it, but it looks very interesting.
I would give Bazel a try for this. It is primarily a software build system, but with its genrule type of artifacts it can perform pretty arbitrary file generation, too.
Bazel is very extendable, using its Python-like Starlark language which should be far easier to use for complicated tasks than make. You can start by writing simple genrule steps by hand, then refactor common patterns into macros, and if things become more complicated even write your own rules. So you should be able to express your individual transformations at a high level that models how you think about them, then turn that representation into lower level constructs using something that feels like a proper programming language.
Where make depends on timestamps, Bazel checks fingerprints. So if at any one step produces the same output even though one of its inputs changed, then subsequent steps won't need to get re-computed again. If some of your data processing steps project or filter data, there might be a high probability of this kind of thing happening.
I see your question is tagged for R, even though it doesn't mention it much. Under the hood, R computations would in Bazel still boil down to R CMD invocations on the shell. But you could have complicated muliti-line commands assembled in complicated ways, to read your inputs, process them and store the outputs. If the cost of initialization of the R binary is a concern, Rserve might help although using it would make the setup depend on a locally accessible Rserve instance I believe. Even with that I see nothing that would avoid the cost of storing the data to file, and loading it back from file. If you want something that avoids that cost by keeping things in memory between steps, then you'd be looking into a very R-specific tool, not a generic tool like you requested.
In terms of “visually showing how results were generated”, bazel query --output graph can be used to generate a graphviz dot file of the dependency graph.
Disclaimer: I'm currently working at Google, which internally uses a variant of Bazel called Blaze. Actually Bazel is the open-source released version of Blaze. I'm very familiar with using Blaze, but not with setting up Bazel from scratch.
Red-R has a concept of data flow programming. I have not tried it yet.

Programs for creating, design and administrate(GUI) SQLite

Which programs do you know for subj purpose? Quickie googling reveal 2 progs:
sqlite-manager(Firefox extension)
Not all features are realizable through GUI, but really easy for use and opensource.
SQLite Administrator Screenshots looks pretty but for Win only.
Please tell me your opinion about program not just link. Thanks.
Well, I'm using Navicat Premium. It is not free, it costs money, but it is a very nice tool when working with multiple database systems, including Sqlite. It has many nice features, such as working with multiple db's from one window, import/export/synchronize data and schemas across different databases etc.
There is also Navicat for SQLite only which costs less I think.
I found this table, and may be this information help someone.
And this question are repeat this. Just in time heh.
You can try SQLitespy. I found it very useful. Its GUI makes it very easy to explore, analyze, and manipulate SQLite3 databases.

Which free merge utility is used by Drupalians?

I've tried Araxis merge and it's good to use. However it is too costly.
I need only file and folder diff. I also need merge for two files.
Although this Wikipedia page lists all of the free tools but it is really difficult to conclude which tool will be best.
I'm curious which is the most recommeded free merge tool for Drupalians!
I'm not sure Drupalian have specific needs merging-wise compared to other web makers :D
Try Kdiff3 ( http://kdiff3.sourceforge.net/ )which is dead simple.
Sorry I talked about opendiff which I use on my mac but it doesn't seem to be available for windows. But if you are on mac it is part of the original install.
Command-line diff (together with colordiff), vimdiff, emacs has quite good implementation as well...

The Clean programming language in the real world?

Are there any real world applications written in the Clean programming language? Either open source or proprietary.
This is not a direct answer, but when I checked last time (and I find the language very interesting) I didn't find anything ready for real-world.
The idealist in myself always wants to try out new languagages, very hot on my list (apart from the aforementioned very cool Clean Language) is currently (random order) IO, Fan and Scala...
But in the meantime I then get my pragmatism out and check the Tiobe Index. I know you can discuss it, but still: It tells me what I will be able to use in a year from now and what I possibly won't be able to use...
No pun intended!
I am using Clean together with the iTasks library to build websites quite easy around workflows.
But I guess another problem with Clean is the lack of documentation and examples: "the Clean book" is from quite a few years back, and a lot of new features don't get documented except for the papers they publish.
http://clean.cs.ru.nl/Projects page doesn't look promising :) It looks like just another research project with no real-world use to date.
As one of my professors at college has been involved in the creation of Clean, it was no shock he'd created a real world application. The rostering-program of our university was created entirely in Clean.
The Clean IDE and the Clean compiler are written in Clean. (http://wiki.clean.cs.ru.nl/Download_Clean)
Cloogle, a search engine for Clean libraries, syntax, etc. (like Hoogle for Haskell) is written in Clean. Its source is on Radboud University's GitLab instance (web frontend; engine).

Resources