An array- or vector-like type with values stored on disk in Julia - julia

I am looking for an Array-like type with the following properties:
stores elements on disk
elements can have composite type
elements are read into memory, not the whole array
it is possible to write individual elements without writing the whole array
supports setindex!, getindex, push!, pop!, shift!, unshift! and maybe vcat
is reasonably efficient
So far I have found the following leads:
https://docs.julialang.org/en/latest/stdlib/SharedArrays/
http://juliadb.org
https://github.com/JuliaIO/JLD.jl
The first one seems promising, but it seems the type of the elements has to be isbits (meaning a simple number, some structs but not, e.g., an Array{Float64,1}). And it's not clear if the whole array contents are loaded into memory.
If it does not exist yet, I will of course try to construct it myself.

NCDatasets.jl addresses part of the requirements:
stores elements on disk: yes
elements can have composite type: no (although some support for composite type is in NetCDF4, but not yet in NCDatasets.jl). Currently you can have only Arrays of basic types and Arrays of Vectors (of basic types).
elements are read into memory, not the whole array: yes
it is possible to write individual elements without writing the whole array supports setindex!, getindex, push!, pop!, shift!, unshift! and maybe vcat: just setindex!, getindex
is reasonably efficient: the efficency is reasonable for me :-)
The project making it yourself sounds very interesting. I think it would server certainly a gap in the current ecosystem.
Some storage technologies that might be good to have a look at are:
HDF5 (for storage, cross-platform and cross-language)
JLD2 (successor of JLD) https://github.com/simonster/JLD2.jl
rasdaman (a "database" for arrays) http://www.rasdaman.org/
possibly also BSON http://bsonspec.org/
Maybe you can also reach out to the JuliaIO group.

Related

Indexing Dynamic Arrays in CUDA [duplicate]

There are a lot of questions online about allocating, copying, indexing, etc 2d and 3d arrays on CUDA. I'm getting a lot of conflicting answers so I'm attempting to compile past questions to see if I can ask the right ones.
First link: https://devtalk.nvidia.com/default/topic/392370/how-to-cudamalloc-two-dimensional-array-/
Problem: Allocating a 2d array of pointers
User solution: use mallocPitch
"Correct" inefficient solution: Use malloc and memcpy in a for loop for each row (Absurd overhead)
"More correct" solution: Squash it into a 1d array "professional opinion," one comment saying no one with an eye on performance uses 2d pointer structures on the gpu
Second link: https://devtalk.nvidia.com/default/topic/413905/passing-a-multidimensional-array-to-kernel-how-to-allocate-space-in-host-and-pass-to-device-/
Problem: Allocating space on host and passing it to device
Sub link: https://devtalk.nvidia.com/default/topic/398305/cuda-programming-and-performance/dynamically-allocate-array-of-structs/
Sub link solution: Coding pointer based structures on the GPU is a bad experience and highly inefficient, squash it into a 1d array.
Third link: Allocate 2D Array on Device Memory in CUDA
Problem: Allocating and transferring 2d arrays
User solution: use mallocPitch
Other solution: flatten it
Fourth link: How to use 2D Arrays in CUDA?
Problem: Allocate and traverse 2d arrays
Submitted solution: Does not show allocation
Other solution: squash it
There are a lot of other sources mostly saying the same thing but in multiple instances I see warnings about pointer structures on the GPU.
Many people claim the proper way to allocate an array of pointers is with a call to malloc and memcpy for each row yet the functions mallocPitch and memcpy2D exist. Are these functions somehow less efficient? Why wouldn't this be the default answer?
The other 'correct' answer for 2d arrays is to squash them into one array. Should I just get used to this as a fact of life? I'm very persnickety about my code and it feels inelegant to me.
Another solution I was considering was to max a matrix class that uses a 1d pointer array but I can't find a way to implement the double bracket operator.
Also according to this link: Copy an object to device?
and the sub link answer: cudaMemcpy segmentation fault
This gets a little iffy.
The classes I want to use CUDA with all have 2/3d arrays and wouldn't there be a lot of overhead in converting those to 1d arrays for CUDA?
I know I've asked a lot but in summary should I get used to squashed arrays as a fact of life or can I use the 2d allocate and copy functions without getting bad overhead like in the solution where alloc and cpy are called in a for loop?
Since your question compiles a list of other questions, I'll answer by compiling a list of other answers.
cudaMallocPitch/cudaMemcpy2D:
First, the cuda runtime API functions like cudaMallocPitch and cudaMemcpy2D do not actually involve either double-pointer allocations or 2D (doubly-subscripted) arrays. This is easy to confirm simply by looking at the documentation, and noting the types of parameters in the function prototypes. The src and dst parameters are single-pointer parameters. They could not be doubly-subscripted, or doubly dereferenced. For additional example usage, here is one of many questions on this. here is a fully worked example usage. Another example covering various concepts associated with cudaMallocPitch/cudaMemcpy2d usage is here. Instead the correct way to think about these is that they work with pitched allocations. Also, you cannot use cudaMemcpy2D to transfer data when the underlying allocation has been created using a set of malloc (or new, or similar) operations in a loop. That sort of host data allocation construction is particularly ill-suited to working with the data on the device.
general, dynamically allocated 2D case:
If you wish to learn how to use a dynamically allocated 2D array in a CUDA kernel (meaning you can use doubly-subscripted access, e.g. data[x][y]), then the cuda tag info page contains the "canonical" question for this, it is here. The answer given by talonmies there includes the proper mechanics, as well as appropriate caveats:
there is additional, non-trivial complexity
the access will generally be less efficient than 1D access, because data access requires dereferencing 2 pointers, instead of 1.
(note that allocating an array of objects, where the object(s) has an embedded pointer to a dynamic allocation, is essentially the same as the 2D array concept, and the example you linked in your question is a reasonable demonstration for that)
Also, here is a thrust method for building a general dynamically allocated 2D array.
flattening:
If you think you must use the general 2D method, then go ahead, it's not impossible (although sometimes people struggle with the process!) However, due to the added complexity and reduced efficiency, the canonical "advice" here is to "flatten" your storage method, and use "simulated" 2D access. Here is one of many examples of questions/answers discussing "flattening".
general, dynamically allocated 3D case:
As we extend this to 3 (or higher!) dimensions, the general case becomes overly complex to handle, IMO. The additional complexity should strongly motivate us to seek alternatives. The triply-subscripted general case involves 3 pointer accesses before the data is actually retrieved, so even less efficient. Here is a fully worked example (2nd code example).
special case: array width known at compile time:
Note that it should be considered a special case when the array dimension(s) (the width, in the case of a 2D array, or 2 of the 3 dimensions for a 3D array) is known at compile-time. In this case, with an appropriate auxiliary type definition, we can "instruct" the compiler how the indexing should be computed, and in this case we can use doubly-subscripted access with considerably less complexity than the general case, and there is no loss of efficiency due to pointer-chasing. Only one pointer need be dereferenced to retrieve the data (regardless of array dimensionality, if n-1 dimensions are known at compile time for a n-dimensional array). The first code example in the already-mentioned answer here (first code example) gives a fully worked example of that in the 3D case, and the answer here gives a 2D example of this special case.
doubly-subscripted host code, singly-subscripted device code:
Finally another methodology option allows us to easily mix 2D (doubly-subscripted) access in host code while using only 1D (singly-subscripted, perhaps with "simulated 2D" access) in device code. A worked example of that is here. By organizing the underlying allocation as a contiguous allocation, then building the pointer "tree", we can enable doubly-subscripted access on the host, and still easily pass the flat allocation to the device. Although the example does not show it, it would be possible to extend this method to create a doubly-subscripted access system on the device based off a flat allocation and a manually-created pointer "tree", however this would have approximately the same issues as the 2D general dynamically allocated method given above: it would involve double-pointer (double-dereference) access, so less efficient, and there is some complexity associated with building the pointer "tree", for use in device code (e.g. it would necessitate an additional cudaMemcpy operation, probably).
From the above methods, you'll need to choose one that fits your appetite and needs. There is not one single recommendation that fits every possible case.

Is Ada.Containers.Functional_Maps usable in Ada2012?

The information about Ada.Containers.Functional_Maps in the GNAT documentation is quite—let's say—abstruse.
First, it says this:
…these containers can still be used safely.
In the second paragraph, it seems to me that you cannot free the memory allocated for those objects once the program exits the context where they are created. I am understanding that you could run into a memory leak. Am I right?
They are also memory consuming, as the allocated memory is not reclaimed when the container is no longer referenced.
Read the next two sentences in the doc:
Thus, they should in general be used in ghost code and annotations, so that they can be removed from the final executable. The specification of this unit is compatible with SPARK 2014.
Because the specification of Ada.Containers.Functional_Maps is compatible with SPARK, it may help to examine it in the context of related SPARK Libraries with regard to proof, testing and annotation. In particular,
The functional maps, sets and vectors are unbounded collections of indefinite elements that are neither controlled nor limited. While they are inefficient with regard to memory, they are simple, immutable and useful "to model user defined data structures."
The functional containers can be used in Ghost Code, "parts of the code that are only meant for specification and verification", as suggested here. This related example illustrates a ghost function.
it seems to me that you cannot free the memory allocated for those
objects once the program exits the context where they are created. I
am understanding that you could run into a memory leak. Am I right?
There are some things that you can do in Ada to manage memory, I would be surprised if (for example) the usage of an instance inside a declare-block were not cleaned-up on the block's exit. — This is, in fact, how some surprisingly robust applications can get away without "dynamically-allocated" memory/values (it's actually heap-allocated, but that's pedantic).
This sort of granular control is really nice, as you can constrain things/usages to specific points. Combined with Ada's good facilities for presenting interfaces, this means that changing some structure to another can be less-painful than it otherwise might be.
As an example of the above, I had a nested key-value map (a JSON object) that was being used to pass parameters around; the method for doing this changed and so I had a string of values (with common-rooted keys) coming in and a procedure that took JSON as input. Obviously what was needed was a "keys&values-to-JSON function, so inside the function I used the multiway-tree container where the leafs represented values and the internal-nodes the keys, the second step was to traverse the tree and create the JSON-object as needed - simple recursion and data-structure selection used to address the problem of adapting the textual key-value pairs of these nested parameters to JSON. — And because the usage of multi-way trees was exclusive to this function, I can be confident that the memory used by the intermediate tree-object I used is released on the function's exit.

Dictionary vs. hashtable

Can someone explain the difference between dictionaries and hashtables? In Java, I've read that dictionaries are a superset of hashtables, but I always thought it was the other way around. Other languages seem to treat the two as the same. When should one be used over the other, and what's the difference?
The Oxford Dictionary of Computing defines a dictionary as...
Any data structure representing a set of elements that can support the insertion and deletion of elements as well as a test for membership.
As such, dictionaries are an abstract idea that can be reasonably efficiently implemented as e.g. binary trees or hash tables, tries, or even direct array indexing if the keys are numeric and not too sparse. That said, python uses a closed-hashing hash table for its dict implementation, and C# seems to use some kind of hash table too (hence the need for a separate SortedDictionary type).
A hash table is a much more specific and concrete data structures: there are several implementations options (closed vs. open hashing being perhaps the most fundamental), but they're all characterised by O(1) amortised insertion, lookup and deletion, and there's no excuse for begin->end iteration worse than O(n + #buckets), while implementations may achieve better (e.g. GCC's C++ library has O(n) container iteration. The implementations necessarily depend on a hash function leading to an indexed probe in an array.
The way i see it, a hashtable is one way of implementing a dictionary. specifying that the key is hashfunction(x) and the value is any Object. The Java Dictionary can use any key as long as .equals(y) has been implemented for that object.
The 'answer' will also change depending on the language (C#? Java? JS?) you're using. in JS the 'dictionary' is implemented as a hashtable and there is no difference. ---- in another language (i believe it's C#), the Dictionary MUST be strongly typed fixed type key and fixed type value, while the Hashtable's value can be any type, and the two are not extended from one another.

Container Semantics and Functional Style in D

Does containers in D have value or reference semantics by default? If they have reference semantics doesn't that fundamentally hinder the use of functional programming style in D (compared to C++11's Move Semantics) such as in the following (academic) example:
auto Y = reverse(sort(X));
where X is a container.
Whether containers have value semantics or reference semantics depends entirely on the container. The only built-in containers are dynamic arrays, static arrays, and associative arrays. Static arrays have strict value semantics, because they sit on the stack. Associative arrays have strict reference semantics. And dynamic arrays mostly have reference semantics. They're elements don't get copied, but they do, so they end up with semantics which are a bit particular. I'd advise reading this article on D arrays for more details.
As for containers which are official but not built-in, the containers in std.container all have reference semantics, and in general, that's how containers should be, because it's highly inefficient to do otherwise. But since anyone can implement their own containers, anyone can create containers which are value types if they want to.
However, like C++, D does not take the route of having algorithms operate on containers, so as far as algorithms go, whether containers have reference or value semantics is pretty much irrelevant. In C++, algorithms operate on iterators, so if you wanted to sort a container, you'd do something like sort(container.begin(), container.end()). In D, they operate on ranges, so you'd do sort(container[]). In neither language would you actually sort a container directly. Whether containers themselves have value or references semantics is therefore irrelevant to your typical algorithm.
However, D does better at functional programming with algorithms than C++ does, because ranges are better suited for it. Iterators have to be passed around in pairs, which doesn't work very well for chaining functions. Ranges, on the other hand, chain quite well, and Phobos takes advantage of this. It's one of its primary design principles that most of its functions operate on ranges to allow you to do in code what you typically end up doing on the unix command line with pipes, where you have a lot of generic tools/functions which generate output which you can pipe/pass to other tools/functions to operate on, allowing you to chain independent operations to do something specific to your needs rather than relying on someone to have written a program/function which did exactly what you want directly. Walter Bright discussed it recently in this article.
So, in D, it's easy to do something like:
auto stuff = sort(array(take(map!"a % 1000"(rndGen()), 100)));
or if you prefer UFCS (Universal Function Call Syntax):
auto stuff = rndGen().map!"a % 1000"().take(100).array().sort();
In either case, it generates a sorted list of 100 random numbers between 0 and 1000, and the code is in a functional style, which C++ would have a much harder time doing, and libraries which operate on containers rather than iterators or ranges would have an even harder time doing.
In-Built Containers
The only in-built containers in D are slices (also called arrays/dynamic arrays) and static arrays. The latter have value semantics (unlike in C and C++) - the entire array is (shallow) copied when passed around.
As for slices, they are value types with indirection, so you could say they have both value and reference semantics.
Imagine T[] as a struct like this:
struct Slice(T)
{
size_t length;
T* ptr;
}
Where ptr is a pointer to the first element of the slice, and length is the number of elements within the bounds of the slice. You can access the .ptr and .length fields of a slice, but while the data structure is identical to the above, it's actually a compiler built-in and thus not defined anywhere (the name Slice is just for demonstrative purposes).
Knowing this, you can see that copying a slice (assign to another variable, pass to a function etc.) just copies a length (no indrection - value semantics) and a pointer (has indirection - reference semantics).
In other words, a slice is a view into an array (located anywhere in memory), and there can be multiple views into the same array.
Algorithms
sort and reverse from std.algorithm work in-place to cater to as many users as possible. If the user wanted to put the result in a GC-allocated copy of the slice and leave the original unchanged, that can easily be done (X.dup). If the user wanted to put the result in a custom-allocated buffer, that can be done too. Finally, if the user wanted to sort in-place, this is an option. At any rate, any extra overhead is made explicit.
However, it's important to note that most algorithms in the standard library don't require mutation, instead returning lazily-evaluated range results, which is characteristic of functional programming.
User-Defined Containers
When it comes to user-defined containers, they can have whatever semantics they want - any configuration is possible in D.
The containers in std.container are reference types with .dup methods for making copies, thus slightly emulating slices.

What is the disadvantage of list as a universal data type representation?

Lisp programmers tend to use lists to represent all other data types.
However, I have heard that lists are not a good universal representation for data types.
What are the disadvantage of lists being used in this manner, in contrast to using records?
You mention "record". By this I take it that you're referring to fixed-element structs/objects/compound data. For instance, in HtDP syntax:
;; a packet is (make-packet destination source text) where destination is a number,
;; source is a number, and text is a string.
... and you're asking about the pros and cons of representing a packet as a list of length three,
rather than as a piece of compound data (or "record").
In instances where compound data is appropriate--the values have specific roles and names, and there are a fixed number of them--compound data is generally preferable; they help you to catch errors in your programs, which is the sine qua non of programming.
The disadvantage is that it isn't universal. Sometimes this is performance related: you want constant time lookups (array, hash table). Sometimes this is organization related: you want to name your data locations (Hash table, record ... although you could use name,value pairs in the list). It requires a little diligence on the part of the author to make the code self-documenting (more diligence than the record). Sometimes you want the type system to catch mistakes made by putting things in the wrong spot (record, typed tuples).
However, most issues can be addressed with OptimizeLater. The list is a versatile little data structure.
You're talking about what Peter Seibel addresses in Chapter 11 of Practical Common Lisp:
[Starting] discussion of Lisp's collections
with lists . . . often leads readers to the mistaken
conclusion that lists are Lisp's only collection type. To make matters
worse, because Lisp's lists are such a flexible data structure, it is
possible to use them for many of the things arrays and hash tables are
used for in other languages. But it's a mistake to focus too much on
lists; while they're a crucial data structure for representing Lisp
code as Lisp data, in many situations other data structures are more
appropriate.
Once you're familiar with all the data types Common Lisp offers,
you'll also see that lists can be useful for prototyping data
structures that will later be replaced with something more efficient
once it becomes clear how exactly the data is to be used.
Some reasons I see are:
A large hashtable, for example, has faster access than the equivalent alist
A vector of a single datatype is more compact and, again, faster to access
Vectors are more efficiently and easily accessed by index
Objects and structures allow you to access data by name, not position
It boils down to using the right datatype for the task at hand. When it's not obvious, you have two options: guess and fix it later, or figure it out now; either of those is sometimes the right approach.

Resources