How to create a collection in Julia? - collections

This seems like a really basic question, but can't find the answer. How do I create a collection in Julia? For example, I want to open a text file and parse each line to create an (iterable or otherwise) collection. Obviously I don't know how many elements there are in advance.
I can iterate through the lines like this
I = each_line(open(fileName,"r"))
state = start(I)
while !done(I, state)
(i, state) = next(I, state)
println(i)
end
But I don't know how to put each i into an array or other collection. I tried
map( i -> println(i), each_line(open(fileName,"r") ) )
But got the error
no method map(Function,EachLine)

You could do this:
lines = String[]
for line in each_line(open(fileName))
push!(lines, line)
end
And then lines contains the list of lines. You need the String in the first line to make the array extensible.

Standard collections and supported operations are mainly covered in the standard library documentation.
Specifically, the Deques section covers all of the operations supported by the 1d Array type (vector), including push! and pop! as well as insertion, resizing, etc.
Omar's answer is correct, and I will just add a small qualification: String[] creates a 1d array of Strings. The same constructor syntax may be used for example to create Int[], Float[], or even Any[] vectors. The latter type may hold objects of any type.

Depending on your Julia version, you may also be able to write collect(eachline(open("LICENSE.md"))) or [eachline(open("LICENSE.md"))...]. I think these won't work in 0.1.x versions but will working in newer 0.2 development versions (which are recommended at this point – 0.2 is on its way soon).

Related

What does the "Base" keyword mean in Julia?

I saw this example in the Julia language documentation. It uses something called Base. What is this Base?
immutable Squares
count::Int
end
Base.start(::Squares) = 1
Base.next(S::Squares, state) = (state*state, state+1)
Base.done(S::Squares, s) = s > S.count;
Base.eltype(::Type{Squares}) = Int # Note that this is defined for the type
Base.length(S::Squares) = S.count;
Base is a module which defines many of the functions, types and macros used in the Julia language. You can view the files for everything it contains here or call whos(Base) to print a list.
In fact, these functions and types (which include things like sum and Int) are so fundamental to the language that they are included in Julia's top-level scope by default.
This means that we can just use sum instead of Base.sum every time we want to use that particular function. Both names refer to the same thing:
Julia> sum === Base.sum
true
Julia> #which sum # show where the name is defined
Base
So why, you might ask, is it necessary is write things like Base.start instead of simply start?
The point is that start is just a name. We are free to rebind names in the top-level scope to anything we like. For instance start = 0 will rebind the name 'start' to the integer 0 (so that it no longer refers to Base.start).
Concentrating now on the specific example in docs, if we simply wrote start(::Squares) = 1, then we find that we have created a new function with 1 method:
Julia> start
start (generic function with 1 method)
But Julia's iterator interface (invoked using the for loop) requires us to add the new method to Base.start! We haven't done this and so we get an error if we try to iterate:
julia> for i in Squares(7)
println(i)
end
ERROR: MethodError: no method matching start(::Squares)
By updating the Base.start function instead by writing Base.start(::Squares) = 1, the iterator interface can use the method for the Squares type and iteration will work as we expect (as long as Base.done and Base.next are also extended for this type).
I'll grant that for something so fundamental, the explanation is buried a bit far down in the documentation, but http://docs.julialang.org/en/release-0.4/manual/modules/#standard-modules describes this:
There are three important standard modules: Main, Core, and Base.
Base is the standard library (the contents of base/). All modules
implicitly contain using Base, since this is needed in the vast
majority of cases.

Assert type information onto computed results in Julia

Problem
I read in an array of strings from a file.
julia> file = open("word-pairs.txt");
julia> lines = readlines(file);
But Julia doesn't know that they're strings.
julia> typeof(lines)
Array{Any,1}
Question
Can I tell Julia this somehow?
Is it possible to insert type information onto a computed result?
It would be helpful to know the context where this is an issue, because there might be a better way to express what you need - or there could be a subtle bug somewhere.
Can I tell Julia this somehow?
No, because the readlines function explicitly creates an Any array (a = {}): https://github.com/JuliaLang/julia/blob/master/base/io.jl#L230
Is it possible to insert type information onto a computed result?
You can convert the array:
r = convert(Array{ASCIIString,1}, w)
Or, create your own readstrings function based on the link above, but using ASCIIString[] for the collection array instead of {}.
Isaiah is right about the limits of readlines. More generally, often you can say
n = length(A)::Int
when generic type inference fails but you can guarantee the type in your particular case.
As of 0.3.4:
julia> typeof(lines)
Array{Union(ASCIIString,UTF8String),1}
I just wanted to warn against:
convert(Array{ASCIIString,1}, lines)
that can fail (for non-ASCII) while I guess, in this case nothing needs to be done, this should work:
convert(Array{UTF8String,1}, lines)

Silly Ada Generic Package Management?

I have just started using Ada, and I'm finding the generic package declarations to be rather silly. Maybe I'm not doing it right, so I'm looking for better options.
Take a look at the below example.
package STD_Code_Maps is new
Ada.Containers.Map(Key_Type => STD_Code_Type;
Element_Type => Ada.Strings.Unbounded.Unbounded_String);
STD_Code_Map : STD_Code_Maps.Map;
-- . . .
procedure Do_Something is
Item : Ada.Strings.Unbounded.Unbounded_String;
begin
Item := STD_Code_Maps.Element(STD_Code_Maps.First(STD_Code_Map));
-- Do something with the Item
-- . . .
end Do_Something;
It would be much cleaner to simply be able to write STD_Code_Map.First.Element instead of the godforsaken STD_Code_Maps.Element(STD_Code_Maps.First(STD_Code_Map));
Obviously I'm doing this wrong -- I think. I'm repeating the phrase STD_Code_Map at least thrice over there. I'm all for verbosity and everything, but really the code I'm writing seems bad and silly to me.
I was wondering if there is solution that doesn't require you to rename the package to something like package Map renames STD_Code_Maps; which would of course shorten the code but I don't want to do this on each and every procedure entry. I really think something like STD_Code_Map.First.Element would be much simpler. Can this be done in Ada 2012?
Note: Using the Unbounded_String package by default is also so difficult. Did the standard library designers actually give much thought to the ridiculous and overly long package hierarchy?
Thanks for reading this, and potentially helping me out. I'm new to Ada.
GNAT GPL 2012 and 2013, and FSF GCC 4.7 and 4.8, support the new container indexing scheme of Ada 2012, which means that you can write
Item := STD_Code_Map ({some Cursor});
And you can do this even with the -gnat05 switch to force Ada 2005 mode! (which has to be a bug).
Ada 2005 allows you to call a primitive function of a tagged type using object.function notation, provided that the first operand is of the tagged type; so you can write STD_Code_Map.First as shorthand for STD_Code_Maps.First (STD_Code_Map).
Putting these together, you can write
Item := STD_Code_Map (STD_Code_Map.First);
which is quite short!
The issue has nothing to do with generics. As Simon pointed out, you can say STD_Code_Map.First, since the type of STD_Code_Map is a tagged type and Ada supports this notation for tagged types. On the other hand, the type of STD_Code_Map.First is a Cursor type, which isn't tagged (making it tagged would have caused problems declaring certain operations that take both a Cursor and a Map). But even without the Ada 2012 container indexing that Simon mentioned, you can say
STD_Code_Maps.Element(STD_Code_Map.First);
which is a little better. Besides renaming a package, you can also rename the function:
function Elem (Position : STD_Code_Maps.Cursor) return STD_Code_Maps.Element_Type
renames STD_Code_Maps.Element;
and you can now use just Elem instead of STD_Code_Maps.Element wherever the renaming is directly visible. (You could call it Element if you want to. The renaming name can be the same or it can be different.) This could be helpful if you use that function a lot.
Ada was designed for readability and maintainability. (written once, read & maintained for much longer) This means that it does get a little verbose at times. If you prefer terse & cryptic there are plenty of other languages out there !
If you want to avoid typing STD_Code_Map all the time, just use a use clause:
use STD_Code_Map;
which would mean your code of
Item := STD_Code_Maps.Element(STD_Code_Maps.First(STD_Code_Map));
would become
Item := Element(First(STD_Code_Map));
Getting names nice and readable in Ada can sometimes be tricky. Often times language designers made the task worse than it had to be, by designing Ada standard library packages for use with Ada use clauses, without a thought to how they'd look to some poor sap who either can't or doesn't want to use that feature.
In this case though, there are things you can do on your own end.
For example "_Maps.Map" is redundant, so why not get rid of it from the package name? Why not use names so that you can write:
package STD_Code is new
Ada.Containers.Map(Key_Type => STD_Code_Type;
Element_Type => Ada.Strings.Unbounded.Unbounded_String);
Map : STD_Code.Map;
-- . . .
procedure Do_Something is
Item : Ada.Strings.Unbounded.Unbounded_String;
begin
Item := STD_Code.Element(STD_Code.First(Map));
-- Do something with the Item
-- . . .
end Do_Something;
Now Code looks like a bit of a null-meaning word too. Everything in a program is a code. So I'd consider ditching it as well. Unsually I name my Ada container packages something that says their basic theoretical function (eg: STD_to_String), while objects are more specific nouns.
Also, I should point out that if your map is constant and you can live with names that look like identifiers, often times you can get rid of maps to strings entirely by using an enumerated type and the 'image attribute.
If you're using Ada 2012 the STD_Code_Maps.Element(STD_Code_Maps.First(STD_Code_Map)); could become:
Function Get_First( Map : STD_Code_Maps.Map ) return Unbounded_String is
( STD_Code_Maps.Element(Map.First) );
That is an example of the new Expression-functions, which were mainly introduced for pre- and post-conditions.
Of course, if you're using Ada 2012 then the precise map you'd likely want is Ada.Containers.Indefinite_Ordered_Maps -- the Indefinite_* containers are those that [can] have indefinite elements, like String.
In addition to TED's comment, using an enumeration when the maps are constant have even more advantages, namely case coverage: the compiler will flag as an error any case-statement that doesn't cover all alternatives. So if you add a new code it will flag all the case-statements that you need to modify. (Of course this benefit is lost when you use the others case.)

How to suppress function return

Suppose I have a function that has multiple returned values (shown below). However, this output is not informative as users do not know what each value stands for unless they look up the function definition. So I would like to use println() to print the results with appropriate names to the screen, while suppressing the the actual returned values from being printed on the screen. In R, the function invisible() does that, but how do you do the same thing in Julia?
function trimci(x::Array; tr=0.2, alpha=0.05, nullvalue=0)
se=sqrt(winvar(x,tr=tr))./((1-2.*tr)*sqrt(length(x)))
ci=cell(2)
df=length(x)-2.*floor(tr.*length(x))-1
ci=[tmean(x, tr=tr)-qt(1-alpha./2, df).*se, tmean(x, tr=tr)+qt(1-alpha./2, df).*se]
test=(tmean(x,tr=tr)-nullvalue)./se
sig=2.*(1-pt(abs(test),df))
return ci, tmean(x, tr=tr), test, se, sig
end
In addition to what Harlan and Stefan said, let me share an example from the ODBC.jl package (source here).
One of my favorite features of Julia over other languages is how dead simple it is to create custom types (and without performance issues either!). Here's a custom type, Metadata, that simply holds several fields of data that describe an executed query. This doesn't necessarily need its own type, but it makes it more convenient passing all this data between functions as well as allowing custom formatting of its output by overloading the Base.show() function.
type Metadata
querystring::String
cols::Int
rows::Int
colnames::Array{ASCIIString}
coltypes::Array{(String,Int16)}
colsizes::Array{Int}
coldigits::Array{Int16}
colnulls::Array{Int16}
end
function show(io::IO,meta::Metadata)
if meta == null_meta
print(io,"No metadata")
else
println(io,"Resultset metadata for executed query")
println(io,"------------------------------------")
println(io,"Columns: $(meta.cols)")
println(io,"Rows: $(meta.rows)")
println(io,"Column Names: $(meta.colnames)")
println(io,"Column Types: $(meta.coltypes)")
println(io,"Column Sizes: $(meta.colsizes)")
println(io,"Column Digits: $(meta.coldigits)")
println(io,"Column Nullable: $(meta.colnulls)")
print(io,"Query: $(meta.querystring)")
end
end
Again, nothing fancy, but illustrates how easy it really is to define a custom type and produce custom output along with it.
Cheers.
One thing you could do would be to define a new type for the return value for this function, call it TrimCIResult or something. Then you could define appropriate methods to show that object in the REPL. Or you may be able to generalize that solution with a type hierarchy that could be used for storing the results from and displaying any statistical test.
The value nothing is how you return a value that won't print: the repl specifically checks for the value nothing and prints nothing if that's the value returned by an expression. What you're looking to do is to return a bunch of values and not print them, which strikes me as rather odd. If a function returns some stuff, I want to know about it – having the repl lie to users seems like a bad idea. Harlan's suggesting would work though: define a type for this value with the values you don't want to expose to the user as fields and customize its printing so that the fields you don't want to show people aren't printed.

VIM: Bijection between VIM Taglist elements and code snippets?

My Taglist in a C code:
macro
|| MIN_LEN
|| MAX_ITERATIONS
||- typedef
|| cell
|| source_cell
||- variable
|| len_given
Taglist elements (domain):
A = {MIN_LEN, MAX_ITERATIONS, cell, source_cell, len_given}
Code snippets (codomain):
B = {"code_MIN_LEN", "code_MAX_ITERATIONS", ..., "code_len_given"}
Goal: to have bijection between the sets A and B.
Example: I want to remove any element in A, such as the MIN_LEN, from A and B by removing either its element in A or B.
Question: Is there a way to quarantee a bijection between A and B so that a change either in A or in B results in a change in the other set?
I strongly doubt you can do that. The taglist plugin uses ctags to collect the symbols in your code and display them in a lateral split. The lateral split contains readonly information (if you try to work on that window, vim tells you that modifiable is off for that buffer).
What you want to achieve would imply quite complex parsing of the source code you are modifying. Even a simple task like automatic renaming (assuming you modify a function name entry in the taglist buffer and all the instances in your source are updated) requires pretty complex parsing, which is beyond the ctags features or taglist itself. Deleting and keeping everything in sync with a bijective relationship is even more complex. Suppose you have a printf line where you use a macro you want to remove. What should happen to that line? should the whole line disappear, or just the macro (in that case, the line will probably be syntactically incorrect.
taglist is a nice plugin for browsing your code, but it's unsuited for automatic refactoring (which is what you want to achieve).
Edit: as for the computational complexity, well, the worst case scenario is that you have to scout the whole document at every keystroke, looking for new occurrence of labels that could be integrated, so in this sense you could say it's O(n) at each keystroke. This is of course overkill and the worst method to implement it. I am not aware of the computational complexity of the syntax highlight in vim, (which would be useful to extract tags as well, via proper tokenization), but I would estimate it very low, and very limited in the amount of parsed data (you are unlikely to have large constructs to parse to extract the token and understand its context). In any case, this is not how taglist works. Taglist runs ctags at every vim invocation, it does not parse the document live while you type. This is however done by Eclipse, XCode and KDevelop for example, which also provide tools for automatic or semiautomatic refactoring, and can eventually integrate vim as an editor. If you need these features, you are definitely using the wrong tool.

Resources