Julia iterator which parses each line in file - julia

I'm new to Julia and have hit a rock with something I imagine should be a common scenario:
I would like an iterator which parses a text file one line at a time. So, like eachline(f), except a function parse is applied to each line. Call the result eachline(parse, f) if you wish (like the version of open with an extra function argument): mapping parse over eachline.
To be more specific: I have a function poset(s::String) which turns a string representation of a poset into a poset:
function poset(s::String)
nums = split(s)
...
return transitiveclosure(g)
end
Now I'd like to be able to say something like
open("posets7.txt") do f
for p in eachline(poset, f)
#something
end
end
(and I specifically need this to be an iterator: my files are rather large, so I really want to parse them one line at a time).

I think I would (personally) use for Iterators.map in such a case:
open("posets7.txt") do f
for p in Iterators.map(poset, eachline(f))
# do something
end
end
But, as the documentation notes (and as you discovered yourself), this is equivalent to using a generator expression:
open("posets7.txt") do f
for p in (poset(x) for x in eachline(f))
# do something
end
end

Related

How to correctly do line continuation in Julia?

In fortran, we can simply use & to do line continuation.
But I wonder, in Julia, is there a safe way to do line continuation?
I heard that sometimes Julia cannot identify line continuation and it can cause bugs? Because after, Julia does not seem to have any symbol to do line continuation. Can Julia correctly recognize the line continuation?
Like I define the below function with long arguments,
function mean_covar_init(kmix::Int64,dim_p::Int64,weight::Array{Float64,1},sigma::Array{Float64,2},mu::Array{Float64,2})
return nothing
end
If I do things like
function mean_covar_init(kmix::Int64
,dim_p::Int64
,weight::Array{Float64,1}
,sigma::Array{Float64,2}
,mu::Array{Float64,2})
return nothing
end
Is it safe? Thank you very much!
If you do the thing like you have presented it is safe because Julia sees ( in the first line of code so it will look for a closing ).
However a problematic code would be:
f() = 1
+ 2
The reason is that the f() = 1 part is a valid and complete function definition. Therefore you need to make sure to signal Julia that the line is incomplete. The three most typical ways to do it are:
Move the + to the end of first line:
f() = 1 +
2
Use ( and ) as a wrapper:
f() = (1
+ 2)
Use begin and end:
f() = begin 1
+ 2 end
Let me give another example with macros, which do not require parenthesis or punctuation and therefore can be often tricky. Therefore the following:
#assert isodd(4) "What even are numbers?"
if rewritten as
#assert isodd(4)
"What even are numbers?"
does not produce what you expect, and you need to do e.g.:
#assert(isodd(4),
"What even are numbers?")

R: Enriched debugging for linear code chains

I am trying to figure out if it is possible, with a sane amount of programming, to create a certain debugging function by using R's metaprogramming features.
Suppose I have a block of code, such that each line uses as all or part of its input the output from thee line before -- the sort of code you might build with pipes (though no pipe is used here).
{
f1(args1) -> out1
f2(out1, args2) -> out2
f3(out2, args3) -> out3
...
fn(out<n-1>, args<n>) -> out<n>
}
Where for example it might be that:
f1 <- function(first_arg, second_arg, ...){my_body_code},
and you call f1 in the block as:
f1(second_arg = 1:5, list(a1 ="A", a2 =1), abc = letters[1:3], fav = foo_foo)
where foo_foo is an object defined in the calling environment of f1.
I would like a function I could wrap around my block that would, for each line of code, create an entry in a list. Each entry would be named (line1, line2) and each line entry would have a sub-entry for each argument and for the function output. the argument entries would consist, first, of the name of the formal, to which the actual argument is matched, second, the expression or name supplied to that argument if there is one (and a placeholder if the argument is just a constant), and third, the value of that expression as if it were immediately forced on entry into the function. (I'd rather have the value as of the moment the promise is first kept, but that seems to me like a much harder problem, and the two values will most often be the same).
All the arguments assigned to the ... (if any) would go in a dots = list() sublist, with entries named if they have names and appropriately labeled (..1, ..2, etc.) if they are assigned positionally. The last element of each line sublist would be the name of the output and its value.
The point of this is to create a fairly complete record of the operation of the block of code. I think of this as analogous to an elaborated version of purrr::safely that is not confined to iteration and keeps a more detailed record of each step, and indeed if a function exits with an error you would want the error message in the list entry as well as as much of the matched arguments as could be had before the error was produced.
It seems to me like this would be very useful in debugging linear code like this. This lets you do things that are difficult using just the RStudio debugger. For instance, it lets you trace code backwards. I may not know that the value in out2 is incorrect until after I have seen some later output. Single-stepping does not keep intermediate values unless you insert a bunch of extra code to do so. In addition, this keeps the information you need to track down matching errors that occur before promises are even created. By the time you see output that results from such errors via single-stepping, the matching information has likely evaporated.
I have actually written code that takes a piped function and eliminates the pipes to put it in this format, just using text manipulation. (Indeed, it was John Mount's "Bizarro pipe" that got me thinking of this). And if I, or we, or you, can figure out how to do this, I would hope to make a serious run on a second version where each function calls the next, supplying it with arguments internally rather than externally -- like a traceback where you get the passed argument values as well as the function name and and formals. Other languages have debugging environments like that (e.g. GDB), and I've been wishing for one for R for at least five years, maybe 10, and this seems like a step toward it.
Just issue the trace shown for each function that you want to trace.
f <- function(x, y) {
z <- x + y
z
}
trace(f, exit = quote(print(returnValue())))
f(1,2)
giving the following which shows the function name, the input and output. (The last 3 is from the function itself.)
Tracing f(1, 2) on exit
[1] 3
[1] 3

Parameter file in ArgParse.jl?

Python's argparse has a simple way to read parameters from a file:
https://docs.python.org/2/library/argparse.html#fromfile-prefix-chars
Instead of passing your arguments one by one:
python script.py --arg1 val1 --arg2 val2 ...
You can say:
python script.py #args.txt
and then the arguments are read from args.txt.
Is there a way to do this in ArgParse.jl?
P.S.: If there is no "default" way of doing this, maybe I can do it by hand, by calling parse_args on a list of arguments read from a file. I know how to do this in a dirty way, but it gets messy if I want to replicate the behavior of argparse in Python, where I can pass multiple files with #, as well as arguments in the command line, and then the value of a parameter is simply the last value passed to this parameter. What's the best way of doing this?
This feature is not currently present in ArgParse.jl, although it would not be difficult to add. I have prepared a pull request.
In the interim, the following code suffices for what you need:
# faithful reproduction of Python 3.5.1 argparse.py
# partial copyright Python Software Foundation
function read_args_from_files(arg_strings, prefixes)
new_arg_strings = AbstractString[]
for arg_string in arg_strings
if isempty(arg_string) || arg_string[1] ∉ prefixes
# for regular arguments, just add them back into the list
push!(new_arg_strings, arg_string)
else
# replace arguments referencing files with the file content
open(arg_string[2:end]) do args_file
arg_strings = AbstractString[]
for arg_line in readlines(args_file)
push!(arg_strings, rstrip(arg_line, '\n'))
end
arg_strings = read_args_from_files(arg_strings, prefixes)
append!(new_arg_strings, arg_strings)
end
end
end
# return the modified argument list
return new_arg_strings
end
# preprocess args, then parse as usual
ARGS = read_args_from_files(ARGS, ['#'])
args = parse_args(ARGS, s)

Confusion with ' operator and bracketing where (v')*v becomes Ac_mul_B despite overloading

I am playing around with the idea of CoVectors in Julia and am getting some behaviour I didn't expect from the parser/compiler. I have defined a new CoVector type that is the ctranspose of any vector, and is just a simple decoration:
type CoVector{T<:AbstractVector}
v::T
end
They can be created (and uncreated) with ' using ctranspose:
import Base.ctranspose
function CoVector(T::DataType,d::Integer=0)
return CoVector(Array(T,d))
end
function Base.ctranspose(cv::CoVector)
return cv.v
end
function Base.ctranspose(v::AbstractVector)
return CoVector(v)
end
function Base.ctranspose(v::Vector) # this is already specialized in Base
return CoVector(v)
end
Next I want to define a simple dot product
function *(x::CoVector,y::AbstractVector)
return dot(x.v,y)
end
Which can work fine for:
v = [1,2,3]
cv = v'
cv*v
returns 14, and cv is a CoVector. But if I do
(v') * v
I get something different! In this case it is a single element array containing 14. How come parenthesis doesn't work how I expect?
In the end we see the expression gets expanded to Ac_mul_B which defaults to [dot(A,B)] and it seems that this interpretation is defined at the "operator" level.
Is this expected behaviour? Can Julia completely ignore my bracketing and change the expression as it wants? In Julia I like that I can override things in Base but is it also possible to make self-consistent changes to how operators are applied? I see the expression doesn't have head call but Symbol call... does this change in Julia 0.4? (I read somewhere that call is becoming more universal).
I guess I can fix the problem by redefining Ac_mul_B but I was very surprised that it was called at all, given my above definitions.

Returning multiple values in Ruby, to be used to call a function

Is it possible to return multiple values from a function?
I want to pass the return values into another function, and I wonder if I can avoid having to explode the array into multiple values
My problem?
I am upgrading Capybara for my project, and I realized, thanks to CSS 'contains' selector & upgrade of Capybara, that the statement below will no longer work
has_selector?(:css, "#rightCol:contains(\"#{page_name}\")")
I want to get it working with minimum effort (there are a lot of such cases), So I came up with the idea of using Nokogiri to convert the css to xpath. I wanted to write it so that the above function can become
has_selector? xpath(:css, "#rightCol:contains(\"#{page_name}\")")
But since xpath has to return an array, I need to actually write this
has_selector?(*xpath(:css, "#rightCol:contains(\"#{page_name}\")"))
Is there a way to get the former behavior?
It can be assumed that right now xpath func is like the below, for brevity.
def xpath(*a)
[1, 2]
end
You cannot let a method return multiple values. In order to do what you want, you have to change has_selector?, maybe something like this:
alias old_has_selector? :has_selector?
def has_selector? arg
case arg
when Array then old_has_selector?(*arg)
else old_has_selector?(arg)
end
end
Ruby has limited support for returning multiple values from a function. In particular a returned Array will get "destructured" when assigning to multiple variables:
def foo
[1, 2]
end
a, b = foo
a #=> 1
b #=> 2
However in your case you need the splat (*) to make it clear you're not just passing the array as the first argument.
If you want a cleaner syntax, why not just write your own wrapper:
def has_xpath?(xp)
has_selector?(*xpath(:css, xp))
end

Resources