Compare two buffers for equality - julia

Is there a way to compare two buffers directly?
For example having two identical files file1 and file1-copy, i would like to do:
f1 = open(file1)
f2 = open(file1-copy)
if f1 == f2
println("Equal content")
end
I know i can make strings of that and compare those:
if readstring(f1) == readstring(f2)
println("Equal content")
end

Easiest way is probably just to mmap them:
julia> f1 = open("file")
f2 = open("file-copy");
julia> Mmap.mmap(f1) == Mmap.mmap(f2)
true

I guess not(in the sense that comparing two buffers(f1&f2) directly), but if you could pre-calculate their hashes, it would be convenient to directly compare them later on:
julia> using SHA
shell> cat file1
Is there a way to compare two buffers directly? For example having two identical files file1 and file1-copy, i would like to do:
f1 = open(file1)
f2 = open(file1-copy)
if f1 == f2
println("Equal content")
end
I know i can make strings of that and compare those:
if readstring(f1) == readstring(f2)
println("Equal content")
end
julia> file1 = open("file1") do f
sha256(f)
end |> bytes2hex
"eb179202793cfbfd1a1f19e441e813a8e23012a5bdd81e453daa266fcb74144a"
julia> file1copy = open("file1-copy") do f
sha256(f)
end |> bytes2hex
"eb179202793cfbfd1a1f19e441e813a8e23012a5bdd81e453daa266fcb74144a"
julia> file1 == file1copy
true

Related

generating expressions and then checking them in Julia

My goal is to be able to generate a list of expressions, p.g., check that a number is in some interval, and then evaluate it.
I was able to do it in the following way.
First, a function genExpr that creates such an Expr:
function genExpr(a::Real, b::Real)::Expr
quote
x < $(a + b) && x > $(a - b)
end
end
Create two expressions:
e1 = genExpr(0,3)
e2 = genExpr(8,2)
Now, my problem is how to pass these expressions to a function along with a number x. Then, this function, checks if such a number satisfies both conditions. I was able to achieve it with the following function:
function applyTest(y::Real, vars::Expr...)::Bool
global x = y
for var in vars
if eval(var)
return true
end
end
return false
end
This works, but the appearance of global suggests the existence of a better way of obtaining the same goal. And that's my question: create a function with arguments a number and a list of Expr's. Such function returns true if any condition is satisfied and false otherwise.
This looks like a you are probably looking into using a macro:
macro genExpr(a::Real, b::Real)
quote
x-> x < $(a + b) && x > $(a - b)
end
end
function applyTest(y::Real, vars::Function...)::Bool
any(var(y) for var in vars)
end
Testing:
julia> e1 = #genExpr(0,3)
#15 (generic function with 1 method)
julia> e2 = #genExpr(8,2)
#17 (generic function with 1 method)
julia> applyTest(0,e1,e2)
true
However, with this simple code a function just generating a lambda would be as good:
function genExpr2(a::Real, b::Real)
return x-> x < (a + b) && x > (a - b)
end

List comprehensions and tuples in Julia

I am trying to do in Julia what this Python code does. (Find all pairs from the two lists whose combined value is above 7.)
#Python
def sum_is_large(a, b):
return a + b > 7
l1 = [1,2,3]
l2 = [4,5,6]
l3 = [(a,b) for a in l1 for b in l2 if sum_is_large(a, b)]
print(l3)
There is no if for list comprehensions in Julia. And if I use filter(), I'm not sure if I can pass two arguments. So my best suggestion is this:
#Julia
function sum_is_large(pair)
a, b = pair
return a + b > 7
end
l1 = [1,2,3]
l2 = [4,5,6]
l3 = filter(sum_is_large, [(i,j) for i in l1, j in l2])
print(l3)
I don't find this very appealing. So my question is, is there a better way in Julia?
Using the very popular package Iterators.jl, in Julia:
using Iterators # install using Pkg.add("Iterators")
filter(x->sum(x)>7,product(l1,l2))
is an iterator producing the pairs. So to get the same printout as the OP:
l3iter = filter(x->sum(x)>7,product(l1,l2))
for p in l3iter println(p); end
The iterator approach is potentially much more memory efficient. Ofcourse, one could just l3 = collect(l3iter) to get the pair vector.
#user2317519, just curious, is there an equivalent iterator form for python?
Guards (if) are now available in Julia v0.5 (currently in the release-candidate stage):
julia> v1 = [1, 2, 3];
julia> v2 = [4, 5, 6];
julia> v3 = [(a, b) for a in v1, b in v2 if a+b > 7]
3-element Array{Tuple{Int64,Int64},1}:
(3,5)
(2,6)
(3,6)
Note that generators are also now available:
julia> g = ( (a, b) for a in v1, b in v2 if a+b > 7 )
Base.Generator{Filter{##18#20,Base.Prod2{Array{Int64,1},Array{Int64,1}}},##17#19}(#17,Filter{##18#20,Base.Prod2{Array{Int64,1},Array{Int64,1}}}(#18,Base.Prod2{Array{Int64,1},Array{Int64,1}}([1,2,3],[4,5,6])))
Another option similar to the one of #DanGetz using also Iterators.jl:
function expensive_fun(a, b)
return (a + b)
end
Then, if the condition is also complicated, it can be defined as a function:
condition(x) = x > 7
And last, filter the results:
>>> using Iterators
>>> result = filter(condition, imap(expensive_fun, l1, l2))
result is an iterable that is only computed when needed (inexpensive) and can be collected collect(result) if required.
The one-line if the filter condition is simple enough would be:
>>> result = filter(x->(x > 7), imap(expensive_fun, l1, l2))
Note: imap works natively for arbitrary number of parameters.
Perhaps something like this:
julia> filter(pair -> pair[1] + pair[2] > 7, [(i, j) for i in l1, j in l2])
3-element Array{Tuple{Any,Any},1}:
(3,5)
(2,6)
(3,6)
although I'd agree it doesn't look like it ought to be the best way...
I'm surprised nobody mentions the ternary operator to implement the conditional:
julia> l3 = [sum_is_large((i,j)) ? (i,j) : nothing for i in l1, j in l2]
3x3 Array{Tuple,2}:
nothing nothing nothing
nothing nothing (2,6)
nothing (3,5) (3,6)
or even just a normal if block within a compound statement, i.e.
[ (if sum_is_large((x,y)); (x,y); end) for x in l1, y in l2 ]
which gives the same result.
I feel this result makes a lot more sense than filter(), because in julia the a in A, b in B construct is interpreted dimensionally, and therefore the output is in fact an "array comprehension" with appropriate dimensionality, which clearly in many cases would be advantageous and presumably the desired behaviour (whether we include a conditional or not).
Whereas filter will always return a vector. Obviously, if you really want a vector result you can always collect the result; or for a conditional list comprehension like the one here, you can simply remove nothing elements from the array by doing l3 = l3[l3 .!= nothing].
Presumably this is still clearer and no less efficient than the filter() approach.
You can use the #vcomp (vector comprehension) macro in VectorizedRoutines.jl to do Python-like comprehensions:
using VectorizedRoutines
Python.#vcomp Int[i^2 for i in 1:10] when i % 2 == 0 # Int[4, 16, 36, 64, 100]

Pass function arguments into Julia non-interactively

I have a Julia function in a file. Let's say it is the below. Now I want to pass arguments into this function. I tried doing
julia filename.jl randmatstat(5)
but this gives an error that '(' token is unexpected. Not sure what the solution would be. I am also a little torn on if there is a main function / how to write a full solution using Julia. For example what is the starting / entry point of a Julia Program?
function randmatstat(t)
n = 5
v = zeros(t)
w = zeros(t)
for i = 1:t
a = randn(n,n)
b = randn(n,n)
c = randn(n,n)
d = randn(n,n)
P = [a b c d]
Q = [a b; c d]
v[i] = trace((P.'*P)^4)
w[i] = trace((Q.'*Q)^4)
end
std(v)/mean(v), std(w)/mean(w)
end
Julia doesn't have an "entry point" as such.
When you call julia myscript.jl from the terminal, you're essentially asking julia to execute the script and exit. As such, it needs to be a script. If all you have in your script is a function definition, then it won't do much unless you later call that function from your script.
As for arguments, if you call julia myscript.jl 1 2 3 4, all the remaining arguments (i.e. in this case, 1, 2, 3 and 4) become an array of strings with the special name ARGS. You can use this special variable to access the input arguments.
e.g. if you have a julia script which simply says:
# in julia mytest.jl
show(ARGS)
Then calling this from the linux terminal will give this result:
<bashprompt> $ julia mytest.jl 1 two "three and four"
UTF8String["1","two","three and four"]
EDIT: So, from what I understand from your program, you probably want to do something like this (note: in julia, the function needs to be defined before it's called).
# in file myscript.jl
function randmatstat(t)
n = 5
v = zeros(t)
w = zeros(t)
for i = 1:t
a = randn(n,n)
b = randn(n,n)
c = randn(n,n)
d = randn(n,n)
P = [a b c d]
Q = [a b; c d]
v[i] = trace((P.'*P)^4)
w[i] = trace((Q.'*Q)^4)
end
std(v)/mean(v), std(w)/mean(w)
end
t = parse(Int64, ARGS[1])
(a,b) = randmatstat(t)
print("a is $a, and b is $b\n")
And then call this from your linux terminal like so:
julia myscript.jl 5
You can try running like so:
julia -L filename.jl -E 'randmatstat(5)'
Add the following to your Julia file:
### original file
function randmatstat...
...
end
### new stuff
if length(ARGS)>0
ret = eval(parse(join(ARGS," ")))
end
println(ret)
Now, you can run:
julia filename.jl "randmatstat(5)"
As attempted originally. Note the additional quotes added to make sure the parenthesis don't mess up the command.
Explanation: The ARGS variable is defined by Julia to hold the parameters to the command running the file. Since Julia is an interpreter, we can join these parameters to a string, parse it as Julia code, run it and print the result (the code corresponds to this description).

Convert Dict to DataFrame in Julia

Suppose I have a Dict defined as follows:
x = Dict{AbstractString,Array{Integer,1}}("A" => [1,2,3], "B" => [4,5,6])
I want to convert this to a DataFrame object (from the DataFrames module). Constructing a DataFrame has a similar syntax to constructing a dictionary. For example, the above dictionary could be manually constructed as a data frame as follows:
DataFrame(A = [1,2,3], B = [4,5,6])
I haven't found a direct way to get from a dictionary to a data frame but I figured one could exploit the syntactic similarity and write a macro to do this. The following doesn't work at all but it illustrates the approach I had in mind:
macro dict_to_df(x)
typeof(eval(x)) <: Dict || throw(ArgumentError("Expected Dict"))
return quote
DataFrame(
for k in keys(eval(x))
#eval ($k) = $(eval(x)[$k])
end
)
end
end
I also tried writing this as a function, which does work when all dictionary values have the same length:
function dict_to_df(x::Dict)
s = "DataFrame("
for k in keys(x)
v = x[k]
if typeof(v) <: AbstractString
v = string('"', v, '"')
end
s *= "$(k) = $(v),"
end
s = chop(s) * ")"
return eval(parse(s))
end
Is there a better, faster, or more idiomatic approach to this?
Another method could be
DataFrame(Any[values(x)...],Symbol[map(symbol,keys(x))...])
It was a bit tricky to get the types in order to access the right constructor. To get a list of the constructors for DataFrames I used methods(DataFrame).
The DataFrame(a=[1,2,3]) way of creating a DataFrame uses keyword arguments. To use splatting (...) for keyword arguments the keys need to be symbols. In the example x has strings, but these can be converted to symbols. In code, this is:
DataFrame(;[Symbol(k)=>v for (k,v) in x]...)
Finally, things would be cleaner if x had originally been with symbols. Then the code would go:
x = Dict{Symbol,Array{Integer,1}}(:A => [1,2,3], :B => [4,5,6])
df = DataFrame(;x...)

Getting count of occurrences for X in string

Im looking for a function like Pythons
"foobar, bar, foo".count("foo")
Could not find any functions that seemed able to do this, in a obvious way. Looking for a single function or something that is not completely overkill.
Julia-1.0 update:
For single-character count within a string (in general, any single-item count within an iterable), one can use Julia's count function:
julia> count(i->(i=='f'), "foobar, bar, foo")
2
(The first argument is a predicate that returns a ::Bool).
For the given example, the following one-liner should do:
julia> length(collect(eachmatch(r"foo", "bar foo baz foo")))
2
Julia-1.7 update:
Starting with Julia-1.7 Base.Fix2 can be used, through ==('f') below, as to shorten and sweeten the syntax:
julia> count(==('f'), "foobar, bar, foo")
2
What about regexp ?
julia> length(matchall(r"ba", "foobar, bar, foo"))
2
I think that right now the closest built-in thing to what you're after is the length of a split (minus 1). But it's not difficult to specifically create what you're after.
I could see a searchall being generally useful in Julia's Base, similar to matchall. If you don't care about the actual indices, you could just use a counter instead of growing the idxs array.
function searchall(s, t; overlap::Bool=false)
idxfcn = overlap ? first : last
r = findnext(s, t, firstindex(t))
idxs = typeof(r)[] # Or to only count: n = 0
while r !== nothing
push!(idxs, r) # n += 1
r = findnext(s, t, idxfcn(r) + 1)
end
idxs # return n
end
Adding an answer to this which allows for interpolation:
julia> a = ", , ,";
julia> b = ",";
julia> length(collect(eachmatch(Regex(b), a)))
3
Actually, this solution breaks for some simple cases due to use of Regex. Instead one might find this useful:
"""
count_flags(s::String, flag::String)
counts the number of flags `flag` in string `s`.
"""
function count_flags(s::String, flag::String)
counter = 0
for i in 1:length(s)
if occursin(flag, s)
s = replace(s, flag=> "", count=1)
counter+=1
else
break
end
end
return counter
end
Sorry to post another answer instead of commenting previous one, but i've not managed how to deal with code blocks in comments :)
If you don't like regexps, maybe a tail recursive function like this one (using the search() base function as Matt suggests) :
function mycount(what::String, where::String)
function mycountacc(what::String, where::String, acc::Int)
res = search(where, what)
res == 0:-1 ? acc : mycountacc(what, where[last(res) + 1:end], acc + 1)
end
what == "" ? 0 : mycountacc(what, where, 0)
end
This is simple and fast (and does not overflow the stack):
function mycount2(where::String, what::String)
numfinds = 0
starting = 1
while true
location = search(where, what, starting)
isempty(location) && return numfinds
numfinds += 1
starting = location.stop + 1
end
end
one liner: (Julia 1.3.1):
julia> sum([1 for i = eachmatch(r"foo", "foobar, bar, foo")])
2
Since Julia 1.3, there has been a count method that does exactly this.
count(
pattern::Union{AbstractChar,AbstractString,AbstractPattern},
string::AbstractString;
overlap::Bool = false,
)
Return the number of matches for pattern in string.
This is equivalent to calling length(findall(pattern, string)) but more
efficient.
If overlap=true, the matching sequences are allowed to overlap indices in the
original string, otherwise they must be from disjoint character ranges.
│ Julia 1.3
│
│ This method requires at least Julia 1.3.
julia> count("foo", "foobar, bar, foo")
2
julia> count("ana", "bananarama")
1
julia> count("ana", "bananarama", overlap=true)
2

Resources