I'm trying to get my head around Julia, coming from Python. Currently working through some Project Euler problems I've solved using Python in Julia to get a better feeling for the language. One thing that I do a lot (in Project Euler and in real life) is to parse a big multiline data object into an array. For example, if I have the data
data = """1 2 3 4
5 6 7 8
9 0 1 2"""
In python I might do
def parse(input):
output = []
for line in input.splitlines():
output.append(map(int,line.split()))
return np.array(output)
Here's what I have so far in Julia:
function parse(input)
nrow = ncol = 0
# Count first
for row=split(input,'\n')
nrow += 1
ncol = max(ncol,length(split(row)))
end
output = zeros(Int64,(nrow,ncol))
for (i,row) in enumerate(split(input,'\n'))
for (j,word) in enumerate(split(row))
output[i,j] = int(word)
end
end
return output
end
What's the Julia version of "pythonic" called? Whatever it is, I don't think I'm doing it. I'm pretty sure there's a way to (1) not have to pass through the data twice, (2) not have to be so specific about allocating the array. I've tried hcat/vcat a little, without luck.
I'd welcome suggestions for solving this. I'd also be interested in references to proper Julia style (julia-onic?), and general language usage practices. Thanks!
readdlm is really useful here. See the docs for all the options, but here's an example.
julia> data="1 2 3 4
5 6 7 8
9 0 1 2"
"1 2 3 4\n5 6 7 8\n9 0 1 2"
julia> readdlm(IOBuffer(data))
3x4 Array{Float64,2}:
1.0 2.0 3.0 4.0
5.0 6.0 7.0 8.0
9.0 0.0 1.0 2.0
julia> readdlm(IOBuffer(data),Int)
3x4 Array{Int32,2}:
1 2 3 4
5 6 7 8
9 0 1 2
Related
Is there any way for Julia to change the format of the following Tsp model to *.lp file?
using JuMP,CPLEX
Tsp=Model(solver=CplexSolver());
#Sets-------------------------------------------------------------------------
totalu=4;
U=1:4;
totalV=5;
V=1:totalV;
#Parameters-------------------------------------------------------------------
d=[100 10 8 9 7;10 100 10 5 6;8 10 100 8 9;9 5 8 100 6;7 6 9 6 100];
#variables---------------------------------------------------------------------
#variable(Tsp,x[V,V],Bin);
#variable(Tsp,u[V]>=0);
#constrains---------------------------------------------------------------------
#constraint(Tsp,c1[i in V ], sum(x[i,j] for j in V )==1);
#constraint(Tsp,c2[j in V], sum(x[i,j] for i in V )==1);
#constraint(Tsp,c3[i in U,j in V; i!=j],u[i]-u[j]+totalV*x[i,j]<=totalV-1);
# objective function------------------------------------------------------------
ff=sum(d[i,j]*x[i,j] for i in V,j in V);
#objective(Tsp, Min, ff);
solve(Tsp);
I tired this:
open("Tsp.lp", "w") do obj1
println(obj1, Tsp)
end
It doesn't give any error but I can't see in the console the code as a *.lp file. Moreover, is it possible to save the model as a *.lp file?
I am thankful for your help.
Use write_to_file: https://jump.dev/JuMP.jl/stable/manual/models/#Write-a-model-to-file
write_to_file(Tsp, "Tsp.lp")
However, it looks like you're using a (very) old version of JuMP? Please update to JuMP 1.0 for this to work.
The only differences after updating are:
# Tsp=Model(solver=CplexSolver())
Tsp = Model(CPLEX.Optimizer)
# solve(Tsp)
optimize!(Tsp)
I wondered if there is an equivalent to the browser() statement available in RStudio for debugging purposes for Julia (I am using the Juno IDE at the moment).
The R function browser() halts execution and invokes an environment browser when it is called. So, in principle, we can put browser() anywhere in our code to stop in this particular line and see what's stored in the environment at that moment, which is terrific for debugging purposes.
For instance, the code below will stop when i>3. Hence, that's exactly what we will see in the environment browser available in RStudio, where we will observe that i=4 at that moment in the code.
for (i in 1:5) {
print(i)
if (i>3) {
browser()
}
}
[1] 1
[1] 2
[1] 3
[1] 4
Called from: eval(ei, envir)
Browse[1]>
Have a look at Debugger.jl. Specifically the Place breakpoints in source code section:
It is sometimes more convenient to choose in the source code when to break. This is done for instance in Matlab/Octave with keyboard, and in R with browser(). You can use the #bp macro to do this
Your R example translated to Julia:
julia> using Debugger
julia> #run for i in 1:5
println(i)
if i > 3
#bp
end
end
1
2
3
4
Hit breakpoint:
In ##thunk#257() at REPL[4]:1
9 │ Base.println(i)
10 │ %10 = i > 3
11 └── goto #4 if not %10
●12 3 ─ nothing
>13 4 ┄ #_2 = Base.iterate(%1, %8)
14 │ %14 = #_2 === nothing
15 │ %15 = ($(QuoteNode(Core.Intrinsics.not_int)))(%14)
16 └── goto #6 if not %15
17 5 ─ goto #2
About to run: (iterate)(1:5, 4)
1|debug>
This is a general solution for Julia, Juno IDE also has integrated debugging: Debugging, Juno manual.
Infiltrator.jl's #infiltrate seems like the equivalent in Julia:
julia> using Infiltrator
julia> for i in 1:5
println(i)
if i > 3
#infiltrate
end
end
1
2
3
4
Infiltrating top-level scope at REPL[1]:4:
infil> i
4
Compared to Debugger.jl's breakpoint, this doesn't slow down program execution at all, at the cost of not allowing you to step further into your program.
I create a new struct called HousingData, and also define function such as iterate and length. However, when I use the function collect for my HousingData object, I run into the following error.
TypeError: in typeassert, expected Integer, got a value of type Float64
import Base: length, size, iterate
struct HousingData
x
y
batchsize::Int
shuffle::Bool
num_instances::Int
function HousingData(
x, y; batchsize::Int=100, shuffle::Bool=false, dtype::Type=Array{Float64})
new(convert(dtype,x),convert(dtype,y),batchsize,shuffle,size(y)[end])
end
end
function length(d::HousingData)
return ceil(d.num_instances/d.batchsize)
end
function iterate(d::HousingData, state=ifelse(
d.shuffle, randperm(d.num_instances), collect(1:d.num_instances)))
if(length(state)==0)
return nothing
end
return ((d.x[:,state[1]],d.y[:,state[1]]),state[2:end])
end
x1 = randn(5, 100); y1 = rand(1, 100);
obj = HousingData(x1,y1; batchsize=20)
collect(obj)
There are multiple problems in your code. The first one is related to length not returning an integer, but rather a float. This is explained by the behavior of ceil:
julia> ceil(3.8)
4.0 # Notice: 4.0 (Float64) and not 4 (Int)
You can easily fix this:
function length(d::HousingData)
return Int(ceil(d.num_instances/d.batchsize))
end
Another problem lies in the logic of your iteration function, which is not consistent with the advertised length. To take a smaller example than yours:
julia> x1 = [i+j/10 for i in 1:2, j in 1:6]
2×6 Array{Float64,2}:
1.1 1.2 1.3 1.4 1.5 1.6
2.1 2.2 2.3 2.4 2.5 2.6
# As an aside, unless you really want to work with 1xN matrices
# it is more idiomatic in Julia to use 1D Vectors in such situations
julia> y1 = [Float64(j) for i in 1:1, j in 1:6]
1×6 Array{Float64,2}:
1.0 2.0 3.0 4.0 5.0 6.0
julia> obj = HousingData(x1,y1; batchsize=3)
HousingData([1.1 1.2 … 1.5 1.6; 2.1 2.2 … 2.5 2.6], [1.0 2.0 … 5.0 6.0], 3, false, 6)
julia> length(obj)
2
julia> for (i, e) in enumerate(obj)
println("$i -> $e")
end
1 -> ([1.1, 2.1], [1.0])
2 -> ([1.2, 2.2], [2.0])
3 -> ([1.3, 2.3], [3.0])
4 -> ([1.4, 2.4], [4.0])
5 -> ([1.5, 2.5], [5.0])
6 -> ([1.6, 2.6], [6.0])
The iterator produces 6 elements, whereas the length of this object is only 2. This explains why collect errors out:
julia> collect(obj)
ERROR: ArgumentError: destination has fewer elements than required
Knowing your code, you're probably the best person to fix its logic.
I am working on the problem from Cracking the Coding Interview:
A child is running up a staircase with n steps, and can hop either 1 step, 2 steps, or 3 steps at a time.
Implement a method to count how many possible ways the child can run up the stairs.
I came up with a dynamic solution:
def dynamic_prog(N):
store_values = {1:1,2:2,3:3}
return dynamic_prog_helper(N, store_values)
def dynamic_prog_helper(N, map_n):
if N in map_n:
return map_n[N]
map_n[N] = dynamic_prog_helper(N-1, map_n) + dynamic_prog_helper(N-2, map_n) + dynamic_prog_helper(N-3,map_n)
return map_n[N]
I am not sure why it does not compute correctly.
dynamic_prog(5) = 11, but should be 13
dynamic_prog(4) = 6, but should be 7
Can someone point me in the right direction?
The critical problem is that your initial value for store_values[3] is wrong. From 3 steps down, you have 4 possibilities:
3
2 1
1 2
1 1 1
Fixing that error gets the expected results:
def dynamic_prog(N):
store_values = {1:1,2:2,3:4}
return dynamic_prog_helper(N, store_values)
...
for stair_count in range(3, 6):
print dynamic_prog(stair_count)
Output:
4
7
13
I'm using doSNOW- package for parallelizing tasks, which differ in length. When one thread is finished, I want
some information generated by old threads passed to the next thread
start the next thread immediatly (loadbalancing like in clusterApplyLB)
It works in singlethreaded (see makeClust(spec = 1 ))
#Register Snow and doSNOW
require(doSNOW)
#CHANGE spec to 4 or more, to see what my problem is
registerDoSNOW(cl <- makeCluster(spec=1,type="SOCK",outfile=""))
numbersProcessed <- c() # init processed vector
x <- foreach(i = 1:10,.export=numbersProcessed) %dopar% {
#Do working stuff
cat(format(Sys.time(), "%X"),": ","Starting",i,"(Numbers processed so far:",numbersProcessed, ")\n")
Sys.sleep(time=i)
#Appends this number to general vector
numbersProcessed <- append(numbersProcessed,i)
cat(format(Sys.time(), "%X"),": ","Ending",i,"\n")
cat("--------------------\n")
}
#End it all
stopCluster(cl)
Now change the spec in "makeCluster" to 4. Output is something like this:
[..]
Type: EXEC
18:12:21 : Starting 9 (Numbers processed so far: 1 5 )
18:12:23 : Ending 6
--------------------
Type: EXEC
18:12:23 : Starting 10 (Numbers processed so far: 2 6 )
18:12:25 : Ending 7
At 18:12:21 thread 9 knew, that thread 1 and 5 have been processed. 2 seconds later thread 6 ends. The next thread has to know at least about 1, 5 and 6, right?. But thread 10 only knows about 6 and 2.
I realized, this has to do something with the cores specified in makeCluster. 9 knows about 1, 5 and 9 (1 + 4 + 4), 10 knows about 2,6 and 10 (2 + 4 + 4).
Is there a better way to pass "processed" stuff to further generations of threads?
Bonuspoints: Is there a way to "print" to the master- node in parallel processing, without having these "Type: EXEC" etc messages from the snow package? :)
Thanks!
Marc
My bad. Damn.
I thought, foreach with %dopar% is load-balanced. This isn't the case, and makes my question absolete, because there can nothing be executed on the host-side while parallel processing. That explains why global variables are only manipulated on the client side and never reach the host.