pmap bounds error: parallel julia - julia

I get a bounds error when running a function in parallel that runs fine normally (sequentially) e.g. when I run:
parallelsol = #time pmap(i -> findividual(x,y,z), 1:50)
It gives me an error:
exception on 2: exception on exception on 16: 20exception on 5: : ERROR: BoundsError()
in getindex at array.jl:246 (repeats 2 times)
But when I run:
parallelsol = #time map(i -> findividual(prodexcint,firstrun,q,r,unempinc,VUnempperm,Indunempperm,i,VUnemp,poachedwagevec, mw,k,Vp,Vnp,reswage), 1:50)
It runs fine. Any ideas as to why this might be happening?

Related

Jupyter with Julia results in unexpected type error: no method matching

I get an unexpected type error when running the following Julia code in Jupyter, where a seemingly straightforward import goes wrong:
include("./imp.jl")
include("./imp2.jl")
n = Main.Imp.Network([1,2])
Imp2.p2(n)
This results in the following error:
MethodError: no method matching p(::Main.Imp.Network)
Closest candidates are:
p(::Main.Imp2.Imp.Network) at /Users/cg/Dropbox/code/Julia/learning/imp.jl:11
The code is the below. How does this happen?
Imp.jl:
module Imp
export Network, p
mutable struct Network
a::Array{Any,1}
end
function p(network::Network)
network
end
end
Imp2.jl:
module Imp2
include("./imp.jl")
function p2(network)
Imp.p(network)
end
end
More error below:
Stacktrace:
[1] p2(network::Main.Imp.Network)
# Main.Imp2 ~/Dropbox/code/Julia/learning/imp2.jl:5
[2] top-level scope
# In[3]:4
[3] eval
# ./boot.jl:360 [inlined]
[4] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
# Base ./loading.jl:1116
You can either do:
module Imp2
using Main.Imp
function p2(network)
Imp.p(network)
end
end
OR (without sourcing imp.jl outside of module defintion)
module Imp2
include("./imp.jl")
using .Imp
function p2(network)
Imp.p(network)
end
end
In the second case your Julia code could look like:
julia> using Main.Imp2
julia> n = Imp2.Imp.Network([1,2])
Main.Imp2.Imp.Network(Any[1, 2])
julia> Imp2.p2(n)
Main.Imp2.Imp.Network(Any[1, 2])
Addtionally if you add export Imp to the Imp2 module, you could write Imp.Network([1,2]) instead of Imp2.Imp.Network([1,2]).

How to run parallel function in julia?

I would like to run the function f() on all my 4 processors (intel i7) and fetch the rands sum, as follows:
using Distributed;
#everywhere function f()
return sum(rand(10000))
end
#sync for w in workers()
#async begin
res = #spawnat w f()
values[w-1] = fetch(res)
end
end
But, getting the following error:
ERROR: TaskFailedException
nested task error: MethodError: no method matching setindex!(::typeof(values), ::Float64, ::Int64)
Stacktrace:
[1] macro expansion
# ./REPL[58]:4 [inlined]
[2] (::var"#68#70"{Channel{Any}, Int64})()
# Main ./task.jl:411
Stacktrace:
[1] sync_end(c::Channel{Any})
# Base ./task.jl:369
[2] top-level scope
# task.jl:388
Please guide me in resolving the issue!
For your code the easiest way would be (assuming Julia has been run with -p 4 command line parameter or you have run addprocs(4):
julia> values = #distributed (append!) for i in 1:4
[f()]
end
4-element Vector{Float64}:
5001.232864826896
4999.244031827526
4966.883114472259
5014.022690758762
If you want to do #spawns yourself this code works:
julia> values = Vector{Float64}(undef, 4);
julia> #sync for w in workers()
#async values[w-1] = fetch(#spawnat w f())
end
julia> values
4-element Vector{Float64}:
5029.967318172736
4993.1064528029
5016.491407076979
5062.0706219606345
However your code mostly didn't work because the type of your values was not Vector{Float64}. Here is how to replicate your error:
julia> vv()=0
vv (generic function with 1 method)
julia> vv[1]=11
ERROR: MethodError: no method matching setindex!(::typeof(vv), ::Int64, ::Int64)

Change julia promt to include evalutation numbers

When debugging or running julia code in REPL, I usually see error messages showing ... at ./REPL[161]:12 [inlined].... The number 161 means the 161-th evaluation in REPL, I guess. So my question is could we show this number in julia's prompt, i.e. julia [161]> instead of julia>?
One of the advantages of Julia is its ultra flexibility. This is very easy in Julia 0.7 (nightly version).
julia> repl = Base.active_repl.interface.modes[1]
"Prompt(\"julia> \",...)"
julia> repl.prompt = () -> "julia[$(length(repl.hist.history) - repl.hist.start_idx + 1)] >"
#1 (generic function with 1 method)
julia[3] >
julia[3] >2
2
julia[4] >f = () -> error("e")
#3 (generic function with 1 method)
julia[5] >f()
ERROR: e
Stacktrace:
[1] error at .\error.jl:33 [inlined]
[2] (::getfield(, Symbol("##3#4")))() at .\REPL[4]:1
[3] top-level scope
You just need to put the first 2 lines onto your ~/.juliarc and enjoy~
Since there are several changes in the REPL after julia 0.7, these codes do not work in old versions.
EDIT: Well, actually there need a little bit more efforts to make it work in .juliarc.jl. Try this code:
atreplinit() do repl
repl.interface = Base.REPL.setup_interface(repl)
repl = Base.active_repl.interface.modes[1]
repl.prompt = () -> "julia[$(length(repl.hist.history) - repl.hist.start_idx + 1)] >"
end

foreach-loop (R/doParallel package) fails with big number of iterations

I have the following R-code:
library(doParallel)
cl <- makeCluster(detectCores()-4, outfile = "")
registerDoParallel(cl)
calc <- function(i){
...
#returns a dataframe
}
system.time(
res<- foreach( i = 1:106800, .verbose = TRUE) %dopar% calc(i)
)
stopCluster(cl)
If I run that code from 1:5, it finishes successfully.
The same happens if I run that code from 106000 - 106800.
But it fails if I run the full vector 1-106800, or even 100000-106800 (these are not the very exact numbers I am working with but better readable) with this error message:
...
got results for task 6813
numValues: 6814, numResults: 6813, stopped: TRUE
returning status FALSE
got results for task 6814
numValues: 6814, numResults: 6814, stopped: TRUE
calling combine function
evaluating call object to combine results:
fun(accum, result.6733, result.6734, result.6735, result.6736,
result.6737, result.6738, result.6739, result.6740, result.6741,
result.6742, result.6743, result.6744, result.6745, result.6746,
result.6747, result.6748, result.6749, result.6750, result.6751,
result.6752, result.6753, result.6754, result.6755, result.6756,
result.6757, result.6758, result.6759, result.6760, result.6761,
result.6762, result.6763, result.6764, result.6765, result.6766,
result.6767, result.6768, result.6769, result.6770, result.6771,
result.6772, result.6773, result.6774, result.6775, result.6776,
result.6777, result.6778, result.6779, result.6780, result.6781,
result.6782, result.6783, result.6784, result.6785, result.6786,
result.6787, result.6788, result.6789, result.6790, result.6791,
result.6792, result.6793, result.6794, result.6795, result.6796,
result.6797, result.6798, result.6799, result.6800, result.6801,
result.6802, result.6803, result.6804, result.6805, result.6806,
result.6807, result.6808, result.6809, result.6810, result.6811,
result.6812, result.6813, result.6814)
returning status TRUE
Error in calc(i) :
task 1 failed - "object of type 'S4' is not subsettable"
I have no clue why I get this error message. Unfortunately, I cannot provide a running example as I cannot reproduce it with some simple code. Is a single job failing? If yes, how can I find which one fails? Or any other ideas how to troubleshoot?

How to build logistic regression model in SparkR

I am new to Spark as well as SparkR. I have successfully installed Spark and SparkR.
When I tried to build Logistic regression model with R and Spark over csv file stored in HDFS, I got the error "incorrect number of dimensions".
My Code is :
points <- cache(lapplyPartition(textFile(sc, "hdfs://localhost:54310/Henry/data.csv"), readPartition))
collect(points)
w <- runif(n=D, min = -1, max = 1)
cat("Initial w: ", w, "\n")
# Compute logistic regression gradient for a matrix of data points
gradient <- function(partition) {
partition = partition[[1]]
Y <- partition[, 1] # point labels (first column of input file)
X <- partition[, -1] # point coordinates
# For each point (x, y), compute gradient function
dot <- X %*% w
logit <- 1 / (1 + exp(-Y * dot))
grad <- t(X) %*% ((logit - 1) * Y)
list(grad)
}
for (i in 1:iterations) {
cat("On iteration ", i, "\n")
w <- w - reduce(lapplyPartition(points, gradient), "+")
}
Error Message is:
On iteration 1
Error in partition[, 1] : incorrect number of dimensions
Calls: do.call ... func -> FUN -> FUN -> Reduce -> <Anonymous> -> FUN -> FUN
Execution halted
14/09/27 01:38:13 ERROR Executor: Exception in task 0.0 in stage 181.0 (TID 189)
java.lang.NullPointerException
at edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:125)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:701)
14/09/27 01:38:13 WARN TaskSetManager: Lost task 0.0 in stage 181.0 (TID 189, localhost): java.lang.NullPointerException:
edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:125)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:701)
14/09/27 01:38:13 ERROR TaskSetManager: Task 0 in stage 181.0 failed 1 times; aborting job
Error in .jcall(getJRDD(rdd), "Ljava/util/List;", "collect") : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 181.0 failed 1 times, most recent failure: Lost task 0.0 in stage 181.0 (TID 189, localhost): java.lang.NullPointerException: edu.berkeley.cs.amplab.sparkr.RRDD.compute(RRDD.scala:125) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:701) Driver stacktrace:
Dimension of data (sample) :
data <- read.csv("/home/Henry/data.csv")
dim(data)
[1] 17 541
What could be the possible reason for this error?
The problem is that textFile() reads some text data and return a distributed collection of strings, each of which corresponds to a line of the text file. Therefore later in the program partition[, -1] fails. The program's real intent seems to be treating points as a distributed collection of data frames. We are working on providing data frame support in SparkR soon (SPARKR-1).
To resolve the issue, simply manipulate your partition using string operations to extract X, Y correctly. Some other ways include (I think you've probably seen this before) producing a different type of distributed collection from the beginning as is done here: examples/logistic_regression.R.

Resources