Which python code written in the client side of #dask is really added to the task graph?
In this script for example, I am reading an hdf5 dataset of 4 dim, using a loop for the fourth dimension.
I calculate the sum for each dim called here g for generation and subtract the result of this generation and the one before it.
Then i am calling the deriv.visualize() to see how it generates the graph.
alive = []
derivate = []
board = []
deriv = 0
rest_1 = 0
hf5 = h5py.File('Datata.h5', 'r')
hds5 = hf5.get('dataset')
list(hf5.keys())
last_gen = hds5.attrs.get('last_gen')
for g in range(0,generations):
board = hds5[g]
arr = da.asarray(board, chunks=(4,5,4))
res = arr.sum()
if g!=0 :
deriv = res - rest_1
rest_1 = res
deriv.visualize()
Here is the graph i am getting
Here without calling .compute() the subtract operator is added to the dask graph apparently, how do we explain this ?
If i add a .compute() in the "res = arr.sum().compute()" and keep the rest as it is, where the subtraction will be executed ? in the client side or in one of the workers ?
An other question which is more general, if i wanna keep the partial sums in the workers, and perform the subtraction (on partial sum of the current and last generation) in the workers, is there a way to say that i want theses operations to be performed on the same chunks over different generations? (for example the worker 0 will operate always on the 3 first rows of each generation, like in mpi even if it's not the same thing at all)?
Dask does not look at your Python code, and so can not see anything other than what you give to it. In this case it is these two lines:
arr = da.asarray(x, chunks=(4,5,4))
res = arr.sum().compute()
Related
What's happening, folks.
So, I've done a fair amount of research on merge sort, and in spite of getting the "gist" of it, I am still baffled by how one is supposed to store the subarrays in order to merge them back together—in other words, save them somewhere so that they would "know" each other, as you would otherwise—in classic recursive fashion—have all these independent function calls returning data that I would assume would go out of scope.
Here's what I first thought: create a new array named "subs" to store the subarrays in upon each division (I also considered using a closure to do this and would like to know whether this is advisable). But, as you proceed to the next division, what are you gonna do—replace each element in subs with its subarrays? Then, you would be facing more costly work, especially once you consider how you're gonna move things around in subs in order to ensure that each subarray has its own index.
Heh—I have a bad feeling that this might be a far cry from what's actually supposed to be done. I understand that this algorithm is a classic example of the divide-and-conquer approach, but it's just strange to me that one couldn't just cut to the chase by splitting the array into all of its elements right off the bat (after all, that's the base case, and what would be wrong with throwing in a greedy approach to solving the problem?).
Thanks!
EDIT:
Alright, so I figured it out.
To sum it up: I used indices to track where to place elements (and obviate the need for built-in list functions that may slow down runtime).
By using nested functions and a (hidden) pointer to the new array, I kept data in scope. An auxiliary array buffers data from the subarrays.
In retrospect, what I originally had in mind vaguely resembled insertion sort was, in fact, bottom-up merge sort. Having previously questioned the efficiency and purpose of top-down merge sort, I now understand that by breaking down the problem, it expedites comparisons and swaps (especially when operating on larger lists, which insertion sort would prove to be less efficient in sorting). I did not attempt to implement my initial idea because I did not have a clear enough picture of recursion and how data is passed.
#!/bin/python
import sys
def merge_sort(arr):
def merge(*indices): # indices = first, last, and pivot indices, respectively
head, tail = indices[0], indices[1]
pivot = indices[2]
i = head
j = pivot+1
k = 0
while (i <= pivot and j <= tail):
if new[i] <= new[j]:
aux[k] = new[i]
i += 1
k += 1
else:
aux[k] = new[j]
j += 1
k += 1
while (i <= pivot):
aux[k] = new[i]
i += 1
k += 1
while (j <= tail):
aux[k] = new[j]
j += 1
k += 1
for x in xrange(head, tail+1):
new[x] = aux[x-head]
# end merge
def split(a, *indices): # indices = first and last indices, respectively
head, tail = indices[0], indices[1]
pivot = (head+tail) / 2
if head < tail:
l_sub = a[head:pivot+1]
r_sub = a[pivot+1:tail+1]
split(l_sub, head, pivot)
split(r_sub, pivot+1, tail)
merge(head, tail, pivot)
# end split
new = arr
aux = list(new)
tail = len(new)-1
split(new, 0, tail)
return new
# end merge_sort
if __name__ == "__main__":
loops = int(raw_input().strip())
for _ in xrange(loops):
arr = map(int, raw_input().strip().split(' '))
result = merge_sort(arr)
print result
I am trying to run a for loop in Julia using bounds for integration where fI and r are arrays of the same length. I know this is incorrect, but this is the gist of what I want to do.
a = zeros(1:length(fI))
for i = 1:length(fI)
a[i] = (fI[i+1] - fI[i])/(r[i+1] - r[i])
end
How can I set increments of n+1 in Julia? Haven't had any luck finding the answer elsewhere.
Just let me know if I can clarify anything. I'm still pretty new to the language.
Ranges are specified by start:stepsize:end. Thus the answer is for i = 1:(n+1):length(fI).
I am not completely sure what you want to do, but it looks as you want to create a new variable based on the difference between elements in the other variables. If that is your use case, you can use diff, e.g.
fI, r = rand(10), rand(10)
a = diff(fI) ./ diff(r)
Your code will crash since for the last "i" you access beyond the array length
fI[i+1] = fI[length(fI)+1]
a = zeros(1:length(fI))
for i = 1:length(fI)
a[i] = (fI[i+1] - fI[i])/(r[i+1] - r[i])
end
Maybe you intend the following
n = length(fI) - 1
a = zeros(1:n)
for i = 1:n
a[i] = (fI[i+1] - fI[i])/(r[i+1] - r[i])
end
I would like to maintain a variable on the GPU, and perform some operations on that variable in place. The following snippet is a minimalish example of this.
import numpy as np
import tensorflow as tf
with tf.Graph().as_default():
i = tf.placeholder(tf.int32, [4], name='i')
y = tf.placeholder(tf.float32, [4], name='y')
_x = tf.get_variable('x', [4], initializer=tf.random_normal_initializer())
x = _x + tf.reduce_sum(tf.mul(_x,y))
assign_op = tf.assign(_x, x).op
permute_op = tf.assign(_x, tf.gather(_x, i))
ii = np.array([1,2,3,0])
yy = np.random.randn(4)
s = tf.Session()
s.run(tf.initialize_all_variables())
xxx0 = s.run(_x)
s.run([permute_op, assign_op], feed_dict={i: ii, y: yy})
xxx1 = s.run(_x)
print('assigned then permuted', np.allclose((xxx0+np.dot(xxx0,yy))[ii], xxx1))
print('permuted then assigned', np.allclose((xxx0[ii]+np.dot(xxx0[ii], yy)), xxx1))
The problem is that this program is ambiguous, in terms of the ordering of the assign_op and permute_op operations. Hence, one or the other of the final two print statements will be true, but which one that is varies randomly across multiple runs of the program. I could break this into two steps, the first running the permute_op and the second running the assign_op, but it seems this will be less efficient.
Is there an efficient way of breaking the race condition, and making the results predictable?
The easiest way to order the two assignments is to use the result of the first assignment as the variable input to the second one. This creates a data dependency between the assignments, which gives them a deterministic order. For example:
assigned = tf.assign(_x, x)
permuted = tf.assign(assigned, tf.gather(assigned, i))
sess.run(permuted.op) # Runs both assignments.
Note that I reversed the order of the permutation and assignment operations from what you said in your question, because doing the permutation first and then updating still has a race. Even if this isn't the semantics you wanted, the principle should hopefully be clear.
An alternative approach is to use with tf.control_dependencies(ops): blocks, where ops is a list of operations (such as assignments) that must run before the operations in the with block. This is slightly trickier to use, because you have to be careful about reading the updated value of a variable. (Like a non-volatile variable in C, the read may be cached.) The typical idiom to force a read is to use tf.identity(var.ref()), so the example would look something like:
assign_op = tf.assign(_x, x).op
with tf.control_dependencies([assign_op]):
# Read updated value of `_x` after `assign_op`.
new_perm = tf.gather(tf.identity(_x.ref()), i)
permute_op = tf.assign(_x, new_perm).op
sess.run(permute_op) # Runs both assignments.
I'm trying to run some R code and it is crashing because of memory. The error that I get is:
Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) :
long vectors not supported yet: memory.c:3100
The function that creates the troubles is the following:
StationUserX <- function(userNDX){
lat1 = deg2rad(geolocation$latitude[userNDX])
long1 = deg2rad(geolocation$longitude[userNDX])
session_user_id = as.character(geolocation$session_user_id[userNDX])
#Find closest station
Distance2Stations <- unlist(lapply(stationNDXs, Distance2StationX, lat1, long1))
# Return index for closest station and distance to closest station
stations_userX = data.frame(session_user_id = session_user_id,
station = ghcndstations$ID[stationNDXs],
Distance2Station = Distance2Stations)
stations_userX = stations_userX[with(stations_userX, order(Distance2Station)), ]
stations_userX = stations_userX[1:100,] #only the 100 closest stations...
row.names(stations_userX)<-NULL
return(stations_userX)
}
I run this function using mclapply 50k times. StationUserX is calling Distance2StationX 90k times.
Is there an obvious way to optimize the function StationUserX ?
mclapply is having trouble sending back all the data from worker threads into the main thread. That's because of prescheduling, where it runs large number of iterations per thread, and then syncs all the data back. That's nice and fast, but results in >2GB of data being sent back, which it can't do.
Run mclapply with mc.preschedule=F to turn off pre-scheduling. Now, each iteration will spawn its own thread and will return its own data. It won't go quite as fast, but it gets around the problem.
Try using nextElem() from the iterators package. It acts like a "generator" in Python, so you don't have to load the entire list into memory.
I have around 85 lists of different sizes, L1 - L85
I'm trying to create a new list in the following way:
allLists <- list(a = L1, b = L2, c = L3,....nn = L85)
This code is dynamically generated by Java which creates the lists for later statistical calculations.
When I run the code all I get is a + after the end of the command.
If I remove some of the lists and reduce the size if the allLists list to 79 or less
the code runs without a problem, otherwise there's just + .
Any Idea will be appreciated.
So I'm using RCaller library in Java.
The target is to perform statistical analysis on Experiments with x traits and n repetitions.
First of all I'm building lists which contain all the calculations for each event for example AVG, MED etc., Later I need to build the "Master List" which contains all the events lists with all their calculations to run some statistical models on them.
Basically the allLists looks as follows:
allLists <- list(trait1STDEV = res.trait1STDEV, trait1MeasureN = res.trait1MeasureN, trait1MeasureIMP = res.trait1MeasureIMP, trait1MeasureSIG = res.trait1MeasureSIG, trait2AVG = res.trait2AVG, trait2STDEV = res.trait2STDEV...........traitNSIG = res.traitNSIG)
So I've found the problem. Apparently there's a limit on the maximum command length in R,
this problem can be solved by feeding a new line when working in console.
However when using the RUtils in Java this solution is not applicable.
In the case when generation R code with Java by using RUtils there's a need to install a package called Runiversal after doing that there's no problem with maximum command length.