fixing race condition in tensorflow run - graph

I would like to maintain a variable on the GPU, and perform some operations on that variable in place. The following snippet is a minimalish example of this.
import numpy as np
import tensorflow as tf
with tf.Graph().as_default():
i = tf.placeholder(tf.int32, [4], name='i')
y = tf.placeholder(tf.float32, [4], name='y')
_x = tf.get_variable('x', [4], initializer=tf.random_normal_initializer())
x = _x + tf.reduce_sum(tf.mul(_x,y))
assign_op = tf.assign(_x, x).op
permute_op = tf.assign(_x, tf.gather(_x, i))
ii = np.array([1,2,3,0])
yy = np.random.randn(4)
s = tf.Session()
s.run(tf.initialize_all_variables())
xxx0 = s.run(_x)
s.run([permute_op, assign_op], feed_dict={i: ii, y: yy})
xxx1 = s.run(_x)
print('assigned then permuted', np.allclose((xxx0+np.dot(xxx0,yy))[ii], xxx1))
print('permuted then assigned', np.allclose((xxx0[ii]+np.dot(xxx0[ii], yy)), xxx1))
The problem is that this program is ambiguous, in terms of the ordering of the assign_op and permute_op operations. Hence, one or the other of the final two print statements will be true, but which one that is varies randomly across multiple runs of the program. I could break this into two steps, the first running the permute_op and the second running the assign_op, but it seems this will be less efficient.
Is there an efficient way of breaking the race condition, and making the results predictable?

The easiest way to order the two assignments is to use the result of the first assignment as the variable input to the second one. This creates a data dependency between the assignments, which gives them a deterministic order. For example:
assigned = tf.assign(_x, x)
permuted = tf.assign(assigned, tf.gather(assigned, i))
sess.run(permuted.op) # Runs both assignments.
Note that I reversed the order of the permutation and assignment operations from what you said in your question, because doing the permutation first and then updating still has a race. Even if this isn't the semantics you wanted, the principle should hopefully be clear.
An alternative approach is to use with tf.control_dependencies(ops): blocks, where ops is a list of operations (such as assignments) that must run before the operations in the with block. This is slightly trickier to use, because you have to be careful about reading the updated value of a variable. (Like a non-volatile variable in C, the read may be cached.) The typical idiom to force a read is to use tf.identity(var.ref()), so the example would look something like:
assign_op = tf.assign(_x, x).op
with tf.control_dependencies([assign_op]):
# Read updated value of `_x` after `assign_op`.
new_perm = tf.gather(tf.identity(_x.ref()), i)
permute_op = tf.assign(_x, new_perm).op
sess.run(permute_op) # Runs both assignments.

Related

Problem with extracting values from vector for for loop

I am trying to extract values from a vector to generate random numbers from a GEV distribution. I keep getting an error. This is my code
x=rand(Truncated(Poisson(2),0,10),10)
t=[]
for i in 1:10 append!(t, maximum(rand(GeneralizedExtremeValue(2,4,3, x[i])))
I am new to this program and I think I am not passing the variable x properly. Any help will be appreciated. Thanks
If I am correctly understanding what you are trying to do, you might want something more like
x = rand(Truncated(Poisson(2),0,10),10)
t = Float64[]
for i in 1:10
append!(t, max(rand(GeneralizedExtremeValue(2,4,3)), x[i]))
end
Among other things, you were missing a paren, and probably want max instead of maximum here.
Also, while it would technically work, t = [] creates an empty array of type Any, which tends to be very inefficient, so you can avoid that by just telling Julia what type you want that array to hold with e.g. t = Float64[].
Finally, since you already know t only needs to hold ten results, you can make this again more efficient by pre-allocating t
x = rand(Truncated(Poisson(2),0,10),10)
t = Array{Float64}(undef,10)
for i in 1:10
t[i] = max(rand(GeneralizedExtremeValue(2,4,3)), x[i])
end

Find which sum of any numbers in an array equals amount

I have a customer who sends electronic payments but doesn't bother to specify which invoices. I'm left guessing which ones and I would rather not try every single combination manually. I need some sort of pseudo-code to do it and then I can adapt it but I'm not sure I can come up with a good algorithm myself. . I'm familiar with php, bash, and python but I can adapt.
I would need an array with the following numbers: [357.15, 223.73, 106.99, 89.96, 312.39, 120.00]. Those are the amounts of the invoices. Then I would need to find a sum of any combination of two or more of those numbers that adds up to 596.57. Once found the program would need to tell me exactly which numbers it used to reach the sum so I can then know which invoices got paid.
This is very similar to the Subset Sum problem and can be solved using a similar approach to the typical brute-force method used for that problem. I have to do this often enough that I keep a simple template of this algorithm handy for when I need it. What is posted below is a slightly modified version1.
This has no restrictions on whether the values are integer or float. The basic idea is to iterate over the list of input values and keep a running list of every subset that sums to less than the target value (since there might be a later value in the inputs that will yield the target). It could be modified to handle negative values as well by removing the rule that only keeps candidate subsets if they sum to less than the target. In that case, you'd keep all subsets, and then search through them at the end.
import copy
def find_subsets(base_values, taget):
possible_matches = [[0, []]] # [[known_attainable_value, [list, of, components]], [...], ...]
matches = [] # we'll return ALL subsets that sum to `target`
for base_value in base_values:
temp = copy.deepcopy(possible_matches) # Can't modify in loop, so use a copy
for possible_match in possible_matches:
new_val = possible_match[0] + base_value
if new_val <= target:
new_possible_match = [new_val, possible_match[1]]
new_possible_match[1].append(base_value)
temp.append(new_possible_match)
if new_val == target:
matches.append(new_possible_match[1])
possible_matches = temp
return matches
find_subsets([list, of input, values], target_sum)
This is a very inefficient algorithm and it will blow up quickly as the size of the input grows. The Subset Sum problem is NP-Complete, so you are not likely to find a generalized solution that will work in all cases and is efficient.
1: The way lists are being used here is kludgy. If the goal was to simply find any match, the nested lists could be replaced with a dictionary, and we could exit right away once a match is found. But doing that will cause intermediate subsets that sum to the same value to also map to the same dictionary slot, so only one subset with that sum is kept. Since we need to report all matching subsets (because the values represent checks and are presumably not fungible even if the dollar amounts are equal), a dictionary won't work.
You can use itertools.combinations(t,r) to list all combinations of r elements in array t.
So we loop on the possible values of r, then on the results of itertools.combinations:
import itertools
def find_sum(t, obj):
t = [x for x in t if x < obj] # filter out elements which are too big
for r in range(1, len(t)+1): # loop on number of elements
for subt in itertools.combinations(t, r): # loop on combinations of r elements
if sum(subt) == obj:
return subt
return None
find_sum([1,2,3,4], 6)
# (2, 4)
find_sum([1,2,3,4], 10)
# (1, 2, 3, 4)
find_sum([1,2,3,4], 11)
# none
find_sum([35715, 22373, 10699, 8996, 31239, 12000], 59657)
# none
Rounding errors:
The code above is meant to be used with integers, rather than floats.
To use with floats, replace the test sum(subt) == obj with the more forgiving test sum(subt) - obj < 0.01.
Relevant documentation:
itertools.combinations

Which python code will be included in the Dask graph

Which python code written in the client side of #dask is really added to the task graph?
In this script for example, I am reading an hdf5 dataset of 4 dim, using a loop for the fourth dimension.
I calculate the sum for each dim called here g for generation and subtract the result of this generation and the one before it.
Then i am calling the deriv.visualize() to see how it generates the graph.
alive = []
derivate = []
board = []
deriv = 0
rest_1 = 0
hf5 = h5py.File('Datata.h5', 'r')
hds5 = hf5.get('dataset')
list(hf5.keys())
last_gen = hds5.attrs.get('last_gen')
for g in range(0,generations):
board = hds5[g]
arr = da.asarray(board, chunks=(4,5,4))
res = arr.sum()
if g!=0 :
deriv = res - rest_1
rest_1 = res
deriv.visualize()
Here is the graph i am getting
Here without calling .compute() the subtract operator is added to the dask graph apparently, how do we explain this ?
If i add a .compute() in the "res = arr.sum().compute()" and keep the rest as it is, where the subtraction will be executed ? in the client side or in one of the workers ?
An other question which is more general, if i wanna keep the partial sums in the workers, and perform the subtraction (on partial sum of the current and last generation) in the workers, is there a way to say that i want theses operations to be performed on the same chunks over different generations? (for example the worker 0 will operate always on the 3 first rows of each generation, like in mpi even if it's not the same thing at all)?
Dask does not look at your Python code, and so can not see anything other than what you give to it. In this case it is these two lines:
arr = da.asarray(x, chunks=(4,5,4))
res = arr.sum().compute()

Swap name and content of a (lookup) vector in an one liner / library function

In my code I use lookup tables quite often, for example to have more verbose versions of column names in a data frame. For instance:
lkp <- c(speed = "Speed in mph", dist = "Stopping Distance in ft")
makePlot <- function(x = names(cars)) {
x <- match.arg(x)
hist(cars[[x]], xlab = lkp[[x]])
}
Now it happens that I want to reverse the lookup vector [*], which is easily done by
setNames(names(lkp), lkp)
If lkp is a bit more complicated, this becomes quite a lot of typing:
setNames(names(c(firstLkp, secondLkp, thirdLkp, youGotTheIdea)),
c(firstLkp, secondLkp, thirdLkp, youGotTheIdea))
with a lot of redundant code. Of course I could create a temporary variable
fullLkp <- c(firstLkp, secondLkp, thirdLkp, youGotTheIdea)
setNames(names(fullLkp), fullLkp)
Or even write a simple function doing it for me
swap_names_content <- function(x) setNames(names(x), x)
However, since this seems to me to be such a common task, I was wondering whether there is already a function in one of the popular packages doing the same?
[*] A common use case for me is the use of shiny's selectInput for instance:
List of values to select from. If elements of the list are named, then that name rather than the value is displayed to the user.
That is, it is exactly the reverse of my typical lookup table.

Most efficient way (fastest) to modify a data.frame using indexing

Little introduction to the question :
I am developing an ecophysiological model, and I use a reference class list called S that store every object the model need for input/output (e.g. meteo, physiological parameters etc...).
This list contains 5 objects (see example below):
- two dataframes, S$Table_Day (the outputs from the model) and S$Met_c(the meteo in input), which both have variables in columns, and observations (input or output) in row.
- a list of parameters S$Parameters.
- a matrix
- a vector
The model runs many functions with a daily time step. Each day is computed in a for loop that runs from the first day i=1 to the last day i=n. This list is passed to the functions that often take data from S$Met_c and/or S$Parameters in input and compute something that is stored in S$Table_Day, using indexes (the ith day). S is a Reference Class list because they avoid copy on modification, which is very important considering the number of computations.
The question itself :
As the model is very slow, I am trying to decrease computation time by micro-benchmarking different solutions.
Today I found something surprising when comparing two solutions to store my data. Storing data by indexing in one of the preallocated dataframes is longer than storing it into an undeclared vector. After reading this, I thought preallocating memory was always faster, but it seems that R performs more operations while modifying by index (probably comparing the length, type etc...).
My question is : is there a better way to perform such operations ? In other words, is there a way for me to use/store more efficiently the inputs/outputs (in a data.frame, a list of vector or else) to keep track of all computations of each day ? For example would it be better to use many vectors (one for each variable) and regroup them in more complex objects (e.g. list of dataframe) at then end ?
By the way, am I right to use Reference Classes to avoid copy of the big objects in S while passing it to functions and modify it from within them ?
Reproducible example for the comparison:
SimulationClass <- setRefClass("Simulation",
fields = list(Table_Day = "data.frame",
Met_c= "data.frame",
PerCohortFruitDemand_c="matrix",
Parameters= "list",
Zero_then_One="vector"))
S= SimulationClass$new()
# Initializing the table with dummy numbers :
S$Table_Day= data.frame(one= 1:10000, two= rnorm(n = 10000), three= runif(n = 10000),Bud_dd= rep(0,10000))
S$Met_c= data.frame(DegreeDays= rnorm(n=10000, mean = 10, sd = 1))
f1= function(i){
a= cumsum(S$Met_c$DegreeDays[i:(i-1000)])
}
f2= function(i){
S$Table_Day$Bud_dd[(i-1000):i]= cumsum(S$Met_c$DegreeDays[i:(i-1000)])
}
res= microbenchmark(f1(1000),f2(1000),times = 10000)
autoplot(res)
And the result :
Also if someone has any experience in programming such models, I am deeply interested in any advice for model development.
I read more about the question, and I'll just write here for prosperity some of the solutions that were proposed on other posts.
Apparently, reading and writing are both worth to consider when trying to reduce the computation time of assignation to a data.frame by index.
The sources are all found in other discussions:
How to optimize Read and Write to subsections of a matrix in R (possibly using data.table)
Faster i, j matrix cell fill
Time in getting single elements from data.table and data.frame objects
Several solutions appeared relevant :
Use a matrix instead of a data.frame if possible to leverage in place modification (Advanced R).
Use a list instead of a data.frame, because [<-.data.frame is not a primitive function (Advanced R).
Write functions in C++ and use Rcpp (from this source)
Use .subset2 to read instead of [ (third source)
Use data.table as recommanded by #JulienNavarre and #Emmanuel-Lin and the different sources, and use either set for data.frame or := if using a data.table is not a problem.
Use [[ instead of [ when possible (index by one value only). This one is not very effective, and very restrictive, so I removed it from the following comparison.
Here is the analysis of performance using the different solutions :
The code :
# Loading packages :
library(data.table)
library(microbenchmark)
library(ggplot2)
# Creating dummy data :
SimulationClass <- setRefClass("Simulation",
fields = list(Table_Day = "data.frame",
Met_c= "data.frame",
PerCohortFruitDemand_c="matrix",
Parameters= "list",
Zero_then_One="vector"))
S= SimulationClass$new()
S$Table_Day= data.frame(one= 1:10000, two= rnorm(n = 10000), three= runif(n = 10000),Bud_dd= rep(0,10000))
S$Met_c= data.frame(DegreeDays= rnorm(n=10000, mean = 10, sd = 1))
# Transforming data objects into simpler forms :
mat= as.matrix(S$Table_Day)
Slist= as.list(S$Table_Day)
Metlist= as.list(S$Met_c)
MetDT= as.data.table(S$Met_c)
SDT= as.data.table(S$Table_Day)
# Setting up the functions for the tests :
f1= function(i){
S$Table_Day$Bud_dd[i]= cumsum(S$Met_c$DegreeDays[i])
}
f2= function(i){
mat[i,4]= cumsum(S$Met_c$DegreeDays[i])
}
f3= function(i){
mat[i,4]= cumsum(.subset2(S$Met_c, "DegreeDays")[i])
}
f4= function(i){
Slist$Bud_dd[i]= cumsum(.subset2(S$Met_c, "DegreeDays")[i])
}
f5= function(i){
Slist$Bud_dd[i]= cumsum(Metlist$DegreeDays[i])
}
f6= function(i){
set(S$Table_Day, i=as.integer(i), j="Bud_dd", cumsum(S$Met_c$DegreeDays[i]))
}
f7= function(i){
set(S$Table_Day, i=as.integer(i), j="Bud_dd", MetDT[i,cumsum(DegreeDays)])
}
f8= function(i){
SDT[i,Bud_dd := MetDT[i,cumsum(DegreeDays)]]
}
i= 6000:6500
res= microbenchmark(f1(i),f3(i),f4(i),f5(i),f7(i),f8(i), times = 10000)
autoplot(res)
And the resulting autoplot :
With f1 the reference base assignment, f2 using a matrix instead of a data.frame, f3 using the combination of .subset2 and matrix, f4 using a list and .subset2, f5 using two lists (both reading and writing), f6 using data.table::set, f7 using data.table::set and data.table for cumulative sum, and f8using data.table :=.
As we can see the best solution is to use lists for reading and writing. This is pretty surprising to see that data.table is the worst solution. I believe I did something wrong with it, because it is supposed to be the best. If you can improve it, please tell me.

Resources