I have a dictionary that maps a key to a function object. Then, using Spark 1.4.1 (Spark may not even be relevant for this question), I try to map each object in the RDD using a function object retrieved from the dictionary (acts as look-up table). e.g. a small snippet of my code:
fnCall = groupFnList[0].fn
pagesRDD = pagesRDD.map(lambda x: [x, fnCall(x[0])]).map(shapeToTuple)
Now, it has fetched from a namedtuple the function object. Which I temporarily 'store' (c.q. pointing to fn obj) in FnCall. Then, using the map operations I want the x[0] element of each tuple to be processed using that function.
All works fine and good in that there indeed IS a fn object, but it behaves in a weird way.
Each time I call an action method on the RDD, even without having used a fn obj in between, the RDD values have changed! To visualize this I have created dummy functions for the fn objects that just output a random integer. After calling the fn obj on the RDD, I can inspect it with .take() or .first() and get the following:
pagesRDD.first()
>>> [(u'myPDF1.pdf', u'34', u'930', u'30')]
pagesRDD.first()
>>> [(u'myPDF1.pdf', u'23', u'472', u'11')]
pagesRDD.first()
>>> [(u'myPDF1.pdf', u'4', u'69', u'25')]
So it seems to me that the RDD's elements have the functions bound to them in some way, and each time I do an action operation (like .first(), very simple) it 'updates' the RDD's contents.
I don't want this to happen! I just want the function to process the RDD ONLY when I call it with a map operation. How can I 'unbind' this function after the map operation?
Any ideas?
Thanks!
####### UPDATE:
So apparently rewriting my code to call it like pagesRDD.map(fnCall) should do the trick, but why should this even matter? If I call
rdd = rdd.map(lambda x: (x,1))
rdd.first()
>>> # some output
rdd.first()
>>> # same output as before!
So in this case, using a lambda function it would not get bound to the rdd and would not be called each time I do a .take()-like action. So why is that the case when I use a fn object INSIDE the lambda? Logically it just does not make sense to me. Any explanation on this?
If you redefine your functions that their parameter is an iterable. Your code should look like this.
pagesRDD = pagesRDD.map(fnCall).map(shapeToTuple)
Related
How do I retrieve outputs from objects in an array as described in the background?
I have a function in R that returns multiple variables. For eg. if my function is called function_ABC,then:
a<-function_ABC (input_var)
gives a such that a$var1, a$var2, and a$var3 exist.
I have multiple cases to run such that I have put then in an array:
input_var <- c(1, 2, ...15)
for storing the outputs, I declared var such that:
var <- c(v1, v2, v3, .... v15)
Then I run:
assign(v1[i],function(input_var(i)))
However, after that I am unable to access these variables as v1[1]$var1. I can access them as: v1$var1, or v3$var1, etc. But this means I need to write 15*3 commands to retrieve my output.
Is there an easier way to do this?
Push your whole input set into an array Arr[ ].
Open a multi threaded executor E of certain size N.
Using a for loop on the input array Arr[], submit your function calls as a Callable job to the executor E. While submitting each job, hold the reference to the FutureTask in another Array FTArr[ ].
When all the FutureTask jobs are executed, you may retrieve the output for each of them by running another for loop on FTArr[ ].
Note :
• make sure to add synchronized block in your func_ABC, where you are accessing shared resources to avoid deadlocks.
• Please refer to the below link, if you want to know more about the usage of a count-down-latch. A count-down-latch helps you to find out, when exactly, all the child threads have finished execution.
https://www.geeksforgeeks.org/countdownlatch-in-java/
I've created my immutable Tensor_field and a function nabla that acts on the tensor (that is nabla(a::Tensor_field) = do something.
I've added a method to function dot for the tensor: Base.dot(a::Tensor_field, b::Tensor_field) = do something....
Now I want to define a new behavior to function dot with nabla as an argument.
Something like Base.dot(nabla::function, a::Tensor_field) = do something different.
I know in Julia we are able to pass functions as arguments to other functions, but I couldn't find in the docs how to define a method for a "functional" argument.
If I type typeof(nabla) the output is My_Module_Name.#nabla, not a real DataType...
If you want it to work for any function, you can do
Base.dot(f::Function, a::Tensor_field) = do something different
If you only want it to work for the nabla function already defined, you can take advantage of what you have discovered, namely that each function has a unique type:
Base.dot(f::typeof(nabla), a::Tensor_field) = do something different
This will match only the function called nabla, which will now be called f inside the function dot.
Note that you can write ∇ as \nabla<TAB> and use it in your code instead of the word nabla. If your tensor field is called e.g. 𝐯 (written as \mbfv<TAB>), you can then write ∇⋅𝐯 in your Julia code! (The centered dot is written as \cdot<TAB>, and is an alias for the dot function.)
In Julia am I allowed to create & use static fields? Let me explain my problem with a simplified example. Let's say we have a type:
type Foo
bar::Dict()
baz::Int
qux::Float64
function Foo(fname,baz_value,qux_value)
dict = JLD.load(fname)["dict"] # It is a simple dictionary loading from a special file
new(dict,baz_value,quz_value)
end
end
Now, As you can see, I load a dictionary from a jld file and store it into the Foo type with the other two variables baz and qux_value. Now, let's say I will create 3 Foo object type.
vars = [ Foo("mydictfile.jld",38,37.0) for i=1:3]
Here, as you can see, all of the Foo objects load the same dictionary. This is a quite big file (~10GB)and I don't want to load it many times. So,
I simply ask that, is there any way in julia so that, I load it just once and all of there 3 types can reach it? (That's way I simply use Static keyword inside the question)
For such a simple question, my approach might look like silly, but as a next step, I make this Foo type iterable and I need to use this dictionary inside the next(d::Foo, state) function.
EDIT
Actually, I've found a way right now. But I want to ask that whether this is a correct or not.
Rather than giving the file name to the FOO constructor, If I load the dictionary into a variable before creating the objects and give the same variable into all of the constructors, I guess all the constructors just create a pointer to the same dictionary rather than creating again and again. Am I right ?
So, modified version will be like that:
dict = JLD.load("mydictfile.jld")["dict"]
vars = [ Foo(dict,38,37.0) for i=1:3]
By the way,I still want to hear if I do the same thing completely inside the Foo type (I mean constructor of it)
You are making the type "too special" by adding the inner constructor. Julia provides default constructors if you do not provide an inner constructor; these just fill in the fields in an object of the new type.
So you can do something like:
immutable Foo{K,V}
bar::Dict{K,V}
baz::Int
qux::Float64
end
dict = JLD.load("mydictfile.jld")["dict"]
vars = [Foo(dict, i, i+1) for i in 1:3]
Note that it was a syntax error to include the parentheses after Dict in the type definition.
The {K,V} makes the Foo type parametric, so that you can make different kinds of Foo type, with different Dict types inside, if necessary. Even if you only use it for a single type of Dict, this will give more efficient code, since the type parameters K and V will be inferred when you create the Foo object. See the Julia manual: http://docs.julialang.org/en/release-0.5/manual/performance-tips/#avoid-fields-with-abstract-containers
So now you can try the code without even having the JLD file available (as we do not, for example):
julia> dict = Dict("a" => 1, "b" => 2)
julia> vars = [Foo(dict, i, Float64(i+1)) for i in 1:3]
3-element Array{Foo{String,Int64},1}:
Foo{String,Int64}(Dict("b"=>2,"a"=>1),1,2.0)
Foo{String,Int64}(Dict("b"=>2,"a"=>1),2,3.0)
Foo{String,Int64}(Dict("b"=>2,"a"=>1),3,4.0)
You can see that it is indeed the same dictionary (i.e. only a reference is actually stored in the type object) by modifying one of them and seeing that the others also change, i.e. that they point to the same dictionary object:
julia> vars[1].bar["c"] = 10
10
julia> vars
3-element Array{Foo{String,Int64},1}:
Foo{String,Int64}(Dict("c"=>10,"b"=>2,"a"=>1),1,2.0)
Foo{String,Int64}(Dict("c"=>10,"b"=>2,"a"=>1),2,3.0)
Foo{String,Int64}(Dict("c"=>10,"b"=>2,"a"=>1),3,4.0)
Given the following Go method:
func (t *T) TMethod(data *testData) (interface{}, *error) {
...
}
I want to reflect the name of the parameter (which is data here).
I tried the following, but it returns the structure name (which is testData here):
reflect.ValueOf(T).MethodByName("TMethod").Type.In(0).Elem().Name()
How can I get the name of the parameter?
There is no way to get the names of the parameters of a method or a function.
The reason for this is because the names are not really important for someone calling a method or a function. What matters is the types of the parameters and their order.
A Function type denotes the set of all functions with the same parameter and result types. The type of 2 functions having the same parameter and result types is identical regardless of the names of the parameters. The following code prints true:
func f1(a int) {}
func f2(b int) {}
fmt.Println(reflect.TypeOf(f1) == reflect.TypeOf(f2))
It is even possible to create a function or method where you don't even give names to the parameters (within a list of parameters or results, the names must either all be present or all be absent). This is valid code:
func NamelessParams(int, string) {
fmt.Println("NamelessParams called")
}
For details and examples, see Is unnamed arguments a thing in Go?
If you want to create some kind of framework where you call functions passing values to "named" parameters (e.g. mapping incoming API params to Go function/method params), you may use a struct because using the reflect package you can get the named fields (e.g. Value.FieldByName() and Type.FieldByName()), or you may use a map. See this related question: Initialize function fields
Here is a relevant discussion on the golang-nuts mailing list.
I've read through previous topics on closures on stackflow and other sources and one thing is still confusing me. From what I've been able to piece together technically a closure is simply the set of data containing the code of a function and the value of bound variables in that function.
In other words technically the following C function should be a closure from my understanding:
int count()
{
static int x = 0;
return x++;
}
Yet everything I read seems to imply closures must somehow involve passing functions as first class objects. In addition it usually seems to be implied that closures are not part of procedural programming. Is this a case of a solution being overly associated with the problem it solves or am I misunderstanding the exact definition?
No, that's not a closure. Your example is simply a function that returns the result of incrementing a static variable.
Here's how a closure would work:
function makeCounter( int x )
{
return int counter() {
return x++;
}
}
c = makeCounter( 3 );
printf( "%d" c() ); => 4
printf( "%d" c() ); => 5
d = makeCounter( 0 );
printf( "%d" d() ); => 1
printf( "%d" c() ); => 6
In other words, different invocations of makeCounter() produce different functions with their own binding of variables in their lexical environment that they have "closed over".
Edit: I think examples like this make closures easier to understand than definitions, but if you want a definition I'd say, "A closure is a combination of a function and an environment. The environment contains the variables that are defined in the function as well as those that are visible to the function when it was created. These variables must remain available to the function as long as the function exists."
For the exact definition, I suggest looking at its Wikipedia entry. It's especially good. I just want to clarify it with an example.
Assume this C# code snippet (that's supposed to perform an AND search in a list):
List<string> list = new List<string> { "hello world", "goodbye world" };
IEnumerable<string> filteredList = list;
var keywords = new [] { "hello", "world" };
foreach (var keyword in keywords)
filteredList = filteredList.Where(item => item.Contains(keyword));
foreach (var s in filteredList) // closure is called here
Console.WriteLine(s);
It's a common pitfall in C# to do something like that. If you look at the lambda expression inside Where, you'll see that it defines a function that it's behavior depends on the value of a variable at its definition site. It's like passing a variable itself to the function, rather than the value of that variable. Effectively, when this closure is called, it retrieves the value of keyword variable at that time. The result of this sample is very interesting. It prints out both "hello world" and "goodbye world", which is not what we wanted. What happened? As I said above, the function we declared with the lambda expression is a closure over keyword variable so this is what happens:
filteredList = filteredList.Where(item => item.Contains(keyword))
.Where(item => item.Contains(keyword));
and at the time of closure execution, keyword has the value "world," so we're basically filtering the list a couple times with the same keyword. The solution is:
foreach (var keyword in keywords) {
var temporaryVariable = keyword;
filteredList = filteredList.Where(item => item.Contains(temporaryVariable));
}
Since temporaryVariable is scoped to the body of the foreach loop, in every iteration, it is a different variable. In effect, each closure will bind to a distinct variable (those are different instances of temporaryVariable at each iteration). This time, it'll give the correct results ("hello world"):
filteredList = filteredList.Where(item => item.Contains(temporaryVariable_1))
.Where(item => item.Contains(temporaryVariable_2));
in which temporaryVariable_1 has the value of "hello" and temporaryVariable_2 has the value "world" at the time of closure execution.
Note that the closures have caused an extension to the lifetime of variables (their life were supposed to end after each iteration of the loop). This is also an important side effect of closures.
From what I understand a closure also has to have access to the variables in the calling context. Closures are usually associated with functional programming. Languages can have elements from different types of programming perspectives, functional, procedural, imperative, declarative, etc. They get their name from being closed over a specified context. They may also have lexical binding in that they can reference the specified context with the same names that are used in that context. Your example has no reference to any other context but a global static one.
From Wikipedia
A closure closes over the free variables (variables which are not local variables)
A closure is an implementation technique for representing procedures/functions with local state. One way to implement closures is described in SICP. I will present the gist of it, anyway.
All expressions, including functions are evaluated in an environement, An environment is a sequence of frames. A frame maps variable names to values. Each frame also has a
pointer to it's enclosing environment. A function is evaluated in a new environment with a frame containing bindings for it's arguments. Now let us look at the following interesting scenario. Imagine that we have a function called accumulator, which when evaluated, will return another function:
// This is some C like language that has first class functions and closures.
function accumulator(counter) {
return (function() { return ++counter; });
}
What will happen when we evaluate the following line?
accum1 = accumulator(0);
First a new environment is created and an integer object (for counter) is bound to 0 in it's first frame. The returned value, which is a new function, is bound in the global environment. Usually the new environment will be garbage collected once the function
evaluation is over. Here that will not happen. accum1 is holding a reference to it, as it needs access to the variable counter. When accum1 is called, it will increment the value of counter in the referenced environment. Now we can call accum1 a function with local state or a closure.
I have described a few practical uses of closures at my blog
http://vijaymathew.wordpress.com. (See the posts "Dangerous designs" and "On Message-Passing").
There's a lot of answers already, but I'll add another one anyone...
Closures aren't unique to functional languages. They occur in Pascal (and family), for instance, which has nested procedures. Standard C doesn't have them (yet), but IIRC there is a GCC extension.
The basic issue is that a nested procedure may refer to variables defined in it's parent. Furthermore, the parent may return a reference to the nested procedure to its caller.
The nested procedure still refers to variables that were local to the parent - specifically to the values those variables had when the line making the function-reference was executed - even though those variables no longer exist as the parent has exited.
The issue even occurs if the procedure is never returned from the parent - different references to the nested procedure constructed at different times may be using different past values of the same variables.
The resolution to this is that when the nested function is referenced, it is packaged up in a "closure" containing the variable values it needs for later.
A Python lambda is a simple functional-style example...
def parent () :
a = "hello"
return (lamda : a)
funcref = parent ()
print funcref ()
My Pythons a bit rusty, but I think that's right. The point is that the nested function (the lambda) is still referring to the value of the local variable a even though parent has exited when it is called. The function needs somewhere to preserve that value until it's needed, and that place is called a closure.
A closure is a bit like an implicit set of parameters.
Great question! Given that one of the OOP principles of OOP is that objects has behavior as well as data, closures are a special type of object because their most important purpose is their behavior. That said, what do I mean when I talk about their "behavior?"
(A lot of this is drawn from "Groovy in Action" by Dierk Konig, which is an awesome book)
On the simplest level a close is really just some code that's wrapped up to become an androgynous object/method. It's a method because it can take params and return a value, but it's also an object in that you can pass around a reference to it.
In the words of Dierk, imagine an envelope that has a piece of paper inside. A typical object would have variables and their values written on this paper, but a closure would have a list of instructions instead. Let's say the letter says to "Give this envelope and letter to your friends."
In Groovy: Closure envelope = { person -> new Letter(person).send() }
addressBookOfFriends.each (envelope)
The closure object here is the value of the envelope variable and it's use is that it's a param to the each method.
Some details:
Scope: The scope of a closure is the data and members that can be accessed within it.
Returning from a closure: Closures often use a callback mechanism to execute and return from itself.
Arguments: If the closure needs to take only 1 param, Groovy and other langs provide a default name: "it", to make coding quicker.
So for example in our previous example:
addressBookOfFriends.each (envelope)
is the same as:
addressBookOfFriends.each { new Letter(it).send() }
Hope this is what you're looking for!
An object is state plus function.
A closure, is function plus state.
function f is a closure when it closes over (captured) x
I think Peter Eddy has it right, but the example could be made more interesting. You could define two functions which close over a local variable, increment & decrement. The counter would be shared between that pair of functions, and unique to them. If you define a new pair of increment/decrement functions, they would be sharing a different counter.
Also, you don't need to pass in that initial value of x, you could let it default to zero inside the function block. That would make it clearer that it's using a value which you no longer have normal access to otherwise.