How to prevent global variable or array in module? - julia

I am coming from Fortran to Julia. I know in Julia want to prevent using global variables.
But the thing is, how to prevent using global variables in a module?
For example, in the following module,
module Mod
global AAA=zeros(1000000000)
function f(x)
change the most up to date AAA with x in some way.
return nothing
end
function g(x)
using the most up to date AAA, then change/update AAA with x in some way.
return nothing
end
end
In the above example, I need to update the very big array AAA, so I put AAA in the global. So f(x) and g(x) can use the most updated AAA and further update them.
Now since Julia discourage the use of global variable, so in my case, what do I do?
I mean, do I put AAA just in the argument of f(x) and g(x) such that they become f(x,AAA) and g(x,AAA)?
Is passing a very big array like AAA in the argument really faster than putting AAA just as a global variable?

It is possible to pass the array in question to the functions.
Here is an updated version of your pseudo code.
module Mod
AAA=zeros(1000000000)
function f!(ba, x) # the ! mark to indicate that ba will be updated
ba[1]=x
return nothing
end
function g!(ba, x)
ba[1] +=x
return nothing
end
function example()
f!(AAA,1)
g!(AAA,2)
#show(AAA[1])
end
end
I am using my phone, so there could be some typos, and I can't benchmark, but you could do it if you want to convince yourself there is no penalty in passing the array as an argument.
It is common practice to add a ! when the function mutates the content of the argument. Also, the arguments changed are put first in the list. Of course, these are only conventions, but it makes it easier for others to understand the intent of your code.

Related

My defined R function does not 'save' the changes made to a matrix [duplicate]

I'm just getting my feet wet in R and was surprised to see that a function doesn't modify an object, at least it seems that's the default. For example, I wrote a function just to stick an asterisk on one label in a table; it works inside the function but the table itself is not changed. (I'm coming mainly from Ruby)
So, what is the normal, accepted way to use functions to change objects in R? How would I add an asterisk to the table title?
Replace the whole object: myTable = title.asterisk(myTable)
Use a work-around to call by reference (as described, for example, in Call by reference in R by TszKin Julian?
Use some structure other than a function? An object method?
The reason you're having trouble is the fact that you are passing the object into the local namespace of the function. This is one of the great / terrible things about R: it allows implicit variable declarations and then implements supercedence as the namespaces become deeper.
This is affecting you because a function creates a new namespace within the current namespace. The object 'myTable' was, I assume, originally created in the global namespace, but when it is passed into the function 'title.asterisk' a new function-local namespace now has an object with the same properties. This works like so:
title.asterisk <- function(myTable){ do some stuff to 'myTable' }
In this case, the function 'title.asterisk' does not make any changes to the global object 'myTable'. Instead, a local object is created with the same name, so the local object supercedes the global object. If we call the function title.asterisk(myTable) in this way, the function makes changes only to the local variable.
There are two direct ways to modify the global object (and many indirect ways).
Option 1: The first, as you mention, is to have the function return the object and overwrite the global object, like so:
title.asterisk <- function(myTable){
do some stuff to 'myTable'
return(myTable)
}
myTable <- title.asterisk(myTable)
This is okay, but you are still making your code a little difficult to understand, since there are really two different 'myTable' objects, one global and one local to the function. A lot of coders clear this up by adding a period '.' in front of variable arguments, like so:
title.asterisk <- function(.myTable){
do some stuff to '.myTable'
return(.myTable)
}
myTable <- title.asterisk(myTable)
Okay, now we have a visual cue that the two variables are different. This is good, because we don't want to rely on invisible things like namespace supercedence when we're trying to debug our code later. It just makes things harder than they have to be.
Option 2: You could just modify the object from within the function. This is the better option when you want to do destructive edits to an object and don't want memory inflation. If you are doing destructive edits, you don't need to save an original copy. Also, if your object is suitably large, you don't want to be copying it when you don't have to. To make edits to a global namespace object, simply don't pass it into or declare it from within the function.
title.asterisk <- function(){ do some stuff to 'myTable' }
Now we are making direct edits to the object 'myTable' from within the function. The fact that we aren't passing the object makes our function look to higher levels of namespace to try and resolve the variable name. Lo, and behold, it finds a 'myTable' object higher up! The code in the function makes the changes to the object.
A note to consider: I hate debugging. I mean I really hate debugging. This means a few things for me in R:
I wrap almost everything in a function. As I write my code, as soon as I get a piece working, I wrap it in a function and set it aside. I make heavy use of the '.' prefix for all my function arguments and use no prefix for anything that is native to the namespace it exists in.
I try not to modify global objects from within functions. I don't like where this leads. If an object needs to be modified, I modify it from within the function that declared it. This often means I have layers of functions calling functions, but it makes my work both modular and easy to understand.
I comment all of my code, explaining what each line or block is intended to do. It may seem a bit unrelated, but I find that these three things go together for me. Once you start wrapping coding in functions, you will find yourself wanting to reuse more of your old code. That's where good commenting comes in. For me, it's a necessary piece.
The two paradigms are replacing the whole object, as you indicate, or writing 'replacement' functions such as
`updt<-` <- function(x, ..., value) {
## x is the object to be manipulated, value the object to be assigned
x$lbl <- paste0(x$lbl, value)
x
}
with
> d <- data.frame(x=1:5, lbl=letters[1:5])
> d
x lbl
1 1 a
2 2 b
3 3 c
> updt(d) <- "*"
> d
x lbl
1 1 a*
2 2 b*
3 3 c*
This is the behavior of, for instance, $<- -- in-place update the element accessed by $. Here is a related question. One could think of replacement functions as syntactic sugar for
updt1 <- function(x, ..., value) {
x$lbl <- paste0(x$lbl, value)
x
}
d <- updt1(d, value="*")
but the label 'syntactic sugar' doesn't really do justice, in my mind, to the central paradigm that is involved. It is enabling convenient in-place updates, which is different from the copy-on-change illusion that R usually maintains, and it is really the 'R' way of updating objects (rather than using ?ReferenceClasses, for instance, which have more of the feel of other languages but will surprise R users expecting copy-on-change semantics).
For anybody in the future looking for a simple way (do not know if it is the more appropriate one) to get this solved:
Inside the function create the object to temporally save the modified version of the one you want to change. Use deparse(substitute()) to get the name of the variable that has been passed to the function argument and then use assign() to overwrite your object. You will need to use envir = parent.frame() inside assign() to let your object be defined in the environment outside the function.
(MyTable <- 1:10)
[1] 1 2 3 4 5 6 7 8 9 10
title.asterisk <- function(table) {
tmp.table <- paste0(table, "*")
name <- deparse(substitute(table))
assign(name, tmp.table, envir = parent.frame())
}
(title.asterisk(MyTable))
[1] "1*" "2*" "3*" "4*" "5*" "6*" "7*" "8*" "9*" "10*"
Using parentheses when defining an object is a little more efficient (and to me, better looking) than defining then printing.

R language: changes to the value of an attribute of an object inside a function is lost after function exits [duplicate]

I'm just getting my feet wet in R and was surprised to see that a function doesn't modify an object, at least it seems that's the default. For example, I wrote a function just to stick an asterisk on one label in a table; it works inside the function but the table itself is not changed. (I'm coming mainly from Ruby)
So, what is the normal, accepted way to use functions to change objects in R? How would I add an asterisk to the table title?
Replace the whole object: myTable = title.asterisk(myTable)
Use a work-around to call by reference (as described, for example, in Call by reference in R by TszKin Julian?
Use some structure other than a function? An object method?
The reason you're having trouble is the fact that you are passing the object into the local namespace of the function. This is one of the great / terrible things about R: it allows implicit variable declarations and then implements supercedence as the namespaces become deeper.
This is affecting you because a function creates a new namespace within the current namespace. The object 'myTable' was, I assume, originally created in the global namespace, but when it is passed into the function 'title.asterisk' a new function-local namespace now has an object with the same properties. This works like so:
title.asterisk <- function(myTable){ do some stuff to 'myTable' }
In this case, the function 'title.asterisk' does not make any changes to the global object 'myTable'. Instead, a local object is created with the same name, so the local object supercedes the global object. If we call the function title.asterisk(myTable) in this way, the function makes changes only to the local variable.
There are two direct ways to modify the global object (and many indirect ways).
Option 1: The first, as you mention, is to have the function return the object and overwrite the global object, like so:
title.asterisk <- function(myTable){
do some stuff to 'myTable'
return(myTable)
}
myTable <- title.asterisk(myTable)
This is okay, but you are still making your code a little difficult to understand, since there are really two different 'myTable' objects, one global and one local to the function. A lot of coders clear this up by adding a period '.' in front of variable arguments, like so:
title.asterisk <- function(.myTable){
do some stuff to '.myTable'
return(.myTable)
}
myTable <- title.asterisk(myTable)
Okay, now we have a visual cue that the two variables are different. This is good, because we don't want to rely on invisible things like namespace supercedence when we're trying to debug our code later. It just makes things harder than they have to be.
Option 2: You could just modify the object from within the function. This is the better option when you want to do destructive edits to an object and don't want memory inflation. If you are doing destructive edits, you don't need to save an original copy. Also, if your object is suitably large, you don't want to be copying it when you don't have to. To make edits to a global namespace object, simply don't pass it into or declare it from within the function.
title.asterisk <- function(){ do some stuff to 'myTable' }
Now we are making direct edits to the object 'myTable' from within the function. The fact that we aren't passing the object makes our function look to higher levels of namespace to try and resolve the variable name. Lo, and behold, it finds a 'myTable' object higher up! The code in the function makes the changes to the object.
A note to consider: I hate debugging. I mean I really hate debugging. This means a few things for me in R:
I wrap almost everything in a function. As I write my code, as soon as I get a piece working, I wrap it in a function and set it aside. I make heavy use of the '.' prefix for all my function arguments and use no prefix for anything that is native to the namespace it exists in.
I try not to modify global objects from within functions. I don't like where this leads. If an object needs to be modified, I modify it from within the function that declared it. This often means I have layers of functions calling functions, but it makes my work both modular and easy to understand.
I comment all of my code, explaining what each line or block is intended to do. It may seem a bit unrelated, but I find that these three things go together for me. Once you start wrapping coding in functions, you will find yourself wanting to reuse more of your old code. That's where good commenting comes in. For me, it's a necessary piece.
The two paradigms are replacing the whole object, as you indicate, or writing 'replacement' functions such as
`updt<-` <- function(x, ..., value) {
## x is the object to be manipulated, value the object to be assigned
x$lbl <- paste0(x$lbl, value)
x
}
with
> d <- data.frame(x=1:5, lbl=letters[1:5])
> d
x lbl
1 1 a
2 2 b
3 3 c
> updt(d) <- "*"
> d
x lbl
1 1 a*
2 2 b*
3 3 c*
This is the behavior of, for instance, $<- -- in-place update the element accessed by $. Here is a related question. One could think of replacement functions as syntactic sugar for
updt1 <- function(x, ..., value) {
x$lbl <- paste0(x$lbl, value)
x
}
d <- updt1(d, value="*")
but the label 'syntactic sugar' doesn't really do justice, in my mind, to the central paradigm that is involved. It is enabling convenient in-place updates, which is different from the copy-on-change illusion that R usually maintains, and it is really the 'R' way of updating objects (rather than using ?ReferenceClasses, for instance, which have more of the feel of other languages but will surprise R users expecting copy-on-change semantics).
For anybody in the future looking for a simple way (do not know if it is the more appropriate one) to get this solved:
Inside the function create the object to temporally save the modified version of the one you want to change. Use deparse(substitute()) to get the name of the variable that has been passed to the function argument and then use assign() to overwrite your object. You will need to use envir = parent.frame() inside assign() to let your object be defined in the environment outside the function.
(MyTable <- 1:10)
[1] 1 2 3 4 5 6 7 8 9 10
title.asterisk <- function(table) {
tmp.table <- paste0(table, "*")
name <- deparse(substitute(table))
assign(name, tmp.table, envir = parent.frame())
}
(title.asterisk(MyTable))
[1] "1*" "2*" "3*" "4*" "5*" "6*" "7*" "8*" "9*" "10*"
Using parentheses when defining an object is a little more efficient (and to me, better looking) than defining then printing.

julia introspection - get name of variable passed to function

In Julia, is there any way to get the name of a passed to a function?
x = 10
function myfunc(a)
# do something here
end
assert(myfunc(x) == "x")
Do I need to use macros or is there a native method that provides introspection?
You can grab the variable name with a macro:
julia> macro mymacro(arg)
string(arg)
end
julia> #mymacro(x)
"x"
julia> #assert(#mymacro(x) == "x")
but as others have said, I'm not sure why you'd need that.
Macros operate on the AST (code tree) during compile time, and the x is passed into the macro as the Symbol :x. You can turn a Symbol into a string and vice versa. Macros replace code with code, so the #mymacro(x) is simply pulled out and replaced with string(:x).
Ok, contradicting myself: technically this is possible in a very hacky way, under one (fairly limiting) condition: the function name must have only one method signature. The idea is very similar the answers to such questions for Python. Before the demo, I must emphasize that these are internal compiler details and are subject to change. Briefly:
julia> function foo(x)
bt = backtrace()
fobj = eval(current_module(), symbol(Profile.lookup(bt[3]).func))
Base.arg_decl_parts(fobj.env.defs)[2][1][1]
end
foo (generic function with 1 method)
julia> foo(1)
"x"
Let me re-emphasize that this is a bad idea, and should not be used for anything! (well, except for backtrace display). This is basically "stupid compiler tricks", but I'm showing it because it can be kind of educational to play with these objects, and the explanation does lead to a more useful answer to the clarifying comment by #ejang.
Explanation:
bt = backtrace() generates a ... backtrace ... from the current position. bt is an array of pointers, where each pointer is the address of a frame in the current call stack.
Profile.lookup(bt[3]) returns a LineInfo object with the function name (and several other details about each frame). Note that bt[1] and bt[2] are in the backtrace-generation function itself, so we need to go further up the stack to get the caller.
Profile.lookup(...).func returns the function name (the symbol :foo)
eval(current_module(), Profile.lookup(...)) returns the function object associated with the name :foo in the current_module(). If we modify the definition of function foo to return fobj, then note the equivalence to the foo object in the REPL:
julia> function foo(x)
bt = backtrace()
fobj = eval(current_module(), symbol(Profile.lookup(bt[3]).func))
end
foo (generic function with 1 method)
julia> foo(1) == foo
true
fobj.env.defs returns the first Method entry from the MethodTable for foo/fobj
Base.decl_arg_parts is a helper function (defined in methodshow.jl) that extracts argument information from a given Method.
the rest of the indexing drills down to the name of the argument.
Regarding the restriction that the function have only one method signature, the reason is that multiple signatures will all be listed (see defs.next) in the MethodTable. As far as I know there is no currently exposed interface to get the specific method associated with a given frame address. (as an exercise for the advanced reader: one way to do this would be to modify the address lookup functionality in jl_getFunctionInfo to also return the mangled function name, which could then be re-associated with the specific method invocation; however, I don't think we currently store a reverse mapping from mangled name -> Method).
Note also that (1) backtraces are slow (2) there is no notion of "function-local" eval in Julia, so even if one has the variable name, I believe it would be impossible to actually access the variable (and the compiler may completely elide local variables, unused or otherwise, put them in a register, etc.)
As for the IDE-style introspection use mentioned in the comments: foo.env.defs as shown above is one place to start for "object introspection". From the debugging side, Gallium.jl can inspect DWARF local variable info in a given frame. Finally, JuliaParser.jl is a pure-Julia implementation of the Julia parser that is actively used in several IDEs to introspect code blocks at a high level.
Another method is to use the function's vinfo. Here is an example:
function test(argx::Int64)
vinfo = code_lowered(test,(Int64,))
string(vinfo[1].args[1][1])
end
test (generic function with 1 method)
julia> test(10)
"argx"
The above depends on knowing the signature of the function, but this is a non-issue if it is coded within the function itself (otherwise some macro magic could be needed).

Confusion with ' operator and bracketing where (v')*v becomes Ac_mul_B despite overloading

I am playing around with the idea of CoVectors in Julia and am getting some behaviour I didn't expect from the parser/compiler. I have defined a new CoVector type that is the ctranspose of any vector, and is just a simple decoration:
type CoVector{T<:AbstractVector}
v::T
end
They can be created (and uncreated) with ' using ctranspose:
import Base.ctranspose
function CoVector(T::DataType,d::Integer=0)
return CoVector(Array(T,d))
end
function Base.ctranspose(cv::CoVector)
return cv.v
end
function Base.ctranspose(v::AbstractVector)
return CoVector(v)
end
function Base.ctranspose(v::Vector) # this is already specialized in Base
return CoVector(v)
end
Next I want to define a simple dot product
function *(x::CoVector,y::AbstractVector)
return dot(x.v,y)
end
Which can work fine for:
v = [1,2,3]
cv = v'
cv*v
returns 14, and cv is a CoVector. But if I do
(v') * v
I get something different! In this case it is a single element array containing 14. How come parenthesis doesn't work how I expect?
In the end we see the expression gets expanded to Ac_mul_B which defaults to [dot(A,B)] and it seems that this interpretation is defined at the "operator" level.
Is this expected behaviour? Can Julia completely ignore my bracketing and change the expression as it wants? In Julia I like that I can override things in Base but is it also possible to make self-consistent changes to how operators are applied? I see the expression doesn't have head call but Symbol call... does this change in Julia 0.4? (I read somewhere that call is becoming more universal).
I guess I can fix the problem by redefining Ac_mul_B but I was very surprised that it was called at all, given my above definitions.

Returning multiple values in Ruby, to be used to call a function

Is it possible to return multiple values from a function?
I want to pass the return values into another function, and I wonder if I can avoid having to explode the array into multiple values
My problem?
I am upgrading Capybara for my project, and I realized, thanks to CSS 'contains' selector & upgrade of Capybara, that the statement below will no longer work
has_selector?(:css, "#rightCol:contains(\"#{page_name}\")")
I want to get it working with minimum effort (there are a lot of such cases), So I came up with the idea of using Nokogiri to convert the css to xpath. I wanted to write it so that the above function can become
has_selector? xpath(:css, "#rightCol:contains(\"#{page_name}\")")
But since xpath has to return an array, I need to actually write this
has_selector?(*xpath(:css, "#rightCol:contains(\"#{page_name}\")"))
Is there a way to get the former behavior?
It can be assumed that right now xpath func is like the below, for brevity.
def xpath(*a)
[1, 2]
end
You cannot let a method return multiple values. In order to do what you want, you have to change has_selector?, maybe something like this:
alias old_has_selector? :has_selector?
def has_selector? arg
case arg
when Array then old_has_selector?(*arg)
else old_has_selector?(arg)
end
end
Ruby has limited support for returning multiple values from a function. In particular a returned Array will get "destructured" when assigning to multiple variables:
def foo
[1, 2]
end
a, b = foo
a #=> 1
b #=> 2
However in your case you need the splat (*) to make it clear you're not just passing the array as the first argument.
If you want a cleaner syntax, why not just write your own wrapper:
def has_xpath?(xp)
has_selector?(*xpath(:css, xp))
end

Resources