Why does push!() adds duplicate elements to a Set? - julia

When using a Set of a composite type in Julia, the push! function seems to add duplicate items to the set. Reading the Julia standard documentation, I assumed that the isequal function would be used to test for duplicates. I guess I misunderstood, so perhaps someone can help me out.
As an example see the code below. In particular, I'd like to know why t2 is added to the set, despite being identical to t1.
Any help is very much appreciated. Note: In my case two variables of type t are considered identical if the fields x1 and x2 are equal; the values of the remaining fields do not matter.
type t
x1::Float64
x2::Float64
b1::Bool
b2::Bool
end
isequal( tx::t, ty::t) = (tx.x1 == ty.x1) && (tx.x2 == ty.x2)
==(tx::t, ty::t) = (tx.x1 == ty.x1) && (tx.x2 == ty.x2)
t1 = t( 1, 2, true, true)
t2 = t( 1, 2, true, true)
tc = t1
tdc = deepcopy( t1)
[ t1 == t2 isequal( t1, t2)] # ---> [ true true ]
[ t1 == tc isequal( t1, tc)] # ---> [ true true ]
[ t1 == tdc isequal( t1, tdc)] # ---> [ true true ]
s = Set{t}()
push!( s, t1)
push!( s, t2) # adds t2 to the set although t2 and t1 are identical ...
push!( s, tc) # does not add ...
push!( s, tdc) # adds tdc although tdc and t1 are identical

As DSM indicated, you simply need to add a method for hash for your type, i.e.:
hash(x::t, h) = hash(x.x2, hash(x.x1, h))

Related

ellipsis ... as function in substitute?

I'm having trouble understanding how/why parentheses work where they otherwise should not work®.
f = function(...) substitute(...()); f(a, b)
[[1]]
a
[[2]]
b
# but, substitute returns ..1
f2 = function(...) substitute(...); f2(a, b)
a
Normally an error is thrown, could not find function "..." or '...' used in an incorrect context, for example when calling (\(...) ...())(5).
What I've tried
I have looked at the source code of substitute to find out why this doesn't happen here. R Internals 1.1.1 and 1.5.2 says ... is of SEXPTYPE DOTSXP, a pairlist of promises. These promises are what is extracted by substitute.
# \-substitute #R
# \-do_substitute #C
# \-substituteList #C recursive
# \-substitute #C
Going line-by-line, I am stuck at substituteList, in which h is the current element of ... being processed. This happens recursively at line 2832 if (TYPEOF(h) == DOTSXP) h = substituteList(h, R_NilValue);. I haven't found exception handling of a ...() case in the source code, so I suspect something before this has happened.
In ?substitute we find substitute works on a purely lexical basis. Does it mean ...() is a parser trick?
parse(text = "(\\(...) substitute(...()))(a, b)") |> getParseData() |> subset(text == "...", select = c(7, 9))
#> token text
#> 4 SYMBOL_FORMALS ...
#> 10 SYMBOL_FUNCTION_CALL ...
The second ellipsis is recognized during lexical analysis as the name of a function call. It doesn't have its own token like |> does. The output is a pairlist ( typeof(f(a, b)) ), which in this case is the same as a regular list (?). I guess it is not a parser trick. But whatever it is, it has been around for a while!
Question:
How does ...() work?
Note: When referring to documentation and source code, I provide links to an unofficial GitHub mirror of R's official Subversion repository. The links are bound to commit 97b6424 in the GitHub repo, which maps to revision 81461 in the Subversion repo (the latest at the time of this edit).
substitute is a "special" whose arguments are not evaluated (doc).
typeof(substitute)
[1] "special"
That means that the return value of substitute may not agree with parser logic, depending on how the unevaluated arguments are processed internally.
In general, substitute receives the call ...(<exprs>) as a LANGSXP of the form (pseudocode) pairlist(R_DotsSymbol, <exprs>) (doc). The context of the substitute call determines how the SYMSXP R_DotsSymbol is processed. Specifically, if substitute was called inside of a function with ... as a formal argument and rho as its execution environment, then the result of
findVarInFrame3(rho, R_DotsSymbol, TRUE)
in the body of C utility substituteList (source) is either a DOTSXP or R_MissingArg—the latter if and only if f was called without arguments (doc). In other contexts, the result is R_UnboundValue or (exceptionally) some other SEXP—the latter if and only if a value is bound to the name ... in rho. Each of these cases is handled specially by substituteList.
The multiplicity in the processing of R_DotsSymbol is the reason why these R statements give different results:
f0 <- function() substitute(...(n = 1)); f0()
## ...(n = 1)
f1 <- function(...) substitute(...(n = 1)); f1()
## $n
## [1] 1
g0 <- function() {... <- quote(x); substitute(...(n = 1))}; g0()
## Error in g0() : '...' used in an incorrect context
g1 <- function(...) {... <- quote(x); substitute(...(n = 1))}; g1()
## Error in g1() : '...' used in an incorrect context
h0 <- function() {... <- NULL; substitute(...(n = 1))}; h0()
## $n
## [1] 1
h1 <- function(...) {... <- NULL; substitute(...(n = 1))}; h1()
## $n
## [1] 1
Given how ...(n = 1) is parsed, you might have expected f1 to return call("...", n = 1), both g0 and g1 to return call("x", n = 1), and both h0 and h1 to throw an error, but that is not the case for the above, mostly undocumented reasons.
Internals
When called inside of the R function f,
f <- function(...) substitute(...(<exprs>))
substitute evaluates a call to the C utility do_substitute—you can learn this by looking here—in which argList gets a LISTSXP of the form pairlist(x, R_MissingArg), where x is a LANGSXP of the form pairlist(R_DotsSymbol, <exprs>) (source).
If you follow the body of do_substitute, then you will find that the value of t passed to substituteList from do_substitute is a LISTSXP of the form pairlist(copy_of_x) (source).
It follows that the while loop inside of the substituteList call (source) has exactly one iteration and that the statement CAR(el) == R_DotsSymbol in the body of the loop (source) is false in that iteration.
In the false branch of the conditional (source), h gets the value
pairlist(substituteList(copy_of_x, env)). The loop exits and substituteList returns h to do_substitute, which in turn returns CAR(h) to R (source 1, 2, 3).
Hence the return value of substitute is substituteList(copy_of_x, env), and it remains to deduce the identity of this SEXP. Inside of this call to substituteList, the while loop has 1+m iterations, where m is the number of <exprs>. In the first iteration, the statement CAR(el) == R_DotsSymbol in the body of the loop is true.
In the true branch of the conditional (source), h is either a DOTSXP or R_MissingArg, because f has ... as a formal argument (doc). Continuing, you will find that substituteList returns:
R_NilValue if h was R_MissingArg in the first while iteration and m = 0,
or, otherwise,
a LISTSXP listing the expressions in h (if h was a DOTSXP in the first while iteration) followed by <exprs> (if m > 1), all unevaluated and without substitutions, because the execution environment of f is empty at the time of the substitute call.
Indeed:
f <- function(...) substitute(...())
is.null(f())
## [1] TRUE
f <- function(...) substitute(...(n = 1))
identical(f(a = sin(x), b = zzz), pairlist(a = quote(sin(x)), b = quote(zzz), n = 1))
## [1] TRUE
Misc
FWIW, it helped me to recompile R after adding some print statements to coerce.c. For example, I added the following before UNPROTECT(3); in the body of do_substitute (source):
Rprintf("CAR(t) == R_DotsSymbol? %d\n",
CAR(t) == R_DotsSymbol);
if (TYPEOF(CAR(t)) == LISTSXP || TYPEOF(CAR(t)) == LANGSXP) {
Rprintf("TYPEOF(CAR(t)) = %s, length(CAR(t)) = %d\n",
type2char(TYPEOF(CAR(t))), length(CAR(t)));
Rprintf("CAR(CAR(t)) = R_DotsSymbol? %d\n",
CAR(CAR(t)) == R_DotsSymbol);
Rprintf("TYPEOF(CDR(CAR(t))) = %s, length(CDR(CAR(t))) = %d\n",
type2char(TYPEOF(CDR(CAR(t)))), length(CDR(CAR(t))));
}
if (TYPEOF(s) == LISTSXP || TYPEOF(s) == LANGSXP) {
Rprintf("TYPEOF(s) = %s, length(s) = %d\n",
type2char(TYPEOF(s)), length(s));
Rprintf("TYPEOF(CAR(s)) = %s, length(CAR(s)) = %d\n",
type2char(TYPEOF(CAR(s))), length(CAR(s)));
}
which helped me confirm what was going into and coming out of the substituteList call on the previous line:
f <- function(...) substitute(...(n = 1))
invisible(f(hello, world, hello(world)))
CAR(t) == R_DotsSymbol? 0
TYPEOF(CAR(t)) = language, length(CAR(t)) = 2
CAR(CAR(t)) = R_DotsSymbol? 1
TYPEOF(CDR(CAR(t))) = pairlist, length(CDR(CAR(t))) = 1
TYPEOF(s) = pairlist, length(s) = 1
TYPEOF(CAR(s)) = pairlist, length(CAR(s)) = 4
invisible(substitute(...()))
CAR(t) == R_DotsSymbol? 0
TYPEOF(CAR(t)) = language, length(CAR(t)) = 1
CAR(CAR(t)) = R_DotsSymbol? 1
TYPEOF(CDR(CAR(t))) = NULL, length(CDR(CAR(t))) = 0
TYPEOF(s) = pairlist, length(s) = 1
TYPEOF(CAR(s)) = language, length(CAR(s)) = 1
Obviously, compiling R with debugging symbols and running R under a debugger helps, too.
Another puzzle
Just noticed this oddity:
g <- function(...) substitute(...(n = 1), new.env())
gab <- g(a = sin(x), b = zzz)
typeof(gab)
## [1] "language"
gab
## ...(n = 1)
Someone here can do another deep dive to find out why the result is a LANGSXP rather than a LISTSXP when you supply env different from environment() (including env = NULL).

ST-HOSVD in Julia

I am trying to implement ST-HOSVD algorithm in Julia because I could not found library which contains ST-HOSVD.
See this paper in Algorithm 1 in page7.
https://people.cs.kuleuven.be/~nick.vannieuwenhoven/papers/01-STHOSVD.pdf
I cannot reproduce input (4,4,4,4) tensor by approximated tensor whose tucker rank is (2,2,2,2).
I think I have some mistake in indexes of matrix or tensor elements, but I could not locate it.
How to fix it?
If you know library of ST-HOSVD, let me know.
ST-HOSVD is really common way to reduce information. I hope the question helps many Julia user.
using TensorToolbox
function STHOSVD(A, reqrank)
N = ndims(A)
S = copy(A)
Sk = undef
Uk = []
for k = 1:N
if k == 1
Sk = tenmat(S, k)
end
Sk_svd = svd(Sk)
U1 = Sk_svd.U[ :, 1:reqrank[k] ]
V1t = Sk_svd.V[1:reqrank[k], : ]
Sigma1 = diagm( Sk_svd.S[1:reqrank[k]] )
Sk = Sigma1 * V1t
push!(Uk, U1)
end
X = ttm(Sk, Uk[1], 1)
for k=2:N
X = ttm(X, Uk[k], k)
end
return X
end
A = rand(4,4,4,4)
X = X_STHOSVD(A, [2,2,2,2])
EDIT
Here, Sk = tenmat(S, k) is mode n matricization of tensor S.
S∈R^{I_1×I_2×…×I_N}, S_k∈R^{I_k×(Π_{m≠k}^{N} I_m)}
The function is contained in TensorToolbox.jl. See "Basis" in Readme.
The definition of mode-k Matricization can be seen the paper in page 460.
It works.
I have seen 26 page in this slide
using TensorToolbox
using LinearAlgebra
using Arpack
function STHOSVD(T, reqrank)
N = ndims(T)
tensor_shape = size(T)
for i = 1 : N
T_i = tenmat(T, i)
if reqrank[i] == tensor_shape[i]
USV = svd(T_i)
else
USV = svds(T_i; nsv=reqrank[i] )[1]
end
T = ttm( T, USV.U * USV.U', i)
end
return T
end

Julia 1.0.2.1: Why is a variable changing value, without being assigned?

I am building a metaheuristic in Julia for study purpose.
The purpose is to find the best order of boxes.
1) I start with an initial order (random order) defined as. Order = InitOrder before my while loop.
2) For each iteration in the while loop I set CurrentOrder = Order
3) When the CurrentOrder is changed, Order changes too. Why does Order change value without being assigned? And how do I avoid it?
Version:
JuliaPro 1.0.2.1
Editor: Atom
while ( (time_ns()-timestart)/1.0e9 < RunLength && done == false ) #Stopping Criteria
done = true #Starting point
IterationCount = IterationCount + 1
BestCurrentValue = sum(H) #Worst case solutio
CurrentOrder = Order #(From,To)
for n1=1:N
for n2=1:N
if n1 != n2
(CurrentOrder,CopyTo) = SwapBox(CurrentOrder,n1,n2) #Swap boxes
(CurrentLayout,L) = DeltaCopy(CurrentLayout,CopyTo,CurrentOrder) #Delta Copy to minimise calculations
(TempLayout,L) = BLV(BinW,CurrentLayout,CopyTo,CurrentOrder,W,H,L) #Evalueate by BLV
if L < BestCurrentValue #check if TempLayout is better than Best Current
BestCurrentValue = L
BestCurrentOrder = CurrentOrder
BestCurrentLayout = CurrentLayout
end #if L<...
end #if n1 != n2
##############################################################################
CurrentOrder = Order
##############################################################################
end #n2 in N
end #n1 in N
if BestCurrentValue < BestValue
done = false #Look further
BestValue = BestCurrentValue
BestOrder = BestCurrentOrder
BestLayout = BestCurrentLayout
Order = BestOrder
end #if BestCurrentValue...
end #while
Your assignment NewOrder=Order does not copy any information in memory, it just says that the variable NewOrder should point to the same memory location as Order. Changing one of these variables will thus also change the other. If you want to copy a variable you could use NewOrder=deepcopy(Order)

filter max of N

Could it be possible to write in FFL a version of filter that stops filtering after the first negative match, i.e. the remaining items are assumed to be positive matches? more generally, a filter.
Example:
removeMaxOf1([1,2,3,4], value>=2)
Expected Result:
[1,3,4]
This seems like something very difficult to write in a pure functional style. Maybe recursion or let could acheive it?
Note: the whole motivation for this question was hypothesizing about micro-optimizations. so performance is very relevant. I am also looking for something that is generally applicable to any data type, not just int.
I have recently added find_index to the engine which allows this to be done easily:
if(n = -1, [], list[:n] + list[n+1:])
where n = find_index(list, value<2)
where list = [1,2,3,4]
find_index will return the index of the first match, or -1 if no match is found. There is also find_index_or_die which returns the index of the first match, asserting if none is found for when you're absolutely certain there is an instance in the list.
You could also implement something like this using recursion:
def filterMaxOf1(list ls, function(list)->bool pred, list result=[]) ->list
base ls = []: result
base not pred(ls[0]): result + ls[1:]
recursive: filterMaxOf1(ls[1:], pred, result + [ls[0]])
Of course recursion can! :D
filterMaxOf1(input, target)
where filterMaxOf1 = def
([int] l, function f) -> [int]
if(size(l) = 0,
[],
if(not f(l[0]),
l[1:],
flatten([
l[0],
recurse(l[1:], f)
])
)
)
where input = [
1, 2, 3, 4, ]
where target = def
(int i) -> bool
i < 2
Some checks:
--> filterOfMax1([1, ]) where filterOfMax1 = [...]
[1]
--> filterOfMax1([2, ]) where filterOfMax1 = [...]
[]
--> filterOfMax1([1, 2, ]) where filterOfMax1 = [...]
[1]
--> filterOfMax1([1, 2, 3, 4, ]) where filterOfMax1 = [...]
[1, 3, 4]
This flavor loses some strong type safety, but is nearer to tail recursion:
filterMaxOf1(input, target)
where filterMaxOf1 = def
([int] l, function f) -> [int]
flatten(filterMaxOf1i(l, f))
where filterMaxOf1i = def
([int] l, function f) -> [any]
if(size(l) = 0,
[],
if(not f(l[0]),
l[1:],
[
l[0],
recurse(l[1:], f)
]
)
)
where input = [
1, 2, 3, 4, ]
where target = def
(int i) -> bool
i < 2

Push dictionary? How to achieve this in Lua?

Say I have this dictionary in Lua
places = {dest1 = 10, dest2 = 20, dest3 = 30}
In my program I check if the dictionary has met my size limit in this case 3, how do I push the oldest key/value pair out of the dictionary and add a new one?
places["newdest"] = 50
--places should now look like this, dest3 pushed off and newdest added and dictionary has kept its size
places = {newdest = 50, dest1 = 10, dest2 = 20}
It's not too difficult to do this, if you really needed it, and it's easily reusable as well.
local function ld_next(t, i) -- This is an ordered iterator, oldest first.
if i <= #t then
return i + 1, t[i], t[t[i]]
end
end
local limited_dict = {__newindex = function(t,k,v)
if #t == t[0] then -- Pop the last entry.
t[table.remove(t, 1)] = nil
end
table.insert(t, k)
rawset(t, k, v)
end, __pairs = function(t)
return ld_next, t, 1
end}
local t = setmetatable({[0] = 3}, limited_dict)
t['dest1'] = 10
t['dest2'] = 20
t['dest3'] = 30
t['dest4'] = 50
for i, k, v in pairs(t) do print(k, v) end
dest2 20
dest3 30
dest4 50
The order is stored in the numeric indices, with the 0th index indicating the limit of unique keys that the table can have.
Given that dictionary keys do not save their entered position, I wrote something that should be able to help you accomplish what you want, regardless.
function push_old(t, k, v)
local z = fifo[1]
t[z] = nil
t[k] = v
table.insert(fifo, k)
table.remove(fifo, 1)
end
You would need to create the fifo table first, based on the order you entered the keys (for instance, fifo = {"dest3", "dest2", "dest1"}, based on your post, from first entered to last entered), then use:
push_old(places, "newdest", 50)
and the function will do the work. Happy holidays!

Resources