How to define empty IndexedTables in Julia? - julia

I am unable to define empty IndexedTables, e.g.
using IndexedTables, IndexedTables.Table
t = Table(Columns(a=Int64[],b=String[]),Int64[])
t[1,"a"] = 1
t[1,"b"] = 2
t[1,"c"] = t[1,"a"] + t[1,"b"]
BoundsError: attempt to access 0-element Array{Int64,1} at index [0]
I am aware that creating the IndexedTable with already the data is more efficient that creating an empty one and then insert values, but sometimes you are obliged to go on this way.
Is this a bug ? If so, is there any workaround possible ?
(I already posted this thread on the Julia forum, but so far I had no replies there)

This is probably a bug in IndexedTables.
Inserting into an IndexedTable requires reindexing to access the data. Reindexing is done with flush!.
But flush!(t) fails in the example in the question with the empty t.
Fixing flush! which calls _merge! can be done by:
julia> function IndexedTables._merge!(dst::IndexedTable, src::IndexedTable, f)
if length(dst.index)==0 || isless(dst.index[end], src.index[1])
append!(dst.index, src.index)
append!(dst.data, src.data)
else
# merge to a new copy
new = _merge(dst, src, f)
ln = length(new)
# resize and copy data into dst
resize!(dst.index, ln)
copy!(dst.index, new.index)
resize!(dst.data, ln)
copy!(dst.data, new.data)
end
return dst
end
julia> t[1,"c"] = t[1,"a"] + t[1,"b"]
3
The change is the addition of the length(...) check in the first if.
Of course, a pull request / issue should be opened with IndexedTables.jl. Antonello, will you do this? (or shall I)

Related

Workaround for case-sensitive input to dir

I am using Octave 5.1.0 on Windows 10 (x64). I am parsing a series of directories looking for an Excel spreadsheet in each directory with "logbook" in its filename. The problem is these files are created by hand and the filenaming isn't consistent: sometimes it's "LogBook", other times it's "logbook", etc...
It looks like the string passed as input to the dir function is case-sensitive so if I don't have the correct case, dir returns an empty struct. Currently, I am using the following workaround, but I wondered if there was a better way of doing this (for a start I haven't captured all possible upper/lower case combinations):
logbook = dir('*LogBook.xls*');
if isempty(logbook)
logbook = dir('*logbook.xls*');
if isempty(logbook)
logbook = dir('*Logbook.xls*');
if isempty(logbook)
logbook = dir('*logBook.xls*');
if isempty(logbook)
error(['Could not find logbook spreadsheet in ' dir_name '.'])
end
end
end
end
You need to get the list of filenames (either via readdir, dir, ls), and then search for the string in that list. If you use readdir, it can be done like this:
[files, err, msg] = readdir ('.'); # read current directory
if (err != 0)
error ("failed to readdir (error code %d): %s", msg);
endif
logbook_indices = find (cellfun (#any, regexpi (files, 'logbook'));
logbook_filenames = files(logbook_indices);
A much less standard approach could be:
glob ('*[lL][oO][gG][bB][oO][kK]*')

Filter vertices on several properties - Julia

I am working on julia with the Metagraphs.jl library.
In order to conduct an optimization problem, I would like to get the set/list of edges in the graph that point to a special set of vertices having 2 particular properties in common.
My first guess was to first get the set/list of vertices. But I am facing a first issue which is that the filter_vertices function doesn't seem to accept to apply a filter on more than one property.
Here is below an example of what I would like to do:
g = DiGraph(5)
mg = MetaDiGraph(g, 1.0)
add_vertex!(mg)
add_edge!(mg,1,2)
add_edge!(mg,1,3)
add_edge!(mg,1,4)
add_edge!(mg,2,5)
add_edge!(mg,3,5)
add_edge!(mg,5,6)
add_edge!(mg,4,6)
set_props!(mg,3,Dict(:prop1=>1,:prop2=>2))
set_props!(mg,1,Dict(:prop1=>1,:prop2=>0))
set_props!(mg,2,Dict(:prop1=>1,:prop2=>0))
set_props!(mg,4,Dict(:prop1=>0,:prop2=>2))
set_props!(mg,5,Dict(:prop1=>0,:prop2=>2))
set_props!(mg,6,Dict(:prop1=>0,:prop2=>0))
col=collect(filter_vertices(mg,:prop1,1,:prop2,2))
And I want col to find vertex 3 and no others.
But the filter_vertices would only admit one property at a time and then it makes it more costly to do a loop with 2 filters and then try to compare in order to sort a list with the vertices that have both properties.
Considering the size of my graph I would like to avoid defining this set with multiple and costly loops. Would any one of you have an idea of how to solve this issue in an easy and soft way?
I ended up making this to answer my own question:
fil3=Array{Int64,1}()
fil1=filter_vertices(mg,:prop1,1)
for f in fil1
if get_prop(mg,f,:prop2)==2
push!(fil3,f)
end
end
println(fil3)
But tell me if you get anything more interesting
Thanks for your help!
Please provide a minimal working example in a way we can simply copy and paste, and start right away. Please also indicate where the problem occurs in the code. Below is an example for your scenario:
Pkg.add("MetaGraphs")
using LightGraphs, MetaGraphs
g = DiGraph(5)
mg = MetaDiGraph(g, 1.0)
add_vertex!(mg)
add_edge!(mg,1,2)
add_edge!(mg,1,3)
add_edge!(mg,1,4)
add_edge!(mg,2,5)
add_edge!(mg,3,5)
add_edge!(mg,5,6)
add_edge!(mg,4,6)
set_props!(mg,3,Dict(:prop1=>1,:prop2=>2))
set_props!(mg,1,Dict(:prop1=>1,:prop2=>0))
set_props!(mg,2,Dict(:prop1=>1,:prop2=>0))
set_props!(mg,4,Dict(:prop1=>0,:prop2=>2))
set_props!(mg,5,Dict(:prop1=>0,:prop2=>2))
set_props!(mg,6,Dict(:prop1=>0,:prop2=>0))
function my_vertex_filter(g::AbstractMetaGraph, v::Integer, prop1, prop2)
return has_prop(g, v, :prop1) && get_prop(g, v, :prop1) == prop1 &&
has_prop(g, v, :prop2) && get_prop(g, v, :prop2) == prop2
end
prop1 = 1
prop2 = 2
col = collect(filter_vertices(mg, (g,v)->my_vertex_filter(g,v,prop1,prop2)))
# returns Int[3]
Please check ?filter_vertices --- it gives you a hint on what/how to write to define your custom filter.
EDIT. For filtering the edges, you can have a look at ?filter_edges to see what you need to achieve the edge filtering. Append the below code excerpt to the solution above to get your results:
function my_edge_filter(g, e, prop1, prop2)
v = dst(e) # get the edge's destination vertex
return my_vertex_filter(g, v, prop1, prop2)
end
myedges = collect(filter_edges(mg, (g,e)->my_edge_filter(g,e,prop1,prop2)))
# returns [Edge 1 => 3]
I found this solution:
function filter_function1(g,prop1,prop2)
fil1=filter_vertices(g,:prop1,prop1)
fil2=filter_vertices(g,:prop2,prop2)
filter=intersect(fil1,fil2)
return filter
end
This seems to work and is quite easy to implement.
Just I don't know if the filter_vertices function is taking a lot of computational power.
Otherwise a simple loop like this seems to also work:
function filter_function2(g,prop1,prop2)
filter=Set{Int64}()
fil1=filter_vertices(g,:prop1,prop1)
for f in fil1
if get_prop(g,f,:prop2)==prop2
push!(filter,f)
end
end
return filter
end
I am open to any other answers if you have some more elegant ones.

How to use recursive function in Lua to dynamically build a table without overwriting it each call

I have a need to crawl a set of data of indeterminate size and build a table key/value index of it. Since I don't know the dimensions in advance, seems I have to use a recursive function. My Lua skills are very new and superficial. I'm having difficulty understanding how to deal with returning a table from the function call.
Note this is for a Lua 5.1 script processor
API = function(tbl)
local table_api = {}
-- do stuff here with target data and add to table_api
table_api["key"] = value
-- then later there is a need to recurse deeper into target data
table_api["lower"] = API(var)
return table_api
end
result = API(source_data)
In most every other language I know there would be some way to make the recurse line table_api["lower"] = API(var) work but since Lua does table variables by reference, my return sub-table just keeps getting overwritten and my result is a tiny last bit of what it should be.
Just for background on my purpose: there's a commercial application I'm working with that has a weakly documented Lua scripting interface. It's running Lua 5.1 and the API is not well documented and frequently updated. As I understand it, everything is stored in _G, so I wanted to write something to reverse engineer the API. I have a working recursive function (not shown here) that enumerates all of _G. For that return value, it just builds up an annotated string and progressively builds on the string. That all works fine and is really useful, but it shows much more that the API interface; all the actual data elements are included, so I have to sift through like 30,000 records to determine an API set of about 500 terms. In order to determine the API, I am trying to use this sub-table return value recursive function being discussed in this question. The code I'm showing here is just a small distilled subset of the larger function.
I'll go ahead and include the full code here. I was hoping to incrementally build a large'ish table of each API level, any sublevels, and finally whatever keys used at the lowest level.
In the end, I was expecting to have a table that I could address like this:
result["api"]["label"]["api"]["sublabel"]["value"]["valuename"]
Full code:
tableAPIShow = function(tbl, table_track)
table_track = table_track or {}
local table_api = {}
if type(tbl) == 'table' then
-- Check if values are tables.
local parent_table_flag = true
for ind,val in pairs(tbl) do
if type(val) ~= 'table' then
parent_table_flag = false
break
end
end
-- If all children are table type, check each of them for subordinate commonality
local api_flag = false
if parent_table_flag == true then
local child_table = {}
local child_table_flag = false
api_flag = true
for ind,val in pairs(tbl) do
-- For each child table, store the names of the indexes.
for sub_ind,sub_val in pairs(val) do
if child_table_flag == false then -- First time though, create starting template view of typical child table.
child_table[sub_ind] = true -- Store the indexes as a template table.
elseif child_table[sub_ind] == nil then -- Otherwise, test this child table compared to the reference template.
api_flag = false
break
end
end
if api_flag == false then -- need to break out of nested loop
break
end
child_table_flag = true
end
end
if api_flag == true then
-- If everything gets to here, then this level is an API with matching child tables below.
for ind,val in pairs(tbl) do
if table_api["api"] == nil then
table_api["api"] = {}
end
table_api["api"][ind] = tableAPIShow(val, table_track)
end
else
-- This level is not an API level, determine how to process otherwise.
for ind,val in pairs(tbl) do
if type(val) == 'table' then
if table_track[val] ~= nil then -- Have we already recursed this table?
else
table_track[val] = true
if table_api["table"] == nil then
table_api["table"] = {}
end
table_api["table"][ind] = tableAPIShow(val, table_track)
end
else -- The children are not tables, they are values
if table_api["value"] == nil then
table_api["value"] = {}
end
table_api["value"][ind] = val
end
end
end
else
-- It's not a table, just return it.
-- Probably never use this portion because it's caught on upper level recurse and not called
return tbl
end
return table_api
end
And I was calling this function in the main script like this:
local str = tableAPIShow(_G)
I've got another function that recursively shows a table so I can look inside my results and see I only get a return value that contains only the values of the top-level of _G (I have it excluded built-in Lua functions/values because I'm only interested in the Application API):
{
[value] = table: 00000000F22CB700 {
[value][_VERSION] = Application/5.8.1 (x86_64; Windows NT 10.0.16299),
[value][tableAPIShow] = "function: 00000000F22C6DE0, defined in (121-231) C:\\Users\\user\\AppData\\Local\\Temp\\APP\\/~mis00002690 ",
[value][_FINAL_VERSION] = true,
[value][Path] = ./Scripts/Database/elements/,
[value][class] = "function: 00000000F1953C40, defined in (68-81) Scripts/Common/Class.lua ",
[value][db_path] = ./Scripts/Database/,
[value][merge_all_units] = "function: 00000000F20D20C8, defined in (2242-2250) Scripts/Database/db_merge.lua ",
}
You just need to localize the variable you store your table in and it will work as you expect:
local table_api = {}
(note that you are passing table variable that conflicts with the global table variable and is not currently used in the function.)
I am inclined to believe that your tableAPIShow function code is working correctly and the
another function that recursively shows a table
fails to serialize your table fully. Hence you don't see the deeper levels of the table returned by tableAPIShow().
I got your initial and current code (tableAPIShow) to work with my simple table serialize function: the whole _Global table is exported completely and formatted as you implemented it in your tableAPIShow().
Code for testing:
apiMassiveTable = {
api = {
get = {
profile = {"profileID", "format"},
name = {"profileID", "encoding"},
number = {"profileID", "binary"}
},
set = {
name = {"apikey", "profileID", "encoding", "newname"},
number = {"apikey", "profileID", "binary", "newnumber"}
},
retrieve = {}
},
metadata = {version="1.4.2", build="nightly"}
}
-- tableAPIShow implemenation here
table.serialize = dofile("serialize.lua")
print(table.serialize("myNameForAHugeTable", tableAPIShow(_G)))
PS: Whatever serialize function you're using, it should enquote strings like Application/5.8.1 (x86_64; Windows NT 10.0.16299) which it does not.
Like #Paul Kulchenko said, you need to learn to use locals (https://www.lua.org/pil/4.2.html). Global variable post-exist until a new lua_State is loaded (a new environment, could be a new process depending on what interpreter you are using). So a tip is to always use local variables for anything you don't want to leave the function or leave the compilation unit.
Think of tables like dictionaries: a word is attached to a definition. Thusly the definition is the data.
What I think you are trying to do is serialization of a table of data. However, that isn't really necessary. You can either shadow copy or deep copy the given table. A shadow copy is when you don't delve into the depths of tables found in the keys, etc. A deep copy is when you copy tables in keys of tables in keys of tables... etc.
local shallow_copy = function(tab)
local rep_tab = {}
for index, value in pairs(tab)do
rep_tab[index] = value
end
return rep_tab
end
-- because local variable is not defined or declared immediately on a 1-liner,
-- a declaration has to exist so that deep_copy can be used
-- lets metatable execute functions
local deep_copy
deep_copy = function(tab)
local rep_tab = {}
for index, value in pairs(tab)do
if(type(value) == "table")then
rep_tab[index] = deep_copy(value)
else
rep_tab[index] = value
end
end
return rep_tab
end
Deco's deepcopy.lua
https://gist.github.com/Deco/3985043
You can also index tables using periods:
local tab = {}
tab.abc = 123
tab["def"] = 456
print(tab.abc, tab["def"])
To serialize the entire _G, you would just filter out the junk you don't need and recurse each table encountered. Watch out for _G, package, and _ENV because if defined it will recurse back to the start.
-- use cap as a whitelist
enumerate_dir = function(dir, cap)
local base = {}
for i, v in pairs(dir) do
-- skip trouble
if(i ~= "_G" and i ~= "_ENV" and i ~= "package")then
if(type(v) == "table")then -- if we have a table
base[i] = enumerate_dir(v, cap)
else
for k, n in pairs(cap) do
if(type(v) == n)then -- if whitelisted
base[i] = tostring(v)
end
end
end
end
end
return base
end
-- only functions and tables from _G please
local enumeration = enumerate_dir(_G, {"function", "table"})
-- do some cool quips to get a basic tree system
prefix = ""
recursive_print = function(dir)
for i, v in pairs(dir)do
if(type(v) == "table")then
print(prefix, i, v)
prefix = prefix .. ">"
recursive_print(v)
else
print(prefix, i, v)
end
end
if(#prefix > 0)then
prefix = prefix:sub(1, #prefix-1)
end
end
recursive_print(test)

Can LLDB data formatters call methods?

I'm debugging a Qt application using LLDB. At a breakpoint I can write
(lldb) p myQString.toUtf8().data()
and see the string contained within myQString, as data() returns char*. I would like to be able to write
(lldb) p myQString
and get the same output. This didn't work for me:
(lldb) type summary add --summary-string "${var.toUtf8().data()}" QString
Is it possible to write a simple formatter like this, or do I need to know the internals of QString and write a python script?
Alternatively, is there another way I should be using LLDB to view QStrings this way?
The following does work.
First, register your summary command:
debugger.HandleCommand('type summary add -F set_sblldbbp.qstring_summary "QString"')
Here is an implementation
def make_string_from_pointer_with_offset(F,OFFS,L):
strval = 'u"'
try:
data_array = F.GetPointeeData(0, L).uint16
for X in range(OFFS, L):
V = data_array[X]
if V == 0:
break
strval += unichr(V)
except:
pass
strval = strval + '"'
return strval.encode('utf-8')
#qt5
def qstring_summary(value, unused):
try:
d = value.GetChildMemberWithName('d')
#have to divide by 2 (size of unsigned short = 2)
offset = d.GetChildMemberWithName('offset').GetValueAsUnsigned() / 2
size = get_max_size(value)
return make_string_from_pointer_with_offset(d, offset, size)
except:
print '?????????????????????????'
return value
def get_max_size(value):
_max_size_ = None
try:
debugger = value.GetTarget().GetDebugger()
_max_size_ = int(lldb.SBDebugger.GetInternalVariableValue('target.max-string-summary-length', debugger.GetInstanceName()).GetStringAtIndex(0))
except:
_max_size_ = 512
return _max_size_
It is expected that what you tried to do won't work. The summary strings feature does not allow calling expressions.
Calling expressions in a debugger is always interesting, in a data formatter more so (if you're in an IDE - say Xcode - formatters run automatically). Every time you stop somewhere, even if you just stepped over one line, all these little expressions would all automatically run over and over again, at a large performance cost - and this is not even taking into account the fact that your data might be in a funny state already and running expressions has the potential to alter it even more, making your debugging sessions trickier than needed.
If the above wall of text still hasn't discouraged you ( :-) ), you want to write a Python formatter, and use the SB API to run your expression. Your value is an SBValue object, which has access to an SBFrame and an SBTarget. The combination of these two allows you to run EvaluateExpression("blah") and get back another SBValue, probably a char* to which you can then ask GetSummary() to get your c-string back.
If, on the other hand, you are now persuaded that running expressions in formatters is suboptimal, the good news is that QString most certainly has to store its data pointer somewhere.. if you find out where that is, you can just write a formatter as ${var.member1.member2.member3.theDataPointer} and obtain the same result!
this is my trial-and-error adaptation of a UTF16 string interpretation lldb script I found online (I apologise that I don't remember the source - and that I can't credit the author)
Note that this is for Qt 4.3.2 and versions close to it - as the handling of the 'data' pointer has since changed between then and Qt 5.x
def QString_SummaryProvider(valobj, internal_dict):
data = valobj.GetChildMemberWithName('d')#.GetPointeeData()
strSize = data.GetChildMemberWithName('size').GetValueAsUnsigned()
newchar = -1
i = 0
s = u'"'
while newchar != 0:
# read next wchar character out of memory
data_val = data.GetChildMemberWithName('data').GetPointeeData(i, 1)
size = data_val.GetByteSize()
e = lldb.SBError()
if size == 1:
newchar = data_val.GetUnsignedInt8(e, 0) # utf-8
elif size == 2:
newchar = data_val.GetUnsignedInt16(e, 0) # utf-16
elif size == 4:
newchar = data_val.GetUnsignedInt32(e, 0) # utf-32
else:
s = s + '<unexpected char size - error parsing QString>'
break
if e.fail:
s = s + '<parse error:' + e.why() + '>'
break
i = i + 1
if i > strSize:
break
# add the character to our string 's'
# print "char2 = %s" % newchar
if newchar != 0:
s = s + unichr(newchar)
s = s + u'"'
return s.encode('utf-8')

Simplifying device creation in sip.conf

I often have to define many similar devices in sip.conf like this:
[device](!)
; setting some parameters
[device01](device)
callerid=dev01 <01>
[device02](device)
callerid=dev02 <02>
; ...
[deviceXX](device)
callerid=devXX <XX>
The question is perhaps I could avoid setting device-name specific parameters by using some variable like following?
[device](!)
callerid=dev${DEVICE_NAME:-2} <${DEVICE_NAME:-2}>
; setting some parameters
[device01](device)
[device02](device)
; ...
[deviceXX](device)
P.S.
It would be perfect, if there was some device constructor, so I could reduce the script to following, but, I think, that is not possible in Asterisk.
[device](!)
callerid=dev${DEVICE_NAME:-2} <${DEVICE_NAME:-2}>
; setting some parameters
;[device${MAGIC_LOOP(1,XX,leading_zeroes)}](device)
I've had good results writing a small program that takes care of it. It checks for a line saying something like
------- Automatically generated -------
and whatever is after that line, it's going to be regenerated as soon as it detects that there are new values for it (it could be from a database or from a text file). Then, I run it with supervisor and it checks every XX seconds if there are changes.
If there are changes, it issues a sip reload command after updating the sip.conf file
I wrote it in python, but whatever language you feel comfortable with should work just fine.
That's how I managed that and has been working fine so far (after a couple of months). I'd be extremely interested in learning about other approaches though. It's basically this (called from another script with supervisor):
users = get_users_logic()
#get the data that will me used on the sip.conf file
data_to_be_hashed = reduce(lambda x, y: x + y, map(lambda x: x['username'] + x['password'] + x['company_prefix'], users))
m = hashlib.md5()
m.update(str(data_to_be_hashed).encode("ascii"))
new_md5 = m.hexdigest()
last_md5 = None
try:
file = open(os.path.dirname(os.path.realpath(__file__)) + '/lastMd5.txt', 'r')
last_md5 = file.read().rstrip()
file.close()
except:
pass
# if it changed...
if new_md5 != last_md5:
#needs update
with open(settings['asterisk']['path_to_sip_conf'], 'r') as file:
sip_content = file.read().rstrip()
parts = sip_content.split(";-------------- BEYOND THIS POINT IT IS AUTO GENERATED --------------;")
sip_content = parts[0].rstrip()
sip_content += "\n\n;-------------- BEYOND THIS POINT IT IS AUTO GENERATED --------------;\n\n"
for user in users:
m = hashlib.md5()
m.update(("%s:sip.ellauri.it:%s" % (user['username'], user['password'])).encode("ascii"))
md5secret = m.hexdigest()
sip_content += "[%s]\ntype = friend\ncontext = %sLocal\nmd5secret = %s\nhost = dynamic\n\n" % (
user['username'], user['company_prefix'], md5secret)
#write the sip.conf file
f = open(settings['asterisk']['path_to_sip_conf'], 'w')
print(sip_content, file=f)
f.close()
subprocess.call('asterisk -x "sip reload"', shell=True)
#write the new md5
f = open(os.path.dirname(os.path.realpath(__file__)) + '/lastMd5.txt', 'w')
print(new_md5, file=f)
f.close()

Resources