Can LLDB data formatters call methods? - qt

I'm debugging a Qt application using LLDB. At a breakpoint I can write
(lldb) p myQString.toUtf8().data()
and see the string contained within myQString, as data() returns char*. I would like to be able to write
(lldb) p myQString
and get the same output. This didn't work for me:
(lldb) type summary add --summary-string "${var.toUtf8().data()}" QString
Is it possible to write a simple formatter like this, or do I need to know the internals of QString and write a python script?
Alternatively, is there another way I should be using LLDB to view QStrings this way?

The following does work.
First, register your summary command:
debugger.HandleCommand('type summary add -F set_sblldbbp.qstring_summary "QString"')
Here is an implementation
def make_string_from_pointer_with_offset(F,OFFS,L):
strval = 'u"'
try:
data_array = F.GetPointeeData(0, L).uint16
for X in range(OFFS, L):
V = data_array[X]
if V == 0:
break
strval += unichr(V)
except:
pass
strval = strval + '"'
return strval.encode('utf-8')
#qt5
def qstring_summary(value, unused):
try:
d = value.GetChildMemberWithName('d')
#have to divide by 2 (size of unsigned short = 2)
offset = d.GetChildMemberWithName('offset').GetValueAsUnsigned() / 2
size = get_max_size(value)
return make_string_from_pointer_with_offset(d, offset, size)
except:
print '?????????????????????????'
return value
def get_max_size(value):
_max_size_ = None
try:
debugger = value.GetTarget().GetDebugger()
_max_size_ = int(lldb.SBDebugger.GetInternalVariableValue('target.max-string-summary-length', debugger.GetInstanceName()).GetStringAtIndex(0))
except:
_max_size_ = 512
return _max_size_

It is expected that what you tried to do won't work. The summary strings feature does not allow calling expressions.
Calling expressions in a debugger is always interesting, in a data formatter more so (if you're in an IDE - say Xcode - formatters run automatically). Every time you stop somewhere, even if you just stepped over one line, all these little expressions would all automatically run over and over again, at a large performance cost - and this is not even taking into account the fact that your data might be in a funny state already and running expressions has the potential to alter it even more, making your debugging sessions trickier than needed.
If the above wall of text still hasn't discouraged you ( :-) ), you want to write a Python formatter, and use the SB API to run your expression. Your value is an SBValue object, which has access to an SBFrame and an SBTarget. The combination of these two allows you to run EvaluateExpression("blah") and get back another SBValue, probably a char* to which you can then ask GetSummary() to get your c-string back.
If, on the other hand, you are now persuaded that running expressions in formatters is suboptimal, the good news is that QString most certainly has to store its data pointer somewhere.. if you find out where that is, you can just write a formatter as ${var.member1.member2.member3.theDataPointer} and obtain the same result!

this is my trial-and-error adaptation of a UTF16 string interpretation lldb script I found online (I apologise that I don't remember the source - and that I can't credit the author)
Note that this is for Qt 4.3.2 and versions close to it - as the handling of the 'data' pointer has since changed between then and Qt 5.x
def QString_SummaryProvider(valobj, internal_dict):
data = valobj.GetChildMemberWithName('d')#.GetPointeeData()
strSize = data.GetChildMemberWithName('size').GetValueAsUnsigned()
newchar = -1
i = 0
s = u'"'
while newchar != 0:
# read next wchar character out of memory
data_val = data.GetChildMemberWithName('data').GetPointeeData(i, 1)
size = data_val.GetByteSize()
e = lldb.SBError()
if size == 1:
newchar = data_val.GetUnsignedInt8(e, 0) # utf-8
elif size == 2:
newchar = data_val.GetUnsignedInt16(e, 0) # utf-16
elif size == 4:
newchar = data_val.GetUnsignedInt32(e, 0) # utf-32
else:
s = s + '<unexpected char size - error parsing QString>'
break
if e.fail:
s = s + '<parse error:' + e.why() + '>'
break
i = i + 1
if i > strSize:
break
# add the character to our string 's'
# print "char2 = %s" % newchar
if newchar != 0:
s = s + unichr(newchar)
s = s + u'"'
return s.encode('utf-8')

Related

Nginx: How to convert post data encoding from UTF-8 to TIS-620 before pass them through proxy

I would like to convert POST data received from request and convert it from UTF-8 to TIS-620 before pass it to backend via proxy_pass using code below but I am not sure which way to do it
location / {
proxy_pass http://targetwebsite;
}
If I am not wrong, I believe I have to use Lua module to manipulate the request but I don't know if they support any character conversion.
Could anybody help me with sample code to convert POST data from UTF-8 toTIS-620 using LUA and how to validate if POST data is UTF-8 before convert it or if there is other better way to manipulate/convert POST data in nginx ?
This solution works on Lua 5.1/5.2/5.3
local function utf8_to_unicode(utf8str, pos)
-- pos = starting byte position inside input string
local code, size = utf8str:byte(pos), 1
if code >= 0xC0 and code < 0xFE then
local mask = 64
code = code - 128
repeat
local next_byte = utf8str:byte(pos + size)
if next_byte and next_byte >= 0x80 and next_byte < 0xC0 then
code, size = (code - mask - 2) * 64 + next_byte, size + 1
else
return
end
mask = mask * 32
until code < mask
elseif code >= 0x80 then
return
end
-- returns code, number of bytes in this utf8 char
return code, size
end
function utf8to620(utf8str)
local pos, result_620 = 1, {}
while pos <= #utf8str do
local code, size = utf8_to_unicode(utf8str, pos)
if code then
pos = pos + size
code =
(code < 128 or code == 0xA0) and code
or (code >= 0x0E01 and code <= 0x0E3A or code >= 0x0E3F and code <= 0x0E5B) and code - 0x0E5B + 0xFB
end
if not code then
return utf8str -- wrong UTF-8 symbol, this is not a UTF-8 string, return original string
end
table.insert(result_620, string.char(code))
end
return table.concat(result_620) -- return converted string
end
Usage:
local utf8string = "UTF-8 Thai text here"
local tis620string = utf8to620(utf8string)
I looked up the encoding on Wikipedia and came up the following solution for converting from UTF-8 to TIS-620. It assumes that all the codepoints in the UTF-8 string have an encoding in TIS-620. It will work if the UTF-8 string only contains ASCII printable characters (codepoints " " to "~") or Thai characters (codepoints "ก" to "๛"). Otherwise, it will give wrong and possibly very strange results.
This assumes you have Lua 5.3's utf8 library or an equivalent. If you're using an earlier version of Lua, one possibility is the pure-Lua version of the ustring library from MediaWiki (used by Wikipedia and Wiktionary, for instance). It provides a function to validate UTF-8, and many of the other functions will validate strings automatically. (That is, they throw an error if the string is invalid UTF-8.) If you use that library, you just have to replace utf8.codepoint with ustring.codepoint in the code below.
-- Add this number to TIS-620 values above 0x80 to get the Unicode codepoint.
-- 0xE00 is the first codepoint of Thai block, 0xA0 is the corresponding byte used in TIS-620.
local difference = 0xE00 - 0xA0
function UTF8_to_TIS620(UTF8_string)
local TIS620_string = UTF8_string:gsub(
'[\194-\244][\128-\191]+',
function (non_ASCII)
return string.char(utf8.codepoint(non_ASCII) - difference)
end)
return TIS620_string
end

How to use recursive function in Lua to dynamically build a table without overwriting it each call

I have a need to crawl a set of data of indeterminate size and build a table key/value index of it. Since I don't know the dimensions in advance, seems I have to use a recursive function. My Lua skills are very new and superficial. I'm having difficulty understanding how to deal with returning a table from the function call.
Note this is for a Lua 5.1 script processor
API = function(tbl)
local table_api = {}
-- do stuff here with target data and add to table_api
table_api["key"] = value
-- then later there is a need to recurse deeper into target data
table_api["lower"] = API(var)
return table_api
end
result = API(source_data)
In most every other language I know there would be some way to make the recurse line table_api["lower"] = API(var) work but since Lua does table variables by reference, my return sub-table just keeps getting overwritten and my result is a tiny last bit of what it should be.
Just for background on my purpose: there's a commercial application I'm working with that has a weakly documented Lua scripting interface. It's running Lua 5.1 and the API is not well documented and frequently updated. As I understand it, everything is stored in _G, so I wanted to write something to reverse engineer the API. I have a working recursive function (not shown here) that enumerates all of _G. For that return value, it just builds up an annotated string and progressively builds on the string. That all works fine and is really useful, but it shows much more that the API interface; all the actual data elements are included, so I have to sift through like 30,000 records to determine an API set of about 500 terms. In order to determine the API, I am trying to use this sub-table return value recursive function being discussed in this question. The code I'm showing here is just a small distilled subset of the larger function.
I'll go ahead and include the full code here. I was hoping to incrementally build a large'ish table of each API level, any sublevels, and finally whatever keys used at the lowest level.
In the end, I was expecting to have a table that I could address like this:
result["api"]["label"]["api"]["sublabel"]["value"]["valuename"]
Full code:
tableAPIShow = function(tbl, table_track)
table_track = table_track or {}
local table_api = {}
if type(tbl) == 'table' then
-- Check if values are tables.
local parent_table_flag = true
for ind,val in pairs(tbl) do
if type(val) ~= 'table' then
parent_table_flag = false
break
end
end
-- If all children are table type, check each of them for subordinate commonality
local api_flag = false
if parent_table_flag == true then
local child_table = {}
local child_table_flag = false
api_flag = true
for ind,val in pairs(tbl) do
-- For each child table, store the names of the indexes.
for sub_ind,sub_val in pairs(val) do
if child_table_flag == false then -- First time though, create starting template view of typical child table.
child_table[sub_ind] = true -- Store the indexes as a template table.
elseif child_table[sub_ind] == nil then -- Otherwise, test this child table compared to the reference template.
api_flag = false
break
end
end
if api_flag == false then -- need to break out of nested loop
break
end
child_table_flag = true
end
end
if api_flag == true then
-- If everything gets to here, then this level is an API with matching child tables below.
for ind,val in pairs(tbl) do
if table_api["api"] == nil then
table_api["api"] = {}
end
table_api["api"][ind] = tableAPIShow(val, table_track)
end
else
-- This level is not an API level, determine how to process otherwise.
for ind,val in pairs(tbl) do
if type(val) == 'table' then
if table_track[val] ~= nil then -- Have we already recursed this table?
else
table_track[val] = true
if table_api["table"] == nil then
table_api["table"] = {}
end
table_api["table"][ind] = tableAPIShow(val, table_track)
end
else -- The children are not tables, they are values
if table_api["value"] == nil then
table_api["value"] = {}
end
table_api["value"][ind] = val
end
end
end
else
-- It's not a table, just return it.
-- Probably never use this portion because it's caught on upper level recurse and not called
return tbl
end
return table_api
end
And I was calling this function in the main script like this:
local str = tableAPIShow(_G)
I've got another function that recursively shows a table so I can look inside my results and see I only get a return value that contains only the values of the top-level of _G (I have it excluded built-in Lua functions/values because I'm only interested in the Application API):
{
[value] = table: 00000000F22CB700 {
[value][_VERSION] = Application/5.8.1 (x86_64; Windows NT 10.0.16299),
[value][tableAPIShow] = "function: 00000000F22C6DE0, defined in (121-231) C:\\Users\\user\\AppData\\Local\\Temp\\APP\\/~mis00002690 ",
[value][_FINAL_VERSION] = true,
[value][Path] = ./Scripts/Database/elements/,
[value][class] = "function: 00000000F1953C40, defined in (68-81) Scripts/Common/Class.lua ",
[value][db_path] = ./Scripts/Database/,
[value][merge_all_units] = "function: 00000000F20D20C8, defined in (2242-2250) Scripts/Database/db_merge.lua ",
}
You just need to localize the variable you store your table in and it will work as you expect:
local table_api = {}
(note that you are passing table variable that conflicts with the global table variable and is not currently used in the function.)
I am inclined to believe that your tableAPIShow function code is working correctly and the
another function that recursively shows a table
fails to serialize your table fully. Hence you don't see the deeper levels of the table returned by tableAPIShow().
I got your initial and current code (tableAPIShow) to work with my simple table serialize function: the whole _Global table is exported completely and formatted as you implemented it in your tableAPIShow().
Code for testing:
apiMassiveTable = {
api = {
get = {
profile = {"profileID", "format"},
name = {"profileID", "encoding"},
number = {"profileID", "binary"}
},
set = {
name = {"apikey", "profileID", "encoding", "newname"},
number = {"apikey", "profileID", "binary", "newnumber"}
},
retrieve = {}
},
metadata = {version="1.4.2", build="nightly"}
}
-- tableAPIShow implemenation here
table.serialize = dofile("serialize.lua")
print(table.serialize("myNameForAHugeTable", tableAPIShow(_G)))
PS: Whatever serialize function you're using, it should enquote strings like Application/5.8.1 (x86_64; Windows NT 10.0.16299) which it does not.
Like #Paul Kulchenko said, you need to learn to use locals (https://www.lua.org/pil/4.2.html). Global variable post-exist until a new lua_State is loaded (a new environment, could be a new process depending on what interpreter you are using). So a tip is to always use local variables for anything you don't want to leave the function or leave the compilation unit.
Think of tables like dictionaries: a word is attached to a definition. Thusly the definition is the data.
What I think you are trying to do is serialization of a table of data. However, that isn't really necessary. You can either shadow copy or deep copy the given table. A shadow copy is when you don't delve into the depths of tables found in the keys, etc. A deep copy is when you copy tables in keys of tables in keys of tables... etc.
local shallow_copy = function(tab)
local rep_tab = {}
for index, value in pairs(tab)do
rep_tab[index] = value
end
return rep_tab
end
-- because local variable is not defined or declared immediately on a 1-liner,
-- a declaration has to exist so that deep_copy can be used
-- lets metatable execute functions
local deep_copy
deep_copy = function(tab)
local rep_tab = {}
for index, value in pairs(tab)do
if(type(value) == "table")then
rep_tab[index] = deep_copy(value)
else
rep_tab[index] = value
end
end
return rep_tab
end
Deco's deepcopy.lua
https://gist.github.com/Deco/3985043
You can also index tables using periods:
local tab = {}
tab.abc = 123
tab["def"] = 456
print(tab.abc, tab["def"])
To serialize the entire _G, you would just filter out the junk you don't need and recurse each table encountered. Watch out for _G, package, and _ENV because if defined it will recurse back to the start.
-- use cap as a whitelist
enumerate_dir = function(dir, cap)
local base = {}
for i, v in pairs(dir) do
-- skip trouble
if(i ~= "_G" and i ~= "_ENV" and i ~= "package")then
if(type(v) == "table")then -- if we have a table
base[i] = enumerate_dir(v, cap)
else
for k, n in pairs(cap) do
if(type(v) == n)then -- if whitelisted
base[i] = tostring(v)
end
end
end
end
end
return base
end
-- only functions and tables from _G please
local enumeration = enumerate_dir(_G, {"function", "table"})
-- do some cool quips to get a basic tree system
prefix = ""
recursive_print = function(dir)
for i, v in pairs(dir)do
if(type(v) == "table")then
print(prefix, i, v)
prefix = prefix .. ">"
recursive_print(v)
else
print(prefix, i, v)
end
end
if(#prefix > 0)then
prefix = prefix:sub(1, #prefix-1)
end
end
recursive_print(test)

How to define empty IndexedTables in Julia?

I am unable to define empty IndexedTables, e.g.
using IndexedTables, IndexedTables.Table
t = Table(Columns(a=Int64[],b=String[]),Int64[])
t[1,"a"] = 1
t[1,"b"] = 2
t[1,"c"] = t[1,"a"] + t[1,"b"]
BoundsError: attempt to access 0-element Array{Int64,1} at index [0]
I am aware that creating the IndexedTable with already the data is more efficient that creating an empty one and then insert values, but sometimes you are obliged to go on this way.
Is this a bug ? If so, is there any workaround possible ?
(I already posted this thread on the Julia forum, but so far I had no replies there)
This is probably a bug in IndexedTables.
Inserting into an IndexedTable requires reindexing to access the data. Reindexing is done with flush!.
But flush!(t) fails in the example in the question with the empty t.
Fixing flush! which calls _merge! can be done by:
julia> function IndexedTables._merge!(dst::IndexedTable, src::IndexedTable, f)
if length(dst.index)==0 || isless(dst.index[end], src.index[1])
append!(dst.index, src.index)
append!(dst.data, src.data)
else
# merge to a new copy
new = _merge(dst, src, f)
ln = length(new)
# resize and copy data into dst
resize!(dst.index, ln)
copy!(dst.index, new.index)
resize!(dst.data, ln)
copy!(dst.data, new.data)
end
return dst
end
julia> t[1,"c"] = t[1,"a"] + t[1,"b"]
3
The change is the addition of the length(...) check in the first if.
Of course, a pull request / issue should be opened with IndexedTables.jl. Antonello, will you do this? (or shall I)

Simplifying device creation in sip.conf

I often have to define many similar devices in sip.conf like this:
[device](!)
; setting some parameters
[device01](device)
callerid=dev01 <01>
[device02](device)
callerid=dev02 <02>
; ...
[deviceXX](device)
callerid=devXX <XX>
The question is perhaps I could avoid setting device-name specific parameters by using some variable like following?
[device](!)
callerid=dev${DEVICE_NAME:-2} <${DEVICE_NAME:-2}>
; setting some parameters
[device01](device)
[device02](device)
; ...
[deviceXX](device)
P.S.
It would be perfect, if there was some device constructor, so I could reduce the script to following, but, I think, that is not possible in Asterisk.
[device](!)
callerid=dev${DEVICE_NAME:-2} <${DEVICE_NAME:-2}>
; setting some parameters
;[device${MAGIC_LOOP(1,XX,leading_zeroes)}](device)
I've had good results writing a small program that takes care of it. It checks for a line saying something like
------- Automatically generated -------
and whatever is after that line, it's going to be regenerated as soon as it detects that there are new values for it (it could be from a database or from a text file). Then, I run it with supervisor and it checks every XX seconds if there are changes.
If there are changes, it issues a sip reload command after updating the sip.conf file
I wrote it in python, but whatever language you feel comfortable with should work just fine.
That's how I managed that and has been working fine so far (after a couple of months). I'd be extremely interested in learning about other approaches though. It's basically this (called from another script with supervisor):
users = get_users_logic()
#get the data that will me used on the sip.conf file
data_to_be_hashed = reduce(lambda x, y: x + y, map(lambda x: x['username'] + x['password'] + x['company_prefix'], users))
m = hashlib.md5()
m.update(str(data_to_be_hashed).encode("ascii"))
new_md5 = m.hexdigest()
last_md5 = None
try:
file = open(os.path.dirname(os.path.realpath(__file__)) + '/lastMd5.txt', 'r')
last_md5 = file.read().rstrip()
file.close()
except:
pass
# if it changed...
if new_md5 != last_md5:
#needs update
with open(settings['asterisk']['path_to_sip_conf'], 'r') as file:
sip_content = file.read().rstrip()
parts = sip_content.split(";-------------- BEYOND THIS POINT IT IS AUTO GENERATED --------------;")
sip_content = parts[0].rstrip()
sip_content += "\n\n;-------------- BEYOND THIS POINT IT IS AUTO GENERATED --------------;\n\n"
for user in users:
m = hashlib.md5()
m.update(("%s:sip.ellauri.it:%s" % (user['username'], user['password'])).encode("ascii"))
md5secret = m.hexdigest()
sip_content += "[%s]\ntype = friend\ncontext = %sLocal\nmd5secret = %s\nhost = dynamic\n\n" % (
user['username'], user['company_prefix'], md5secret)
#write the sip.conf file
f = open(settings['asterisk']['path_to_sip_conf'], 'w')
print(sip_content, file=f)
f.close()
subprocess.call('asterisk -x "sip reload"', shell=True)
#write the new md5
f = open(os.path.dirname(os.path.realpath(__file__)) + '/lastMd5.txt', 'w')
print(new_md5, file=f)
f.close()

Why are string lengths different between JavaScript and VB.NET (posted back via ASP.NET form)?

When I trim a multi-line textarea in JavaScript to a certain length (JS: .substring(0, x)), and that field is then posted back, checking the length in VB.NET will still find a length greater than the trim length from JavaScript (VB: .Length > x).
I have already determined this was a problem with line breaks, but I wanted to make sure no one else had to spend so long finding the answer (apparently it also applies to some implementations of JSP).
Somewhere in the whole ASP.NET scheme of things, a multi-line value is being massaged from the land of "\n" (vbLf) line breaks into the land of "\r\n" (vbCrLf) line breaks This difference in line breaks is the reason the lengths do not agree. Here is the simple way of addressing it in VB.NET (though a regex could probably do it to):
SomeString = SomeString.Replace(vbCrLf, vbCr)
Handling it in VB.NET opens myself up to potential duplication and would still leave it easy to miss this logic when someone adds another textarea; handling it in JavaScript could do the same thing. Is there some way to keep VB.NET/ASP.NET from handling line breaks this way or is there some better way of making this a non-issue? The answer to this best-practices question would definitely be the correct answer to this question.
The culprit seems to be an internal type in System.Web.dll; System.Web.HttpMultipartContentTemplateParser. Using Reflector, I found this code;
private bool GetNextLine()
{
int num = this._pos;
this._lineStart = -1;
while (num < this._length)
{
if (this._data[num] == 10)
{
this._lineStart = this._pos;
this._lineLength = num - this._pos;
this._pos = num + 1;
if ((this._lineLength > 0) && (this._data[num - 1] == 13))
{
this._lineLength--;
}
break;
}
if (++num == this._length)
{
this._lineStart = this._pos;
this._lineLength = num - this._pos;
this._pos = this._length;
}
}
return (this._lineStart >= 0);
}
Note some of the magic numbers, especially 10 and 13. These are vbLf and vbCr. It seems to me that this is processing the raw bytes that come in from the post, and 'a line' is considered to be anything ending with vbLf (10).
As the raw bytes are parsed (see the ParsePartData method, too) the compexities of vbcr and vblf are being cleaned out.
Ultimately, then, I think it's safe to just replace CrLF with LF again.
Welcome to the land of Unix line endings versus Windows OS based line endings.
Doesn't IIS have some sort of HTTP request filter or even a configuration option to not modify the request as it comes in? That would be the best answer.
Otherwise, search and replace is your best answer.

Resources