I recently came across a kaitai struct to deal with arbitrary binary formats. Now the thing is I have a hex-dump what I mean by that is I have a file which i want to parse and its in hex format when i use the visualizer in the web ide of kaitai for the mapping of data, it's converting the hex data again into hex is there any way i can convert the data from hex and get the exact hex data when i use the visualizer.
for example consider this
3335363330
and then again its mapping it to 33 33 33 35 33 36 33 33 33 30
thanks in advance
Currently the Kaitai WebIDE & the console visualizer (ksv) does not support reading hex-encoded files, only raw binary files.
The solution is to convert the hex-encoded (text) file to a binary one first and then load the binary file into Kaitai.
You can do this by calling xxd -r -p <input_file >output_file on Linux or eg. calling this small Python script: python -c "open('output_file','wb').write(open('input_file','r').read().strip().decode('hex'))". The latter works on any machine where Python 2 is installed.
Related
I'm trying to perform joins in SQLite on Hebrew words including vowel points and cantillation marks and it appears that the sources being joined built the components in different orders, such that the final strings/words appear identical on the screen but fail to match when they should. I'm pretty sure all sources are UTF-8.
I don't see a built in method of unicode normalization in SQLite, which would be the easiest solution; but found this link of Tcl Unicode but it looks a bit old using Tcl 8.3 and Unicode 1.0. Is this the most up-to-date method of normalizing unicode in Tcl and is it appropriate for Hebrew?
If Tcl doesn't have a viable method for Hebrew, is there a preferred scripting language for handling Hebrew that could be used to generate normalized strings for joining? I'm using Manjaro Linux but am a bit of a novice at most of this.
I'm capable enough with JavaScript, browser extensions, and the SQLite C API to pass the data from C to the browser to be normalized and back again to be stored in the database; but I figured there is likely a better method. I refer to the browser because I assume that they area kept most up to date for obvious reasons.
Thank you for any guidance you may be able to provide.
I used the following code in attempt to make the procedure provided by #DonalFellows a SQLite function such that it was close to not bringing the data into Tcl. I'm not sure how SQLite functions really work in that respect but that is why I tried it. I used the foreach loop solely to print some indication that the query was running and progressing because it took about an hour to complete.
However, that's probably pretty good for my ten-year old machine and the fact that it ran on 1) the Hebrew with vowel points, 2) with vowel points and cantillation marks and 3) the Septuagint translation of the Hebrew for all thirty-nine books of the Old Testament, and then two different manuscripts of Koine Greek for all twenty-seven books of the New Testament in that hour.
I still have to run the normalization on the other two sources to know how effective this is overall; however, after running it on this one which is the most involved of the three, I ran the joins again and the number of matches nearly doubled.
proc normalize {string {form nfc}} {
exec uconv -f utf-8 -t utf-8 -x "::$form;" << $string
}
# Arguments are: dbws function NAME ?SWITCHES? SCRIPT
dbws function normalize -returntype text -deterministic -directonly { normalize }
foreach { b } { 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 } {
puts "Working on book $b"
dbws eval { update src_original set uni_norm = normalize(original) where book_no=$b }
puts "Completed book $b"
}
If you're not in a hurry, you can pass the data through uconv. You'll need to be careful when working with non-normalized data though; Tcl's pretty aggressive about format conversion on input and output. (Just… not about normalization; the normalization tables are huge and most code doesn't need to do it.)
proc normalize {string {form nfc}} {
exec uconv -f utf-8 -t utf-8 -x "::$form;" << $string
}
The code above only really works on systems where the system encoding is UTF-8… but that's almost everywhere that has uconv in the first place.
Useful normalization forms are: nfc, nfd, nfkc and nfkd. Pick one and force all your text to be in it (ideally on ingestion into the database… but I've seen so many broken DBs in this regard that I suggest being careful).
I have an RDS file that is uploaded and then download via curl::curl_fetch_memory() (via httr) - this gives me a raw vector in R.
Is there a way to read that raw vector representing the RDS file to return the original R object? Or does it always have to be written to disk first?
I have a setup similar to below:
saveRDS(mtcars, file = "obj.rds")
# upload the obj.rds file
...
# download it again via httr::write_memory()
...
obj
# [1] 1f 8b 08 00 00 00 00 00 00 03 ad 56 4f 4c 1c 55 18 1f ca 02 bb ec b2 5d
# ...
is.raw(obj)
#[1] TRUE
It seems readRDS() should be used to uncompress it, but it takes a connection object and I don't know how to make a connection object from an R raw vector - rawConnection() looked promising but gave:
rawConnection(obj)
#A connection with
#description "obj"
#class "rawConnection"
#mode "r"
#text "binary"
#opened "opened"
#can read "yes"
#can write "no"
readRDS(rawConnection(obj))
#Error in readRDS(rawConnection(obj)) : unknown input format
Looking through readRDS it looks like it uses gzlib() underneath but couldn't get that to work with the raw vector object.
If its download via httr::write_disk() -> curl::curl_fetch_disk() -> readRDS() then its all good but this is a round trip to disk and I wondered if it could be optimised for big files.
By default, RDS file streams are gzipped. To read a raw connection you need to manually wrap it into a gzcon:
con = rawConnection(obj)
result = readRDS(gzcon(con))
This works even when the stream isn’t gzipped. But unfortunately it fails if a different supported compression method (e.g. 'bzip2') was used to create the RDS file. Unfortunately R doesn’t seem to have a gzcon equivalent for bzip2 or xz. For those formats, the only recourse seems to be to write the data to disk.
I had exactly the same problem, and for me, the above answer with gzcon did not work, however, I could directly load the raw object into R's memory using the rawConnection:
load(rawConnection(obj))
I need to programmatically inspect the library dependencies of a given executable. Is there a better way than running the ldd (or objdump) commands and parsing their output? Is there an API available which gives the same results as ldd ?
I need to programmatically inspect the library dependencies of a given executable.
I am going to assume that you are using an ELF system (probably Linux).
Dynamic library dependencies of an executable or a shared library are encoded as a table on Elf{32_,64}_Dyn entries in the PT_DYNAMIC segment of the library or executable. The ldd (indirectly, but that's an implementation detail) interprets these entries and then uses various details of system configuration and/or LD_LIBRARY_PATH environment variable to locate the needed libraries.
You can print the contents of PT_DYNAMIC with readelf -d a.out. For example:
$ readelf -d /bin/date
Dynamic section at offset 0x19df8 contains 26 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000000c (INIT) 0x3000
0x000000000000000d (FINI) 0x12780
0x0000000000000019 (INIT_ARRAY) 0x1a250
0x000000000000001b (INIT_ARRAYSZ) 8 (bytes)
0x000000000000001a (FINI_ARRAY) 0x1a258
0x000000000000001c (FINI_ARRAYSZ) 8 (bytes)
0x000000006ffffef5 (GNU_HASH) 0x308
0x0000000000000005 (STRTAB) 0xb38
0x0000000000000006 (SYMTAB) 0x358
0x000000000000000a (STRSZ) 946 (bytes)
0x000000000000000b (SYMENT) 24 (bytes)
0x0000000000000015 (DEBUG) 0x0
0x0000000000000003 (PLTGOT) 0x1b000
0x0000000000000002 (PLTRELSZ) 1656 (bytes)
0x0000000000000014 (PLTREL) RELA
0x0000000000000017 (JMPREL) 0x2118
0x0000000000000007 (RELA) 0x1008
0x0000000000000008 (RELASZ) 4368 (bytes)
0x0000000000000009 (RELAENT) 24 (bytes)
0x000000006ffffffb (FLAGS_1) Flags: PIE
0x000000006ffffffe (VERNEED) 0xf98
0x000000006fffffff (VERNEEDNUM) 1
0x000000006ffffff0 (VERSYM) 0xeea
0x000000006ffffff9 (RELACOUNT) 170
0x0000000000000000 (NULL) 0x0
This tells you that the only library needed for this binary is libc.so.6 (the NEEDED entry).
If your real question is "what other libraries does this ELF binary require", then that is pretty easy to obtain: just look for DT_NEEDED entries in the dynamic symbol table. Doing this programmatically is rather easy:
Locate the table of program headers (the ELF file header .e_phoff tells you where it starts).
Iterate over them to find the one with PT_DYNAMIC .p_type.
That segment contains a set of fixed sized Elf{32,64}_Dyn records.
Iterate over them, looking for ones with .d_tag == DT_NEEDED.
Voila.
P.S. There is a bit of a complication: the strings, such as libc.so.6 are not part of the PT_DYNAMIC. But there is a pointer to where they are in the .d_tag == DT_STRTAB entry. See this answer for example code.
For instance, Epson's TM-T88V esc/pos manual is like
I need to supply my printer with a buffer that contains the FS C code to change the code page.
# doesnt work, the actual code page number just gets interpreted literally
\x1C \x43 0
\x1C \x430
How do you read escpos manuals?
ESC/POS printer accept data as a series of bytes in your shown example to select Kanji characters for the printer. You need to send three bytes 1C 43 0 the printer will then executes the command.
However, before you send a command to an esc/pos printer you need to send a series of command first and then ends with cut command.
For example
Initialize the printer with 1B 40
Switch to standard mode. 1B 53
your commands 1C 43 0
your data.
Print command. 0A
last command cut paper. 1D 56 00
your printer's programming manual should have detail these steps.
So I am trying to compare a binary file I make when I compile with gcc to an sample executable that is provided. So I used the command diff and went like this
diff asgn2 sample-asgn2
Binary files asgn2 and sample-asgn2 differ
Is there any way to see how they differ? Instead of it just displaying that differ.
Do a hex dump of the two binaries using hexdump. Then you can compare the hex dump using your favorite diffing tool, like kdiff3, tkdiff, xxdiff, etc.
Why don't you try Vbindiff? It probably does what you want:
Visual Binary Diff (VBinDiff) displays files in hexadecimal and ASCII (or EBCDIC). It can also display two files at once, and highlight the differences between them. Unlike diff, it works well with large files (up to 4 GB).
Where to get Vbindiff depends on which operating system you are using. If Ubuntu or another Debian derivative, apt-get install vbindiff.
I'm using Linux,in my case,I need a -q option to just show what you got.
diff -q file1 file2
without -q option it will show which line is differ and display that line.
you may check with man diff to see the right option to use in your UNIX.
vbindiff only do byte-to-byte comparison. If there is just one byte addition/deletion, it will mark all subsequent bytes changed...
Another approach is to transform the binary files in text files so they can be compared with the text diff algorithm.
colorbindiff.pl is a simple and open-source perl script which uses this method and show a colored side-by-side comparison, like in a text diff. It highlights byte changes/additions/deletions. It's available on GitHub.