I got such a piece of code:
void SHAPresenter::hashData(QString data)
{
QCryptographicHash* newHash = new QCryptographicHash(QCryptographicHash::Sha3_224);
newHash->addData(data.toUtf8());
QByteArray hashResultByteArray = newHash->result();
setHashedData(QString(hashResultByteArray.toHex()));
delete newHash;
}
According to Qt spec, QCryptographicHash::Sha3_224 should "generate an SHA3-224 hash sum. Introduced in Qt 5.1". I wanted to compare result of that code to something other source to check whether I put data in correct manner. I found site: https://emn178.github.io/online-tools/sha3_224.html
So we have SHA3_224 in both cases. The problem is that the first will generate such a byte string from "test":
3be30a9ff64f34a5861116c5198987ad780165f8366e67aff4760b5e
And the second:
3797bf0afbbfca4a7bbba7602a2b552746876517a7f9b7ce2db0ae7b
Not similar at all. But there is also a site that do "Keccak-224":
https://emn178.github.io/online-tools/keccak_224.html
And here result is:
3be30a9ff64f34a5861116c5198987ad780165f8366e67aff4760b5e
I know that SHA3 is based on Keccak's functions - but what is the issue here? Which of these two implementations follows NIST FIPS 202 in proper manner and how do we know that?
I'm writing a Keccak library for Java at the moment, so I had the toys handy to test an initial suspicion.
First a brief summary. Keccak is a sponge function which can take a number of parameters (bitrate, capacity, domain suffix, and output length). SHA-3 is simply a subset of Keccak where these values have been chosen and standardised by NIST (in FIPS PUB 202).
In the case of SHA3-224, the parameters are as follows:
bitrate: 1152
capacity: 448
domain suffix: "01"
output length: 224 (hence the name SHA3-224)
The important thing to note is that the domain suffix is a bitstring which gets appended after the input message and before the padding. The domain suffix is an optional way to differentiate different applications of the Keccak function (such as SHA3, SHAKE, RawSHAKE, etc). All SHA3 functions use "01" as a domain suffix.
Based on the documentation, I get the impression that Keccak initially had no domain suffix concept, and the known-answer tests provided by the Keccak team require that no domain suffix is used.
So, to your problem. If we take the String "test" and convert it to a byte array using ASCII or UTF-8 encoding (because Keccak works on binary, so text must be converted into bytes or bits first, and it's therefore important to decide on which character encoding to use) then feed it to a true SHA3-224 hash function we'll get the following result (represented in hexadecimal, 16 bytes to a line for easy reading):
37 97 BF 0A FB BF CA 4A 7B BB A7 60 2A 2B 55 27
46 87 65 17 A7 F9 B7 CE 2D B0 AE 7B
SHA3-224 can be summarised as Keccak[1152, 448](M || "01", 224) where the M || "01" means "append 01 after the input message and before multi-rate padding".
However, without a domain suffix we get Keccak[1152, 448](M, 224) where the lonesome M means that no suffix bits are appended, and the multi-rate padding will begin immediately after the input message. If we feed your same input "test" message to this Keccak function which does not use a domain suffix then we get the following result (again in hex):
3B E3 0A 9F F6 4F 34 A5 86 11 16 C5 19 89 87 AD
78 01 65 F8 36 6E 67 AF F4 76 0B 5E
So this result indicates that the function is not SHA3-224.
Which all means that the difference in output you are seeing is explained entirely by the presence or absence of a domain suffix of "01" (which was my immediate suspicion on reading your question). Anything which claims to be SHA3 must use a "01" domain suffix, so be very wary of tools which behave differently. Check the documentation carefully to make sure that they don't require you to specify the desired domain suffix when creating/using the object or function, but anything which claims to be SHA3 really should not make it possible to forget the suffix bits.
This is a bug in Qt and reported here and Fixed in Qt5.9
Related
im using Rstudio to get data from esp32
i got this error :
simpleError: parse error: trailing garbage
9c ac fb 3f 9c ac fb 3f 6c 36
My code is like this :
str(req$bodyRaw)
print("\n")
data <- jsonlite::parse_json(req$bodyRaw)
print(data)
Short answer is parse_json requires a "string with literal json or connection object to read from" not a raw vector. From the information provided I have no idea what the structure of req is, but often there is another property such as body that will be the character representation of the body. Alternatively, jsonlite::parse_json(rawToChar(req$bodyRaw))
I'm trying to perform joins in SQLite on Hebrew words including vowel points and cantillation marks and it appears that the sources being joined built the components in different orders, such that the final strings/words appear identical on the screen but fail to match when they should. I'm pretty sure all sources are UTF-8.
I don't see a built in method of unicode normalization in SQLite, which would be the easiest solution; but found this link of Tcl Unicode but it looks a bit old using Tcl 8.3 and Unicode 1.0. Is this the most up-to-date method of normalizing unicode in Tcl and is it appropriate for Hebrew?
If Tcl doesn't have a viable method for Hebrew, is there a preferred scripting language for handling Hebrew that could be used to generate normalized strings for joining? I'm using Manjaro Linux but am a bit of a novice at most of this.
I'm capable enough with JavaScript, browser extensions, and the SQLite C API to pass the data from C to the browser to be normalized and back again to be stored in the database; but I figured there is likely a better method. I refer to the browser because I assume that they area kept most up to date for obvious reasons.
Thank you for any guidance you may be able to provide.
I used the following code in attempt to make the procedure provided by #DonalFellows a SQLite function such that it was close to not bringing the data into Tcl. I'm not sure how SQLite functions really work in that respect but that is why I tried it. I used the foreach loop solely to print some indication that the query was running and progressing because it took about an hour to complete.
However, that's probably pretty good for my ten-year old machine and the fact that it ran on 1) the Hebrew with vowel points, 2) with vowel points and cantillation marks and 3) the Septuagint translation of the Hebrew for all thirty-nine books of the Old Testament, and then two different manuscripts of Koine Greek for all twenty-seven books of the New Testament in that hour.
I still have to run the normalization on the other two sources to know how effective this is overall; however, after running it on this one which is the most involved of the three, I ran the joins again and the number of matches nearly doubled.
proc normalize {string {form nfc}} {
exec uconv -f utf-8 -t utf-8 -x "::$form;" << $string
}
# Arguments are: dbws function NAME ?SWITCHES? SCRIPT
dbws function normalize -returntype text -deterministic -directonly { normalize }
foreach { b } { 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 } {
puts "Working on book $b"
dbws eval { update src_original set uni_norm = normalize(original) where book_no=$b }
puts "Completed book $b"
}
If you're not in a hurry, you can pass the data through uconv. You'll need to be careful when working with non-normalized data though; Tcl's pretty aggressive about format conversion on input and output. (Just… not about normalization; the normalization tables are huge and most code doesn't need to do it.)
proc normalize {string {form nfc}} {
exec uconv -f utf-8 -t utf-8 -x "::$form;" << $string
}
The code above only really works on systems where the system encoding is UTF-8… but that's almost everywhere that has uconv in the first place.
Useful normalization forms are: nfc, nfd, nfkc and nfkd. Pick one and force all your text to be in it (ideally on ingestion into the database… but I've seen so many broken DBs in this regard that I suggest being careful).
I have an RDS file that is uploaded and then download via curl::curl_fetch_memory() (via httr) - this gives me a raw vector in R.
Is there a way to read that raw vector representing the RDS file to return the original R object? Or does it always have to be written to disk first?
I have a setup similar to below:
saveRDS(mtcars, file = "obj.rds")
# upload the obj.rds file
...
# download it again via httr::write_memory()
...
obj
# [1] 1f 8b 08 00 00 00 00 00 00 03 ad 56 4f 4c 1c 55 18 1f ca 02 bb ec b2 5d
# ...
is.raw(obj)
#[1] TRUE
It seems readRDS() should be used to uncompress it, but it takes a connection object and I don't know how to make a connection object from an R raw vector - rawConnection() looked promising but gave:
rawConnection(obj)
#A connection with
#description "obj"
#class "rawConnection"
#mode "r"
#text "binary"
#opened "opened"
#can read "yes"
#can write "no"
readRDS(rawConnection(obj))
#Error in readRDS(rawConnection(obj)) : unknown input format
Looking through readRDS it looks like it uses gzlib() underneath but couldn't get that to work with the raw vector object.
If its download via httr::write_disk() -> curl::curl_fetch_disk() -> readRDS() then its all good but this is a round trip to disk and I wondered if it could be optimised for big files.
By default, RDS file streams are gzipped. To read a raw connection you need to manually wrap it into a gzcon:
con = rawConnection(obj)
result = readRDS(gzcon(con))
This works even when the stream isn’t gzipped. But unfortunately it fails if a different supported compression method (e.g. 'bzip2') was used to create the RDS file. Unfortunately R doesn’t seem to have a gzcon equivalent for bzip2 or xz. For those formats, the only recourse seems to be to write the data to disk.
I had exactly the same problem, and for me, the above answer with gzcon did not work, however, I could directly load the raw object into R's memory using the rawConnection:
load(rawConnection(obj))
For instance, Epson's TM-T88V esc/pos manual is like
I need to supply my printer with a buffer that contains the FS C code to change the code page.
# doesnt work, the actual code page number just gets interpreted literally
\x1C \x43 0
\x1C \x430
How do you read escpos manuals?
ESC/POS printer accept data as a series of bytes in your shown example to select Kanji characters for the printer. You need to send three bytes 1C 43 0 the printer will then executes the command.
However, before you send a command to an esc/pos printer you need to send a series of command first and then ends with cut command.
For example
Initialize the printer with 1B 40
Switch to standard mode. 1B 53
your commands 1C 43 0
your data.
Print command. 0A
last command cut paper. 1D 56 00
your printer's programming manual should have detail these steps.
I have a data dump from a database used by XMap.
It looks like XMap stores the Lat/Lng as a Hex value. Can anyone help me decode these?
I was also able to use Xmap to upload some of my own data and see how it converted that to Hex. I am just unable to do a bulk export with the version of Xmap that I have.
Long: -100.00 Lat :35.00 0000004E 0000806E
-101.00 35.00 0000804D 0000806E
-101.1 35.1 3333734D 3333736E
-101.2 35.2 6666664D 6666666E
Lat Lon Hex
35.21285737 -98.44795716 0x57A9C64E17C1646E
35.21305335 -98.44786274 0x6FACC64EABBA646E
35.94602108 -96.74434793 0x35B9A04FC8E8066E
34.89283431 -99.037117 0xC03F7B4E9BB78D6E
34.89300668 -99.03754044 0xE0317B4EF5B18D6E
34.41109633 -100.2820795 0xD2E4DB4D3261CB6E
33.97470069 -101.2196311 0x21E3634D023D036F
34.0079211 -101.1440331 0x53906D4D71FCFE6E
32.76227534 -104.2691193 0x808DDD4BC36D9E6F
32.77947819 -104.204128 0x22DFE54B0F3A9C6F
32.77947819 -104.204128 0x22DFE54B0F3A9C6F
32.64307308 -104.5322441 0x6DDFBB4BC8AFAD6F
32.64290345 -104.531814 0x85EDBB4B57B5AD6F
32.47907078 -104.5652282 0x9AA6B74BCFADC26F
32.47907078 -104.5652282 0x9AA6B74BCFADC26F
32.22682178 -101.3943434 0x28864D4D81F7E26F
32.07237184 -101.8558813 0x7B72124D85BCF66F
31.89574015 -102.4611448 0x35F9C44C63580D70
31.8808713 -102.4563805 0x5395C54C9C3F0F70
31.18487537 -101.1440152 0xE9906D4D01566870
31.28633738 -100.4128259 0x8528CB4D4C595B70
31.0369339 -100.5050048 0x015CBF4DC0457B70
30.83263898 -100.6261411 0x9CDAAF4D166C9570
So this exact problem just came up at work a last night, and I spent a few hours decoding and converting the information.
It appears that the lat and long are stored in 32-bit Little-endian chunks (read more about endianness here (wikipedia))
From your example 35.21285737 -98.44795716 0x57A9C64E17C1646E converts as follows:
Split to 32 bit sections --> 0x57A9C64E (lng) 0x17C1646E (lat)
Adjust for endianness
LAT: 17 C1 64 6E => Swap Bytes => 6E 64 C1 17 ==> 1852096791 (base 10)
LNG: 57 A9 C6 4E => Swap Bytes => 4E C6 A9 57 ==> 1321642327 (base 10)
from that information, I then used a linear regression to figure out a conversion equation (http://arachnoid.com/polysolve/. Originally, I tried using Excel's regression tools, but it didn't provide nearly enough accuracy). It ended up working out much nicer than I originally thought. However, it seems like there should be a sign bit within the data, but I didn't figure out how to retrieve it, so there are two separate equations for whether it is lat or long.
LAT = 256 + raw / -2^23
LNG = -256 + raw / 2^23
If we go ahead and run our test data through the equations we receive:
Lat: 35.212857365
Lng: -98.447957158
If i get more time in the future, I may try to figure out a more efficient method for converting (taking the sign bit into account) but for now this method worked well enough for me.
Now that this data if figured out, one could expand it to be able to convert the raw data for other more complex geometry types (such as lines). I haven't had a chance to finish working out all the details for RAW data that's used with lines. However, what I did look at it seems that it includes a header that contains some additional information, such as how many Lat/Long Points are in the data. Other than that, it looked about as straightforward as the single points.
-- Edit --
Revisited this today and after much digging found a better formula for conversion from GPSBable source code.
COORD = (0x80000000 - raw) / 0x800000
We can also do the inverse and convert from a COORD back to the raw data
RAW = 0x80000000 - (coord * 0x800000)
I also looked into the sign bit and as far as I can tell, sign bits are not stored in the data, so you have to be aware of that. I also have code that implements Point, Line, and Polygon decoding in PHP if anybody needs it.