GS1 support in a QR encoder/decoder? - qr-code

Very few QR encoders/decoders have (explicit) support for so-called GS1 encoding. Zint is one of the exceptions (under QR select GS-1 Data Mode), but its license prevents me from using it. Commercial offers, mainly from Tec-It, are costly, especially because I'm not interested in all other kinds of barcodes they support.
Is there a way to add GS1 support to any QR encoder/decoder without changing its source? For example, could I apply some algorithm to transform textual GTIN AI data into compatible binary? I think it should be possible, because after all, it's still QR. Please note that I am not a data coding expert - I'm just looking for a way to deal with this standard without paying a small fortune. So far, I found postscriptbarcode which does have support for it, and seems to use its own QR engine, but output quality is so-so and my PostScript skills are far too limited to figure out the algorithm.

As long as the library supports decoding of the FNC1 special character, it can be used to read GS1 codes. The FNC1 character is not a byte in the data-stream, but more of a formatting symbol.
The specification says that a leading FNC1-character is used to identify GS1 barcodes, and should be decoded as "]d2" (GS1 DataMatrix), "]C1" (GS1-128), "]e0" (GS1 DataBar Omnidirectional) or "]Q3" (GS1 QR Code). Any other FNC1-characters should be decoded as ASCII GS-characters (byte value 29).
Depending on the library, the leading FNC1 might be missing, or decoded as GS (not critical), or the embedded FNC1-characters might be missing (critical). The embedded FNC1-characters are used to delimit variable-length fields.
You can read the full specification here (pdf). The algorithm for decoding the data can be found under heading 7.9 Processing of Data from a GS1 Symbology using GS1 Application Identifiers (page 426).
The algorithm goes something like this:
Peek at the first character.
If it is ']',
If string does not start with ']C1' or ']e0' or ']d2' or ']Q3',
Not a GS1 barcode.
Stop.
Consume the caracters.
Else if it is <GS>,
Consume character.
Else,
No symbology identifier, assume GS1.
While not end of input,
Read the first two digits.
If they are in the table of valid codes,
Look up the length of the AI-code.
Read the rest of the code.
Look up the length of the field.
If it is variable-length,
Read until the next <FNC1> or <GS>.
Else,
Read the rest if the field.
Peek at the next character.
If it is <FNC1> or <GS>, consume it.
Save the read field.
Else,
Error: Invalid AI
The binary data in the QR Code is encoded as 4-bit tokens, with embedded data.
0111 -> Start Extended Channel Interpretation (ECI) Mode (special encodings).
0001, 0010, 0100, 1000 -> start numeric, alphanumeric, raw 8-bit, kanji encoded data.
0011 -> structured append (combine two or more QR Codes to one data-stream).
0101 -> FNC1 initial position.
1001 -> FNC1 other positions.
0000 -> End of stream (can be omitted if not enough space).
After an encoding specification comes the data-length, followed by the actual data. The meanings of the data bits depends on the encoding used. In between the data-blocks, you can squeeze FNC1 characters.
The QR Code specification (ISO/IEC 18004) unfortunately costs money (210 Franc). You might find some pirate version online though.
To create GS1 QR Codes, you need to be able to specify the FNC1-characters in the data. The library should either recognize the "]Q3" prefix and GS-characters, or allow you to write FNC1 tokens via some other method.
If you have some way to write the FNC1-characters, you can encode GS1 data as follows:
Write initial FNC1.
For each field,
Write the AI-code as decimal digits.
Write field data.
If the code is a variable-length field,
If not the last field,
Write FNC1 to terminate the field.
If possible, you should order the fields such that a variable-length field comes last.
As noted by Terry Burton in the comments; The FNC1 symbol in a GS1 QR Code can be encoded as % in alphanumeric data, and as GS in byte mode. To encode an actual percent symbol, you write it as %%.
To encode (01) 04912345123459 (15) 970331 (30) 128 (10) ABC123, you first combine it into the data string 01049123451234591597033130128%10ABC123 (% indicator is the encoded FNC1 symbol). This string is then written as
0101 - Initial FNC1, GS1 mode indicator
0001 - QR numeric mode
0000011101 - Data length (29)
<data bits for "01049123451234591597033130128">
0010 - QR alphanumeric mode
000001001 - Data length (9)
<data bits for "%10ABC123">
(Example from the ISO 18004:2006 specification)

Related

ZPL - How to embed GS1 application identifiers into GS1 QR code

I'm trying to code a GS1 compliant QR code in ZPL which will inlcude a number of application identifiers. I don't understand how to embed the FNC1 character within the ^FD string when using ^BQ to create a 2d code.
Below is my first attempt. When creating a GS1-128 barcode, I would use the >8 character to denote variable length fields.
^FX Test^FS
^XA^MCY^XZ
^XA^LH0,65
^LH0,0^FS
^BQN,2,10^FD>;>83018099999>82411184174>810MFATA00001>891EA^FS
^PQ1,0,0,N
^XZ
This creates a 2d barcode that returns the following string when scanned, but is not recoginised as GS1 compliant.
11611193018099999>82411184174>810MFATA00001>891EA
How do I configure the ^FD field to enable the FNC1 character?
QR ZPL Issues
See a recent answer I had for QR codes here:
Print ZPLII QR to open url
You are missing some of the parameters for the ^BQ and ^FD commands.
GS1 QR Issues
Research I've done indicates that the GS1 QR code is quite proprietary, and does not seem to be easily generated with ZPL. However, you can use a Data Matrix barcode quite easily.
Looks like you are trying to create a code with the following Application Identifiers and values:
30: Variable Count of Items: 18099999
241: Variable Customer Part Number: 1184174
10: Variable Batch/Lot Number: MFATA00001
91: Variable Company Internal: EA
GTIN 01 seems to be required and is missing. I've added a temporary GTIN string. Customer part number 241 seems to be local only, and may not validate in some applications which validate global requirements.
Full Barcode String.
^FD_10112345678901234_110MFATA00001_13018099999_12411184174_191EA^FS
Full ZPL for sample label
^XA
^FO10,10
^BXN,9,200,40,40,,_
^FD_10112345678901234_110MFATA00001_13018099999_12411184174_191EA^FS
^XZ
Hope that helps.
https://www.gs1.org/docs/barcodes/GSCN_16_477_FNC1.pdf
https://www.zebra.com/us/en/support-downloads/knowledge-articles/creating-gs1-barcodes-with-zebra-printers-for-data-matrix-and-code-128-using-zpl.html
EdHayes3's answer is just great.
As specified by Zebra in a ^BX the escape character is the underscore and the subsequent number defines what kind of FNC is used.
_1 - > FNC1
_2 - > FNC2
_3 - > FNC3
FNC4 is not supported according to how I understand the Zebra documentation.
The only thing I do not entirely agree with is escaping every GS1 AI since the most common ones except Lot/Batch number have a fixed length.
In other words, I do not think that it is necessary to escape for example the GTIN. Though, you probably have to keep in mind to pad it up with leading zeros in case of GTIN-12 or GTIN-13.

How to represent acute accents in ASCII?

I'm having an encoding problem related to cookies on one of my websites.
A user is inputing Usuário, which has an acute accent, and that's being put in a cookie. The raw HEX for the cookie response is (for the Usuário string):
55 73 75 C3 A1 72 69 6F
When I see it in the browser, it looks like this:
...which is really messy. I need to fix this up.
Then I went to this website: http://www.rapidtables.com/convert/number/hex-to-ascii.htm and converted the HEX value to see how it would look like. And I got the same output:
Right. This means the HEX code is wrong. Then I tried to convert Usuário to ASCII to see how it should be. I used this WebSite: http://www.asciitohex.com/ and this is the result:
For my surprise, the HEX is exactly the one that is showing up messy. Why???
And how do I represent Usuário in ASCII so I can put it in a cookie? Should I manually encode it?
PS: I'm using ASP.NET, just in case it matters.
As of 2015 the standard of the web to store character data is UTF-8 and not ASCII. ASCII actually only contains the first 128 characters of the codepage, and does not include any kind of accented characters. To add accented characters to this 128 characters there were many legacy solutions: codepages. They each added 128 different characters to the default ASCII list thereby allowing representing 256 different characters.
The problem was, that this didn't properly solve the issue: ASCII based codepages were more or less incomatible with each other (except for the first 128 characters), and there was usually no way of programatically knowing which codepage was in used.
One of the solutions was UTF-8, which is a way to encode the unocde character set (containing most of the characters used around the world, and more) while trying to remain compatible with ASCII. The first 128 characters are actually the same in both cases, but afterwards UTF-8 characters become multi-byte: one character is encoded using a series of bytes (usually 2-3, depends on which character needs to be encoded)
The problem is if you are using some kind of ASCII based single byte codebase (like ISO-8859-1), which encodes supported characters in single bytes, but your input is actually UTF-8, which will encode accented characters in multiple bytes (you can see this in your HEX example. á is encoded as C3 A1: two bytes). If you try to read these two bytes in an ASCII based codepage, which uses single bytes for every characters (in West-Europe this codepage is usually ISO-8859-1), then each of this two bytes will be reprensented with two different characters.
In the web world the default encoding is UTF-8, so your clients will usually send their requests using UTF-8. ASP.NET is Unicode aware, so it can handle these requests. However somewere in your code this UTF-8 is converted acccidentally into ISO-8859-1, and then back into UTF-8. This might happen on various layers. As you have issues it probably happens at the cookie layer, which is sometimes problematic (here is how it worked in 2009). You should also double check your application that it uses UTF-8 everywhere else though (views, database, etc.), if you want to properly support accented characters.

Using Coldfusion's Encrypt function to encrypt a hex block and return a block-length result

My company is working on a project that will put card readers in the field. The readers use DUKPT TripleDES encryption, so we will need to develop software that will decrypt the card data on our servers.
I have just started to scratch the surface on this one, but I find myself stuck on a seemingly simple problem... In trying to generate the IPEK (the first step to recreating the symmetric key).
The IPEK's a 16 byte hex value created by concatenating two triple DES encrypted 8 byte hex strings.
I have tried ECB and CBC (zeros for IV) modes with and without padding, but the result of each individual encoding is always 16 bytes or more (2 or more blocks) when I need a result that's the same size as the input. In fact, throughout this process, the cyphertexts should be the same size as the plaintexts being encoded.
<cfset x = encrypt("FFFF9876543210E0",binaryEncode(binaryDecode("0123456789ABCDEFFEDCBA98765432100123456789ABCDEF", "hex"), "base64") ,"DESEDE/CBC/PKCS5Padding","hex",BinaryDecode("0000000000000000","hex"))>
Result: 3C65DEC44CC216A686B2481BECE788D197F730A72D4A8CDD
If you use the NoPadding flag, the result is:
3C65DEC44CC216A686B2481BECE788D1
I have also tried encoding the plaintext hex message as base64 (as the key is). In the example above that returns a result of:
DE5BCC68EB1B2E14CEC35EB22AF04EFC.
If you do the same, except using the NoPadding flag, it errors with "Input length not multiple of 8 bytes."
I am new to cryptography, so hopefully I'm making some kind of very basic error here. Why are the ciphertexts generated by these block cipher algorithms not the same lengths as the plaintext messages?
For a little more background, as a "work through it" exercise, I have been trying to replicate the work laid out here:
https://www.parthenonsoftware.com/blog/how-to-decrypt-magnetic-stripe-scanner-data-with-dukpt/
I'm not sure if it is related and it may not be the answer you are looking for, but I spent some time testing bug ID 3842326. When using different attributes CF is handling seed and salt differently under the hood. For example if you pass in a variable as the string to encrypt rather than a constant (hard coded string in the function call) the resultant string changes every time. That probably indicates different method signatures - in your example with one flag vs another flag you are seeing something similar.
Adobe's response is, given that the resulting string can be unecrypted in either case this is not really a bug - more of a behavior to note. Can your resultant string be unencrypted?
The problem is encrypt() expects the input to be a UTF-8 string. So you are actually encrypting the literal characters F-F-F-F-9.... rather than the value of that string when decoded as hexadecimal.
Instead, you need to decode the hex string into binary, then use the encryptBinary() function. (Note, I did not see an iv mentioned in the link, so my guess is they are using ECB mode, not CBC.) Since the function also returns binary, use binaryEncode to convert the result to a more friendly hex string.
Edit: Switching to ECB + "NoPadding" yields the desired result:
ksnInHex = "FFFF9876543210E0";
bdkInHex = "0123456789ABCDEFFEDCBA98765432100123456789ABCDEF";
ksnBytes = binaryDecode(ksnInHex, "hex");
bdkBase64 = binaryEncode(binaryDecode(bdkInHex, "hex"), "base64");
bytes = encryptBinary(ksnBytes, bdkBase64, "DESEDE/ECB/NoPadding");
leftRegister = binaryEncode(bytes, "hex");
... which produces:
6AC292FAA1315B4D
In order to do this we want to start with our original 16 byte BDK
... and XOR it with the following mask ....
Unfortunately, most of the CF math functions are limited to 32 bit integers. So you probably cannot do that next step using native CF functions alone. One option is to use java's BigInteger class. Create a large integer from the hex strings and use the xor() method to apply the mask. Finally, use the toString(radix) method to return the result as a hex string:
bdkText ="0123456789ABCDEFFEDCBA9876543210";
maskText = "C0C0C0C000000000C0C0C0C000000000";
// use radix=16 to create integers from the hex strings
bdk = createObject("java", "java.math.BigInteger").init(bdkText, 16);
mask = createObject("java", "java.math.BigInteger").init(maskText, 16);
// apply the mask and convert the result to hex (upper case)
newKeyHex = ucase( bdk.xor(mask).toString(16) );
WriteOutput("<br>newKey="& newKeyHex);
writeOutput("<br>expected=C1E385A789ABCDEF3E1C7A5876543210");
That should be enough to get you back on track. Given some of CF's limitations here, java would be a better fit IMO. If you are comfortable with it, you could write a small java class and invoke that from CF instead.

What is the encoding of this data?

Who can tell me the encoding type of these data? It doesn't seem to be base64.
/zZ/u00GIaP9HW010G000G01003/sm1302WS7YCU6IWZ8ICjAoWmF6H1F3StF7jONKbaaO2Pbe+
0Z8gWjER3eAhQhOgCoF/Bskxr////cy7////w/+Rz//Z/sm130IijBJmrF7P1GNRufOob+FZu+F
Zu+FZu+FZu+FZu+FZu+FZu+FZu+FZu+FZu+FZu+FZu+FZu+FZu+FZu+FZu+FZ/m00H200X07430
I800X410n41/yG07m000GK10G410G400000000000420mG51WS82GeB/yG0jH000W430m840mK5
10G0005z0G8300GH1H8XCK464r5X1o9n53A1aQ488qAnmHLIqV0aCs9oWWaA5XSO6Heb9YSeAIe
qDJOtE3awGqH5HaT8IKfJL5LMLrXPMcDaPMPdQ6bgStHrTdTuUNg3X8M6XuY9YfAJb9MMbvYPcg
AZfAMcfwYfghApjBMsjxYvkiB3nCN6nyZ9ojBJrDNMrzZPsk7Yu+JbvkVewUhnylFqzVRt+Fdw/
yG07m400m410G410G410G00000000420mG51WS82GeB/yG0jH400W4210G310S510G00G9t0042
0n441I4n1X91KGTXSHCYCe4854AHeR712ICpKl0LOdBH2XOaDE4byHSO6Hec9oWfAZKsDpWvEaD
4HKP7I4bAKrHLLbTOMLfZP6LcPsXfQdDqTNPtU7bwWeE4XOQ7Y8cAafEKbPQNc9cQegEafQQdgA
cgihEqjRQtkBcwmiF4nSR7oCdAqjFKrTRNsDdQukFavURdwEdgylFqzVRt+Fdw/ze030C1008H0
n40Fm2vQMjkzf0pGHVSKab1ad5I/PBPkbl5Z/S7D5dyrb1dfvQ/ZnIotCAIB6x7B38LLB5lm7Qc
G9zajcwMyMFzmSr18sd82pnn1586H5a4aP7EFIfblRUMRoVCsj/TO5IVph7kBAUH1TnkMJPl18s
bU/JFuvxydwWpLYZi9s8YItR7OAkV/m1LFIsjPLLrjujX08FbWPhdxUEIvlnvESbzsuZE1dgSd+
jT7DF77jyni1ZXG0IMFq52JUmCRzajcwMyMFy0S7D7sIsRfRnO/m1mSqXlRSo17VPaP0TIkVpxK
iy+C95jQHssgfFV6Sds0fyhMuYbBFPBUHsyTh4+M2imK31pzAjImsKKPbaWYMDUfyiS/fM8xgcg
ix7v1EIJrutLSdqoUxccdSX2I0e7Ex0ndhmE/Vy0ngQIjOQBqCTZSblAXYPKE2H6C4/N5IVPBPk
bl5Z/071pMHRwQ3V97vmEq1sKZ3O/0d+VlMpBSHHZzv8gZhZkVmzAWFGRzajcwMyMFzmSqVPBPk
bl5Z/S7DBzfYPrGahk+xkKZT+TJVV/0Dt+T0Qd6qKKKYZgxFvhA3FJor/7Yi+rbaRKBif5vpbzf
F2XL1nr/BZlYj2p+QoWpqyjVnugl5ljRYLNHtXcSoAw8JrwWu/3wqo1VEZakaIxjbZaF+hSuODW
yOFzAfHtKgj1O+KZgGkL2bICW7a+eF9EEsQkp1hwvX2HbOeM3cHq8px3FwrFBPmN4WaTCaSWXYC
dru/3ds50p1byrBpVRAoC+S8WzEm0wZZkFK7a6j4oksgkVACiYe0g361m2VcFrFrpLo6pWZVUY3
e02UJZ6C0+d+UcAYn91TFAoj97C536DUX0nqzEl+UkbDsk9wYpJ8P45vRg49mid3BdwaSVvxKv5
bCiapnAt92yu9K7W3sxzidZfKTtklWi4KR1IGnaT201xPxrU+//0Blyw9Q90SvgCPMyas/SPYm9
mESPFFy08VJ7NdRUQIApCio1olB2FE2Czi+t+SK2pWPjs7FkP6EUdlx3yXKWXZC8Y0Fb3W0a/m0
wKfNIGRb3Hcy/xHCxPTt+PSzktuSdyglI97l+q6CiLN0mCa/GVv/AajxQA5I8a037B7+yVyAYbb
dIut6DfBSZ0239pKeQrUX3VJ6TOeoZHHkGJ98E1MZz/m3tVvrHkNUyZycE1nd1tEk0Ajn9jYHCv
LL0p/UflOgMoHo5555G0KKKK0555501HHHG0KKKK0555501HHHG0KKKK0555507/za
(Line endings inserted for readability)
It appears to be base64. If you add a single equal ('=') for padding to the end your decoder should be happy (see https://en.wikipedia.org/wiki/Base64#Padding).
Decoded it's 1568 bytes which mod 16 is zero. This histogram of byte value occurance is flat. So I'd guess something encrypted with a 128 bit block cipher like AES.
It does look like base64 to me. Most variants of base64 include the following characters:
A-Z a-z 0-9 + / (and = for padding)
However, if it were proper base64 it would end with a single = as padding, as the 2091 characters don't exactly fit a number of bytes.
Your data doesn't seem to decode to anything readable, so it might be binary data, or encrypted (or both). Only with thorough knowledge of cryptography systems, and a lot of hints and luck, might some expert be able to figure out the encryption used (if any), but that's beyond the scope of this site.
Without more information as to the source of the data, we can only guess.

Please help identify multi-byte character encoding scheme on ASP Classic page

I'm working with a 3rd party (Commidea.com) payment processing system and one of the parameters being sent along with the processing result is a "signature" field. This is used to provide a SHA1 hash of the result message wrapped in an RSA encrypted envelope to provide both integrity and authenticity control. I have the API from Commidea but it doesn't give details of encoding and uses artificially created signatures derived from Base64 strings to illustrate the examples.
I'm struggling to work out what encoding is being used on this parameter and hoped someone might recognise the quite distinctive pattern. I initially thought it was UTF8 but having looked at the individual characters I am less sure.
Here is a short sample of the content which was created by the following code where I am looping through each "byte" in the string:
sig = Request.Form("signature")
For x = 1 To LenB(sig)
s = s & AscB(MidB(sig,x,1)) & ","
Next
' Print s to a debug log file
When I look in the log I get something like this:
129,0,144,0,187,0,67,0,234,0,71,0,197,0,208,0,191,0,9,0,43,0,230,0,19,32,195,0,248,0,102,0,183,0,73,0,192,0,73,0,175,0,34,0,163,0,174,0,218,0,230,0,157,0,229,0,234,0,182,0,26,32,42,0,123,0,217,0,143,0,65,0,42,0,239,0,90,0,92,0,57,0,111,0,218,0,31,0,216,0,57,32,117,0,160,0,244,0,29,0,58,32,56,0,36,0,48,0,160,0,233,0,173,0,2,0,34,32,204,0,221,0,246,0,68,0,238,0,28,0,4,0,92,0,29,32,5,0,102,0,98,0,33,0,5,0,53,0,192,0,64,0,212,0,111,0,31,0,219,0,48,32,29,32,89,0,187,0,48,0,28,0,57,32,213,0,206,0,45,0,46,0,88,0,96,0,34,0,235,0,184,0,16,0,187,0,122,0,33,32,50,0,69,0,160,0,11,0,39,0,172,0,176,0,113,0,39,0,218,0,13,0,239,0,30,32,96,0,41,0,233,0,214,0,34,0,191,0,173,0,235,0,126,0,62,0,249,0,87,0,24,0,119,0,82,0
Note that every other value is a zero except occasionally where it is 32 (0x20). I'm familiar with UTF8 where it represents characters above 127 by using two bytes but if this was UTF8 encoding then I would expect the "32" value to be more like 194 (0xC2) or (0xC3) and the other value would be greater than 0x80.
Ultimately what I'm trying to do is convert this signature parameter into a hex encoded string (eg. "12ab0528...") which is then used by the RSA/SHA1 function to verify the message is intact. This part is already working but I can't for the life of me figure out how to get the signature parameter decoded.
For historical reasons we are having to use classic ASP and the SHA1/RSA functions are javascript based.
Any help would be much appreciated.
Regards,
Craig.
Update: Tried looking into UTF-16 encoding on Wikipedia and other sites. Can't find anything to explain why I am seeing only 0x20 or 0x00 in the (assumed) high order byte positions. I don't think this is relevant any more as the example below shows other values in this high order position.
Tried adding some code to log the values using Asc instead of AscB (Len,Mid instead of LenB,MidB too). Got some surprising results. Here is a new stream of byte-wise characters followed by the equivalent stream of word-wise (if you know what I mean) characters.
21,0,83,1,214,0,201,0,88,0,172,0,98,0,182,0,43,0,103,0,88,0,103,0,34,33,88,0,254,0,173,0,188,0,44,0,66,0,120,1,246,0,64,0,47,0,110,0,160,0,84,0,4,0,201,0,176,0,251,0,166,0,211,0,67,0,115,0,209,0,53,0,12,0,243,0,6,0,78,0,106,0,250,0,19,0,204,0,235,0,28,0,243,0,165,0,94,0,60,0,82,0,82,0,172,32,248,0,220,2,176,0,141,0,239,0,34,33,47,0,61,0,72,0,248,0,230,0,191,0,219,0,61,0,105,0,246,0,3,0,57,32,54,0,34,33,127,0,224,0,17,0,224,0,76,0,51,0,91,0,210,0,35,0,89,0,178,0,235,0,161,0,114,0,195,0,119,0,69,0,32,32,188,0,82,0,237,0,183,0,220,0,83,1,10,0,94,0,239,0,187,0,178,0,19,0,168,0,211,0,110,0,101,0,233,0,83,0,75,0,218,0,4,0,241,0,58,0,170,0,168,0,82,0,61,0,35,0,184,0,240,0,117,0,76,0,32,0,247,0,74,0,64,0,163,0
And now the word-wise data stream:
21,156,214,201,88,172,98,182,43,103,88,103,153,88,254,173,188,44,66,159,246,64,47,110,160,84,4,201,176,251,166,211,67,115,209,53,12,243,6,78,106,250,19,204,235,28,243,165,94,60,82,82,128,248,152,176,141,239,153,47,61,72,248,230,191,219,61,105,246,3,139,54,153,127,224,17,224,76,51,91,210,35,89,178,235,161,114,195,119,69,134,188,82,237,183,220,156,10,94,239,187,178,19,168,211,110,101,233,83,75,218,4,241,58,170,168,82,61,35,184,240,117,76,32,247,74,64,163
Note the second pair of byte-wise characters (83,1) seem to be interpreted as 156 in the word-wise stream. We also see (34,33) as 153 and (120,1) as 159 and (220,2) as 152. Does this give any clues as the encoding? Why are these 15[2369] values apparently being treated differently from other values?
What I'm trying to figure out is whether I should use the byte-wise data and carry out some post-processing to get back to the intended values or if I should trust the word-wise data with whatever implicit decoding it is apparently performing. At the moment, neither seem to give me a match between data content and signature so I need to change something.
Thanks.
Quick observation tells me that you are likely dealing with UTF-16. Start from there.

Resources