How can I use SyncSort to convert data to unsigned packed format? - unsigned

I have a requirement to convert numeric data (stored as character on input) to either packed signed or packed unsigned formats. I can convert to packed/signed using the "PD" format, but I'm having a difficult time getting unsigned packed data.
For instance, I need a ZD number like 14723 converted to:
042
173
Using PD, I get this (which is fine):
0173
042C
Any suggestions? We do not have COBOL at this shop and are relying on SyncSort to handle these data conversions. I'm not seeing a "PK" option in SyncSort, but I've missed things before!

So you don't want a packed-decimal, which always has a sign (even when F for unsigned) in the low-order half-byte. You want Binary Coded Decimal (BCD).
//STEP0100 EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTOUT DD SYSOUT=*
//SYSIN DD *
OPTION COPY
INREC IFTHEN=(WHEN=INIT,OVERLAY=(1,5,ZD,MUL,+10,TO=PD,LENGTH=4)),
IFTHEN=(WHEN=INIT,BUILD=(1,3))
//SORTIN DD *
14723
Will give you, in vertical hex:
042
173
To use an existing BCD, look at field-type PD0.

Related

How do I convert a signed 16-bit hexadecimal to decimal?

I have a very long sequence of data stored as unsigned 16-bit PCM (it's audio). Using Frhed, I can view the file as hex. I'd like to convert the hex into decimal. So far I've exported into a csv file and attempted to convert in R using as.integer(). However, this doesn't seem to take into account that the values are signed.
You can convert hex text strings to decimal digits using strtoi. You will need to specify that the base is 16.
HexData = c("A167", "AE8F", "6A99", "0966", "784D", "A637", "102D", "4521")
strtoi(HexData, base=16L)
[1] 41319 44687 27289 2406 30797 42551 4141 17697
This is assuming unsigned data. Your question mentions both signed and unsigned so I am not quite sure which you have.

How to convert IBM file to hexadecimal using DFSORT?

I'm trying to pass a IBM file to hex values.
With this input:
H800
Would save this output in a file:
48383030
I tried by this way:
//R45ORF80V JOB (EFAS,2SGJ000),'LLAMI',NOTIFY=R45ORF80,
// MSGLEVEL=(1,1),MSGCLASS=X,CLASS=A,
// REGION=0M,TIME=5
//*---------------------------------------------------
//SORTEST EXEC PGM=ICEMAN
//SORTIN DD DSN=LF58.DFE.V1408001,DISP=SHR
//SORTOUT DD DSN=LF58.DFE.V1408001.OUT,
// DISP=(NEW,CATLG,DELETE),
// LRECL=4,DATACLAS=CDMULTI
//SYSOUT DD SYSOUT=X
//SYSPRINT DD SYSOUT=X
//SYSUDUMP DD SYSOUT=X
//SYSIN DD *
SORT FIELDS=COPY
OUTREC FIELDS=(1,4,HEX)
END
/*
But it outputs the following:
C8F1F0F0
What am I doing wrong?
Is posible to convert to hexadecimal a file with 500 of LREC with COMP-3 fields too?
Just by the way I could use "HEX" command while I browse a file using file manager.
Your control cards are giving you the output you have asked for. They are showing you the hexadecimal values of those characters in EBCDIC, not in ASCII, the hexadecimal values you are expecting.
If you actually want to see the ASCII equivalent, use TRAN=ETOA, then TRAN=HEX.
You are using OUTREC FIELDS. FIELDS has a new synonym (from exactly 10 years) which is BUILD. FIELDS is supported for backwards compatibility.
INREC and OUTREC are similar, INREC operates before a SORT or MERGE, OUTREC afterwards.
What I recommend, unless you need to be doing it after a SORT/MERGE, is to use INREC.
So:
INREC BUILD=(1,4,TRAN=ETOA)
But, there is no need to use BUILD. BUILD always creates a new version of the record. Many times this is what you want when you are rearranging fields. Here, you are not.
INREC OVERLAY=(1,4,TRAN=ETOA)
If you replace your OUTREC with that, your output file will be encoded in ASCII.
If you want to see the ASCII as well:
INREC OVERLAY=(1,4,TRAN=ETOA,1,4,TRAN=HEX)
If you want to see the ASCII instead:
INREC OVERLAY=(1,4,TRAN=ETOA,1:1,4,TRAN=HEX)
Note the 1: in the last example. This says "the results are going to be at position 1", so overwriting your previous converted data. OVERLAY can do that, BUILD cannot in one statement.

Convert hex to text using SQLite

I am trying to convert from a string representing hex data to the textual encoding (ASCII, UTF-8, etc.) of the hex data using purely SQLite language. Essentially I want the functionality of the X'[hex]' syntax, but applied to a programmatically derived hex string.
I want, for example, select X(hex_data_string) from ..., which is not legal SQLite syntax.
Obviously in the above snippet, I would not necessarily be able to output the data if it was not in a valid textual encoding. That is, if hex_data_string contains control chars, etc., the X() should fail in some way. If this is possible, there would have to be a default character encoding or the desired character encoding would have to be specified somehow.
I am not asking about how to retrieve the hex data string value from the SQLite database and then use C or some other facility to convert it. I am trying to perform this conversion in pure SQLite because I have queries that I check which return a text representation of hex characters representing binary data. Most of the binary data is ASCII, so I want to be able to quickly view the content of the binary data in my query output when applicable.
Intuitively, I figured this could be accomplished by casting the hex data string to a blob and using hex() but that still returns the hex data string.
Any ideas?
Possible duplicates:
SQLite X'...' notation with column data
sqlite char, ascii function
This is quite old but I was looking for the same thing so I thought I'd post:
It seems you can just cast hex data as a varchar to convert it to ascii.
ie:
select cast(data as varchar) from some_table will return a string representation of a binary field (data).
This is not possible in pure SQLite.
As an embedded database, SQLite is designed to provide only pure database functions, and leave the program logic to the the application.
You are supposed to retrieve the hex data string value from the SQLite database and then use C or some other facility to convert it.
If you control the program you're running the queries in, you could install a user-defined function that does this conversion.
This will not be fully useful since it can't really be used inline, but I have done this in the past when I just wanted to convert one row of data and not create/find another utility.
WITH RECURSIVE test(c,cur) as (
select '','686F77647921'
UNION ALL
select c || char((case substr(cur,1,1) when 'A' then 10 when 'B' then 11 when 'C' then 12 when 'D' then 13 when 'E' then 14 when 'F' then 15 else substr(cur,1,1) end)*16
+ (case substr(cur,2,1) when 'A' then 10 when 'B' then 11 when 'C' then 12 when 'D' then 13 when 'E' then 14 when 'F' then 15 else substr(cur,2,1) end)),
substr(cur,3)
from test where length(cur)>0
)
select * from test

Delimiting binary sequences

I need to be able to delimit a stream of binary data. I was thinking of using something like the ASCII EOT (End of Transmission) character to do this.
However I'm a bit concerned -- how can I know for sure that the particular binary sequence used for this (0b00000100) won't appear in my own binary sequences, thus giving a false positive on delimitation?
In other words, how is binary delimiting best handled?
EDIT: ...Without using a length header. Sorry guys, should have mentioned this before.
You've got five options:
Use a delimiter character that is unlikely to occur. This runs the risk of you guessing incorrectly. I don't recommend this approach.
Use a delimiter character and an escape sequence to include the delimiter. You may need to double the escape character, depending upon what makes for easier parsing. (Think of the C \0 to include an ASCII NUL in some content.)
Use a delimiter phrase that you can determine does not occur. (Think of the mime message boundaries.)
Prepend a length field of some sort, so you know to read the following N bytes as data. This has the downside of requiring you to know this length before writing the data, which is sometimes difficult or impossible.
Use something far more complicated, like ASN.1, to completely describe all your content for you. (I don't know if I'd actually recommend this unless you can make good use of it -- ASN.1 is awkward to use in the best of circumstances, but it does allow completely unambiguous binary data interpretation.)
Usually, you wrap your binary data in a well known format, for example with a fixed header that describes the subsequent data. If you are trying to find delimeters in an unknown stream of data, usually you need an escape sequence. For example, something like HDLC, where 0x7E is the frame delimeter. Data must be encoded such that if there is 0x7E inside the data, it is replaced with 0x7D followed by an XOR of the original data. 0x7D in the data stream is similarly escaped.
If the binary records can really contain any data, try adding a length before the data instead of a marker after the data. This is sometimes called a prefix length because the length comes before the data.
Otherwise, you'd have to escape the delimiter in the byte stream (and escape the escape sequence).
You can prepend the size of the binary data before it. If you are dealing with streamed data and don't know its size beforehand, you can divide it into chunks and have each chunk begin with size field.
If you set a maximum size for a chunk, you will end up with all but the last chunk the same length which will simplify random access should you require it.
As a space-efficient and fixed-overhead alternative to prepending your data with size fields and escaping the delimiter character, the escapeless encoding can be used to trim off that delimiter character, probably together with other characters that should have special meaning, from your data.
#sarnold's answer is excellent, and here I want to share some code to illustrate it.
First here is a wrong way to do it: using a \n delimiter. Don't do it! the binary data could contain \n, and it would be mixed up with the delimiters:
import os, random
with open('test', 'wb') as f:
for i in range(100): # create 100 binary sequences of random
length = random.randint(2, 100) # length (between 2 and 100)
f.write(os.urandom(length) + b'\n') # separated with the character b"\n"
with open('test', 'rb') as f:
for i, l in enumerate(f):
print(i, l) # oops we get 123 sequences! wrong!
...
121 b"L\xb1\xa6\xf3\x05b\xc9\x1f\x17\x94'\n"
122 b'\xa4\xf6\x9f\xa5\xbc\x91\xbf\x15\xdc}\xca\x90\x8a\xb3\x8c\xe2\x07\x96<\xeft\n'
Now the right way to do it (option #4 in sarnold's answer):
import os, random
with open('test', 'wb') as f:
for i in range(100):
length = random.randint(2, 100)
f.write(length.to_bytes(2, byteorder='little')) # prepend the data with the length of the next data chunk, packed in 2 bytes
f.write(os.urandom(length))
with open('test', 'rb') as f:
i = 0
while True:
l = f.read(2) # read the length of the next chunk
if l == b'': # end of file
break
length = int.from_bytes(l, byteorder='little')
s = f.read(length)
print(i, s)
i += 1
...
98 b"\xfa6\x15CU\x99\xc4\x9f\xbe\x9b\xe6\x1e\x13\x88X\x9a\xb2\xe8\xb7(K'\xf9+X\xc4"
99 b'\xaf\xb4\x98\xe2*HInHp\xd3OxUv\xf7\xa7\x93Qf^\xe1C\x94J)'

ColdFusion: Integer "0" doesn't convert to ASCII character

I have a string (comprised of a userID and a date/time stamp), which I then encrypt using ColdFusion's Encrypt(inputString, myKey, "Blowfish/ECB/PKCS5Padding", "Hex").
In order to interface with a 3d party I have to then perform the following:
Convert each character pair within the resultant string into a HEX value.
HEX values are then represented as integers.
Resultant integers are then output as ASCII characters.
All the ASCII characters combine to form a Bytestring.
Bytestring is then converted to Base64.
Base64 is URL encoded and finally sent off (phew!)
It all works seamlessly, APART FROM when the original cfEncrypted string contains a "00".
The HEX value 00 translates as the integer (via function InputBaseN) 0 which then refuses to translate correctly into an ASCII character!
The resultant Bytestring (and therefore url string) is messed up and the 3d party is unable to decipher it.
It's worth mentioning that I do declare: <cfcontent type="text/html; charset=iso-8859-1"> at the top of the page.
Is there any way to correctly output 00 as ASCII? Could I avoid having "00" within the original encrypted string? Any help would be greatly appreciated :)
I'm pretty sure ColdFusion (and the Java underneath) use a null-terminated string type. This means that every string contains one and only one asc(0) char, which is the string terminator. If you try to insert an asc(0) into a string, CF is erroring because you are trying to create a malformed string element.
I'm not sure what the end solution is. I would play around with toBinary() and toString(), and talk to your 3rd party vendor about workarounds like sending the raw hex values or similar.
Actually there is a very easy solution. The credit card company who is processing your request needs you to convert it to lower case letters of hex. The only characters processed are :,-,0-9 do a if else and convert them manually into a string.

Resources