Decoding Binary Data in Tcl

Decoding Binary Data in Tcl - tcp

I am reading data from a TCP port in TCL using a socket. The messages do not end with any newline, but they do container a header containing the number of bytes of data.
I have the following code to read two byte of data from the socket (16bit little endian) and convert that into an integer I can then use in a loop to read the rest of the data:
binary scan [read $Socket 2] s* length
In this case $Socket is my socket and it has been configured to use binary encoding.
This works well except where either the upper or lower byte is 0x0D. It appears TCL reads 0x0D and 0x0A both as '\n', which then defaults to 0x0A, so the code does work correctly. For example 13 is read as 10. How do I stop this from happening?

The socket should be placed into binary mode if you're moving binary data across it.
chan configure $Socket -translation binary
# Use [fconfigure] instead of [chan configure] in older Tcl versions
This disables all the automatic processing that Tcl usually does — your description says you're having a problem with end-of-line conversion — and makes it so that read will just deliver a string of the bytes (formally a string of characters between U+000000 and U+0000FF, and internally using an efficient in-memory encoding scheme).
For files, you can include b in the control mode when opening to get this done for you. For sockets, you need to do this yourself.

In addition to configuring binary encoding, you also need to set the translation to 'lf'. As this is a frequently occurring situation, there is a shorthand for making these two settings:
fconfigure $Socket -translation binary

Related

What does \x00# mean?

I read an Executable file (exe) and I saw \x00#, I know that 0x00 is NULL, but what does the # represent in hexdecimal? I couldn't find any information about this.
Example
b'MZ\x90\x00\x03\x00\x00\x00\x04\x00\x00\x00\xff\xff\x00\x00\xb8\x00\x00\x00\x00\x00\x00\x00#\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xc0\x00\x00\x00\x0e\x1f\xba\x0e\x00\xb4\t\xcd!\xb8\x01L\xcd!This program cannot be run in DOS mode.\r\r\n'

It means nothing special, you are simply viewing raw binary in some manner of bad editor and # simply means value 0x40, or perhaps 0x0040. Perhaps the editor is using a symbol format (some UTF?) where most of these raw hex values don't make sense, but it was able to represent 0x40 or 0x0040 as #.
I'm guessing this binary goo is from the PE Format for Windows executables.

Wrong result of TIdURI.URLDecode when unit LazUTF8 is used

With Free Pascal 3.0.4, this test program correctly writes ÄÖÜ
program FPCTest;
uses IdURI;
begin
WriteLn(TIdURI.URLDecode('%C3%84%C3%96%C3%9C'));
ReadLn;
end.
However if the unit LazUTF8 (as described here) is used, it writes ???
program FPCTest;
uses IdURI, LazUTF8;
begin
WriteLn(TIdURI.URLDecode('%C3%84%C3%96%C3%9C'));
ReadLn;
end.
How can I fix this decoding error for programs which use LazUTF8?

When the String type is an alias for AnsiString 1, much of Indy's functionality exposes extra parameters/properties to let users control which ANSI encodings are used when AnsiString values are passed around in operations that perform AnsiString<->byte conversions.
1: Delphi pre-2009, and FreePascal/Lazarus when {$ModeSwitch UnicodeStrings} and {$Mode DelphiUnicode} are not used (FYI, Indy 11 will use them!).
In most cases, Indy's default byte encoding is ASCII (because many of the Internet protocols that Indy implements originally supported only ASCII - individual Indy components upgrade themselves to UTF as appropriate per protocol), though some things use the OS default codepage/charset instead.
Indy's default byte encoding can be changed at runtime by setting the global GIdDefaultTextEncoding variable in the IdGlobal unit, eg:
GIdDefaultTextEncoding := encUTF8;
But, in this particular situation, TIdURI.URLEncode() does not use GIdDefaultTextEncoding, but it does have an optional ADestEncoding parameter that you can use to specify a specific byte encoding for the returned AnsiString (in addition to an optional AByteEncoding parameter to specify the byte encoding of the parsed url octets - UTF-8 by default), eg:
TIdURI.URLDecode('%C3%84%C3%96%C3%9C'
{$IFNDEF FPC_UNICODESTRINGS}, IndyTextEncoding_UTF8, IndyTextEncoding_UTF8{$ENDIF}
)
The above will parse the url-encoded octets as UTF-8, and then return that data as-is in a UTF-8 encoded AnsiString.
If you do not specify an output encoding for ADestEncoding, URLDecode() defaults to the OS default. If you want it to use GIdDefaultTextEncoding instead, specify IndyTextEncoding_Default in the ADestEncoding parameter:
TIdURI.URLDecode('%C3%84%C3%96%C3%9C'
{$IFNDEF FPC_UNICODESTRINGS}, IndyTextEncoding_UTF8, IndyTextEncoding_Default{$ENDIF}
)
Another option would be to use the IndyTextEncoding(CodePage) function for ADestEncoding, passing it FreePascal's DefaultSystemCodePage variable, which the LazUtils package sets to CP_UTF8 2:
TIdURI.URLDecode('%C3%84%C3%96%C3%9C'
{$IFNDEF FPC_UNICODESTRINGS}, IndyTextEncoding_UTF8, IndyTextEncoding(DefaultSystemCodePage){$ENDIF}
)
2: I have opened a ticket in Indy's issue tracker to add support for DefaultSystemCodePage when compiling for FreePascal/Lazarus.

With this change in TIdURI.URLDecode lines 386ff LazUTF8 can be used:
{$IFDEF FPC}
Result := string(AByteEncoding.GetString(LBytes));
{$ELSE}
{$IFDEF STRING_IS_ANSI}
EnsureEncoding(ADestEncoding, encOSDefault);
CheckByteEncoding(LBytes, AByteEncoding, ADestEncoding);
SetString(Result, PAnsiChar(LBytes), Length(LBytes));
{$ELSE}
Result := AByteEncoding.GetString(LBytes);
{$ENDIF}
{$ENDIF}
Notes
This change assumes that the LazUTF8 unit is used always, and the Indy source code change needs to be applied every time when a new version is used.
Also I found no way to fix the TIdURI.URLDecode in a way which works with and without LazUTF8.

Why a hex file is used in burning program in micro controller?

When ever we program a micro controller we convert the C file into a hex file and then we burn that into controller.
My question is that why a hex file only, is that hex file a hexadecimal version of binary executable?
If yes then why do not we use a binary file instead?

if you are talking about an "intel hex" file the reason being is that it is ascii which makes it easy to examine and parse. true, it is innefficient in one way but compared to a raw binary it might be smaller. With a raw binary you only have one if any address associated, the starting address (not embedded in the file) in a hex file or motorola srecord which is a similar and often used format as well. both the ihex and srec formats are basically lines of ascii/hex numbers that represent a type a starting address, length data, and a checksum. there are non data lines in there but much of it will be data. so if your program has a few bytes at address 0x1000 and a few bytes at 0x80000000 then a .bin file would be at its smallest 0x8000000-0x1000 plus a few bytes but would typically be 0x80000000+ a few bytes (right, 2 gigabytes). Where an ihex or srec would be in the dozens of bytes total. the ihex and srec have built in checksums to help protect against corrupt files, not perfect of course but better than nothing at all...
Since then elf and coff and other formats have become popular. these are also based on blocks of data and not a complete memory image. these are binary, not ascii formats, but they are not just a memory image. chunks of data with address, type, etc are provided.
Because the ihex and srec are so simple to create and parse they will continue to be used for a long time, it does not take a lot of resources in a bootloader for example to handle receiving an ihex or srec file. (same with a binary of course, but the binary has a lot of fill data in it costing a lot of unnecessary transmission time).

AT+CMGS returns ERROR

I am using SIM900 GSM module connect to my AVR Microcontroller.
I tested it with FT232 to see transmitting data.
First Micro sends AT it will response OK
AT OK
AT+CMGF=1 OK
AT+CMGS="+9893XXXXXX" returns ERROR and doesn't show ">"
Could anybody advise me what to do?

Command AT+CSCS? will answer You what type of sms-encoding is used. Properly answer is "GSM", and if not, You should set it by command AT+CSCS="GSM".
And remember about "Ctrl+Z" (not "Enter") as a finish of sms text, please.

You aren't passing all the parameters to the command.
The command format is:
AT+CMGS=<number><CR><message><CTRL-Z>
Where:
<CR> = ASCII character 13
<CTRL-Z> = ASCII character 26
You have passed only the number and without the <CR> you won't see the > note for the message.
Example:
AT+CMGS="+9893XXXXXX"
> This is the message.→
The response is:
+CMGS:<mr>
OK
Where <mr> is the message reference.

If AT+CSCS? command returns UCS2, then many arguments need to be encoded as hex string of UTF-16 encoding, so the phone number would become "002B0039003800390033...", and the SMS text would need to be encoded in the same way. If you don't need UCS2 encoding, then the easiest thing to do is to switch to GSM encoding (or another encoding from the available set as shown by AT+CSCS=? command)

Sometimes the issue is the text mode you are in. Enter AT+CMGF? and you should receive +CMGF: 1. If instead you receive +CMGF: 0, enter AT+CMGF=1. This changes the message format from PDU mode to Text mode. I'm not sure what either of those mean exactly, but this fixed my issue.
SIM 800 AT command manual

jpg file difference : from wireshark tcp stream and from a C++ socket

I'm trying to record a jpeg image sent by an Ethernet camera in a mjpg stream.
The image I obtain with my Borland C++ application (VSPCIP) looks identical in Notepad++ to the tcp stream saved from the application Wireshark (except for the number of characters : 15540 in my file, and 15342 in the wireshark file, whereas the jpeg content-length is announced to be 15342).
That is to say that I have 198 non-displayable characters more than expected but both files have 247 lines.
Here are the two files :
http://demo.ovh.com/fr/a61295d39f963998ba1244da2f55a27d/
Which tool could I use (in Notepad++ (I tried to display in UTF8 or ANSI : files still match whereas they don't have the same number of characters) or another editor) to view the non-displayable characters ?

std::ofstream by default opens the file in text mode, which means it might translate newline characters ('\n' binary 0x0a) into a carriage-return/newline sequence ("\r\n", binary 0x0d and 0x0a).
Open the output file in binary mode and it will most likely solve your problem:
std::ofstream os("filename", ios_base::out | ios_base::binary);

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex