jpg file difference : from wireshark tcp stream and from a C++ socket - tcp

I'm trying to record a jpeg image sent by an Ethernet camera in a mjpg stream.
The image I obtain with my Borland C++ application (VSPCIP) looks identical in Notepad++ to the tcp stream saved from the application Wireshark (except for the number of characters : 15540 in my file, and 15342 in the wireshark file, whereas the jpeg content-length is announced to be 15342).
That is to say that I have 198 non-displayable characters more than expected but both files have 247 lines.
Here are the two files :
http://demo.ovh.com/fr/a61295d39f963998ba1244da2f55a27d/
Which tool could I use (in Notepad++ (I tried to display in UTF8 or ANSI : files still match whereas they don't have the same number of characters) or another editor) to view the non-displayable characters ?

std::ofstream by default opens the file in text mode, which means it might translate newline characters ('\n' binary 0x0a) into a carriage-return/newline sequence ("\r\n", binary 0x0d and 0x0a).
Open the output file in binary mode and it will most likely solve your problem:
std::ofstream os("filename", ios_base::out | ios_base::binary);

Related

Decoding Binary Data in Tcl

I am reading data from a TCP port in TCL using a socket. The messages do not end with any newline, but they do container a header containing the number of bytes of data.
I have the following code to read two byte of data from the socket (16bit little endian) and convert that into an integer I can then use in a loop to read the rest of the data:
binary scan [read $Socket 2] s* length
In this case $Socket is my socket and it has been configured to use binary encoding.
This works well except where either the upper or lower byte is 0x0D. It appears TCL reads 0x0D and 0x0A both as '\n', which then defaults to 0x0A, so the code does work correctly. For example 13 is read as 10. How do I stop this from happening?
The socket should be placed into binary mode if you're moving binary data across it.
chan configure $Socket -translation binary
# Use [fconfigure] instead of [chan configure] in older Tcl versions
This disables all the automatic processing that Tcl usually does — your description says you're having a problem with end-of-line conversion — and makes it so that read will just deliver a string of the bytes (formally a string of characters between U+000000 and U+0000FF, and internally using an efficient in-memory encoding scheme).
For files, you can include b in the control mode when opening to get this done for you. For sockets, you need to do this yourself.
In addition to configuring binary encoding, you also need to set the translation to 'lf'. As this is a frequently occurring situation, there is a shorthand for making these two settings:
fconfigure $Socket -translation binary

Parse .a2l and .hex files

I'm trying to parse some .a2l and .hex files to extract variables and their values. So far l don't know how to find the values of the variables in the .hex file. Here is a link to download an example of these files.
To be more specific : How can I read the value at the address 0x810600 in the .hex file ?
/begin CHARACTERISTIC ASAM.C.DEPENDENT.REF_1.SWORD
"Dependent SWORD"
VALUE
0x810600
RL.FNC.SWORD.ROW_DIR
0
CM.IDENTICAL
-32268 32267
/begin DEPENDENT_CHARACTERISTIC
"X1 + 5"
ASAM.C.SCALAR.SBYTE.IDENTICAL
/end DEPENDENT_CHARACTERISTIC
DISPLAY_IDENTIFIER DI.ASAM.C.DEPENDENT.REF_1.SWORD
/end CHARACTERISTIC
In the same A2L, please find RL.FNC.SWORD.ROW_DIR item, I guess it might be kind of signed word (2 bytes) type.
I'm not sure if this is kind of array or some special type... I assume this is just single variable (scalar).
Again, find CM.IDENTICAL item, as it's name maybe it's identical compu_method. This means HEX value 0 -> displayed screen as 0, HEX value 100 -> displayed screen as 100, ... identical between internal value and physical value. No special conversion I guess.
Go to the address 0x810600 in HEX then you can find some values there. As it is identical compu_method type, the value in HEX might be identically displayed in M/C SW (INCA, Vision, CANape, ...) I guess.
HEX is of intel hex format. This format is used to map each part of the file to a part in virtual address space of device. You can also use the following command if you use Linux:
objdump -s file.hex

Output of ARM-WB decoder - What is the format? and How to play it?

I downloaded the 3GPP AMR-WB codec (26.173) from http://www.3gpp.org/DynaReport/26173.htm and successfully compiled it. However, the file format generated from the decoder is some so-called binary synthesized speech file (*.out). I am wondering what is the exact format and how I can play the the file? Thanks
For AMR-WB, output will be raw PCM with following properties
16000 (16Khz) sampling frequency
1 (mono) channel
16 bits per channel
You can play it using Audacity or any other player which supports PCM input.

Why a hex file is used in burning program in micro controller?

When ever we program a micro controller we convert the C file into a hex file and then we burn that into controller.
My question is that why a hex file only, is that hex file a hexadecimal version of binary executable?
If yes then why do not we use a binary file instead?
if you are talking about an "intel hex" file the reason being is that it is ascii which makes it easy to examine and parse. true, it is innefficient in one way but compared to a raw binary it might be smaller. With a raw binary you only have one if any address associated, the starting address (not embedded in the file) in a hex file or motorola srecord which is a similar and often used format as well. both the ihex and srec formats are basically lines of ascii/hex numbers that represent a type a starting address, length data, and a checksum. there are non data lines in there but much of it will be data. so if your program has a few bytes at address 0x1000 and a few bytes at 0x80000000 then a .bin file would be at its smallest 0x8000000-0x1000 plus a few bytes but would typically be 0x80000000+ a few bytes (right, 2 gigabytes). Where an ihex or srec would be in the dozens of bytes total. the ihex and srec have built in checksums to help protect against corrupt files, not perfect of course but better than nothing at all...
Since then elf and coff and other formats have become popular. these are also based on blocks of data and not a complete memory image. these are binary, not ascii formats, but they are not just a memory image. chunks of data with address, type, etc are provided.
Because the ihex and srec are so simple to create and parse they will continue to be used for a long time, it does not take a lot of resources in a bootloader for example to handle receiving an ihex or srec file. (same with a binary of course, but the binary has a lot of fill data in it costing a lot of unnecessary transmission time).

Wordpress/Apache - 404 error with unicode characters in image filenames

We've recently moved a website to a new server, and are running into an odd issue where some uploaded images with unicode characters in the filename are giving us a 404 error.
Via ssh/FTP, we can see that the files are definitely there.
For example:
http://sjofasting.no/project/adnoy
none of the images are working:
Code:
<img class='image-display' title='' src='http://sjofasting.no/wp/wp-content/uploads/2012/03/ådnøy_1_2.jpg' width='685' height='484'/>
SSH:
-rw-r--r-- 1 xxxxxxxx xxxxxxxx 836813 Aug 3 16:12 ådnøy_1_2.jpg
What is also strange is that if you navigate to the directory you can even click on the image and it works:
http://sjofasting.no/wp/wp-content/uploads/2012/03/
click on 'ådnøy_1_2.jpg' and it works.
Somehow wordpress is generating
http://sjofasting.no/wp/wp-content/uploads/2012/03/ådnøy_1_2.jpg
and copying from the direct folder browse is generating
http://sjofasting.no/wp/wp-content/uploads/2012/03/a%CC%8Adn%C3%B8y_1_2.jpg
What is going on??
edit:
If I copy the image url from the wordpress source I get:
http://sjofasting.no/wp/wp-content/uploads/2011/11/Bore-Strand-Hotellg%C3%A5rd-12.jpg
When copied from the apache browser I get:
http://sjofasting.no/wp/wp-content/uploads/2011/11/Bore-Strand-Hotellga%cc%8ard-12.jpg
What could account for this discrepancy between:
%C3%A5 and %cc%8
??
Unicode normalisation.
0xC3 0xA5 is the UTF-8 encoding for U+00E5 a-with-ring.
0xCC 0x8A is the UTF-8 encoding for U+030A combining ring.
U+0035 is the composed (Normal Form C) way of writing an a-ring; an a letter followed by U+030A is the decomposed (Normal Form D) way of writing it. å vs å - they should look the same, though they may differ slightly depending on font rendering.
Now normally it doesn't really matter which one you've got because sensible filesystems leave them untouched. If you save a file called [char U+00E5].txt (å.txt), it stays called that under Windows and Linux.
Macs, on the other hand, are insane. The filesystem prefers Normal Form D, to the extent that any composed characters you pass into it get converted into decomposed ones. If you put a file in called [char U+00E5].txt and immediately list the directory, you'll find you've actually got a file called a[char U+030A].txt. You can still access the file as [char U+00E5].txt on a Mac because it'll convert that input into Normal Form D too before looking it up, but you cannot recover the same filename in character sequence terms as you put in: it's a lossy conversion.
So if you save your files on a Mac and then transfer to a filesystem where [char U+00E5].txt and a[char U+030A].txt refer to different files, you will get broken links.
Update the pages to point to the Normal Form D versions of the URLs, or re-upload the files from a filesystem that doesn't egregiously mangle Unicode characters.
Think Different, Cause Bizarre Interoperability Problems.

Resources