Bytes IO and Text Files - bytesio

This may come as a silly question but it would really help me clear my concepts.
Bytes IO utilizes bytes string which means it uses bytes data.
And bytes data is not human readable.
So when I code the following:
f = open('new.txt','wb')
f.write(b'helloworld')
f.close()
A txt file appears in the python rote directory and when I open it, the text 'hello world' is present in it. My question is that I coded bytes data. Why am I able to read it in a text file as 'hello world' when bytes data is only computer readable.

Related

Examples that send unkown size data with http chunked header?

I still don't have a clear picture of practical examples of the chunked header usage, after reading some posts and Wikipedia.
One example I see from Content-Length header versus chunked encoding, is:
On the other hand, if the content length is really unpredictable
beforehand (e.g. when your intent is to zip several files together and
send it as one), then sending it in chunks may be faster than
buffering it in server's memory or writing to local disk file system
first.
So it means that I can send zip files while I am zipping them ? How ?
I've also noticed that if I download a GitHub repo, I am receiving data in chunked. Does GitHub also send files in this way (sending while zipping) ?
A minimal example would be much appreciated. :)
Here is an example using perl (with IO::Compress::Zip module) to send a zipped file on the fly as #deceze pointed to
use IO::Compress::Zip qw(:all);
my #files = ('example.gif', 'example1.png'); # here are some files
my $path = "/home/projects/"; # files location
# here is the header
print "Content-Type: application/zip\n"; # we are going to compress to zip and send it
print "Content-Disposition: attachment; filename=\"zip.zip\"\r\n\r\n"; # zip.zip for example is where we are going to zip data
my $zip = new IO::Compress::Zip;
foreach my $file (#files) {
$zip->newStream(Name => $file, Method => ZIP_CM_STORE); # storing files in zip
open(FILE, "<", "$path/$file");
binmode FILE; # reading file in binary mode
my ($buffer, $data, $n);
while (($n = read FILE,$data, 1024) != 0) { # reading data from file to the end
$zip->print($data); # print the data in binary
}
close(FILE);
}
$zip->close;
As you see in the script so even if you add the zip filename in the header, it doesn't matter, because we are zipping the files and printing it in binary mode right away, so it's not necessary to zip the data and store them then send it to the client, you can directly zip the files and print them without storing it.

How can I tell if my dicom files are compressed?

I have been working with dicom files that are about 4 MB each but I recently received some which are 280 KB each. I am not sure whether this is because they are from different CT scanners or if the new dicoms were compressed before being given to me.
Is there a way to find out and if they are compressed is there a way to uncompressed them to the original size?
This is in continuation to the other answer from #kritzel_sw.
If you see any of the following UIDs in (0002,0010) Transfer Syntax UID element:
1.2.840.10008.1.2 Implicit VR Endian: Default Transfer Syntax for DICOM
1.2.840.10008.1.2.1 Explicit VR Little Endian
1.2.840.10008.1.2.2 Explicit VR Big Endian
then the Pixel Data (7FE0,0010) Pixel Data is uncompressed. You will generally observe bigger file size here.
Not a part of your question, but objects other than image (PDF may be in case of Structured Report) can be encapsulated with following Transfer Syntax:
1.2.840.10008.1.2.1.99 Deflated Explicit VR Little Endian
Other well known values for Transfer Syntax mean that the Pixel Data is compressed.
Note that there are also private Transfer Syntax values possible for data set. Implementation of those values is generally private to the respective manufacturer.
Yes and yes.
I recommend the binary tools from the OFFIS DICOM toolkit, but you will be able to achieve the same results with different toolkits. You can find the dcmtk here.
How to find out if your files are compressed:
dcmdump <filename>
Have a look at the metaheader, the attribute Transfer Syntax UID (0002,0010) in particular. Dcmdump "translates" the unique identifier to the human readable transfer syntax, e.g.
(0002,0010) UI =LittleEndianExplicit # 20, 1 TransferSyntaxUID
The Transfer Syntax tells you whether or not the pixel data in this DICOM file is compressed.
How to decompress compressed images:
dcmdjpeg <compressed DICOM file in> <uncompressed DICOM file out>

Why a hex file is used in burning program in micro controller?

When ever we program a micro controller we convert the C file into a hex file and then we burn that into controller.
My question is that why a hex file only, is that hex file a hexadecimal version of binary executable?
If yes then why do not we use a binary file instead?
if you are talking about an "intel hex" file the reason being is that it is ascii which makes it easy to examine and parse. true, it is innefficient in one way but compared to a raw binary it might be smaller. With a raw binary you only have one if any address associated, the starting address (not embedded in the file) in a hex file or motorola srecord which is a similar and often used format as well. both the ihex and srec formats are basically lines of ascii/hex numbers that represent a type a starting address, length data, and a checksum. there are non data lines in there but much of it will be data. so if your program has a few bytes at address 0x1000 and a few bytes at 0x80000000 then a .bin file would be at its smallest 0x8000000-0x1000 plus a few bytes but would typically be 0x80000000+ a few bytes (right, 2 gigabytes). Where an ihex or srec would be in the dozens of bytes total. the ihex and srec have built in checksums to help protect against corrupt files, not perfect of course but better than nothing at all...
Since then elf and coff and other formats have become popular. these are also based on blocks of data and not a complete memory image. these are binary, not ascii formats, but they are not just a memory image. chunks of data with address, type, etc are provided.
Because the ihex and srec are so simple to create and parse they will continue to be used for a long time, it does not take a lot of resources in a bootloader for example to handle receiving an ihex or srec file. (same with a binary of course, but the binary has a lot of fill data in it costing a lot of unnecessary transmission time).

jpg file difference : from wireshark tcp stream and from a C++ socket

I'm trying to record a jpeg image sent by an Ethernet camera in a mjpg stream.
The image I obtain with my Borland C++ application (VSPCIP) looks identical in Notepad++ to the tcp stream saved from the application Wireshark (except for the number of characters : 15540 in my file, and 15342 in the wireshark file, whereas the jpeg content-length is announced to be 15342).
That is to say that I have 198 non-displayable characters more than expected but both files have 247 lines.
Here are the two files :
http://demo.ovh.com/fr/a61295d39f963998ba1244da2f55a27d/
Which tool could I use (in Notepad++ (I tried to display in UTF8 or ANSI : files still match whereas they don't have the same number of characters) or another editor) to view the non-displayable characters ?
std::ofstream by default opens the file in text mode, which means it might translate newline characters ('\n' binary 0x0a) into a carriage-return/newline sequence ("\r\n", binary 0x0d and 0x0a).
Open the output file in binary mode and it will most likely solve your problem:
std::ofstream os("filename", ios_base::out | ios_base::binary);

InputB vs. Get; code pages; slow reading on unix server

We have been using the usual code to read in a complete file into a string to then parse in VB6. The files are ANSI text but encoded using whatever code page the user was in at the time (we have Chinese and English users for example). This is the code
Open FileName For Binary As nFileUnit
sContents = StrConv(InputB(LOF(nFileUnit), nFileUnit), vbUnicode)
However, we have discovered this is VERY slow reading a file from a server running unix/linux, particularly when the ownership of the file is not the same as the process doing the reading.
I have rewritten the above using Get and discovered it is much faster and does not suffer from any issues with file ownership. I appreciate that this might be solved by reconfiguring the server somehow, but I think since deiscovering even without that issue, the Get method is still much faster than InputB I'd like to replace my existing code using Get.
I wonder if someone could tell me if this will really do the same thing. In particular, is it correctly doing the ANSI to Unicode conversion and will this always be true. My testing suggests the following replacement code does the same thing but faster:
Open FileName For Binary As nFileUnit
sContents = String(LOF(nFileUnit), " ")
Get #nFileUnit, , sContents
I also realise I could use a byte array, but again my tests suggest the above is simpler and works. So how does the buffer work correctly (if you believe the online help for Get it talks of characters returned - clearly this would cause problems when reading in an ANSI file written on the Chinese code page with 2-byte Chinese characters in it).
The following might be of interest becuase the InputB approach is commonly given as the method to read a complete file, but it is much slower, examples
Reading 380Kb file across the network from the unix server
InputB (file owned) = 0.875 sec
InputB (not owned) = 72.8 sec
Get (either) = 0.0156 sec
Reading a 9Mb file across the network from the unix server
InputB (file owned) = 19.65 sec
Get (either) = 0.42 sec
Thanks
Jonathan
InputB() is CVar(InputB$()), and is known to be horribly slow. My suspicion is that InputB$() reads the bytes and converts them to Unicode using the current codepage via some stock logic for reading text from disk, then does another conversion back to ANSI using the current codepage.
You might be far ahead to use ADODB.Stream.LoadFromFile() to load complete ANSI text files. You can set the .Type = adTypeText and .Charset = the appropriate ANSI encoding as required to read Unicode back out of it via .ReadText(x) where x can be a number of bytes, or adReadAll or adReadLine. For line reading you can set .LineSeparator to adCR, adCRLF, or adLF as required.
Many Charset values are supported: KOI8 for Cyrillic, Big5 for Chinese, etc.

Resources