I'm converting some legacy code to ITK 4.7 for dicom manipulation. I'm reading private image tags but getting results in Base64 encoded format for few private tags.
I wondered about the value I'm getting for a tag as
MlwtNVwyNSA=
Basically it is encoded value for
2\-5\25 (in base64)
I know there is Base64.h that comes with gdcm library but question is is that header/functions part of ITK as well or do i need to create gdcm objects to convert the encoded values? or write my own C++ function for that conversion?
What shall be the most efficient (if not native) way within ITK 4.7 library?
By looking at the source code (gdcmBase64.h and .cxx), gdcm::Base64 is a self-contained class, which is independent from the rest of GDCM. Just #include "gdcmBase64.h", and call Encode and Decode as needed.
Alternatively, you can find implementations of base64 encoding and decoding and put them in your source file. base64 encoding is pretty simple.
Related
I am reading in a file that should be UTF-8 encoded using QTextStream::readAll(). If I attempt to open a corrupt UTF-8 file (or a binary file) I want to know that the data was not valid UTF-8.
I tried checking the status() after the read, but it did not indicate any abnormal condition.
I know I could read the whole file in binary mode and write a routine to check it myself, but it seems there should be an easier way, since the read has done all that UTF-8 conversion already.
You can use QTextCodec for this.
QTextCodec * QTextCodec::codecForUtfText(const QByteArray & ba, QTextCodec * defaultCodec)
From documentation:
Tries to detect the encoding of the provided snippet ba by using the
BOM (Byte Order Mark) and returns a QTextCodec instance that is
capable of decoding the text to unicode. If the codec cannot be
detected from the content provided, defaultCodec is returned.
I'm trying to read a file line-by-line in Ada, it's a XML text file. I'm following the instructions here:
http://rosettacode.org/wiki/Read_a_file_line_by_line#Ada
However there's a problem that annoys me: the "Get_Line" function seems to be unaware of byte-order marks and reads them as part of the text itself, which means that when I raed the lines, the first one will always start with some extra bytes that should not be there.
While removing the extra bytes manually from the string is no big deal it seems strange to me that a function dedicated to text input/output is unaware of BOMs, there must be a way to read a text file in ada without having to worry about this... is there?
Ada.Text_IO is specified to handle ISO-8859-1 encoded text, so ignoring an UTF-8 feature is the proper thing to do.
If Ada.Wide_Text_IO and Ada.Wide_Wide_Text_IO also output the byte-order-mark, when asked to read UTF-8 encoded text, then you should consider reporting it as a bug to GCC - but as there is quite a lot of implementation defined details for the text I/O packages in Ada, you should be ready for a "wont fix" answer.
One possibility is using the stream attributes and making a UTF_8 file-type to handle the BOM reading-and-discarding.
I need to create DataMatrix barcodes which may contain non-Latin characters. I have code which creates the barcodes correctly when they only consist of Latin characters; when I run the same code with non-Latin (Hebrew or Russian) characters, however, although the code runs to completion and the barcode is created, the non-Latin characters are not deciphered by the barcode reader.
Any assistance or ideas would be greatly appreciated!
Your issue is related to the character encoding used prior to generating the barcode. The encoding used by the generator to encode must match the encoding used by the reader to decode.
Possible encodings are:
Extended Channel Interpretations (ECI) is supported by DataMatrix and other 2D barcode standards. The generator places an ECI identifying code inside the barcode data, so the reader knows to use ECI to correctly convert the data back to text.
UTF-8 encodes pretty much any language.
Code page is an older encoding, but if your generator is using it, you can use 1255 for Hebrew code page or 1251 for Russian code page. See this SO answer for more info.
To test your encoding, try Inlite's Online Barcode Reader (OBR) which should read the correct text for ECI and UTF-8 encoded barcodes. If it does, the problem is with your barcode reader which is not decoding correctly.
If OBR returns binary data, either your generator uses code page or does not encode correctly at all. Try another generator that supports ECI or UTF-8.
In the Qt documentation it states that (among others) the following Unicode string encodings are supported:
UTF-8
UTF-16
UTF-16BE
UTF-16LE
UTF-32
UTF-32BE
UTF-32LE
Due to the three different codecs listed for 2 and 4 octet encoded Unicode, I was wondering: how do the two non-endian codecs ("UTF-16" and "UTF-32") decide which endianness to use?
Based on the source code in src/corelibs/codecs/, it seems Qt uses the byte ordering of the host for UTF-16 and UTF-32.
If you use QTextCodec to read an existing Unicode string that has a BOM, and you didn't explicitly ask to ignore the header, the byte ordering detected in the string is used.
In *qutfcodec_p.h* both QUtf16Codec::e and QUtf32Codec::e are initialized with the value DetectEndianness (an enum).
In qutfcodec.cpp, near the beginning of the functions convertFromUnicode and convertToUnicode from the classes QUtf16 and QUtf32 (used by QUtf16Codec and QUtf32Codec), you can find the line:
endian = (QSysInfo::ByteOrder == QSysInfo::BigEndian)
? BigEndianness : LittleEndianness;
I have a string that is:
!"#$%&'()*+,-./0123456789:;?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[]\^_`abcdefghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª« ®¯°±²³´µ¶•¸¹º»¼½¾¿ÀÁÂÃÄÅàáâäèçéêëìíîïôö÷òóõùúý
I post that to service and used Htmlencode, then I get a result:
!#$%&'()* ,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~����������� ���������•������������������������������������
it isn't result that i need,how i get original string? thanks!
Your string is not ASCII, so you are either using a string to represent binary data, or you're not maintaining awareness of multi-byte encoding. In any case, the simplest way to deal with any Internet-based technology (HTTP, SMTP, POP, IMAP) is to encode it as 7-bit clean. One common way is to base64-encode your data, send it across the wire, then base64-decode it before trying to process it.
I believe this is what you're looking for:
!"#$%&'()*+,-./0123456789:;?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[]\\^_`abcdefghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«®¯°±²³´µ¶•¸¹º»¼½¾¿ÀÁÂÃÄÅàáâäèçéêëìíîïôö÷òóõùúý
You just need to use a better html entity/encoding library or tool. The one I used to generate this is from Ruby - I used the HTML Entities library. The code I wrote to do this follows. I had to put your text in input.txt to preserve Unicode (there was an EOF character in the string), but it worked great.
require 'rubygems'
require 'htmlentities'
str = File.read('input.txt')
coder = HTMLEntities.new
puts coder.encode(str, :named)