Hi I'm wondering how I can convert some mime text like =?ISO-8859-1?Q? into utf-8 so it's readable for the users.
Thanks
You can do it from file properties (Rt Clikc File in Eclipse) in resource
hopes that helps
Get your text into ByteArray
Use readMultiByte() and specify encoding you need (they used iso-8859-1 right in the example, he-he.)
Related
I was trying to write into DICOM tag (0008,0080) with Chinese words by fo-dicom. But found the Tag value just show the messy code in the result file. Please help to review it .
The C# code is below:
var file = DicomFile.Open(#"C:\Users\Administrator\Desktop\20D08F04");
//file.Dataset.Add(DicomTag.SpecificCharacterSet, "GB18030");
//file.Dataset.Add(DicomTag.SpecificCharacterSet, "ISO_IR 192");
//I already tried to specified the 0008,0005 with GBK and Utf-8. but it doesn't work.
file.Dataset.Add(DicomTag.InstitutionName, "测试");
file.Save(#"C:\Users\Administrator\Desktop\test123.dcm");
The resulting file looks like blew in the DCMTK editor.
May anyone of you can help me?
I am sure the DVTK Dicom file editor support the Chinese character set.
Because there is another attribute Patient's Name's value is Chinese. And can be viewed properly.
The default encoding in .NET fo-dicom is US-ASCII. It does not help if you set the Specific Character Set after you have opened the DICOM file, parsing is done in the open operation. Specific Character Set only applies if it already set in the DICOM file.
What you can do is to set the "fallback encoding" to be used if Specific Character Set is not specified in the DICOM file, in the argument list of DicomFile.Open.
Try this for example:
var file = DicomFile.Open(fileName, DicomEncoding.GetEncoding("GB18030"));
And as #johnelemans pointed out in the comments, also verify that your viewer is capable of displaying the Chinese character set.
I am working on a project which requires to convert PDF to text. The PDF contains Hindi fonts (Mangal to be specific) along with English.
100% of english is getting converted into text. The conversion of Hindi part is around 95%. Remaining 5% Hindi text is either coming as blank or like " ा". I could figure out that the accented characters are not getting converted to text properly.
I am using following code:
pdftotext -enc UTF-8 pdfname.pdf textname.txt
The PDF uses following Fonts
name, type, emb, sub, uni
ZDPKEY+Mangal, CID TrueType, yes, yes, yes
Mangal TrueType, no, no, no
Helvetica-Bold Type 1, no, no, no
CODUBM+Mangal-Bold, CID TrueType, yes, yes, yes
Mangal-Bold, TrueType, no, no, no
Times-Roman, Type 1 no, no, no
Helvetica, Type 1, no, no, no
Following is the result of conversion. Left side is original PDF. Right side is text opened in notepad:
http://preview.tinyurl.com/qbxud9o
My questions is whether the 5% missing / junk characters be correctly captured in Text with open-source packages? Would appreciate your inputs!
Change your code to.
pdftotext -enc "UTF-8" pdfname.pdf textname.txt
It has worked for me, similarly it should work for you.
I'm trying to get the degrees celsius symbol to show up using the pseudo selector :after but can't seem to any unicode to work. Using the symbol I have in place now prints a capital A before the degree symbol.
.temp:after{
content:"°C";
}
I’m pretty sure it actually prints “°”, i.e. capital A with circumflex before the degree sign. The reason is that the file containing the CSS code is UTF-8 encoded but being interpreted as windows-1252 encoded. (The degree sign, U+00B0, is 0xC2 0xB0 in UTF-8 encoding; if this is interpreted as windows-1252, or as ISO-8859-1, you get U+00C2 U+00B0, that is °.)
The solution is to declare the encoding of the file as UTF-8. The details depend on whether the CSS code is inside an HTML document or in a CSS file, and it may also depend on the server software. See the W3C page Character encodings.
If the code is in an CSS file, the simplest fix is to save that file, in your editor, as UTF-8 with BOM. Depending on software, this might be simply flagged as “UTF-8” (as opposite to “UTF-8 without BOM”). Another way is to write the following at the very start of the CSS file:
#charset "UTF-8";
this: content:'\00b0 C'; seems to work for me ? http://codepen.io/anon/pen/kvyFh
this could be helpfull to you : http://unicode-table.com/en/#00B0 (it gives you html entities code too ° )
I wanted to validate my Website for example with http://validator.w3.org but I always get the following error:
Sorry, I am unable to validate this document because on line 11 it
contained one or more bytes that I cannot interpret as utf-8 (in other
words, the bytes found are not valid values in the specified Character
Encoding). Please check both the content of the file and the character
encoding indication. The error was: utf8 "\xFC" does not map to
Unicode
Does anybody know where I can locate/get rid of the error?
open the css-file with your favorite text editor.
There, switch the encoding to UTF8.
Goto line 11 and look for strange looking symbols.
Delete/replace them.
I've found this string :
M``1#W?!E[G$0C#&/X)A#&<605QS8!W#^A;X"D%CI"D_XGT$"=&L:*_B'JI2K
M<.9H*78G0G+J(*R3KB6A%3A2QU5$#L0?[H0\,45&$K0,X`%'9YC`'L#`JA-`
M_!$&%(!2Q?J+&(EUHB&"OT*7E#DN"#)#.""/7L"Q#=[0;`8^`G]$4]06F4-7
MXOL8011Q*W2`7`Q/TI#EE2'9$=>`$2\8(.`PB#V`JC%``(CM&O.7HA*`,O06
M\.:V(1&!HU.0"_,$9RJ/%COJR*O>ZJAC'R\`J#`!$"6$:3!'0#"YI8/"DR5A
M39?`+H;[BADO4>IWR:+AH95:%!".4BJ#```"*#`````*##8``)Z_`P`A`"<"
MF(6```-:CJ:U'4AMCPU_UUX2YH'L.<<`*"#K)_`2K:CL;E:QST>62]03H)RG
ME8BCHQC!",MU1<E=_X%2/C1U++5-#(`W*DT_%,7>Z&J8X-)8]`20_85<*$^M
MHL/<^^K;[_`($"YZJMFF^2222GR_!H*+^K)<M:XX50```!L01659A5-V=>[N
M#_`?BMX<!W47_&"%N7HV#S^>EKZXD%^<ZT):F/UT,RL&.N,`8QFZTP4!#=3[
M%]L!.</+>!D2Y'(W[E(^HY"`G56O4&IX0JR7/^(Q]IMAK!K=&?QDNV$FXZNC
M#>'8DGE#>4&_S`'#S2X``\DQ!GH%&#R\]`^MQ6`+F<5;Y>C#?UT("7';^90C
MV!9!165U)R7`!;Y,I>.T%#2ZIRA>9E1P3HYP.*)F5`&J0#+J!+XPT&!R7N^J
MB#K'=#=5>D"BP]V%4``S#&`D/&PN'8",NRAOA1EYMJ4=::C#25!(TBWIQE3=
It's supposed to be mp3 files.
How to make it into the mp3 file,
and what is name of the thing like this?
Looks like uuencode'd file. Look into uudecode/uuencode.
It looks to be uuencoded or a similar encoding. Most tools should handle the similar encodin automatically. It should have a header containing the file name. You may need to add the appropriate header and trailer for the decoder to work. Encoding a file and the line before the encoded text and the line after the encoded text should work. Change the file name so you don't risk overwriting the original file.