how to convert MS word Unicode 2-byte Cyrillic to CP866 1-byte Cyrillic - utf

I am having an issue with a piece of hardware that only contains the CP866 library/code page for Cyrillic. The text that I want to display is currently in MS Word and I need to convert it to the CP866 in a text file. (I know it just keeps getting worse!)
I am aware that MS Word uses Unicode to display Cyrillic and if i am not mistaken it uses the UTF-16. So if I try to copy it to NP++, which from what I can tell only uses UTF-8, the HEX value changes.
For example HEX values for 'й': UTF-16 is 0439 but UTF-8 is d0b9 but what I need is CP866 HEX 89.
Now I wish I could use different hardware, but it is what it is. Does anyone know the best way to make this happen? Maybe a different Text Editor someone could suggest.
Thanks for the help

I think I figured it out.
Open the .doc file, go to Word Options under the main round office button. Advanced tab -> General tab -> check Confirm file format conversion on open. click ok. close that file
Reopen the .doc file. Save as, change type to Plan text (.txt), file conversion should pop up. choose Cyrillic (DOS). click OK. new pop-up about something might not display, blah blah blah... click Yes.
Close the file.
go to the file and open it in NP++. everything looks all strange because its now displaying the format based on the ANSI map... BUT, the HEX values seem (I have not completely verified) to be the correct CP866. Now I can load my hardware.
I will be working on this for another day or two. I will report back if this did not work correctly.
Take a day off and come back later. It always seems to work. Hope this helps out anyone else who maybe experiencing Similar issues.
Best!

Related

Save css files using Notepad as Encoding of Ansi or UTF-8?

I'm new at web development and css,
One simple question came into my mind...
I know how to create a css file, but I'm not sure save this Encoding file as ANSI or UTF-8 when I save the file using Notepad?
I'm not sure which one is the best choice.
I searched on the internet, but I didn't find something helpful.
I want to know which one is the best choice for saving the file as css that will not be a problem in the future.
Please take a look at the attach image
Thanks for your helps.
Aattch01, when save a file using windows Notepad
I recommend you save in UTF8
pos if in a futuri your css has characters like
áéíóúñäëïöü, and others more than ANSI would only show an error.
A site where you can find those characters for example is in a content of a pseudelement like:
.menu .spain:after{
content:"España";
}
result UTF8: "España"
result ANSI: "Espa[]a"
Also recommend you to use an editor that gives you more comforts like Atom, which is very simple to use.
Although you may be afraid of it, is a simple editor with many aids to program

SQLite database shows question marks (???) instead of these Unicode characters (தமிழ்)

I imported a CSV file containing Unicode into an SQLite database but instead of seeing the text, all that I see are question marks. Like this, "???". The encoding is UTF-8 (I've mentioned below what happened when I tried UTF-16). The SQLite manager I'm using is DB Browser for SQLite.
This is the Unicode that I typed: தமிழ்
Now, according to this answer in Stackoverflow, SQLite stores text data as Unicode. So the fact that my text is Unicode can't be the problem.
The characters I'm trying to use belong the language Tamil. I'm trying to use it with Unicode. According to Wikipedia, encoding for Tamil is called TACE16. It's a 16-bit Unicode based character encoding.
So then I set the encoding as UTF-16 when I imported the CSV file. But the file doesn't even show up in the database after importing when I do that. But it says import is successful.
Then I tried importing the CSV file with UTF-8 encoding as usual. But after importing I right clicked the row header, selected "Set Encoding" and set it to UTF-16. Now it didn't show question marks but it shows something like Chinese characters. This is what it shows now: 㼿㼿.
I tried setting TACE16 while importing. I also tried setting it manually. But it said it's either an incorrect encoding or it is not supported.
Further searching online didn't turn up anything. Could someone tell me how I can fix this issue? Basically, I want this text "தமிழ்" to show in the SQLite database after importing the CSV file which has the text.
Thank you so much. I would really appreciate your help.
I had similar issue once but in my case the problem were only on the DB software I used to visualize DB tables. Have you tried to retrieve your data from the database? Are they right when you retrieve them?
Anyways if you tell us what tools are you exactly using for doing what it is impossible to find a solution in your specific case.
OK, it turns out the issue was my csv file. I edited it in excel and I guess excel saved it using another encoding. I'm still not sure what's the exact issue but I'll just write about how I fixed it.
I opened Notepad and typed out the data separated by commas. I saved the file with the extensions csv. Here's the important thing. You have to change the encoding to Unicode. There's a drop down menu just left of the save button. Use that. Here's a link to a youtube video that shows you how.
Also, you don't need to type everything in a Notepad. It can get tedious.
Type everything out in Google Spreadsheets and export download it as a CVS file. It works. If you have to use Notepad, type the data in excel, concatenate everything in each row with using a formula, and copy paste it into a notepad. Don't forget to add a comma between each cell info using the formula in excel.

Change Word 2013 autocorrect behaviour

This question involves bending Microsoft Word 2013 to one's will.
I have been asked to help fix a problem with Word 2013's autocorrect.
We are working on a spell checker for my native language (Afrikaans), and many Afrikaans words contain a diacritical/umlaut (ë, ö, Ü, etc).
The spell checker consists of a .dic file which is basically just a text file that contains about 508 000 words, and an autocorrect list (.acl) file that is used to automatically replace text as you type.
The spell checker works very well for the most part. It replaces the text as you type, which is the desired effect. The problem is that autocorrect doesn't work with all words.
For example, if I want to type the Afrikaans word 'pêrels' (which means 'pearls'), I should only have to type 'perels' (without the ^ character on the 'e'), and autocorrect should automatically change it to the correct form.
Same with 'reën' (rain). If I type 'reen' (without the umlaut), it is supposed to automatically correct it.
However, in both of the above cases, the words remain unchanged. A red line appears under the words, and when you right-click, you can select the correct word from the pop-up autocorrect menu as shown in the image below.
As you can see, the correct form of the word is the first one in the context menu. I need autocorrect to automatically change the wrong word into the first word that appears in said menu. It should completely ignore the other menu items, and just go with the first word.
My initial instinct was to manually add the words to the *.acl file using a text editor, but the file is encrypted and not readable (I used Notepad++).
I then tried adding them inside Word's autocorrect options menu. However, Word 2013 has a maximum autocorrect memory of 64KB, and the size of the file is already at that maximum. Whenever I add more words, it bombs out and basically wipes the file contents. This doesn't seem like the most efficient strategy anyway, since I would need to manually enter hundreds, if not thousands of autocorrect cases. Ain't nobody got time for that!
What makes this even more complicated (ironically), is that there is no real "program". In other words, this isn't a C# program with source code that I can manipulate. I have the two files mentioned above, and Word's built-in options (which I have already explored). That's it. Nothing else.
I'm stuck. Does anyone have any ideas?
Is it perhaps possible for me to hack Word to increase the autocorrect memory to, let's say, 128 KB? Google hasn't turned up anything of use.
Or, is there a way to set Word to not give the autocorrect context menu, and instead default to the first matching word in the dictionary, as mentioned above?
I can probably write a batch script, C# program, or edit the registry if need be. I just need to know where to start.
Thanks for any help!
In case you are still looking for a solution, you might consider using AutoHotkey (http://www.autohotkey.com). It is a very powerful free open-source utility, and can handle substitutions similar to AutoCorrect. Whenever the built-in program features of Word and others fail to handle my needs, I use AutoHotkey. It has the added benefit of not being tied to any specific program (e.g., Word), so the substitutions can occur anywhere needed. I hope it helps you. I have used and depended on AutoHotkey for years of new Windows versions, new Office versions, and highly recommend having a look. You might even get new ideas about time-saving automation with AutoHotkey. Good luck!

How to repair unicode letters?

Someone in email sent me letters like this
IVIØR†€™
correct should be
IVIØR†€™
suppose to be
How do I represent them in their original Portuguese langauge, it got altered after being passed through HTTP GET request.
I probably will not be able to fix the site.. but maybe create a repair tool to repair these broken encoded letters? or anyone know of any repair tool? or how to do it manually by hand? Seems like nothing is lost.. just badly interpreted
What happened here is that UTF-8 got misinterpreted as ISO-8859-1; and then other kinds of mangling (the bad ISO-8859-1 string being re-UTF-8-encoded; the non-breaking space character '\xA0' being converted to regular space '\x20') seem to have happened afterward, though those may just be a result of pasting it into Stack Overflow.
Due to the subsequent mangling, there's no really good way to completely undo it, but you can largely undo it by passing it through a not-very-strict UTF-8 interpreter. For example, if I save "IVIØR†€™" as a text-file on my computer, using Notepad, with the "ANSI" (single-byte) encoding, and then I open it in Firefox and tell it to interpret it as UTF-8 (Firefox > Web Developer > Character Encoding > Unicode (UTF-8)), then it displays "IVIØR� €™". (The "�" is because of the '\xA0' having been changed to '\x20', which broke the UTF-8 encoding.)
They're probably not broken. It's just a difference between the encoding they were sent in, vs. the decoding you're viewing them in.
Figure out what encoding was originally used, and use the same one to decode it, and it should look like the original. In terms of writing a "fix-it" tool, you'd always need to know what encoding they were originally created in, which can be complicated depending on the source, and whether or not you have access to said information.

What is causing the corruption of text fields with ¿ characters?

We have a very strange problem in out application, all of a sudden we started noticing
upside down question marks being saved along with other text typed in to the fields on the screen. These upside down question marks were not originally entered by the users and it is unclear where they come from. We are using Oracle 10g with Asp.Net.
Here is an example of the issue: "140, 141) ¿ 16-Oct-07". If any one have seen this before and found a way to fix this please let me know how.
This sounds like a character encoding issue. Please check what encoding your database (tables) are set to, and what encoding the objects or strings which are passing data in the database are of. If there is a mis-match (DB in ANSI, App in UTF-8), these sorts of issues can appear.
Greg, you should check NLS_CHARACTERSET not NLS_NCHAR_CHARACTERSET settings. And I bet you it's WE8ISO8859P1 or something similar and not unicode. The problem occurs when the submitted data in unicode, which is probably UTF8, and Oracle tries to map the characters to WE8ISO8859P1 character set. It does fine for most of them but fails for high ASCII number characters, like 140.
So yes, I have seen the same issue in our application and in our case it was caused by special quote marks (“example”, ‘example’) that were copied from MS Word. Word automatically converts double quotes to some other quotes. The solution was to convert the database to UTF-8.
IF your users are copying from MS Word you can turn the feature off . Its part of the autocorrect/autoformat functionality. If you uncheck the replace options for quotes and apostrophes you should be ok. Be sure turn off the replacements in both the AutoFormat and AutoFormat as you type.

Resources