language translation in UTF-8 or unicode - utf

In my Chinese properties file I have a code like :
FORGOT_YOUR_PASSWORD_TITLE =\u5fd8\u8bb0 ID/\u5bc6\u7801
I have tried with native to ASCII of java software
Now I have to modify it. I am not able to understand whether it is Unicode or other. I have used native to ascii & other tools to find it out. But I did not find. Can any one help me to understand which type of code is this??

It is unicode. I'm assuming windows - different ball game if it is linux.
Start the charmap utility that comes with windows.
Change to font to Arial Unicode MS.
Select Advanced View.
Type in each of the \u codes and click select.
You will end up with something like
忘记密码
Copy that and paste in Google translate. Change the other side to English - it will say Forgot Password.

Related

VSCode : mvbasic extension on editing Unidata code with MV marks in code, ie CHAR(253), CHAR(254)

I have searched for a setting within the mvbasic extension within VSCode but I may have hit a dead end. I am new to using VSCode with the rocket mvbasic extension and still in the learning process, so please bear with me.
Our development for the most part has always been directly on the server using the editor within it to code and develop on a Unix/Aix platform with Unidata. Some of our code has array assignments with CHAR(253)/CHAR(254) characters within them. See the link to the image that shows how its done. Now I didn't do this code, the original software developer did this many many years ago and we just aren't going to go and change it all.
How code looks on actual server
The issue is when pulling the code to edit in VSCode, the extension is changing it, and I uploaded it back and didn't pay attention and it was implemented in our production incorrectly, which created a few bugs.
ALIST="H�V�P�R�M�D"
How code looks in VSCode
How code looks after uploaded back to server from VSCode
Easy to fix, no biggie, but now to my question.
Does anyone have this issue, or has a direction to point me into that maybe I need to create a setting to keep the characters in the correct ASCII format so that this doesn't happen again by mistake?
VSCode defaults to the sane choice for character encoding in 2022: utf-8, but sometimes you have to deal with legacy stuff.
https://code.visualstudio.com/docs/editor/codebasics#_file-encoding-support
If you click on the UTF-8 in the bottom right corner you can choose "Reopen with Encoding":
After that, you can select a different encoding. I chose DOS (CP437) at a guess and literal MV characters are displayed as superscript 2 (²), and for me I can save to the server and confirm those characters remain as #VM after a round trip (though for my terminal emulator they appear as } which is useful).
You can edit preferences and set "files.encoding": "cp437". One other thing that can be helpful if your programs don't have a standard extension (like .bas) as most don't is to set the default mode to basic so most of what you're editing will identify as MVbasic, and you can do a quick CTRL-K M to switch to any other modes if you're just pasting in something else like SQL.
Some useful links - the Rocket forums are helpful and the folks there are always super nice
https://community.rocketsoftware.com/forums/multivalue?CommunityKey=521bce2e-71d5-4d32-b560-dfa95e950eb5
The MV Extensions Community extension is a good group and always has been helpful when I've had issues. I've made some small contributions - they're very open. I prefer this extension, but honestly haven't done a deep comparison.
https://github.com/mvextensions

How to display foreign characters?

I just downloaded brackets after hearing that it will probably be the next big editor. I know it is still in beta, but does anyone know how to display foreign characters in this editor? When I try to display a foreign character, this symbol � is displayed.
Can anyone help?
Brackets currently only supports UTF-8 encoded files.
There's an item in the feature backlog to support other encodings, so you should upvote it if that's an important use case for you.
If you're pretty certain that the file you're working with is UTF-8 encoded (bearing in mind that it's tricky to tell for sure), then it sounds like you've hit a bug. As a Brackets core contributor, I definitely encourage you to file it in our issue tracker — and please include a link to the file in question if at all possible.

how to convert MS word Unicode 2-byte Cyrillic to CP866 1-byte Cyrillic

I am having an issue with a piece of hardware that only contains the CP866 library/code page for Cyrillic. The text that I want to display is currently in MS Word and I need to convert it to the CP866 in a text file. (I know it just keeps getting worse!)
I am aware that MS Word uses Unicode to display Cyrillic and if i am not mistaken it uses the UTF-16. So if I try to copy it to NP++, which from what I can tell only uses UTF-8, the HEX value changes.
For example HEX values for 'й': UTF-16 is 0439 but UTF-8 is d0b9 but what I need is CP866 HEX 89.
Now I wish I could use different hardware, but it is what it is. Does anyone know the best way to make this happen? Maybe a different Text Editor someone could suggest.
Thanks for the help
I think I figured it out.
Open the .doc file, go to Word Options under the main round office button. Advanced tab -> General tab -> check Confirm file format conversion on open. click ok. close that file
Reopen the .doc file. Save as, change type to Plan text (.txt), file conversion should pop up. choose Cyrillic (DOS). click OK. new pop-up about something might not display, blah blah blah... click Yes.
Close the file.
go to the file and open it in NP++. everything looks all strange because its now displaying the format based on the ANSI map... BUT, the HEX values seem (I have not completely verified) to be the correct CP866. Now I can load my hardware.
I will be working on this for another day or two. I will report back if this did not work correctly.
Take a day off and come back later. It always seems to work. Hope this helps out anyone else who maybe experiencing Similar issues.
Best!

Warning when validating my website with http://validator.w3.org?

I created a simple test page on my website www.xaisoft.com and it had no errors, but it came back with the following warning and I am not sure what it means.
The Unicode Byte-Order Mark (BOM) in UTF-8 encoded files is known to cause problems for some text editors and older browsers. You may want to consider avoiding its use until it is better supported.
To find out what the BOM is, you can take a look at the Unicode FAQ (quoting) :
Q: What is a BOM?
A: A byte order mark (BOM) consists of
the character code U+FEFF at the
beginning of a data stream, where it
can be used as a signature defining
the byte order and encoding form,
primarily of unmarked plaintext files.
Under some higher level protocols, use
of a BOM may be mandatory (or
prohibited) in the Unicode data stream
defined in that protocol.
Depending on your editor, you might find an option in the preferences to indicate it should save unicode documents without a BOM... or change editor ^^
Some text editors - notably Notepad - put an extra character at the front of the text file to indicate that it's Unicode and what byte-order it is in. You don't expect Notepad to do this sort of thing, and you don't see it when you edit with Notepad. You need to open the file and explicitly resave it as ANSI. If you're using fancy characters like smart quotes, trademark symbols, circle-r, or that sort of thing, don't. Use the HTML entities instead.

Adobe Flex fails on unicode / foreign input in Linux

I was learning flex for a few days now and suddenly noticed that input of unicode / foreign characters on Linux into TextInput, TextArea or RichTextEditor gives you unreadable text composed of several characters (seems like utf-8 is making things bad). On the other hand, output is flawless.
I was trying hard to find anything for the same issue on the internet, but only this old blog entry could be seen. Author produced temporary solution but it is not sufficient.
So if Windows allows unicode and Linux doesn't, what should I do? Maybe the problem is on my machine only? Did anybody come up with the same problem and maybe the solution?
I have Adobe Flash 10.0.32.18 installed on my Sabayon Linux box.
Might have something to do with this bug:
Incorrect unicode input in linux
Which, apparently, will get fixed once FP 10.1 is released.
Just to further update the answer. Flex 4 components support unicode and the unicode characters can be typed into input controls using Google Chrome, Firefox 3.6+ and IE7+ .
For Java MySQL users
database.url=jdbc:mysql://localhost:3306/sampledb?useUnicode=true&characterEncoding=utf-8
To allow utf8 data-write operations.
Database table and columns must be set to utf8_* encoding to make sure the unicode data can be stored in the tables.

Resources