How to protect special characters in UNIX files - unix

I moved a text file from windows to unix. The content in windows file had some special characters like ®,ä which I needed. However after moving it to linux, all my special characters where prepended by Ã. For example if the string was äbcd# it was converted to ÃabcdÃ#. Also some special characters were totally replaced by either - or `. Please let me know how can I protect my special characters from being modified or corrupted.
Update1:
I tried using binary transfer in WinScp. I am still getting the same problem.
Update2:
I tried using dos2unix. It also dint work either.

The problem is caused by the fact that Windows and Unix use different text encoding. Your file on Windows is probably in an ANSI encoding (not ASCII). Unix (Linux?) expects most likely UTF-8.
In notepad save your file in UTF-8 format. Then run the file through dos2unix to fix the line breaks.

Related

Chinese character encoding with differenct operation systems/languages

I am having trouble read my csv file containing simplified Chinese character into my r. I have tried the encoding=utf-8,gb18130,gb2130 etc. The Chinese character could be not displayed.
I also tried change the encoding by excel to utf8 csv, no luck.
I also
tried using Chinese windows and set the locale to China. No luck.
After I change to Chinese windows. The excel can open my csv (English
windows cannot open it correctly). The r studio can open it in the
View() but the R console console could not read my csv even if I
reinstall the r as Chinese version.
I tried the Ubuntu, Ubuntu could not even read my csv at all. At least in Windows, the R studio can read my data well.
I tried google sheet. But my file is so big that Google sheet would
not even open it
I tired Cals in Ubuntu and convert it GB* since GB is
working fine in Windows R studio. No luck. And it takes more than 10
minutes to convert my 200Mb-750Mb data to gb18013
The Ubuntu use UTF-8 as default Chinese Encoding. So you should encode it as UTF-8 instead GB18130 or other GB starting encoding.
(1) Download Open Office (free and fast to install, have have higher
file size than Cals in Ubuntu).
(2) Detect your CSV encoding. Simply open your csv using Open office and choose an encoding method that display your Chinese character.
(3) Save your csv to the correct encoding according to your
operation system. Default Windows encoding is GBK for Chinese and Ubuntu is UTF8.
This should solve your file size problem and encoding problem. You do not even have to force the encoding. Normal read.csv would work.

Altering an exe file (deleting or changing 1 byte) and keep it executable

I have an exe file (Windows environment) which runs OK on Windows 10, but eventually it produces an error ("Can't find folder %USERPROFILE%\Local Settings\folder\\Tempfolder"). Needless to say, the folder does exist. I looked into the exe file with a hex editor and I found out that somewhere there is (hardcoded) the URL of this specific folder, in cleartext. It looked the same as the error message:
"%USERPROFILE%\Local Settings\folder\\Tempfolder".
I wonder if there is a mistake in the URL -- the double backslash before the last subfolder. As you can see from the example above, the double backslash appears only at the last subfolder, so I don't think that it is interpreted as an escape character.
So I tried deleting the extra backslash with the HEX editor, save and execute, but when I did so Windows 10 stopped being able to execute it, it said something like "This cannot be recognized as a Windows application".
Why does this hapen?
Is there a way to do this without messing up the executable?
(Note: the URL above is an example, I won't write the real one because the exe is actually a dubious tool of the cracking genre)
STOP!
DON'T USE THIS FOR WHAT THE ASKER IS DOING. IT MIGHT MESS STUFF UP EVEN MORE.
Add a 0x00 byte on the end of the string. :)
And also update the checksum.
If it's signed, you're outta luck.
If you don't wanna update the checksum, use CFF Explorer.

Diacritics in a Pascal console APP?

I print messages with diacritics in my console application. I tried to set multiple encoding commonly used for my language (CZECH) but non of them is giving me the desired result. I tried UTF-8, Windows(CP1250), ISO 8859-2...
Is there a way how to force console to use some specific encoding?
Or at least where can I find which encoding does my console use?
Thanks in advance.
EDIT: Using Windows 7 - basic command line console ( cmd.exe )
To display the current codepage in cmd.exe:
chcp
To change the current codepage, e.g., to CP-1250:
chcp 1250
By default, the Windows console uses the OEM encoding. There are three encodings for APIs in Windows OEM, ANSI and Unicode. CMD.exe when normally executed uses OEM.
UTF8 seems to be possible, but needs
starting the console with "cmd /u" (create a shortcut)
setting the codepage to chcp 65001
choosing a unicode capable font (e.g. Consolas 20) in the settings of the shortcut

Boost.asio HTTP Client issues with \r vs \n

I'm trying to use the HTTP client example code (sync_cleint.cpp) to retrieve a jpg. I'm using VS2008 on a Windows XP machine and trying to access a Linux server. The resulting data I get back does not start the new lines at the same place as data I retrieve by simply downloading. I believe this is an issue with how windows interprets a new line, \r\n, versus how unix/linux interprets a new line, \n. The example code (http://www.boost.org/doc/libs/1_48_0/doc/html/boost_asio/example/http/client/sync_client.cpp) uses \r\n, \n, and \r, so I'm a little confused on how to rectify the problem. Any suggestions to correct this for my case (hard code) or automatically detect would be greatly appreciated.
P.S. I'm using boost 1.48.0

Ctrl-M chars when transfer files SFTP

I am sending files from a windows system to a Unix SFTP server using JSCAPE ftp client.
However, I am experiencing the following issue:
When uploading a text file from windows to UNiX, each line of text files transferred contains Control-M characters. I did some search and found out that If I use the "ASCII" transfer mode it should solve the issue. But the Ctrl-M is still appearing on the files.
Can anyone throw some light in this issue?
thanks in advance
FTP supports switching between Binary and ASCII transfer mode and converting data on the fly but SFTP does not support that feature and it always transfers files unchanged (at least for the most popular version 3 of the protocol).
The utility dos2unix can be used to convert files from DOS to Unix.
That's the newline character from windows files showing up on UNIX system.
Convert the line endings prior to uploading or find a different FTP server package that can do it for you.
Some text editors have this functionality built in. For instance, Notepad++
Do you have cygwin? You can use the dos2unix utility.

Resources