My access table look:
ban#test.local REJECT
test#test.local OK
The first line does not working, but, if i try this:
# Some comment or another access rule
ban#test.local REJECT
test#test.local OK
Line "ban#test.local REJECT" successfully rejecting address
I.e. postfix first line skips, always
PS: main.cf:
check_sender_access hash:/path/to/access_table
So, the problem is character encoding when saving utf8 files with "BOM" in bdb
Curve file encoding: Unicode text, UTF-8 (with BOM) text
Normal file encoding: Unicode text, UTF-8 text
At the beginning of the line, characters were inserted: \ef\bb\ then, the address was perceived incorrectly.
If you open the bearkley db file, you can immediately see the characters of the BOM encoding (\ef\bb):
$ db_dump -p bad_access_table.db
VERSION=3
format=print
type=hash
h_nelem=4097
db_pagesize=4096
HEADER=END
\ef\bb\bfban#test.local\00
REJECT\00
test#test.local\00
OK\00
DATA=END
Saving the file in UTF-8 as a solution!
Related
I have an issue with an ldap entry. I try to create a dn such as :
dn: ou=élèves,ou=1A,ou=Classes,ou=Personnes,dc=ldap,dc=ecoleplurielle,dc=local
as I have utf-8 characters in ou=élèves I translate this value in base-64 and add an extra colon after the dn, which gives me :
dn::b3U9w6lsw6h2ZXMsb3U9MlNBLG91PUNsYXNzZXMsb3U9UGVyc29ubmVzLGRjPWxkYXAsZGM9ZWNvbGVwbHVyaWVsbGUsZGM9bG9jYWw=
The thing is when I use ldapadd with this entry, the command seems to auto generate comments and in this autogenerated comment, utf-8 characters a wrongly represented.
Let's see in details:
My ldapsearch result gives me this. You can see that the third comment starts by \C3\A9 and \C3\A8 which are hex values for utf-8 letters é and è.
On this image you can see the ldif used to populate ldap.
The weird thing is I do not write comments in the ldif file. The buggy line seems to appear on its own. You'd say it doesn't matter as it's just a comment but it makes phpLDAPadmin crash...
I already tried to convert the ldif in utf-8 using iconv.
Do someone know how to prevent this comment from being generated? Is there something I miss here?
You can disable comments in the ldif output of ldapsearch using the -L option :
Search results are display in LDAP Data Interchange Format detailed in
ldif(5). A single -L restricts the output to LDIFv1. A second -L
disables comments. A third -L disables printing of the LDIF version.
The default is to use an extended version of LDIF.
ldapsearch -LL [options]
Note that instead of turning whole dn string into base64, you could write accented characters as printable ASCII by escaping the hex pair given by their UTF-8 encoding, as specified by RFC 4514 :
Unicode Letter Description UCS code UTF-8 Escaped
------------------------------- -------- ------ --------
Latin Small Letter E with Acute U+00E9 0xC3A9 \C3\A9
Latin Small Letter E with Grave U+00E8 0xC3A8 \C3\A8
Which indeed turns the dn into :
dn: ou=\C3\A9l\C3\A8ves,ou=1A,ou=Classes,ou=Personnes,dc=ldap,dc=ecoleplurielle,dc=local
It would be interesting to check whether phpLDAPadmin has a problem with this encoding, or if the crash was caused by the base64 encoded dn or something else (I would be glad to have your feedback!).
[Edit] - It seems related to this issue.
I am using an R script to create and append a file. But I need the file to be saved in ANSI encoding,even though some characters are in Unicode format. How to ensure ANSI encoding?
newfile='\home\user\abc.ttl'
file.create(newfile)
text3 <- readLines('\home\user\init.ttl')
sprintf('readlines %d',length(text3))
for(k in 1:length(text3))
{
cat(text3[[k]],file=newfile,sep="\n",append=TRUE)
}
Encoding can be tricky, since you need to detect your encoding upon input, and then you need to convert it before writing. Here it sounds like your input file input.ttl is encoded as UTF-8, and you need it converted to ASCII. This means you are probably going to lose some non-translatable characters, since there may be no mapping from the UTF-8 characters to ASCII outside of the 128-bit lower range. (Within this range the mappings of UTF-8 to ASCII are the same.)
So here is how to do it. You will have to modify your code accordingly to test since you did not supply the elements needed for a reproducible example.
Make sure that your input file is actually UTF-8 and that you are reading it as UTF-8. You can do this by adding encoding = "UTF-8" to the third line of your code, as an argument to readLines(). Note that you may not be able to set the system locale to UTF-8 on a Windows platform, but the file will still be read as UTF-8, even though extended characters may not display properly.
Use iconv() to convert the text from UTF-8 to ASCII. iconv() is vectorised so it works on the whole set of text. You can do this using
text3 <- iconv(text3, "UTF-8", "ASCII", sub = "")
Note here that the sub = "" argument prevents the default behaviour of converting the entire character element to NA if it encounters any untranslatable characters. (These include the seemingly innocent but actually subtly evil things such as "smart quotes".)
Now when you write the file using cat() the output should be ASCII.
I was trying to find a solution for my problem and after looking at the forums I couldn't so I'll explain my problem here.
We receive a csv file from a client with some special characters and encoded as unknown-8bit. We convert this csv file to xml using an awk script. With the xml file we make an API call to our system using utf-8 as default encoding. The response is an error with following information:
org.apache.xerces.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence
The content of the file is as bellow:
151215901579-109617744500,sandra,sandra,Coesfeld,,Coesfeld,48653,DE,1,2.30,ASTRA 16V CAVALIER CALIBRA TURBO BLUE 10,53.82,GB,,.80,3,ASTRA 16V CAVALIER CALIBRA TURBO BLUE 10MM 4CORE IGNITION HT LEADS WIRES MLR.CR,,sandra#online.de,parcel1,Invalid Request,,%004865315500320004648880276,INTL,%004865315500320004648880276,1,INTL,DPD,180380,INTL,2.30,Send A2B Ltd,4th Floor,200 Gray’s Inn Road,LONDON,,WC1X8XZ,GBR,
I think the problem is in the field "200 Gray’s Inn Road" cause when I use utf-8 encoding it automatically converts "'" character by a x92 value.
Does anybody know how can I handle this?
Thanks in advance,
Sandra
Find out the actual encoding first, best would be asking the sender.
If you cannot do so, and also for sanity-checking, the unix command file is very useful for that (the linked page shows more options).
Next step, convert to UTF-8.
As it is obviously an ASCII-based encoding, you could just discard all non-ASCII or replace them on encoding, if that loss is acceptable.
As an alternative, open it in the editor of your choice and flip the encoding used for interpreting the data until you get something useful. My guess is you'll have either Latin-1 or Windows-1252, but check it for yourself.
Last step, do what you wanted to do, in comforting knowledge that you now have valid UTF-8.
Obviously, don't pretend it's UTF-8 if it isn't. Find out what the encoding is, or replace all non-ASCII characters with the UTF-8 REPLACEMENT CHARACTER sequence 0xEF 0xBF 0xBD.
Since you are able to view this particular sample just fine, you apparently already know which encoding it is (even if you don't know that you know -- it would be whatever your current set-up is using) -- I would guess Windows-1252 which uses 0x92 for a curvy right single quote.
I have a file in notepad, saves ad Ansi encoding with two URLs:
http://www.odinklik.ru/site.aspx?site=korney_chukovsky
http://www.odinklik.ru/site.aspx?site=korney_chukovsky
As you can see from the paste - one of them is looking "weird".
It is changed to:
"http://www.odinklik.ru/site%E2%80%8B.aspx?%E2%80%8Bsite=korney_%E2%80%8Bchukovsky"
When i copy it in the browser. What is happening here?
The code E2 80 8B is the UTF-8 code for the character Zero Width Space, but this is not a character that exists in an ANSI character set.
Either you have some other obscure spacing character that is translated into a Zero Width Space when you copy the text, or the file is actually not saved as ANSI after all.
I've got a program that in a nutshell reads values from a SQL database and writes them to a tab-delimited text file.
The issue is that some of the values in the database have special characters (TM, dash, ellipsis, etc.) When written to the text file, the formatting is lost and they come across as junk "™ or – etc"
When the value is viewed in the immediate window, before it is written to the txt file, everything looks fine. My guess is that this is an issue of encoding. But, I'm not real sure how to proceed, where to look, or what to look for.
Is this ASCII or UTF-8? If it's one of those how do I correct it before it's written to the text file.
Here's how I build the text file (where feedStr is a StringBuilder)
objReader = New StreamWriter(filePath)
objReader.Write(feedStr)
objReader.Close()
The default encoding for StreamWriter is UTF8 (with no byte order mark). Your result file is ok, the question is what do you open it in afterwards? If you open it in a UTF8 capable text editor, the characters should look the way you want.
You can also write the text file in another encoding, for example iso-8859-1 (latin1)
objReader = New StreamWriter(filePath, false, Encoding.GetEncoding("iso-8859-1"))