CSV file (with special characters) upload encoding issue

CSV file (with special characters) upload encoding issue - servlets

I am trying to upload a CSV file that has special characters using ServletFileUpload of apache common. But the special characters present in the CSV are being stored as junk characters in the database. The special characters I have are Trademark, registered etc. Following is the code snippet.
ServletFileUpload upload = new ServletFileUpload();
FileItemIterator iter = upload.getItemIterator(request);
while (iter.hasNext()) {
FileItemStream item = iter.next();
String name = item.getFieldName();
InputStream stream = item.openStream();
if (item.isFormField()) {
System.out.println("Form field " + name + " with value "
+ Streams.asString(stream, "UTF-8") + " detected.");
}
}
I have tried reading it using BufferendReader, used request.setCharacterEncoding("UTF-8"), tried upload.setHeaderEncoding("UTF-8") and also checked with IOUtils.copy() method, but none of them worked.
Please advice how to get rid of this issue and where it needs to be addressed? Is there anything I need to do beyond servlet code?
Thanks

What database are using? What character set is database using? Characters can be malformed in the database rather than in Java code.

Related

Garbage characters after update of web.config using c# on China (Traditional) machine

I am updating web.config file of Asp.net mvc dynamically while installation using installshiled script.
It works correctly on all machines; however it generates ??? charaters on Chinese machine at the start of web.config file like below.
???<?xml version="1.0" encoding="utf-8"?>
Please suggest how this problem can be please.
Below is the Installshield code
Using installscript I am finding connection string place holder and replacing that with connection string generated while installation.
szIniFile = INSTALLDIR^"AppDir\\Web.config";
szSearchStr = "[COONECTIONSTRING]";
FindAndReplaceInFile(szIniFile, szSearchStr,strWebConString);
function FindAndReplaceInFile(szFile, szSearchStr,szReplaceStr)
STRING szReturnLine,szString, szSecPart,szFirstPart,svString,szArchive;
NUMBER nResult,nSubPos,nSearchStrLen,nLineNumber;
begin
nSearchStrLen = StrLength(szSearchStr);
nResult=FileGrep (szFile, szSearchStr, szReturnLine, nLineNumber,
RESTART) ;
NumToStr ( svString, nResult );
while (nResult=0)
nSubPos = StrFind(szReturnLine, szSearchStr); //get position of szSearchStr
StrSub (szFirstPart, szReturnLine, 0, nSubPos);
StrSub (szSecPart, szReturnLine, nSubPos+nSearchStrLen, StrLength(szReturnLine));
szString="";
szString = szFirstPart+szReplaceStr+szSecPart;
FileInsertLine (szFile, szString, nLineNumber, REPLACE);
nLineNumber = nLineNumber + 1;
nResult=FileGrep (szFile, szSearchStr, szReturnLine, nLineNumber,CONTINUE) ;
endwhile;
end;

You had a byte order marker (BOM) at the start of the file.
I suspect that what has happened is you opened a UTF8 encoded file in as a different encoding. This misread the unnecessary BOM and corrupted it. When you saved it, the unknown character markers replaced the BOM.
To rectify this, you need to encode your config as UTF8 without a BOM. Edits should then be safe, unless you have other characters outside the ASCII range in your file.

python creation json with plone object

I've a json file with plone objects and there is one field of the objects giving me an error:
UnicodeDecodeError('ascii', '{"id":"aluminio-prata", "nome":"ALUM\xc3\x8dNIO PRATA", "num_demaos":0, "rendimento": 0.0, "unidade":"litros", "url":"", "particular":[], "profissional":[], "unidades":[]},', 36, 37, 'ordinal not in range(128)') (Also, the following error occurred while attempting to render the standard error message, please see the event log for full details: 'NoneType' object has no attribute 'getMethodAliases')
I already know witch field is, is the "title" from title = obj.pretty_title_or_id(), when I remove it from here its ok:
json += '{"id":"' + str(id) + '", "nome":"' + title + '", "num_demaos":' + str(num_demaos) + ', "rendimento": ' + str(rendimento) + ', "unidade":"' + str(unidade) + '", "url":"' + link_produto + '", "particular":' + arr_area_particular + ', "profissional":' + arr_area_profissional + ', "unidades":' + json_qtd + '},
but when I leave it I've got this error.
UnicodeDecodeError('ascii', '{"id":"aluminio-prata", "nome":"ALUM\xc3\x8dNIO PRATA", "num_demaos":0, "rendimento": 0.0, "unidade":"litros", "url":"", "particular":[], "profissional":[], "unidades":[]},', 36, 37, 'ordinal not in range(128)') (Also, the following error occurred while attempting to render the standard error message, please see the event log for full details: 'NoneType' object has no attribute 'getMethodAliases')

I'm going to assume that the error occurs when you're reading the JSON file.
Internally, Plone uses Python Unicode strings for nearly everything. If you read a string from a file, it will need to be decoded into Unicode before Plone can use it. If you give no instructions otherwise, Python will assume that the string was encoded as ASCII, and will attempt its Unicode conversion on that basis. It would be similar to writing:
unicode("ALUM\xc3\x8dNIO PRATA")
which will produce the same kind of error.
In fact, the string you're using was evidently encoded with the UTF-8 character set. That's evident from the "\xc3", and it also makes sense, because that's the character set Plone uses when it sends data to the outside world.
So, how do you fix this? You have to specify the character set that you wish to use when you convert to Unicode:
"ALUM\xc3\x8dNIO PRATA".decode('UTF8')
This gives you a Python Unicode string with no error.
So, after you've read your JSON file into a string (let's call it mystring), you will need to explicitly decode it by using mystring.decode('UTF8'). unicode(mystring, 'UTF8') is another form of the same operation.

As Steve already wrote do title.decode('utf8')
An Example illustrate the facts:
>>> u"Ä" == u"\xc4"
True # the native unicode char and escaped versions are the same
>>> "Ä" == u"\xc4"
False # the native unicode char is '\xc3\x84' in latin1
>>> "Ä".decode('utf8') == u"\xc4"
True # one can decode the string to get unicode
>>> "Ä" == "\xc4"
False # the native character and the escaped string are
# of course not equal ('\xc3\x84' != '\xc4').
I find this Thread very helpfull for Problems and Understanding with Encode/Decode of UTF-8.

Looping through XMLReader to replace special characters in data field

I have XML files I want to put into data sets to export to a database using VB.Net. There is a possibility that new XML files added to this list daily will have special characters (idk why anyone would include "&" in an address entry anyway). After creating the XMLReader, what is the easiest way to replace the escape characters? What would the pseudo code look like? Stream Reader maybe? Or does that work with XMLReader?
Here is my code right now that attempts the data set creation:
For Each file1 In Directory.GetFiles(My.Settings.Local_Meter_Path, "*BadMeter*.xml")
Dim filecreatedate As String = IO.File.GetLastWriteTime(file1)
FN = Path.GetFileName(file1).ToString()
xmlFile = XmlReader.Create(Path.Combine(My.Settings.Local_Meter_Path, FN), New XmlReaderSettings())
ds.ReadXml(xmlFile)
and the spot where I'm getting ampersand entity-name parsing error
<Cell ss:StyleID="Default"><Data ss:Type="String">1440 COUNTY ROAD 40 X-MAS LIGHT & RV #2 CAMP HILL</Data></Cell>

issue related to space after link

I am generating a link using below code
string EncryptPath = Common.Encrypt(Path);
string SourceLinkPath= string.Empty;
if (File.Exists(Server.MapPath("Image.txt")))
{
SourceLinkPath = System.IO.File.ReadAllText(Server.MapPath ("Image.txt"));
}
string link2 = SourceLinkPath + EncryptPath;
TxtPathLink2.Text = link2;
the link is generating but it is giving space after sourcepath. OUTPUT like
http://18.10.10.11/test/View.aspx?Value=
67534ERT
i want to generate like http://18.10.10.11/test/View.aspx?Value=67534ERT
How can i generate link in one line

The .txt file probably has a whitespace you are missing.
Change System.IO.File.ReadAllText(Server.MapPath ("Image.txt"))
To:
System.IO.File.ReadAllText(Server.MapPath("Image.txt")).Trim()
String.Trim() removes all leading and trailing white-space characters from the String object.

Newtonsoft.JSON error with text with HTML encoded character

I allow user input from TinyMCE on client and store it as a JSON string, then pass it to server ASP.NET C#.
The JSON String looks like this: { "mcfn2" : ";lt;p;gt;Trước đ& oacute;, việc tung ra t& ecirc;n miền lần đầu ti& ecirc;n được sự đồng & yacute; của ICANN - tổ chức quản l& yacute; t& ecirc;n miền quốc tế" } (JSON string contains Vietnamese accent)
But when process on server, I received error "Unterminated string. Expected delimiter: ". Line 1, position ...." (It looks like the error happened because of đ& oacute;). (In this page, I seperate & with character after it by a space, because it will automacally converted to a Vietnamese if there are no space)
There are no error if user input is English text (no Vietnamese accent).
Please guide me how to fix this error.

I know at this time this will probably not be useful for you, but maybe it can help another person.
You should convert your string to UTF8 to deal with accents (Vietnamese and many other languages) before serializing it to JSON. For that you can use this function:
private string ConvertToUtf8(string textOriginal)
{
if (!string.IsNullOrEmpty(textOriginal))
{
byte[] bytes = Encoding.Default.GetBytes(textOriginal);
return Encoding.UTF8.GetString(bytes);
}
return string.Empty;
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

CSV file (with special characters) upload encoding issue - servlets

What database are using? What character set is database using? Characters can be malformed in the database rather than in Java code.

Related

Garbage characters after update of web.config using c# on China (Traditional) machine

python creation json with plone object

Looping through XMLReader to replace special characters in data field

issue related to space after link

Newtonsoft.JSON error with text with HTML encoded character

Categories

Resources