Get a unicode string from a QString in python - qt

I am trying to get a unicode string out of a QString with PySide:
In [63]: qs = QString("órgão")
In [64]: qs
Out[64]: PyQt4.QtCore.QString(u'\xc3\xb3rg\xc3\xa3o')
In [65]: print(unic)
unichr unicode
In [65]: print(unicode(qs))
órgão
But it looks the string comes out different then the original. Why?

I have asked a stupid question again, it should be
QString(u"órgão")
then everything goes ok.

Related

How to convert emojis from to unicode in Xamarin.forms?

I have Xamarin.Forms project. I have textbox in that and have a button which get text from textbox and pass it to API to store. Now the point is when user select any emojis from keyboard, I want to get unicode character of the emojis. Currently I am getting emojis it self when I check Text property of it.
I want to get Unicode rather emoji as given in NewTextValue from Text property.
This post is same but I don't understand how the guy has managed. POST
Please suggest.
After some google, I have tried with following.
string res = BitConverter.ToString(Encoding.BigEndianUnicode.GetBytes(str)).Replace("-", "");
This is result res = D83DDE00
I don't know above code is unicode or not.
How can I convert back to original emoji or is there any other way to convert in unicode?
We need to manually convert it back. Insert "-" every two characters:
var convertStr = string.Join("-", Regex.Matches(res, #"..").Cast<Match>().ToList());
String[] tempArr = convertStr.Split('-');
byte[] decBytes = new byte[tempArr.Length];
for (int i = 0; i < tempArr.Length; i++)
{
decBytes[i] = Convert.ToByte(tempArr[i], 16);
}
String str = Encoding.BigEndianUnicode.GetString(decBytes);
Moreover in my test, Encoding.UTF32.GetBytes() may be closer to emoji code. You can test it with \U0001F600, this is a smile image. After converting with utf32, the bytes just change its order.

How do i decode this string? \xc3\x99\xc3\xa9\xc2\x87-B[x\xc2

This is what I need to decode
\xc3\x99\xc3\x99\xc3\xa9\xc2\x87-B[x\xc2\x99\xc2\xbe\xc3\xa6\x14Ez\xc2\xab
it is generated by String.fromCharCode(arrayPw[i]);
but i don't understand how to decode it :(
Please help
Python:
data = "\xc3\x99\xc3\x99\xc3\xa9\xc2\x87-B[x\xc2\x99\xc2\xbe\xc3\xa6\x14Ez\xc2\xab"
udata = data.decode("utf-8")
asciidata = udata.encode("ascii","ignore")
JavaScript:
function decode_utf8(s) {
return decodeURIComponent(escape(s));
}
Otherwise do more research about decoding UTF-8.
https://gist.github.com/chrisveness/bcb00eb717e6382c5608
There's also an online UTF-8 decoder/encoder:
https://mothereff.in/utf-8
HINT: ÙÙé-B[x¾æEz«
duplicate of this : https://stackoverflow.com/a/70815136/5902698
You load a dataset and you have some strange characters.
Exemple :
'戴森美å�‘é€\xa0型器完整版套装Dyson Airwrap
HS01(铜金色礼盒版)'
In my case, I know that the strange characters are chineses. So I can figure that the one who send me the data have encode it in utf-8 but should do it in 'ISO-8859-1'.
So first step, I had encoded the string, then I decode with utf-8.
so my lines are :
_encoding = 'ISO-8859-1'
_my_str.encode(_encoding, 'ignore').decode("utf-8", 'ignore')
Then my output is :
"'森Dyson Airwrap HS01礼'"
This works for me, but I guess that I do not really well understood under the hood. So feel free to tell me if you have further information.
Bonus. I'll try to detect when the str is in the first strange format because some of my entries are in chinese but others are in english
EDIT : The Bonus is useless. I Just use lamba on ma column to encode and decode without care about format. So I changed the encoding after loading the dataframe
_encoding = 'ISO-8859-1'
_decoding = "utf-8"
df[col] = df[col].apply(lambda x : x.encode(_encoding, 'ignore').decode(_decoding , 'ignore'))

Qt, QUrl, QUrlQuery: Encoding special character in a query string

I create a URL query like this:
QString normalize(QString text)
{
text.replace("%", "%25");
text.replace("#", "%40");
text.replace("‘", "%27");
text.replace("&", "%26");
text.replace("“", "%22");
text.replace("’", "%27");
text.replace(",", "%2C");
text.replace(" ", "%20");
return text;
}
QString key = "usermail";
QString value = "aemail#gmail.com";
QUrlQuery qurlqr;
qurlqr.addQueryItem(normalize(key), normalize(value));
QString result = qurlqr.toString();
The result that's be expecting is :
usermail=aemail%40gmail.com.
But I received:
usermail=aemail#gmail.com
I don't know why. Can you help me?
(I'm using Qt5 on Win7)
QUrlQuery's toString by default decodes the percent encoding. If you want the encoded version try:
qurlqr.toString(QUrl::FullyEncoded)
Also you don't need to manually encode the string by replacing characters; you could instead use QUrl::toEncoded() (I suggest you read the QUrlQuery documentation).

Replace part of string

I'm having difficulty with the following.
In VB.Net, I have the following line:
Dim intWidgetID As Integer = CType(Replace(strWidget, "portlet_", ""), Integer)
where strWidget = portlet_n
where n can be any whole number, i.e.
portlet_5
I am trying to convert this code to C#, but I keep getting errors, I currently have this:
intTabID = Convert.ToInt32(Strings.Replace(strTabID, "tab_group_", ""));
which I got using an online converter
But it doesn't like Strings
So my question is, how to I replace part of a string, so intTabID becomes 5 based on this example?
I've done a search for this, and found this link:
C# Replace part of a string
Can this not be done without regular expressions in c#, so basically, I'm trying to produce code as similar as possible to the original vb.net example code.
It should be like this strTabID.Replace("tab_group_", string.Empty);
int intTabID = 0;
string value = strTabID.Replace("tab_group_", string.Empty);
int.TryParse(value, out intTabID);
if (intTabID > 0)
{
}
And in your code i think you need to replace "tab_group_" with "portlet_"
Instead of Strings.Replace(strTabID, "tab_group_", ""), use strTabID.Replace("tab_group_", "").
This should work
int intWidgetID = int.Parse(strTabID.Replace("tab_group_",""));//Can also use TryParse
Their is no Strings class in Vb.Net so please use the string class instead http://msdn.microsoft.com/en-us/library/aa903372(v=vs.71).aspx
you can achieve it by this way
string strWidget = "portlet_n";
int intWidgetID = Convert.ToInt32(strWidget.Split('_')[1]);

How to correctly uppercase Greek words in .NET?

We have ASP.NET application which runs different clients around the world. In this application we have dictionary for each language. In dictionary we have words in lowercase and sometimes we uppercase it in code for typographic reasons.
var greek= new CultureInfo("el-GR");
string grrr = "Πόλη";
string GRRR = grrr.ToUpper(greek); // "ΠΌΛΗ"
The problem is:
...if you're using capital letters
then they must appear like this: f.e.
ΠΟΛΗ and not like ΠΌΛΗ, same for all
other words written in capital letters
So is it possible generically to uppercase Greek words correctly in .NET? Or should I wrote my own custom algorithm for Greek uppercase?
How do they solve this problem in Greece?
I suspect that you're going to have to write your own method, if el-GR doesn't do what you want. Don't think you need to go to the full length of creating a custom CultureInfo, if this is all you need. Which is good, because that looks quite fiddly.
What I do suggest you do is read this Michael Kaplan blog post and anything else relevant you can find by him - he's been working on and writing about i18n and language issues for years and years and his commentary is my first point of call for any such issues on Windows.
I don't know much about ASP.Net but I know how I'd do this in Java.
If the characters are Unicode, I would just post-process the output from ToUpper with some simple substitutions, one being the conversion of \u038C (Ό) to \u039F (Ο) or \u0386 (Ά) to \u0391 (Α).
From the looks of the Greek/Coptic code page (\u0370 through \u03ff), there's only a few characters (6 or 7) you'll need to change.
Check out How do I remove diacritics (accents) from a string in .NET?
How about replacing the wrong characters with the right ones:
/// <summary>
/// Returns the string to uppercase using Greek uppercase rules.
/// </summary>
/// <param name="source">The string that will be converted to uppercase</param>
public static string ToUpperGreek(this string source)
{
Dictionary<char, char> mappings = new Dictionary<char, char>(){
{'Ά','Α'}, {'Έ','Ε'}, {'Ή','Η'}, {'Ί','Ι'}, {'Ό','Ο'}, {'Ύ','Υ'}, {'Ώ','Ω'}
};
source = source.ToUpper();
char[] result = new char[source.Length];
for (int i = 0; i < result.Length; i++)
{
result[i] = mappings.ContainsKey(source[i]) ? mappings[source[i]] : source[i];
}
return new string(result);
}

Resources