What is the correct way to represent Unicode codepoints in string literals in Memgraph? Does memgraph support UTF-16 or UTF-32 codepoints?
Memgraph supports both UTF-16 and UTF-32 codepoints.
You can use \u followed by 4 hex digits in string literals for UTF-16 codepoint and \U with 8 hex digits for UTF-32 codepoint in Memgraph.
Related
I just wanted to know which unicode blocks can be safely used when being limited to single-byte codepoints only.
So, which is the last single-byte codepoint, and which is the first multi-byte codepoint?
In UTF-8, the last single-byte code point is U+007F, and first 2-byte code point is U+0080.
See https://en.wikipedia.org/wiki/UTF-8#Encoding
Running R CMD check --as-cran gives
Portable packages must use only ASCII characters in their R code,
except perhaps in comments.
Use \uxxxx escapes for other characters.
What are \uxxxx, and more importantly, how can I convert non ASCII characters into them?
What I know so far
?iconv is very informative, and looks powerful, but I see nothing of the form \u
this python documentation indicates \uxxxx are
Character with 16-bit hex value xxxx (Unicode only)
Question
How can I convert non-ASCII characters into character representations of the form \uxxxx
Some examples c("¤", "£", "€", "¢", "¥", "₧", "ƒ")
You have stri_escape_unicode from stringi to escape unicode:
stringi::stri_escape_unicode(c("¤", "£", "€", "¢", "¥", "₧", "ƒ"))
## [1] "\\u00a4" "\\u00a3" "\\u20ac" "\\u00a2" "\\u00a5" "P" "\\u0192"
I have an addin based on that to remove non ascii characters between quotes in function here : https://github.com/dreamRs/prefixer
How do I know exactly what characters are used for the encrypted output using jasypt? Can I force that my output does not contain certain characters or are always all ASCII characters used?
Reason I am asking is that the encrypted text is part of a file with delimiters and I would like to avoid that this delimiter is part of the encrypted text. The delimiter should also not be a hidden character, like SOH, because the file can be edited manually.
"Base64 only uses 6 bits (corresponding to 2^6 = 64 characters) to ensure encoded data is printable and humanly readable. None of the special characters available in ASCII are used. The 64 characters (hence the name Base64) are 10 digits, 26 lowercase characters, 26 uppercase characters as well as '+' and '/'."
So, looks like I can use ASCII special characters as a delimiter.
Lots of European authors have Unicode characters in their names such as Å, Æ, ø and Ä. How can they have their actual name with these Unicode characters rather than some transformed version of the English alphabet (Å -> A, Æ -> A, Ä -> A) when creating R-package. In short, how can I use Unicode characters in author/creator/maintainer name when creating an R-Package.
There is some note here:
http://r-pkgs.had.co.nz/check.html
I quote:
If you use any non-ASCII characters in the DESCRIPTION, you must also specify an encoding. There are only three encodings that work on all platforms: latin1, latin2 and UTF-8. I strongly recommend UTF-8:
Encoding: UTF-8
I am trying to use the Unicode UTF-16 character set, but I am unsure how to do this, by default when I use the Unicode character set it uses UTF-8 which changes foreign Spanish, Arabic, etc. characters into ?. I am currently using Teradata 14.