PyYAML HEX numbers to python variables - hex

everybody!
I need to print in my YAML files something in HEX, like:
send: 0x740, and when I do yaml.load in my python scrip all my HEX numbers automatically converted to DEC. (1856 instead of 0x740)
How can I avoid this conversion to DEC?
Thanks and sorry for my english

Related

What is the format called when there is a bunch of numbers between backslashes?

I'm currently reading through some code right now, and this stuff keeps appearing. How would I decode this, and what is it called?
\108\111\97\100\40\34\92\50\55\92\55\54\92\49\49\55\92\57\55\92\56\50\92\48\92\49\92\52\92\52\92\52\92\56\92\48\92\50\53\92\49\52\55\92\49\51\92\49\48\92\50\54
It's (probably) unicode ASCII characters, represented as escape sequences.
\108\111 is lo, for example.
https://en.wikipedia.org/wiki/List_of_Unicode_characters#Basic_Latin
It's byte data encoded as \-separated base-10 ints. This is not a standard thing -- some kind of CTF exercise? It looks like someone took a file, encoded it into a string inside some source code, and then encoded the source code itself the same way.
>>> code = r"108\111\97\100\40\34\92\50\55\92\55\54\92\49\49\55\92\57\55\92\56\50\92\48\92\49\92\52\92\52\92\52\92\56\92\48\92\50\53\92\49\52\55\92\49\51\92\49\48\92\50\54"
>>> print(''.join(chr(int(n)) for n in code.split("\\")))
load("\27\76\117\97\82\0\1\4\4\4\8\0\25\147\13\10\26
>>> code = r"27\76\117\97\82\0\1\4\4\4\8\0\25\147\13\10\26"
>>> print(''.join(chr(int(n)) for n in code.split("\\")))
←LuaR ☺♦ ↓“
The LuaR in the original encoded file is apparently the file header for compiled Lua.

How can I use QSettings to write UTF-8 characters into [section] and [name] of *.ini file properly?

My code snippet is here:
QSettings setting("xxx.ini", QSettings::Format::IniFormat);
setting.setIniCodec(QTextCodec::codecForName("UTF-8"));
setting.beginGroup(u8"运动控制器");
setting.setValue(u8"运动控制器", u8"运动控制器");
setting.endGroup();
But what is written looks like this:
[%U8FD0%U52A8%U63A7%U5236%U5668]
%U8FD0%U52A8%U63A7%U5236%U5668=运动控制器
So it seems I did set the encoding correctly (partly), but what should I do to change the section and name into text from some per-cent-sign code?
Environment is Qt 5.12.11 and Visual Studio 2019
Unfortunately, this is hard-coded behavior in QSettings that you simply cannot change.
In section and key names, Unicode characters <= U+00FF (other than a..z, A..Z, 0..9, _, -, or .) are encoded in %XX hex format, and higher characters are encoded in %UXXXX format. The codec specified in setIniCodec() has no effect on this behavior.
Key values are written in the specified codec, in this case UTF-8.

Bold math characters into moodle with mathml converter

When I have $$\mathbf{x}$$ in my .Rmd file, and use exams2moodle with the pandoc-mathml converter, the xml file contains an "𝐱" character, which needs to be replaced with an "x" character before moodle will import the quiz question (because moodle will give an error saying the file is not UTF-8 without BOM.)
I wonder what are the most practical workarounds? Is this a bug? Thanks!
Minimal example: Here is minimal_example.Rmd
Question
========
Stare hard at the variable.
$$\mathbf{x}$$
What is its value?
Solution
========
If you think hard enough, you will know it is 12.
Meta-information
================
extype: num
exsolution: 12
exname: minimal_example
extol: 0
Here is the minimal_example.r
library("exams")
exams2moodle("minimal_example.Rmd", converter="pandoc-mathml")
And... here is a snippet of the resulting .xml file.
...
<questiontext format="html">
<text><![CDATA[<p>
<p>Stare hard at the variable. <math display="block" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mstyle mathvariant="bold"><mi>𝐱</mi></mstyle><annotation encoding="application/x-tex">\mathbf{x}</annotation></semantics></math> What is its value?</p>
</p>]]></text>
</questiontext>
...
If I try importing the XML to my school's moodle, I get a dmlwriteexeption error. If I replace the "𝐱" with "x" the XML imports fine.
I am fairly certain my moodlequiz.xml file does not contain a BOM.
$ file moodlequiz.xml
moodlequiz.xml: XML 1.0 document, UTF-8 Unicode text, with very long lines
$ hexdump -n 3 -C moodlequiz.xml
00000000 3c 3f 78 |<?x|
00000003
I consider this question resolved. Hopefully nobody else has this issue, and I will use one of the proposed workarounds for my own files. Thanks!
TL;DR
exams2moodle(..., converter = "pandoc-mathml") seems to work correctly and produces an UTF-8 encoded XML file moodlequiz.xml. The problem on your end appears to be caused by a BOM (byte order mark) in your XML file. It is unclear to me whether this is introduced through exams2moodle() or through an editor on your end.
Either you can remove the BOM manually or you can avoid the UTF-8 encoding altogether by using exams2moodle(..., converter = "pandoc-mathml-ascii"). The latter requires at least version 2.4-0 of the package.
Replication
Thanks for providing a reproducible example. I ran your example code - both on a Linux machine running in an UTF-8 locale and a Windows 10 machine - and can confirm that I get exactly the same XML code containing the UTF-8 encoded bold x: 𝐱. However, I have no problem importing that into my Moodle system.
Possible sources of the problem
So I looked up what the Moodle error message is about. Moodle does not accept UTF-8-encoded files with a BOM (byte order mark) at the beginning. Some systems use a BOM at the beginning of a file to declare how the file is encoded. See:
Moodle documentation: https://docs.moodle.org/39/en/UTF-8_and_BOM
Wikipedia with general information: https://en.wikipedia.org/wiki/Byte_order_mark
The moodlequiz.xml I produced on the two systems I mentioned above have no BOM. So I suspect that either your R setup produces a file with a BOM or the BOM is inserted later, e.g., after opening the XML file with an editor. The Moodle documentation above has some information on what you can do to detect the BOM and get rid of it. Hopefully, this lets you debug the problem on your end. If the BOM was produced by exams2moodle() (as opposed to your editor for example) and you find out how to avoid that, please let me know.
Alternative solution
In principle it is possible to replace the UTF-8 encocded characters by the corresponding HTML entities. For example, in this particular case we have a "MATHEMATICAL BOLD SMALL X" with Unicode U+1D431 (see https://www.w3.org/Math/characters/bold.html). Thus, we can also represent it as 𝐱 (hexadecimal) or 𝐱 (decimal). Then the XML file can be in ASCII while still leading to the same output in HTML.
While pandoc is generally designed to work with UTF-8 throughout it also has support for (hexa)decimal escapes in certain conversions, see https://pandoc.org/MANUAL.html#option--ascii. And luckily it is possible to combine the --mathml with the --ascii option. There was only a small bug in how R/exams passed on the option to the rmarkdown::pandoc_convert() function which I just fixed. So you need at least version 2.4-0 of exams and can then do:
exams2moodle(..., converter = "pandoc-mathml-ascii")
which yields a moodlequiz.xml in ASCII instead of UTF-8.

Ignoring ascii carriage return characters in R

I've a dataset in tab delimited text file. The data have been exported from an old-school relational database software 4D. Most of the lines seems to be well formated but some lines include an ASCII carriage return character (^M in Emacs or Ascii code 13). I would like to read the data in R using a function such as read.table() and to find a way to ignore those ascii carriage return symbols. Does anyone have a solution ?
In Vim you can create the ^M character by typing control-v control-m
So you could replace every occurence of ^M with:
:%s/<c-v><c-m>//g

Printing ASCII value of BB (HEX) in Unix

When I am trying to paste the character » (right double angle quotes) in Unix from my Notepad, it's converting to /273. The corresponding Hex value is BB and the Decimal value is 187.
My actual requirement is to have this character as the file delimiter when I export a .dat file from a database table. So, this character was put in as the delimiter after each column name. But, while copy-pasting, it's getting converted to /273.
Any idea about how to fix this? I am on Solaris (SunOS 5.10).
Thanks,
Visakh
ASCII only defines the character codes up to 127 (0x7F) - everything after that is another encoding, such as ISO-8859-1 or UTF-8. Make sure your locale is set to the encoding you are trying to use - the locale command will report your current locale settings, the locale(5) and environ(5) man pages cover how to set them. A much more in-depth introduction to the whole character encoding concept can be found in Joel Spolsky's The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
The character code 0xBB is shown as » in the IS0-8859-1 character chart, so that's probably the character set you want, so the locale would be something like en_US.ISO8859-1 for that character set with US/English messages/date formats/currency settings/etc.

Resources