I have to deal with translation in po/mo format.
Basically an example of my po:
msgid "content_book"
msgid_plural "content_books"
msgstr[0] "%s book"
msgstr[1] "%s books"
It seems that %s or %d placeholders are quite common.
But the Symfony component use the placeholder %count%
https://github.com/symfony/translation-contracts/blob/main/TranslatorTrait.php#L50
Is there any possibility to use Symnfony/Translation component with po/mo file without to convert them in the icu format (https://symfony.com/doc/4.4/translation/message_format.html)?
I have to use sf 4.4 for now (wait for the next lts in 5.x).
Thanks for the help.
Gettext is not meant to be used in the way you are attempting to use it. msgid (and msgid_plural) are supposed to contain the actual human-readable text in English (or whatever your base language is), not some symbol names.
PO files with only symbol names as msgids are also painful to translate, and involve much more manual work than proper PO files do.
Related
When I have $$\mathbf{x}$$ in my .Rmd file, and use exams2moodle with the pandoc-mathml converter, the xml file contains an "𝐱" character, which needs to be replaced with an "x" character before moodle will import the quiz question (because moodle will give an error saying the file is not UTF-8 without BOM.)
I wonder what are the most practical workarounds? Is this a bug? Thanks!
Minimal example: Here is minimal_example.Rmd
Question
========
Stare hard at the variable.
$$\mathbf{x}$$
What is its value?
Solution
========
If you think hard enough, you will know it is 12.
Meta-information
================
extype: num
exsolution: 12
exname: minimal_example
extol: 0
Here is the minimal_example.r
library("exams")
exams2moodle("minimal_example.Rmd", converter="pandoc-mathml")
And... here is a snippet of the resulting .xml file.
...
<questiontext format="html">
<text><![CDATA[<p>
<p>Stare hard at the variable. <math display="block" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mstyle mathvariant="bold"><mi>𝐱</mi></mstyle><annotation encoding="application/x-tex">\mathbf{x}</annotation></semantics></math> What is its value?</p>
</p>]]></text>
</questiontext>
...
If I try importing the XML to my school's moodle, I get a dmlwriteexeption error. If I replace the "𝐱" with "x" the XML imports fine.
I am fairly certain my moodlequiz.xml file does not contain a BOM.
$ file moodlequiz.xml
moodlequiz.xml: XML 1.0 document, UTF-8 Unicode text, with very long lines
$ hexdump -n 3 -C moodlequiz.xml
00000000 3c 3f 78 |<?x|
00000003
I consider this question resolved. Hopefully nobody else has this issue, and I will use one of the proposed workarounds for my own files. Thanks!
TL;DR
exams2moodle(..., converter = "pandoc-mathml") seems to work correctly and produces an UTF-8 encoded XML file moodlequiz.xml. The problem on your end appears to be caused by a BOM (byte order mark) in your XML file. It is unclear to me whether this is introduced through exams2moodle() or through an editor on your end.
Either you can remove the BOM manually or you can avoid the UTF-8 encoding altogether by using exams2moodle(..., converter = "pandoc-mathml-ascii"). The latter requires at least version 2.4-0 of the package.
Replication
Thanks for providing a reproducible example. I ran your example code - both on a Linux machine running in an UTF-8 locale and a Windows 10 machine - and can confirm that I get exactly the same XML code containing the UTF-8 encoded bold x: 𝐱. However, I have no problem importing that into my Moodle system.
Possible sources of the problem
So I looked up what the Moodle error message is about. Moodle does not accept UTF-8-encoded files with a BOM (byte order mark) at the beginning. Some systems use a BOM at the beginning of a file to declare how the file is encoded. See:
Moodle documentation: https://docs.moodle.org/39/en/UTF-8_and_BOM
Wikipedia with general information: https://en.wikipedia.org/wiki/Byte_order_mark
The moodlequiz.xml I produced on the two systems I mentioned above have no BOM. So I suspect that either your R setup produces a file with a BOM or the BOM is inserted later, e.g., after opening the XML file with an editor. The Moodle documentation above has some information on what you can do to detect the BOM and get rid of it. Hopefully, this lets you debug the problem on your end. If the BOM was produced by exams2moodle() (as opposed to your editor for example) and you find out how to avoid that, please let me know.
Alternative solution
In principle it is possible to replace the UTF-8 encocded characters by the corresponding HTML entities. For example, in this particular case we have a "MATHEMATICAL BOLD SMALL X" with Unicode U+1D431 (see https://www.w3.org/Math/characters/bold.html). Thus, we can also represent it as 𝐱 (hexadecimal) or 𝐱 (decimal). Then the XML file can be in ASCII while still leading to the same output in HTML.
While pandoc is generally designed to work with UTF-8 throughout it also has support for (hexa)decimal escapes in certain conversions, see https://pandoc.org/MANUAL.html#option--ascii. And luckily it is possible to combine the --mathml with the --ascii option. There was only a small bug in how R/exams passed on the option to the rmarkdown::pandoc_convert() function which I just fixed. So you need at least version 2.4-0 of exams and can then do:
exams2moodle(..., converter = "pandoc-mathml-ascii")
which yields a moodlequiz.xml in ASCII instead of UTF-8.
I'm trying to read a file line-by-line in Ada, it's a XML text file. I'm following the instructions here:
http://rosettacode.org/wiki/Read_a_file_line_by_line#Ada
However there's a problem that annoys me: the "Get_Line" function seems to be unaware of byte-order marks and reads them as part of the text itself, which means that when I raed the lines, the first one will always start with some extra bytes that should not be there.
While removing the extra bytes manually from the string is no big deal it seems strange to me that a function dedicated to text input/output is unaware of BOMs, there must be a way to read a text file in ada without having to worry about this... is there?
Ada.Text_IO is specified to handle ISO-8859-1 encoded text, so ignoring an UTF-8 feature is the proper thing to do.
If Ada.Wide_Text_IO and Ada.Wide_Wide_Text_IO also output the byte-order-mark, when asked to read UTF-8 encoded text, then you should consider reporting it as a bug to GCC - but as there is quite a lot of implementation defined details for the text I/O packages in Ada, you should be ready for a "wont fix" answer.
One possibility is using the stream attributes and making a UTF_8 file-type to handle the BOM reading-and-discarding.
Plone is showing the special chars from my mother language (Brazilian Portuguese) in its pages. However, when I use a spt page I created it shows escape sequences, e.g.:
Educa\xc3\xa7\xc3\xa3o
instead of
Educação
(by the way, it means Education). I'm creating a python function to replace the escape sequences with the utf chars, but I have a feeling that I'm slaving away without need.
Are you interpolating catalog search results? Those are, by necessity (the catalog cannot handle unicode) UTF-8 encoded.
Just use the .decode method on strings to turn them into unicode again:
value = value.decode('utf8')
A better way should be to use safe_unicode function https://github.com/plone/Products.CMFPlone/blob/master/Products/CMFPlone/utils.py#L458
from Products.CMFPlone.utils import safe_unicode
value = safe_unicode(value)
Anyone know how to specify a custom record delimiter in a schema.ini file? I need to import some data for an old system, and the source file uses pipes (|) as field delimieters and tilda's (~) as row delimiters. I've managed to get the field delimiters configured. Row delimiters anyone?
The current schema.ini file...
[sourcefile.txt]
ColNameHeader=false
Format=Delimited(|)
CharacterSet=ANSI
Col1=F1 text
Col2=F2 text
Col3=F3 text
Col4=F4 text
Col5=F5 text
...
Oh, and yes, it has to be done this way. I can't work around it by importing it through some other means...
no chance: see the grammar of ODBC text driver text files, especially these 2 lines:
delimited-text-line ::=
blank-line |
delimited-data [delimiter delimited-data]... end-of-line
end-of-line ::= <CR> | <LF> | <CR><LF>
ie. no setting for a custom row separator. also, no string anywhere related in the dll ...
If you have access to a copy of SQL Server, you might want to try the BCP utility which also handles row terminators.
https://learn.microsoft.com/en-us/sql/relational-databases/import-export/specify-field-and-row-terminators-sql-server
Back in the day these "old school" formats were relied on to provide consistency of format between different systems (Unix, DOS/Windows etc)
Sometimes when copying stuff into PostgreSQL I get errors that there's invalid byte sequences.
Is there an easy way using either vim or other utilities to detect byte sequences that cause errors such as: invalid invalid byte sequence for encoding "UTF8": 0xde70 and whatnot, and possibly and easy way to do a conversion?
Edit:
What my workflow is:
Dumped sqlite3 database (from trac)
Trying to replay it in postgresql
Perhaps there's an easier way?
More Edit:
Also tried these:
Running enca to detect encoding of the file
Told me it was ASCII
Tried iconv to convert from ASCII to UTF8. Got an error
What did work is deleting the couple erroneous lines that it complained about. But that didn't really solve the real problem.
Based on one short sentence, it sounds like you have text in one encoding (e.g. ANSI/ASCII) and you are telling PostgreSQL that it's actually in another encoding (Unicode UTF8). All the different tools you would be using: PostgreSQL, Bash, some programming language, another programming language, other data from somewhere else, the text editor, the IDE, etc., all have default encodings which may be different, and some step of the way, the proper conversions are not being done. I would check the flow of data where it crosses these kinds of boundaries, to ensure that either the encodings line up, or the encodings are properly detected and the text is properly converted.
If you know the encoding of the dump file, you can convert it to utf-8 by using recode. For example, if it is encoded in latin-1:
recode latin-1..utf-8 < dump_file > new_dump_file
If you are not sure about the encoding, you should see how sqlite was configured, or maybe try some trial-and-error.
I figured it out. It wasn't really an encoding issue.
SQLite's output escaped strings differently than Postgres expects. There were some cases where 'asdf\xd\foo' was outputted. I believe the '\x' was causing it to expect the following characters to be unicode encoding.
Solution to this is dumping each table individually in CSV mode in sqlite 3.
First
sqlite3 db/trac.db .schema | psql
Now, this does the trick for the most part to copy the data back in
for table in `sqlite3 db/trac.db .schema | grep TABLE | sed 's/.*TABLE \(.*\) (/\1/'`
do
echo ".mode csv\nselect * from $table;" | sqlite3 db/trac.db | psql -c "copy $table from stdin with csv"
done
Yeah, kind of a hack, but it works.