How do I stop bookdown reordering punctuation? - r

I have forked bookdown-minimal here to produce the minimal reproducible example of my issue.
I want the following sentence to be rendered as is. That is, I want the full stop (period) to remain outside the quotes.
This is the on-line version of "A Book".
I have made a minimal bookdown example here
The line bibliography: [book.bib] causes the sentence to be rendered using "Build book" as
This is the on-line version of "A Book."
I know this is a convention of American English, but other languages (and other variants of English) don't do this and I don't want to do it in the real sentences I have (it seems that the issue occurs with other items of punctuation, such as ! and ?, that even American English puts in the correct place).
What is driving this behaviour? (Note that I am not actually including references in my minimal example.) Is there any easy way to stop it?

The system respects the lang metadata variable. So if you are writing British English, then add this to your YAML metadata:
lang: en-GB
The result should be
This is the on-line version of “A Book”.
whereas
lang: en-US
gives
This is the on-line version of “A Book.”
If all else fails, you can resort to adding a Lua filter which adds an invisible character like a zero-width joiner. This will prevent the reordering from happening as well.
function Quoted (quote)
return {quote, pandoc.Str '\u{200d}'}
end
Note that you can take advantage of this trick on a per-case basis by using the ‍ entity in your Markdown input:
This is the on-line version of "A Book"‍.

This should do the trick:
This is the on-line version of "\text{A Book}"\text{.}

Related

How does 95cd 21eb fc from Farbrausch's "fuenf" translate into

In 2001 German scene group Farbrausch released a demo called "fuenf" (in your face). pouet.net It contains a 5 Byte executable which could be rather considered a troll approach than a demo. If you run it your hear a weird sound and it could crash your computer. At least it produces a sound. Whatever.
The hexadecimal content is:
95cd 21eb fc
And the binary representation is:
10010101 11001101 00100001 11101011 11111100
Using xxd I also get the printable chars from the content, which are:
..!..
And that makes me a little confused. Looking up the values in the ASCII table (e.g. here), I get this as a result:
•Í!ëü
At least the exclamation mark is correct.
But how does 95cd21ebfc translate into ..!..?
Side note:
file -bi fuenf.com sais the encoding is not known:
charset=unknown-8bit
And iconv -f ISO-8859-1 -t UTF-8 fuenf.com returns
Í!ëü
Which leads to the assumption, that XXD simply cannot decode the content and therefore just uses default results, like the dot?
First of all, this is not a text file, so looking at it as one makes no sense. It's instructions.
Secondly, even if it could be interpreted as text, you would need to know the encoding. It's definitely not ASCII, because that only defines symbols in the range 0-127 (and the 3rd byte here is the only one in that range, which maps to '!'). The "extended ASCII" table you link to is only one of many possible code pages that give meaning to the value from 128-255, but there are many of those code pages. Calling it "extended ASCII" is misleading, because it suggests that ASCII created an updated standard for this, which they did not. For a while, computer vendors just did whatever they wanted with those additional characters, and some of them became quasi-standards by virtue of being included in DOS, Windows, etc. Or they got standardized by ISO (you tried iso-8859-1, which is one such standard).

Incorrect behaviour of Google Translation API with notranslate tags

Google Translate API allows indicating chunks of text that should not be translated with
<span translate='no'>Skip this text while translating</span>
In some cases there is an incorrect behaviour with non-translate tags, that causes the translation API to omit one of the words and to duplicate the non-translate tag. Input of the translation API:
0c40152c asdasd alsdls3 ec3f297a <span translate="no">AAAAA123AAAA</span> Nov 30 translate
When translating from Italian to English (not sure if the language matters), the following result is returned:
0c40152c asdasd alsdls3 ec3f297a <span translate="no">AAAAA123AAAA</span> Nov 30 <span translate="no">AAAAA123AAAA</span>
Please note that the 'translate' at the end of the text is substituted with the non-translate tag.
This issues are present if instead of <span translate='no'> I use the alternative syntax: <span class='notranslate'>.
Is this a known bug ? Does it have a sensible workaround ?
Is this a known bug?
Yes: https://issuetracker.google.com/issues/121076288
Translation problem with notranslate class in span tag
Problem you have encountered:
The translation API gives wrong results translating from german to arabic
German text:
QANTARA Migration - Kostenfreie Erstprüfung Ihrer Chancen für die erfolgreiche Immigration nach Deutschland
Arabic translation:
QANTARA Migration - إجراء فحص أولي مجاني لفرص نجاح QANTARA Migration إلى ألمانيا
What you expected to happen:
Correct translation without doubling the span with notranslate - this was doubled in arabic translation as you can see
There are also a few others that seem related, like https://issuetracker.google.com/issues/74168658 and https://issuetracker.google.com/issues/35902695.
Does it have a sensible workaround?
Only hacky ones, I'm afraid.
The easiest workaround is just to replace such sections with a token, like a unique number or url that Translate is smart enough not to touch, translate, then swap the original string back in.
A more general solution is to use something like ModelFront (full-disclosure: I work there) to detect errors, and do something only in those cases.
It seems like you have specified Italian as input language, but there are very few words in the text which can be translated (for example “translate”) and they are not recognised in the source language.
This can derivate in issues with the translation algorithm, which seems to be the case here.
A workaround would be setting the source language to get automatically detected by the API and checking the confidence value:
The confidence value is an optional floating point value between 0 and
1. The closer this value is to 1, the higher the confidence level for the language detection. This member is not always available
If the confidence value is high enough for your needs, it will try to detect the appropriate source language to translate from.
Another workaround could be adding more words to the text so the algorithm has more data to work with. I have tested the API with the same input as you describe but adding a few more words. The result output is the expected.

convert comment string to an ASCII character list in sicstus-prolog

currently I am working on comparison between SICStus3 and SICStus4 but I got one issue that is SICStus4 will not consult any cases where the comment string has carriage controls or tab characters etc as given below.
Example case as given below.It has 3 arguments with comma delimiter.
case('pr_ua_sfochi',"
Response:
answer(amount(2370.09,usd),[[01AUG06SFO UA CHI Q9.30 1085.58FUA2SFS UA SFO Q9.30 1085.58FUA2SFS NUC2189.76END ROE1.0 XT USD 180.33 ZPSFOCHI 164.23US6.60ZP5.00AY XF4.50SFO4.5]],amount(2189.76,usd),amount(2189.76,usd),amount(180.33,usd),[[fua2sfs,fua2sfs]],amount(6.6,usd),amount(4.5,usd),amount(0.0,usd),amount(18.6,usd),lasttktdate([20061002]),lastdateafterres(200712282]),[[fic_ticketinfo(fare(fua2sfs),fic([]),nvb([]),nva([]),tktiss([]),penalty([]),tktendorsement([]),tourinfo([]),infomsgs([])),fic_ticketinfo(fare(fua2sfs),fic([]),nvb([]),nva([]),tktiss([]),penalty([]),tktendorsement([]),tourinfo([]),infomsgs([]))]],<>,<>,cat35(cat35info([])))
.
02/20/2006 17:05:10 Transaction 35 served by static.static.server1 (usclsefat002:7551) running E*Fare version $Name: build-2006-02-19-1900 $
",price(pnr(
user('atl','1y',<>,<>,dept(<>,'0005300'),<>,<>,<>),
[
passenger(adt,1,[ptconly(n)])
],
[
segment(1,sfo,chi,'ua','<>','100',20140901,0800,f,20140901,2100,'737',res(20140628,1316),hk,pf2(n,[],[],n),<>,flags(no,no,no,no,no,no,no,no,no)),
segment(2,chi,sfo,'ua','<>','101',20140906,1000,f,20140906,1400,'737',res(20140628,1316),hk,pf2(n,[],[],n),<>,flags(no,no,no,no,no,no,no,no,no))
]),[
rebook(n),
ticket(20140301,131659),
dbaccess(20140301,131659),
platingcarrier('ua'),
tax_exempt([]),
trapparm("trap:ffil"),
city(y)
])).
The below predicate will remove comment section in above case.
flatten-cases :-
getmessage(M1),
write_flattened_case(M1),
flatten-cases.
flatten-cases.
write_flattened_case(M1):-
M1 = case(Case,_Comment,Entry),!,
M2 = case(Case,Entry),
writeq(M2),write('.'),nl.
getmessage(M) :-
read(M),
!,
M \== end_of_file.
:- flatten-cases.
Now my requirement is to convert the comment string to an ASCII character list.
Layout characters other than a regular space cannot occur literally in a quoted atom or a double quoted list. This is a requirement of the ISO standard and is fully implemented in SICStus since 3.9.0 invoking SICStus 3 with the option --iso. Since SICStus 4 only ISO syntax is supported.
You need to insert \n and \t accordingly. So instead of
log('Response:
yes'). % BAD!
Now write
log('Response:\n\tyes').
Or, to make it better readable use a continuation escape sequence:
log('Response:\n\
\tyes').
Note that using literal tabs and literal newlines is highly problematic. On a printout you do not see them! Think of 'A \nB' which would not show the trailing spaces nor trailing tabs.
But there are also many other situations like: Making a screenshot of program text, making a photo of program text, using a 3270 terminal emulator and copying the output. In the past, punched cards. The text-mode when reading files (which was originally motivated by punched cards). Similar arguments hold for the tabulator which comes from typewriters with their manually settable tab stops.
And then on SO it is quite difficult to type in a TAB. The browser refuses to type it (very wisely), and if you copy it in, you get it rendered as spaces.
If I am at it, there is also another problem. The name flatten-case should rather be written flatten_case.

How can I fix turkish character on font-face?

I am using chunk five fonts on my web site as font-face on css.When I use pure fonts on photoshop, there will turkish characters exist.But when I convert it to font-face.I won't display Turkish characters.I shared a screenshot on the following segment of text;
I've tried to convert different type of font faces.I tried to convert it with the subsetting support and I've checked Turkish field on it.Also, I entered ş,Ş,İ,ı,ğ,Ğ,ü,Ü,Ç,ç,Ö,ö Single Characters field on converter.Unfortunately,It's not worked for me.How can I fix that problem?
Thanks and Regards.
You can try convert fonts from another sources and will fix. I know this is old post but maybe helps you.
Some sources:
http://convertfonts.com/
https://www.web-font-generator.com/
http://www.flaticon.com/font-face
https://fontie.flowyapps.com/home
I have experienced the exact same problem as you when dealing with font conversions for Turkish. First, I had run a conversion using FontSquirrel's tool (available here), but it turns out the conversion was stripping these much-needed characters for the Turkish language.
One of the references from #Karmacoma's answer was very interesting and did the trick for me (Fontie) because it delivers advanced options, which gives us more control over the conversion process.
In order to cover the special characters in Turkish, you must use Switch to advanced view and run the conversion with Latin Extended-A.
I went to Wikipedia for a list of characters covered in Latin Extended-A and you can find them here.

IDML : What are Kinsoku/Mojikumi tables?

I am new to the world of Adobe InDesign and IDML file format. I am trying to understand the IDML file format so that I can create IDML files dynamically through code!
I am going through the IDML File format specification and have found references to "Mojikumi Tables" and "Kinsoku Tables" and "Aki". Though the documentation defines various attributes for these elements, there's no clear explanation what these elements actually are.
Any pointers or links to relevant articles would be really helpful.
Thanks.
These are all additional typography settings used in laying out Japanese text.
Kinsoku: A rule set in the Japanese language that is used to determine characters that are not permitted at the beginning or end of a line. Reference.
Mojikumi: Determines spacing between punctuation, symbols, numbers, and other character classes in Japanese type. Reference.
Aki: Means space in Japanese:
"When the glyphs that correspond to characters of different character
classes come together in a run of text, there is spacing behaviour. In
other words, extra space, measured using a fraction of an em, is
introduced depending on which two character classes are in proximity*.
Typical values are one-fourth and one-half of an em"
(Footnote: * 'In Japanese this space is referred to as aki, which simply means
"space"')
Reference and source for this quote.
Here's a link to a book that should provide more information: CJKV Information Processing, 2nd Edition

Resources