How to fix prettytable to display chinese character properly - r

from prettytable import PrettyTable
header="乘客姓名,性别,出生日期".split(",")
x = PrettyTable(header)
x.align["乘客姓名"]="l"
table='''HuangTianhui,男,1948/05/28
姜翠云,女,1952/03/27
李红晶,女,1994/12/09
LuiChing,女,1969/08/02
宋飞飞,男,1982/03/01
唐旭东,男,1983/08/03
YangJiabao,女,1988/08/25
买买提江·阿布拉,男,1979/07/10
安文兰,女,1949/10/20
胡偲婠(婴儿),女,2011/02/25
(有待确定姓名),男,1985/07/20
'''
data=[row for row in table.split("\n") if row]
for row in data:
x.add_row(row.strip().split(","))
print(x)
What I want the output format is as the following.
In this example, prettytable.py can not display properly chinese ambiguous width of character · in 买买提江·阿布拉 , the character has ambiguous width. How to fix the bug in prettytable.py?
I have add two lines in def _char_block_width(char) of prettytable.py, but the problem still remains.
if char == 0xb7:
return 2
I have solved it, the file prettytable.py should be installed in my computer d:\python33\Lib\site-packagesdirectly not in as the form of d:\python33\Lib\site-packages\prettytable\prettytable.py
There are many chinese character with ambiguous width, it is stupid for us to add two lines such as the following to fix the bug, if there are 50 ambiguous character,100 lines will be added in the prettytable.py, is there a simple way to do that? Just fix some lines to treat all the ambiguous character?
if char == 0xb7:
return 2

The issue you're running into has to do with the dot character in the incorrectly padded line of your Python output. The dot is Unicode code point U+00B7 · middle dot. This character is considered to have an "ambiguous" width, as it is a narrow character in most non-East-Asian fonts, but is rendered a full-width in most Asian ones. Without context, a program can't tell how wide it will appear on the screen. Unfortunately, Python's Unicode system doesn't appear to have any way to provide that context.
One fix might be to replace the offending dot with one that has an unambiguous width, such as U+30FB katakana middle dot (which is always full width). This way the padding logic will be able to recognize that extra space is needed for that line.
Another solution could be to set your console to use a font with more Western treatment of the middle dot character, rather than the current one that follows the East-Asian style of rendering of it as full-width. This will mean that the existing padding is correct. Your output from R clearly uses a different font that the Python output does, and its font renders the dot as half-width.

Related

How to preserve white space at the start of a line in .Rd documentation?

I need to indent some math stuff in the \details section of my .Rd documentation to enhance its readability. I am using mathjaxr. Is there any way to indent without installing roxygen2 or similar?
The math stuff is inline, so simply setting to display using \mjdeqn won't solve this.
I seem to have a reasonable "cheating" work around for indenting the first line using mathjaxr, at least for the PDF and HTML output.
We need to do two things:
Use the mathjax/LaTeX phantom command. phantom works by making a box of the size necessary to type-set whatever its argument is, but without actually type-setting anything in the box. For my purposes, if I want to indent, say, about 2 characters wide, I would start the line with a \mjeqn{\phantom{22}}{ } and following with my actual text, possibly including actual mathy bits. If I want an indent of, say, roughly 4 characters wide, I might use \mjeqn{\phantom{2222}}{ }.
Because mathjaxr has a problem with tacking on unsolicited new lines when starting a line with mjeqn, we need to prefix the use of phantom in 1 above with an empty bit of something non-mathjaxr-ish like \emph{}.
Putting it all together, I can indent by about 2 characters using something like this:
\emph{}\mjeqn{\phantom{22}}Here beginneth mine indented line…
I need to explore whether the { } business actually indents for ASCII output, or whether I might accomplish that using or some such.

Box-drawing characters aren't aligned in Xmobar

I've created a little Xmobar status indicator for https://complice.co. Inspired by the agnoster Zsh theme, I used some box-drawing characters to try to put triangle-like ends on the end of the status bar. But they aren't aligning correctly, as shown here:
The triangle is too small, leaving a lip at the bottom. It annoys me that it's not pixel-perfect. Does anyone have any insight into why it isn't sized correctly? I've never used box-drawing characters and couldn't find any documentation on the specific ones I'm using (\ue0b2 and \ue0b0) - any links would be appreciated.
I use a script to generate the text. The important part is here where I use the box-drawing characters: https://github.com/d4hines/beth/blob/master/scripts/complice#L38
And the Xmobar config: https://github.com/d4hines/beth/blob/master/flake.nix#L249-L265

Why does whitespaces between characters / words are handled in other way as general whitespaces?

I have create a font-subset for my two used fonts.
But if I enter the browser and inspect a given H1-Tag which should only use this font, it shows that 2 Fonts are used, because one character is taken from an Fallback_Font Open Sans:
The exact HTML-Tag:
<strong class="headline1">Carservice Meisterwerkstatt</strong>
The CSS which is used (BTW: PT Sans use the same Font-Subsetting, so the next Fallback for those 5 Glyphs is OpenSans):
To determine the Subset I've used: glyphhanger http://localhost:3000 and added the output of it as whitelist to the following command:
glyphhanger --whitelist=U+A,U+20-23,U+25-29,U+2B-3B,U+3F-57,U+59,U+5A,U+5F,U+61-7D,U+A9,U+C4,U+D6,U+DC,U+E4,U+F6,U+FC,U+F002,U+F017,U+F0F1,U+F2B5,U+F2DC,U+F46D,U+F500,U+F530,U+F5E1,U+F63B,U+F7D9 --subset=Dosis-VariableFont_wght.ttf
What I do search for is a way to figure out, which 5 Glyphs are used from Open Sans. Is there a way to get this in the DEV-Console?
For testing purposes, I've changed the font to other font face to see immediately if there is used another font as fallback. But as you can see, even with Alfredo as Fallback it is not visible which 5 glyph's are using this fallback.
I've tried now to remove each single Character in Content-Part of the Tag inside of the Dev-Console... and checked when does the font-mixing appear. I figured out, that it appear only if I have 2 Characters with a whitespace in between: r M
But if I enter only a character (or word) with a whitespace in front of, or after the character, it doesn't happend. M even not like M .
I found that there are more than one simple space-character. There are many (see https://emptycharacter.com/ down on topic Unicode empty characters)
So it seems the issue at least is, that the Font-Subset doesn't have the needed Unicode included.
If anybody knows how to easily figure out which exact unicode the browser request to the font, you are very welcome to paste it here as comment)

how to put y axis greek letters in Veusz plot?

I want to put Capitalomega with index DE and k label:
and then ı want to show on the y axis label? How to do them?
Generally you can use tex symbols in Veusz. Therefore, you can write \Omega_{DE} and \Omega_{k} for your request. See details here (Sec. 2.4 Text).
Veusz understands a limited set of LaTeX-like formatting for text. There are some differences (for example, "10^23" puts the 2 and 3 into superscript), but it is fairly similar. You should also leave out the dollar signs. Veusz supports superscripts ("^"), subscripts ("_"), brackets for grouping attributes are "{" and "}".
Supported LaTeX symbols include: \AA, \Alpha, \Beta, \Chi, \Delta, \Epsilon, \Eta, \Gamma, \Iota, \Kappa, \Lambda, \Mu, \Nu, \Omega, \Omicron, \Phi, \Pi, \Psi, \Rho, \Sigma, \Tau, \Theta, \Upsilon, \Xi, \Zeta, \alpha, \approx, \ast, \asymp, \beta, \bowtie, \bullet, \cap, \chi, \circ, \cup, \dagger, \dashv, \ddagger, \deg, \delta, \diamond, \divide, \doteq, \downarrow, \epsilon, \equiv, \eta, \gamma, \ge, \gg, \in, \infty, \int, \iota, \kappa, \lambda, \le, \leftarrow, \lhd, \ll, \models, \mp, \mu, \neq, \ni, \nu, \odot, \omega, \omicron, \ominus, \oplus, \oslash, \otimes, \parallel, \perp, \phi, \pi, \pm, \prec, \preceq, \propto, \psi, \rhd, \rho, \rightarrow, \sigma, \sim, \simeq, \sqrt, \sqsubset, \sqsubseteq, \sqsupset, \sqsupseteq, \star, \stigma, \subset, \subseteq, \succ, \succeq, \supset, \supseteq, \tau, \theta, \times, \umid, \unlhd, \unrhd, \uparrow, \uplus, \upsilon, \vdash, \vee, \wedge, \xi, \zeta. Please request additional characters if they are required (and exist in the unicode character set). Special symbols can be included directly from a character map.
Other LaTeX commands are supported. "\" breaks a line. This can be used for simple tables. For example "{a\b} {c\d}" shows "a c" over "b d". The command "\frac{a}{b}" shows a vertical fraction a/b.
Also supported are commands to change font. The command "\font{name}{text}" changes the font text is written in to name. This may be useful if a symbol is missing from the current font, e.g. "\font{symbol}{g}" should produce a gamma. You can increase, decrease, or set the size of the font with "\size{+2}{text}", "\size{-2}{text}", or "\size{20}{text}". Numbers are in points.
Various font attributes can be changed: for example, "\italic{some italic text}" (or use "\textit" or "\emph"), "\bold{some bold text}" (or use "\textbf") and "\underline{some underlined text}".
Example text could include "Area / \pi (10^{-23} cm^{-2})", or "\pi\bold{g}".
Veusz plots these symbols with Qt's unicode support. You can also include special characters directly, by copying and pasting from a character map application. If your current font does not contain these symbols then you may get a box character.
In addition to the answer OmG posted, you can also directly enter the character (via a character map application or copy and paste), as Veusz supports unicode characters.

What special character is this space like thousand separator?

Sometimes, in a Excel file, I find thousand separators. It is exactly like a space, but it is not. Why, because when you want to replace it, you can't type space. In stead, if you copy paste this "space", the thousand separator inside a figure, then you can replace them.
Since always, I do this, and I still don't know what is this mysterious space like thousand separator.
Now I have a problem because I have to do in R, and copy paste no more works. I think that maybe like the case with €.
When I do : gsub("€","",Price), the euro symbol won't be replaced.
Could you please help ? Thank you
That would be U+2009, "thin space".
There are multiple characters that look like a space character. Because they look similar, it is good to refer to them using the Unicode code.
U+0020 : this is the normal space character you get when pressing the spacebar on the keyboard
U+00A0 No-Break Space: this space character is meant to prevent breaking into a new line when word wrapping. In HTML, it is equivalent to
U+2007   Figure Space: this space character is meant to be as wide as a numerical digit, and it prevents line breaking.
U+2009   Thin Space: this space character is meant to be slightly less wide than a normal space character
U+202F   Narrow No-Break Space: similar to U+00A0, but narrower in width
There may be others I'm missing.

Resources