I have a font where combined diacritics don't combine. Some applications (Adobe Illustrator) render the font correctly, one application shows the characters separately. Unfortunately, this application lives on a server outside of my control, which makes testing difficult. When I switch to another font (Arial Unicode MS), the combined diacritics are shown correctly. This application does not support OpenType fonts.
For example, U+00EA (ê) plus U+0323 (dot below) should be rendered as an e with circumflex above and dot below (ệ), but one application renders it as ê. with the dot after the ê (the dot is at the correct vertical position).
When I use View->Combinations->Ligatures in FontForge, the combination of these two glyphs is shown correctly.
Not being a font designer, I'm a bit lost. What could cause this problem?
Combining diacritics is a nice font feature but it's not universally supported. The application in question probably just doesn't support modern font features.
Combining diacritics should be designed with this in mind. Give them zero advance-width, and position the diacritic left of the glyph start position, so that it will naturally fall over/under the previous character. Unfortunately you can't get it perfectly positioned against that character; but if you make it look good over/under an a, it will probably be at least legible with i, w, etc.
This is really a stopgap measure in case of poor software, and it won't always work. For instance, high diacritics above capitals or tall lowercase letters will run into trouble.
Related
So, when working with fonts :
the default (aka "Regular") font weight is 400
the "bold" font weight is 700
the "light" font weight is 300
But... why ? 400 what ? Is this a sort of unit ? Is there a historical reason behind that ?
Not "400 what", just 400. As per the CSS specification, first formalized in https://www.w3.org/TR/CSS1/#font-weight. there are nine levels of font weight, represented as unitless numbers, starting at 100 and ending at 900, in 100 increments.
In addition to this, the spec defines two mappings between numerical value and string value:
the numerical value 400 and the string value normal are the same thing, and
the numerical value 700 and string value bold are the same thing
(Note that while CSS4 will change this to allow for the numbers 1-1000 in increments of 1, it will still only officially recognise the string values normal and bold, still mapping to 400 and 700 respectively. See https://drafts.csswg.org/css-fonts-4/#font-weight-prop for more information)
The only formal rules around these weights is that if you're using a font family in CSS context, a font weight of 400/normal should get you whatever is that font family's Regular typeface, and a font weight of 700/bold should get you whatever is the font family's Bold typeface. Anything else is left entirely undefined, and all you know is that 100, 200, and 300 are probably lighter than 400, 500 and 600 are probably in between regular and bold, and 800 and 900 are probably heavier than 700.
All of those are qualified as "probably" because #font-face gets to completely invalidate everything about this. If you use #font-face, you overrule CSS's rules for what those numerical values mean entirely. For example: this rule will effect an ultra-thin font when you set font-weight to 900, because that's what we're telling the browser it must do:
#font-face {
font-family: MyFont;
font-weight: 900;
src: url("./fonts/ultra-thin.woff2") format("WOFF2");
}
Also important to know is that these are the only two official number/string mappings. Officially there are no other mappings, and the table of numerical values in https://drafts.csswg.org/css-fonts-3/#font-weight-prop is there purely to illustrate which real CSS values map to which rough names people tend to use for them.
The most important part is that this only applies to CSS. It has nothing to do with actual font-stored values, or things you see in word processing software, etc.
On the reason for no units
Font weights are not given a unit because with increasing and decreasing font weights there is no specific, identifiable "thing" that you are "turning up and down." In contrast to font-size, increases/decreases in font weight on computers have not traditionally been achieved programmatically - each level of font thickness has been needed to be hand-created by the font designer as a completely different typeface.
This is because it's much harder to programmatically change font weight than it is to change, say, size. Bolding text doesn't just add thickness to the entire character. It adds contrast by selectively adding more thickness to certain parts of the letter than others. Notice on the font below how certain parts of the letter "a" grow very thick with a higher weight while other parts of the same letter don't grow nearly as much.
This is not easy do programmatically, it's mostly had to be done by hand to have it look good - typographers would create a few different versions of their font - usually three - developers would upload all three styles, and the CSS font-weight property would be set up to only change between those three, usually at 300, 400, and 700. This is assuming you wanted real bold / italics. Faux styling - faux bold, faux italics, etc - has been provided by browsers, but the results have generally not been great.
Where things are moving now is towards variable fonts, introduced in 2016. These are fonts that are specifically designed to give developers full control over style attributes such as font weight and italicization. Essentially, font creators create several different variations of their typeface with different levels of weight, and the computer then intelligently fills in the spaces in between so that developers now have potentially infinite degrees of font-weight to use for a given font.
Even with this change, though, it's hard to pin down a specific, scientifically measurable "thing" that makes text more or less heavy. So I doubt that we'll see any non-relative unit introduced for font-weight any time soon.
On the reason for multiples of 100 and default of 400
The system is based off of the Linotype system that was developed for the Univers font in 1957. The idea was that you would use three digits to specify font-weight, width, and italicization, in that order. So 099 would be very light, very wide, and very italicized. 905 would be very heavy, very compressed, with medium italicization. This is just an example, the actual available values were different. People had more use of the 'weight' part of the system than the other two digits, so the second two digits were used less and less until they became vestigial and people forgot that the numbering system was used for anything other than specifying weights. It would indeed make more sense to have a 1-9 scale for font weight at this point, but 100-900 has now become conventional.
My guess is that 400 is the default simply because it's right in the middle. I would think that they chose 400 over 500 because people more often desire to bolden their text than to lighten it so they chose the default value that left more room for that.
I'm looking for a means to extract the position (x, y) and attributes (font / size) of every word in a document.
From the python-docx docs, I know that :
Conceptually, Word documents have two layers, a text layer and a
drawing layer. In the text layer, text objects are flowed from left to
right and from top to bottom, starting a new page when the prior one
is filled. In the drawing layer, drawing objects, called shapes, are
placed at arbitrary positions. These are sometimes referred to as
floating shapes.
A picture is a shape that can appear in either the text or drawing layer. When it appears in the text layer it is called an inline shape,
or more specifically, an inline picture.
[...] At the time of writing, python-docx only supports inline pictures.
Yet, even if it is not the gist of it, I'm wondering if something similar exists :
from docx import Document
main_file = Document("/tmp/file.docx")
for paragraph in main_file.paragraphs:
for word in paragraph.text: # <= Non-existing (yet wished) functionnalities, IMHO
print(word.x, word.y) # <= Non-existing (yet wished) functionnalities, IMHO
Does somebody has an idea ?
Best,
Arthur
for word in paragraph.text: # <= Non-existing (yet wished) functionalities, IMHO
This functionality is provided right in the Python library as str.split(). These can be composed easily as:
for word in paragraph.text.split():
...
Regarding
print(word.x, word.y) # <= Non-existing (yet wished) functionnalities, IMHO
I think it's safe to say this functionality will never appear in python-docx, and if it did it could not look like this.
What such a feature would be doing is asking the page renderer for the location at which the renderer was going to place those characters. python-docx has no rendering engine (because it does not render documents); it is simply a fancy XML editor that selectively modifies XML files in the WordprocessingML vocabulary.
It may be possible to get these values from Word itself, because Word does have a rendering engine (which it uses for screen display and printing).
If there was such a function, I expect it would take a paragraph and a character offset within that paragraph, or something more along those lines, like document.position(paragraph, offset=42) or perhaps paragraph.position(offset=42).
I'm making a multi-weight font from a Thin weight and a Heavy weight. The glyphs that were correctly interpolated look good, but the ones that weren't look jumbled and terrible. (I know it looks like Verdana, don't remind me)
I will provide the two fonts as raw .sfd files, and as .otf exports. Could you help me look into this bug?
Check the number of points, the position of the first point (you can set the first point with CTRL-1), and the direction of the path.
Interpolation becomes especially tricky when there are multiple paths (e.g. any character with an enclosed counter such as o or e) or when it contains multiple references (letters with diacritics, for instance). You need to match up not just the points in each path, but also the order of the different paths and the order of different references. You can reorder paths and references by cutting and re-pasting them; this will move them to the top of the stack.
Unfortunately, path and reference ordering are not displayed. You can number your points (View>Number Points>SVG) which helps somewhat with ordering paths, though lines vs curves get numbered differently, so numbering won't always match exactly even between glyphs that interpolate just fine; also, this numbering lasts only as long as that glyph window is open; and none of this tells you anything about the ordering of references. It's a pain.
I usually just start cutting and re-pasting and use a process of elimination until I get it right.
Make your contours compatible, ie. same number of points and same start points across masters.
I fixed the problem by pasting each path onto a new glyph, corresponding glyphs in the same order.
rgb(255,255,255) notation has been available since CSS1. But #ffffff seems to be vastly more popular.
Obviously it's slightly more compact. I know that hex is more closely related to the underlying bytes, and understand that there would be advantages in carrying out arithmetic on those values, but this isn't something you're going to do with CSS.
Colour values tend to be originated by designers (such as myself) who would never encounter hex notation anywhere else, and are much more familiar with the decimal notation which is the main way of specifying colour in the apps they use -- in fact I have met quite a few who don't realise how a given hex value breaks down into RGB components and assumed it didn't directly relate to the colour at all, like a Pantone colour system reference (eg PMS432).
So, any reason not to use decimal?
Hex values are easier to copy and paste from your favourite image editor.
RGB values are easier to manipulate with Javascript.
(My favourite Hex colour value is #EDEDED and a site we made for a client involved in motorsport had a background colour of #F1F1F1 :-)
Ed.
It's worth noting that if you want to input an RGBA value, hex notation is not supported; i.e., you can't fake it with #FFFFFFff. As a matter of fact, the alpha value must be a number between 0.0 and 1.0, inclusive. (Check out this page for browser support -- as always, IE is leading the pack here. ;) )
HSL and HSLA color support -- which is very designer friendly -- is also provided with a similar syntax to the RGB() style. If a designer were to use both types of color values in the same stylesheet, they might opt for decimal values over hex codes for consistency.
I think it's what you're used to. If you're used to HTML, you'll probably use HEX since it's just been used a lot in HTML. If you're from a design background, using Photoshop/Corel/PaintShopPro etc., then you're likely used to the RGB notation - though, a lot of programs these days incorporate a HEX value field too.
As said, RGBA might be a reason to just go with the RGB notation - consistency.
Though, I think it also depends on the scenario. If you're comfortable with both, you might just switch between them: #fff is a lot easier to type than rgb(255,255,255).
Another question is why people will say #fff instead of White (assuming most browsers support this keyword).
It's all a matter of preference and legibility - if you're maintaining a huge CSS file, being able to look at the colour value and know what colour it is, is a really good advantage. Even more advantageous is using something like LESS or Sass to add a kind of programmability to CSS - allowing constants for example. So instead of saying:
#title { color: #abcdef; }
You might instead do the following with LESS:
#base-color: #abcdef;
#title { color: #base-color; }
Maintaining the CSS becomes less of an issue.
If you're worried about the performance of the browser rendering it's result, then that could also be another factor to your choice.
So in summary, it boils down to:
Familiarity
Preference
Maintainability
Performance
The main reason is probably compactness, as you mentioned. #ffffff can even be further shortened to the #fff shorthand notation.
Another possible reason is that there's a perceived performance increase by saving the browser the trouble of converting the rgb notation.
Traditionally HTML has always used hex colours, so that has carried forward into CSS. Think <font color="#ffffff">
I always used hex, but today I prefer to set my values as:
rgb(82, 110, 188)
in my css files, so whenever I want to add opacity I just need to rename rgb to rgba and add the opacity value. The advantage is that I don't have to convert the hex value to rgb before being able to add the opacity:
rgba(82, 110, 188, 0.5)
CSS was invented by software developers, not designers. Software developers live and breathe hex. From my old C64 days, I can still read most hex numbers without thinking. A9, anyone?
Various things will accept a single hex value where they may have different ways of entering three decimal values. There's also the fact that it's always 6 characters (or 3, admittedly - plus the #) which makes it easier to scan down a list of them.
Just a couple of random thoughts to add to the mix...
Probably a touch of speed when the color is interpreted by a browser. Otherwise some people from design background may know how to compose colors from RGB components when they write code, and some others from programming background are probably more inclined to use HEX values.
HEX is most common due to historical reasons.
Before CSS was common in web development, colors were specified within HTML tags and the most commonly used and supported way to specify a color was to use HEX values.
no valid reason, other than personal preference.
Maybe I've done HTML too long, but I find it easier to think in HEX values. A lot of the pre-defined colour palette for HTML maps neatly to HEX values. Using the shortened format also gives you automatic 'web-safe' colours, though this is not really an issue in the days of 32bit colour displays.
I have always preferred HEX in comparison to RGB or HSL simply due to it being easier for me to read while styling.
When it comes to dynamic editing, I would like to use RGB because of the ease of cycling through different colors. This also helps a little when doing CSS gradients.
I'm using GDI+ in C++. (This issue might exist in C# too).
I notice that whenever I call Graphics::MeasureString() or Graphics::DrawString(), the string is padded with blank space on the left and right.
For example, if I am using a Courier font, (not italic!) and I measure "P" I get 90, but "PP" gives me 150. I would expect a monospace font to give exactly double the width for "PP".
My question is: is this intended or documented behaviour, and how do I disable this?
RectF Rect(0,0,32767,32767);
RectF Bounds1, Bounds2;
graphics->MeasureString(L"PP", 1, font, Rect, &Bounds1);
graphics->MeasureString(L"PP", 2, font, Rect, &Bounds2);
margin = Bounds1.Width * 2 - Bounds2.Width;
It's by design, that method doesn't use the actual glyphs to measure the width and so adds a little padding in the case of overhangs.
MSDN suggests using a different method if you need more accuracy:
To obtain metrics suitable for adjacent strings in layout (for example, when implementing formatted text), use the MeasureCharacterRanges method or one of the MeasureString methods that takes a StringFormat, and pass GenericTypographic. Also, ensure the TextRenderingHint for the Graphics is AntiAlias.
It's true that is by design, however the link on the accepted answer is actually not perfect. The issue is the use of floats in all those methods when what you really want to be using is pixels (ints).
The TextRenderer class is meant for this purpose and works with the true sizes. See this link from msdn for a walkthrough of using this.
Append StringFormat.GenericTypographic will fix your issue:
graphics->MeasureString(L"PP", 1, font, width, StringFormat.GenericTypographic);
Apply the same attribute to DrawString.
Sounds like it might also be connecting to hinting, based on this kb article, Why text appears different when drawn with GDIPlus versus GDI
TextRenderer was great for getting the size of the font. But in the drawing loop, using TextRenderer.DrawText was excruciatingly slow compared to graphics.DrawString().
Since the width of a string is the problem, your much better off using a combination of TextRenderer.MeasureText and graphics.DrawString..