python-docx style error with 'List Paragraph' - docx

I am using python-docx to convert Word docx files into a proprietary XML format.
I'm having trouble with bullet/enumerated lists.
In a number of Word documents when I open them with python-docx and look at the paragraph style of the bullet/enumerated lists, some of the items in the list will be 'List Paragraph' but many of them will be 'Normal'.
Assuming they should all be 'List Paragraph', is there a way I can verify if this is an issue with the Word document or with the python-docx package?
Also, is there a way to identify these bullets/numbers when the paragraph style isn't what it should be?
Eg. using paragraph_format?

A bullet-point can appear on a paragraph in Word at least two different ways:
The user applies a paragraph style, like "List Paragraph"
The user applies a bullet directly to the paragraph, probably using the bullet button on the toolbar.
I suspect users tend to fall into one of these two habits. Using styles consistently allows you to adjust the formatting of all those paragraphs just by modifying the style. But I suspect 98%+ of users cultivate the "click the bullet button" habit.
In any case, it's not surprising to find a document that's a mixed bag that way.
Unfortunately, python-docx doesn't currently have support for directly-applied bullets, either for applying them or detecting them.
If you have the skills to inspect the XML of the paragraph (print(paragraph._p.xml) is a start), then you can probably use an XPath expression on paragraph._p (the XML element underlying the paragraph) to detect if it has what I believe is a <w:bu> element, which would indicate it had a directly-applied bullet. Inspecting the XML of a paragraph known to have a directly applied bullet should give you the details of what you'd be looking for there.

Related

CSS selector with parentheses and some sort of index in style inspector

When looking at the elements of a page that I am analyzing using Chrome DevTools, I am seeing the following weird display:
What is that "(1)" in the end? Since the source is user agent stylesheet, I can't drill down any further.
In the Elements panel, I see similar weirdness:
I thought parentheses were not permitted in CSS selector names. What is the "primaryNavId:(primaryLi1)" being used above?
UPDATE:
A more detailed screencap of the "inherited from" line (Styles pane):
When I click on the "inherited from" line, I get the following in the Styles pane:
UPDATE 2 - FIREFOX INSPECT:
Firefox displays the same information in the Elements pane for the item in question, but the Styles panel shows it differently, as follows:
What a mess. Now I understand why you tagged your original question (and this one) css-selectors.
To start, browser developer tools naïvely assume that classes and IDs don't contain any special CSS selector characters and can therefore be represented in CSS selector notation without any escape sequences. So what you get is something that looks like a selector, but on closer inspection is actually malformed. You can see evidence of this in pretty much every browser's developer tools. The breadcrumb navigation, for example, in every one of them will list the li element as li followed by a period (for a class selector) followed by the class name without any escape sequences. Evidently the same appears to hold true for IDs.
It would seem that Google Chrome uses this same notation for "Inherited from" labels. Firefox is smart enough to only list the element's element type (which is far more predictable), i.e. "Inherited from li", and display the actual style rule and the actual selector from the source CSS, but its breadcrumb navigation suffers from the same problem making it kind of moot.
What you're looking at in the element inspector, however, is not a selector. It's an HTML class attribute. The syntactic rules are completely different. And that's why I said that this answer of mine that you previously linked to was completely irrelevant to your original question. But I can see why you're confused, seeing as HTML and CSS are closely related and CSS has dedicated class and ID selectors. (I suspect there wouldn't be any confusion if we were forced to use quoted attribute selectors for all attributes from the beginning — but attribute selectors weren't even around until CSS2.)
As to why the class name that's reflected in the Styles pane is different from the one that's reflected in the element inspector, the reason for that is not clear. Either you're inspecting different elements altogether, or something else is at play (and judging by the cryptic-looking class names, it may well be some funky client-side scripting framework voodoo magic).

Why does Notepad++ show some (valid?) CSS in black?

When I write CSS in Notepad++, the color coding sometimes seems inconsistent. Normally, selectors are shown in light purple but sometimes they are black for 1 or more lines consecutively. I don't see anything wrong with such lines. Why are they black? What am I missing here?
i'm not sure why that happened to you!?
but you can add keywords to notepad++ :
Setting => Style Configurator ..
Select your language and Style.
Add your keyword like color and etc , separated by space :
Usually, that sort of coloring indicates that the CSS rule immediately preceding the affected one hasn't been closed. Here's an example where I remove the closing brace from a rule in normalize.css, which affects the one that immediately follows in exactly the same way (ignoring the comment and the lack of bold type, of course):
Presumably then, the reason why the "first" declaration after that selector is affected but the subsequent ones are not is because the semicolon from the first declaration tells the syntax coloring parser to terminate the nonsensical statement which is formed by the selector. But I'm just blindly guessing.
If you're sure that the preceding rule has been closed properly, then the syntax coloring parser may have been confused. Try simply highlighting the rule, deleting it, and undoing; that usually works for me.
Since Notepad++ recognizes color of codes based on the language type, you can't able to view multiple languages with color codes in a same file. Even though CSS is a part of web designing, it is still a unique language. If you want to display the CSS codings inside the HTML to color, just change the language type to CSS (only for temporarily purpose). But, don't forget to revert the language conversion back to HTML before saving the file.
Steps: Language -> C -> CSS

True or not: We should always use proper capitalization and never put whole sentences in all-uppercase

True or not: We should always use proper capitalization and never put whole sentences in all-uppercase. If we must do so, we should use CSS for this task."
Should we use the CSS property text-transform for other cases if we need them?
(Note that I'm not talking about HTML tags, I’m talking about text content)
Links to read:
http://blog.mauveweb.co.uk/2009/01/14/dont-use-uppercase-in-html/
http://www.webaim.org/techniques/fonts/#caps
Huh? For normal text? That sounds like a ridiculous idea to me. Every language has its rules about what's lowercase and what's uppercase. Why would one want to divert from that?
Update: Sorry Jitendra, I didn't read your update closely. Now this
I head Screen reader spell letter by letter if we use UPPERCASE.
could well be - say, for USA to be spelled like U S A. I could imagine some screen readers do this. But this would only mean not to put words in ALL CAPS - which is a rule you would want to follow anyway.
Having all text in lowercase and uppercasing the right words through text-transform, you would have to put a CSS class on every word that needs to be capitalized - extremely cumbersome, would result in horribleHTML soup, and wouldn't make sense. Just use normal capitalization, and don't use all caps.
You should write content of a page with proper grammar, spelling, and capitalization just as you would in an essay. Navigation and logos should start with an Uppercase (or if it's a name, the proper spelling for the name, e.g. iPhone, not Iphone or IPhone.) Only use CSS capitalization for stylization. So, if you want your site's name to be in all caps (MY WEBSITE) use CSS to make it all caps, but in the HTML make sure it's proper (My Website).
Hope this helps!
It's generally a good idea to concentrate on what's easy for people to read. Almost always, for almost all sorts of information presentation, conventional typographic rules for the language of the site are appropriate, and you should not do anything different without having a really good reason.
The W3C states that all XHTML elements and attribute names should be in lowercase:
XHTML documents must use lower case
for all HTML element and attribute
names. This difference is necessary
because XML is case-sensitive e.g.
<li> and <LI> are different tags.
As for web page content in between tags, of course it is not necessary.
Jaws does not spell out words if they are recognized as English words. FOR EXAMPLE "THIS IS PRONOUNCED NORMALLY." sounds the same as "This is pronounces normally." When dealing with abbreviations capitalization matters. For example "usa" is pronounced phonetically as one syllable. “USA” is pronounced as “u s a” Made up words tend to be pronounced the same regardless of capitalization, for example “FDIOSUF” is pronounced the same as “Fdiosuf”
I'm not talking about HTML TAG i'm talking about text content? I head Screen reader spell letter by letter if we use UPPERCASE.
my question was "Should we always use lowercase text in web page's content?" and use css text-transform for other cases if we need.
Just use natural text, as you did in your SO question. Screen readers will generally read ALL UPPERCASE as individual letters, as such text is generally an acronym (it'll likely vary from reader to reader - some handle things more intelligently than others, and may be able to figure out that a whole sentence isn't likely to be an acronym).
You don't have to lowercase every letter, though - a screen reader shouldn't have any problem with "This Is A Sentence."
UPPERCASE text that isn't an acronym should be done with CSS's text-transform: uppercase;.
It has nothing to do with screen readers. For actual content, you should use normal capitalization. For element names and attributes, you must use lower case if you're using XHTML, because it's case-sensitive and the spec says the tag names and attribute names are lower case. These are two completely different things (content vs.markup).
Edit Re your edited question: You should avoid incorrect use of ALL UPPER CASE TEXT (that would be an example of incorrect use), because screen readers may well spell that out on the theory that it's an acronym like HTML or W3C. But not doing ALL CAPS is not the same as doing all lower case. Use initial capitals at the beginnings of sentences, etc. Don't use ALL CAPS for emphasis, use <em> (or <strong>, depending on the type of emphasis). Doing so marks up your text semantically, which actually helps the screen reader do its job (by allowing it to put emphasis where it should be put).
yes you should, if you would like to modify the text letters use the css property text-transform http://www.quackit.com/css/properties/css_text-transform.cfm

How to markup scientific names in XHTML?

I would like to know the best way to markup scientific names in XHTML. This answer says I should use the <i> element, which I'm not too in favour of, unless of course, it is the right element to use.
I could use <span class="scientific">, but seeing as one of the first uses of HTML was to markup scientific documents, I'd imagine there'd be a better semantic hook for this sort of thing.
Also, is there any element to markup the common name of a scientific name?
Note: It looks like Wikipedia, or at lease this article is using <i> for scientific nams.
Edit: Would the <dfn> tag be appropiate?
dfn is for a definition of something.
<p>The prefix cardio- means <dfn>of the heart</dfn>.</p>
As far as I can see in the list of HTML 4 elements nothing specifically fits the bill. This leaves you with a few options:
<span class="scientific">cardio</span>
The semantics are added by the class, and so this is probably the most correct way, technically. However, it does have the downside that without your CSS, it won't appear different in any way to the surrounding text. Another option might be this: /me prepares to duck for cover
<i class="scientific">cardio</i>
Now before I get my head bitten off for using the verboten element, <i>, consider that it is no less descriptive than using <span>, and even if a stylesheet were missing, you'd still get vaguely the correct formatting. Just make sure you add the class attribute.
In (X)HTML5, the i element should be used:
[…] such as a taxonomic designation, a technical term, an idiomatic phrase or short span of transliterated prose from another language, a thought, or a ship name in Western texts.
I guess "taxonomic designation" matches your case.

Use an image name held inside an element - XML/CSS

I am making a CSS file for an XML document. I need to insert an image - the filename of which is contained inside an element.
This is part of the xml code I am referring to:
<description>
<text> Club Praia is divided and linked by a foot bridge over the main street which runs
through the heart of Praia Da Rocha.</text>
<image>pria.jpg</image>
</description>
Ordinarily, I would use text:after{ content: url(filename.jpg) }, which would work if it was the same one image being used each time. Instead, I need to insert the specific image name (in this one example pria.jpg) held inside the image element tags, into the generated content code.
Is there a way I can do this? (Hope my explanation made sense!!)
This will probably be possible with CSS3 named strings. However, this spec is not even halfway finished, so I'd recommend you go with XSLT.
You can definitely do this using XSLT, but if there is a pure CSS solution, I would love to see it too.

Resources