Preventing automatic line break in xml on Atom editor? - atom-editor

Hello everyone.
I am new to Atom and using atom to see xml files. (I didn't setup any additional packages yet. Version 1.19.4)
One of my xml files consist of many attributes. For example..
<book id="test_xml">
<class name="First_row" attrib_01="Grape" attrib_02="Apple" attrib_03="banana" attrib_04="Water melon" attrib_05="Orange" ... (and so on )
</book>
Every has 50 attributes at least.
First time I opened this xml file in atom editor, It shows every class in single line. (This is what I want.) But when I edit attribute value ("Melon" to "Apple"), atom editor breaks the line suddenly and showed one line to multi line like belows.
<book id="Fruit">
<class name="First_row" attrib_01="Grape" attrib_02="Apple"
attrib_03="banana" attrib_04="Water melon"
attrib_05="Orange" ... (and so on )
</book>
Without changing xml format, how to prevent split the single line to multi line?
Thank you.

Related

Viewing MS Word .docx files in Midnight Commander

I want to be able to quickly view (with F3) the content of Word doc/docx files in Midnight Commander. MC's extensions file calls /usr/lib/mc/ext.d/doc.sh, which contains wv, antiword, catdoc, and word2x as helper programs. On my system (debian), the first three are available, but none of them are able to deal with the newer docx format.
The obvious solution is to use LibreOffice:
libreoffice --headless --convert-to "txt:Text (encoded):UTF8" filename.docx
This works well, but how do I tell MC to use it and display the result of the conversion? If I put this in ~/.config/mc/mc.ext, replacing the lines
View=%view{ascii} /usr/lib/mc/ext.d/doc.sh view msdoc
with
View=libreoffice --headless --convert-to "txt:Text (encoded):UTF8" "${MC_EXT_FILENAME}"
then I end up with a filename.txt file in the current directory, and nothing is displayed. What I want to happen is for mc to do the conversion when I press F3 and discard it when I quit the viewer. (I guess the converted file would be written to /tmp/ and removed on quit.)
Bonus: it would be nice if the displayed file would be word-wrapped, I suppose that could be done by using the wrap command?
Can I do this without having to modify /usr/lib/mc/ext.d/doc.sh, in my ~/.config/mc/mc.ext?
I use docx2txt:
View=%view{ascii} docx2txt %f -
Also you don't need such a long conversion string in libreoffice.
libreoffice --cat %f
is enough.

Avoid rendering of specific .md files from blogdown::serve_site()

I have a file located at
content/post/data_for_posts/my_file.md
I have it there because it's quite easy to do htmltools::includeMarkdown("data_for_posts/my_file.md") and recycle this file in different posts.
My problem is that when I serve_site() this creates a public/post/data_for_posts/index.html, which means, it gets posted to my website (as a January 1 of 0001). I guess I could change the date to year 10000, but I would rather handle it the way I handle the .Rmd and other files, as suggested here
I have tried to modify my config.toml but have not managed to solve the issue.
ignoreFiles = ["\\.Rmd$", "\\.Rmarkdown$", "_files$", "_cache$", "content/post/data_for_posts/my_file.md"]
Here are a couple techniques that I use to do this:
Rename data_for_posts/my_file.md so it uses a file extension that hugo does not interpret as a known markup language, for example change .md to .markd or mdn.[*]
Rename data_for_posts/my_file.md so it includes a string that you will never use in a real content file, for example data_for_posts-UNPUBLISHED/my_file.md. Then add that string (UNPUBLISHED or whatever) to your config ignoreFiles list.[**]
[*] In the content/ directory, a file with one of the following file extensions will be interpreted by hugo as containing a known markup language: .ad, .adoc, .asciidoc, .htm, .html, .markdown, .md, .mdown, .mmark, .pdc, .pandoc, .org, or .rst (this is an excerpt of something I wrote).
[**] The strings listed in ignoreFiles seem to be case sensitive so I like to use all-upper-case characters in my ignored file names (because I never use upper-case chars in real content file names). Also note that there is no need to specify the path and my experience is that path delimiters (/ or \) cause problems.

Why does my query report "Steps within a path expression must yield nodes"?

I'm relatively new to XQuery and I'm using a XML with the following format (MODSXML):
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.loc.gov/mods/v3
http://www.loc.gov/standards/mods/v3/mods-3-0.xsd">
<mods ID="ISI:000330282600027" version="3.0">
<titleInfo>
<title>{Minimum Relative Entropy for Quantum Estimation: Feasibility and General
Solution}</title>
</titleInfo>
I'm trying to retrieva all titles of the articles contained on the XML file. The expression I'm using is the following:
for $x in collection("ExemploBibtex")/"quantuminformation.xml"/modsCollection/mods/titleInfo/title
return <title>$x/text()</title>
When I try to run this expression on Base, I get the following error:
"[XPTY0019] Steps within a path expression must yield nodes; xs:string
found."
Can anybody tell me what's wrong? The result I was expecting was a list with all the titles in the document.
Okay, problem solved in the BaseX Mailing List :D
I needed to declare the namespace. So now I'm using:
declare namespace v3 ="http://www.loc.gov/mods/v3";
for $doc in collection('ExemploBibtex')
where matches(document-uri($doc), 'quantuminformation.xml')
return $doc/v3:modsCollection/v3:mods/v3:titleInfo/v3:title/text()
And it works.
The problem is here:
collection("ExemploBibtex")/"quantuminformation.xml"/modsCollection
This returns a string with content quantuminformation.xml for each file/root node in the ExemploBibtex collection, and then tries to perform an axis step on each of these strings -- which is not allowed.
It seems you want to access to document quantuminformation.xml within the collection ExemploBibtex. To open a specific file of a collection, use following syntax instead:
collection("ExemploBibtex/quantuminformation.xml")/modsCollection
I cut of the last axis steps for readability and keeping the code lines short; simply add them again, they're fine.

How to get alt text from images in a docx file using Open XML and C#

I am creating a web form that will do a 508 compliance check on word documents. I am looking through MSDN and other sites for getting the information I need from a file the user selects. The one thing I can't find is how to find images, and check to see if they have alternative text. Any help would be GREATLY appreciated!
Images inserted into 2007+ Word documents are Drawing objects. So you can traverse the XML for w:drawing members.
http://msdn.microsoft.com/en-us/library/documentformat.openxml.wordprocessing.drawing.aspx
The w:drawing member will have a child called w:inline which is a part of the Inline class.
http://msdn.microsoft.com/en-us/library/documentformat.openxml.drawing.wordprocessing.inline.aspx
The w:inline member will have a member called wd:docPr.
http://msdn.microsoft.com/en-us/library/documentformat.openxml.drawing.wordprocessing.docproperties.aspx
The wd:docPr member may have a field called title which houses the alternative text title and a field called descr which houses all the alternative text.
Example XML:
<w:drawing xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<wp:inline distT="0" distB="0" distL="0" distR="0" wp14:anchorId="357A850A" wp14:editId="384E9053" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing">
<wp:extent cx="5943600" cy="4457700" />
<wp:effectExtent l="0" t="0" r="0" b="0" />
<wp:docPr id="1" name="Picture 1" descr="ALL TEXT HERE" title="ALT TEXT TITLE HERE"/>
...
I highly recommend you use the OpenXML Productivity Tool that comes with the OpenXML SDK.
You can do the same thing slightly more easily with unzip and a copy of lxprintf (part of the LTXML2 toolkit), by unzipping the slides in a loop and running lxprintf on each one to locate the wp:docPr element and output the values of #descr and #title, eg
for f in `unzip -l demo.pptx | grep ppt/slides/slide.*\.xml | awk '{print $NF}'`; do
unzip -p demo.pptx $f |\
lxprintf -e 'w:drawing/wp:inline/wp:docPr' "%s, %s\n" #descr #title -
done

XQuery: weird xsi attribute being inserted into my XQuery output

Here is an example of the XQuery output that I get:
<clinic>
<Name xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">Healthy Kids Pediatrics</Name>
<Address xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">510 W 27th St, Los Angeles, CA 90007</Address>
<PhoneNumberList>213-555-5845</PhoneNumberList>
<NumberOfPatientGroups>2</NumberOfPatientGroups>
</clinic>
As you can see, in the <Name> and <Address> tag, there are these strange xmlns:xsi tags being added to it.
The funny thing is if I go to the top of my xml file, and remove:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="vaccination.xsl"?>
<Vaccination xsi:noNamespaceSchemaLocation="vaccination.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
the phrase
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
Then now my XQuery XML output will look like this (which is what I want)
<clinic>
<Name>Healthy Kids Pediatrics</Name>
<Address>510 W 27th St, Los Angeles, CA 90007</Address>
<PhoneNumberList>213-555-5845</PhoneNumberList>
<NumberOfPatientGroups>2</NumberOfPatientGroups>
</clinic>
BUT, when I view my XML in my browser, it will give an error and display something like:
XML Parsing Error: prefix not bound to a namespace
Location: file:///C:/Users/Pac/Desktop/csci585-hw3/vaccination.xml
Line Number 3, Column 1:<Vaccination xsi:noNamespaceSchemaLocation="vaccination.xsd">
^
Does anyone have an idea of how to remove those xsi tags from my XQuery output without breaking my XML/XSL ?
Removing the namespace declaration from the top node makes the XML document invalid, as the xsi prefix is used but not declared. This should have caused an error when you try to load the document in a query.
I assume that the Name and Address nodes are copied directly from the source document and the other nodes are constructed.
When copying a node from the source document, the in scope namespaces from the source node are combined with the in scope namespaces in the node that contains the copy. The way these are combined is specified by the copy-namespaces-mode.
In your case you want namespaces to be inherited from the parent node (the node in the query), but you do not want to preserve namespaces in the source document where they are unnecessary.
This can be achieved by adding the following line to the top of the query:
declare copy-namespaces no-preserve, inherit;

Resources