Why is read_xml not properly reading an xml file in R? - r

I am reading the url below with this code in R. But, for some reason, R is not reading or recognizing the "language_dependence" poll that clearly exists in the file if you just copy and paste the below url into a browser.
library(XML)
library(xml2)
url <- https://boardgamegeek.com//xmlapi//boardgame//44669&type=boardgame,boardgameexpansion,boardgameaccesory,rpgitem,rpgissue,videogame&versions=1&stats=1&videos=1&marketplace=1&comments=1&pricehistory=1
data <- read_xml(url)
xmlfile <- xmlParse(data)
xmltop = xmlRoot(xmlfile)
This is the language_dependence poll as it looks like in xmltop, from my code above.
xmltop
<poll name="language_dependence" title="Language Dependence" totalvotes="0">
</poll>
It should look like this:
<poll name="language_dependence" title="Language Dependence" totalvotes="0">
<results>
<result level="1" value="No necessary in-game text" numvotes="0"/>
<result level="2" value="Some necessary text - easily memorized or small crib sheet" numvotes="0"/>
<result level="3" value="Moderate in-game text - needs crib sheet or paste ups" numvotes="0"/>
<result level="4" value="Extensive use of text - massive conversion needed to be playable" numvotes="0"/>
<result level="5" value="Unplayable in another language" numvotes="0"/>
</results>
</poll>
I am out of ideas as what could be going on here. This only happens once in a while, as there are thousandso of simiilar URL's I am reading just fine with this code. Am I reading it in wrong or something? Is there a better way to read it in or parse it? I don't see anything wrong with the xml, but maybe I'm just missing something obvious. Thank you!!

Related

Preventing automatic line break in xml on Atom editor?

Hello everyone.
I am new to Atom and using atom to see xml files. (I didn't setup any additional packages yet. Version 1.19.4)
One of my xml files consist of many attributes. For example..
<book id="test_xml">
<class name="First_row" attrib_01="Grape" attrib_02="Apple" attrib_03="banana" attrib_04="Water melon" attrib_05="Orange" ... (and so on )
</book>
Every has 50 attributes at least.
First time I opened this xml file in atom editor, It shows every class in single line. (This is what I want.) But when I edit attribute value ("Melon" to "Apple"), atom editor breaks the line suddenly and showed one line to multi line like belows.
<book id="Fruit">
<class name="First_row" attrib_01="Grape" attrib_02="Apple"
attrib_03="banana" attrib_04="Water melon"
attrib_05="Orange" ... (and so on )
</book>
Without changing xml format, how to prevent split the single line to multi line?
Thank you.

Notepad++ Function list for SQL

quick question .. I'm trying to get the function list option in Notepad++ going ...
Now, I found this thread:
Notepad++ Function List for PL/SQL
which helped get me started, however, I'm still struggling with something, and I can't seem to wrap my monkey-brain around it.
It'll be this section I need to focus:
<function
mainExpr="^[\t ]*(FUNCTION|PROCEDURE)[\s]*[\w]*[\s]*(\(|IS|AS)*"
displayMode="$functionName">
<functionName>
<nameExpr expr="[\w]+[\s]*(\(|IS|AS)"/>
</functionName>
</function>
That works perfectly fine .. so far.
However, I would like to also see PACKAGE header and PACKAGE BODY names in there as well .. just to help tidy things up.
I figured it'd be easy to tweak the RegExp, however, nothing I've tried is working
So I'm trying to pick out these kinds of scenarios:
CREATE PACKAGE aaa
CREATE OR REPLACE PACKAGE bbb
CREATE PACKAGE BODY ccc
CREATE OR REPLACE PACKAGE BODY ddd
all 4: aaa, bbb, ccc, and ddd.
I can't even get it to pull back one yet.. :(
Hoping I could get some help/hints/something ..
I know this is the main "logic":
mainExpr="^[\t ]*(FUNCTION|PROCEDURE)[\s]*[\w]*[\s]*(\(|IS|AS)*"
that finds the line(s) ..
And trying to matchup the logic with what it finds for .. say, FUNCTIONs, and what I want for PACKAGE ... I tried this:
mainExpr="^[\t ]*(FUNCTION|PROCEDURE|CREATE OR REPLACE PACKAGE)[\s]*[\w]*[\s]*(\(|IS|AS)*"
but even that doesn't pick out the header! O.o
I'm sure there's something I need to do with the part .. but again, not really understanding how it works ??
I've read this :
https://notepad-plus-plus.org/features/function-list.html
but there's obviously something about the syntax/usage of this thing I'm not fully understanding ..
hoping somebody can help me out?
I think your problem is coming from the Regex rather than anything you're doing incorrectly. I've made a new parser based on the one I found here: http://www.hermanmol.nl/?p=240
<parser id="plsql_func" displayName="PL/SQL" commentExpr="((/\*.*?\*)/|(--.*?$))">
<function
mainExpr="^[\w\s]{0,}(PACKAGE BODY|PACKAGE|FUNCTION|PROCEDURE)[\s]{1,}[\w_]{1,}">
<functionName>
<nameExpr expr="^[\w\s]{0,}(PACKAGE BODY|PACKAGE|FUNCTION|PROCEDURE)[\s]{1,}\K[\w_]{1,}"/>
</functionName>
</function>
</parser>
For me this seems to correctly pull out the Package, Procedures and Functions.
One thing to note however, I could not get this to work using a file extension assocation, and used the following instead to test on a text file: <association langID= "0" id="plsql_func" />
I also placed the updated functionList.xml file in both the Program Files (x86)\Notepad++ and the Users\xxxxx\AppData\Roaming\Notepad++ directories.
Edit - a short explanation of the Regex, I'm not great at Regex but it was requested in the comments
^[\w\s]{0,} - From the beginning of the line, find 0 or more letters or white space characters
(PACKAGE BODY|PACKAGE|FUNCTION|PROCEDURE) - followed by any of these
[\s]{1,}[\w_]{1,} - followed by one or more spaces, followed by one or more words
Thanks Chrisrs2292,
I was helped by the location of the functionList.xml file in Users\xxxxx\AppData\Roaming\Notepad++ directories.
RegEX for T-SQL:
<association id= "T-SQL_func" langID="17"/>
<!-- T-SQL-->
<parser displayName="T-SQL" id="T-SQL_func" commentExpr="(?s:/\*.*?\*/)|(?m-s:--.*?$)">
<function mainExpr='(?im)^\h*(create|alter)\s+(function|procedure)\s+((\[|")?[\w_]+(\]|")?\.?)?((\[|")?[\w_]+(\]|")?)?'
displayMode="$functionName">
<functionName>
<nameExpr expr='(?im)(function|procedure)\s+((\[|")?[\w_]+(\]|")?\.?)?((\[|")?[\w_]+(\]|")?)?' />
</functionName>
</function>
Working from what Chrisrs2292 has provided from above, I played around with some PL/SQL code on regex101.com to find a regular expression to find the functions/procedures/etc, and got put that into the functionList.xml.
<parser
displayName="SQL Node"
id="sql_node"
commentExpr="((/\*.*?\*)/|(--.**$))"
>
<function
mainExpr="^[\w\s]+(PACKAGE BODY|PACKAGE|PROCEDURE|FUNCTION)\s+[\w"\.]+"
>
<functionName>
<nameExpr expr="^[\w\s]+(PACKAGE BODY|PACKAGE|PROCEDURE|FUNCTION)\s+\K[\w"\.]+"/>
</functionName>
</function>
</parser>
The big change was I had include the double quotes (using the XML code of ") and dot (the \.) since many times I use the quotes and like to use the fully qualified name (schema.procedure|function|etc). Minor changes are replacing the {0,} with * and {1,} with +. These are minor, cosmetic changes are should be interchangeable.

Using awk/cut/sed with lines containing different numbers of fields

I'm using the pinboard.in API to get a list of my current bookmarks. The results look like this:
<post href="https://www.nocc.meezy.com/doc/view.cgi?id=715" time="2013-02-11T17:38:10Z" description="Disk Errors Process Flow Chart" extended="" tag="nocc work" hash="a3419515b2e956e86886ba630b6028b7" meta="d793aeef6133a26e361695181eb57b9d" />
<post href="https://www.nocc.meezy.com/doc/view.cgi?id=39" time="2013-02-11T17:38:08Z" description="Using socat" extended="" tag="socat work" hash="fd60523bf841b2b95674a0e1d4401f4d" meta="5f2b6ad395fe4da05b2987d199b675ea" />
<post href="https://agora.meezy.com/wiki/Development_Tools" time="2013-02-11T17:38:06Z" description="Development Tools - meezyWiki" extended="" tag="devtools work" hash="dcf904433987a125c00a88bcaf31cad27" meta="5e744562282561390a0417223d323aee" />
I'm only interested in the URL, description, and tags, so I'd like to have the results look like this:
https://www.nocc.meezy.com/doc/view.cgi?id=715 description="Disk Errors Process Flow Chart" tag="nocc work"
https://www.nocc.meezy.com/doc/view.cgi?id=39 description="Using socat" extended="" tag="socat work"
https://agora.meezy.com/wiki/Development_Tools description="Development Tools - meezyWiki" tag="devtools work
I know a little bit about awk/cut/sed but not enough to tell them how to count the fields correctly when the description and tag fields contains spaces and different numbers of strings.
I could probably hack together some really crappy solution if my life depended on it but I'd rather get a proper solution by someone who knows them much better than I do.
Thanks
when you playing with xml with regex/awk/sed.. you should know the risk. here is sed one-liner for your requirement:
sed -r 's/^.*"(http)/\1/; s/" time=.*( desc)/ \1/; s/extended=.*( tag=")/\1/; s/hash=.*//' file
test with your example:
kent$ sed -r 's/^.*"(http)/\1/; s/" time=.*( desc)/ \1/; s/extended=.*( tag=")/\1/; s/hash=.*//' file
https://www.nocc.meezy.com/doc/view.cgi?id=715 description="Disk Errors Process Flow Chart" tag="nocc work"
https://www.nocc.meezy.com/doc/view.cgi?id=39 description="Using socat" tag="socat work"
https://agora.meezy.com/wiki/Development_Tools description="Development Tools - meezyWiki" tag="devtools work"

XForms bind element error

I am changing my code to use binds in XForms (which is better practice than using nodesets everywhere!) but I am getting errors.
The error message I receive is: "Error: XForms Error (8): id (data_criterion) does not refer to a bind element..."
From tutorials/guides I have been using, it seems as though this should work, but clearly I am missing something! (btw, I was modeling my binding code after the examples here: http://en.wikibooks.org/wiki/XForms/Bind)
I originally thought the problem was due to the fact I was using xf:select controls as opposed to xf:input like the examples, but even once I dumbed down my code to the most simplistic code, I still receive errors!
This is the model code I am using:
<xf:model id="select_data">
<xf:instance id="criteria_data" xmlns="">
<file>
<criteria>
<criterion></criterion>
</criteria>
</file>
</xf:instance>
<bind id="data_criterion" nodeset="instance('criteria_data')/criteria/criterion"/>
</xf:model>
As for the ui code, this is what I have:
<xf:input bind="data_criterion">
<xf:label>Enter criteria:</xf:label>
</xf:input>
The error message I receive is: "Error: XForms Error (8): id (data_criterion) does not refer to a bind element..."
Anyone have any insight to what the problem is? Also, is there any special usage of bindings and xf:select (with xf:itemset) controls that I should be aware of? (I am ultimately using a lot of xf:select controls on my form..)
Thanks in advance!
EDIT:
I ran the code through this validator, and I got this message (refers to the bind line):
"Warning: Should the following element have the XForms namespace applied?: bind (line 66)"
A couple of things you might want to change:
Not sure of this is the reason for the error, but the nodeset expression should be instance('criteria_data')/criteria/..., without file. Remember: instance() returns the root element, not the document node. (This one you took care by updating the question; good)
You are missing the xf on the bind. It should be: <xf:bind id="data_criterion" nodeset="instance('criteria_data')/criteria/criterion"/>.
See below a full example with your code, which works fine for me under Orbeon Forms:
<xhtml:html xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:xforms="http://www.w3.org/2002/xforms"
xmlns:xf="http://www.w3.org/2002/xforms"
xmlns:xxforms="http://orbeon.org/oxf/xml/xforms"
xmlns:ev="http://www.w3.org/2001/xml-events"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fr="http://orbeon.org/oxf/xml/form-runner">
<xhtml:head>
<xhtml:title>SO Bind</xhtml:title>
<xf:model id="select_data">
<xf:instance id="criteria_data" xmlns="">
<file>
<criteria>
<criterion>Gaga</criterion>
</criteria>
</file>
</xf:instance>
<xf:bind id="data_criterion" nodeset="instance('criteria_data')/criteria/criterion"/>
</xf:model>
</xhtml:head>
<xhtml:body>
<xf:input bind="data_criterion">
<xf:label>Enter criteria:</xf:label>
</xf:input>
</xhtml:body>
</xhtml:html>

How to convert ticks into a readable datetime with XSLT?

I have an XML with timestamps like this:
<node stamp="1236888746689" />
And I would like to display them in the result HTML as date with time.
Is there a way to do it with XSLT (any Version)?
EDIT:
I am using XSLT2.0 with Saxon9. The base date is 1970-01-01 0:00.
You take the date 1970-01-01T00:00:00 and add as many milliseconds as the value of the stamp tells you:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://www.w3.org/TR/xhtml1/strict">
<xsl:template match="node">
<xsl:value-of
select='xs:dateTime("1970-01-01T00:00:00") + #stamp * xs:dayTimeDuration("PT0.001S")'/>
</xsl:template>
</xsl:stylesheet>
If you are using an XSLT 1.0 processor which supports the EXSLT date functions (I've just tested this with libxslt in PHP), you can use date:add() and date:duration():
<xsl:value-of select="date:add('1970-01-01T00:00:00Z', date:duration(#stamp div 1000))"/>
The date:duration() function takes a number of seconds (so you have to divide your milliseconds by 1000) and turns it into a "duration" (in this case, "P14315DT20H12M26.6889998912811S"), which is then added to the start of your epoch (looks like the standard epoch, for this stamp) with date:add() to get a stamp of "2009-03-12T20:12:26.6889998912811Z". You can then format this using the EXSLT date functions or just substring(), depending on what you need.
Belated answer, yes, I know, but I couldn't find the one I was looking for here, so I thought I'd pay it forward with my solution.
My XML was a few nodes dumped from Drupal using export_node and drush. I was using the xslt processor in PHP5, which only supports xslt 1.0. Some EXSLT functions appear to be supported, but I couldn't tell whether my syntax was wrong or the function I was trying to use was not supported. Anyway, the following worked for me. I used the example code from w3schools.com, but added a line right after declaring the xsltprocessor, like below:
$xp = new XsltProcessor();
$xp->registerPHPFunctions();
PHP has a trivial function for date conversion, so I cheated and used the PHP processor, since I was already using it to transform my xsl.
<xsl:for-each select="node_export/node">
<xsl:value-of select="php:function('date', 'n-j-y', number(timestamp))"/>
</xsl:for-each>
Hope this helps someone out there. I was banging my head for quite a while as I worked this one out.
If you wanted to use an XSL 1.0 processor that does not support the EXSLT date and time functions this is non-trivial, but it has been done.
You can have a look at Katy Coe's XSLT 1.0 implementation of the "iso-from-unix" function. It's part of a rather huge "free for non-commercial use" set of date and time functions she created.
However, your XSL processor must support the "http://exslt.org/functions" namespace for this implementation to work. Other than that there is no dependency on EXSLT.
P.S.: I'm aware that a Unix timestamp and ticks are not exactly the same thing. They are close enough, though.
XSLT is Turing complete, so there must be a way. :) Knowing at least a bit of XSLT, it will probably involve recursion.
You don't specify the exact interpretation of your "ticks", I'm guessing milliseconds since some epoch, but which? 1970?

Resources