Safe navigation operator in JSONPath? - jsonpath

Ruby and many languages support safe navigation operator:
name = article&.author&.name
Is there any equivalent in JSONPath?

This is unnecessary in JSON Path. Navigation is safe by default. If a value doesn't exist at a given path, the node set returned will just be empty; it shouldn't error.
This may, of course, vary between implementations as we don't have a standard yet, but we're working on one.

Related

spaCy adding pointer to another token in custom component

I am trying to find how token.head and token.children are implemented. I want to replicate this implementation as I add a custom component to my spaCy pipeline for SRL.
That is, each token can point to predicates for which it is an argument. Intuitively, I think that this should work kind of like token.children wherein (I think) it returns a generator of the actually dependent child token objects.
I assume that I should not simply store an attribute of that token as this does not seem very memory efficient and rather redundant. Does anyone know the correct way to implement this? Or is this handled implicitly by the spaCy Underscore.set method?
Thanks!
The Token object is only a view -- it's sort of like holding a reference to the Doc object, and an index to the token. The Span object is like this too. This ensures there's a single source of truth, and only one copy of the data.
You can find the definition of the key structs in the spacy/structs.pxd file. This defines the attributes of the TokenC struct. The Doc object then holds an array of these, and a length. The Token objects are created on the fly when you index into the Doc. The data definition for the Doc object can be found in spacy/tokens/doc.pxd, and the implementation of the token access is in spacy/tokens/doc.pyx.
The way the parse tree is encoded in spaCy is a bit unsatisfying. I've made an issue about this on the tracker --- it feels like there should be a better solution.
What we do is encode the offset of the head relative to the token. So if you do &doc.c[i] + doc.c[i].head you'll get a pointer to the head. That part is okay. The part that's a bit weirder is that we track the left and right edges of the token's subtree, and the number of direct left and right children. To get the rightmost or leftmost child, we navigate around within this region. In practice this actually works pretty well because we're dealing with a contiguous block of memory, and loops in Cython are fast. But it still feels a bit janky.
As far as what you'll be able to do as a user...If you run your own fork of spaCy you can happily define your own data on the structs. But then you're running your own fork.
There's no way to attach "real" attributes to the Doc or Token objects, as these are defined as C-level types --- so their structure is defined statically; it's not dynamic. You could subclass the Doc but this is quite ugly: you need to also subclass.
This is why we have the underscore attributes, and the doc.user_data dictionary. It's really the only way to extend the objects. Fortunately you shouldn't really face a data redundancy problem. Nothing is stored on the Token objects. The definitions of your extensions are stored globally, within the Underscore class. Data is stored on the Doc object, even if it applies to a token --- again, the Token is a view. It can't own anything. So the Doc has to note that we have some value assigned to token i.
If you're defining a tree-navigation system, I'd recommend considering defining it as your own Cython class, so you can use structs. If you use native Python types it'll be pretty slow and pretty large. If you pack the data into numpy arrays the representation will be more compact, but writing the code will be a pretty miserable experience, and the performance is likely to be not great.
In short:
Define your own types in Cython. Put the data into a struct owned by a cdef class, and give the class accessor methods.
Use the underscore attributes to access the data from spaCy's Doc, Span and Token objects.
If you come up with a compelling API for SRL and the data can be coded compactly into the TokenC struct, we'd consider adding it as native support.

Why does 'x-www-form-urlencoded' begin with 'x-www', when other standard content types do not?

I understand that in the past, it was standard for custom headers names to use the prefix "X-" (I'm aware it no longer is considered standard to do this), but I've been unable to find if there is any relationship between this naming convention and the value ("application/x-www-form-urlencoded"). Did it start out as a custom content-type value that was later adopted or something?
I found this link here, which certainly was interesting, but have been unable to find the answer to my question.
Does anybody know the reason this prefix was chosen, and what it signifies?
it was standard for custom headers names to use the prefix "X-"
Actually … no, not at all. To be precise: It has never been a standard, just a best practice. It allowed implementors to introduce new content types and codings without the need to write an entire RFC for it. Nowadays the IANA Media Type Registry is good for that. RFC 6648 put an end to this practice.
The reason application/x-www-form-urlencoded is prefixed in this way (it is listed as a proper MIME type in said registry, btw)) stems from the fact that it is a "custom" method of structuring the query string in a URL. That part has never seen proper regulation. The people behind HTML just went and did it, which fully justified the prefix.
As far as the history: it has the x- prefix because it originated in a proposal from Mosaic—and since it was just a proposal, they used that x- extension prefix to initially define it. But then other browsers implemented it that way too, and nobody ever got around to taking the time to properly standardize an unprefixed alternative, so it just stuck that way, and here were are now.
It can be traced back to a 1993 thread on the www-talk mailing list titled “Submitting input-form data to server”, and in that thread, a September 1993 message from Marc Andreessen:
This is what we're doing in Mosaic 2.0… See
http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/fill-out-forms/overview.html
...for details on what we're up to
That link is broken now but the document, titled “Mosaic for X version 2.0 Fill-Out Form Support” is archived at archive.org. Here’s the relevant excerpt:
ENCTYPE specifies the encoding for the fill-out form contents. This attribute only applies if METHOD is set to POST -- and even then, there is only one possible value (the default, application/x-www-form-urlencoded) so far.
Anyway, application/x-www-form-urlencoded is now formally defined in the URL spec, with algorithms for parsing and serializing it—though the section it’s all defined in has this note:
The application/x-www-form-urlencoded format is in many ways an aberrant monstrosity, the result of many years of implementation accidents and compromises leading to a set of requirements necessary for interoperability, but in no way representing good design practices. In particular, readers are cautioned to pay close attention to the twisted details involving repeated (and in some cases nested) conversions between character encodings and byte sequences. Unfortunately the format is in widespread use due to the prevalence of HTML forms.

What should I do with warnings that show up after upgrading to ARC code?

I got 213 warning.
Here are some and their issues:
UserController.m:
ARC Issue — Assigning retained object to unsafe property; object will
be released after assignment V r’
LoginController.m
ARC Issue — Assigning retained object to unsafe property; object will be released
after assignment
Well, I checked that the object is declared without strong or retain. However, default for ARC files are strong, NOT assign.
Does the compiler still think that the files are non ARC files? Where can I check?
Semantic Issue No ‘assign’, ‘retain’, or ‘copy’ attribute is specified
— ‘assign’ is assumed
Semantic Issue Default property attribute ‘assign’ not appropriate for
non-gc object
Semantic Issue No ‘assign’, ‘retain’, or ‘copy’ attribute is specified
— ‘assign’ is assumed
Semantic Issue Again, no attribute is specified means RETAIN should be
assumed, which is the new default for ARC files.
Those things show up on codes generated automatically by the coredata.
Should I just ignore those warnings?
But it's too anoying
Replacing the code one by one is too time consuming. Also that means I am not taking advantage of the fact that the default is indeed strong.
Maybe I can search and replace. What exact format should I search and replace for?
Programs are working fine.
I would just turn off ARC for the current project and work with it during the next new project that you create. going from non-ARC to ARC is such a pain D:

Character entity references - numeric or not?

So, I know that I can represent an ampersand as & or &.
I have found that at least one method of parsing XML does not allow for the abbreviation-based style - only numeric. Is there a best-practice? I want to instruct my team to use the numeric versions because of my experience, but one instance hardly seems like enough reason to convince them.
Which method should we favor?
XML only has a small set of these symbolic entities, for amp, quot, gt and lt.
The symbolic names we're familiar with for ©, etc. for entities exist because of their appearance in the HTML DTD, here http://www.w3.org/TR/html4/sgml/entities.html (although I think most browsers have this baked in).
Therefore, if you are using (X)HTML, get your doctype right, and then follow the links on w3.org to XHTML to see the entities available.
As far as best practices, most people find the symbolic names easier to understand and will use them when available. I would recommend that.
The only reason not to is that there used to be cases in very old browsers when entities wouldn't work-- but I don't believe this is the case any more.
If you mean other HTML entities, with pure XML, only the entities amp, lt, gt, quot, and apos are pre-defined (apos is not available in HTML, but amp indeed should be).
However, all other HTML entities (such as nbsp) will not be available unless defined in the DOCTYPE, so in such a case, using numeric entities may indeed be preferable.

Dynamic Typing without duck typing?

I'm used to dynamic typing meaning checking for type info of object/non object oriented structure at runtime and throwing some sort of type error, ie if it quacks like a duck its a duck. Is there a different type of dynamic typing (please go into details).
Yes, absolutely. Duck-typing is an idiom which says that the type of a value at this moment in time is based on the fields and methods that it has right now. Dynamic typing just says that types are associated with run-time values, not with static variables and parameters. There is a difference between the two, and you can use the latter without the former.
For example, if you programmed in PHP and limited yourself to the basic types without using OO, then you would be using dynamic typing without using duck-typing.
No, dynamic typing is when values have type but variables do not, so most type checking is done at runtime. So, basically, if the value walks or quacks like a duck, it's a duck, else an error is thrown. Duck typing really just describes a feature of dynamic typing that ensures it will be typesafe (i.e. a method will only run if variable foo'has the right attribute or can execute that method).

Resources