Generate temporary URIs - uri

I have an application that, for a specific resource at URI x, generates a system of URIs such as x#a, x#b and so on, and stores them in an RDF graph at any position in a triple, even as predicates.
This is fine but the issue arises when the resource that should be described in this way is provided "inline" i.e. without a URI. In RDF I have the option to use blank nodes, but there are two issues I have with them:
Blank nodes are mostly regarded with existential meaning, i.e. they denote some node. The nodes produced by the application are not just some nodes – they are exactly the nodes produced from one specific resource that was given to them at one specific time. In contrast, a blank node can be "satisfied" by a concrete URI node with the same relations to other nodes, becoming redundant in presence of such a node.
Blank nodes cannot be unfortunately used as predicates in RDF (or at least in some formatting schemes).
This means that I have to find a URI, but I am wondering what scheme to choose. So far I have considered these options:
Generate a new UUID and use urn:uuid:. This is sufficient, but I would prefer something that expresses the temporary status of the URI better, so that it would be apparent it was generated for this one case only. In comparison, UUIDs can be used as stable identifiers for more permanent resources as well, and there is no inherent distinction.
Use a proprietary URI, like nodeID: which is sometimes used for blank nodes, but is still a valid URI nonetheless. I would append a new UUID to that as well. This seems a bit better to me since it mimics blank nodes but without their different semantics, but I would still prefer something more standardized.
Use tag: URIs. Tags have a time component to them, so the temporal notion is expressed sufficiently. However, I am not sure about the authority since whoever runs the application will in essence become the authority. Another option is to use my own domain and somehow "grant" the rights of minting tags to anyone running the application (or just to anyone in general). This way, I can express everything I need (the temporary notion of the resource, the uniqueness since I would also include a new UUID in the URI, and a description of the purpose and use of the tags on the website itself).
This is in my opinion not a bad solution, but it somewhat limits the usage of the application, as once my control of the domain expires, users of the application will no longer have the rights to mint new tags under the original authority, and they will have to find a new one (unless I leave the authority blank; while it is not strictly stated by the specification, anything is technically valid since implementations must not reject non-conforming authorities). I am also concerned about the time component itself, as it might pose a security risk (although it can be less specific). I am also not overly accustomed to the concept of tags, so I am not sure if it isn't against their spirit.
I can provide all of these methods as options in the application, but I would like to know if there are perhaps better options.

Related

static in a web application

I want to generate a very short Unique ID, in my web app, that can be used to handle sessions. Sessions, as in users connecting to eachother's session, with this ID.
But how can I keep track of these IDs? Basically, I want to generate a short ID, check if it is already in use, and create a new if it is.
Could I simply have a static class, that has a collection of these IDs? Are there other smarter, better ways to do this?
I would like to avoid using a database for this, if possible.
Generally, static variables, apart from the places may be declared, will stay alive during application lifetime. An application lifetime ended after processing the last request and a specific time (configurable in web.config) as idle. As a result, you can define your variable to store Short-IDS whenever you are convenient.
However, there are a number of tools and well-known third-parties which are candidate to choose. MemCache is one of the major facilities which deserve your notice as it is been used widely in giant applications such as Facebook and others.
Based on how you want to arrange your software architecture you may prefer your own written facilities or use third-parties. Most considerable advantage of the third-parties is the fact that they are industry-standard and well-tested in different situations which has taken best practices while writing your own functions give that power to you to write minimum codes with better and more desirable responses as well as ability to debug which can not be ignored in such situations.

Why isn't HTTP PUT allowed to do partial updates in a REST API?

Who says RESTful APIs must support partial updates separately via HTTP PATCH?
It seems to have no benefits. It adds more work to implement on the server side and more logic on the client side to decide which kind of update to request.
I am asking this question within the context of creating a REST API with HTTP that provides abstraction to known data models. Requiring PATCH for partial updates as opposed to PUT for full or partial feels like it has no benefit, but I could be persuaded.
Related
http://restcookbook.com/HTTP%20Methods/idempotency/ - this implies you don't have control over the server software that may cache requests.
What's the justification behind disallowing partial PUT? - no clear answer given, only reference to what HTTP defines for PUt vs PATCH.
http://groups.yahoo.com/neo/groups/rest-discuss/conversations/topics/17415 - shows the divide of thoughts on this.
Who says? The guy who invented REST says:
#mnot Oy, yes, PATCH was something I created for the initial HTTP/1.1 proposal because partial PUT is never RESTful. ;-)
https://twitter.com/fielding/status/275471320685367296
First of all, REST is an architectural style, and one of its principles is to leverage on the standardized behavior of the protocol underlying it, so if you want to implement a RESTful API over HTTP, you have to follow HTTP strictly for it to be RESTful. You're free to not do so if you think it's not adequate for your needs, nobody will curse you for that, but then you're not doing REST. You'll have to document where and how you deviate from the standard, creating a strong coupling between client and server implementations, and the whole point of using REST is precisely to avoid that and focus on your media types.
So, based on RFC 7231, PUT should be used only for complete replacement of a representation, in an idempotent operation. PATCH should be used for partial updates, that aren't required to be idempotent, but it's a good to make them idempotent by requiring a precondition or validating the current state before applying the diff. If you need to do non-idempotent updates, partial or not, use POST. Simple. Everyone using your API who knows how PUT and PATCH works expects them to work that way, and you don't have to document or explain what the methods should do for a given resource. You're free to make PUT act in any other way you see fit, but then you'll have to document that for your clients, and you'll have to find another buzzword for your API, because that's not RESTful.
Keep in mind that REST is an architectural style focused on long term evolution of your API. To do it right will add more work now, but will make changes easier and less traumatic later. That doesn't mean REST is adequate for everything and everyone. If your focus is the ease of implementation and short term usage, just use the methods as you want. You can do everything through POST if you don't want to bother about clients choosing the right methods.
To extend on the existing answer, PUT is supposed to perform a complete update (overwrite) of the resource state simply because HTTP defines the method in this way. The original RFC 2616 about HTTP/1.1 is not very explicit about this, RFC 7231 adds semantic clarifications:
4.3.4 PUT
The PUT method requests that the state of the target resource be created or replaced with the state defined by the representation enclosed in the request message payload. A successful PUT of a given representation would suggest that a subsequent GET on that same target resource will result in an equivalent representation being sent in a 200 (OK) response.
As stated in the other answer, adhering to this convention simplifies the understanding and usage of APIs, and there is no need to explicitly document the behavior of the PUT method.
However, partial updates are not disallowed because of idempotency. I find this important to highlight, as these concepts are often confused, even on many StackOverflow answers (e.g. here).
Idempotent solely means that applying a request one or many times results in the same effect on the server. To quote RFC 7231 once more:
4.2.2 Idempotent methods
A request method is considered "idempotent" if the intended effect on the server of multiple identical requests with that method is the same as the effect for a single such request.
As long as a partial update contains only new values of the resource state and does not depend on previous values (i.e. those values are overwritten), the requirement of idempotency is fulfilled. Independently of how many times such a partial update is applied, the server's state will always hold the values specified in the request.
Whether an intermediate request from another client can change a different part of the resource is not relevant, because idempotency refers to the operation (i.e. the PUT method), not the state itself. And with respect to the operation of a partial overwriting update, its application yields the same effect after being applied once or many times.
On the contrary, an operation that is not idempotent depends on the current server state, therefore it leads to different results depending on how many times it is executed. The easiest example for this is incrementing a number (non-idempotent) vs. setting it to an absolute value (idempotent).
For non-idempotent changes, HTTP foresees the methods POST and PATCH, whereas PATCH is explicitly designed to carry modifications to an existing resource, whereas POST can be interpreted much more freely regarding the relation of request URI, body content and side effects on the server.
What does this mean in practice? REST is a paradigma for implementing APIs over the HTTP protocol -- a convention that many people have considered reasonable and is thus likely to be adopted or understood. Still, there are controversies regarding what is RESTful and what isn't, but even leaving those aside, REST is not the only correct or meaningful way to build HTTP APIs.
The HTTP protocol itself puts constraints on what you may and may not do, and many of them have actual practical impact. For example, disregarding idempotency may result in cache servers changing the number of requests actually issued by the client, and subsequently disrupt the logic expected by applications. It is thus crucial to be aware of the implications when deviating from the standard.
Being strictly REST-conform, there is no completely satisfying solution for partial updates (some even say this need alone is against REST). The problem is that PATCH, which first appears to be made just for this purpose, is not idempotent. Thus, by using PATCH for idempotent partial updates, you lose the advantages of idempotency (arbitrary number of automatic retries, simpler logic, potential for optimizations in client, server and network). As such, you may ask yourself if using PUT is really the worst idea, as long as the behavior is clearly documented and doesn't break because users (and intermediate network nodes) rely on certain behavior...?
Partial updates are allowed by PUT (according to RFC 7231 https://www.rfc-editor.org/rfc/rfc7231#section-4.3.4).
",... PUT request is defined as replacing the state of the target resource." - replacing part of object basically change state of it.
"Partial content updates are possible by targeting a separately identified resource with state that overlaps a portion of the larger resource, ..."
According to that RFC next request is valid: PUT /resource/123 {name: 'new name'}
It will change only name for specified resource. Specifying id inside request payload would be incorrect (as PUT not allow partial updates for unspecified resources).
PS: Below is example when PATCH is useful.
There is object that have Array inside. With PUT you can't update specific value. You only could replace whole list to new one. With PATCH, you could replace one value to another. With maps and more complex objects benefit will be even bigger.

Assigning URIs to RDF Resources

I'm writing a desktop app using Gnome technologies, and I reached the
stage I started planning Semantic Desktop support.
After a lot of brainstorming, sketching ideas and models, writing notes
and reading a lot about RDF and related topics, I finally came up with a
plan draft.
The first thing I decided to do is to define the way I give URIs to
resources, and this is where I'd like to hear your advice.
My program consists of two parts:
1) On the lower level, an RDF schema is defined. It's a standard set of
classes and properties, possible extended by users who want more options
(using a definition language translated to RDF).
2) On the high level, the user defines resources using those classes and
properties.
There's no problem with the lower level, because the data model is
public: Even if a user decides to add new content, she's very welcome to
share it and make other people's apps have more features. The problem is
with the second part. In the higher level, the user defines tasks,
meetings, appointments, plans and schedules. These may be private, and
the user may prefer to to have any info in the URI revealing the source
of the information.
So here are the questions I have on my mind:
1) Which URI scheme should I use? I don't have a website or any web
pages, so using http doesn't make sense. It also doesn't seem to make
sense to use any other standard IANA-registered URI. I've been
considering two options: Use some custom, my own, URI scheme name for
public resources, and use a bare URN for private ones, something like
this:
urn : random_name_i_made_up : some_private_resource_uuid
But I was wondering whether a custom URI scheme is a good decision, I'm
open to hear ideas from you :)
2) How to hide the private resources? On one hand, it may be very useful
for the URI to tell where a task came from, especially when tasks are
shared and delegated between people. On the other hand, it doesn't
consider privacy. Then I was thinking, can I/should I use two different
URI styles depending on user settings? This would create some
inconsistency. I'm not sure what to do here, since I don't have any
experience with URIs. Hopefully you have some advice for me.
1) Which URI scheme should I use?
I would advise the standard urn:uuid: followed by your resource UUID. Using standards is generally to be preferred over home-grown solutions!
2) How to hide the private resources?
Don't use different identifier schemes. Trying to bake authorization and access control into the identity scheme is mixing the layers in a way that's bound to cause you pain in the future. For example, what happens if a user makes some currently private content (e.g. a draft) into public (it's now in its publishable form)?
Have a single, uniform identifier solution, then provide one or more services that may or may not resolve a given identifier to a document, depending on context (user identity, metadata about the content itself, etc etc). Yes this is much like an HTTP server would do, so you may want to reconsider whether to have an embedded HTTP service in your architecture. If not, the service you need will have many similarities to HTTP, you just need to be clear the circumstances in which an identifier may be resolved to a document, what happens when that is either not possible or not permitted, etc.
You may also want to consider where you're going to add the most value. Re-inventing the basic service access protocols may be a fun exercise, but your users may get more value if you re-use standard components at the basic service level, and concentrate instead on innovating and adding features once the user actually has access to the content objects.

Abstraction or not?

The other day i stumbled onto a rather old usenet post by Linus Torwalds. It is the infamous "You are full of bull****" post when he defends his choice of using plain C for Git over something more modern.
In particular this post made me think about the enormous amount of abstraction layers that accumulate one over the other where I work. Mine is a Windows .Net environment. I must say that I like C# and the .Net environment, it really makes most things easy.
Now, I come from a very different background made of Unix technologies like C and a plethora or scripting languages; to me, also, OOP is just one, and not always the best, programming paradigm.. I often struggle (in a working kind of way, of course!) with my colleagues (one in particular), because they appear to be of the "any problem can be solved with an additional level of abstraction" church, while I'm more of the "keeping it simple" school. I think that there is a very different mental approach to the problems that maybe comes from the exposure to different cultures.
As a very simple example, for the first project I did here I needed some configuration for an application. I made a 10 rows class to load and parse a txt file to be located in the program's root dir containing colon separated key / value pairs, one per row. It worked.
In the end, to standardize the approach to the configuration problem, we now have a library to be located on every machine running each configured program that calls a service that, at startup, loads up an xml that contains the references to other xmls, one per application, that contain the configurations themselves.
Now, it is extensible and made up of fancy reusable abstractions, providers and all, but I still think that, if we one day really happen to reuse part of it, with the time taken to make it up, we can make the needed code from start or copy / past the old code and modify it.
What are your thoughts about it? Can you point out some interesting reference dealing with the problem?
Thanks
Abstraction makes it easier to construct software and understand how it is put together, but it complicates fully understanding certain issues around performance and security, because the abstraction layers introduce certain kinds of complexity.
Torvalds' position is not absurd, but he is an extremist.
Simple answer: programming languages provide data structures and ways to combine them. Use these directly at first, do not abstract. If you find you have representation invariants to maintain that are at a high risk of being broken due to a large number of usage sites possibly outside your control, then consider abstraction.
To implement this, first provide functions and convert the call sites to use them without hiding the representation. Hide the data representation only when you're satisfied your functional representation is sufficient. Make sure at this time to document the invariant being protected.
An "extreme programming" version of this: do not abstract until you have test cases that break your program. If you think the invariant can be breached, write the case that breaks it first.
Here's a similar question: https://stackoverflow.com/questions/1992279/abstraction-in-todays-languages-excited-or-sad.
I agree with #Steve Emmerson - 'Coders at Work' would give you some excellent perspective on this issue.

Best way to incorporate spell checkers with a build process

I try to externalize all strings (and other constants) used in any application I write, for many reasons that are probably second-nature to most stack-overflowers, but one thing I would like to have is the ability to automate spell checking of any user-visible strings. This poses a couple problems:
Not all strings are user-visible, and it's non-trivial to spearate them, and keep that separation in place (but it is possible)
Most, if not all, string externalization methods I've used involve significant text that will not pass a spell checker such as aspell/ispell (eg: theStrName="some string." and comments)
Many spellcheckers (once again, aspell/ispell) don't handle many words out of the box (generally technical terms, proper nouns, or just 'new' terminology, like metadata).
How do you incorporate something like this into your build procedures/test suites? It is not feasible to have someone manually spell check all the strings in an application each time they are changed -- and there is no chance that they will all be spelled correctly the first time.
We do it manually, if errors aren't picked up during testing then they're picked up by the QA team, or during localization by the translators, or during localization QA. Then we lodge a bug.
Most of our developers are not native English speakers, so it's not an uncommon problem for us. The number that slip through the cracks is so small that this is a satisfactory solution for us.
Nothing over a few hundred lines is ever 100% bug-free (well... maybe the odd piece of embedded code), just think of spelling mistakes as bugs and don't waste too much time on it.
As soon as your application matures, over 90% of strings won't change between releases and it would be a reasonably trivial exercise to compare two versions of your resources, figure out what'ts new (check them first), what's changed/updated (check next) and what hasn't changed (no need to check these)
So think of it more like I need to check ALL of these manually the first time, and I'm only going to have to check 10% of them next time. Now ask yourself if you still really need to automate spell checking.
I can think of two ways to approach this semi-automatically:
Have the compiler help you differentiate between strings used in the UI and strings used elsewhere. Overload different variants of the string datatype depending on it's purpose, and overload the output methods to only accept that type - that way you can create a fake UI that just outputs the UI strings, and do the spell checking on that.
If this is doable of course depends on the platform and the overall architecture of the application.
Another approach could be to simply update the spell checkers database with all the strings that appear in the code - comments, xpaths, table names, you name it - and regard them as perfectly cromulent. This will of course reduce the precision of the spell checking.
First thing, regarding string externalization - GNU GetText (if used properly) creates string files that are contain almost no text other then the actual content of the strings (there are some headers but its easy to cause a spell checker to ignore them).
Second thing, what I would do is to run the spell checker in a continuous integration environment and have the errors fed externally, probably through a web interface but email will also work. Developers can then review the errors and either fix them in the code or use some easy interface to let the spell check know that a misspelling should be ignored (a web interface can integrate both the error view and the spell checker interface).
If you're using java and are storing your localized strings in resource bundles then you could check the Bundle.properties files and validate the bundle strings. You could also add a special comment annotation that your processor could use to determine if an entry should be skipped.
This method will allow you to give a hint as to the locale and provide a way of checking multiple languages within the one build process.
I can't answer how you would perform the actual spell checking itself, though I think what I've presented will guid you as for the method of performing the spell checking.
Use aspell. It's a programme, it's available for unixoids and cygwin, it can be run over lots of kinds of source code. Use it.
First point, please don't put it into you build process. I would be a vengeful coder if I (meaning my computer) had to spell check all the content on the site every time I tried to debug or build a new feature. I don't even think this kind of operation belongs as a unit test (you're testing a human interface, not a computerised one).
Second point, don't write a script. You're going to have so many false positives fall through the cracks that people will stop reading the reports and you are no better off than when you started.
Third point, this is probably most easily solved by having humans do it: QA team, copy writers, beta testers, translators, etc. All the big sites with internationalised content that I've built had the same process: we took the copy from the copy writers, sent it to the translating service/agency, put it into the persistence layer, and deployed it. Testers (QA, developers, PMs, designers, etc.) would find spelling or grammatical mistakes and lodge bug reports. There is just too much red tape and pairs of eyes for that many spelling/grammar errors to slip through.
Fourth point, there will always be spelling and grammar mistakes on your page. Even major newspaper web sites haven't gotten around this and they have whole office buildings filled with editors.

Resources