BizTalk core format: redundant but all according to standard or "keep it simply stupid"? - biztalk

I'm quite new in BizTalk mapping and my question now is:
Let's say that I need to receive UBL document, convert it to my Biztalk Core and send out same UBL. Of course I can perform 1:1 UBL->UBL without core format but I may need this core if I need to send out something else - EDIFACT, OIOXML, whatever else, so I believe it's a good practice to use core. So it looks like UBL -> Incoming map -> Core -> Outgoing map -> UBL.
So the question is: what is the best practice to create core format schema?
My incoming file must meet all OIOUBL standards so I have to use something pre-defined as XSD schema (e. g. like this: The same for outgoing file.
But on the other hand I know for the fact that in my case this standard contains a lot of redundant fields. I will never use some of these fields or parameters; some other are constants and there's no need to store it - we can just define default values in outgoing map...and so on.
So my question is: what is the best practice to build core file? Is it better to use full UBL xsd which meets all standards, even if it's redundant (in this case it will simplify incoming and outgoing maps - I can just use 1:1 mass copy) or it's better to KISS and to simplify core as it's possible using only fields that I really need and adding something one by one if I need anything else?
This question is not about code - just about what is the best practice.
Thanks. I would appreciate any advice.

Best practice would be for the internal core or canonical schema usually to match the inbound external schema (apart from target namespace) so you don't lose any data from the first mapping, as quite often when you find you do need to send it to another system and need another outbound mapping the fields you require might include on that you didn't for the original system. Of course to any rule there is an exception, and it is a matter of judgement as to when it isn't appropriate to do this.


Protobuf 3 breaks contract additivity

I'm using Protobuf 3 along with gRPC in distributed environment ("microservices").
Due to lack of supporting not-set/missing values in Protobuf 3 I got the following issue related to contract additivity.
Imagine I have Service A and couple of consumer services B and C owned by Team B and Team C.
If I add a field, say, boolean value to contract of Service A, at the first it will have default value which will be written, say, to database as is.
Then, Team B updates their service to talk using updated contract and passes 'true' as the field value.
Then, Team C still uses old contract and calls the same service - value gets replaced to false. But Team C didn't mean it, moreover they weren't aware about that field at all.
Thus, Service A cannot extend contract at all because consumers that didn't get updated for various reasons yet are able to harm data and the Service A can do nothing about it.
In Thrift such things are done just by single check (.isSet()).
There are dirty workarounds like wrapping primitives into objects but it forces to use library-implementation-specific checks-by-reference (at least in java) which seems to be rather poor hack than robust solution. Also, eventually, I have to wrap everything in wrappers, which as you imagine is not great solution as well.
What are best practices you use to manage such situations in Protobuf 3 in 2017? How do you manage/coordinate contract updates between teams/services? Thanks
Note: this question is not exactly about how to implement absence of detection for not-set/missing values, but rather about how to live with that and follow Protobuf 3 philosophy.
I think the problem here is that trying to check for field presence this way is not really an idiomatic use of protocol buffers (not even in proto2). It sounds like you are trying to evolve your schema by adding new fields but not reading those new fields unless you're sure they came from an updated client. The idiomatic way is to do this instead: just make sure the defaults for the new fields are reasonable and maintain compatible behavior if they're not explicitly set. Then don't try to check for presence--just read the fields and older clients will get good default behavior.
To give you an example, let's say you're adding a new feature that can be enabled or disabled. The right way to do this would be to add a bool field in your request message called enable_new_feature. Since older clients don't know about this field, their requests will have this default to false and so they get the old behavior they're expecting. Adding a disable_new_feature field instead would probably be the wrong way to do it because then you would indeed break older clients by enabling something they didn't want.
Using oneof looks like a better/cleaner alternative to wrappers. See this answer to a similar question:

How does one expose constants in a java google app engine Endpoints API?

Simple question -- how do you expose constants in a java google app engine Endpoints API?
public static final int CODE_FOO = 3845;
I'd like the client of the Endpoints to be able to match on CODE_FOO rather than on 3845. I'll end up doing enum wrappers (which probably is better anyway) but I'm just starting to be curious if this is even doable? Thx
Note that this isn't a full answer but here is a workaround: in Android Studio, create a very light-weight "common" java project and shove anything you want to keep in sync there such as constants as well as common types that you want exposed (e.g. an enum representing all possible return / error codes, etc).
This way you should get pretty decent compiler-time safety and keep these guys in sync.
Please feel free to comment if anyone has better suggestions.
This is unfortunately a Law of Information (ahem). If you have a message protocol you defined, both sides of the interaction need to be aware of the messages that could be passed. There's no other way for the client to be aware of what it needs to respond to. Ajax libraries hard-code the number "200" to be able to detect a successful request, as one example.
Yes, just use a switch statement on strings inside your client code. Or, you could use a dictionary of strings pointing to functions and just call the function after de-referencing the dictionary given the string you got.

REST design: what verb and resource name to use for a filtering service

I am developing a cleanup/filtering service that has a method that receives a list of objects serialized in xml, and apply some filtering rules to return a subset of those objects.
In a REST-ful service, what verb shall I use for such a method? I thought that GET is a natural choice, but I have to put the serialized XML in the body of the request which works but feels incorrect. The other verbs don't seem to fit semantically.
What is a good way to define that Service interface? Naming the resource /Cleanup or /Filter seems weird mainly because in the examples I see online, it is always a name rather than a verb being used for resource name.
Am I right to feel that REST services are better suited for CRUD operations and you start bending the rules in situations like this service? If yes, am I then making a wrong architectural choice.
I've pushed to develop this service in REST-ful style (as opposed to SOAP) for simplicity, but such awkward cases happen a lot and make me feel like I am missing something. Either choosing REST where it shouldn't be used or may be over-thinking some stuff that doesn't really matter? In that case, what really matters?
REST is about using HTTP the way it was designed. To be RESTful consider (title was REST design :):
URLs should be permalinks to a resource (caching benefits, storing/sharing endpoints etc...)
Because they are permalinks to a resource, having verbs in the URL is a hint that you're on the wrong path (filter is a verb).
A collection of resources can be an endpoint /foos.
If you want to filter the collection of resources, consider querystring params like ?filter= or something like ?ids=1,2,3,4,5.
A GET should not change resources. Note that 'cleanup' implies something getting deleted so be cautious of changes to resources when you do a GET. REST says a GET shouldn't alter resources. Imagine a caching server taking you're cleanup request as a GET and returning OK because t's cached. Caching servers know not to cache a POST, DELETE etc... (that's the way HTTP was designed).
Don't rule out multiple calls - for example, you may do a get to filter and get a set of resources to clean up and then could be followed by many or one DELETE verb calls to do the cleanup.
Sometimes there's a temporal resource like a transaction or a 'job' that could do work like a cleanup. Don't rule out a POST to the resource with the body containing items to cleanup up and it returns a job id. You can then query the jobid for the cleanup progress or status.
It's hard to give exact guidance because the question isn't clear but hopefully the RESTful principlies guidance and thoughts above set you on the right track. If you clarify the exact calls, I'll try and recommend APIs.
So, let's say you wanted to cleanup duplicate foos.
[GET] /foos/duplicates (or /foos?filter=duplicates)
returns a body with identifies to of foos that are duplicates. Let's say that returns 1,2,5 (could be names).
Then you could issue:
[DELETE] /foos with the body being an array containing 1,2,5 (or names if unique). the delete call is passive so even if the GET call is cached according to REST principles it's fine.
It's also possible and valid to not go the REST route such as POX or JOSN RPC over http but just realize at that point that it's not REST. And that's fine but you're not getting the benefits of REST described in fielding's thesis.
Also, read this:
After reading the comment where you clarified you're sending the server a set of objects (not persisted server side) and it returns the subset with the dupes filtered out (like a server side helper function), some options are:
Do this client/browser side if possible - why take the network roundtrip to filter out dupes out of collection?
If for some reason only the server has specific knowledge/data to determine that two items are functional equivalent (even though data not exactly the same), then consider POSTing the data set to the server with the response body containing the unique/filtered set. Even though the server isn't persisting the set, it would fall into a 'temporal' object or set and the server is modifying it. It's not conceptually a GET of server resources and caching offers no benefits in that scenario.
Last question first: What really matters is getting the job done in a way that is
As easy to use as practical
Easily maintained by future programmers (likely to include yourself)
REST is a natural fit for operations on resources where each URL matches some object that can be manipulated. It is a less natural fit for other uses, but these are more guidelines than actual rules. Others have pointed out the original dissertation on REST, but it is worth remembering that few implementations are pure.
If you have several URLs that perform these transformative kinds of functions, consider putting them in their own special URL space, like /api/filter and /api/transliterate, etc.. That will help users and maintainers alike know that certain URLs aren't REST, but are more like remote procedure calls. Posting data to these URLs results in you getting some kind of data back.
If you get stuck on specific names you should make a list of candidates, have a few beers, then choose one from the list. That's what I do when I get stuck on minutia.
SOAP is a neat protocol and has its uses, but it tends to be very heavy. Good documentation and consistency are probably more important to your budding API than using any specific technology.

Database structure of a triple store?

I want to use RDF / triples in my Symfony2 project in order to organize things (in my case it is Tags).
I would see something like this :
ENTITY TAG <-------------- TAG_TAG --------------> ASSOCIATION_TYPE
^ |
Fields :
Tag (text)
Description (text/html)
Like this, I would be able :
To store triple associations
To set different association types. For example, PHP is a Programming_language ; is a website ; but the Earth turns around the Sun.
To set parameters (which permits to give more information inside associations)
We could consider setting a many-to-many relation between TAG_TAG and ASSOCIATION_TYPE. By doing this we could set several parameters.
So I have several questions :
Do you think it's a good way to store triples efficiently ?
Is there any RDF layer to extract existing RDF/triples databases and populate my own ?
Should I consider using some kind of tripleStore like Sesame and use it with Symfony ?
To answer your questions:
1) I'm not entirely sure what you're asking. If you're asking if that's a reasonable way to model your data, it's probably ok. But your diagram is not clear and you're a bit light on details. Best thing to do is just do something that works to start with. You can improve the modeling later without much of a hassle.
If you're asking about storage of triples, don't. See my response to #3.
2) There are many RDF libraries available, you have Jena & Sesame in Java, dotNetRdf for the .Net world, RDFLib in python, redland for C, etc.
3) Yes. Don't attempt to re-invent the wheel and build your own triple store. It's not an easy project and you won't do better than even the worst existing triple store on any reasonable time scale.
As Michael said - please don't build your own triple store! There are several solutions available in PHP:
ARC2 provides a triple store based on MySQL
The librdf extension provides a PHP wrapper for the standrad RDF C library
The Erfurt Library is an abstraction library for connecting with the open source Virtuoso Server triple store, but also has its own triple store based on Zend DB taht can be used with MySQL.

node_load or direct query?

What rule of thumb do you use for deciding to use node_load() or just writing a direct db_query()?
In a situation I'm looking at right now I need to get some node data and resolve data on two nodereference fields. So that would be 3 calls to node_load(). At some point here, would it be more efficient to construct the query with Joins directly?
This is for use in a self contained module that won't be distributed or used anywhere else, so I don't believe I need to worry about subverting node modification hooks (or do I?).
Thinking about my question more, node_load() is only really applicable when you have one node to grab (and then maybe drilling down further into nodereferences like in my example). But as soon as you need to return more than one node based on some criteria, you're pretty much forced to use db_query right? Does Drupal have any abstracted API for writing queries like this?
Not a full answer (Not sure myself), just some hints.
node_load() is using a static cache (in Drupal 7, you can even use the entity_cache module to make it a permanent cache). If the nodes you are loading are being used a second time on the same page, that call will be free.
Querying CCK-tables is tricky. The schema structure can change completely based on configuration, for example when using a single or multiple values.
The reasoning behind using API methods for DB calls over direct DB calls is to provide a DB abstraction layer so that your app could move between supported database engines etc, also it enables your app to gracefully handle any schema changes (however unlikely) that core/module may make to the tables in question. It's also likely easier as #Berdir says for CCK fields and Node_Ref fields, but that depends on which you are more confident with Drupal API& PHP or MySQL...the payoff of doing it the Drupal way is increased future productivity and understanding of the codebase and what is possible :)
Oh and my rule of thumb is - Do it the Drupal way if at all possible (possible being variable depending on app time/cost/performance/whatever requirements)
