Where can I find research data that proves best practices for creating public APIs? - api-design

I need to persuade management (product management and others) that just "publicising" internal private APIs is a bad idea compared to the best practice of creating a public API candidate, use it internally and when satisfied make it public. Can anyone help me find some facts like research papers that helps me make the argument?

I'm not aware of any specific research since the public interface to any API is highly subjective and specifically tied to a problem domain.
The first few pages of this pdf are an ok overview of an API for a business person:
http://aarontgrogg.com/wp-content/uploads/2009/09/How-to-Build-API-and-why-it-matters.pdf
This blog posts section header's highlight key points that your business partners need to think about as I think you're aware already. I would search for best practices around these specific subjects as they pertain to a public API: http://gaejexperiments.wordpress.com/2010/07/01/public-api-design-factors/
API Format Rest vs WebServices
Response Format XML, JSON
Service Contract Description
Authentication Mechanism for the Consumers of the API
Service Versioning (so you can roll out new versions of the API without blowing everyone up)
Rate Limits (obviously, for any number of things, preventing DOS attacks, and just managing system load)
Documentation
Helper Libraries
Website for the public api
Depending on what type of API it is... A SUPPORT TEAM
This doesn't address your internal processes either. Should your internal systems be able to evolve faster than the public api? In most cases I think the answer is yes, as your company wants to be agile with their business model and strategy. Having 3rd parties consume your internal systems is going to force your company to make a decision of who's more important when it's time to make an update. Either your company will have to version it's internal service and hope the third party consumers upgrade in a timely fashion, or just break the integration for all the third party consumers.
At the end of the day, it might not be worth doing. You can only screw over the people using your API so many times before they stop using it. What good is it if no one uses it.
I have been in the position before where the business has wanted an API pushed out too fast and without any governance around it. It resulted in all of my time being spent supporting people who were integrating with our API, and writing code samples for them.

Related

When a G-Suite form is embedded on external website, does any form data get stored on the host site?

This question comes up because of very specific HIPAA requirements. A Covered Entity(CE) eg, doctor can't use a cloud storage provider (CSP) unless they have a Business Associate Agreement (BAA) with the CSP, even if the data are encrypted and the CSP has no access. I'm not a security expert, but most web hosts' security would IMO satisfy HIPAA, IF there were a BAA.
There's a conduit exception for video, ISPs, and other electronic equivalents of USPS that do not store electronic Protected Health Information (e-PHI.)
I don't know why, but the web hosts who will sign a BAA charge $100-300/ mo for very basic hosting other sites charge $5-15/mo for. I think they're preying on CE ignorance with the perception there's lots of money sloshing around, true for radiology, but not for primary care.
G-Suite will execute a BAA, which makes G-Suite a reasonably-priced solution for gathering Protected Health Information (PHI) patient input, while keeping the CE compliant with HIPAA.
It's worth noting that "HIPAA compliance" is ONLY a property of CEs and Electronic Medical Records, not other software or sites. Any other product or service claiming "HIPAA compliance" is misrepresenting itself.
I find Google Sites not as user-friendly as most web hosts. There's less hand-holding for doing things like installing WP add-ins, or adding SSL certificates. Or maybe Google just does a terrible job of explaining how to actually DO something with a site hosted there. In any case, it seems easier to run a website on a web host that's set up to manage software and WP plug-ins for amateurs.
I'm willing to be educated on this. (24 hours later--I did a lot of self-education-see answer below.)
The basic HIPAA privacy requirements are rather simple:
CEs can use PHI to treat and carry out essential functions, but must
not share it with anyone not entitled to it.
The basic HIPAA security requirements are also simple:
Make a security risk analysis.
Implement reasonable security measures and
Document why various measures were taken or not.
Some elements are required, others must simply be addressed, evaluated and documented.
For example, 2FA is "addressable" as is data encryption, but making an analysis, having physical security and employee training are required.
So my question is whether a G-Suite form embedded in a website on another web host stores any data on that web host, or does it all go back to G-Suite, eg G-Drive, where it's secure and covered by a BAA?
The problem when you know very little about a topic is, you don't know what to ask. I know a bunch about HIPAA, not much about HTML. I did a lot more research, and there's at least two answers.
The short answer is, NO, the embedded frame is an iframe HTTPS linked to G-Suite.
The form in the iframe is a window into docs.google.com, so data never gets off docs.google.com, where it's covered by G-Suite's BAA. The host site is in effect a conduit.
<iframe src="https://docs.google.com/forms......…</iframe>
Note https
Embedding the form does not create a HIPAA violation.
The second answer is, G-Suite has its own content management system and website builder, which requires very little technical skill. Thus there's no need to install Wordpress or anything else, you just drag-and-drop to create a site. All the back end stuff is done for you. Duh. And they execute a BAA, all for $6 a month. So G-Suite is much simpler, in fact so simple that only a child can do it. Their help pages leave much to be desired.
Bottom line--for small covered entities, G Suite is a very economical website solution that doesn't create a HIPAA violation. Wish I knew this yesterday!
FYI: HIPAA compliant Cloud Services

Where do APIs get their information from

After some time being working with Restful APIs I would like to know a bit more about their internal functionality.
I would like a simple explanation about how the API`s get access to the data that they provide as responses to our requests.
There are APIs, for example weather API`s or sports APIs that are capable to provide responses with very recent data (such as sports results), I am wondering where or how they get that updated info almost as soon as it is available.
I have seen here on SO questions with answers pointing to API design tutorials, but not to this particular topic.
An API is usually simply a facade (or an interface if you prefer) to some information resource. The idea behind it is to "hide" any complexity from the user, to unify several services to a single access point or even to keep the details about the implementation of the actual service a secret.
This being said you probably understand now that there can't be one definitive answer to the question "where do APIs get their info from?". But some common answers are:
other APIs
some proprietary/in-house developed service/database
etc.
For sports APIs - probably they are being provided by some sports media, which has the results as soon as they get out, so they just enter them in their DB and immediately they become available through their API.
For weather forecasts - again as with the sports API they are probably provided by a company dealing with weather forecasts.
If it's easier for you you can think of the "read-only" APIs as rss feeds in a way.
I hope this clears the things a bit for you.
You could have a look at Stack Share to see what companies use for databases and whatnot. But there isn't a universal answer, every company uses whatever works for them.
This usually means that te company has its own database in which the data is stored. But they might also get their data from another company.
But a 'database' is not just SQL, maybe they use unstructured data or any of the other options to store data.
That's where the "whatever works" comes from. The company chooses a solution they go with which best fits their needs.

Orchestrating microservices

What is the standard pattern of orchestrating microservices?
If a microservice only knows about its own domain, but there is a flow of data that requires that multiple services interact in some manner, what's the way to go about it?
Let's say we have something like this:
Invoicing
Shipment
And for the sake of the argument, let's say that once an order has been shipped, the invoice should be created.
Somewhere, someone presses a button in a GUI, "I'm done, let's do this!"
In a classic monolith service architecture, I'd say that there is either an ESB handling this, or the Shipment service has knowledge of the invoice service and just calls that.
But what is the way people deal with this in this brave new world of microservices?
I do get that this could be considered highly opinion-based. but there is a concrete side to it, as microservices are not supposed to do the above.
So there has to be a "what should it by definition do instead", which is not opinion-based.
Shoot.
The Book Building Microservices describes in detail the styles mentioned by #RogerAlsing in his answer.
On page 43 under Orchestration vs Choreography the book says:
As we start to model more and more complex logic, we have to deal with
the problem of managing business processes that stretch across the
boundary of individual services. And with microservices, we’ll hit
this limit sooner than usual. [...] When it comes to actually
implementing this flow, there are two styles of architecture we could
follow. With orchestration, we rely on a central brain to guide and
drive the process, much like the conductor in an orchestra. With
choreography, we inform each part of the system of its job and let it
work out the details, like dancers all find‐ ing their way and
reacting to others around them in a ballet.
The book then proceeds to explain the two styles. The orchestration style corresponds more to the SOA idea of orchestration/task services, whereas the choreography style corresponds to the dumb pipes and smart endpoints mentioned in Martin Fowler's article.
Orchestration Style
Under this style, the book above mentions:
Let’s think about what an orchestration solution would look like for
this flow. Here, probably the simplest thing to do would be to have
our customer service act as the central brain. On creation, it talks
to the loyalty points bank, email service, and postal service [...],
through a series of request/response calls. The
customer service itself can then track where a customer is in this
process. It can check to see if the customer’s account has been set
up, or the email sent, or the post delivered. We get to take the
flowchart [...] and model it directly into code. We could even use
tooling that implements this for us, perhaps using an appropriate
rules engine. Commercial tools exist for this very purpose in the form
of business process modeling software. Assuming we use synchronous
request/response, we could even know if each stage has worked [...]
The downside to this orchestration approach is that the customer
service can become too much of a central governing authority. It can
become the hub in the middle of a web and a central point where logic
starts to live. I have seen this approach result in a small number of
smart “god” services telling anemic CRUD-based services what to do.
Note: I suppose that when the author mentions tooling he's referring to something like BPM (e.g. Activity, Apache ODE, Camunda). As a matter of fact, the Workflow Patterns Website has an awesome set of patterns to do this kind of orchestration and it also offers evaluation details of different vendor tools that help to implement it this way. I don't think the author implies one is required to use one of these tools to implement this style of integration though, other lightweight orchestration frameworks could be used e.g. Spring Integration, Apache Camel or Mule ESB
However, other books I've read on the topic of Microservices and in general the majority of articles I've found in the web seem to disfavor this approach of orchestration and instead suggest using the next one.
Choreography Style
Under choreography style the author says:
With a choreographed approach, we could instead just have the customer
service emit an event in an asynchronous manner, saying Customer
created. The email service, postal service, and loyalty points bank
then just subscribe to these events and react accordingly [...]
This approach is significantly more decoupled. If some
other service needed to reach to the creation of a customer, it just
needs to subscribe to the events and do its job when needed. The
downside is that the explicit view of the business process we see in
[the workflow] is now only implicitly reflected in our system [...]
This means additional work is needed to ensure that you can monitor
and track that the right things have happened. For example, would you
know if the loyalty points bank had a bug and for some reason didn’t
set up the correct account? One approach I like for dealing with this
is to build a monitoring system that explicitly matches the view of
the business process in [the workflow], but then tracks what each of
the services do as independent entities, letting you see odd
exceptions mapped onto the more explicit process flow. The [flowchart]
[...] isn’t the driving force, but just one lens through
which we can see how the system is behaving. In general, I have found
that systems that tend more toward the choreographed approach are more
loosely coupled, and are more flexible and amenable to change. You do
need to do extra work to monitor and track the processes across system
boundaries, however. I have found most heavily orchestrated
implementations to be extremely brittle, with a higher cost of change.
With that in mind, I strongly prefer aiming for a choreographed
system, where each service is smart enough to understand its role in
the whole dance.
Note: To this day I'm still not sure if choreography is just another name for event-driven architecture (EDA), but if EDA is just one way to do it, what are the other ways? (Also see What do you mean by "Event-Driven"? and The Meanings of Event-Driven Architecture). Also, it seems that things like CQRS and EventSourcing resonate a lot with this architectural style, right?
Now, after this comes the fun. The Microservices book does not assume microservices are going to be implemented with REST. As a matter of fact in the next section in the book, they proceed to consider RPC and SOA-based solutions and finally REST. An important point here is that Microservices does not imply REST.
So, What About HATEOAS? (Hypermedia as the Engine of Application State)
Now, if we want to follow the RESTful approach we cannot ignore HATEOAS or Roy Fielding will be very much pleased to say in his blog that our solution is not truly REST. See his blog post on REST API Must be Hypertext Driven:
I am getting frustrated by the number of people calling any HTTP-based
interface a REST API. What needs to be done to make the REST
architectural style clear on the notion that hypertext is a
constraint? In other words, if the engine of application state (and
hence the API) is not being driven by hypertext, then it cannot be
RESTful and cannot be a REST API. Period. Is there some broken manual
somewhere that needs to be fixed?
So, as you can see, Fielding thinks that without HATEOAS you are not truly building RESTful applications. For Fielding, HATEOAS is the way to go when it comes to orchestrating services. I am just learning all this, but to me, HATEOAS does not clearly define who or what is the driving force behind actually following the links. In a UI that could be the user, but in computer-to-computer interactions, I suppose that needs to be done by a higher level service.
According to HATEOAS, the only link the API consumer truly needs to know is the one that initiates the communication with the server (e.g. POST /order). From this point on, REST is going to conduct the flow, because, in the response of this endpoint, the resource returned will contain the links to the next possible states. The API consumer then decides what link to follow and move the application to the next state.
Despite how cool that sounds, the client still needs to know if the link must be POSTed, PUTed, GETed, PATCHed, etc. And the client still needs to decide what payload to pass. The client still needs to be aware of what to do if that fails (retry, compensate, cancel, etc.).
I am fairly new to all this, but for me, from HATEOAs perspective, this client, or API consumer is a high order service. If we think it from the perspective of a human, you can imagine an end-user on a web page, deciding what links to follow, but still, the programmer of the web page had to decide what method to use to invoke the links, and what payload to pass. So, to my point, in a computer-to-computer interaction, the computer takes the role of the end-user. Once more this is what we call an orchestrations service.
I suppose we can use HATEOAS with either orchestration or choreography.
The API Gateway Pattern
Another interesting pattern is suggested by Chris Richardson who also proposed what he called an API Gateway Pattern.
In a monolithic architecture, clients of the application, such as web
browsers and native applications, make HTTP requests via a load
balancer to one of N identical instances of the application. But in a
microservice architecture, the monolith has been replaced by a
collection of services. Consequently, a key question we need to answer
is what do the clients interact with?
An application client, such as a native mobile application, could make
RESTful HTTP requests to the individual services [...] On the surface
this might seem attractive. However, there is likely to be a
significant mismatch in granularity between the APIs of the individual
services and data required by the clients. For example, displaying one
web page could potentially require calls to large numbers of services.
Amazon.com, for example,
describes how some
pages require calls to 100+ services. Making that many requests, even
over a high-speed internet connection, let alone a lower-bandwidth,
higher-latency mobile network, would be very inefficient and result in
a poor user experience.
A much better approach is for clients to make a small number of
requests per-page, perhaps as few as one, over the Internet to a
front-end server known as an API gateway.
The API gateway sits between the application’s clients and the
microservices. It provides APIs that are tailored to the client. The
API gateway provides a coarse-grained API to mobile clients and a
finer-grained API to desktop clients that use a high-performance
network. In this example, the desktop clients make multiple requests
to retrieve information about a product, whereas a mobile client
makes a single request.
The API gateway handles incoming requests by making requests to some
number of microservices over the high-performance LAN. Netflix, for
example,
describes
how each request fans out to on average six backend services. In this
example, fine-grained requests from a desktop client are simply
proxied to the corresponding service, whereas each coarse-grained
request from a mobile client is handled by aggregating the results of
calling multiple services.
Not only does the API gateway optimize communication between clients
and the application, but it also encapsulates the details of the
microservices. This enables the microservices to evolve without
impacting the clients. For example, two microservices might be
merged. Another microservice might be partitioned into two or more
services. Only the API gateway needs to be updated to reflect these
changes. The clients are unaffected.
Now that we have looked at how the API gateway mediates between the
application and its clients, let’s now look at how to implement
communication between microservices.
This sounds pretty similar to the orchestration style mentioned above, just with a slightly different intent, in this case, it seems to be all about performance and simplification of interactions.
Trying to aggregate the different approaches here.
Domain Events
The dominant approach for this seems to be using domain events, where each service publish events regarding what have happened and other services can subscribe to those events.
This seems to go hand in hand with the concept of smart endpoints, dumb pipes that is described by Martin Fowler here: http://martinfowler.com/articles/microservices.html#SmartEndpointsAndDumbPipes
Proxy
Another apporach that seems common is to wrap the business flow in its own service.
Where the proxy orchestrates the interaction between the microservices like shown in the below picture:
.
Other patterns of the composition
This page contains various composition patterns.
So, how is orchestration of microservices different from orchestration of old SOA services that are not “micro”? Not much at all.
Microservices usually communicate using http (REST) or messaging/events. Orchestration is often associated with orchestration platforms that allow you to create a scripted interaction among services to automate workflows. In the old SOA days, these platforms used WS-BPEL. Today's tools don't use BPEL. Examples of modern orchestration products: Netflix Conductor, Camunda, Zeebe, Azure Logic Apps, Baker.
Keep in mind that orchestration is a compound pattern that offers several capabilities to create complex compositions of services. Microservices are more often seen as services that should not participate in complex compositions and rather be more autonomous.
I can see a microservice being invoked in an orchestrated workflow to do some simple processing, but I don’t see a microservice being the orchestrator service, which often uses mechanisms such as compensating transactions and state repository (dehydration).
So you're having two services:
Invoice micro service
Shipment micro service
In real life, you would have something where you hold the order state. Let's call it order service. Next you have order processing use cases, which know what to do when the order transitions from one state to another. All these services contain a certain set of data, and now you need something else, that does all the coordination. This might be:
A simple GUI knowing all your services and implementing the use cases ("I'm done" calls the shipment service)
A business process engine, which waits for an "I'm done" event. This engine implements the use cases and the flow.
An orchestration micro service, let's say the order processing service itself that knows the flow/use cases of your domain
Anything else I did not think about yet
The main point with this is that the control is external. This is because all your application components are individual building blocks, loosely coupled. If your use cases change, you have to alter one component in one place, which is the orchestration component. If you add a different order flow, you can easily add another orchestrator that does not interfere with the first one. The micro service thinking is not only about scalability and doing fancy REST API's but also about a clear structure, reduced dependencies between components and reuse of common data and functionality that are shared throughout your business.
HTH, Mark
If the State needs to be managed then the Event Sourcing with CQRS is the ideal way of communication. Else, an Asynchronous messaging system (AMQP) can be used for inter microservice communication.
From your question, it is clear that the ES with CQRS should be the right mix. If using java, take a look at Axon framework. Or build a custom solution using Kafka or RabbitMQ.
You can implement orchestration by using spring State machine model.
Steps
Add below dependency to your project ( if you are using Maven)
<dependency>
<groupId>org.springframework.statemachine</groupId>
<artifactId>spring-statemachine-core</artifactId>
<version>2.2.0.RELEASE</version>
</dependency>
Define states and events e.g. State 1, State 2 and Event 1 and Event 2
Provide state machine implementation in buildMachine() method.
configureStates
configureTransitions
Send events to state machine
Refer to documentation page for complete code
i have written few posts on this topic:
Maybe these posts can also help:
API Gateway pattern - Course-grained api vs fine-grained apis
https://www.linkedin.com/pulse/api-gateway-pattern-ronen-hamias/
https://www.linkedin.com/pulse/successfulapi-ronen-hamias/
Coarse-grained vs Fine-grained service API
By definition a coarse-grained service operation has broader scope than a fine-grained service, although the terms are relative. coarse-grained increased design complexity but can reduce the number of calls required to complete a task. at micro-services architecture coarse-grained may reside at the API Gateway layer and orchestrate several micro-services to complete specific business operation. coarse-grained APIs needs to be carefully designed as involving several micro-services that managing different domain of expertise has a risk to mix-concerns in single API and breaking the rules described above. coarse-grained APIs may suggest new level of granularity for business functions that where not exist otherwise. for example hire employee may involve two microservices calls to HR system to create employee ID and another call to LDAP system to create a user account. alternatively client may have performed two fine-grained API calls to achieve the same task. while coarse-grained represents business use-case create user account, fine-grained API represent the capabilities involved in such task. further more fine-grained API may involve different technologies and communication protocols while coarse-grained abstract them into unified flow. when designing a system consider both as again there is no golden approach that solve everything and there is trad-off for each. Coarse-grained are particularly suited as services to be consumed in other Business contexts, such as other applications, line of business or even by other organizations across the own Enterprise boundaries (typical B2B scenarios).
the answer to the original question is SAGA pattern.

Is Firebase an all-purpose database?

I've been reading about Firebase and playing with it for a short while. The idea (BAAS) and implementation are impressive, and having programmed with Javascript it seems a viable choice. Not having to deal with scaling and other server side concerns makes it even more attractive.
My question is: generally speaking, is Firebase a first class back-end candidate for any average data-based application? e.g. billing, CRM, e-commerce, social, location based, etc. I do not include super light or heavy extremes such as a basic chat, or a nuclear plant monitor...
The answer may not be a clear yes/no, but was it built to support the general application space, or just stand out as a real-time read/write data service?
Would appreciate answers based on experience and existing production applications.
Thanks
Yes, Firebase is intended to be a first class back-end for any data based Web, iOS or Android application. The service offers real-time data reads and writes, but also comes with a powerful and flexible security system that allows you to write secure client-only apps, without needing any server code to enforce data boundaries.
There are several apps in production listed on the front page as customer and on the app showcase page on https://firebase.google.com/customers/
Firebase is now more capable and is considered as a full stand-alone back-end, especially after the introduction of cloud function. https://firebase.google.com/docs/functions/
Firebase may not have support for transaction spanning multiple business objects.
e.g. When a sales order is booked then it needs to update inventory for multiple items, update billing in receivables, give sales credit to multiple sales persons etc.
Firebase team is supposed to come up with a database trigger option which will make all these happen.

Will Google block my access if I use their features without token?

I'm using this link https://www.google.com/reader/api/0/stream/contents/feed/FEEDHERE?output=json&n=20
to fetch feeds using Google's algorithm. As you can see I'm not adding any other parameters, just fetching the returned data in JSON format. My app will be heavily used hopefully and if I send a lot of requests to this link, will Google block my access or something?
Is there anything I can include, like userip, url for my app (so if they have problem to just contact me) or something else?
The most basic answer to your question is that Google will change its Terms of Service whenever it likes, and you've got no say in the matter. So if it's allowed today, it might not be allowed tomorrow, at Google's whim.
On this issue, though, you seem fairly safe. From the Terms of Service (these is the general document, since Reader doesn't seem to have a specific one):
Don’t misuse our Services. For example, don’t interfere with our Services or try to access them using a method other than the interface and the instructions that we provide.
Google provides RSS and Atom. They provide these feeds, so I assume they expect that they'll be used. They don't say that it's a misuse to point someone else at those feeds, so it looks OK for now, but they could add such a clause at any time.
All online services are subject to the terms and conditions of the providers of those services. So, as others have said, they may be ok with your use today, but they can change their mind any time down the line. I doubt including a URL or email or contact info will help anything, because when these services change, they don't notify every user of the service, they just announce the change publicly, and usually they give several month's notice in order to give users a chance to adapt their applications, but this is not standardized or enforced so there is no guarantee. One example would be the fairly recent discontinuance of the Google Finance API (for which no replacement has been announced).
The safest approach would be to design your app such that this feature that uses google's functionality is decoupled as much as possible from the rest of your app, so that, when or if the availability of the service changes (ie it's no longer available at all) you can adapt your app to use some other source for the feeds with minimal impact to the rest of the app. Design for change and plan for the worst.

Resources