Where do APIs get their information from - api-design

After some time being working with Restful APIs I would like to know a bit more about their internal functionality.
I would like a simple explanation about how the API`s get access to the data that they provide as responses to our requests.
There are APIs, for example weather API`s or sports APIs that are capable to provide responses with very recent data (such as sports results), I am wondering where or how they get that updated info almost as soon as it is available.
I have seen here on SO questions with answers pointing to API design tutorials, but not to this particular topic.

An API is usually simply a facade (or an interface if you prefer) to some information resource. The idea behind it is to "hide" any complexity from the user, to unify several services to a single access point or even to keep the details about the implementation of the actual service a secret.
This being said you probably understand now that there can't be one definitive answer to the question "where do APIs get their info from?". But some common answers are:
other APIs
some proprietary/in-house developed service/database
etc.
For sports APIs - probably they are being provided by some sports media, which has the results as soon as they get out, so they just enter them in their DB and immediately they become available through their API.
For weather forecasts - again as with the sports API they are probably provided by a company dealing with weather forecasts.
If it's easier for you you can think of the "read-only" APIs as rss feeds in a way.
I hope this clears the things a bit for you.

You could have a look at Stack Share to see what companies use for databases and whatnot. But there isn't a universal answer, every company uses whatever works for them.
This usually means that te company has its own database in which the data is stored. But they might also get their data from another company.
But a 'database' is not just SQL, maybe they use unstructured data or any of the other options to store data.
That's where the "whatever works" comes from. The company chooses a solution they go with which best fits their needs.

Related

End-to-end encrypted mobile backend as a service?

I'm thinking of using an MBaaS such as Firebase or Kinvey for my next app, and am wondering if any exist which encrypt application data end-to-end (i.e. such that the encryption keys are never shared with the service provider). This seems feasible in theory, since the server is not expected to do any computation on the data, only store it and deliver it to clients.
Does such a service exist? I've found ZeroDB and Crypton, but neither are available as services AFAICT, which means I'd have to administer, scale, and back them up myself. I also thought of using something like Firebase and encrypting my app's data before I pass it to the Firebase API, but I'm wary of writing a one-off crypto layer like that unless I have to (i.e. I'd rather use something that's been peer-reviewed).
Alternatively, if no such service currently exists, why not? Is it technically infeasible, or is there just no market for it?
Edit: This seems closest to what I'm looking for, but considering the broken links on their website I'm guessing it's defunct: Adreneline Mobility
The answer to your question is actually available on the market. CloudMine offers end-to-end encryption (disclosure - I work at CloudMine). They have a largely healthcare focused offering so it has to stand up to HIPAA and other government regs around data security.
Here's a good overview video on security featuring CloudMine's CTO. The first 45 sec. provide some more information on our encryption techniques.
I know I'm being the "sales guy" right now but I'm happy to hop on a call to share what we've built and discuss your specific use case. You can email me at nick at cloudmineinc.com if you're interested.
Virgil Security (full disclosure - I work there) has an end-to-end encryption SDK that works for any endpoint, and also has a special integration with Firebase. It's open source, of course. Check it out and feel free to ask any questions of the team here or on Slack - https://e3kit.readme.io/

Is there a good way to link registered users' emails with data in google analytics?

If I build a website for my new awesome mobile app (or web service or whatever) I might want to do a slow launch, sending email invites to the first x people to register on the site.
Is there a good way to link each registered email to the corresponding data in google analytics (or any similar service), and query them based on location, language, etc.?
Maybe the spanish version isn't quite done yet, so I don't want to invite people who used a spanish browser to sign up. Or maybe my app is location-dependent (like time tables for buses) and just doesn't work at all outside of my home town.
I really want to have a simple email-only "registration".
It is completely possible, although it may breach some of GA's terms of use if done wrong.
You should not store email addresses in any way as part of your GA data because it would be considered personally identifiable data. However, there is nothing saying that you couldn't store a kind of GUID for each user, and then compare that with email addresses offline - although the user should be made aware that any actions they take while using your service/application/whatever are being tracked with the capability of being personally identified.
As far as getting the actual data that you are discussing, language and location are stored by GA by default, so no headache there!
The best way to store the user's GUID would probably be in a custom dimension. How you do this is going to depend on how you build your product. I had to write a tracking library using the measurement protocol for an AS3 project awhile back because there isn't an AS3 library that is supported anymore. If you are using JavaScript, it will be much easier, as Google offers native JS libraries to handle web analytics.
Finally, try taking a look at the documentation. Its pretty easy to understand

Will Google block my access if I use their features without token?

I'm using this link https://www.google.com/reader/api/0/stream/contents/feed/FEEDHERE?output=json&n=20
to fetch feeds using Google's algorithm. As you can see I'm not adding any other parameters, just fetching the returned data in JSON format. My app will be heavily used hopefully and if I send a lot of requests to this link, will Google block my access or something?
Is there anything I can include, like userip, url for my app (so if they have problem to just contact me) or something else?
The most basic answer to your question is that Google will change its Terms of Service whenever it likes, and you've got no say in the matter. So if it's allowed today, it might not be allowed tomorrow, at Google's whim.
On this issue, though, you seem fairly safe. From the Terms of Service (these is the general document, since Reader doesn't seem to have a specific one):
Don’t misuse our Services. For example, don’t interfere with our Services or try to access them using a method other than the interface and the instructions that we provide.
Google provides RSS and Atom. They provide these feeds, so I assume they expect that they'll be used. They don't say that it's a misuse to point someone else at those feeds, so it looks OK for now, but they could add such a clause at any time.
All online services are subject to the terms and conditions of the providers of those services. So, as others have said, they may be ok with your use today, but they can change their mind any time down the line. I doubt including a URL or email or contact info will help anything, because when these services change, they don't notify every user of the service, they just announce the change publicly, and usually they give several month's notice in order to give users a chance to adapt their applications, but this is not standardized or enforced so there is no guarantee. One example would be the fairly recent discontinuance of the Google Finance API (for which no replacement has been announced).
The safest approach would be to design your app such that this feature that uses google's functionality is decoupled as much as possible from the rest of your app, so that, when or if the availability of the service changes (ie it's no longer available at all) you can adapt your app to use some other source for the feeds with minimal impact to the rest of the app. Design for change and plan for the worst.

Where can I find research data that proves best practices for creating public APIs?

I need to persuade management (product management and others) that just "publicising" internal private APIs is a bad idea compared to the best practice of creating a public API candidate, use it internally and when satisfied make it public. Can anyone help me find some facts like research papers that helps me make the argument?
I'm not aware of any specific research since the public interface to any API is highly subjective and specifically tied to a problem domain.
The first few pages of this pdf are an ok overview of an API for a business person:
http://aarontgrogg.com/wp-content/uploads/2009/09/How-to-Build-API-and-why-it-matters.pdf
This blog posts section header's highlight key points that your business partners need to think about as I think you're aware already. I would search for best practices around these specific subjects as they pertain to a public API: http://gaejexperiments.wordpress.com/2010/07/01/public-api-design-factors/
API Format Rest vs WebServices
Response Format XML, JSON
Service Contract Description
Authentication Mechanism for the Consumers of the API
Service Versioning (so you can roll out new versions of the API without blowing everyone up)
Rate Limits (obviously, for any number of things, preventing DOS attacks, and just managing system load)
Documentation
Helper Libraries
Website for the public api
Depending on what type of API it is... A SUPPORT TEAM
This doesn't address your internal processes either. Should your internal systems be able to evolve faster than the public api? In most cases I think the answer is yes, as your company wants to be agile with their business model and strategy. Having 3rd parties consume your internal systems is going to force your company to make a decision of who's more important when it's time to make an update. Either your company will have to version it's internal service and hope the third party consumers upgrade in a timely fashion, or just break the integration for all the third party consumers.
At the end of the day, it might not be worth doing. You can only screw over the people using your API so many times before they stop using it. What good is it if no one uses it.
I have been in the position before where the business has wanted an API pushed out too fast and without any governance around it. It resulted in all of my time being spent supporting people who were integrating with our API, and writing code samples for them.

User ownership of personal information [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
At the moment it seems that most webapps store their user-data centrally.
I would like to see a movement towards giving the user total access and ownership of their own personal information and data; ultimately allowing the user to choose where their data is stored.
As an example - with an application like facebook, the user's profile data could exist on any device that they own (e.g. their mobile phone) ... facebook would then request the data from the user, and make use of it.
Does anyone see this idea becoming a reality? Is it a ridiculous idea?
CLARIFICATION:
The information would at least need to be cache-able. The motivation behind the idea was to give the user more control over their own data - the user is self-publishing an
authoritative version of what they are happy for the world to see.
I'm imagining a future which is largely dictated by choices which are made now. Perhaps physical location of the data isn't actually important - and is more a symbolic gesture... but I think that decoupling the relationship between our information and the companies that make use of it could be a positive thing.
But perhaps, the details do need a bit more work ;)
What's with performance? Imagine you want to search for data that is located on hundreds of mobile phones or private distributed systems.
what your describing is simulator to a combination of OpenID Attribute Exchange, Portable Contacts and OpenSocial. Having one repository of user data that every other provider would feed off. Its nice for a user but I would not go so far as to tie it to a specific device. Rather a federated identity that you control from one vendor's website/application.
I am with you on this one.
And I think the key technology might be RDF. Since protocols such as F.O.A.F. are already used in these social applications, it is a small step from $Facebook storing your RDF Graph, to you storing it yourself, and saying: This is me, these are my friends, or anything else you might want someone to know.
This approach might be globalised to other personal information you might ened an authorised party to know, like Health Records.
There are quite a few conceptual problem with what you are suggesting.
Firstly, everytime you reconnected to the system, you would need to upload your personal information back into the system so that it could interact with you. This adds quite an overhead to the signin/handshake/auth with the remote system.
Secondly, alot of online systems (particularly online communities) rely on you leaving an online profile of yourself so that other users can interact with you (via your profile) when you yourself are offline. This data would have to be kept somewhere central.
At the very least, the online system would need a very basic profile to represent you, so that you could login & authenticate against... which sounds like a contradiction to what you are suggesting.
Performance would suffer should the user have physical possession of the data; e.g., thumb drive, local drive. However, if a "padded cell" solution were possible where the user has complete rights to a vault that the application could reach quickly, then there might be a possibility.
This really isn't a technology solution, rather one of corporate policy. Facebook could easily craft a policy that states that your records are yours, just like a bank should. They just don't. For that matter, many other institutions who are supposed to guard our personal information - our property if I can evoke John Locke - but fail miserably. If they reviewed their practices for violation of policy and were honest, you could trust. Unfortunately this just doesn't happen.
The IRS, Homeland Security and other agencies will always require that an institution yield access to assets. In the current climate I can't see how it would be allowed for individuals to remain in physical possession of electronic records that a bank or institution would use online.
Don't misinterrpret me - I think your idea is a good one to pursue, but it's more of a corporate policy issue than a technical solution.
You need to clarify what you mean by ownership. Are you trying to ensure that the data is only stored on your own devices? As others have pointed out, this will make building social networks impossible. You would disappear from Facebook when you weren't connected to it, for example.
Or are you trying to ensure that a single authoritative copy exists and that services defer to it? This might be more possible, and would require essentially synching the master copy on your cell phone with the server when possible.
Or are you trying to ensure that you can edit/delete your account at any time? Most sites already work like this.
The user still wouldn't be sure they 'own' their data, simply because they'd have to upload it every time they connect, and the company it's being sent to could still do whatever it wants with it. It could just not display your profile when you're not online, but still keep a copy of it somewhere.
Total access, ownership and location choices of personal information and data is an interesting goal but your example illustrates some fundamental architecture issues.
For example, Facebook is effectively a publishing mechanism. Anything you put on a public profile has essentially left the realm of information that you can reasonably expect to keep private. As a result, let's assume that public forums are outside the scope of your idea.
Within the realm of things that you can expect to keep private, I'm a big fan of encryption combined with physical and network security balanced against the need for performance. You use the mobile phone as an example. In that case, you almost certainly have at least three problems:
What encryption is used on the phone? Any?
Physical security risk is quite high - have you ever had an expensive portable electronic device stolen? There seems to be quite the stolen phone market out there....
The phone becomes a network hotspot - every service that needs your information would need to make an individual connection to your phone before it could satisfy a request. Your phone needs to be on, you need to have a sufficiently fat data pipeline, etc.
If you flip your idea around, however, it becomes clear that any organization that does require persistent storage of your sensitive private information (aka SPI) should meet some fundamental (and audit-able) requirements:
Demonstrated need to persist the information: many web services already ask "should I remember you?" or "do you want to create an account?" I think the default answer should always be "NO" unless I say otherwise explicitly.
No resale or sharing of SPI. If I didn't tell my bank or my bookstore that they can share my demographic information, they shouldn't be able to. Admittedly, my phone number and address are in the book, so I can't expect that I'll stay off of every mailing list but this would at least make things less convenient for the telemarketers.
Encryption all the time. My SPI should never be stored in the clear.
Physical security all the time. My SPI should never be on a laptop drive.
Given all of the above, it would be possible for you to partially achieve the goal of controlling the dissemination of your SPI. It wouldn't be perfect. The moment you type anything in, there is immediately a non-zero risk that someone somewhere has somehow figured out to monitor or capture it. Even so, you would have some control of where your information goes, some belief that it would only go where you tell it to go and that the probability of it being stolen is somewhat reduced.
Admittedly, that's a lot of weasel words in a row....
We are currently developing a platform to allow people exercise the right to access their personal data (habeas data) against any holder of such data.
Rather than following the approach you suggest, we actually pursue a different strategy: we take snapshots of the personal data as it is in the ddbb of the "data holder" whenever the individual wants to access her data.
Our objective is to give people freedom in the management of their own personal data, allowing them to share it with others based on their previous consent.
I would like to further discuss with you should you be interested.
Please read Architecture Astronauts.

Resources