For which purposes LinkedIn uses Kafka - linkedin

Can anyone tell me for which specific purposes LinkedIn uses Kafka. I read quite many articles from linkedin blog about Kafka. Where they explain how they use Kafka and how much performance benefit they have achieved.
Does Linkedin uses Kafka to notify other users in the network that your friend xxx have a new status update. Something like that

For a good overview not only of Kafka's usage at LinkedIn, but also its origin and motivation, read Jay Krep's The Log: What every software engineer should know about real-time data's unifying abstraction. It's quite long but worth the time.
For a specific use case, you can read my write up of the Call Graph Analysis pipeline based on Kafka and Samza. Additionally, the Tools team wrote on how they consume the output of that job to monitor site performance in near real time.

Related

Where do APIs get their information from

After some time being working with Restful APIs I would like to know a bit more about their internal functionality.
I would like a simple explanation about how the API`s get access to the data that they provide as responses to our requests.
There are APIs, for example weather API`s or sports APIs that are capable to provide responses with very recent data (such as sports results), I am wondering where or how they get that updated info almost as soon as it is available.
I have seen here on SO questions with answers pointing to API design tutorials, but not to this particular topic.
An API is usually simply a facade (or an interface if you prefer) to some information resource. The idea behind it is to "hide" any complexity from the user, to unify several services to a single access point or even to keep the details about the implementation of the actual service a secret.
This being said you probably understand now that there can't be one definitive answer to the question "where do APIs get their info from?". But some common answers are:
other APIs
some proprietary/in-house developed service/database
etc.
For sports APIs - probably they are being provided by some sports media, which has the results as soon as they get out, so they just enter them in their DB and immediately they become available through their API.
For weather forecasts - again as with the sports API they are probably provided by a company dealing with weather forecasts.
If it's easier for you you can think of the "read-only" APIs as rss feeds in a way.
I hope this clears the things a bit for you.
You could have a look at Stack Share to see what companies use for databases and whatnot. But there isn't a universal answer, every company uses whatever works for them.
This usually means that te company has its own database in which the data is stored. But they might also get their data from another company.
But a 'database' is not just SQL, maybe they use unstructured data or any of the other options to store data.
That's where the "whatever works" comes from. The company chooses a solution they go with which best fits their needs.

End-to-end encrypted mobile backend as a service?

I'm thinking of using an MBaaS such as Firebase or Kinvey for my next app, and am wondering if any exist which encrypt application data end-to-end (i.e. such that the encryption keys are never shared with the service provider). This seems feasible in theory, since the server is not expected to do any computation on the data, only store it and deliver it to clients.
Does such a service exist? I've found ZeroDB and Crypton, but neither are available as services AFAICT, which means I'd have to administer, scale, and back them up myself. I also thought of using something like Firebase and encrypting my app's data before I pass it to the Firebase API, but I'm wary of writing a one-off crypto layer like that unless I have to (i.e. I'd rather use something that's been peer-reviewed).
Alternatively, if no such service currently exists, why not? Is it technically infeasible, or is there just no market for it?
Edit: This seems closest to what I'm looking for, but considering the broken links on their website I'm guessing it's defunct: Adreneline Mobility
The answer to your question is actually available on the market. CloudMine offers end-to-end encryption (disclosure - I work at CloudMine). They have a largely healthcare focused offering so it has to stand up to HIPAA and other government regs around data security.
Here's a good overview video on security featuring CloudMine's CTO. The first 45 sec. provide some more information on our encryption techniques.
I know I'm being the "sales guy" right now but I'm happy to hop on a call to share what we've built and discuss your specific use case. You can email me at nick at cloudmineinc.com if you're interested.
Virgil Security (full disclosure - I work there) has an end-to-end encryption SDK that works for any endpoint, and also has a special integration with Firebase. It's open source, of course. Check it out and feel free to ask any questions of the team here or on Slack - https://e3kit.readme.io/

How do I monitor a Google doc's live update with Wireshark?

I'm trying to get a better understanding of how Google docs makes and handles requests for changes in documents on the fly, and am wondering if there is something specific I could be monitoring with Wireshark.
Thank you.
Google Docs uses HTTPS to encrypt its communication. You cannot monitor any specific content-related changes, only witness that a user is contacting docs.google.com.
That said, if you're willing to delve into the realm of timing analysis, you may be able to reconstruct details from encrypted sessions. But this requires a significant amount of manual work.

Architecture For A Real-Time Data Feed And Website

I have been given access to a real time data feed which provides location information, and I would like to build a website around this, but I am a little unsure on what architecture to use to achieve my needs.
Unfortunately the feed I have access to will only allow a single connection per IP address, therefore building a website that talks directly to the feed is out - as each user would generate a new request, which would be rejected. It would also be desirable to perform some pre-processing on the data, so I guess I will need some kind of back end which retrieves the data, processes it, then makes it available to a website.
From a front end connection perspective, web services sounds like it may work, but would this also create multiple connections to the feed for each user? I would also like the back end connection to be persistent, so that data is retrieved and processed even when the site is not being visited, I believe IIS will recycle web services and websites when they are idle?
I would like to keep the design fairly flexible - in future I will be adding some mobile clients, so the API needs to support remote connections.
The simple solution would have been to log all the processed data to a database, which could then be picked up by the website, but this loses the real-time aspect of the data. Ideally I would be looking to push the data to the website every time the data changes or now data is received.
What is the best way of achieving this, and what technologies are there out there that may assist here? Comet architecture sounds close to what I need, but that would require building a back end that can handle multiple web based queries at once, which seems like quite a task.
Ideally I would be looking for a C# / ASP.NET based solution with Javascript client side, although I guess this question is more based on architecture and concepts than technological implementations of these.
Thanks in advance for all advice!
Realtime Data Consumer
The simplest solution would seem to be having one component that is dedicated to reading the realtime feed. It could then publish the received data on to a queue (or multiple queues) for consumption by other components within your architecture.
This component (A) would be a standalone process, maybe a service.
Queue consumers
The queue(s) can be read by:
a component (B) dedicated to persisting data for future retrieval or querying. If the amount of data is large you could add more components that read from the persistence queue.
a component (C) that publishes the data directly to any connected subscribers. It could also do some processing, but if you are looking at doing large amounts of processing you may need multiple components that perform this task.
Realtime web technology components (D)
If you are using a .NET stack then it seems like SignalR is getting the most traction. You could also look at XSockets (there are more options in my realtime web tech guide. Just search for '.NET'.
You'll want to use signalR to manage subscriptions and then to publish messages to registered client (PubSub - this SO post seems relevant, maybe you can ask for a bit more info).
You could also look at offloading the PubSub component to a hosted service such as Pusher, who I work for. This will handle managing subscriptions and component C would just need to publish data to an appropriate channel. There are other options all listed in the realtime web tech guide.
All these components come with a JavaScript library.
Summary
Components:
A - .NET service - that publishes info to queue(s)
Queues - MSMQ, NServiceBus etc.
B - Could also be a simple .NET service that reads a queue.
C - this really depends on D since some realtime web technologies will be able to directly integrate. But it could also just be a simple .NET service that reads a queue.
D - Realtime web technology that offers a simple way of routing information to subscribers (PubSub).
If you provide any more info I'll update my answer.
A good solution to this would be something like http://rubyeventmachine.com/ or http://nodejs.org/ . It's not asp.net, but it can easily solve the issue of distributing real time data to other users. Since user connections, subscriptions and broadcasting to channels are built in to each, that will make coding the rest super simple. Your clients would just connect over standard tcp.
If you needed clients to poll for updates then you would need a que system to store info for the next request. That could be a simple array, or a more complicated que system depending on your requirements and number of users.
There may be solutions for .net that I am not aware of that do the same thing, but those are the 2 I know of.

Where can I find research data that proves best practices for creating public APIs?

I need to persuade management (product management and others) that just "publicising" internal private APIs is a bad idea compared to the best practice of creating a public API candidate, use it internally and when satisfied make it public. Can anyone help me find some facts like research papers that helps me make the argument?
I'm not aware of any specific research since the public interface to any API is highly subjective and specifically tied to a problem domain.
The first few pages of this pdf are an ok overview of an API for a business person:
http://aarontgrogg.com/wp-content/uploads/2009/09/How-to-Build-API-and-why-it-matters.pdf
This blog posts section header's highlight key points that your business partners need to think about as I think you're aware already. I would search for best practices around these specific subjects as they pertain to a public API: http://gaejexperiments.wordpress.com/2010/07/01/public-api-design-factors/
API Format Rest vs WebServices
Response Format XML, JSON
Service Contract Description
Authentication Mechanism for the Consumers of the API
Service Versioning (so you can roll out new versions of the API without blowing everyone up)
Rate Limits (obviously, for any number of things, preventing DOS attacks, and just managing system load)
Documentation
Helper Libraries
Website for the public api
Depending on what type of API it is... A SUPPORT TEAM
This doesn't address your internal processes either. Should your internal systems be able to evolve faster than the public api? In most cases I think the answer is yes, as your company wants to be agile with their business model and strategy. Having 3rd parties consume your internal systems is going to force your company to make a decision of who's more important when it's time to make an update. Either your company will have to version it's internal service and hope the third party consumers upgrade in a timely fashion, or just break the integration for all the third party consumers.
At the end of the day, it might not be worth doing. You can only screw over the people using your API so many times before they stop using it. What good is it if no one uses it.
I have been in the position before where the business has wanted an API pushed out too fast and without any governance around it. It resulted in all of my time being spent supporting people who were integrating with our API, and writing code samples for them.

Resources