Cache HTTP requests in a serverless environment

Cache HTTP requests in a serverless environment - http

I have an serverless lambda which does the following:
Start with a set of ids in the query (example.com?ids=a,b,c)
Does HTTP request to another webservice (based on the given ids) which I do not control
Renders the website based on the other webservice response
All works, no issues so far.
Today I introduced a new UI for my website. The user can toggle between "a tableview" and "a listview".
Because those differents views can also be controlled via (another) query paramter, I do a simple "redirect" to my own website. Assuming I'm looking currently at the tableview, for the "show listview" textfield I have a simple <a href="example.com?ids=a,b,c&view=list">[...]<a>.
This redirect leads, of course, to another call to the "other webservice". Even if I can be pretty sure that the content haven't change since my last call (just a few seconds/minutes ago).
My question is:
Can I somehow cache the HTTP requests from my lambda so that we won't do the call again?
I'm somewhat aware of the Cache-Control headers, but since it is an serverless environment it could (and probably will?! I don't know but I don't even care 😅) another machine without this cache. And therefore it will not be an cache hit and will do the requests anyways.
Please don't answer with solutions like "Use JavaScript for changing the UI". I'm aware that this is possible, but my main question is just how (and even if I can) cache such requests in a serverless environment.
Thanks in advance!

From documentation and common best practices we get the impression, that a Serverless function or more specifically an AWS Lambda function has only a very short lifespan. This is to the point, that we need to assume that a function is provisioned into its (firecracker) micro container for a single call only and gets de-provisioned afterwards.
However, to safe resources and to improve performance, the life cycle of a Lambda is rather: provisioning, use for several distinct function calls, de-provisioning.
This means irregardless of the used language, the container gets reused for a certain amount of time. Global resources you create in that time (global variables, static objects, files) will survive beyond a single function call.
Your case
In your exact case you can then implement whichever caching strategy you want. This should work most of the times for your use-case with two pitfalls you need to be aware of:
The micro container gets re-used between requests between different clients. Meaning that of course you need to have a way of access control to your cache, if this is relevant to your use-case.
You do not have direct control over the timeout time of your Lambda, meaning that you should anticipate that every now and then a user will experience the overhead of a non-cached request just due to bad timing.
Let us know about your final solution.

Related

HTTP session tracking through base URL "resource"?

A little background: We're currently trying try specify an HTTP API between a couple of vendors so that different products can easily inter operate. We're not writing any "server" software yet, nor any client, but just laying out the basics of the API so that every party can start prototyping and then we can refine it. So the typical use-case for this API would be being used by (thin) HTTP layers inside a given application, not from within the browser.
Communication doesn't really make sense without having session state here, so we were looking into how to track sessions typically.
Thing is, we want to keep the implementation of the API as easy as possible with as little burden as possible on any used HTTP library.
Someone proposed to manage session basically through "URL rewriting", but a little more explicit:
POST .../service/session { ... }
=> reply with 201 Created and session URL location .../service/session/{session-uuid}
subsequent requests use .../service/session/{session-uuid}/whatever
to end the session the client does DELETE .../service/session/{session-uuid}
Looking around the web, initial searches indicate this is somewhat untypical.
Is this a valid approach? Specific drawbacks or pros?
The pros we identified: (please debunk where appropriate)
Simple on the implementation, no cookie or header tracking etc. required
Orthogonal to client authentication mechanism - if authentication is appropriate, we could easily pass the URLs to a second app that could continue to use the session (valid use case in our case)
Should be safe, as we're going https exclusively for this.
Since, PHPSESSID was mentioned, I stumbled upon this other question, where it is mentioned that the "session in URL" approach may be more vulnerable to session fixation attacks.
However, see 2nd bullet above: We plan to implement~specify authentication/authorization orthogonally to this session concept, so passing around the "session" url might even be a feature, so we think we're quite fine with having the session appear in the URL.

How to handle network calls in Microservices architecture

We are using Micro services architecture where top services are used for exposing REST API's to end user and backend services does the work of querying database.
When we get 1 user request we make ~30k requests to backend service. We are using RxJava for top service so all 30K requests gets executed in parallel.
We are using haproxy to distribute the load between backend services.
However when we get 3-5 user requests we are getting network connection Exceptions, No Route to Host Exception, Socket connection Exception.
What are the best practices for this kind of use case?

Well you ended up with the classical microservice mayhem. It's completely irrelevant what technologies you employ - the problem lays within the way you applied the concept of microservices!
It is natural in this architecture, that services call each other (preferably that should happen asynchronously!!). Since I know only little about your service APIs I'll have to make some assumptions about what went wrong in your backend:
I assume that a user makes a request to one service. This service will now (obviously synchronously) query another service and receive these 30k records you described. Since you probably have to know more about these records you now have to make another request per record to a third service/endpoint to aggregate all the information your frontend requires!
This shows me that you probably got the whole thing with bounded contexts wrong! So much for the analytical part. Now to the solution:
Your API should return all the information along with the query that enumerates them! Sometimes that could seem like a contradiction to the kind of isolation and authority over data/state that the microservices pattern specifies - but it is not feasible to isolate data/state in one service only because that leads to the problem you currently have - all other services HAVE to query that data every time to be able to return correct data to the frontend! However it is possible to duplicate it as long as the authority over the data/state is clear!
Let me illustrate that with an example: Let's assume you have a classical shop system. Articles are grouped. Now you would probably write two microservices - one that handles articles and one that handles groups! And you would be right to do so! You might have already decided that the group-service will hold the relation to the articles assigned to a group! Now if the frontend wants to show all items in a group - what happens: The group service receives the request and returns 30'000 Article numbers in a beautiful JSON array that the frontend receives. This is where it all goes south: The frontend now has to query the article-service for every article it received from the group-service!!! Aaand your're screwed!
Now there are multiple ways to solve this problem: One is (as previously mentioned) to duplicate article information to the group-service: So every time an article is assigned to a group using the group-service, it has to read all the information for that article form the article-service and store it to be able to return it with the get-me-all-the-articles-in-group-x query. This is fairly simple but keep in mind that you will need to update this information when it changes in the article-service or you'll be serving stale data from the group-service. Event-Sourcing can be a very powerful tool in this use case and I suggest you read up on it! You can also use simple messages sent from one service (in this case the article-service) to a message bus of your preference and make the group-service listen and react to these messages.
Another very simple quick-and-dirty solution to your problem could also be just to provide a new REST endpoint on the articles services that takes an array of article-ids and returns the information to all of them which would be much quicker. This could probably solve your problem very quickly.
A good rule of thumb in a backend with microservices is to aspire for a constant number of these cross-service calls which means your number of calls that go across service boundaries should never be directly related to the amount of data that was requested! We closely monitory what service calls are made because of a given request that comes through our API to keep track of what services calls what other services and where our performance bottlenecks will arise or have been caused. Whenever we detect that a service makes many (there is no fixed threshold but everytime I see >4 I start asking questions!) calls to other services we investigate why and how this could be fixed! There are some great metrics tools out there that can help you with tracing requests across service boundaries!
Let me know if this was helpful or not, and whatever solution you implemented!

Would real world Meteor application use server side Methods almost exclusively?

I'm learning Meteor and fundamentally enjoy how fast I can build data driven applications however as I went through the Creating Posts chapter in the Discover Meteor book I learned about using server side Methods. Specifically the primary reason (and there are a number of very valid reasons to use these) was because of the timestamp. You wouldn't want to rely on the client date/time, you'd want to use the server date/time.
Makes sense except that in almost every application I've ever built we store date/time of row create/update in a column. Effectively every single create or update to the database records date/time which in Meteor now looks like I would need to use server side Methods to ensure data integrity.
If I'm understanding correctly that pretty much eliminates the ease of use and real-time nature of a client side Collection because I'll need to use Methods for almost every single update and create to our databases.
Just wanted to check and see how everyone else is doing this in the real world. Are you just querying a server side Method that just returns the date/time and then using client side Collection or something else?
Thanks!

The short answer to this question is that yes, every operation that affects the server's database will go through a server-side method. The only difference is whether you are defining this method explicitly or not.
When you are just getting started with Meteor, you will probably do insert/update/remove operations directly on client collections using validators, which check for whether the operation is allowed. This usage is actually calling predefined methods on both the server and client: (for a collection named foo the you have /foo/insert, for example) which simply checks the specified validators before doing the operation. As you become more familiar with Meteor you will probably override these default methods, for reasons you described (among others.)
When using your own methods, you will typically want to define a method both on the server and the client, just as the default collection functions do for you. This is because of Meteor's latency compensation, which allows most client operations to be reflected immediately in the browser without any noticeable lag, as long as they are permitted. Meteor does this by first simulating the effect of a method call in the client, updating the client's cached data temporarily, then sending the actual method call to the server. If the server's method causes a different set of changes than the client's simulation, the client's cache will be updated to reflect this when the server method returns. This also means that if the client's method would have done the same as the server, we've basically allowed for an instant operation from the perspective of the client.
By defining your own methods on the server and client, you can extend this to fill your own needs. For example, if you want to insert timestamps on updates, have the client insert whatever timestamp in the simulation method. The server will insert an authoritative timestamp, which will replace the client's timestamp when the method returns. From the client's perspective, the insert operation will be instant, except for an update to the timestamp if the client's time happens to be way off. (By the way, you may want to check out my timesync package for displaying relative server time accurately on the client.)
A final note: it's good to understand what scope you are doing collection operations in, as this was one of the this that originally confused me about Meteor. For example, if you have a collection instance in the client Foo, Foo.insert() in normal client code will call the default pair of client/server methods. However, Foo.insert() in a client method will run only in a simulation and will never call server code - so you will need to define the same method on the server and make sure you do Foo.insert() there as well, for the method to work properly.
A good rule of thumb for moving forward is to replace groups of validated collection operations with your own methods that do the same operations, and then adding specific extra features on the server and client respectively.

In short— yes!
Publications exist to send out a 'live', and dynamic, subset of the database to the client, sending DDP added messages for existing records, followed by a ready, and then added, changed, and deleted messages to keep the client's cache consistent.
Methods exist to- directly, or indirectly— cause Mongo Updates, and like it was mentioned by Andrew, they are always in use.
But truly, because of Meteor's publication architecture, any edits to collections that are currently being published to at least one client, will be published via DDP - regardless of the source of the change to Mongo - even an outside process.

How do I handle use 100 Continue in a REST web service?

Some background
I am planning to writing a REST service which helps facilitate collaboration between multiple client systems. Similar to how git or hg handle things I want the client to perform all merging locally and for the server to reject new changes unless they have been merged with existing changes.
How I want to handle it
I don't want clients to have to upload all of their change sets before being told they need to merge first. I would like to do this by performing a POST with the Expect 100 Continue header. The server can then verify that it can accept the change sets based on the header information (not hard for me in this case) and either reject the request or send the 100 Continue status through to the client who will then upload the changes.
My problem
As far as I have been able to figure out so far ASP.NET doesn't support this scenario, by the time you see the request in your controller actions the POST body has normally already been completely uploaded. I've had a brief look at WCF REST but I haven't been able to see a way to do it there either, their conditional PUT example has the full request body before rejecting the request.
I'm happy to use any alternative framework that runs on .net or can easily be made to run on Windows Azure.

I can't recommend WcfRestContrib enough. It's free, and it has a lot of abilities.
But I think you need to use OpenRasta instead of WCF in order to do what you're wanting. There's a lot of stuff out there on it, like wiki, blog post 1, blog post 2. It might be a lot to take in, but it's a .NET framework thats truly focused on being RESTful, and not RPC like WCF. And it has the ability work with headers, like you asked about. It even has PipelineContributors, which have access to the whole context of a call and can halt execution, handle redirections, or even render something different than what was expected.
EDIT:
As far as I can tell, this isn't possible in OpenRasta after all, because "100 continue is usually handled by the hosting environment, not by OR, so there’s no support for it as such, because we don’t get a chance to respond in the asp.net pipeline"

Is it wrong to switch client logic in the service tier?

We have two client apps (a web app and an agent app) accessing methods on the same service, but with slightly different requirements. My team wants to control behaviour on the service side by passing in a ApplicationType parameter to every method - which is essentially an enum containing the name of the calling client application - which is then used as a key for a database lookup to configure the service with client-specific options.
Something about this makes me uneasy as I don't think the service should really have to be aware of which client is calling it. I'm being told that it's easier to do it this way than pass a load of options dynamically through the method call.
Is there anything wrong with the client application telling the service who they are? Or is there really no difference between passing a config key versus a set of parameterized options?
One immediate problem I can see is that if we ever opened the service to another client run by a third party, we'd have to maintain their configuration settings locally for them. At the moment we own both client apps so it's not so much of a problem.
How would you do it?

In a layered solution, you should always consider your layers as onion-like layers, and dependencies should always go inwards, never outwards.
So your GUI/App layer should depend on the businesslogic layer, the businesslogic layer should depend on the data access layer, and similar.
Unless you categorize the clients (web, win, wpf, cli), or generalize it with client profiles (which client applications can configure), I would never pass in the name of the calling application, as this would make the business logic layer aware of and dependent upon the outside layer.
What kind of differences are we talking about that would depend on the type of application? If you elaborate a bit on the differences here, perhaps someone can come up with some helpful advice on other ways to solve this.
But I would definitely look for other ways before going down your described path.

Can't you create two different services, one for each application? The two services will share a lot of code or call a single internal service with different parameterization depending on what outer service was called.

From a design perspective, this is no different than having users with different profiles. From a security perspective, I hope your applications are doing something to identify themselves, lest users of one application figure out a way to invoke the other applications logic as a hack. (Image a HR application being used by the mafia and a bank at the same time, one customer would be interesting in hacking the other customer's application on a shared application host)
In .net the design doesn't feel this way because the credentials live on the thread (i.e. when you set the IIPrincipal, that info rides on the thread-- it is communicated along with each method call, but not as a parameter.)
Maybe what you are looking for in terms of a more elegant design is an ApplicationIdentity attribute. You'd have to write a custom one, I don't know of one in the framework right now.

This is a hard topic to discuss without a solid example.
You are right for feeling that way. Sending in the client type to change behaviour is not correct. It's not a bad idea for logging... but that's about it.
Here is what I would do:
Review each method to see what needs to be different and why.
Create different methods for different usages. The method name should be self explanatory. If you ever need to break compatibility, you have more control (assuming you're not using a versioning system which would be overkill for an in-house-only service).
In some cases request parameters (flags/enum values) are more appropriate.
In some cases knowing the operating environment is more appropriate (especially for data security). The operating environment almost always sent during a login request. Something like "attended"/"secure" (agent client) vs "unattended"/"not secure" (web client). Now you must exchange a session key (HTTP cookie or an application level session id). Sessions obviously doesn't work if you need to be 100% stateless -- especially if you want to scale-out without session replication... if you have that requirement, send a structure in every request.
Think of requests like functions in your code. You wouldn't put a magic parameter that changes the behaviour of the function. You would create multiple functions that each behave differently. Whoever is using the function makes the decision which one to call.
So why is client type so wrong? Client type has no specific meaning on its own. It has many meanings and they may change over time. It's simply informational which is why it is a handy thing to log. An operating environment does have a specific meaning.
Here is a scenario to consider: What if a new client type is developed that is slightly different in a way that would break compatibility with the original request? Now you have two requests. 2 clients use Request A and 1 client uses Request B. If you pass in a client type to each request, the server is expected to work for every possible client type. Much harder to test and maintain!!

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex