I have an API that currently does not use any caching. I do have one piece of Middleware that I am using that generates cache headers (Cache-Control, Expires, ETag, Last-Modified - using the https://github.com/KevinDockx/HttpCacheHeaders library). It does not store anything as it only generates the headers.
When an If-None-Match header is passed to the API request, the middleware checks the Etag value passed in vs the current generated value and if they match, sends a 304 not modified as the response (httpContext.Response.StatusCode = StatusCodes.Status304NotModified;)
I'm using a Redis cache and I'm not sure how to implement cache invalidation. I used the Microsoft.Extensions.Caching.Redis package in my project. I installed Redis locally and used it in my controller as below:
[AllowAnonymous]
[ProducesResponseType(200)]
[Produces("application/json", "application/xml")]
public async Task<IActionResult> GetEvents([FromQuery] ParameterModel model)
{
var cachedEvents = await _cache.GetStringAsync("events");
IEnumerable<Event> events = null;
if (!string.IsNullOrEmpty(cachedEvents))
{
events = JsonConvert.DeserializeObject<IEnumerable<Event>>(cachedEvents);
}
else
{
events = await _eventRepository.GetEventsAsync(model);
string item = JsonConvert.SerializeObject(events, new JsonSerializerSettings()
{
ReferenceLoopHandling = ReferenceLoopHandling.Ignore
});
await _cache.SetStringAsync("events", item);
}
var eventsToReturn = _mapper.Map<IEnumerable<EventViewDto>>(events);
return Ok(eventsToReturn);
}
Note that _cache here is using IDistributedCache. This works as the second time the request is hitting the cache. But when the Events I am fetching are modified, it does not take the modified values into account. It serves up the same value without doing any validation.
My middlware is setup as:
Cache Header Middleware -> MVC. So the cache headers pipeline will first compare the Etag value sent by the client and either decides to forward the request to MVC or short circuits it with a 304 not modified response.
My plan was to add a piece of middleware prior to the cache header one (i.e. My Middleware -> Cache Header Middleware -> MVC) and wait for a response back from the cache header middleware and check if the response is a 304. If 304, go to the cache and retrieve the response. Otherwise update the response in the cache.
Is this the ideal way of doing cache invalidation? Is there a better way of doing it? With above method, I'll have to inspect each 304 response, determine the route, and have some sort of logic to verify what cache key to use. Not sure if that is the best approach.
If you can provide some guidelines and documentation/tutorials on cache invalidation, that would be really helpful.
Here is a guideline based on how a service I support uses cache invalidation on a CQRS system.
The command system receives create, update, delete requests from clients. The request is applied to Origin. The request is broadcast to listeners.
A separate invalidation service exists and subscribes to the change list. When a command event is received, the configured distributed caches are examined for the item in the event. A couple of different actions are taken based on the particular system.
The first option is the Invalidation service removes the item from a distributed cache. Subsequently consumers of the services sharing the distributed cache will eventually suffer a cache miss, retrieve the item from storage and add the latest version of the item to the distributed cache. In this scenario there is a race condition between all of the discreet machines in the services and Origin may receive multiple requests for the same item in a short window. If the item is expensive to retrieve this can strain Origin. But the Invalidation scenario is very simple.
The second option is the Invalidation service makes a request to one of the services using the same distributed cache and asks the service to ignore cache and get the latest version of the item from Origin. This addresses the potential spike for multiple discreet machines calling Origin. But it means the Invalidation service is more tightly coupled to the other related services. And the service now has an API that allows a caller to bypass its caching strategy. Access to the uncached API would need to be secured to just the Invalidation service and other authorized callers.
In either case, all of the discreet machines that use the same redis database also subscribe to the command change list. Any individual machine just processes changes locally by removing items from its local cache. No error exists if the item is not present. The item will be refreshed from redis or Origin on the next request. For hot items, this means multiple requests to Origin could still be received from any machine that has removed the hot item and redis has not yet been updated. It can be beneficial for the discreet machines to locally "cache" and "item being retrieved" task that all subsequent request can await rather than calling Origin.
In addition to the discreet machines and a shared redis, the Invalidation logic also extends to Akamai and similar content distribution networks. Once the redis cache has been invalidated, the Invalidation routine uses the CDN APIs to flush the item. Akamai is fairly well-behaved and if configured correctly makes a relatively small number of calls to Origin for the updated item. Ideally the service has already retrieved the item and copies exist in both discreet machines local caches and the shared redis. CDN invalidation can be another source of spikes of requests if not anticipated and designed correctly.
In redis, from the discreet machines sharing it, a design that uses redis to indicate an item is being refreshed can also shield origin from multiple requests for the same item. A simple counter whose key is based on the item ID and the current time interval rounded to the nearest minute, 30 seconds, etc. can use the redis INCR command the machines that gets the count of 1 access Origin while all others wait.
Finally, for hot items, it can be helpful to have a Time To Refresh value attached to the item. If all payloads have a wrapper similar to below, then when an item is retrieved and its refresh time has passed, the called performs a background refresh of the item. For hot items this means they will be refreshed from cache before their expiration. For a system with heavy reads and low volumes of writes, caching items for an hour with a refresh time of something less than an hour means the hot items will generally stay in redis.
Here is a sample wrapper for cached items. In all cases it is assumed that the caller knows type T based on the item key being requested. The actual payload written to redis is assumed to be a byte array serialized and possibly gzip-ed. The SchemaVersion provide a hint to how the redis string is created.
interface CacheItem<T> {
String Key {get; }
DateTimeOffset ExpirationTime {get; }
DateTimeOffset TimeToRefresh {get; }
Int SchemaVersion {get;}
T Item {get; }
}
When storing
var redisString = Gzip.Compress(NetDataContractSerializer.Serialize(cacheItem))
When retrieving the item is recreated by the complementary uncompress and deserialize methods.
Related
In an old-school server environment, you initialize an SDK (like the Twitter SDK) when the server starts up, using dotenv to read secrets and tokens from your .env file like so:
import dotenv from 'dotenv';
import {Client} from 'twitter-api-sdk';
dotenv.config();
const twitterClient = new Client (TWITTER_SECRET_INFO);
And then you would use the twitterClient object to get data in one of the route handlers.
What's the best practice for initializing something like the twitter client in Hono with Cloudflare?
In the old service worker framework, I could have treated the secret info as a global environment variable much like in Node/Express, but in the new module worker code you have to access the environment variables as a parameter passed to a function call. It looks like Hono manages this by passing contexts to methods like .use/.get/.post.
Ideally, though, I wouldn't reinitialize the twitter connection on every request, especially since I'm just getting public info with a token, not dealing with any user login/password info.
Is there any way to do this in Hono/Cloudflare, or do I have to initialize the Twitter client middle ware each request? I looked at the Hono class constructer, but from what I can tell, all it does is take a router config object.
And from what I can tell of the cloudflare docs, module workers have the same issue. Whereas constants in a service worker were declared outside the route handler, it looks like everything in a module worker is declared inside a fetch handler. Is there anyway to initialize once during the life of the worker and not for each request?
In principle you could initialize the client on the first request:
let twitterClient = null;
export default {
async fetch(req, env, ctx) {
if (!twitterClient) {
twitterClient = new Client(env.TWITTER_SECRET_INFO);
}
// ... normal code ...
}
}
That said, though, is creating a new client actually expensive?
Constructing the client does not "initialize a connection". The client presumably makes requests by calling fetch(). The fetch() API doesn't expose any way to control the underlying connections used; each fetch() operates effectively independently. But, the Workers Runtime will automatically reuse connections behind the scenes, when possible. It could even reuse the same connection for two completely unrelated Workers, if they are contacting the same destination host. So it may be that even creating a new client with every request, you're already getting good connection reuse.
That said, perhaps the client has to do some sort of key exchange upfront, e.g. exchanging a long-lived refresh token for an access token. That is annoying to have to repeat on every request. So in that sense, maybe caching it in a global helps.
However, note that Workers creates LOTS of instances of your Worker around the world. You may find if you curl your Worker several times in a row, each request lands on a different instance. You may find that caching in global state does not actually have much impact unless you have a large amount of traffic.
Caching may be more effective if you use the Cache API to store cached values into the colo-wide cache. Unfortunately, client libraries designed for Node environments may not provide the right hooks to do this.
One final note: Note that putting live resources (things that are not just plain data structures) into the global scope can be dangerous on Workers, because in general a Promise created on behalf of one incoming request cannot be awaited in the context of some other request. So if that twitter client does do some sort of upfront key exchange and tries to have all requests wait for that to complete, you may find that if you receive multiple requests at once before the initial key exchange finishes, all except the first request end up failing. To be honest, I would recommend creating a new client for every request unless you see a measurable performance problem from this.
Sessions' variables
In all web app you can get/set sessions' variables
PHP:
$foo = $_SESSION['myVar'];
.NET (MVC, in Controller):
using System.Web.Mvc;
// ...
var foo = Session["myVar"];
I am looking for some detailed informations on sessions' variables :
Their initial purpose (what problems did they aimed to address ?)
Common use cases
Storage
Where is it stored on the system ?
Hard drive, RAM, ...
Who is storing it?
Client / Server
I guess it's server-side, so what is managing it ?
Web Server (Apache, IIS, ...) / Web App
What is the lifetime of a session's variable ?
The session, right. So when do a session start, when does it end and how do the system know when it can get rid of these variables (GC mechanism) ?
Security
Known security flaws ?
PS: I would like to allow people here to build a good documentation about this concept. Feel free to edit the question if you think some questions should be added or edited.
Purpose
Session Variables were created primarily to deal with the stateless behavior of the HTTP protocol. Because each page request was handled pretty much completely separately from each other page request, developers wanted ways to tie strings of requests together. The canonical example of this is a login page that authenticates the user and then changes the behavior of pages requested after login.
To help with this problem, many languages and/or frameworks provided the concept of a Session Variable which would let the developer store data that would be associated with a specific browser and would persist across separate requests from that same browser.
So, to take logins as an example, on the first request from a new browser, the Session Variable would be blank. Then the user would fill out authentication information and assuming it was correct, on the server side the code would set the Session Variable for that browser to contain some sort of identifier to say that his browser was authenticated. Then during subsequent requests the code could check that identifier in the Session Variable to do some specific code that required logging in.
Another common use case would be for a "wizard" workflow. You might have a multi-page form that you want the user to fill in over several separate requests. As the user fills out the form, you can add the values to the session until the user gets to the end of the form at which time you could save it in some more permanent storage.
Storage and Management
There are many ways to store Session Variables. Any sort of persistent storage that is persistent across requests will work. Probably the most basic way is to just create a separate file for each session. PHP does this by taking a session ID that it has stored as a cookie in a browser and then looks for a file with a named derived from the session ID.
You can also store Session Variables in databases, shared memory, or even in the cookie itself. Ruby on Rails stores Session Variables by encrypting the data and then setting the cookie to the encrypted data. So the session gets stored in the user's browser itself.
Most typically the Session Variable is associated with a cookie that is stored in web browser in some way. This cookie is usually managed automatically by the language or framework that the web server application is written in. The language or framework detects a new session and creates a new Session Variable that it provides to the web server application via some sort of API. The web server application can then use the API to store information in the Session Variable, to delete it, create a new one, etc... Usually the framework has some default value for the lifetime of the session, but usually this is adjustable via the API. I think the most typical default lifetime is the the lifetime of the browser process via a cookie that has a lifetime associated with the user's browser process.
Security
There are a lot of security issues around Session Variables because they are typically used to manage authorization and authentication in web applications.
For example, many applications set the session lifetime just using the lifetime associated with the cookie. Many login systems want to force the user to re-login after a specified time, but you can't trust the browser to expire the cookie when you tell it to. The browser could be buggy, could be written by a malicious person, or manipulated by the user herself to adjust the lifetime of the cookie. So if the Session Variable API you are using relies on the cookie lifetime, you may need to have a secondary mechanism that forces the Session Variable to expire even if the cookie doesn't.
Some other security issues involve storage. If you store A Session ID in a cookie and then use that Session ID as your file name to store the Session Variable in, a malicious browser can change the Session ID in the cookie to another ID and then requests from that browser would start using some other browser's session file.
Another issue is stolen session information. Through XSS or packet inspection, session information can be stolen from a users browser session and then used by a malicious user to access the other user's accounts. This sort of problem is typically mitigated by using SSL to protect the session in transit.
This page explains a lot of the security issues when using PHP's implementation of Session Variables. Ruby on Rails has a similar page that outlines the security issues with Session Variables for that platform.
So, I will be taking this question on under two considerations:
1. I am answering under PHP guidelines.
2. I am assuming that a shared hosting service is used.
Storage
With the use of shared hosting, the php.ini file holds this answer. The file is created, physically, at the path you specify through the "session.save_path" line within the php.ini file.
Source: php.net Manual
Who Stores Session
The session is TECHNICALLY stored by the SERVER but at request, obviously, by the client. So, answer: SERVER.
Source: session_start
Who Manages It
If your session.save_path is set to go somewhere on a shared hosting server, then they control the GC that destroys it or ignores it until later. Actually, instances have happened for me where other clients within the shared hosting server had their session_gc.maxlifetime at a MUCH shorter amount than I did, therefore causing my session files to be destroyed in the amount of time that THEY set (other shared users). To get around this, edit your "session.save_path" to within your OWN file tree.
Lifetime
As said previously, "session.gc_maxlifetime" controls this file's "expiration". Along with this, the "session.gc_probability" and "session.gc_divisor" should be considered, and set to "1" and "100", respectively. Google search this for further explanation.
Source: session.gc_maxlifetime
Security
I'm going to let php.net handle this, but here's the link!
Source: Security
I take an ASP.NET application scenario as an example.
In ASP.NET/MVC HttpContext.Current.Session provides access to RAM that is managed by the Server (WebServer/AppServer, IIS). In the case of the Internet Information Server, the RAM used is located inside a so called Application Pool and used by one or more Apps running inside the Web-/AppServer. The structure from the programmer's point of view is a Dictionary which means for access via C# that you can use the this[] operator to write to and read from the Session object.
// write access
var CurrentArticle = 123456;
Session["CurrentArticle"] = CurrentArticle;
//...
// read access
var CurrentArticle = 0;
CurrentArticle = (int)Session[nameof(CurrentArticle)];
The Session object provided by .NET will be created in the method Session_Start and deleted in Session_End. However, you don't have to use the System's default Session Store and can implement your own, i.e. like this:
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Linq;
using System.Runtime.Caching;
using System.Web;
// using MyOtherStuff;
namespace MyStuff.Server.Context
{
public class HttpSessionState : HttpSessionStateBase
{
Dictionary<string, object> _sessionStorage = new System.Collections.Generic.Dictionary<string, object>();
public override object this[string name]
{
get
{
if (HttpContext.Current == null || HttpContext.Current.Session == null)
if (!_sessionStorage.ContainsKey(name))
return null;
else return _sessionStorage[name];
return HttpContext.Current.Session[name];
}
set
{
if (HttpContext.Current == null || HttpContext.Current.Session == null)
_sessionStorage[name] = value;
else HttpContext.Current.Session[name] = value;
}
}
}
public class Current
{
/// <summary>
/// Objects stored in Cache expire after some time; compare Application
/// </summary>
public static ExpiringCache Cache = new ExpiringCache();
/// <summary>
/// Objects stored in Application don't expire
/// </summary>
public static Application Application = new Application();
public static HttpSessionState Session = new HttpSessionState();
private static System.Web.HttpServerUtility server;
public static System.Web.HttpServerUtility Server
{
get
{
if (System.Web.HttpContext.Current != null)
return Context.Current.Server;
if (server != null)
return server;
server = new System.Web.HttpApplication().Server;
if (server != null)
return server;
throw new NotSupportedException("HDitem.ApplicationServices.Current was not initialized (server)");
}
set { server = value; }
}
}
//..
}
Every new Browser connecting to your server creates a new session. If you don't care about the data from recent sessions of the same user (if your app has users) than you are probably done here.
If you want to re-connect a new session to one or more previous sessions, i.e. identified by some combination of data you have available about this user (e.g. through the request, a cookie or alike) or in the easiest form the authentication of the user than you might want to store the data of the session in Session_End rather than deleting it and recover it in Session_start or any time thereafter (as soon as you have enough data about the user of this session to identify her. In this case you need any form of Session Persistence (presumably this boils down to a Hard Disk or SSD to refer to your question) which in this case can come in any form of per-user-based storage, sometimes stored in a user-profile in a database or any file format like XML- or JSON-based.
So in other words: I don't want to generalize too much here but Session Storage is ideally very fast Memory Storage, potentially saved to any external storage if Session Persistence is implemented.
The above mentioned Session Storage is located on the server side. Modern browsers have a built in localstore that can be accessed via JavaScript. This local storage can also be used to create a session memory that can be used differently from the server-side session but can of course be synchronized in form of explicit requests or attachment to requests (cookie).
I have a ASP.Net API implementation, where to store and access the data / variables across consecutive calls, I am using a session state object as shown below and it can be successfully accessed in the multiple calls to separate calls by a browser:
// Access the current session object
var blSession = HttpContext.Current.Session;
// create the BL object using the user id
BL accessBL = new BL(userID);
// Store the Bl object in the session object dictionary
blSession["UserBL"] = accessBL;
I have to enable the following setting in the Global.asax, for the Session object to be accessible:
protected void Application_PostAuthorizeRequest()
{
// Enable session state in the web api post authorization
HttpContext.Current.SetSessionStateBehavior(SessionStateBehavior.Required);
}
Issue comes in when the WebAPI shown above has to be accessed via another ASP.Net MVC client, which is separately hosted on a different machine, at that time same consecutive calls do not maintain the state as shown above and thus it leads to an exception, since the consecutive calls rely on session data to proceed.
I have seen a similar issue when I seen the similar issue when I use the Fiddler Debugger, as it gets hosted as a web proxy, so consecutive calls through that too fails, since it does not maintain the state. In my understanding, issue is due to setting the Cookie across domain, which doesn't seem to work across domains due to security reason
I know a workaround is to use an application wide variable like Cache, but please suggest if you have a way to get the SessionState work. Let me know if you need more details.
If you have not setup an alternative way to do SessionState, then the default behavior is to do it InMemory on the server. This is why you are seeing issues when the request is handled by a different ASP.NET server.
Web API calls are meant to be stateless. That is, they should not perform like a classic ASP.NET application that relies on the framework to store user specific information in Session variables across HTTP requests. For each call, pass in a user-specific identifier or token that you can then use to lookup information stored in your backend. You can store this information in your database or a distributed cache like MemCache for faster retrieval.
Is it possible to use it with everything like in threads, with locking?
Main idea is to track whether some web-method is already invoked by same user, so we want to block more than X calls in a minute. We would keep some data in relevant HashTable.
ASP.NET Sessions are dependent on cookies, which are easily circumvented by malicious users. Since malicious users seem to be your problem, ASP.NET sessions aren't the solution.
You could write a wrapper class containing a static Dictionary<UserID, List<DateTime>> allowing operations on the dictionary only with 1-2 methods and properly locking it.
eg
class UserLogger
{
private Dictionary<int, List<DateTime>> _visits;
public bool AddPageVisit(int userID)
{
// lock dictionary
// add or update entry for the user,
// remove from all entries the datetimes older than a minute
// remove entries without datetimes
// check if number of datetimes for the user > max allowed
// unlock dictionary
}
}
class RestrictedController
{
private static UserLogger _userLogger;
}
Disclaimer: I just wrote this pseudo-code in the answer, it may contain typo's.
There is a big disadvantage of asp.net session. It locks executing same sessionId requests unless session is in ReadOnly state. I haven't use it in that way but it think it can solve your task.
ASP.NET Session State Overview
Concurrent Requests and Session State
Access to ASP.NET session state is exclusive per session, which means
that if two different users make concurrent requests, access to each
separate session is granted concurrently. However, if two concurrent
requests are made for the same session (by using the same SessionID
value), the first request gets exclusive access to the session
information. The second request executes only after the first request
is finished. (The second session can also get access if the exclusive
lock on the information is freed because the first request exceeds the
lock time-out.) If the EnableSessionState value in the # Page
directive is set to ReadOnly, a request for the read-only session
information does not result in an exclusive lock on the session data.
However, read-only requests for session data might still have to wait
for a lock set by a read-write request for session data to clear.
I have created a ASMX Web Service which does some Active Directory stuff behind the scene.
As I wish to retain certain information within Web Services under user session, I have decided to put [WebMethod(EnableSession = true)] and start using Session variables.
However, when I turn that option on, the return time from app -> web service -> app has became ridiculously long. (about a minute or more).
If I remove [WebMethod(EnableSession = true)], it is fairly fast.
Anyone know what is going on?
Possible reasons:
Session state is stored out of process (state server/ SQL server) and getting/storing it taking a long time
You are making multiple concurrent requests (including service requests) under the same session. ASP.NET ensures that only one session-full (session read/write) request execute at a time and hence, multiple concurrent requests would queue up.
EDIT :
For #2, obvious solution is to avoid session state use - for example, can you put the relevant information into another store such as cache or database (expensive).
If you are only reading session state in web service then you may take advantage of read-only session state (see IReadOnlySessionState). Read-only session state allows concurrent read-only requests - read/write request will still block all other requests. Now, EnableSession from WebMethod attribute does not support this - it either provides no session or read/write session. So one of the workaround can be to implement your own handler implementing IReadOnlySessionState and then route asmx request to thi handler using a http-module and then switch the handler to default one later. Because your handler requires read-only session state, you will have the read-only session state - see this forum post where such http-module that switches the handler has been given.