In the spring kafka documentation https://docs.spring.io/spring-kafka/docs/2.3.3.RELEASE/reference/html/#transactions
It mentions;
Transactions are enabled by providing the DefaultKafkaProducerFactory with a transactionIdPrefix. In that case, instead of managing a single shared Producer, the factory maintains a cache of transactional producers. When the user calls close() on a producer, it is returned to the cache for reuse instead of actually being closed. The transactional.id property of each producer is transactionIdPrefix + n
How is this cache configured e.g. producer pool size?
Does it dynamically create a new producer when there isn't any available producers from the cache for the given transaction?
It depends on whether the transaction is producer only and the producerPerConsumerPartition which is true by default (for consumer initiated transactions).
This property is to support EOSMode.ALPHA (or fallback to ALPHA when BETA is used but the broker is older than 2.5).
See here for more information about exactly once semantics.
When using producerPerConsumerPartition=false and for producer-only transactions, there is no limit to the cache size; new producers are created when the cache is empty, and returned to the cache when "closed".
Related
I am looking at the AxonIQ framework and have managed to get a test application up and running. But I have a question about how EventHandlers should be treated when using a store that has persistence in the Read Model.
From my (possible naive) understanding. #EventHandler annotated methods in my Projection class get called from the beginning when first launched. This would mechanism seems to assume that the Projection utilises some kind of in volatile store (e.g. an in memory sql like h2) which is re-created from scratch during the application bootup.
However, if the store was persistent in something like Elastic Search, I would want the #EventHandler to resume from its last persisted event instead of from the beginning event.
Is there anyway to control the behaviour of the #EventHandler in this way?
Axon has two types of Event Processors: Subscribing and Tracking.
The Subscribing mode (which was the default up to Axon 3) will handle events in the thread that delivers them. That means you're at "the mercy" of the delivery guarantees of whichever component delivers the events.
The Tracking mode (which is the default since Axon 4 when using an Event Store or otherwise a source that supports it) will have events handled in dedicated threads, managed by the Event Processor itself. That means events are handled asynchronously from the actual publication mechanism.
The Tracking Event Processor uses Tokens to keep track of progress. These Tokens are stored in a TokenStore and updates as the Processor has correctly processed each incoming event (possibly batched). You decide where those tokens are stored. If you update a relational database, we recommend storing the tokens in the same database, so that event changes and tokens are updated atomically.
If you don't specify any TokenStore, it depends on whether you're on Spring Boot, in which case Axon will attempt to detect a suitable TokenStore implementation for you. Otherwise, it may very well just be an in-memory TokenStore, which causes Processors to re-initialize on every startup (and possibly start from the beginning).
To configure a TokenStore
On Spring (Boot), simply add a bean of type TokenStore with the implementation you want to use
When using Axon's Configuration API, on the EventProcessingConfigurer, use one of the registerTokenStore(...) methods.
When the Tracking Processor starts, it will check the Token Store for previous progress, and continue from there automatically.
I have an API that currently does not use any caching. I do have one piece of Middleware that I am using that generates cache headers (Cache-Control, Expires, ETag, Last-Modified - using the https://github.com/KevinDockx/HttpCacheHeaders library). It does not store anything as it only generates the headers.
When an If-None-Match header is passed to the API request, the middleware checks the Etag value passed in vs the current generated value and if they match, sends a 304 not modified as the response (httpContext.Response.StatusCode = StatusCodes.Status304NotModified;)
I'm using a Redis cache and I'm not sure how to implement cache invalidation. I used the Microsoft.Extensions.Caching.Redis package in my project. I installed Redis locally and used it in my controller as below:
[AllowAnonymous]
[ProducesResponseType(200)]
[Produces("application/json", "application/xml")]
public async Task<IActionResult> GetEvents([FromQuery] ParameterModel model)
{
var cachedEvents = await _cache.GetStringAsync("events");
IEnumerable<Event> events = null;
if (!string.IsNullOrEmpty(cachedEvents))
{
events = JsonConvert.DeserializeObject<IEnumerable<Event>>(cachedEvents);
}
else
{
events = await _eventRepository.GetEventsAsync(model);
string item = JsonConvert.SerializeObject(events, new JsonSerializerSettings()
{
ReferenceLoopHandling = ReferenceLoopHandling.Ignore
});
await _cache.SetStringAsync("events", item);
}
var eventsToReturn = _mapper.Map<IEnumerable<EventViewDto>>(events);
return Ok(eventsToReturn);
}
Note that _cache here is using IDistributedCache. This works as the second time the request is hitting the cache. But when the Events I am fetching are modified, it does not take the modified values into account. It serves up the same value without doing any validation.
My middlware is setup as:
Cache Header Middleware -> MVC. So the cache headers pipeline will first compare the Etag value sent by the client and either decides to forward the request to MVC or short circuits it with a 304 not modified response.
My plan was to add a piece of middleware prior to the cache header one (i.e. My Middleware -> Cache Header Middleware -> MVC) and wait for a response back from the cache header middleware and check if the response is a 304. If 304, go to the cache and retrieve the response. Otherwise update the response in the cache.
Is this the ideal way of doing cache invalidation? Is there a better way of doing it? With above method, I'll have to inspect each 304 response, determine the route, and have some sort of logic to verify what cache key to use. Not sure if that is the best approach.
If you can provide some guidelines and documentation/tutorials on cache invalidation, that would be really helpful.
Here is a guideline based on how a service I support uses cache invalidation on a CQRS system.
The command system receives create, update, delete requests from clients. The request is applied to Origin. The request is broadcast to listeners.
A separate invalidation service exists and subscribes to the change list. When a command event is received, the configured distributed caches are examined for the item in the event. A couple of different actions are taken based on the particular system.
The first option is the Invalidation service removes the item from a distributed cache. Subsequently consumers of the services sharing the distributed cache will eventually suffer a cache miss, retrieve the item from storage and add the latest version of the item to the distributed cache. In this scenario there is a race condition between all of the discreet machines in the services and Origin may receive multiple requests for the same item in a short window. If the item is expensive to retrieve this can strain Origin. But the Invalidation scenario is very simple.
The second option is the Invalidation service makes a request to one of the services using the same distributed cache and asks the service to ignore cache and get the latest version of the item from Origin. This addresses the potential spike for multiple discreet machines calling Origin. But it means the Invalidation service is more tightly coupled to the other related services. And the service now has an API that allows a caller to bypass its caching strategy. Access to the uncached API would need to be secured to just the Invalidation service and other authorized callers.
In either case, all of the discreet machines that use the same redis database also subscribe to the command change list. Any individual machine just processes changes locally by removing items from its local cache. No error exists if the item is not present. The item will be refreshed from redis or Origin on the next request. For hot items, this means multiple requests to Origin could still be received from any machine that has removed the hot item and redis has not yet been updated. It can be beneficial for the discreet machines to locally "cache" and "item being retrieved" task that all subsequent request can await rather than calling Origin.
In addition to the discreet machines and a shared redis, the Invalidation logic also extends to Akamai and similar content distribution networks. Once the redis cache has been invalidated, the Invalidation routine uses the CDN APIs to flush the item. Akamai is fairly well-behaved and if configured correctly makes a relatively small number of calls to Origin for the updated item. Ideally the service has already retrieved the item and copies exist in both discreet machines local caches and the shared redis. CDN invalidation can be another source of spikes of requests if not anticipated and designed correctly.
In redis, from the discreet machines sharing it, a design that uses redis to indicate an item is being refreshed can also shield origin from multiple requests for the same item. A simple counter whose key is based on the item ID and the current time interval rounded to the nearest minute, 30 seconds, etc. can use the redis INCR command the machines that gets the count of 1 access Origin while all others wait.
Finally, for hot items, it can be helpful to have a Time To Refresh value attached to the item. If all payloads have a wrapper similar to below, then when an item is retrieved and its refresh time has passed, the called performs a background refresh of the item. For hot items this means they will be refreshed from cache before their expiration. For a system with heavy reads and low volumes of writes, caching items for an hour with a refresh time of something less than an hour means the hot items will generally stay in redis.
Here is a sample wrapper for cached items. In all cases it is assumed that the caller knows type T based on the item key being requested. The actual payload written to redis is assumed to be a byte array serialized and possibly gzip-ed. The SchemaVersion provide a hint to how the redis string is created.
interface CacheItem<T> {
String Key {get; }
DateTimeOffset ExpirationTime {get; }
DateTimeOffset TimeToRefresh {get; }
Int SchemaVersion {get;}
T Item {get; }
}
When storing
var redisString = Gzip.Compress(NetDataContractSerializer.Serialize(cacheItem))
When retrieving the item is recreated by the complementary uncompress and deserialize methods.
I read a document from Oracle which explains what is idempotent in BPEL.
13.3.2 Partner Link Property
You can dynamically configure a partner link at runtime in BPEL. This
is useful for scenarios in which the target service that BPEL wants to
invoke is not known until runtime. The following Partner Link
properties can be tuned for performance:
13.3.2.1 idempotent
An idempotent activity is an activity that can be retried (for
example, an assign activity or an invoke activity). Oracle BPEL Server
saves the instance after a nonidempotent activity. This property is
applicable to both durable and transient processes.
Values:
This property has the following values:
False: Activity is dehydrated immediately after execution and recorded in the dehydration store. When idempotent is set to False, it
provides better failover protection, but may impact performance if the
BPEL process accesses the dehydration store frequently.
True (default): If Oracle BPEL Server fails, it performs the activity again after restarting. This is because the server does not
dehydrate immediately after the invoke and no record exists that the
activity executed. Some examples of where this property can be set to
True are: read-only services (for example, CreditRatingService) or
local EJB/WSIF invocations that share the instance's transaction.
But I wonder is there anyway to set an activity as non-idempotent or idempotent in design time and runtime?
idempotent property can be set at partnerLink operation level using DeploymentDescriptor property idempotent.
Refer to Section c.1 Introduction to Deployment Descriptor properties of soa dev doc
In reading an article on N-Tiered Applications, I came across information regarding concurrency tokens and change tracking information:
Another important concept to understand is that while the
default-generated entities support serialization, their
change-tracking information is stored in the ObjectStateManager (a
part of the ObjectContext), which does not support serialization.
My question is three-fold:
Is there the same thing when using DbContext?
If the only interaction with the database is in a Repository class within a using statement, does closing the database connection when the program leaves the using statement get rid of any option for change tracking?
Can this be leveraged as/with a Concurrency Token?
Yes. DbContext is just wrapper around ObjectContext and it exposes change tracking information through ChangeTracker property (returns DbChangeTracker) and for particular entity through calling Entry method (returns DbEntityEntry<T>).
Yes. Closing context will remove all change tracking information.
Concurrency token and change tracking are two completely different concepts. Change tracking tells context what operations it has to execute on database when you call SaveChanges. It tracks changes you did on your entities since you loaded them into current context instance. Concurrency token resolves optimistic concurrency in the database => it validates that another process / thread / user / context instance didn't change the same record your context is going to modify during SaveChanges.
I was reading a book on Java Servlets where I came across HTTPSessionActivationListener. It was specified that in a clustered environment , there can be only one HTTPSession object containing a specific session id. Assume there are 2 nodes A and B in a cluster -
first request goes to node A. Here a HTTPSession S1 is created along with session attributes and response goes back to the client.
Same client sends the subsequent request. This request goes to node B. Now the session object S1 is moved from node A to node B (activated in Node B and passivated in node A).
In this case should the session object along with the attributes be serializable? What happens if it is not serializable?
In order to count the number of active sessions , should the sessions in both nodes be added up to get the actual value? How is this usually done?
Also I guess ServletContext is unique for each JVM. Are the attributes set as part of servletcontext copied to servlet context in all nodes of the cluster?
Usually I've seen people use sticky sessions (provided usually by the load balancer, for example ec2 ELB has this feature: http://shlomoswidler.com/2010/04/elastic-load-balancing-with-sticky-sessions.html), OR the session data is stored in a shared repository, such as a database or NoSQL store.
Spring session seems to be offering a capability called 'Clustered Sessions' and it also has feature to offload the session to a RedIs or GemFire caching solution.
Reference: http://projects.spring.io/spring-session/
Using a caching solution like Infinispan, hazelcast or redis would be the way to go if you want sessions to survive server failure. Application servers provide these function integrated now a days. You can just enable them from admin interface for web/ejb/jms persistance. If you are storing something into session in your code, you can use JCache API to store them on the underlying cache. JCache provides a product independent caching API, makes you code portable across caching solutions.