Spring cloud stream kafka transactions in producer side - spring-kafka

We have a spring cloud stream app using Kafka. The requirement is that on the producer side the list of messages needs to be put in a topic in a transaction. There is no consumer for the messages in the same app. When i initiated the transaction using spring.cloud.stream.kafka.binder.transaction.transaction-id prefix, I am facing the error that there is no subscriber for the dispatcher and a total number of partitions obtained from the topic is less than the transaction configured. The app is not able to obtain the partitions for the topic in transaction mode. Could you please tell if I am missing anything. I will post detailed logs tomorrow.
Thanks

You need to show your code and configuration as well as the versions you are using.
Producer-only transactions are discussed in the documentation.
Enable transactions by setting spring.cloud.stream.kafka.binder.transaction.transactionIdPrefix to a non-empty value, e.g. tx-. When used in a processor application, the consumer starts the transaction; any records sent on the consumer thread participate in the same transaction. When the listener exits normally, the listener container will send the offset to the transaction and commit it. A common producer factory is used for all producer bindings configured using spring.cloud.stream.kafka.binder.transaction.producer.* properties; individual binding Kafka producer properties are ignored.
If you wish to use transactions in a source application, or from some arbitrary thread for producer-only transaction (e.g. #Scheduled method), you must get a reference to the transactional producer factory and define a KafkaTransactionManager bean using it.
#Bean
public PlatformTransactionManager transactionManager(BinderFactory binders) {
ProducerFactory<byte[], byte[]> pf = ((KafkaMessageChannelBinder) binders.getBinder(null,
MessageChannel.class)).getTransactionalProducerFactory();
return new KafkaTransactionManager<>(pf);
}
Notice that we get a reference to the binder using the BinderFactory; use null in the first argument when there is only one binder configured. If more than one binder is configured, use the binder name to get the reference. Once we have a reference to the binder, we can obtain a reference to the ProducerFactory and create a transaction manager.
Then you would just normal Spring transaction support, e.g. TransactionTemplate or #Transactional, for example:
public static class Sender {
#Transactional
public void doInTransaction(MessageChannel output, List<String> stuffToSend) {
stuffToSend.forEach(stuff -> output.send(new GenericMessage<>(stuff)));
}
}
If you wish to synchronize producer-only transactions with those from some other transaction manager, use a ChainedTransactionManager.

Related

Dynamic scoped dependency resolution in a console app for multiple tenants using dependency injection

In case of a Web API, each request is a distinct scope and dependencies registered as scoped will get resolved per request. So resolving dependencies per request per tenant is easy as the tenant information (like TenantId) can be passed in the HTTP Request headers like below:
services.TryAddScoped<ITenantContext>(x =>
{
var context = x.GetService<IHttpContextAccessor>().HttpContext;
var tenantId = context.Request.Headers["TenantId"].ToString();
var tenantContext = GetTenantContext(tenantId);
return tenantContext;
}
Other registrations first resolve TenantContext and use it to resolve other dependencies. For example, IDatabase will be registered as below. During resolution it will resolve and connect to specific tenant database.
services.TryAddScoped<IDatabase>(x =>
{
var tenantContext = x.GetService<ITenantContext>();
return new Database(tenantContext.DatabaseConnectionString);
}
This is all good in a Web API service because each request is a scope. I am facing challenges using dependency injection in a multi-tenant Console App. Suppose the app processes items from a
multi-tenant queue and each message can belong to a different tenant. While processing each message, it commits data to tenant specific database. So in this case the scope is each message in a queue and message contains the tenantId.
So when the app reads a message from queue, it needs to get TenantContext. Then resolve other dependencies based on this TenantContext.
One straightforward option I see how this dynamic resolution can be achieved is to create the dependent objects manually using the TenantContext but then I wouldn't be able to leverage dependency injection. All objects would get created manually and disposed after going out of scope after the message is processed.
var messgage = GetMessageFromQueue(queueName);
var tenantContext = GetTenantContext(message.TenantId);
var database = GetDatabaseObject(tenantContext);
// Do other processing now we got the database object connected to specific tenant DB
Is there an option in DI where I can pass in the TenantId dynamically so that TenantContext gets set for this scope and then all further resolution within this scope leverage this TenantContext?
Because the role of the tenancy goes beyond the implementation ("this uses X database") and is actually contextual to the action being performed ("this uses X database and must use this connection string based on the context being handled in the action"), there's some risk of assuming that ambient context is present in alternate implementations due to it not expressly being described in your interface in some way, which is where the DI issue is coming up here.
You might be able to:
Update your interfaces so that the tenancy information is an expected parameter of your methods. This ensures that regardless of future implementation, the presence of the tenant ID is explicit in their signature:
public interface ITenantDatabase {
public TResponse Get(string TenantId, int Id);
//... other methods ...
}
Add a factory wrapper around your existing interfaces to handle assigning the context at object creation and have that factory return the IDatabase instance. This is basically what you are proposing manually but with an abstraction around it that you could register and inject to keep the code that leverages it from being responsible for the logic:
public interface ITenantDatabaseFactory {
public IDatabase GetDatabaseForTenant(int TenantId);
}
// Add an implementation that manually generates and returns the scoped objects

How to load the caching layer with data when asp.net core web api is created?

I have created a web api that handles the creation of jwt token based on the encrypted user details that it receives in a post request.
In addition to this STS api should also handle the population of the caching layer (Redis or Hazelcast) with all the user data present in the database. Presently I have registered the caching service using dependency injection.This will happen only once when the api is first initialized.
services.AddSingleton<ICacheService, RedisCacheService>();
And in the TokenController added the service as a parameter to initialize the CachingService class and thereby initialize the caching layer.So that when the cacheService object is fist initialized it fetches all the user rows from the database and stores it as a key value pair inside Redis/Hazelcast database.
public TokenController(
ICryptographyService cryptographyService,
crudDBContext crudDBContext,
IConfiguration configuration,
ICacheService cacheService)
{
_cryptographyService = cryptographyService;
_context = crudDBContext;
_config = configuration;
_cacheService = cacheService;
}
But the Token Controller constructor is initialized only when an endpoint is called, so i had to create a separate default [HttpGet] endpoint to ensure that the constructor is called when the STS api is first initialized so as to ensure that the cacheService object gets created and the data gets loaded to the cache.
public ActionResult<string> Get()
{
return "STS";
}
Please let me know if there is a proper way of doing this without calling an endpoint, like be able to use dependency injection but at the same time call some code without the endpoint being called.I need to use dependency injection because i should be able to switch between Redis and Hazelcast by just changing the classname in the startup.cs file.
With respect to Hazelcast and dependency injection: First you would need to use the sources and not the Hazelcast NuGet version. Next the configuration depends on if you are in a Container Environment or a Hosted Environment. In both cases configuration keys will be gathered from the same sources and in the same order, and options will be registered in the service container, and available via dependency injection

RocksDb Java API support for Transactions

Does RocksJavaAPI have the support for transactions? I see that there is a Transaction DB class present in the JAR. I am not able to do a begin transaction on transaction Db class.
RocksDB db = TransactionDB.open(options, "/Users/jagannathan/Desktop/My Files/db/rocksdb")
I am not able to do db.beginTransaction as such methods are not available. Any pointers on how to accomplish in Java are appreciated.
You need to use a different open method. You currently use the open method of the base class (RocksDB).
Use either:
public static TransactionDB open(Options options,
TransactionDBOptions transactionDbOptions,
java.lang.String path)
or
public static TransactionDB open(DBOptions dbOptions,
TransactionDBOptions transactionDbOptions,
java.lang.String path,
java.util.List<ColumnFamilyDescriptor> columnFamilyDescriptors,
java.util.List<ColumnFamilyHandle> columnFamilyHandles)
To get a TransactionDB object. This object you can then use to call #beginTransaction, which will return an Transaction object. This transaction can then be used similar to a RocksDB, where you can put, delete etc. and commit if you're done.

When should i use batch consumer vs single record consumer

As far as I know, there is no special concept such as batch consumer in Apache Kafka documentation but spring-kafka has an option to create a batch consumer using below code snippet.
#Bean
public KafkaListenerContainerFactory<?> batchFactory() {
ConcurrentKafkaListenerContainerFactory<Integer, String> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setBatchListener(true); // <<<<<<<<<<<<<<<<<<<<<<<<<
return factory;
}
Now my question is,
When should i use batch consumer vs single record consumer? Can someone share few use cases to explain the usage of single record consumer vs batch consumer
As per the below thread, the main difference between single record consumer vs batch consumer is how many records are handled to the listener from the poll list.
What's the basic difference between single record kafka consumer and kafka batch consumer?
An example of using a batch listener might be if you to want to send data from multiple records to a database in one SQL statement (which might be more efficient than sending them one-at-a-time).
Another case is if you are using transactions; again, it might be more efficient to use one transaction for multiple SQL updates.
Generally, it's a performance consideration; if that's not a concern then the simpler one-at-a-time processing might be more suitable.

Are custom repository instances created per request?

I am trying to create an application with mikro-orm and apollo-server-express, I want to use the batch processing and the caching of the Facebook dataloader.
Normally, Facebook dataloader instances are creates per request. If mikro-orm also creates custom repository instances per request and if all calls to EntityManager.getRepository() in the same request gets the same instance, it may be the perfect place to create the dataloader instances.
Repositories are created as singletons, so only one instance exists per EntityManager instance. You should fork this EM to have one instance per request, either manually, or via RequestContext middleware:
https://b4nan.github.io/mikro-orm/identity-map/
This way, each request will have its own EntityManager, that will have its own cache of repository instances.
Keep in mind that if you use RequestContext, you should get the request specific EntityManager from it, and get the repository from there:
// beware that this will return null if the context is not yet started
const em = RequestContext.getEntityManager();
// gets request specific repository instance
const repo = em.getRepository(Book);

Resources