Single vs Multiple #KafkaListener methods for Multiple Topics - spring-kafka

We are creating a spring-kafka app that listens to multiple topics. What is the difference between having a single method with a #KafkaListener annotation for multiple topics vs having multiple methods with #KafkaListener annotation for each topic? Any benefit of doing it one way vs the other?

It depends on the concurrency, and your requirements.
Let's say each topic has 10 partitions and you have concurrency = 5.
With one annotation, you will get 5 consumers, each getting 2 partitions from each topic.
If you have two topics, the same thread is used to process 2 partitions from each.
Now, let's say records in topic1 take much longer to process than those from topic2. Records from topic2 could sit "behind" those from topic1. In that case, you might prefer to configure a separate listener container for each topic.
For low volume applications, one listener with multiple topics will be ok.
It all depends on your requirements.
By the way, you can put multiple #KafkaListener annotations on the same method; each one will create its own listener container.

Related

Dynamic Kafka Consumer using functional paradigm

I have a particulier requirement in which I want to collect messages from a topic until a specified duration (for example for 40 seconds), but only when asked to (so start a consumer for 40 sec when asked for and then stop).
I came across examples over creating consumers dynamically using
DefaultKafkaConsumerFactory
ConcurrentKafkaListenerContainerFactory
Firstly, I would like to know if its possible to do the same using KStream + functional paradigm (i.e. using (bi)function, consumer interfaces)?
Secondly, What should I do if another call comes in to start collecting messages while the first duration hasn't finished ? Create a new container (unique group-id ofcoarse) ??
Lastly, When the collect duration has expired, I have no use of this container anymore. I could stop the container but what then, can it be reused, i.e. when a new request comes in ?
It looks like you are using the Kafka Streams binder. If so, you can control the way the processors are started programmatically.
See these sections from the reference docs for more details.
https://docs.spring.io/spring-cloud-stream/docs/current/reference/html/spring-cloud-stream-binder-kafka.html#_manually_starting_kafka_streams_processors
https://docs.spring.io/spring-cloud-stream/docs/current/reference/html/spring-cloud-stream-binder-kafka.html#_manually_starting_kafka_streams_processors_selectively
You can also stop/start the bindings using a REST endpoint: https://docs.spring.io/spring-cloud-stream/docs/current/reference/html/spring-cloud-stream-binder-kafka.html#_binding_visualization_and_control_in_kafka_streams_binder

Apache Ignite Data Seggregation

I have an application that creates persistent caches on a fixed region (MYAPP_REGION) with fixed cached names (MyApp.Data.Class1, MyApp.Data.Class2, ...etc.)
I am deploying 2 instances of this application for 2 different customers, but they use the same ignite clusters.
What is the correct way to discriminate the data between the instances: do I change the cache name to be by customer or a region per customer is enough?
In a rdbms scenario, we would create 2 different databases; so I am wondering how we would achieve the same thing when using ignite as storage solution.
Well, as you have mentioned, there are a variety of options. If it's only logical division and you are OK with resource sharing, just like with a regular RDBM, then use multiple caches/tables or different SQL schemas. Keep in mind the desired data distribution and the amount of caches/tables per customer. I.e. if you have 3 nodes and 3 customers with about the same amount of data, most likely you'd like to use a custom affinity function to make them collocated on a single node, but it's a bit different question.
If you want more physical division, for example, if one of the customers needs more resources or special features like native persistence, then it's better to follow the different regions approach which might end up having separate clusters though.

I would like to better understand how Grakn is handling super nodes?

Is it just sharding and sending multiple queries?
If it is the case how it keeps the relations over shards?
Can I set or hint the sharding method?
Can I split the dataset and somehow hint there is another dataset in which a "cloned" entity resides?
Grakn handles super node sharding transparently in the schema, and will have automatic instance sharding in upcoming versions.
Sharding is a relatively simple mechanism, in which a node with more than a number of edges (set by knowledge-base.type-shard-threshold in the grakn.properties) will be split into a root node, with new edges attaching to child nodes that are invisible so the user. Since the goal of this is to be hands-off, we don't currently offer a way to control the sharding mechanism.
That should cover questions 1-3, but I don't follow question 4. Splitting a dataset but inserting it into the same keyspace will be no different from not splitting it in the first place. Splitting it into two keyspaces will isolate your data and there will be no sharing at all.

Doing compex reports with microservices

I'm starting a new project and am interested in architecting it as microservices. I'm trying to wrap my head around it:
Say that I have an order service and a product service. Now I want to make a report service that gives me all orders that contain a product from a certain product category.
Since order's dont know about products that means that I would need to fetch all orders, loop them and fetch products for each order and then return those how match.
Is this assumption correct or is there any more efficient way of doing this with microservices?
In a microservices architecture, the procedure is to distill the use cases and the service boundaries of the application. In the question above, there are at least two service boundaries, namely one for transactions and another for reporting.
When you have two different service boundaries, the typical approach is to duplicate some data elements between them eg. whenever you make a sale, the data, should be sent to both the reporting and transactional services. One possible approach of broadcasting the data to the different boundaries is to use a message queue. Duplicating the data allows them to be evolve and operate independently and become self sufficient which is one of the goals of microservices.
A personal word of advice though, you might want to start with a monolith before going the microservices route. Microservices are generally more operationally heavy; it will be difficult to reason about its advantages during the initial application stages. It tends to work better after having developed the monolithic application since it would be easier to see what didn't work and what could be improved by a microservices-like system.

Multiple Entity Framework models and the objectcontext

I have a asp.net web application that uses Entity Framework. The application data layer uses a method outlined by Jordan Van Gogh found here. In a nutshell, this method uses one shared objectcontext instance per http request and is disposed of using a AspNetObjectContextDisposalModule that inherits IHttpModule.
I have added another project, a few additional tables, and setup a data layer that copies (exactly) the example I described above to my application. This additional project and subsequent different data model all work perfectly. I can perform different operations using the 2 data layers with seemingly no consequences. Of course the objectsets are different in the 2 data layers as they represent different tables.
My question is this:
Is this approach a good idea? I get most of what is going on behind the scenes but both of these models use System.Data.Objects.ObjectContext. If user A performs an operation using the first data layer while simultaneously user B performs an operation using the second data layer, are there going to be problems with the "shared" objectcontext?
Thanks. And be gentle.
Edit
Note: I am using different oc keys
You should be OK:
The object context is per http request so the context from different users will not interfere with each other
Each context updates different tables so they will not interfere with each other
One thing that you may have to watch out for is what happens it two users update the same data at the same time.

Resources