Records are filtered one-by-one in FilteringBatchMessageListenerAdapter - spring-kafka

I just realized that in spring-kafka the messagefiltering for batchlisteners are executed one after the other. (see FilteringBatchMessageListenerAdapter, version 2.7.4)
#Override
public void onMessage(List<ConsumerRecord<K, V>> consumerRecords, #Nullable Acknowledgment acknowledgment,
Consumer<?, ?> consumer) {
Iterator<ConsumerRecord<K, V>> iterator = consumerRecords.iterator();
while (iterator.hasNext()) {
if (filter(iterator.next())) {
iterator.remove();
}
}
...
}
since in my filter implementation I need to make a lookup on our database the whole processing slows down.
Is there a way to overrule this behaviour in such a way that I need to lookup the database only once for the whole batch?

Not out of the box; you would have to do the filtering in your batch listener instead.

Related

Singleton State vs. Singleton Event

I have a Blazor Server App. This app is connected to a SQL DB and is at this time relatively complex. Since the main focus is usability, we ran into some problems when we access the database directly (components not updated correctly, etc.).
Therefore, I am trying to create a StateService which basically acts as some sort of "Cache". Data is stored in it and components can access it, without any loading times. During my research, I had some questions, which the documentation couldn't answer to me.
The Problem
It should be possible that all components always have the latest state of the data. This means that clients need to be automatically notified of any changes and automatically refresh their states. It also should be possible to have the power to provide the service to ~1.000 concurrent users, without the necessity to upgrade to a high-end server (I know, that this is very vague).
Possible Solutions
Singleton State
I basically have a service, which holds the data as a property in it and has an OnChange-event. Whenever any data property gets set, the event gets triggered. This service is then used by components to display data. When I add data to the database, the data will then be automatically loaded back into the state. I added this service as a singleton, so there is only one object during the server runtime.
public class SharedStateService
{
public event Action OnChange;
private ICollection<MyData>? myData;
public ICollection<MyData>? MyData
{
get => this.myData;
set
{
this.myData= value;
this.OnChange?.Invoke();
}
}
}
public class MyDataService
{
private readonly SharedStateService sharedStateService;
private readonly TestDbContext context;
public MyDataService(TestDbContext context, SharedStateService sharedService)
{
this.context = context;
this.sharedStateService = sharedService;
}
public async Task<bool> DeleteData(MyData data)
{
try
{
this.context.Set<MyData>().Remove(data);
await this.context.SaveChangesAsync();
}
catch (Exception)
{
return false;
}
await this.ReloadData();
return true;
}
public async Task ReloadData()
{
this.sharedStateService.MyData =
await this.context.Set<MyData>().ToListAsync();
}
}
In my views, it is now possible to subscribe to the OnChange event and freely use the MyData property.
<table class="table">
<thead>
<tr>
<!-- ... -->
</tr>
</thead>
<tbody>
#foreach (var data in SharedStateService.MyData)
{
<tr>
<!-- ... -->
</tr>
}
</tbody>
</table>
#code {
public void Dispose()
{
SharedStateService.OnChange -= Refresh;
}
protected override void OnInitialized()
{
SharedStateService.OnChange += Refresh;
}
private async void Refresh()
{
await InvokeAsync(this.StateHasChanged);
}
}
The problem I see with this case is that the entire data is constantly stored on the server. Might there be any problems? Am I overthinking it too much? What could possible risks of such an approach be?
Singleton Event
It is similar to the singleton state, but I do not store the data anywhere. Instead of the state, I have a service, which only provides an event, which can be subscribed to. This service is, again, added as a singleton.
public class RefreshService
{
public event Action OnChange;
public void Refresh()
{
OnChange?.Invoke();
}
}
This service is then injected into the data providers and called, when a change occur.
I extend the MyDataService by a new method.
public async Task<ICollection<MyData>> GetAll()
{
return await this.context.Set<MyData>().ToListAsync();
}
Afterwards, in my view, I add a property and adjust the Refresh method, to load the data into this local property.
private async void Refresh()
{
this.MyData= await MyDataService.GetAll();
await InvokeAsync(this.StateHasChanged);
}
This approach is very similar to the first one, but I don't need to store the data constantly. Is this approach easier to handle for the server? Could this lead to wrong data displayed, since the data is stored redundantly?
I know that this is a long read, but maybe someone knows which approach is generally preferable over the other.
Listen to data change it's not a bad idea, the only think i would get focus on it's the way you delete and change. First i will improve on use EFCoreBulkExtensions just for performance, if you will be updating / deleting data everytime, it's not a bad idea to perform that (principally because your database will grow as time goes by).
And what i think it's the proper solution it's the second one, Singleton Event , that way allow's you to prevent the possible error that could make the first one. Think in this scenario: you have 1000 users, it's probably that most of your users where interacting with the data at same time. If you delete, and then refresh the data could make data inconsistency, but if you get the event change state, you could use it as a flag, that data needs to be updated before user interacts with it.
Finally, i think you could use BulkInsertOrUpdateOrDelete method, so if data doesn't exists (with their id), you insert it, if any changes, it get's updates, and if it doesn't exists (an existing id) you delete it, all with one optimized method of bulk extensions. And in case you can't add another library, you should make your own add/update/delete method!

Cassandra Async reads and writes, Best practices

To Set the context,
We have 4 tables in cassandra, out of those 4, one is data table remaining are search tables (Lets assumme DATA, SEARCH1, SEARCH2 and SEARCH3 are the tables).
We have an initial load requirement with upto 15k rows in one req for the DATA table and hence to the search tables to keep in sync.
We do it in batch inserts with each bacth as 4 queries (one to each table) to keep consistency.
But for every batch we need to read the data. If exists, just update only the DATA table's lastUpdatedDate column, else insert to all the 4 tables.
And below is the code snippet how we are doing:
public List<Items> loadData(List<Items> items) {
CountDownLatch latch = new CountDownLatch(items.size());
ForkJoinPool pool = new ForkJoinPool(6);
pool.submit(() -> items.parallelStream().forEach(item -> {
BatchStatement batch = prepareBatchForCreateOrUpdate(item);
batch.setConsistencyLevel(ConsistencyLevel.LOCAL_ONE);
ResultSetFuture future = getSession().executeAsync(batch);
Futures.addCallback(future, new AsyncCallBack(latch), pool);
}));
try {
latch.await();
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
//TODO Consider what to do with the failed Items, Retry? or remove from the items in the return type
return items;
}
private BatchStatement prepareBatchForCreateOrUpdate(Item item) {
BatchStatement batch = new BatchStatement();
Item existingItem = getExisting(item) //synchronous read
if (null != data) {
existingItem.setLastUpdatedDateTime(new Timestamp(System.currentTimeMillis()));
batch.add(existingItem));
return batch;
}
batch.add(item);
batch.add(convertItemToSearch1(item));
batch.add(convertItemToSearch2(item));
batch.add(convertItemToSearch3(item));
return batch;
}
class AsyncCallBack implements FutureCallback<ResultSet> {
private CountDownLatch latch;
AsyncCallBack(CountDownLatch latch) {
this.latch = latch;
}
// Cooldown the latch for either success or failure so that the thread that is waiting on latch.await() will know when all the asyncs are completed.
#Override
public void onSuccess(ResultSet result) {
latch.countDown();
}
#Override
public void onFailure(Throwable t) {
LOGGER.warn("Failed async query execution, Cause:{}:{}", t.getCause(), t.getMessage());
latch.countDown();
}
}
The execution is taking about 1.5 to 2 mins for 15k items considering the network roundtrip b/w application and cassandra cluster(Both reside on same DNS but different pods on kubernetes)
we have ideas to make even the read call getExisting(item) also async, but handling of the failure cases is becoming complex.
Is there a better approach for data loads for cassandra(Considering only the Async wites through datastax enterprise java driver).
First thing - batches in Cassandra are other things than in the relational DBs. And by using them you're putting more load on the cluster.
Regarding the making everything async, I thought about following possibility:
make query to the DB, obtain a Future and add listener to it - that will be executed when query is finished (override the onSuccess);
from that method, you can schedule the execution of the next actions based on the result that is obtained from Cassandra.
One thing that you need to make sure to check, is that you don't issue too much simultaneous requests at the same time. In the version 3 of the protocol, you can have up to 32k in-flight requests per connection, but in your case you may issue up to 60k (4x15k) requests. I'm using following wrapper around Session class to limit the number of in-flight requests.

rxJava Observer.onNext not called second time

I am using rxJava to fetch data from the database and show it in a recyclerview. The relevant code is shown below
function updateUI(){
ContactsLab contactsLab = ContactsLab.get(getActivity());
Subscription sub = contactsLab.getContactList().subscribeOn(Schedulers.io())
.observeOn(AndroidSchedulers.mainThread())
.toList()
.subscribe(onContactsReceived());
mCompositeSubscription.add(sub);
}
ContactsLab is a singleton that returns an Observable of Contact objects.
onContactsReceived function is shown below
private Observer<List<Contact>> onContactsReceived(){
return new Observer<List<Contact>>() {
#Override
public void onCompleted() {}
#Override
public void onError(Throwable e) {}
#Override
public void onNext(List<Contact> contacts) {
if(mContactsAdapter == null) {
mContactsAdapter = new ContactsAdapter(contacts);
mRecyclerView.setAdapter(mContactsAdapter);
} else{
mContactsAdapter.setContactList(contacts);
mContactsAdapter.notifyDataSetChanged();
}
}
};
}
The updateUI function is called in my fragment onResume but the view is updated only the first time. If I come back to this fragment from any other fragment (having added more items to db), onResume is called, updateUI runs and onContactsReceived also runs but returns immediately without calling onNext or onComplete.
I think this has something to do with the way rxJava handles observables but no idea how to fix it (read about defer but couldn't understand much). Can somebody please help?
Edit:
The getContactList function look like this :
public rx.Observable<Contact> getContactList() {
List<Contact> contacts = new ArrayList<>();
ContactCursorWrapper cursorWrapper = queryContacts(null, null);
try{
cursorWrapper.moveToFirst();
while (!cursorWrapper.isAfterLast()){
contacts.add(cursorWrapper.getContact());
cursorWrapper.moveToNext();
}
} finally {
cursorWrapper.close();
}
return rx.Observable.from(contacts);
}
Basically it queries the database and maps the returned Cursor into my Contact class(which is a POJO). I added the rx.Observable.from to get an observable that was later collated using toList and updated into the adapter.
I used this approach avoid having to call notifyDataSetChanged after getting each item (and call it only once after getting all that).
What's the right approach to minimize the number of notifyDataSetChanged calls and also, refresh each time onResume is called?
Your observable contactsLab.getContactList().toList() has terminated.toList() collects all emissions from a source observable to a list and emits the entire list once the source Observable terminates (see the documentation). You aren't going to observe any more emissions from it.

RxJava one observable, multiple subscribers, one execution

I create an Observable from a long running operation + callback like this:
public Observable<API> login(){
return Observable.create(new Observable.OnSubscribe<API>() {
#Override
public void call(final Subscriber<? super API> subscriber) {
API.login(new SimpleLoginListener() {
#Override
public void onLoginSuccess(String token) {
subscriber.onNext(API.from(token));
subscriber.onCompleted();
}
#Override
public void onLoginFailed(String reason) {
subscriber.onNext(API.error());
subscriber.onCompleted();
}
});
}
})
}
A successfully logged-in api is the pre-condition for multiple other operations like api.getX(), api.getY() so I thought I could chain these operation with RxJava and flatMap like this (simplified): login().getX() or login().getY().
My biggest problem is now, that I don't have control over when login(callback) is executed. However I want to be able to reuse the login result for all calls.
This means: the wrapped login(callback) call should be executed only once. The result should then be used for all following calls.
It seems the result would be similar to a queue that aggregates subscribers and then shares the result of the first execution.
What is the best way to achieve this? Am I missing a simpler alternative?
I tried code from this question and experiemented with cache(), share(), publish(), refCount() etc. but the wrapped function is called 3x when I do this for all of the mentioned operators:
apiWrapper.getX();
apiWrapper.getX();
apiWrapper.getY();
Is there something like autoConnect(time window) that aggregates multiple successive subscribers?
Applying cache() should make sure login is only called once.
public Observable<API> login() {
return Observable.create(s -> {
API.login(new SimpleLoginListener() {
#Override
public void onLoginSuccess(String token) {
s.setProducer(new SingleProducer<>(s, API.from(token)));
}
#Override
public void onLoginFailed(String reason) {
s.setProducer(new SingleProducer<>(s, API.error()));
}
});
}).cache();
}
If, for some reason you want to "clear" the cache, you can do the following trick:
AtomicReference<Observable<API>> loginCache = new AtomicReference<>(login());
public Observable<API> cachedLogin() {
return Observable.defer(() -> loginCache.get());
}
public void clearLoginCache() {
loginCache.set(login());
}
Ok I think I found one major problem in my approach:
Observable.create() is a factory method so even if every single observable was working as intented, I created many of them. One way to avoid this mistake is to create a single instance:
if(instance==null){ instance = Observable.create(...) }
return instance

What Exactly Does HttpApplicationState.Lock Do?

My application stores two related bits of data in application state. Each time I read these two values, I may (depending on their values) need to update both of them.
So to prevent updating them while another thread is in the middle of reading them, I'm locking application state.
But the documentation for HttpApplicationState.Lock Method really doesn't tell me exactly what it does.
For example:
How does it lock? Does it block any other thread from writing the data?
Does it also block read access? If not, then this exercise is pointless because the two values could be updated after another thread has read the first value but before it has read the second.
In addition to preventing multiple threads from writing the data at the same time, it is helpful to also prevent a thread from reading while another thread is writing; otherwise, the first thread could think it needs to refresh the data when it's not necessary. I want to limit the number of times I perform the refresh.
Looking at the code is locking only the write, not the read.
public void Lock()
{
this._lock.AcquireWrite();
}
public void UnLock()
{
this._lock.ReleaseWrite();
}
public object this[string name]
{
get
{
return this.Get(name);
}
set
{
// here is the effect on the lock
this.Set(name, value);
}
}
public void Set(string name, object value)
{
this._lock.AcquireWrite();
try
{
base.BaseSet(name, value);
}
finally
{
this._lock.ReleaseWrite();
}
}
public object Get(string name)
{
object obj2 = null;
this._lock.AcquireRead();
try
{
obj2 = base.BaseGet(name);
}
finally
{
this._lock.ReleaseRead();
}
return obj2;
}
The write and the read is thread safe, meaning have all ready the lock mechanism. So if you going on a loop that you read data, you can lock it outside to prevent other break the list.
Its also good to read this answer: Using static variables instead of Application state in ASP.NET
Its better to avoid use the Application to store data, and direct use a static member with your lock mechanism, because first of all MS suggest it, and second because the read/write to application static data is call the locking on every access of the data.

Resources