Using MEssaging Queue like AMQP, ZMQ, Kafka etc with Rocksdb - rocksdb

I have gone through some material, talks and some documentation on rocksdb.
I find it interesting for some use cases. I would like to understand how it can be used to populate data using message queues like AMQP, ZMQ or Kafka.
I've not gone through individual files on github, in case there is any explanation on this.
Request you to share your thoughts/experiences on the same.
Thanks

Related

Event-Driven-API with Real Time Streaming Analytics from Datastream? (Kappa-Architecture, IoT)

I've recently read up common Big Data architectures (Lambda and Kappa) and I'm trying to put it into practice in the context of an IoT Application.
As of right now, events are produced, ingested into a database, queried and provided as a REST-API (Backend) for a (React) Frontend. However, this architecture is not event driven as the front end isn't notified or updated if there are new events. I use frequent HTTP-Requests to "simulate" a real time application.
Now at first glance, the Kappa Architecture seems like the perfect fit for my needs, but I'm having trouble finding a technology that lets me write dynamic aggregation queries and serve them to a frontend.
As I understand, Frameworks like Apache Flink (or Spark Structured Streaming) are a great way to write such queries and apply them to the datastream, but they are static and can't be changed.
I'd like to find a way, how to filter, group, and aggregate events from a stream and provide them to a frontend using WebSockets or SSE. As of right now, the aggregates don't need to be persisted as they are strictly for visualization (this will probably change in the future).
I implemented a Kafka Broker into my application and all events are ingested into a topic and ready for consumption.
Before I implemented Kafka I tried to apply Aggregation Pipelines on my MongoDB Change Feed, which isn't fully supported and therefore doesn't fit my needs.
I tried using Apache Druid, but it seems as if it only supports a request/response-pattern and can't stream query results for consumption
I've looked into Apache Flink, but it seems as if you can only define static queries that are then committed to the Flink Cluster. It seems as if Interactive/Ad-hoc queries are not possible which is really sad, as it looked very promising otherwise.
I think I've found a way that could maybe work using Kafka + Kafka Streams, but I'm not really satisfied with it and this is why I'm writing this post.
My problem boils down to 2 questions:
How can I properly create interactive queries (filter, group (windowing), aggregate) and receive a continuous stream of results?
How can I serve this result stream to a frontend for visualization and therefore create an truly event-driven API?
I'd like to only rely on open-source/free software (Apache etc.).

spring cloud stream multi binder setup - remote source kafka with local destination kafka possible?

I am trying to consume messages from a remote kafka and produce output to a local kafka setup.
Additionally I don't want to 'pollute' the remote Kafka with kakfa streams specific technical topics (e.g. intermediate KTable stores).
I created an example app for demo purposes, however this doesn't work as expected - and I cannot say why.
Not working means: I don't really understand what's going on. Multiple consumers are being created but all to locahost, none seems to point to the remote kafka.
https://github.com/webermich/multibinder
https://github.com/webermich/multibinder/blob/main/src/main/resources/application.yaml
So my qustions are:
Is my general understanding correct that I can build something like I described above?
If the answer to 1) is true: Can anyone spot my mistakes?
I am really unsure about the type: kstream of my bindings, too. I feel this is wrong.
Is there a link to a working example?

Browsing JMS Durable Subscription

One of the projects I'm on is currently using a JMS Topic setup, with the client application using containing listeners for two different durable subscribers.
Long story short, we're looking at several different ways to solve an ordering issue, and one of those is by looking at JMSTimestamps. At first we were thinking we might use whatever is the durable sub equivalent of a QueueBrowser, but so far I haven't found anything.
Is there any way to accomplish either browsing the contents of a durable subscription, or seeing the next message without actually consuming it?
JMS does not provide any API to browse messages. However there is TopicBrowser interface which is Oracle-specific extension to JMS.
You can use JMSToolBox on sourceforge to subscribe to the Topic in parallel of your regular clients and capture all the messages post to that topic

Dynamodb streams in python

I would like to read data from a dynamodb stream in python and the alternatives that i have found so far are
Use dynamodb stream low level library functions (as described here): This solution however seems almost impossible to maintain in a production environment, with the application having to maintain the status of shards, etc.
Use KCL library designed for reading Kinesis streams: The python version of the library seems unable to read from a dynamodb stream.
What are the options to successfully process dynamodb streams in python? (links to possible examples would be super helpful)
PS: I have considered using lambda function to process the dynamodb but for this task, I would like to read the stream in an application as it has to interact with other components which cannot be done via a lamda function.
I would still suggest to use lambda. The setup is very easy as well as very robust (it's easy to manage retries,batching, downtimes...)
Then from your lambda invocation you could easily send your data in a convenient way to your existing program (including, but not limited to: SNS, SQS, a custom server webhook, sending the data to a custom pub/sub service you own...etc)

In addition to the included mongo datastore, how do I add and subscribe to additional sources of livedata (eg a separate Riak DB)

We have a system in mind whereby we will use the Meteor stack as is, but in addition we would like to have additional sources of live data that we would like to subscribe to.
I assume this would involve implementing DDP for the other data sources (in this case a Riak DB, and potentially RabbitMQ)
The additional sources would be read-only, but we need to update things based on the changes in the DB, hence the need for some sort of subscription.
so my question is
Given that we need to have multiple livedata sources, is implementing DDP even the correct approach?
Where would i start implementing DDP for Riak (pointers, examples if possible)?
Is there possibly some simpler way to achieve live updates from multiple sources, given that the extra sources would be read-only?
Thanks in advance :)
DDP is a client/server protocol, not a server to database protocol. This is not the approach I would take, especially for read-only data.
Instead I would wrap a Riak node.js library into a Meteor package, using a Fiber. You could look at the Mongo driver for a complicated example of this, or the HTTP package for a simpler example. (Packages are found in /usr/local/meteor/packages)
As the node driver returns data, it would call back into your Meteor to populate the collection. See a code snippet at In Meteor, how to remove items from a non-Mongo Collection?

Resources