Is there any scalable data streaming framework that includes a UI that supports defining on-demand queries? [closed] - bigdata

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
Let's say I'm ingesting real-time data and want the end users who are viewing the data in real time to be able to group the information in different ways and quickly get views that represent the new groupings.
So if the data was for example, all the transactions on an e-commerce web site, and the user was viewing a live grid(e.g. in a Kibana-like website, with live data delivered via websocket) of transactions and wanted to group by country and view the top 10 countries by notional price of their transactions, the UI would send that command down to the servers and the servers would do all the necessary calculations to feed the user the aggregated notionals of the top countries, and constantly update that stream as new data was processed.
I know there are frameworks (e.g. Flink,Storm, Kafka Streams) that let you define such calculations in code, but is there any framework that lets the user pick different aggregations and set those up in real-time?

I'll answer for Flink:
Apache Zeppelin has a good integration with Flink. It lets users set up dynamic Flink queries. These are on-demand (user-defined), and off course you'll need a Flink cluster to attach to. These also refresh the Zeppelin UI.
I'm speculating, but I think that the backend will received Flink's updates, and the UI will fetch for new data at a fixed interval. While this is not done reactively, I believed this is as good as it can get currently.
Here is a Flink blog article on it: https://flink.apache.org/news/2020/06/15/flink-on-zeppelin-part1.html

Flink can do this out of the box, using Flink SQL with the SQL client. You can interactively create dynamic, continuously updating queries that stream their results into Elasticsearch (for example).
This talk is a good intro that shows off what's possible. It includes a bunch of example queries, and uses Grafana on top of MySQL for dashboarding. You can do the same with Elasticsearch/Kibana, if you prefer.

Related

Which of the design patterns (with or without SQS) below should they use to process spiked data volume in AWS? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I am doing my AWS study and came across this seemly debatable question online, wondering if posting here can get more inputs:
A company is building a voting system for a popular TV show, viewers win watch the performances then visit
the show’s website to vote for their favorite performer. It is expected that in a short period of time after the
show has finished the site will receive millions of visitors. The visitors will first login to the site using their
Amazon.com credentials and then submit their vote. After the voting is completed the page will display the
vote totals. The company needs to build the site such that can handle the rapid influx of traffic while
maintaining good performance but also wants to keep costs to a minimum. Which of the design patterns
below should they use?
A. Use CloudFront and an Elastic Load balancer in front of an
auto-scaled set of web servers, the web servers will first can the
Login With Amazon service to authenticate the user then process the
users vote and store the result into a multi-AZ Relational Database
Service instance.
B. Use CloudFront and the static website hosting feature of S3 with
the Javascript SDK to call the Login With Amazon service to
authenticate the user, use IAM Roles to gain permissions to a DynamoDB
table to store the users vote.
C. Use CloudFront and an Elastic Load Balancer in front of an
auto-scaled set of web servers, the web servers will first call the
Login with Amazon service to authenticate the user, the web servers
will process the users vote and store the result into a DynamoDB table
using IAM Roles for EC2 instances to gain permissions to the DynamoDB
table.
D. Use CloudFront and an Elastic Load Balancer in front of an
auto-scaled set of web servers, the web servers will first call the
Login. With Amazon service to authenticate the user, the web servers
win process the users vote and store the result into an SQS queue
using IAM Roles for EC2 Instances to gain permissions to the SQS
queue. A set of application servers will then retrieve the items from
the queue and store the result into a DynamoDB table.
The original question is from
http://www.briefmenow.org/amazon/which-of-the-design-patterns-below-should-they-use/#comment-25822
and it is very debatable there.
My thought is limit the numbers of AWS service so that the cost is minimum. RDS is excluded here for there are three options pointing to DynamoDB (I am not a cloud guru, this is my instinct judgement). S3 is only good for static website, so B is excluded.
Between C & D I chose C as 1. it doesn't need to use SQS which is an extra cost; 2. does SQS come with the capability to process the volume in an expected IOPS/Throughput? I don't know but both C & D use DynamoDB which is what I think a good solution and D doesn't indicate the needed permission for the application servers (i.e. EC2 instances) to obtain access to DynamoDB. So D is excluded here.
Am I missing anything here?
There is no standard and authority answer provided for this question.
Thank you very much for the discussion.
Highly opinionated question.
This is one way to solve the problem.
Design considerations
High number of requests in a short span of time
a. we must be able to autoscale
b. write traffic should be spread out so that we don't have to provision lots of dynamodb capacity.
c. read should not require doing database join and count operation, so that we don't have to provision lots of RAM, dynamodb capacity.
Assumptions
Eventual consistency of votes is fine.
Write should be strongly consistent. (if votes are immutable(can not be changed once done) it may fine to use sqs otherwise bringing in sqs brings in lot of complexities in the system. as explained below)
Architecture
Components
Use CloudFront and the static website hosting feature of S3 for powering website. (so that horizontally scalable.) (PS this approach is called client side rendering and has multiple cons like seo, do your research before choosing this approach, if you need server side rendering have another server behind aws elb which calls other services and creates the page. these are pros and cons of both the approaches, for the rest of answer i assume you are doing server side rendering.)
Website calls your services to render the page.
Your service is deployed on ec2, with auto scaling enabled.
All the reads are served from elastic-cache (they are deployed in master slave configuration so that no single point of failure).
All the writes go to dynamodb consistently. (Most of the services need to get consistent state to decide next state, if we have sqs queue in between if becomes impossible to determine exact state of system any given point of time. But this also means that we need to pay more for dynamodb. For this enable auto-scaling on dynamodb.)
Since major share of the load would be read, keep the aggregates in elastic cache. To update elastic cache you can have a lambda function subscribed to dynamodb change streams.
Contingencies
You should have a plan to populate elastic cache in case it goes down and you have to rehydrate the state.
Here are the cons of your options
Option A cons:
It will require huge rsu since you will be aggregating results.
Option B cons:
It will require huge rsu since you will be aggregating results.
Directly calling dynamodb from frontend might not be a good idea, considering how you store and retrieve data may evolve over time.
See cons of client side rendering.
Option C cons:
It will require huge rsu since you will be aggregating results.
Option D cons:
It will require huge rsu since you will be aggregating results.
Asynchronous application has their own complexities.
for example
if user upvotes and refreshes the page, you may show him that he hasn't casted any vote, since your application is yet to process the sqs event.
if user first upvotes and then downvotes, your system may store that user has upvoted. Since sqs events are not processed in order.
Despite the above cons if you still want to take this approach, What you are trying to do here is called Event Sourcing. You should use kafka instead of sqs so that your events are ordered.

Let client add data to Firestore through HTTP POST cloud function (Architecture) [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
So I'm building a web shop using Firestore and Firebase for the first time, I'm also new to NoSQL. I have the following architectural problem: When a customer places an order the client sends the products ordered to Firestore directly which stores the order in a collection raw_orders. My idea then was to have a Cloud Function trigger on the document create which processes it and builds an invoice and such. However I read that this function invocation may be delayed for 10 seconds, I would like to have a synchronous solution instead.
Instead I had the idea to create a HTTP Cloud Function where the customer can POST the order to, the HTTP function then processes the order and pushes it to Firestore, the function then returns the orderID or something to the customer. This approach feels much more safe since the user won't have to talk to the database directly. Also it solves the issues that the a function triggered by a Firestore create might be delayed.
However I'm new to Firebase and I'm not sure if this is architecturally the preferred way. The method I propose seems to be more in line with regular old REST APIs.
What do you think?
Thanks!
It sounds like you definitely have some server-side code and database operations that you can't trust the clients to do. (Keep in mind that firestore security rules are your only protection -- anyone can run whatever code they want within those rules, not just the code you provide).
Cloud functions give you exactly this -- and since you both want the operation to be synchronous (from the view of your client) and presumably have some way for the client to react to errors in the process, a cloud function would make a lot of sense for you to use.
Using cloud functions in this way is very common in Firebase apps, even if it isn't pure REST.
Moreover, if you are using Firebase more generally in your client, it might be more natural to use a callable cloud function rather than an http function, as this will handle the marshaling of the parameters in a much more native way than a raw HTTP request might. However, it isn't clear in your case since it sounds like you're using the REST API today.
Keep in mind that there are additional costs (or specific quotas, depending on your pricing plan) for running cloud functions. The pricing is based on invocations, as well as CPU and RAM usage of your functions.

which is best amoung Cloud Firestone and Realtime Database in FIrebase for working with large and complex data structure [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I am developing android and IOS app with web backend admin support for an organisation oriented employee, tasks and issue management system. issues and the task will be posted by any public people and issue will be addressed and solved by the organization people.
my app will have dynamic keyword filtering over the new issue posting and an algorithm will keep running to identify the issue categories dynamically from the issues being posted.
my app or the search result and filters should be fast in retrieving the data and it should not affect my application performance. I didn't know which one is good to use for a case like this.
I use the Realtime database for an large iOS app that contains users, jobs for users, favourites, messages etc. So I need the database to show results in realtime as well as being reliable.
The best features for a big app like the one I have built are the
offline capabilities
If the user goes offline, the database is still responsive and persists data to the disk which resynchronises when connection is established.
Data synchronization
All of my users on the app can see changes happening instantly such as notifications and messages, job updates etc. It's reliable and stops any overlapping potential risk.
https://firebase.google.com/docs/database/
Additionally, it sounds like you will benefit from the easy JSON structure that Firebase Realtime database offers to build categories etc.
I use a lot of filters in my iOS app and the calls to the database return results almost instantly given a connection is established; its very flexible.
In terms of reliability
Cloud Firestore is currently in beta. Stability in a beta product is not always the same as that of a fully launched product, where as Realtime Database is a mature product.
Use Fire-Base Realtime database if your apps are not going to serve more than 1000 people.
Because Fire-base Realtime doesn't perform much complex and fast queries like Fire-Base FireStore. And you can not upgrade your app if your app is based on RealTime database.
FireStore has more features that Realtime database.
You can go and check the difference between both of them:-
https://firebase.google.com/docs/database/rtdb-vs-firestore

Is firebase realtime json database suitable for data broadcasting? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I am considering using firebase as a way to brodcast data messages to many connected users on mobile phone native apps running actively in the foreground.
In a "channel" (presumably a node in the database) there might be a new 1kb message every second or so and potentially thousands of users listening in.
The ideal latency should be less than a second.
Is Firebase realtime json database ideal for this use case?
What are the limitations on number of users, number of messages and latency?
How does it compare to "Google Cloud Messaging", native push notifications, or other frameworks, for the same purpose?
Firebase is a real time json data base and it would work absolutely fine for what you are requesting.
There is no limitation to the number of users you can have but there is a limitation to the number of active connected users you can have. The free pricing tier allows 100 active connections at one time. The more expensive tiers allow for unlimited active connected users. There is no limitation on number of messages. Latency is very low. Changes are displayed almost instantly.
I haven't personally worked with Google cloud messaging or any other real time frameworks so I can't answer that. But firebase has great documentation and is very easy to set up and implement. The only downside is that firebase do not currently provide push notifications. However they can easily be implemented with a push notification service such as Batch

Meteor.js: How are HTML templates served? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I read that Meteor doesn't do any server side processing of view templates. Does this mean that HTML is served as-is to the client, with AJAX requests to populate dynamic parts of the page?
How does this compare to server side template processing in terms of serving the same dynamic-content-rich page to many users (100s of thousands)?
Your question is kind of broad, and I don't think this is the place for such comparisons, but I'll try to get you started regarding Meteor's approach:
Meteor used to be bundled with a view layer called Blaze. It is still the official view layer, but it appears that the next version will be based on React.js. Anyway, Meteor is more loosely coupled from Blaze now and you can choose any view layer, with Blaze, React and Angular being officially supported.
All of the above are templates/components that are compiled to JavaScript and rendered on the client, based on state/data available locally.
This data is usually obtained via a pub/sub mechanism (using a local cache that mimics the MongoDB interface, called MiniMongo) and mutated via async RPC mechanism called Meteor Methods.
The Meteor servers monitor the database for changes by looking at a change stream called OpLog.
When a client requests data (via a subscription), the server fetches the initial data and monitors for changes. If an OpLog changes matches a subscription's criteria, an update is sent to the client.
The notion of Reactive Computations is used throughout the framework, where some data sources can be invalidated and re-evaluate functions that depend on them.
Combined with a client-side router you often get what is commonly referred to as a SPA (single page application).
The state of the application (route + data + local state) normally dictates what views are rendered on the screen.
Currently, Meteor bundles the views and other code during the build process and sends the bundle to the client, which then has all of the code that it needs to render all of the views and fetch the required data.
A more modular approach is being investigated by the community (via alternative build methods) and is expected in the upcoming version 1.3 of Meteor.
The data transport mechanism is the Meteor DDP (distributed data protocol), which uses a WebSocket when possible to transfer data back and forth between the client and the server, so no need for AJAX/Comet calls for each state mutation.
I think that the spectrum of alternative implementations is too broad to discuss in a SO answer. It really depends on how "reactive" or "real time" you want your app to be.
The server capacity greatly depends on your implementation:
the amount of data each user needs and the frequency in which this data changes
the way you construct your queries (getting the data you need and doing so efficiently)
the way you partition your code
the hardware, of course
It can range from hundreds to 10's of thousands of connected users per server. No real way of providing a generic answer here.
An interesting guide is being created to demonstrate best practices using Meteor.

Resources