OpenSearch - Anomaly Detector - kibana

Context:
There are four fields in my data stream (text), id, userName, loginDateTime, loginGeoLocation. A user shouldn't be allowed to log in from different machines (geoLocation), but they are logging in from multiple machines. Data Stream is coming from many external systems & we don't have any control over it, and landing on a Kafka topic, and a logstash pipeline is picking up from and pushing to OpenSearch Indice.
Problem Statement:
I want to identify (alert) if some user login from a totally different location than is frequent. For example user-a login from Redwood, CA - but suddenly login from Boston, MS. On this action an alert trigger and send email/push/notify etc. How can we achieve this using pipeline, logstash or any method available with OpenSearch other than development or interceptor on stream.

After spending a lot of time, figured out two ways for this,
Create an anomaly detection (https://www.elastic.co/guide/en/machine-learning/7.17/ml-configuring-detector-custom-rules.html)
Use fingerprint filter from logstash while ingressing data (https://www.elastic.co/blog/logstash-lessons-handling-duplicates)
Customise the plugin to use custom written rules (https://github.com/logstash-plugins/logstash-input-java_input_example)
Hope it help to someone in similar problem.

Related

Nifi Flowfile Progress

I'm trying to figure out what one could do to see where a flowfile is at in a nifi flow over http. For example, say I have a webpage where a user could upload files. I want to indicate to the user that that file is currently being ingested/processed, and possibly what stage its at. What does nifi offer that I could leverage to get this information? Like is there a way to see which processors a flowfile has gone through, or the processor/queue it's currently in?
Thanks
One option would be to use NiFi's provenance events and issue a provenance query for the flow file UUID to see all the current events which should give you a graph of all the processors it has passed through:
https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data_provenance
You can open Chrome Dev Tools while using provenance features in the UI and see what calls are being made to the REST API.
Another option is to build in some kind of status updates into your flow. You could stand up your own HTTP services that receives simple events like an id, timestamp, and processor name, then in your flow you could put InvokeHttp processors wherever you want to report status to your service. Then your UI would use the status events in it's own DB or wherever you store them.

How to handle network calls in Microservices architecture

We are using Micro services architecture where top services are used for exposing REST API's to end user and backend services does the work of querying database.
When we get 1 user request we make ~30k requests to backend service. We are using RxJava for top service so all 30K requests gets executed in parallel.
We are using haproxy to distribute the load between backend services.
However when we get 3-5 user requests we are getting network connection Exceptions, No Route to Host Exception, Socket connection Exception.
What are the best practices for this kind of use case?
Well you ended up with the classical microservice mayhem. It's completely irrelevant what technologies you employ - the problem lays within the way you applied the concept of microservices!
It is natural in this architecture, that services call each other (preferably that should happen asynchronously!!). Since I know only little about your service APIs I'll have to make some assumptions about what went wrong in your backend:
I assume that a user makes a request to one service. This service will now (obviously synchronously) query another service and receive these 30k records you described. Since you probably have to know more about these records you now have to make another request per record to a third service/endpoint to aggregate all the information your frontend requires!
This shows me that you probably got the whole thing with bounded contexts wrong! So much for the analytical part. Now to the solution:
Your API should return all the information along with the query that enumerates them! Sometimes that could seem like a contradiction to the kind of isolation and authority over data/state that the microservices pattern specifies - but it is not feasible to isolate data/state in one service only because that leads to the problem you currently have - all other services HAVE to query that data every time to be able to return correct data to the frontend! However it is possible to duplicate it as long as the authority over the data/state is clear!
Let me illustrate that with an example: Let's assume you have a classical shop system. Articles are grouped. Now you would probably write two microservices - one that handles articles and one that handles groups! And you would be right to do so! You might have already decided that the group-service will hold the relation to the articles assigned to a group! Now if the frontend wants to show all items in a group - what happens: The group service receives the request and returns 30'000 Article numbers in a beautiful JSON array that the frontend receives. This is where it all goes south: The frontend now has to query the article-service for every article it received from the group-service!!! Aaand your're screwed!
Now there are multiple ways to solve this problem: One is (as previously mentioned) to duplicate article information to the group-service: So every time an article is assigned to a group using the group-service, it has to read all the information for that article form the article-service and store it to be able to return it with the get-me-all-the-articles-in-group-x query. This is fairly simple but keep in mind that you will need to update this information when it changes in the article-service or you'll be serving stale data from the group-service. Event-Sourcing can be a very powerful tool in this use case and I suggest you read up on it! You can also use simple messages sent from one service (in this case the article-service) to a message bus of your preference and make the group-service listen and react to these messages.
Another very simple quick-and-dirty solution to your problem could also be just to provide a new REST endpoint on the articles services that takes an array of article-ids and returns the information to all of them which would be much quicker. This could probably solve your problem very quickly.
A good rule of thumb in a backend with microservices is to aspire for a constant number of these cross-service calls which means your number of calls that go across service boundaries should never be directly related to the amount of data that was requested! We closely monitory what service calls are made because of a given request that comes through our API to keep track of what services calls what other services and where our performance bottlenecks will arise or have been caused. Whenever we detect that a service makes many (there is no fixed threshold but everytime I see >4 I start asking questions!) calls to other services we investigate why and how this could be fixed! There are some great metrics tools out there that can help you with tracing requests across service boundaries!
Let me know if this was helpful or not, and whatever solution you implemented!

use webservice in same project or handle it with code?

This is a theoretical question.
imagine an aspnet website. by clicking a button site sends mail.now:
I can send mail async with code
I can send mail using QueueBackgroundWorkItem
I can call a ONEWAY webservice located in same website
I can call a ONEWAY webservice located in ANOTHER website (or another subdomain)
none of above solutions wait for mail operation to be completed.so they are fine.
my question is why I should use service solution instead of other solutions. is there an advantage ?
4th solution adds additional tcpip traffic to use service its not efficient right ?
if so, using service under same web site (3rd solution) also generates additional traffic. is that correct ?
I need to understand why people using services under same website ? Is there any reason besides make something available to ajax calls ?
any information would be great. I really need to get opinions.
best
The most appropriate architecture will depend on several factors:
the volume of emails that needs to be sent
the need to reuse the email sending capability beyond the use case described
the simplicity of implementation, deployment, and maintenance of the code
Separating out the sending of emails in a service either in the same or another web application will make it available to other applications and from client side code. It also adds some complexity to the code calling the service as it will need to deal with the case when the service is not available and handle errors that may occur when placing the call.
Using a separate web application for the service is useful if the volume of emails sent is really large as it allows to offload the work to one or servers if needed. Given the use case given (user clicks on a button), this seems rather unlikely, unless the web site will have really large traffic. Creating a separate web application adds significant development, deployment and maintenance work, initially and over time.
Unless the volume of emails to be sent is really large (millions per day) or there is a need to reuse the email capability in other systems, creating the email sending function within the same web application (first two options listed in the question) is almost certainly the best way to go. It will result in the least amount of initial work, is easy to deploy, and (perhaps most importantly) will be the easiest to maintain.
An important concern to pay significant attention to when implementing an email sending function is the issue of robustness. Robustness can be achieved with any of the possible architectures and is somewhat of an different concern as the one emphasized by the question. However, it is important to consider the proper course of action needed if (1) the receiving SMTP refuses the take the message (e.g., mailbox full; non-existent account; rejection as spam) and (2) an NDR is generated after the message is sent (e.g., rejection as spam). Depending on the kind of email sent, it may be OK to ignore these errors or some corrective action may be needed (e.g., retry sending, alert the user at the origination of the emails, ...)

Pattern for long running tasks invoked through ASP.NET

I need to invoke a long running task from an ASP.NET page, and allow the user to view the tasks progress as it executes.
In my current case I want to import data from a series of data files into a database, but this involves a fair amount of processing. I would like the user to see how far through the files the task is, and any problems encountered along the way.
Due to limited processing resources I would like to queue the requests for this service.
I have recently looked at Windows Workflow and wondered if it might offer a solution?
I am thinking of a solution that might look like:
ASP.NET AJAX page -> WCF Service -> MSMQ -> Workflow Service *or* Windows Service
Does anyone have any ideas, experience or have done this sort of thing before?
I've got a book that covers explicitly how to integrate WF (WorkFlow) and WCF. It's too much to post here, obviously. I think your question deserves a longer answer than can readily be answered fully on this forum, but Microsoft offers some guidance.
And a Google search for "WCF and WF" turns up plenty of results.
I did have an app under development where we used a similar process using MSMQ. The idea was to deliver emergency messages to all of our stores in case of product recalls, or known issues that affect a large number of stores. It was developed and testing OK.
We ended up not using MSMQ because of a business requirement - we needed to know if a message was not received immediately so that we could call the store, rather than just letting the store get it when their PC was able to pick up the message from the queue. However, it did work very well.
The article I linked to above is a good place to start.
Our current design, the one that we went live with, does exactly what you asked about a Windows service.
We have a web page to enter messages and pick distribution lists. - these are saved in a database
we have a separate Windows service (We call it the AlertSender) that polls the database and checks for new messages.
The store level PCs have a Windows service that hosts a WCF client that listens for messages (the AlertListener)
When the AlertSender finds messages that need to go out, it sends them to the AlertListener, which is responsible for displaying the message to the stores and playing an alert sound.
As the messages are sent, the AlertSender updates the status of the message in the database.
As stores receive the message, a co-worker enters their employee # and clicks a button to acknowledge that they've received the message. (Critical business requirement for us because if all stores don't get the message we may need to physically call them to have them remove tainted product from shelves, etc.)
Finally, our administrative piece has a report (ASP.NET) tied to an AlertId that shows all of the pending messages, and their status.
You could have the back-end import process write status records to the database as it completes sections of the task, and the web-app could simply poll the database at arbitrary intervals, and update a progress-bar or otherwise tick off tasks as they're completed, whatever is appropriate in the UI.

Email notification when an error occurs

I need to design a bug alert system, where the web support team is notified via email when a user of our website encounters an error of any sort (database exception, or a 404)
What would be the best way to design this section of the project? Any ideas would be appreciated.
You may want to look into using the global.asax file for application-wide error intercepting. A quick search yields this step-by-step walk-through:
http://aspnetresources.com/articles/CustomErrorPages.aspx
Depending on the volume of traffic you're expecting, sending an e-mail every time an error is intercepted may not be the best approach. At best, you'd flood inboxes (and make the support staff very unhappy), and at worst you'd get your mail servers blacklisted for spamming. The approach that I've used in the past on high-traffic sites is to queue up errors in a table that is read and purged at a set interval by a separate process. The process would aggregate the errors, grouping them by type, number of occurrences, etc, then send out an e-mail report to the support mailing lists.
ASP.NET health monitoring may be of interest: http://msdn.microsoft.com/en-us/library/ms998306.aspx. It's really simpler to use than this article first appears and doesn't require any additional components - it's all built-in.
I would implement an HTTPmodule that captures the onError event.
This is would allow the module to be reused over multiple applications. The destination email addresses, SMTP server etc, could be in the HTTPmodule, overriden in the web.config file for maximum flexibility.

Resources