Does U-SQL allow custom code to call external services - u-sql

In U-SQL custom code (code behind or Assemblies) can external services be called e.g. bing search or map.
Thanks,
Nasir

This is currently not supported for the following reason:
Imagine that you write a UDF or UDO (e.g., an extractor) that calls a REST endpoint of a service that is used to get a few calls per minute from the same originating IP address. But now you execute this user code in a U-SQL job that is scaled out over millions of rows, running possibly on hundreds of vertices concurrently. This is a - hopefully unintended - distributed denial of service attack against that service. And it most likely will lead to that service experiencing an outage and our IP ranges getting blocked.
Thus, we are currently closing off our containers and recommend that you use other mechanisms (like getting a data set for coordinate translations) instead.

Related

What is the difference between tracing calls in distributed system using ELK stack and using Dynatrace distributed tracing?

In my project this week I got a backlog where I asked to analyze how can we use distributed tracing in our microservices. We already have ELK stack where we check log in Kibana dashboard. I tried to filter calls using co-relation id and this is working fine. However, our application uses Dynatrace also for tracing. My question is if we have ELK stack to check log to find issues in PRD then what extra Dynatrace distributed tracing does? What is main difference that I am not able to understand. Please can anyone try to explain in simple language? I am new to this backlog so some concepts are not proper yet.
Distributed tracing is an industry method to allow developers to
monitor the performance of the APIs that they use without actually
being able to analyze the backing microservice’s code.
From off doc:
ELK Distributed tracing enables you to analyze performance throughout
your microservice architecture by tracing the entirety of a
request — from the initial web request on your front-end service all
the way to database queries made on your back-end services.
It works
by injecting a custom traceparent HTTP header into outgoing requests.
This header includes information, like trace-id, which is used to
identify the current trace, and parent-id, which is used to identify
the parent of the current span on incoming requests or the current
span on an outgoing request.
Dynatrace distributed tracing is almost the same plus code execution analysis. In Dynatrace words: Purepaths plus Code-level visibility and AI-based application performance monitor. Easy way to identify how your microservices works together, full service flow and problem analysis.
More explanations from Dynatrace blog

How can I track WebSockets in Node.js (ws) using Azure Application Insights?

I'm using Node.js and ws for my WebSocket servers and want to know the best practice methods of tracking connections and incoming and outgoing messages with Azure Azure Application Insights.
It appears as though this service is really only designed for HTTP requests and responses so would I be fine if I tracked everything as an event? I'm currently passing the JSON.parse'd connection message values.
What to do here really depends on the semantics of your websocket operations. You will have to track these manually since the Application Insights SDK can't infer the semantics to map to Request/Dependency/Event/Trace the same way it can for HTTP. The method names in the API do indeed make this unclear for non-HTTP, but it becomes clearer if you map the methods to the telemetry schema generated and what those item types actually represent.
If you would consider a receiving a socket message to be semantically beginning an "operation" that would trigger dependencies in your code, you should use trackRequest to record this information. This will populate the information in the most useful way for you to take advantage of the UI in the Azure Portal (eg. response time analysis in the Performance blade or failure rate analysis in the Failures blade). Because this request isn't HTTP, you'll have to mend your data to fit the schema a bit. An example:
client.trackRequest({name:"WS Event (low cardinality name)", url:"WS Event (high cardinality name)", duration:309, resultCode:200, success:true});
In this example, use the name field to describe that items that share this name are related and should be grouped in the UI. Use the url field as information that more completely describes the operation (like GET parameters would in HTTP). For example, name might be "SendInstantMessage" and url might be "SendInstantMessage/user:Bob".
In the same way, if you consider sending a socket message to be request for information from your app, and has meaningful impact how your "operation" acts, you should use trackDependency to record this information. Much like above, doing this will populate the data in most useful way to take advantage of the Portal UI (Application Map in this case would then be able to show you % of failed Websocket calls)
If you find you're using websockets in a way that doesn't really fit into these, tracking as an event as you are now would be the correct use of the API.

How to handle network calls in Microservices architecture

We are using Micro services architecture where top services are used for exposing REST API's to end user and backend services does the work of querying database.
When we get 1 user request we make ~30k requests to backend service. We are using RxJava for top service so all 30K requests gets executed in parallel.
We are using haproxy to distribute the load between backend services.
However when we get 3-5 user requests we are getting network connection Exceptions, No Route to Host Exception, Socket connection Exception.
What are the best practices for this kind of use case?
Well you ended up with the classical microservice mayhem. It's completely irrelevant what technologies you employ - the problem lays within the way you applied the concept of microservices!
It is natural in this architecture, that services call each other (preferably that should happen asynchronously!!). Since I know only little about your service APIs I'll have to make some assumptions about what went wrong in your backend:
I assume that a user makes a request to one service. This service will now (obviously synchronously) query another service and receive these 30k records you described. Since you probably have to know more about these records you now have to make another request per record to a third service/endpoint to aggregate all the information your frontend requires!
This shows me that you probably got the whole thing with bounded contexts wrong! So much for the analytical part. Now to the solution:
Your API should return all the information along with the query that enumerates them! Sometimes that could seem like a contradiction to the kind of isolation and authority over data/state that the microservices pattern specifies - but it is not feasible to isolate data/state in one service only because that leads to the problem you currently have - all other services HAVE to query that data every time to be able to return correct data to the frontend! However it is possible to duplicate it as long as the authority over the data/state is clear!
Let me illustrate that with an example: Let's assume you have a classical shop system. Articles are grouped. Now you would probably write two microservices - one that handles articles and one that handles groups! And you would be right to do so! You might have already decided that the group-service will hold the relation to the articles assigned to a group! Now if the frontend wants to show all items in a group - what happens: The group service receives the request and returns 30'000 Article numbers in a beautiful JSON array that the frontend receives. This is where it all goes south: The frontend now has to query the article-service for every article it received from the group-service!!! Aaand your're screwed!
Now there are multiple ways to solve this problem: One is (as previously mentioned) to duplicate article information to the group-service: So every time an article is assigned to a group using the group-service, it has to read all the information for that article form the article-service and store it to be able to return it with the get-me-all-the-articles-in-group-x query. This is fairly simple but keep in mind that you will need to update this information when it changes in the article-service or you'll be serving stale data from the group-service. Event-Sourcing can be a very powerful tool in this use case and I suggest you read up on it! You can also use simple messages sent from one service (in this case the article-service) to a message bus of your preference and make the group-service listen and react to these messages.
Another very simple quick-and-dirty solution to your problem could also be just to provide a new REST endpoint on the articles services that takes an array of article-ids and returns the information to all of them which would be much quicker. This could probably solve your problem very quickly.
A good rule of thumb in a backend with microservices is to aspire for a constant number of these cross-service calls which means your number of calls that go across service boundaries should never be directly related to the amount of data that was requested! We closely monitory what service calls are made because of a given request that comes through our API to keep track of what services calls what other services and where our performance bottlenecks will arise or have been caused. Whenever we detect that a service makes many (there is no fixed threshold but everytime I see >4 I start asking questions!) calls to other services we investigate why and how this could be fixed! There are some great metrics tools out there that can help you with tracing requests across service boundaries!
Let me know if this was helpful or not, and whatever solution you implemented!

use webservice in same project or handle it with code?

This is a theoretical question.
imagine an aspnet website. by clicking a button site sends mail.now:
I can send mail async with code
I can send mail using QueueBackgroundWorkItem
I can call a ONEWAY webservice located in same website
I can call a ONEWAY webservice located in ANOTHER website (or another subdomain)
none of above solutions wait for mail operation to be completed.so they are fine.
my question is why I should use service solution instead of other solutions. is there an advantage ?
4th solution adds additional tcpip traffic to use service its not efficient right ?
if so, using service under same web site (3rd solution) also generates additional traffic. is that correct ?
I need to understand why people using services under same website ? Is there any reason besides make something available to ajax calls ?
any information would be great. I really need to get opinions.
best
The most appropriate architecture will depend on several factors:
the volume of emails that needs to be sent
the need to reuse the email sending capability beyond the use case described
the simplicity of implementation, deployment, and maintenance of the code
Separating out the sending of emails in a service either in the same or another web application will make it available to other applications and from client side code. It also adds some complexity to the code calling the service as it will need to deal with the case when the service is not available and handle errors that may occur when placing the call.
Using a separate web application for the service is useful if the volume of emails sent is really large as it allows to offload the work to one or servers if needed. Given the use case given (user clicks on a button), this seems rather unlikely, unless the web site will have really large traffic. Creating a separate web application adds significant development, deployment and maintenance work, initially and over time.
Unless the volume of emails to be sent is really large (millions per day) or there is a need to reuse the email capability in other systems, creating the email sending function within the same web application (first two options listed in the question) is almost certainly the best way to go. It will result in the least amount of initial work, is easy to deploy, and (perhaps most importantly) will be the easiest to maintain.
An important concern to pay significant attention to when implementing an email sending function is the issue of robustness. Robustness can be achieved with any of the possible architectures and is somewhat of an different concern as the one emphasized by the question. However, it is important to consider the proper course of action needed if (1) the receiving SMTP refuses the take the message (e.g., mailbox full; non-existent account; rejection as spam) and (2) an NDR is generated after the message is sent (e.g., rejection as spam). Depending on the kind of email sent, it may be OK to ignore these errors or some corrective action may be needed (e.g., retry sending, alert the user at the origination of the emails, ...)

Event Driven Architecture - Service Contract Design

I'm having difficulty conceptualising a requirement I have into something that will fit into our nascent SOA/EDA
We have a component I'll call the Data Downloader. This is a facade for an external data provider that has both high latency and a cost associated with every request. I want to take this component and create a re-usable service out of it with a clear contract definition. It is up to me to decide how that contract should work, however its responsibilities are two-fold:
Maintain the parameter list (called a Download Definition) for an upcoming scheduled download
Manage the technical details of the communication to the external service
Basically, it manages the 'how' of the communication. The 'what' and the 'when' are the responsibilities of two other components:
The 'what' is managed by 'Clients' who are responsible for
determining the parameters for the download.
The 'when' is managed by a dedicated scheduling component. Because of the cost associated with the downloads we'd like to batch the requests intraday.
Hopefully this sequence diagram explains the responsibilities of the services:
Because each of the responsibilities are split out in three different components, we get all sorts of potential race conditions with async messaging. For instance when the Scheduler tells the Downloader to do its work, because the 'Append to Download Definition' command is asynchronous, there is no guarantee that the pending requests from Client A have actually been serviced. But this all screams high-coupling to me; why should the Scheduler necessarily know about any 'prerequisite' client requests that need to have been actioned before it can invoke a download?
Some potential solutions we've toyed with:
Make the 'Append to Download Definition' command a blocking request/response operation. But this then breaks the perf. and scalability benefits of having an EDA
Build something in the Downloader to ensure that it only runs when there are no pending commands in its incoming request queue. But that then introduces a dependency on the underlying messaging infrastructure which I don't like either.
Makes me think I'm thinking about this problem in a completely backward way. Or is this just a classic case of someone trying to fit a synchronous RPC requirement into an async event-driven architecture?
The thing I like most about EDA and SOA, is that it almost completely eliminates the notion of race condition. As long as your events are associated with some association key (e.g. downloadId), the problem you describe can be addressed with several solutions of different complexities - depending on your needs. I'm not sure I totally understand the described use-case but I will try my best
Out of the top of my head:
DataDownloader maintains a list of received Download Definitions and a list of triggered downloads. When a definition is received it is checked against the triggers list to see if the associated download has already been triggered, and if it was, execute the download. When a TriggerDownloadCommand is recieved, the definitions list is checked against a definition with the associated downloadId.
For more complex situation, consider using the Saga pattern, which is implemented by some 3rd party messaging infrastructures. With some simple configuration, it will handle both messages, and initiate the actual download when the required condition is satisfied. This is more appropriate for distributed systems, where an in-memory collection is out of the question.
You can also configure your scheduler (or the trigger command handler) to retry when an error is signaled (e.g. by an exception), in order to avoid that race condition, and ultimately give up after a specified timeout.
Does this help?

Resources