What is batchsize parameter in SignalR Crank load testing tool?
Also please help me to understand what the function "connectbatch" is doing? As per my understanding it just creates a n client connections based on input parameter "clients"?
BatchSize is used to define the number of connections that are created in one batch.
Once the batch size is reached Crank waits for a successful connection for all created clients before it moves on.
Crank (and signalR in general) lean heavily on TPL and async/await. This enables a large number of requests to be created in one batch
Related
PROBLEM
Our PROCESSING SERVICE is serving UI, API, and internal clients and listening for commands from Kafka.
Few API clients might create a lot of generation tasks (one task is N messages) in a short time. With Kafka, we can't control commands distribution, because each command comes to the partition which is consumed by one processing instance (aka worker). Thus, UI requests could be waiting too long while API requests are processing.
In an ideal implementation, we should handle all tasks evenly, regardless of its size. The capacity of the processing service is distributed among all active tasks. And even if the cluster is heavily loaded, we always understand that the new task that has arrived will be able to start processing almost immediately, at least before the processing of all other tasks ends.
SOLUTION
Instead, we want an architecture that looks more like the following diagram, where we have separate queues per combination of customer and endpoint. This architecture gives us much better isolation, as well as the ability to dynamically adjust throughput on a per-customer basis.
On the side of the producer
the task comes from the client
immediately create a queue for this task
send all messages to this queue
On the side of the consumer
in one process, you constantly update the list of queues
in other processes, you follow this list and consume for example 1 message from each queue
scale consumers
QUESTION
Is there any common solution to such a problem? Using RabbitMQ or any other tooling. Š¯istorically, we use Kafka on the project, so if there is any approach using - it is amazing, but we can use any technology for the solution.
Why not use spark to execute the messages within the task? What I'm thinking is that each worker creates a spark context that then parallelizes the messages. The function that is mapped can be based on which kafka topic the user is consuming. I suspect however your queues might have tasks that contained a mixture of messages, UI, API calls, etc. This will result in a more complex mapping function. If you're not using a standalone cluster and are using YARN or something similar you can change the queueing method that the spark master is using.
As I understood the problem, you want to create request isolation from the customer using dynamically allocated queues which will allow each customer tasks to be executed independently. The problem looks like similar to Head of line blocking issue in networking
The dynamically allocating queues is difficult. This can also lead to explosion of number of queues that can be a burden to the infrastructure. Also, some queues could be empty or very less load. RabbitMQ won't help here, it is a queue with different protocol than kafka.
One alternative is to use custom partitioner in kafka that can look at the partition load and based on that load balance the tasks. This works if the tasks are independent in nature and there is no state store maintains in the worker.
The other alternative would be to load balance at the customer level. In this case you select a dedicated set of predefined queues for a set of customers. Customers with certain Ids will be getting served by a set of queues. The downside of this is some queues can have less load than others. This solution is similar to Virtual Output Queuing in networking,
My understanding is that the partitioning of the messages it's not ensuring a evenly load-balance. I think that you should avoid create overengineering and so some custom stuff that will come on top of the Kafka partitioner and instead think at a good partitioning key that will allows you to use Kafka in an efficiently manner.
grpc_impl::ServerReaderWriter/grpc_impl::internal::ReaderInterface implement NextMessageSize(), but from the naming it looks like it'd only return the size of the immediate next message, and from this thread and the documentation it seems that the return value is only an upper bound.
For streaming applications (e.g. audio/video, text, any real time duplex streams), it'd be helpful to know how much data arrived from the client, so that it could be e.g. processed in bulk, or to measure non-realtimeness, or to adapt to variable streaming rates, etc.
Thanks for any pointers and explanations.
The current API does not provide such capabilities. It is normally recommended to keep reading from the stream especially if the application is expecting to receive messages. If the application stops reading, gRPC would also stop reading at some point depending on how resource quota is configured. Even if the configuration is such that gRPC never stops reading, we risk gRPC consuming too much memory.
It seems to me that what you want is to build a layer on top of gRPC that will buffer messages so that you can process them in bulk and perform measurements.
I have a biztalk orchestration which processes a single message. This messages are actually batches of messages. Most of the time, the batch size n is small (<1.000) but once in a while there are very large batches (>50.000). We have a high throughput of messages as well.
The orchestration takes a linear O(n) amount of system memory depending on the batch size and I know by observation that a single server can process up to an accumulated batch size of ~250k in parallel before it runs out of system memory and only returns OutOfMemoryExceptions. (Which will kill the BizTalk host instance and the orchestrations will startup on another host which will ultimately break again leaving our BizTalk group in a broken state which can currently only be recovered by manual intervention)
Small batches are common, large batches are rare but kind of deadly if there is more than one at the same time.
I know the batch size in advance so I could tell biztalk about it. But I see no way to interact with throttling. When throttling detects a lack of system memory it is already too late.
Do I have to build my own queueing and dispatching on top of biztalk to achieve my goals?
Our current solution is to use a semaphore with a value of 8 and every large message n>1000 needs to get a semaphore slot before it is allowed to start processing. We had an edge case the other day where even this was too much. We reduced 8 to 4 to resolve this but now, we impacted the general throughput noticeably.
Any idea or hint is welcomed!
Don't use XmlDocument within your processing. It will further exacerbate your memory issues. Prefer XmlReader for sure here. However, I'd still try to move processing outside of your orchestration. Even if you can get the streaming working in a .NET component called from the orchestration, you can still end up with an orchestration instance that runs for a long time and consumes lots of memory, which should be avoided whenever possible. Therefore...
Avoid letting the orchestration get messages that large to begin with. It may be possible to debatch the message using the OOB XmlDisassembler if you can mark the schema as an envelope schema; if not, you may need to create a custom disassembler component to do your debatching (just remember to promote/write the proper context properties to the newly created messages from the original). If you use some streaming techniques (see https://www.microsoft.com/en-us/download/details.aspx?id=20375) in the pipeline, you can greatly reduce the memory footprint and have much greater control there. Again, use XmlReader to actually parse and debatch the message (it shouldn't be super difficult - look into the ReadToFollowing and ReadSubTree, as in this Splitting large xml files in to sub files without memory contention). You might get away with doing this in an orchestration rather than a pipeline component, but in a pipeline component it should be easier to control memory usage. You may also look into promoting things like a batch ID if you need to correlate the messages back together.
If you get a large batch, you will still need to throttle the number of concurrent orchestrations; you could do so as Richard Seroter suggests here, which uses multiple convoys that correlate on instance IDs to prevent too many from running at once. Alternatively, you could use ordered delivery on the receive shape (see MSDN), which would probably be my preferred option as it takes significantly less work and won't face the concerns around zombie messages that are possible with convoys.
Basically: try to think small and lean as much as possible and BizTalk will be happier. BizTalk would much rather process 1000 small messages in a second than 1 very large message in a minute.
We have a BizTalk 2010 receive location, which will get a 70MB file and then using inbound map (in receive location) and outbound map (in send port) to produce a 1GB file.
While performing the above process, a lot of disk I/O resource is consumed in SQL Server. Another receive location processes performance are highly affected.
We have tried reduce the maximum disk I/O threads in host instance of that receive location, but it still consumes a lot of disk I/O resource in SQL Server.
In fact this process priority is very very low. Is there any method to reduce the disk I/O resource usage of this process such that other processes performance can be normal?
This issue isn't related to the speed of the file input, but, as you mentioned in a comment, to the load this places on the messagebox when trying to persist the 1gb map output to the MessageBox. You have a few options here to try to minimize the impact this will have on other processes:
Adjust the throttling settings on the newly created host to something very low. This may or may not work the way you want it to though.
Set a service window on the recieve location for these files so that they only run during off hours. This would be ideal if you don't have 24/7 demand on the MessageBox and can afford to have slow response time in the middle of the night (say 2-3am)
If your requirements can handle this, don't map the file in the recieve port, but instead route it to an Orchestration and/or custom pipeline component that will split it into smaller pieces and then map the smaller pieces. This should at least give you more fine grained control over the speed at which these are processed (have a delay shape in the loop that processes the pieces). There'd still possibly be issues when you joined them back together, but it shouldn't be as bad as your current process.
It also may be worth looking at your map. If there are lots of slow/processor heavy calls you might be able to refactor it.
Ideally you should debatch the file. Apply business logic including map on each individual segments and then load them into sql one at a time. Later you can use pipeline or some other .NET component to pull data from SQL and rebatch the data. Handling big xml (10 times size as compared to flat file) in BizTalk messagebox is not a very good practice.
If however it was a pure messaging scenario, you can convert file into stream and route it to destination.
I was reading a comment about server architecture.
http://news.ycombinator.com/item?id=520077
In this comment, the person says 3 things:
The event loop, time and again, has been shown to truly shine for a high number of low activity connections.
In comparison, a blocking IO model with threads or processes has been shown, time and again, to cut down latency on a per-request basis compared to an event loop.
On a lightly loaded system the difference is indistinguishable. Under load, most event loops choose to slow down, most blocking models choose to shed load.
Are any of these true?
And also another article here titled "Why Events Are A Bad Idea (for High-concurrency Servers)"
http://www.usenix.org/events/hotos03/tech/vonbehren.html
Typically, if the application is expected to handle million of connections, you can combine multi-threaded paradigm with event-based.
First, spawn as N threads where N == number of cores/processors on your machine. Each thread will have a list of asynchronous sockets that it's supposed to handle.
Then, for each new connection from the acceptor, "load-balance" the new socket to the thread with the fewest socket.
Within each thread, use event-based model for all the sockets, so that each thread can actually handle multiple sockets "simultaneously."
With this approach,
You never spawn a million threads. You just have as many as as your system can handle.
You utilize event-based on multicore as opposed to a single core.
Not sure what you mean by "low activity", but I believe the major factor would be how much you actually need to do to handle each request. Assuming a single-threaded event-loop, no other clients would get their requests handled while you handled the current request. If you need to do a lot of stuff to handle each request ("lots" meaning something that takes significant CPU and/or time), and assuming your machine actually is able to multitask efficiently (that taking time does not mean waiting for a shared resource, like a single CPU machine or similar), you would get better performance by multitasking. Multitasking could be a multithreaded blocking model, but it could also be a single-tasking event loop collecting incoming requests, farming them out to a multithreaded worker factory that would handle those in turn (through multitasking) and sending you a response ASAP.
I don't believe slow connections with the clients matter that much, as I would believe the OS would handle that efficiently outside of your app (assuming you do not block the event-loop for multiple roundtrips with the client that initially initiated the request), but I haven't tested this myself.